Here's a result from ReelTwo's Classification System circa 2003 (Based on a bayesian learner; related to the U Waikato WEKA ML system) if you'd be up for comparison:
10 categories
2,535 documents
15 build time (~170 docs/sec; these were short news abstracts; see pdf for example)
0.9121 F-measure
Build Time is the time to load, model and evaluate (using Leave-One-Out evaluation) a dataset on a WinXP/1GHz Celeron/256MB computer. F-Measure is the micro-averaged F-Measure across all categories in the dataset.
There are some well known text classification datasets, e.g. the Reuters news dataset from David Lewis of Bell Labs:
More background here: Here's a result from ReelTwo's Classification System circa 2003 (Based on a bayesian learner; related to the U Waikato WEKA ML system) if you'd be up for comparison: 10 categories 2,535 documents 15 build time (~170 docs/sec; these were short news abstracts; see pdf for example) 0.9121 F-measureBuild Time is the time to load, model and evaluate (using Leave-One-Out evaluation) a dataset on a WinXP/1GHz Celeron/256MB computer. F-Measure is the micro-averaged F-Measure across all categories in the dataset.