

Technical report, Princeton University, 1995. Prentice Hall, Englewood Cliffs, NJ, 1991.Į.S. Information Retrieval: Data Structures and Algorithms, chapter Clustering algorithms, pages 419–442. Learning and Revising User Profiles: The identification of interesting web sites. Applied Artificial Intelligence, 11:1–32, 1997. Interface Agents that Learn: An Investigation of Learning Issues in a Mail Agent Interface. Improving Text Classification by Shrinkage in a Hierarchy of Classes. In AAAI-98 Workshop on “Learning for Text Categorization”, 1998.Īndrew McCallum, Ronald Rosenfeld, Tom Mitchell, and Andrew Ng. A Comparison of Event Models for Naive Bayes Text Classification.

Communications of the ACM, 37(7):31–40, 1994.Īndrew McCallum and Kamal Nigam. Agents that Reduce Work and Information Overload. Note on the general case of the Bayes-Laplace formula for inductive or a posteriori probabilities. In In Third Annual Symposium on Document Analysis and Information Retrieval, pages 81–92, 1994. A comparison of two learning algorithms for text categorization. News Weeder: Learning to Filter Net-News. on Knowledge Discovery and Data Mining, 1998.
#Iscribe athena software
BAYDA: Software for Bayesian Classification and Feature Selection. In The 9th European Conference on Machine Learning, Poster Papers, 1997. on Knowledge Discovery and Data Mining, 1996. Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid. The Estimation of Probabilities: An Essay on Modern Bayesian Methods. on Research and Development in Information Retrieval, 1992. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. of the 1996 AAAI Spring Symposium on Machine Learning in Information Access, 1996.ĭ.R. on Very Large Databases, pages 446–455, 1997. Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases. ACM Transactions on Information Systems, 1994. Automated Learning of Decision Rules for Text Categorization. Research Report RJ 10153, IBM Almaden Research Center, San Jose, CA 95120, July 1999. Athena: Mining-based interactive management of text databases. This process is experimental and the keywords may be updated as the learning algorithm improves. These keywords were added by machine and not by the authors. By allowing this interactivity in the clustering process, C-Evolve achieves considerably higher clustering accuracy (10 to 20% absolute increase in our experiments) than the popular K-Means and agglomerative clustering methods. C-Evolve first finds highly accurate cluster digests (partial clusters), gets user feedback to merge and correct these digests, and then uses the classification algorithm to complete the partitioning of the data. We also present a new interactive clustering algorithm, C-Evolve, for topic discovery. Our enhancements include using Lidstone’s law of succession instead of Laplace’s law, under-weighting long documents, and over-weighting author and subject. We show that our specialization of the Naive Bayes classifier is considerably more accurate (7 to 29% absolute increase in accuracy) than a standard implementation. Naive Bayes classifiers are recognized to be among the best for classifying text. Athena satisfies these requirements through linear-time classification and clustering engines which are applied interactively to speed the development of accurate models. Requirements of any such system include speed and minimal end-user effort. In two or three years we plan to detect concealed guns and visual guns and be able to have a robot/drone response in seconds after detection to help stop the threats.We describe Athena: a system for creating, exploiting, and maintaining a hierarchy of textual documents through interactive miningbased operations. What do you plan to achieve in the next 2-3 years?
