A Roadmap to Text Mining and Web Mining
- Under Construction, Last Modified: Jan 8, 2002 -
Text Mining in General
M. Hearst, Untangling Text Data Mining, ACL99
Mining in Textual Mountains: An Interview with Marti Hearst, Mappa Mundi, 1999
Semio Co., Text Mining and the Knowledge Management Space, 1999
D. Radev, Text Data Mining: An Overview
Workshops
PAKDD-2002 Workshop on Text Mining
SDM-2002 Text Mining Workshop (Text Mining 2002)
ICDM-2001 Workshop on Text Mining (TextDM'2001)
SDM-2001 Text Mining Workshop (TextMine'01)
KDD-2000 Workshop on Text Mining
UMN-IMA Text Mining Workshop (2000)
IJCAI-99 Workshop on Text Mining
ECML-98 Workshop on Text Mining
Tutorials
D. Mladenic & M. Grobelink, ECML/PKDD-2001 Tutorial on Text Mining
Classes
M. Hearst, Seminar on Text Data Mining (SIMS, UCBerkeley)
W. Cohen, Machine Learning for Text Mining (LTI, CS, CMU)
W. Pratt, Text Mining (ICS, UCI)
Links
Google Directory: Text Mining
Open Directory: Text Mining
Web Mining in General
R. Kosala et. al, Web Mining Research: A Survey, SIGKDD Explorations, 2000
R. Cooley et. al, Web Mining: Information and Pattern Discovery on the World Wide Web
D. Greening, Data Mining on the Web, WebTechniques, 2000
People
Sergey Brin
Oren Etzioni
David W. Embley
Filippo Menczer
Bamshad Mobasher
Workshops
ECML/PKDD-2001 Semantic Web Mining Workshop
SDM-2001 Workshop on Web Mining
PRICAI-2000 Workshop on Text and Web Mining
INFWET97
Task-Driven Text Mining
Text Categorization
Y. Yang, An Evaluation of Statistical Approaches to Text Categorization, Journal of IR, 1999
Andrew McCallum
Kamal Nigam
David D. Lewis
AAAI98 Workshop on Learning for Text Categorization
Text Mining, Automatic Classification & Indexing
Document Clustering
Clustering Text & Useful Scripts
Scatter/Gather
I. Dhillon, Co-Clustering Documents and Words Using Bipartite Spectral Graph Partitioning, KDD01.
Rule Mining from Text
H. Ahonen-Myka et. al, Applying Data Mining Techniques in Text Analysis, Technical Report, 1997
R. Feldman et. al, FACT, 1996
R. Feldman et. al, Knowledge Discovery in Texts(KDT), KDD95
Rayid Ghani
Heikki Mannila
Relationship Mining
Y. Park et. al, Hybrid Text Mining for Finding Abbreviations and Their Definitions, EMNLP-2001
N. Sundaresan et. al, Mining the Web for Relations, WWW9, 2000
L. Larkey et. al, Acrophile: An Automated Acronym Extractor and Server, ACM DL-2000
J. Yi et. al, Mining the Web for Acronyms Using the Duality of Patterns and Relations, ACM CIKM-99 Workshop on Web Information and Data Management
Topic Detection
Chris Clifton
Text Segmentation
UCBerkeley TextTiling
Text Summarization
Text Summarization Technology
Knowledge Understanding
Udo Hahn
Text Navigation, Visualization and User Interface
UCBerkeley Cat-a-Cone
MIT Shakespeare Project
Data Visualization
Methodology-Driven Text Mining
Neural Networks
WEBSOM - SOM-based Text Mining
D. Merkl et. al, Text Data Mining, In A Handbook of Natural Language Processing: Techniques and Applications for the Processing of Language as Text, 1998
Vitali Schetinin
Evolutionary Computation
A. Freitas, A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery, In Advances in Evolutionary Computation, 2002.
Parallel Text Mining
J. Chen, Parallel Text Mining for Cross-Language Information Retrieval Using a Statistical Translation Model
Hyperlinks Analysis
Soumen Chakrabarti, Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text, WWW7, 1998
Application-Driven Text Mining
Bioinformatics
Text Mining for Bioinformatics Tutorial
Text Mining for Molecular Biology
Business and Customer Relationship Management(CRM)
D. Evans, Text Mining Towards Decision Supports, 1999
C. Halliman, Business Intelligence Using Smart Techniques: Environmental Scanning Using Text Mining and Competitor Analysis Using Scenarios and Manual Simulation
Text Mining in the Noisy World
Data Mining in the Noisy World
Z. Tian et. al, An N-gram-based Approach for Detecting Approximately Duplicate Database Records, International Journal on Digital Libraries, 2001
W. Cohen, Hardening Soft Information Sources, KDD00
M. Hernandez et. al, Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem, Data Mining and Knowledge Discovery, 1998
Merge-Purge and Data Cleaning
Machine Learning in the Noisy World
P. Domingos, Unifying Instance-Based and Rule-Based Induction, Machine Learning, 1996
C. Janikow, FID: Fuzzy Decision Trees
Information Retrieval in the Noisy World
G. Bordogna et. al, Modeling Vagueness in Information Retrieval, ESSIR, 2000
Databases in the Noisy World (a.k.a. Information Integration)
H. Lu et. al, Discovering and Reconciling Semantic Conflicts: A Data Mining Conflicts, DS-7, 1997
Tools for Text Mining (Or Related Fields)
Information Extraction
D. Appelt et. al, Introduction to Information Extraction Technology, IJCAI-99 Tutorial
H. Cunningham, Information Extraction: A User Guide, Technical Report, 1999
I. Muslea, Extraction Patterns for Information Extraction, AAAI-99 Workshop on Machine Learning for Information Extraction
I. Muslea, Extraction Patterns: From Information Extraction to Wrapper Induction, Technical Report, 1998
C. Cardie, Empirical Methods in Information Extraction, AI Magazine, 1997
Repository
I. Muslea, RISE: Repository of Information Extraction
Workshops
AAAI-99 Workshop on Machine Learning for Information Extraction
Machine Learning Techniques for Information Extraction
M. Califf et. al, Relational Learning of Pattern-Match Rules for Information Extraction, AAAI-99
Mark Craven
Tom Mitchell
Stephen Soderland
Statistical Information Extraction
Dayne Freitag
Hugo Zaragoza
Wrapper Induction
N. Kushmerick, Wrapper induction: Efficiency and Expressiveness, Artificial Intelligence, 2000
Information Extraction and Text Mining
IJCAI-2001 Workshop on Adaptive Text Extraction and Mining
Information Extraction and Information Retrieval
J. Bear et. al, Using Information Extraction to Improve Document Retrieval, TREC6, 1997
Natural Language Processing
Computational Linguistics Journal
COLING-2002 | ACL-2002
C. Manning et. al, Foundataions of Statistical Natural Language Processing
D. Jurafsky et. al., Speech and Language Processing
LIA Publication
Text Mining for Natural Language Processing
D. Lin et. al, Discovery of Inference Rules for Question-Answering, Journal of Natural Language Engineering, 2001
D. Lin et. al, Induction of Semantic Classes from Natural Language Text, KDD-2001
Natural Language Processing and Databases
NLDB-2002
Machine Learining
Machine Learning Journal
ICML-2002
Repository
UCI Machine Learning Repository
Machine Learning on Text
UW-Madison: Machine Learning for Text Analysis (2000)
ICML-99 Workshop on Machine Learning in Text Data Analysis
Information Retrieval
Information Retrieval Journal
ACM SIGIR-2002
IR Resources
WebIR
The Center for Intelligent Information Retrieval at UMass
R. Baeza-Yates et. al, Modern Information Retrieval
Information Retrieval for Text Mining
J. Neto et. al, Document Clustering and Text Summarization
Information Retrieval on the Web
Data Engineering Special Issue on Next-Generation Web Search
AAAI-2000 Workshop on AI for Web Search
V. Raghavan, Information Retrieval on the World-Wide Web, 1997
Information Retrieval and Machine Learning
R. Belew et. al, Machine Learning and Information Retrieval
Information Retrieval using Natural Language Processing
A. Arampatzis, Linguistically-Motivated Information Retrieval
Data Mining
Data Mining and Knowledge Discovery Journal
ACM KDD-2002 | ICDM-2002
J. Han et. al, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001
KDNuggets Directory
Databases
VLDB Journal
ACM SIGMOD-2002
D. Sullivan, Document Warehousing and Text Mining
Web
World Wide Web Journal
WWW-2002
WWW + Databases: WebDB
Digital Libraries
D-Lib Forum & Magazine
ACM JCDL-2002
Intelligent Agents
Autonmous Agents and Multi-Agent Systems Journal
AAMAS-2002
Agent Web
BotSpot FAQ
Web Agent
C. Petrie, Agent-Based Engineering, the Web, and Intelligence, IEEE Expert, 1996
Syskill & Webert: Identifying Interesting Web Sites
Agent and Information Retrieval
Agent-Based IR
Haym Hirsh
Foster Provost
Jason Rennie
Sean Slattery
Charles Elkan
Christos Faloutsos
Geoff Webb
Osmar R. Zaiane
W. Fan
Bing Liu
Institutions
UT-Austin Machine Learning Group
CMU Text Learning Group
University of Helsinki FDK Data Mining and Machine Learning Group
Albert-Ludwigs-University Computational Linguistic Research Group
University of Waikato Text Mining Group
Text Mining at Kent Ridge Digital Lab(KRDL), Singapore
Text Mining at PMSI, France
Imperial College Data Mining Group
Text Mining at KI, Germany
XRCE MLTT
LIA TLN
Projects
IBM Clever
IBM Data Abstraction
Web->KB
WebWatcher
STARTS
FAQFinder
Combining Machine Learning and Natural Language Processing for Knowledge Discovery in Text Corpora
IBM-TRL Text Mining
Products
Brosis Xcise
iCrossReader
Intelligent Miner for Text (IBM)
Leximancer
SRA
Temis
TextAnalyst (Megaputer)
Text Mining by Filter Composition
Text Mining Tools (The Data Warehousing Information Center)
VantagePoint
VisualText (TextAnalysis)
WizSoft
WordStat (Provalis Research)
Alembic Workbench (MITRE)
INTEX (LADL)
LexGram (University of Stuttgart)
LinguistX Platform (InXight)
PAGE (DFKI)
Pinocchio (ITC-Irst) |