What is Text Data Mining (briefly): Text Data Mining is the structuring of unstructured text content, followed by the analysis of that unstructured content. Structuring text content applies some sense of order that can then be analyzed. Examples can include organizing ngram counts, parts of speech tagging, and named entity extraction. Types of analysis can largely be categorized as supervised and unsupervised. Supervised learning primarily focuses on classifying and categorizing data into pre-defined labels (such as sentiment analysis), or regression of continuous data, such as measuring the relationship between web page views and the calculated reading level of the page. Unsupervised learning does not contain pre-defined labels, but instead focuses on functions such as similarity clustering and keyword density in order to obtain a clearer picture of the text content.
Top 5 Software Tools we use at Baylor Libraries for TDM (listed least to most complex):
(1) Voyant – Perfect for the beginner. Entirely online with no software to download and no registration required.
(4) Mallet -MAchine Learning for LanguagE Toolkit – Free tool providing TDM machine learning functions.