The Wolfram Language has had extensive support for string manipulation since
Mathematica 5, and in Version 10 it provided uniform symbolic access to a huge repository of computable data via the
Wolfram Knowledgebase. Taking advantage of both of these fundamental capabilities, along with new machine learning functionality with
Classify and
Predict, we're excited to be making further inroads into the rich domains of natural language processing and text analytics with
TextCases, new in Version 10.2.
TextCases, like its sister functions
Cases and
StringCases, finds instances of patterns in a given input. Whereas Cases operates on Wolfram Language expressions and StringCases on strings, TextCases assumes that the input is human understandable text, from which one can extract known syntactic and semantic entities. These include basic textual types such as words, sentences, and paragraphs, but also more sophisticated semantic types such as countries, cities, and numbers.
As a simple example, let's use TextCases to find instances of countries in a sentence: