The Topics Extraction API allows you to find key topics in text. Extracted topics can be used to summarize documents, create navigable word clouds, features for machine learning, visualization of large amounts of text and more! The supporting texts allows you to easily link back to snippets from your text!
When working with text mining applications, we often hear of the term “stop words” or “stop word list” or even “stop list”. Stop words are just a set of commonly used words in any language. Stop words are commonly eliminated
The NGramCounter API generates word and n-grams counts based on large amounts of text provided with the desired n-gram size in any language. You can count frequencies of words (unigrams), bi-grams, tri-grams and etc.
The Text Similarity API computes surface similarity between two pieces of text (long or short) using well known measures such as Jaccard, Dice and Cosine. Text similarity is used in many applications such as linking similar entities, de-duplication of product listings, clustering, topic merging and for many other tasks.
The Sentence Clustering API clusters sentence level texts such as Legal Documents, Tweets, Facebook Status Updates, News Articles, Surveys and etc into logical groups. The API produces meaningful clusters as well as topic labels for the clusters.
While it is fairly easy to use a published set of stop words, in many cases, using such stop words is completely insufficient for certain applications. For example, in clinical texts, terms like “mcg” “dr.” and “patient” occur almost in