Overview of NGramCounter


NGramCounter is an API endpoint that generates n-grams and frequency counts of the n-grams based on the provided text. The n-grams and its frequency are returned in descending order of frequency. For example if your text is “The cow jumps over the moon. The dog jumps over the moon”  and you needed bi-gram counts, the output would be as follows:

Here is an example in Spanish 

“El gato salta sobre la luna. El hacer salta sobre la luna” for tri-grams. You would get the following output:

N-grams are a set of co-occuring words within a given window. It has many uses in NLP and text mining right from summarization to feature extraction in supervised machine learning tasks.

Using the API

To use this API endpoint, you would first need to have a valid API key.

NGramCounter Parameters

  • text: the text in any language to be used for n-gram generation (max 1MB of text per API call) *
  • n-gram: the size of n-gram (n can be 1-10) *
  • case-sensitive: true/false [ false only if you want to distinguish the capitalization in text ] (optional)

The first two parameters are mandatory and the third parameter is optional. For the first parameter, text – the text that you provide can be any type of text ranging from news articles, to Tweets to customer support emails in any language. Please only provide plain text. The second parameter, n-gram would be a number ranging from

1-10. Use 1 if you want to generate counts of words. Use 2 or more for the preferred type of n-gram. The third parameter essentially specifies if you should distinguish the different capitalization used in text. For example, if you set the case-sensitive parameter to true the word “Love” and “love” would be regarded as two different words. If you see no use case for this, please ignore this parameter as by default, all words are lowercased.

Sample Input

Here is an example of NGramCounter parameters in Mashape for generating tri-grams for the text “I love rainy days. How I wish it was raining ! How I wish it was snowing !”

Sample Output

Below is the sample output from NGramCounter for the above text “I love rainy days. How I wish it was raining ! How I wish it was snowing !”

As you can see, you have the tri-grams of the above text and the count of the tri-grams in descending order of frequency. Note that NGramCounter does basic sentencing of text in order to compute n-grams. So, please ensure that punctuation is available so that sentences are segmented reasonably well (i.e. do not attempt to completely clean out the text). Also note that NGramCounter is language-neutral so you can generate n-grams in multiple languages.

cURL Example

Here is a CURL example on how you can use this API:

Languages Supported

NGramCounter is language neutral, so it should work for most languages.

See Also