What is Sentence Clustering?

The Sentence Clustering API groups sentence level texts (e.g. from news articles, customer support emails, support tickets, blog comments, customer reviews, etc) or short texts (e.g. Tweets, FourSquare tips, sms text messages, Facebook status updates) into logical groups. This API not only produces meaningful clusters, it also provides meaningful topic cues which would make analysis of the clusters much easier than not having any labels or topics. For example, let us say you have simple sentences as follows which you would like to cluster:

    The terrorist attack was really horrifying.
    People were really afraid after the terrorist attack
    The terrorists escaped after the attacks
    CNN news reporter went to the crime scene
    CNN news reports today on this violant crime

The terrorist attack was really horrifying.

People were really afraid after the terrorist attack

The terrorists escaped after the attacks

CNN news reporter went to the crime scene

CNN news reports today on this violant crime

The output from the SentenceClustering endpoint would be as follows:

"results": {
        "numOfClusters": 3,
        "clusters": [
            {
                "clusterScore": 1.4905037742070442,
                "clusterSize": 3,
                <strong>"<span style="color: #b22222;">clusterTopics": "[terrorist attack:1.49, attack:1.02]</span>",</strong>
                "clusteredSentences": [
                    "0000: The terrorist attack was really horrifying",
                    "0001: People were really afraid after the terrorist attack",
                    "0002: The terrorists escaped after the attacks "
                ]
            },
            {
                "clusterScore": 1.0155816276222989,
                "clusterSize": 3,
               <strong> "<span style="color: #b22222;">clusterTopics": "[crime:1.02, cnn:1.02]</span>",</strong>
                "clusteredSentences": [
                    "0003: CNN news reporter went to the crime scene",
                    "0004: CNN news reports today on this violant crime",
                    "0005: This crime was reported by CNN"
                ]
            },
            {
                "clusterScore": 0,
                "clusterSize": 0,
                <strong>"<span style="color: #b22222;">clusterTopics": "[sentences_with_no_cluster_membership]</span>"</strong>,
                "clusteredSentences": []
            }
        ]
    }

"results": {

"numOfClusters": 3,

"clusters": [

{

"clusterScore": 1.4905037742070442,

"clusterSize": 3,

"clusterTopics": "[terrorist attack:1.49, attack:1.02]",

"clusteredSentences": [

"0000: The terrorist attack was really horrifying",

"0001: People were really afraid after the terrorist attack",

"0002: The terrorists escaped after the attacks "

]

{

"clusterScore": 1.0155816276222989,

"clusterSize": 3,

"clusterTopics": "[crime:1.02, cnn:1.02]",

"clusteredSentences": [

"0003: CNN news reporter went to the crime scene",

"0004: CNN news reports today on this violant crime",

"0005: This crime was reported by CNN"

]

{

"clusterScore": 0,

"clusterSize": 0,

"clusterTopics": "[sentences_with_no_cluster_membership]",

"clusteredSentences": []

}

]

}

As you can see above, the sentences are in fact, grouped into logical buckets. The first bucket is about the terrorist attack and the second bucket is about CNN reporting on the crime. Each cluster has a score which is a function of its size and topic informativeness. Each cluster also comes with topic cues under “clusterTopics” which describes the clusters. Sentences that are not found to be part of any cluster are put under “sentences_with_no_cluster_membership” and when this happens, the unclustered sentences can be thought of as being in their own cluster of size “1”. In this specific example however, all sentences were successfully clustered. The sentence ids and topic cues can be used for further analysis and can help with organization of the cluster results.

How is this different from Document Clustering?

Document clustering is about grouping similar documents (e.g. web pages) into logical groups (e.g. web pages about sports, pages about entertainment, pages about politics, etc). The size of each textual unit being clustered is much larger than clustering a set of sentences or pieces of short texts. Conceptually however, these two tasks of clustering sentences and documents are similar. Document clustering has traditionally been the focus of many research groups up until recently. Now, with micro-format texts all over the Web, there is a need for algorithms that actually work well at the sentence level. The ClusterSentences endpoint uses a novel algorithm which is the result of recent research in this area. This algorithm is different from other algorithms like K-means because the focus is not only to cluster sentences (and short texts) into logical buckets but also to simultaneously generate meaningful topics for each cluster.

What type of texts can I cluster?

Essentially, any documents that have sentence level texts. Here are a few examples:

News articles
Customer Support Emails
Support Tickets
Incident Reports
Tweets about a brand, product, company or person
Search Results or Product listings (e.g. eBay, Etsy)
Text messages (SMS)
User Reviews (e.g. Yelp, YellowPages, Urban Spoon)
Micro-reviews (e.g. Foursquare tips)
Clinical texts

Start Clustering Sentences

Before we start…

Before you start, please ensure that you have a valid API key to be able to access the API.

Clustering Algorithm Key Facts

The clustering algorithm performs soft-clustering, meaning the same sentence or short text may appear in different clusters
You get meaningful labels for each cluster
You can use the sentence ids and topics cues to further merge clusters
The current algorithm has only been fine-tuned for the English language

Sentence Clustering API Request

The Sentence Clustering endpoint accepts a JSON request via POST. It takes in two parameters:

Parameter name	Type	Required?	Values
type	text	Yes	“chunk” or “pre-sentenced”
text	text	Yes	a chunk of text or pre-sentenced text (array of sentences)

Clustering a “chunk” or “blob” of text

Clustering a “chunk” of text simply means, you are leaving the clustering endpoint to determine sentence boundaries. We use our default sentencer to extract sentences from text. This typically works well for news articles or well-written texts. For short texts such as comments on blog articles or Tweets, the “pre-sentenced” option would be more appropriate. If you are sending in concatenation of texts as chunk (e.g. concatenated news articles, concatenated Tweets, etc), please ensure that punctuation is available between the two textual units (a “.” should be sufficient). Here is an example JSON request using the “chunk” option for clustering.

Example of Plain JSON request with “chunk”

This request uses the “chunk” option where a chunk of text is sent in with no sentence segmentation.

{
  "type": "chunk",
  "text": "WASHINGTON (AP) — Key elements of the economic proposals President Barack Obama will outline in his State of the Union address Tuesday appear to be aimed at driving the debate in the 2016 election on income inequality and middle-class economic issues, rather than setting a realistic agenda for Congress.  Obama's calls for increasing taxes on the wealthy, making community college free for many students and expanding paid leave for workers stand little chance of winning approval from the new Republican majority on Capitol Hill. But the debate over middle-class economics is looking critical for the coming campaign.    Inequality_and especially the growing opportunity gap_have become the top litmus test of seriousness for 2016,  said Robert Putnam, a Harvard political scientist who has discussed inequality issues with the president and his advisers.  The entry ticket for the presidential sweepstakes is that you have a policy — some policy — for dealing with this issue.   Indeed, potential Republican candidates Jeb Bush and Mitt Romney have been talking openly about income inequality and the need to give lower-earning Americans more opportunities. On the Democratic side, Massachusetts Sen. Elizabeth Warren appears intent on keeping the party focused on a populist economic agenda, even if she doesn't plan to run for president herself.  As the nation's attention increasingly turns to the 2016 election, the Obama White House is making clear that it still wants to set the terms of the economic conversation.   I think we should have a debate in this country between middle-class economics and trickle-down economics and see if we can come to an agreement on the things we do agree on,  White House senior adviser Dan Pfeiffer said Sunday on CBS's  Face the Nation.   The president's advisers argue that's a debate they have won previously, including in Obama's victory over Romney in the 2012 presidential campaign and the fiscal cliff fight with Congress that led to the raising of George W. Bush-era tax rates for the wealthiest Americans. ......remaining text..."
}

{

"type": "chunk",

"text": "WASHINGTON (AP) — Key elements of the economic proposals President Barack Obama will outline in his State of the Union address Tuesday appear to be aimed at driving the debate in the 2016 election on income inequality and middle-class economic issues, rather than setting a realistic agenda for Congress. Obama's calls for increasing taxes on the wealthy, making community college free for many students and expanding paid leave for workers stand little chance of winning approval from the new Republican majority on Capitol Hill. But the debate over middle-class economics is looking critical for the coming campaign. Inequality_and especially the growing opportunity gap_have become the top litmus test of seriousness for 2016, said Robert Putnam, a Harvard political scientist who has discussed inequality issues with the president and his advisers. The entry ticket for the presidential sweepstakes is that you have a policy — some policy — for dealing with this issue. Indeed, potential Republican candidates Jeb Bush and Mitt Romney have been talking openly about income inequality and the need to give lower-earning Americans more opportunities. On the Democratic side, Massachusetts Sen. Elizabeth Warren appears intent on keeping the party focused on a populist economic agenda, even if she doesn't plan to run for president herself. As the nation's attention increasingly turns to the 2016 election, the Obama White House is making clear that it still wants to set the terms of the economic conversation. I think we should have a debate in this country between middle-class economics and trickle-down economics and see if we can come to an agreement on the things we do agree on, White House senior adviser Dan Pfeiffer said Sunday on CBS's Face the Nation. The president's advisers argue that's a debate they have won previously, including in Obama's victory over Romney in the 2012 presidential campaign and the fiscal cliff fight with Congress that led to the raising of George W. Bush-era tax rates for the wealthiest Americans. ......remaining text..."

}

Example JSON request using Unirest Java Library (with “chunk” option)

If you want to send the JSON request in Java, this is how it would look if using the third party Unirest library:

// These code snippets use an open-source library. http://unirest.io/java
HttpResponse response = Unirest.post("https://rxnlp-core.p.mashape.com/generateClusters")
.header("X-Mashape-Key", "<your_api_key>")
.header("Content-Type", "application/json")
.header("Accept", "application/json")
.body("{'type':'chunk','text':'WASHINGTON (AP) — Key elements of the economic proposals President Barack Obama will outline in his State of the Union address Tuesday appear to be aimed at driving the debate in the 2016 election on income inequality and middle-class economic issues, rather than ....remaining text.'}")
.asJson();

// These code snippets use an open-source library. http://unirest.io/java

HttpResponse response = Unirest.post("https://rxnlp-core.p.mashape.com/generateClusters")

.header("X-Mashape-Key", "<your_api_key>")

.header("Content-Type", "application/json")

.header("Accept", "application/json")

.body("{'type':'chunk','text':'WASHINGTON (AP) — Key elements of the economic proposals President Barack Obama will outline in his State of the Union address Tuesday appear to be aimed at driving the debate in the 2016 election on income inequality and middle-class economic issues, rather than ....remaining text.'}")

.asJson();

Example of Plain JSON request with “pre-sentenced” option

This request uses the pre-sentenced option where sentences are already segmented into sentence level texts or you have a set of short texts to cluster. Each short text (.e.g Tweets) can be sent in as a separate sentence using the pre-sentenced option.

{
  "type": "pre-sentenced",
  "text": [
             {"sentence":": Welcome to Tanzania ! ý Citibank Poor Credit Cards"},
             {"sentence":"@AdamCSmith good idea ! chase and citibank have been the worst blood suckers for me ..."},
             {"sentence":"Add your signature to #citibank sucks page :"},
             {"sentence":"Almost had a heart attack looking at my Citibank Student Loan account... "},
            ]
}

{

"type": "pre-sentenced",

"text": [

{"sentence":": Welcome to Tanzania ! ý Citibank Poor Credit Cards"},

{"sentence":"@AdamCSmith good idea ! chase and citibank have been the worst blood suckers for me ..."},

{"sentence":"Add your signature to #citibank sucks page :"},

{"sentence":"Almost had a heart attack looking at my Citibank Student Loan account... "},

]

}

Encoding Issues with JSON

Please note that if you do not escape special characters appropriately and do not use the proper encoding, this can cause “400 Bad Request” errors. Its recommended that you use a JSON wrapper that does the encoding/decoding for you.

ClusterSentences API Response

The ClusterSentences API endpoint returns several values as the output:

Parameter name	Type	Short Description
clusterScore	double	score of cluster - function of topics and cluster size
clusterSize	integer	how many sentences or short texts are part of this cluster?
clusterTopics	text	meaningful labels describing your clusters
clusteredSentences	array of texts	list of sentences with corresponding ids that are part of the cluster

Cluster Score (clusterScore)

The cluster score is the function of the topic meaningfulness and size of the cluster. It can be used to rank clusters or to prune unwanted clusters. For example, if you have 500 clusters, you can choose to use the top 100.

Cluster Size (clusterSize)

This reflects the number of sentences within the cluster. A larger cluster does not necessarily mean the quality of the cluster is better. In fact, if you have all sentences that are related to only one particular topic, you may get back one large cluster rather than several meaningful ones. This defeats the purpose of clustering and you may have to re-analyze your input or ignore the really large clusters that significantly deviate from the size of the other clusters.

Cluster Topics (clusteredTopics)

This is the **label **or **topic **for the clusters. The topics of the cluster try to describe the contents of the cluster. In the example below you would see that the first topic is related to the customer service of Citibank which is thought to be lousy. The different topics are separated by comma and each topic has a corresponding score. You can choose to use the first best topic to represent the cluster or you can use the top N most diverse topics.

Clustered Sentences (clusterSentences)

This is the list of sentences with corresponding ids that are part of a cluster. The clustered sentences are numbered from 0 upto the number of sentences provided in sequence. If you use the chunk option, the sentences are numbered after the text has been segmented into a set of sentences. If you use the pre-sentenced option, then the sentences are numbered in the order sent in the request. The sentence ids along with the topic cues can be used to merge clusters in map-reduce tasks. You can use measures such as Jaccard, Cosine and Dice to measure how similar the sets of sentence ids and topics are. Note that for the sentences you can ommit the actual sentences itself to measure similarity. You just have to deal with the ids. For example, if you have two clusters with the following ids: cluster1: “0001 0002 0003” cluster2: “0001 0002 0004” The Jaccard, Cosine and Dice score from the TextSimilarity API are as follows:

{
    "cosine": "0.667",
    "jaccard": "0.500",
    "dice": "0.667"
}

{

"cosine": "0.667",

"jaccard": "0.500",

"dice": "0.667"

}

These scores show that the clusters do overlap and if the overlap is greater than a specific threshold, the two clusters may be merged (reduced).

Example JSON Response

Please note that this is not a complete response, just a snapshot.

{
  "results": {
    "numOfClusters": 34,
    "clusters": [
      {
        "clusterScore": 10.001701006383902,
        "clusterSize": 53,
        "clusterTopics": "[custom servic:10.00, citibank lousi:7.81, citibank lousi servic:6.96, lousi:4.88]",
        "clusteredSentences": [
          "0001: @AdamCSmith good idea ! chase and citibank have been the worst blood suckers for me , and I will be done with them real soon !",
          "0002: Add your signature to #citibank sucks page :",
          "0016: @BillMcCabe I 've been going rounds with Comcast and Verizon customer service reps all week . I 've heard bad things about Citibank too .",
          "0018: Bodo lah CITIBANK ! Your service is CRAP ! ! ! !",
          "0019: Breaking Now ! Terrible service : I HAD a Citibank supplementary credit card under my father 's name but I rarely use ...",
          "0020: But their systems are so archaic , that they are customer unfriendly .. #citibank #fail",
          "0021: calling citibank to check the balance on a gift card . this line really sucks ",
          "0026: christinagan What are you switching to ? CitiBank sucks but they did not even ask why I was closing just gave me my money ",
          "0030: Citibank : your customer service sucks .",
          "0031: Citibank , your student loan site is awful .",
          "0033: Citibank bankers are accused of deceiving customers with the sale of Lehman Bros products ",
          "0037: Citibank credit card customer service sucks beyond belief ",
          "0039: #citibank creditcard service sucks big time .",
          "0040: @citibank customer service poor",
          "0041: citibank #fail .... Worst customer experience ! ! ! !",
          "0042: Citibank Fraud detection absolutely sucks ",
          "0043: @citibank has the absolute worst customer service .",
          "0044: citibank has the worst credit cards ever .",
          "0046: Citibank is my mortgage company . Well my past two pays were real crappy d/t taking care of family …",
          "0047: Citibank is such a terrible bank with terrible customer service .",
          "0048: Citibank MasterCard is the worst credit card I 've ever had Do not trust the bank that never sleeps Lack of sleep makes it grumpy Dump Citi",
          "0051: Citibank 's hotline is useless ..",
          "0056: #citibank telephone banking sucks",
          "0059: Dear #citibank , your website sucks because it does not allow me to view my statements on my mac because it will not detect acrobat reader",
          "0060: Do not ever take out any TJX credit card . It has the worst customer service .",
          "0061: ECR Citibank credit report information is reliable ? ý Citibank Poor ý : Citibank offers customers the opportunity t ...",
          "0063: @fredlyfriendly #citibank has terrible customer service , high fees , and crappy management #MOVEYOURMONEY !",
          "0068: GRRR ... Citibank has the most annoying customer service !",
          "0076: hsbc #fail : inept customer service reps  NOT a &global& bank , intl wire transfers are a nightmare . poor boo ",
          "0079: I love it when a company gives me awful service and then sends me a satisfaction survey ",
          "0080: I really hate Citibank though we 've been their long-time client . Service is mediocre at best ",
          "0082: I think Citibank Philippines has a lousy call center .",
          "0083: I think Citibank Philippines has a lousy call center . I always get unprofessional customer service reps when I call them ",
          "0090: jesus , I may switch to chase sapphire simply bc citibank is annoying the hell out of me with not allowing me to skip to customer ",
          "0092: Just left Citibank Gold-Useless bunch .",
          "0093: @kellipoliska The worst Customer Service experience I ever received was definitely #Citibank",
          "0094: @libreriasjosue very bad service ... CITI bank is charging for everything , canceling credit lines , making people crazy !",
          "0096: Make your opinion more visible and sign #citibank sucks page :",
          "0102: Nice ! Customer service was useless .",
          "0103: Nice . Dumped Citibank cards after years for poor customer service .",
          "0105: No customer service at Citibank . Lack of phone access to human .",
          "0107: Not only does citibank fund more coal power plants than any other bank-they raise your interest rates  have terrible customer service .",
          "0110: Old news but citibank 's customer service sucks ....",
          "0111: OMG Suzi Orman just confirmed . Citibank has the worst credit card percents ",
          "0112: On the phone with Citi bank credit card customer service , horrible customer service , rude support person ",
          "0117: @poojster very bad citi bank ke employees se ab yeh kaam karate hai , clowns pathetic @mac_outlaws",
          "0123: RT @FredlyFriendly : @fredlyfriendly #citibank has terrible customer service , high fees , and crappy management #MOVEYOURMONEY !",
          "0133: Time to shop for a bank . RT @sarahlynnpablo : Citibank : your customer service sucks .",
          "0135: Today I realized that Citibank sucks ! Their customer service is horrible .",
          "0137: Ver bad customer response by #Citibank #fail ! ! ! !",
          "0148: why is it so hard to get in touch with the Citibank customer service personnel ? or am I calling the wrong number ?",
          "0149: Worst experience with #Citibank service ...",
          "0150: Wow Citibank sure has the worst interest rates ever"
        ]
      },
      {
        "clusterScore": 0,
        "clusterSize": 6,
        "clusterTopics": "[sentences_with_no_cluster_membership]",
        "clusteredSentences": [
          "0034:#citibank called me back after the tedious process I went thru to submit a complaint on that blardy rude call centre staff ",
          "0067:Got my euros ! : D but I 'm super disappointed with citibank !",
          "0070:Having a sunrise at c bean m Kiara . Cooling off after arguing w incompetent pol at citibank damansara perdana",
          "0078:I joined Twitter just so that I could tell people how UNHELPFUL AND TERRIBLE CITIBANK IS .",
          "0100:my citibank account is all screwed up and my house was broken into over the weekend .",
          "0151:Wtf ... #citibank playing FUCKING GAMES wit my interest ... !"
        ]
      }
    ]
  }
}

{

"results": {

"numOfClusters": 34,

"clusters": [

{

"clusterScore": 10.001701006383902,

"clusterSize": 53,

"clusterTopics": "[custom servic:10.00, citibank lousi:7.81, citibank lousi servic:6.96, lousi:4.88]",

"clusteredSentences": [

"0001: @AdamCSmith good idea ! chase and citibank have been the worst blood suckers for me , and I will be done with them real soon !",

"0002: Add your signature to #citibank sucks page :",

"0016: @BillMcCabe I 've been going rounds with Comcast and Verizon customer service reps all week . I 've heard bad things about Citibank too .",

"0018: Bodo lah CITIBANK ! Your service is CRAP ! ! ! !",

"0019: Breaking Now ! Terrible service : I HAD a Citibank supplementary credit card under my father 's name but I rarely use ...",

"0020: But their systems are so archaic , that they are customer unfriendly .. #citibank #fail",

"0021: calling citibank to check the balance on a gift card . this line really sucks ",

"0026: christinagan What are you switching to ? CitiBank sucks but they did not even ask why I was closing just gave me my money ",

"0030: Citibank : your customer service sucks .",

"0031: Citibank , your student loan site is awful .",

"0033: Citibank bankers are accused of deceiving customers with the sale of Lehman Bros products ",

"0037: Citibank credit card customer service sucks beyond belief ",

"0039: #citibank creditcard service sucks big time .",

"0040: @citibank customer service poor",

"0041: citibank #fail .... Worst customer experience ! ! ! !",

"0042: Citibank Fraud detection absolutely sucks ",

"0043: @citibank has the absolute worst customer service .",

"0044: citibank has the worst credit cards ever .",

"0046: Citibank is my mortgage company . Well my past two pays were real crappy d/t taking care of family …",

"0047: Citibank is such a terrible bank with terrible customer service .",

"0048: Citibank MasterCard is the worst credit card I 've ever had Do not trust the bank that never sleeps Lack of sleep makes it grumpy Dump Citi",

"0051: Citibank 's hotline is useless ..",

"0056: #citibank telephone banking sucks",

"0059: Dear #citibank , your website sucks because it does not allow me to view my statements on my mac because it will not detect acrobat reader",

"0060: Do not ever take out any TJX credit card . It has the worst customer service .",

"0061: ECR Citibank credit report information is reliable ? ý Citibank Poor ý : Citibank offers customers the opportunity t ...",

"0063: @fredlyfriendly #citibank has terrible customer service , high fees , and crappy management #MOVEYOURMONEY !",

"0068: GRRR ... Citibank has the most annoying customer service !",

"0076: hsbc #fail : inept customer service reps NOT a &global& bank , intl wire transfers are a nightmare . poor boo ",

"0079: I love it when a company gives me awful service and then sends me a satisfaction survey ",

"0080: I really hate Citibank though we 've been their long-time client . Service is mediocre at best ",

"0082: I think Citibank Philippines has a lousy call center .",

"0083: I think Citibank Philippines has a lousy call center . I always get unprofessional customer service reps when I call them ",

"0090: jesus , I may switch to chase sapphire simply bc citibank is annoying the hell out of me with not allowing me to skip to customer ",

"0092: Just left Citibank Gold-Useless bunch .",

"0093: @kellipoliska The worst Customer Service experience I ever received was definitely #Citibank",

"0094: @libreriasjosue very bad service ... CITI bank is charging for everything , canceling credit lines , making people crazy !",

"0096: Make your opinion more visible and sign #citibank sucks page :",

"0102: Nice ! Customer service was useless .",

"0103: Nice . Dumped Citibank cards after years for poor customer service .",

"0105: No customer service at Citibank . Lack of phone access to human .",

"0107: Not only does citibank fund more coal power plants than any other bank-they raise your interest rates have terrible customer service .",

"0110: Old news but citibank 's customer service sucks ....",

"0111: OMG Suzi Orman just confirmed . Citibank has the worst credit card percents ",

"0112: On the phone with Citi bank credit card customer service , horrible customer service , rude support person ",

"0117: @poojster very bad citi bank ke employees se ab yeh kaam karate hai , clowns pathetic @mac_outlaws",

"0123: RT @FredlyFriendly : @fredlyfriendly #citibank has terrible customer service , high fees , and crappy management #MOVEYOURMONEY !",

"0133: Time to shop for a bank . RT @sarahlynnpablo : Citibank : your customer service sucks .",

"0135: Today I realized that Citibank sucks ! Their customer service is horrible .",

"0137: Ver bad customer response by #Citibank #fail ! ! ! !",

"0148: why is it so hard to get in touch with the Citibank customer service personnel ? or am I calling the wrong number ?",

"0149: Worst experience with #Citibank service ...",

"0150: Wow Citibank sure has the worst interest rates ever"

]

{

"clusterScore": 0,

"clusterSize": 6,

"clusterTopics": "[sentences_with_no_cluster_membership]",

"clusteredSentences": [

"0034:#citibank called me back after the tedious process I went thru to submit a complaint on that blardy rude call centre staff ",

"0067:Got my euros ! : D but I 'm super disappointed with citibank !",

"0070:Having a sunrise at c bean m Kiara . Cooling off after arguing w incompetent pol at citibank damansara perdana",

"0078:I joined Twitter just so that I could tell people how UNHELPFUL AND TERRIBLE CITIBANK IS .",

"0100:my citibank account is all screwed up and my house was broken into over the weekend .",

"0151:Wtf ... #citibank playing FUCKING GAMES wit my interest ... !"

]

}

]

}

Language Support

This API is currently only tuned and optimized for English.

Client Wrapper

See our Python wrapper for this API. You can cluster text from a list of sentences, some plain text content or content from a file.

What is Sentence Clustering?

How is this different from Document Clustering?

What type of texts can I cluster?

Start Clustering Sentences

Before we start…

Clustering Algorithm Key Facts

Sentence Clustering API Request

Clustering a “chunk” or “blob” of text

Example of Plain JSON request with “chunk”

Example JSON request using Unirest Java Library (with “chunk” option)

Example of Plain JSON request with “pre-sentenced” option

Encoding Issues with JSON

ClusterSentences API Response

Cluster Score (clusterScore)

Cluster Size (clusterSize)

Cluster Topics (clusteredTopics)

Clustered Sentences (clusterSentences)

Example JSON Response

​Language Support

Client Wrapper

Language Support