ROUGE 2.0 is an open-source Java Package for Evaluation of Summarization Tasks using ROUGE measures. This document outlines how the ROUGE 2.0 package can be used for evaluation of your summarization tasks.

Before you begin…

Please ensure that you have downloaded the most recent version on ROUGE 2.0 prior to proceeding.
Reference summary refers to the “gold standard” or human summaries.
System summary refers to the machine generated summaries.
You can have any number of reference summaries for each summarization task.
You can have any number of system summaries for each summarization task.

Step 1: Unpack ROUGE 2.0

The first step is to unpack the ROUGE 2.0 distribution to any location on your machine. After unpacking ROUGE 2.0, this is the resulting directory structure that you would see:

resources/ - This directory contains replaceable stop words, POS taggers, language related resources.
test-summarization/- This is a test project containing “references” and “system” folders. It currently has samples for english and persian.
results.csv - This is a sample results file produced by ROUGE 2.0
rouge.properties- This is where you configure all your settings. This would be the file you would be working with extensively. (see step 4 for details)
rouge2.0_xx.jar- This is the runnable ROUGE 2.0 jar file that you can use to compute ROUGE scores.

Step 2: Getting Started - Create a project directory

To get started, create a directory structure as follows anywhere on your system:

 **your-project-name/** 
    reference/  ---- contains all reference summaries that will be used for evaluation. 
    system/  ---  contains all system generated summaries.

**your-project-name/**

reference/ ---- contains all reference summaries that will be used for evaluation.

system/ --- contains all system generated summaries.

Since we avoid any complex HTML formatting for evaluation. We evaluate summaries based on file naming convention. Please follow instructions in the next step for naming convention.

Step 3: Generate your system and reference summaries

In generating your system and reference summaries, please adhere to the following naming convention for evaluation. The naming convention for ROUGE 2.0 is actually very simple. You basically use the task name, reference summary name and system summary names.

Reference Summary Naming Convention

Your reference summary should be named as follows:

(task_name)_(reference_name)

1 2	(task_name)_(reference_name)

So if you have a summarization document called news1 and you have 3 human composed reference summaries, the files would be similar to:

news1_reference1.txt
news1_reference2.txt
news1_reference3.txt

news1_reference1.txt

news1_reference2.txt

news1_reference3.txt

The underscore is mandatory, and the first part of the file name always refers to the summarization task (i.e. document or text to be summarized). Please see samples under test-summarization/reference/

System Summary Naming Convention

Your system summary should be named as follows:

(task_name)_(system_name)

1 2	(task_name)_(system_name)

If you have a document called news1 (as earlier) and you have 4 systems (e.g. variation of the same summarization algorithm), each system would generate a separate summary for the same summarization task. The system files would thus be similar to:

news1_system1.txt
news1_system2.txt
news1_system3.txt
news1_system4.txt

news1_system1.txt

news1_system2.txt

news1_system3.txt

news1_system4.txt

The underscore is mandatory, and the first part of the file name always refers to the summarization task (i.e. document or text to be summarized). Note that news1 in the reference folder matches up with news1 in the system folder. Each of the system summaries for news1 would be individually evaluated against all reference summaries for news1 in the reference folder (scores averaged). Please see samples under test-summarization/system/

Reference and System Summary Formatting

You would need to use one file for each reference or system summary. Each sentence in your system or reference summary should be on a separate line. There is no need for numbering the sentences or placing the text in a html file. Any file extension is permitted as long as each sentence in the file is on a separate line. For example, if your system summary file has 3 sentences, the file would contain sentences like this:

More examples can be found in test-summarization/system or test-summarization/reference

Step 4: Configure rouge.properties

Now, before you start evaluating your summaries you need to configure the rouge.properties file located in the root of ROUGE 2.0. This is where you specify the ROUGE-N type that you want to evaluate, stop words to use, output file, synonyms and etc. You would be working with this file quite extensively. The table below describes the rouge 2.0 parameters and how you should set it. The most important parameters are:

project.dir
rouge.type (and parameters related to this selection)
output

Property Name	Description of Property	Required	Default
project.dir	Location of your summaries. Reference and System Summaries. By default it points to test-summarization within the root folder which contains some test reference and system summaries.	Yes	test-summarization/
rouge.type	topic, topicUniq or normal. Select normal for the typical ROUGE-N evaluation.	Yes	normal
ngram	what n-gram size? This is only applicable if rouge.type=normal.	Depends	1
stopwords.use	Do you want to use stop words? true/false	Optional	false
stopwords.file	Location of stop words file. This can be changed based on language. By default it uses what the perl version of ROUGE uses.	Yes (if stopwords.use=true)	resources/stopwords-rouge-default.txt
stemmer.use	Use stemming? (true/false)	Optional	false
stemmer.name	Which stemmer to use? englishStemmer, turkishStemmer, frenchStemmer…?	Yes (if stemmer.use=true)	englishStemmer
topic.type	Only set this if topic of topicUniq are used. This should be the POS form in lowercase (based on Stanford’s POS Tagger). For example, “nn\|jj” or jj or “jj\|vb” or vbp.	Depends	“nn\|jj”
synonyms.use	Use synonyms? true/false	Optional	false
output	How to generate results, to screen or file? file or console	Optional	file
outputFile	What file to output results to? This is to be set only if output=file	Optional	results.csv

Step 5: Running ROUGE 2.0 Jar File

Once you have generated your system summaries and formatted your reference summaries as specified in Step 3 and configured the rouge property file as in Step 4, the next step would be to run the evaluation package. Assuming ROUGE 2.0 was unpacked into C:\projects\rouge2.0, you can execute ROUGE 2.0 from any Linux or Windows machine as follows:

java -jar rouge2.0_xx.jar

1 2	java -jar rouge2.0_xx.jar

This step uses all the system summaries and corresponding reference summaries from your project directory defined in Step 2 and computes the appropriate ROUGE scores (as specified in rouge.properties). By default, the rouge.properties file in the root of the rouge 2.0 installation will be used. You could either modify this version of rouge.properties or you can also use a rouge.properties file located elsewhere on disk as follows:

-Drouge.prop="path_to_properties_file"

1 2	-Drouge.prop="path_to_properties_file"

The output of evaluation would be printed to console and also to file if output=file in rouge.properties. Here is an example of output printed to screen:

Here is an example of the results file produced:

By default if outputFile in rouge.properties is not changed, the output would be in results.csv in the root of rouge 2.0.

Running ROUGE 2.0 for Unicode Texts (Persian, Tamil, etc)

One of the problems with the original perl version of ROUGE is that it does not support evaluation of unicode based texts because of the way the text is tokenized. This package has been tested with persian texts and would thus work in cases where the original perl package fails. To make sure that this package works for unicode texts please ensure that:

In the rouge.properties file, use_synonyms is set to false. When this is true, ROUGE 2.0 tries to POS tag the text and if there isn’t a suitable POS tagger from the Stanford POS tagging libraries, this will cause issues.

Other than this setting, you are actually ready to go! Just follow Steps 1 - 5 as specified above.

Questions ?

For questions, you can use our “Ask Question” feature which can be found on the menu bar. For other inquiries please use our contact page.