Summarization System Submission

Submit your existing test results for the released test leaderboard or submit a Docker image of your summarization system to perform a complete evaluation on the held out test data for the unreleased test leaderboard.

Share test set results of your summarization system...

Fill out this form to submit existing evaluation results on the released test set. The leaderboard uses the F-score variant of ROUGE 1, 2, and L. System scores should be computed on all article-summary pairs in the entire test set, across all extractiveness subsets. (For a more complete evaluation, consider submitting a Docker image of your system for unreleased test data evaluation in the future!)

Evaluate your summarization system on the unreleased test data...

Fill out this form to submit a system for evaluation on the unreleased test set. Systems must be submitted as Docker or GPU-enabled Docker containers (nvidia-docker). Docker is used to minimize differences between your training environment and the test environment. The NEWSROOM download tools include:

A working Docker example for the TextRank summarization system.
The CLI tool used to run submitted summarization systems (newsroom-run).
The CLI tool used to evaluate summaries of submitted systems (newsroom-test).

These tools are provided to test systems prior to submission.

The test dataset is provided on standard input in a JSON line stream. This is similar to the training data, but only containing the "text" field. By default, this is the raw summary string, but can be tokenized if requested. From this input, the system must reproduce a list of JSON-encoded strings to standard output. The output must be in the same order as the input. If possible, standard output should be flushed between summaries.

The Dockerfile for the container must specify an ENTRYPOINT to be run the container as an executable (see TextRank example). Docker containers are run using the provided newsroom-run tool, which runs docker or nvidia-docker as specified.

docker run \
    -a stdin -a stdout \
    -i --rm [IMAGE] \
    < unreleased.jsonl > summaries.jsonl

System summaries are evaluated using ROUGE-1, ROUGE-2, and ROUGE-L both with and without stemming. If a tokenizer is specified, summaries are evaluated with and without tokenization. See the provided newsroom-test tool for the exact evaluation procedure. This tool can also be used for evaluation on the provided development and test datasets.

Additional instructions for using Docker, running, and evaluating systems are provided in the NEWSROOM download. Review these instructions before submission.

Newsroom

Summarization Dataset

Summarization System Submission

Input and Output

Running the System

System Evaluation

Additional Instructions