CS 288: Statistical Natural Language Processing

CS 288: Statistical Natural Language Processing, Spring 2011

Assignment 2: Phrase-Based Decoding
Due: February 17th

Setup

First, make sure you can access the course materials. The components are:

code2.tar.gz : the Java source code provided for this course
data2.tar.gz : the data sets used in this assignment

The authentication restrictions are due to licensing terms. The username and password are the same as for Assignment 1.

The testing harness we will be using is MtDecoderTester (in the edu.berkeley.nlp.assignments.assign2 package). To run it, first unzip the data archive to a local directory ($PATH). Then, build the submission jar using

ant -f build_assign2.xml

Then, try running

java -cp assign2.jar:assign2-submit.jar -server -mx2000m edu.berkeley.nlp.assignments.assign2.MtDecoderTester -path $PATH -decoderType MONO_GREEDY

You will see the tester do some translation of 100 French sentences, which will take a few seconds, printing out translations along the way (note that printing of translations can be turned off with -noprint). The decoder is a very simple monotonic decoder which greedily translates foreign phrases one at a time. Its accuracy should be poor (around 20), as should its model score (around -5216; see below for more about the model).

Description

In this assignment, you will implement several phrase-based decoders and test them with the provided harness. Your decoder will attempt to find the translation with maximum score under a linear model. This linear model consists of three features: a trigram language model; a linear distortion model; and a translation model which assigns scores to phrase pairs. The language model uses the data from Assignment 1, but we have provided an implementation so you don't need to use your own (though you can if you wish). Each of these features is weighted by some pre-tuned weights, so you should not need to worry about these weights (though you may experiment with them if you wish). Constructing the translation model table and tuning the weights will be dealt with in later assignments.

You will need to implement four decoders of increasing complexity. The first is a monotonic beam-search decoder with no language model. You should return an instance of such a decoder from MonotonicNoLmDecoderFactory. Using our implementation, using beams of size 2000, we decode the test set in 16 seconds and achieve a model score of -5276 and BLEU score of 17.2. Note that this decoder performs poorly even compared to the greedy decoder since no language model is used.

The second decoder, which should be constructed in MonotonicWithLmDecoderFactory, should implement monotonic beam search with an integrated trigram language model. This should slow the search, but model score and BLEU will go up significantly. Using our implementation, using beams of size 2000, we decode the test set in 2 minutes and achieve a model score of -3781 and BLEU score of 26.5. The third decoder, which should be constructed in DistortingWithLmDecoderFactory, should implement a beam search which permits limited distortion as discussed in class. The distortion score can be retrieved from DistortionModel class, as can the distortion limit. The addition of distortion should also slow down the decoder but again achieve higher model and BLEU scores. Using our implementation, using beams of size 2000, we decode the test set in about 20 minutes and achieve a model score of -3529 and BLEU score of 27.5. Finally, you should implement a decoder with extensions of your choosing in AwesomeDecoderFactory. This portion of the assignment is open-ended, and you can improve your decoder in any way you like. Some possible extensions include:

Implementing/improving the future cost estimates used in the Pharoah decoder.

Implementing/improving cube pruning.

Implementing coarse-to-fine decoding.

Exploiting equivalent LM states as in a context-encoded language model. Note that this requires access to additional functionality in the language model, which is provided via the ContextEncodingNgramLanguageModel interface. You can get an instance of this interface via NgramLanguageModelAdaptor class.

Improving/replacing the linear distortion model.

Using A* instead of beam search.

Evaluation: The task of an MT decoder is to take a foreign sentence and find the highest-scoring English translation according to a particular model. If the model is a good one, then translation accuracy (BLEU) will correlate with model score, though this will not always be the case. As such, we will primarily evaluate your decoder on what model score it achieves on the training data, though we will measure BLEU as well. Also, with all decoders, there is a speed vs. accuracy trade-off: you can always achieve a higher model score with a slow decoder, or decode very quickly if you do not care about making large numbers of search errors and achieving a low model score. We will measure the performance of you decoder both in terms of speed and accuracy (model score), and compare it to the speed-accuracy trade-off of our own implementation. Note that it is up to you to pick a point along the speed-accuracy curve in your submission. However, you can (and should) do further exploration of your decoders and report on the results in your write-up.

When we autograde your submitted code, we will run the following commands:

    java -cp assign2.jar:assign2-submit.jar -server -mx2000m edu.berkeley.nlp.assignments.assign2.MtDecoderTester -path $PATH -decoderType MONO_NOLM
    java -cp assign2.jar:assign2-submit.jar -server -mx2000m edu.berkeley.nlp.assignments.assign2.MtDecoderTester -path $PATH -decoderType MONO_LM
    java -cp assign2.jar:assign2-submit.jar -server -mx2000m edu.berkeley.nlp.assignments.assign2.MtDecoderTester -path $PATH -decoderType DIST_LM
    java -cp assign2.jar:assign2-submit.jar -server -mx2000m edu.berkeley.nlp.assignments.assign2.MtDecoderTester -path $PATH -decoderType AWESOME

Please ensure that all of these commands complete on your machine before submitting.

Write-ups: Write-ups should similar to those in Assignment 1. It should include tables or graphs of BLEU, runtime, model score, etc., of your systems, as well as some error analysis - enough to convince us that you looked at the specific behavior of your systems and thought about what it's doing wrong and how you'd fix it. You should also describe the extensions you implemented in your AwesomeDecoder, including how and why they improve on your other decoders.

Submission: You will submit assign2-submit.jar to an online system. Note that this jar must contain implementations of the DecoderFactory implementations we provide to you, but must not contain any modifications of the source code provided in assign2.jar. To check that everything is in order, we will run a small "sanity" check when you submit the jar. Specifically, we will run the commands

    java -cp assign2.jar:assign2-submit.jar -server -mx50m edu.berkeley.nlp.assignments.assign2.MtDecoderTester -path $PATH -lmType MONO_NOLM -sanityCheck
    java -cp assign2.jar:assign2-submit.jar -server -mx50m edu.berkeley.nlp.assignments.assign2.MtDecoderTester -path $PATH -lmType MONO_LM -sanityCheck
    java -cp assign2.jar:assign2-submit.jar -server -mx50m edu.berkeley.nlp.assignments.assign2.MtDecoderTester -path $PATH -lmType DIST_LM -sanityCheck
    java -cp assign2.jar:assign2-submit.jar -server -mx50m edu.berkeley.nlp.assignments.assign2.MtDecoderTester -path $PATH -lmType AWESOME -sanityCheck

The -sanityCheck flag will run the test harness with a tiny amount of data just to make sure no exceptions are thrown. Please ensure that these commands return successfully before submitting your jar.

You will also submit a write-up in class on the due date.

Grading: Unlike Assignment 1, there will be no hard requirements for completion of this assignment. However, you should ensure that your times and accuracies are comparable to those quoted in this write-up -- underperforming submissions will receive a lower grade. As always, additional improvements in memory usage, BLEU score, model score and decoding speed will also affect your grade, as will your write-up. Your extension(s) will be evaluated primarily based on your write-up, though we will run your AWESOME decoder to observe any improvements ourselves.