Natural Language Procession Metrics

Once you have trained your NLP model, you need to evaluate the performance of the model. This package provide various metrics functions with which we can evaluate and assess the accuracy of the NLP model. However, their usefulness depends on the type of NLP problem you are working on.

BLEU Score

Metrics.bleu_scoreFunction
bleu_score(reference_corpus, translation_corpus; max_order=4, smooth=false)

Computes BLEU score of translated segments against one or more references. Returns the BLEU score, n-gram precisions, brevity penalty, geometric mean of n-gram precisions, translationlength and referencelength

Arguments

  • reference_corpus: list of lists of references for each translation. Each reference should be tokenized into a list of tokens.
  • translation_corpus: list of translations to score. Each translation should be tokenized into a list of tokens.
  • max_order: maximum n-gram order to use when computing BLEU score.
  • smooth=false: whether or not to apply. Lin et al. 2004 smoothing.
source

Rouge Score

Metrics.rouge_nFunction
rouge_n(evaluated_sentences, reference_sentences; n=2)

Computes ROUGE-N of two text collections of sentences. Returns f1, precision, recall for ROUGE-N.

Arguments:

  • evaluated_sentences: the sentences that have been picked by the summarizer
  • reference_sentences: the sentences from the referene set
  • n: size of ngram. Defaults to 2.

Source: (http://research.microsoft.com/en-us/um/people/cyl/download/ papers/rouge-working-note-v1.3.1.pdf)

source
Metrics.rouge_l_sentence_levelFunction
rouge_l_sentence_level(evaluated_sentences, reference_sentences)

Computes ROUGE-L (sentence level) of two text collections of sentences.

Calculated according to: Rlcs = LCS(X,Y)/m, Plcs = LCS(X,Y)/n, Flcs = ((1 + beta^2)*Rlcs*Plcs) / (Rlcs + (beta^2) * P_lcs)

where: X = reference summary Y = Candidate summary m = length of reference summary n = length of candidate summary

Argumnets:

  • evaluated_sentences: the sentences that have been picked by the summarizer
  • reference_sentences: the sentences from the referene set

Source: (http://research.microsoft.com/en-us/um/people/cyl/download/papers/rouge-working-note-v1.3.1.pdf)

source
Metrics.rouge_l_summary_levelFunction
rouge_l_summary_level(evaluated_sentences, reference_sentences)

Computes ROUGE-L (summary level) of two text collections of sentences.

Calculated according to: Rlcs = SUM(1, u)[LCS<union>(ri,C)]/m Plcs = SUM(1, u)[LCS<union>(ri,C)]/n Flcs = ((1 + beta^2)*Rlcs*Plcs) / (Rlcs + (beta^2) * P_lcs)

where: SUM(i,u) = SUM from i through u u = number of sentences in reference summary C = Candidate summary made up of v sentences m = number of words in reference summary n = number of words in candidate summary

Arguments:

  • evaluated_sentences: the sentences that have been picked by the summarizer
  • reference_sentence: the sentences in the reference summaries

Source: (http://research.microsoft.com/en-us/um/people/cyl/download/papers/rouge-working-note-v1.3.1.pdf)

source
Metrics.rougeFunction
rouge(hypotheses, references)

Calculates average rouge scores for a list of hypotheses and references.

source