Natural Language Procession Metrics

Once you have trained your NLP model, you need to evaluate the performance of the model. This package provide various metrics functions with which we can evaluate and assess the accuracy of the NLP model. However, their usefulness depends on the type of NLP problem you are working on.

BLEU Score

Metrics.bleu_score — Function

bleu_score(reference_corpus, translation_corpus; max_order=4, smooth=false)

Computes BLEU score of translated segments against one or more references. Returns the BLEU score, n-gram precisions, brevity penalty, geometric mean of n-gram precisions, translationlength and referencelength

Arguments

reference_corpus: list of lists of references for each translation. Each reference should be tokenized into a list of tokens.
translation_corpus: list of translations to score. Each translation should be tokenized into a list of tokens.
max_order: maximum n-gram order to use when computing BLEU score.
smooth=false: whether or not to apply. Lin et al. 2004 smoothing.

source

Rouge Score

Metrics.rouge_n — Function

rouge_n(evaluated_sentences, reference_sentences; n=2)

Computes ROUGE-N of two text collections of sentences. Returns f1, precision, recall for ROUGE-N.

Arguments:

evaluated_sentences: the sentences that have been picked by the summarizer
reference_sentences: the sentences from the referene set
n: size of ngram. Defaults to 2.

Source: (http://research.microsoft.com/en-us/um/people/cyl/download/ papers/rouge-working-note-v1.3.1.pdf)

source

Metrics.rouge_l_sentence_level — Function

rouge_l_sentence_level(evaluated_sentences, reference_sentences)

Computes ROUGE-L (sentence level) of two text collections of sentences.

Calculated according to: Rlcs = LCS(X,Y)/m, Plcs = LCS(X,Y)/n, Flcs = ((1 + beta^2)*Rlcs*Plcs) / (Rlcs + (beta^2) * P_lcs)

where: X = reference summary Y = Candidate summary m = length of reference summary n = length of candidate summary

Argumnets:

evaluated_sentences: the sentences that have been picked by the summarizer
reference_sentences: the sentences from the referene set

Source: (http://research.microsoft.com/en-us/um/people/cyl/download/papers/rouge-working-note-v1.3.1.pdf)

source

Metrics.rouge_l_summary_level — Function

rouge_l_summary_level(evaluated_sentences, reference_sentences)

Computes ROUGE-L (summary level) of two text collections of sentences.

Calculated according to: Rlcs = SUM(1, u)[LCS<union>(ri,C)]/m Plcs = SUM(1, u)[LCS<union>(ri,C)]/n Flcs = ((1 + beta^2)*Rlcs*Plcs) / (Rlcs + (beta^2) * P_lcs)

where: SUM(i,u) = SUM from i through u u = number of sentences in reference summary C = Candidate summary made up of v sentences m = number of words in reference summary n = number of words in candidate summary

Arguments:

evaluated_sentences: the sentences that have been picked by the summarizer
reference_sentence: the sentences in the reference summaries

Source: (http://research.microsoft.com/en-us/um/people/cyl/download/papers/rouge-working-note-v1.3.1.pdf)

source

Metrics.rouge — Function

rouge(hypotheses, references)

Calculates average rouge scores for a list of hypotheses and references.

source