Natural Language Procession Metrics
Once you have trained your NLP model, you need to evaluate the performance of the model. This package provide various metrics functions with which we can evaluate and assess the accuracy of the NLP model. However, their usefulness depends on the type of NLP problem you are working on.
BLEU Score
Metrics.bleu_score
— Functionbleu_score(reference_corpus, translation_corpus; max_order=4, smooth=false)
Computes BLEU score of translated segments against one or more references. Returns the BLEU score
, n-gram precisions
, brevity penalty
, geometric mean of n-gram precisions, translationlength and referencelength
Arguments
reference_corpus
: list of lists of references for each translation. Each reference should be tokenized into a list of tokens.translation_corpus
: list of translations to score. Each translation should be tokenized into a list of tokens.max_order
: maximum n-gram order to use when computing BLEU score.smooth=false
: whether or not to apply. Lin et al. 2004 smoothing.
Rouge Score
Metrics.rouge_n
— Functionrouge_n(evaluated_sentences, reference_sentences; n=2)
Computes ROUGE-N of two text collections of sentences. Returns f1, precision, recall for ROUGE-N.
Arguments:
evaluated_sentences
: the sentences that have been picked by the summarizerreference_sentences
: the sentences from the referene setn
: size of ngram. Defaults to 2.
Source: (http://research.microsoft.com/en-us/um/people/cyl/download/ papers/rouge-working-note-v1.3.1.pdf)
Metrics.rouge_l_sentence_level
— Functionrouge_l_sentence_level(evaluated_sentences, reference_sentences)
Computes ROUGE-L (sentence level) of two text collections of sentences.
Calculated according to: Rlcs = LCS(X,Y)/m, Plcs = LCS(X,Y)/n, Flcs = ((1 + beta^2)*Rlcs*Plcs) / (Rlcs + (beta^2) * P_lcs)
where: X = reference summary Y = Candidate summary m = length of reference summary n = length of candidate summary
Argumnets:
evaluated_sentences
: the sentences that have been picked by the summarizerreference_sentences
: the sentences from the referene set
Source: (http://research.microsoft.com/en-us/um/people/cyl/download/papers/rouge-working-note-v1.3.1.pdf)
Metrics.rouge_l_summary_level
— Functionrouge_l_summary_level(evaluated_sentences, reference_sentences)
Computes ROUGE-L (summary level) of two text collections of sentences.
Calculated according to: Rlcs = SUM(1, u)[LCS<union>(ri,C)]/m Plcs = SUM(1, u)[LCS<union>(ri,C)]/n Flcs = ((1 + beta^2)*Rlcs*Plcs) / (Rlcs + (beta^2) * P_lcs)
where: SUM(i,u) = SUM from i through u u = number of sentences in reference summary C = Candidate summary made up of v sentences m = number of words in reference summary n = number of words in candidate summary
Arguments:
evaluated_sentences
: the sentences that have been picked by the summarizerreference_sentence
: the sentences in the reference summaries
Source: (http://research.microsoft.com/en-us/um/people/cyl/download/papers/rouge-working-note-v1.3.1.pdf)
Metrics.rouge
— Functionrouge(hypotheses, references)
Calculates average rouge scores for a list of hypotheses and references.