Classification Metrics

This package allows you to use a variety of Classification metrics for the performance analysis of Classification models based on the provided y_true and y_pred. The metrics that you choose to evaluate your machine learning model is very important. Choice of metrics influences how the performance of machine learning algorithms is measured and compared. For most of these function, it is expected that the provided

Functions

Metrics.binary_accuracy — Function

binary_accuracy(y_pred, y_true; threshold=0.5)

Calculates Averaged Binary Accuracy based on y_pred and y_true. Argument threshold is used to specify the minimum predicted probability y_pred required to be labelled as 1. Default value set as 0.5.

Metrics.categorical_accuracy — Function

categorical_accuracy(y_pred, y_true)

Calculates Averaged Categorical Accuracy based on y_pred and y_true.

Metrics.cohen_kappa — Function

cohen_kappa(y_pred, y_true)

Measures the agreement between two raters (predicted and ground truth, here) who each classify N items into C mutually exclusive categories, using the observed data to calculate the probabilities of each observer randomly seeing each category. If the raters are in complete agreement then κ = 1. If there is no agreement among the raters other than what would be expected by chance, κ = 0.

Ref: Cohen's Kappa

Metrics.confusion_matrix — Function

confusion_matrix(y_pred, y_true)

Function to create a confusionmatrix for classification problems based on provided `ypredandytrue. Expectsytrue`, to be onehot_enocded already.

Metrics.f_beta_score — Function

f_beta_score(y_pred, y_true; β=1, avg_type="macro", sample_weights=nothing)

Compute fbeta score. The F_beta score is the weighted harmonic mean of precision and recall, reaching its optimal value at 1 and its worst value at 0.

Arguments

y_pred: predicted values.
y_true: ground truth values on the basis of which predicted values are to be assessed.
β=1: the weight of precision in the combined score. If β<1, more weight given to precision, while β>1 favors recall.
avg_type="macro": Type of average to be used while calculating precision of multiclass models. Can take values as macro, micro and weighted. Default set to macro.
sample_weights: Class weights to be provided when avg_type is set to weighted. Useful in case of imbalanced classes.

Metrics.false_alarm_rate — Function

false_alarm_rate(y_pred, y_true; avg_type="macro", sample_weights=nothing)

Computes the falsealarmraye of the predictions with respect to the labels as 1 - specificity(y_pred, y_true, avg_type, sample_weights)

Arguments

y_pred: predicted values.
y_true: ground truth values on the basis of which predicted values are to be assessed.
avg_type="macro": Type of average to be used while calculating precision of multiclass models. Can take values as macro, micro and weighted. Default set to macro.
sample_weights: Class weights to be provided when avg_type is set to weighted. Useful in case of imbalanced classes.

See also: specificity

Metrics.precision — Function

precision(y_pred, y_true; avg_type="macro", sample_weights=nothing)

Computes the precision of the predictions with respect to the labels.

Arguments

y_pred: predicted values.
y_true: ground truth values on the basis of which predicted values are to be assessed.
avg_type="macro": Type of average to be used while calculating precision of multiclass models. Can take values as macro, micro and weighted. Default set to macro.
sample_weights: Class weights to be provided when avg_type is set to weighted. Useful in case of imbalanced classes.

Metrics.recall — Function

recall(y_pred, y_true; avg_type="macro", sample_weights=nothing)

Computes the recall of the predictions with respect to the labels.

Arguments

y_pred: predicted values.
y_true: ground truth values on the basis of which predicted values are to be assessed.
avg_type="macro": Type of average to be used while calculating precision of multiclass models. Can take values as macro, micro and weighted. Default set to macro.
sample_weights: Class weights to be provided when avg_type is set to weighted. Useful in case of imbalanced classes.

Aliases: sensitivity and detection_rate

Metrics.sparse_categorical — Function

sparse_categorical(y_pred, y_true)

Calculated Sparse Categorical Accuracy based on y_pred and y_true. It evaluates the maximal true value is equal to the index of the maximal predicted value. Here, y_true is expected to provide only an integer (start from 0 index) as label for each data element (ie. not one hot encoded).

Metrics.specificity — Function

specificity(y_pred, y_true; avg_type="macro", sample_weights=nothing)

Computes the specificity of the predictions with respect to the labels.

Arguments

y_pred: predicted values.
y_true: ground truth values on the basis of which predicted values are to be assessed.
avg_type="macro": Type of average to be used while calculating precision of multiclass models. Can take values as macro, micro and weighted. Default set to macro.
sample_weights: Class weights to be provided when avg_type is set to weighted. Useful in case of imbalanced classes.

Metrics.top_k_categorical — Function

top_k_categorical(y_pred, y_true; k=3)

Evaluates if the index of true value is equal to any of the indices of top k predicted values. Default value of k set to 3.

Metrics.top_k_sparse_categorical — Function

top_k_sparse_categorical(y_pred, y_true; k=3)

Evaluates if the true value is equal to any of the indices of top k predicted values. Default value of k set to 3. Similar to sparse_categorical, expects the y_true to provide only an integer (start from 0 index) as label for each data element (ie. not one hot encoded).

Combined Stats

There are some functions that return you the overall analysis of the model performance within a single function. They are:

Metrics.statsfromTFPN — Function

statsfromTFPN(TP, TN, FP, FN)

Computes statistics in case of binary classification or one-vs-all statsitics in case of multiclass classification.

Arguments:

TP: true positive values
TN: true negative values
FP: false positive values
FN: false negative values

Return the result stats as a dictionary.

Metrics.classwise_stats — Function

classwise_stats(y_pred, y_true)

Computes statistics for each of the class for multiclass classification based on provided y_pred and y_true.

Return the result stats as a dictionary.

Metrics.global_stats — Function

global_stats(y_pred, y_true; avg_type="macro")

Computes the overall statistics based on provided y_pred and y_true. avg_type allows to specify the type of average to be used while evaluating the stats. Currently, it can take values as "macro" or "micro".

Return the result stats as a dictionary.

Utils

These are some utility functions to aid the overall performance analysis.

Metrics.bin_to_cat — Function

bin_to_cat(y_pred, y_true)

Function to convert binary type of data to categorical with two categories. Return y_pred and y_true of shape (2, length(y_pred)) as tuple. Utility function to support performance metrics like Precision, Recall etc, where the function first need to be converted to categorical form before applying metric.

Metrics.TFPN — Function

TFPN(y_pred, y_true)

Returns Confusion Matrix and True Positive, True Negative, False Positive and False Negative for each class based on y_pred and y_true. Expects y_true, to be onehot_enocded already.