Performance Evaluation

This package provides tools to assess the performance of a machine learning algorithm.

Classification Performance

correctrate(gt, pred)

Compute correct rate of predictions given by pred w.r.t. the ground truths given in gt.

errorrate(gt, pred)

Compute error rate of predictions given by pred w.r.t. the ground truths given in gt.

confusmat(k, gt, pred)

Compute the confusion matrix of the predictions given by pred w.r.t. the ground truths given in gt. Here, k is the number of classes.

It returns an integer matrix R of size (k, k), such that R(i, j) == countnz((gt .== i) & (pred .== j)).

Examples:

julia> gt = [1, 1, 1, 2, 2, 2, 3, 3];

julia> pred = [1, 1, 2, 2, 2, 3, 3, 3];

julia> C = confusmat(3, gt, pred)   # compute confusion matrix
3x3 Array{Int64,2}:
 2  1  0
 0  2  1
 0  0  2

julia> C ./ sum(C, 2)   # normalize per class
3x3 Array{Float64,2}:
 0.666667  0.333333  0.0
 0.0       0.666667  0.333333
 0.0       0.0       1.0

julia> trace(C) / length(gt)  # compute correct rate from confusion matrix
0.75

julia> correctrate(gt, pred)
0.75

Hit rate (for retrieval tasks)

hitrate(gt, ranklist, k)

Compute the hitrate of rank k for a ranked list of predictions given by ranklist w.r.t. the ground truths given in gt.

Particularly, if gt[i] is contained in ranklist[1:k, i], then the prediction for the i-th sample is said to be hit within rank ``k``. The hitrate of rank k is the fraction of predictions that hit within rank k.

hitrates(gt, ranklist, ks)

Compute hit-rates of multiple ranks (as given by a vector ks). It returns a vector of hitrates r, where r[i] corresponding to the rank ks[i].

Note that computing hit-rates for multiple ranks jointly is more efficient than computing them separately.

Receiver Operating Characteristics (ROC)

Receiver Operating Characteristics (ROC) is often used to measure the performance of a detector, thresholded classifier, or a verification algorithm.

The ROC Type

This package uses an immutable type ROCNums defined below to capture the ROC of an experiment:

immutable ROCNums{T<:Real}
    p::T    # positive in ground-truth
    n::T    # negative in ground-truth
    tp::T   # correct positive prediction
    tn::T   # correct negative prediction
    fp::T   # (incorrect) positive prediction when ground-truth is negative
    fn::T   # (incorrect) negative prediction when ground-truth is positive
end

One can compute a variety of performance measurements from an instance of ROCNums (say r):

true_positive(r)

the number of true positives (r.tp)

true_negative(r)

the number of true negatives (r.tn)

false_positive(r)

the number of false positives (r.fp)

false_negative(r)

the number of false negatives (r.fn)

true_postive_rate(r)

the fraction of positive samples correctly predicted as positive, defined as r.tp / r.p

true_negative_rate(r)

the fraction of negative samples correctly predicted as negative, defined as r.tn / r.n

false_positive_rate(r)

the fraction of negative samples incorrectly predicted as positive, defined as r.fp / r.n

false_negative_rate(r)

the fraction of positive samples incorrectly predicted as negative, defined as r.fn / r.p

recall(r)

Equivalent to true_positive_rate(r).

precision(r)

the fraction of positive predictions that are correct, defined as r.tp / (r.tp + r.fp).

f1score(r)

the harmonic mean of recall(r) and precision(r).

Computing ROC Curves

The package provides a function roc to compute an instance of ROCNums or a sequence of such instances from predictions.

roc(gt, pred)

Compute an ROC instance based on ground-truths given in gt and predictions given in pred.

roc(gt, scores, thres[, ord])

Compute an ROC instance or an ROC curve (a vector of ROC instances), based on given scores and a threshold thres.

Prediction will be made as follows:

  • When ord = Forward: predicts 1 when scores[i] >= thres otherwise 0.
  • When ord = Reverse: predicts 1 when scores[i] <= thres otherwise 0.

When ord is omitted, it is defaulted to Forward.

Returns:

  • When thres is a single number, it produces a single ROCNums instance;
  • When thres is a vector, it produces a vector of ROCNums instances.

Note: Jointly evaluating an ROC curve for multiple thresholds is generally much faster than evaluating for them individually.

roc(gt, (preds, scores), thres[, ord])

Compute an ROC instance or an ROC curve (a vector of ROC instances) for multi-class classification, based on given predictions, scores and a threshold thres.

Prediction is made as follows:

  • When ord = Forward: predicts preds[i] when scores[i] >= thres otherwise 0.
  • When ord = Reverse: predicts preds[i] when scores[i] <= thres otherwise 0.

When ord is omitted, it is defaulted to Forward.

Returns:

  • When thres is a single number, it produces a single ROCNums instance.
  • When thres is a vector, it produces an ROC curve (a vector of ROCNums instances).

Note: Jointly evaluating an ROC curve for multiple thresholds is generally much faster than evaluating for them individually.

roc(gt, scores, n[, ord])

Compute an ROC curve (a vector of ROC instances), with respect to n evenly spaced thresholds from minimum(scores) and maximum(scores). (See above for details)

roc(gt, (preds, scores), n[, ord])

Compute an ROC curve (a vector of ROC instances) for multi-class classification, with respect to n evenly spaced thresholds from minimum(scores) and maximum(scores). (See above for details)

roc(gt, scores, ord])

Equivalent to roc(gt, scores, 100, ord).

roc(gt, (preds, scores), ord])

Equivalent to roc(gt, (preds, scores), 100, ord).

roc(gt, scores)

Equivalent to roc(gt, scores, 100, Forward).

roc(gt, (preds, scores))

Equivalent to roc(gt, (preds, scores), 100, Forward).