python - Getting the maximum accuracy for a binary probabilistic classifier in scikit-learn -
is there built-in function maximum accuracy binary probabilistic classifier in scikit-learn?
e.g. maximum f1-score do:
# aucpr precision, recall, thresholds = sklearn.metrics.precision_recall_curve(y_true, y_score) auprc = sklearn.metrics.auc(recall, precision) max_f1 = 0 r, p, t in zip(recall, precision, thresholds): if p + r == 0: continue if (2*p*r)/(p + r) > max_f1: max_f1 = (2*p*r)/(p + r) max_f1_threshold = t
i compute maximum accuracy in similar fashion:
accuracies = [] thresholds = np.arange(0,1,0.1) threshold in thresholds: y_pred = np.greater(y_score, threshold).astype(int) accuracy = sklearn.metrics.accuracy_score(y_true, y_pred) accuracies.append(accuracy) accuracies = np.array(accuracies) max_accuracy = accuracies.max() max_accuracy_threshold = thresholds[accuracies.argmax()]
but wonder whether there built-in function.
i started improve solution transforming thresholds = np.arange(0,1,0.1)
smarter, dichotomous way of finding maximum
then realized, after 2 hours of work, getting all accuracies far more cheaper finding maximum !! (yes totally counter-intuitive).
i wrote lot of comments here below explain code. feel free delete these make code more readable.
import numpy np # definition : predict true if y_score > threshold def roc_curve_data(y_true, y_score): y_true = np.asarray(y_true, dtype=np.bool_) y_score = np.asarray(y_score, dtype=np.float_) assert(y_score.size == y_true.size) order = np.argsort(y_score) # ordering stuffs y_true = y_true[order] # thresholds consider values of score, , 0 (accept everything) thresholds = np.insert(y_score[order],0,0) tp = [sum(y_true)] # number of true positives (for threshold = 0 => accept => tp[0] = # of postive in true y) fp = [sum(~y_true)] # number of true positives (for threshold = 0 => accept => tp[0] = # of postive in true y) tn = [0] # number of true negatives (for threshold = 0 => accept => don't have negatives !) fn = [0] # number of true negatives (for threshold = 0 => accept => don't have negatives !) in range(1, thresholds.size) : # "-1" because last threshold # @ step, stop predicting y_score[i-1] true, false.... y_true value ? # if y_true true, step mistake ! tp.append(tp[-1] - int(y_true[i-1])) fn.append(fn[-1] + int(y_true[i-1])) # if y_true false, step ! fp.append(fp[-1] - int(~y_true[i-1])) tn.append(tn[-1] + int(~y_true[i-1])) tp = np.asarray(tp, dtype=np.int_) fp = np.asarray(fp, dtype=np.int_) tn = np.asarray(tn, dtype=np.int_) fn = np.asarray(fn, dtype=np.int_) accuracy = (tp + tn) / (tp + fp + tn + fn) sensitivity = tp / (tp + fn) specificity = tn / (fp + tn) return((thresholds, tp, fp, tn, fn))
the process single loop, , algorithm trivial. in fact, stupidly simple function 10 times faster solution proposed before me (commpute accuracies thresholds = np.arange(0,1,0.1)
) , 30 times faster previous smart-ass-dychotomous-algorithm...
you can compute any kpi want, example :
def max_accuracy(thresholds, tp, fp, tn, fn) : accuracy = (tp + tn) / (tp + fp + tn + fn) return(max(accuracy)) def max_min_sensitivity_specificity(thresholds, tp, fp, tn, fn) : sensitivity = tp / (tp + fn) specificity = tn / (fp + tn) return(max(np.minimum(sensitivity, specificity)))
if want test :
y_score = np.random.uniform(size = 100) y_true = [np.random.binomial(1, p) p in y_score] data = roc_curve_data(y_true, y_score) %matplotlib inline # because personnaly use jupyter, can remove otherwise import matplotlib.pyplot plt plt.step(data[0], data[1]) plt.step(data[0], data[2]) plt.step(data[0], data[3]) plt.step(data[0], data[4]) plt.show() print("max accuracy is", max_accuracy(*data)) print("max of min(sensitivity, specificity) is", max_min_sensitivity_specificity(*data))
enjoy ;)
Comments
Post a Comment