python - Getting the maximum accuracy for a binary probabilistic classifier in scikit-learn -

- June 15, 2011

is there built-in function maximum accuracy binary probabilistic classifier in scikit-learn?

e.g. maximum f1-score do:

# aucpr precision, recall, thresholds = sklearn.metrics.precision_recall_curve(y_true, y_score)     auprc  = sklearn.metrics.auc(recall, precision) max_f1 = 0 r, p, t in zip(recall, precision, thresholds):     if p + r == 0: continue     if (2*p*r)/(p + r) > max_f1:         max_f1 = (2*p*r)/(p + r)          max_f1_threshold = t

i compute maximum accuracy in similar fashion:

accuracies = [] thresholds = np.arange(0,1,0.1) threshold in thresholds:     y_pred = np.greater(y_score, threshold).astype(int)     accuracy = sklearn.metrics.accuracy_score(y_true, y_pred)     accuracies.append(accuracy)  accuracies = np.array(accuracies) max_accuracy = accuracies.max()  max_accuracy_threshold =  thresholds[accuracies.argmax()]

but wonder whether there built-in function.

i started improve solution transforming thresholds = np.arange(0,1,0.1) smarter, dichotomous way of finding maximum

then realized, after 2 hours of work, getting all accuracies far more cheaper finding maximum !! (yes totally counter-intuitive).

i wrote lot of comments here below explain code. feel free delete these make code more readable.

import numpy np  # definition : predict true if y_score > threshold def roc_curve_data(y_true, y_score):     y_true  = np.asarray(y_true,  dtype=np.bool_)     y_score = np.asarray(y_score, dtype=np.float_)     assert(y_score.size == y_true.size)      order = np.argsort(y_score) # ordering stuffs     y_true  = y_true[order]     # thresholds consider values of score, , 0 (accept everything)     thresholds = np.insert(y_score[order],0,0)     tp = [sum(y_true)] # number of true positives (for threshold = 0 => accept => tp[0] = # of postive in true y)     fp = [sum(~y_true)] # number of true positives (for threshold = 0 => accept => tp[0] = # of postive in true y)     tn = [0] # number of true negatives (for threshold = 0 => accept => don't have negatives !)     fn = [0] # number of true negatives (for threshold = 0 => accept => don't have negatives !)      in range(1, thresholds.size) : # "-1" because last threshold         # @ step, stop predicting y_score[i-1] true, false.... y_true value ?         # if y_true true, step mistake !         tp.append(tp[-1] - int(y_true[i-1]))         fn.append(fn[-1] + int(y_true[i-1]))         # if y_true false, step !         fp.append(fp[-1] - int(~y_true[i-1]))         tn.append(tn[-1] + int(~y_true[i-1]))      tp = np.asarray(tp, dtype=np.int_)     fp = np.asarray(fp, dtype=np.int_)     tn = np.asarray(tn, dtype=np.int_)     fn = np.asarray(fn, dtype=np.int_)      accuracy    = (tp + tn) / (tp + fp + tn + fn)     sensitivity = tp / (tp + fn)     specificity = tn / (fp + tn)     return((thresholds, tp, fp, tn, fn))

the process single loop, , algorithm trivial. in fact, stupidly simple function 10 times faster solution proposed before me (commpute accuracies thresholds = np.arange(0,1,0.1)) , 30 times faster previous smart-ass-dychotomous-algorithm...

you can compute any kpi want, example :

def max_accuracy(thresholds, tp, fp, tn, fn) :     accuracy    = (tp + tn) / (tp + fp + tn + fn)     return(max(accuracy))  def max_min_sensitivity_specificity(thresholds, tp, fp, tn, fn) :     sensitivity = tp / (tp + fn)     specificity = tn / (fp + tn)     return(max(np.minimum(sensitivity, specificity)))

if want test :

y_score = np.random.uniform(size = 100) y_true = [np.random.binomial(1, p) p in y_score] data = roc_curve_data(y_true, y_score)  %matplotlib inline # because personnaly use jupyter, can remove otherwise import matplotlib.pyplot plt plt.step(data[0], data[1]) plt.step(data[0], data[2]) plt.step(data[0], data[3]) plt.step(data[0], data[4]) plt.show()  print("max accuracy is", max_accuracy(*data)) print("max of min(sensitivity, specificity) is", max_min_sensitivity_specificity(*data))

enjoy ;)

Search This Blog

Chrom

python - Getting the maximum accuracy for a binary probabilistic classifier in scikit-learn -

Comments

Post a Comment

Popular posts from this blog

qt - Using float or double for own QML classes -

json - ORA-06502: PL/SQL: numeric or value error: character string buffer too small - Convert Clob to varchar2 -

ios - Swift Array Resetting Itself -