Tags: , , ,

Categories:

Updated:


Logistic Regression

Sigmoid function

y=11+exy = \frac{1}{1 + e^{-x}}

Apply Sigmoid function to the regression.

Yβ(x)=11+eβTx+ϵPr(Yi=yXi)=piy(1pi)1y=(eβXi1+eβXi)y(1eβXi1+eβXi)1y=eβXiy1+eβXi\begin{aligned} Y_{\beta}(x) & = \frac{1}{1 + e^{-\beta^T x+\epsilon}}\\ \\ \Pr(Y_{i}=y\mid \mathbf{X} _{i}) & = {p_{i}}^{y}(1-p_{i})^{1-y}\\ & = \left({\frac{e^{\mathbf{\beta}\cdot \mathbf{X} _{i}}}{1+e^{\mathbf{\beta} \cdot \mathbf{X} _{i}}}}\right)^{y}\left(1-{\frac {e^{\mathbf{\beta}\cdot \mathbf{X} _{i}}}{1+e^{\mathbf{\beta}\cdot \mathbf{X} _{i}}}}\right)^{1-y}\\ & = {\frac {e^{\mathbf{\beta}\cdot \mathbf{X} _{i}\cdot y}}{1+e^{\mathbf{\beta}\cdot \mathbf{X} _{i}}}} \end{aligned}

And it convert into a odds ratio.

P(x)1P(x)=eβTx\frac{P(x)}{1-P(x)} = e^{\beta^T x}

and

lnP(x)1P(x)=βTx\ln{\frac{P(x)}{1-P(x)}} = \beta^T x
from sklearn.linear_model import LogisticRegression
LR = LogisticRegression(penalty='l2', C=1e5)
LR = LR.fit(x_train, y_train)
y_predict = LR.predict(X_test)
LR.coef_

Confusion Matrix

  Predicted True Predicted False
Actual True TP FN
Actual False FP TN

FN is called type 1 error, and FP is called type 2 error.

Accuracy is the ratio of correct predictions to all predictions.

=TP+TNTP+TN+FP+FN= \frac{TP + TN}{TP + TN + FP + FN}

Sensitivity, Recall Correctly predict the positive class. What percentage is captured true

=TPTP+FN=TPActual True= \frac{TP}{TP + FN} = \frac{TP}{Actual\ True}

Precision is, out of all positive predictions, how many are correct. Trade off between recall and precision.

=TPTP+FP=TPPredicted True= \frac{TP}{TP + FP} = \frac{TP}{Predicted\ True}

Specificity is how correctly predicted the negative class. Recall for class 0.

=TNTN+FP=TNActual False= \frac{TN}{TN + FP} = \frac{TN}{Actual\ False}

F1 is the harmonic mean of precision and recall. It captures trade off between recall and precision.

=2×Precision×RecallPrecision+Recall= 2\times \frac{Precision \times Recall}{Precision + Recall}

Classification Error Metrics

ROC

Receiver Operating Characteristics is a scatter Plot of True Positive Rate (TPR, Sensitivity) and False Positive Rate (FPR 1-Specificity).
Better when data with balanced classes.

smallcenter

Precision-Recall Curves

Trade off between precision and recall.
Better for data with imbalanced classes.

smallcenter

Multiple Class Error Metrics

Accuracy $= \frac{TP1+TP2+TP3}{Total}$

Code

from sklearn.metrics import accuracy_score
accuracy_value = accuracy_score(y_test, y_predict)

from sklearn.metrics import precision_score, recall_score, f1_score, roc_auc_score, confusion_matrix, roc_curve, precision_recall_curve

Leave a comment