Week 5¶
Classification¶
Let
where
We could try to just do linear regression: under squared error loss,
Logistic Regression¶
Instead of modeling
where
This is called logistic regression model.
For
Motivation and Interpretation¶
On one hand,
If
The log odds (or logit) of the probability is the logarithm of the odds, i.e.
Hence, in logistic regression, we make an assumption that the log odds is linearly dependent on
Another more insightful motivation connects the logistic function to the exponential family from the prospective of generative models:
Procedure of Logistic Regression¶
How to use logistic regression for prediction:
-
Get training data
-
Use training data to select a
, i.e. fit our model to the training data. -
Get a new
want to predict the class of : Compute the estimated probability .-
If
, predict that is in class . -
If
, predict that is in class .
-
This procedure is designed to minimize the misclassification error. However, some errors are "worse" than others. For example, for a spam email filter, misclassifying spams as normal emails is unlikely to cause big issue, but it may be terrible in the opposite. To account for this we can modify the 0.5 threshold into a larger number (such as 0.95) when predicting a email to be a spam.
Multinomial Regression¶
Let
For
where we regard the
Calculate the Estimator¶
We use maximum likelihood to estimate
Assume
Then the MLE is defined as
Unfortunately, unlike linear regression,
Brief comparison of GD and NR:
-
NR Usually takes more intelligent steps than GD.
-
NR There is no tuning required while in GD we have to tune the step size.
-
Problem with NR We have to invert the the Hessian matrix at each step. Inverting such a matrix has complexity
. For GD we only have to compute the gradient, which has complexity .