ML-Cheat-Sheet

Basic Rules of Differentiation

Basic Rules

  • Constant Rule:

  • Power Rule:

  • Linear Combination:

  • Product Rule:

  • Quotient Rule:

  • Chain Rule:

  • Exponential: | |

  • Logarithmic ||

Linear Regression

1. Hypothesis

2. Cost Function

Mean Squared Error (MSE):

3. Optimization

  • Gradient Descent:
  • Normal Equation:

Logistic Regression

1. Hypothesis

  • Prediction Rule:
    • Predict if , otherwise .

2. Cost Function

Log Loss:

3. Optimization

  • Gradient Descent:

4. Sigmoid Properties

  • Output:
  • Derivative:

Ridge Regression

Loss Function

Adds regularization to prevent overfitting:

  • : Regularization parameter. Higher values shrink .

Optimization

  • Closed-form Solution:
  • Gradient Descent:

Bayesian Classification

Dataset

  • ,

Posterior Probability

The probability of class given input :

If features are conditionally independent:

SVM

Hard SVM
Hyperplane:
Constraint:
Goal: s.t.
Lagrangian:
Partial derivative:
Solution:
Lagrangian becomes: s.t. and
Weight vector:
Bias:

Soft SVM
Hyperplane:
Constraint:
Goal:
Lagrangian:
Partial Derivative:
Solution:
Dual Problem:
s.t.
Weight vector:
Bias:
The reason that ξ disappears: The slack variables disappear in the dual problem because they are implicitly handled through the Lagrange multipliers .
By taking the derivative of the Lagrangian with respect to , we obtain: This relationship ensures that is bounded by .
Consequently, the slack variables do not explicitly appear in the dual formulation. Instead, the dual problem balances maximizing the margin and allowing for misclassification through the constraint on .

Kernel SVM
Hyperplane:
Constraint:
Goal:
Lagrangian (Dual):
s.t.
Weight vector:
Decision Function:
Bias:
Kernel Functions:
Linear:
Polynomial:
Gaussian (RBF):
Sigmoid:

MLE and MAP

MLE

构建似然函数:联合分布
取对数简化计算
求导并设为 0,解得
验证极值:通过二阶导数等方式确保是最大值。

MAP

结合先验构建后验概率
取对数后验函数
求导并设为 0,解得
验证极值:确保找到最大值。