线性回归 |
f
(
x
)
=
W
T
⋅
x
+
b
f(x)=W^T cdot x + b
f(x)=WT⋅x+b |
最小二乘
L
(
W
,
b
)
=
(
f
(
x
)
−
y
)
2
L(W,b) = (f(x)-y)^2
L(W,b)=(f(x)−y)2
|
梯度下降、牛顿法 |
LOSSO回归 |
f
(
x
)
=
W
T
⋅
x
+
b
f(x)=W^T cdot x + b
f(x)=WT⋅x+b |
最小二乘
L
(
W
,
b
)
=
(
f
(
x
)
−
y
)
2
+
λ
∣
∣
W
∣
∣
1
L(W,b) = (f(x)-y)^2+lambda ||W||_1
L(W,b)=(f(x)−y)2+λ∣∣W∣∣1
|
坐标下降法 |
岭回归 |
f
(
x
)
=
W
T
⋅
x
+
b
f(x)=W^T cdot x + b
f(x)=WT⋅x+b |
最小二乘
L
(
W
,
b
)
=
(
f
(
x
)
−
y
)
2
+
1
2
λ
W
2
L(W,b) = (f(x)-y)^2+frac{1}{2}lambda W^2
L(W,b)=(f(x)−y)2+21λW2
|
梯度下降 |
逻辑回归 |
f
(
x
)
=
1
1
+
e
−
(
W
T
⋅
x
+
b
)
f(x)=frac{1}{1+e^{-(W^Tcdot x + b)}}
f(x)=1+e−(WT⋅x+b)1 |
交叉熵损失
−
l
n
p
(
y
∣
x
)
=
−
1
m
∑
i
=
1
m
(
y
l
n
y
^
+
(
1
−
y
)
l
n
(
1
−
y
^
)
)
-lnp(y|x)=-dfrac{1}{m}sum_{i=1}^m(ylnhat y+(1-y)ln(1-hat y))
−lnp(y∣x)=−m1∑i=1m(ylny^+(1−y)ln(1−y^)),
y
^
=
1
1
+
e
−
(
W
T
⋅
X
+
b
)
hat y = dfrac{1}{1+e^{-(W^T cdot X + b)}}
y^=1+e−(WT⋅X+b)1
|
梯度下降、牛顿法 |
感知机 |
f
(
x
)
=
s
i
g
n
(
W
T
⋅
x
+
b
)
f(x)=sign(W^T cdot x+b)
f(x)=sign(WT⋅x+b),sign是一个符号函数 |
让分错的点距离当前分离超平面距离尽可能的小
L
(
w
,
b
)
=
−
∑
x
i
∈
M
y
i
(
w
x
i
+
b
)
L(w,b)=-sum_{x_i in M} y_i(wx_i+b)
L(w,b)=−∑xi∈Myi(wxi+b) M是所有分类错误点的集合 |
随机梯度下降:每找到一个错误的样本更新一次参数 |
K近邻 |
y
=
a
r
g
m
a
x
∑
x
i
∈
N
K
(
x
)
I
(
y
i
=
c
j
)
y=argmax sum_{x_iin N_K(x)} I(y_i=c_j)
y=argmax∑xi∈NK(x)I(yi=cj) |
- |
- |
朴素贝叶斯 |
P
(
Y
=
c
k
∣
X
=
x
)
=
P
(
Y
=
c
k
)
∏
j
P
(
X
(
j
)
=
x
(
j
)
∣
Y
=
c
k
)
∑
k
P
(
Y
=
c
k
)
∏
j
P
(
X
(
j
)
=
x
(
j
)
∣
Y
=
c
k
)
P(Y=c_k|X=x)=frac{P(Y=c_k)prod_j P(X^{(j)}=x^{(j)}|Y=c_k)}{sum_kP(Y=c_k)prod_j P(X^{(j)}=x^{(j)}|Y=c_k)}
P(Y=ck∣X=x)=∑kP(Y=ck)∏jP(X(j)=x(j)∣Y=ck)P(Y=ck)∏jP(X(j)=x(j)∣Y=ck),条件独立性假设 |
经验风险期望最小化:
y
=
f
(
x
)
=
a
r
g
m
a
x
c
k
P
(
Y
=
c
k
)
∏
j
P
(
X
(
j
)
=
x
(
j
)
∣
Y
=
c
k
)
y=f(x)=argmax_{c_k}P(Y=c_k)prod_j P(X^{(j)}=x^{(j)}|Y=c_k)
y=f(x)=argmaxckP(Y=ck)∏jP(X(j)=x(j)∣Y=ck)
|
极大似然估计:
P
(
Y
=
c
k
)
=
∑
i
=
1
N
I
(
y
i
=
c
k
)
N
P(Y=c_k)=frac{sum_{i=1}^N I(y_i=c_k)}{N}
P(Y=ck)=N∑i=1NI(yi=ck),
P
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
=
∑
i
=
1
N
I
(
x
i
(
j
)
=
a
j
l
,
y
i
=
c
k
)
∑
i
=
1
N
I
(
y
i
=
c
k
)
P(X^{(j)}=a_{jl}|Y=c_k)=frac{sum_{i=1}^N I(x_i^{(j)}=a_{jl},y_i=c_k)}{sum_{i=1}^N I(y_i=c_k)}
P(X(j)=ajl∣Y=ck)=∑i=1NI(yi=ck)∑i=1NI(xi(j)=ajl,yi=ck)
|
SVM-线性可分 |
f
(
x
)
=
s
i
g
n
(
W
T
⋅
x
+
b
)
f(x)=sign(W^T cdot x+b)
f(x)=sign(WT⋅x+b) |
间隔最大化/合页损失:
m
i
n
1
2
∣
∣
w
∣
∣
2
mindfrac{1}{2}||w||^2
min21∣∣w∣∣2同时包含约束:
y
i
(
W
T
⋅
x
i
+
b
)
−
1
>
=
0
y_i(W^Tcdot x_i+b)-1>=0
yi(WT⋅xi+b)−1>=0
|
拉格朗日、SMO |
SVM-线性近似可分 |
f
(
x
)
=
s
i
g
n
(
W
T
⋅
x
+
b
)
f(x)=sign(W^T cdot x+b)
f(x)=sign(WT⋅x+b) |
间隔最大化/合页损失:
m
i
n
1
2
∣
∣
w
∣
∣
2
+
C
∑
i
=
1
N
ξ
i
minfrac{1}{2}||w||^2+Csum_{i=1}^Nxi_i
min21∣∣w∣∣2+C∑i=1Nξi,同时包含约束:
y
i
(
w
⋅
x
i
+
b
)
>
=
1
−
ξ
i
y_i(wcdot x_i+b)>=1-xi_i
yi(w⋅xi+b)>=1−ξi
|
拉格朗日、SMO |
SVM-支持向量机 |
f
(
x
)
=
s
i
g
n
(
W
T
⋅
x
+
b
)
f(x)=sign(W^T cdot x+b)
f(x)=sign(WT⋅x+b) |
间隔最大化/合页损失:
m
i
n
α
1
2
∑
i
=
1
N
∑
j
=
1
N
α
i
α
j
y
i
y
j
K
(
x
i
⋅
x
j
)
−
∑
i
=
1
N
α
i
min_{alpha}dfrac{1}{2}sum_{i=1}^Nsum_{j=1}^Nalpha_ialpha_jy_iy_jK(x_icdot x_j)-sum_{i=1}^Nalpha_i
minα21∑i=1N∑j=1NαiαjyiyjK(xi⋅xj)−∑i=1Nαi同时包含约束:
∑
i
=
1
N
α
i
y
i
=
0
sum_{i=1}^Nalpha_iy_i=0
∑i=1Nαiyi=0,
0
<
=
α
i
<
=
C
0<=alpha_i<=C
0<=αi<=C
|
拉格朗日、SMO |