### Discover more content...

Enter some keywords in the search box above, we will do our best to offer you relevant results.

### We're sorry!

We couldn't find any results for your search. Please try again with another keywords.

## 一. Classification and Representation

$g(x) = \frac{1}{1+e^{-x}}$

$g(x) = \frac{1}{1+e^{-\theta^{T}x}}$

## 二. Logistic Regression Model

### 1. Cost Function

$\rm{CostFunction} = \rm{F}({\theta}) = \frac{1}{m}\sum_{i = 1}^{m} (h_{\theta}(x^{(i)})-y^{(i)})^2$

$\rm{CostFunction} = \rm{F}({\theta}) = \frac{1}{m}\sum_{i = 1}^{m} \rm{Cost}(h_{\theta}(x^{(i)}),y^{(i)})$

$\rm{Cost}(h_{\theta}(x^{(i)}),y^{(i)}) = \left{\begin{matrix} -log(h_{\theta}(x)) &if ; y = 1 \ -log(1-h_{\theta}(x)) & if; y = 0 \end{matrix}\right.$

### 2. Simplified Cost Function and Gradient Descent

$\rm{Cost}(h_{\theta}(x),y) = - ylog(h_{\theta}(x)) - (1-y)log(1-h_{\theta}(x))$

\begin{align*} \rm{CostFunction} = \rm{F}({\theta}) &= \frac{1}{m}\sum_{i = 1}^{m} \rm{Cost}(h_{\theta}(x^{(i)}),y^{(i)})\ &= -\frac{1}{m}\left [ \sum_{i=1}^{m} y^{(i)}logh_{\theta}(x^{(i)}) + (1-y^{(i)})log(1-h_{\theta}(x^{(i)})) \right ] \ \left( h_{\theta}(x) = \frac{1}{1+e^{-\theta^{T}x}} \right ) \end{align*}

\begin{align*} h &= g(X\theta)\ \rm{CostFunction} = \rm{F}({\theta}) &= \frac{1}{m} \left ( -\overrightarrow{y}^{T}log(h) - (1-\overrightarrow{y})^{T}log(1-h) \right ) \ \end{align*}

$\theta_{j} := \theta_{j} - \alpha \frac{1}{m} \sum_{i=1}^{m}(h_{\theta}(x^{(i)})-y^{(i)})x^{(i)}_{j}$

$\theta := \theta - \alpha \frac{1}{m} X^{T}(g(X\Theta)-\vec{y})$

### 3. 求导过程

\begin{align*} \sigma(x)'&=\left(\frac{1}{1+e^{-x}}\right)'=\frac{-(1+e^{-x})'}{(1+e^{-x})^2}=\frac{-1'-(e^{-x})'}{(1+e^{-x})^2}=\frac{0-(-x)'(e^{-x})}{(1+e^{-x})^2}=\frac{-(-1)(e^{-x})}{(1+e^{-x})^2}=\frac{e^{-x}}{(1+e^{-x})^2} \newline &=\left(\frac{1}{1+e^{-x}}\right)\left(\frac{e^{-x}}{1+e^{-x}}\right)=\sigma(x)\left(\frac{+1-1 + e^{-x}}{1+e^{-x}}\right)=\sigma(x)\left(\frac{1 + e^{-x}}{1+e^{-x}} - \frac{1}{1+e^{-x}}\right)\ &=\sigma(x)(1 - \sigma(x))\ \end{align*}

\begin{align*} \frac{\partial}{\partial \theta_j} J(\theta) &= \frac{\partial}{\partial \theta_j} \frac{-1}{m}\sum_{i=1}^m \left [ y^{(i)} log (h_\theta(x^{(i)})) + (1-y^{(i)}) log (1 - h_\theta(x^{(i)})) \right ] \newline&= - \frac{1}{m}\sum_{i=1}^m \left [ y^{(i)} \frac{\partial}{\partial \theta_j} log (h_\theta(x^{(i)})) + (1-y^{(i)}) \frac{\partial}{\partial \theta_j} log (1 - h_\theta(x^{(i)}))\right ] \newline&= - \frac{1}{m}\sum_{i=1}^m \left [ \frac{y^{(i)} \frac{\partial}{\partial \theta_j} h_\theta(x^{(i)})}{h_\theta(x^{(i)})} + \frac{(1-y^{(i)})\frac{\partial}{\partial \theta_j} (1 - h_\theta(x^{(i)}))}{1 - h_\theta(x^{(i)})}\right ] \newline&= - \frac{1}{m}\sum_{i=1}^m \left [ \frac{y^{(i)} \frac{\partial}{\partial \theta_j} \sigma(\theta^T x^{(i)})}{h_\theta(x^{(i)})} + \frac{(1-y^{(i)})\frac{\partial}{\partial \theta_j} (1 - \sigma(\theta^T x^{(i)}))}{1 - h_\theta(x^{(i)})}\right ] \newline&= - \frac{1}{m}\sum_{i=1}^m \left [ \frac{y^{(i)} \sigma(\theta^T x^{(i)}) (1 - \sigma(\theta^T x^{(i)})) \frac{\partial}{\partial \theta_j} \theta^T x^{(i)}}{h_\theta(x^{(i)})} + \frac{- (1-y^{(i)}) \sigma(\theta^T x^{(i)}) (1 - \sigma(\theta^T x^{(i)})) \frac{\partial}{\partial \theta_j} \theta^T x^{(i)}}{1 - h_\theta(x^{(i)})}\right ] \newline&= - \frac{1}{m}\sum_{i=1}^m \left [ \frac{y^{(i)} h_\theta(x^{(i)}) (1 - h_\theta(x^{(i)})) \frac{\partial}{\partial \theta_j} \theta^T x^{(i)}}{h_\theta(x^{(i)})} - \frac{(1-y^{(i)}) h_\theta(x^{(i)}) (1 - h_\theta(x^{(i)})) \frac{\partial}{\partial \theta_j} \theta^T x^{(i)}}{1 - h_\theta(x^{(i)})}\right ] \newline&= - \frac{1}{m}\sum_{i=1}^m \left [ y^{(i)} (1 - h_\theta(x^{(i)})) x^{(i)}j - (1-y^{(i)}) h\theta(x^{(i)}) x^{(i)}j\right ] \newline&= - \frac{1}{m}\sum{i=1}^m \left [ y^{(i)} (1 - h_\theta(x^{(i)})) - (1-y^{(i)}) h_\theta(x^{(i)}) \right ] x^{(i)}j \newline&= - \frac{1}{m}\sum{i=1}^m \left [ y^{(i)} - y^{(i)} h_\theta(x^{(i)}) - h_\theta(x^{(i)}) + y^{(i)} h_\theta(x^{(i)}) \right ] x^{(i)}j \newline&= - \frac{1}{m}\sum{i=1}^m \left [ y^{(i)} - h_\theta(x^{(i)}) \right ] x^{(i)}j \newline&= \frac{1}{m}\sum{i=1}^m \left [ h_\theta(x^{(i)}) - y^{(i)} \right ] x^{(i)}_j \end{align*}

$\nabla J(\theta) = \frac{1}{m} \cdot X^T \cdot \left(g\left(X\cdot\theta\right) - \vec{y}\right)$

BFGS，
L_BFGS，

1. 不需要手动选择学习率 $\alpha$ 。可以理解为它们有一个智能的内循环(线搜索算法)，它会自动尝试不同的学习速率 $\alpha$，并自动选择一个最好的学习速率 $\alpha$ 。甚至还可以为每次迭代选择不同的学习速率，那么就不需要自己选择了。
2. 收敛速度远远快于梯度下降。



jVal = (theta(1)-5)^2 + (theta(2)-5)^2;




options = optimset('GrabObj','on','MaxIter','100');
initialTheta = zeros(2,1);
[optTheta, functionVal, exitFlag] = fminunc(@costFunction, initialTheta, options);




optTheta =

5.0000
5.0000

functionVal = 1.5777e-030
exitFlag = 1



optTheta 表示的是最终求得的结果，functionVal 表示的是代价函数的最小值，这里是 0，是我们期望的。exitFlag 表示的是最终是否收敛，1表示收敛。


x =fminunc(fun,x0)                                   %试图从x0附近开始找到函数的局部最小值，x0可以是标量，向量或矩阵
x =fminunc(fun,x0,options)                           %根据结构体options中的设置来找到最小值，可用optimset来设置options
x =fminunc(problem)                                  %为problem找到最小值,而problem是在Input Arguments中定义的结构体

[x,fval]= fminunc(...)                               %返回目标函数fun在解x处的函数值
[x,fval,exitflag]= fminunc(...)                      %返回一个描述退出条件的值exitflag
[x,fval,exitflag,output]= fminunc(...)               %返回一个叫output的结构体，它包含着优化的信息



## 四. Logistic Regression 测试

### 1. Question 1

Suppose that you have trained a logistic regression classifier, and it outputs on a new example x a prediction hθ(x) = 0.7. This means (check all that apply):

A. Our estimate for P(y=1|x;θ) is 0.7.
B. Our estimate for P(y=0|x;θ) is 0.3.
C. Our estimate for P(y=1|x;θ) is 0.3.
D. Our estimate for P(y=0|x;θ) is 0.7.

### 2. Question 2

Suppose you have the following training set, and fit a logistic regression classifier hθ(x)=g(θ0+θ1x1+θ2x2).

Which of the following are true? Check all that apply.

A. Adding polynomial features (e.g., instead using hθ(x)=g(θ0+θ1x1+θ2x2+θ3x21+θ4x1x2+θ5x22) ) could increase how well we can fit the training data.

B. At the optimal value of θ (e.g., found by fminunc), we will have J(θ)≥0.

C. Adding polynomial features (e.g., instead using hθ(x)=g(θ0+θ1x1+θ2x2+θ3x21+θ4x1x2+θ5x22) ) would increase J(θ) because we are now summing over more terms.

D. If we train gradient descent for enough iterations, for some examples x(i) in the training set it is possible to obtain hθ(x(i))>1.

### 3. Question 3

For logistic regression, the gradient is given by ∂∂θjJ(θ)=1m∑mi=1(hθ(x(i))−y(i))x(i)j. Which of these is a correct gradient descent update for logistic regression with a learning rate of α? Check all that apply.

A. θj:=θj−α1m∑mi=1(hθ(x(i))−y(i))x(i) (simultaneously update for all j).

B. θj:=θj−α1m∑mi=1(hθ(x(i))−y(i))x(i)j (simultaneously update for all j).

C. θj:=θj−α1m∑mi=1(11+e−θTx(i)−y(i))x(i)j (simultaneously update for all j).

D. θ:=θ−α1m∑mi=1(θTx−y(i))x(i).

### 4. Question 4

Which of the following statements are true? Check all that apply.

A. The cost function J(θ) for logistic regression trained with m≥1 examples is always greater than or equal to zero.

B. Linear regression always works well for classification if you classify by using a threshold on the prediction made by linear regression.

C. The one-vs-all technique allows you to use logistic regression for problems in which each y(i) comes from a fixed, discrete set of values.

D. For logistic regression, sometimes gradient descent will converge to a local minimum (and fail to find the global minimum). This is the reason we prefer more advanced optimization algorithms such as fminunc (conjugate gradient/BFGS/L-BFGS/etc).

D由于使用代价函数为线性回归代价函数，会有很多局部最优值

### 5. Question 5

Suppose you train a logistic classifier hθ(x)=g(θ0+θ1x1+θ2x2). Suppose θ0=6,θ1=0,θ2=−1. Which of the following figures represents the decision boundary found by your classifier?

6-x2>=0 即X2<6时为1

GitHub Repo：Halfrost-Field

Follow: halfrost · GitHub

Previous Post

Next Post