### Discover more content...

Enter some keywords in the search box above, we will do our best to offer you relevant results.

### We're sorry!

We couldn't find any results for your search. Please try again with another keywords.

# 二. Cost Function

$h_{\theta}(x) = \theta_{0} + \theta_{1}x$

1. 初始化 ${\theta_{0}}$ 和 ${\theta_{1}}$ , ${\theta_{0}}$ = 0 , ${\theta_{1}}$ = 0
2. 不断的改变 ${\theta_{0}}$ 和 ${\theta_{1}}$ 值，不断减少 $F({\theta_{0}},{\theta_{1}})$ 直至达到最小值（或者局部最小）。

\begin{align*}
\rm{temp}0 &:= {\theta_{0}} - \alpha * \frac{\partial }{\partial {\theta_{0}}}\rm{F}({\theta_{0}},{\theta_{1}}) \
\rm{temp}1 &:= {\theta_{1}} - \alpha * \frac{\partial }{\partial {\theta_{1}}}\rm{F}({\theta_{0}},{\theta_{1}}) \
{\theta_{0}} &:= \rm{temp}0 \
{\theta_{1}} &:= \rm{temp}1 \
\end{align*}

$\alpha$ 被称作为学习速率。

## 关于 梯度 和 偏导数 的关系

### 1. 导数

$f^{'}(x_{0}) = \lim_{\Delta x\rightarrow 0} \frac{\Delta y}{\Delta x} = \lim_{\Delta x\rightarrow 0} \frac{f(x_{0} + \Delta x) - f(x_{0}))}{\Delta x}$

### 2. 偏导数

$f_{x}(x_{0},y_{0}) = \lim_{\Delta x \rightarrow 0} \frac{f(x_{0} + \Delta x , y_{0}) - f(x_{0},y_{0})}{\Delta x}$
$f_{y}(x_{0},y_{0}) = \lim_{\Delta y \rightarrow 0} \frac{f(x_{0} , y_{0} + \Delta y) - f(x_{0},y_{0})}{\Delta y}$

### 3. 方向导数

$\left.\begin{matrix} \frac{\partial f}{\partial l} \end{matrix}\right|{(x{0},y_{0})} = \lim_{t \rightarrow 0^{+}} \frac{f(x_{0} + tcos \alpha , y_{0} + tcos \beta) - f(x_{0},y_{0})}{t}$

$\left.\begin{matrix} \frac{\partial f}{\partial l} \end{matrix}\right|{(x{0},y_{0})} = f_{x}(x_{0},y_{0})cos \alpha + f_{y}(x_{0},y_{0})cos \beta$

### 4. 梯度

$f_{x}(x_{0},y_{0}) \vec{i} + f_{y}(x_{0},y_{0}) \vec{j}$

$\textbf{grad};;f(x_{0},y_{0}) = \triangledown f(x_{0},y_{0}) = f_{x}(x_{0},y_{0}) \vec{i} + f_{y}(x_{0},y_{0}) \vec{j}$

\begin{align*} \left.\begin{matrix} \frac{\partial f}{\partial l} \end{matrix}\right|{(x{0},y_{0})} &= f_{x}(x_{0},y_{0})cos \alpha + f_{y}(x_{0},y_{0})cos \beta \ &= \textbf{grad};;f(x_{0},y_{0}) \cdot \vec{e_{j}} = \left | \textbf{grad};;f(x_{0},y_{0}) \right | cos \theta \ \end{align*}

1. 当 $\theta = 0$ 的时候，$\left.\begin{matrix} \frac{\partial f}{\partial l} \end{matrix}\right|{(x{0},y_{0})} = \left | \textbf{grad};;f(x_{0},y_{0}) \right |$

1. 当 $\theta = \pi$ 的时候，$\left.\begin{matrix} \frac{\partial f}{\partial l} \end{matrix}\right|{(x{0},y_{0})} = - \left | \textbf{grad};;f(x_{0},y_{0}) \right |$

# 四. Linear Regression 线性回归

\begin{align*}
\frac{\partial }{\partial {\theta_{j}}}\rm{F}({\theta_{0}},{\theta_{1}}) & = \frac{\partial }{\partial {\theta_{j}}} \frac{1}{2m}\sum_{i = 1}^{m} (h_{\theta}(x^{(i)})-y^{(i)})^2\
\end{align*}

\begin{align*}
\frac{\partial }{\partial {\theta_{j}}}\rm{F}({\theta_{0}},{\theta_{1}}) & = \frac{\partial }{\partial {\theta_{j}}} \frac{1}{2m}\sum_{i = 1}^{m} (h_{\theta}(x^{(i)})-y^{(i)})^2\
& = \frac{1}{2m}\sum_{i = 1}^{m} \frac{\partial z }{\partial u} \frac{\partial u }{\partial {\theta_{j}}} = \frac{1}{2m} * 2 \sum_{i = 1}^{m} u \frac{\partial u }{\partial {\theta_{j}}}\
& = \frac{1}{m} \sum_{i = 1}^{m} u \frac{\partial u }{\partial {\theta_{j}}} \
\end{align*}

\begin{align*}
\frac{\partial }{\partial {\theta_{0}}}\rm{F}({\theta_{0}},{\theta_{1}}) &= \frac{1}{m} \sum_{i = 1}^{m} u \frac{\partial u }{\partial {\theta_{0}}} \
&= \frac{1}{m} \sum_{i = 1}^{m}(\theta_{0} + \theta_{1}x^{(i)} - y^{(i)}) = \frac{1}{m} \sum_{i = 1}^{m}(h_{\theta}(x^{(i)}) - y^{(i)}) \
\end{align*}

\begin{align*}
\frac{\partial }{\partial {\theta_{1}}}\rm{F}({\theta_{0}},{\theta_{1}}) &= \frac{1}{m} \sum_{i = 1}^{m} u \frac{\partial u }{\partial {\theta_{1}}}\
&= \frac{1}{m} \sum_{i = 1}^{m}(\theta_{0} + \theta_{1}x^{(i)} - y^{(i)}) * x^{(i)} = \frac{1}{m} \sum_{i = 1}^{m}(h_{\theta}(x^{(i)}) - y^{(i)}) * x^{(i)} \
\end{align*}

\begin{align*}
\rm{temp}0 &:= {\theta_{0}} - \alpha * \frac{\partial }{\partial {\theta_{0}}}\rm{F}({\theta_{0}},{\theta_{1}}) = {\theta_{0}} - \alpha * \frac{1}{m} \sum_{i = 1}^{m}(h_{\theta}(x^{(i)}) - y^{(i)}) \
\rm{temp}1 &:= {\theta_{1}} - \alpha * \frac{\partial }{\partial {\theta_{1}}}\rm{F}({\theta_{0}},{\theta_{1}}) = {\theta_{1}} - \alpha * \frac{1}{m} \sum_{i = 1}^{m}(h_{\theta}(x^{(i)}) - y^{(i)}) * x^{(i)} \
{\theta_{0}} &:= \rm{temp}0 \
{\theta_{1}} &:= \rm{temp}1 \
\end{align*}

import numpy as np
x_train = np.array([[2.5], [3.5], [6.3], [9.9], [9.91], [8.02],
[4.5], [5.5], [6.23], [7.923], [2.941], [5.02],
[6.34], [7.543], [7.546], [8.744], [9.674], [9.643],
[5.33], [5.31], [6.78], [1.01], [9.68],
[9.99], [3.54], [6.89], [10.9]], dtype=np.float32)

y_train = np.array([[3.34], [3.86], [5.63], [7.78], [10.6453], [8.43],
[4.75], [5.345], [6.546], [7.5754], [2.35654], [5.43646],
[6.6443], [7.64534], [7.546], [8.7457], [9.6464], [9.74643],
[6.32], [6.42], [6.1243], [1.088], [10.342],
[9.24], [4.22], [5.44], [9.33]], dtype=np.float32)

y_data = np.array([[2.5], [3.5], [6.3], [9.9], [9.91], [8.02],
[4.5], [5.5], [6.23], [7.923], [2.941], [5.02],
[6.34], [7.543], [7.546], [8.744], [9.674], [9.643],
[5.33], [5.31], [6.78], [1.01], [9.68],
[9.99], [3.54], [6.89], [10.9]], dtype=np.float32)
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(x_train, y_train, 'bo',label='real')
plt.plot(x_train, y_data, 'r-',label='estimated')
plt.legend()
<matplotlib.legend.Legend at 0x7fb46c217908>

# Linear Regression with One Variable 测试

## 1.

Consider the problem of predicting how well a student does in her second year of college/university, given how well she did in her first year.

Specifically, let x be equal to the number of "A" grades (including A-. A and A+ grades) that a student receives in their first year of college (freshmen year). We would like to predict the value of y, which we define as the number of "A" grades they get in their second year (sophomore year).

Here each row is one training example. Recall that in linear regression, our hypothesis is hθ(x)=θ0+θ1x, and we use m to denote the number of training examples.

For the training set given above (note that this training set may also be referenced in other questions in this quiz), what is the value of m? In the box below, please enter your answer (which should be a number between 0 and 10).

4

## 2.

Consider the following training set of m=4 training examples:

x y
1 0.5
2 1
4 2
0 0

Consider the linear regression model hθ(x)=θ0+θ1x. What are the values of θ0 and θ1 that you would expect to obtain upon running gradient descent on this model? (Linear regression will be able to fit this data perfectly.)

θ0=0,θ1=0.5

θ0=1,θ1=1

θ0=0.5,θ1=0.5

θ0=1,θ1=0.5

θ0=0.5,θ1=0

θ0=0,θ1=0.5

## 3.

Suppose we set θ0=−1,θ1=2 in the linear regression hypothesis from Q1. What is hθ(6)?

-1 + 2*6 = 11

## 4.

Let f be some function so that
f(θ0,θ1) outputs a number. For this problem,
f is some arbitrary/unknown smooth function (not necessarily the
cost function of linear regression, so f may have local optima).
Suppose we use gradient descent to try to minimize f(θ0,θ1)
as a function of θ0 and θ1. Which of the
following statements are true? (Check all that apply.)

A. No matter how θ0 and θ1 are initialized, so long
as α is sufficiently small, we can safely expect gradient descent to converge
to the same solution.

B. If the first few iterations of gradient descent cause f(θ0,θ1) to
increase rather than decrease, then the most likely cause is that we have set the
learning rate α to too large a value.

C. If θ0 and θ1 are initialized at
the global minimum, then one iteration will not change their values.

D. Setting the learning rate α to be very small is not harmful, and can
only speed up the convergence of gradient descent.

B、C

## 5.

Suppose that for some linear regression problem (say, predicting housing prices as in the lecture), we have some training set, and for our training set we managed to find some θ0, θ1 such that J(θ0,θ1)=0.

Which of the statements below must then be true? (Check all that apply.)

A. For this to be true, we must have θ0=0 and θ1=0
so that hθ(x)=0

B. We can perfectly predict the value of y even for new examples that we have not yet seen.
(e.g., we can perfectly predict prices of even new houses that we have not yet seen.)

C. For these values of θ0 and θ1 that satisfy J(θ0,θ1)=0,
we have that hθ(x(i))=y(i) for every training example (x(i),y(i))

D. This is not possible: By the definition of J(θ0,θ1), it is not possible for there to exist
θ0 and θ1 so that J(θ0,θ1)=0

C

GitHub Repo：Halfrost-Field

Follow: halfrost · GitHub

Previous Post

Next Post