application:stock market forecast,self-driving car,recommendation.
Role: select function, output value is prediction.
linear model:y=b+wxi(b and w are parameters,xi is feature,w is weight,b is bias)
The superscript is used to mark the object number, and the subscript is used to display the object properties.
step2:goodness of function
training data is the real data.
Loss function L.input:function,output is how bad it is.L(f)=L(b,w)
So we can define Loss function as
gradient descent:Assump only one parameter,pick a random initial value w0,Differentiate this point.
But what is the value of increase or decrease?
Differential value or eta value(learning rate) are related to it, and both are related to it
Then continue to recalculate.We will get local optimal.NOT GLOBAL OPTIMAL!
How about two parameters?Actually the same as one parameter.Just do it twice.
What is gradient?We need to consider two parameters,like array:
But we are worry!Because we are trying our luck.
In linear regression,there is no local optimal.
We can still find the partial differential：