• 2021-02-21
  • 0

application:stock market forecast,self-driving car,recommendation.

Role: select function, output value is prediction.


function set.

linear model:y=b+wxi(b and w are parameters,xi is feature,w is weight,b is bias)

The superscript is used to mark the object number, and the subscript is used to display the object properties.

step2:goodness of function

training data is the real data.

Loss function L.input:function,output is how bad it is.L(f)=L(b,w)

So we can define Loss function as

step3:best function

gradient descent:Assump only one parameter,pick a random initial value w0,Differentiate this point.

But what is the value of increase or decrease?

Differential value or eta value(learning rate) are related to it, and both are related to it

Then continue to recalculate.We will get local optimal.NOT GLOBAL OPTIMAL!

How about two parameters?Actually the same as one parameter.Just do it twice.

What is gradient?We need to consider two parameters,like array:

But we are worry!Because we are trying our luck.

In linear regression,there is no local optimal.

We can still find the partial differential: