Linear Regression is a part of supervised machine learning . Indeed, regression is the process of predicting a continuous value.

In regression there are two types of variables: a dependent variable and one or more independent variables.

If the there is one independent variable it is simple linear regression and if there is more than variable it is multivariate regression

For instance, one of the applications of regression analysis could be in the area of
sales forecasting.

You can try to predict a salesperson’s total yearly sales from independent variables such as age, education, and years of experience.

Lets dig deeper and look how it works .

So lets explain it with an
example , suppose you want travel from Kolkata to Delhi . lets look at two variable for that is the distance and other is amount to be paid for the petrol .

Price of the petrol is dependent variable . As it is dependent on the
distance .

Distance is the independent variable as it is fixed .

Let us consider price of petrol tobe Y and distance to be x.

From your previous journey you will know how much you have paid for a curtain distance.

Lets
formulate in the tabular way

With this previous journeys estimation I want to know how much I have to pay for 15 km ?? Lets plot it .

Image from (http://www.alcula.com/calculators/statistics/linear-regression/)

So we can see that there is linear relationship with the price for petrol is linearly dependent with the distanceWith the increase/decrease in distance price of petrol will be higher/lower .

Lets now dig into the maths part of it

So from our early maths class if we remember the equation is of straight line is y = mx + c

M = slope of the line of dy/dx

C = intercept .

So now we need to know how good is our best fitting line lets do it .

Its is calculated by Summation of mean square error (How much the predicted value differ from the actual value ) . Lesser the mean square error higher is the chances of fitting the line, from online regression calculator we get the equation of the line to be .

- Find the regression line.
- Insert your X values into the linear regression equationto find the new Y values (Y’).
- Subtract the new Y value from the original to get the error.
- Square the errors.
- Add up the errors.
- Find the mean.

y=31.626016260163x-20.406504065041

10 |

12 |

14 |

8 |

20 |

18 |

22 |

21 |

Let's calculate new Y’

295.8537 |

359.1057 |

422.3577 |

232.6016 |

612.1138 |

548.8618 |

675.3659 |

643.7398 |

Y’=31.626016260163(10)-20.406504065041 = 296

Y’=31.626016260163(20)-20.406504065041 = 359

Y’=31.626016260163(14)-20.406504065041 = 422

Y’=31.626016260163(8)-20.406504065041 = 232

Y’=31.626016260163(20)-20.406504065041 = 612

Y’=31.626016260163(18)-20.406504065041 = 548

Y’=31.626016260163(22)-20.406504065041 = 675

Y’=31.626016260163(21)-20.406504065041 = 744

Now lets look the the difference from original Y = (y-y’) .

So after calculating the difference it is :

4.146341 |

-9.10569 |

-22.3577 |

17.39837 |

-12.1138 |

1.138211 |

24.63415 |

-3.73984 |

Calculating Mean Square Value :

Now take the square of the difference which is coming to

17.19214753 |

82.91360962 |

499.8678036 |

302.7034173 |

146.7446626 |

1.29552515 |

606.841166 |

13.98638377 |

Now add the residual value and divide by the no of sample .

1671/8 = 209 .

So what does mean square tell ??

The smaller the mean square error , the closer you to finding the best fitting line .

on the data it is very much unlike to get a very small square mean value .So there are techniques to optimize this with the use Gradient Descend will look loser into gradient descend in our next post !!

Correlation vs Covariance

Simple linear regression

data science interview questions

Correlation vs Covariance

Simple linear regression

data science interview questions