Input Data :
Data set x = 4, 5, 6, 7, 10
Data set y = 3, 8, 20, 30, 12
Total number of elements = 5
Objective :
Find what is the linear relationship between two datsset X and Y?
Solution :
Xmean = (4 + 5 + 6 + 7 + 10)/5
= 32/5
Xmean = 6.4
Ymean = (3 + 8 + 20 + 30 + 12)/5
= 73/5
Ymean = 14.6
Slope = (∑y)(∑x²) - (∑x)(∑xy)n(∑x²) - (∑x)²
∑y = 3 + 8 + 20 + 30 + 12
∑y = 73
∑x² = (4)² + ( 5)² + ( 6)² + ( 7)² + ( 10)²
= 16 + 25 + 36 + 49 + 100
∑x² = 226
∑x = 4 + 5 + 6 + 7 + 10
∑x = 32
∑xy = (4 x 3) + ( 5 x 8) + ( 6 x 20) + ( 7 x 30) + ( 10 x 12)
∑xy = 12 + 40 + 120 + 210 + 120
∑xy = 502
Apply the values in above formula
Slope = (73 x 226) - (32 x 502))(5 x 226) - (32)²
= 16498 - 160641130 - 1024
= 434106
Slope = 4.0943
Intercept = n(∑xy) - (∑x)(∑y)n(∑x²) - (∑x)²
= 5(502) - (32 x 73)(5 x 226) - (32)²
= 2510 - 23361130 - 1024
= 174106
Intercept = 1.6415
Regression equation = Intercept + Slope x
Regression equation = 1.6415 + 4.0943 x
Linear Regression calculator uses the least squares method to find the line of best fit for a sets of data `X` and `Y` or the linear relationship between two dataset. It estimates the value of a dependent variable `Y` from a given independent variable `X`. It's an online statistics and probability tool requires two sets of data `X` and `Y` and finds the relationship between two variables by fitting a linear equation to observed data.
It is necessary to follow the next steps:
Linear regression is a model of the relationship between a dependent variable `y` and independent variables `x` by linear prediction function $\hat {y}=a+bx$. Linear functions are used to model the data in linear regression and the unknown model parameters are estimated from the data. Such method of modeling data is known as linear models. For more two or more variables, this modeling is called multiple linear regression. Linear regression models are often fitted using the least squares regression line. The least squares regression line is the line $\hat {y}=a+bx$ that makes the vertical distance from the data points to the regression line as small as possible. We call it "least squares" because the best line of fit is one that minimizes the sum of squares of the errors. So, the line of best fit is the least squares regression line $\hat {y}=a+bx$, where $b$ is the slope of the line and `a` is the `Y`-intercept.
Let us consider two samples $X=(x_1,\ldots,x_n)$ and $Y=(y_1,\ldots, y_n)$ of `n` outcomes. Coefficients `a` and `b` of the least squares regression line, $\hat {y}=a+bx$, can be determined from the equations:
Linear regression has many applications. If the goal is a prediction, linear regression can be used to fit a predictive model to a data set of values of the response and explanatory variables. Linear regression can help in analyzing the impact of varied factors on business sales and profits. For example, predictive analytics, operation efficiency, correcting errors, etc. By using this concept, we can analyze the marketing effectiveness, pricing, and promotions on sales of a product.
Also, linear regression can be useful in studying engine performance from test data in automobiles, to model causal relationships between parameters in biological systems, and in many other fields of science and life.
Practice Problem 1:
Mitchell is the basketball player. The number of minutes in games `X` and the numbers of points `Y` are in the table below
G1 | G2 | G3 | G4 | G5 | G6 | G7 |
26 | 38 | 19 | 36 | 38 | 12 | 24 |
12 | 15 | 9 | 26 | 34 | 5 | 15 |
Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday | |
Geometry | 14 | 18 | 19 | 36 | 18 | 2 | 14 |
Algebra | 24 | 45 | 19 | 16 | 14 | 5 | 16 |