The Least Squares Line   

One of the most important applications in statistics is finding the equation of the line that best fits a data set of the form
( x1,y1) ,( x2,y2) ,¼,(xn,yn)
where by best fit we mean the line which produces the least error. Specifically, the jth error or residual in approximating the data set with the line y = mx+b is
ej = mxj+b-yj
Thus, ej2 is the square of the vertical distance from the point to the line.

We then define the least squares line for the data set to be the line with the slope m and the y-intercept b that minimizes the total squared error

E( m,b) = n
å
j = 1 
( mxj+b-yj) 2
That is, the least squares line minimizes the sum of the squares of the residuals.       

EXAMPLE 6    Find the least squares line for the data set (1,1) ,  ( 2,3) ,  ( 3,5) , and (4,4) .       

Solution: To find E( m,b) , we expand the residuals and then compute their sum:
e1:
( m·1+b-1) 2
=
  m2+2mb-2m+b2-2b+1
e2:
( m·2+b-3) 2
=
4m2+4mb-12m+b2-6b+9
e3:
( m·3+b-5) 2
=
9m2+6mb-30m+b2-10b+25
e4:
( m·4+b-4) 2
=
16m2+8mb-32m+b2-8b+16
E( m,b)
=
30m2+20mb-76m+4b2-26b+51
The first partial derivative of E( m,b) are
Em( m,b) = 60m+20b-76    and    Eb( m,b) = 20m+8b-26
Thus, the critical points must satisfy

60m+20b  =  76
20m+8b  =  26

Multiplying the latter by -3 yields

60m + 20b  =    76
-60m - 24b  =   -78
0m -   4b  =   -2
Thus, b = 0.5 and likewise, we find that m = 1.1.
        The second derivatives of E( m,b) are
Emm = 60,    Emb = 20,    Ebb = 8
and as a result, the discriminant is
D = 60·8-( 20) 2 = 80 > 0
which implies that E( m,b) has a minimum at m = 1.1 and b = 0.5. Thus, the least squares line for the data set ( 1,1) , ( 2,3) , ( 3,5) , and ( 1,4) is y = 1.1x+0.5: