# Least-Squares Regression in Cost Estimation

Least-squares regression is a statistical technique that may be used to estimate a linear total cost function for a mixed cost, based on past cost data. The cost function may then be used to predict the total cost at a given level of activity such as number of units produced or labor/machine hours used.

Least-squares regression mathematically calculates a line of best fit to a set of data pairs i.e. a series of activity levels and corresponding total-cost at each activity level. The calculation involves minimizing the sum of squares of the vertical distances between the data points and the cost function. The name least-squares regression also reflects this proposition, that the ideal fitting of the regression line is achieved by minimizing the sum of squares of the distances between the straight line and all the data points on the graph.

Assuming that the cost varies along y-axis and activity levels along x-axis, the required cost line may be represented in the form of following equation:

y = a + bx

In the above equation, a is the y-intercept of the line and it equals the approximate fixed cost at any level of activity. Whereas b is the slope of the line and it equals the average variable cost per unit of activity.

## Formulas

By using mathematical techniques beyond the scope of this article, the following formulas to calculate a and b may be derived:

 Unit Variable Cost (b) = nΣxy − (Σx)(Σy) nΣx2 − (Σx)2
 Total Fixed Cost (a) = Σy − bΣx n

Where,
n is number of pairs of units–total-cost used in the calculation;
Σy is the sum of total costs of all data pairs;
Σx is the sum of units of all data pairs;
Σxy is the sum of the products of cost and units of all data pairs; and
Σx2 is the sum of squares of units of all data pairs.

The following example based on the same data as in high-low method illustrates the usage of least squares linear regression method to split a mixed cost into its fixed and variable components.

## Example

Based on the following data of number of units produced and the corresponding total cost, estimate the total cost of producing 4,000 units. Use the least-squares linear regression method.

MonthUnitsCost
11,520\$36,375
21,25038,000
31,75041,750
41,60042,360
52,35055,080
62,10048,100
73,00059,000
82,75056,800

Solution:

Monthxyx2xy
11,520\$36,3752,310,40055,290,000
21,25038,0001,562,50047,500,000
31,75041,7503,062,50073,062,500
41,60042,3602,560,00067,776,000
52,35055,0805,522,500129,438,000
62,10048,1004,410,000101,010,000
73,00059,0009,000,000177,000,000
82,75056,8007,562,500156,200,000
16,320377,46535,990,400807,276,500

We have,
n = 8;
Σx = 16,320;
Σy = 377,465;
Σx2 = 35,990,400; and
Σxy = 807,276,500

Calculating the average variable cost per unit:

 b = 8 × 807,276,500 − 16,320 × 377,465 ≈ 13.8 8 × 35,990,400 − 16,3202

Calculating the approximate total fixed cost:

 a = 377,465 − 13.8078 × 16,320 ≈ 19,015 8

The cost-volume formula now becomes:

y = 19,015 + 13.8x

At 4,000 activity level, the estimated total cost is \$74,215 [= 19,015 + 13.8 × 4,000].