# Least-Squares Regression in Cost Estimation

Least-squares regression is a statistical technique that may be used to estimate a linear total cost function for a mixed cost, based on past cost data. The cost function may then be used to predict the total cost at a given level of activity such as number of units produced or labor/machine hours used.

Least-squares regression mathematically calculates a line of best fit to a set of data pairs i.e. a series of activity levels and corresponding total-cost at each activity level. The calculation involves minimizing the sum of squares of the vertical distances between the data points and the cost function. The name least-squares regression also reflects this proposition, that the ideal fitting of the regression line is achieved by minimizing the sum of squares of the distances between the straight line and all the data points on the graph.

Assuming that the cost varies along y-axis and activity levels along x-axis, the required cost line may be represented in the form of following equation:

y = a + bx

In the above equation, *a* is the y-intercept of the line and it equals the approximate fixed cost at any level of activity. Whereas *b* is the slope of the line and it equals the average variable cost per unit of activity.

## Formulas

By using mathematical techniques beyond the scope of this article, the following formulas to calculate *a* and *b* may be derived:

Unit Variable Cost (b) = | nΣxy − (Σx)(Σy) |

nΣx^{2} − (Σx)^{2} |

Total Fixed Cost (a) = | Σy − bΣx |

n |

Where,

*n* is number of pairs of units–total-cost used in the calculation;

*Σy* is the sum of total costs of all data pairs;

*Σx* is the sum of units of all data pairs;

*Σxy* is the sum of the products of cost and units of all data pairs; and

*Σx ^{2}* is the sum of squares of units of all data pairs.

The following example based on the same data as in high-low method illustrates the usage of least squares linear regression method to split a mixed cost into its fixed and variable components.

## Example

Based on the following data of number of units produced and the corresponding total cost, estimate the total cost of producing 4,000 units. Use the least-squares linear regression method.

Month | Units | Cost |
---|---|---|

1 | 1,520 | $36,375 |

2 | 1,250 | 38,000 |

3 | 1,750 | 41,750 |

4 | 1,600 | 42,360 |

5 | 2,350 | 55,080 |

6 | 2,100 | 48,100 |

7 | 3,000 | 59,000 |

8 | 2,750 | 56,800 |

**Solution:**

Month | x | y | x^{2} | xy |
---|---|---|---|---|

1 | 1,520 | $36,375 | 2,310,400 | 55,290,000 |

2 | 1,250 | 38,000 | 1,562,500 | 47,500,000 |

3 | 1,750 | 41,750 | 3,062,500 | 73,062,500 |

4 | 1,600 | 42,360 | 2,560,000 | 67,776,000 |

5 | 2,350 | 55,080 | 5,522,500 | 129,438,000 |

6 | 2,100 | 48,100 | 4,410,000 | 101,010,000 |

7 | 3,000 | 59,000 | 9,000,000 | 177,000,000 |

8 | 2,750 | 56,800 | 7,562,500 | 156,200,000 |

16,320 | 377,465 | 35,990,400 | 807,276,500 |

We have,

n = 8;

Σx = 16,320;

Σy = 377,465;

Σx^{2} = 35,990,400; and

Σxy = 807,276,500

Calculating the average variable cost per unit:

b = | 8 × 807,276,500 − 16,320 × 377,465 | ≈ 13.8 |

8 × 35,990,400 − 16,320^{2} |

Calculating the approximate total fixed cost:

a = | 377,465 − 13.8078 × 16,320 | ≈ 19,015 |

8 |

The cost-volume formula now becomes:

y = 19,015 + 13.8x

At 4,000 activity level, the estimated total cost is $74,215 [= 19,015 + 13.8 × 4,000].

Written by Irfanullah Jan and last revised on