House Listings

- The descriptive statistics of the data

Statistic |
Price |
Size |
Bedrooms |
Baths |
Years |
FICO |
Days |

Mean | $ 357,026.47 | 3440.29 | 3.80 | 2.72 | 8.23 | 708.12 | 29.59 |

Standard Error | $ 15,682.73 | 137.71 | 0.15 | 0.10 | 0.49 | 7.32 | 0.98 |

Median | $ 323,417.00 | 3150.00 | 4.00 | 2.50 | 8.00 | 726.00 | 28.00 |

Mode | #N/A | 3440.00 | 3.00 | 1.50 | 6.00 | 813.00 | 27.00 |

Standard Deviation | $ 160,700.13 | 1411.11 | 1.50 | 1.01 | 5.01 | 75.02 | 10.08 |

Sample Variance | $25,824,533,133.67 | 1991239.34 | 2.26 | 1.03 | 25.14 | 5627.84 | 101.51 |

Kurtosis | $ 2.37 | 0.56 | -0.20 | -0.24 | -0.20 | -1.35 | 0.62 |

Skewness | $ 1.53 | 1.03 | 0.66 | 0.59 | 0.75 | -0.17 | 0.42 |

Range | $ 751,518.00 | 6120.00 | 6.00 | 4.00 | 19.00 | 241.00 | 56.00 |

Minimum | $ 167,962.00 | 1550.00 | 2.00 | 1.50 | 1.00 | 583.00 | 2.00 |

Maximum | $ 919,480.00 | 7670.00 | 8.00 | 5.50 | 20.00 | 824.00 | 58.00 |

Sum | $ 37,487,779.00 | 361230.00 | 399.00 | 285.50 | 864.00 | 74353.00 | 3107.00 |

Count | $ 105.00 | 105.00 | 105.00 | 105.00 | 105.00 | 105.00 | 105.00 |

- Regression model is of the form

In the dataset, Size is the lot size, while Township is the living area. Using Excel, the regression equation is:

From the model, one can note several things. First, the intercept is negative, which is not meaningful as one can not find a home whose lot size is zero and living area is nonexistent. Also, the coefficient of lot size implies that an additional thousand square feet of lot sizes adds $108.61 to the home price. More so, the coefficient of the living area is not meaningful. Here, meaning 90.57% of variations are explained by the model. One must perform hypothesis tests to test the significance of the coefficients.

The following tests are performed:

Rejecting indicates that the coefficient differs from zero, thus significant. From Excel, the p-values of , , and are 0.5162, 8.84E-54, and 0.62078 respectively. If p-value<0.05, reject and thus shows the significance of the coefficients. Thus, is significant while the others are not significant.

While testing for multicollinearity, one can use the variance inflation factor (VIF). After regressing Size against Township,

The variance inflation factor of the Size variable is less than 10, thus a mild variance inflation factor. Multicollinearity is not a problem here.

As the coefficient of the Township variable is not significant, include only the Size variable, thus:

R square equals 0.9054, indicating that the model explains 90.54% of variation.

- One can introduce a dummy variable to show the effect of having a pool, whereby, 1 show there is a pool, and zero absence of a pool.

Thus, the model is of the form:

Excel produces the model as:

Again, the intercept is negative, which is not meaningful. Checking whether the coefficients are significant, repeat the procedure above.

The p-values are 0.1353, 3.09E-55, 0.2541, and 0.002403. It indicates that the Size and the Pool coefficients are significant, while others are not.

Also, to check for multicollinearity, regress Size against Pool, and Township against Pool.

In both, VIF<10, showing that multicollinearity is not a problem.

However, after excluding the Township variable, the model is of the form:

The model explains 91.28% of the variation

- The inclusion of the dummy variable slightly improves the coefficient of determination to 0.9128. It might result from confounding variables.
- If for example an individual lives in a 4000 SqFt in Town 3,

If there is a pool, then,

If there is no pool, then,

As expected, a home with a pool is more expensive than that without one