Call:
lm(formula = total_pr ~ cond + stock_photo + duration + wheels,
data = mariokart)
Residuals:
Min 1Q Median 3Q Max
-19.485 -6.511 -2.530 1.836 263.025
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 43.5201 8.3701 5.199 7.05e-07 ***
condused -2.5816 5.2272 -0.494 0.622183
stock_photoyes -6.7542 5.1729 -1.306 0.193836
duration 0.3788 0.9388 0.403 0.687206
wheels 9.9476 2.7184 3.659 0.000359 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 24.4 on 138 degrees of freedom
Multiple R-squared: 0.1235, Adjusted R-squared: 0.09808
F-statistic: 4.86 on 4 and 138 DF, p-value: 0.001069
plot(m)
Residuals vs Fitted
This plot tells you the magnitude of the difference between the residuals and the fitted values. There are three things to watch for here. First, are there any drastic outliers? Yes, there are two, points 65 and 20. (Those are row numbers in the data frame.) You need to investigate those and decide whether to omit them from further analysis. Were they typos? Mismeasurements? Or is the process from which they derive intrinsically subject to occasional extreme variation. In the third case, you probably don’t want to omit them.
Second, is the solid red line near the dashed zero line? Yes it is, indicating that the residuals have a mean of approximately zero. (The red line shows the mean of the residuals in the immediate region of the \(x\)-values of the observed data.)
Third, is there a pattern to the residuals? No, there is not. The residuals appear to be of the same general magnitude at one end as the other. The things that would need action would be a curve or multiple curves, or a widening or narrowing shape, like the cross section of a horn.
plot(m,which=c(1))
Normal Q-Q
This is an important plot. I see many students erroneously claiming that residuals are normally distributed because they have a vague bell shape. That is not good enough to detect normality. The Q-Q plot is the standard way to detect normality. If the points lie along the dashed line, you can be reasonably safe in an assumption of normality. If they deviate from the dashed line, the residuals are probably not normally distributed.
plot(m,which=c(2))
Scale-Location
Look for two things here. First, the red line should be approximately horizontal, meaning that there is not much variability in the standardized residuals. Second, look at the spread of the points around the red line. If they don’t show a pattrn, this reinforces the assumption of homoscedasticity that we already found evidence for in the first plot.
plot(m,which=c(3))
plot(m,which=c(4))
Residuals vs Leverage
This shows you influential points that you may want to remove. Point 84 has high leverage (potential for influence) but is probably not actually very influential because it is so far from Cook’s Distance. Points 20 and 65 are outliers but only point 20 is more than Cook’s Distance away from the mean. In this case, you would likely remove point 20 from consideration unless there were a mitigating reason. For example, game collectors often pay extra for a game that has unusual attributes, such as shrink-wrapped original edition. As an example of a point you would definitely remove, draw a horizontal line from point 20 to a vertical line from point 84. Where they meet would be a high-leverage outlier that is unduly affecting the model no matter what it’s underlying cause. On the other hand, what if you have many such points? Unfortunately, that probably means the model isn’t very good.
plot(m,which=c(5))
plot(m,which=c(6))
#. If you render with the following,#. you get an error message saying#. "'which' must be in 1:6"#. plot(m,which=c(7))
Best Subsets Regression
--------------------------------------------------------
Model Index Predictors
--------------------------------------------------------
1 carat
2 carat clarity
3 carat color clarity
4 carat color clarity x
5 carat cut color clarity x
6 carat cut color clarity depth x
7 carat cut color clarity depth table x
8 carat cut color clarity depth table x z
9 carat cut color clarity depth table x y z
--------------------------------------------------------
Subsets Regression Summary
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Adj. Pred
Model R-Square R-Square R-Square C(p) AIC SBIC SBC MSEP FPE HSP APC
--------------------------------------------------------------------------------------------------------------------------------------------------------------
1 0.8493 0.8493 0.8493 47343.7252 945466.5323 792389.1367 945493.2192 129350491485.6166 2398132.8805 44.4601 0.1507
2 0.8949 0.8948 0.8948 16756.0962 926075.5885 772986.9665 926164.5448 90268965134.5993 1673786.2236 31.0311 0.1052
3 0.9140 0.9139 0.9139 3925.0856 915270.4554 762172.8186 915412.7855 73867441333.4421 1369818.0969 25.3957 0.0861
4 0.9171 0.9171 0.917 1786.5467 913238.1961 760140.7902 913389.4218 71134846525.0090 1319168.5676 24.4567 0.0829
5 0.9194 0.9194 0.9192 292.5281 911771.5199 758668.3749 911958.3280 69217703948.1984 1283711.1090 23.7993 0.0806
6 0.9197 0.9196 0.9195 102.9375 911582.4842 758479.3794 911778.1880 68974272750.4518 1279220.1538 23.7161 0.0804
7 0.9198 0.9198 0.9196 22.3316 911501.9083 758398.8257 911706.5077 68870038819.1149 1277310.6785 23.6807 0.0802
8 0.9198 0.9198 0.9196 22.2470 911501.8229 758398.7418 911715.3179 68868653231.6045 1277308.6624 23.6806 0.0802
9 0.9198 0.9198 0.9196 24.0000 911503.5757 758400.4957 911725.9664 68869614710.3862 1277350.1772 23.6814 0.0802
--------------------------------------------------------------------------------------------------------------------------------------------------------------
AIC: Akaike Information Criteria
SBIC: Sawa's Bayesian Information Criteria
SBC: Schwarz Bayesian Criteria
MSEP: Estimated error of prediction, assuming multivariate normality
FPE: Final Prediction Error
HSP: Hocking's Sp
APC: Amemiya Prediction Criteria
Look for elbows in the plots of the metrics to find the best model.
plot(best)
My impression is that the best model has three variables: carat, color, and clarity. This is based entirely on diminishing returns of adding more variables.