Thursday, April 4, 2013

EVO Data Continued

My previous post attempted to see whether higher revving cars are more likely to score higher on EVO tests (and therefore add some quantitative justification to my suggestion that higher revving cars are more fun). I used an ordered logit model because of the categorical nature of the EVO 5 star rating system. Here I will simplify the analysis by using a logit model and combine the 5 star ratings into 2 different ratings. This should simplify the analysis and assumptions involved (EG, that all the coefficients are same throughout the ordered logit model).

First, I separated the 5 star data into two different groups; those that are 4.5 or 5, and those below 4.5 (4, 3.5, 3, 2.5). One benefit of separating the data into these two groups is that they are roughly equal in size (134 (46.9%) for those >=4.5 and 152 (53.1%) (<4.5). These results are shown in Model 1.

Then, I separated the EVO star data into whether the car had the coveted 5 stars or not. There are 63 (22%) observations with 5 star rating and 223 (88%) without. These results are shown in Model 2.

The conclusions are fairly similar to my previous findings, that higher revving, low torque engines are more EVO worthy cars.

Greater than or equal ato 4.5 EVO Rating Cutoff Point Model

Below is a pairwise correlation plot of the variables. I've included an indicator variable to determine whether the car is naturally aspirated or has some sort of forced induction which was not in my previous analysis. All the variables showed some level of right skewness and therefore all variables were logged transformed.


Price is slightly skewed to the right. This makes sense given that we have a few vary high priced cars (max was 2 million pounds). Engine size (CubicCenti) has a bimodal distribution because of the high number of cars that have 2.0 liter engines. Interestingly, RPM where max LBFT is achieved (ftrpm) is also bimodal. The cause of this bimodal distribution appears to be due to whether the engine is has a forced induction (including turbo or supercharged). One can see (at the intersection of ftrpm and Induction) that those that are naturally aspirated have a higher RPM where maximum torque is achieved. Finally, Weight is skewed to the left. This is due to EVO testing a large number of track cars (Morgan's, Caterhams, etc.) that are very light weight. (I was thinking of removing these light track cars from the analysis because these aren't REALLY cars. But then I had the good luck and privilege of having a ride in a Morgan three wheeler at a event. Being a passenger in the car while the driver was drifting in downtown Manhattan convinced me to leave these wonderful cars in the analysis.)

To fit a model, I included all variables in a logistic regression and used a stepwise regression teqhnique with an AIC criterion. The resulting model is below. 

Coefficients: Estimate Std. Error Z - Score P Value
(Intercept) -28.51 13.46 -2.12 0.03
Price.1 1.33 0.40 3.34 0.00
hprpm 3.26 1.34 2.43 0.02
lbft -1.76 0.67 -2.62 0.01
X0.60mph -2.84 0.93 -3.04 0.00
Price and RPM where maximum Horespower is achieved are positively associated with EVO score. Interestingly, torque is negatively associated with EVO score. Finally, as we would expect, 0-60 time is negatively associated with EVO score (having a lower 0-60 score means its a more "EVO appropriate" car). This model results in correctly classifying 75.5% of the observations. 

That torque comes in negative and HP RPM comes in positive both suggest that higher revving, low torque cars are more EVO worthy cars. 

5 Star EVO Model

Below is a pairwise plot for the variables. The plots on the bottom row and the right most column are the only difference the graph below and the graph above.


All of comments from the GRAPH 1 apply to GRAPH 2. The relationships between this cutoff method and other variables appear to be similar to those relationships between the first cutoff method and other variables. The same method of stepwise logistic regression was used to fit the model shown below. 

Coefficients: Estimate Std. Error Z - Score P Value
(Intercept) -10.4832 5.586 -1.877 0.060561
Price.1 0.4135 0.2614 1.582 0.11369
ftrpm 1.4126 0.4151 3.403 0.000666
X0.60mph -4.7284 1.0398 -4.547 5.43E-06

Price and RPM where Torque achieves its maximum (FTRPM) are both positively associated with 5 star EVO cars and 0-60 time is negatively associated. While torque is not statistically significant in this model, FTRPM is. Therefore, similar conclusions are reached about my original hypothesis, but not as strongly as Model 1. Model 2 correctly classifies 86.5% of the observations.

Both Models suggest that higher revving cars are more likely to achieve a higher EVO score. Torque is negative in first model (controlling for other factors) and adds evidence that high torque isn't obviously beneficial. However, the second model does not include torque as statistically significant so it isn't as conclusive as higher RPM cars have higher EVO scores. While HP RPM is used in Model 2, LBFT RPM is used in Moded 1. I don't think this is a big deal since they are highly correlated (correlation coefficient is 0.745) and it's not surprising that one doesn't come in statistically significant when the other does. 


No comments:

Post a Comment