Tuesday, April 23, 2013

Graphing Car Sales

By the numbers from Autoblog.com reports monthly car sales for almost every manufacturer. It started in 2005 and continues to today. I gathered the data into a spreadsheet which you can see here(). My main objective was to see group manufactures into clusters and see what companies are directly related.  For example; are the monthly change in car sales for Acura more similar to Honda, its parent company, or BMW and Mercedes. If it were more similar to BMW then one could conclude that Acura is a 'true' luxury brand. Using monthly car sales is an indirect way of achieving this. That's my original intention. However, I may come up with a few questions of my own when looking at the data... 

This post is really just exploratory data analysis and focuses on plotting manufactures. I've grouped the manufactures based on my own knowledge and what I think should be considered similar manufactures.  

Japan Big 3

Here we have Japan's big 3 auto manufactures; Honda, Toyota, and Nissan. We see a general decrease in car sales in second half of 2008 and they have yet to recover. The surprisingly large number of cars sold in summer 2009 was a result of Cash for Clunkers. Wonder if US manufactures had a large increase like that too...
Big 3 US. Same general decline until 2009, then steadily increasing after. No big increase for Cash for Clunker's dates like Japanese manufactures.
Suzuki dying is the only thing I find sad about this graph.
Mitsubishi looks dead compared to the jagged ups and downs of Mazda and Subaru.  I wonder how long Mitsubishi can stay around for.
Obviously Kia and Hyundai should be graphed together. I included VW since it looked similar enough and has been targeting US market specifically with poorer quality (cheaper) Passat. All three haven't been significantly effected by general decline due to economy and they've all increased since 2009.
Lexus is biggest Japanese Luxo brand while Acura and Infiniti are far behind.
With all

Car and Driver has Terrible Infographics

Take a look at this info graphic.
The data include car sales of Ferrari, Lambogini, and Maserati for the years 2010,2011,2012 for Italy. This is obviously a time series plot but the graphic designer thought for some reason that the graphic should be a bar chart, half circle, and half of the picture should be upside down. Why not just have something like this?
I know my graphic has half years (which don't mean anything) and the labels for car manufactures are cut off. I still think its a more informative graphic than the one C and D produced. What do you think?

Thursday, April 4, 2013

EVO Data Continued

My previous post attempted to see whether higher revving cars are more likely to score higher on EVO tests (and therefore add some quantitative justification to my suggestion that higher revving cars are more fun). I used an ordered logit model because of the categorical nature of the EVO 5 star rating system. Here I will simplify the analysis by using a logit model and combine the 5 star ratings into 2 different ratings. This should simplify the analysis and assumptions involved (EG, that all the coefficients are same throughout the ordered logit model).

First, I separated the 5 star data into two different groups; those that are 4.5 or 5, and those below 4.5 (4, 3.5, 3, 2.5). One benefit of separating the data into these two groups is that they are roughly equal in size (134 (46.9%) for those >=4.5 and 152 (53.1%) (<4.5). These results are shown in Model 1.

Then, I separated the EVO star data into whether the car had the coveted 5 stars or not. There are 63 (22%) observations with 5 star rating and 223 (88%) without. These results are shown in Model 2.

The conclusions are fairly similar to my previous findings, that higher revving, low torque engines are more EVO worthy cars.

Greater than or equal ato 4.5 EVO Rating Cutoff Point Model

Below is a pairwise correlation plot of the variables. I've included an indicator variable to determine whether the car is naturally aspirated or has some sort of forced induction which was not in my previous analysis. All the variables showed some level of right skewness and therefore all variables were logged transformed.


Price is slightly skewed to the right. This makes sense given that we have a few vary high priced cars (max was 2 million pounds). Engine size (CubicCenti) has a bimodal distribution because of the high number of cars that have 2.0 liter engines. Interestingly, RPM where max LBFT is achieved (ftrpm) is also bimodal. The cause of this bimodal distribution appears to be due to whether the engine is has a forced induction (including turbo or supercharged). One can see (at the intersection of ftrpm and Induction) that those that are naturally aspirated have a higher RPM where maximum torque is achieved. Finally, Weight is skewed to the left. This is due to EVO testing a large number of track cars (Morgan's, Caterhams, etc.) that are very light weight. (I was thinking of removing these light track cars from the analysis because these aren't REALLY cars. But then I had the good luck and privilege of having a ride in a Morgan three wheeler at a jalopnik.com event. Being a passenger in the car while the driver was drifting in downtown Manhattan convinced me to leave these wonderful cars in the analysis.)

To fit a model, I included all variables in a logistic regression and used a stepwise regression teqhnique with an AIC criterion. The resulting model is below. 

Coefficients: Estimate Std. Error Z - Score P Value
(Intercept) -28.51 13.46 -2.12 0.03
Price.1 1.33 0.40 3.34 0.00
hprpm 3.26 1.34 2.43 0.02
lbft -1.76 0.67 -2.62 0.01
X0.60mph -2.84 0.93 -3.04 0.00
Price and RPM where maximum Horespower is achieved are positively associated with EVO score. Interestingly, torque is negatively associated with EVO score. Finally, as we would expect, 0-60 time is negatively associated with EVO score (having a lower 0-60 score means its a more "EVO appropriate" car). This model results in correctly classifying 75.5% of the observations. 

That torque comes in negative and HP RPM comes in positive both suggest that higher revving, low torque cars are more EVO worthy cars. 

5 Star EVO Model

Below is a pairwise plot for the variables. The plots on the bottom row and the right most column are the only difference the graph below and the graph above.


All of comments from the GRAPH 1 apply to GRAPH 2. The relationships between this cutoff method and other variables appear to be similar to those relationships between the first cutoff method and other variables. The same method of stepwise logistic regression was used to fit the model shown below. 

Coefficients: Estimate Std. Error Z - Score P Value
(Intercept) -10.4832 5.586 -1.877 0.060561
Price.1 0.4135 0.2614 1.582 0.11369
ftrpm 1.4126 0.4151 3.403 0.000666
X0.60mph -4.7284 1.0398 -4.547 5.43E-06

Price and RPM where Torque achieves its maximum (FTRPM) are both positively associated with 5 star EVO cars and 0-60 time is negatively associated. While torque is not statistically significant in this model, FTRPM is. Therefore, similar conclusions are reached about my original hypothesis, but not as strongly as Model 1. Model 2 correctly classifies 86.5% of the observations.

Both Models suggest that higher revving cars are more likely to achieve a higher EVO score. Torque is negative in first model (controlling for other factors) and adds evidence that high torque isn't obviously beneficial. However, the second model does not include torque as statistically significant so it isn't as conclusive as higher RPM cars have higher EVO scores. While HP RPM is used in Model 2, LBFT RPM is used in Moded 1. I don't think this is a big deal since they are highly correlated (correlation coefficient is 0.745) and it's not surprising that one doesn't come in statistically significant when the other does. 


Monday, April 1, 2013

Horsepower/Liter vs Max RPM

It's clear that the higher the max RPM is more HP should follow. However, whats the relationship between HP/Liter and Max RPM? Below is a graph using EVO Data. I added whether the car was has some sort of forced induction or whether it was naturally aspirated.

From the graph there are two separate relationships for naturally aspirated engines and forced induction engines. Naturally aspirated engines have to rev much higher to achieve the same hp/liter ratio.  In addition the naturally aspirated observations have a much more linear relationship than forced induction counterparts.

With all thats being said about the death of manual transmissions and the increasing ubiquity of electric steering I'm surprised people aren't talking about takeover of forced induction engines over natural breathing. Higher revving engines can be more fun (as I claimed in my last post). And if turbos can't give us that same stratospheric rev range that natural aspiration can do, should we shun the take over of the turbo as well?

Rough Draft: Is Easy, Abundant Torque Boring?

EVO Data. Are High Torque Cars Boring?

Here is rough Draft of what I'm working on. Are Torquey Engines Boring? using EVO data

Rough Draft:

Better Graph:


R Code: