top of page

Python

Correlation & model analysis

In the following example we will take the role of an analyst whose task was to analyze the automobile market to find correlations between the final pricing of a vehicle and different characteristics of said vehicle, to price future vehicle releases according to the current market to stay competitive.

Note that only some parts of the final Jupyter notebook is shown to keep only the more relevant elements. Do feel free to contact me if you want the full file.

The following dataset is available on IBM skill networks.

After the cleaning and following the first dataset exploration steps, we can check for correlations that we might think are relevant for our analysis.

In these visualizations, we are using the built-in matplotlib and seaborn plot to quickly see if we can identify any correlations. 

python2.5.png
python2.3.png
python2.6.png

We can gather that :

-There is a visible correlation between engine size and price.

-There is a negative correlation between highway mpg and price.

-High outlier in sedan and wagon body style.

-High Interquartile range in convertible and hardtop body style.

- Pearson correlation ( measure the linear dependence between two variables) 

  - P-value ( probability that that correlation between two variables is statically significant )

python2.4.png

-Strong statically significant

 Medium linear relationship

-Strong statically significant

 Medium linear relationship

-Strong statically significant

 Medium linear relationship

-Strong statically significant

 Medium linear relationship

Using linear regression plot, distribution plot as well as Pearson correlation to visualize and confirm precedent correlation we found

python2.1.png
python2.2.png

We can confirm that :

-There is a high to very high positive correlation between engine size and price .

-There high negative correlation between highway mpg and price.

python2.8.png

Red - actual data value

Green - last fitted model 

since Highway-mpg was not giving us the result we wanted, I tested our distribution plot with others variables and I found that the Engine size, in combination with the Horsepower was pretty close to our actual value for our final model 

Conclusion

After our analysis, dependent on what kind of vehicle our employer want to put on the market, we can quickly have a price estimate that will be very close to actual model pricing, using the Engine Size and Horsepower values.

We can refine the vehicle price according to other factor like body style, gas or diesel etc.

Further analysis would require information like region specific data.

bottom of page