top of page

Python 

Exploratory Phase

The main goal of a starter exploratory phase is to clean and assemble the dataset, as well as answering simple questions that we might have about said dataset. It also serves to see if the data will be useful for further exploration, analysis and visualization.

The following dataset is from the article Hotel Booking Demand Datasets, written by Nuno Antonio, Ana Almeida, and Luis Nunes for Data in Brief, Volume 22, February 2019.

python1.png

- Importing the necessary libraries

- Cleaning null values and removing useless columns in the data set

python2.png
python3.png

We can then quickly answer questions we might have, such has :

- The country with the highest amount of clients ?

- What is the average amount of nights stayed per client ?

- Who is the client that paid the highest daily rate ?

- The percentage of returning customers ?

- The most common area code by phone numbers ?

Or aggregate columns from the dataset to find, for example, how many customers arrived on each day of the week.

Finally we can visualize some of the data we find useful for use in more in dept visual analysis in order to answer more complex questions and hopefully find meaningful correlations

Amount of Canceled vs. Valid reservation

python5.png

Room price trend 

python7.png

Distribution by type of hotels

python6.png

The following visualization is based on a dataset for Canada Immigration from 1980 to 2013, available on Kaggle.com

Immigration trend from 1980 to 2013, following the top 5 countries

python8.png
python9.png
bottom of page