Tutorial to write SQL queries using Python and visualizing the results in Tableau - Part 1

Tutorial to write SQL queries using Python and visualizing the results in Tableau - Part 1
Photo by Carlos Muza / Unsplash

This post walks you through the SQL queries in the Kaggle python notebook and shows you how to visualize them using Tableau.

We are taking the following dataset for our reference:

Dataset: Used Cars Dataset

How has the price of vehicles changed over the years?

Thinking about buying a used car? Then we need to analyze how the price of different cars has changed over the years and how that might affect future prices. In our previous SQL series, we discussed the "Craigslist used cars dataset" (link above) and discussed the various factors that affect the price of the car, be it the distance traveled, time of its purchase, or the brand. The price of a car reduces as it gets older and older.

In our previous SQL series, we discussed how to write queries. Today, we will talk about how to visualize those results in Tableau.

SQL Query in Kaggle

Following is the code to load necessary libraries in the Kaggle notebook to run your SQL queries.

import numpy as np # linear algebra
import pandas as pd # data processing
      
!pip install pandasql
pd.set_option('display.max_rows', 100)
pd.options.display.float_format = '{:.5f}'.format
import pandasql as ps
from IPython.display import HTML

Next, we have the SQL query to select vehicles at a particular price range and group them by the year of manufacturing to calculate whether a vehicle's price is affected by the year of its manufacturing.

SELECT Avg(price)AS average_price,
       year
FROM   vehicles
WHERE  price BETWEEN 1 AND 655000
GROUP  BY year; 
average_price year
0 39384.35887 NaN
1 1999.33333 1900.00000
2 100.00000 1901.00000
3 5000.00000 1915.00000
4 21900.00000 1920.00000
5 15000.00000 1922.00000
6 16886.66667 1923.00000
7 11833.33333 1924.00000
8 10000.00000 1925.00000
9 10583.66667 1926.00000
10 13416.66667 1927.00000
11 26866.50000 1928.00000
12 23617.85714 1929.00000
13 17619.16667 1930.00000
14 20399.94444 1931.00000
15 34036.92308 1932.00000
16 58500.00000 1933.00000
17 24719.28571 1934.00000
18 10000.00000 1935.00000
19 28075.00000 1936.00000
20 24599.93750 1937.00000
21 6465.00000 1938.00000
22 26386.75000 1939.00000
23 26140.17647 1940.00000
24 25840.41667 1941.00000
25 24250.00000 1942.00000
26 11761.46154 1946.00000
27 18880.00000 1947.00000
28 23829.72727 1948.00000
29 19962.82609 1949.00000
30 10529.06897 1950.00000
31 22152.36842 1951.00000
32 20289.20000 1952.00000
33 14300.21739 1953.00000
34 17589.58333 1954.00000
35 31010.00000 1955.00000
36 20008.50000 1956.00000
37 24255.11765 1957.00000
38 18727.77778 1958.00000
39 24349.40000 1959.00000
40 18126.41176 1960.00000
41 18774.68750 1961.00000
42 17777.39130 1962.00000
43 16797.29167 1963.00000
44 14487.17544 1964.00000
45 23074.98387 1965.00000
46 18976.40625 1966.00000
47 20567.78824 1967.00000
48 20401.75000 1968.00000
49 31741.01205 1969.00000
50 17379.22222 1970.00000
51 18103.29825 1971.00000
52 18745.01149 1972.00000
53 14596.85246 1973.00000
54 13155.59615 1974.00000
55 12524.39286 1975.00000
56 9137.73810 1976.00000
57 10833.83929 1977.00000
58 11701.24691 1978.00000
59 11749.24691 1979.00000
60 11559.46809 1980.00000
61 9264.56863 1981.00000
62 10726.79487 1982.00000
63 9985.03030 1983.00000
64 8310.40244 1984.00000
65 9278.32184 1985.00000
66 11124.78641 1986.00000
67 10982.11009 1987.00000
68 9976.32143 1988.00000
69 7932.36207 1989.00000
70 7930.28926 1990.00000
71 7159.19658 1991.00000
72 9340.10484 1992.00000
73 7932.73973 1993.00000
74 7791.28571 1994.00000
75 9550.36404 1995.00000
76 9549.94636 1996.00000
77 8249.66766 1997.00000
78 7511.40566 1998.00000
79 8679.17483 1999.00000
80 7393.17699 2000.00000
81 7611.78471 2001.00000
82 7079.10738 2002.00000
83 7487.86367 2003.00000
84 7642.66414 2004.00000
85 8302.91257 2005.00000
86 8536.31916 2006.00000
87 8837.61229 2007.00000
88 9661.51915 2008.00000
89 9681.30271 2009.00000
90 10702.25058 2010.00000
91 13714.09688 2011.00000
92 15055.66599 2012.00000
93 16127.65714 2013.00000
94 18890.75078 2014.00000
95 22142.77637 2015.00000
96 23632.19509 2016.00000
97 26777.76141 2017.00000
98 29488.20022 2018.00000
99 33312.31478 2019.00000
100 36845.25795 2020.00000
101 36155.48700 2021.00000
102 3414.50000 2022.00000

You can copy and paste the SQL query we shared in your notebook and see how the query works on data. After taking the data using queries you can create visualizations using tableau which we have discussed in the next section.

How to do it in Tableau?

"Data visualization is the presentation of abstract information in graphical form. Data visualization allows us to spot patterns, trends, and correlations that otherwise might go unnoticed in traditional reports, tables, or spreadsheets."

Thus to read the data better we'll be using Tableau for visualizing the out of our queries.

Following is the Tableau Window, with the data already loaded.  

In the above visualization, we have plotted "Year" against "Average Price".

X-axis/Columns : Year
Y-axis/Rows: AVG(Price)

We select the above specifications by dragging the Year from tables to columns and Avg(price) to the rows.

When we plot our data we see that the year '1960' and '2007' are outliers, which makes the rest of the data negligible and hard for us to observe.

Therefore, we will remove these outliers.

We click on the year '1960; and see some options in the pop-up box. Here, we can exclude the particular outliers to let data make sense using the Exclude option.

Now, we observe our graph. We have selected an Area Graph for this query.

Area Graphs are Line Graphs but with the area below the line filled in with a certain colour or texture. Area Graphs are drawn by first plotting data points on a Cartesian coordinate grid, joining a line between the points and finally filling in the space below the completed line.

Area graphs combine the line chart and bar chart to show how one or more groups’ numeric values change over the progression of a second variable, typically that of time, and is distinguished from a line chart by the addition of shading between lines and a baseline, like in a bar chart.

Let's see how the final plot looks like.

Interpretation

After removing the outliers we can see the data is following a trend that is quite interpretable and shows how the average price of used cars varies across years. Among all the years that were taken into consideration, the few such as the year 1935, 1955 that is touching the peak have the highest pricing of used cars. Also, a sudden dip at 1938 indicates that the used cars price was lowest for this particular year.

In the tableau, we can create many more visualizations for the same thing using the 'Show me' button at the top right corner. Also, in the marks field, we can choose the drop-down button and select from the different types of graphs available.

Conclusion

In the following series, we would be sharing different SQL queries in python and their visualization using Tableau. We're using Tableau because it is much easier and convenient than other python libraries that are being used for plotting various graphs and visualizing the dataset such as seaborn, matplotlib, etc.

We will publish many such articles in the days to come so stay tuned for more such queries and exciting graphs around the same.