Introduction
Welcome to the third part of the SQL series! We will discuss more challenging yet exciting questions on the Cars Dataset.
If you are new here, we recommend you to check out parts one and two first, since many concepts are carried forward in this tutorial.
Find the relevant links below:
Dataset: Used Cars Dataset
Getting Started with SQL Queries  Exercises for Beginners Part1
Getting Started with SQL Queries  Exercises for Beginners Part2
To learn how to run your queries on Dbeaver or Python, visit the link below:
Now, let's get jump straight to the questions.
Q1. How has the percentage market share of manufacturers changed across the years?
select table1.year,manufacturer_grouped_price,total_price,
(100*manufacturer_grouped_price/total_price) as market_share
from
(select sum(price) as manufacturer_grouped_price,manufacturer,year
from vehicles
where (price between 1 and 655000) and (manufacturer !="") and (year
!="")
group by manufacturer,year
order by year) table1
inner join
(select sum(price) as total_price,year
from vehicles
where (price between 1 and 655000) and (manufacturer !="") and (year
!="")
group by year
order by year) table2
on table1.year=table2.year
Explanation:
We will divide the whole query into small parts and understand each one indepth.
 Let's first look at the part of the query given below:
(select sum(price) as manufacturer_grouped_price,manufacturer,year
from vehicles
where (price between 1 and 655000) and (manufacturer !="") and (year
!="")
group by manufacturer,year
order by year) table1
Here, we are extracting sum of price as manufacturer_grouped_price
, manufacturer
and year
, w.r.t the following conditions using the WHERE
clause:
Fixing
manufacturer
not equal to " "
Fixingyear
not equal to " "
Settingprice
between 1 to 655000
After this, we group our results w.r.t the manufacturer
and year
. Next, we arrange the results in ascebding order w.r.t year
. We call this resultant table, table1
.
2. Now, let's understand another chunk of the query below:
(select sum(price) as total_price,year
from vehicles
where (price between 1 and 655000) and (manufacturer !="") and (year
!="")
group by year
order by year) table2
Here, we are extracting the sum of price as total_price
and year
, w.r.t the following conditions using the WHERE
clause:
Fixing
manufacturer
not equal to " "
Fixingyear
not equal to " "
Settingprice
between 1 to 655000
After this, we group our results w.r.t the year
. Arrange the result in ascending order w.r.t year
. We call this resultant table, table2
.
3. Next, we use INNER JOIN
on table1
and table2
and we specify the joinpredicate to be year
using the query:
on table1.year=table2.year
4. Lastly, from this resultant table we extract the following columns:

table1.year
manufacturer_grouped_price
total_price
 (100*
manufacturer_grouped_price
/total_price
) asmarket_share
.
The final table will give us the percentage market share of manufacturers across the years.
Q2. In which month the highest number of cars are listed for sale?
Answer:
select count(id) vehicle_counts,SUBSTRING(posting_date,6,2) as month
from vehicles
where (posting_date!="") and (price between 1 and 655000)
group by month
order by vehicle_counts desc
limit 1
First, let's learn about the SUBSTRING()
function.
SUBSTRING()
SUBSTRING()
: This function helps to extract some characters from a string.
Syntax:
SUBSTRING(string, start, length)
Explanation:
 We extract no. of ids using
COUNT(id)
asvehicle_counts
andmonths
by applying thesubstring()
function on the columnposting_date
.  We get the above result w.r.t some conditions using the
WHERE()
clause, i.e.:
Fixing
posting_date
not equal to " "
Settingprice
between 1 to 655000
3. Grouping the results w.r.t. the month
.
4. Arranging the result in descending order w.r.t. vehicle_counts
and printing the 1strow using limit 1
.
Q3. What are the most frequent conditions across colours?
Answer:
select count(id) vehicle_counts,condition,paint_color
from vehicles
where (paint_color!="") and (condition!="") and (price between 1 and 655000)
group by condition,paint_color
order by vehicle_counts desc
Explanation:
 We extract the number of ids using
count(id)
,condition
andpaint_color
.  We filter the above results w.r.t some conditions using the
WHERE()
clause, i.e.,
Fixing
paint_color
not equal to " "
Fixingcondition
not equal to " "
Settingprice
between 1 to 655000
3. Grouping the results w.r.t. condition
and paint_color
.
4. Arranging the final table in descending order of vehicle_count
.
Q4. Which colours are the most frequent across vehicle types, ranked in decreasing order of frequency?
Answer:
select count(id) vehicle_counts,type,paint_color,
rank() over(partition by type order by count(id) desc) rank_col
from vehicles
where (paint_color!="") and (type!="") and (price between 1 and 655000)
group by type,paint_color
We already know what the RANK()
function does (Exercise 2, Question #10), so jumping straight to the explanation.
Explanation:
 We extract no. of ids using
count(id)
asvehicle_counts
,type
andpaint_color
.  We also extract rank for each row using the function
rank()
. To specify the ranks, we use the criteria that we partition the result w.r.t.type
and arrange it in descending order w.r.tcount(id)
(using order by) ~ saving this in the column namerank_col
.  We want the above result w.r.t the following conditions using the
WHERE
clause:
Fixing
paint_color
not equal to " "
Fixingtype
not equal to " "
Settingprice
between 1 to 655000
4. Grouping the above result w.r.t. type
and paint_color
.
Q5. How is the frequency of vehicles distributed across their condition and type?
Answer:
select count(id) as total_vehicles,condition,type
from vehicles
where (condition!="") and (price between 1 and 655000) and (type!="")
group by condition,type
Explanation:
 Extracting no. of ids using
count(id)
astotal_vehicles
,condition
andtype
.  Filtering the above result w.r.t the following conditions using the
WHERE
clause:
Fixing
condition
not equal to " "
Fixingtype
not equal to " "
Settingprice
between 1 to 655000
3. Grouping the final result w.r.t. condition
and type
.
Q6. Rank (without repetition) the models according to their frequency across posting years?
Answer:
select count(id) as total_vehicles,model,SUBSTRING(posting_date,1,4) as post_year,
ROW_NUMBER () over(partition by SUBSTRING(posting_date,1,4) order by count(id) desc) rank_col
from vehicles
where (model!="") and (price between 1 and 655000) and (posting_date!="")
group by model,post_year
We already know what the ROW_NUMBER()
function does (Exercise 2, Question #13), so jumping straight to the explanation.
Explanation:
 We extract the number of ids using
count(id)
and save it in the columntotal_vehicles
.  We also extract
model
,post_year
using thesubstring
function.  Next, we use
row_number
overpost_year
. Then, we sort it by arranging them in descending order using the code:order by count(id) desc
. We save this in a column namedrank_col
.  We want the above result w.r.t the following conditions using the
WHERE
clause:
Fixing
model
not equal to " "
Fixingposting_date
not equal to " "
Settingprice
between 1 to 655000
5. Grouping the final result w.r.t model
and post_year
.
Q7. The highest number of vehicles were posted for sale in which year?
Answer:
select count(id) as total_vehicles,SUBSTRING(posting_date,1,4) as post_year
from vehicles
where (price between 1 and 655000) and (posting_date!="")
group by post_year
order by total_vehicles
limit 1
Explanation:
 We extract no. of ids using
COUNT(id)
astotal_vehicles
andpost_year
by applying thesubstring()
function on the columnposting_date
.  We get the above result w.r.t some conditions using the
WHERE()
clause, i.e.:
Fixing
posting_date
not equal to " "
Settingprice
between 1 to 655000
3. Grouping the results w.r.t. the post_year
.
4. Arranging the result in ascending order w.r.t. total_vehicles
and printing the 1strow using limit 1
.
Q8. Mention the top 10 costliest models for every manufacturer?
Answer:
select *
from(
select max(price) as max_price,model,manufacturer,
ROW_NUMBER () over(partition by manufacturer order by max(price) desc) rank_col
from vehicles
where (price between 1 and 655000) and (model!="") and (manufacturer !="")
group by model,manufacturer
)
where rank_col<=10
Refer to Question 6 for the explanation.
Q9. What is the most occurring description phrase across vehicle types?
Answer:
select count(description) as description_occurance,description,type
from vehicles
where (price between 1 and 655000) and (type!="") and (description !="")
group by type,description
order by description_occurance desc
Explanation:
 We extract no. of description using
count(description)
asdescription_occurance
,description
andtype
.  We filter the above result w.r.t. some conditions using the
where()
clause:
Fixing
type
not equal to " "
Fixingdescription
not equal to " "
Settingprice
between 1 to 655000
3. Grouping the above result w.r.t type
and description
.
4. Arranging the final result in descending order w.r.t. description_occurance
.
Q10. What is the average description character length across the vehicle condition column?
Answer:
select avg(description_length) as average_length, condition
from
(select length(description) as description_length,description,condition
from vehicles
where (price between 1 and 655000) and (condition!="") and (description !=""))
group by condition
Explanation:
 First, we are extracting the length of the description using
length(description)
and save it in a column namedescription_lenght
. We also extract the columnsdescription
andcondition
 (secondSELECT
statement).  Filtering the above result w.r.t the following conditions using
WHERE
clause:
Fixing
condition
not equal to " "
Fixingdescription
not equal to " "
Settingprice
between 1 to 655000
3. Now, we extract avg(description_length)
as average_length
and condition
from the result obtained from step 1  (first SELECT
statement).
4. In the end, we group the results w.r.t condition
using the GROUP BY
command.
Q11. Mention the most selling model across every state?
Answer:
select *
from(
select COUNT(id) vehicle_counts,model,state,
ROW_NUMBER () over(partition by state order by COUNT(id) desc) rank_col
from vehicles
where (price between 1 and 655000) and (model!="") and (state !="")
group by model,state)
where rank_col=1
Refer to Question 6 for the explanation.
Q12. Arrange the number of postings in decreasing order of their posting time?
Answer:
select COUNT(id) as vehicle_counts, SUBSTRING(posting_date,12,2) as hour
from vehicles
where (price between 1 and 655000) and (posting_date!="")
group by SUBSTRING(posting_date,12,2)
order by vehicle_counts desc
Refer to Question 7 for the explanation.
Q13. What is the percentage distribution of trucks across fuel type?
Answer:
select table1.fuel,fuel_vehicle_counts,total_vehicle_counts,
(100*fuel_vehicle_counts/total_vehicle_counts)
from
(select COUNT(id) as fuel_vehicle_counts,fuel,type
from vehicles
where (price between 1 and 655000) and (fuel!="") and (type="truck")
group by fuel,type) table1
inner join
(select COUNT(id) as total_vehicle_counts,type
from vehicles
where (price between 1 and 655000) and (fuel!="") and (type="truck")
group by type) table2
on table1.type=table2.type
Refer to Question 1 for the explanation.
Q14. How does the price vary for Sedans across their condition ?
Answer:
select avg(price) as average_price,condition
from vehicles
where (price between 1 and 655000) and (condition!="") and
(type="sedan")
group by condition
Explanation:
 Calculating average price using
AVG()
function and storing it in a new column namedaverage_price
.  Extracting
average_price
and correspondingcondition
, w.r.t. the conditions:
Fixing
condition
not equal to " "
Fixingtype
equal to "sedan"
Settingprice
between 1 to 655000
3. Grouping all the result w.r.t. condition
using GROUP BY
.
Q15. Arrange the number of postings in decreasing order of their posting time?
Answer:
select COUNT(id) as vehicle_counts, SUBSTRING(posting_date,12,2) as hour
from vehicles
where (price between 1 and 655000) and (posting_date!="")
group by SUBSTRING(posting_date,12,2)
order by vehicle_counts desc
Explanation:
 We extract no. of ids using
COUNT(id)
asvehicle_counts
andhours
by applying thesubstring()
function on the columnposting_date
.  We get the above result w.r.t some conditions using the
WHERE()
clause, i.e.:
Fixing
posting_date
not equal to " "
Settingprice
between 1 to 655000
3. Grouping the results w.r.t. the hour
.
4. Arranging the result in descending order w.r.t. vehicle_counts
.
Conclusion
Here, we come to the end of our practice questions. You can walk through these on your own.
But, how much practice do you need to master SQL?
Find out how many questions you can answer without getting stuck or looking back at the explanation. If you get stuck, come back to Exercise Parts 1,2 and 3 and you will find the explanation here.
Click on the link below for more challenging questions on the same dataset.
Getting Started with SQL Queries  Exercises for Beginners Part4
Come meet us there!