Introduction

Welcome to the third part of the SQL series! We will discuss more challenging yet exciting questions on the Cars Dataset.

If you are new here, we recommend you to check out parts one and two first, since many concepts are carried forward in this tutorial.

Find the relevant links below:

Dataset: Used Cars Dataset

Getting Started with SQL Queries - Exercises for Beginners Part-1

Getting Started with SQL Queries - Exercises for Beginners Part-2

To learn how to run your queries on Dbeaver or Python, visit the link below:

DBeaver vs Python: Two Methods To Run Your Queries
This is a step-by-step guide on how to run your SQL queries using Dbeaver and Python..

Now, let's get jump straight to the questions.

ūüí°
To know more about such content - follow us on Twitter.

Q1. How has the percentage market share of manufacturers changed across the years?

select table1.year,manufacturer_grouped_price,total_price,
     (100*manufacturer_grouped_price/total_price) as market_share
from
  (select sum(price) as manufacturer_grouped_price,manufacturer,year 
  from vehicles
  where (price between 1 and 655000) and (manufacturer !="") and (year 
  !="") 
  group by manufacturer,year 
  order by year) table1
inner join 
  (select sum(price) as total_price,year 
  from vehicles
  where (price between 1 and 655000) and (manufacturer !="") and (year 
  !="") 
  group by year 
  order by year) table2
on table1.year=table2.year

Explanation:

We will divide the whole query into small parts and understand each one in-depth.

  1. Let's first look at the part of the query given below:
(select sum(price) as manufacturer_grouped_price,manufacturer,year 
  from vehicles
  where (price between 1 and 655000) and (manufacturer !="") and (year 
  !="") 
  group by manufacturer,year 
  order by year) table1

Here, we are extracting sum of price as manufacturer_grouped_price, manufacturer and year, w.r.t the following conditions using the WHERE clause:

Fixing manufacturernot equal to " "
Fixing year not equal to " "
Setting price between 1 to 655000

After this, we group our results w.r.t the manufacturerand year. Next, we arrange the results in ascebding order w.r.t year. We call this resultant table, table1.

2.  Now, let's understand another chunk of the query below:

(select sum(price) as total_price,year 
  from vehicles
  where (price between 1 and 655000) and (manufacturer !="") and (year 
  !="") 
  group by year 
  order by year) table2

Here, we are extracting the sum of price as total_price and year, w.r.t the following conditions using the WHERE clause:

Fixing manufacturer not equal to " "
Fixing year not equal to " "
Setting price between 1 to 655000

After this, we group our results w.r.t the year. Arrange the result in ascending order w.r.t year. We call this resultant table, table2.

3.  Next, we use INNER JOIN on table1 and table2 and we specify the join-predicate to be year using the query:

on table1.year=table2.year

4. Lastly, from this resultant table we extract the following columns:

  • table1.year
  • manufacturer_grouped_price
  • total_price
  • (100* manufacturer_grouped_price / total_price ) as market_share.

The final table will give us the percentage market share of manufacturers across the years.

Q2. In which month the highest number of cars are listed for sale?

Answer:

select count(id) vehicle_counts,SUBSTRING(posting_date,6,2) as month
from vehicles
where (posting_date!="") and (price between 1 and 655000)
group by month
order by vehicle_counts desc 
limit 1

First, let's learn about the SUBSTRING() function.

SUBSTRING()

  • SUBSTRING(): This function helps to extract some characters from a string.

Syntax:

SUBSTRING(string, start, length)

Explanation:

  1. We extract no. of ids using COUNT(id) as vehicle_countsand months by applying the substring()function on the column posting_date.
  2. We get the above result w.r.t some conditions using the WHERE()clause, i.e.:-

Fixing posting_date not equal to " "
Setting price between 1 to 655000

3.  Grouping the results w.r.t. the month.

4.  Arranging the result in descending order w.r.t. vehicle_counts and printing the 1st-row using limit 1.

Q3. What are the most frequent conditions across colours?

Answer:

select count(id) vehicle_counts,condition,paint_color 
from vehicles
where (paint_color!="") and (condition!="") and (price between 1 and 655000)
group by condition,paint_color 
order by vehicle_counts desc

Explanation:

  1. We extract the number of ids using count(id), condition and paint_color.
  2. We filter the above results w.r.t some conditions using the WHERE() clause, i.e.,

Fixing paint_color not equal to " "
Fixing condition not equal to " "
Setting price between 1 to 655000

3.  Grouping the results w.r.t. condition and paint_color.

4.  Arranging the final table in descending order of vehicle_count.

Q4. Which colours are the most frequent across vehicle types, ranked in decreasing order of frequency?

Answer:

select count(id) vehicle_counts,type,paint_color,
rank() over(partition by type order by count(id) desc)  rank_col 
from vehicles
where (paint_color!="") and (type!="") and (price between 1 and 655000)
group by type,paint_color
We already know what the RANK()function does (Exercise -2, Question #10), so jumping straight to the explanation.

Explanation:

  1. We extract no. of ids using count(id) as vehicle_counts, type and paint_color.
  2. We also extract rank for each row using the function rank(). To specify the ranks, we use the criteria that we partition the result w.r.t. type and arrange it in descending order w.r.t count(id) (using order by) ~ saving this in the column name rank_col.
  3. We want the above result w.r.t the following conditions using the WHERE clause:

Fixing paint_colornot equal to " "
Fixing type not equal to " "
Setting price between 1 to 655000

4.  Grouping the above result w.r.t. type and paint_color.

Q5. How is the frequency of vehicles distributed across their condition and type?

Answer:

select count(id) as total_vehicles,condition,type
from vehicles
where (condition!="") and (price between 1 and 655000) and (type!="")
group by condition,type

Explanation:

  1. Extracting no. of ids using count(id) as total_vehicles, condition and type.
  2. Filtering the above result w.r.t the following conditions using the WHERE clause:

Fixing conditionnot equal to " "
Fixing type not equal to " "
Setting price between 1 to 655000

3.  Grouping the final result w.r.t. condition and type.

Q6. Rank (without repetition) the models according to their frequency across posting years?

Answer:

select count(id) as total_vehicles,model,SUBSTRING(posting_date,1,4) as post_year,
ROW_NUMBER () over(partition by SUBSTRING(posting_date,1,4) order by count(id) desc)  rank_col
from vehicles
where (model!="") and (price between 1 and 655000) and (posting_date!="")
group by model,post_year
We already know what the ROW_NUMBER()function does (Exercise -2, Question #13), so jumping straight to the explanation.

Explanation:

  1. We extract the number of ids using count(id) and save it in the column total_vehicles.
  2. We also extract model, post_year using the substring function.
  3. Next, we use row_number over post_year. Then, we sort it by arranging them in descending order using the code: order by count(id) desc . We save this in a column named rank_col.
  4. We want the above result w.r.t the following conditions using the WHERE clause:

Fixing modelnot equal to " "
Fixing posting_datenot equal to " "
Setting price between 1 to 655000

5.  Grouping the final result w.r.t modeland post_year.

Q7. The highest number of vehicles were posted for sale in which year?

Answer:

select count(id) as total_vehicles,SUBSTRING(posting_date,1,4) as post_year
from vehicles
where  (price between 1 and 655000) and (posting_date!="")
group by post_year
order by total_vehicles
limit 1

Explanation:

  1. We extract no. of ids using COUNT(id) as total_vehiclesand post_year by applying the substring()function on the column posting_date.
  2. We get the above result w.r.t some conditions using the WHERE()clause, i.e.:-

Fixing posting_date not equal to " "
Setting price between 1 to 655000

3.  Grouping the results w.r.t. the post_year.

4.  Arranging the result in ascending order w.r.t. total_vehicles and printing the 1st-row using limit 1.

Q8. Mention the top 10 costliest models for every manufacturer?

Answer:

select * 
from(
select max(price) as max_price,model,manufacturer,
ROW_NUMBER () over(partition by manufacturer order by max(price) desc)  rank_col
from vehicles
where  (price between 1 and 655000) and (model!="") and (manufacturer !="") 
group by model,manufacturer 
)
where rank_col<=10
Refer to Question 6 for the explanation.

Q9. What is the most occurring description phrase across vehicle types?

Answer:

select count(description) as description_occurance,description,type
from vehicles
where  (price between 1 and 655000) and (type!="") and (description !="") 
group by type,description
order by description_occurance desc

Explanation:

  1. We extract no. of description using count(description) as description_occurance, description and type.
  2. We filter the above result w.r.t. some conditions using the where()clause:

Fixing type not equal to " "
Fixing description not equal to " "
Setting price between 1 to 655000

3.  Grouping the above result w.r.t type and description.

4.  Arranging the final result in descending order w.r.t. description_occurance.

Q10. What is the average description character length across the vehicle condition column?

Answer:

select avg(description_length) as average_length, condition
from
	(select length(description) as description_length,description,condition
	from vehicles
	where  (price between 1 and 655000) and (condition!="") and (description 	!=""))
group by condition

Explanation:

  1. First, we are extracting the length of the description using length(description) and save it in a column name description_lenght. We also extract the columns description and condition -- (second SELECT statement).
  2. Filtering the above result w.r.t the following conditions using WHERE clause:

Fixing condition not equal to " "
Fixing description not equal to " "
Setting price between 1 to 655000

3.  Now, we extract avg(description_length) as average_length and condition from the result obtained from step 1 -- (first SELECT statement).

4.  In the end, we group the results w.r.t condition using the GROUP BY command.

Q11. Mention the most selling model across every state?

Answer:

select * 
from(
  select COUNT(id) vehicle_counts,model,state,
  ROW_NUMBER () over(partition by state order by COUNT(id) desc)  rank_col
  from vehicles
  where  (price between 1 and 655000) and (model!="") and (state !="") 
  group by model,state)
where rank_col=1
Refer to Question 6 for the explanation.

Q12. Arrange the number of postings in decreasing order of their posting time?

Answer:

select COUNT(id) as vehicle_counts, SUBSTRING(posting_date,12,2) as hour 
from vehicles
where  (price between 1 and 655000) and (posting_date!="")
group by SUBSTRING(posting_date,12,2)
order by vehicle_counts desc
Refer to Question 7 for the explanation.

Q13. What is the percentage distribution of trucks across fuel type?

Answer:

select table1.fuel,fuel_vehicle_counts,total_vehicle_counts,
       (100*fuel_vehicle_counts/total_vehicle_counts)
from
  (select COUNT(id) as fuel_vehicle_counts,fuel,type
  from vehicles
  where  (price between 1 and 655000) and (fuel!="")  and (type="truck")
  group by fuel,type) table1
inner join 
  (select COUNT(id) as total_vehicle_counts,type
  from vehicles
  where  (price between 1 and 655000) and (fuel!="")  and (type="truck")
  group by type) table2
on table1.type=table2.type
Refer to Question 1 for the explanation.

Q14. How does the price vary for Sedans across their condition ?

Answer:

select avg(price) as average_price,condition
from vehicles
where  (price between 1 and 655000) and (condition!="")  and  
       (type="sedan")
group by condition

Explanation:

  1. Calculating average price using AVG() function and storing it in a new column named average_price.
  2. Extracting average_price and corresponding condition, w.r.t. the conditions:

Fixing condition not equal to " "
Fixing type equal to "sedan"
Setting price between 1 to 655000

3.  Grouping all the result w.r.t. condition using GROUP BY.

Q15. Arrange the number of postings in decreasing order of their posting time?

Answer:

select COUNT(id) as vehicle_counts, SUBSTRING(posting_date,12,2) as hour 
from vehicles
where  (price between 1 and 655000) and (posting_date!="")
group by SUBSTRING(posting_date,12,2)
order by vehicle_counts desc

Explanation:

  1. We extract no. of ids using COUNT(id) as vehicle_countsand hours by applying the substring()function on the column posting_date.
  2. We get the above result w.r.t some conditions using the WHERE()clause, i.e.:-

Fixing posting_date not equal to " "
Setting price between 1 to 655000

3.  Grouping the results w.r.t. the hour.

4.  Arranging the result in descending order w.r.t. vehicle_counts.

Conclusion

Here, we come to the end of our practice questions. You can walk through these on your own.

But, how much practice do you need to master SQL?

Find out how many questions you can answer without getting stuck or looking back at the explanation. If you get stuck, come back to Exercise Parts 1,2 and 3 and you will find the explanation here.

Click on the link below for more challenging questions on the same dataset.

Getting Started with SQL Queries - Exercises for Beginners Part-4

Come meet us there!