Getting Started with SQL Queries - Exercises for Beginners Part-3
Here's a selection of the most useful SQL queries every beginner must practice. This article will not only provide you with the answers but also the explanation to each query.
Introduction
Welcome to the third part of the SQL series! We will discuss more challenging yet exciting questions on the Cars Dataset.
If you are new here, we recommend you to check out parts one and two first, since many concepts are carried forward in this tutorial.
Find the relevant links below:
Dataset: Used Cars Dataset
Getting Started with SQL Queries - Exercises for Beginners Part-1
Getting Started with SQL Queries - Exercises for Beginners Part-2
To learn how to run your queries on Dbeaver or Python, visit the link below:
Now, let's get jump straight to the questions.
Q1. How has the percentage market share of manufacturers changed across the years?
select table1.year,manufacturer_grouped_price,total_price,
(100*manufacturer_grouped_price/total_price) as market_share
from
(select sum(price) as manufacturer_grouped_price,manufacturer,year
from vehicles
where (price between 1 and 655000) and (manufacturer !="") and (year
!="")
group by manufacturer,year
order by year) table1
inner join
(select sum(price) as total_price,year
from vehicles
where (price between 1 and 655000) and (manufacturer !="") and (year
!="")
group by year
order by year) table2
on table1.year=table2.yearExplanation:
We will divide the whole query into small parts and understand each one in-depth.
- Let's first look at the part of the query given below:
(select sum(price) as manufacturer_grouped_price,manufacturer,year
from vehicles
where (price between 1 and 655000) and (manufacturer !="") and (year
!="")
group by manufacturer,year
order by year) table1Here, we are extracting sum of price as manufacturer_grouped_price, manufacturer and year, w.r.t the following conditions using the WHERE clause:
Fixing
manufacturernot equal to " "
Fixingyearnot equal to " "
Settingpricebetween 1 to 655000
After this, we group our results w.r.t the manufacturerand year. Next, we arrange the results in ascebding order w.r.t year. We call this resultant table, table1.
2. Now, let's understand another chunk of the query below:
(select sum(price) as total_price,year
from vehicles
where (price between 1 and 655000) and (manufacturer !="") and (year
!="")
group by year
order by year) table2Here, we are extracting the sum of price as total_price and year, w.r.t the following conditions using the WHERE clause:
Fixing
manufacturernot equal to " "
Fixingyearnot equal to " "
Settingpricebetween 1 to 655000
After this, we group our results w.r.t the year. Arrange the result in ascending order w.r.t year. We call this resultant table, table2.
3. Next, we use INNER JOIN on table1 and table2 and we specify the join-predicate to be year using the query:
on table1.year=table2.year
4. Lastly, from this resultant table we extract the following columns:
-
table1.year manufacturer_grouped_pricetotal_price- (100*
manufacturer_grouped_price/total_price) asmarket_share.
The final table will give us the percentage market share of manufacturers across the years.
Q2. In which month the highest number of cars are listed for sale?
Answer:
select count(id) vehicle_counts,SUBSTRING(posting_date,6,2) as month
from vehicles
where (posting_date!="") and (price between 1 and 655000)
group by month
order by vehicle_counts desc
limit 1
First, let's learn about the SUBSTRING() function.
SUBSTRING()
SUBSTRING(): This function helps to extract some characters from a string.
Syntax:
SUBSTRING(string, start, length)Explanation:
- We extract no. of ids using
COUNT(id)asvehicle_countsandmonthsby applying thesubstring()function on the columnposting_date. - We get the above result w.r.t some conditions using the
WHERE()clause, i.e.:-
Fixing
posting_datenot equal to " "
Settingpricebetween 1 to 655000
3. Grouping the results w.r.t. the month.
4. Arranging the result in descending order w.r.t. vehicle_counts and printing the 1st-row using limit 1.
Q3. What are the most frequent conditions across colours?
Answer:
select count(id) vehicle_counts,condition,paint_color
from vehicles
where (paint_color!="") and (condition!="") and (price between 1 and 655000)
group by condition,paint_color
order by vehicle_counts desc
Explanation:
- We extract the number of ids using
count(id),conditionandpaint_color. - We filter the above results w.r.t some conditions using the
WHERE()clause, i.e.,
Fixing
paint_colornot equal to " "
Fixingconditionnot equal to " "
Settingpricebetween 1 to 655000
3. Grouping the results w.r.t. condition and paint_color.
4. Arranging the final table in descending order of vehicle_count.
Q4. Which colours are the most frequent across vehicle types, ranked in decreasing order of frequency?
Answer:
select count(id) vehicle_counts,type,paint_color,
rank() over(partition by type order by count(id) desc) rank_col
from vehicles
where (paint_color!="") and (type!="") and (price between 1 and 655000)
group by type,paint_colorWe already know what the RANK()function does (Exercise -2, Question #10), so jumping straight to the explanation.Explanation:
- We extract no. of ids using
count(id)asvehicle_counts,typeandpaint_color. - We also extract rank for each row using the function
rank(). To specify the ranks, we use the criteria that we partition the result w.r.t.typeand arrange it in descending order w.r.tcount(id)(using order by) ~ saving this in the column namerank_col. - We want the above result w.r.t the following conditions using the
WHEREclause:
Fixing
paint_colornot equal to " "
Fixingtypenot equal to " "
Settingpricebetween 1 to 655000
4. Grouping the above result w.r.t. type and paint_color.
Q5. How is the frequency of vehicles distributed across their condition and type?
Answer:
select count(id) as total_vehicles,condition,type
from vehicles
where (condition!="") and (price between 1 and 655000) and (type!="")
group by condition,typeExplanation:
- Extracting no. of ids using
count(id)astotal_vehicles,conditionandtype. - Filtering the above result w.r.t the following conditions using the
WHEREclause:
Fixing
conditionnot equal to " "
Fixingtypenot equal to " "
Settingpricebetween 1 to 655000
3. Grouping the final result w.r.t. condition and type.
Q6. Rank (without repetition) the models according to their frequency across posting years?
Answer:
select count(id) as total_vehicles,model,SUBSTRING(posting_date,1,4) as post_year,
ROW_NUMBER () over(partition by SUBSTRING(posting_date,1,4) order by count(id) desc) rank_col
from vehicles
where (model!="") and (price between 1 and 655000) and (posting_date!="")
group by model,post_year
We already know what the ROW_NUMBER()function does (Exercise -2, Question #13), so jumping straight to the explanation.Explanation:
- We extract the number of ids using
count(id)and save it in the columntotal_vehicles. - We also extract
model,post_yearusing thesubstringfunction. - Next, we use
row_numberoverpost_year. Then, we sort it by arranging them in descending order using the code:order by count(id) desc. We save this in a column namedrank_col. - We want the above result w.r.t the following conditions using the
WHEREclause:
Fixing
modelnot equal to " "
Fixingposting_datenot equal to " "
Settingpricebetween 1 to 655000
5. Grouping the final result w.r.t modeland post_year.
Q7. The highest number of vehicles were posted for sale in which year?
Answer:
select count(id) as total_vehicles,SUBSTRING(posting_date,1,4) as post_year
from vehicles
where (price between 1 and 655000) and (posting_date!="")
group by post_year
order by total_vehicles
limit 1
Explanation:
- We extract no. of ids using
COUNT(id)astotal_vehiclesandpost_yearby applying thesubstring()function on the columnposting_date. - We get the above result w.r.t some conditions using the
WHERE()clause, i.e.:-
Fixing
posting_datenot equal to " "
Settingpricebetween 1 to 655000
3. Grouping the results w.r.t. the post_year.
4. Arranging the result in ascending order w.r.t. total_vehicles and printing the 1st-row using limit 1.
Q8. Mention the top 10 costliest models for every manufacturer?
Answer:
select *
from(
select max(price) as max_price,model,manufacturer,
ROW_NUMBER () over(partition by manufacturer order by max(price) desc) rank_col
from vehicles
where (price between 1 and 655000) and (model!="") and (manufacturer !="")
group by model,manufacturer
)
where rank_col<=10
Refer to Question 6 for the explanation.
Q9. What is the most occurring description phrase across vehicle types?
Answer:
select count(description) as description_occurance,description,type
from vehicles
where (price between 1 and 655000) and (type!="") and (description !="")
group by type,description
order by description_occurance desc
Explanation:
- We extract no. of description using
count(description)asdescription_occurance,descriptionandtype. - We filter the above result w.r.t. some conditions using the
where()clause:
Fixing
typenot equal to " "
Fixingdescriptionnot equal to " "
Settingpricebetween 1 to 655000
3. Grouping the above result w.r.t type and description.
4. Arranging the final result in descending order w.r.t. description_occurance.
Q10. What is the average description character length across the vehicle condition column?
Answer:
select avg(description_length) as average_length, condition
from
(select length(description) as description_length,description,condition
from vehicles
where (price between 1 and 655000) and (condition!="") and (description !=""))
group by condition
Explanation:
- First, we are extracting the length of the description using
length(description)and save it in a column namedescription_lenght. We also extract the columnsdescriptionandcondition-- (secondSELECTstatement). - Filtering the above result w.r.t the following conditions using
WHEREclause:
Fixing
conditionnot equal to " "
Fixingdescriptionnot equal to " "
Settingpricebetween 1 to 655000
3. Now, we extract avg(description_length) as average_length and condition from the result obtained from step 1 -- (first SELECT statement).
4. In the end, we group the results w.r.t condition using the GROUP BY command.
Q11. Mention the most selling model across every state?
Answer:
select *
from(
select COUNT(id) vehicle_counts,model,state,
ROW_NUMBER () over(partition by state order by COUNT(id) desc) rank_col
from vehicles
where (price between 1 and 655000) and (model!="") and (state !="")
group by model,state)
where rank_col=1
Refer to Question 6 for the explanation.
Q12. Arrange the number of postings in decreasing order of their posting time?
Answer:
select COUNT(id) as vehicle_counts, SUBSTRING(posting_date,12,2) as hour
from vehicles
where (price between 1 and 655000) and (posting_date!="")
group by SUBSTRING(posting_date,12,2)
order by vehicle_counts descRefer to Question 7 for the explanation.
Q13. What is the percentage distribution of trucks across fuel type?
Answer:
select table1.fuel,fuel_vehicle_counts,total_vehicle_counts,
(100*fuel_vehicle_counts/total_vehicle_counts)
from
(select COUNT(id) as fuel_vehicle_counts,fuel,type
from vehicles
where (price between 1 and 655000) and (fuel!="") and (type="truck")
group by fuel,type) table1
inner join
(select COUNT(id) as total_vehicle_counts,type
from vehicles
where (price between 1 and 655000) and (fuel!="") and (type="truck")
group by type) table2
on table1.type=table2.typeRefer to Question 1 for the explanation.
Q14. How does the price vary for Sedans across their condition ?
Answer:
select avg(price) as average_price,condition
from vehicles
where (price between 1 and 655000) and (condition!="") and
(type="sedan")
group by conditionExplanation:
- Calculating average price using
AVG()function and storing it in a new column namedaverage_price. - Extracting
average_priceand correspondingcondition, w.r.t. the conditions:
Fixing
conditionnot equal to " "
Fixingtypeequal to "sedan"
Settingpricebetween 1 to 655000
3. Grouping all the result w.r.t. condition using GROUP BY.
Q15. Arrange the number of postings in decreasing order of their posting time?
Answer:
select COUNT(id) as vehicle_counts, SUBSTRING(posting_date,12,2) as hour
from vehicles
where (price between 1 and 655000) and (posting_date!="")
group by SUBSTRING(posting_date,12,2)
order by vehicle_counts desc
Explanation:
- We extract no. of ids using
COUNT(id)asvehicle_countsandhoursby applying thesubstring()function on the columnposting_date. - We get the above result w.r.t some conditions using the
WHERE()clause, i.e.:-
Fixing
posting_datenot equal to " "
Settingpricebetween 1 to 655000
3. Grouping the results w.r.t. the hour.
4. Arranging the result in descending order w.r.t. vehicle_counts.
Conclusion
Here, we come to the end of our practice questions. You can walk through these on your own.
But, how much practice do you need to master SQL?
Find out how many questions you can answer without getting stuck or looking back at the explanation. If you get stuck, come back to Exercise Parts 1,2 and 3 and you will find the explanation here.
Click on the link below for more challenging questions on the same dataset.
Getting Started with SQL Queries - Exercises for Beginners Part-4
Come meet us there!