Getting Started with SQL Queries - Exercises for Beginners Part-2
Introduction
Welcome back to Exercise-2 on the SQL Queries for Beginners.
In the first part of the SQL Queries for Beginners, we discussed and learnt about some of the basic yet very important functions of SQL, namely,
SELECT
FROM
WHERE
GROUP BY
ORDER BY
CASE
If you're new here we recommend you to start with our first exercise by clicking on the link provided below:
Getting Started with SQL Queries - Exercises for Beginners Part-1
If you've already completed the exercises in the previous tutorial or if you are already a little ahead in SQL, then you're ready to try this section.
We will keep practising the questions on the same dataset as before, i.e., "Used Car Dataset."
To learn how to run your queries on Dbeaver or Python, visit the link below:
Without further ado, let's start with the next set of questions we have for you.
Q1. Mention the costliest 8 cylinder model across each manufacturer?
Answer:
SELECT Max(price) AS max_price,
cylinders,
model,
manufacturer
FROM vehicles
WHERE ( cylinders = "8 cylinders" )
AND ( price BETWEEN 1 AND 655000 )
GROUP BY manufacturer
ORDER BY max_price DESC;
Explanation:
- Calculating maximum price using
MAX()
function and storing it in a new column namedmax_price
. - Extracting
max_price
and correspondingcylinders
,model
andmanufacturer
w.r.t. the following conditions using theWHERE
clause:
Fixing
cylinders
equal to "8 cylinders"
Setting the price limit from 1 to 655000
3. Grouping the result w.r.t manufacturer
together using GROUP BY
.
4. Arranging the result w.r.t max_price
in descending order using ORDER BY
.
Q2. What are the cheapest full-size models across all manufacturers?
Answer:
SELECT Min(price) AS min_price,
model,
manufacturer
FROM vehicles
WHERE ( size = "full-size" )
AND ( price BETWEEN 1 AND 655000 )
GROUP BY model,
manufacturer
ORDER BY min_price;
Explanation:
- Calculating minimum price using
MIN()
function and storing it in a new column namedmin_price
. - Extracting
min_price
and correspondingVIN
,model
andmanufacturer
w.r.t. the following conditions using theWHERE
clause:
Fixing
size
equal to "full-size"
Setting the price limit from 1 to 655000
3. Grouping the result w.r.t manufacturer
together using GROUP BY
.
4. Arranging the result w.r.t. min_price
in ascending order using ORDER BY
.
Q3. Compare the average price across manufacturers for the years 2020 and 2021?
Answer:
SELECT table1.manufacturer,
average_price_2020,
average_price_2021
FROM (SELECT Avg(price) AS average_price_2020,
manufacturer
FROM vehicles
WHERE ( year = "2020" )
AND ( price BETWEEN 1 AND 655000 )
GROUP BY manufacturer) table1
INNER JOIN
(SELECT Avg(price) AS average_price_2021,
manufacturer
FROM vehicles
WHERE ( year = "2021" )
AND ( price BETWEEN 1 AND 655000 )
GROUP BY manufacturer) table2
ON table1.manufacturer = table2.manufacturer;
Let's first understand what INNER JOIN
means.
INNER JOIN keyword
INNER JOIN
: It creates a new final table by combining column values of two tables (table1 and table2) based upon the column with common entries. The query compares each row of table1 with each row of table2 to find all pairs of rows which are common among the two tables. When this is found, column values for each matched pair of rows of A and B are combined into a result row. It is also calledEQUIJOIN
.
Syntax:
SELECT table1.column1, table2.column2...
FROM table1
INNER JOIN table2
ON table1.common_column = table2.common_column;
Understanding the query below.
Explanation:
We will divide the whole query into small parts and understand each one in depth.
- Let's first look at the part of the query given below:
(select avg(price) as average_price_2020,manufacturer
from vehicles
where (year="2020") and (price between 1 and 655000)
group by manufacturer) table1
Here, we are extracting average_price
for year 2020 and manufacturer
, w.r.t the following conditions using the WHERE
clause:
Fixing
year
='2020'
Settingprice
between 1 to 655000
After this, we group our results w.r.t the manufacturer
. We call this resultant table, table1
.
2. Now, let's understand another chunk of the query below:
(select avg(price) as average_price_2021,manufacturer
from vehicles
where (year="2021") and (price between 1 and 655000)
group by manufacturer) table2
Here, we are extracting average_price
for year 2021 and manufacturer
, w.r.t the following conditions using the WHERE
clause:
Fixing
year
='2021'
Settingprice
between 1 to 655000
After this, we group our results w.r.t the manufacturer
. We call this resultant table, table2
.
3. Next, we use INNER JOIN
on table1
and table2
and we specify the join-predicate to be manufacturer
using the query:
on table1.manufacturer=table2.manufacturer
4. Lastly, from this resultant table we extract the following columns: table1.manufacturer
, average_price_2020
, average_price_2021
.
This will give us the final result, by comparing the average prices across manufacturers for the years 2020 and 2021.
Q4. Mention all the models for trucks of the manufacturer that has the cheapest average truck price?
Answer:
select model from vehicles where manufacturer in
(select manufacturer
from vehicles
where (type="truck") and (price between 1 and 655000)
group by manufacturer
order by avg(price)
limit 1)
and (type='truck') and (price between 1 and 655000)
Explanation:
- Extracting
manufacturer
(secondSELECT
statement) w.r.t the following conditions usingWHERE
clause:
Fixing
type
='truck'
Settingprice
between 1 to 655000
2. Grouping the results w.r.t the manufacturer
.
3. Arranging the result in ascending order w.r.t average_price
and printing the first row using LIMIT 1
.
4. Finally, extracting model
w.r.t the following conditions:
manufacturer
is equal to what we obtained from the above result.
type
='truck'
Settingprice
between 1 to 655000
Q5. Compare the average prices of automatic sedans across all manufacturers?
Answer:
select avg(price) as average_price,manufacturer
from vehicles
where (type="sedan") and (transmission="automatic") and
(price between 1 and 655000)
group by manufacturer
order by avg(price)
Explanation:
- Calculating average price using
AVG()
function and storing it in a new column namedaverage_price
. - Extracting
average_price
and correspondingmanufacturer
, w.r.t. the following conditions:
type
='sedan'
transmission
='automatic'
Settingprice
between 1 to 655000
3. Grouping all the different manufacturers together using GROUP BY
.
4. Arranging the result w.r.t. average price in ascending order using ORDER BY
.
Q6. How does the price vary for different vehicle types across distance travelled?
Answer:
select avg(price) as average_price,type,
CASE
WHEN odometer >= 0 and odometer <= 20000 then '0-20000'
WHEN odometer > 20000 and odometer <= 40000 then '20000-40000'
WHEN odometer > 40000 and odometer <= 60000 then '40000-60000'
WHEN odometer > 60000 and odometer <= 80000 then '60000-80000'
WHEN odometer > 80000 and odometer <= 100000 then '80000-100000'
WHEN odometer > 100000 then '<100000'
ELSE 'NA'
end as distance_travelled
from vehicles
where price between 1 and 655000
group by type,distance_travelled;
We already know what the CASE
statement means (Exercise-1, Question #3), so jumping straight to the explanation.
Explanation:
- Calculating average price using
AVG()
function and storing it in a new column namedaverage_price
. - Defining cases for different conditions using the
CASE
statement, i.e.: when odometer>=0 and odometer <=20000 then we are printing '0-20000' and so on.. and naming this new columndistance_travelled
using theEND
command. - Setting the price limit between 1 to 655000.
- Grouping the results with respect to the new column
distance_travelled
.
Q7. What is the difference between the average price for excellent conditioned trucks and new conditioned trucks?
Answer:
SELECT table1.TYPE,
excellent_average_price,
new_average_price,
( excellent_average_price-new_average_price ) AS price_difference
FROM (SELECT Avg(price) AS excellent_average_price,
TYPE
FROM vehicles
WHERE ( condition = "excellent" )
AND ( TYPE = "truck" )
AND ( price BETWEEN 1 AND 655000 )) table1
INNER JOIN (SELECT Avg(price) AS new_average_price,
TYPE
FROM vehicles
WHERE ( condition = "new" )
AND ( TYPE = "truck" )
AND ( price BETWEEN 1 AND 655000 )) table2
ON table1.TYPE = table2.TYPE
;
Explanation:
- Let's first look at the part of the query given below:
(select avg(price) as excellent_average_price,type
from vehicles
where (condition="excellent") and (type="truck") and
(price between 1 and 655000 )) table1
Here, we are extracting average_price
as excellent_average_price
and type
, w.r.t the following conditions using the WHERE
clause:
Fixing
condition
='excellent'
Fixingtype
='truck'
Settingprice
between 1 to 655000
We call this resultant table, table1
.
2. Now, let's understand another chunk of the query below:
(select avg(price) as new_average_price,type
from vehicles
where (condition="new") and (type="truck") and
(price between 1 and 655000 )) table2
Here, we are extracting average_price
as new_average_price
and type
, w.r.t the following conditions using the WHERE
clause:
Fixing
condition
='new'
Fixingtype
='truck'
Settingprice
between 1 to 655000
We call this resultant table, table2
.
3. Next, we use INNER JOIN
on table1
and table2
and we specify the join-predicate to be type
using the query:
on table1.type=table2.type
.
4. Lastly, from this resultant table we extract the following columns:
table1.type
excellent_average_price
new_average_price
- [
excellent_average_price
-new_average_price
] =price_difference
. (taking difference of 2 columns).
This will give us the final result by comparing the average price for excellent condition trucks and new condition trucks.
Q8. What is the percentage distribution of trucks across fuel types?
Answer:
SELECT table1.fuel,
fuel_vehicle_counts,
total_vehicle_counts,
( 100 * fuel_vehicle_counts / total_vehicle_counts )
FROM (SELECT COUNT(id) AS fuel_vehicle_counts,
fuel,
TYPE
FROM vehicles
WHERE ( price BETWEEN 1 AND 655000 )
AND ( fuel != "" )
AND ( TYPE = "truck" )
GROUP BY fuel,
TYPE) table1
INNER JOIN (SELECT COUNT(id) AS total_vehicle_counts,
TYPE
FROM vehicles
WHERE ( price BETWEEN 1 AND 655000 )
AND ( fuel != "" )
AND ( TYPE = "truck" )
GROUP BY TYPE) table2
ON table1.TYPE = table2.TYPE;
Explanation:
- Let's first look at the part of the query given below:
(select COUNT(id) as fuel_vehicle_counts,fuel,type
from vehicles
where (price between 1 and 655000) and (fuel!="") and (type="truck")
group by fuel,type) table1
Here, we are extracting average_price as excellent_average_price and type, w.r.t the following conditions using the WHERE clause:
Fixing
fuel
not equal to " "
Fixingtype
='truck'
Settingprice
between 1 to 655000
Grouping the results w.r.t fuel
and type
. We call this resultant table, table1.
2. Now, let's understand another chunk of the query below:
(select COUNT(id) as total_vehicle_counts,type
from vehicles
where (price between 1 and 655000) and (fuel!="") and (type="truck")
group by type) table2
Here, we are extracting COUNT(id)
(no. of ids) as total_vehicle_counts
and type, w.r.t the following conditions using the WHERE clause:
Fixing
fuel
not equal to " "
Fixingtype
='truck'
Settingprice
between 1 to 655000
Grouping the results w.r.t type
. We call this resultant table, table2.
3. Next, we use INNER JOIN
on table1 and table2 and we specify the join-predicate to be type
using the query:
on table1.type=table2.type
.
4. Lastly, from this resultant table we extract the following columns:
table1.fuel
fuel_vehicle_counts
total_vehicle_counts
(100*fuel_vehicle_counts/total_vehicle_counts)
This will give us the final result.
Q9. How does the price vary for hatchbacks and SUVs across manufacturers?
Answer:
SELECT table1.manufacturer,
average_hatchback_price,
average_suv_price
FROM (SELECT Avg(price) AS average_hatchback_price,
manufacturer
FROM vehicles
WHERE ( TYPE = "hatchback" )
AND ( price BETWEEN 1 AND 655000 )
GROUP BY manufacturer) table1
INNER JOIN (SELECT Avg(price) AS average_SUV_price,
manufacturer
FROM vehicles
WHERE ( TYPE = "suv" )
AND ( price BETWEEN 1 AND 655000 )
GROUP BY manufacturer) table2
ON table1.manufacturer = table2.manufacturer
;
Explanation:
- Let's first look at the part of the query given below:
(select avg(price) as average_hatchback_price,manufacturer
from vehicles
where (type="hatchback") and (price between 1 and 655000)
group by manufacturer) table1
Here, we are extracting average_price
as average_hatchback_price
and manufacturer
, w.r.t the following conditions using the WHERE
clause:
Fixing
type
='hatchback'
Settingprice
between 1 to 655000
Grouping the result w.r.t the manufacturer
. We call this resultant table, table1
.
2. Now, let's understand another chunk of the query below:
(select avg(price) as average_SUV_price,manufacturer
from vehicles
where (type="SUV") and (price between 1 and 655000)
group by manufacturer) table2
Here, we are extracting average_price
as average_SUV_price
and manufacturer
, w.r.t the following conditions using the WHERE
clause:
Fixing
type
='SUV'
Settingprice
between 1 to 655000
Grouping the result w.r.t the manufacturer
. We call this resultant table, table2
.
3. Next, we use INNER JOIN
on table1
and table2
and we specify the join-predicate to be manufacturer
using the query:
on table1.manufacturer=table2.manufacturer
.
4. Lastly, from this resultant table we extract the following columns:
table1.manufacturer
, average_hatchback_price
, average_SUV_price
.
This will tell us how the price varies for hatchbacks and SUVs across manufacturers.
Q10. Find out the top three manufacturers that have the highest average price across each region?
Answer:
SELECT region,
manufacturer,
average_price,
rank_col
FROM (SELECT Avg(price) AS average_price,
manufacturer,
region,
RANK()
OVER(
PARTITION BY region
ORDER BY Avg(price) desc) rank_col
FROM vehicles
WHERE ( price BETWEEN 1 AND 655000 )
AND ( manufacturer != "" )
GROUP BY manufacturer)
WHERE rank_col <= 3
ORDER BY region;
RANK(), OVER() and PARTITION BY() Functions
RANK()
: It is used to specify ranks for each row within a partition of a result set. It is also known as Window Functions.OVER()
: It is used to define a set of rows in the result set.PARTITION BY()
: It distributes the rows in the result set into partitions by one or more criteria.
Syntax:
SELECT column_1,
RANK () OVER ( PARTITION BY column_2
ORDER BY column_1
) AS New_column_name
FROM table_name;
Explanation:
- We extract
AVG(price)
asaverage_price
,manufacturer
,region
- We also extract rank for each row using the function
rank()
. To specify the ranks, we use the criteria that we partition the result w.r.t.region
and arrange it in descending order w.r.tavg(price)
(using order by) ~saving this in the column namerank_col
. - We want the above result w.r.t the following conditions using the
WHERE
clause:
Fixing
manufacturer
not equal to " "
Settingprice
between 1 to 655000
4. Grouping the above result w.r.t. manufacturer
5. Finally, we select everything ( SELECT *
) from the above output, with the condition that rank_col
<=3.
This will give us the top three manufacturers that have the highest average price across each region.
Q11. Mark in which quantiles do the models lie according to their average price?
Answer:
select avg(price) as average_price,model,
ntile(4) over (order by avg(price))
from vehicles
where (price between 1 and 655000) and (model !="")
group by model
NTILE(N) Function
NTILE(n)
: It is another Window Function. It is used to distribute the number of rows in the specified number of groups (n). Each row group gets it's rank w.r.t a specified condition. We need to specify the value for the desired number of groups inside the parenthesis.
Syntax:
SELECT *,
NTILE(2) OVER(
ORDER BY Column_1 DESC) Rank
FROM Table_name
Explanation:
- Extracting
AVG(price)
asaverage_price
,model
- We split the rows in
avg(price)
(ascending order usingorder by
) into 4 groups using the functionntile(4)
. - We want the above result w.r.t the following conditions using the
WHERE
clause:
Fixing
model
not equal to " "
Settingprice
between 1 to 655000
4. We then group the result w.r.t. the model
.
Q12. Mark each manufacturer according to its percentile rank based on its average price?
Answer:
select avg(price) as average_price,manufacturer,
percent_rank() over (order by avg(price))
from vehicles
where (price between 1 and 655000) and (manufacturer !="")
group by manufacturer
First, let's learn more about thepercent_rank()
function.
PERCENT_RANK() Function
PERCENT_RANK()
: This function in SQL Server calculates the relative rank SQL Percentile of each row. It always returns values greater than 0, and the highest value is 1. It does not count any NULL values. This function is nondeterministic, i.e., even for the same input, can exhibit different outputs on different runs.
Syntax:
PERCENT_RANK() OVER (
[PARTITION BY column_1]
ORDER BY column_2 [ASC | DESC], ...
)
Explanation:
- We extract
avg(price)
asaverage_price
,manufacturer
and get the percentile of each row usingpercent_rank()
. We usepercent_rank()
overavg(price)
(ascending) column. - We want the above result w.r.t the following conditions using the
WHERE
clause:
Fixing
model
not equal to " "
Settingprice
between 1 to 655000
3. Grouping the final result w.r.t manufacturer
.
Q13. What are the ranks of the manufacturers according to the number of models offered across each vehicle type, also ensure that the ranks are not repeated?
Answer:
select count(distinct model) as models,manufacturer,type,
row_number() over(partition by type order by count(distinct model)) rank_col
from vehicles
where (price between 1 and 655000) and (manufacturer !="") and (type !="")
group by manufacturer,type;
ROW_NUMBER() Function
ROW_NUMBER()
: It is used to get a unique sequential number for each row. It gives the rank one for the first row and then increments the value by one for each row. We get different ranks for the row having similar values as well.
Syntax:
PERCENT_RANK() OVER (
[PARTITION BY column_1]
ORDER BY column_2 [ASC | DESC], ...
)
Explanation:
- We extract the unique number of models using
count(distinct model)
and save it in the columnmodels
. - We also extract
manufacturer
,type
and get the unique sequential number for each row usingrow_number()
. - We use
row_number
overtype
. Then, we sort it by arranging unique no. of models in ascending order using the code:order by count(distinct model)
. We save this in a column namedrank_col
. - We want the above result w.r.t the following conditions using the
WHERE
clause:
Fixing
manufacturer
not equal to " "
Fixingtype
not equal to " "
Settingprice
between 1 to 655000
5. Grouping the final result w.r.t manufacturer
and type
.
Q14. Find out the cheapest 3 models for each manufacturer, mention all models in case of a tie?
Answer:
select * from (
select avg(price) as average_price,manufacturer,model,
rank() over(partition by manufacturer order by avg(price)) rank_col
from vehicles
where (price between 1 and 655000) and (manufacturer !="") and (model
!="")
group by manufacturer,model)
where rank_col<=3
Refer to Question 10 for the explanation.
Q15. Find out the top 5 models that have the highest average price across all vehicle types?
Answer:
select * from (
select avg(price) as average_price,type,model,
rank() over(partition by type order by avg(price) desc) rank_col
from vehicles
where (price between 1 and 655000) and (type !="") and (model !="")
group by type,model)
where rank_col<=5
Refer to Question 10 for the explanation.
Conclusion
Congratulations on making it this far!
The more you practice, the easier it will be to write similar queries regardless of whether you are a beginner or not.
If you enjoyed these problems and want to practice more, click on the link below for more exciting yet challenging questions on the same dataset.
Getting Started with SQL Queries - Exercises for Beginners Part-3
Hope you enjoyed the post. See you in the next one.