Introduction

Welcome back to Exercise-2 on the SQL Queries for Beginners.

In the first part of the SQL Queries for Beginners, we discussed and learnt about some of the basic yet very important functions of SQL, namely,

SELECT
FROM
WHERE
GROUP BY
ORDER BY
CASE

If you're new here we recommend you to start with our first exercise by clicking on the link provided below:

Getting Started with SQL Queries - Exercises for Beginners Part-1

If you've already completed the exercises in the previous tutorial or if you are already a little ahead in SQL, then you're ready to try this section.

We will keep practising the questions on the same dataset as before, i.e., "Used Car Dataset."

To learn how to run your queries on Dbeaver or Python, visit the link below:
DBeaver vs Python: Two Methods To Run Your Queries
This is a step-by-step guide on how to run your SQL queries using Dbeaver and Python..

Without further ado, let's start with the next set of questions we have for you.

ūüí°
To know more about such content - follow us on Twitter.

Q1. Mention the costliest 8 cylinder model across each manufacturer?

Answer:

SELECT Max(price) AS max_price,
       cylinders,
       model,
       manufacturer
FROM   vehicles
WHERE  ( cylinders = "8 cylinders" )
       AND ( price BETWEEN 1 AND 655000 )
GROUP  BY manufacturer
ORDER  BY max_price DESC; 

Explanation:

  1. Calculating maximum price using MAX() function and storing it in a new column named max_price.
  2. Extracting max_price and corresponding cylinders, model and manufacturer w.r.t. the following conditions using the WHERE clause:

Fixing cylinders equal to "8 cylinders"
Setting the price limit from 1 to 655000

3.  Grouping the result w.r.t manufacturer together using GROUP BY.

4.  Arranging the result w.r.t max_price in descending order using ORDER BY.

Q2. What are the cheapest full-size models across all manufacturers?

Answer:

SELECT Min(price) AS min_price,
       model,
       manufacturer
FROM   vehicles
WHERE  ( size = "full-size" )
       AND ( price BETWEEN 1 AND 655000 )
GROUP  BY model,
          manufacturer
ORDER  BY min_price; 

Explanation:

  1. Calculating minimum price using MIN() function and storing it in a new column named min_price.
  2. Extracting min_price and corresponding VIN, model and manufacturer w.r.t. the following conditions using the WHERE clause:

Fixing size equal to "full-size"
Setting the price limit from 1 to 655000

3.  Grouping the result w.r.t manufacturer together using GROUP BY.

4.  Arranging the result w.r.t. min_price in ascending order using ORDER BY.

Q3. Compare the average price across manufacturers for the years 2020 and 2021?

Answer:

SELECT table1.manufacturer,
       average_price_2020,
       average_price_2021
FROM   (SELECT Avg(price) AS average_price_2020,
               manufacturer
        FROM   vehicles
        WHERE  ( year = "2020" )
               AND ( price BETWEEN 1 AND 655000 )
        GROUP  BY manufacturer) table1
       INNER JOIN 
       (SELECT Avg(price) AS average_price_2021,
                          manufacturer
        FROM   vehicles
        WHERE  ( year = "2021" )
               AND ( price BETWEEN 1 AND 655000 )
        GROUP  BY manufacturer) table2
ON table1.manufacturer = table2.manufacturer;

Let's first understand what INNER JOIN means.

INNER JOIN keyword

  • INNER JOIN: It creates a new final table by combining column values of two tables (table1 and table2) based upon the column with common entries. The query compares each row of table1 with each row of table2 to find all pairs of rows which are common among the two tables. When this is found, column values for each matched pair of rows of A and B are combined into a result row. It is also called EQUIJOIN.

Syntax:

SELECT table1.column1, table2.column2...
FROM table1
INNER JOIN table2
ON table1.common_column = table2.common_column;

Understanding the query below.

Explanation:

We will divide the whole query into small parts and understand each one in depth.

  1. Let's first look at the part of the query given below:
(select avg(price) as average_price_2020,manufacturer
  from vehicles 
  where (year="2020") and (price between 1 and 655000)
  group by manufacturer) table1

Here, we are extracting average_price for year 2020 and manufacturer, w.r.t the following conditions using the WHERE clause:

Fixing year='2020'
Setting price between 1 to 655000

After this, we group our results w.r.t the manufacturer. We call this resultant table, table1.

2.  Now, let's understand another chunk of the query below:

(select avg(price) as average_price_2021,manufacturer
  from vehicles 
  where (year="2021") and (price between 1 and 655000)
  group by manufacturer) table2

Here, we are extracting average_price for year 2021 and manufacturer, w.r.t the following conditions using the WHERE clause:

Fixing year='2021'
Setting price between 1 to 655000

After this, we group our results w.r.t the manufacturer. We call this resultant table, table2.

3.  Next, we use INNER JOIN on table1 and table2 and we specify the join-predicate to be manufacturer using the query:

on table1.manufacturer=table2.manufacturer

4. Lastly, from this resultant table we extract the following columns: table1.manufacturer, average_price_2020, average_price_2021.

This will give us the final result, by comparing the average prices across manufacturers for the years 2020 and 2021.

Q4. Mention all the models for trucks of the manufacturer that has the cheapest average truck price?

Answer:

select model from vehicles where manufacturer in
  (select manufacturer
  from vehicles 
  where (type="truck") and (price between 1 and 655000)
  group by manufacturer
  order by avg(price)
  limit 1)
and (type='truck') and (price between 1 and 655000)

Explanation:

  1. Extracting manufacturer (second SELECT statement) w.r.t the following conditions using WHERE clause:

Fixing type='truck'
Setting price between 1 to 655000

2.  Grouping the results w.r.t the manufacturer.

3.   Arranging the result in ascending order w.r.t average_price and printing the first row using LIMIT 1.

4.  Finally, extracting model w.r.t the following conditions:

manufacturer is equal to what we obtained from the above result.
type='truck'
Setting price between 1 to 655000

Q5. Compare the average prices of automatic sedans across all manufacturers?

Answer:

select avg(price) as average_price,manufacturer
from vehicles 
where (type="sedan") and (transmission="automatic") and 
 (price between 1 and 655000)
group by manufacturer
order by avg(price)

Explanation:

  1. Calculating average price using AVG() function and storing it in a new column named average_price.
  2. Extracting average_price and corresponding manufacturer, w.r.t. the following conditions:

type='sedan'
transmission='automatic'
Setting price between 1 to 655000

3.   Grouping all the different manufacturers together using GROUP BY.

4.   Arranging the result w.r.t. average price in ascending order using ORDER BY.

Q6. How does the price vary for different vehicle types across distance travelled?

Answer:

select avg(price) as average_price,type,
CASE 
        WHEN odometer >= 0 and odometer <= 20000 then '0-20000'
        WHEN odometer > 20000 and odometer <= 40000 then '20000-40000'
        WHEN odometer > 40000 and odometer <= 60000 then '40000-60000'
        WHEN odometer > 60000 and odometer <= 80000 then '60000-80000'
        WHEN odometer > 80000 and odometer <= 100000 then '80000-100000'
        WHEN odometer > 100000 then '<100000'
        ELSE 'NA'
        end as distance_travelled
from vehicles 
where price between 1 and 655000 
group by type,distance_travelled;
We already know what the CASE statement means (Exercise-1, Question #3), so jumping straight to the explanation.

Explanation:

  1. Calculating average price using AVG() function and storing it in a new column named average_price.
  2. Defining cases for different conditions using the CASE statement, i.e.: when odometer>=0 and odometer <=20000 then we are printing '0-20000' and so on.. and naming this new column distance_travelled using the END command.
  3. Setting the price limit between 1 to 655000.
  4. Grouping the results with respect to the new column distance_travelled.

Q7. What is the difference between the average price for excellent conditioned trucks and new conditioned trucks?

Answer:

SELECT table1.TYPE,
       excellent_average_price,
       new_average_price,
       ( excellent_average_price-new_average_price ) AS price_difference
FROM   (SELECT Avg(price) AS excellent_average_price,
               TYPE
        FROM   vehicles
        WHERE  ( condition = "excellent" )
               AND ( TYPE = "truck" )
               AND ( price BETWEEN 1 AND 655000 )) table1
       INNER JOIN (SELECT Avg(price) AS new_average_price,
                          TYPE
                   FROM   vehicles
                   WHERE  ( condition = "new" )
                          AND ( TYPE = "truck" )
                          AND ( price BETWEEN 1 AND 655000 )) table2
               ON table1.TYPE = table2.TYPE
;

Explanation:

  1. Let's first look at the part of the query given below:
(select avg(price) as excellent_average_price,type
      from vehicles 
      where (condition="excellent") and (type="truck") and 
      (price between 1 and 655000 )) table1

Here, we are extracting average_price as excellent_average_price and type, w.r.t the following conditions using the WHERE clause:

Fixing condition='excellent'
Fixing type='truck'
Setting price between 1 to 655000

We call this resultant table, table1.

2.  Now, let's understand another chunk of the query below:

(select avg(price) as new_average_price,type
      from vehicles 
      where (condition="new") and (type="truck") and 
      (price between 1 and 655000 )) table2

Here, we are extracting average_price as new_average_price and type, w.r.t the following conditions using the WHERE clause:

Fixing condition='new'
Fixing type='truck'
Setting price between 1 to 655000

We call this resultant table, table2.

3.  Next, we use INNER JOIN on table1 and table2 and we specify the join-predicate to be type using the query:

on table1.type=table2.type.

4.  Lastly, from this resultant table we extract the following columns:

  • table1.type
  • excellent_average_price
  • new_average_price
  • [excellent_average_price - new_average_price] = price_difference. (taking difference of 2 columns).

This will give us the final result by comparing the average price for excellent condition trucks and new condition trucks.

Q8. What is the percentage distribution of trucks across fuel types?

Answer:

SELECT table1.fuel,
       fuel_vehicle_counts,
       total_vehicle_counts,
       ( 100 * fuel_vehicle_counts / total_vehicle_counts )
FROM   (SELECT COUNT(id) AS fuel_vehicle_counts,
               fuel,
               TYPE
        FROM   vehicles
        WHERE  ( price BETWEEN 1 AND 655000 )
               AND ( fuel != "" )
               AND ( TYPE = "truck" )
        GROUP  BY fuel,
                  TYPE) table1
       INNER JOIN (SELECT COUNT(id) AS total_vehicle_counts,
                          TYPE
                   FROM   vehicles
                   WHERE  ( price BETWEEN 1 AND 655000 )
                          AND ( fuel != "" )
                          AND ( TYPE = "truck" )
                   GROUP  BY TYPE) table2
               ON table1.TYPE = table2.TYPE; 

Explanation:

  1. Let's first look at the part of the query given below:
(select COUNT(id) as fuel_vehicle_counts,fuel,type
  from vehicles
  where  (price between 1 and 655000) and (fuel!="")  and (type="truck")
  group by fuel,type) table1

Here, we are extracting average_price as excellent_average_price and type, w.r.t the following conditions using the WHERE clause:

Fixing fuelnot equal to " "
Fixing type='truck'
Setting price between 1 to 655000

Grouping the results w.r.t fuel and type. We call this resultant table, table1.

2.   Now, let's understand another chunk of the query below:

(select COUNT(id) as total_vehicle_counts,type
  from vehicles
  where  (price between 1 and 655000) and (fuel!="")  and (type="truck")
  group by type) table2

Here, we are extracting COUNT(id) (no. of ids) as total_vehicle_counts and type, w.r.t the following conditions using the WHERE clause:

Fixing fuelnot equal to " "
Fixing type='truck'
Setting price between 1 to 655000

Grouping the results w.r.t type. We call this resultant table, table2.

3.   Next, we use INNER JOIN on table1 and table2 and we specify the join-predicate to be type using the query:

on table1.type=table2.type.

4.   Lastly, from this resultant table we extract the following columns:

  • table1.fuel
  • fuel_vehicle_counts
  • total_vehicle_counts
  • (100*fuel_vehicle_counts/total_vehicle_counts)

This will give us the final result.

Q9. How does the price vary for hatchbacks and SUVs across manufacturers?

Answer:

SELECT table1.manufacturer,
       average_hatchback_price,
       average_suv_price
FROM   (SELECT Avg(price) AS average_hatchback_price,
               manufacturer
        FROM   vehicles
        WHERE  ( TYPE = "hatchback" )
               AND ( price BETWEEN 1 AND 655000 )
        GROUP  BY manufacturer) table1
       INNER JOIN (SELECT Avg(price) AS average_SUV_price,
                          manufacturer
                   FROM   vehicles
                   WHERE  ( TYPE = "suv" )
                          AND ( price BETWEEN 1 AND 655000 )
                   GROUP  BY manufacturer) table2
               ON table1.manufacturer = table2.manufacturer
;

Explanation:

  1. Let's first look at the part of the query given below:
(select avg(price) as average_hatchback_price,manufacturer
	from vehicles 
	where (type="hatchback") and (price between 1 and 655000)
	group by manufacturer) table1

Here, we are extracting average_price as average_hatchback_price and manufacturer, w.r.t the following conditions using the WHERE clause:

Fixing type='hatchback'
Setting price between 1 to 655000

Grouping the result w.r.t the manufacturer. We call this resultant table, table1.

2.  Now, let's understand another chunk of the query below:

(select avg(price) as average_SUV_price,manufacturer
	from vehicles 
	where (type="SUV") and (price between 1 and 655000)
	group by manufacturer) table2

Here, we are extracting average_price as average_SUV_price and manufacturer, w.r.t the following conditions using the WHERE clause:

Fixing type='SUV'
Setting price between 1 to 655000

Grouping the result w.r.t the manufacturer. We call this resultant table, table2.

3.  Next, we use INNER JOIN on table1 and table2 and we specify the join-predicate to be manufacturer using the query:

on table1.manufacturer=table2.manufacturer.

4.  Lastly, from this resultant table we extract the following columns:

table1.manufacturer, average_hatchback_price, average_SUV_price.

This will tell us how the price varies for hatchbacks and SUVs across manufacturers.

Q10. Find out the top three manufacturers that have the highest average price across each region?

Answer:

SELECT region,
       manufacturer,
       average_price,
       rank_col
FROM   (SELECT Avg(price)                    AS average_price,
               manufacturer,
               region,
               RANK()
                 OVER(
                   PARTITION BY region
                   ORDER BY Avg(price) desc) rank_col
        FROM   vehicles
        WHERE  ( price BETWEEN 1 AND 655000 )
               AND ( manufacturer != "" )
        GROUP  BY manufacturer)
WHERE  rank_col <= 3
ORDER  BY region; 

RANK(), OVER() and PARTITION BY() Functions

  • RANK(): It is used to specify ranks for each row within a partition of a result set. It is also known as Window Functions.
  • OVER(): It is used to define a set of rows in the result set.
  • PARTITION BY() : It distributes the rows in the result set into partitions by one or more criteria.

Syntax:

SELECT column_1, 
RANK () OVER ( PARTITION BY column_2
ORDER BY column_1
) AS New_column_name 
FROM table_name;

Explanation:

  1. We extract AVG(price) as average_price, manufacturer, region
  2. We also extract rank for each row using the function rank(). To specify the ranks, we use the criteria that we partition the result w.r.t. region and arrange it in descending order w.r.t avg(price) (using order by) ~saving this in the column name rank_col.
  3. We want the above result w.r.t the following conditions using the WHERE clause:

Fixing manufacturernot equal to " "
Setting price between 1 to 655000

4.  Grouping the above result w.r.t. manufacturer

5.  Finally, we select everything ( SELECT *) from the above output, with the condition that rank_col<=3.

This will give us the top three manufacturers that have the highest average price across each region.

Q11. Mark in which quantiles do the models lie according to their average price?

Answer:

select avg(price) as average_price,model,
ntile(4) over (order by avg(price))
from vehicles
where (price between 1 and 655000) and (model !="") 
group by model

NTILE(N) Function

  • NTILE(n): It is another Window Function. It is used to distribute the number of rows in the specified number of groups (n). Each row group gets it's rank w.r.t a specified condition. We need to specify the value for the desired number of groups inside the parenthesis.

Syntax:

SELECT *, 
       NTILE(2) OVER(
       ORDER BY Column_1 DESC) Rank
FROM Table_name

Explanation:

  1. Extracting AVG(price) as average_price, model
  2. We split the rows in avg(price) (ascending order using order by) into 4 groups using the function ntile(4).
  3. We want the above result w.r.t the following conditions using the WHERE clause:

Fixing modelnot equal to " "
Setting price between 1 to 655000

4. We then group the result w.r.t. the model.

Q12. Mark each manufacturer according to its percentile rank based on its average price?

Answer:

select avg(price) as average_price,manufacturer,
percent_rank() over (order by avg(price))
from vehicles
where (price between 1 and 655000) and (manufacturer !="") 
group by manufacturer

First, let's learn more about thepercent_rank() function.

PERCENT_RANK() Function

  • PERCENT_RANK(): This function in SQL Server calculates the relative rank SQL Percentile of each row. It always returns values greater than 0, and the highest value is 1. It does not count any NULL values. This function is nondeterministic, i.e., even for the same input, can exhibit different outputs on different runs.

Syntax:

PERCENT_RANK() OVER (
      [PARTITION BY column_1]
      ORDER BY column_2 [ASC | DESC], ...
  )

Explanation:

  1. We extract avg(price) as average_price,  manufacturer and get the percentile of each row using percent_rank(). We use percent_rank() over avg(price) (ascending) column.
  2. We want the above result w.r.t the following conditions using the WHERE clause:

Fixing modelnot equal to " "
Setting price between 1 to 655000

3. Grouping the final result w.r.t manufacturer.

Q13. What are the ranks of the manufacturers according to the number of models offered across each vehicle type, also ensure that the ranks are not repeated?

Answer:

select count(distinct model) as models,manufacturer,type,
row_number() over(partition by type order by count(distinct model)) rank_col  
from vehicles
where (price between 1 and 655000) and (manufacturer !="") and (type !="") 
group by manufacturer,type;

ROW_NUMBER() Function

  • ROW_NUMBER(): It is used to get a unique sequential number for each row. It gives the rank one for the first row and then increments the value by one for each row. We get different ranks for the row having similar values as well.

Syntax:

PERCENT_RANK() OVER (
      [PARTITION BY column_1]
      ORDER BY column_2 [ASC | DESC], ...
  )

Explanation:

  1. We extract the unique number of models using count(distinct model) and save it in the column models.
  2. We also extract manufacturer, type and get the unique sequential number for each row using row_number().
  3. We use row_number over type. Then, we sort it by arranging unique no. of models in ascending order using the code: order by count(distinct model). We save this in a column named rank_col.
  4. We want the above result w.r.t the following conditions using the WHERE clause:

Fixing manufacturernot equal to " "
Fixing typenot equal to " "
Setting price between 1 to 655000

5.  Grouping the final result w.r.t manufacturerand type.

Q14. Find out the cheapest 3 models for each manufacturer, mention all models in case of a tie?

Answer:

select * from (
  select avg(price) as average_price,manufacturer,model,
  rank() over(partition by manufacturer order by avg(price)) rank_col  
  from vehicles
  where (price between 1 and 655000) and (manufacturer !="") and (model   
  !="") 
  group by manufacturer,model)
where rank_col<=3

Refer to Question 10 for the explanation.

Q15. Find out the top 5 models that have the highest average price across all vehicle types?

Answer:

select * from (
  select avg(price) as average_price,type,model,
  rank() over(partition by type order by avg(price) desc)  rank_col  
  from vehicles
  where (price between 1 and 655000) and (type !="") and (model !="") 
  group by type,model)
where rank_col<=5
Refer to Question 10 for the explanation.

Conclusion

Congratulations on making it this far!

The more you practice, the easier it will be to write similar queries regardless of whether you are a beginner or not.

If you enjoyed these problems and want to practice more, click on the link below for more exciting yet challenging questions on the same dataset.

Getting Started with SQL Queries - Exercises for Beginners Part-3

Hope you enjoyed the post. See you in the next one.