# Introduction

Welcome back to Exercise-2 on the SQL Queries for Beginners.

In the first part of the SQL Queries for Beginners, we discussed and learnt about some of the basic yet very important functions of SQL, namely,

`SELECT`
`FROM`
`WHERE`
`GROUP BY`
`ORDER BY`
`CASE`

If you're new here we recommend you to start with our first exercise by clicking on the link provided below:

Getting Started with SQL Queries - Exercises for Beginners Part-1

If you've already completed the exercises in the previous tutorial or if you are already a little ahead in SQL, then you're ready to try this section.

We will keep practising the questions on the same dataset as before, i.e., "Used Car Dataset."

To learn how to run your queries on Dbeaver or Python, visit the link below:

Without further ado, let's start with the next set of questions we have for you.

💡
To know more about such content - follow us on Twitter.

## Q1. Mention the costliest 8 cylinder model across each manufacturer?

``````SELECT Max(price) AS max_price,
cylinders,
model,
manufacturer
FROM   vehicles
WHERE  ( cylinders = "8 cylinders" )
AND ( price BETWEEN 1 AND 655000 )
GROUP  BY manufacturer
ORDER  BY max_price DESC; ``````

Explanation:

1. Calculating maximum price using `MAX()` function and storing it in a new column named `max_price`.
2. Extracting `max_price` and corresponding `cylinders`, `model` and `manufacturer` w.r.t. the following conditions using the `WHERE` clause:

Fixing `cylinders` equal to "8 cylinders"
Setting the price limit from 1 to 655000

3.  Grouping the result w.r.t `manufacturer` together using `GROUP BY`.

4.  Arranging the result w.r.t `max_price` in descending order using `ORDER BY`.

## Q2. What are the cheapest full-size models across all manufacturers?

``````SELECT Min(price) AS min_price,
model,
manufacturer
FROM   vehicles
WHERE  ( size = "full-size" )
AND ( price BETWEEN 1 AND 655000 )
GROUP  BY model,
manufacturer
ORDER  BY min_price; ``````

Explanation:

1. Calculating minimum price using `MIN()` function and storing it in a new column named `min_price`.
2. Extracting `min_price` and corresponding `VIN`, `model` and `manufacturer` w.r.t. the following conditions using the `WHERE` clause:

Fixing `size` equal to "full-size"
Setting the price limit from 1 to 655000

3.  Grouping the result w.r.t `manufacturer` together using `GROUP BY`.

4.  Arranging the result w.r.t. `min_price` in ascending order using `ORDER BY`.

## Q3. Compare the average price across manufacturers for the years 2020 and 2021?

``````SELECT table1.manufacturer,
average_price_2020,
average_price_2021
FROM   (SELECT Avg(price) AS average_price_2020,
manufacturer
FROM   vehicles
WHERE  ( year = "2020" )
AND ( price BETWEEN 1 AND 655000 )
GROUP  BY manufacturer) table1
INNER JOIN
(SELECT Avg(price) AS average_price_2021,
manufacturer
FROM   vehicles
WHERE  ( year = "2021" )
AND ( price BETWEEN 1 AND 655000 )
GROUP  BY manufacturer) table2
ON table1.manufacturer = table2.manufacturer;``````

Let's first understand what `INNER JOIN` means.

### INNER JOIN keyword

• `INNER JOIN`: It creates a new final table by combining column values of two tables (table1 and table2) based upon the column with common entries. The query compares each row of table1 with each row of table2 to find all pairs of rows which are common among the two tables. When this is found, column values for each matched pair of rows of A and B are combined into a result row. It is also called `EQUIJOIN`.

Syntax:

``````SELECT table1.column1, table2.column2...
FROM table1
INNER JOIN table2
ON table1.common_column = table2.common_column;``````

Understanding the query below.

Explanation:

We will divide the whole query into small parts and understand each one in depth.

1. Let's first look at the part of the query given below:
``````(select avg(price) as average_price_2020,manufacturer
from vehicles
where (year="2020") and (price between 1 and 655000)
group by manufacturer) table1``````

Here, we are extracting `average_price` for year 2020 and `manufacturer`, w.r.t the following conditions using the `WHERE` clause:

Fixing `year`='2020'
Setting `price` between 1 to 655000

After this, we group our results w.r.t the `manufacturer`. We call this resultant table, `table1`.

2.  Now, let's understand another chunk of the query below:

``````(select avg(price) as average_price_2021,manufacturer
from vehicles
where (year="2021") and (price between 1 and 655000)
group by manufacturer) table2``````

Here, we are extracting `average_price` for year 2021 and `manufacturer`, w.r.t the following conditions using the `WHERE` clause:

Fixing `year`='2021'
Setting `price` between 1 to 655000

After this, we group our results w.r.t the `manufacturer`. We call this resultant table, `table2`.

3.  Next, we use `INNER JOIN` on `table1` and `table2` and we specify the join-predicate to be `manufacturer` using the query:

`on table1.manufacturer=table2.manufacturer`

4. Lastly, from this resultant table we extract the following columns: `table1.manufacturer`, `average_price_2020`, `average_price_2021`.

This will give us the final result, by comparing the average prices across manufacturers for the years 2020 and 2021.

## Q4. Mention all the models for trucks of the manufacturer that has the cheapest average truck price?

``````select model from vehicles where manufacturer in
(select manufacturer
from vehicles
where (type="truck") and (price between 1 and 655000)
group by manufacturer
order by avg(price)
limit 1)
and (type='truck') and (price between 1 and 655000)``````

Explanation:

1. Extracting `manufacturer` (second `SELECT` statement) w.r.t the following conditions using `WHERE` clause:

Fixing `type`='truck'
Setting `price` between 1 to 655000

2.  Grouping the results w.r.t the `manufacturer`.

3.   Arranging the result in ascending order w.r.t `average_price` and printing the first row using `LIMIT 1`.

4.  Finally, extracting `model` w.r.t the following conditions:

`manufacturer` is equal to what we obtained from the above result.
`type`='truck'
Setting `price` between 1 to 655000

## Q5. Compare the average prices of automatic sedans across all manufacturers?

``````select avg(price) as average_price,manufacturer
from vehicles
where (type="sedan") and (transmission="automatic") and
(price between 1 and 655000)
group by manufacturer
order by avg(price)``````

Explanation:

1. Calculating average price using `AVG()` function and storing it in a new column named `average_price`.
2. Extracting `average_price` and corresponding `manufacturer`, w.r.t. the following conditions:

`type`='sedan'
`transmission`='automatic'
Setting `price` between 1 to 655000

3.   Grouping all the different manufacturers together using `GROUP BY`.

4.   Arranging the result w.r.t. average price in ascending order using `ORDER BY`.

## Q6. How does the price vary for different vehicle types across distance travelled?

``````select avg(price) as average_price,type,
CASE
WHEN odometer >= 0 and odometer <= 20000 then '0-20000'
WHEN odometer > 20000 and odometer <= 40000 then '20000-40000'
WHEN odometer > 40000 and odometer <= 60000 then '40000-60000'
WHEN odometer > 60000 and odometer <= 80000 then '60000-80000'
WHEN odometer > 80000 and odometer <= 100000 then '80000-100000'
WHEN odometer > 100000 then '<100000'
ELSE 'NA'
end as distance_travelled
from vehicles
where price between 1 and 655000
group by type,distance_travelled;
``````
We already know what the `CASE` statement means (Exercise-1, Question #3), so jumping straight to the explanation.

Explanation:

1. Calculating average price using `AVG()` function and storing it in a new column named `average_price`.
2. Defining cases for different conditions using the `CASE` statement, i.e.: when odometer>=0 and odometer <=20000 then we are printing '0-20000' and so on.. and naming this new column `distance_travelled` using the `END` command.
3. Setting the price limit between 1 to 655000.
4. Grouping the results with respect to the new column `distance_travelled`.

## Q7. What is the difference between the average price for excellent conditioned trucks and new conditioned trucks?

``````SELECT table1.TYPE,
excellent_average_price,
new_average_price,
( excellent_average_price-new_average_price ) AS price_difference
FROM   (SELECT Avg(price) AS excellent_average_price,
TYPE
FROM   vehicles
WHERE  ( condition = "excellent" )
AND ( TYPE = "truck" )
AND ( price BETWEEN 1 AND 655000 )) table1
INNER JOIN (SELECT Avg(price) AS new_average_price,
TYPE
FROM   vehicles
WHERE  ( condition = "new" )
AND ( TYPE = "truck" )
AND ( price BETWEEN 1 AND 655000 )) table2
ON table1.TYPE = table2.TYPE
;``````

Explanation:

1. Let's first look at the part of the query given below:
``````(select avg(price) as excellent_average_price,type
from vehicles
where (condition="excellent") and (type="truck") and
(price between 1 and 655000 )) table1``````

Here, we are extracting `average_price` as `excellent_average_price` and `type`, w.r.t the following conditions using the `WHERE` clause:

Fixing `condition`='excellent'
Fixing `type`='truck'
Setting `price` between 1 to 655000

We call this resultant table, `table1`.

2.  Now, let's understand another chunk of the query below:

``````(select avg(price) as new_average_price,type
from vehicles
where (condition="new") and (type="truck") and
(price between 1 and 655000 )) table2``````

Here, we are extracting `average_price` as `new_average_price` and `type`, w.r.t the following conditions using the `WHERE` clause:

Fixing `condition`='new'
Fixing `type`='truck'
Setting `price` between 1 to 655000

We call this resultant table, `table2`.

3.  Next, we use `INNER JOIN` on `table1` and `table2` and we specify the join-predicate to be `type` using the query:

`on table1.type=table2.type`.

4.  Lastly, from this resultant table we extract the following columns:

• `table1.type`
• `excellent_average_price`
• `new_average_price`
• [`excellent_average_price` - `new_average_price`] = `price_difference`. (taking difference of 2 columns).

This will give us the final result by comparing the average price for excellent condition trucks and new condition trucks.

## Q8. What is the percentage distribution of trucks across fuel types?

``````SELECT table1.fuel,
fuel_vehicle_counts,
total_vehicle_counts,
( 100 * fuel_vehicle_counts / total_vehicle_counts )
FROM   (SELECT COUNT(id) AS fuel_vehicle_counts,
fuel,
TYPE
FROM   vehicles
WHERE  ( price BETWEEN 1 AND 655000 )
AND ( fuel != "" )
AND ( TYPE = "truck" )
GROUP  BY fuel,
TYPE) table1
INNER JOIN (SELECT COUNT(id) AS total_vehicle_counts,
TYPE
FROM   vehicles
WHERE  ( price BETWEEN 1 AND 655000 )
AND ( fuel != "" )
AND ( TYPE = "truck" )
GROUP  BY TYPE) table2
ON table1.TYPE = table2.TYPE; ``````

Explanation:

1. Let's first look at the part of the query given below:
``````(select COUNT(id) as fuel_vehicle_counts,fuel,type
from vehicles
where  (price between 1 and 655000) and (fuel!="")  and (type="truck")
group by fuel,type) table1``````

Here, we are extracting average_price as excellent_average_price and type, w.r.t the following conditions using the WHERE clause:

Fixing `fuel`not equal to " "
Fixing `type`='truck'
Setting `price` between 1 to 655000

Grouping the results w.r.t `fuel` and `type`. We call this resultant table, table1.

2.   Now, let's understand another chunk of the query below:

``````(select COUNT(id) as total_vehicle_counts,type
from vehicles
where  (price between 1 and 655000) and (fuel!="")  and (type="truck")
group by type) table2``````

Here, we are extracting `COUNT(id)` (no. of ids) as `total_vehicle_counts` and type, w.r.t the following conditions using the WHERE clause:

Fixing `fuel`not equal to " "
Fixing `type`='truck'
Setting `price` between 1 to 655000

Grouping the results w.r.t `type`. We call this resultant table, table2.

3.   Next, we use `INNER JOIN` on table1 and table2 and we specify the join-predicate to be `type` using the query:

`on table1.type=table2.type`.

4.   Lastly, from this resultant table we extract the following columns:

• `table1.fuel`
• `fuel_vehicle_counts`
• `total_vehicle_counts`
• `(100*fuel_vehicle_counts/total_vehicle_counts)`

This will give us the final result.

## Q9. How does the price vary for hatchbacks and SUVs across manufacturers?

``````SELECT table1.manufacturer,
average_hatchback_price,
average_suv_price
FROM   (SELECT Avg(price) AS average_hatchback_price,
manufacturer
FROM   vehicles
WHERE  ( TYPE = "hatchback" )
AND ( price BETWEEN 1 AND 655000 )
GROUP  BY manufacturer) table1
INNER JOIN (SELECT Avg(price) AS average_SUV_price,
manufacturer
FROM   vehicles
WHERE  ( TYPE = "suv" )
AND ( price BETWEEN 1 AND 655000 )
GROUP  BY manufacturer) table2
ON table1.manufacturer = table2.manufacturer
;``````

Explanation:

1. Let's first look at the part of the query given below:
``````(select avg(price) as average_hatchback_price,manufacturer
from vehicles
where (type="hatchback") and (price between 1 and 655000)
group by manufacturer) table1``````

Here, we are extracting `average_price` as `average_hatchback_price` and `manufacturer`, w.r.t the following conditions using the `WHERE` clause:

Fixing `type`='hatchback'
Setting `price` between 1 to 655000

Grouping the result w.r.t the `manufacturer`. We call this resultant table, `table1`.

2.  Now, let's understand another chunk of the query below:

``````(select avg(price) as average_SUV_price,manufacturer
from vehicles
where (type="SUV") and (price between 1 and 655000)
group by manufacturer) table2``````

Here, we are extracting `average_price` as `average_SUV_price` and `manufacturer`, w.r.t the following conditions using the `WHERE` clause:

Fixing `type`='SUV'
Setting `price` between 1 to 655000

Grouping the result w.r.t the `manufacturer`. We call this resultant table, `table2`.

3.  Next, we use `INNER JOIN` on `table1` and `table2` and we specify the join-predicate to be `manufacturer` using the query:

`on table1.manufacturer=table2.manufacturer`.

4.  Lastly, from this resultant table we extract the following columns:

`table1.manufacturer`, `average_hatchback_price`, `average_SUV_price`.

This will tell us how the price varies for hatchbacks and SUVs across manufacturers.

## Q10. Find out the top three manufacturers that have the highest average price across each region?

``````SELECT region,
manufacturer,
average_price,
rank_col
FROM   (SELECT Avg(price)                    AS average_price,
manufacturer,
region,
RANK()
OVER(
PARTITION BY region
ORDER BY Avg(price) desc) rank_col
FROM   vehicles
WHERE  ( price BETWEEN 1 AND 655000 )
AND ( manufacturer != "" )
GROUP  BY manufacturer)
WHERE  rank_col <= 3
ORDER  BY region; ``````

### RANK(), OVER() and PARTITION BY() Functions

• `RANK()`: It is used to specify ranks for each row within a partition of a result set. It is also known as Window Functions.
• `OVER()`: It is used to define a set of rows in the result set.
• `PARTITION BY()` : It distributes the rows in the result set into partitions by one or more criteria.

Syntax:

``````SELECT column_1,
RANK () OVER ( PARTITION BY column_2
ORDER BY column_1
) AS New_column_name
FROM table_name;``````

Explanation:

1. We extract `AVG(price)` as `average_price`, `manufacturer`, `region`
2. We also extract rank for each row using the function `rank()`. To specify the ranks, we use the criteria that we partition the result w.r.t. `region` and arrange it in descending order w.r.t `avg(price)` (using order by) ~saving this in the column name `rank_col`.
3. We want the above result w.r.t the following conditions using the `WHERE` clause:

Fixing `manufacturer`not equal to " "
Setting `price` between 1 to 655000

4.  Grouping the above result w.r.t. `manufacturer`

5.  Finally, we select everything ( `SELECT *`) from the above output, with the condition that `rank_col`<=3.

This will give us the top three manufacturers that have the highest average price across each region.

## Q11. Mark in which quantiles do the models lie according to their average price?

``````select avg(price) as average_price,model,
ntile(4) over (order by avg(price))
from vehicles
where (price between 1 and 655000) and (model !="")
group by model``````

### NTILE(N) Function

• `NTILE(n)`: It is another Window Function. It is used to distribute the number of rows in the specified number of groups (n). Each row group gets it's rank w.r.t a specified condition. We need to specify the value for the desired number of groups inside the parenthesis.

Syntax:

``````SELECT *,
NTILE(2) OVER(
ORDER BY Column_1 DESC) Rank
FROM Table_name``````

Explanation:

1. Extracting `AVG(price)` as `average_price`, `model`
2. We split the rows in `avg(price)` (ascending order using `order by`) into 4 groups using the function `ntile(4)`.
3. We want the above result w.r.t the following conditions using the `WHERE` clause:

Fixing `model`not equal to " "
Setting `price` between 1 to 655000

4. We then group the result w.r.t. the `model`.

## Q12. Mark each manufacturer according to its percentile rank based on its average price?

``````select avg(price) as average_price,manufacturer,
percent_rank() over (order by avg(price))
from vehicles
where (price between 1 and 655000) and (manufacturer !="")
group by manufacturer``````

First, let's learn more about the`percent_rank()` function.

### PERCENT_RANK() Function

• `PERCENT_RANK()`: This function in SQL Server calculates the relative rank SQL Percentile of each row. It always returns values greater than 0, and the highest value is 1. It does not count any NULL values. This function is nondeterministic, i.e., even for the same input, can exhibit different outputs on different runs.

Syntax:

``````PERCENT_RANK() OVER (
[PARTITION BY column_1]
ORDER BY column_2 [ASC | DESC], ...
)``````

Explanation:

1. We extract `avg(price)` as `average_price`,  `manufacturer` and get the percentile of each row using `percent_rank()`. We use `percent_rank()` over `avg(price)` (ascending) column.
2. We want the above result w.r.t the following conditions using the `WHERE` clause:

Fixing `model`not equal to " "
Setting `price` between 1 to 655000

3. Grouping the final result w.r.t `manufacturer`.

## Q13. What are the ranks of the manufacturers according to the number of models offered across each vehicle type, also ensure that the ranks are not repeated?

``````select count(distinct model) as models,manufacturer,type,
row_number() over(partition by type order by count(distinct model)) rank_col
from vehicles
where (price between 1 and 655000) and (manufacturer !="") and (type !="")
group by manufacturer,type;``````

### ROW_NUMBER() Function

• `ROW_NUMBER()`: It is used to get a unique sequential number for each row. It gives the rank one for the first row and then increments the value by one for each row. We get different ranks for the row having similar values as well.

Syntax:

``````PERCENT_RANK() OVER (
[PARTITION BY column_1]
ORDER BY column_2 [ASC | DESC], ...
)``````

Explanation:

1. We extract the unique number of models using `count(distinct model)` and save it in the column `models`.
2. We also extract `manufacturer`, `type` and get the unique sequential number for each row using `row_number()`.
3. We use `row_number` over `type`. Then, we sort it by arranging unique no. of models in ascending order using the code: `order by count(distinct model)`. We save this in a column named `rank_col`.
4. We want the above result w.r.t the following conditions using the `WHERE` clause:

Fixing `manufacturer`not equal to " "
Fixing `type`not equal to " "
Setting `price` between 1 to 655000

5.  Grouping the final result w.r.t `manufacturer`and `type`.

## Q14. Find out the cheapest 3 models for each manufacturer, mention all models in case of a tie?

``````select * from (
select avg(price) as average_price,manufacturer,model,
rank() over(partition by manufacturer order by avg(price)) rank_col
from vehicles
where (price between 1 and 655000) and (manufacturer !="") and (model
!="")
group by manufacturer,model)
where rank_col<=3

``````
Refer to Question 10 for the explanation.

## Q15. Find out the top 5 models that have the highest average price across all vehicle types?

``````select * from (
select avg(price) as average_price,type,model,
rank() over(partition by type order by avg(price) desc)  rank_col
from vehicles
where (price between 1 and 655000) and (type !="") and (model !="")
group by type,model)
where rank_col<=5
``````
Refer to Question 10 for the explanation.

# Conclusion

Congratulations on making it this far!

The more you practice, the easier it will be to write similar queries regardless of whether you are a beginner or not.

If you enjoyed these problems and want to practice more, click on the link below for more exciting yet challenging questions on the same dataset.

Getting Started with SQL Queries - Exercises for Beginners Part-3

Hope you enjoyed the post. See you in the next one.