How to be an awesome Android Marketplace Data Analyst using SQL -Part 3

Introduction

Welcome to the third and the final part of the series on 'Data Analysis using SQL' with a real-life  Android Apps Dataset to answer some analytics-oriented questions.

We hope you are finding the series an efficient way to learn SQL as well as Data Analysis. Visit the two parts in case you haven't;

How to be an awesome Android Marketplace Data Analyst for Beginners - Part 1
This post will guide you through writing SQL queries using a real-life dataset of Android apps to answer some analytics-oriented questions.
How to be an awesome Android Marketplace Data Analyst using SQL -Part 2
Part two of our ‘Data Analysis using SQL’ using the real-life Android Apps Dataset. A guide to answering some business questions using SQL.

Let's have a quick recap on the dataset we are using:

COLUMNDESCRIPTION
AppName of the Play Store app
Category The category of the app ( GAME, BUSINESS, FAMILY, etc.)
RatingRating of the app out of 5
Reviews Number of reviews for the app on Play Store
Size Size of the app in MBs
Installs Number of installs
TypeWhether the app is free or paid
PricePrice of the app
Content RatingWho is this app meant for? (Teen etc.)
GenresStyle or category of the app ( Entertainment, Tools, etc.)
Last UpdatedThe date on which the app was last updated
Current VerThe current version of the app
Android VerThe Android Version the app is capable with

We will be writing our queries on Falcon SQL editor. Use this easy guide to set up Falcon on your computer and load the database into it, in case you haven't already.

Write SQL queries effortlessly with Falcon by Plotly
This post guides you through downloading and setting up Falcon - a free, open-source SQL editor which enables you to write your SQL queries without the hassle and moreover gives you an option to visualize the results.

Seems like you are all set. Let's solve some business questions using SQL !

✔️
Follow us on Twitter to know more about such posts.

1. Every App in the database is compatible with some version of Android given in the column named 'Android Ver'.
Find out the number of Apps compatible with each version. Which Android Version has the most compatible Apps?


Answer

SELECT [Android Ver],
       Count(App)
FROM   androidapps
GROUP  BY [Android Ver]
ORDER  BY Count(App) DESC;  

Output

The most number of Apps are compatible with Android Version 4.1 and up that is 2202 Apps.  The second and third on the list are Versions 4.0.3 and up and 4.0 and up with 1395 and 1285 each respectively.

SQL statements, clauses, and keywords used in the query are SELECT, FROM,COUNT(),GROUP BY and ORDER BY

To know about these statements in detail refer to our post on beginner SQL queries

Getting Started with SQL Queries - Exercises for Beginners Part-1
Here’s a selection of the most useful SQL queries every beginner must practice. This article will not only provide you with the answers but also the explanation to each query.

COUNT()is used to count the number of Apps in each 'Android Version'.

GROUP BYis used to group the Apps in different types of 'Android Version'.

ORDER BY  the keyword is used to sort the result-set in ascending or descending order. ORDER BY  keyword sorts the records in ascending order by default. To sort the records in descending order we use the DESC keyword along with ORDER BY.

2. Find out the total number of Apps that are compatible with Android version 5.0 or above?

Answer

SELECT Count(app),
       [android ver]
FROM   androidapps
WHERE  [android ver] > 5.0
GROUP  BY [android ver] 


SELECT SUM(app)
FROM   androidapps
WHERE  [android ver] > 5.0;

Output

Total Apps compatible with Android Versions 5.0 and above are 3299 which is way lesser than previous Android Versions combined i.e. 4.1 , 4.0.3, etc.

SQL statements, clauses, and keywords used in the query are SELECT, FROM,COUNT(), SUM(),WHERE, and GROUP BY .

SUM() function returns the total sum of a numeric column.

WHERE clause is used to filter records.

The following operators can be used in the WHERE clause:

OperatorDescription
=Equal
>Greater than
<Less than
>=Greater than or equal
<=Less than or equal
<>Not equal. Note: In some versions of SQL this operator may be written as !=
BETWEENBetween a certain range
LIKESearch for a pattern
INTo specify multiple possible values for a column

3. Check out which 'Content Rating' of Apps has the maximum number of Installs and fetch a Top 10 Apps list (by installs) in that category?

Answer

SELECT SUM(installs) AS Total_Installs,
       [Content Rating]
FROM   androidapps
GROUP  BY [Content Rating]
ORDER  BY total_installs DESC;
SELECT App,
       Installs
FROM   androidapps
WHERE  [Content Rating] = 'Everyone'
ORDER  BY Installs DESC
LIMIT  10; 

Output

Top Ten apps by Installs 
The Apps with Content Rating as 'Everyone' has the most number of Installs that is 52179352961 installs. The top Apps in that Content Rating category are Messenger, Google Chrome, Gmail, Hangouts, etc.

SQL statements, clauses, and keywords used in the query are SELECT, FROM,AS, SUM(),WHERE,GROUP BY and LIMIT.

• MySQL supports the LIMIT clause to select a limited number of records.Whereas, Oracle uses FETCH FIRST n ROWS ONLY and ROWNUM and SQL Server and MS Access uses SELECT TOP clause is used to specify the number of records to return.

4. Are paid apps downloaded as much as free apps?

Answer

SELECT Sum(Installs)AS Total_Installs,
       Type
FROM   androidapps
GROUP  BY Type; 

Output

The number of total Installs of Free Apps is 75065572646 whereas the number of total Installs of Paid Apps is 57364881. There is a percentage difference of 199.695% (Woah!). It can be said that Free Apps are more likely to be installed than Paid Apps.

SQL statements, clauses, and keywords used in the query are SELECT, FROM,SUM() and GROUP BY

5. Does the Price of an App depend on its Rating?

Answer

SELECT Avg(Price),
       Rating
FROM   androidapps
GROUP  BY Rating
ORDER  BY Avg(Price) DESC; 

Output

There is no such dependence of Ratings over the Pricing of an App. The Highest Average Price is for the Apps with Rating 2.9 which is a relatively lower rating. The second highest Price of Apps is for Rating 3.8.

SQL statements, clauses, and keywords used in the query are SELECT, FROM,AVG(),GROUP BY and ORDER BY.

AVG() function returns the average value of a numeric column.

6. Which Category of Apps do users don't mind paying for?

Answer

SELECT Sum(Installs),
       Category
FROM   androidapps
WHERE  type = 'Paid'
GROUP  BY Category
ORDER  BY Sum(Installs) DESC; 

Output

Despite the Paid nature of the Apps, Users tend to install the Apps in the Category of Games. I guess that is understandable! Second, on the list is the Category Family with 20649814 installs.

SQL statements, clauses, and keywords used in the query are SELECT, FROM,SUM(),WHERE, GROUP BY and ORDER BY.

7. Which Content Rating of Paid Apps have the users spent the most on?

Answer

SELECT [content rating],
       ( installs * price ) AS Expenditure
FROM   androidapps
WHERE  type = 'Paid'
GROUP  BY [content rating]
ORDER  BY expenditure DESC;

Output

The Content Rating on which the Users have spent the most is the 'Teen' category. The total expenditure on this Category is $169899.99999999997.

SQL statements, clauses, and keywords used in the query are SELECT, FROM,WHERE,AS,GROUP BY and ORDER BY.


Conclusion

Well, that was the end of our 'Data Analysis using SQL' series. We hope you found the series informative and productive by not only learning about the SQL statements and functions but also applying them to a real-world scenario.

Check out our other series about the same using the 'Used Cars' dataset:

Getting Started with SQL Queries - Exercises for Beginners Part-1
Here’s a selection of the most useful SQL queries every beginner must practice. This article will not only provide you with the answers but also the explanation to each query.

We hope to bring more informative content like this your way. Suppose you want more articles, tutorials, Data World updates delivered directly to your inbox - subscribe.

See you soon!