How to be an awesome Android Marketplace Data Analyst using SQL -Part 3
Introduction
Welcome to the third and the final part of the series on 'Data Analysis using SQL' with a real-life Android Apps Dataset to answer some analytics-oriented questions.
We hope you are finding the series an efficient way to learn SQL as well as Data Analysis. Visit the two parts in case you haven't;
Let's have a quick recap on the dataset we are using:
COLUMN | DESCRIPTION |
App | Name of the Play Store app |
Category | The category of the app ( GAME, BUSINESS, FAMILY, etc.) |
Rating | Rating of the app out of 5 |
Reviews | Number of reviews for the app on Play Store |
Size | Size of the app in MBs |
Installs | Number of installs |
Type | Whether the app is free or paid |
Price | Price of the app |
Content Rating | Who is this app meant for? (Teen etc.) |
Genres | Style or category of the app ( Entertainment, Tools, etc.) |
Last Updated | The date on which the app was last updated |
Current Ver | The current version of the app |
Android Ver | The Android Version the app is capable with |
We will be writing our queries on Falcon SQL editor. Use this easy guide to set up Falcon on your computer and load the database into it, in case you haven't already.
Seems like you are all set. Let's solve some business questions using SQL !
1. Every App in the database is compatible with some version of Android given in the column named 'Android Ver'.
Find out the number of Apps compatible with each version. Which Android Version has the most compatible Apps?
Answer
SELECT [Android Ver],
Count(App)
FROM androidapps
GROUP BY [Android Ver]
ORDER BY Count(App) DESC;
Output
The most number of Apps are compatible with Android Version 4.1 and up that is 2202 Apps. The second and third on the list are Versions 4.0.3 and up and 4.0 and up with 1395 and 1285 each respectively.
SQL statements, clauses, and keywords used in the query are SELECT
, FROM
,COUNT()
,GROUP BY
and ORDER BY
To know about these statements in detail refer to our post on beginner SQL queries
• COUNT()
is used to count the number of Apps in each 'Android Version'.
• GROUP BY
is used to group the Apps in different types of 'Android Version'.
•ORDER BY
the keyword is used to sort the result-set in ascending or descending order. ORDER BY
keyword sorts the records in ascending order by default. To sort the records in descending order we use the DESC
keyword along with ORDER BY
.
2. Find out the total number of Apps that are compatible with Android version 5.0 or above?
Answer
SELECT Count(app),
[android ver]
FROM androidapps
WHERE [android ver] > 5.0
GROUP BY [android ver]
SELECT SUM(app)
FROM androidapps
WHERE [android ver] > 5.0;
Output
Total Apps compatible with Android Versions 5.0 and above are 3299 which is way lesser than previous Android Versions combined i.e. 4.1 , 4.0.3, etc.
SQL statements, clauses, and keywords used in the query are SELECT
, FROM
,COUNT()
, SUM(),WHERE, and GROUP BY
.
•SUM()
function returns the total sum of a numeric column.
• WHERE
clause is used to filter records.
The following operators can be used in the WHERE
clause:
Operator | Description |
= | Equal |
> | Greater than |
< | Less than |
>= | Greater than or equal |
<= | Less than or equal |
<> | Not equal. Note: In some versions of SQL this operator may be written as != |
BETWEEN | Between a certain range |
LIKE | Search for a pattern |
IN | To specify multiple possible values for a column |
3. Check out which 'Content Rating' of Apps has the maximum number of Installs and fetch a Top 10 Apps list (by installs) in that category?
Answer
SELECT SUM(installs) AS Total_Installs,
[Content Rating]
FROM androidapps
GROUP BY [Content Rating]
ORDER BY total_installs DESC;
SELECT App,
Installs
FROM androidapps
WHERE [Content Rating] = 'Everyone'
ORDER BY Installs DESC
LIMIT 10;
Output
The Apps with Content Rating as 'Everyone' has the most number of Installs that is 52179352961 installs. The top Apps in that Content Rating category are Messenger, Google Chrome, Gmail, Hangouts, etc.
SQL statements, clauses, and keywords used in the query are SELECT
, FROM
,AS
, SUM(),WHERE,GROUP BY
and LIMIT
.
• MySQL supports the LIMIT
clause to select a limited number of records.Whereas, Oracle uses FETCH FIRST n ROWS ONLY
and ROWNUM
and SQL Server and MS Access uses SELECT TOP
clause is used to specify the number of records to return.
4. Are paid apps downloaded as much as free apps?
Answer
SELECT Sum(Installs)AS Total_Installs,
Type
FROM androidapps
GROUP BY Type;
Output
The number of total Installs of Free Apps is 75065572646 whereas the number of total Installs of Paid Apps is 57364881. There is a percentage difference of 199.695% (Woah!). It can be said that Free Apps are more likely to be installed than Paid Apps.
SQL statements, clauses, and keywords used in the query are SELECT
, FROM
,SUM()
and GROUP BY
5. Does the Price of an App depend on its Rating?
Answer
SELECT Avg(Price),
Rating
FROM androidapps
GROUP BY Rating
ORDER BY Avg(Price) DESC;
Output
There is no such dependence of Ratings over the Pricing of an App. The Highest Average Price is for the Apps with Rating 2.9 which is a relatively lower rating. The second highest Price of Apps is for Rating 3.8.
SQL statements, clauses, and keywords used in the query are SELECT
, FROM
,AVG(),GROUP BY
and ORDER BY
.
•AVG()
function returns the average value of a numeric column.
6. Which Category of Apps do users don't mind paying for?
Answer
SELECT Sum(Installs),
Category
FROM androidapps
WHERE type = 'Paid'
GROUP BY Category
ORDER BY Sum(Installs) DESC;
Output
Despite the Paid nature of the Apps, Users tend to install the Apps in the Category of Games. I guess that is understandable! Second, on the list is the Category Family with 20649814 installs.
SQL statements, clauses, and keywords used in the query are SELECT
, FROM
,SUM()
,WHERE
, GROUP BY
and ORDER BY
.
7. Which Content Rating of Paid Apps have the users spent the most on?
Answer
SELECT [content rating],
( installs * price ) AS Expenditure
FROM androidapps
WHERE type = 'Paid'
GROUP BY [content rating]
ORDER BY expenditure DESC;
Output
The Content Rating on which the Users have spent the most is the 'Teen' category. The total expenditure on this Category is $169899.99999999997.
SQL statements, clauses, and keywords used in the query are SELECT
, FROM
,WHERE
,AS,GROUP BY
and ORDER BY
.
Conclusion
Well, that was the end of our 'Data Analysis using SQL' series. We hope you found the series informative and productive by not only learning about the SQL statements and functions but also applying them to a real-world scenario.
Check out our other series about the same using the 'Used Cars' dataset:
We hope to bring more informative content like this your way. Suppose you want more articles, tutorials, Data World updates delivered directly to your inbox - subscribe.
See you soon!