How to be an awesome Android Marketplace Data Analyst for Beginners - Part 1
This post will guide you through writing SQL queries using a real-life dataset of Android apps to answer some analytics-oriented questions.
Introduction
SQL is a highly sought-after Analytics skill that is needed when looking for jobs in the Analytics industry. Why is that? Well, the companies you might want to apply to store all their data in relational databases like MySQL, SQL Server, MS Access, or PostgreSQL. To retrieve and conduct further analysis on the data, SQL is a must. So, when searching for a job in the Analytics domain, every candidate would be good in languages like R/Python but adding SQL to your skills is what sets you apart.
Every now and then we come across SQL functions (such as SELECT, WHERE, ORDER BY), but how do we apply them to a real-life dataset?
To learn that - This post will guide you through the SQL queries to answer some analytical questions using a real-life Android Apps Dataset.
But first, let's introduce you to the data;
We will be using Falcon to write our SQL queries. Falcon is a free, open-source SQL editor with inline data visualization. Use this easy guide to set up Falcon on your computer and load the database into it.
Now, that you are all set with the database and the SQL editor, let's begin with some business problems.
1. We can begin the analysis by obtaining the number of distinct Categories and Genres
Answer:
SELECT Count(DISTINCT category) AS Categories,
Count(DISTINCT genres) AS Genres,
FROM androidapps
Output
There are 33 Categories and 118 Genres of Apps in the Play Store Database.
To obtain the number of distinct categories and genres, we have used SELECT
, FROM
, COUNT(DISTINCT )
statements.COUNT()
would show a result of all records whileCOUNT(DISTINCT )
will show a result of only distinct records.
To know about these statements in detail refer to our post on beginner SQL queries
2. There are two 'Types' of Apps, namely Free and Paid Apps. Obtain the number of Paid as well as Free Apps?
Answer:
SELECT type,
Count(*)
FROM androidapps
GROUP BY type
Output
There are 8902 Free Apps and 756 Paid Apps in the Play Store Database, the results are visualized using a bar chart.
To obtain the number of Free and Paid Apps SELECT
,GROUP BY
, COUNT()
statements are used.
•COUNT()
is used to count the number of Apps in each 'Type'.
•GROUP BY
is used to group the Apps in 'Types' i.e. Free v/s Paid.
3. The popularity of Apps can be assessed by the number of installs. Find out the most popular App category?
Answer:
SELECT category,
installs
FROM androidapps
GROUP BY category
ORDER BY installs DESC
Output
The most popular App Categories are ENTERTAINMENT, LIBRARIES_AND_DEMO, LIFESTYLE, and PHOTOGRAPHY with 10,000 installs each.
Google Play Store Research says "Since the installs parameter is independent and not correlated to any other parameters, we must use installs to show the popularity of an app."
To derive the most popular Category of Apps, we have used SELECT
,FROM
,GROUP BY
and ORDER BY
statement.
•ORDER BY
the keyword is used to sort the result-set in ascending or descending order. ORDER BY
keyword sorts the records in ascending order by default. To sort the records in descending order we use the DESC
keyword along with ORDER BY
.
4. Find out the least popular App category, only considering the Apps with 1000+ installs or 100+ reviews?
Answer
SELECT installs,
category,
reviews
FROM androidapps
WHERE reviews > 100
OR installs > 1000
GROUP BY category
ORDER BY installs;
Output
The least popular App Category is BOOKS_AND_REFERENCE with only 1000 installs. The results are visualized using a line graph. We can see a similarity in the trend of installs and reviews. This tells us that either people are more likely to install Apps with more reviews or write a review for Apps with more installs. Always remember, correlation doesn't imply causation.
Google Play Store Research says "Installs and reviews have the strongest inverse correlation."
To derive the least popular Category of Apps, we have used SELECT
,FROM
,WHERE
, GROUP BY
and ORDER BY
statement.
•GROUP BY
statement groups rows that have the same values into summary rows, like "find the number of installs in each category".
5. A number of reviews can also serve as a good variable to assess the popularity of the apps. Analyze the popularity of Apps(by reviews)?
Answer
SELECT app,
reviews
FROM androidapps
GROUP BY category
ORDER BY reviews DESC
Output
By analyzing the popularity of Apps by reviews we can see that the App with the maximum amount of reviews is Colorfy: Coloring Books for Adults.
To derive the most popular App by reviews, we have used SELECT
,FROM
,GROUP BY
and ORDER BY
statements.
6. There must be varying app sizes throughout the database. Deriving App Size by each Category would be beneficial for the Analysis. What is the average App Size for each Category?
Answer
SELECT Avg(size) AS average_size,
category
FROM androidapps
GROUP BY category
ORDER BY size
Output
The distribution of average App Size per Category is obtained and the results are visualized using a bar graph. There are 33 categories, and by the graph, it looks like the GAME category has the highest average size and LIBRARIES_AND_HOME has the lowest average size. But to confirm this insight let's write another query in the next segment.
To obtain the distribution of average App Size per Category we have used SELECT
,FROM
,GROUP BY
,ORDER BY
statements and AVG()
function.
• AVG()
function returns the average value of a numeric column.
7. Write a query to obtain which Category of Apps has the maximum and minimum average App size?
SELECT Category,
AVG(Size)
FROM androidapps
GROUP BY Category
ORDER BY AVG(size);
The distribution of average App Size per Category is obtained and has been arranged in an ascending order to derive the App Category with minimum and maximum average App Size and the results are visualized using a bar graph. So, the initial assumption we made by looking at the previous query's visualization was wrong, the App Category with minimum average App Size is Tools instead of Library. This is one of the reasons why we should always arrange the output in order using the ORDER BY
statement.
To obtain the minimum and maximum average App Size per Category we have used SELECT
,FROM
,GROUP BY
,ORDER BY
statements and AVG()
function.
Conclusion
Writing queries to answer these business questions will help you immensely to begin your SQL and Data Analytics journey. You can get a better look at the trend of the data by looking at the visualizations as well.
This is the first part of the Data Analysis using SQL series using the same Android Apps Database. Stay Tuned for more.
Check out our another 4 part series, that offers a selection of the most useful SQL queries every beginner must practice, using a real-life dataset.
I hope you found this content informative. For more such content delivered directly to your mailbox weekly - consider subscribing :)
Until next time!
Would mean a lot to us if you showed us some ❤️ on Twitter 👇