How to be an awesome Android Marketplace Data Analyst for Beginners - Part 1

This post will guide you through writing SQL queries using a real-life dataset of Android apps to answer some analytics-oriented questions.

How to be an awesome Android Marketplace Data Analyst for Beginners - Part 1
Photo by Campaign Creators / Unsplash

Introduction

SQL is a highly sought-after Analytics skill that is needed when looking for jobs in the Analytics industry. Why is that? Well, the companies you might want to apply to store all their data in relational databases like MySQL, SQL Server, MS Access, or PostgreSQL. To retrieve and conduct further analysis on the data, SQL is a must. So, when searching for a job in the Analytics domain, every candidate would be good in languages like R/Python but adding SQL to your skills is what sets you apart.

Every now and then we come across SQL functions (such as SELECT, WHERE, ORDER BY), but how do we apply them to a real-life dataset?

To learn that - This post will guide you through the SQL queries to answer some analytical questions using a real-life Android Apps Dataset.

But first, let's introduce you to the data;

COLUMNDESCRIPTION
AppName of the Play Store app
Category The category of the app ( GAME, BUSINESS, FAMILY, etc.)
RatingRating of the app out of 5
Reviews Number of reviews for the app on Play Store
Size Size of the app in MBs
Installs Number of installs
TypeWhether the app is free or paid
PricePrice of the app
Content RatingWho is this app meant for? (Teen etc.)
GenresStyle or category of the app ( Entertainment, Tools, etc.)
Last UpdatedThe date on which the app was last updated
Current VersionThe current version of the app
Android VersionThe Android Version, the app is capable with

We will be using Falcon to write our SQL queries. Falcon is a free, open-source SQL editor with inline data visualization. Use this easy guide to set up Falcon on your computer and load the database into it.

Write SQL queries effortlessly with Falcon by Plotly
This post guides you through downloading and setting up Falcon - a free, open-source SQL editor which enables you to write your SQL queries without the hassle and moreover gives you an option to visualize the results.


Now, that you are all set with the database and the SQL editor, let's begin with some business problems.


1. We can begin the analysis by obtaining the number of distinct Categories and Genres

Answer:

SELECT Count(DISTINCT category) AS Categories,
       Count(DISTINCT genres)   AS Genres,     
FROM   androidapps 

Output

There are 33 Categories and 118 Genres of Apps in the Play Store Database.

To obtain the number of distinct categories and genres, we have used SELECT, FROM, COUNT(DISTINCT ) statements.
COUNT() would show a result of all records whileCOUNT(DISTINCT ) will show a result of only distinct records.

To know about these statements in detail refer to our post on beginner SQL queries

Getting Started with SQL Queries - Exercises for Beginners Part-1
Here’s a selection of the most useful SQL queries every beginner must practice. This article will not only provide you with the answers but also the explanation to each query.

2. There are two 'Types' of Apps, namely Free and Paid Apps. Obtain the number of Paid as well as Free Apps?

Answer:

SELECT type,
       Count(*)
FROM   androidapps
GROUP  BY type 

Output

There are 8902 Free Apps and 756 Paid Apps in the Play Store Database, the results are visualized using a bar chart.

To obtain the number of Free and Paid Apps SELECT,GROUP BY , COUNT() statements are used.

COUNT()is used to count the number of Apps in each 'Type'.
GROUP BYis used to group the Apps in 'Types' i.e. Free v/s Paid.

Answer:

SELECT category,
       installs
FROM   androidapps
GROUP  BY category
ORDER  BY installs DESC 

Output

The most popular App Categories are ENTERTAINMENT, LIBRARIES_AND_DEMO, LIFESTYLE, and PHOTOGRAPHY with 10,000  installs each.
Google Play Store Research says "Since the installs parameter is independent and not correlated to any other parameters, we must use installs to show the popularity of an app."

To derive the most popular Category of Apps, we have used SELECT,FROM,GROUP BY and ORDER BY statement.

ORDER BY  the keyword is used to sort the result-set in ascending or descending order. ORDER BY  keyword sorts the records in ascending order by default. To sort the records in descending order we use the DESC keyword along with ORDER BY.

Answer

SELECT installs,
       category,
       reviews
FROM   androidapps
WHERE  reviews > 100
        OR installs > 1000
GROUP  BY category
ORDER  BY installs; 

Output

The least popular App Category is BOOKS_AND_REFERENCE with only 1000 installs. The results are visualized using a line graph. We can see a similarity in the trend of installs and reviews. This tells us that either people are more likely to install Apps with more reviews or write a review for Apps with more installs. Always remember, correlation doesn't imply causation.
Google Play Store Research says "Installs and reviews have the strongest inverse correlation."

To derive the least popular Category of Apps, we have used SELECT,FROM,WHERE, GROUP BY and ORDER BY statement.

GROUP BY  statement groups rows that have the same values into summary rows, like "find the number of installs in each category".

5. A number of reviews can also serve as a good variable to assess the popularity of the apps. Analyze the popularity of Apps(by reviews)?

Answer

SELECT app,
       reviews
FROM   androidapps
GROUP  BY category
ORDER  BY reviews DESC 

Output

By analyzing the popularity of Apps by reviews we can see that the App with the maximum amount of reviews is Colorfy: Coloring Books for Adults.

To derive the most popular App by reviews, we have used SELECT,FROM,GROUP BY and ORDER BY statements.

6. There must be varying app sizes throughout the database. Deriving App Size by each Category would be beneficial for the Analysis. What is the average App Size for each Category?

Answer

SELECT Avg(size) AS average_size,
       category
FROM   androidapps
GROUP  BY category
ORDER  BY size 

Output

The distribution of average App Size per Category is obtained and the results are visualized using a bar graph. There are 33 categories, and by the graph, it looks like the GAME category has the highest average size and LIBRARIES_AND_HOME has the lowest average size. But to confirm this insight let's write another query in the next segment.

To obtain the distribution of average App Size per Category we have used SELECT,FROM,GROUP BY ,ORDER BY statements and AVG() function.

AVG() function returns the average value of a numeric column.

💡
Never miss any of our awesome posts - Follow us on Twitter to stay updated!

7. Write a query to obtain which Category of Apps has the maximum and minimum average App size?

 SELECT Category,
       AVG(Size)
 FROM  androidapps 
 GROUP BY Category 
 ORDER BY AVG(size);
The distribution of average App Size per Category is obtained and has been arranged in an ascending order to derive the App Category with minimum and maximum average App Size and the results are visualized using a bar graph. So, the initial assumption we made by looking at the previous query's visualization was wrong, the App Category with minimum average App Size is Tools instead of Library. This is one of the reasons why we should always arrange the output in order using the  ORDER BY statement.

To obtain the minimum and maximum average App Size per Category we have used SELECT,FROM,GROUP BY ,ORDER BY statements and AVG() function.

Conclusion

Writing queries to answer these business questions will help you immensely to begin your SQL and Data Analytics journey. You can get a better look at the trend of the data by looking at the visualizations as well.
This is the first part of the  Data Analysis using SQL series using the same Android Apps Database. Stay Tuned for more.

Check out our another 4 part series, that offers a selection of the most useful SQL queries every beginner must practice, using a real-life dataset.

Getting Started with SQL Queries - Exercises for Beginners Part-1
Here’s a selection of the most useful SQL queries every beginner must practice. This article will not only provide you with the answers but also the explanation to each query.

I hope you found this content informative. For more such content delivered directly to your mailbox weekly - consider subscribing :)

Until next time!

Would mean a lot to us if you showed us some ❤️ on Twitter 👇