How to be an awesome Android Marketplace Data Analyst using SQL -Part 2

Introduction

Businesses have become digitized. Digitalization generates more data, data clubs to form databases, and to manage those huge amounts of data, well, you need SQL.
SQL is a very important addition to your resume when looking for jobs in the Data Domain.

We have created a series on 'Data Analysis using SQL' using the real-life Android Apps Dataset and answered some business questions. This is the second part of the series.

Do check part one, for some informative introductory queries.

How to be an awesome Android Marketplace Data Analyst for Beginners - Part 1
This post will guide you through writing SQL queries using a real-life dataset of Android apps to answer some analytics-oriented questions.

To recap, here is an introduction to the data we are about to use;

COLUMNDESCRIPTION
AppName of the Play Store app
Category The category of the app ( GAME, BUSINESS, FAMILY, etc.)
RatingRating of the app out of 5
Reviews Number of reviews for the app on Play Store
Size Size of the app in MBs
Installs Number of installs
TypeWhether the app is free or paid
PricePrice of the app
Content RatingWho is this app meant for? (Teen etc.)
GenresStyle or category of the app ( Entertainment, Tools, etc.)
Last UpdatedThe date on which the app was last updated
Current VersionThe current version of the app
Android VersionThe Android Version the app is capable with

Again to recap, we will be using Falcon to write our SQL queries. Falcon is a free, open-source SQL editor with inline data visualization. Use this easy guide to set up Falcon on your computer and load the database into it.

Write SQL queries effortlessly with Falcon by Plotly
This post guides you through downloading and setting up Falcon - a free, open-source SQL editor which enables you to write your SQL queries without the hassle and moreover gives you an option to visualize the results.

Now that you are all set, let's answer some business questions.

✔️
But before that, follow us on Twitter and never miss out on such cool posts.

Let's begin!


1. The analytics team is interested to know the total expenditure by the users on the Apps to estimate the market price. Write a query to obtain that.

Answer

SELECT Sum(installs * price) AS Total_Expenditure
FROM   androidapps 

Output

The total expenditure comes out to be $291148457.79. This figure can serve as an estimate of the total market price of the Apps, although there are various other revenues that factor in such as ad revenue, in app purchases and more (refer here).Further questions can be asked upon the pricing of the Apps for a deeper analysis.

SQL statements, clauses, and keywords used in the query are SELECT, FROM,AS, and SUM()

To know about these statements in detail refer to our post on beginner SQL queries

Getting Started with SQL Queries - Exercises for Beginners Part-1
Here’s a selection of the most useful SQL queries every beginner must practice. This article will not only provide you with the answers but also the explanation to each query.

SUM()function returns the total sum of a numeric column.

AS keyword creates an alias, SQL aliases are used to give a table, or a column in a table, a temporary name.

Let's ask further questions regarding the pricing to achieve better insights out of the database.

2. Which Category of Apps has the maximum and minimum expenditure by the users?

Answer (Maximum Expenditure)

SELECT category,
       Sum(Installs*price) AS Total_Expenditure
FROM   androidapps
GROUP  BY Category
ORDER  BY Total_Expenditure DESC ;

Output (Maximum Expenditure)

The category of Apps that has maximum expenditure by the users is 'FAMILY' with a total expenditure of $113679845.36 which adds up to approximately 39% of the total expenditure calculated in the previous query.

SQL statements, clauses, and keywords used in the query are SELECT, FROM,AS, SUM(), ORDER BY, and GROUP BY.

GROUP BY statement groups rows that have the same values into summary rows, like "find the total expenditure in each category".

ORDER BY keyword is used to sort the result-set in ascending or descending order.

Answer (Minimum Expenditure)

SELECT category,
       Sum(Installs*price) AS Total_Expenditure
FROM   androidapps
GROUP  BY Category
ORDER  BY Total_Expenditure ASC ;

Output (Minimum Expenditure)

The category of Apps that has maximum expenditure by the users is 'BEAUTY' , 'COMICS' and  'HOUSE_AND_HOME' with a total expenditure of $0,  which means either these categories don't have any paid Apps or there are no installs for the paid Apps.

For further insights, let's ask more questions regarding the Price of the Apps.

3. What are the top 10 costliest Apps and the Category they belong to? Also, find the number of Installs for each one of them?

Answer

SELECT app,
       category,
       price,
       installs
FROM   androidapps
ORDER  BY price DESC
LIMIT  10; 

Output

The costliest App is 'I'm Rich- Trump Edition' which belongs to the 'LIFESTYLE' Category priced at $400. It has 10,000 installs which brings the total expenditure on this App to  $4000000. Out of top 10 costliest apps, 3 Apps belong to 'LIFESTYLE' and 'FAMILY' each and the rest 4 belong to 'FINANCE'.

SQL statements, clauses, and keywords used in the query are SELECT, FROM, ORDER BY,  and LIMIT.

• MySQL supports the LIMIT clause to select a limited number of records.Whereas, Oracle uses FETCH FIRST n ROWS ONLY and ROWNUM and SQL Server and MS Access uses SELECT TOP clause is used to specify the number of records to return.

Now let's analyze the Apps on the basis of the Application size.

4. Are there any NULL values in the Size column of the Database? If yes, what are the number of NULL values and Non-NULL Values in Size?

Answer

SELECT Sum(CASE
             WHEN size IS NULL THEN 1
             ELSE 0
           END)    AS [Number Of Null Values],
       Count(size) AS [Total Number Of Values]
FROM   androidapps 

Output

A field with a NULL value is a field with no value. A NULL value is different from a zero value or a field that contains spaces. A field with a NULL value is one that has been left blank during record creation.
The number of NULL values in the Size column is 1,227 and the number of Non-NULL values can be caluculates by subtracting the null values from total values, so that is 7,205.

SQL statements, clauses, and keywords used in the query are SELECT, FROM, COUNT() ,SUM(), and CASE.

• The aliases are written inside the square brackets (i.e. [Number Of Null Values]) because it contains spaces.  

CASE statement goes through conditions and returns a value when the first condition is met (like an if-then-else statement). So, once a condition is true, it will stop reading and return the result. If no conditions are true, it returns the value in the ELSE clause. If there is no ELSE part and no conditions are true, it returns NULL.

CASE Syntax

CASE
    WHEN condition1 THEN result1
    WHEN condition2 THEN result2
    WHEN conditionN THEN resultN
    ELSE result
END;

Let's obtain some descriptive statistics of App Size,

5. What are the descriptive statistics (min, max, and average) of application sizes considering all Apps and also across all the categories?

Answer (All Apps)

SELECT Avg(size) AS Average,
       Max(size) AS Maximum,
       Min(size) AS Minimum
FROM   androidapps 

Output (All Apps)

The average App size considering all Apps is 20.39MB ,the maximum size is 100MB and the minimum size is 0.0085MB. Now, let's further look across the same statistics in each category.

SQL statements, clauses, and keywords used in the query are SELECT, AVG(),MIN(),MAX()and FROM

MIN() function returns the smallest value of the selected column.

MAX() function returns the largest value of the selected column.

AVG() function returns the average value of a numeric column.

Answer (per Category)

SELECT Category,AVG(Size),
       MIN(Size), 
       MAX(Size) 
FROM androidapps 
GROUP BY Category ORDER BY AVG(Size);

Output ( per Category)

The Category 'GAME' has the highest average App size of 41.86MB and Category 'Tools' has the lowest average App size of 8.78MB. The maximum App size obtained in the previous query was 100 MB and it can be seen that it belongs in the 'GAME' and 'FAMILY' cateogries. The minimum App size obtained in the previous query was 0.0085MB and it can be seen that it belons in the 'LIBRARIES_AND_DEMO' Category.

SQL statements, clauses, and keywords used in the query are SELECT, AVG(),MIN(),MAX(),FROM , GROUP BY, and  ORDER BY.

Let's further dive into deeper analysis in terms of App size.  

6. Now that we have obtained the App size distribution, let's divide the Apps into two sections, Apps with size 0.0085MB-20.39MB and Apps with size 20.39MB-100.1MB (upper class excluded). Let's analyze the difference between the count, average price, and total installs between these two sections?

Answer (0.0085MB -20.39MB)

SELECT Sum(installs) AS total_installs,
       Count(app)    AS Number_of_apps,
       Avg(price)    AS price
FROM   androidapps
WHERE  size BETWEEN 0.0084 AND 20.40; 

Output (0.0085MB -20.39MB)

There are a total of 5465 Apps with a size between 0.0085 MB and 20.39 MB, the total installs in this section are 10652058180 and the average price of the Apps is $1.3900786825251497.

SQL statements, clauses, and keywords used in the query areSELECT,SUM(),COUNT() AVG(),FROM ,WHERE , and  BETWEEN.

BETWEEN operator selects values within a given range. The values can be numbers, text, or dates. BETWEEN operator is inclusive: begin and end values are included.

Answer (20.39MB-100MB)

SELECT Sum(installs) AS Installations,
       Count(app)    AS Total_apps,
       Avg(price)    AS Price
FROM   androidapps
WHERE  size BETWEEN 20.39 AND 100; 

Output  (20.39MB-100MB)

There are a total of 2967 Apps with a size between 20.30MB and 100 MB, the total installs in this section are 20750499324 and the average price of the Apps is $0.9237007077856374.

SQL statements, clauses, and keywords used in the query areSELECT,SUM(),COUNT() AVG(),FROM ,WHERE , and  BETWEEN.

Even though there are almost half as many apps in the second category, the number of installs is almost double. There is no difference between the average pricing of both categories.

Let's hear the incident of the forgetful Manager now,

7. The Manager is interested in knowing about a particular app in the 'Business' category but hardly remembers the name. They can remember that the app starts with an 'In'. Create a narrowed-down list of Apps using this information.

Answer

SELECT App,
       Category
FROM   androidapps
WHERE  category = 'BUSINESS'
       AND App LIKE 'In%%' 

Output

The narrowed down list will help the Manager in recalling the name of that particular App.

SQL statements, clauses, and keywords used in the query areSELECT,FROM ,WHERE , and  LIKE.

• The LIKE operator is used in a WHERE clause to search for a specified pattern in a column. There are two wildcards often used in conjunction with the LIKE operator:
1. The percent sign (%) represents zero, one, or multiple characters.
2. The underscore sign (_) represents one, single character.

Conclusion

The best way to learn Data Analysis using SQL is to apply it to business problems using real-life dataset. We hope you found this content informative.

This is the first part of the  Data Analysis using SQL series using the Android Apps Database. Stay Tuned for more.  

In case you want to go through the first part ;

How to be an awesome Android Marketplace Data Analyst for Beginners - Part 1
This post will guide you through writing SQL queries using a real-life dataset of Android apps to answer some analytics-oriented questions.

Take a look at our other 4 part Data Analysis using SQL series using the Used Cars Database.

Getting Started with SQL Queries - Exercises for Beginners Part-1
Here’s a selection of the most useful SQL queries every beginner must practice. This article will not only provide you with the answers but also the explanation to each query.

For more such content delivered directly to your mailbox weekly - consider subscribing :)

Happy Learning!