How to be an awesome Android Marketplace Data Analyst using SQL -Part 2
Part two of our 'Data Analysis using SQL' using the real-life Android Apps Dataset. A guide to answering some business questions using SQL.
Introduction
Businesses have become digitized. Digitalization generates more data, data clubs to form databases, and to manage those huge amounts of data, well, you need SQL.
SQL is a very important addition to your resume when looking for jobs in the Data Domain.
We have created a series on 'Data Analysis using SQL' using the real-life Android Apps Dataset and answered some business questions. This is the second part of the series.
Do check part one, for some informative introductory queries.
To recap, here is an introduction to the data we are about to use;
Again to recap, we will be using Falcon to write our SQL queries. Falcon is a free, open-source SQL editor with inline data visualization. Use this easy guide to set up Falcon on your computer and load the database into it.
Now that you are all set, let's answer some business questions.
Let's begin!
1. The analytics team is interested to know the total expenditure by the users on the Apps to estimate the market price. Write a query to obtain that.
Answer
SELECT Sum(installs * price) AS Total_Expenditure
FROM androidapps
Output
The total expenditure comes out to be $291148457.79. This figure can serve as an estimate of the total market price of the Apps, although there are various other revenues that factor in such as ad revenue, in app purchases and more (refer here).Further questions can be asked upon the pricing of the Apps for a deeper analysis.
SQL statements, clauses, and keywords used in the query are SELECT
, FROM
,AS
, and SUM()
To know about these statements in detail refer to our post on beginner SQL queries
• SUM()
function returns the total sum of a numeric column.
• AS
keyword creates an alias, SQL aliases are used to give a table, or a column in a table, a temporary name.
Let's ask further questions regarding the pricing to achieve better insights out of the database.
2. Which Category of Apps has the maximum and minimum expenditure by the users?
Answer (Maximum Expenditure)
SELECT category,
Sum(Installs*price) AS Total_Expenditure
FROM androidapps
GROUP BY Category
ORDER BY Total_Expenditure DESC ;
Output (Maximum Expenditure)
The category of Apps that has maximum expenditure by the users is 'FAMILY' with a total expenditure of $113679845.36 which adds up to approximately 39% of the total expenditure calculated in the previous query.
SQL statements, clauses, and keywords used in the query are SELECT
, FROM
,AS
, SUM()
, ORDER BY
, and GROUP BY
.
• GROUP BY
statement groups rows that have the same values into summary rows, like "find the total expenditure in each category".
• ORDER BY
keyword is used to sort the result-set in ascending or descending order.
Answer (Minimum Expenditure)
SELECT category,
Sum(Installs*price) AS Total_Expenditure
FROM androidapps
GROUP BY Category
ORDER BY Total_Expenditure ASC ;
Output (Minimum Expenditure)
The category of Apps that has maximum expenditure by the users is 'BEAUTY' , 'COMICS' and 'HOUSE_AND_HOME' with a total expenditure of $0, which means either these categories don't have any paid Apps or there are no installs for the paid Apps.
For further insights, let's ask more questions regarding the Price of the Apps.
3. What are the top 10 costliest Apps and the Category they belong to? Also, find the number of Installs for each one of them?
Answer
SELECT app,
category,
price,
installs
FROM androidapps
ORDER BY price DESC
LIMIT 10;
Output
The costliest App is 'I'm Rich- Trump Edition' which belongs to the 'LIFESTYLE' Category priced at $400. It has 10,000 installs which brings the total expenditure on this App to $4000000. Out of top 10 costliest apps, 3 Apps belong to 'LIFESTYLE' and 'FAMILY' each and the rest 4 belong to 'FINANCE'.
SQL statements, clauses, and keywords used in the query are SELECT
, FROM
, ORDER BY
, and LIMIT
.
• MySQL supports the LIMIT
clause to select a limited number of records.Whereas, Oracle uses FETCH FIRST n ROWS ONLY
and ROWNUM
and SQL Server and MS Access uses SELECT TOP
clause is used to specify the number of records to return.
Now let's analyze the Apps on the basis of the Application size.
4. Are there any NULL values in the Size column of the Database? If yes, what are the number of NULL values and Non-NULL Values in Size?
Answer
SELECT Sum(CASE
WHEN size IS NULL THEN 1
ELSE 0
END) AS [Number Of Null Values],
Count(size) AS [Total Number Of Values]
FROM androidapps
Output
A field with a NULL value is a field with no value. A NULL value is different from a zero value or a field that contains spaces. A field with a NULL value is one that has been left blank during record creation.
The number of NULL values in the Size column is 1,227 and the number of Non-NULL values can be caluculates by subtracting the null values from total values, so that is 7,205.
SQL statements, clauses, and keywords used in the query are SELECT
, FROM
, COUNT()
,SUM()
, and CASE
.
• The aliases are written inside the square brackets (i.e. [Number Of Null Values]) because it contains spaces.
• CASE
statement goes through conditions and returns a value when the first condition is met (like an if-then-else statement). So, once a condition is true, it will stop reading and return the result. If no conditions are true, it returns the value in the ELSE
clause. If there is no ELSE
part and no conditions are true, it returns NULL.
CASE
Syntax
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
WHEN conditionN THEN resultN
ELSE result
END;
Let's obtain some descriptive statistics of App Size,
5. What are the descriptive statistics (min, max, and average) of application sizes considering all Apps and also across all the categories?
Answer (All Apps)
SELECT Avg(size) AS Average,
Max(size) AS Maximum,
Min(size) AS Minimum
FROM androidapps
Output (All Apps)
The average App size considering all Apps is 20.39MB ,the maximum size is 100MB and the minimum size is 0.0085MB. Now, let's further look across the same statistics in each category.
SQL statements, clauses, and keywords used in the query are SELECT
, AVG()
,MIN()
,MAX()
and FROM
• MIN()
function returns the smallest value of the selected column.
• MAX()
function returns the largest value of the selected column.
• AVG()
function returns the average value of a numeric column.
Answer (per Category)
SELECT Category,AVG(Size),
MIN(Size),
MAX(Size)
FROM androidapps
GROUP BY Category ORDER BY AVG(Size);
Output ( per Category)
The Category 'GAME' has the highest average App size of 41.86MB and Category 'Tools' has the lowest average App size of 8.78MB. The maximum App size obtained in the previous query was 100 MB and it can be seen that it belongs in the 'GAME' and 'FAMILY' cateogries. The minimum App size obtained in the previous query was 0.0085MB and it can be seen that it belons in the 'LIBRARIES_AND_DEMO' Category.
SQL statements, clauses, and keywords used in the query are SELECT
, AVG()
,MIN()
,MAX()
,FROM
, GROUP BY
, and ORDER BY
.
Let's further dive into deeper analysis in terms of App size.
6. Now that we have obtained the App size distribution, let's divide the Apps into two sections, Apps with size 0.0085MB-20.39MB and Apps with size 20.39MB-100.1MB (upper class excluded). Let's analyze the difference between the count, average price, and total installs between these two sections?
Answer (0.0085MB -20.39MB)
SELECT Sum(installs) AS total_installs,
Count(app) AS Number_of_apps,
Avg(price) AS price
FROM androidapps
WHERE size BETWEEN 0.0084 AND 20.40;
Output (0.0085MB -20.39MB)
There are a total of 5465 Apps with a size between 0.0085 MB and 20.39 MB, the total installs in this section are 10652058180 and the average price of the Apps is $1.3900786825251497.
SQL statements, clauses, and keywords used in the query areSELECT
,SUM(),COUNT()
AVG()
,FROM
,WHERE
, and BETWEEN
.
• BETWEEN
operator selects values within a given range. The values can be numbers, text, or dates. BETWEEN
operator is inclusive: begin and end values are included.
Answer (20.39MB-100MB)
SELECT Sum(installs) AS Installations,
Count(app) AS Total_apps,
Avg(price) AS Price
FROM androidapps
WHERE size BETWEEN 20.39 AND 100;
Output (20.39MB-100MB)
There are a total of 2967 Apps with a size between 20.30MB and 100 MB, the total installs in this section are 20750499324 and the average price of the Apps is $0.9237007077856374.
SQL statements, clauses, and keywords used in the query areSELECT
,SUM(),COUNT()
AVG()
,FROM
,WHERE
, and BETWEEN
.
Even though there are almost half as many apps in the second category, the number of installs is almost double. There is no difference between the average pricing of both categories.
Let's hear the incident of the forgetful Manager now,
7. The Manager is interested in knowing about a particular app in the 'Business' category but hardly remembers the name. They can remember that the app starts with an 'In'. Create a narrowed-down list of Apps using this information.
Answer
SELECT App,
Category
FROM androidapps
WHERE category = 'BUSINESS'
AND App LIKE 'In%%'
Output
The narrowed down list will help the Manager in recalling the name of that particular App.
SQL statements, clauses, and keywords used in the query areSELECT
,FROM
,WHERE
, and LIKE
.
• The LIKE
operator is used in a WHERE
clause to search for a specified pattern in a column. There are two wildcards often used in conjunction with the LIKE
operator:
1. The percent sign (%) represents zero, one, or multiple characters.
2. The underscore sign (_) represents one, single character.
Conclusion
The best way to learn Data Analysis using SQL is to apply it to business problems using real-life dataset. We hope you found this content informative.
This is the first part of the Data Analysis using SQL series using the Android Apps Database. Stay Tuned for more.
In case you want to go through the first part ;
Take a look at our other 4 part Data Analysis using SQL series using the Used Cars Database.
For more such content delivered directly to your mailbox weekly - consider subscribing :)
Happy Learning!