In this post, we show you how you can set up your own dataset in a Postgres table and analyze it with SQL. The SQL questions (download free ebook with 30+ SQL practice questions and answers) cover concepts that will showcase the basic SQL commands and the more advanced functions in SQL.  

Let us dive in.

GRAB A COPY OF THE DATASET

The dataset is provided below as a CSV file and is a cleaned version of the original dataset on Kaggle. The number of rows is close to 1 Million.

https://drive.google.com/file/d/1jiL0Qy_ulouhOEK7amaGweuuh_CRwEF4/view?usp=sharing

SETUP YOUR OWN POSTGRES DATABASE

USING PANDAS and POSTGRES DOCKER IMAGE

Setting up your own SQL database can be done in a variety of ways. The post below shows you how you can do this with Pandas and Docker images of Postgres and Pgweb.

The approach is the following;

  1. Setup Docker image of Postgres database
  2. Read the source dataset CSV file (link in the previous section) in Pandas
  3. From Pandas write to Postgres database
  4. Pgweb is a very neat SQL client you can use to interact with the Postgres database. Another option we have showcased is to use DBeaver and you can learn how to do that here.
Learn SQL in a browser with PostgreSQL and pgweb
PostgreSQL is a very versatile database. If you want to learn SQL, then a quickway to start is to 1) grab some data you want to analyze 2) insert into aPostgreSQL table and 3) use a SQL client such as pgweb and get started analyzingdata. You can

USING PGCLI

Another very simple approach to insert the CSV into your own Postgres table is to create the table with the query below and then issue the COPY command to read from the CSV file into the table. This can be done using pgcli.

CREATE TABLE "playstore_apps" (
  "app_name" TEXT,
  "category" TEXT,
  "rating" REAL,
  "rating_count" REAL,
  "maximum_installs" INTEGER,
  "free" INTEGER,
  "price" REAL,
  "currency" TEXT,
  "size" REAL,
  "released" TIMESTAMP,
  "content_rating" TEXT
);
COPY playstore_apps FROM 'playstore_apps_clean_1m_rows.csv' DELIMITER ',' CSV;

USING PGFUTTER

Pgfutter is a pretty neat application (simply download the binary for your platform here) and you can use this tool to insert data into Postgres tables from CSV files.

The example shown below assumes the following;

  1. Your database name is data
  2. Your table name is playstore_apps
  3. Your database is hosted on Cloudclusters - this is a hosting provider where you can set up a small Postgres instance for a nominal amount. We use it for testing. THIS IS NOT A RECOMMENDATION TO USE THEM. USE AT YOUR OWN DISCRETION.
  4. The rest of the options need to be provided appropriately. Replace the values with your own specifics.
pgfutter --dbname "data" --table "playstore_apps" --schema "public" --host "postgresql-90455-0.cloudclusters.net" --port "19982" --user "sql_learner" --pass "strong_password_here" csv --skip-header --fields "app_name,category,rating,rating_count,maximum_installs,free,price,currency,size,released,content_rating" playstore_apps_clean_1m_rows.csv


If you are familiar with Pandas, the best option is to use Pandas to insert your data frame into a Postgres table. And as for the table itself, you can set up one on your local computer with Docker or choose a cloud option.

FREE EBOOK WITH SQL EXERCISES

You can find 30+ SQL questions in the document linked below. The SQL questions will take you from simple concepts on selecting data, and filtering using where clauses all the way to commands that will help you aggregate data, use window functions, and more.

datascience.fm - Playstore Apps dataset SQL Exercises
Questions: ----------------------------------------------------------------------------------------- Q1: Take a look at all app’s names, their categories, and their release dates from the table. A: SELECT app_name, category, released FROM playstore_apps E: SELECT indicates which columns...

FULLY MANAGED SOLUTION

If you want a fully managed solution where all you need to do is open your browser and get to work, then reach out to us at hello@datascience.fm

We work with corporates, startups, and colleges to set up the end-to-end infrastructure with 10+ real-world datasets and extensive SQL question banks that are case-based and develop business analytics skills.

Your employees or students will get access to a browser-based environment to run SQL queries.

We set up real-world datasets in a variety of databases to ensure you can provide the training for your employees they need to deliver business impact.

 Reach out to hello@datascience.fm to learn more and get started today.