Preparing Database

Let us setup the database along with tables to see how the batch loads are typically done.

  • We have scripts and data set available in our GitHub repository. If you are using our environment the repository is already cloned under /data/retail_db.

  • It have scripts to create tables with primary keys. Those scripts are generated from MySQL tables and refactored for Postgres. We will create the tables and load the data from the files.

    • Script to create tables: create_db_tables_pg.sql

  • Here are the commands to launch psql and run scripts to create tables as well as load data into tables.

psql -U itversity_retail_user \
  -h localhost \
  -p 5432 \
  -d itversity_retail_db \
  -W

\i /data/retail_db/create_db_tables_pg.sql
%run 02_function_get_database_connection.ipynb
retail_connection
<connection object at 0x7f5f0fcfd528; dsn: 'user=itversity_retail_user password=xxx dbname=itversity_retail_db host=localhost port=5432', closed: 0>
%load_ext sql
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
%%sql

SELECT * FROM orders LIMIT 10
0 rows affected.
order_id order_date order_customer_id order_status
%%sql

SELECT * FROM order_items LIMIT 10
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
0 rows affected.
order_item_id order_item_order_id order_item_product_id order_item_quantity order_item_subtotal order_item_product_price