Preparing Database

Let us setup the database along with tables to see how the batch loads are typically done.

  • We have scripts and data set available in our GitHub repository. If you are using our environment the repository is already cloned under /data/retail_db.

  • It have scripts to create tables with primary keys. Those scripts are generated from MySQL tables and refactored for Postgres. We will create the tables and load the data from the files.

    • Script to create tables: create_db_tables_pg.sql

  • Here are the commands to launch psql and run scripts to create tables as well as load data into tables.

psql -U itversity_retail_user \
  -h localhost \
  -p 5432 \
  -d itversity_retail_db \
  -W

\i /data/retail_db/create_db_tables_pg.sql
Copy to clipboard
%run 02_function_get_database_connection.ipynb
Copy to clipboard
retail_connection
Copy to clipboard
<connection object at 0x7f5f0fcfd528; dsn: 'user=itversity_retail_user password=xxx dbname=itversity_retail_db host=localhost port=5432', closed: 0>
Copy to clipboard
%load_ext sql
Copy to clipboard
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
Copy to clipboard
env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
Copy to clipboard
%%sql

SELECT * FROM orders LIMIT 10
Copy to clipboard
0 rows affected.
Copy to clipboard
order_id order_date order_customer_id order_status
%%sql

SELECT * FROM order_items LIMIT 10
Copy to clipboard
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
0 rows affected.
Copy to clipboard
order_item_id order_item_order_id order_item_product_id order_item_quantity order_item_subtotal order_item_product_price