Validate myJoin

Let us perform a task to valiate myJoin. Join order_id and order_date from orders as well as order_id and order_item_subtotal from order_items as part of the join results.

  • Read orders data set

  • Create list of tuples using order_id and order_date with order_id as first element.

  • Read order_items data set

  • Create list of tuples using order_item_order_id and order_item_subtotal with order_item_order_id as first element.

  • Invoke myJoin function with these collections.

%run 04_develop_myMap.ipynb
orders_path = "/data/retail_db/orders/part-00000"
orders = open(orders_path). \
    read(). \
    splitlines()
orders[:10]
['1,2013-07-25 00:00:00.0,11599,CLOSED',
 '2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT',
 '3,2013-07-25 00:00:00.0,12111,COMPLETE',
 '4,2013-07-25 00:00:00.0,8827,CLOSED',
 '5,2013-07-25 00:00:00.0,11318,COMPLETE',
 '6,2013-07-25 00:00:00.0,7130,COMPLETE',
 '7,2013-07-25 00:00:00.0,4530,COMPLETE',
 '8,2013-07-25 00:00:00.0,2911,PROCESSING',
 '9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT',
 '10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT']
orders_map = myMap(orders, 
                   lambda order: (int(order.split(',')[0]), order.split(',')[1])
                  )
orders_map[:10]
[(1, '2013-07-25 00:00:00.0'),
 (2, '2013-07-25 00:00:00.0'),
 (3, '2013-07-25 00:00:00.0'),
 (4, '2013-07-25 00:00:00.0'),
 (5, '2013-07-25 00:00:00.0'),
 (6, '2013-07-25 00:00:00.0'),
 (7, '2013-07-25 00:00:00.0'),
 (8, '2013-07-25 00:00:00.0'),
 (9, '2013-07-25 00:00:00.0'),
 (10, '2013-07-25 00:00:00.0')]
order_items_path = "/data/retail_db/order_items/part-00000"
order_items = open(order_items_path). \
    read(). \
    splitlines()
order_items[:10]
['1,1,957,1,299.98,299.98',
 '2,2,1073,1,199.99,199.99',
 '3,2,502,5,250.0,50.0',
 '4,2,403,1,129.99,129.99',
 '5,4,897,2,49.98,24.99',
 '6,4,365,5,299.95,59.99',
 '7,4,502,3,150.0,50.0',
 '8,4,1014,4,199.92,49.98',
 '9,5,957,1,299.98,299.98',
 '10,5,365,5,299.95,59.99']
order_items_map = myMap(order_items,
                        lambda order_item: (int(order_item.split(',')[1]),
                                            float(order_item.split(',')[4])
                                           )
                       )
%run 10_develop_myJoin.ipynb
orders_join = myJoin(orders_map, order_items_map)
orders_join[:10]
[(1, ('2013-07-25 00:00:00.0', 299.98)),
 (2, ('2013-07-25 00:00:00.0', 199.99)),
 (2, ('2013-07-25 00:00:00.0', 250.0)),
 (2, ('2013-07-25 00:00:00.0', 129.99)),
 (4, ('2013-07-25 00:00:00.0', 49.98)),
 (4, ('2013-07-25 00:00:00.0', 299.95)),
 (4, ('2013-07-25 00:00:00.0', 150.0)),
 (4, ('2013-07-25 00:00:00.0', 199.92)),
 (5, ('2013-07-25 00:00:00.0', 299.98)),
 (5, ('2013-07-25 00:00:00.0', 299.95))]