Limitations of using Loops

There are several limitations using loops.

  • If you look at the below examples related to processing collections using loops, most of the functions have similar logic to iterate through elements.

  • We are spending more time on coding non business logic.

  • It results in too much of code and it can become a maintenance problem.

def get_customer_orders(orders, customer_id):
    orders_filtered = []
    for order in orders:
        if int(order.split(',')[2]) == customer_id:
            orders_filtered.append(order)
    return orders_filtered
def get_customer_orders_for_month(orders, customer_id, order_month):
    orders_filtered = []
    for order in orders:
        order_elements = order.split(',')
        if (int(order_elements[2]) == customer_id and
            order_elements[1].startswith(order_month)):
            orders_filtered.append(order)
    return orders_filtered
for order in orders:
    order_elements = order.split(',')
    if int(order_elements[2]) == 12431 \
        and order_elements[1].startswith('2014-01') \
        and (order_elements[3] in ('PROCESSING', 'PENDING_PAYMENT')):
        print(order)
  • Map Reduce APIs or Higher level libraries such as Pandas will solve these problems.

    • We do not have to develop loops and conditionals.

    • Loops and Conditionals are taken care by the existing APIs.

    • We can just focus on business logic. It can be passed using Lambda Functions.

%run 07_preparing_data_sets.ipynb

Note

Here is the approach using filter that comes as part of Map Reduce APIs. You will learn about Map Reduce APIs soon.

orders_filtered = filter(
    lambda order: int(order.split(',')[2]) == 12431,
    orders
)
list(orders_filtered)
['3774,2013-08-16 00:00:00.0,12431,CANCELED',
 '3870,2013-08-17 00:00:00.0,12431,PENDING_PAYMENT',
 '4032,2013-08-17 00:00:00.0,12431,ON_HOLD',
 '22812,2013-12-12 00:00:00.0,12431,PENDING',
 '22927,2013-12-13 00:00:00.0,12431,CLOSED',
 '25614,2013-12-30 00:00:00.0,12431,CLOSED',
 '27585,2014-01-12 00:00:00.0,12431,PROCESSING',
 '28244,2014-01-15 00:00:00.0,12431,PENDING_PAYMENT',
 '29109,2014-01-21 00:00:00.0,12431,ON_HOLD',
 '29232,2014-01-21 00:00:00.0,12431,ON_HOLD',
 '45894,2014-05-06 00:00:00.0,12431,CLOSED',
 '46217,2014-05-07 00:00:00.0,12431,CLOSED',
 '49678,2014-05-31 00:00:00.0,12431,PENDING',
 '51865,2014-06-15 00:00:00.0,12431,PROCESSING',
 '63146,2014-02-13 00:00:00.0,12431,PENDING_PAYMENT',
 '67110,2014-07-14 00:00:00.0,12431,PENDING']
orders_filtered = filter(
    lambda order: int(order.split(',')[2]) == 12431
        and order.split(',')[1].startswith('2014-01'),
    orders
)
list(orders_filtered)
['27585,2014-01-12 00:00:00.0,12431,PROCESSING',
 '28244,2014-01-15 00:00:00.0,12431,PENDING_PAYMENT',
 '29109,2014-01-21 00:00:00.0,12431,ON_HOLD',
 '29232,2014-01-21 00:00:00.0,12431,ON_HOLD']
orders_filtered = filter(
    lambda order: int(order.split(',')[2]) == 12431
        and order.split(',')[1].startswith('2014-01')
        and (order.split(',')[3] in ('PROCESSING', 'PENDING_PAYMENT')),
    orders
)
list(orders_filtered)
['27585,2014-01-12 00:00:00.0,12431,PROCESSING',
 '28244,2014-01-15 00:00:00.0,12431,PENDING_PAYMENT']

Note

Here is the approach using Pandas library. You will learn about how to process data using Pandas in subsequent sections.

import pandas as pd
orders_schema = [
    'order_id',
    'order_date',
    'order_customer_id',
    'order_status'
]
orders = pd.read_csv('/data/retail_db/orders/part-00000', names=orders_schema)
orders.query('order_customer_id == 12431')
order_id order_date order_customer_id order_status
3773 3774 2013-08-16 00:00:00.0 12431 CANCELED
3869 3870 2013-08-17 00:00:00.0 12431 PENDING_PAYMENT
4031 4032 2013-08-17 00:00:00.0 12431 ON_HOLD
22811 22812 2013-12-12 00:00:00.0 12431 PENDING
22926 22927 2013-12-13 00:00:00.0 12431 CLOSED
25613 25614 2013-12-30 00:00:00.0 12431 CLOSED
27584 27585 2014-01-12 00:00:00.0 12431 PROCESSING
28243 28244 2014-01-15 00:00:00.0 12431 PENDING_PAYMENT
29108 29109 2014-01-21 00:00:00.0 12431 ON_HOLD
29231 29232 2014-01-21 00:00:00.0 12431 ON_HOLD
45893 45894 2014-05-06 00:00:00.0 12431 CLOSED
46216 46217 2014-05-07 00:00:00.0 12431 CLOSED
49677 49678 2014-05-31 00:00:00.0 12431 PENDING
51864 51865 2014-06-15 00:00:00.0 12431 PROCESSING
63145 63146 2014-02-13 00:00:00.0 12431 PENDING_PAYMENT
67109 67110 2014-07-14 00:00:00.0 12431 PENDING
orders.query('order_customer_id == 12431 & order_date.str.startswith("2014-01")')
order_id order_date order_customer_id order_status
27584 27585 2014-01-12 00:00:00.0 12431 PROCESSING
28243 28244 2014-01-15 00:00:00.0 12431 PENDING_PAYMENT
29108 29109 2014-01-21 00:00:00.0 12431 ON_HOLD
29231 29232 2014-01-21 00:00:00.0 12431 ON_HOLD
orders.query('order_customer_id == 12431 & ' +
             'order_date.str.startswith("2014-01") &' +
             'order_status in ("PROCESSING", "PENDING_PAYMENT")'
            )
order_id order_date order_customer_id order_status
27584 27585 2014-01-12 00:00:00.0 12431 PROCESSING
28243 28244 2014-01-15 00:00:00.0 12431 PENDING_PAYMENT