Performing Total Aggregations

We have pre-existing functions to take care of aggregations such as len, sum, min, max etc. Let us understand how they are typically implemented.

  • Get total number of records for a given month using orders.

  • Generate total revenue for a given order id using order items.

  • Use order items data set and get total number of items sold as well as total revenue generated for a given product_id.

  • Create a collection with sales and commission percentage. Using that collection compute total commission amount. If the commission percent is None or not present, treat it as 0.

%run 07_preparing_data_sets.ipynb
orders[:10]
['1,2013-07-25 00:00:00.0,11599,CLOSED',
 '2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT',
 '3,2013-07-25 00:00:00.0,12111,COMPLETE',
 '4,2013-07-25 00:00:00.0,8827,CLOSED',
 '5,2013-07-25 00:00:00.0,11318,COMPLETE',
 '6,2013-07-25 00:00:00.0,7130,COMPLETE',
 '7,2013-07-25 00:00:00.0,4530,COMPLETE',
 '8,2013-07-25 00:00:00.0,2911,PROCESSING',
 '9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT',
 '10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT']
len(orders)
68883
order_items[:10]
['1,1,957,1,299.98,299.98',
 '2,2,1073,1,199.99,199.99',
 '3,2,502,5,250.0,50.0',
 '4,2,403,1,129.99,129.99',
 '5,4,897,2,49.98,24.99',
 '6,4,365,5,299.95,59.99',
 '7,4,502,3,150.0,50.0',
 '8,4,1014,4,199.92,49.98',
 '9,5,957,1,299.98,299.98',
 '10,5,365,5,299.95,59.99']
len(order_items)
172198

Task 1

Use orders and get total number of records for a given month (201401).

  • Develop a function which take orders collection and month as arguments.

  • Month will be passed as integer in the form of yyyyMM (example 201401).

  • Return the order count

orders[:10]
['1,2013-07-25 00:00:00.0,11599,CLOSED',
 '2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT',
 '3,2013-07-25 00:00:00.0,12111,COMPLETE',
 '4,2013-07-25 00:00:00.0,8827,CLOSED',
 '5,2013-07-25 00:00:00.0,11318,COMPLETE',
 '6,2013-07-25 00:00:00.0,7130,COMPLETE',
 '7,2013-07-25 00:00:00.0,4530,COMPLETE',
 '8,2013-07-25 00:00:00.0,2911,PROCESSING',
 '9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT',
 '10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT']
order = '1,2013-07-25 00:00:00.0,11599,CLOSED'
order.split(',')
['1', '2013-07-25 00:00:00.0', '11599', 'CLOSED']
order.split(',')[1]
'2013-07-25 00:00:00.0'
order.split(',')[1][:7]
'2013-07'
order.split(',')[1][:7].replace('-', '')
'201307'
int(order.split(',')[1][:7].replace('-', ''))
201307
def get_order_count(orders, order_month):
    order_count = 0
    for order in orders:
        l_order_month = int(order.split(',')[1][:7].replace('-', ''))
        if l_order_month == order_month: order_count += 1
    return order_count
get_order_count(orders, 201401)
5908

Task 2

Use order items data set and compute total revenue generated for a given product_id.

  • Define a function which takes order_items and a product_id as arguments.

  • product_id will be passed as integer

  • Compute revenue generated for a given product id using subtotal (5th field)

  • Return the computed product revenue

order_items[:10]
['1,1,957,1,299.98,299.98',
 '2,2,1073,1,199.99,199.99',
 '3,2,502,5,250.0,50.0',
 '4,2,403,1,129.99,129.99',
 '5,4,897,2,49.98,24.99',
 '6,4,365,5,299.95,59.99',
 '7,4,502,3,150.0,50.0',
 '8,4,1014,4,199.92,49.98',
 '9,5,957,1,299.98,299.98',
 '10,5,365,5,299.95,59.99']
order_item = '1,1,957,1,299.98,299.98'
order_item.split(',')
['1', '1', '957', '1', '299.98', '299.98']
order_item.split(',')[2]
'957'
int(order_item.split(',')[2])
957
float(order_item.split(',')[4])
299.98
def get_product_revenue(order_items, product_id):
    product_revenue = 0.0
    for order_item in order_items:
        l_product_id = int(order_item.split(',')[2])
        order_item_subtotal = float(order_item.split(',')[4])
        if l_product_id == product_id: product_revenue += order_item_subtotal
    return product_revenue
get_product_revenue(order_items, 502)
3147800.0

Task 3

Use order items data set and get total number of items sold as well as total revenue generated for a given product_id.

  • Define a function which takes order_items and a product_id as arguments.

  • product_id will be passed as integer

  • Get number of items sold for a given product id using quantity (4th field)

  • Compute revenue generated for a given product id using subtotal (5th field)

  • Return the number of items sold as well as revenue generated. When we return more than one value, then the type of the returned results will be tuple.

t1 = (1, 200.0)
t2 = (2, 300.0)
res = (0, 0.0)
res = (res[0] + t1[0], res[1] + t1[1])
res
(1, 200.0)
res = (res[0] + t2[0], res[1] + t2[1])
res
(3, 500.0)
def get_product_metrics(order_items, product_id):
    product_metrics = (0, 0.0)
    for order_item in order_items:
        l_product_id = int(order_item.split(',')[2])
        order_metric = (int(order_item.split(',')[3]), float(order_item.split(',')[4]))
        if l_product_id == product_id: 
            product_metrics = (product_metrics[0] + order_metric[0], product_metrics[1] + order_metric[1])
    return product_metrics
get_product_metrics(order_items, 502)
(62956, 3147800.0)

Note

Alternative approach to get the product metrics. We can return multiple values as part of the return statement.

def get_product_metrics(order_items, product_id):
    product_count, product_revenue = 0, 0.0
    for order_item in order_items:
        l_product_id = int(order_item.split(',')[2])
        order_metric = (int(order_item.split(',')[3]), float(order_item.split(',')[4]))
        if l_product_id == product_id: 
            product_count += order_metric[0]
            product_revenue += order_metric[1]
    return product_count, product_revenue
get_product_metrics(order_items, 502)
(62956, 3147800.0)
type(get_product_metrics(order_items, 502))
tuple

Task 4

Create a collection with sales and commission percentage. Using that collection compute total commission amount. If the commission percent is None or not present, treat it as 0.

  • Each element in the collection should be a tuple.

  • First element is the sales amount and second element is commission percentage.

  • Commission for each sale can be computed by multiplying commission percentage with sales (make sure to divide commission percentage by 100).

  • Some of the records does not have commission percentage, in that case commission amount for that sale shall be 0

  • Function should take a collection of tuples and return commission amount which is of type float.

transactions = [(376.0, 8),
(548.23, 14),
(107.93, 8),
(838.22, 14),
(846.85, 21),
(234.84,),
(850.2, 21),
(992.2, 21),
(267.01,),
(958.91, 21),
(412.59,),
(283.14,),
(350.01, 14),
(226.95,),
(132.7, 14)]
type(transactions)
list
transactions[:6]
[(376.0, 8), (548.23, 14), (107.93, 8), (838.22, 14), (846.85, 21), (234.84,)]
sale = transactions[0]
type(sale)
tuple
commission_amount = round(sale[0] * (sale[1] / 100), 2)
commission_amount
30.08
sale = (234.84,)
commission_amount = round(sale[0] * (sale[1] / 100), 2) # errors out
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-64-158aaaf09c96> in <module>
----> 1 commission_amount = round(sale[0] * (sale[1] / 100), 2) # errors out

IndexError: tuple index out of range
len(sale)
1
commission_pct = sale[1] / 100 if len(sale) == 2 else 0
commission_pct
0
def get_commission_amount(sales):
    commission_amount = 0.0
    for sale in sales:
        sale_amount = sale[0]
        commission_pct = round(sale[1]/100, 2) if len(sale) == 2 else 0
        commission_amount += sale_amount * commission_pct
    return round(commission_amount, 2)
get_commission_amount(transactions)
1066.51