Performing Total Aggregations¶
We have pre-existing functions to take care of aggregations such as len
, sum
, min
, max
etc. Let us understand how they are typically implemented.
Get total number of records for a given month using orders.
Generate total revenue for a given order id using order items.
Use order items data set and get total number of items sold as well as total revenue generated for a given product_id.
Create a collection with sales and commission percentage. Using that collection compute total commission amount. If the commission percent is None or not present, treat it as 0.
%run 07_preparing_data_sets.ipynb
orders[:10]
['1,2013-07-25 00:00:00.0,11599,CLOSED',
'2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT',
'3,2013-07-25 00:00:00.0,12111,COMPLETE',
'4,2013-07-25 00:00:00.0,8827,CLOSED',
'5,2013-07-25 00:00:00.0,11318,COMPLETE',
'6,2013-07-25 00:00:00.0,7130,COMPLETE',
'7,2013-07-25 00:00:00.0,4530,COMPLETE',
'8,2013-07-25 00:00:00.0,2911,PROCESSING',
'9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT',
'10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT']
len(orders)
68883
order_items[:10]
['1,1,957,1,299.98,299.98',
'2,2,1073,1,199.99,199.99',
'3,2,502,5,250.0,50.0',
'4,2,403,1,129.99,129.99',
'5,4,897,2,49.98,24.99',
'6,4,365,5,299.95,59.99',
'7,4,502,3,150.0,50.0',
'8,4,1014,4,199.92,49.98',
'9,5,957,1,299.98,299.98',
'10,5,365,5,299.95,59.99']
len(order_items)
172198
Task 1¶
Use orders and get total number of records for a given month (201401).
Develop a function which take orders collection and month as arguments.
Month will be passed as integer in the form of yyyyMM (example 201401).
Return the order count
orders[:10]
['1,2013-07-25 00:00:00.0,11599,CLOSED',
'2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT',
'3,2013-07-25 00:00:00.0,12111,COMPLETE',
'4,2013-07-25 00:00:00.0,8827,CLOSED',
'5,2013-07-25 00:00:00.0,11318,COMPLETE',
'6,2013-07-25 00:00:00.0,7130,COMPLETE',
'7,2013-07-25 00:00:00.0,4530,COMPLETE',
'8,2013-07-25 00:00:00.0,2911,PROCESSING',
'9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT',
'10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT']
order = '1,2013-07-25 00:00:00.0,11599,CLOSED'
order.split(',')
['1', '2013-07-25 00:00:00.0', '11599', 'CLOSED']
order.split(',')[1]
'2013-07-25 00:00:00.0'
order.split(',')[1][:7]
'2013-07'
order.split(',')[1][:7].replace('-', '')
'201307'
int(order.split(',')[1][:7].replace('-', ''))
201307
def get_order_count(orders, order_month):
order_count = 0
for order in orders:
l_order_month = int(order.split(',')[1][:7].replace('-', ''))
if l_order_month == order_month: order_count += 1
return order_count
get_order_count(orders, 201401)
5908
Task 2¶
Use order items data set and compute total revenue generated for a given product_id.
Define a function which takes order_items and a product_id as arguments.
product_id will be passed as integer
Compute revenue generated for a given product id using subtotal (5th field)
Return the computed product revenue
order_items[:10]
['1,1,957,1,299.98,299.98',
'2,2,1073,1,199.99,199.99',
'3,2,502,5,250.0,50.0',
'4,2,403,1,129.99,129.99',
'5,4,897,2,49.98,24.99',
'6,4,365,5,299.95,59.99',
'7,4,502,3,150.0,50.0',
'8,4,1014,4,199.92,49.98',
'9,5,957,1,299.98,299.98',
'10,5,365,5,299.95,59.99']
order_item = '1,1,957,1,299.98,299.98'
order_item.split(',')
['1', '1', '957', '1', '299.98', '299.98']
order_item.split(',')[2]
'957'
int(order_item.split(',')[2])
957
float(order_item.split(',')[4])
299.98
def get_product_revenue(order_items, product_id):
product_revenue = 0.0
for order_item in order_items:
l_product_id = int(order_item.split(',')[2])
order_item_subtotal = float(order_item.split(',')[4])
if l_product_id == product_id: product_revenue += order_item_subtotal
return product_revenue
get_product_revenue(order_items, 502)
3147800.0
Task 3¶
Use order items data set and get total number of items sold as well as total revenue generated for a given product_id.
Define a function which takes order_items and a product_id as arguments.
product_id will be passed as integer
Get number of items sold for a given product id using quantity (4th field)
Compute revenue generated for a given product id using subtotal (5th field)
Return the number of items sold as well as revenue generated. When we return more than one value, then the type of the returned results will be tuple.
t1 = (1, 200.0)
t2 = (2, 300.0)
res = (0, 0.0)
res = (res[0] + t1[0], res[1] + t1[1])
res
(1, 200.0)
res = (res[0] + t2[0], res[1] + t2[1])
res
(3, 500.0)
def get_product_metrics(order_items, product_id):
product_metrics = (0, 0.0)
for order_item in order_items:
l_product_id = int(order_item.split(',')[2])
order_metric = (int(order_item.split(',')[3]), float(order_item.split(',')[4]))
if l_product_id == product_id:
product_metrics = (product_metrics[0] + order_metric[0], product_metrics[1] + order_metric[1])
return product_metrics
get_product_metrics(order_items, 502)
(62956, 3147800.0)
Note
Alternative approach to get the product metrics. We can return multiple values as part of the return statement.
def get_product_metrics(order_items, product_id):
product_count, product_revenue = 0, 0.0
for order_item in order_items:
l_product_id = int(order_item.split(',')[2])
order_metric = (int(order_item.split(',')[3]), float(order_item.split(',')[4]))
if l_product_id == product_id:
product_count += order_metric[0]
product_revenue += order_metric[1]
return product_count, product_revenue
get_product_metrics(order_items, 502)
(62956, 3147800.0)
type(get_product_metrics(order_items, 502))
tuple
Task 4¶
Create a collection with sales and commission percentage. Using that collection compute total commission amount. If the commission percent is None or not present, treat it as 0.
Each element in the collection should be a tuple.
First element is the sales amount and second element is commission percentage.
Commission for each sale can be computed by multiplying commission percentage with sales (make sure to divide commission percentage by 100).
Some of the records does not have commission percentage, in that case commission amount for that sale shall be 0
Function should take a collection of tuples and return commission amount which is of type float.
transactions = [(376.0, 8),
(548.23, 14),
(107.93, 8),
(838.22, 14),
(846.85, 21),
(234.84,),
(850.2, 21),
(992.2, 21),
(267.01,),
(958.91, 21),
(412.59,),
(283.14,),
(350.01, 14),
(226.95,),
(132.7, 14)]
type(transactions)
list
transactions[:6]
[(376.0, 8), (548.23, 14), (107.93, 8), (838.22, 14), (846.85, 21), (234.84,)]
sale = transactions[0]
type(sale)
tuple
commission_amount = round(sale[0] * (sale[1] / 100), 2)
commission_amount
30.08
sale = (234.84,)
commission_amount = round(sale[0] * (sale[1] / 100), 2) # errors out
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-64-158aaaf09c96> in <module>
----> 1 commission_amount = round(sale[0] * (sale[1] / 100), 2) # errors out
IndexError: tuple index out of range
len(sale)
1
commission_pct = sale[1] / 100 if len(sale) == 2 else 0
commission_pct
0
def get_commission_amount(sales):
commission_amount = 0.0
for sale in sales:
sale_amount = sale[0]
commission_pct = round(sale[1]/100, 2) if len(sale) == 2 else 0
commission_amount += sale_amount * commission_pct
return round(commission_amount, 2)
get_commission_amount(transactions)
1066.51