Performing Aggregations - Grouped¶
Here are some of the examples for grouped aggregations.
Get number of employees for each department
Get daily revenue for a given month (aggregation for a given day and filtering based up on month).
Number of courses enrolled by each student
Number of students enrolled for each course
Note
Let us perform couple of tasks related to grouped aggregations.
%run 07_preparing_data_sets.ipynb
Task 1¶
Create a function get_count_by_order_status which takes orders list as argument and returns a dict which contain order_status and corresponding count.
def get_count_by_order_status(orders):
order_count = {} # dict
for order in orders:
order_status = order.split(',')[3]
if order_status in order_count:
# if order_count contains key,
# then increment the existing value by 1
order_count[order_status] += 1
else:
# if order_count does not contain key,
# then add new element to order_count
order_count[order_status] = 1
return order_count
get_count_by_order_status(orders)
Task 2¶
Create a function get_revenue_per_order which takes order_items list as argument and returns a dict which contain order_item_order_id and corresponding quantity as well as order_revenue.
Function will take order_items as argument.
It shall return a list of orders with corresponding quantity and revenue. You can either use
list
ordict
ortuple
for quantity and revenue.
Note
This is an example to get order_id and corresponding quantity as well as revenue. For each order_id, quantity and revenue are treated as list.
def get_revenue_per_order(order_items):
revenue_per_order = {}
for order_item in order_items:
order_item_order_id = int(order_item.split(',')[1])
order_item_quantity = int(order_item.split(',')[3])
order_item_subtotal = float(order_item.split(',')[4])
order_metrics = [order_item_quantity, order_item_subtotal]
if revenue_per_order.get(order_item_order_id):
# if revenue_per_order contains key,
# then add order_item_subtotal to existing value
revenue_per_order[order_item_order_id][0] += order_metrics[0]
revenue_per_order[order_item_order_id][1] += order_metrics[1]
else:
# if revenue_per_order does not contain key,
# then add new element to revenue_per_order
revenue_per_order[order_item_order_id] = order_metrics
return revenue_per_order
list(get_revenue_per_order(order_items).items())[:10]
[(1, [1, 299.98]),
(2, [7, 579.98]),
(4, [14, 699.85]),
(5, [10, 1129.8600000000001]),
(7, [7, 579.9200000000001]),
(8, [13, 729.8399999999999]),
(9, [4, 599.96]),
(10, [6, 651.9200000000001]),
(11, [15, 919.79]),
(12, [18, 1299.8700000000001])]
Note
This is an example to get order_id and corresponding quantity as well as revenue. For each order_id, quantity and revenue are treated as dict.
def get_revenue_per_order(order_items):
revenue_per_order = {}
for order_item in order_items:
order_item_order_id = int(order_item.split(',')[1])
order_item_quantity = int(order_item.split(',')[3])
order_item_subtotal = float(order_item.split(',')[4])
order_metrics = {'order_item_quantity': order_item_quantity, 'order_item_subtotal': order_item_subtotal}
if revenue_per_order.get(order_item_order_id):
# if revenue_per_order contains key,
# then add order_item_subtotal to existing value
revenue_per_order[order_item_order_id]['order_item_quantity'] += order_metrics['order_item_quantity']
revenue_per_order[order_item_order_id]['order_item_subtotal'] += order_metrics['order_item_subtotal']
else:
# if revenue_per_order does not contain key,
# then add new element to revenue_per_order
revenue_per_order[order_item_order_id] = order_metrics
return revenue_per_order
list(get_revenue_per_order(order_items).items())[:10]
[(1, {'order_item_quantity': 1, 'order_item_subtotal': 299.98}),
(2, {'order_item_quantity': 7, 'order_item_subtotal': 579.98}),
(4, {'order_item_quantity': 14, 'order_item_subtotal': 699.85}),
(5, {'order_item_quantity': 10, 'order_item_subtotal': 1129.8600000000001}),
(7, {'order_item_quantity': 7, 'order_item_subtotal': 579.9200000000001}),
(8, {'order_item_quantity': 13, 'order_item_subtotal': 729.8399999999999}),
(9, {'order_item_quantity': 4, 'order_item_subtotal': 599.96}),
(10, {'order_item_quantity': 6, 'order_item_subtotal': 651.9200000000001}),
(11, {'order_item_quantity': 15, 'order_item_subtotal': 919.79}),
(12, {'order_item_quantity': 18, 'order_item_subtotal': 1299.8700000000001})]
Task 3¶
Create a function get_order_count_by_month which takes orders list and order_status as arguments and returns a dict which contain order_month and count.
We only have to count those orders which belong to the passed order_status.
If order_status is not passed then we need to count all records.
def get_order_count_by_month(orders, order_status='ALL'):
order_count = {}
for order in orders:
order_month = order.split(',')[1][:7]
l_order_status = order.split(',')[3]
if order_status == 'ALL' or l_order_status == order_status:
if order_month in order_count:
order_count[order_month] += 1
else:
order_count[order_month] = 1
return order_count
get_order_count_by_month(orders)
get_order_count_by_month(orders, 'COMPLETE')
get_order_count_by_month(orders, 'CLOSED')