Performing Aggregations - Grouped

Here are some of the examples for grouped aggregations.

  • Get number of employees for each department

  • Get daily revenue for a given month (aggregation for a given day and filtering based up on month).

  • Number of courses enrolled by each student

  • Number of students enrolled for each course

Note

Let us perform couple of tasks related to grouped aggregations.

%run 07_preparing_data_sets.ipynb

Task 1

Create a function get_count_by_order_status which takes orders list as argument and returns a dict which contain order_status and corresponding count.

def get_count_by_order_status(orders):
    order_count = {} # dict
    for order in orders:
        order_status = order.split(',')[3]
        if order_status in order_count:
            # if order_count contains key, 
            # then increment the existing value by 1
            order_count[order_status] += 1
        else: 
            # if order_count does not contain key,
            # then add new element to order_count
            order_count[order_status] = 1
    return order_count
get_count_by_order_status(orders)

Task 2

Create a function get_revenue_per_order which takes order_items list as argument and returns a dict which contain order_item_order_id and corresponding quantity as well as order_revenue.

  • Function will take order_items as argument.

  • It shall return a list of orders with corresponding quantity and revenue. You can either use list or dict or tuple for quantity and revenue.

Note

This is an example to get order_id and corresponding quantity as well as revenue. For each order_id, quantity and revenue are treated as list.

def get_revenue_per_order(order_items):
    revenue_per_order = {}
    for order_item in order_items:
        order_item_order_id = int(order_item.split(',')[1])
        order_item_quantity = int(order_item.split(',')[3])
        order_item_subtotal = float(order_item.split(',')[4])
        order_metrics = [order_item_quantity, order_item_subtotal]
        if revenue_per_order.get(order_item_order_id):
            # if revenue_per_order contains key, 
            # then add order_item_subtotal to existing value
            revenue_per_order[order_item_order_id][0] += order_metrics[0]
            revenue_per_order[order_item_order_id][1] += order_metrics[1]
        else:
            # if revenue_per_order does not contain key, 
            # then add new element to revenue_per_order
            revenue_per_order[order_item_order_id] = order_metrics
    return revenue_per_order
list(get_revenue_per_order(order_items).items())[:10]
[(1, [1, 299.98]),
 (2, [7, 579.98]),
 (4, [14, 699.85]),
 (5, [10, 1129.8600000000001]),
 (7, [7, 579.9200000000001]),
 (8, [13, 729.8399999999999]),
 (9, [4, 599.96]),
 (10, [6, 651.9200000000001]),
 (11, [15, 919.79]),
 (12, [18, 1299.8700000000001])]

Note

This is an example to get order_id and corresponding quantity as well as revenue. For each order_id, quantity and revenue are treated as dict.

def get_revenue_per_order(order_items):
    revenue_per_order = {}
    for order_item in order_items:
        order_item_order_id = int(order_item.split(',')[1])
        order_item_quantity = int(order_item.split(',')[3])
        order_item_subtotal = float(order_item.split(',')[4])
        order_metrics = {'order_item_quantity': order_item_quantity, 'order_item_subtotal': order_item_subtotal}
        if revenue_per_order.get(order_item_order_id):
            # if revenue_per_order contains key, 
            # then add order_item_subtotal to existing value
            revenue_per_order[order_item_order_id]['order_item_quantity'] += order_metrics['order_item_quantity']
            revenue_per_order[order_item_order_id]['order_item_subtotal'] += order_metrics['order_item_subtotal']
        else:
            # if revenue_per_order does not contain key, 
            # then add new element to revenue_per_order
            revenue_per_order[order_item_order_id] = order_metrics
    return revenue_per_order
list(get_revenue_per_order(order_items).items())[:10]
[(1, {'order_item_quantity': 1, 'order_item_subtotal': 299.98}),
 (2, {'order_item_quantity': 7, 'order_item_subtotal': 579.98}),
 (4, {'order_item_quantity': 14, 'order_item_subtotal': 699.85}),
 (5, {'order_item_quantity': 10, 'order_item_subtotal': 1129.8600000000001}),
 (7, {'order_item_quantity': 7, 'order_item_subtotal': 579.9200000000001}),
 (8, {'order_item_quantity': 13, 'order_item_subtotal': 729.8399999999999}),
 (9, {'order_item_quantity': 4, 'order_item_subtotal': 599.96}),
 (10, {'order_item_quantity': 6, 'order_item_subtotal': 651.9200000000001}),
 (11, {'order_item_quantity': 15, 'order_item_subtotal': 919.79}),
 (12, {'order_item_quantity': 18, 'order_item_subtotal': 1299.8700000000001})]

Task 3

Create a function get_order_count_by_month which takes orders list and order_status as arguments and returns a dict which contain order_month and count.

  • We only have to count those orders which belong to the passed order_status.

  • If order_status is not passed then we need to count all records.

def get_order_count_by_month(orders, order_status='ALL'):
    order_count = {}
    for order in orders:
        order_month = order.split(',')[1][:7]
        l_order_status = order.split(',')[3]
        if order_status == 'ALL' or l_order_status == order_status:
            if order_month in order_count: 
                order_count[order_month] += 1
            else: 
                order_count[order_month] = 1
    return order_count
get_order_count_by_month(orders)
get_order_count_by_month(orders, 'COMPLETE')
get_order_count_by_month(orders, 'CLOSED')