{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Performing Grouped Aggregations\n", "\n", "Let us understand how to perform grouped or by key aggregations using Pandas.\n", "* Here are the steps we need to follow:\n", " * Make sure data is read into Data Frame.\n", " * Identify the key on which data should be aggregated. If the data has to be aggregated on derived field which is not available as part of the Data Frame, then first we need to update data frame with the derived field.\n", " * Using the key group the values using `groupby` function on data frame. We can only pass column names from Data Frame as part of `groupby`.\n", " * Apply required aggregate functions to get aggregated results based up on the key.\n", "* We can apply multiple aggregate functions at a time after creating grouped data frame.\n", "* Pandas Data Frame exposes a function called as `rename` to provide aliases to the aggregated fields." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "remove-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%HTML\n", "" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "%run 06_csv_to_pandas_data_frame.ipynb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Getting number of orders per day" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
order_idorder_dateorder_customer_idorder_status
012013-07-25 00:00:00.011599CLOSED
122013-07-25 00:00:00.0256PENDING_PAYMENT
232013-07-25 00:00:00.012111COMPLETE
342013-07-25 00:00:00.08827CLOSED
452013-07-25 00:00:00.011318COMPLETE
...............
68878688792014-07-09 00:00:00.0778COMPLETE
68879688802014-07-13 00:00:00.01117COMPLETE
68880688812014-07-19 00:00:00.02518PENDING_PAYMENT
68881688822014-07-22 00:00:00.010000ON_HOLD
68882688832014-07-23 00:00:00.05533COMPLETE
\n", "

68883 rows × 4 columns

\n", "
" ], "text/plain": [ " order_id order_date order_customer_id order_status\n", "0 1 2013-07-25 00:00:00.0 11599 CLOSED\n", "1 2 2013-07-25 00:00:00.0 256 PENDING_PAYMENT\n", "2 3 2013-07-25 00:00:00.0 12111 COMPLETE\n", "3 4 2013-07-25 00:00:00.0 8827 CLOSED\n", "4 5 2013-07-25 00:00:00.0 11318 COMPLETE\n", "... ... ... ... ...\n", "68878 68879 2014-07-09 00:00:00.0 778 COMPLETE\n", "68879 68880 2014-07-13 00:00:00.0 1117 COMPLETE\n", "68880 68881 2014-07-19 00:00:00.0 2518 PENDING_PAYMENT\n", "68881 68882 2014-07-22 00:00:00.0 10000 ON_HOLD\n", "68882 68883 2014-07-23 00:00:00.0 5533 COMPLETE\n", "\n", "[68883 rows x 4 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "orders" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "orders.groupby(orders['order_date'])" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('2013-07-25 00:00:00.0',\n", " 0 1\n", " 1 2\n", " 2 3\n", " 3 4\n", " 4 5\n", " ... \n", " 57786 57787\n", " 57787 57788\n", " 57788 57789\n", " 67415 67416\n", " 68690 68691\n", " Name: order_id, Length: 143, dtype: int64),\n", " ('2013-07-26 00:00:00.0',\n", " 104 105\n", " 105 106\n", " 106 107\n", " 107 108\n", " 108 109\n", " ... \n", " 67418 67419\n", " 67419 67420\n", " 67420 67421\n", " 67421 67422\n", " 68691 68692\n", " Name: order_id, Length: 269, dtype: int64),\n", " ('2013-07-27 00:00:00.0',\n", " 346 347\n", " 347 348\n", " 348 349\n", " 349 350\n", " 350 351\n", " ... \n", " 67422 67423\n", " 67423 67424\n", " 67424 67425\n", " 67425 67426\n", " 68692 68693\n", " Name: order_id, Length: 202, dtype: int64)]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(orders.groupby(orders['order_date'])['order_id'])[:3]" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "order_date\n", "2013-07-25 00:00:00.0 143\n", "2013-07-26 00:00:00.0 269\n", "2013-07-27 00:00:00.0 202\n", "2013-07-28 00:00:00.0 187\n", "2013-07-29 00:00:00.0 253\n", " ... \n", "2014-07-20 00:00:00.0 285\n", "2014-07-21 00:00:00.0 235\n", "2014-07-22 00:00:00.0 138\n", "2014-07-23 00:00:00.0 166\n", "2014-07-24 00:00:00.0 185\n", "Name: order_id, Length: 364, dtype: int64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "orders.groupby(orders['order_date'])['order_id'].count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Getting number of orders per status" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "order_status\n", "CANCELED 1428\n", "CLOSED 7556\n", "COMPLETE 22899\n", "ON_HOLD 3798\n", "PAYMENT_REVIEW 729\n", "PENDING 7610\n", "PENDING_PAYMENT 15030\n", "PROCESSING 8275\n", "SUSPECTED_FRAUD 1558\n", "Name: order_status, dtype: int64" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "orders.groupby('order_status')['order_status'].count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Computing revenue per order" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
order_item_idorder_item_order_idorder_item_product_idorder_item_quantityorder_item_subtotalorder_item_product_price
0119571299.98299.98
12210731199.99199.99
2325025250.0050.00
3424031129.99129.99
454897249.9824.99
.....................
172193172194688814031129.99129.99
17219417219568882365159.9959.99
17219517219668882502150.0050.00
1721961721976888320811999.991999.99
172197172198688835023150.0050.00
\n", "

172198 rows × 6 columns

\n", "
" ], "text/plain": [ " order_item_id order_item_order_id order_item_product_id \\\n", "0 1 1 957 \n", "1 2 2 1073 \n", "2 3 2 502 \n", "3 4 2 403 \n", "4 5 4 897 \n", "... ... ... ... \n", "172193 172194 68881 403 \n", "172194 172195 68882 365 \n", "172195 172196 68882 502 \n", "172196 172197 68883 208 \n", "172197 172198 68883 502 \n", "\n", " order_item_quantity order_item_subtotal order_item_product_price \n", "0 1 299.98 299.98 \n", "1 1 199.99 199.99 \n", "2 5 250.00 50.00 \n", "3 1 129.99 129.99 \n", "4 2 49.98 24.99 \n", "... ... ... ... \n", "172193 1 129.99 129.99 \n", "172194 1 59.99 59.99 \n", "172195 1 50.00 50.00 \n", "172196 1 1999.99 1999.99 \n", "172197 3 150.00 50.00 \n", "\n", "[172198 rows x 6 columns]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "order_items" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[(1,\n", " 0 299.98\n", " Name: order_item_subtotal, dtype: float64),\n", " (2,\n", " 1 199.99\n", " 2 250.00\n", " 3 129.99\n", " Name: order_item_subtotal, dtype: float64),\n", " (4,\n", " 4 49.98\n", " 5 299.95\n", " 6 150.00\n", " 7 199.92\n", " Name: order_item_subtotal, dtype: float64),\n", " (5,\n", " 8 299.98\n", " 9 299.95\n", " 10 99.96\n", " 11 299.98\n", " 12 129.99\n", " Name: order_item_subtotal, dtype: float64),\n", " (7,\n", " 13 199.99\n", " 14 299.98\n", " 15 79.95\n", " Name: order_item_subtotal, dtype: float64)]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(order_items. \\\n", " groupby('order_item_order_id')['order_item_subtotal'])[:5]" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "order_item_order_id\n", "1 299.98\n", "2 579.98\n", "4 699.85\n", "5 1129.86\n", "7 579.92\n", " ... \n", "68879 1259.97\n", "68880 999.77\n", "68881 129.99\n", "68882 109.99\n", "68883 2149.99\n", "Name: order_item_subtotal, Length: 57431, dtype: float64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "order_items. \\\n", " groupby('order_item_order_id')['order_item_subtotal']. \\\n", " sum()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
summinmaxcount
order_item_order_id
1299.98299.98299.981
2579.98129.99250.003
4699.8549.98299.954
51129.8699.96299.985
7579.9279.95299.983
...............
688791259.97129.99999.993
68880999.77149.94250.005
68881129.99129.99129.991
68882109.9950.0059.992
688832149.99150.001999.992
\n", "

57431 rows × 4 columns

\n", "
" ], "text/plain": [ " sum min max count\n", "order_item_order_id \n", "1 299.98 299.98 299.98 1\n", "2 579.98 129.99 250.00 3\n", "4 699.85 49.98 299.95 4\n", "5 1129.86 99.96 299.98 5\n", "7 579.92 79.95 299.98 3\n", "... ... ... ... ...\n", "68879 1259.97 129.99 999.99 3\n", "68880 999.77 149.94 250.00 5\n", "68881 129.99 129.99 129.99 1\n", "68882 109.99 50.00 59.99 2\n", "68883 2149.99 150.00 1999.99 2\n", "\n", "[57431 rows x 4 columns]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "order_items. \\\n", " groupby('order_item_order_id')['order_item_subtotal']. \\\n", " agg(['sum', 'min', 'max', 'count'])" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
revenueminmaxitem_count
order_item_order_id
1299.98299.98299.981
2579.98129.99250.003
4699.8549.98299.954
51129.8699.96299.985
7579.9279.95299.983
...............
688791259.97129.99999.993
68880999.77149.94250.005
68881129.99129.99129.991
68882109.9950.0059.992
688832149.99150.001999.992
\n", "

57431 rows × 4 columns

\n", "
" ], "text/plain": [ " revenue min max item_count\n", "order_item_order_id \n", "1 299.98 299.98 299.98 1\n", "2 579.98 129.99 250.00 3\n", "4 699.85 49.98 299.95 4\n", "5 1129.86 99.96 299.98 5\n", "7 579.92 79.95 299.98 3\n", "... ... ... ... ...\n", "68879 1259.97 129.99 999.99 3\n", "68880 999.77 149.94 250.00 5\n", "68881 129.99 129.99 129.99 1\n", "68882 109.99 50.00 59.99 2\n", "68883 2149.99 150.00 1999.99 2\n", "\n", "[57431 rows x 4 columns]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "order_items. \\\n", " groupby('order_item_order_id')['order_item_subtotal']. \\\n", " agg(['sum', 'min', 'max', 'count']). \\\n", " rename(columns={'count': 'item_count', 'sum': 'revenue'})" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
order_item_idorder_idorder_item_product_idorder_item_quantityorder_item_subtotalorder_item_product_price
0119571299.98299.98
12210731199.99199.99
2325025250.0050.00
3424031129.99129.99
454897249.9824.99
.....................
172193172194688814031129.99129.99
17219417219568882365159.9959.99
17219517219668882502150.0050.00
1721961721976888320811999.991999.99
172197172198688835023150.0050.00
\n", "

172198 rows × 6 columns

\n", "
" ], "text/plain": [ " order_item_id order_id order_item_product_id order_item_quantity \\\n", "0 1 1 957 1 \n", "1 2 2 1073 1 \n", "2 3 2 502 5 \n", "3 4 2 403 1 \n", "4 5 4 897 2 \n", "... ... ... ... ... \n", "172193 172194 68881 403 1 \n", "172194 172195 68882 365 1 \n", "172195 172196 68882 502 1 \n", "172196 172197 68883 208 1 \n", "172197 172198 68883 502 3 \n", "\n", " order_item_subtotal order_item_product_price \n", "0 299.98 299.98 \n", "1 199.99 199.99 \n", "2 250.00 50.00 \n", "3 129.99 129.99 \n", "4 49.98 24.99 \n", "... ... ... \n", "172193 129.99 129.99 \n", "172194 59.99 59.99 \n", "172195 50.00 50.00 \n", "172196 1999.99 1999.99 \n", "172197 150.00 50.00 \n", "\n", "[172198 rows x 6 columns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "order_items.rename(columns={'order_item_order_id': 'order_id'})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task 1\n", "\n", "Get order_item_count and order_revenue for each order_id." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
order_item_idorder_item_order_idorder_item_product_idorder_item_quantityorder_item_subtotalorder_item_product_price
0119571299.98299.98
12210731199.99199.99
2325025250.0050.00
3424031129.99129.99
454897249.9824.99
.....................
172193172194688814031129.99129.99
17219417219568882365159.9959.99
17219517219668882502150.0050.00
1721961721976888320811999.991999.99
172197172198688835023150.0050.00
\n", "

172198 rows × 6 columns

\n", "
" ], "text/plain": [ " order_item_id order_item_order_id order_item_product_id \\\n", "0 1 1 957 \n", "1 2 2 1073 \n", "2 3 2 502 \n", "3 4 2 403 \n", "4 5 4 897 \n", "... ... ... ... \n", "172193 172194 68881 403 \n", "172194 172195 68882 365 \n", "172195 172196 68882 502 \n", "172196 172197 68883 208 \n", "172197 172198 68883 502 \n", "\n", " order_item_quantity order_item_subtotal order_item_product_price \n", "0 1 299.98 299.98 \n", "1 1 199.99 199.99 \n", "2 5 250.00 50.00 \n", "3 1 129.99 129.99 \n", "4 2 49.98 24.99 \n", "... ... ... ... \n", "172193 1 129.99 129.99 \n", "172194 1 59.99 59.99 \n", "172195 1 50.00 50.00 \n", "172196 1 1999.99 1999.99 \n", "172197 3 150.00 50.00 \n", "\n", "[172198 rows x 6 columns]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "order_items" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
order_item_order_idorder_revenueorder_item_count
01299.981
12579.983
24699.854
351129.865
47579.923
............
57426688791259.973
5742768880999.775
5742868881129.991
5742968882109.992
57430688832149.992
\n", "

57431 rows × 3 columns

\n", "
" ], "text/plain": [ " order_item_order_id order_revenue order_item_count\n", "0 1 299.98 1\n", "1 2 579.98 3\n", "2 4 699.85 4\n", "3 5 1129.86 5\n", "4 7 579.92 3\n", "... ... ... ...\n", "57426 68879 1259.97 3\n", "57427 68880 999.77 5\n", "57428 68881 129.99 1\n", "57429 68882 109.99 2\n", "57430 68883 2149.99 2\n", "\n", "[57431 rows x 3 columns]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "order_items. \\\n", " groupby('order_item_order_id')['order_item_subtotal']. \\\n", " agg(['sum', 'count']). \\\n", " rename(columns={'sum': 'order_revenue', 'count': 'order_item_count'}). \\\n", " reset_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task 2\n", "\n", "Get order count by month using orders data for specific order_status." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
order_idorder_dateorder_customer_idorder_status
012013-07-25 00:00:00.011599CLOSED
122013-07-25 00:00:00.0256PENDING_PAYMENT
232013-07-25 00:00:00.012111COMPLETE
342013-07-25 00:00:00.08827CLOSED
452013-07-25 00:00:00.011318COMPLETE
...............
68878688792014-07-09 00:00:00.0778COMPLETE
68879688802014-07-13 00:00:00.01117COMPLETE
68880688812014-07-19 00:00:00.02518PENDING_PAYMENT
68881688822014-07-22 00:00:00.010000ON_HOLD
68882688832014-07-23 00:00:00.05533COMPLETE
\n", "

68883 rows × 4 columns

\n", "
" ], "text/plain": [ " order_id order_date order_customer_id order_status\n", "0 1 2013-07-25 00:00:00.0 11599 CLOSED\n", "1 2 2013-07-25 00:00:00.0 256 PENDING_PAYMENT\n", "2 3 2013-07-25 00:00:00.0 12111 COMPLETE\n", "3 4 2013-07-25 00:00:00.0 8827 CLOSED\n", "4 5 2013-07-25 00:00:00.0 11318 COMPLETE\n", "... ... ... ... ...\n", "68878 68879 2014-07-09 00:00:00.0 778 COMPLETE\n", "68879 68880 2014-07-13 00:00:00.0 1117 COMPLETE\n", "68880 68881 2014-07-19 00:00:00.0 2518 PENDING_PAYMENT\n", "68881 68882 2014-07-22 00:00:00.0 10000 ON_HOLD\n", "68882 68883 2014-07-23 00:00:00.0 5533 COMPLETE\n", "\n", "[68883 rows x 4 columns]" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "orders" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 2013-07\n", "1 2013-07\n", "2 2013-07\n", "3 2013-07\n", "4 2013-07\n", " ... \n", "68878 2014-07\n", "68879 2014-07\n", "68880 2014-07\n", "68881 2014-07\n", "68882 2014-07\n", "Name: order_date, Length: 68883, dtype: object" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "orders.order_date.str.slice(0, 7)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "orders['order_month'] = orders.order_date.str.slice(0, 7)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
order_idorder_dateorder_customer_idorder_statusorder_month
012013-07-25 00:00:00.011599CLOSED2013-07
122013-07-25 00:00:00.0256PENDING_PAYMENT2013-07
232013-07-25 00:00:00.012111COMPLETE2013-07
342013-07-25 00:00:00.08827CLOSED2013-07
452013-07-25 00:00:00.011318COMPLETE2013-07
..................
68878688792014-07-09 00:00:00.0778COMPLETE2014-07
68879688802014-07-13 00:00:00.01117COMPLETE2014-07
68880688812014-07-19 00:00:00.02518PENDING_PAYMENT2014-07
68881688822014-07-22 00:00:00.010000ON_HOLD2014-07
68882688832014-07-23 00:00:00.05533COMPLETE2014-07
\n", "

68883 rows × 5 columns

\n", "
" ], "text/plain": [ " order_id order_date order_customer_id order_status \\\n", "0 1 2013-07-25 00:00:00.0 11599 CLOSED \n", "1 2 2013-07-25 00:00:00.0 256 PENDING_PAYMENT \n", "2 3 2013-07-25 00:00:00.0 12111 COMPLETE \n", "3 4 2013-07-25 00:00:00.0 8827 CLOSED \n", "4 5 2013-07-25 00:00:00.0 11318 COMPLETE \n", "... ... ... ... ... \n", "68878 68879 2014-07-09 00:00:00.0 778 COMPLETE \n", "68879 68880 2014-07-13 00:00:00.0 1117 COMPLETE \n", "68880 68881 2014-07-19 00:00:00.0 2518 PENDING_PAYMENT \n", "68881 68882 2014-07-22 00:00:00.0 10000 ON_HOLD \n", "68882 68883 2014-07-23 00:00:00.0 5533 COMPLETE \n", "\n", " order_month \n", "0 2013-07 \n", "1 2013-07 \n", "2 2013-07 \n", "3 2013-07 \n", "4 2013-07 \n", "... ... \n", "68878 2014-07 \n", "68879 2014-07 \n", "68880 2014-07 \n", "68881 2014-07 \n", "68882 2014-07 \n", "\n", "[68883 rows x 5 columns]" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "orders" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "order_month\n", "2013-07 515\n", "2014-07 1419\n", "2013-10 1783\n", "2014-06 1797\n", "2014-05 1854\n", "2014-02 1869\n", "2013-08 1880\n", "2013-12 1898\n", "2014-01 1911\n", "2014-04 1932\n", "2013-09 1933\n", "2014-03 1967\n", "2013-11 2141\n", "Name: order_id, dtype: int64" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "orders.query('order_status == \"COMPLETE\"'). \\\n", " groupby('order_month')['order_id']. \\\n", " count(). \\\n", " sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task 3\n", "\n", "Get order_revenue and order_quantity for each order_id. Add quantity of all items for each order_id to get order_quantity." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "order_metrics = order_items. \\\n", " groupby('order_item_order_id')[['order_item_subtotal', 'order_item_quantity']]. \\\n", " agg(['sum'])" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "order_metrics.columns = ['order_revenue', 'order_quantity']" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
order_revenueorder_quantity
order_item_order_id
1299.981
2579.987
4699.8514
51129.8610
7579.927
.........
688791259.973
68880999.7717
68881129.991
68882109.992
688832149.994
\n", "

57431 rows × 2 columns

\n", "
" ], "text/plain": [ " order_revenue order_quantity\n", "order_item_order_id \n", "1 299.98 1\n", "2 579.98 7\n", "4 699.85 14\n", "5 1129.86 10\n", "7 579.92 7\n", "... ... ...\n", "68879 1259.97 3\n", "68880 999.77 17\n", "68881 129.99 1\n", "68882 109.99 2\n", "68883 2149.99 4\n", "\n", "[57431 rows x 2 columns]" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "order_metrics" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.12" } }, "nbformat": 4, "nbformat_minor": 4 }