Overview of Standard Transformations¶
Let us understand standard transformations we perform on top of data in collections.
Filtering
Get all the revenue generating orders considering only with order_status as COMPLETE or CLOSED.
Get all the sales details of all the dealers with in a state.
Row level transformations such as standardization, masking, cleansing etc.
Converting names to upper or lower cases.
Standardizing Phone numbers or Addresses.
Encrypting Phone Numbers or Social Security Numbers.
Masking Phone Numbers, Social Security Numbers or Date of Births and provide limited information such as last 4 digits of social security number.
Removing unwanted spaces and other special characters.
Total Aggregations
Revenue generated by a product
Revenue generated for a given day
Grouped Aggregations
Revenue generated by each product
Revenue generated on daily basis
Revenue generated by each car dealer
Revenue generated by each retail store
Sales Commission for each channel partner
Sorting and Ranking
Top 5 stores by revenue
Top 5 car dealers by sales
Top 10 sales agents by the commission
Typically we use external libraries such as Pandas, Pyspark etc to perform these standard transformations. However, we will try to develop using conventional loops to understand how they are implemented and also to get better with respect to programming.