For a US-based shopping channel and e-commerce retailer
Business Problem
Ingest and transform data for various markets from on-prem to Azure Cloud.
Abzooba’s Solution:
Proposed a metadata driven processing framework which is developed using PySpark.
The framework dynamically handles multiple source files coming from various markets by fetching the configuration values from underlying metadata tables.
Once the data is landed in ADLS landing zone, it then goes through multiple processing layers like Rule Based Cleansing, Deduplication and Transformations.
The processed data is then loaded into persistent storage in Delta Lake and Azure Synapse target tables.
The process is orchestrated with the help of Stonebranch and Azure Data Factory.
Business Benefits:
Minimal configuration required for processing different files from various markets.
Parallelism can be implemented for all independent processes whereas the dependent ones are run in the appropriate order as per our configurations .
Stonebranch and ADF enable scheduling and automation of pipeline runs.