Home > Data Ingestion (On-Premise to Cloud)

Data Ingestion (On-Premise to Cloud) 

Client

For a US-based shopping channel and e-commerce retailer

Business Problem

  • Ingest and transform data for various markets from on-prem to Azure Cloud.

Abzooba’s Solution:

  • Proposed a metadata driven processing framework which is developed using PySpark.
  • The framework dynamically handles multiple source files coming from various markets by fetching the configuration values from underlying metadata tables.
  • Once the data is landed in ADLS landing zone, it then goes through multiple processing layers like Rule Based Cleansing, Deduplication and Transformations.
  • The processed data is then loaded into persistent storage in Delta Lake and Azure Synapse target tables.
  • The process is orchestrated with the help of Stonebranch and Azure Data Factory.

Business Benefits:

  • Minimal configuration required for processing different files from various markets.
  • Parallelism can be implemented for all independent processes whereas the dependent ones are run in the appropriate order as per our configurations .
  • Stonebranch and ADF enable scheduling and automation of pipeline runs.

Tech Stack

Speak to AI expert