Purpose and Vision

Purpose

  • Build enterprise-level standards in Data Lake processing
  • Reusable Framework with configurability

Vision

  • Defining a standard set of procedures that each data has to pass through
  • Build reusable and configurable accelerators to expedite the data onboarding processing
  • Build a configurable and spark-based business rule engine to enable users to run distributed queries without the need to acquire knowledge for it
  • No Pre-installation of cluster needed and minimal resource footprint (small VM)
  • Dynamic compute to ensure cost optimization

Guiding Principles for Pine

Cloud-native solution
Use of open source technologies like Airflow, Spark
Minimal infra footprint
Dynamic compute
Platform agnostic solution (Any cloud/On-prem)
Modular design (Plug and play model, i.e., add/remove data processing steps as per the requirement)
Minimal configuration needed on client clusters
Configurability and Flexibility

Features & Benefits of Pine

Features

  • Data cataloguing
  • Data lake categorization
  • Data processing accelerators including Data Quality Engine, Data Standardization Engine, and Data profiler
  • A Metadata driven BRE (Business Rule Engine)
  • Auditing and logging
  • Dynamic Compute
  • Infrastructure as a code

Benefits

  • On-board a new dataset in hours, not in days
  • Flexibility in choosing the platform and on choosing the processing steps
  • Flexibility in configuring the framework
  • With BRE, its all about writing Business logic in to Queries (Spark-SQL/Hive)
  • Very less time in learning the framework due to the use of open source technologies

System view of Pine

Pine Framework Architecture

Deployment view

Speak to AI expert