Home>Olive – Product Information for Azure Marketplace

Getting Started

ODIF Product Overview

Olive Data Ingestion Framework (ODIF) is an ingestion tool which can connect to configurable sources and sinks to accelerates data ingestion. It built with a cloud agnostic approach, with no pre-installation of cluster and can be deployed with minimal resource footprint. Enterprise do not have to worry about setting up hadoop cluster, with elastic compute it makes easy to setup the compute power based on source data-size. It provides a user-friendly web interface which helps user in, data source registration, job config, job runs and monitoring.

Key Feature

  • One Homogenous code for all source and sink.
  • No Pre-Installation of hadoop cluster.
  • Minimal Resource Footprint.
  • Cloud-agnostic ingestion engine.
  • Compute configuration decided based on ingestion data size.

Launch ODIF

Deploy ODIF

For ODIF deployment on Azure we need to create Resource Group, a VNET and a Databricks cluster. Below steps will help in setting up Olive on Azure:

The details of the each of the above steps are as follows:

Create Resource group

  • Search Resource groups on Azure home page and select it.
  • Click on Add button to create resource group.
  • Under resource group service, apply the following settings as per tab selected:
    1. Basic:
      1. Select appropriate subscription.
      2. Give appropriate name to the Resource Group.
      3. Select the Region in which to create a Resource Group.
    2. Tags: Tags are optional, user can add them according to their organization standards.

      • Click on Review + Create button and after click on create button to create Resource group.

Create Virtual Network and Subnets

To launch ODIF, need to create a VNET which will hold 3 subnets.

  1. Subnet to hold ODIF VM.
  2. Public Subnet for Databricks.
  3. Private Subnet for Databricks.

Steps to Create VNET and Subnets

  • Search Virtual Network on Azure home page and select it.
  • Click on Add button to create virtual network.
  • Under virtual network service, apply the following settings:
    1. Basic:
      • Select appropriate subscription.
      • Select the Resource Group created in the above section.
      • Give appropriate name to virtual network.
      • Select the same region in which Resource Group was created.
    2. IP Addresses Tab. Create three Subnets under main VNET:
      • Create a subnet for ODIF VM. (Rename default Subnet to ODIF_VM_SUBNET).
      • Create a public subnet for Databricks workspace. (ODIF-DB-PUBLIC)
      • Create a private subnet for Databricks workspace. (ODIF-DB-PRIVATE)
    3. Security: Security settings can be kept as default.
    4. Tags: Tags are optional, user can add them according to their organization standards.
    5. Click on Review + Create button and after click on create button to create Virtual Network.

Create and Configure ODIF VM

  • Search Virtual Machine on Azure home page and select it.
  • Click on Add button to create virtual machine.
  • Under virtual machine service, apply the following settings:
    1. Basic:
      • Select appropriate subscription.
      • Select resource group created in the earlier section.
      • Give appropriate name to VM.
      • Select the same region as used for Resource Group and VNET
      • Select the appropriate availability zone.
      • Select the “Olive_Plan2021_Per_Hour-Gen1” image from marketplace.
      • Select the instance type as recommended (D4_v3 – 4vcpus,16 GiB)
      • Configure other parameters as per requirement.
    2. Disk: Disk’s option can be kept as per defaults.
    3. Networking:
      • Select the virtual network as the one configured in earlier section.
      • Select the subnet created for ODIF VM(ODIF_VM_SUBNET).
      • Select “Advanced” option for NIC network security group.
      • Select “create new” option for “Configure network security group” option as shown in below image.
      • Add the inbound security rule for a user’s public IP so that application can be accessed.
      • Configure the settings as shown in below images.
    4. Management: Management options can be kept as defaults.
    5. Advanced: Advanced options can be kept as defaults.
    6. Tags: Tags are optional, user can add them according to their organization standards.
    7. Click on Review + Create button and after click on create button to create virtual machine.

Setup Databricks workspace

An Azure Databricks workspace provides access computational resources such as clusters and jobs.

Steps for setup Databricks workspace are as follow:

  • Search Azure Databricks on Azure home page and select it.
  • Click on Add button to create an azure Databricks workspace.
  • Under Azure Databricks Service, apply the following settings:
    1. Basics:
      • Select appropriate subscription and Resource group that you created early.
      • Provide workspace name and select same region where resource group is present.
    2. Networking:
    3. Tags: Tags are optional user can add them according to their organization standards.
    4. Click on Review + Create button and after click on create button to deploy workspace.

Configure ODIF Application

  • Login to Odif
    Once application setup launches it at http://<VM-IP>:8081 application run at port 8081 and VM-IP refers to IP Address reserved for the virtual machine.Login with Credentials:Username: odifPassword: odif@123
  • Login for Rabbit MQ http://<VM-IP>:15672

Screen Details:

Compute Details:

Odif user is admin user which will be used to setup basic configuration. After login as admin Compute Details will be the first screen. ODIF launch resources Dynamically based on input source size, hence need to provide some details to setup dynamic compute. select compute type (azure default)

Field Description
Client id Can be obtained from Admin.
Tenant id Can be obtained from Admin.
Client secret Can be obtained from Admin.
Databricks URL From Databricks Overview Page (see below)
Workspace name From Databricks Overview Page (see below)
Subscription id From Databricks Overview Page (see below)
Databricks token Generate User token from Databricks cluster (see below)
Workspace resource group From Databricks Overview Page (see below)
Databricks URL, Workspace Name, Subscription ID, and Workspace Resource Group Details:

To get Databricks URL, workspace name, Subscription id, Workspace resource group please visit overview page of Databricks workspace created earlier.

Generate User Token

To get Databricks token click on launch workspace. After workspace gets launched click on workspace name dropdown on top right as shown. Select user settings.

Click on generate token give appropriate comment and click on Generate. Make sure to copy token before closing popup.

Connectors:

Connector screen is for creating and modification of connectors. Once connector is created it can be used as source or sink.

Once details filled using connectivity with the source/sink will be validated and after successful validation source connector can be submitted   .

Job Configuration:

This screen helps in configure job where a link between source and sink can be setup.

  1. User can give appropriate job name and select required source and sink connector.
  2. In case of mysql connector database, table information can be provided. It provides below feature. Load multiple tables from one database in one job. Use
  3. User specific query can also be provided in Query section
  4. User need to provide required fields for sink location
    This screen will submit job to transfer data from source to sink. Use  as default as   option is for other static on-premises hadoop cluster.

Job run:

  • User can direct submit job will result in job execution only once.
  • Another option for scheduled job run also provided where user is expected to submit cron frequency

Job Run Log:

Job run Lo screen shows the status of job (succeeded / failed / In progress). To see more details of job by select job and click on button.

User Screen:

Option for creating different users with different roles (admin, developer) is provided. User with role developer can only see.

  1. Connectors 2. Job Configuration 3. Job run   4. Job Run Log

List of users will appear with email Id, role, Status. User screen is only visible to admin.

Admin can activate or deactivate users. User with status inactive cannot login.

Speak to AI expert