Home>Olive – Product Information

Getting Started

ODIF Product Overview

Olive Data Ingestion Framework (ODIF) is an ingestion tool which can connect to configurable sources and sinks to accelerates data ingestion. It built with a cloud agnostic approach, with no pre-installation of cluster and can be deployed with minimal resource footprint. Enterprise do not have to worry about setting up hadoop cluster, with elastic compute it makes easy to setup the compute power based on source data-size. It provides a user-friendly web interface which helps user in, data source registration, job config, job runs and monitoring.

Key Feature

  • One Homogenous code for all source and sink.
  • No Pre-Installation of hadoop cluster.
  • Minimal Resource Footprint.
  • Cloud-agnostic ingestion engine.
  • Compute configuration decided based on ingestion data size.

Launch ODIF

Launching on AWS

ODIF is an AMI based tool for AWS. This page will describe how to launch and setup application from AMI.

  1. Locate ODIF on marketplace.
  2. Browse Choose and instance type and choose from supported instance type. t2.xlarge is preferred. Minimum: 4 vCPU and 16 GiB.
  3. Configure instance details.
    Points to consider:

    1. VPC and Subnet:
      1. VPC and subnet can be selected as per requirement or can be left as default.
    2. Auto-assign Public IP:
      1. Mark it as Disable, and later a selected EIP can be allocated.
    3. IAM Role:
      1. Create a new IAM role for EC2 instance with access policies. Refer to create IAM for new role creation.

    Rest other option can be left as default or can be selected as per user requirement.

  4. Add Storage
    Add EBS storage as per user requirement (minimum 32 GB). EBS Encryption is marked with default aws/ebs key.
  5. Add Tags
    Tags can be added as per organization/individual policies.
  6. Security Group
    Create new security group where inbound can be configured for ODIF. ODIF need to communicate between EMR and EC2, hence a minimum config for user IP and EMR SG needs to be added to EC2 SG.

    Outbound can be left as default.
    For creation of SG Security Group.
    Note: We need EMR SG to add here, EMR SG can be created using another window or first we can launch ODIF EC2 instance and then create EMR SG and include it in ODIF EC2 SG as inbound rule.
  7. EC2 Key Pair
    Generate a new key pair for EC2 instance and download the same to access EC2 instance.
  8. Review and Launch Instance.

Additional Information

IAM Role

This IAM role is for managing credentials for application that going to run on EC2 instance. There will be 3 main IAM roles required, and below is the process for creation of the same:

  1. IAM for EC2
    1. From Create Role screen select EC2 and click on to attach policy to new role
    2. Attach Policy
      1. AmazonElasticMapReduceFullAccess: To access EMR, S3 and EC2
      2. AssumeRole: To manage Dynamic launch of EMR cluster.
    3. Tags as per organisation/individual policy.
    4. Review and Create Role. This role will be used to launch EC2 instance from AMI
  2. EMR Default Role
    It allows EMR to call EC2 service on your behalf. Check if EMR_DefaultRole exists then use the same else create a new one.

    1. From Create Role screen select EMR and click on to attach policy to new role
    2. It will have default attached policy.
    3. Click on to add tags as per organization/individual policy.
    4. Review and Create role, as EMR_DefaultRole.
  3. EMR_EC2_DefaultRole
    It allows EMR to call EC2 service on your behalf. Check if EMR_DefaultRole exists then use the same else create a new one.

    1. From Create Role screen select EMR and click on to attach policy to new role.
    2. It will have default attached policy.
    3. Click on to add tags as per organization/individual policy.
    4. Review and Create role, as EMR_EC2_DefaultRole

Security Group

Security Group control the incoming and outgoing traffic coming. With ODIF we need security group

  1. EC2 Instance Security Group
    1. Inbound
      1. All TCP for user/org-subnet IP address.
      2. All TCP for EMR Security Group.
      3. SSH for User IP

  2. EMR Security Group
    1. Inbound
      1. All TCP for EC2 security group
      2. All TCP for EMR master-slave security group.
    2. In case EMR have different security group for Master and Slave we need to add EC2 security group in both security group of Master and Slave as Inbound. Below example user single SG for master and slave.
      Note: Good to have single Security Group for both Master and Slave.

EIP Allocation

Elastic IP address is a public IPv4 address, that can be associate with dynamic instance, which is reachable from internet. As best practice this EIP can be mapped in /etc/hosts for easy access to machine.

  1. Select Elastic IPs from EC2 feature, and click on
  2. Select Amazon’s pool of IPv4 addresses (default option) and click on
  3. Select the allocated IP and from Actions, select associate Elastic IP
  4. Select Resource and instance and choose launched instance to associate EIP.
  5. Click on to associate EIP with instance.

Assume Role

AssumeRole policy is used to access AWS resources with a Security Token Service (STS). This policy needs to be attached with EC2 role, to launch dynamic EMR cluster. Below are steps to create and attach policy with Role:

  1. Click on to create new policy.
  2. Select STS as service
  3. Select AssumeRole under Action
  4. Select Specific role and add ARN (Role created here)
  5. Click on and provide name to policy and Create it.
  6. Attached this policy with the role created for EC2 Instance.

Setup ODIF

Login to ODIF

Once application setup launches it at http://<EIP>:8081. Application run at port 8081 and EIP refers to IP Address reserved for the instance.

Login with Credentials:

Login for Rabbit MQ http://<EIP>:15672

ODIF Screen Details

Compute Details

Odif user is admin user which will be used to setup basic configuration. After login as admin Compute Details will be the first screen. ODIF launch resources Dynamically based on input source size, hence need to provide some details to setup dynamic compute.

select compute type (aws default)

Field Description
Compute Type Default AWS.
EC2 Role IAM Role for EC2
EMR Role EMR Default Role
EC2 Key Pair Key Pair created at the time of Instance Launch.
AWS Region Region where compute resource will be launched.
S3 Bucket Name Existing S3 Bucket to hold ODIF assets and EMR logs.
EC2 Instance Profile Instance profile ARN for EMR_EC2_DefaultRole.
Note: It needs to be Instance profile ARN not Role ARN.
EMR Slave Security Group EMR Slave Security Group.
EMR Master Security Group EMR Master Security Group.

Connectors

    • Connector screen is for creating and modification of connectors. Once connector is created it can be used as source or sink.

Once details filled using connectivity with the source/sink will be validated and after successful validation source connector can be submitted

Job Configuration

This screen helps in configure job where a link between source and sink can be setup. In case of mysql connector database, table information can be provided. It provides below feature

  1. Load multiple tables from one database in one job. Use
  2. Load tables from different database
  3. User specific query can also be provided in Query section.

Job Run

This screen will submit job to transfer data from source to sink. Use as default as option is for other static on-premises hadoop cluster.

Speak to AI expert