Loading Data from OCI Object Storage to ADB Table via OCI-DI
In this blog we will see how to load data from Object Storage to Autonomous Database table using OCI Data Integration.
Use Case
- Few customers would want to load their SaaS data to Autonomous Database for multiple reasons or if they want to generate a dashboard for their SaaS data using Oracle APEX (Low Code No Code)
- There can be customers who have data in csv, json, excel, etc., formats which they want to ingest in Database table.
The above architecture and this blog covers these two cases (though we are not covering the part where data from SaaS is coming to Object Storage via BICC).
In this blog, we will cover the below topics:
- Provisioning an OCI — DI Instance.
- Creating IAM Policies using OCI-DI with Object Storage & Autonomous Database.
- Creating a flow to transfer data from OCI Object Storage to OCI ADB using OCI DI.
Buckle up as this is going to be a lengthy blog
1. Provision OCI DI Instance
OCI DI (Data Integration) Instance can be provisioned from the path Analytics & AI → Data Lake → Data Integration as shown below
By default, OCI DI instance will be provisioned on a Private Subnet. Here you can chose the Compartment → VCN → Subnet on which you want the instance to be provisioned and click on “Create”
This completes provisioning of OCI DI Instance.
2. IAM Policies
Now, to access Object Storage from OCI DI you need to provide access via OCI IAM Policies. Below are the list of policies to be provided for the same
allow service dataintegration to use virtual-network-family in compartment <>
allow group <> to manage dis-workspaces in compartment <>
allow group <> to manage dis-work-requests in compartment <>
allow group <> to use virtual-network-family in compartment <>
allow group <> to manage tag-namespaces in compartment <>
allow group <> to use object-family in compartment <>
allow any-user to inspect compartments in compartment <> where ALL{request.principal.type=’disworkspace’,request.principal.id=’ocid.’}
allow any-user to read buckets in compartment <> where ALL{request.principal.type=’disworkspace’, request.principal.id=’ocid.’, request.operation=’GetBucket’}
allow any-user to manage buckets in compartment <> where ALL{request.principal.type=’disworkspace’, request.principal.id=’ocid.’}
allow any-user to manage objects in compartment <> where ALL{request.principal.type=’disworkspace’, request.principal.id=’ocid.’}
allow any-user to manage buckets in compartment <> where ALL{request.principal.type=’disworkspace’, request.principal.id=’ocid.’}
NOTE: See this for more about IAM Policies for OCI Data Integration.
3. OCI DI Flow
Once your OCI DI Instance is provisioned click on the Instance name to open it.
In here, we have to perform the below tasks in sequence:
- Create a table in ADB in which you want the Data to be inserted.
- Create a Bucket in OCI Object Storage.
- Create Data Assets for Object Storage & Autonomous Database.
- Create a Data Flow.
- Create an Integration task.
- Create an Application and Publish Task to this Application.
- Test the Integration.
Table Creation
I will not be discussing the table creation or Bucket creation parts in this blog (assuming everyone is aware of that).
Bucket Creation
For OCI DI to access Object Storage, you need to have a Service Gateway created on the Subnet on which OCI DI is created as Object Storage will be in Oracle Services Network, which can be accessed via Service Gateway.
Data Assets Creation
In your OCI DI Instance, click on Create Data Asset to create individual Data Assets for Object Storage & Autonomous Database.
Object Storage Connection
a. Click on Create Data Asset, give a name to your asset.
b. From type chose “OCI Object Storage” (this auto populates your Tenancy OCID, Namespace), leaving you an option to chose the region where your bucket is present.
c. Give name to your Default Connection, else it’ll take the default name and click on “Create”
This creates your Data Asset for Object Storage. Now, click on Buckets to see the buckets of your Object Storage to confirm the working of created Data Asset
Autonomous Database Connection
a. Click on Create Data Asset, give a name to your asset.
b. From type chose “Oracle Autonomous Transaction Processing/Oracle Autonomous Database” based on your Database Workload type, then upload the Wallet file for this Database along with a Wallet password.
c. Give name to your Default Connection, username & password of ADB, chose a TNS alias from the drop down. Here I am chosing a bucket on Object Storage as Staging Location for now and click “Create”
This completes ADB Connection. Now, test the working of the Data Asset by clicking on Schemas tab as shown below
This completes creation of Data Assets. Now let us start by creating Data Flow.
Data Flow Creation
Let us create a Data Flow, to show the source & target and their mappings.
a. Click on Create Data Flow and give a name to it.
b. Drag and drop Source activity from the left side pane on to your canvas (center pane) and then configure it by giving it a name, choosing the Object Storage Data Asset created earlier, Connection and Bucket. Click on “select” in Data entity and you can choose the entity from bucket either by name or pattern (I am doing it via Pattern for now)
Then choose File type “ CSV” and click on “Select”
You can check if this is correct or not by clicking on Attributes Tab which shows the fields from your CSV (first line which is considered as Header by default)
c. Drag and Drop Target activity from your left pane to your canvas (center pane), give it a name, choose Data Asset, Connection, Schema & Entity (Table into which you want to insert). Integration strategy by default will be “insert” which we can leave it as we will be inserting into the table.
You can test this as well by clicking on Attributes tab to see the fields of your chosen table
d. Now, let us connect both source and target and then complete mapping between them.
As the fields are same in both my source & target they are auto-mapped. Click on the link between source & target and choose the Project where you want this to be saved.
Now, click on “Create and Save” to complete creating Data Flow.
Integration Task Creation
a. Click on “Create Integration Task”, give a name to it, chose the project folder, chose the Data Flow created in previous step and click on “Create and close”
This completes creation of Integration Task. Now, let us create an Application.
Application Creation & Publishing Integration Task
a. Click on Create Application, give it a name, chose the compartment and click on “Create”
Once application is created, go to Projects from Home screen, click on Tasks, Select the Task which you want to publish, click on Publish to Application, chose the Application to which you want to publish the task and click on “Publish”.
Test the Integration
To test the integration, go to Applications from Home screen, select the application you have created, go to Tasks tab and against the Task click on the three vertical dots and click on Run as shown below
Now that the run is completed successfully in OCI DI, let us check if the data is inserted in ADB Table and the data in Object Storage Bucket.
This completes the entire flow of loading data from Object Storage Bucket to Autonomous Database table.
NOTE: This is just to demonstrate on how to load data to ADB Table from Object Storage using OCI DI. You can enhance this further by scheduling it and by creting pipelines and pipeline tasks based on your requirement.
If customer has a requirement to show the data from Autonomous Database table on APEX application, the same thing can be achieved which is explained here.
!!! Happy Reading !!!