Unlocking Data Insights Using Oracle Endeca Information Discovery
Let’s say you’re a business analyst who’s been commissioned to create the specs for a portfolio performance report. Your task is to combine information from the back office database, the middle office trading application and a spreadsheet the portfolio manager has maintained since Excel was introduced. While it could be done with the current toolset, it would take months to create the new data model, a database, and the ETL to load it before you can even begin looking at the actual data to write the report spec. There’s got to be a better way.
What you need is a data discovery tool that can combine structured and unstructured data into a single database to solve new problems in weeks or days, not months. Fortunately, Oracle has released Oracle Endeca Information Discovery to supplement Oracle Business Intelligence. Endeca Information Discovery combines an ETL suite, a semi-structured database and an easily configurable thin client interface into one easily implemented platform.
Below is a description of the key platform components:
- Endeca Integrator is an Eclipse-based ETL system. It has an IDE that comes with a variety of pre-configured transformations used to extract, transform and load the data into the Endeca server. Integrator is as easy to use as any other GUI-based ETL system, but has direct visibility into the Endeca load process rather than just creating staging tables or the Web Services interface. It can also take data from the Endeca Content Acquisition System such as Web or file-system crawls and load it into the Endeca Server.
- Endeca Server is a semi-structured database that stores data as records and attributes. A record is a unit of data, and an attribute describes the record. For example, in a database of books, the record is a particular book and the attribute would be its author. The data model is very extensible and is quite capable of handling “jagged” or incomplete data. It can even handle data without cleansing. The database is in-memory for the most part and very fast.
- Endeca Studio is an intuitive user interface developed from Endeca commerce user, used to index some of the world’s largest E-Commerce sites. It allows fast development of re-usable components to dive into the Endeca record set, and does not require a lot of training. The Studio is Java and SOA based, and is typically used for intranet settings.
Oracle Endeca Information Discovery can be downloaded with a sample application at https://edelivery.oracle.com/. The sample app is set up with the AdventureWorks database that a lot of BI users will be familiar with, and shows access for structured data very well. Future blog posts will be on using Endeca to tap into unstructured data.