What is Informatica Data Archive®?
Last time, I mentioned we’d do a brief guide to the archiving tool I like: Data Archive, part of the Information Life Cycle suite of tools from Informatica Corporation. But first, a bit of background about where the toolset came from.
Data Archive was developed about 15 years ago by a cabal of whiz kids inside Oracle’s E-Business Suite (OEBS) group. Even back then, OEBS was very good at generating millions of rows and gigabytes of data, most of which was never used after it was a year or so old. And back then, Oracle’s solution was, “’buy more storage.”
This group started a company to develop and market a solution to the growing problem. Nice for Oracle, because they liked being a Silicon Valley incubator. Also, they liked having an ecosystem grow up around OEBS. And with third-party solutions for data growth in the market, Oracle could focus on what they did best: making OEBS swell.
The particular genius of the toolset was that it maintained its own extensive metadata about the tables it was to work on. This enabled data to be copied into an archive with its operational integrity fully preserved. The ERP could access the data in the current database or in the archive database, and work perfectly either way – no errors or missing data elements.
Keep in mind, full operational integrity is more than just keeping parent and child tables in sync. There is business logic coded into the application, which must be replicated in the metadata. For example, an old invoice, with all its detail lines and comments, cannot be written to the archive if it’s still awaiting payment. An ambitious undertaking, and not accomplished overnight, but ultimately they built a powerful, flexible, and generic framework to maintain and use this detailed metadata.
The company was called Applimation, and you can still see this appellation buried in the product’s internals. In 2009 they were acquired by Informatica, and since then the toolset has expanded its reach to other ERPs and DBMSs, and it’s evolving closer to Informatica’s other toolsets, e.g., PowerCenter.
A very exciting development since that time is the ability to keep the archive in flat files rather than in an actual database. “Back in the day” users (of course!) wanted to use the application they were familiar with to access their old data. But DBAs needed to take data out of the primary database to keep the application fast and supportable. The solution? An archive in a separate database, but accessible by the application.
Scroll forward 10 years, and the terabytes have really begun to stack up, with no end in sight. Now there’s data so old even the most retentive users don’t need instant access to it via their familiar applications. So now you can begin to save real money by archiving to flat files. The data is highly compressed and securely immutable, but still its tabular structure has been kept and can be queried with plain SQL commands. Brilliant! We’ll discuss this more in a later blog entry.
Moving on to the tool itself. Let’s do a verbal equivalent of a YouTube unboxing video. These are the main components you get “out of the box”:
- An application to create and maintain metadata. If you will be working mostly with a major ERP (like OEBS or PeopleSoft) you will get a license for the accelerators already written by Informatica for your particular ERP and module. In that case, you won’t be doing much messing with metadata. But if you are working with applications written in-house, or specialized or second-tier vendor applications, you will use this tool a lot.
- The archiving engine. This is the software that does the data selection, moves it to the archive repository, manages the batch processes and performs other administrative tasks. It has a web-based user interface. This UI is what most people think of as Informatica Data Archive itself.
- The database that holds IDA’s own repository: metadata from all sources, accelerators, user IDs and privileges, processing logs, etc. Very likely you will need direct access to this data, especially if you are working with custom applications.
If the archive is kept in a flat-file system, there are also the following:
- A basic tool for querying the archived data, through an interface that interprets ansi-SQL commands. This is called the SQL Worksheet.
- A tool for administrating the file archive: access management (i.e., user IDs), legal hold and data destruction settings, etc.
Finally, after the software is installed and configured, there will be user schemas for each application to be Archived. (Typographical note: I capitalize the word “archive” when referring specifically to Informatica Data Archive.)
- One sits in the source database and has select-and-delete access to the tables it manages. This is called the “staging” schema. It owns a few objects of its own, but doesn’t occupy much space except during an Archiving cycle, when the interim tables it builds can become sizable, but they’re normally dropped at the end of the process.
- If Archiving to a database, there is another schema that owns the tables in the historical archive. It should have its own tablespace, to facilitate administering it separately from the current database.
Those are the major components. In the next installment, we’ll take a look at various approaches to developing the best archiving setup for your specific environment.
Until then —