Approaching a Data Archiving Solution
At this point, you know something about archiving in general, and Informatica Data Archive in particular. Now, as promised last time, let’s look at how you should approach the task in your own environment.
Where to begin? Most shops have multiple applications with multiple databases in use. Shouldn’t you focus on the one with the most obvious problem, to get a big bang for your buck? The short answer here is “Of course.” But consider before you commit: The biggest obstacle to getting Archive going is not technical, but political. Archive is a new thing rocking the boat, and especially unwelcome, because in fact all your applications work OK without it. More often than not, what you need is an easy target, just to break the ice and get something going in production. Once one application is in, and everything still works, the fear factor goes way down. You will find it much easier then, to get more and bigger applications into the pipeline.
We’ll discuss later in this series what makes one application easier to archive than another.
How and how fast do you need access to the Archived data? The technical issue targeted here is: Can you keep the Archive in a flat-file Informatica Data Vault (IDV), or do you need to keep history in a separate database schema? Here is a comparison between the two:
- Can be accessed readily by an application, or any number of query tools.
- Fast, indexed response times – the Archive is a database.
- Cheaper to administer than the primary, current schema because it needs to be backed up only after each Archive cycle.
- Reference- or dimension-type data can be left in the current schema, and accessed via a db-link. No explicit maintenance or processing needed.
- After a version upgrade to the application database, the history schema can be upgraded to add new columns, new column sizes, even new tables if necessary (all new data elements will be initialized to NULLs). This keeps the current and history schema synchronized and perpetually usable by the base application.
- Accessible through a JDBC/ODBC connection. This can work for BI tools like Hyperion, but typically a plain-vanilla SQL tool would be used. Informatica supplies one for the purpose, with the tool set.
- The data is highly compressed and sits on plain flat files. Access is through a front end that enables SQL querying. This front end has a much smaller footprint than a real DBMS, but response time will be significantly slower than reading from a database.
- Both administration and the physical storage can be much cheaper than what you’d need to host an active database. Also, it’s outside the reach of a DBMS license.
- Depending on the queries you’ll be running, you may also need to keep reference tables in the IDV. At the moment, this means having to explicitly maintain them somehow. For example, by dropping the old copy of a reference table before re-Archiving it. Subsequent releases of Data Archive should simplify this task eventually.
- If a version upgrade adds new data elements to managed tables – no change. Subsequent Archive cycles will incorporate the new columns, but existing data in the Vault is immutable, period. This has implications for security (good) and reporting (“some assembly may be required”).
It’s quite legitimate to keep both types of archives. So, for example, if you need to support a few reports that run quarterly and use three years of data, but everything older is needed only rarely for, say, compliance purposes, you might plan on a current schema holding 13 months of data; plus a history schema holding data between 14 months and three years of age; plus a file archive going back 10 years.
As database vendors evolve their offerings, the most common use case for Data Archive has also evolved. These days, there is a lot of interest in custom or retired applications, going into a Data Vault. And speaking of retirement…
Is the application being retired? You are in luck! (Project duration-wise, anyway.) In a retirement scenario, all the data is written out to the archive. No filters for current-vsold records, no business rules to verify – just back up the truck and load it all up. Archive has shortcuts to automatically generate the code needed. Furthermore, once everything is in the Archive, you’re done. No monthly or quarterly (or any) jobs to schedule and monitor.
How difficult will it be to get it going? If your applications are any of the various modules of PeopleSoft, J.D. Edwards, Deltek, or the Oracle E-Business Suite, the development has already been done. Your task is to get everything configured, get trained in how to run the cycles, do the UAT, and Bob (as they say across the pond) is your uncle. You will need a good, technical PM and competent DBAs. If you are running applications that are not part of this lineup, there will be some development required, which typically requires about three months per application. A specialist consultant can help immensely to ensure your first project gets up and running successfully. Indeed (shameless self-promotion department) that is what we do. We have extensive experience with Archive on a range of different custom applications, ERPs, and industries. We have seen the issues, and resolved them.
Next time, a typical project life cycle. Until then.