Archive Development Life Cycle – Or, What’s that light at the end of the tunnel?
Last time we said we’d look at a typical project life cycle. That’s what we’ll do, with this caveat: There is no typical project. Every one is different, sometimes radically different. No doubt you already knew that. And we’ll skip the contract discussions, hardware acquisition, and other tedious start-up parts of the project. Let’s get to the good stuff: What will the implementer (you!) experience down in the trenches?
Testing: If you followed my advice and picked a simple application for your first trial, you should be able to look at some test results in a week or so after installation. “Simple” in this context means that the accelerator for your application was included with the Informatica software; or if not, that’s it’s no more complicated than “One master table, selected by one date column, and all its relations from a child table, connected by a simple join condition”. There are a few key choices to make when setting up this first trial:
- Take a very small slice of data, ie retain almost everything. Do some preliminary analysis and counts to figure out how far back to go. You want to test all the logic of your criteria, but not spend hours moving terabytes of data.
- Use an absolute date, rather than “six months prior to today’s date”. This will simplify the verification of all the logic, and then later you can switch to a relative date for UAT and Production purposes.
- At first, do not purge the data being archived. This allows the initial tests to be easily repeated, and simplifies verification of selection criteria.
As the implementer, you will be keenly interested in making sure that all and only the intended data was archived out, before you submit the results for inspection by others. And here is a risk point: Testing is tedious; no one likes to do it. I often find that after a few testing cycles, application staff tire of confirming my results, or running their own tests — they grow to trust what I submit. This is good, but it’s bad. Testing must be confirmed by independent parties to be valid. Resist the temptation to handle all testing yourself; offer whatever assistance you can to facilitate others’ testing. Otherwise, you may find yourself in the very unpleasant situation of having to restore data to correct a missed assumption.
Later testing stages will focus on two areas: First, business testing of the application, to ensure that it works as expected without having access to all the old data. This is where you occasionally find out about a condition which should have been included in the archive criteria, but was not.
Also, performance. These jobs move hundreds of gigabytes around, so they can take many hours to run, and they consume lots of resources. Normally you do not want to have Archiving going on while the application is available to users. So you need to work out how much data you can remove during the available processing window. Also, there are lots of techniques for optimizing run times, but they necessarily involve trial and error. For this testing, you will be working closely with the DBAs, and you’ll need an environment that’s identical to production, or close to it. Do not add resources to the test environment (eg, memory) that you won’t have in production.
Custom Development: This may be the biggest difference from one company to another. Sometimes a lot of customization is required, sometimes it’s littler or none. One of your earliest priorities in the project life cycle must be to determine how much custom code you really need. How many applications? How closely do the out-of-the-box accelerators match your requirements? Have there been any ad-hoc, in-house projects to address the data-volume problem? (Often that work can be leveraged for Data Archive.) There’s a whole dialog involved here, but the biggest question of all is: What is your management’s expectation? As the implementer, you may not have been deeply involved in the preliminary discussions and negotiations. But this is a risk point! Do not underestimate the complexity or duration required for custom development! The killer is, Archive should go fairly smoothly, as projects go; so when it goes sideways, there can be some unnecessary bad feelings.
As an example, I offer this war story: The requirements were to Archive old invoices, unless they had credit balances. The credit amount was held in its own column, easily checkable to see whether it was greater than zero. Development and testing proceeded on schedule; everything looked good to me and to the system testers. Only late in the game, very very late, did it become apparent that the credit-balance column could occasionally also hold debit (negative) balances. In retrospect, this should have been almost obvious. But at the time, we had to go back, determine if there were other places holding credit or debit balance amounts, build an exhaustive test suite, revise the accelerator, re-run everything, scrutinize the data and the balance-sheet reporting… More egg on my face than I’m comfortable with.
Launch: And then finally you hit Enter … and nothing changes. The job runs successfully, but everything is more or less as it was before. On one hand, the best possible outcome is “anticlimax”; but on the other hand, you do want demonstrable success. So plan for this day:
- To realize the performance benefit of a smaller database, you need to gather stats after archiving. And in most environments, you should reorg affected tablespaces to recover the physical space and reset the HWM.*
- Re-adjust storage allocations where possible, like dropping empty partitions.
- Track storage use before and after. Data Archive has internal reports for this, but I advise tracking it externally to the tool, for performance reasons.
- Similarly, track response and turn-around time for long-running queries, reports, and maintenance tasks like re-indexing.
- Users adapt very quickly to improved response time, and all your hard work will soon fade into the background noise. Don’t be discouraged, this is how it’s supposed to be.
Business as usual: In the basic scenario, DBAs or other technical staff will be trained in how to run Archive. After it’s all set up, tested, and migrated to production, the staff who did all the development should be free to move on to other projects. And here is a risk point: Archive jobs normally run only monthly or even semi-annually. It’s easy for a DBA to forget the training, especially in environments where the staff has high turn-over. If you don’t have one of the original developers around to run the jobs, you will need clear and simple run documentation. This means screen shots, a searchable message index, and a backout plan (with phone numbers). Should be obvious I suppose, but it still catches me by surprise: A lot of technical staff simply will not read manuals, explanations, background, or blog posts.
In conclusion: To maximize your ROI on archiving, don’t skimp on the resources needed to do the job properly.
- Strong technical staff, often including temporaries with specialized skill and experience
- Focused project management, to keep expectations in hand
- Executive commitment, to provide the horsepower to overcome institutional inertia
When it’s all done, you will have eliminated a major source of cost, delay, and inefficiency. Pat yourself on the back for a job well done!
Note: This was intended to be the last of four blog entries on archiving. But, fair warning, you might hear from me again on this topic someday.
Until then – A factette about something you either already knew, or will never need to know:
* HWM (high water mark): Whenever you do a full table scan, Oracle reads through everywhere any data was ever stored, regardless of how much data has been deleted. Say you’ve archived away eight years of a ten-year-old database – select count(*) will still chug through ten years of mostly empty segments.