Dealing with Data Growth: Archiving
Hello! If you’re reading this, I know who you are: A person who’s been tasked with doing something about all that old data.
I make the bold assumption that you’ve been given this task — because no one seeks it out. It’s the least “sexy” thing you can do in IT. No new functionality is provided, storage is cheap, and an archiving project only gets noticed if it goes wrong. Welcome to my world!
I have been focused on archive projects for about five years — for many clients in many industries. When not working on archiving, I am involved in a range of other database engineering projects: Architecture, BI development, performance tuning, analytics, etc. My series of blog posts will ease you into archiving: What it really is and is not, what are the various approaches, why I like the industry’s leading tool set (Informatica Data Archive, part of their ILM suite), and how a typical project progresses from start to finish.
Let’s start with the basics. What is archiving? Very fundamentally, it is a physical separation of older, static, infrequently-used data out of the active, frequently-accessed and updated portion of a database. Data is removed from the current, active database and written to the archive. Each row exists in only one place. Both the current and the archive repositories are “databases of record.” Your first line of defense here should be partitioning. But eventually, the proportion of very old and static data overwhelms even a well-tuned partitioning scheme, and you just need to get the rarely-needed stuff out of there.
These are the benefits to expect:
- An optimized user experience. Even a well-indexed database still has many queries and reports that get bogged down in a cascade of full-table scans.
- Reduced cost and time needed for regular database administration and maintenance. An archive, being static, only needs a single backup and no transaction logging.
- Reduced need and expense for fast, high-availability storage.
- Stabilized licensing expenses based on the size of the database and depending on your environment. In a good archiving scenario, the active portion remains at a fixed size, and the growth is off-loaded to the archive.
- When an application is replaced or retired, allof its data is static, so it can all be kept in an archive. You remain in compliance with data retention requirements without having to maintain obsolete software and hardware.
What about the downside? To be blunt, the risk is data loss. Or, worse, making a big investment in an archive solution only to find out that you can’t actually get at the data when you need it. In today’s regulated environment, the cost of lost data looms much higher than the cost of bloated databases.
But don’t panic! A purposely-designed tool implemented by capable staff — guided and advised by years of experience — can give you the benefits you seek while demonstrably reducing your risk to zero. Demonstrably. (Just wanted to use that word one more time!)
When should you start a regular archiving program? In one sense, archiving is a lot like backups and other business-continuity practices: Everyone knows that you’re going to need it sooner or later, yet you don’t want to spend a lot of time and money on it. But please, for your own sake, don’t wait until your databases are so large that regular backups and index maintenance cannot be done in a single weekend! Every application and its database is different. But as a rule of thumb, when the older data that you “never” access is more than half the volume of the whole database, you will get an appreciable benefit from Archiving it out.
The next blog entry will include a very brief guide to the tool I like (Informatica Data Archive) and will move on to consider various architectures and approaches to the problem and what scenarios they would apply to.
Until then —