Why your business is only as good as its data
In the past few years, it seems like data analytics has become the answer to corporate growth and sustenance. Businesses both large and small have started to perform analytics on data to analyze past performance and chart a plan for the future. There’s no doubt that data is essential to business success, yet far too many companies suffer with unreliable data. Fortunately, there are many tools and techniques available to guarantee high-quality, reliable data for analytics.
What is data quality?
Data quality is a measure of how good your data is. Though this term sounds simple, it’s a pinnacle in the world of data analytics. Data analytics can make or break a corporation. Today’s companies are highly competitive and eager to cater to their customers’ needs. They can only be successful when they have reliable data that is able to forecast their company’s future and shine light on what their customers want. That is where data quality comes in.
There are many different attributes of data that a company can possess. Data for service provider companies includes customer data, product data, inventory data, pricing data, marketing data and, of course, internal employee, HR and finance data. In order to guarantee the company’s success, there needs to be an analytical trend of their customer base. They need to ask questions such as the following: What are our customers buying? What can we do better? What marketing campaign has been most successful? How are our customers hearing about us? What other products/sales can we offer? These questions need to be asked, but to get the right answers a company needs to examine its data to find trends between historic customer purchases and current ones. Only then can they start to see what was and what is going to be.
Data quality is the starting point for good analytics. Companies need to ensure that their data is reliable and in the best format so their reports can be accurate. The data integration process by nature includes all sorts of raw data, and it’s a challenge to make sure it is all correct and to eliminate data anomalies. Data anomalies are data inconsistencies that cause data redundancy. This is where data matching comes in.
What is data matching?
Data matching is the process that links matching data together so you can find and omit data anomalies and redundancies. When you think about companies that would need a lot of analytics, the main types of data that come to mind are customer data and employee data. Let’s think about what kind of attributes we would find in employee data: names, phone numbers, email addresses, residential information, pay data and job data, and the list can go on. Let’s take name data as an example. A company can have several employees named “Smith, Mike,” but of course multiple Mikes will not have other matching attributes such as address and phone number. If a database isn’t well-designed and doesn’t have the proper primary and foreign keys, there may not be a way of telling one Mike Smith from another.
This is a good example to keep in mind for data integration. A lot of companies are moving to electronic records, which simply means they are transferring their paper and spreadsheet-based documents into a database or into a data management tool that will allow them to query information in various ways. Keeping this scenario in mind, let’s think about what kinds of data anomalies can occur. Let’s say that a doctor’s office updated its patient John Hardy’s address, but it has three John Hardys. Doing this update in a spreadsheet or in another unmanaged platform will create data discrepancies because there is nothing stopping the update. Now, how does this relate to data matching? When the data integration is happening, primary keys and foreign keys are created, and there is nothing but a set of unique identifiers that when put together make up a single person’s data. There are various data matching tools and techniques that you can use to guarantee the quality of your data.
There are so many different ways in which you can match data, and so many benefits to having clean and concise data. My favorite part of the post-data matching is the standardized results! The best example I can think of is one I used on a project I worked on a while back. We worked with millions of lines of manually entered data that was flooding our database. In order to perform accurate analytics on this data, we had to make sure it was at its lowest, cleanest, most standardized level possible. This proved to be a huge challenge when it came to addresses. Consider the “auto-fill” on your smartphone for your address — there are so many ways we type the same address on different occasions. Combinations of “St” versus “Street” or “Apt” versus “#” are just a couple I have noticed. That’s just one of us, typing one address many different ways. Now, imagine hundreds of people typing in millions of addresses multiple times a day! It was a mess! Luckily we used Informatica and Bing’s Geospatial addresses match capability to automate the integrations. Though this made things a lot faster than manually looking at millions of rows and updating them, automation isn’t 100% accurate. That’s why it’s always a best practice to randomly choose a data set and test it. Whether the data set is humongous, large, moderate or small, when you can pull a report and have neat, accurate and useful data, there is no limit to the kinds of analytics you can perform to grow your company.