Disrupting BI Part 3: Visualizing Better Data with Data Quality
Data visualization proves its value by showing key stakeholders the patterns, trends, exceptions and outliers in the business, yet the picture painted is only as good as the underlying data. Data visualization delivers an easily consumable message that is rapidly understood in the form of charts, pictures, graphs and other visuals. The users of data visualization gain immediate insight on demand and can therefore respond to changes and issues in the business as they happen, which they otherwise may not have been able to do in time to make a difference. When users understand more information and have it readily available, it is very important that the supporting data be of high quality so decisions are made with correct information.
Achieve More Completeness and Consistency
A sound data quality process assesses and improves the completeness, consistency and accuracy of the data, and some of these techniques are especially important for visualization. Complete data ensures that data is populated where it’s needed and there are no gaps, such as null or empty values. Data consistency avoids abbreviations and alternate spellings, such as “grey” and “gray” for describing color, to indicate the same entity. Data completeness and consistency are both addressed during the first steps of data quality improvement, commonly referred to as data profiling. Later in the data quality improvement process, incorrect data must be fixed, and accuracy can be discovered by cross- referencing data with other attributes. For example, cross-referencing the city and ZIP code on a record is an allowable combination. There are other areas of data quality that can improve visualization as well. A few of these techniques include matching and merging records (e.g., consolidating customer names into single households) and data enhancement (e.g., adding longitude and latitude to addresses).
Ensure More Data Accuracy
The importance of data quality grows with data visualization. For example, to see and understand true patterns in customer behavior, each customer must be represented as a single entity. Customer records almost always exist in source data in multiple different amalgamations. Consider the customer name “Smith Company.” It may exist in the source data as “Smith Company,” “Smith Co.,” “Smythe Co.,” “Smith Corp.,” etc., and metrics such as sales may be associated with each amalgamation. When visualizing sales for “Smith Company,” the true number may not present. A data quality process would fix this issue by standardizing the name on all records to one accepted name, such as “Smith Company.” Another example of data quality showing its importance is in visualizing map data. Imagine plotting sales by customer location on a map. An incorrect ZIP code in the source data would result in sales information being plotted in the wrong spot on the map. A data quality process would improve the map by cross-referencing ZIP codes with other address information such as city and state and correcting it if illogical combinations are found.
Embrace More Quality Data Practices
Augmenting data with good data quality practices improves data visualizations. Checking the data for completeness and filling in the blanks ensures that the graphs and charts tell a full story. Checking the data for consistency ensures the actors in the story are credited with appropriate impact. Correcting the data eliminates false information and helps tell a more accurate story. Data visualizations are used to make better decisions, but bad decisions will be made when the information is bad. Data quality fuels better decision-making capabilities when working with data visuals.
Ready to embrace data quality practices that help you visualize more robust, accurate and consistent data? Contact us for a customized solution for your business.