The job of the data scientist is to acquire data, clean it, analyze it, make sense of it, and most crucially of all, communicate its meaning to an audience of (usually) non-data scientists. Effective communication is critical in data science.
Data scientists are highly-trained, highly-skilled individuals – and they need to be. The language of data science is highly complex and esoteric. Data modeling and analysis is complicated, and datasets are difficult to understand – especially for non-technical people. Furthermore, as a data scientist works with data, that data is sometimes stored as comma-separated values (CSV) files, Excel files, or otherwise in [No]SQL databases, the Hadoop Distributed File System (HDFS), and the like. As with all data science work, it’s not how the data is stored – nor the actual data itself – that is inherently valuable. Rather, the value is found in the insights that can be drawn from it.
Data isn’t easy to decipher– especially when dealing with large volumes of the stuff. This is where data visualization comes in. Data visualization is the process of presenting data in a visual context – using pictures to understand data, in other words. This is vital even for the data scientist to comprehend, let alone for effective communication between the data science team and relevant stakeholders. Indeed, for data scientists to produce truly actionable insights, the findings and observations have to be made available to the stakeholders tasked with acting on them.
Data visualization enables different stakeholders and decision-makers to understand the significance of data by presenting it not as volumes and volumes of records, but in easy-to-interpret graphs, charts, maps, dashboards, and other visualizations. Visualizations provide a consumable way to see and understand trends, patterns, correlations and outliers in data. In this way, data visualizations don’t only reveal insights – they help make those insights actionable.
Why Is Data Visualization So Effective in Data Science?
The simple answer to this question is because of the way the human brain processes information. Since data science business projects usually involve a lot of information to process, the human brain often is unable to handle the volume. According to Dell EMC, organizations managed an average of 9.7 petabytes of data in 2018, a 569% increase compared with the 1.45 petabytes they handled in 2016. With so much data, it’s practically impossible to wade through it all line-by-line and pick out patterns and trends – but with data visualization tools and techniques, insights are much easier to see and grasp. The reason is that our brains process visual information much faster than text-based information– 60,000 times faster, in fact, according to estimates. Here’s a visual to help that data sink in quicker…
(Image source: killervisualstratergies.com)
…And here’s an example exercise from Study.com to prove the point.
Question: Looking at the following table, what month recorded the highest sales?
Obviously, it’s December – but it took you a few seconds to read through the figures to find the answer. By comparison, look at this simple visual representation of the same data and ask yourself the same question…
(Image source: study.com)
… You got the answer almost instantaneously, right? And you can also see the peaks and troughs throughout the year – the larger story the data tells.
We all know that time is money in business. Organizations that can make better sense of their data quicker are more competitive in the marketplace. Why? Because they can see trends, patterns, and make informed, evidence-based decisions sooner than their rivals. Data visualization helps this to happen.
Consider a marketing team working across 20 or more ad and social media platforms. The team needs to know the effectiveness of its various campaigns so it can optimize spend and targeting – and it needs this information quickly to remain competitive. The process could be completed manually by going into each system, pulling out the various reports, combining the data, and then analyzing on a spreadsheet – but it would take an age to pore through all the metrics and draw any meaningful conclusions. However, utilizing data visualization tools, all sources of data can be automatically connected, and visualizations immediately produced to be presented to the team, allowing its members to draw on-the-spot comparisons and conclusions about each campaign’s performance.
It’s All About Fast and Clear Communication
There are many different types of visualizations – line plots, scatter plots, histograms, box plots, bar charts, the list goes on. They may seem simple – but it is precisely this simplicity that makes them so valuable when presenting data science findings to stakeholders.
(Image source: blog.qlik.com – click to access this chart)
In a recent interview with Dataquest, Kristen Sosulski, Clinical Associate Professor of Information, Operations, and Management Sciences at New York University Stern School of Business – and author of Data Visualization Made Simple – makes the point that while very few people can look at a spreadsheet and draw quick and accurate conclusions about what the data says, anyone can compare the size of bars on a bar chart, or follow the trend on a line graph.
Sosulski explains that while data visualization is a key skill at every stage of the data science process, it becomes critical at the point of communication. “There are a lot of angles that you can take with visualization, and ways to look at it,” she says. “I think about data visualization as something that we have in the toolkit to help people better understand our insights and our data. Just on a human level, visualizations allow us to perceive information a lot more clearly when they’re well designed.”
Using visual representations, data scientists can open the eyes of key stakeholders and decision-makers, allowing them to understand clearly and quickly what a dataset is revealing, how a model will help solve a business problem, and what impact the scientist’s proposals and discoveries will have on the organization. Emerging trends – both in the business and in the market – can be pinpointed quickly, outliers spotted, relationships and patterns identified, and the whole data story communicated engagingly in a way that gets the message across without unnecessary delay.
Without visualizations, all you have is data on a spreadsheet. All insights remain buried. The beauty of data science is that it reveals the true value of all those petabytes of data organizations are now managing. But without using data visualizations to communicate the important insights a data science project discovers, that value will be forever lost.