By Xiaoming Zeng, MD, PhD
“A picture is worth a thousand words.” The psychological basis of this well-known saying is that most people process and recognize patterns in visuals more quickly than from text signals. Today in the digital era, we are faced with ever-growing mounds of data—the meaning of which can be difficult to parse. Indeed, one could say that a picture is worth a thousand data points. The use of visual aids allows most people to understand and recognize patterns within large amounts of data much faster than when faced with the raw data alone.
In the digital era, data are generated and collected at an astronomical speed. Consequently, it has become increasingly challenging to make sense of a large, unwieldy amount of data. Healthcare is no exception. Healthcare data are collected and generated from a myriad of sources—electronic health records (EHRs), claims database, disease registries, imaging, sensors, genomics, patient-reported data, and so on. For example, it was estimated that, by 2020, there would be 2,314 exabytes (1 exabyte equals a million terabytes) of data in healthcare. Although emerging technologies such as big data, machine learning, and artificial intelligence could help store and process the data, individuals still need to communicate the findings from the data to stakeholders, customers, and colleagues. Therefore, data visualization, simply defined as the graphical representation of data and information, has become critically important for identifying and communicating the meanings of the data.
Data visualization has a long history in healthcare. Two prominent historical examples are data maps depicted by epidemiologist John Snow to identify and remove a water pump, the source of the cholera outbreak in London in 1854, and the Rose Chart (Coxcomb chart) drawn by Florence Nightingale to communicate the avoidable military mortalities during the Crimean War (1853–1856). Both examples illustrate the representation of data visually being vital to pattern recognition and decision-making.
Nowadays, with the help of computer software, data can be visualized with the ease of just a few mouse clicks. Unfortunately, it is also easier to generate misleading, incomplete, cluttered, and hideous data visualizations that often hamper the decision-making process. For this reason, it becomes essential for a manager to possess the mindsets and skills to create accurate and effective visualizations.
Recognizing the importance of visualizing data, the Commission on Accreditation for Health Informatics and Information Management (CAHIIM) stipulates data visualization as one of the curricular requirements at all three educational levels. Therefore, we expect the CAHIIM-accredited programs to implement data visualization content in their curricula to prepare the next generation of health information management professionals.
- Associate “III.4 Report health care data through graphical representations.”
- Baccalaureate “III.4. Examine health care findings with data visualizations.”
- Master “III.3. Present data visually through a computerized application.”
Selected Principles of Effective Data Visualization
Data visualization is a broad field, drawing content from psychology, statistics, computer science, cartography, art, etc. While it is impossible to discuss all tenets of the field in one article, six principles essential for building effective and accurate data visualizations are highlighted below. The first three principles are general, while the last three are more specific to creating charts. Although data could be visualized in various formats, this article’s main focus is on charting—representing data using graphs such as bar or line charts. Anyone interested in learning more about data visualization could and should find ample excellent resources—books, websites, blogs—by searching the web. It is imperative to point out that even though data visualization could be used to clean data, such as identifying outliers and errors, this article assumes that the data have been pre-processed and cleaned from the visualization.
1. Understand the Needs of Your Audience
Understanding the needs of your audience is vital to effective data visualization. To effectively convey a message, you must understand your audience from more than one perspective. Questions you need to ask before preparing the data visualization include:
What is the purpose of the data visualization? Specifically, is it for data exploration or explanation? Must any important information be included? Information collected from these questions often sets up the design direction of the data visualization—static vs. dynamic, highlights vs. plain design, etc.
What are the roles of the readers of the data visualization? If you present data to a seasoned chief financial officer with a sound accounting background, include a table of original data. Conversely, if the busy chief executive officer is in your audience, strategic information, such as that related to the market and competitors, must be included in the data visualization to facilitate the CEO’s decision-making.
How much time do they have to understand the visualization? If they have ample time to process the data visualization, you can create a complicated one with multiple layers of data to examine over time. Otherwise, create multiple simple visualizations to collectively convey the same information but take less time to process the individual ones.
What are their technical skills? For example, a user may like a still image hard copy printout better if they are not familiar with the underlying data or the software tool. On the contrary, a user may desire to access a dashboard with filter choices to explore the data by themselves.
What are some of the constraints the users may have? Without knowing your audience beforehand, you should always choose design that is friendly to everyone, including people with disabilities. For example, colorblind users will not be able to discern certain combination of colors that are used for coding data, so you should avoid using those colors in your design.
2. Create Data Visualization with Integrity
Data visualization is a process of gaining information from raw data. In this sense, it is not different from statistics. The impact of misleading data visualization could be even worse than that of distorted statistics because of the long-lasting impression of visual signals. For example, the default setting of Microsoft Excel when creating a bar chart is not to start the Y-axis at 0, which results in the distortion of inter-bar comparison by artificially magnifying the differences between bars. Another common misrepresentation of the data is to create a pie chart using a list of numbers in percentages, while the sum of all these percentages is not 100 percent. A bar chart should be used in this situation.
Data visualizers should always be mindful of the context of the data at hand. Missing critical contextual information will lead to inaccurate and incomplete data visualizations. For example, mortality rates due to substance misuse might have decreased in one country over the past five years. However, before celebrating the achievement, a comparison to the norm, such as the same five-year trend of average mortality rates due to substance misuse at the state level, needs to be include for a “complete picture.”
The final product of the data visualization must be complete, objective, and without perceptual distortions. A detailed analysis plan is a tool to ensure the integrity of the visualization. Pilot testing the draft data visualization with potential users will also help in identifying flaws in the design.
3. Maximize the Data/Ink Ratio
Data/ink ratio is a concept introduced and advocated by the influential data visualization expert Edward Turfe.1 It reflects a minimalist design of data visualization. Although not everyone may favor the principle, it helps ensure that the visualization remains uncluttered to make important information prominent. Data-ink is the “non-erasable ink” used for the presentation of data. The non-data ink is the ink or components used in a chart that contribute nothing to the presentation of data. The concept is illustrated in the image below, and the goal of data visualization is to get the data-ink ratio closer to 1.0.
1.0 – the proportion of a graphic that can be erased
Based on the principle of maximizing data-ink ratio, the following elements of a chart could be reduced if they don’t contribute to the representation of the data: 3D effects, background images, shadow effects, unnecessary borders, and overused grid lines. Unfortunately, many of these non-data-ink elements are in the default setting of charts created by commercial software. So, extra efforts will be needed to remove the non-data ink from the charts created by these software tools.
4. Select the Proper Chart Based on the Data and Functions of the Chart
New users often want to know what the best tool is to visualize data. There are many different software tools available now, including Microsoft Excel, Tableau, SAS, R, and D3, to name a few. Each of these tools has its strengths, weaknesses, and learning curves. What is more important is to choose the proper chart for the data you like to visualize. The selection process should be based on functions of the chart (comparison, relationship, composition, distribution, mapping, etc.) and types of data (nominal, categorical, ratio, interval, qualitative, etc.). The Data Visualization Catalog hosts a list of charts searchable based on the functions of the charts and data types. Andrew Abela from the website Extreme Presentation developed a one-page cheat sheet to help users choose the proper charts. For Microsoft Excel users, the website Juice Analytics implemented a chart chooser specifically for Excel users with sample implementations in Excel.
5. Use Pie Charts Judiciously
A pie chart is an excellent choice to represent a part/whole relationship of a categorical variable. For example, the racial composition of a patient panel is a good candidate for a pie chart. The circle represents all possible values of the categorical variable, while the slices represent the percentages of the observations of each category comparing to the total counts. One fundamental flaw of the pie chart is that it is hard to discern the difference between two slices if their values are similar. A bar chart should be used in this situation. Another common challenge of using the pie chart is having a separate chart legend, which forces users to scan back and forth between the chart and the legend to make the mental connection. If you have to use a pie chart, use it when there are no more than three large slices and, when possible, have the labels adjacent to or in the slices so users can make the connection quickly.
6. Avoid Using 3D Charts
Many people like to create a data visualization in 3D because it looks “cool.” Some even use the additional third dimension to represent one additional variable in data visualization. However, the fundamental drawback of a 3D chart is the perceptual distortion of the values associated with the 3D elements. A reader of the 3D chart can no longer precisely discern the difference between the elements of the chart. Therefore, they may not be able to make the right decision based on the charts. Instead, whenever possible, always represent data in a 2D format. The additional variables could be added to the 2D charts using color, size, or annotations. Since we rarely need to print out data visualization anymore, we can continually create multiple 2D charts along with the values of the additional variable and plot them out as a grid together to show the whole story.
The Era of Big Data
In summary, data visualization is a must-have skill for health information management professionals in the era of big data in healthcare. With the wide availability of computer tools, data can be easily visualized. However, data visualizers need to be mindful of its principles and strive for effective data visualization. Any deviation from these principles needs to be examined and well-justified. Effective and engaging data visualization will, in consequence, improve data-driven communication and decision-making.
Xiaoming Zeng (email@example.com) is a research professor in the Department of Psychiatry at the University of North Carolina at Chapel Hill.
- Tufte, Edward R. The Visual Display of Quantitative Information. Cheshire, Conn. Graphics Press, 2001.