The Scientific Principles of Data Visualization
A previous Journal of AHIMA article on data visualization articulated that data visualization is one of the essential skills for health information (HI) professionals to succeed in a data-centric healthcare environment. The article introduced several heuristics for HI practitioners to use when approaching data visualization. Data visualization is both an art and science. The art aspect enables data visualization to be engaging and exciting; the science aspect helps visualize data in an objective and intuitive way to be effective and efficient.
This article introduces some of the neuropsychological theories and principles behind data visualization. Understanding these theories and principles will set up a solid foundation for building data visualization projects.
Visual Perceptions
Data visualization utilizes our most important and dominant sensory system—vision—to represent data as graphics that can be perceived by our brain and stored in memory for later use. We use our eyes like a camera to capture images and our brains like computers to process and make sense of the captured images. To some extent, we use our brains to “think” what we see. The part of our brain that processes this visual information is called the visual cortex. It is the primary cortical region of the brain that receives, integrates, and processes visual information relayed from our retinas. The visual cortex is located at the occipital lobe of the primary cerebral cortex, which is in the very back of the brain.
The visual cortex processes the visible signals it receives from the eyes and then “forwards” the information to other regions of the brain for further processing, such as reading comprehension or decision-making. Most of the processing by the visual cortex is at the unconscious or subconscious level, which means it can recognize objects and patterns from the visual signals without engaging other regions of the brain. This implies that, to design an effective data visualization, we should leverage the visual cortex’s capability to recognize patterns and objects in place On the other side, a designer should also be conscientious about the potential risk of manipulating data visualizations.
Gestalt Principles
Gestalt psychology is a school of thought in psychology concerned with how people perceive objects and the relationship between parts and the whole. Some of the principles from Gestalt psychology can guide the process of creating intuitive visualizations, especially when dealing with relationships between individual elements, groups, and the entire visualization. Six of these principles are detailed below.
Proximity
Human beings perceive items that are close to each other as in the same group. The items could be bars, lines, or dots. In the example below, the dot plot (or scatterplot) shows the data are in two separate clusters, and in the column chart, the columns are are grouped into four clusters using white space between the clusters. Each column represents the amount of sale of a month, and each cluster in the column chart represents a quarter of the year.
Similarity
We perceive items with similar properties, such as shape, color, or direction, as belonging to the same group. Following the same example, we added colors to make the dot and/or column clusters more salient. Please note such design might be redundant since the same information (groups) are coded twice in both clusters and colors.
Enclosure
Items that are enclosed together using boundaries (e.g., circles, polygons, lines) are perceived as a group. Again, using the same examples of the dot plot and column chart, we added circles to the dot plot to emphasize the groups in the dot clusters and used vertical lines in the column chart to draw the boundaries between the quarters.
Closure
This principle describes how our brains tend to ignore gaps and try to complete visualizations with vacant areas. For example, there is a gap in the first line chart below due to two missing data points, and our brain may arbitrarily add a straight line to complete the connection of the line chart, which could be misleading if the actual trend is not a straight line (in most cases, they are not). In the second figure below, both actual data points and the arbitrary straight line were added for comparison.
Continuity
This principle postulates that items aligned with one another are often perceived as a group. For example, in the bar chart below, it becomes unnecessary to have a horizontal X-axis line because the bottoms of the bars are aligned at the same level as each other. Our brain can “automatically” draw a line from the aligned bars.
Connection
Connected items are perceived to be in the same group. In the example below, the first dot plot shows three sets of dots in different colors. In the second multiple line chart, by connecting the dots using lines, we clearly show three series of dots (data points) in the visualization.
Pre-Attentive Attributes
As mentioned above, the visual cortex of our brain recognizes objects and patterns before sending them to other regions of the brain to further process. This means that when designing a data visualization, if we could encode data in the objects and patterns that our visual cortex could recognize immediately without forwarding to other regions of the brain, we can make our data more straightforward for our brain to process. This is called pre-attentive processing. It takes about 200-500 milliseconds for the brain to process the information and store it into spatial memory. Visualization can take advantage of the shortcut of our visual processing to create an engaging visualization. Four categories of attributes can be used in the pre-attentive processing.
- Color: Color is one of the most commonly used pre-attentive attributes used to call attention. There are two factors affecting the perception of color: intensity and hue. Intensity specifies the level of saturation of a color. In other words, it describes how bright or dull a color is. Hue describes the different types or shades of color: white, blue, red, orange, etc. In the column chart below, the color of the tallest column is different from the colors of the other columns, which allows a reader to identify that specific column is March almost instantaneously.
- Form: There are many attributes included in the concept of form: shape, size, length, grouping, distance, etc. A column chart is an example of using the column form as a pre-attentive attribute. The length of each column represents the quantities of the specific categories on the X-axis. Slight differences between the bars could be discerned by our visual cortex almost intuitively.
- Spatial positioning: This attribute utilizes the positioning and grouping of elements to represent data and patterns. Most of the principles from Gestalt psychology explain the effectiveness of spatial positioning. Again, let’s use the column chart as an example; if we sort the columns from the highest to the lowest value, we alter the spatial positions of the columns (see below). The result is that we could quickly recognize that the month March has the highest quantity and the month February has the lowest quantity.
- Movement: In data visualization, the most often use of movement as a pre-attentive attribute is in the format of animations. One famous example is the late professor Hans Rosling’s mesmerizing animated data presentation of the fertility rate and life expectancy of each country in the world over a period of 40 years (1962-2002). The TED Talk of his presentation can be viewed here.
Awareness Is Key
Effective data visualization is based on the principles of neuropsychology of how we perceive, process, and integrate information to understand what we see. As HI practitioners, understanding and using these principles in data visualization can help guide design approaches and build compelling visualizations for the healthcare organization. We must also keep these principles in mind to avoid misleading and manipulating the data.
Xiaoming Zeng (xiaoming_zeng@med.unc.edu) is a research professor in the Department of Psychiatry at the University of North Carolina at Chapel Hill.
Katelyn H. Rouse (hardyka16@ecu.edu) is a clinical assistant professor in the Department of Health Services and Information Management at East Carolina University.