Health Data

Using Tables as a Method of Data Visualization

In the first three articles of this data visualization series, we introduced the importance of data visualization for health information (HI) professionals, the psychoneurological basis of intuitive visual processing, and the grammar of graphics that could be used to build layered and informative data visualizations.

Beginning with this article, we will dive into the different types of visualizations in detail. The first data visualization type we will discuss is a data table. Although data tables are not usually associated as a type of data visualization, they are often made available for users to examine the data along with visualizations. Although data tables are not as intuitive as charts due to not leveraging the pre-attentive attributes introduced in our previous discussions, they can still present data effectively with some mindful organization and formatting, as we will discuss in this article. Nothing beats a “good old data table” as the source of the truth.

Data in a Table

A data table is a collection of observed values in a two-dimensional tabular form that consists of columns and rows. Data can be stored in a higher dimensional space in the mathematical and statistical programming environment. However, such high-dimensional data representation is difficult for humans to process intuitively and accurately (think a Rubik’s Cube of numbers: Can you see the core center block?). In the table, the intersection of a column and a row is called a cell, which contains the value of an observation. Spreadsheet software such as Microsoft Excel or Google Sheets has made generating, entering, storing, and manipulating data in tables simple for end users. Although unnecessary, each column usually has a unique header, and each row sometimes has a unique index, usually stored in the left most column of the table.

There are two general data types in a table: text and numeric. Text data often are categorical (e.g., gender, race), while numeric data are quantitative (e.g., age, counts, percentages). Sometimes, text data looks like numeric data. For example, Medical Record Numbers (MRN) often consist of a string of digits, which makes them look like numbers. In situations like this, one key consideration is whether we can conduct mathematical operations on the data. It is apparent that it makes no sense to add or compare the value of any two MRNs, so MRNs are not numeric data. They should be processed as text data type.

Make Data ‘Tidy’

Data tables are generated and used in all phases of the data analysis process, from storing the collected raw data to summary tables displaying the analysis results. When preparing data for analysis, which is at the early stage of the data analysis process, we need to make the data “tidy.” Tidy data is a concept proposed by Hadley Wickham on getting a dataset ready for statistical analysis. In a tidy data set, 1) each variable forms a column, 2) each observation forms a row, and 3) each type of observational unit forms a table.

Let’s consider this example. Table 1 and Table 2 are two tables representing the same data of four fictional patients; each patient has repeated lab tests on different dates. Table 1 is called a “wide table,” which is “untidy,” and Table 2 is called a “long table,” which is considered tidy. In Table 1, each row represents a patient with multiple observations (tests 1 and 2). If additional observations of tests are made on a patient, we add the test result and date to the right side of the table. Notably in Table 1, patient Joe does not have a test 2 yet, which results in the two empty cells in the table. Table 1 is considered untidy because each row contains two observations of the same patient, and the same variables, Test and Date, are formed as duplicated columns.

In Table 2, each column is a variable—patient name, test, and date, and each row is an observation of the values of the three variables for a particular test. The entire table represents the records of the patients’ tests. Although the patient’s name column now contains duplicated names, such duplication is acceptable in tidy data. And there is no empty cell as in Table 1, as each row represents an observation. If a test is missing, the data is not entered as a new row.

A long table is the preferred format for data tables used for statistical analysis. However, data stored in the wide table format like those in Table 1 are not uncommon because of the convenience of having all data about a subject stored in one row. Computer tools can be employed to convert wide tables to long tables to make the data tidy. Here is a tutorial on converting the table from the wide format to the long format using Microsoft Excel.

The discussion of tidy data is mainly relevant to preparing raw data for statistical analysis. Data scientists may examine such data formats more closely, but they often mean little to the general users of the data. General users often expect to view a table with some summarization or reorganization of the data generated as the product of data analysis. Tidy data guidelines need not apply when generating the summary tables. Although summary data tables usually have fewer rows and columns than the tidy data sets used at the early stage of the data analysis process, the summary data tables must be purposefully organized and formatted for straightforward scanning and reading because they are for the general users to examine and comprehend.

Anatomy of a Summary Table

Before enlisting the guidelines for presenting data in a table, it is important to understand the structure and components of a table. Summary tables, especially those formatted for professional or academic publications, are often complex with multiple levels of variables. Besides the columns, rows, and cells, summary tables have specific terms for different parts. One resource is the American Psychological Association style guide, which outlines the names, formats, and component definitions of a publishable summary table

Guidelines for Presenting a Table

When presenting data in a tabular format, readers will generally scan the table to locate specific data points. The following eight guidelines are recommended to create data tables digitally and/or in print.

1. Clearly distinguish the headers from the body of the data table.

The headers, which usually are the first row of a data table, provide important contextual information about each column. Headers should be presented differently from the rest of the table by using boldface or a different font. A horizontal line divider could be used to separate the headers and the body of the table. Computer software like Microsoft Excel or Word allows users to “freeze” the top row(s) of a data table to repeat on multiple pages if the table is too large to present on one screen or page.

2. Sort data when appropriate and with purpose.

Sorted data could help the end users identify patterns within the data or locate a specific value. Text variables could be sorted alphabetically, and numeric variables could be sorted based on values. Data could be sorted by a single column or multiple columns based on the type of information to show.

The following three tables are examples to show how different sorting can help answer different questions. The data are the same in the three tables: the top 10 countries with the most COVID-19 cases as of April 5, 2022.

In Table 3, the data is sorted by the number of total cases in a country, from the highest value to the lowest. If we ask the question, “Which country has the most cases of COVID-19?”, the USA could be quickly identified as the answer to that question since it is the first row of the sorted table. In Table 4, we sorted the country names alphabetically. It can help answer the question, “What are the numbers of COVID-19 cases and total deaths in S. Korea?” In Table 5, we used two columns to sort the data, first by the continent’s name and then the COVID-19 death rate of each country, which can help compare the case and death rates of countries in the same continents. Most data analysis software support how data is sorted.

3. Avoid using gridlines for data tables if possible.

Gridlines are the horizontal and vertical lines delineating the columns, rows, and cells of a table. They are helpful, especially in spreadsheet software, for quickly locating where the cursor is for entering and locating data. However, full gridlines make the data table look less “breathable” and more cluttered. Therefore, unless required, all default gridlines should be removed as the first step of formatting a data table.

Instead, we should leverage the white spaces between the rows and columns, the padding of the cells, text indentations, and alignments to help users read the data and identify the embedded relationships. In the example below, Table 7 is a non-gridline version of Table 6 that shows the annual US population estimates from 2015 to 2019. Although Table 6 is still readable, the existence of the full gridlines surrounding each cell disrupts data scanning. By removing the gridlines, users can scan the data horizontally and vertically without jumping any virtual hurdles. Additionally, in Table 7, a horizontal line divider separates the headers from the data (see guideline point 1). Another subtle improvement of Table 6 is to increase the space between the total row for the US and the rows for other regions. In combination with the indentations of the regional names, the whole-part hierarchical relationship between the US and the four regions becomes more prominent.

4. Right-align numeric values and headers.

Alignment makes data scannable in a table. Data, either left- or right-aligned, could form a subtle straight line along the side of the alignment. Because numeric calculation starts from the rightmost digit of the numbers, right aligning numeric values, especially those with decimal points, will improve the scalability of the numeric data.

In Table 8, we compare the three different types of alignment (left, center, and right) of the same set of four numbers. The right-aligned numeric data is much easier for a quick vertical scan. It also makes the value for Hospital C stand out, as it is longer than the other three values. A mathematic-savvy user may even be able to quickly check the four last digits of the numbers and compare them to the last digit of the average value at the bottom, after some mental calculation.

5. Left-align text values and headers.

Aligning the text data along the left side will make them readable since we read English from left to right. For language read from right to left, such as Arabic, it is helpful to right-align the text. In Table 9, we compare the lists of four hospital names in three different alignments (left, centered, and right). The left-aligned data are much more readable than the other two alignments. Centered alignment should not be used in a data table unless it is required, because this doesn’t help reading either the text or the numeric data.

6. Select the appropriate font for better visualization of numeric data in a table.

There are hundreds of font types that we can use when formatting a data table. There are two general families of font types: sans serif and serif. Sans serif font type refers to the family of fonts that don’t have the small extending feature at the end of the strokes, while serif fonts do have the extending feature. For example, the default font type in the current version of Microsoft Word, Calibri, is a sans serif font type. An example of a serif font type is Times New Roman, which is widely used in publishing and web media. Sans serif font types are usually considered more modern, “clean,” and minimalistic, while the serif font types are generally considered more aesthetically pleasing. It is acceptable to use both font types in a table, but some serif fonts, such as Georgia, do not align the characters along the same baseline, which results in an uneven baseline of text string. These fonts should not be used to format a table, especially numeric data.

Another more relevant way to classify the font types is proportional versus non-proportional font types. Non-proportional fonts use the same width for different characters, while proportional fonts use different widths for different alphanumeric characters. Popular proportional fonts include Times New Roman, Arial, and Georgia, and popular non-proportional fonts include Courier or Monaco. Although proportional fonts are widely used in publishing media for their aesthetics, they may not be proper for data tables, as they may disrupt the alignment of the numeric data. To make the matter more complicated, some proportional font types (e.g., Calibri) might present the alphabetical characters proportionally, while presenting numeric characters non-proportionally, so they still could be used to format numeric data.

Therefore, the best practice to choose a font type to represent numeric data in a table is to test different font types before deciding which one to use. Table 10 compares four popular font types (Calibri, Cambria, Georgia, and Arial) often used in word processing and publishing. The two sans serif fonts (Calibri and Arial) give a clean presentation of the numeric data. Cambria, a serif font type, is acceptable for formatting numbers even though it looks a little “busier.” However, the numbers in the column using the Georgia font type do not align well either horizontally (check the bottoms of the numeric characters) or vertically (check the positions of the commas). Therefore, it should be refrained from being used in formatting numeric values.

7. Use highlights to draw attention to specific data points.

As previously mentioned, users need help scanning the data table to locate specific data points such as outliers, maximum or minimum values, or any other noteworthy values. Highlighting key data points is critical, especially in large and complex data tables. Common methods of highlighting specific data include bold font, special superscript symbols (e.g., asterisk), or a different font or background color. Please note that when using color to highlight a value, the best practice is to use a color palette that is friendly to people who might be color blind.

8. Embed data visualizations in data tables when applicable.

Modern spreadsheet and statistical software allow end users to embed data visualization within a data table. For example, with Microsoft Excel users can create and embed sparklines (think a miniature line or bar graph condensed into one cell) or use conditional formatting to highlight the different numeric values in the table. Table 11, generated using Microsoft Excel, displays a time series sparkline beside each row of data to summarize the annual data trend. Using the conditional formatting function available in Microsoft Excel, we can also create the green colored data bars overlaid on top of the data table to encode the different amount within each cell.

In summary, although data tables are not usually regarded as a type of data visualization, they are often requested and made available to show the data underlying the data visualization. Effective formatting and organizing of data in a table are essential for users to scan and comprehend the data quickly and accurately. Understanding the concepts and using the guidelines introduced in this article will help HI professionals effectively organize and present data tables in presentations or reports.


Xiaoming Zeng (xiaoming_zeng@med.unc.edu) is a research professor in the Department of Psychiatry at the University of North Carolina at Chapel Hill.

Katelyn H. Rouse (hardyka16@ecu.edu) is a clinical assistant professor in the Department of Health Services and Information Management at East Carolina University.

For more on data analytics, take the AHIMA course “Actionable Data Analytics Insights.”