What is a Data Dictionary?
By Michelle Hermann, MS, RHIA
A data dictionary, as defined by the AHIMA Press-published textbook Health Information: Management of a Strategic Resource, is a “super catalog” that provides, for each data field or element, a list of information describing the field, where the data originates, edits or rules that apply to that field, type and width of field, description of codes used (if any), what applications or reports use that data element, and so on. This centralized repository, a collection of databases from a wide variety of sources in an organization that are integrated into one database to permit a singular view of the data, provides meaning, relationships to other data, origin, usage and formatting. It aligns the organization and removes confusion by providing the necessary metadata to the designers, users, and administrators as an informational resource management tool. Data definitions should describe and explain the meaning of each data element clearly and concisely.
Essentially, a data dictionary is a tool that provides the communication structure in a way that technical and operational teams can more easily meet the daily operational needs of the organization. A data dictionary enables different systems to transmit and share information through standardized definitions and data mapping in a streamlined approach. It is a reference for all staff, including onboarding new staff easier with clearer requirements.
To begin the development of a data dictionary, organizations should consider the following steps:
- Data Stewards should be assigned in all functional domains/business units and are essential to the development and standardization of data definitions. All data stewards within the organization should begin compiling a list of terms (fields/attribute names) in their domains. They can do this by reviewing current systems and applications for a listing of these terms to output. This will ensure the definitions are organized by each domain across the organization.
- After compiling all terms, the Data Stewards should provide/develop definitions with clear and unambiguous language. Then, the Data Stewards would sit down with the teams to identify common terms and to refine and standardize their definitions.
- Once the terms and definitions are reviewed, vetted, and standardized, they should be integrated into a master list that will be used and published enterprise-wide. Data stewards from all domains should be involved to determine the final definitions and to document clear descriptive terminology.
- All teams will need to sign off on the enterprise-wide data dictionary. This is a valuable step to ensure the integrity of the data where leaders have had their final review and approve that these terms will be adopted in all areas.
- Then, the data dictionary should be published in a location that is easily accessible to all staff. Training and education should be provided for the workforce.
- Field or Attribute Name: a unique identifier used to label each attribute
- Optional/Required: indicates if this information is an optional or required field
- Type: defines the type of data that is in this field such as text, numeric, or date/time
Here is an example of a data dictionary:
Table | Field | Type | Format | Length | Description |
PATIENT | |||||
Patient Last Name | Text | 25 Characters | Last name | ||
Patient First Name | Text | 25 Characters | First name | ||
Middle Initial | Text | 1 character | Middle initial | ||
Gender | Drop down | 0 (U), 1 (M), 2 (F) | 1 character | Patient's sex | |
MRN | Numeric | XXXXXXXXXX | 10 characters | Medical record number to serve as patient's unique identifier |
As you can see, a data dictionary provides critical information in a structured format to align all organizational users. The data definitions should be displayed on reports and dashboards to clearly describe the data and context in which it is used. It makes it easier for database developers, report writers, and end users to communicate utilizing the same language to meet the needs of the organization so all users are consistent with their reporting of data. Data definitions should be addressed as a part of an information governance program. This will help streamline data across its lifecycle so that it can be used more strategically and efficiently.
Michelle Hermann is director of health information management at Children’s Health System of Texas.