Oct, 01, 2020

Exploring Strategies to Fill Gaps in Medicaid Race, Ethnicity, and Language Data

Elizabeth Lukanen and Emily Zylla, SHADAC

As states seek to address the social determinants of health and advance health equity, they face longstanding and persistent challenges in collecting complete, accurate, and consistent race, ethnicity and language (REL) data. This expert perspective provides an overview of current REL data collection standards; ideas for increasing completeness in data by engaging the enrollee and enrollment assisters, and modifying enrollment and renewal interface; and provides suggestions for how states could leverage alternative sources of data in order to improve REL data completeness.


Collecting standardized patient demographic and language data across health care systems is an important first step toward improving population health, identifying areas of health disparities, and addressing health equity and social risk factors. As seen during the recent COVID-19 pandemic, for example, disaggregated demographic data is critical to identifying problems, implementing solutions, and evaluating impacts. To-date, the amount of unknown race data varies in Medicaid programs across the country. Recently the Centers for Medicaid and Medicaid Services (CMS) classified 14 states’ Medicaid race and ethnicity data as “high concern” because more than 20 percent of the data was missing. An additional five states with more than 50 percent of their data missing were classified as unusable.[1] A review of the literature confirms that the deficiencies in REL data have changed over time from one of accuracy to one of completeness. The most commonly cited REL data collection challenges include:

  • Lack of mandatory reporting standards
  • Rapidly changing demographics
  • Evolution in how people self-identify
  • Voluntary reporting
  • Lack of understanding on why the data are important
  • Mistrust and enrollee concerns about how data will be used

Current Standards for Collecting Race and Ethnicity Data

There is a robust body of research, guidance, and tools related to the collection of REL data in health care settings. The Department of Health and Human Services (HHS) provides guidance (but does not mandate) to use the minimum standards developed by the US Office of Management and Budget (OMB), established in 1997 (Figure 1). These standards were updated after the passage of the ACA.

HHS also recommends collecting more granular data that reflects the population of interest/locally relevant choices, with the following considerations:

  • Make sure the categories can be aggregated back to the minimum categories above (see Figure 2.)
  • Consider conducting analysis of Census data to select these options
  • Rely on established coding systems such as the CDC code set for race and ethnicity
  • Although “Other” is not an official OMB designation, the US Census and OMB also recommend including a “some other race” option.

The number of response categories a state offers often will depend on format and space available (e.g., web vs paper) and whether or not a state has the staffing capacity to recode write-in responses. States are encouraged to report additional granularity where it is supported by sample size and as long as the additional detail can be aggregated back to the minimum standard set of race and ethnicity categories.

Race and Ethnicity: New Census Research

In preparation for the 2020 census, the agency conducted research to improve race/ethnicity collection.[2] The goals of the project were to increase the accuracy and reliability of reporting in the major OMB racial and ethnic categories; collect detailed data for myriad groups; and obtain lower item nonresponse rates. The research had several major findings, including:

  • Reinforcing the importance of allowing multiple responses
  • Suggesting that “Mark all that apply” or “Select all that apply” is better than “Select one or more”
  • Using race/ethnicity terminology is less confusing than using terms like “category,” which can suggest a hierarchy
  • Data collection is improved when there is a dedicated “Middle Eastern or North African” response category for race (currently classified as “White”)
  • Data collection is improved when a “write-in line” is used to collect detailed American Indian/Alaskan Native (AIAN) responses, however including a limited number of conceptual checkboxes (i.e., American Indian, Alaska Native, and Central/South American Indian) decreased detailed reporting for the AIAN category.  According to Census Bureau research there are hundreds of very small detailed AIAN tribes, villages, and indigenous groups.  Listing the six largest American Indian groups and Alaska Native groups as checkboxes, for example, would represent only about 10 percent of the entire AIAN population. Therefore, providing a distinct write-in area was determined to be the best overall approach for eliciting detailed responses across AIAN communities and identities.

Overall, the research found evidence that a providing a combined race/ethnicity question with detailed checkboxes resulted in increased use of OMB standard categories, decreased nonresponse, and improved accuracy.

Strategies for Improving and Revising Collection Standards

Research continues to inform REL data collection efforts, and practices continue to evolve. Data collection standards change in response to research, but also in response to the rapidly changing demographics and fluidity of racial and ethnic self-identification. States should review data collection practices periodically, as well as the overall demographics of the state’s population to assure REL response options are appropriate, but only modify collection practices if needed. If modifications are needed, make sure changes do not impact the ability to compare/link your data to other sources (e.g., other state agency data, data from providers, etc.). If possible, engage enrollees and their advocates in decision-making related to how REL data are collected and get feedback on any changes that are made.

The following suggestions are strategies that states could employ to help increase REL data completeness in their Medicaid programs.

Engage Enrollees

Some Medicaid enrollees may be sensitive about answering questions about their race/ethnicity or preferred language because of concerns about discrimination. Consider the following steps to help enrollees overcome that concern:

  • Conduct outreach to ask the enrollee community, and community leaders, for their input about how to best collect this information
    • Proactively seek advice on why data are not being provided and how the state can improve data collection
    • Collect input from stakeholders into how data are collected, defined, and used (e.g., A/B user testing, testing what educational messages work, etc.)
    • Report findings back to stakeholders when data are available
  • Develop a communication strategy focused on the importance of REL data collection, addressing concerns about collecting this data, and providing enrollees information specifically on:
    • How are REL data used and NOT used? (e.g., can the information be used to enforce immigration laws)
    • The state’s privacy policy
    • Who has access to these data and how these data are protected
    • Where results are reported

Engage Enrollment Assisters

Enrollment assisters (term used broadly) play a large roll in application submission and renewal. In many cases, assisters serve as trusted source of information and are well positioned to talk to enrollees about providing REL data. Like the enrollees, however, assisters may not know why REL data are being collected, how data are used and might skip questions to save time. Assisters pride themselves on being neutral and may not be comfortable making recommendations to clients about providing data. Consider the following steps to help enrollment assisters collect complete and accurate REL data:

  • Develop trainings, role playing, talking points, and fact sheets that address key questions about REL data collection broadly, but also provide specific tips regarding client engagement. (Training and tools should be framed as information to educate their clients, not coerce them.) Trainings can address:
    • Where in the application/renewal form or online process are REL questions asked and how can discussions about the importance of these questions be incorporated into assister workflows?
    • Where can the assister send people if the client has questions or complaints about response options
  • For assisters where Medicaid has a direct or contractual relationship (call center employees, contractors, grantees):
    • Require training on the topic
    • Make enrollee discussions about REL data part of the formalized work flow
  • For the broader assister community
    • Communicate with umbrella organizations that support smaller community based organizations as a way to build trust and spread the message

Modify the Enrollment/Renewal Interface

Since the implementation of the ACA, states have made significant progress in streamlining and simplifying their enrollment processes and systems. As required by the ACA, all states now have to accept Medicaid applications via multiple modes including in person, mail, telephone, and online. The following strategies could help states increase the collection of REL data through their online enrollment and renewal platforms.

  • Include text, or links to text, explaining the importance of collecting REL data and how the information will be used in online forms
    • Utilize hover text with links
    • Provide reminder prompts: “Are you sure you want to skip this question?”
  • Use research-tested messages to explain the value of collecting race, ethnicity, and language data (Figure 3)

  • Increase opportunities for enrollees to provide REL data for collection if data is missing
    • Prompt near the end of the application
    • Prompt when the enrollee logs in to account (e.g., when they log in to re-enroll or report changes)
    • Send special email reminder to complete this information a short time after enrollment or renewal

Leverage Alternative Data Sources

Another potential strategy is to use other data sources to fill gaps or validate Medicaid data. This is a particularly successful strategy if the data source in question is more complete (e.g., Minnesota has robust provider quality reporting standards that include REL data collection). States could compare Medicaid data against other data sources to look for systematic gaps (e.g., implausibly low rates of certain races/ethnicities). In some cases, states are collecting data at several points in time – e.g., eligibility determination, enrollment into a health plan, and at the time of treatment. These data can be used together and to validate the strongest and most complete data source. Potential data sources might include:

  • Encounter data collected directly or from managed care organizations
  • Data from health and social needs assessments
  • Vital records data
  • Data from other state agencies

Leveraging other data is not without challenges and might be best approached initially as a pilot initiative. For example, data use agreements take time to execute and data linking can face technical challenges and take up scarce analyst time. Also, data from other sources often has similar issues related to quality and completeness or different issues.

[1] Medicaid.gov. (n.d.). DQ Atlas: Race and Ethnicity [2016 data set]. Available from https://www.medicaid.gov/dq-atlas/landing/topics/single/map?topic=g3m16.

[2] https://assets.documentcloud.org/documents/4316468/2015nct-Race-Ethnicity-Analysis.pdf