Project Summary

Patient matching (also called record linkage, entity resolutions, data matching) is a process of connecting a patient to their health care records located in disparate health care systems. Patient matching has been noted as one key barrier to achieving interoperability of electronic health care records. Health Information Management Society and Systems (HIMSS), the Office of the Chief Technology Officer at HHS (HHS IDEA Lab) and the Office of the National Coordinator (ONC) have worked on the issue of patient matching for several years. In order to help bring focus and innovation to the problem, HIMSS decided to sponsor an Innovator-in-Residence (IIR) to work with both the HHS IDEA Lab and ONC to tackle patient matching.

Project Team

Adam Culbertson

Project Details

There are a variety of ways in which we can match patient records. Common methods to match include deterministic and probabilistic matching. Simple deterministic matches are typically string comparisons where attributes in record A are matched to attributes in record B and return a match if all the characteristics match. Deterministic, also described as a rules-based or heuristic-based method, typically have a binary outcome of match and non-match status. See table one below for an example.

Table demonstrating Deterministic matching

Probabilistic approaches to matching attempt to account for small differences in data sets by using match scores which the user sets to determine match, non-match, and possible matches. In a probabilistic match scenario, we might see something like table 2 below.   If the operator set the match score at 0.90 then data set A and data set B would produce a match score of 0.93 using Jaro edit distance for the calculation of the first and last name plus date of birth.

Table demonstrating Probabilistic matching

The probabilistic algorithms need an operator to make the best case judgment through experimentation on where to set the match (upper) and no-match (lower) threshold. The area between the upper and lower match threshold is the area that will be sent to a queue as undecided. These records are typically sent for manual review, where human reviewers can find additional information before they use a set of specific criteria to determine if they are a match or non-match.  To evaluate the effectiveness of a patient matching software, there are several key terms to understand. Typically humans review the records in duplicate and then assign a match, non-match or undecided status to the records. This set of records is then compared to the output of the algorithm.

  • True Positive– The two records represent the same patient.
  • True Negative– The two records don’t represent the same patient.
  • False Positive– The algorithm creates a link to two records that don’t actually match (type I error).
  • False Negative– The algorithm misses a record that should be matched (type II error).

See table three below for a simple example for how two names in two separate data systems would be compared.

Table with examples of patient matching terms

Many patient matching algorithms have been described as blackboxes and for many, there is little empirical evidence and well established methods to establish how well matching software performs. A more clear and transparent metrics of performance need to be established in order to help the field move forward from its current state. Without more transparency and use of common metrics for algorithm performance, the area of record linkage will have difficultly innovating.

Additional Information


Established in 2012, the Innovator-in-Residence (IIR) Program brings new ideas and expertise into HHS programs through collaboration between the U.S. Department of Health and Human Services (HHS) and private sector not-for-profit organizations.  The IIR Program brings together talent from within and outside government to tackle high priority issues in health, health care, and the delivery of human services