Comparative Analysis of the VA/Kaiser and NLM CORE Problem Subsets:
An Empirical Study Based on Problem Frequency
Adam Wright, PhD1,2, Joshua Feblowitz, MS1,
Allison B. McCoy, PhD3, Dean F. Sittig, PhD3
1Brigham and Women’s Hospital, Boston, MA; 2Harvard Medical School, Boston, MA;
3University of Texas Health Science Center, Houston, TX
Abstract
The problem list is a critical component of the electronic medical record, with implications for clinical care,
provider communication, clinical decision support, quality measurement and research. However, many of its
benefits depend on the use of coded terminologies. Two standard terminologies (ICD-9 and SNOMED-CT) are
available for problem documentation, and two SNOMED-CT subsets (VA/KP and CORE) are available for
SNOMED-CT users. We set out to examine these subsets, characterize their overlap and measure their coverage.
We applied the subsets to a random sample of 100,000 records from Brigham and Women’s Hospital to determine
the proportion of problems covered. Though CORE is smaller (5,814 terms vs. 17,761 terms for VA/KP), 94.8% of
coded problem entries from BWH were in the CORE subset, while only 84.0% of entries had matches in VA/KP
(p<0.001). Though both subsets had reasonable coverage, CORE was superior in our sample, and had fewer
clinically significant gaps.
Introduction
The problem list, “a compilation of clinically relevant physical and diagnostic concerns, procedures, and
psychosocial and cultural issues that may affect the health status and care of patients” (1), is a critical component of
the problem-oriented medical record (2). However, many of its benefits depend on the use of coded problem
terminologies. Two standard terminologies: International Statistical Classification of Diseases and Related Health
Problems version 9 (ICD-9) and the Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT)ICD-
9 and SNOMED-CT) are available for problem list documentation, and two subsets of SNOMED-CT (VA/KP and
CORE) are available for SNOMED-CT users. We set out to examine the two subsets, characterize their overlap and
objectively measure their coverage with a goal of making a recommendation to clinical system implementers and
standards developers.
Our analysis builds on prior work that studied SNOMED-CT in its entirety, and also builds on the internal
evaluation work done by the subset developers (3-6). This internal validation looked at the subsets’ coverage of the
datasets from which they were derived (6) – no previous validation of the subsets on an external dataset has been
reported to this point. We provide such a validation on a sample of actual patient data from Brigham and Women’s
Hospital.
Background
Problem lists are used for a variety of functions, including direct clinical care of patients, communication between
healthcare providers (particularly in situations where one provider is covering for another) as well as clinical
decision support (7), quality measurement (8) and clinical research (9).
One critical aspect of many electronic problem lists is structure and coding (1, 3, 10). Most modern (and indeed,
most early) clinical information systems support coding of clinical problems using either proprietary code sets, ICD-
9 or SNOMED-CT (3). Advantages of coded problems include greater standardization of problem descriptions and
definitions, more computability, interoperability and, in many cases, the ability to use existing ontologies for
subsumption and related operations to facilitate more efficient development of logic (11, 12).
There is presently some debate over the advantages of SNOMED-CT or ICD-9 as the best standard to represent
problem lists, and both are currently acceptable for ONC-approved data exchange (13) and EHR certification (14).
However, empirical data suggests that SNOMED-CT provides better coverage and clinical granularity (3), and ICD-
9 is currently being phased out as a billing standard in the United States in favor of ICD-10 (15). ICD-10 may have
1532
Comparative Analysis of the VA/Kaiser and NLM CORE Problem Subsets:
An Empirical Study Based on Problem Frequency
Adam Wright, PhD1,2, Joshua Feblowitz, MS1,
Allison B. McCoy, PhD3, Dean F. Sittig, PhD3
1Brigham and Women’s Hospital, Boston, MA; 2Harvard Medical School, Boston, MA;
3University of Texas Health Science Center, Houston, TX
Abstract
The problem list is a critical component of the electronic medical record, with implications for clinical care,
provider communication, clinical decision support, quality measurement and research. However, many of its
benefits depend on the use of coded terminologies. Two standard terminologies (ICD-9 and SNOMED-CT) are
available for problem documentation, and two SNOMED-CT subsets (VA/KP and CORE) are available for
SNOMED-CT users. We set out to examine these subsets, characterize their overlap and measure their coverage.
We applied the subsets to a random sample of 100,000 records from Brigham and Women’s Hospital to determine
the proportion of problems covered. Though CORE is smaller (5,814 terms vs. 17,761 terms for VA/KP), 94.8% of
coded problem entries from BWH were in the CORE subset, while only 84.0% of entries had matches in VA/KP
(p<0.001). Though both subsets had reasonable coverage, CORE was superior in our sample, and had fewer
clinically significant gaps.
Introduction
The problem list, “a compilation of clinically relevant physical and diagnostic concerns, procedures, and
psychosocial and cultural issues that may affect the health status and care of patients” (1), is a critical component of
the problem-oriented medical record (2). However, many of its benefits depend on the use of coded problem
terminologies. Two standard terminologies: International Statistical Classification of Diseases and Related Health
Problems version 9 (ICD-9) and the Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT)ICD-
9 and SNOMED-CT) are available for problem list documentation, and two subsets of SNOMED-CT (VA/KP and
CORE) are available for SNOMED-CT users. We set out to examine the two subsets, characterize their overlap and
objectively measure their coverage with a goal of making a recommendation to clinical system implementers and
standards developers.
Our analysis builds on prior work that studied SNOMED-CT in its entirety, and also builds on the internal
evaluation work done by the subset developers (3-6). This internal validation looked at the subsets’ coverage of the
datasets from which they were derived (6) – no previous validation of the subsets on an external dataset has been
reported to this point. We provide such a validation on a sample of actual patient data from Brigham and Women’s
Hospital.
Background
Problem lists are used for a variety of functions, including direct clinical care of patients, communication between
healthcare providers (particularly in situations where one provider is covering for another) as well as clinical
decision support (7), quality measurement (8) and clinical research (9).
One critical aspect of many electronic problem lists is structure and coding (1, 3, 10). Most modern (and indeed,
most early) clinical information systems support coding of clinical problems using either proprietary code sets, ICD-
9 or SNOMED-CT (3). Advantages of coded problems include greater standardization of problem descriptions and
definitions, more computability, interoperability and, in many cases, the ability to use existing ontologies for
subsumption and related operations to facilitate more efficient development of logic (11, 12).
There is presently some debate over the advantages of SNOMED-CT or ICD-9 as the best standard to represent
problem lists, and both are currently acceptable for ONC-approved data exchange (13) and EHR certification (14).
However, empirical data suggests that SNOMED-CT provides better coverage and clinical granularity (3), and ICD-
9 is currently being phased out as a billing standard in the United States in favor of ICD-10 (15). ICD-10 may have
1532