Dissimilarities in the Logical Modeling of Apparently Similar Concepts
in SNOMED CT
Ankur Agrawal, BE, Gai Elhanan, MD, Michael Halper, PhD
New Jersey Institute of Technology, Newark, NJ
Abstract exhibiting a similar word structure should have
Concepts whose terms are of a similar word structure underlying DL modeling that is analogous in
are expected to have similar logical representations. structure, too.
Anecdotal examples from SNOMED CT indicate that
Past research (e.g., [3]) has identified instances of
this may not always be the case. An investigation into inconsistent modeling in SCT. Such inconsistencies
the extent of inconsistent modeling in SNOMED CT
may be perceived to have minimal implications
hierarchies is carried out. A lexical methodology is
regarding clinical coding. However, inconsistencies
used to identify sets of similar concepts. It is applied
may significantly affect the performance of reasoners
to one of the most attribute-rich hierarchies,
and inference generation (e.g., in the context of error
Procedure, from which a random sample of 60 sets is detection and decision support) as these explicitly
derived. These sets are examined in regard to
rely on the completeness and consistency of formal
hierarchical, definitional, attribute, attribute/value,
definitions. Therefore, in this paper we analyze the
and role-group aspects. Thirty percent of the sample
conceptual representation of sets of concepts similar
sets were found to have at least one type of modeling
at the term-level in an attempt to characterize the
inconsistency. Their presence may interfere with the
consistency of the modeling across these concepts.
performance of terminology-driven applications.
Sets of concepts with similar terms are gathered
With the use of SNOMED expanding, such
through standard lexical techniques. Such an analysis
inconsistencies may eventually affect clinical care.
is performed on SCT’s Procedure hierarchy, one
Due to this, external auditing should be encouraged
with a rich collection of attributes.
to identify such issues and complement IHTSDO’s
efforts. Background
In SNOMED CT, the logical modeling of concepts
Introduction
should have a consistency commensurate with that of
SNOMED CT (SCT) is a comprehensive and
the concepts’ presentations in the form of terms.
complex terminology [1]. In the past decade, SCT
SCT’s concepts are defined in the context of DL
gained recognition as a premier clinical terminology
through their relationships to other concepts. The
through endorsements from many national and
relationships come in two varieties: subsumption (IS-
international organizations. The most commonly
A) and attribute relationship. Each concept, except
perceived use of terminologies such as SCT is the
for SCT’s overall root, has at least one IS-A to
encoding of clinical data within electronic medical
another concept called its parent. Collectively, the IS-
systems including Electronic Health Records (EHRs)
As impose a hierarchical structure on SCT, with
and Clinical Information Systems (CISs). Encoding
general concepts at the top and more specific
by standards like SCT is essential for sharable and
concepts below. SCT concepts are organized into a
transferrable medical data. However, SCT offers
collection of 19 top-level hierarchies, including
significantly more than the ability to search for
Clinical finding, Procedure, Body Structure, etc.
medical terms and translate them into codes. It is
built upon description logic (DL) principles [2], with Attribute relationships (simply “attributes” from
each concept being defined by its hierarchical (IS-A) hereon) capture the “lateral” knowledge of SCT. For
and lateral (attribute) relationships to other concepts example, the attribute finding site connects Fracture
in the terminology. of tarsal bone to Bone structure of tarsus,
conveying the knowledge that a fracture of the tarsal
From a clinical perspective, particularly from the
bone involves that particular bone structure and not
point of view of human clinicians, the presentation
another. SCT’s technical documentation outlines well
format of concepts in the form of terms (e.g., fully-
defined rules for the domains and ranges of its
specified names and preferred names) is often of
defining attributes and attribute values [4]. Each
primary concern. On the other hand, computer concept is further classified by its status of logical
programs—particularly those performing some kind
definition: fully-defined vs. primitive. A primitive
of reasoning—are built around the concepts’ DL
concept is underspecified in the sense of not having
formulations. One would expect that these two
enough attributes to distinguish it from its parents
perspectives be highly consistent. In particular, terms
AMIA 2010 Symposium Proceedings Page - 212
Dissimilarities in the Logical Modeling of Apparently Similar Concepts
in SNOMED CT
Ankur Agrawal, BE, Gai Elhanan, MD, Michael Halper, PhD
New Jersey Institute of Technology, Newark, NJ
Abstract exhibiting a similar word structure should have
Concepts whose terms are of a similar word structure underlying DL modeling that is analogous in
are expected to have similar logical representations. structure, too.
Anecdotal examples from SNOMED CT indicate that
Past research (e.g., [3]) has identified instances of
this may not always be the case. An investigation into inconsistent modeling in SCT. Such inconsistencies
the extent of inconsistent modeling in SNOMED CT
may be perceived to have minimal implications
hierarchies is carried out. A lexical methodology is
regarding clinical coding. However, inconsistencies
used to identify sets of similar concepts. It is applied
may significantly affect the performance of reasoners
to one of the most attribute-rich hierarchies,
and inference generation (e.g., in the context of error
Procedure, from which a random sample of 60 sets is detection and decision support) as these explicitly
derived. These sets are examined in regard to
rely on the completeness and consistency of formal
hierarchical, definitional, attribute, attribute/value,
definitions. Therefore, in this paper we analyze the
and role-group aspects. Thirty percent of the sample
conceptual representation of sets of concepts similar
sets were found to have at least one type of modeling
at the term-level in an attempt to characterize the
inconsistency. Their presence may interfere with the
consistency of the modeling across these concepts.
performance of terminology-driven applications.
Sets of concepts with similar terms are gathered
With the use of SNOMED expanding, such
through standard lexical techniques. Such an analysis
inconsistencies may eventually affect clinical care.
is performed on SCT’s Procedure hierarchy, one
Due to this, external auditing should be encouraged
with a rich collection of attributes.
to identify such issues and complement IHTSDO’s
efforts. Background
In SNOMED CT, the logical modeling of concepts
Introduction
should have a consistency commensurate with that of
SNOMED CT (SCT) is a comprehensive and
the concepts’ presentations in the form of terms.
complex terminology [1]. In the past decade, SCT
SCT’s concepts are defined in the context of DL
gained recognition as a premier clinical terminology
through their relationships to other concepts. The
through endorsements from many national and
relationships come in two varieties: subsumption (IS-
international organizations. The most commonly
A) and attribute relationship. Each concept, except
perceived use of terminologies such as SCT is the
for SCT’s overall root, has at least one IS-A to
encoding of clinical data within electronic medical
another concept called its parent. Collectively, the IS-
systems including Electronic Health Records (EHRs)
As impose a hierarchical structure on SCT, with
and Clinical Information Systems (CISs). Encoding
general concepts at the top and more specific
by standards like SCT is essential for sharable and
concepts below. SCT concepts are organized into a
transferrable medical data. However, SCT offers
collection of 19 top-level hierarchies, including
significantly more than the ability to search for
Clinical finding, Procedure, Body Structure, etc.
medical terms and translate them into codes. It is
built upon description logic (DL) principles [2], with Attribute relationships (simply “attributes” from
each concept being defined by its hierarchical (IS-A) hereon) capture the “lateral” knowledge of SCT. For
and lateral (attribute) relationships to other concepts example, the attribute finding site connects Fracture
in the terminology. of tarsal bone to Bone structure of tarsus,
conveying the knowledge that a fracture of the tarsal
From a clinical perspective, particularly from the
bone involves that particular bone structure and not
point of view of human clinicians, the presentation
another. SCT’s technical documentation outlines well
format of concepts in the form of terms (e.g., fully-
defined rules for the domains and ranges of its
specified names and preferred names) is often of
defining attributes and attribute values [4]. Each
primary concern. On the other hand, computer concept is further classified by its status of logical
programs—particularly those performing some kind
definition: fully-defined vs. primitive. A primitive
of reasoning—are built around the concepts’ DL
concept is underspecified in the sense of not having
formulations. One would expect that these two
enough attributes to distinguish it from its parents
perspectives be highly consistent. In particular, terms
AMIA 2010 Symposium Proceedings Page - 212