US20230360745A1 - Code Point Resolution Using Natural Language Processing and Metathesaurus - Google Patents
Code Point Resolution Using Natural Language Processing and Metathesaurus Download PDFInfo
- Publication number
- US20230360745A1 US20230360745A1 US18/223,082 US202318223082A US2023360745A1 US 20230360745 A1 US20230360745 A1 US 20230360745A1 US 202318223082 A US202318223082 A US 202318223082A US 2023360745 A1 US2023360745 A1 US 2023360745A1
- Authority
- US
- United States
- Prior art keywords
- medical
- csmtcs
- code point
- text
- rule
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003058 natural language processing Methods 0.000 title description 14
- 238000000034 method Methods 0.000 claims abstract description 75
- 238000013507 mapping Methods 0.000 claims abstract description 19
- 230000008569 process Effects 0.000 claims abstract description 17
- 230000007246 mechanism Effects 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 9
- 230000004044 response Effects 0.000 claims 2
- 238000001914 filtration Methods 0.000 claims 1
- 208000037924 multicystic encephalomalacia Diseases 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 30
- 238000012545 processing Methods 0.000 description 21
- 238000004891 communication Methods 0.000 description 19
- 238000007726 management method Methods 0.000 description 16
- 238000013059 nephrectomy Methods 0.000 description 16
- 241000700196 Galea musteloides Species 0.000 description 15
- 230000002085 persistent effect Effects 0.000 description 13
- 230000006870 function Effects 0.000 description 11
- 230000005855 radiation Effects 0.000 description 10
- 238000010801 machine learning Methods 0.000 description 7
- 238000001959 radiotherapy Methods 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 5
- 230000008520 organization Effects 0.000 description 5
- 230000001225 therapeutic effect Effects 0.000 description 5
- 238000003491 array Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 4
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 4
- 230000036541 health Effects 0.000 description 4
- 238000002600 positron emission tomography Methods 0.000 description 4
- 229940079593 drug Drugs 0.000 description 3
- 239000003814 drug Substances 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 230000001902 propagating effect Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000006855 networking Effects 0.000 description 2
- 238000011275 oncology therapy Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000001356 surgical procedure Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000012384 transportation and delivery Methods 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 102000004137 Lysophosphatidic Acid Receptors Human genes 0.000 description 1
- 108090000642 Lysophosphatidic Acid Receptors Proteins 0.000 description 1
- 208000010428 Muscle Weakness Diseases 0.000 description 1
- 206010028372 Muscular weakness Diseases 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 206010033799 Paralysis Diseases 0.000 description 1
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 230000002146 bilateral effect Effects 0.000 description 1
- 230000009172 bursting Effects 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000012517 data analytics Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000002405 diagnostic procedure Methods 0.000 description 1
- 230000005670 electromagnetic radiation Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000005865 ionizing radiation Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000013439 planning Methods 0.000 description 1
- 229920001690 polydopamine Polymers 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000003449 preventive effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000002560 therapeutic procedure Methods 0.000 description 1
- 210000003813 thumb Anatomy 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/174—Form filling; Merging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/55—Rule-based translation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
- G16H70/20—ICT specially adapted for the handling or processing of medical references relating to practices or guidelines
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
- G16H70/60—ICT specially adapted for the handling or processing of medical references relating to pathologies
Definitions
- SMTCs medical terminology codes
- UMLSM Unified Medical Language System
- Concepts or meanings represented by MCEs that are found within the unstructured medical text may be detected using the vocabularies defined by the metathesaurus. The detected concepts may then be mapped to SMTCs using mappings in the metathesaurus and relevant MCEs.
- a method for exchanging medical information with a medical management system comprises receiving, using a processor of a code point resolver, from the medical management system, medical text via a network interface.
- a code point is a single standardized medical terminology code (SMTC) that corresponds to a medical concept contained within the medical text.
- the method further applies rule-based logic to process the medical text to form a localized mapping of a text portion of the medical text to a plurality of candidate SMTCs (CSMTCs) that are related to at least one metathesaurus concept entity (MCE) in a metathesaurus, and to determines the code point from the CSMTCs.
- the method transmits, via the network interface, to the medical management system, the code point.
- a code point resolver comprising a memory, and a processor.
- the code point resolver is configured to receive, using a processor of a code point resolver, from the medical management system, medical text via a network interface.
- a code point is a single standardized medical terminology code (SMTC) that corresponds to a medical concept contained within the medical text.
- the code point resolver applies rule-based logic to process the medical text to form a localized mapping of a text portion of the medical text to a plurality of candidate SMTCs (CSMTCs) that are related to at least one metathesaurus concept entity (MCE) in a metathesaurus.
- CSMTCs candidate SMTCs
- MCE metathesaurus concept entity
- the code point resolver determines the code point from the CSMTCs, and transmits, via the network interface, to the medical management system, the code point.
- embodiments may take the form of a related computer program product, accessible from a computer-usable or computer-readable medium providing program code for use, by, or in connection, with a computer or any instruction execution system.
- a computer-usable or computer-readable medium may be any apparatus that may contain a mechanism for storing, communicating, propagating or transporting the program for use, by, or in connection, with the instruction execution system, apparatus, or device.
- FIG. 1 A is a block diagram of a data processing system (DPS) according to one or more embodiments disclosed herein.
- DPS data processing system
- FIG. 1 B is a pictorial diagram that depicts a cloud computing environment according to an embodiment disclosed herein.
- FIG. 1 C is a pictorial diagram that depicts abstraction model layers according to an embodiment disclosed herein.
- FIG. 2 A is a block diagram of a concept tree (alternately, “surface form”) that illustrates an instance in which the text portion maps to a single CUI having multiple CSMTCs (e.g., SNOMED codes), according to some embodiments.
- CSMTCs e.g., SNOMED codes
- FIG. 2 B is a block diagram that illustrates multiple CUIs due to ambiguity applying to the covered text, according to some embodiments.
- FIG. 3 A is a block diagram of a concept tree that combines multiple ideas, which is another reason that multiple concepts may exist, according to some embodiments.
- FIG. 3 B is a block diagram that illustrates a concept tree for an example of increased complexity, according to some embodiments.
- FIGS. 4 A- 4 C are block diagrams illustrating process flows for associating a best-fit SMTC with a respective event, according to some embodiments.
- FIG. 5 is a block diagram illustrating a system within which the code point resolver may operate, according to some embodiments.
- FIG. 6 is a flowchart illustrating a process that may be used with the code point resolver, according to some embodiments.
- FIG. 1 A is a block diagram of an example DPS according to one or more embodiments.
- the DPS 10 may include communications bus 12 , which may provide communications between a processor unit 14 , a memory 16 , persistent storage 18 , a communications unit 20 , an I/O unit 22 , and a display 24 .
- the processor unit 14 serves to execute instructions for software that may be loaded into the memory 16 .
- the processor unit 14 may be a number of processors, a multi-core processor, or some other type of processor, depending on the particular implementation.
- a number, as used herein with reference to an item, means one or more items.
- the processor unit 14 may be implemented using a number of heterogeneous processor systems in which a main processor is present with secondary processors on a single chip.
- the processor unit 14 may be a symmetric multi-processor system containing multiple processors of the same type.
- the memory 16 and persistent storage 18 are examples of storage devices 26 .
- a storage device may be any piece of hardware that is capable of storing information, such as, for example without limitation, data, program code in functional form, and/or other suitable information either on a temporary basis and/or a permanent basis.
- the memory 16 in these examples, may be, for example, a random access memory or any other suitable volatile or non-volatile storage device.
- the persistent storage 18 may take various forms depending on the particular implementation.
- the persistent storage 18 may contain one or more components or devices.
- the persistent storage 18 may be a hard drive, a flash memory, a rewritable optical disk, a rewritable magnetic tape, or some combination of the above.
- the media used by the persistent storage 18 also may be removable.
- a removable hard drive may be used for the persistent storage 18 .
- the communications unit 20 in these examples may provide for communications with other DPSs or devices.
- the communications unit 20 is a network interface card.
- the communications unit 20 may provide communications through the use of either or both physical and wireless communications links.
- the input/output unit 22 may allow for input and output of data with other devices that may be connected to the DPS 10 .
- the input/output unit 22 may provide a connection for user input through a keyboard, a mouse, and/or some other suitable input device. Further, the input/output unit 22 may send output to a printer.
- the display 24 may provide a mechanism to display information to a user.
- Instructions for the operating system, applications and/or programs may be located in the storage devices 26 , which are in communication with the processor unit 14 through the communications bus 12 .
- the instructions are in a functional form on the persistent storage 18 .
- These instructions may be loaded into the memory 16 for execution by the processor unit 14 .
- the processes of the different embodiments may be performed by the processor unit 14 using computer implemented instructions, which may be located in a memory, such as the memory 16 .
- These instructions are referred to as program code 38 (described below) computer usable program code, or computer readable program code that may be read and executed by a processor in the processor unit 14 .
- the program code in the different embodiments may be embodied on different physical or tangible computer readable media, such as the memory 16 or the persistent storage 18 .
- the DPS 10 may further comprise an interface for a network 29 .
- the interface may include hardware, drivers, software, and the like to allow communications over wired and wireless networks 29 and may implement any number of communication protocols, including those, for example, at various levels of the Open Systems Interconnection (OSI) seven layer model.
- OSI Open Systems Interconnection
- FIG. 1 A further illustrates a computer program product 30 that may contain the program code 38 .
- the program code 38 may be located in a functional form on the computer readable media 32 that is selectively removable and may be loaded onto or transferred to the DPS 10 for execution by the processor unit 14 .
- the program code 38 and computer readable media 32 may form a computer program product 30 in these examples.
- the computer readable media 32 may be computer readable storage media 34 or computer readable signal media 36 .
- Computer readable storage media 34 may include, for example, an optical or magnetic disk that is inserted or placed into a drive or other device that is part of the persistent storage 18 for transfer onto a storage device, such as a hard drive, that is part of the persistent storage 18 .
- the computer readable storage media 34 also may take the form of a persistent storage, such as a hard drive, a thumb drive, or a flash memory, that is connected to the DPS 10 . In some instances, the computer readable storage media 34 may not be removable from the DPS 10 .
- the program code 38 may be transferred to the DPS 10 using the computer readable signal media 36 .
- the computer readable signal media 36 may be, for example, a propagated data signal containing the program code 38 .
- the computer readable signal media 36 may be an electromagnetic signal, an optical signal, and/or any other suitable type of signal. These signals may be transmitted over communications links, such as wireless communications links, optical fiber cable, coaxial cable, a wire, and/or any other suitable type of communications link.
- the communications link and/or the connection may be physical or wireless in the illustrative examples.
- the program code 38 may be downloaded over a network to the persistent storage 18 from another device or DPS through the computer readable signal media 36 for use within the DPS 10 .
- program code stored in a computer readable storage medium in a server DPS may be downloaded over a network from the server to the DPS 10 .
- the DPS providing the program code 38 may be a server computer, a client computer, or some other device capable of storing and transmitting the program code 38 .
- the different components illustrated for the DPS 10 are not meant to provide architectural limitations to the manner in which different embodiments may be implemented.
- the different illustrative embodiments may be implemented in a DPS including components in addition to or in place of those illustrated for the DPS 10 .
- Cloud computing is a model of service delivery for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, network bandwidth, servers, processing, memory, storage, applications, virtual machines, and services) that can be rapidly provisioned and released with minimal management effort or interaction with a provider of the service.
- This cloud model may include at least five characteristics, at least three service models, and at least four deployment models.
- On-demand self-service a cloud consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with the service's provider.
- Resource pooling the provider's computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to demand. There is a sense of location independence in that the consumer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter).
- Rapid elasticity capabilities can be rapidly and elastically provisioned, in some cases automatically, to quickly scale out and rapidly released to quickly scale in. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be purchased in any quantity at any time.
- Measured service cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service.
- level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts).
- SaaS Software as a Service: the capability provided to the consumer is to use the provider's applications running on a cloud infrastructure.
- the applications are accessible from various client devices through a thin client interface such as a web browser (e.g., web-based e-mail).
- a web browser e.g., web-based e-mail
- the consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.
- PaaS Platform as a Service
- the consumer does not manage or control the underlying cloud infrastructure including networks, servers, operating systems, or storage, but has control over the deployed applications and possibly application hosting environment configurations.
- IaaS Infrastructure as a Service
- the consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, deployed applications, and possibly limited control of select networking components (e.g., host firewalls).
- Private cloud the cloud infrastructure is operated solely for an organization. It may be managed by the organization or a third party and may exist on-premises or off-premises.
- Public cloud the cloud infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud services.
- Hybrid cloud the cloud infrastructure is a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load-balancing between clouds).
- a cloud computing environment is service oriented with a focus on statelessness, low coupling, modularity, and semantic interoperability.
- An infrastructure that includes a network of interconnected nodes.
- cloud computing environment 52 includes one or more cloud computing nodes 50 with which local computing devices used by cloud consumers, such as, for example, personal digital assistant (PDA) or cellular telephone 54 A, desktop computer 54 B, laptop computer 54 C, and/or automobile computer system 54 N may communicate.
- Nodes 50 may communicate with one another. They may be grouped (not shown) physically or virtually, in one or more networks, such as Private, Community, Public, or Hybrid clouds as described hereinabove, or a combination thereof.
- This allows cloud computing environment 52 to offer infrastructure, platforms and/or software as services for which a cloud consumer does not need to maintain resources on a local computing device.
- computing devices 54 A-N shown in FIG. 1 B are intended to be illustrative only and that computing nodes 50 and cloud computing environment 52 can communicate with any type of computerized device over any type of network and/or network addressable connection (e.g., using a web browser).
- FIG. 1 C a set of functional abstraction layers provided by cloud computing environment 52 ( FIG. 1 B ) is shown. It should be understood in advance that the components, layers, and functions shown in FIG. 1 C are intended to be illustrative only and embodiments of the invention are not limited thereto. As depicted, the following layers and corresponding functions are provided:
- Hardware and software layer 60 includes hardware and software components.
- hardware components include: mainframes 61 ; RISC (Reduced Instruction Set Computer) architecture based servers 62 ; servers 63 ; blade servers 64 ; storage devices 65 ; and networks and networking components 66 .
- software components include network application server software 67 and database software 68 .
- Virtualization layer 70 provides an abstraction layer from which the following examples of virtual entities may be provided: virtual servers 71 ; virtual storage 72 ; virtual networks 73 , including virtual private networks; virtual applications and operating systems 74 ; and virtual clients 75 .
- management layer 80 may provide the functions described below.
- Resource provisioning 81 provides dynamic procurement of computing resources and other resources that are utilized to perform tasks within the cloud computing environment.
- Metering and Pricing 82 provide cost tracking as resources are utilized within the cloud computing environment, and billing or invoicing for consumption of these resources. In one example, these resources may include application software licenses.
- Security provides identity verification for cloud consumers and tasks, as well as protection for data and other resources.
- User portal 83 provides access to the cloud computing environment for consumers and system administrators.
- Service level management 84 provides cloud computing resource allocation and management such that required service levels are met.
- Service Level Agreement (SLA) planning and fulfillment 85 provide pre-arrangement for, and procurement of, cloud computing resources for which a future requirement is anticipated in accordance with an SLA.
- SLA Service Level Agreement
- Workloads layer 90 provides examples of functionality for which the cloud computing environment may be utilized. Examples of workloads and functions which may be provided from this layer include: mapping and navigation 91 ; software development and lifecycle management 92 ; virtual classroom education delivery 93 ; data analytics processing 94 ; transaction processing 95 ; and application processing 96 .
- Any of the nodes 50 in the computing environment 52 as well as the computing devices 54 A-N may be a DPS 10 .
- the present invention may be a system, a method, and/or a computer readable media at any possible technical detail level of integration
- the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention
- the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
- the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
- a non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
- RAM random access memory
- ROM read-only memory
- EPROM or Flash memory erasable programmable read-only memory
- SRAM static random access memory
- CD-ROM compact disc read-only memory
- DVD digital versatile disk
- memory stick a floppy disk
- a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon
- a computer readable storage medium is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
- Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
- the network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
- a network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
- Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, configuration data for integrated circuitry, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++, or the like, and procedural programming languages, such as the “C” programming language or similar programming languages.
- the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
- electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
- These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
- the computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
- each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
- the functions noted in the blocks may occur out of the order noted in the Figures.
- two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
- the one or more embodiments disclosed herein accordingly provide an improvement to computer technology.
- an improvement to a computer database comprising medical information allows for a more efficient and effective resolution of ambiguity that may exist within the database.
- MCEs and SMTCs are known in the medical industry. However, there may be instances in which multiple concepts and related MCEs are present in a span of medical text, or instances in which a single concept may map to multiple SMTCs (a one-to-many relationship). Having multiple concepts and related MCEs or multiple SMTCs may result in a lack of clarity or produce ambiguity, and thus, it may be desirable to eliminate the multiplicity and provide a single MCE and/or SMTC.
- code point resolution The process of ultimately producing a single SMTP from a span of medical text is referred to herein as “code point resolution”.
- Code point resolution means selecting a single SMTC from among multiple candidate SMTCs (CSMTCs) for a given portion of medical text that is most appropriate to a particular application that will use the information, or domain for the solution.
- CSMTCs candidate SMTCs
- the system would consider codes related to “radiation therapy” (treatment) before considering “ionizing radiation” (physical force), which would not be associated with clinical notes very often.
- Code point resolution by the code point resolver may be performed by considering concepts or associated MCEs that are a best fit for an application, and mapping an individual MCE(s) to a single code point.
- the code point resolver may use other information, such as clinical notes and/or structured data to disambiguate. “Medical text”, “clinical notes”, and “other information” may, in some cases, have a similar form, but may constitute separate documents or be delineated in some manner, such as having different origins or being part of a separate entry and the like.
- “Clinical notes”, as defined herein, refers to a wide variety of documents generated on behalf of a patient, and may include, but is not limited by, the FHIR definition of clinical notes. Usually, each note is for a specific event, such as a consultation, discharge, procedure, etc.
- the process may, in some embodiments, determine a fitness score for each CSMTC and then determine the code point as the one CSMTC having the highest fitness score. Other techniques discussed herein may be utilized to determine the code point as well. If structured data is available for the patient, information in that structured data may be utilized in addition to the (unstructured) medical text for improving accuracy.
- “Medical text” may additionally include research papers, clinical trial protocols, or other data not related to a specific patient.
- Code point resolution relates to concept detection and mapping these concepts to concept codes, i.e., the SMTCs, such as the Systematized Nomenclature of Medicine—Clinical Terms (SNOMED—CT) codes, using the approaches and techniques described herein, to reach a decision as to which CSMTC best represents the information in a span or portion of medical text.
- Other SMTCs may involve terminologies such as the International Classification of Diseases (ICD) ICD-9-CM, ICD-10, ICD-O-3, ICD-10-AM, Laboratory Logical Observation Identifiers Names and Codes (LOINC), RxNorm, which is part of the UMLS terminology, and the Office of Population Censuses and Surveys Classification of Interventions and Procedures version 4 (OPCS-4).
- ICD International Classification of Diseases
- LINC Laboratory Logical Observation Identifiers Names and Codes
- RxNorm which is part of the UMLS terminology
- OPCS-4 Office of Population Censuses and Surveys Classification
- the SMTCs may support standards such as the American National Standards Institute (ANSI), the Digital Imaging and Communications in Medicine (DICOM), Health Level Seven International (HL7), and the International Organization for Standardization (ISO) standards.
- ANSI American National Standards Institute
- DICOM Digital Imaging and Communications in Medicine
- HL7 Health Level Seven International
- ISO International Organization for Standardization
- SNOMED—CT as a particular instance of a SMTC system, is a standardized vocabulary of clinical terminology that is widely used by health care provides for the electronic exchange of clinical health information.
- SNOMED codes are a common standard for exchanging clinical information between providers. SNOMED-CT codes tend to focus on clinical information. Because of that, SNOMED-CT is often used by care providers and insurance companies for exchanging structured medical information, and thus it is more standard in the industry for “end users”.
- NLP natural language processing
- MCEs detected from the medical text “in context”, i.e., using relationships provided in the metathesaurus and context information from external sources, such as available clinical notes (which, e.g., may be broader than just nearby words in a given medical text item).
- the UMLS includes a biomedical metathesaurus, the UMLSM, that is organized by concept/meaning, and links similar names (or surface forms) for a particular concept from nearly two-hundred vocabularies.
- a “concept” is a fundamental unit of meaning in this metathesaurus, which represents a single meaning—every concept is assigned a concept unique identifier (CUI).
- CUI concept unique identifier
- This metathesaurus also identifies useful relationships between concepts and preserves the meanings and relationships from each vocabulary. Solutions often summarize clinical information as SMTCs, such as SNOMED codes, making use of detected CUIs and relationships defined by the metathesaurus.
- the UMLS CUIs are generally used by NLP related tasks to extract meaning from text.
- UMLS is built from over one-hundred different vocabularies (including SNOMED). Mappings between vocabs get complicated quickly.
- a single surface form may have multiple possible meanings (CUIs), each CUI may be associated with zero or more SNOMED codes, and SNOMED codes may be associated with more than one CUI/concept.
- CUIs single surface form
- SNOMED codes may be associated with zero or more SNOMED codes
- SNOMED codes may be associated with more than one CUI/concept.
- This results is an n-to-n relationship between UMLS CUIs and SNOMED codes that makes it difficult to get from a word or phrase to the ideal SNOMED code or codes that can be used by a higher level application.
- Muscle weakness vs.
- FIG. 2 A is a block diagram of a concept tree 200 (alternately, surface form) that illustrates an instance in which the text portion maps to a single CUI having multiple CSMTCs (e.g., SNOMED codes).
- the concept tree 200 shows a relationship between a key word(s) of a medical text portion, one or more related CUIs, and one or more SMTCs.
- the concept tree 200 may have a “covered text” field 205 that indicates the relevant text for a medical text portion.
- a medical text (described below) input portion may be “the patent received radiation treatment”, resulting in the covered text being “radiation”.
- the relevant single UMLS CUI 210 meaning is designated “C1522449” from the metathesaurus in this example, which corresponds to the concept/meaning of a therapeutic radiology procedure.
- this CUI may be related to two SMTCs: a first SMTC 215 , such as a first SNOMED code (here, by way of example, 108290001 corresponding to radiation oncology and/or radiotherapy), and a second SMTC, such as a second SNOMED code (here, by way of example, 5343800 corresponding to radiation therapy procedure or service).
- each CUI may map to zero or more SMTCs. It may map to zero SMTCs because it is possible that no SMTC is defined for the particular meaning; it is also possible that multiple SMTCs could apply to the meaning. For example, if the CUI for “Therapeutic Radiology Procedure” is discovered for the text “radiation”, there are two SNOMED codes that are mapped to the CUI.
- FIG. 2 B is a block diagram that illustrates multiple CUIs due to ambiguity applying to the covered text.
- a single medical text span or medical text portion may have multiple relevant UMLS concepts defined for it.
- the UMLS is a combination of many vocabularies, and these vocabularies may not agree on a specific meaning. This is partly because a single surface form might have different meanings in different contexts.
- “nephrectomy” could refer to a total nephrectomy in one context, or any type of nephrectomy in another context.
- each CUI has a distinct SNOMED code mapped to it.
- the concept tree 250 has a “covered text” field 255 that indicates the covered text is “nephrectomy”.
- the first UMLS CUI 260 meaning is designated “C0176996” which means a total nephrectomy.
- This is related to a first SMTC 265 , such as a first SNOMED code (here, by way of example, 175905003).
- the second UMLS CUI 270 meaning is designated “C0027695” which means a nephrectomy.
- This is related to a second SMTC 275 such as a second SNOMED code (here, by way of example, 108022006).
- FIG. 3 A is a block diagram of a concept tree 300 that combines multiple ideas, which is another reason that multiple concepts may exist.
- the concept tree 300 represents a combination of ideas, and the combination does not have a single UMLS CUI.
- FIG. 3 A illustrates an example concept tree 300 in which the covered text field 305 includes a combination of ideas: “BillRoth II” and “GastroJejunostomy”.
- the first UMLS CUI 310 meaning is designated “C0399839” which means a gastrojejunostomy.
- This is related to a first SMTC 315 , such as a first SNOMED code (here, by way of example, 442338001).
- the second UMLS CUI 320 meaning is designated “C0192444” which means a BillRoth II procedure.
- This is related to a second SMTC 325 , such as a second SNOMED code (here, by way of example, 83985009).
- FIG. 3 B is a block diagram that illustrates a concept tree 350 for an example of increased complexity.
- the covered text field 355 includes the term “radiation”, which has multiple CUIs: one with multiple SNOMED codes, and one with a single SNOMED code.
- the first UMLS CUI 360 meaning is designated “C1533449” which means a therapeutic radiology procedure.
- SMTC 365 such as a first SNOMED code (here, by way of example, 108390001 for radiation oncology and/or radiotherapy), and a second SMTC 370 , such as a second SNOMED code (here, by way of example, 53438000 for a radiation therapy procedure or service).
- the second UMLS CUI 375 meaning is designated “C1534030” which means radiation ionizing radiotherapy.
- SMTC 380 such as a second SNOMED code (here, by way of example, 135576007).
- FIGS. 4 A- 4 C are block diagrams illustrating process flows 400 A, 400 B, 400 C for associating a best-fit SMTC with a respective event.
- FIG. 5 is a block diagram illustrating a system 500 within which the code point resolver 520 may operate.
- medical input data 514 including the medical text 512
- the medical management system 510 may comprise any number of computers, such as DPSs 10 , that are connected via a network, and may be implemented, for example, in a cloud computing environment 52 .
- the code point resolver 520 may operate with the application processing 96 , as described above.
- the medical input data 514 may comprise medical text 512 and related other information, such as clinical notes and structured data, may be all or a part of an electronic medical record (EMR).
- EMR electronic medical record
- the medical input data 514 may be received by the code point resolver 520 via a network interface 522 , and received by rule-based logic 540 , which may comprise an NLP 542 , pattern matching rules 544 , supervised machine learning (ML) models 546 , or any other rule-based mechanism. Where supervised ML models 546 are used, such models may be trained in a training phase using a set of training data that relates medical text to SMTCs and/or provides selecting a single SMTC from a set of SMTCs.
- the rule-based logic 540 may comprise a knowledge base that stores relationships and mappings between CUIs and SMTCs.
- the medical text 512 may be broken down into medical text portions. For example, if the medical text 512 contains information from multiple visits to a facility, multiple procedures performed, etc., the NLP 542 may break the information into individual portions to simplify the processing. This breaking down or parsing of the medical text 512 by the NLP 542 may be based on a mechanism such as punctuation, keywords, parts of language (nouns, verbs, etc.) or using other known techniques for language parsing. The medical text portions may be further processed by the NLP 542 to remove superfluous words and organize the text in a consistent manner. Additionally, the NLP 542 may perform a tokenization of the medical text 512 . The NLP 542 may determine one or more concepts/CUIs associated with the medical text 512 or text portions.
- the code point resolver 520 in order to resolve the code point, i.e., the best-fit SMTC, may consider multiple concepts/CUIs that are the best fit for an application.
- the code point resolver 520 may determine that certain concepts and/or certain types of concepts are more valuable for a medical text portion 512 than others.
- the rule-based logic 540 may make this determination by incorporating NLP 542 , pattern matching rules 544 , and/or supervised machine learning models 546 .
- the algorithm 540 has a relevance determiner 548 that determines a medical text portion 512 relates to a radiation procedure, then it would determine applicable CUIs that are therapeutic or preventive procedures.
- non-procedure-based concepts such as the concept for “electromagnetic radiation” or “radiation physical force” may thus be considered not relevant and filtered out and not considered for mapping into an SMTC(s), since it is much more likely that documentation of a clinical visit is referring to a type of radiation therapy.
- the delineation of a procedure vs. non-procedure may be, for example, found in definitions of the SMTCs themselves, or may be distinguished by being “therapeutic and diagnostic procedures” as opposed to something that is for example a “physical object” (e.g., a positron emission tomography (PET) scan vs PET system).
- PET positron emission tomography
- disorder vs. organism might be a distinction that could be used to delineate various terms, such as SARS, where the text could potentially refer to either a disorder or an organism.
- the surface form matching logic uses the longest match it can find.
- the individual concepts are mapped to a code point.
- the code point resolver 520 determines the code point for a CUI by applying the rule-based logic 540 that considers common parameters of an application.
- the rule-based logic 540 may thus use the relevance determiner 548 to select an SMTC or filter SMTCs based on the most correct intent of the CUI in the context of the medical application (e.g., codes for procedures are favored over codes for non-procedures or other intents) that may be provided to the code point resolver 520 .
- the relevance determiner 548 is show separately from the pattern matching rules 544 and the supervised machine learning models 546 , the relevance determiner 548 may make use of them or be a part of them.
- the fitness scorer 549 may make use of the pattern matching rules and/or supervised ML models 546 or be a part of them.
- the code point resolver 520 may have access to external data information sources, such as the interchange coding system 552 (e.g., SNOMED and others discussed above) to provide the SMTCs 554 , and a biomedical metathesaurus 556 (e.g., UMLSM discussed above) to provide the metathesaurus concept entities 558 .
- the interchange coding system 552 e.g., SNOMED and others discussed above
- a biomedical metathesaurus 556 e.g., UMLSM discussed above
- the rule-based logic 540 may choose the hypernym over the hyponym.
- the case in FIG. 2 B illustrates this. “Nephrectomy” could be mapped to “total nephrectomy” or “nephrectomy”. Since there is uncertainty at this point, the more general one is picked. But if other documents later include medical text about a “total nephrectomy” on the same day, then that decision may be revised to a more specific code.
- mastectomy may be chosen as the hypernym, but could be later determined to refer to other kinds of mastectomies (e.g., simple, radical, bilateral . . . ). These examples are largely based on how UMLS and SNOMED organize the relationships between procedures. The mastectomy could be viewed as an example of speaking generally about something more specific. However, in terms of code resolution, this problem may also happen because different vocabs have different mappings. Sometimes this leads to more than one mapping for the same medical text, and therefore it may be desirable to pick the most general concept for accuracy.
- a source rater may determine a reliability of the sources for the respective CSMTCs, and the CSMTC with the highest reliability rating. Because UMLS has many vocabularies and mappings, some sources are more reliable. If multiple CSMTCs continue to remain, then further logic may apply, such as the oldest CUI that exists in UMLS being chosen. Older CUIs are more familiar, and are more often used in practice. To determine the age of a CUI, it may be possible to determine an absolute or possibly a relative age based on a length or a value of the CUI (e.g., newer CUIs may have longer identifiers).
- the age might be determined by loading each version of the UML's database and recording when a particular CUI first appeared in the database. In some embodiments, age is used as a proxy for how frequently a code is used, based on a presumption that CUIs that have been around a while are more in use than newer ones. Determining the frequency that a CUI or SMTC is used within a large corpus may be an alternative mechanism for decision-making.
- the code point resolver 520 may try to disambiguate using other information, such as clinical notes or structured data.
- the code point resolver 520 will often detect a same event for a particular SMTC in multiple notes associated with the text portion. Some of these notes have more detail than others.
- an operative clinical note may state a specific “skin-saving mastectomy”, while an assessment clinical note may simplify this and simply state in a general manner that the patient had a “mastectomy”.
- These operative and assessment clinical notes may be aggregated together by the code point resolver 520 , and SMTC disambiguation may be performed at that time.
- the code point resolver 520 may consider identifying information related to the events, relationships between SMTCs and CSMTCs, as well as any detected date for the event. Further, the code point resolver 520 may then decide whether to combine the information from two events or not. If events are combined, then the most suitable code, based on the process discussed above, may be selected for the combined event.
- the code point resolver 520 may determine that the patient identifier is the same for two events represented by text portions, and the two clinical notes are both for the same day (or within a predefined segment of time), and thus logically determine that these two different notes both refer to the same event.
- Other rules or logic may be used to make this determination by the rule-based logic 540 .
- FIGS. 4 A- 4 C are block diagrams illustrating process flows for associating a best-fit SMTC with a respective event, according to some embodiments.
- FIG. 4 A is a block diagram illustrating an event 400 A for processing an event with ambiguous CSMTCs 420 , 425 .
- the medical input data 514 relates to a surgery event 410 having a first possible CSMTC 420 , where the CSMTC refers to a nephrectomy, and a second possible CSMTC 425 refers to a total nephrectomy.
- the date evidence 414 associated with the surgery event 410 indicates the date of occurrence simply as being some time in 2019.
- the event evidence 416 makes reference to a generic “nephrectomy”. As can be seen by the medical text 430 A, the indication is that “the patient had a nephrectomy” 432 A “in 2019 ” 434 A.
- FIG. 4 B is a block diagram illustrating an event 400 B that shows another event that has been constructed from a different text portion showing more detail, namely, the text portion clarifies that “the patient had a total nephrectomy” 432 B and has a more specific “December 2019” 434 B date. It also has a single CSMTC 425 .
- the event in FIG. 4 B can be combined with the event in FIG. 4 A based on a relatedness of the procedures and relatedness of the date, even though one is more general than the other.
- FIG. 4 C is a block diagram illustrating a combination 400 C of the events 400 A, 400 B from FIG. 4 A and FIG. 4 B .
- the most specific code 425 the total nephrectomy, has been selected for the new event by the rule-based logic 540 , based on the more specific text portion 430 B, 432 B.
- the most specific date 434 B has also been updated by the rule-based logic 540 .
- the evidence from the two prior events 400 A, 400 B may be included in the new event 400 C.
- a fitness score may be determined by the rule-based logic 540 in the event that the code point 516 cannot be determined by other mechanisms described herein.
- the rule-based logic 540 may utilize a fitness scorer 549 for each of the CSMTCs and then choose the CSMTC having the highest score.
- the fitness scorer may perform certain of the rule-based logic 540 described above, such as providing a higher score to a hypernym over a hyponym, providing a higher score for a UMLS or SMTC that is more reliable or has a higher reliability measure, providing a higher score where an older UMLS is present.
- a reliability measure for various sources may be provided in the configuration files of the code point resolver 520 , and may be determined from developer experience, and combined potentially with measuring the accuracy of the overall system.
- some sources of the mapping are known or believed to be better sources than others.
- an arbitrary decision may be made by the rule-based logic 540 to ensure a single CSMTC is returned as the code point 516 .
- This arbitrary decision may be based on, e.g., a numerical order of a SNOMED ID value, a random selection of remaining CSMTCs, or any other mechanism to ensure a return of a single value. In some embodiments, it may be advantageous to ensure a consistent return of the code point 516 for a given input of data.
- a weighting may be applied to an MCE that is based on a medical application intent (e.g., a procedure may have a higher weighting than a test); a weighting might be applied in which hypernyms are weighted higher than hyponyms; a source that is more reliable may be weighted higher than one that is less reliable; and a weighting may be weighted according to codes by an industry acceptance rating.
- a medical application intent e.g., a procedure may have a higher weighting than a test
- a weighting might be applied in which hypernyms are weighted higher than hyponyms
- a source that is more reliable may be weighted higher than one that is less reliable
- a weighting may be weighted according to codes by an industry acceptance rating.
- FIG. 6 is a flowchart of an example process 600 that may be utilized by the code point resolver 520 .
- the code point resolver 520 receives, via a network interface 522 , medical input data 514 that may comprise medical text 512 , with possibly other information, from a medical management system 510 .
- rule-based logic 540 may be used to process the medical text 512 , and may comprise, in some embodiments, the components of a natural language processor 542 , pattern matching rules 544 , supervised ML models 546 , a relevance determiner 548 , and a fitness scorer 549 . These components may interact with one another or share algorithms and functionality.
- the rule-based logic 540 may utilize other information, such as clinical reports, and structured data, along with medical terminology codes (SMTCs) 554 or an interchange coding system 552 , such as SNOMED.
- the rule-based logic may further utilize a biomedical metathesaurus, such as the UMLSM 556 to provide metathesaurus concept entities 558 .
- a plurality of CSMTCs are associated with the medical input data 514 .
- the code point resolver 520 resolves a single code point 516 from the plurality of CSMTCs.
- the code point 516 is transmitted, via the network interface 522 , to the medical management system 510 in order to assist the medical management system 510 in resolving any ambiguity that may be present in the initial medical input data 514 .
- the code point 516 may be further utilized to construct a timeline of a patient's history. By way of example, this could be used by an oncology application that assists a physician with following National Comprehensive Cancer Network (NCCN) guidelines, and/or matching patients with relevant clinical trials. Additionally, the code point 516 may be utilized to convert the unstructured text and other associated medical input data 514 into structured data, such as into a Fast Healthcare Interoperability Resources (FHIR) record format for storage and use in the above-discussed applications.
- FHIR Fast Healthcare Interoperability Resources
Abstract
Description
- Disclosed herein is a system and related method for a code point resolution using natural language processing and a metathesaurus. In the medical domain, applications may use standardized medical terminology codes (SMTCs) to exchange clinical information. A common goal when processing unstructured patient notes is to produce SMTCs corresponding to the medical text. A common approach to extracting SMTCs from natural language is to use a biomedical metathesaurus that contains metathesaurus concept entities (MCEs), such as the Unified Medical Language System (UMLS) Metathesaurus (UMLSM). Concepts or meanings represented by MCEs that are found within the unstructured medical text may be detected using the vocabularies defined by the metathesaurus. The detected concepts may then be mapped to SMTCs using mappings in the metathesaurus and relevant MCEs.
- A method is provided for exchanging medical information with a medical management system. The method comprises receiving, using a processor of a code point resolver, from the medical management system, medical text via a network interface. A code point is a single standardized medical terminology code (SMTC) that corresponds to a medical concept contained within the medical text. The method further applies rule-based logic to process the medical text to form a localized mapping of a text portion of the medical text to a plurality of candidate SMTCs (CSMTCs) that are related to at least one metathesaurus concept entity (MCE) in a metathesaurus, and to determines the code point from the CSMTCs. The method transmits, via the network interface, to the medical management system, the code point.
- A code point resolver is provided, comprising a memory, and a processor. The code point resolver is configured to receive, using a processor of a code point resolver, from the medical management system, medical text via a network interface. A code point is a single standardized medical terminology code (SMTC) that corresponds to a medical concept contained within the medical text. The code point resolver applies rule-based logic to process the medical text to form a localized mapping of a text portion of the medical text to a plurality of candidate SMTCs (CSMTCs) that are related to at least one metathesaurus concept entity (MCE) in a metathesaurus. The code point resolver determines the code point from the CSMTCs, and transmits, via the network interface, to the medical management system, the code point.
- Furthermore, embodiments may take the form of a related computer program product, accessible from a computer-usable or computer-readable medium providing program code for use, by, or in connection, with a computer or any instruction execution system. For the purpose of this description, a computer-usable or computer-readable medium may be any apparatus that may contain a mechanism for storing, communicating, propagating or transporting the program for use, by, or in connection, with the instruction execution system, apparatus, or device.
- Various embodiments are described herein with reference to different subject-matter. In particular, some embodiments may be described with reference to methods, whereas other embodiments may be described with reference to apparatuses and systems. However, a person skilled in the art will gather from the above and the following description that, unless otherwise notified, in addition to any combination of features belonging to one type of subject-matter, also any combination between features relating to different subject-matter, in particular, between features of the methods, and features of the apparatuses and systems, are considered as to be disclosed within this document.
- The aspects defined above, and further aspects disclosed herein, are apparent from the examples of one or more embodiments to be described hereinafter and are explained with reference to the examples of the one or more embodiments, but to which the invention is not limited. Various embodiments are described, by way of example only, and with reference to the following drawings:
-
FIG. 1A is a block diagram of a data processing system (DPS) according to one or more embodiments disclosed herein. -
FIG. 1B is a pictorial diagram that depicts a cloud computing environment according to an embodiment disclosed herein. -
FIG. 1C is a pictorial diagram that depicts abstraction model layers according to an embodiment disclosed herein. -
FIG. 2A is a block diagram of a concept tree (alternately, “surface form”) that illustrates an instance in which the text portion maps to a single CUI having multiple CSMTCs (e.g., SNOMED codes), according to some embodiments. -
FIG. 2B is a block diagram that illustrates multiple CUIs due to ambiguity applying to the covered text, according to some embodiments. -
FIG. 3A is a block diagram of a concept tree that combines multiple ideas, which is another reason that multiple concepts may exist, according to some embodiments. -
FIG. 3B is a block diagram that illustrates a concept tree for an example of increased complexity, according to some embodiments. -
FIGS. 4A-4C are block diagrams illustrating process flows for associating a best-fit SMTC with a respective event, according to some embodiments. -
FIG. 5 is a block diagram illustrating a system within which the code point resolver may operate, according to some embodiments. -
FIG. 6 is a flowchart illustrating a process that may be used with the code point resolver, according to some embodiments. - The following general computer acronyms may be used below:
-
TABLE 1 General Computer Acronyms API application program interface ARM advanced RISC machine CD-ROM compact disc ROM CMS content management system CoD capacity on demand CPU central processing unit CUoD capacity upgrade on demand DPS data processing system DVD Digital versatile disk EPROM erasable programmable read-only memory FPGA field-programmable gate arrays HA high availability IaaS infrastructure as a service I/O input/output IPL initial program load ISP Internet service provider ISA instruction-set-architecture LAN local-area network LPAR logical partition PaaS platform as a service PDA personal digital assistant PLA programmable logic arrays RAM random access memory RISC reduced instruction set computer ROM read-only memory SaaS software as a service SLA service level agreement SRAM static random-access memory WAN wide-area network -
FIG. 1A is a block diagram of an example DPS according to one or more embodiments. In this illustrative example, theDPS 10 may includecommunications bus 12, which may provide communications between aprocessor unit 14, amemory 16,persistent storage 18, acommunications unit 20, an I/O unit 22, and adisplay 24. - The
processor unit 14 serves to execute instructions for software that may be loaded into thememory 16. Theprocessor unit 14 may be a number of processors, a multi-core processor, or some other type of processor, depending on the particular implementation. A number, as used herein with reference to an item, means one or more items. Further, theprocessor unit 14 may be implemented using a number of heterogeneous processor systems in which a main processor is present with secondary processors on a single chip. As another illustrative example, theprocessor unit 14 may be a symmetric multi-processor system containing multiple processors of the same type. - The
memory 16 andpersistent storage 18 are examples ofstorage devices 26. A storage device may be any piece of hardware that is capable of storing information, such as, for example without limitation, data, program code in functional form, and/or other suitable information either on a temporary basis and/or a permanent basis. Thememory 16, in these examples, may be, for example, a random access memory or any other suitable volatile or non-volatile storage device. Thepersistent storage 18 may take various forms depending on the particular implementation. - For example, the
persistent storage 18 may contain one or more components or devices. For example, thepersistent storage 18 may be a hard drive, a flash memory, a rewritable optical disk, a rewritable magnetic tape, or some combination of the above. The media used by thepersistent storage 18 also may be removable. For example, a removable hard drive may be used for thepersistent storage 18. - The
communications unit 20 in these examples may provide for communications with other DPSs or devices. In these examples, thecommunications unit 20 is a network interface card. Thecommunications unit 20 may provide communications through the use of either or both physical and wireless communications links. - The input/
output unit 22 may allow for input and output of data with other devices that may be connected to theDPS 10. For example, the input/output unit 22 may provide a connection for user input through a keyboard, a mouse, and/or some other suitable input device. Further, the input/output unit 22 may send output to a printer. Thedisplay 24 may provide a mechanism to display information to a user. - Instructions for the operating system, applications and/or programs may be located in the
storage devices 26, which are in communication with theprocessor unit 14 through thecommunications bus 12. In these illustrative examples, the instructions are in a functional form on thepersistent storage 18. These instructions may be loaded into thememory 16 for execution by theprocessor unit 14. The processes of the different embodiments may be performed by theprocessor unit 14 using computer implemented instructions, which may be located in a memory, such as thememory 16. These instructions are referred to as program code 38 (described below) computer usable program code, or computer readable program code that may be read and executed by a processor in theprocessor unit 14. The program code in the different embodiments may be embodied on different physical or tangible computer readable media, such as thememory 16 or thepersistent storage 18. - The
DPS 10 may further comprise an interface for anetwork 29. The interface may include hardware, drivers, software, and the like to allow communications over wired andwireless networks 29 and may implement any number of communication protocols, including those, for example, at various levels of the Open Systems Interconnection (OSI) seven layer model. -
FIG. 1A further illustrates acomputer program product 30 that may contain theprogram code 38. Theprogram code 38 may be located in a functional form on the computerreadable media 32 that is selectively removable and may be loaded onto or transferred to theDPS 10 for execution by theprocessor unit 14. Theprogram code 38 and computerreadable media 32 may form acomputer program product 30 in these examples. In one example, the computerreadable media 32 may be computerreadable storage media 34 or computerreadable signal media 36. Computerreadable storage media 34 may include, for example, an optical or magnetic disk that is inserted or placed into a drive or other device that is part of thepersistent storage 18 for transfer onto a storage device, such as a hard drive, that is part of thepersistent storage 18. The computerreadable storage media 34 also may take the form of a persistent storage, such as a hard drive, a thumb drive, or a flash memory, that is connected to theDPS 10. In some instances, the computerreadable storage media 34 may not be removable from theDPS 10. - Alternatively, the
program code 38 may be transferred to theDPS 10 using the computerreadable signal media 36. The computerreadable signal media 36 may be, for example, a propagated data signal containing theprogram code 38. For example, the computerreadable signal media 36 may be an electromagnetic signal, an optical signal, and/or any other suitable type of signal. These signals may be transmitted over communications links, such as wireless communications links, optical fiber cable, coaxial cable, a wire, and/or any other suitable type of communications link. In other words, the communications link and/or the connection may be physical or wireless in the illustrative examples. - In some illustrative embodiments, the
program code 38 may be downloaded over a network to thepersistent storage 18 from another device or DPS through the computerreadable signal media 36 for use within theDPS 10. For instance, program code stored in a computer readable storage medium in a server DPS may be downloaded over a network from the server to theDPS 10. The DPS providing theprogram code 38 may be a server computer, a client computer, or some other device capable of storing and transmitting theprogram code 38. - The different components illustrated for the
DPS 10 are not meant to provide architectural limitations to the manner in which different embodiments may be implemented. The different illustrative embodiments may be implemented in a DPS including components in addition to or in place of those illustrated for theDPS 10. - It is to be understood that although this disclosure includes a detailed description on cloud computing, implementation of the teachings recited herein are not limited to a cloud computing environment. Rather, embodiments of the present invention are capable of being implemented in conjunction with any other type of computing environment now known or later developed.
- Cloud computing is a model of service delivery for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, network bandwidth, servers, processing, memory, storage, applications, virtual machines, and services) that can be rapidly provisioned and released with minimal management effort or interaction with a provider of the service. This cloud model may include at least five characteristics, at least three service models, and at least four deployment models.
- Characteristics are as Follows
- On-demand self-service: a cloud consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with the service's provider.
- Broad network access: capabilities are available over a network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, laptops, and PDAs).
- Resource pooling: the provider's computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to demand. There is a sense of location independence in that the consumer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter).
- Rapid elasticity: capabilities can be rapidly and elastically provisioned, in some cases automatically, to quickly scale out and rapidly released to quickly scale in. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be purchased in any quantity at any time.
- Measured service: cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service.
- Service Models are as Follows
- Software as a Service (SaaS): the capability provided to the consumer is to use the provider's applications running on a cloud infrastructure. The applications are accessible from various client devices through a thin client interface such as a web browser (e.g., web-based e-mail). The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.
- Platform as a Service (PaaS): the capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including networks, servers, operating systems, or storage, but has control over the deployed applications and possibly application hosting environment configurations.
- Infrastructure as a Service (IaaS): the capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, deployed applications, and possibly limited control of select networking components (e.g., host firewalls).
- Deployment Models are as Follows
- Private cloud: the cloud infrastructure is operated solely for an organization. It may be managed by the organization or a third party and may exist on-premises or off-premises.
- Community cloud: the cloud infrastructure is shared by several organizations and supports a specific community that has shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be managed by the organizations or a third party and may exist on-premises or off-premises.
- Public cloud: the cloud infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud services.
- Hybrid cloud: the cloud infrastructure is a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load-balancing between clouds).
- A cloud computing environment is service oriented with a focus on statelessness, low coupling, modularity, and semantic interoperability. At the heart of cloud computing is an infrastructure that includes a network of interconnected nodes.
- Referring now to
FIG. 1B , illustrativecloud computing environment 52 is depicted. As shown,cloud computing environment 52 includes one or morecloud computing nodes 50 with which local computing devices used by cloud consumers, such as, for example, personal digital assistant (PDA) orcellular telephone 54A,desktop computer 54B,laptop computer 54C, and/orautomobile computer system 54N may communicate.Nodes 50 may communicate with one another. They may be grouped (not shown) physically or virtually, in one or more networks, such as Private, Community, Public, or Hybrid clouds as described hereinabove, or a combination thereof. This allowscloud computing environment 52 to offer infrastructure, platforms and/or software as services for which a cloud consumer does not need to maintain resources on a local computing device. It is understood that the types ofcomputing devices 54A-N shown inFIG. 1B are intended to be illustrative only and thatcomputing nodes 50 andcloud computing environment 52 can communicate with any type of computerized device over any type of network and/or network addressable connection (e.g., using a web browser). - Referring now to
FIG. 1C , a set of functional abstraction layers provided by cloud computing environment 52 (FIG. 1B ) is shown. It should be understood in advance that the components, layers, and functions shown inFIG. 1C are intended to be illustrative only and embodiments of the invention are not limited thereto. As depicted, the following layers and corresponding functions are provided: - Hardware and
software layer 60 includes hardware and software components. Examples of hardware components include:mainframes 61; RISC (Reduced Instruction Set Computer) architecture basedservers 62;servers 63;blade servers 64; storage devices 65; and networks andnetworking components 66. In some embodiments, software components include networkapplication server software 67 anddatabase software 68. -
Virtualization layer 70 provides an abstraction layer from which the following examples of virtual entities may be provided:virtual servers 71;virtual storage 72;virtual networks 73, including virtual private networks; virtual applications andoperating systems 74; andvirtual clients 75. - In one example,
management layer 80 may provide the functions described below.Resource provisioning 81 provides dynamic procurement of computing resources and other resources that are utilized to perform tasks within the cloud computing environment. Metering andPricing 82 provide cost tracking as resources are utilized within the cloud computing environment, and billing or invoicing for consumption of these resources. In one example, these resources may include application software licenses. Security provides identity verification for cloud consumers and tasks, as well as protection for data and other resources.User portal 83 provides access to the cloud computing environment for consumers and system administrators.Service level management 84 provides cloud computing resource allocation and management such that required service levels are met. Service Level Agreement (SLA) planning andfulfillment 85 provide pre-arrangement for, and procurement of, cloud computing resources for which a future requirement is anticipated in accordance with an SLA. -
Workloads layer 90 provides examples of functionality for which the cloud computing environment may be utilized. Examples of workloads and functions which may be provided from this layer include: mapping andnavigation 91; software development andlifecycle management 92; virtualclassroom education delivery 93; data analytics processing 94;transaction processing 95; andapplication processing 96. - Any of the
nodes 50 in thecomputing environment 52 as well as thecomputing devices 54A-N may be aDPS 10. - The present invention may be a system, a method, and/or a computer readable media at any possible technical detail level of integration. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
- The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
- Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
- Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, configuration data for integrated circuitry, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++, or the like, and procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
- Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
- These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
- The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
- The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the blocks may occur out of the order noted in the Figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
- The one or more embodiments disclosed herein accordingly provide an improvement to computer technology. For example, an improvement to a computer database comprising medical information allows for a more efficient and effective resolution of ambiguity that may exist within the database.
- The following application specific acronyms may be used below:
-
TABLE 2 Application Specific Acronyms ANSI American National Standards Institute CSMTC candidate standardized medical terminology codes CT clinical terms CUI concept unique identifier DICOM Digital Imaging and Communications in Medicine EMR electronic medical record HL7 Health Level Seven International IDC International Classification of Diseases ISO International Organization for Standardization LOINC Laboratory Logical Observation Identifiers Names and Codes MCE metathesaurus concept entity NCCN National Comprehensive Cancer Network NLP natural language processing OPCS-4 Office of Population Censuses and Surveys Classification of Interventions and Procedures version 4 PET positron emission tomography SMTC standardized medical terminology codes, for example, SNOMED SNOMED Systematized Nomenclature of Medicine UMLS Unified Medical Language System UMLSM Unified Medical Language System metathesaurus - The use of MCEs and SMTCs are known in the medical industry. However, there may be instances in which multiple concepts and related MCEs are present in a span of medical text, or instances in which a single concept may map to multiple SMTCs (a one-to-many relationship). Having multiple concepts and related MCEs or multiple SMTCs may result in a lack of clarity or produce ambiguity, and thus, it may be desirable to eliminate the multiplicity and provide a single MCE and/or SMTC. The process of ultimately producing a single SMTP from a span of medical text is referred to herein as “code point resolution”. Code point resolution means selecting a single SMTC from among multiple candidate SMTCs (CSMTCs) for a given portion of medical text that is most appropriate to a particular application that will use the information, or domain for the solution. By way of example, if a term “radiation” were found in clinical notes with a clinical application, the system would consider codes related to “radiation therapy” (treatment) before considering “ionizing radiation” (physical force), which would not be associated with clinical notes very often.
- Disclosed herein is a code point resolver system and related method that may be used to resolve a code point for one or more concepts over a span of unstructured medical text. Code point resolution by the code point resolver may be performed by considering concepts or associated MCEs that are a best fit for an application, and mapping an individual MCE(s) to a single code point. In the case where multiple MCEs cause multiple SMTCs to be CSMTCs, the code point resolver may use other information, such as clinical notes and/or structured data to disambiguate. “Medical text”, “clinical notes”, and “other information” may, in some cases, have a similar form, but may constitute separate documents or be delineated in some manner, such as having different origins or being part of a separate entry and the like. “Clinical notes”, as defined herein, refers to a wide variety of documents generated on behalf of a patient, and may include, but is not limited by, the FHIR definition of clinical notes. Usually, each note is for a specific event, such as a consultation, discharge, procedure, etc. When multiple CSMTCs still exist, the process may, in some embodiments, determine a fitness score for each CSMTC and then determine the code point as the one CSMTC having the highest fitness score. Other techniques discussed herein may be utilized to determine the code point as well. If structured data is available for the patient, information in that structured data may be utilized in addition to the (unstructured) medical text for improving accuracy. “Medical text” may additionally include research papers, clinical trial protocols, or other data not related to a specific patient.
- Code point resolution relates to concept detection and mapping these concepts to concept codes, i.e., the SMTCs, such as the Systematized Nomenclature of Medicine—Clinical Terms (SNOMED—CT) codes, using the approaches and techniques described herein, to reach a decision as to which CSMTC best represents the information in a span or portion of medical text. Other SMTCs may involve terminologies such as the International Classification of Diseases (ICD) ICD-9-CM, ICD-10, ICD-O-3, ICD-10-AM, Laboratory Logical Observation Identifiers Names and Codes (LOINC), RxNorm, which is part of the UMLS terminology, and the Office of Population Censuses and Surveys Classification of Interventions and Procedures version 4 (OPCS-4). The SMTCs may support standards such as the American National Standards Institute (ANSI), the Digital Imaging and Communications in Medicine (DICOM), Health Level Seven International (HL7), and the International Organization for Standardization (ISO) standards. The SNOMED—CT, as a particular instance of a SMTC system, is a standardized vocabulary of clinical terminology that is widely used by health care provides for the electronic exchange of clinical health information. SNOMED codes are a common standard for exchanging clinical information between providers. SNOMED-CT codes tend to focus on clinical information. Because of that, SNOMED-CT is often used by care providers and insurance companies for exchanging structured medical information, and thus it is more standard in the industry for “end users”.
- The techniques described herein may involve natural language processing (NLP), and may be practiced, for example, without user interaction. They may disambiguate SMTCs for MCEs detected from the medical text “in context”, i.e., using relationships provided in the metathesaurus and context information from external sources, such as available clinical notes (which, e.g., may be broader than just nearby words in a given medical text item).
- The UMLS includes a biomedical metathesaurus, the UMLSM, that is organized by concept/meaning, and links similar names (or surface forms) for a particular concept from nearly two-hundred vocabularies. A “concept” is a fundamental unit of meaning in this metathesaurus, which represents a single meaning—every concept is assigned a concept unique identifier (CUI). This metathesaurus also identifies useful relationships between concepts and preserves the meanings and relationships from each vocabulary. Solutions often summarize clinical information as SMTCs, such as SNOMED codes, making use of detected CUIs and relationships defined by the metathesaurus.
- The UMLS CUIs are generally used by NLP related tasks to extract meaning from text. UMLS is built from over one-hundred different vocabularies (including SNOMED). Mappings between vocabs get complicated quickly. A single surface form may have multiple possible meanings (CUIs), each CUI may be associated with zero or more SNOMED codes, and SNOMED codes may be associated with more than one CUI/concept. This results is an n-to-n relationship between UMLS CUIs and SNOMED codes that makes it difficult to get from a word or phrase to the ideal SNOMED code or codes that can be used by a higher level application. By way of example, “muscle weakness” vs. “incomplete paralysis” are concepts in different vocabs that might be a single concept in one vocab, and multiple concepts in another. The UMLS tends to provide the mappings and the consumer figures out what to do with them, and thus, it is not a practical tool for end users or high level systems. An aim herein is then to find a single SMTC, such as a SNOMED code, that is most useful to the consumer/application for the medical text.
-
FIG. 2A is a block diagram of a concept tree 200 (alternately, surface form) that illustrates an instance in which the text portion maps to a single CUI having multiple CSMTCs (e.g., SNOMED codes). Theconcept tree 200 shows a relationship between a key word(s) of a medical text portion, one or more related CUIs, and one or more SMTCs. Theconcept tree 200 may have a “covered text”field 205 that indicates the relevant text for a medical text portion. In the example shown, a medical text (described below) input portion may be “the patent received radiation treatment”, resulting in the covered text being “radiation”. The relevantsingle UMLS CUI 210 meaning is designated “C1522449” from the metathesaurus in this example, which corresponds to the concept/meaning of a therapeutic radiology procedure. However, this CUI may be related to two SMTCs: afirst SMTC 215, such as a first SNOMED code (here, by way of example, 108290001 corresponding to radiation oncology and/or radiotherapy), and a second SMTC, such as a second SNOMED code (here, by way of example, 5343800 corresponding to radiation therapy procedure or service). - There are reasons that multiple SMTCs may be relevant. Although the single CUI in 000this example represents a single meaning, each CUI may map to zero or more SMTCs. It may map to zero SMTCs because it is possible that no SMTC is defined for the particular meaning; it is also possible that multiple SMTCs could apply to the meaning. For example, if the CUI for “Therapeutic Radiology Procedure” is discovered for the text “radiation”, there are two SNOMED codes that are mapped to the CUI.
-
FIG. 2B is a block diagram that illustrates multiple CUIs due to ambiguity applying to the covered text. A single medical text span or medical text portion may have multiple relevant UMLS concepts defined for it. The UMLS is a combination of many vocabularies, and these vocabularies may not agree on a specific meaning. This is partly because a single surface form might have different meanings in different contexts. For example, inFIG. 2B , “nephrectomy” could refer to a total nephrectomy in one context, or any type of nephrectomy in another context. In this example, each CUI has a distinct SNOMED code mapped to it. As shown, theconcept tree 250 has a “covered text”field 255 that indicates the covered text is “nephrectomy”. Thefirst UMLS CUI 260 meaning is designated “C0176996” which means a total nephrectomy. This is related to afirst SMTC 265, such as a first SNOMED code (here, by way of example, 175905003). Thesecond UMLS CUI 270 meaning is designated “C0027695” which means a nephrectomy. This is related to asecond SMTC 275, such as a second SNOMED code (here, by way of example, 108022006). -
FIG. 3A is a block diagram of aconcept tree 300 that combines multiple ideas, which is another reason that multiple concepts may exist. In this example, theconcept tree 300 represents a combination of ideas, and the combination does not have a single UMLS CUI.FIG. 3A illustrates anexample concept tree 300 in which the coveredtext field 305 includes a combination of ideas: “BillRoth II” and “GastroJejunostomy”. Here, thefirst UMLS CUI 310 meaning is designated “C0399839” which means a gastrojejunostomy. This is related to afirst SMTC 315, such as a first SNOMED code (here, by way of example, 442338001). Thesecond UMLS CUI 320 meaning is designated “C0192444” which means a BillRoth II procedure. This is related to asecond SMTC 325, such as a second SNOMED code (here, by way of example, 83985009). - In practice, these examples may combine to create enormous complexity.
FIG. 3B is a block diagram that illustrates aconcept tree 350 for an example of increased complexity. For example, inFIG. 3B , the coveredtext field 355 includes the term “radiation”, which has multiple CUIs: one with multiple SNOMED codes, and one with a single SNOMED code. Here, thefirst UMLS CUI 360 meaning is designated “C1533449” which means a therapeutic radiology procedure. This is related to two SMTCs: afirst SMTC 365, such as a first SNOMED code (here, by way of example, 108390001 for radiation oncology and/or radiotherapy), and asecond SMTC 370, such as a second SNOMED code (here, by way of example, 53438000 for a radiation therapy procedure or service). Thesecond UMLS CUI 375 meaning is designated “C1534030” which means radiation ionizing radiotherapy. This is related to a third (single)SMTC 380, such as a second SNOMED code (here, by way of example, 135576007). Far more complex concept trees are possible by invoking this principle. The result is a significant increase of SMTCs associated with a single event. -
FIGS. 4A-4C are block diagrams illustrating process flows 400A, 400B, 400C for associating a best-fit SMTC with a respective event. -
FIG. 5 is a block diagram illustrating asystem 500 within which thecode point resolver 520 may operate. As shown inFIG. 5 ,medical input data 514, including themedical text 512, may originate from amedical management system 510. Themedical management system 510 may comprise any number of computers, such asDPSs 10, that are connected via a network, and may be implemented, for example, in acloud computing environment 52. Thecode point resolver 520 may operate with theapplication processing 96, as described above. Themedical input data 514 may comprisemedical text 512 and related other information, such as clinical notes and structured data, may be all or a part of an electronic medical record (EMR). Themedical input data 514 may be received by thecode point resolver 520 via anetwork interface 522, and received by rule-basedlogic 540, which may comprise anNLP 542, pattern matching rules 544, supervised machine learning (ML)models 546, or any other rule-based mechanism. Wheresupervised ML models 546 are used, such models may be trained in a training phase using a set of training data that relates medical text to SMTCs and/or provides selecting a single SMTC from a set of SMTCs. The rule-basedlogic 540 may comprise a knowledge base that stores relationships and mappings between CUIs and SMTCs. - In some embodiments, the
medical text 512 may be broken down into medical text portions. For example, if themedical text 512 contains information from multiple visits to a facility, multiple procedures performed, etc., theNLP 542 may break the information into individual portions to simplify the processing. This breaking down or parsing of themedical text 512 by theNLP 542 may be based on a mechanism such as punctuation, keywords, parts of language (nouns, verbs, etc.) or using other known techniques for language parsing. The medical text portions may be further processed by theNLP 542 to remove superfluous words and organize the text in a consistent manner. Additionally, theNLP 542 may perform a tokenization of themedical text 512. TheNLP 542 may determine one or more concepts/CUIs associated with themedical text 512 or text portions. - The
code point resolver 520, in order to resolve the code point, i.e., the best-fit SMTC, may consider multiple concepts/CUIs that are the best fit for an application. Thecode point resolver 520 may determine that certain concepts and/or certain types of concepts are more valuable for amedical text portion 512 than others. The rule-basedlogic 540 may make this determination by incorporatingNLP 542, pattern matching rules 544, and/or supervisedmachine learning models 546. By way of example, if thealgorithm 540 has arelevance determiner 548 that determines amedical text portion 512 relates to a radiation procedure, then it would determine applicable CUIs that are therapeutic or preventive procedures. Since this relates to a procedure, non-procedure-based concepts (such as the concept for “electromagnetic radiation” or “radiation physical force”) may thus be considered not relevant and filtered out and not considered for mapping into an SMTC(s), since it is much more likely that documentation of a clinical visit is referring to a type of radiation therapy. The delineation of a procedure vs. non-procedure may be, for example, found in definitions of the SMTCs themselves, or may be distinguished by being “therapeutic and diagnostic procedures” as opposed to something that is for example a “physical object” (e.g., a positron emission tomography (PET) scan vs PET system). - In addition to the distinction of “procedure vs. non-procedure”, other forms of distinction may be considered as well. For example, “disorder vs. organism” might be a distinction that could be used to delineate various terms, such as SARS, where the text could potentially refer to either a disorder or an organism. In some embodiments, the surface form matching logic uses the longest match it can find.
- In some embodiments, the individual concepts are mapped to a code point. The
code point resolver 520 determines the code point for a CUI by applying the rule-basedlogic 540 that considers common parameters of an application. The rule-basedlogic 540 may thus use therelevance determiner 548 to select an SMTC or filter SMTCs based on the most correct intent of the CUI in the context of the medical application (e.g., codes for procedures are favored over codes for non-procedures or other intents) that may be provided to thecode point resolver 520. Although therelevance determiner 548 is show separately from the pattern matching rules 544 and the supervisedmachine learning models 546, therelevance determiner 548 may make use of them or be a part of them. Similarly, thefitness scorer 549, discussed in more detail below, may make use of the pattern matching rules and/orsupervised ML models 546 or be a part of them. Thecode point resolver 520 may have access to external data information sources, such as the interchange coding system 552 (e.g., SNOMED and others discussed above) to provide theSMTCs 554, and a biomedical metathesaurus 556 (e.g., UMLSM discussed above) to provide the metathesaurus concept entities 558. - If multiple CSMTCs remain, and these codes exist in a hypernym-hyponym relationship, then the rule-based
logic 540 may choose the hypernym over the hyponym. The case inFIG. 2B illustrates this. “Nephrectomy” could be mapped to “total nephrectomy” or “nephrectomy”. Since there is uncertainty at this point, the more general one is picked. But if other documents later include medical text about a “total nephrectomy” on the same day, then that decision may be revised to a more specific code. In another example, “mastectomy” may be chosen as the hypernym, but could be later determined to refer to other kinds of mastectomies (e.g., simple, radical, bilateral . . . ). These examples are largely based on how UMLS and SNOMED organize the relationships between procedures. The mastectomy could be viewed as an example of speaking generally about something more specific. However, in terms of code resolution, this problem may also happen because different vocabs have different mappings. Sometimes this leads to more than one mapping for the same medical text, and therefore it may be desirable to pick the most general concept for accuracy. - If multiple CSMTCs continue to remain, then a source rater may determine a reliability of the sources for the respective CSMTCs, and the CSMTC with the highest reliability rating. Because UMLS has many vocabularies and mappings, some sources are more reliable. If multiple CSMTCs continue to remain, then further logic may apply, such as the oldest CUI that exists in UMLS being chosen. Older CUIs are more familiar, and are more often used in practice. To determine the age of a CUI, it may be possible to determine an absolute or possibly a relative age based on a length or a value of the CUI (e.g., newer CUIs may have longer identifiers). In other embodiments, the age might be determined by loading each version of the UML's database and recording when a particular CUI first appeared in the database. In some embodiments, age is used as a proxy for how frequently a code is used, based on a presumption that CUIs that have been around a while are more in use than newer ones. Determining the frequency that a CUI or SMTC is used within a large corpus may be an alternative mechanism for decision-making.
- As noted above, in the case where multiple concepts have resulted in multiple CSMTCs, the
code point resolver 520 may try to disambiguate using other information, such as clinical notes or structured data. Thecode point resolver 520 will often detect a same event for a particular SMTC in multiple notes associated with the text portion. Some of these notes have more detail than others. For example, an operative clinical note may state a specific “skin-saving mastectomy”, while an assessment clinical note may simplify this and simply state in a general manner that the patient had a “mastectomy”. These operative and assessment clinical notes may be aggregated together by thecode point resolver 520, and SMTC disambiguation may be performed at that time. In order for thecode point resolver 520 to aggregate events or information from different text portions, it may consider identifying information related to the events, relationships between SMTCs and CSMTCs, as well as any detected date for the event. Further, thecode point resolver 520 may then decide whether to combine the information from two events or not. If events are combined, then the most suitable code, based on the process discussed above, may be selected for the combined event. - By way of example, for the mastectomy example above, the
code point resolver 520 may determine that the patient identifier is the same for two events represented by text portions, and the two clinical notes are both for the same day (or within a predefined segment of time), and thus logically determine that these two different notes both refer to the same event. Other rules or logic may be used to make this determination by the rule-basedlogic 540. -
FIGS. 4A-4C are block diagrams illustrating process flows for associating a best-fit SMTC with a respective event, according to some embodiments. -
FIG. 4A is a block diagram illustrating anevent 400A for processing an event withambiguous CSMTCs medical input data 514 relates to asurgery event 410 having a firstpossible CSMTC 420, where the CSMTC refers to a nephrectomy, and a secondpossible CSMTC 425 refers to a total nephrectomy. Thedate evidence 414 associated with thesurgery event 410 indicates the date of occurrence simply as being some time in 2019. Theevent evidence 416 makes reference to a generic “nephrectomy”. As can be seen by themedical text 430A, the indication is that “the patient had a nephrectomy” 432A “in 2019” 434A. -
FIG. 4B is a block diagram illustrating anevent 400B that shows another event that has been constructed from a different text portion showing more detail, namely, the text portion clarifies that “the patient had a total nephrectomy” 432B and has a more specific “December 2019” 434B date. It also has asingle CSMTC 425. The event inFIG. 4B can be combined with the event inFIG. 4A based on a relatedness of the procedures and relatedness of the date, even though one is more general than the other. -
FIG. 4C is a block diagram illustrating acombination 400C of theevents FIG. 4A andFIG. 4B . The mostspecific code 425, the total nephrectomy, has been selected for the new event by the rule-basedlogic 540, based on the morespecific text portion 430B, 432B. The mostspecific date 434B has also been updated by the rule-basedlogic 540. The evidence from the twoprior events new event 400C. - Returning to
FIG. 5 , a fitness score may be determined by the rule-basedlogic 540 in the event that thecode point 516 cannot be determined by other mechanisms described herein. The rule-basedlogic 540 may utilize afitness scorer 549 for each of the CSMTCs and then choose the CSMTC having the highest score. The fitness scorer may perform certain of the rule-basedlogic 540 described above, such as providing a higher score to a hypernym over a hyponym, providing a higher score for a UMLS or SMTC that is more reliable or has a higher reliability measure, providing a higher score where an older UMLS is present. A reliability measure for various sources may be provided in the configuration files of thecode point resolver 520, and may be determined from developer experience, and combined potentially with measuring the accuracy of the overall system. When choosing mappings between UMLSs and SMTCs, some sources of the mapping are known or believed to be better sources than others. - In the event that no other principled logic yields a single CSMTC, then an arbitrary decision may be made by the rule-based
logic 540 to ensure a single CSMTC is returned as thecode point 516. This arbitrary decision may be based on, e.g., a numerical order of a SNOMED ID value, a random selection of remaining CSMTCs, or any other mechanism to ensure a return of a single value. In some embodiments, it may be advantageous to ensure a consistent return of thecode point 516 for a given input of data. - Multiple factors discussed above may be used by the
fitness scorer 549 to determine the fitness score, and these may be applied, in some embodiments, by applying a weighting to the factors described above. For example, a weighting may be applied to an MCE that is based on a medical application intent (e.g., a procedure may have a higher weighting than a test); a weighting might be applied in which hypernyms are weighted higher than hyponyms; a source that is more reliable may be weighted higher than one that is less reliable; and a weighting may be weighted according to codes by an industry acceptance rating. -
FIG. 6 is a flowchart of anexample process 600 that may be utilized by thecode point resolver 520. Inoperation 602, thecode point resolver 520 receives, via anetwork interface 522,medical input data 514 that may comprisemedical text 512, with possibly other information, from amedical management system 510. Inoperation 604, rule-basedlogic 540 may be used to process themedical text 512, and may comprise, in some embodiments, the components of anatural language processor 542, pattern matching rules 544,supervised ML models 546, arelevance determiner 548, and afitness scorer 549. These components may interact with one another or share algorithms and functionality. - The rule-based
logic 540 may utilize other information, such as clinical reports, and structured data, along with medical terminology codes (SMTCs) 554 or aninterchange coding system 552, such as SNOMED. The rule-based logic may further utilize a biomedical metathesaurus, such as theUMLSM 556 to provide metathesaurus concept entities 558. A plurality of CSMTCs are associated with themedical input data 514. Ultimately, inoperation 606, thecode point resolver 520 resolves asingle code point 516 from the plurality of CSMTCs. In operation 608, thecode point 516 is transmitted, via thenetwork interface 522, to themedical management system 510 in order to assist themedical management system 510 in resolving any ambiguity that may be present in the initialmedical input data 514. Thecode point 516 may be further utilized to construct a timeline of a patient's history. By way of example, this could be used by an oncology application that assists a physician with following National Comprehensive Cancer Network (NCCN) guidelines, and/or matching patients with relevant clinical trials. Additionally, thecode point 516 may be utilized to convert the unstructured text and other associatedmedical input data 514 into structured data, such as into a Fast Healthcare Interoperability Resources (FHIR) record format for storage and use in the above-discussed applications.
Claims (22)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/223,082 US20230360745A1 (en) | 2021-01-29 | 2023-07-18 | Code Point Resolution Using Natural Language Processing and Metathesaurus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/161,828 US11749384B2 (en) | 2021-01-29 | 2021-01-29 | Code point resolution using natural language processing and metathesaurus |
US18/223,082 US20230360745A1 (en) | 2021-01-29 | 2023-07-18 | Code Point Resolution Using Natural Language Processing and Metathesaurus |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/161,828 Continuation US11749384B2 (en) | 2021-01-29 | 2021-01-29 | Code point resolution using natural language processing and metathesaurus |
Publications (1)
Publication Number | Publication Date |
---|---|
US20230360745A1 true US20230360745A1 (en) | 2023-11-09 |
Family
ID=82612839
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/161,828 Active 2041-07-15 US11749384B2 (en) | 2021-01-29 | 2021-01-29 | Code point resolution using natural language processing and metathesaurus |
US18/223,082 Pending US20230360745A1 (en) | 2021-01-29 | 2023-07-18 | Code Point Resolution Using Natural Language Processing and Metathesaurus |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/161,828 Active 2041-07-15 US11749384B2 (en) | 2021-01-29 | 2021-01-29 | Code point resolution using natural language processing and metathesaurus |
Country Status (1)
Country | Link |
---|---|
US (2) | US11749384B2 (en) |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8024128B2 (en) * | 2004-09-07 | 2011-09-20 | Gene Security Network, Inc. | System and method for improving clinical decisions by aggregating, validating and analysing genetic and phenotypic data |
US8229881B2 (en) * | 2007-07-16 | 2012-07-24 | Siemens Medical Solutions Usa, Inc. | System and method for creating and searching medical ontologies |
US9542647B1 (en) | 2009-12-16 | 2017-01-10 | Board Of Regents, The University Of Texas System | Method and system for an ontology, including a representation of unified medical language system (UMLS) using simple knowledge organization system (SKOS) |
US20110246234A1 (en) * | 2010-03-31 | 2011-10-06 | Welch Allyn, Inc. | Patient matching |
US9946991B2 (en) | 2011-06-30 | 2018-04-17 | 3M Innovative Properties Company | Methods using multi-dimensional representations of medical codes |
WO2013019532A1 (en) | 2011-07-29 | 2013-02-07 | The Trustees Of Columbia University In The City Of New York | System and method for language extraction and encoding |
US8700589B2 (en) * | 2011-09-12 | 2014-04-15 | Siemens Corporation | System for linking medical terms for a medical knowledge base |
US9594872B2 (en) * | 2012-10-25 | 2017-03-14 | Intelligent Medical Objects, Inc. | Method and system for concept-based terminology management |
US10133847B2 (en) * | 2014-06-10 | 2018-11-20 | International Business Machines Corporation | Automated medical problem list generation from electronic medical record |
US10614196B2 (en) | 2014-08-14 | 2020-04-07 | Accenture Global Services Limited | System for automated analysis of clinical text for pharmacovigilance |
US10740678B2 (en) | 2016-03-31 | 2020-08-11 | International Business Machines Corporation | Concept hierarchies |
US11081215B2 (en) * | 2017-06-01 | 2021-08-03 | International Business Machines Corporation | Medical record problem list generation |
US10395772B1 (en) | 2018-10-17 | 2019-08-27 | Tempus Labs | Mobile supplementation, extraction, and analysis of health records |
-
2021
- 2021-01-29 US US17/161,828 patent/US11749384B2/en active Active
-
2023
- 2023-07-18 US US18/223,082 patent/US20230360745A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
US11749384B2 (en) | 2023-09-05 |
US20220246253A1 (en) | 2022-08-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11232365B2 (en) | Digital assistant platform | |
CA3046247C (en) | Data platform for automated data extraction, transformation, and/or loading | |
US11636376B2 (en) | Active learning for concept disambiguation | |
US11276494B2 (en) | Predicting interactions between drugs and diseases | |
US20170235887A1 (en) | Cognitive Mapping and Validation of Medical Codes Across Medical Systems | |
US20180349555A1 (en) | Medical record problem list generation | |
US11316833B2 (en) | System, method, and recording medium for preventing back propogation of data protection | |
US20200065380A1 (en) | Document clearance using blockchain | |
Pardamean et al. | Integrated model of cloud-based E-medical record for health care organizations | |
US20190286968A1 (en) | Cognitive adaption of recommendation system | |
US11514501B2 (en) | Navigation method, system, and computer program product based on user specification | |
US10902943B2 (en) | Predicting interactions between drugs and foods | |
US11164657B2 (en) | Accelerated pharmaceutical repurposing by finding anticorrelations and by text mining | |
US10629310B2 (en) | Systems and methods for facilitating communication of health information | |
US11443384B2 (en) | Intelligent policy covery gap discovery and policy coverage optimization | |
US20210265063A1 (en) | Recommendation system for medical opinion provider | |
US11749384B2 (en) | Code point resolution using natural language processing and metathesaurus | |
US11823775B2 (en) | Hashing electronic records | |
US11301772B2 (en) | Measurement, analysis and application of patient engagement | |
Harrison Jr | Pathology informatics questions and answers from the University of Pittsburgh pathology residency informatics rotation | |
US11238955B2 (en) | Single sample genetic classification via tensor motifs | |
WO2021245493A1 (en) | Concept disambiguation for natural language processing | |
WO2017052358A1 (en) | Comprehensive healthcare system and method for effective management of healthcare services | |
Adamkó et al. | Review and requirement specification of telemedicine systems interoperability with common data exchange hub | |
US20190108451A1 (en) | Cognitive health care vital sign determination to negate white coat hypertension impact |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MERATIVE US L.P., MICHIGAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTERNATIONAL BUSINESS MACHINES CORPORATION;REEL/FRAME:064293/0552 Effective date: 20220630 Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LAWRENCE, NICHOLAS TODD;SUAREZ SAIZ, FERNANDO JOSE;SANDERS, COREY;AND OTHERS;REEL/FRAME:064293/0151 Effective date: 20210128 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |