WO2021124150A1

WO2021124150A1 - Populating a tree data structure using a molecular data structure

Info

Publication number: WO2021124150A1
Application number: PCT/IB2020/062027
Authority: WO
Inventors: Gordon E. Johnson; William L. SCHOFIELD, III; Hans Rudolf STRAUB; Jeremy R. KORNBLUTH
Original assignee: 3M Innovative Properties Company
Priority date: 2019-12-20
Filing date: 2020-12-16
Publication date: 2021-06-24

Abstract

A system, method, and computer-readable storage medium for a computing apparatus populating a portion of a tree data structure in a decision tree encoder. The method includes accessing, via the computing apparatus, a molecular data structure that includes a concept molecule. At least one of the concept atoms in the concept molecule is an attribute of another concept atom or a medical code, and at least one of the concept atoms has a hierarchical relationship to another concept atom. The method includes populating, based on the molecular data structure, at least one node of an existing tree data structure that is derived from the decision tree encoder to form a populated tree data structure and determining whether the populated tree data structure satisfies a condition. Based on the condition being satisfied, the computing apparatus can perform at least one operation.

Description

POPULATING A TREE DATA STRUCTURE USING A MOLECULAR DATA STRUCTURE

BACKGROUND

[0001] Document coding is generally a process of mapping topics included in a document to a code of a code-set. The topics in different scenarios may simply be words but may also, or instead, be concepts related to the words. In most situations, the real interest is in the semantic meanings of one or more words, sentences, and paragraphs within a document.

The code-set to which a document is mapped may be unique to an organization or purpose or may instead be a standardized code-set as set by an industry organization, a government entity, a company, or as may be needed to integrate code-set data with a particular computer system or computing environment. Regardless, document coding is performed for many reasons such as organizing, indexing, inventorying, billing, and the like. The documents may be of different types for these purposes, such as legal documents which may include evidentiary documents, medical records of procedures and services provided, academic and technical articles and papers, and others.

[0002] Initially, document coding was performed manually. There has been an ongoing effort for electronically processing documents for automatic coding. These efforts have progressed but are generally rule-driven. Such rules often provide one-to-one or many-to- one mapping of words or a semantic meaning to one code. These rules are typically inflexible, difficult to define and update, and generally expensive to maintain due to hard- coding within computer programs or components thereof and the computer code and complexity of the rules generally being inaccessible to non-expert computer-coding employees.

BRIEF SUMMARY

[0003] Aspects of the present disclosure can relate to a system, method, and computer- readable storage medium for a computing apparatus populating a portion of a tree data structure in a decision tree encoder. The method includes accessing, via the computing apparatus, a molecular data structure that includes a concept molecule. At least one of the concept atoms in the concept molecule is an attribute of another concept atom or a medical code, and at least one of the concept atoms has a hierarchical relationship to another concept atom. The method includes populating, based on the molecular data structure, at least one node of an existing tree data structure that is derived from the decision tree encoder to form a populated tree data structure and determining whether the populated tree data structure satisfies a condition. Based on the condition being satisfied, the computing apparatus can perform at least one operation.

[0004] The above summary is not intended to describe each embodiment or every implementation of the disclosure. The Figures and the detailed description that follow more particularly exemplify illustrative embodiments

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

[0005] To easily identify the discussion of any particular element or act, the most significant digit or digits in a reference number refer to the figure number in which that element is first introduced.

[0006] FIG. 1 illustrates a system 100 in accordance with one embodiment.

[0007] FIG. 2 illustrates a method 200 in accordance with one embodiment.

[0008] FIG. 3 illustrates a concept molecule 300 which is an example of a concept molecule (CM) derived from the input noun phrase “open fracture of distal radius” according to an example embodiment.

[0009] FIG. 4 illustrates a logical block diagram of a correlation engine 400 in accordance with one embodiment.

[0010] FIG. 5 illustrates a semantic network 500 in accordance with one embodiment. [0011] FIG. 6 illustrates a tree data structure 600 in accordance with one embodiment.

[0012] FIG. 7 is a block schematic diagram of a computer system to implement one or more example embodiments.

[0013] FIG. 8 is a logical block diagram of a system architecture for use in generating concept molecules and converting the concept molecules into codes according to an example embodiment.

DETAILED DESCRIPTION

[0014] Aspects of the present disclosure relate to a method, system, and computer readable storage medium for using a molecular data structure to populate (i.e., pre-fill) a tree data structure from a decision tree encoder. The tree data structure can be used to determine a medical code.

[0015] While molecular data structures are known to take unstructured language and used to produce medical codes, taking the output from the molecular data structure and using it to pre-fill an existing tree data structure corresponding to a medical code scheme is not known. [0016] Decision tree encoders can function as hierarchical lists of medical codes. To arrive at the right code, the user may have to cycle through each level (i.e., nodes) of the tree data structure which can be time consuming. Previously, a user would have to select all of the nodes of the tree data structure in a decision tree encoder as part of a question and answer until the leaf node is reached.

[0017] By using the molecular data structure to pre-fill in the existing tree data structure for a decision tree encoder, the user can shortcut intermediate nodes to arrive at the medical code faster and more efficiently, which is a practical application of the present disclosure. This technique can also allow the user to avoid unnecessary input operations (e.g., keystrokes) and prolong the life of input devices such as keyboards and mice.

[0018] FIG. 1 illustrates a high-level overview of a system 100 for performing aspects of the present disclosure. In at least one embodiment, the system 100 is a computing apparatus configured to utilize the semantic representations described herein. The system 100 can include input 120. The input 120 can be either auditory or text input. For example, system 100 can receive spoken language from a user, or written language from a corpus of medical documents. The system 100 can convert the input 120 into words 102. In at least one embodiment, the words 102 can include phrases such as noun phrases which can be used to create semantic representations of the phrase.

[0019] The words 102 are converted into a molecular data structure 104 using various known techniques. For example, a method of converting words (e.g., arranged in sentences or noun phrase structures), into a molecular data structure 104 is described in U.S. Provisional Patent Application No. 62/786,473, titled Concept Molecular Data Structure Generator, filed December 31, 2018, which is incorporated by reference. In at least one embodiment, molecular data structure 104 and the tree data structure 106 can be created with the decision tree encoder 108 software. In at least one embodiment, a plurality of concept molecules 114 can be stored in a data store and to be accessed at a later time by the decision tree encoder 108, thus reducing processing time to translate frequently used words 102 into the molecular data structure 104.

[0020] The extraction of semantics can occur via natural language processing, the findings of which are utilized to generate a semantic graph. The natural language processing is performed to find meaning from words. The meanings are represented by the concepts of the semantic graphs. There is a distinction between concept atoms 116 and concept molecules 114. The atomic concepts (concept atom 116) are the building blocks of the composite concepts (concept molecule 114). Concept molecules 114 are represented in semantic graphs as data structures built from atoms that are arranged considering the semantic relations between them. The arrangement distinguishes between hierarchic and attributive relations.

[0021] In the graphical representation of a molecule, hierarchic relations are shown horizontally and attributive relations vertically. The hierarchical relationship can be between a concept and its subconcepts, the attributive one between a concept and its attributes.

[0022] A semantic graph includes at least a concept molecule 114 with at least a concept atom 116 but may also include one or more other concept molecules 114 that each include one or more atoms. Thus, a conceptual semantic graph may be referred to as at least one concept molecule 114 built from at least one atom. In some embodiments, a code of a coding scheme to which a concept molecule 114 has been associated may be included as an atom of the concept molecule 114.

[0023] The content of an input noun phrase, i.e., its semantics, is more than just the words of the phrase. What is lacking if you look only at the words are the implicit contents additional to the words and the implicit connections of all the concepts activated by the words. Both are made explicit in concept molecules (CMs).

[0024] The system 100 can convert the molecular data structure 104 into a tree data structure 106. The tree data structure 106 can be a hierarchical representation of the data from decision tree encoder 108. The tree data structure 106 can include a plurality of nodes 112. In at least one embodiment, the decision tree encoder 108 organizes the medical codes 110 as nodes 112 within the tree data structure 106. The nodes 112 can include root node, parent nodes, child node, and/or leaf nodes. In at least one embodiment, the medical codes 110 can be stored in a separate library or data store.

[0025] The system 100 can include a user interface 118 to display the populated tree data structure 124 (which can be populated from at least part of the molecular data structure 104). The populated tree data structure 124 can be presented to the user and the user can interact with the populated tree data structure 124 using the input device 122. In at least one embodiment, the user can add further information to the various nodes 112 in order to satisfactory the populated tree data structure 124 as described further herein.

[0026] FIG. 2 illustrates a method 200 of how the computing apparatus uses a molecular data structure to pre-fill an existing tree data structure from a decision tree encoder. [0027] In block 202, the computing apparatus can optionally generate a molecular data structure. Generation of a molecular data structure can occur using techniques described herein. In at least one embodiment, the generation occurs based on a corpus of text from associated medical documents. In at least one embodiment, the molecular data structure can be determined based on user input. A plurality of concept molecules can be determined previously (e.g., by experts) and assembled into a library and stored on a data store (existing either within the computing apparatus or external to the computing apparatus). In at least one embodiment, block 202 can occur prior to or contemporaneously with block 204.

[0028] In block 204, the computing device can access the molecular data structure. For example, the computing apparatus can read a plurality of concept molecules from the data store. In at least one embodiment, the molecular data structure can be created contemporaneously as words are provided to the computing apparatus (e.g., natural language processing including speech or text input).

[0029] In at least one embodiment, the accessing can include the generation of the molecular data structure in block 202. For example, the computing apparatus can access the molecular data structure by first receiving a sentence (e.g., an arrangement of words such as a noun-phrase) by the computing apparatus. The sentence can be received via natural language processing or from a document. After the sentence is received, then the computing apparatus can generate the molecular data structure based on the sentence.

[0030] In block 206, the computing apparatus can populate a node (or plurality of nodes) of an existing tree data structure. The computing apparatus can first determine the correlation (e.g., using a correlation engine 400) of any portion of the molecular data structure (e.g., atoms) to the lower nodes (e.g., a child node or leaf node) of the tree data structure. For example, the computing apparatus can populate at least one node of an existing tree data structure that is derived from a decision tree encoder to form a populated tree data structure.

[0031] In at least one embodiment, the computing apparatus can first determine a list of potential medical codes from the molecular data structure as described in Concept Molecular Data Structure Generator and then populate at least one node in the existing tree data structure by identifying a node based on a potential medical code from the list. For example, the computing apparatus can determine whether the code is an attribute of the concept atom.

[0032] In at least one embodiment, the existing tree data structure can be based on a plurality of medical codes. For example, the plurality of medical codes can be indexed according to a coding scheme inherent in the medical code. For example, an ICD-10 CM code for broken internal joint prosthesis is T84.01, which is indexed and dependent from “complications of internal orthopedic prosthetic device” (T84).

[0033] As described herein, the nodes for existing tree data structure can be further populated with information from the plurality of concept molecules (and plurality of concept atoms). For example, one or more of the data fields from the plurality of concept atoms can match with the one or more data fields from the nodes.

[0034] In at least one embodiment, the computing apparatus can first identify the concept atoms having the same semantic type. For example, the concept atoms that have the same semantic type can be concept atoms that are all nouns with modifiers (or verbs when used to evaluate procedures). In another example, concept atoms having the same semantic type can be concept atoms that are dependent from each other and all relate to a master concept (such as injury). The identified concept atoms can have a hierarchical relationship within the concept molecule.

[0035] The computing apparatus can determine the correlation between the identified concept atoms and a plurality of nodes from the existing tree data structure. In at least one embodiment, the computing apparatus can determine the correlation between the concept atom and the node. In at least one embodiment, the computing apparatus can use data analysis techniques used to find correlation between two hierarchical list structures. For example, the correlation can be based on a tree-edit distance involving a number of matrix permutations.

[0036] In at least one embodiment, the computing apparatus can determine the correlation by first determining a terminal concept atom from the identified concept atoms. Once determined, the computing apparatus can correspond the terminal concept atom to a first node (e.g., leaf or child node) within the existing tree data structure. In at least one embodiment, the correlation can be a numerical value. For example, the computing apparatus can use the Jaccard similarity or cosine similarity between data values.

[0037] Once the correlation between the terminal concept atom and the first node is determined, then the computing apparatus can determine the correlation between a predecessor concept atom to a second node (e.g., a child node or a parent node) within the existing tree data structure. The computing apparatus can determine whether the terminal concept atom and the predecessor concept atom correlates with the first node and the second node within the existing tree data structure. [0038] In at least one embodiment, the existing tree data structure may not be present initially (i.e., no existing tree data structure or a blank existing tree data structure) and may need to be built from the medical code data structure. For example, the populating can include adding a root node to the existing tree data structure. In at least one embodiment, the root node can be the index of level 0 node. For example, the root node can be abstracted high-level 2020 ICD-10-CM codes.

[0039] Next, the computing apparatus can add a plurality of nodes to the existing tree data structure to form a populated tree data structure. The plurality of nodes comprises at least a parent node and a child node. Relationships amongst the plurality of nodes in the tree data structure is based on prior associations from a medical code (e.g., an ICD-10 medical code). In the above example, the parent nodes (e.g., level 1 nodes) can be the groups of codes, e.g., A00-B99, C00-D49...Z00-Z99. The child nodes (e.g., level 2 nodes) can be more granular details of a level 1 node. For example, for the A00-B99 parent node, the child nodes can be A00-A09, A15-A19...B99-B99. In at least one embodiment, the parent node can correspond to a first level concept atom, the child node corresponds to a second level concept atom. For example, the first level of concept atom can be “diseases of the respiratory system,” and the second level of the concept atom can be “influenza and pneumonia.”

[0040] In response to a correlation between the concept atoms and nodes within the existing tree data structure, the computing apparatus can present the nodes, via the user interface, that correspond to the identified concept atoms in block 208. For example, the computing apparatus can present the populated tree data structure. In at least one embodiment, presenting the populated tree data structure can include highlighting elements (e.g., medical codes and the associated descriptions) that correspond to the existing tree data structure. In at least one embodiment, the computing apparatus can present a query path that is a sequence of clinical codes based on the populated tree data structure. The query path can also include various requests for information from the user to satisfy the condition or further populate the populated tree data structure.

[0041] In decision block 210, the computing apparatus can determine whether the populated tree data structure satisfies a condition. In at least one embodiment, the condition can be whether the user believes that the existing tree data structure is populated fully. For example, the computing apparatus can prompt a user, via the user interface, whether the populated tree data structure is satisfactory and can receive the indication from the user that the populated tree data structure is satisfactory. In at least one embodiment, the condition can be satisfied when at least one leaf node or child node matches a concept atom (e.g., a terminal concept atom). The condition can be satisfied when a threshold number of leaf nodes or child nodes or a percentage of leaf nodes out of the total leaf nodes matches the concept atoms.

[0042] In block 212, the computing apparatus can perform at least one operation in response to the populated tree data structure satisfying a condition. In at least one embodiment, the operation can be presenting, via a user interface of the computing apparatus and in response to the populated tree data structure being incomplete or not satisfied, the populated tree data structure including a node that is at least two levels from a root node. The computing apparatus can receive input from the user, via the user interface, to fill in a leaf node of the populated tree data structure. For example, the computing apparatus can present the populated tree data structure and then ask the user for more information for descendant nodes. In at least one embodiment, the operation can include determining the medical code based on the input of the leaf node (from the user) and communicating the medical code to the user.

[0043] FIG. 3 is an example of a CM 300, derived from the input noun phrase “open fracture of distal radius.” CM 300 consists of eleven atomic concepts bound together to one composite structure which is referred to as the concept molecule (CM). CM 300 is a cutout of a semantic network that represents the semantics of a specialty domain in a most complete and structured manner and which cutout represents the semantics of the input noun phrase.

[0044] The CM 300 is composed of atoms. The first line of the CM 300 includes three atoms, diagnosis 305, injury 314, and fracture 302. The fracture atom 302 has an attribute of open 320. The open attribute 320 is shown as related to the fracture atom 302 by a link 325. Attributes are also atoms.

[0045] The links between the atomic concepts are represented in the CM structure in a way that shows how the single atomic concepts are arranged. Every atom on the same line is of the same semantic type, e.g., all three atoms 305, atom 314, and atom 302 of the first line are of the type “diagnosis,” all four atoms 330, 335, atom 306, and atom 308 of the second line of the type “localization” and all two atoms 350 and atom 310 of the third line of the type “organ.” At the same time the two lines of “localization,” atom 330 and “bone” atom 350, represent attributes of the atom “diagnosis.” The “open” atom 304 represents an attribute of “fracture” and the “distal” atom 312 represents an attribute of “bone” and is linked as shown at 365. Atoms 330 and 350 are linked to diagnosis atom 305 via link 370. [0046] CM 300 shows many implicit meanings, not literally mentioned in the input text. Concepts like “diagnosis,” “injury,” “limb,” “upper limb,” “forearm” and “bone” are all covered by the input, but not explicitly mentioned. They appear explicitly in the CM.

[0047] The structure of CM 300 only shows atoms and links which are necessary for the content found in the input text. The structure, however, behind the concept molecule, i.e., the semantic network, has potentially links to more types of attributes than just the ones shown in CM 300.

[0048] In at least one embodiment, the computing apparatus can determine the terminal concept atom from the identified concept atoms. As shown from FIG. 3, the terminal concept atoms can be atom 302, atom 304, atom 308, atom 310, and atom 312.

[0049] FIG. 4 illustrates a method within a correlation engine 400 of the computing apparatus. The correlation engine 400 can map the concept atoms 426 to the tree data structure 424. The tree data structure 424 comprises medical codes as a leaf node, child node, or a parent node. For example, tree data structure 424 comprises node 402, node 404, node 406, node 408, collection of nodes 410, and collection of nodes 412 with each node depending from the node preceding it.

[0050] The concept atoms 426 are higherachical for the concept molecule 300. The concept atom can correspond to at least one of nodes of the tree data structure 424. For example, the concept atom 430 (i.e., atom 308) can descend from concept atom 428 (i.e., atom 306). Since the two terms are linked, the correlation engine 400 can match upper limb and forearm to a localization of the injury to match to node 404. Concept atom 420 (i.e., atom 302) can further clarify that node 406 corresponds to a fracture. The correlation engine 400 can determine that concept atom 416 (i.e., atom 310) and concept atom 418 (i.e., atom 312) can correspond to node 408.

[0051] At this phase, the computing apparatus can further determine correlation by correlating the terminal concept atom to a first node within the existing tree data structure. For example, concept atom 416 can map to node 408 having an ICD-10-CM code of S52. 5. The computing apparatus can also perform correlation of a predecessor concept atom to a second node within the existing tree data structure. For example, concept atom 430 can map to node 406 having an ICD-10-CM code of S52. Thus, the partial hierarchy of the tree data structure 424 can be established and, from the partial hierarchy, the correlation engine 400 can determine whether the terminal concept atom and the predecessor concept atom correlates with the first node and the second node within the existing tree data structure. [0052] Further toward the leaves, an expanded tree of child nodes can be present in collection of nodes 410 corresponding to codes S52.50 through S52.59. By further utilizing the concept atom 420 and concept atom 418, the collection of nodes 410 can be further narrowed to leaf nodes of collection of nodes 410. For example, since open fractures are only for B, C, E, F, H, J, M, N, Q, R codes, then a segment of leaf nodes can be excluded and not presented to a user via user interface 414. Thus, the user can be presented with only the most relevant child nodes (and codes), allowing the user to be more efficient.

[0053] FIG. 5 illustrates a semantic network 500 comprising both concept molecule 502 and concept molecule 504. The concept molecule 502 can include the concept atom 506 and concept atom 508. The concept atom 510 can be an attribute of concept atom 508 and have attribute 512 mapping to medical code J12.9. Concept atom 514 can be dependent on concept atom 510 and have attribute 524 mapping to code J12.0.

[0054] As shown, the semantic network 500 can include associated medical codes as an attribute of a concept atom. This medical code association can be collected during the generation of the molecular data structure discussed herein. Examples of this technique is further discussed in Semantic Graph Textual Coding. U.S. Prov. App. No. 62/692048, filed June 29, 2018, also filed as international application PCT/IB2019/055418, on June 26,

2019, which subsequently published as W02020/003174 on April 30, 2020, each of which is incorporated by reference.

[0055] The concept molecule 504 can have concept atom 516 with attribute 518 mapping to medical code BB24 and concept atom 522 with attribute 520 mapping to medical code BB24ZZ.

[0056] FIG. 6 illustrates tree data structure 600 corresponding to a portion of the concept molecule 502. The root node 610 can spawn parent node 608. A pneumonia child node 606 can depend from infection parent node 608. Viral child node 604 can further have adenoviral leaf node 602.

[0057] The tree data structure 600 can have multiple levels. For example, root node 610 can exist in level 0, parent node 608 can exist in level 1, child node 606 in level 2, child node 604 in level 3, and leaf node in level 4. In at least one embodiment, at least one node from level 2 or higher matches a concept atom.

[0058] In at least one embodiment, the computing apparatus determines correlation based on the medical code determined in FIG. 5. For example, child node 604 can correspond to attribute 512 and leaf node 602 corresponds to attribute 524 based on the medical code matching. This technique can take away some uncertainties in textual analysis. In at least one embodiment, the child node 604 and leaf nodes in level 4 can form a sub-tree.

[0059] Since leaf node 602 was the highest level (i.e., most terminal) that matched, then the computing apparatus can display that the diagnosis was adenoviral-caused pneumonia, which is a reimbursable medical code. In at least one embodiment, if the highest level determined was J12.9, then the computing apparatus could display J12.0, J12.2, and J12.81 for the user to select from via the user interface.

[0060] FIG. 7 is a block schematic diagram of a computing system 700 to implement the coding system and for performing methods and algorithms according to example embodiments. All components need not be used in various embodiments.

[0061] One example computing device in the form of a computing apparatus 702 may include a processing unit 706, memory 704, removable storage 714, and non-removable storage 716. Although the example computing apparatus is illustrated and described as computing apparatus 702, the computing apparatus may be in different forms in different embodiments. For example, the computing apparatus may instead be a smartphone, a tablet, smartwatch, smart storage device (SSD), or other computing device including the same or similar elements as illustrated and described with regard to FIG. 7. Devices, such as smartphones, tablets, and smartwatches, are generally collectively referred to as mobile devices or user equipment.

[0062] Although the various data storage elements are illustrated as part of the computing apparatus 702, the storage may also or alternatively include cloud-based storage accessible via a network, such as the Internet or server-based storage. Note also that an SSD may include a processor on which the parser may be run, allowing transfer of parsed, filtered data through I/O channels between the SSD and main memory.

[0063] Memory 704 may include volatile memory 710 and non-volatile memory 712. Computing apparatus 702 may include - or have access to a computing environment that includes - a variety of computer-readable media, such as volatile memory 710 and non volatile memory 712, removable storage 714 and non-removable storage 716. Computer storage includes random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM) or electrically erasable programmable read only memory (EEPROM), flash memory or other memory technologies, compact disc read only memory (CD ROM), Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium capable of storing computer-readable instructions. [0064] Computing apparatus 702 may include or have access to a computing environment that includes input 718, output 720, and a communication connection 722. Output 720 may include a display device, such as a touchscreen, that also may serve as an input device. The input 718 may include one or more of a touchscreen, touchpad, mouse, keyboard, camera, one or more device-specific buttons, one or more sensors integrated within or coupled via wired or wireless data connections to the computing apparatus 702, and other input devices. The computer may operate in a networked environment using a communication connection to connect to one or more remote computers, such as database servers. The remote computer may include a personal computer (PC), server, router, network PC, a peer device or other common data flow network switch, or the like. The communication connection may include a Local Area Network (LAN), a Wide Area Network (WAN), cellular, Wi-Fi, Bluetooth, or other networks. According to one embodiment, the various components of computing apparatus 702 are connected with a system bus.

[0065] Computer-readable instructions stored on a computer-readable medium are executable by the processing unit 706, such as a program 708. The program 708 in some embodiments comprises software to implement one or more of the methods of generating and completing concept molecules and assigning codes to noun phrases. A hard drive, CD- ROM, and RAM are some examples of articles including a non-transitory computer- readable medium such as a storage device. The terms computer-readable medium and storage device do not include carrier waves to the extent carrier waves are deemed too transitory. Storage can also include networked storage, such as a storage area network (SAN). Computer program 708 along with the workspace manager may be used to cause processing unit 706 to perform one or more methods or algorithms described herein.

[0066] In at least one embodiment, the program 708 can also comprise the correlation engine 400 described herein. In at least one embodiment the computing apparatus 702 can communicate with the data store 724. For example, the data store 724 may store the plurality of concept molecules that were previously determined. Thus, the concept molecules can be uploaded into the data store 724 once determined. In at least one embodiment, a second computing apparatus 726 can be used at a different location to input commonly used concept molecules into data store 724. The computing apparatus 702 can access the data store 724 to perform aspects of the present disclosure.

[0067] FIG. 8 is a logical block diagram of a system 800 architecture, according to an example embodiment. The system 800 includes a knowledge engineering portion 810 and a system (application) portion 815. The knowledge engineering portion takes advantage of one or more human experts 820 in the specialty domain via a knowledge base editor program 825 to capture the knowledge of the experts and generate a knowledge base 830 stored in the data store 724 or otherwise accessible to system portion 815.

[0068] An encoding program 835 (i.e., part of program 708) receives free text, such as input noun phrases via input 840. Input 840 may be a storage device or buffer that receives text from any type of input. The encoding program 835 creates the concept molecules utilizing rules stored in the knowledge base 830. Both the concept molecules and the rules are cutouts from a semantic graph of a specialty domain. The encoding program 835 may also utilize already completed concept molecules and in this case just produce medical codes 845 for one or more coding systems.

[0069] List of illustrative embodiments:

1. A method of using a computing apparatus to populate a portion of a tree data structure comprising: accessing, via the computing apparatus, a molecular data structure that includes a concept molecule, the molecular data structure further comprises a plurality of concept atoms, at least one of the concept atoms is an attribute of another concept atom or a medical code, and at least one of the concept atoms has a hierarchical relationship to another concept atom; populating, by the computing apparatus and based on the molecular data structure, at least one node of an existing tree data structure that is derived from a decision tree encoder to form a populated tree data structure; determining whether the populated tree data structure satisfies a condition; and performing at least one operation based the condition being satisfied.

2. The method of embodiment 1, wherein performing at least one operation comprises: presenting, via a user interface of the computing apparatus and in response to the populated tree data structure being incomplete, the populated tree data structure including a node that is at least two levels from a root node; and receiving input from the user, via the user interface, to fill in a leaf node of the populated tree data structure.

2a. The method of any of the preceding embodiments, wherein the method is not performed in the human mind. 2b. The method of any of the preceding embodiments, wherein the method, based on using the computing apparatus, results in increasing coding efficiency of the user in medical coding by allowing the user to avoid selecting some nodes within a decision tree encoder.

3. The method of any of embodiments 1 to 2, wherein determining whether the populated tree data structure satisfies the condition comprises prompting a user, via a user interface, whether the populated tree data structure is satisfactory.

4. The method of any of embodiments 1 to 3, wherein determining whether the populated tree data structure satisfies a condition comprises determining that a terminal concept atom matches a leaf node.

5 The method of embodiment 2, wherein performing at least one operation comprises determining the medical code based on the input of the leaf node.

6. The method of any of embodiments 1 to 5, wherein the existing tree data structure comprises medical codes as a leaf node, child node, or a parent node.

7. The method of any of embodiments 1 to 6, wherein accessing the molecular data structure comprises: receiving a sentence by the computing apparatus; and generating the molecular data structure based on the sentence.

8. The method of embodiment 9, wherein the plurality of nodes is from a sub-tree.

9. The method of any of embodiments 1 to 8, wherein populating the existing tree data structure comprises: identifying concept atoms having a same semantic type; determine a correlation between the identified concept atoms and a plurality of nodes from the existing tree data structure; in response to the correlation with the nodes, present the nodes, via the user interface, that correspond to the identified concept atoms.

10. The method of embodiment 9, wherein the correlation is based on a tree-edit distance based on a number of matrix permutations.

11. The method of embodiment 9, wherein determining the correlation comprises: determining a terminal concept atom from the identified concept atoms; correlating the terminal concept atom to a first node within the existing tree data structure; correlating a predecessor concept atom to a second node within the existing tree data structure; and determining whether the terminal concept atom and the predecessor concept atom correlates with the first node and the second node within the existing tree data structure.

12. The method of embodiment 11, wherein the first node is a child node or a leaf node and the second node is a parent node.

13. The method of any of embodiments 1 to 12, wherein a plurality of concept molecules is previously determined and stored in a data store.

14. The method of any of embodiments 1 to 13, wherein the decision tree encoder comprises a library further comprising an index of medical codes and a description of each medical code.

15. The method of any of embodiments 1 to 14, further comprising determining a list of potential medical codes from the molecular data structure, and wherein populating at least one node comprises identifying a node based on a potential medical code.

16. The method of any of embodiments 1 to 15, wherein the molecular data structure is a cutout of a semantic network that represents the semantics of a specialty domain in a most complete and structured manner and which the cutout represents semantics of an input noun phrase.

17. The method of embodiment 16, wherein codes are assigned based on a completed data structure and the codes correspond to semantics of a specialty subject.

18. The method of any of embodiments 1 to 17, wherein one or more additional concepts have associated attributes.

18a. The method of any of embodiments 1 to 18, further comprising generating the molecular data structure by: receiving a sentence; identifying a master type of concept as a function of meanings of the sentence; generating a molecule data structure having the master type as a top level of the molecule data structure; and inserting additional concepts in the molecule data structure based on molecule data structure rules having an equivalent molecule data structure to complete the molecule data structure.

19. The method of any of embodiments 1 to 18, wherein the sentence is an input noun phrase.

20. The method of embodiment 19, further comprising: generating the molecule data structure by interpreting text of the input noun phrase using rules of a domain specific knowledge base in a stepwise manner such that one rule after the other makes a small interpretation change to the input, until the final molecule data structure is created.

21. The method of embodiment 20, wherein the rules also have a molecule data structure and are thus also cutouts of the same domain specific semantic network as all other molecule data structures, wherein the rules have a dynamic potential with which they are able to transform other molecule data structures, and contrary to these “dynamic” rules, the non-rules molecule data structures comprising inputs, outputs and intermediate states of the interpretation are of purely “static” nature and do not have the ability to change other molecule data structures, but represent a momentary state of interpretation.

22. The method of embodiment 21, wherein the rules have operators assigned to one or more of their atoms, with which operators the method checks the matching of the rule to a given input and executes the changes to it.

23. The method of embodiment 22, wherein associated attributes are included in the molecule data structure in response to no value being provided by the input noun phrase at this site.

24. The method of embodiment 23, wherein including attributes is either performed by a rule calling the attribute explicitly by its name or by a rule with a placeholder character in it which acts as a signal to copy the concept to be attributed from another site of the interim state of interpretation to the correct site of the developing molecule data structure. 25. The method of embodiment 20, further comprise identifying specific sites in the input molecule data structure without an associated value by observing a placeholder character in the corresponding site of the data structure of the rule molecule.

26. The method of embodiment 20, wherein the operations further comprise presenting a question created by a rule to a user, receiving an answer specifying a value for an attribute, proposed by the said rule, and adding the received value to the molecule data structure.

27. The method of embodiment 19, further comprising uploading the molecular data structure into a data store.

28. The method of any of embodiments 1 to 27, wherein the medical code is an ICD-10- CM or ICD-10-PCS code.

29. The method of any of embodiments 1 to 28, wherein the existing tree data structure is not present, and the populating comprises: adding a root node, and adding a plurality of nodes to the existing tree data structure to form a populated tree data structure, the plurality of nodes comprises a parent node and a child node, the parent node corresponds to the a first level of concept atom, the child node corresponds to a second level of concept atom, relationships amongst the plurality of nodes in the tree data structure is based on prior associations from a medical code; presenting, with a user interface, a query path that is a sequence of clinical codes based on the populated tree data structure.

30. The method of embodiment 29, further comprising: requesting user input to verify a relationship between a child clinical concept and a parent clinical concept, the user input is associated with the child node; and adding the user input to the dictionary.

31. A non-transitory computer-readable storage medium including instructions that, when processed by a computer, configure the computer to perform the method of any of embodiments 1 to 30.

32. A computing apparatus that populates a portion of a tree data structure, the computing apparatus comprising: a processor; and a memory storing instructions that, when executed by the processor, configure the apparatus to: access, via a computing apparatus, a molecular data structure that includes a concept molecule, the molecular data structure further comprises a plurality of concept atoms, at least one of the concept atoms is an attribute of another concept atom or a medical code, and at least one of the concept atoms has a hierarchical relationship to another concept atom; populate, by the computing apparatus and based on the molecular data structure, at least one node of an existing tree data structure that is derived from a decision tree encoder to form a populated tree data structure; determine whether the populated tree data structure satisfies a condition; and perform at least one operation based the populated tree data structure satisfying the condition.

33. The computing apparatus of embodiment 32, further comprising: a user interface using a display and an input device, wherein performing at least one operation comprises: present, via the user interface and in response to the populated tree data structure being incomplete, the populated tree data structure including a node that is at least two levels from a root node; and receive input from the user, via the input device, to complete a leaf node of the populated tree data structure.

33 a. The computing apparatus of any of the preceding embodiments, wherein using the computing apparatus results in increasing coding efficiency of the user in medical coding by allowing the user to avoid selecting some nodes within a decision tree encoder.

33b. The computing apparatus of embodiment 33a, wherein using the computing apparatus results in longer lifespan of the input device.

34. The computing apparatus of any of embodiments 32 to 33, wherein determining whether the populated tree data structure satisfies a condition comprises prompting a user, via a user interface, whether the populated tree data structure is satisfactory. 35. The computing apparatus of any of embodiments 32 to 34, wherein determining whether the populated tree data structure satisfies a condition comprises determining that terminal concept atom matches a leaf node.

36. The computing apparatus of any of embodiments 32 to 36, wherein performing at least one operation comprises determine the medical code based on the input of the leaf node.

37. The computing apparatus of any of embodiments 32 to 36, wherein the existing tree data structure comprises medical codes as a leaf node, child node, or a parent node.

38. The computing apparatus of any of embodiments 32 to 37, wherein accessing the molecular data structure comprises: receive a sentence by the computing apparatus; and generate the molecular data structure based on the sentence.

39. The computing apparatus of any of embodiments 32 to 38, wherein a plurality of nodes is from a sub-tree.

40. The computing apparatus of any of embodiments 32 to 39, wherein populating the existing tree data structure comprises: identify concept atoms having a same semantic type; determine a correlation between the identified concept atoms and a plurality of nodes from the existing tree data structure; in response to the correlation with the nodes, present nodes, via the user interface, that correspond to the identified concept atoms.

41. The computing apparatus of embodiment 40, wherein the correlation is based on a tree-edit distance based on a number of matrix permutations.

42. The computing apparatus of embodiment 40, wherein determining the correlation comprises: determine a terminal concept atom from the identified concept atoms; correspond the terminal concept atom to a first node within the existing tree data structure; correspond a predecessor concept atom to a second node within the existing tree data structure; and determine whether the terminal concept atom and the predecessor concept atom correlates with the first node and the second node within the existing tree data structure.

43. The computing apparatus of embodiment 42, wherein the first node is a child node or a leaf node and the second node is a parent node.

44. The computing apparatus of any of embodiments 32 to 43, wherein the decision tree encoder comprises a data store comprising an index of medical codes and a description of each medical code.

45. The computing apparatus of any of embodiments 32 to 44, wherein the instructions further configure the apparatus to determine a list of potential medical codes from the molecular data structure, and wherein populating at least one node comprises identify a node based on a potential medical code.

46. The computing apparatus of any of embodiments 32 to 45, wherein the molecular data structure is a cutout of a semantic network that represents the semantics of a specialty domain in a most complete and structured manner and which the cutout represents semantics of an input noun phrase.

47. The computing apparatus of any of embodiments 32 to 46, wherein codes are assigned based on a completed data structure and the codes correspond to semantics of a specialty subject.

48. The computing apparatus of any of embodiments 32 to 47, wherein one or more additional concepts have associated attributes.

[0070] "Child node" refers to a node that is a descendant of any node.

[0071] "Concept atom" refers to an indivisible concept.

[0072] "Concept molecule" refers to a well-structured composite of atoms linked together with clearly distinguishable hierarchic and attributive relations. Concept molecules are built of concept atoms, which are arranged in a structure which represents the relations between the concept atoms. The resulting structure is in detail described by H.R. Straub in the book “Das Interpret! erende System” (Z/I/M Verlag, 2001). [0073] "Correlation" refers to a mutual relationship or connection between two or more things.

[0074] "Correspond" refers to having a close similarity; match or agree almost exactly. Correspond can refer to a (high) probability of a concept atom matching to a node. Correspond can refer to being equivalent or similar in character, quantity, quality, origin, structure, or function while correlate is to compare things and bring them into a relation having corresponding characteristics.

[0075] "Data store" refers to a repository for persistently storing and managing collections of data which include not just repositories like databases, but also simpler store types such as simple files, emails etc. A database is a series of bytes that is managed by a database management system.

[0076] "Decision tree encoder" refers to a software program that uses a tree data structure to produce a medical code. The decision tree encoder can be an index of a plurality of medical codes and be used to select medical code based on the relationship hierarchy to a concept. Examples of decision tree encoders are commercially available under the trade designation Codefinder by 3M, or Encoder Pro by Optum, or fmdacode.com.

[0077] "Leaf node" refers to a node of the tree data structure that does not have any children.

[0078] "Level" refers to one plus the number of edges between the node and the root node. A higher level may refer to a root or parent node as in a “higher level” concept.

[0079] "Medical code" refers to a code defined for a particular medical purpose, such as by a governmental body, a consortium, a standard setting group, and the like. Some such coding schemes may be for medical billing, which may include facility and professional reimbursement fact coding elements. Examples of such medical billing codes include codes associated with the International Classification of Diseases (ICD) codes (versions 9 and 10), Current Procedural Technology (CPT) codes, a Healthcare Common Procedural Coding System codes (HCPCS), and Physician Quality Reporting System (PQRS) codes. In some examples, the codes associated with facility reimbursement include ICD codes, CPT codes, and HCPCS codes. Generally, these reimbursement facts are related to the services and equipment provided by the facility where the patient encounter occurred. Codes associated with professional reimbursement may include ICD codes, CPT codes, and PQRS codes. [0080] "Molecular data structure" refers to a semantic representation of the words. The molecular data structure can be based on the semantics of the specialty itself. It is not closed as all coding systems and other standards (SNOMED e.g.) must be, but open and able to adjust to any of the many diverging standards. The molecular data structure is not simply hierarchical, but multidimensional-multifocal. This means that it incorporates many hierarchical trees and interweaves them to an elaborate semantic network.

[0081] "Parent node" refers to a node that is a predecessor of any node.

[0082] "Phrase" refers to a group of words standing together as a conceptual unit.

[0083] "Populated tree data structure" refers to a tree data structure that is populated from at least one concept atom from the concept molecule. The tree data structure can be partially populated or fully populated (i.e., the full tree).

[0084] "Predecessor concept atom" refers to a concept atom that is a predecessor (immediate or distant) to the terminal concept atom. The opposite is a descendant concept atom.

[0085] “Sentence” refers to a textual unit consisting of one or more words that are grammatically linked.

[0086] "Semantic network" refers to a knowledge base that represents semantic relations between concepts in a network. The term can also be used to refer to a collection of concept molecules.

[0087] "Sub-tree" refers to a portion of the tree data structure consisting of a node of the tree and all of its descendants. Sub-tree corresponding to the root node is the entire tree, and each bode is the root node of the subtree it determines.

[0088] "Terminal concept atom" refers to a concept atom that does not have any dependent concept atoms.

[0089] "Tree data structure" refers to an abstract model having nodes that are linked together in a hierarchical tree structure with a root node, parent node, child node, and leaf node. A tree data structure has one path to a particular node. Example of the tree data structure is described https://www.cs.cmu.edu/~clo/www/CMU/DataStructures/Lessons/lesson4_l.htm.

[0090] "Tree-edit distance" refers to a technique to determine distance between ordered labeled trees. For example, Kaizhong Zhang and Dennis Shasha, Simple fast algorithms for the editing distance between trees and related problems, Society for Industrial and Applied Mathematics Journal of Computing, Vol. 18, No. 6, 1245-1262 (December 1989).

[0091] "User" refers to an entity that uses the computing apparatus, e.g., a medical coder. [0092] "User interface" refers to the means by which the user and a computer system interact, in particular the use of input devices and software.

[0093] "Words" refers to distinct meaningful elements of speech or writing, used with others (or sometimes alone) to form a sentence.

[0094] In the following description, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration specific embodiments which may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized, and that structural, logical and electrical changes may be made without departing from the scope of the present invention. The following description of example embodiments is, therefore, not to be taken in a limited sense, and the scope of the present invention is defined by the appended claims.

[0095] The functions or algorithms described herein may be implemented in software in one embodiment. The software may consist of computer executable instructions stored on computer readable media or computer readable storage device such as one or more non- transitory memories or other type of hardware-based storage devices, either local or networked. Further, such functions correspond to modules, which may be software, hardware, firmware or any combination thereof. Multiple functions may be performed in one or more modules as desired, and the embodiments described are merely examples. The software may be executed on a digital signal processor, ASIC, microprocessor, or other type of processor operating on a computer system, such as a personal computer, server or other computer system, turning such computer system into a specifically programmed machine.

[0096] The functionality can be configured to perform an operation using, for instance, software, hardware, firmware, or the like. For example, the phrase “configured to” can refer to a logic circuit structure of a hardware element that is to implement the associated functionality. The phrase “configured to” can also refer to a logic circuit structure of a hardware element that is to implement the coding design of associated functionality of firmware or software. The term “module” refers to a structural element that can be implemented using any suitable hardware (e.g., a processor, among others), software (e.g., an application, among others), firmware, or any combination of hardware, software, and firmware. The term, “logic” encompasses any functionality for performing a task. For instance, each operation illustrated in the flowcharts corresponds to logic for performing that operation. An operation can be performed using, software, hardware, firmware, or the like. The terms, “component,” “system,” and the like may refer to computer-related entities, hardware, and software in execution, firmware, or combination thereof. A component may be a process running on a processor, an object, an executable, a program, a function, a subroutine, a computer, or a combination of software and hardware. The term, “processor,” may refer to a hardware component, such as a processing unit of a computer system. [0097] Furthermore, the claimed subject matter may be implemented as a method, apparatus, or article of manufacture using standard programming and engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computing apparatus to implement the disclosed subject matter. The term, “article of manufacture,” as used herein is intended to encompass a computer program accessible from any computer-readable storage device or media. Computer-readable storage media can include, but are not limited to, magnetic storage devices, e.g., hard disk, floppy disk, magnetic strips, optical disk, compact disk (CD), digital versatile disk (DVD), smart cards, flash memory devices, among others. In contrast, computer-readable media, i.e., not storage media, may additionally include communication media such as transmission media for wireless signals and the like.

Claims

What is claimed is:

1. A method of a computing apparatus populating a portion of a tree data structure comprising: accessing, via the computing apparatus, a molecular data structure that includes a concept molecule, the concept molecule further comprises a plurality of concept atoms, at least one of the concept atoms is an attribute of another concept atom or a medical code, and at least one of the concept atoms has a hierarchical relationship to another concept atom; populating, by the computing apparatus and based on the molecular data structure, at least one node of an existing tree data structure that is derived from a decision tree encoder to form a populated tree data structure; determining whether the populated tree data structure satisfies a condition; and performing at least one operation based on the condition being satisfied.

2. The method of claim 1, wherein performing at least one operation comprises: presenting, via a user interface of the computing apparatus and in response to the populated tree data structure being incomplete, the populated tree data structure including a node that is at least two levels from a root node; and receiving input from the user, via the user interface, to fill in a leaf node of the populated tree data structure.

3. The method of any of claims 1 to 2, wherein determining whether the populated tree data structure satisfies the condition comprises prompting a user, via a user interface, whether the populated tree data structure is satisfactory.

4. The method of any of claims 1 to 3, wherein determining whether the populated tree data structure satisfies a condition comprises determining that a terminal concept atom matches a leaf node.

5. The method of claim 2, wherein performing at least one operation comprises determining the medical code based on the input of the leaf node.

6. The method of any of claims 1 to 5, wherein the existing tree data structure comprises medical codes as a leaf node, child node, or a parent node.

7. The method of any of claims 1 to 6, wherein accessing the molecular data structure comprises: receiving a sentence by the computing apparatus; and generating the molecular data structure based on the sentence.

8. The method of any of claims 1 to 7, wherein populating the existing tree data structure comprises: identifying concept atoms having a same semantic type; determining a correlation between the identified concept atoms and a plurality of nodes from the existing tree data structure; and in response to the correlation with the nodes, presenting the nodes, via the user interface, that correspond to the identified concept atoms.

9. The method of claim 8, wherein the correlation is based on a tree-edit distance based on a number of matrix permutations.

10. The method of claim 8, wherein determining the correlation comprises: determining a terminal concept atom from the identified concept atoms; correlating the terminal concept atom to a first node within the existing tree data structure; correlating a predecessor concept atom to a second node within the existing tree data structure; and determining whether the terminal concept atom and the predecessor concept atom correlates with the first node and the second node within the existing tree data structure.

11. The method of any of claims 1 to 10, further comprising determining a list of potential medical codes from the molecular data structure, and wherein populating at least one node comprises identifying a node based on a potential medical code.

12. The method of any of claims 1 to 11, wherein the existing tree data structure is not present, and the populating comprises: adding a root node, and adding a plurality of nodes to the existing tree data structure to form a populated tree data structure, the plurality of nodes comprises a parent node and a child node, the parent node corresponds to the a first level of concept atom, the child node corresponds to a second level of concept atom, relationships amongst the plurality of nodes in the tree data structure is based on prior associations from a medical code; presenting, with a user interface, a query path that is a sequence of clinical codes based on the populated tree data structure.

13. The method of claim 12, further comprising: requesting user input to verify a relationship between a child clinical concept and a parent clinical concept, the user input is associated with the child node; and adding the user input to the dictionary.

14. A non-transitory computer-readable storage medium including instructions that, when processed by a computer, configure the computer to perform the method of any of claims 1 to 13.

15. A computing apparatus, the computing apparatus comprising: a processor; and a memory storing instructions that, when executed by the processor, configure the apparatus to: access, via a computing apparatus, a molecular data structure that includes a concept molecule, the molecular data structure further comprises a plurality of concept atoms, at least one of the concept atoms is an attribute of another concept atom or a medical code, and at least one of the concept atoms has a hierarchical relationship to another concept atom; populate, by the computing apparatus and based on the molecular data structure, at least one node of an existing tree data structure that is derived from a decision tree encoder to form a populated tree data structure; determine whether the populated tree data structure satisfies a condition; and perform at least one operation based the populated tree data structure satisfying the condition.

16. The computing apparatus of claim 15, further comprising: a user interface using a display and an input device, wherein performing at least one operation comprises: present, via the user interface and in response to the populated tree data structure being incomplete, the populated tree data structure including a node that is at least two levels from a root node; and receive input from the user, via the input device, to complete a leaf node of the populated tree data structure.

17. The computing apparatus of claim 15 or 16, wherein determining whether the populated tree data structure satisfies a condition comprises prompting a user, via a user interface, whether the populated tree data structure is satisfactory.

18. The computing apparatus of any of claims 15 to 17, wherein determining whether the populated tree data structure satisfies a condition comprises determining that terminal concept atom matches a leaf node.

19. The computing apparatus of any of claims 15 to 18, wherein performing at least one operation comprises determine the medical code based on the input of the leaf node.

20. The computing apparatus of any of claims 15 to 19, wherein populating the existing tree data structure comprises: identify concept atoms having a same semantic type; determine a correlation between the identified concept atoms and a plurality of nodes from the existing tree data structure; in response to the correlation with the nodes, present nodes, via the user interface, that correspond to the identified concept atoms.