WO2022034420A1

WO2022034420A1 - Exemplar-based searching of medical records

Info

Publication number: WO2022034420A1
Application number: PCT/IB2021/056883
Authority: WO
Inventors: Hans R. STRAUB; G. Edward JOHNSON; Jeremy R. KORNBLUTH; William L. Schofield Iii; David E. Yarowsky
Original assignee: 3M Innovative Properties Company
Priority date: 2020-08-12
Filing date: 2021-07-28
Publication date: 2022-02-17

Abstract

Aspects of the present disclosure relate to a method of analyzing unstructured text. The method includes receiving an input noun phrase from a user interface. The method includes generating, from the input noun phrase and with a computing device, a molecular data structure that includes a concept molecule. The method also includes accessing a repository of patient data molecules stored in a data store and identifying at least some of the repository of patient data molecules that correspond to the concept molecule. The computing device can perform at least one action with the at least some of the repository of patient data molecules.

Description

EXEMPLAR-BASED SEARCHING OF MEDICAL RECORDS

BACKGROUND

[0001] Document coding is generally a process of mapping topics included in a document to a code of a code-set. The topics in different scenarios may simply be words but may also, or instead, be the real interest is in the semantic meanings of one or more words, sentences, and paragraphs within a document which is its semantics, consisting not of words but of concepts. The code-set to which a document is mapped may be unique to an organization or purpose but may instead be a standardized code -set as set by an industry organization, a government entity, a company, or as may be needed to integrate code-set data with a particular computer system or computing environment. Regardless, document coding is performed for many reasons such as organizing, indexing, inventorying, billing, and the like. The documents may be of different types for these purposes, such as legal documents which may include evidentiary documents, medical record of procedures and services provided, academic and technical articles and papers, and others.

[0002] Initially, document coding was performed manually. There has been an ongoing effort for electronically processing documents for automatic coding. These efforts have progressed but are generally rule-driven. Such rules often provide one-to-one or many-to-one mapping of words or a semantic meaning to one code. These rules are typically inflexible, difficult to define and update, and generally expensive to maintain due to hard-coding within computer programs or components thereof and the computer code and complexity of the rules generally being inaccessible to nonexpert computer-coding employees.

[0003] Document searching can be performed using strings where a match is found for a string based on a matching string in the document. Document searching in this manner may use a lot of hardware resources.

BRIEF SUMMARY

[0004] A first aspect of the present disclosure relates to a method analyzing unstructured text. The method can include receiving an input noun phrase from a user interface. The input noun phrase corresponds to a language exemplar. The method can include generating, from the input noun phrase and with a computing device, a molecular data structure that includes a concept molecule. The concept molecule further includes a plurality of concept atoms and at least one of the concept atoms is an attribute of another concept atom or a medical code. At least one of the concept atoms has a hierarchical relationship to another concept atom. The method also includes accessing a repository of patient data molecules stored in a data store. The patient data molecule can include a plurality of patient data atoms. A patient data atom corresponds to patient medical data and at least one of the plurality of patient data atoms is an attribute of another patient data atom. The method can also include identifying at least some of the repository of patient data molecules that correspond to the concept molecule. The identifying can occur using a molecule search algorithm. The method can also include performing at least one action with the at least some of the identified repository of patient data molecules.

[0005] A second aspect relates to a computing system. The computing system can include a first computing device which includes a display configured to present a user interface. The first computing device can include an input device for interacting with the user interface, a processor; and a memory storing instructions that, when executed by the processor, configure the computing device to perform the method of analyzing unstructured text described herein.

[0006] The computing system can also include a second computing device. The second computing device can include a second display configured to present the user interface, and a second input device. Performing at least one action by the first computing device can also include accessing, by the second computing device, medical records attributed to a subset of patient data molecules.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

[0007] To easily identify the discussion of any particular element or act, the most significant digit or digits in a reference number refer to the figure number in which that element is first introduced.

[0008] FIG. 1 illustrates a method 100 in accordance with one embodiment.

[0009] FIG. 2 illustrates a method 200 in accordance with one embodiment.

[0010] FIG. 3 illustrates a concept molecule 300 derived from the input noun phrase “open fracture of distal radius” in accordance with one embodiment.

[0011] FIG. 4 illustrates an overview 400 in accordance with one embodiment.

[0012] FIG. 5 illustrates a method 500 in accordance with one embodiment.

[0013] FIG. 6 illustrates a chronology 600 of applying rules to an input noun phrase in accordance with one embodiment.

[0014] FIG. 7 illustrates a process 700 in accordance with one embodiment.

[0015] FIG. 8 illustrates a process 800 in accordance with one embodiment.

[0016] FIG. 9 is a block schematic diagram of a computer computing system 900 to implement one or more example embodiments.

DETAILED DESCRIPTION

[0017] Aspects of the present disclosure relate to using a patient data molecule and concept molecule to retrieve a plurality of medical records from an input noun phrase. The input noun phrase can be first converted to a concept molecule then searched against a plurality of patient data molecules to retrieve a medical record that corresponds to the input noun phrase.

[0018] Searching in this manner, can improve accuracy of the search with respect to the input noun phrase as well as overall performance. For example, the patient data molecule can have a hierarchical organization with attributive relationships to any of the plurality of medical records. By organizing a medical record within a patient data molecule and using a corresponding schemes for both the patient data molecule and the concept molecule, the time for a computing device to return a relevant search can be decreased and performance of the underlying computing device can be improved over searching each and every medical record.

[0019] Aspects of the present disclosure can be performed by one or more computing devices, for example, on a distributed or a localized computing device. By using the molecular data structure to organize the plurality of medical records, the computing device can also produce more relevant search results which is a practical application of the present disclosure. In one example, the plurality of medical records can be at least 1000 medical records and the duration of time between the receiving the input noun phrase and returning the medical records can be no greater than 20 seconds.

[0020] The method 100 can involve identifying at least some of the repository of patient data molecules from an input noun phrase and performing one or more actions based on the identification.

[0021] The method 100 can begin at block 102. In block 102, the computing device receives an input noun phrase. A user can input the input noun phrase via a user interface of a computing device. The input noun phrase can be a search string and correspond to a language exemplar. For example, the input noun phrase can be “cystitis in the urinary bladder” or name some other medical condition. In at least one embodiment, the input noun phrase can be text of a coding scheme. In some further embodiments, the input noun phrase may be text of a new record or document.

[0022] In block 104, the computing device can generate, from the input noun phrase, a molecular data structure such as a concept molecule. Block 104 can be similar to that described in U.S. Patent App. No. 16/724, 590, filed Dec. 23, 2019. The molecular data structure can be a form of semantic network and is described in H.R. Straub in “Das Interpretierende System” (2001) in further detail. Natural language processing technology automatically converts the input noun phrase into a structured semantic representation. For example, the findings of the natural language processing can be used to generate molecular data structure.

[0023] Natural language processing can be performed to find meaning from words. In some embodiments, a meaning is generally referred to as a concept. Such embodiments distinguish between concept types, such as atomic (single, simple and indivisible) and molecular (composite) concepts. The atomic concepts (concept atoms) are building blocks of the composite concepts (concept molecules).

[0024] Concept molecules can be represented in semantic networks as data structures built from atoms that are arranged considering the semantic relations between them. The arrangement distinguishes between hierarchical relationships and/or attributive relationships. In the graphical representation of a molecule, hierarchical relationships are shown horizontally and attributive relationships vertically. The hierarchical relationship can be between a concept and its subconcepts, the attributive relationship between a concept and its attributes.

[0025] A semantic network can include at least a concept molecule with at least a concept atom but may also include one or more other concept molecules that each include one or more concept atoms. Thus, a conceptual semantic network may be referred to as at least one concept molecule built from at least one atom or a plurality of concept atoms. In an example, at least one of the concept atoms is an attribute of another concept atom or a medical code. At least one of the concept atoms can have a hierarchical relationship to another concept atom.

[0026] In some embodiments, a code of a coding scheme to which a concept molecule has been associated may be included as a concept atom of the concept molecule. Examples of atoms, concepts, and concept molecule data structures can be described herein. In at least one embodiment, generating the molecular data structure can include interpreting the text of the input noun phrase using rules of a domain specific knowledge base in a stepwise manner such that one rule after the other makes a small interpretation change to the input, until the final molecular data structure is created.

[0027] In block 106, the computing device can apply rules to complete the molecular data structure. For example, the rules can have an equivalent structure based on a semantic net of the specialty domain. The rules can be applied by matching them to the semantic network received by the information process up to the momentary state. The rules are able to detect data missing from the semantic network. In at least one embodiment, the rules are associated with the scheme. For example, the scheme may refer to a higher-level construct and a rule can be a specific implementation of the scheme.

[0028] Block 108 and block 110 can be optional and relate to further narrowing or identifying the concept molecule or the patient data molecule. Block 108 and block 110 can be further described in FIG. 8. For example, questions in block 108 can be based on further narrowing or identifying the concept molecule. These questions can be presented, via a user interface, to a user.

[0029] In block 110, the computing device can receive answers for the questions, answers can further refine the concept molecule generated or the patient data molecule. For example, the computing device can receive an answer specifying a value for an attribute proposed by the rule, and add the received value to the molecular data structure to form a refined molecular data structure. The refined molecular data structure can be used by the computing device to identify at least some of the repository of patient data molecules uses the refined molecular data structure. [0030] In block 112, the computing device can access a repository of patient data molecules stored in a data store. The patient data molecule can be arranged according to various rules and schemes as described in block 208. These rules and schemes can be the same rules and schemes in block 208 or a different set of rules. For example, the rules used to generate the concept molecule can be based on a first version of a medical code and the rules used to create the patient data molecule can be based on a second version of a medical code. The patient data molecule comprises a plurality of patient data atoms. A patient data atom corresponds to patient medical data and at least one of the plurality of patient data atoms is an attribute of another patient data atom.

[0031] In at least one aspect, the patient data molecule can differ from the concept molecule in that the patient data molecule can correspond to a plurality of patients or to a plurality of medical records for an individual patient. In at least one embodiment, the concept molecule is not associated with any medical record but with an input noun phrase. The patient data molecule can further be based on documents holding patient data. The patient data can be mapped according to the rules which may follow a hierarchical relationship. In at least one embodiment, the hierarchical relationship of the patient data molecule can be at least partially aligned with the hierarchical relationship of the concept molecule. For example, the plurality of patient data molecules can be arranged according to a first edition of a medical code and the concept molecule can be arranged according to a second edition of a medical code. While some of the relationships may match, others may not.

[0032] In block 114, the computing device can identify at least some of the repository of patient data molecules that correspond to the concept molecule. The repository of patient data molecules can be present in a data store that can be different from that that host the plurality of medical records. By having the data store separate from the data store to host the plurality of medical records, then the structured data can be accessed more quickly. In at least one embodiment, the computing device can utilize object-oriented queries. For example, the computing device can convert the concept molecule into a particular object-oriented query and utilize the object-oriented query to search the plurality of patient data molecules. The object-oriented query can be issued against a repository of structured information that is linked to the plurality of medical records. In at least one embodiment, the computing device can perform an informational retrieval query. The informational retrieval query can be based on machine learning and return a ranked list of a plurality of patient data molecules based on previous interactions with the plurality of patient data molecules.

[0033] In at least one embodiment, block 114 can use a tree search algorithm. Tree search algorithms can be similar to graph search algorithm except the entire tree can be searched. Specifically, the tree search algorithm can be a semantic-based tree structure search. For example, trees with attributes and nodes can be searched and compared with the concept molecule, instead of trees with only nodes. Instead of searching within a tree, the tree search algorithm can be configured to identify a matching tree .

[0034] In at least one embodiment, block 114 can use a molecule search which can take advantage of the hierarchical relationship between a plurality of patient data atoms in the patient data molecule. For example, the whole patient data molecule can be searched. If the concept molecule is related to the input noun phrase “shaft of ulna”, then the concept molecule can have relationships describing the attributes of “shaft”, “ulna”, and the radiality of the concept atoms. The patient data molecule can represent all three terms simultaneously and have relationship between each of the term. Thus, a hierarchical search can be performed.

[0035] In at least one embodiment, block 114 can also use a weighted partial graph search to identify at least some of the repository of patient data molecules. The weighted partial graph search can be similar to a weighted graph search algorithm except that the weights can depend on the search term. For example, if the input noun phrase is “lipoma on the left ankle”, then the weight can be focused on “lipoma” as opposed to “left”. The combination of multiple partial graph searches may produce a stronger match to an existing graph. Further, the weighted partial graph search can be based on a correspondence of the concept molecule to a portion of a patient data molecule.

[0036] In at least one embodiment, block 114 can utilize a graph search algorithm. Examples of graph search algorithms include breadth-first graph search algorithms (e.g., first in first out), Greedy best first graph search algorithm, A* search graph search algorithms, weighted graph search algorithms (e.g., weighted A*, or Dijikstra graph search algorithm) and depth-first graph search algorithms. For example, depth-first graph search algorithms can operate on the principle of last in, first out. A depth-first graph search algorithm can compare the most specified attributes of the concept molecule to attributes of the patient data molecule to determine a match. The repository of patient data molecules are fully specified molecules based on patient data.

[0037] In block 116, the computing device can perform at least one action with the at least some of the repository of patient data molecules. One example includes returning a set of documents relating to the at least some of the repository of patient data molecules. For example, the computing device can fetch the medical record and present the medical record that matches the input noun phrase. The computing device can highlight any portion of the medical record that is associated with the input noun phrase. In this way, medical record containing the relevant information are returned to the user.

[0038] In another example, the computing device can be configured to return at least some of the repository of patient data molecules for data analysis. In at least one embodiment, some of the patient data molecules that are associated with the input noun phrase can be further analyzed using another subprocess separate from block 114 or the data store. For example, once identified, the medical record corresponding to the patient data molecules can be transmitted to another computer system for additional analysis.

[0039] In at least one embodiment, the action can be identifying patients associated with the at least some of the repository of patient data molecules for use in clinical trials. For example, if “ocular and orbital fracture” is searched for in the input noun phrase in connection with a clinical trial for protective eyewear, then the medical record corresponding to the ocular and orbital fractures can be statistically analyzed for performance with the respective eyewear by a separate system.

[0040] FIG. 2 is a block flow diagram of a method 200 of generating a patient data molecule to be stored in the repository of patient data molecules described herein. Method 200 can be used to arrange the plurality of medical records into a format with a hierarchical relationship such that the format corresponds to the hierarchical relationship of that used to generate the concept molecule. In at least one embodiment, the hierarchical relationship is established by a medical code or some common scheme. [0041] As the patient data molecule of new medical record are built according to the same processing, the output patient data molecules are aligned structurally for purposes of later matching. In at least one embodiment, the method 200 includes receiving medical record, and extracting semantics therefrom. The computing device can generate a plurality of patient data molecules for extracted semantics and the one or more generated patient data molecules are output or stored, such as to a calling process or as data stored to a data storage device or a memory device.

[0042] Method 200 can be similar to block 104, except that the input is a medical record and the semantic network/patient data molecule contains plurality of patient data atoms that relate to text from the medical record.

[0043] For example, in block 202, the computing device can receive/retrieve medical record(s) from plurality of medical records. The medical record can be associated with a single patient or a group of patients. For example, the medical record can be from the same general system (e.g., within a hospital system network), or existing in multiple networks. The plurality of medical records can be stored within a data store.

[0044] In block 204, the computing device can extract semantics from the plurality of medical records. In at least one embodiment, the extraction of semantics can also include extracting passages from each medical record to have an attributive relationship with a patient data molecule or patient data atom.

[0045] The extraction of semantics can be performed by natural language processing in its proper sense, the findings of which can be utilized to generate a semantic network or graph. The natural language processing is performed to find meaning from words. The meanings are represented by the concepts of the semantic graphs. There is a distinction between atomic (single, simple and indivisible) and molecular (composite) concepts. The atomic concepts (patient data atoms) are the building blocks of the composite concepts (patient data molecules). Patient data molecules can be represented in semantic graphs as data structures built from atoms that are arranged considering the semantic relations between them. The arrangement distinguishes between hierarchic and attributive relations.

[0046] In the graphical representation of a molecule, hierarchic relations are shown horizontally and attributive relations vertically. The hierarchic relation is between a concept and its subconcepts, the attributive one between a concept and its attributes. The molecule is thus a well-structured composite of atoms linked together with clearly distinguishable hierarchic and attributive relations. In at least one embodiment, the hierarchic relationship between concepts can be established according to a scheme, such as a coding scheme which may utilize one or more editions of a medical code set. The attributive relationship can be the medical record itself, or portions of the medical record thereof (such as paragraphs or phrases extracted based on correlation with the patient data molecule). [0047] In block 206, the computing device can generate a patient data molecule. The patient data molecule can be based on coding scheme as the higher-level construct. For example, each atom can include an attributive relation with a portion of the medical record.

[0048] Block 208, block 210, block 212, and block 214 from method 200 can be similar to method 100.

[0049] For example, block 208 can be similar to block 106 except the rules can be used to both define the hierarchical relationship and the attributive relationship. For example, if a medical code defines the hierarchical relationship of a most specified medical code and a least specified medical code, then the rules can populate the remaining submolecules/atoms between the most specified medical code and the least specified medical code. Likewise, the rules can group the medical record into specific attributive relationships with the patient data atoms. For example, if a patient data atom relates to “fractures”, then all the documents with fracture within the keywords can be attributes of the “fracture” patient data atom.

[0050] If there is any missing data within the block 210, then the computing device can generate the questions for missing data similar to block 108. Thus, the computing device can determine whether there is missing data. For example, the computing device can determine which patient data atoms are missing attributive medical record in order to ask a user whether a particular medical record (or portion thereof) is an attribute of a patient data atom. In addition, the computing device can import additional information that is not present in the medical record. The question can be triggered based on the presence of other information. For example, the computing device can ask a question on whether the patient has had a stroke, despite not being present in the medical record because of a statistically common comorbidity.

[0051] If there is a question in block 210, then the computing device can receive the answers like in block 110. In block 214, the computing device can complete the patient data molecule which can be stored in a data store having a repository of patient data molecules.

[0052] FIG. 3 is an example of a concept molecule 300, derived from the input noun phrase “open fracture of distal radius”. Concept molecule 300 consists of eleven atomic concepts bound together to one composite structure . Concept molecule 300 is a cutout of a semantic network (semantic net) that represents the semantics of a specialty domain in a most complete and structured manner and which cutout represents the semantics of the input noun phrase.

[0053] The concept molecule 300 is composed of atoms. The first line of the concept molecule 300 includes three atoms, diagnosis 305, injury 310, and fracture 315. The fracture atom 315 has an attribute of open 320. The open attribute 320 is shown as related to the fracture atom 315 by a link 325. Attributes are also atoms.

[0054] The links between the atomic concepts are represented in the concept molecule structure in a way that shows how the single atomic concepts are arranged. Every atom on the same line is of the same semantic type, e.g., all three concept atoms 305, 310, and 315 of the first line are of the type “diagnosis”, all four concept atoms 330, 335, 340, and 345 of the second line of the type “localization” and all two atoms 350 and 355 of the third line of the type “organ”. At the same time the two lines of “localization”, atom 330 and “bone” atom 350, represent attributes of the atom ’’diagnosis”. The “open” atom 320 represents an attribute of “fracture” and the “distal” atom 360 represents an attribute of “bone” and is linked as shown at 365. Concept atoms 330 and 350 are linked to diagnosis atom 305 via link 370.

[0055] Concept molecule 300 shows many implicit meanings, not literally mentioned in the input text. Concepts like “diagnosis”, “injury”, “limb”, “upper limb”, “forearm” and “bone” are all covered by the input, but not explicitly mentioned. They appear explicitly in the concept molecule. [0056] The structure of concept molecule 300 shows atoms and links which are used for the content found in the input text. The structure, however, behind the concept molecule, i.e., the semantic net, has potentially links to more types of attributes than just the ones shown in concept molecule 300.

[0057] FIG. 4 illustrates an overview 400 of concept molecule 402, patient data molecule 404, patient data molecule 406, and patient data molecule 408 according to an example embodiment. [0058] The concept molecule 402 is an example of a semantic graph built from text received in block 102, such as a language exemplar relating to a medical procedure. In at least one embodiment, the patient data molecules 302-308 can be built from portions of a defined coding scheme, such as a medical billing coding scheme as described in method 200. In at least one embodiment, the concept molecule 402 can be formed using the same or a similar scheme as the patient data molecules.

[0059] With regard to atoms as discussed above, each individual arrow element is an atom, such as “bone” and “humerus.” However, bone and humerus can be combined to form a composite (molecular) concept as a humerus is a bone. Concepts can extend in other dimensions as well, such as to provide more specific detail with regard to a concept or to provide included or implied detail already present. For example, the atom “shaft” provides more specific location detail with regard to the “bone humerus” concept. Similar with included or implied details, the “bone humerus” concept is inclusive of the “anatomy” and “diagnosis” atoms, which they themselves are also concepts.

[0060] As an example, in patient data molecule 404, various medical record (e.g., medical record 412 and medical record 414) can be associated with the hierarchy for “anatomy” and “shaft” in an attributive relationship, respectfully. For example, the medical record containing the phrase “inspection of the patient's anatomy reveals" can be highlighted and the document corresponding to the instance can be associated with “anatomy” as evidence. The radiologist notes (medical record) if describing “shaft of the humerus" can be attributive with both “shaft” and “humerus.” Of further note is that patient data molecule 404 can match with concept molecule 402. For example, the patient data molecule 406 and patient data molecule 408 do not involve the humerus shaft and the patient data molecule 410 involves a wire procedure which differs from concept molecule 402.

[0061] FIG. 5 illustrates an exemplary method 500 of generating a concept molecule. The method 500 can be associated with block 104. The order of the operations in method 500 may change depending on the input noun phrase in block 102. For example, other types of concepts may be identified before the master type of concept is found. As more words in the phrase are processed and ambiguities are resolved, the master concept may be identified. The method 500 can begin in block 502.

[0062] In block 502, the computing device can identify a master type of concept as a function of meanings of the input noun phrase. In at least one embodiment, identifying a master type concept from the input noun phrase comprises searching for a master type concept in a top level of the semantics of the specialty domain that matches a concept in the input noun phrase having a same semantic type. For example, medical procedures from the input noun phrase can match the top level of a scheme, e.g., a coding scheme.

[0063] In block 504, the computing device can generate the molecular data structure having the master type as a top level of the molecular data structure. For example, the top level can indicate that there are no other superordinate concepts. In this example, the top level can indicate the highest level of a medical code.

[0064] In at least one embodiment, generating a molecular data structure having the master type concept as a top level of the molecular data structure can also include adding concepts of the same semantic type to the top level of the molecular data structure to provide a chain of concepts in the top level of the data structure that matches the semantics of the specialty domain. For example, additional concepts related to a humerus can be added to the top level of the molecular data structure to complete a concept molecule. The added concepts may either specify the master type concept and provide thus a chain of concepts in the top level of the data structure, all of the same type, namely the master type, or the concepts may be added at one or more attribute sites of the master type concept, where the added concepts represent properties of the master type.

[0065] The attribute concepts mentioned may be arranged in the same sort of chain as the chain of concepts which specifies the master type. As with the master type concept chain, where all concepts are of the same type, namely the master type, also all concepts in the chain of one property attribute are of the same type, namely the semantic type of the attribute binding site. In the semantic net, alternative attributes of the same attribute type bind at the same attribute binding site of the master type concept. In a molecule, a concept can bind at one site; this will be one of the alternative attributes of the same attributive type or a chain starting with one of the alternative attributes. The chosen concept can represent the actual choice of the specific molecule among the possible alternatives in the overall semantic net.

[0066] In a self-similar way, concepts bound to the master type concept may also act as a focus for adding further concepts, adding them either on the same line to specify the focus concept itself or at specific attribute binding sites of the focus concept in order to specify attributive properties of the focus concept. Each bound concept may again act as such a focus concept.

[0067] In block 506, the computing device can insert additional concepts in the molecular data structure to complete the molecular data structure . The insertion can be based on molecular data structure rules having an equivalent molecular data structure. For example, the rules can have a molecular data structure that is equal to the declarative molecular data structure. An associated attribute can be included in the molecular data structure either by a corresponding rule directly or by a rule with a placeholder character in it, which rule matches then in response to any or no value being provided by the input noun phrase, depending on the operator of this placeholder atom.

[0068] In one embodiment, identifying specific sites in the input molecular data structure without an associated value is done by observing the placeholder character in the corresponding data structure of the rule molecule, in order to present a question to a user, receiving an answer specifying a value for an attribute, and adding the value to the molecular data structure.

[0069] In at least one embodiment, rules can be structured using a rule molecule that can be used to form the concept molecule. For example, a rule molecule can be considered dynamic. Rule molecules have additionally operators (indicators of how to match and change the input) and may also have pronouns. Each interpretation step looks at an input molecule set, finds a rule that matches it and executes the rule - with the result of a new input molecule set (= output molecule set of the present step = input molecule set for the next step). In this way the interpretation algorithm moves step to step from one input molecule set to the next one, each step guided by a matching rule. [0070] The rule molecules can have a “dynamic” potential with which rules can transform the corresponding concept molecules. The molecules of input, output and all the intermediate states of the text interpretation are, in contrast, of a “static” nature. The totality of the rules of an application are contained in one or more knowledge bases.

[0071] In order to execute their dynamic potential, the rule molecules can have operators assigned to one or more of their atoms. The operators are used to execute the changes to the input. The search for the exact rule to apply and the application of the rule is controlled by a software program, a “semantic interpreter” which is part of an Encoding Program as well as a Knowledge Base Interpreter. The totality of the rules of an application are contained in one or more knowledge bases. These are the “rule bases”, in extension to the software. The rule bases contain the algorithms which are created and maintained by the knowledge engineers. The language of the molecules can be seen as a high-level programming language, designed to be dealt with by domain experts (knowledge engineers) and not by software engineers, adjusted to be simple, precise and potent at the same time.

[0072] FIG. 6 is a flow diagram illustrating an interpretation chronology 600 of applying rules to an input noun phrase as described in method 500. Rules are dynamic molecules. As molecules, the rules operate in a multifocal semantic space. Their IFs (conditions) and THENs (effects) are clearly settled along the axes (= Degrees of Freedom) of the semantic space. The chronology 600 begins with an input 606 noun phrase which is represented at begin with the concept molecule 602. A first rule 608 is matched to the beginning state of the concept molecule resulting in the concept “lower leg” being added to the state molecule. A second rule 610 is now matching, adding the implicit concept “lower limb” to the molecule, followed by a third rule 612 completing the concept molecule 604 which is provided as output 614 for conversion into a code. [0073] This example chronology may be more complicated in further examples. As mentioned earlier the sequence/chronology may be more complicated and not as serial as shown.

[0074] FIG. 7 illustrates a process 700 describing at least part of the method 100, according to various embodiments. The process 700 can start with a user 714 interacting with a user interface. The user 714 can be an individual that interacts with medical record. For example, the user can be a clinician such as a doctor, nurse, coder, reviewer, or researcher.

[0075] The user 714 can provide language exemplar of what they are looking for in the form of an input noun phrase 702 which is received by the computing device. The input noun phrase 702 is shown as “cystitis in the urinary bladder." A natural language processing module can automatically convert the input noun phrase 702 to a concept molecule 704 as described in block 104 and block 106 in FIG. 1.

[0076] In at least one embodiment, the concept molecule 704 can be transformed into an object- oriented query 706 that may be useful in the computing device conducting a tree search/molecule search/graph search against a plurality of patient data molecules 708 stored in a data store.

[0077] The graph search can be performed using a graph search algorithm and can return a subset of patient data molecules 710 that may be related to the concept molecule 704. A plurality of medical records can have attributive relationships with any of the subset of patient data molecules 710. Thus, the plurality of medical records in a data store can be refined to a subset of medical record 712. The subset of medical record 712 can have one or more strings that indicate an association with the input noun phrase 702.

[0078] This process 700 can result in medical record containing the relevant information that are returned to the user 714. By organizing the plurality of medical records into a plurality of patient data molecules 708, then using a molecular data structure derived from the input noun phrase, then faster and more relevant searches can be performed using the computing device.

[0079] FIG. 8 illustrates a process 800 that is similar to process 700. The process 800 can include acquiring additional semantic knowledge that can be used to engage in a system-driven interaction that permits the user 714 to refine their object-oriented query 706.

[0080] As shown, the computing device can present a question 802 to the user 714 via the user interface. The question 802 can be relevant to complete the concept molecule 704 as described in block 108 and block 110. For example, the semantics of a cystitis diagnosis can include the germ/bacterium causing the infection, and provides the user 714 with the option of a more focused question 802. Conversely, the user 714 might be offered the opportunity to make a question 802 less specific.

[0081] In at least one embodiment, the user 714 could be offered the opportunity to provide other question 802 restrictions, e.g., Boolean, fielded, and/or free text constraints. For example, the above search could be restricted to female patients between 18 and 24 years old, where the words painful or discomfort appear in at least some of the plurality of medical records. [0082] As a result of receiving the answer to question 802, the computing device can populate additional sections of the concept molecule 704 to form a refined concept molecule 804 (i.e., refined molecular data structure). The refined concept molecule 804 can be translated to the object- oriented query 806 and searched as described in process 700.

[0083] FIG. 9 is a block schematic diagram of a computing system 900 to implement the system and for performing methods and algorithms according to example embodiments. All components need not be used in various embodiments.

[0084] One example computing device in the form of a computing device 902 may include a processing unit 906, memory 904, removable storage 914, and non-removable storage 916. Although the example computing device is illustrated and described as computing device 902, the computing device may be in different forms in different embodiments. For example, the computing device may instead be a smartphone, a tablet, smartwatch, smart storage device (SSD), or other computing device including the same or similar elements as illustrated and described with regard to FIG. 9. Devices, such as smartphones, tablets, and smartwatches, are generally collectively referred to as mobile devices or user equipment.

[0085] Although the various data storage elements are illustrated as part of the computing device 902, the storage may also or alternatively include cloud-based storage accessible via a network, such as the Internet or server-based storage. Note also that an SSD may include a processor on which the parser may be run, allowing transfer of parsed, filtered data through I/O channels between the SSD and main memory.

[0086] Memory 904 may include volatile memory 910 and non-volatile memory 912. Computing device 902 may include - or have access to a computing environment that includes - a variety of computer-readable media, such as volatile memory 910 and non-volatile memory 912, removable storage 914 and non-removable storage 916. Computer storage includes random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM) or electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, compact disc read-only memory (CD ROM), Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium capable of storing computer-readable instructions.

[0087] Computing device 902 may include or have access to a computing environment that includes input 918, output 920, and a communication connection 922. Output 920 may include a display device, such as a touchscreen, that also may serve as an input device. The input 918 may include one or more of a touchscreen, touchpad, mouse, keyboard, camera, one or more devicespecific buttons, one or more sensors integrated within or coupled via wired or wireless data connections to the computing device 902, and other input devices. The computer may operate in a networked environment using a communication connection to connect to one or more remote computers, such as database servers. The remote computer may include a personal computer (PC), server, router, network PC, a peer device or other common data flow network switch, or the like. The communication connection may include a Local Area Network (LAN), a Wide Area Network (WAN), cellular, Wi-Fi, Bluetooth, or other networks. According to one embodiment, the various components of computing device 902 are connected with a system bus.

[0088] Computer-readable instructions stored on a computer-readable medium are executable by the processing unit 906, such as a program 908. The program 908 in some embodiments comprises software to implement one or more of the methods of generating and completing concept molecules and assigning codes to noun phrases. A hard drive, CD-ROM, and RAM are some examples of articles including a non-transitory computer-readable medium such as a storage device. The terms computer-readable medium and storage device do not include carrier waves to the extent carrier waves are deemed too transitory. Storage can also include networked storage, such as a storage area network (SAN). Computer program 908 along with the workspace manager may be used to cause processing unit 906 to perform one or more methods or algorithms described herein.

[0089] In at least one embodiment, the program 908 can also comprise the natural language processing described herein. In at least one embodiment the computing device 902 can communicate with the data store 924. For example, the data store 924 may store the plurality of patient data molecules that were previously determined, and the medical record associated thereof. Thus, the patient data molecule can be uploaded into the data store 924 once determined. In at least one embodiment, a second computing device 926 can be used at a different location to input commonly used concept molecules into data store 924. The computing device 926 can have a user interface to interact with the medical record. The computing device 902 and computing device 926 can access the data store 924 to perform aspects of the present disclosure. The computing device 926 and the computing device 902 can include an input device such as a mouse, keyboard, or voice inputs to receive the input noun phrase and a display for presenting the user interface.

[0090] " Attributive relationship" refers to a relationship where a thing is an attribute to another thing.

[0091] "Concept atom" refers to an indivisible concept.

[0092] "Concept molecule" refers to a well-structured composite of atoms linked together with clearly distinguishable hierarchic and attributive relations. Concept molecules are built of concept atoms, which are arranged in a structure which represents the relations between the concept atoms. The resulting structure is in detail described by H.R. Straub in the book “Das Interpretierende System” (Z/I/M Verlag, 2001).

[0093] "Correspond" refers to having a close similarity or match. Correspond can refer to a (high) probability of a concept atom matching to a patient data molecule. Correspond can refer to being equivalent or similar in character, quantity, quality, origin, structure, or function while correlate is to compare things and bring them into a relation having corresponding characteristics.

[0094] " Data store" refers to a repository for persistently storing and managing collections of data which include not just repositories like databases, but also simpler store types such as simple files, emails etc. A database is a series of bytes that is managed by a database management system (DBMS).

[0095] "Graph" refers to a mathematical abstraction having a set of objects in which some pairs of the objects are related.

[0096] "Graph search algorithm" refers to the processes of visiting each node in a graph. Example graph search algorithms are described at https://neo4j.com/blog/graph-search-algorithm- basics/#:~:text=There%20are%20two%20basic%20types,until%20the%20query%20is%20answere. [0097] " Hierarchical relationship" refers to a relationship based on levels of subordination and superordination. The hierarchy can be established by a particular ontology.

[0098] "Input noun phrase" refers to a group of words standing together as a conceptual unit. Words can be distinct meaningful elements of speech or writing, used with others (or sometimes alone) to form a sentence.

[0099] " Medical code" refers to a code defined for a particular medical purpose, such as by a governmental body, a consortium, a standard setting group, and the like. Some such coding schemes may be for medical billing, which may include facility and professional reimbursement fact coding elements. Examples of such medical billing codes include codes associated with the International Classification of Diseases (ICD) codes (versions 9 and 10), Current Procedural Technology (CPT) codes, a Healthcare Common Procedural Coding System codes (HCPCS), and Physician Quality Reporting System (PQRS) codes. In some examples, the codes associated with facility reimbursement include ICD codes, CPT codes, and HCPCS codes. Generally, these reimbursement facts are related to the services and equipment provided by the facility where the patient encounter occurred. Codes associated with professional reimbursement may include ICD codes, CPT codes, and PQRS codes.

[0100] " Medical record" refers to documentation related to clinical procedures performed on a patient or population of patients.

[0101] " Molecular data structure" refers to a cutout of a semantic network that represents the semantics of text of the input noun phrase in a complete and structured manner based on semantics of a specialty domain. The cutout can represent the semantics of the input noun phrase.

[0102] " Patient data atom" refers to an indivisible concept having an attributive relationship to at least one medical record.

[0103] " Patient data molecule" refers to a data structure that organizes and stores patient data. The patient data molecule can be similar to a semantic network, except that the semantic network can refer to a collection of known ontologies. The patient data molecule represents some unknown ontologies. In some embodiments, inheritance, normalization, and recursiveness can be contributions of the molecular data structure .

[0104] " Scheme" refers to a particular ordered system or arrangement. A scheme can follow certain rules or structures. For example, the scheme can be defined for a particular purpose, such as by a governmental body, a consortium, a standard setting group, and the like. A coding scheme may be for medical billing (i.e., a medical code), which may include facility and professional reimbursement fact coding elements. Examples of such medical billing codes include codes associated with the International Classification of Diseases (ICD) codes (versions 9 and 10), Current Procedural Technology (CPT) codes, a Healthcare Common Procedural Coding System codes (HCPCS), and Physician Quality Reporting System (PQRS) codes. In some examples, the codes associated with facility reimbursement include ICD codes, CPT codes, and HCPCS codes. Generally, these reimbursement facts are related to the services and equipment provided by the facility where the patient encounter occurred. Codes associated with professional reimbursement may include ICD codes, CPT codes, and PQRS codes. Generally, these reimbursement facts are related to the services and equipment provided by the attending medical professional. In other examples, the facility and professional reimbursement facts may include any medical billing codes.

[0105] " Semantic network" refers to a knowledge base that represents semantic relations between concepts in a network. Can be a directed or undirected graph having nodes, which represent concepts, and edges. The semantic network can represent the semantics of a specialty domain in a most complete and structured manner

[0106] " User" refers to an entity that uses a computing device, e.g., a clinician, or a medical coder.

[0107] " User interface" refers to the means by which the user and a computer system interact, in particular the use of input devices and software.

[0108] In the following description, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration specific embodiments which may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the disclosure, and it is to be understood that other embodiments may be utilized, and that structural, logical and electrical changes may be made without departing from the scope of the present disclosure. The following description of example embodiments is, therefore, not to be taken in a limited sense, and the scope of the present disclosure is defined by the appended claims.

[0109] The functions or algorithms described herein may be implemented in software in one embodiment. The software may consist of computer executable instructions stored on computer readable media or computer readable storage device such as one or more non-transitory memories or other type of hardware-based storage devices, either local or networked. Further, such functions correspond to modules, which may be software, hardware, firmware or any combination thereof. Multiple functions may be performed in one or more modules as desired, and the embodiments described are merely examples. The software may be executed on a digital signal processor, ASIC, microprocessor, or other type of processor operating on a computer system, such as a personal computer, server or other computer system, turning such computer system into a specifically programmed machine. [0110] The functionality can be configured to perform an operation using, for instance, software, hardware, firmware, or the like. For example, the phrase “configured to” can refer to a logic circuit structure of a hardware element that is to implement the associated functionality. The phrase “configured to” can also refer to a logic circuit structure of a hardware element that is to implement the coding design of associated functionality of firmware or software. The term “module” refers to a structural element that can be implemented using any suitable hardware (e.g., a processor, among others), software (e.g., an application, among others), firmware, or any combination of hardware, software, and firmware. The term, “logic” encompasses any functionality for performing a task. For instance, each operation illustrated in the flowcharts corresponds to logic for performing that operation. An operation can be performed using, software, hardware, firmware, or the like. The terms, “component,” “system,” and the like may refer to computer-related entities, hardware, and software in execution, firmware, or combination thereof. A component may be a process running on a processor, an object, an executable, a program, a function, a subroutine, a computer, or a combination of software and hardware. The term, “processor,” may refer to a hardware component, such as a processing unit of a computer system.

[0111] Furthermore, the claimed subject matter may be implemented as a method, apparatus, or article of manufacture using standard programming and engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computing device to implement the disclosed subject matter. The term, “article of manufacture,” as used herein is intended to encompass a computer program accessible from any computer-readable storage device or media. Computer-readable storage media can include, but are not limited to, magnetic storage devices, e.g., hard disk, floppy disk, magnetic strips, optical disk, compact disk (CD), digital versatile disk (DVD), smart cards, flash memory devices, among others. In contrast, computer- readable media, i.e., not storage media, may additionally include communication media such as transmission media for wireless signals and the like.

List of Illustrative Embodiments:

1. A method of analyzing unstructured text, comprising: receiving an input noun phrase from a user interface, the input noun phrase corresponds to a language exemplar; generating, from the input noun phrase and with a computing device, a molecular data structure that includes a concept molecule, the concept molecule further comprises a plurality of concept atoms, at least one of the concept atoms is an attribute of another concept atom or a medical code, and at least one of the concept atoms has a hierarchical relationship to another concept atom; accessing a repository of patient data molecules stored in a data store, wherein a patient data molecule comprises a plurality of patient data atoms, a patient data atom corresponds to patient medical data, at least one of the plurality of patient data atoms is an attribute of another patient data atom; identifying at least some of the repository of patient data molecules that correspond to the concept molecule; and performing at least one action with the at least some of the repository of patient data molecules.

2. The method of embodiment 1, wherein performing at least one action comprises returning or retrieving medical records relating to the at least some of the repository of patient data molecules.

3. The method of any of the preceding embodiments, wherein performing at least one action comprises returning at least some of the repository of patient data molecules for data analysis.

4. The method of any of the preceding embodiments, wherein performing at least one action comprises identifying patients associated with the at least some of the repository of patient data molecules for use in clinical trials.

5. The method of any of the preceding embodiments, wherein the patient data molecule corresponds to a plurality of patients or to a plurality of medical records for an individual patient.

6. The method of any of the preceding embodiments, wherein the patient data molecule corresponds to a plurality of portions from a plurality of medical records, the plurality of portions is associated with multiple patients.

7. The method of any of the preceding embodiments, wherein identifying at least some of the repository of patient data molecules that correspond to the concept molecule utilizes a graph search algorithm.

8. The method of embodiment 7, wherein the graph search algorithm is a depth-first algorithm.

9. The method of embodiment 7, wherein the identifying at least some of the repository of patient data molecules further comprises: comparing most specified attributes of the concept molecule to attributes of the patient data molecule to determine a match, wherein the repository of patient data molecules are fully specified molecules based on patient data.

10. The method of any of the preceding embodiments, wherein the plurality of patient data molecules is arranged in a second hierarchical relationship, wherein the hierarchical relationship is partially aligned with the second hierarchical relationship.

11. The method of any of the preceding embodiments, wherein the generating the molecular data structure comprises: identifying a master type of concept as a function of meanings of the input noun phrase; generating the molecular data structure having the master type as a top level of the molecular data structure; and inserting additional concepts in the molecular data structure based on molecular data structure rules having an equivalent molecular data structure.

12. The method of any of the preceding embodiments, wherein generating the molecular data structure comprises interpreting text of the input noun phrase using rules of a domain specific knowledge base in a stepwise manner such that one rule after the other makes a small interpretation change to the input, until the final molecular data structure is created.

13. The method of any of the preceding embodiments, further comprising: presenting, via the user interface, a question created by a rule to a user; receiving an answer specifying a value for an attribute proposed by the rule; and adding the received value to the molecular data structure to form a refined molecular data structure; wherein identifying at least some of the repository of patient data molecules uses the refined molecular data structure.

14. The method of any of the preceding embodiments, wherein the concept molecule or patient data molecule is not constrained by the medical code.

15. The method of any of the preceding embodiments, wherein the concept molecule is not based on an existing semantic network.

16. The method of any of the preceding embodiments, wherein the patient data molecule uses rules of a domain specific knowledge base.

17. The method of any of the preceding embodiments, wherein the data store is a graph database.

18. The method of embodiment 17, wherein the graph database is an RDF triplestore.

19. The method of any of the preceding embodiments, further comprising generating a plurality of patient data molecules according to a scheme, wherein the scheme is based on the medical code and its hierarchy.

20. The method of embodiment 19, wherein generating the plurality of patient data molecules comprises: receiving a medical record; extracting semantics from the medical record; generating the patient data molecule using the semantics and the scheme to form a hierarchical relationship amongst the plurality of patient data atoms within the patient data molecule, wherein portions of the medical record have an attributive relationship with some of the plurality of patient data atoms; applying rules related to the scheme; and completing the patient data molecule.

21. The method of embodiment 20, wherein generating the concept molecule and generating the patient data molecule both use corresponding schemes.

22. A non-transitory computer-readable storage medium including instructions that, when processed by a computer, configure the computer to perform the method of any of the preceding embodiments.

23. A computing system comprising: a first computing device, comprising: a display configured to present a user interface; an input device for interacting with the user interface; a processor; and a memory storing instructions that, when executed by the processor, configure the computing device to: receive an input noun phrase from the user interface, the input noun phrase corresponds to a language exemplar; generate, from the input noun phrase and with a computing device, a molecular data structure that includes a concept molecule, the concept molecule further comprises a plurality of concept atoms, at least one of the concept atoms is an attribute of another concept atom or a medical code, and at least one of the concept atoms has a hierarchical relationship to another concept atom; access a repository of patient data molecules stored in a data store, wherein a patient data molecule comprises a plurality of patient data atoms, a patient data atom corresponds to patient medical data, at least one of the plurality of patient data atoms is an attribute of another patient data atom; identify at least some of the repository of patient data molecules that correspond to the concept molecule; and perform at least one action with the at least some of the repository of patient data molecules.

24. The computing system of any of the preceding embodiments: further comprising: a second computing device, comprising: a second display configured to present the user interface; a second input device; and wherein in performing at least one action comprises accessing, by the second computing device, medical records attributed to a subset of patient data molecules.

25. The computing system of any of the preceding embodiments, wherein performing at least one action comprises returning a set of documents or portions thereof relating to the at least some of the repository of patient data molecules.

26. The computing system of any of the preceding embodiments, wherein performing at least one action comprises returning at least some of the repository of patient data molecules for data analysis.

27. The computing system of any of the preceding embodiments, wherein performing at least one action comprises identifying patients associated with the at least some of the repository of patient data molecules for use in clinical trials.

28. The computing system of any of the preceding embodiments, wherein the plurality of patient data molecules is arranged in a second hierarchical relationship, wherein the hierarchical relationship is partially aligned with the second hierarchical relationship. 29. The computing system of any of the preceding embodiments, wherein the instructions further configure the computing device to: present, via the user interface, a question created by a rule to a user; receive an answer specifying a value for an attribute proposed by the rule; and add the received value to the molecular data structure to form a refined molecular data structure; wherein identifying at least some of the repository of patient data molecules uses the refined molecular data structure.

30. The computing system of any of the preceding embodiments, wherein the instructions further configure the computing device to generate a plurality of patient data molecules according to a scheme, wherein the scheme is based on the medical code.

31. The computing system of embodiment 30, wherein generating the plurality of patient data molecules comprises: receiving a medical record; extracting semantics from the medical record; generating the patient data molecule using the semantics and the scheme to form the hierarchical relationship amongst the plurality of patient data atoms within the patient data molecule, wherein portions of the medical record have an attributive relationship with some of the plurality of patient data atoms; apply rules related to the scheme; and complete the patient data molecule.

32. The computing system of embodiment 31, wherein generating the concept molecule and generating the patient data molecule both use corresponding schemes.

Claims

What is claimed is:

2. The method of claim 1, wherein performing at least one action comprises returning medical records relating to the at least some of the repository of patient data molecules.

3. The method of claim 1 or 2, wherein performing at least one action comprises returning at least some of the repository of patient data molecules for data analysis.

4. The method of any of claims 1 to 3, wherein the patient data molecule corresponds to a plurality of patients or to a plurality of medical records for an individual patient.

5. The method of any of claims 1 to 4, wherein identifying at least some of the repository of patient data molecules that correspond to the concept molecule utilizes a graph search algorithm.

6. The method of claim 5, wherein the identifying at least some of the repository of patient data molecules further comprises: comparing most specified attributes of the concept molecule to attributes of the patient data molecule to determine a match, wherein the repository of patient data molecules are fully specified molecules based on patient data.

7. The method of any of claims 1 to 6, wherein the plurality of patient data molecules is arranged in a second hierarchical relationship, wherein the hierarchical relationship is partially aligned with the second hierarchical relationship.

8. The method of any of claims 1 to 7, wherein the generating the molecular data structure comprises: identifying a master type of concept as a function of meanings of the input noun phrase; generating the molecular data structure having the master type as a top level of the molecular data structure; and inserting additional concepts in the molecular data structure based on molecular data structure rules having an equivalent molecular data structure.

9. The method of any of claims 1 to 8, further comprising: presenting, via the user interface, a question created by a rule to a user; receiving an answer specifying a value for an attribute proposed by the rule; and adding the received value to the molecular data structure to form a refined molecular data structure; wherein identifying at least some of the repository of patient data molecules uses the refined molecular data structure.

10. The method of any of claims 1 to 7, further comprising generating a plurality of patient data molecules according to a scheme, wherein the scheme is based on the medical code and its hierarchy.

11. The method of claim 10, wherein generating the plurality of patient data molecules comprises: receiving a medical record; extracting semantics from the medical record; generating the patient data molecule using the semantics and the scheme to form a hierarchical relationship amongst the plurality of patient data atoms within the patient data molecule, wherein portions of the medical record have an attributive relationship with some of the plurality of patient data atoms; applying rules related to the scheme; and completing the patient data molecule.

12. A non-transitory computer-readable storage medium including instructions that, when processed by a computer, configure the computer to perform the method of any of claims 1 to 11.

13. A computing system comprising: a first computing device, comprising: a display configured to present a user interface; an input device for interacting with the user interface; a processor; and a memory storing instructions that, when executed by the processor, configure the computing device to: receive an input noun phrase from the user interface, the input noun phrase corresponds to a language exemplar; generate, from the input noun phrase and with a computing device, a molecular data structure that includes a concept molecule, the concept molecule further comprises a plurality of concept atoms, at least one of the concept atoms is an attribute of another concept atom or a medical code, and at least one of the concept atoms has a hierarchical relationship to another concept atom; access a repository of patient data molecules stored in a data store, wherein a patient data molecule comprises a plurality of patient data atoms, a patient data atom corresponds to patient medical data, at least one of the plurality of patient data atoms is an attribute of another patient data atom; identify at least some of the repository of patient data molecules that correspond to the concept molecule; and perform at least one action with the at least some of the repository of patient data molecules.

14. The computing system of claim 13: further comprising: a second computing device, comprising: a second display configured to present the user interface; a second input device; and wherein in performing at least one action comprises accessing, by the second computing device, medical records attributed to a subset of patient data molecules.

15. The computing system of claim 13 or 14, wherein performing at least one action comprises returning a set of documents or portions thereof relating to the at least some of the repository of patient data molecules.

16. The computing system of any of claims 13 to 15, wherein the plurality of patient data molecules is arranged in a second hierarchical relationship, wherein the hierarchical relationship is partially aligned with the second hierarchical relationship.

17. The computing system of any of claims 13 to 16, wherein the instructions further configure the computing device to: present, via the user interface, a question created by a rule to a user; receive an answer specifying a value for an attribute proposed by the rule; and add the received value to the molecular data structure to form a refined molecular data structure; wherein identifying at least some of the repository of patient data molecules uses the refined molecular data structure.

18. The computing system of claim 17, wherein the instructions further configure the computing device to generate a plurality of patient data molecules according to a scheme, wherein the scheme is based on the medical code.

19. The computing system of claim 18, wherein generating the plurality of patient data molecules comprises: receiving a medical record; extracting semantics from the medical record; generating the patient data molecule using the semantics and the scheme to form the hierarchical relationship amongst the plurality of patient data atoms within the patient data molecule, wherein portions of the medical record have an attributive relationship with some of the plurality of patient data atoms; apply rules related to the scheme; and complete the patient data molecule.

20. The computing system of claim 19, wherein generating the concept molecule and generating the patient data molecule both use corresponding schemes.

25