CN116682553B

CN116682553B - Diagnosis recommendation system integrating knowledge and patient representation

Info

Publication number: CN116682553B
Application number: CN202310961100.XA
Authority: CN
Inventors: 李劲松; 辛然; 田雨; 周天舒
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2023-08-02
Filing date: 2023-08-02
Publication date: 2023-11-03
Anticipated expiration: 2043-08-02
Also published as: CN116682553A

Abstract

The application discloses a diagnosis recommendation system integrating knowledge and patient representation, which comprises a knowledge graph construction module, a knowledge representation learning module and a diagnosis recommendation module; the application utilizes the medical knowledge graph to link the patient data and the medical knowledge, and can more comprehensively and accurately express the relationship between the medical knowledge and the patient data; the application associates time sequence data of patient diagnosis with medical knowledge graph, constructs a patient information graph, uses a rule learning algorithm to prune the patient information graph, reduces knowledge representation learning domain, and forms a patient information sequence by the associated data, thereby better utilizing patient history data and improving diagnosis recommendation accuracy; the application provides a semantic association method for calculating the similarity of fine-grained semantic units, which can more accurately align medical entities; the application introduces a patient sequence representation learning model constructed based on an unsupervised convolutional neural network, and further improves the accuracy of diagnosis recommendation.

Description

Diagnosis recommendation system integrating knowledge and patient representation

Technical Field

The application belongs to the technical field of medical health information, and particularly relates to a diagnosis recommendation system for fusing knowledge and patient representation.

Background

In recent years, with the deep popularization of medical informatization, the accumulation of medical data presents explosive growth, and how to use the data to realize accurate diagnosis and treatment becomes a hot problem in the medical field.

Clinical manifestations and morphological abnormalities are two important concepts in medical diagnosis, as are important facts in the progression of a patient's condition, clinical manifestations generally refer to the disease or symptoms and signs of the disease, including pain, fever, nausea, vomiting, diarrhea, rash, etc., perceived by a patient and observed by a physician, who can determine the kind and extent of the disease through the patient's clinical manifestation, and thus formulate a corresponding treatment regimen. Morphological abnormalities generally refer to changes in tissue or cellular structure caused by disease, including bumps, tumors, ulcers, inflammation, and the like. These morphological abnormalities are usually obtained by imaging, pathology, etc., and are objective evidence of clinical manifestations. Morphological abnormalities may provide information to the physician regarding disease progression and treatment response.

Most of the existing diagnosis recommendation systems analyze based on medical data, clinical manifestations and morphological anomalies are often complex multi-factor correlations, interaction among different factors needs to be considered, interpretation and understanding need doctors to have abundant clinical experience and professional knowledge, and the complex correlations of the traditional systems are underutilized, so that recommendation results are insufficient in accuracy and comprehensiveness.

The diagnosis recommendation system is an application scene with high knowledge dependence, and the knowledge graph is used as a novel technology for representing and storing knowledge, and has the advantages of strong semantic expression capability, rich semantic relevance and the like. Knowledge representation learning techniques can provide potential solutions to the problem of underutilization of complex knowledge associations of clinical manifestations and morphological anomalies by the system, but still suffer from some drawbacks.

Existing diagnostic recommendation systems suffer from the following three disadvantages:

1. knowledge graph-based diagnostic recommendation systems often have difficulty in associating knowledge with data, and existing medical standard terms often have "unknown word problem" (OOV) when aligned with medical entities existing in electronic medical records, so that it is difficult to ensure semantic consistency and interoperability between different medical entities, and an automatic association method is lacking, so that cascading errors are generated, and reasoning errors occur.

2. The traditional diagnosis recommendation system is based on a machine learning and rule template method, and has lower accuracy when complex features are difficult to express by simple logic in processing of spatial alignment of entities including clinical manifestations, morphological anomalies and the like. Therefore, more advanced techniques, such as learning complex logic correlations in expert knowledge using knowledge-graph techniques, are needed to improve accuracy.

3. The prior diagnosis recommendation system has insufficient utilization rate of the related information and time sequence information of the information of multiple patient visits. In order to more comprehensively understand the disease development of a patient, multiple visit information needs to be integrated, a continuously observed patient sequence record is established, and the patient representation learning and capturing time sequence related information is utilized to conduct model prediction and provide visual display of a result evidence.

Disclosure of Invention

The application aims to provide a diagnosis recommendation system for fusing knowledge and patient representation, aiming at overcoming the defects of a knowledge graph technology in aspects of difficult semantic association of knowledge data, insufficient utilization of complex semantic association and the like of the diagnosis recommendation system.

The aim of the application is realized by the following technical scheme: a diagnosis recommendation system for fusing knowledge and patient representation comprises a knowledge graph construction module, a knowledge representation learning module and a diagnosis recommendation module;

(1) Knowledge graph construction module: defining the body of the patient information map, constructing a medical knowledge map, extracting patient information, and carrying out entity association mapping on the patient information and the medical knowledge map to complete the construction of the patient information map;

(2) A knowledge representation learning module comprising:

screening and pruning the patient information map by adopting a rule learning algorithm, and removing patient nodes and associated edges thereof to obtain a simplified patient information map;

training a knowledge representation learning model by using the simplified patient information map, and converting any entity into a knowledge representation embedding vector;

connecting fact nodes associated with each patient node in a patient information map into a patient longitudinal sequence in a time sequence relationship, dividing the fact nodes in the patient longitudinal sequence into a plurality of subsequences according to body parts where facts occur, substituting knowledge representation embedding vectors of entities into the patient longitudinal sequence to obtain a representation matrix of each subsequence, and training a patient sequence representation learning model by using the subsequence representation matrix to obtain a patient sequence representation of each patient;

(3) And the diagnosis recommendation module is used for finding out a patient with highest overall similarity with the patient to be diagnosed from the patient diagnosis information base based on the patient sequence representation and the subsequence representation corresponding to each body part, and taking the diagnosis result of the found patient as the diagnosis recommendation result of the patient to be diagnosed.

Further, the patient information map comprises patient information and medical knowledge map information, the patient information and the medical knowledge map information are stored in an attribute map model, a node set comprises nodes of different types and node attributes, the node types comprise patients, body parts, clinical manifestations, morphological abnormalities and diseases, and the clinical manifestation nodes and the morphological abnormality nodes are taken as fact nodes; the relationship set contains the logical relationships between different nodes and the time sequence relationships between different real nodes of the patient.

Further, the construction of the medical knowledge graph comprises two parts of a body part knowledge graph and a diagnosis knowledge graph; the body part knowledge graph comprises body parts and relations thereof, and models the inclusive, lateral and azimuthal relations between body part entities.

Further, the extracting patient information includes: extracting corresponding entities and relations contained in the patient information map body in the patient electronic medical record data by using a sequence labeling model based on deep learning; after the entity extraction is completed, the edges between the nodes are complemented by a remote supervision method with predefined rules, and the relation extraction is completed.

Further, the entity association map includes:

extracting all entities from the medical knowledge graph as an initial medical term library;

splitting different types of entities to be mapped into different information units, calculating the similarity between each information unit and candidate terms of the corresponding information unit types in the medical term library, and if the terms meeting the similarity conditions do not exist in the medical term library, adding the information units to the medical term library; until all information units can find corresponding terms in the medical term library;

taking medical terms formed by splicing the terms corresponding to the information units as mapping results, and if the corresponding mapping result terms do not exist in the medical term library, adding the terms to the medical term library;

and establishing a hierarchical relationship between the newly added term and other terms in the medical term library while the term is newly added.

Further, the training of the knowledge representation learning model includes:

defining and simplifying the calculation of the entity relation triplet score in the patient information map; for each entity and relation in the simplified patient information map, embedding the entity and relation into a set dimension vector space to obtain an entity vector and a relation vector; splicing the head node entity vector and the relation vector along the last dimension, carrying out convolution operation on the spliced vector to obtain a new vector, and carrying out dot product operation on the new vector and the tail node entity vector to obtain a triplet score;

and carrying out negative sampling on the triplet data in the training process, calculating to obtain an error triplet score, training by adopting a hinge loss function, and obtaining a knowledge representation embedded vector corresponding to any entity through a trained knowledge representation learning model.

Further, the patient sequence representation learning model consists of a time sequence information extraction layer and a full connection layer which are sequentially connected;

the time sequence information extraction layer uses a convolutional neural network as a time filter to extract a local time mode of a patient subsequence representation matrix and performs unilateral convolutional calculation;

the full-connection layer extracts an output vector of the linear layer as a coded representation of a patient sub-sequence, and averages all the coded representations of the patient sub-sequences component by component to be used as a patient sequence representation; the KL divergence loss function is used in the training process.

Further, the timing information extraction layer includes:

defining a time filter asWhere s is the temporal filter window size and d is the knowledge representation embedding directionThe measurement dimension is used for constructing q time filters in total, and the output of any time filter t isWherein->To activate the function +.>For temporal filters->I-th dimension output of>Embedding vectors for the ith dimensional knowledge representation of the representation matrix of the subsequence, < >>For convolution function +.>Is the t time filter bias vector;

splicing the vectors output by the q time filters, pooling with the step length of 1, and remolding to obtainVector of dimensions, where L is the subsequence length.

Further, the full connection layers are respectively formed by the dimensions of、/>、/>、Four linear layers->And two activation functions->The composition, M, is the length of the longitudinal sequence of the patient and the feedforward expression order is +.>The method comprises the steps of carrying out a first treatment on the surface of the Remodelling the output vector of the initialized patient sequence representation learning model to +.>A matrix of dimensions, denoted->Using KL divergence loss, i.e. loss function +.>Wherein->Is thatIs valued in column i>Value for the ith column of the representation matrix of the subsequence,/->As a Softmax function; extracting the trained patient sequence to represent +.>The output vector of the layer is used as a coded representation of the patient sub-sequence, and the coded representation of all sub-sequences of the patient is averaged component by component as a patient sequence representation.

Further, the construction process of the patient diagnosis information base is as follows: and (3) setting a disease set supported by the diagnosis recommendation system as D, screening a patient list P diagnosed as the disease in the set D from the electronic medical record data of the hospital patients, and calculating patient sequence representations of patients in the patient list P to construct a patient diagnosis information base.

Compared with the traditional diagnosis recommendation system based on the machine learning model and the rule template, the diagnosis recommendation system has the following main advantages:

1. medical knowledge graph is introduced: the application utilizes the medical knowledge graph to link the patient data and the medical knowledge, can more comprehensively and accurately express the relationship between the medical knowledge and the patient data, and enables the diagnosis recommendation system to better understand and process the patient data.

2. Patient timing data is considered: the application relates to time sequence data of patient diagnosis and medical knowledge graph, constructs a patient information graph, uses a rule learning algorithm to prune the patient information graph, reduces knowledge representation learning domain, and relates data to form a patient information sequence. Thus, the patient history data can be better utilized, and the accuracy of diagnosis recommendation is improved.

3. The entity alignment method is improved: the application provides a semantic association method for calculating the similarity of fine-grained semantic units, which can solve the OOV condition existing when medical entities of knowledge and data sources are aligned. This approach may more accurately align the medical entities so that the diagnostic recommendation system may better understand and process the medical data.

4. A representation learning model is introduced: the application introduces a patient sequence representation learning model constructed based on an unsupervised convolutional neural network, can better utilize the historical data of the patient, and improves the accuracy of diagnosis recommendation. Meanwhile, the model can automatically extract complex features in the historical data of the patient, and the complicated process of manually extracting the features is avoided.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a block diagram of a system for diagnosis recommendation incorporating knowledge and patient representation in accordance with an embodiment of the present application.

FIG. 2 is a flow chart of entity association mapping provided in an embodiment of the present application;

fig. 3 is a flowchart of a knowledge representation learning module obtaining a patient table representation and a diagnosis recommendation module obtaining a diagnosis recommendation result according to an embodiment of the present application.

Detailed Description

For a better understanding of the technical solution of the present application, the following detailed description of the embodiments of the present application refers to the accompanying drawings.

It should be understood that the described embodiments are merely some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

The embodiment of the application provides a diagnosis recommendation system for fusing knowledge and patient representation, which comprises a knowledge graph construction module, a knowledge representation learning module and a diagnosis recommendation module as shown in fig. 1. The following description further presents some embodiments of the implementation of the modules of the diagnostic recommendation system that integrate knowledge and patient representations in accordance with the requirements of the present application.

1. Knowledge graph construction module

The knowledge graph construction module comprises four parts, namely knowledge ontology definition, medical knowledge graph construction, patient information extraction and entity association mapping.

1.1 ontology definition

The knowledge graph defined by the application is named as a patient information graph and comprises patient information extracted from the text data of the electronic medical record of the patientAnd medical knowledge-graph information constructed by section 1.2. Patient information mapIs +.>A collection of facts stored in a form. />Wherein->Respectively indicate->Node set and edge set of (2), j->For the binary relation set, let ∈ ->For the entity set->There is->。

In the embodiment of the application, the patient information map is stored in an attribute map model, and the ontology is defined as a node setIncluding nodes of different types and attributes of each node, the node types including patients, body parts, clinical manifestations, morphological abnormalities, diseases, etc., wherein the clinical manifestation nodes and the morphological abnormality nodes are taken as fact nodes; the node attributes comprise node identification, occurrence parts, side deviation, element expression, trend, occurrence frequency, current state time, size, property, boundary condition and the like; relation set->Comprising a patient and a bodyThe physical and morphological abnormalities, the clinical manifestations and diseases, the morphological abnormalities and diseases, the logical relationships of the membership of the physical and physical parts, the temporal relationships between the nodes of different facts of the patient.

1.2 construction of medical knowledge graph

The embodiment of the application constructs the medical knowledge graph from two medical knowledge sources and provides information of expert knowledge sources for the patient information graph.

Firstly, constructing a body part knowledge graph containing body parts and relations thereof for abstracting the spatial information of medical entities corresponding to clinical manifestations and morphological abnormalities, and modeling the containing, lateral and azimuth relations among the body part entities. The body part defined in the present application is broad, referring to the site where clinical manifestations and morphological abnormalities occur, and includes not only anatomical sites (refer to structures of body tissues or organs including bones, muscles, blood vessels, nerves, organs, etc.), but also focal sites (refer to abnormal conditions of body parts including tumors, cysts, ulcers, inflammation, etc.). Wherein the inclusion relationship indicates that one portion includes another portion, the offset relationship indicates that one portion is offset to one side with respect to the central axis, and the azimuth relationship indicates the relative position of one portion at another portion. The embodiment of the application refers to two internationally common medical classification systems, namely ICD10 (International Statistical Classification of Diseases and Related Health Problems, tenth edition) and SNOMED-CT (Systematized Nomenclature of Medicine-Clinical terminals) when constructing a body part knowledge graph. The ICD10 includes classification codes for various diseases and health problems, and also includes classification codes for various body parts, which can be used as important references for constructing knowledge maps of the body parts. SNOMED-CT provides relationships between various body parts, such as inclusion, lateral and azimuthal, among others, in addition to the names and classifications of the various body parts. By associating these relationships with entities, we can describe the relationships between body parts more accurately, providing richer semantic information for subsequent processing.

Secondly, constructing a diagnosis knowledge map comprising physical parts, clinical manifestations, morphological abnormality, diseases and other entities and relations according to hospital department experiences, clinical guidelines, consensus issued by expert groups, knowledge sources such as academic dictionary white paper books and the like.

Finally, the two medical knowledge maps, namely the body part knowledge map and the diagnosis knowledge map, are both disassembled by using the entity association mapping flow described in section 1.4, and are subjected to association mapping with the patient information.

1.3 patient information extraction

The patient information extraction process is to extract corresponding entities and relations contained in the patient information map body in the patient electronic medical record text by using a sequence labeling model based on deep learning.

And predicting the probability that each character in the patient electronic medical record text belongs to each entity and attribute by using the sequence labeling model, and identifying the corresponding entity and attribute associated with the patient entity. The sequence labeling model used in the embodiment of the application is a bi-directional cyclic neural network and conditional random field hybrid model, and is recorded as a BiLSTM-CRF model, the model firstly understands the context information through the bi-directional cyclic neural network BiLSTM, then constructs a state probability and a transition probability matrix based on the output value of the bi-directional cyclic neural network BiLSTM at each character position, and constructs the conditional random field CRF model, thereby obtaining better effect on the sequence labeling task.

After entity extraction is completed, the present embodiment uses a remote supervision method of predefined rules to complement edges between nodes, and completes relation extraction, specifically, creates relation edges meeting the predefined rules between different node combinations in the same sentence.

1.4 entity association mapping

The entity association mapping process of the application is to associate the patient information with the entity mapping in the medical knowledge graph by using an entity disambiguation flow based on natural language processing. A library of medical terms is defined therein, and the process of "unregistered words" is managed and updated with fine-grained information elements to solve the OOV problem in the entity-association mapping process. As shown in fig. 2, the implementation of the entity association map includes the following steps:

(1) initial medical term library construction. The application extracts all entities from the medical knowledge graph constructed by the embodiment as an initial medical term library of the embodiment.

(2) Entity term mapping

The entity to be mapped of the entity association mapping process has two sources: 1.2 partially consolidated medical knowledge source entities, i.e., entities contained in an initial medical term library; part 1.3 entities extracted from patient electronic medical record text data.

First, different types of entities to be mapped are split into different information units, which may correspond to attributes within the body. And dividing the entity to be mapped into different information unit splitting tasks according to different entity types defined in the body. For example, the morphological abnormality "bilateral basal ganglia lacuna foci" is divided into "basal ganglia (site of occurrence); lacuna cookers (meta-performance) "information units". The information elements contained in each entity to be mapped are identified at the character level using the sequence annotation model BiLSTM-CRF.

Secondly, for each dimension of information units, calculating the similarity between the information units and candidate terms of the corresponding information unit types in the medical term library by using a Roberta-based text matching model, recalling the candidate terms by using a BM25 algorithm, and if the terms meeting the similarity conditions do not exist in the medical term library, adding the information units to the medical term library as new terms, and establishing term codes and synonyms; each information unit performs the above steps until the information unit of each dimension of the entity to be mapped can find the term corresponding to the term encoding in the medical term library.

And finally, taking the medical terms formed by the spelling of the terms corresponding to each information unit of the entity to be mapped as a mapping result, and if the corresponding mapping result terms do not exist in the medical term library, adding the terms to the medical term library, and establishing term codes and synonyms.

(3) The term relationship is completed. And establishing a hierarchical relationship between the new term and other terms in the medical term library while adding the new term, for example, using the body part as an information unit of the morphological abnormal term, and utilizing the partial side, the containing and the azimuth relationship of the body part stored in the body part knowledge graph to infer and complement the partial side, the containing and the azimuth relationship of the morphological abnormal term.

Through the steps, the entity association mapping of the medical knowledge graph and the patient information is completed. And after the entity association mapping is completed, the construction of the patient information map is completed.

2. Knowledge representation learning module

The knowledge representation learning module comprises two parts, namely knowledge representation learning and patient sequence representation learning, the overall flow is as shown in fig. 3, and the specific implementation of each part is described in detail below.

2.1 knowledge representation learning

Because the patient information map fuses patient information and a large amount of redundancy exists in data, the embodiment of the application adopts a rule learning algorithm to screen and prune the patient information map before knowledge representation learning, and the rule learning algorithm used in the embodiment of the application is bottom-up rule learning (AnyBURL) at any time, so that the method is used for knowledge map rule mining. Patient nodes stored in a patient information map are used as regular path starting nodesThe diagnosed disease node is a regular path target node +.>From->To->Is a regular path of length nEach rule decomposes a single relationship into multiple relationships (including inverseTo relation), the rule learning algorithm gives a confidence score to each rule, selects a rule reservation with the confidence score greater than a certain threshold value, and performs pruning operation on nodes and edges of the patient information map outside the rule.

Defining the graph of removing patient nodes and associated edges in the patient information graph after pruning as。

Definition of the definitionMiddle triplet->Calculation of score, wherein->The entities corresponding to the head node and the tail node are respectively, and r is the relation. First, for +.>Embedding each entity and relation in d-dimensional vector space to obtain entity vector +.>And relation vector->. Then, the entity vector is->And relation vector->Splicing along the last dimension to obtain a spliced vectorWherein->Representing the concatenation operation of the vectors. The purpose of this stepThe entity and relationship vectors are combined into a larger vector to facilitate subsequent convolution operations. Thereafter, the splice vector is->Performing convolution operation to obtain new vector +.>Wherein->Is a convolution kernel +.>Is an activation function->For residual terms, k is the convolution kernel size, +.>The vector dimension obtained after convolution. The purpose of this step is to perform feature extraction and representation learning in the joint space of entities and relationships, resulting in a more meaningful vector representation. Finally, will->Entity vector with Tail node->Performing dot product operation to obtain triplet +.>Score value +.>I.e. +.>WhereinIs a parameter matrix corresponding to the relation r.

Knowledge representation learning modelDuring training, the data of the triples are negatively sampled, and a positive sample set is set asThe negative sample set is +.>For error triples->Calculating to obtain error triplet score. Taking Hinge Loss function Hinge Loss as Loss function, namely

Wherein the method comprises the steps ofLearning model for knowledge representation>Is a loss function of->Is a super parameter, obtained by cross-validation method, set +.>。

Learning models by knowledge representationWill->Conversion of any entity e within into d-dimensional knowledge representation embedding vector +.>。

2.2 patient sequence representation learning

Acquisition ofAfter knowledge representation of each node in the tree, generate +.>Is a patient sequence. The fact nodes (including clinical manifestation and morphological abnormality) associated with each patient node in the patient information map are connected in a time sequence relationship to form a longitudinal sequence with length of M, so that the patient is->Longitudinal sequence of->Wherein each entity is a term in a library of medical terms.

The patient may have multiple visits, with multiple repeated descriptions for a single body part. In order to learn the sub-sequence representation of each body part of a patient to obtain a diagnosis conclusion with finer granularity, the fact nodes stored in the longitudinal sequence of the patient are segmented into sub-sequences with fixed length L according to the difference of the body parts which occur in fact, and K body parts are assumed to be in total, and the sub-sequences corresponding to each body part K are sequenced according to time sequence to defineIf the sub-sequence length is less than L, then the length of L is padded with 0.

Substituting knowledge representation embedding vectors of entities into a longitudinal sequence of a patient to obtain a representation matrix of each subsequence of the patientWherein the time sequence relation of the patient fact nodes is embodied by the subsequence order. The representation matrix of each of the above sub-sequences is +.>Input patient sequence representing learning model->. Patient sequence representation learning model->Consists of a time sequence information extraction layer and a full connection layer which are connected in sequence.

Specifically, the timing information extraction layerThe local temporal pattern of the patient sub-sequence representation matrix is extracted using the convolutional neural network CNN as a temporal filter, and a single-sided convolutional calculation is performed. Define a temporal filter as +.>Wherein s is the window size of the time filter, q time filters are constructed in total, and for any time filter t, the output of the filter can be obtained>Is that

Wherein the method comprises the steps ofFor activating the function, in this embodiment the correction linear unit ReLU @, is->For temporal filters->I-th dimension output of>To represent matrix +.>Is the i-th dimensional knowledge of (1) represents the embedded vector,>for convolution function +.>Is the t-th temporal filter bias vector. Splicing the vectors output by the q time filters, pooling with the step length of 1, and remolding to obtain +.>Vector of dimensions.

Full connection layerFrom dimensions of->Four linear layers->And two activation functions->Composition, feedforward expression order is +.>. In this embodiment +.>To correct the linear unit ReLU.

Initializing patient sequence representation learning modelThe output vector of (2) is +.>There is->. Will->Remodelling to->A matrix of dimensions, denoted->With KL divergence loss (Kullback-Leibler Divergence Loss) as a loss function, i.e

Wherein the method comprises the steps ofRepresenting a learning model for a patient sequence>Is a loss function of->Is->Is valued in column i>To represent matrix +.>Is valued in column i>As a Softmax function. Convergent +.>。

Extracting trained patient sequence representation learning modelMiddle->Layer transportGo out vector->As coded representation of patient sub-sequence K, component-wise averaging of the coded representation of patient K sub-sequences as patient sequence representation +.>There is->。

3. Diagnostic recommendation module

The application converts the diagnosis recommendation process into a process of finding the patient with the highest overall similarity with the patient p to be diagnosed from the patient diagnosis information base, and the overall similarity between the patient p to be diagnosed and any patient in the patient diagnosis information base is defined as the similarity between the patient sequence representations of the two patients.

The process of constructing the patient diagnostic information base is as follows: the diagnosis recommendation system can support diagnosis of the disease set as D, a patient list P for diagnosing the disease in the set as D is screened out from the electronic medical record data of the hospital patients, and a patient sequence representation of the patients in the patient list P is calculated to construct a patient diagnosis information base.

Specifically, the diagnosis recommendation process of the present application is as follows:

first, a patient to be diagnosed is calculatedPatient sequence representation at the time of the current visit +.>Corresponding subsequence representation。

Secondly, respectively calculating the patients to be diagnosedPatient sequences representing similarities to patients in the patient diagnostic information base. Patient sequence table employed in this exampleThe calculation method of the similarity is cosine similarity, which is set for the patient to be diagnosed>Is +.>Is similar between corresponding body parts k>The method comprises the following steps: />

Wherein the method comprises the steps ofModulo representing vector, ++>Respectively indicate patient->And patient->Is a representation of the patient sequence of (a),respectively indicate patient->And patient->Is represented by a sub-sequence of body part k. Setting threshold +.>When the similarity isWhen it is indicated that the patient is->Is in charge of the patient>Clinical manifestations and morphological abnormalities on body part k are similar. In this embodiment ∈>. As shown in FIG. 3, the patient to be diagnosed is calculated separately +.>And (3) sequencing the similarity between each body part and the patient in the patient diagnosis information base, sequencing the results from big to small, recalling the patient with the front overall similarity, and taking the diagnosis result of the recalled patient as the diagnosis recommended result of the patient to be diagnosed in the embodiment of recalling the patient with the front 5.

Further, the diagnosis recommendation module can provide visual content display for doctors, including patient information maps, patient information sequences and visual results of patient representations.

Patient information profile: according to the embodiment of the application, the patient information map triplet data is imported into the map database, and the visualization tool of the map database is utilized to provide display so as to support the query of the entity and the relation of the knowledge map.

Patient information sequence: through the association of the 1.4 part, the embodiment of the application forms a longitudinal multi-jump query sequence between the patient entity and the fact entity, and through carrying out breadth first search on the associated sides of all patients of the patient information map, the doctor can be provided with the fact sequence of the longitudinal sequence of the patient in multiple visits, so as to know the disease development condition of each body part of the patient in different visits, and help the doctor to find the disease and the treatment scheme matched with the disease condition of the patient more quickly.

The patient represents the visualization result: and (3) using the patient sequence representation obtained by the calculation in the section 2.2, and using Unified Manifold Approximation and Projection (UMAP) to reduce the dimension of the patient sequence representation coding, wherein compared with the classical T-SNE dimension reduction method, UMAP has the advantages of being capable of retaining a global structure, being shorter in time consumption, having no limitation on an embedding dimension, being capable of being expanded to a data set with a larger dimension, and the like. Through constructing patient queue, draw the dimension reduction code that the patient sequence of queue represents, be favorable to the layering condition of visual patient, know the clustering result of different grade type patient.

The diagnosis recommendation system for fusing knowledge and patient representation, provided by the application, utilizes various medical data sources including texts, structured data, unstructured data and the like to automatically construct a medical knowledge graph comprising spatial representation relations of body parts, defines a semantic association method for calculating similarity of fine-grained semantic units, and solves the OOV condition existing when medical entities of knowledge and data sources are aligned; associating time sequence data of patient diagnosis with a medical knowledge graph, constructing a patient information graph, pruning the patient information graph by using a rule learning algorithm, reducing a knowledge representation learning domain, and associating data to form a patient information sequence; learning knowledge representation of the patient information map, and solving complex logic association problems such as morphological abnormal space alignment and the like by calculating semantic similarity; training a patient sequence representation learning model based on an unsupervised convolutional neural network, and embedding knowledge representation and time sequence information to realize comprehensive utilization of time sequence information of patient treatment; compared with the traditional diagnosis recommendation system based on the machine learning model and the rule template, the diagnosis recommendation system provided by the application has the advantages that the accuracy and the efficiency of diagnosis recommendation are improved by utilizing various technical means such as medical knowledge maps, patient data, deep learning models and the like, and the diagnosis recommendation system can better provide help for the medical health industry.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

The foregoing description of the preferred embodiment(s) is (are) merely intended to illustrate the embodiment(s) of the present application, and it is not intended to limit the embodiment(s) of the present application to the particular embodiment(s) described.

Claims

1. The diagnosis recommendation system integrating the knowledge and the patient representation is characterized by comprising a knowledge graph construction module, a knowledge representation learning module and a diagnosis recommendation module;

the patient information map comprises patient information and medical knowledge map information, the patient information and the medical knowledge map information are stored in an attribute map model, a node set comprises nodes of different types and node attributes, the node types comprise patients, body parts, clinical manifestations, morphological abnormalities and diseases, and the clinical manifestation nodes and the morphological abnormality nodes are taken as fact nodes; the relationship set comprises the logic relationship among different nodes and the time sequence relationship among different fact nodes of a patient;

the construction of the medical knowledge graph comprises two parts, namely a body part knowledge graph and a diagnosis knowledge graph; the body part knowledge graph comprises body parts and relations thereof, and models the containing, lateral and azimuth relations among body part entities;

the entity association map includes:

establishing a hierarchical relationship between the newly added term and other terms in the medical term library while the term is newly added;

(2) A knowledge representation learning module comprising:

the patient sequence representation learning model consists of a time sequence information extraction layer and a full connection layer which are sequentially connected;

the full-connection layer extracts an output vector of the linear layer as a coded representation of a patient sub-sequence, and averages all the coded representations of the patient sub-sequences component by component to be used as a patient sequence representation; a KL divergence loss function is adopted in the training process;

2. The system for diagnosis recommendation fusing knowledge and patient representation of claim 1, wherein the extracting patient information comprises: extracting corresponding entities and relations contained in the patient information map body in the patient electronic medical record data by using a sequence labeling model based on deep learning; after the entity extraction is completed, the edges between the nodes are complemented by a remote supervision method with predefined rules, and the relation extraction is completed.

3. The diagnostic recommendation system for fusing knowledge and patient representations according to claim 1, wherein training of the knowledge representation learning model comprises:

4. The system for diagnosis recommendation fusing knowledge and patient representation of claim 1, wherein the timing information extraction layer comprises:

defining a time filter asWhereinsFor the window size of the temporal filter,dembedding vector dimensions for knowledge representation, co-constructionqTime filter, any time filtertThe output of (2) is +.>Wherein->To activate the function +.>For temporal filters->Is the first of (2)iDimension output->The first of the representation matrix being a subsequenceiDimension knowledge represents embedded vector, ">For convolution function +.>Is the firsttA time filter bias vector;

will beqSplicing vectors output by the time filters, pooling with step length of 1, and remolding to obtainVector of dimensions in whichLIs the sub-sequence length.

5. The system of claim 4, wherein the full link layers are each dimensioned to be、/>、/>、/>Is of the four linear layers of (2)And two activation functions->The composition of the composite material comprises the components,Mfor the length of the patient longitudinal sequence, the feedforward expression order isThe method comprises the steps of carrying out a first treatment on the surface of the Remodelling the output vector of the initialized patient sequence representation learning model to +.>A matrix of dimensions, denoted->Using KL divergence loss, i.e. loss function +.>Wherein->Is->Is the first of (2)iColumn value->The first of the representation matrix being a subsequenceiColumn value->As a Softmax function; extracting the trained patient sequence to represent +.>The output vector of the layer is used as a coded representation of the patient sub-sequence, and the coded representation of all sub-sequences of the patient is averaged component by component as a patient sequence representation.

6. The system for fusing knowledge and patient representation diagnosis recommendation of claim 1, wherein the patient diagnosis information base is constructed as follows: and (3) setting a disease set supported by the diagnosis recommendation system as D, screening a patient list P diagnosed as the disease in the set D from the electronic medical record data of the hospital patients, and calculating patient sequence representations of patients in the patient list P to construct a patient diagnosis information base.