CN112364174A - Patient medical record similarity evaluation method and system based on knowledge graph - Google Patents

Patient medical record similarity evaluation method and system based on knowledge graph Download PDF

Info

Publication number
CN112364174A
CN112364174A CN202011131273.1A CN202011131273A CN112364174A CN 112364174 A CN112364174 A CN 112364174A CN 202011131273 A CN202011131273 A CN 202011131273A CN 112364174 A CN112364174 A CN 112364174A
Authority
CN
China
Prior art keywords
knowledge
entity
vector
similarity
entities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011131273.1A
Other languages
Chinese (zh)
Inventor
郭伟
宋贤
鹿旭东
孔兰菊
崔立真
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202011131273.1A priority Critical patent/CN112364174A/en
Publication of CN112364174A publication Critical patent/CN112364174A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records

Abstract

The invention provides a patient medical record similarity evaluation method and system based on a knowledge graph, which are used for acquiring and preprocessing the text data of the patient medical record; performing entity identification and entity relationship extraction on the preprocessed data by adopting a combined information extraction model based on weak supervised learning, and expressing the obtained medical entity type by using a knowledge vector; constructing triples in the knowledge graph according to the obtained entity relationship, and expressing the triples by using a joint knowledge vector; merging the knowledge vectors into the same semantic space by using the joint vector, and judging the similarity by calculating the semantic distance between the entities; the invention excavates and extracts important knowledge in the electronic medical record by using the bidirectional circulation neural network for multiple times, expands the knowledge graph concept in the subject field into the medical field, and carries out entity identification and relationship extraction under the current situation that the types of medical entities are relatively limited, thereby improving the accuracy of similarity evaluation.

Description

Patient medical record similarity evaluation method and system based on knowledge graph
Technical Field
The invention relates to the technical field of text data processing, in particular to a patient medical record similarity evaluation method and system based on a knowledge graph.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
An Electronic Medical Record (EMR) refers to digital electronic information such as characters, diagrams, figures, data, etc. generated by medical staff using a medical institution information system during the medical treatment of a patient, and has the functions of storing, managing, transmitting and reproducing medical records. With the continuous development of intelligent medical treatment and informatization, electronic medical cases gradually replace paper medical records and become main carriers for recording personal medical information and health information, the electronic medical cases have more detailed records on the medical activities of patients, contain a large amount of medical information related to the health conditions of the patients, and have great utilization value for diagnosis, case analysis, prognosis and the like in the medical field. In the case data with mass and diversified information types, similarity duplication checking and analysis are carried out on unstructured electronic medical records, illegal copied medical record contents are searched in an auxiliary mode, and the quality of medical records in hospitals can be effectively controlled. The unstructured medical record text is subjected to information extraction, so that the structured method is divided into two methods, one method is rule-based, namely a regular expression meeting a certain grammatical rule is designed, and the regular expression is matched with the medical record to find corresponding information types, such as diseases, symptoms, treatment means, inspection means and the like. The other method is based on natural language processing, and mainly depends on technologies such as word segmentation, information extraction, syntactic analysis and the like to extract entity words in an unstructured text, and then the entity words are matched with words in a medical term dictionary to obtain medical vocabularies in medical history texts for entity recognition.
Information extraction is an important step for constructing a knowledge graph, is generally divided into two subtasks of entity identification and relationship extraction, and is widely applied to solving the extraction problem of entities and relationships thereof by a Pipelined method and a joint learning method. The tapeled method treats information extraction as two independent tasks, namely Named Entity Recognition (NER) and Relationship Classification (RC). In addition, the knowledge graph displays the key points and the internal relation of knowledge in information in a visual mode, is a basic stone for realizing intelligent medical treatment, and has important significance for clinical decision support, personalized medical service and the like.
The inventor finds that the knowledge graph is not widely applied in the medical aspect, and mainly has the difficulties in two aspects of unstructured text extraction and knowledge graph drawing; the text information extraction is realized by two subtasks, so that non-negligible error propagation can be generated, and the conventional supervised learning has the conditions of time consumption and labor consumption, so that the triple information of the information extraction has the conditions of loss, inaccuracy, non-smoothness and the like.
Disclosure of Invention
In order to solve the defects of the prior art, the invention provides a patient medical record similarity evaluation method and system based on a knowledge graph, which adopt a deep learning method to extract information from an electronic medical record, excavate and extract important knowledge in the electronic medical record by using a bidirectional cyclic neural network for multiple times, expand the knowledge graph concept in the subject field into the medical field, and carry out entity identification and relationship extraction under the current situation that the types of medical entities are relatively limited, thereby improving the accuracy of similarity evaluation.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides a patient medical record similarity evaluation method based on a knowledge graph.
A patient medical record similarity evaluation method based on a knowledge graph comprises the following steps:
acquiring and preprocessing the text data of the patient medical record;
performing entity identification and entity relationship extraction on the preprocessed data by adopting a combined information extraction model based on weak supervised learning, and expressing the obtained medical entity type by using a knowledge vector;
constructing triples in the knowledge graph according to the obtained entity relationship, and expressing the triples by using a joint knowledge vector;
and merging the knowledge vectors into the same semantic space by using the joint vector, and judging the similarity of the medical records by calculating the semantic distance between disease entities.
The invention provides a patient medical record similarity evaluation system based on a knowledge graph.
A system for evaluating similarity of medical records of patients based on a knowledge graph comprises:
a data acquisition module configured to: acquiring and preprocessing the text data of the patient medical record;
an information extraction module configured to: performing entity identification and entity relationship extraction on the preprocessed data by adopting a combined information extraction model based on weak supervised learning, and expressing the obtained medical entity type by using a knowledge vector;
a triplet building module configured to: constructing triples in the knowledge graph according to the obtained entity relationship, and expressing the triples by using a joint knowledge vector;
a similarity discrimination module configured to: and merging the knowledge vectors into the same semantic space by using the joint vector, and judging the similarity by calculating the semantic distance between the entities.
A third aspect of the present invention provides a computer-readable storage medium, on which a program is stored, which, when being executed by a processor, implements the steps of the method for similarity evaluation of patient medical records based on a knowledge-graph according to the first aspect of the present invention.
A fourth aspect of the present invention provides an electronic device, which includes a memory, a processor, and a program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method for estimating similarity of medical records of patients based on a knowledge graph according to the first aspect of the present invention.
Compared with the prior art, the invention has the beneficial effects that:
1. the method, the system, the medium or the electronic equipment adopts a deep learning method to extract information from the electronic medical record, excavates and extracts important knowledge in the electronic medical record by using a bidirectional circulation neural network for multiple times, expands the knowledge graph concept in the subject field into the medical field, performs entity identification and relationship extraction under the current situation that the types of medical entities are relatively limited, and improves the accuracy of similarity evaluation.
2. The method, the system, the medium or the electronic equipment of the invention preprocesses the data of the unstructured medical record data and calculates the similarity of the medical record data by utilizing deep semantic information, and compared with the traditional field full matching method, the identification accuracy is greatly improved.
3. According to the method, the system, the medium or the electronic equipment, an attention mechanism is added into a BI-LSTM-CRF network to construct a joint knowledge extraction network, the relation extraction is carried out while the entity is identified, the method is used for quickly constructing the triples in the knowledge graph, and error propagation between subtasks caused by the fact that the information extraction is divided into two subtasks is avoided.
4. The method, the system, the medium or the electronic equipment provided by the invention have the advantages that the complex work of manually marking a data set is reduced while the effective information extraction of the data is carried out by the joint knowledge extraction network based on the weak supervision learning, and the expandability is better.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
Fig. 1 is a schematic flow chart of a method for evaluating similarity of medical records of a patient based on a knowledge graph according to embodiment 1 of the present invention.
FIG. 2 is a schematic diagram of a BI-LSTM network structure provided in embodiment 1 of the present invention.
Fig. 3 is a schematic diagram of an LSTM-CRF network structure provided in embodiment 1 of the present invention.
FIG. 4 is a schematic diagram of a BI-LSTM-CRF network according to embodiment 1 of the present invention.
Fig. 5 is a schematic diagram of an Att-BiLSTM network structure provided in embodiment 1 of the present invention.
Detailed Description
The invention is further described with reference to the following figures and examples.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
Example 1:
as shown in fig. 1, embodiment 1 of the present invention provides a method for evaluating similarity of medical records of patients based on a knowledge graph, including the following steps:
firstly, entity recognition is carried out by adopting a bidirectional LSTM network (BI-LSTM-CRF) with a conditional random field layer, and the obtained medical entity type is expressed by using a knowledge vector;
then adding an attention mechanism (Att-BilSTM) into a BI-LSTM-CRF network, constructing a Joint knowledge extraction network (JKENet), performing relationship extraction while identifying an entity, and expressing the relation by using a Joint knowledge vector, wherein the JKENet is used for quickly constructing triples in a knowledge graph;
and finally, merging the knowledge vectors into the same semantic space by using the joint vector, and realizing the discrimination of the similarity by calculating the semantic distance between the entities.
In detail, the following contents are included:
in the embodiment, the complex work of manually marking the data set is reduced through weak supervised learning while effective information extraction is carried out on the data, and the method has better expandability compared with methods such as supervised learning.
Medical entity identification is carried out by adopting a bidirectional LSTM network (BI-LSTM-CRF) with a conditional random field layer, and the type of the medical entity is obtained and expressed by a knowledge vector.
An attention mechanism is introduced into the BI-LSTM-CRF network, attention probability is calculated to highlight the importance degree of key words in medical records, and an Att-BilTM network model is used for extracting the evolution relation to obtain a result. And the established joint knowledge extraction network performs relationship extraction while identifying the entity, and is used for constructing the triples in the knowledge graph.
Combining the knowledge vectors into the same semantic space by using the joint vector, combining the triples of the plurality of knowledge maps together for training, and constraining the space where the triples with similar relation are located; joint knowledge embedding is realized through TransE with parameter sharing and soft alignment, and the semantic distance between entities is calculated to realize the discrimination of the similarity.
(1) Bidirectional LSTM network based on conditional random field layer (BI-LSTM-CRF)
The BI-directional LSTM network can effectively use past features (via forward states) and future features (via backward states) within a specified time frame to train the BI-LSTM network via back propagation of time (BPTT), the structure of which is shown in fig. 2.
Over time, the forward and backward delivery over an expanded network is similar to that in a conventional network, except that the hidden state needs to be expanded for all time steps, we also need special processing at the beginning and end of the data points. When the whole sentence is scanned forwards and backwards, the hidden state is only required to be reset to 0 at the beginning of the sentence, and a plurality of sentences can be processed simultaneously by batch implementation. The construction of the private information identification model is completed by extracting some basic features and some special features in the electronic medical record.
The Conditional Random Field (CRF) model focuses on sentence level rather than individual locations, and the inputs and outputs of CRF are directly connected, as opposed to LSTM and BI-LSTM networks, which are connected together by memory cells and circulation components. By integrating the LSTM network and the CRF network into an LSTM-CRF model, as shown in fig. 3.
Through the LSTM layer, the model can effectively utilize past input characteristics, and through the CRF layer, the model can effectively utilize sentence-level label information. The CRF layers are represented by lines connecting successive output layers. The CRF layer has a state transition matrix as a parameter. With such a layer, past and future tags can be effectively utilized to predict current tags, similar to the ability of a two-way LSTM network to take advantage of past and future input features.
Then a BI-directional LSTM network and a CRF network are combined into a BI-LSTM-CRF network, and the network structure is shown in FIG. 4. In addition to being able to utilize past input features and sentence-level tag information like the LSTM-CRF model, the BI-LSTM-CRF model is also able to utilize future input features, an additional function that may improve the accuracy of the annotation.
The Bi-LSTM bidirectional long and short term memory network is adopted to take the generated vector as input and generate a prediction vector of the target word as output, the operation of the iteration module mainly comprises a vector layer, a forward long and short term memory network layer, a backward long and short term memory network layer and a connection layer, and the output vector is changed according to the output of the forward long and short term memory network layer and the output of the backward long and short term memory network layer. Given the training set, forward LSTM considers context information in front of the target word, i.e., from ω1To omegatUpper and lower ofText information, a vector c of the target word is obtainedtThe specific calculation is shown in the following formula:
it=δ(Wwiωt+Whiht-1+Wcict-1+bi) (1)
Figure BDA0002735251260000081
ct=ftct-1+tanh(Wωcωt+Whcht-1+bc)it (3)
wherein W ═ { ω ═ in formula (1)1,...ωtt+1...ωnDenotes a sequence of words, ωt∈RdA vector representation representing the t-th word in a certain sentence, the word vector being a d-dimensional word vector, n representing the number of words in the sentence, ht-1Representing the pre-hidden vector in the memory module in Bi-LSTM, ct-1Representing the previous original vector in the memory module. Meanwhile, the target word is calculated to the LSTM after passing through the backward direction, and the context information behind the target word is considered, namely from omegat+1To omeganGets another vector otThe specific calculation is shown in formula (4):
ot=δ(Wωoωt+Whoht-1+Wcoct+bo) (4)
finally, the two vectors c generated simultaneously are obtainedtAnd otInputting the connection layer, and obtaining the vector h of the target word by using a hyperbolic tangent functiontThe specific calculation is shown in formula (5):
ht=ottanh(ct) (5)
(2) bidirectional LSTM network (Att-BilSTM) based on attention mechanism
After the entity is identified, the relationship extraction is required to complete the task of information extraction, but if the information extraction is divided into two subtasks, error propagation between the subtasks can be caused, and if a supervised mode is adopted to train the model when the relationship extraction is realized, the situations that data labeling is time-consuming and labor-consuming and most of manpower is occupied can occur. Aiming at the problems, the weak supervised learning-based joint information extraction is adopted, a remote supervision method and a remote supervision mode are used for reference, and the weak supervised learning joint information extraction which can ensure the information extraction effect, reduce manual labeling data sets as much as possible and has good extraction speed and expandability is expected to be constructed.
An Attention mechanism is introduced to a bidirectional LSTM network, an Att-BilSTM network is constructed to process the relevant problem of text classification, and the problem that a CNN model is not suitable for learning long-distance semantic information is solved. In the Att-BilSTM network, it is composed of 5 parts:
input layer (Input layer): the input sentence is referred, and for Chinese, the words divided into sentences are referred;
embedding layer: mapping each word in the sentence into a vector with a fixed length;
LSTM layer: calculating an embedding vector by using a bidirectional LSTM, wherein the bidirectional LSTM actually obtains a vector of a sentence at a higher level by calculating a word vector;
an Attention layer: using Attention weighting on the results of the bi-directional LSTM;
output layer (Output layer): and the output layer outputs a specific result.
In Bi-LSTM we will use the last time-series output vector as the feature vector and then perform softmax classification. Attenttion is to calculate the weight of each time sequence, then to make the weighted sum of all time sequence vectors as the feature vector, and then to make softmax classification. In the experiment, the addition of the Attention does improve the result, and the model structure is shown in fig. 5.
Wherein, the coding layer adopts a bidirectional RNN network, and the output of the last hidden layer is the splicing of two vectors expressed as
Figure BDA0002735251260000091
And the output of the Attention layer is
Figure BDA0002735251260000092
In the above formula hjIs the output of the hidden layer at time j of the coding layer, si-1Is the output of the hidden layer at the i-1 th time of the decoding layer. It can thus be found that in the calculation ciIs actually a linear model, and ciIn fact a weighted average of the hidden layer outputs at each time instant in the coding layer.
Using the encoded layer vector as input, generating the sequence label as output, and generating the final prediction vector htAnd multiplying the forward LSTM prediction vector by the position sequence number of the word, updating and connecting, finally obtaining the prediction vector by hyperbolic tangent operation, multiplying the prediction vector by the position vector of the prediction vector and adding the deviation value of the prediction vector, and obtaining the prediction label vector as the output Tt. The generated semantic vector is input into a Softmax layer for similarity calculation, the generated entity label probability is added with a TransE link similarity calculation probability value for normalization, and the probability of the entity label is output, wherein the specific calculation is as follows:
Figure BDA0002735251260000101
Figure BDA0002735251260000102
wherein WyIs a matrix of a Softmax layer, NtIndicates the number of labels, TtRepresenting a prediction tag vector, ytRepresenting entity relationship label probability to obtain
Figure BDA0002735251260000104
Normalized tag probabilities are shown.
(3) Federated knowledge extraction network
The semantic distance calculation method relies on a joint vector generation model, and can adopt distance calculation between any vectors, such as Euclidean distance. Retraining entities and relations involved in original data by a knowledge representation learning method, converting the entities and relations into an embedded vector form, adopting a TransE model in representation learning for a retraining model, randomly initializing a training set into a vector form as input, and generating word vectors corresponding to an entity set and a predefined relation set in the training set as output. Giving an entity set, a relation set and a training set, constructing a negative sample by replacing a head entity or a tail entity randomly through the training set, calculating the distance between a correct triple entity and the relation, the distance between an entity relation in the negative sample, adjusting the error between the entity relation and the negative sample, and expressing the entity relation as a vector conforming to the real relation, wherein a TransE loss function is as follows:
Figure BDA0002735251260000103
in equation (10), the loss function of the TransE is divided into a sum of a hyperparameter and a difference between a positive sample distance and a negative sample distance, where γ represents the hyperparameter, f (h, r, t) represents the distance of the positive sample, f (h ', r', t ') represents the distance of the negative sample, Δ represents a positive sample set, Δ' represents a negative sample set, and [ x ] represents max (0, x).
The triples and aligned entities in the knowledge graph are word vectors for learning joint knowledge, two knowledge bases are obtained through TransE and an expansion method PTransE thereof to respectively learn own knowledge vectors, the knowledge vectors are combined into the same semantic space through the joint vectors, and the joint vectors are obtained by the aligned entities. In the merged semantic space, the alignment between the entities is realized through the semantic distance between the entities, and the calculation method of the semantic distance depends on the generation model of the joint vector, and is shown in formula 11 for the use of the energy function:
Figure BDA0002735251260000111
the value of the energy function is less than the threshold and the two entities are considered similar. And updating the joint vector and finding a new entity pair by using the entity pair obtained by the new alignment, wherein the iterative learning joint vector and the entity alignment adopt two strategies of hard alignment and soft alignment.
Through a combined information extraction algorithm of weak supervised learning, the method realizes error propagation among subtasks, does not need a large amount of time and manpower to label data, extracts valuable triple information from a text, and solves the problems of non-uniform and non-standard data formats of the existing case history information management. The knowledge base is trained through an algorithm combining TransE and expression learning, a semantic vector space which is more consistent with the real world is generated, and the similarity calculation of the patient medical record based on the knowledge map is realized.
Example 2:
the embodiment 2 of the invention provides a patient medical record similarity evaluation system based on a knowledge graph, which comprises:
a data acquisition module configured to: acquiring and preprocessing the text data of the patient medical record;
an information extraction module configured to: performing entity identification and entity relationship extraction on the preprocessed data by adopting a combined information extraction model based on weak supervised learning, and expressing the obtained medical entity type by using a knowledge vector;
a triplet building module configured to: constructing triples in the knowledge graph according to the obtained entity relationship, and expressing the triples by using a joint knowledge vector;
a similarity discrimination module configured to: and merging the knowledge vectors into the same semantic space by using the joint vector, and judging the similarity by calculating the semantic distance between the entities.
The working method of the system is the same as the method for evaluating the similarity of the patient medical record based on the knowledge graph provided in the embodiment 1, and the details are not repeated here.
Example 3:
embodiment 3 of the present invention provides a computer-readable storage medium, on which a program is stored, where the program, when executed by a processor, implements the steps in the method for assessing similarity of medical records of a patient based on a knowledge graph according to embodiment 1 of the present invention, where the steps are as follows:
acquiring and preprocessing the text data of the patient medical record;
performing entity identification and entity relationship extraction on the preprocessed data by adopting a combined information extraction model based on weak supervised learning, and expressing the obtained medical entity type by using a knowledge vector;
constructing triples in the knowledge graph according to the obtained entity relationship, and expressing the triples by using a joint knowledge vector;
and merging the knowledge vectors into the same semantic space by using the joint vector, and judging the similarity of the medical records by calculating the semantic distance between disease entities.
The detailed steps are the same as the method for evaluating the similarity of the patient medical record based on the knowledge graph provided in the embodiment 1, and are not repeated herein.
Example 4:
an embodiment 4 of the present invention provides an electronic device, including a memory, a processor, and a program stored in the memory and executable on the processor, where the processor executes the program to implement the steps in the method for evaluating similarity of medical records of patients based on a knowledge graph according to the first aspect of the present invention, where the steps are as follows:
acquiring and preprocessing the text data of the patient medical record;
performing entity identification and entity relationship extraction on the preprocessed data by adopting a combined information extraction model based on weak supervised learning, and expressing the obtained medical entity type by using a knowledge vector;
constructing triples in the knowledge graph according to the obtained entity relationship, and expressing the triples by using a joint knowledge vector;
and merging the knowledge vectors into the same semantic space by using the joint vector, and judging the similarity of the medical records by calculating the semantic distance between disease entities.
The detailed steps are the same as the method for evaluating the similarity of the patient medical record based on the knowledge graph provided in the embodiment 1, and are not repeated herein.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A patient medical record similarity evaluation method based on a knowledge graph is characterized by comprising the following steps:
acquiring and preprocessing the text data of the patient medical record;
performing entity identification and entity relationship extraction on the preprocessed data by adopting a combined information extraction model based on weak supervised learning, and expressing the obtained medical entity type by using a knowledge vector;
constructing triples in the knowledge graph according to the obtained entity relationship, and expressing the triples by using a joint knowledge vector;
and merging the knowledge vectors into the same semantic space by using the joint vector, and judging the similarity by calculating the semantic distance between the entities.
2. The method as claimed in claim 1, wherein the information extraction model is a combination of a bi-directional LSTM network with conditional random field layers for entity recognition, and the medical entity type is expressed by knowledge vector.
3. The method of claim 1, wherein the information extraction model is combined with an attention-based two-way LSTM network with conditional random field layers for entity relationship extraction.
4. The method of claim 3, wherein the Attention-based two-way LSTM network with conditional random field layers comprises an input layer, an Embedding layer, an LSTM layer, an Attention layer and an output layer;
an Embedding layer configured to: mapping each word in the sentence into a vector with a fixed length;
an LSTM layer configured to: calculating an embedding vector by using a bidirectional LSTM;
an Attention layer configured to: attention weighting is used for the results of the bi-directional LSTM.
5. The method for assessing similarity of medical records of patients based on a knowledge-graph as claimed in claim 1, wherein the similarity determination specifically comprises:
combining the knowledge vectors into the same semantic space by using the joint vector, combining the triples of the plurality of knowledge maps together for training, and constraining the space where the triples with similar relation are located;
and performing joint knowledge embedding through a TransE model with parameter sharing and soft alignment, and calculating the semantic distance between entities to realize the discrimination of the similarity.
6. The method of claim 5, wherein the entities and relationships involved in the raw data are retrained by means of knowledge representation learning and converted into an embedded vector form;
retraining by adopting a TransE model, randomly initializing a training set into a vector form as input, and generating word vectors corresponding to an entity set and a predefined relation set in the training set as output;
giving an entity set, a relation set and a training set, randomly replacing a head entity or a tail entity through the training set to construct a negative sample, calculating the distance between a correct triple entity and the relation, calculating the distance between the entity relation in the negative sample, adjusting the error between the entity relation and the negative sample, and representing the entity relation into a vector which accords with the real relation.
7. The method of claim 5, wherein the alignment between the entities is performed by semantic distance between the entities in the merged semantic space, and the new aligned pair of entities is used to update the join vector and find a new pair of entities.
8. A system for evaluating similarity of medical records of patients based on a knowledge graph is characterized by comprising:
a data acquisition module configured to: acquiring and preprocessing the text data of the patient medical record;
an information extraction module configured to: performing entity identification and entity relationship extraction on the preprocessed data by adopting a combined information extraction model based on weak supervised learning, and expressing the obtained medical entity type by using a knowledge vector;
a triplet building module configured to: constructing triples in the knowledge graph according to the obtained entity relationship, and expressing the triples by using a joint knowledge vector;
a similarity discrimination module configured to: and merging the knowledge vectors into the same semantic space by using the joint vector, and judging the similarity by calculating the semantic distance between the entities.
9. A computer-readable storage medium, on which a program is stored, which, when being executed by a processor, carries out the steps of the method for similarity assessment of patient medical records based on a knowledge-graph according to any one of claims 1 to 7.
10. An electronic device comprising a memory, a processor, and a program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for similarity assessment of patient medical records based on a knowledge graph according to any one of claims 1-7 when executing the program.
CN202011131273.1A 2020-10-21 2020-10-21 Patient medical record similarity evaluation method and system based on knowledge graph Pending CN112364174A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011131273.1A CN112364174A (en) 2020-10-21 2020-10-21 Patient medical record similarity evaluation method and system based on knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011131273.1A CN112364174A (en) 2020-10-21 2020-10-21 Patient medical record similarity evaluation method and system based on knowledge graph

Publications (1)

Publication Number Publication Date
CN112364174A true CN112364174A (en) 2021-02-12

Family

ID=74511378

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011131273.1A Pending CN112364174A (en) 2020-10-21 2020-10-21 Patient medical record similarity evaluation method and system based on knowledge graph

Country Status (1)

Country Link
CN (1) CN112364174A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112883736A (en) * 2021-02-22 2021-06-01 零氪科技(北京)有限公司 Medical entity relationship extraction method and device
CN112992317A (en) * 2021-05-10 2021-06-18 明品云(北京)数据科技有限公司 Medical data processing method, system, equipment and medium
CN113436698A (en) * 2021-08-27 2021-09-24 之江实验室 Automatic medical term standardization system and method integrating self-supervision and active learning
CN113539409A (en) * 2021-07-28 2021-10-22 平安科技(深圳)有限公司 Treatment scheme recommendation method, device, equipment and storage medium
CN114036307A (en) * 2021-09-17 2022-02-11 清华大学 Knowledge graph entity alignment method and device
CN114417845A (en) * 2022-03-30 2022-04-29 支付宝(杭州)信息技术有限公司 Identical entity identification method and system based on knowledge graph
CN115036034A (en) * 2022-08-11 2022-09-09 之江实验室 Similar patient identification method and system based on patient characterization map
CN115080764A (en) * 2022-07-21 2022-09-20 神州医疗科技股份有限公司 Medical similar entity classification method and system based on knowledge graph and clustering algorithm
CN115312186A (en) * 2022-08-09 2022-11-08 北京至真互联网技术有限公司 Auxiliary screening system for diabetic retinopathy
CN115658927A (en) * 2022-11-17 2023-01-31 浙江大学 Time sequence knowledge graph-oriented unsupervised entity alignment method and device
CN115798733A (en) * 2023-01-09 2023-03-14 神州医疗科技股份有限公司 Intelligent auxiliary reasoning system and method for orphan disease
CN116092622A (en) * 2023-04-10 2023-05-09 江苏瀚云医疗信息技术有限公司 Electronic medical record quality control system based on Neo4j atlas and AI algorithm
CN116434933A (en) * 2023-04-14 2023-07-14 酒泉海容网络科技有限公司 Intelligent nursing remote data processing method and system based on intelligent medical treatment
CN116682553A (en) * 2023-08-02 2023-09-01 浙江大学 Diagnosis recommendation system integrating knowledge and patient representation

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108536754A (en) * 2018-03-14 2018-09-14 四川大学 Electronic health record entity relation extraction method based on BLSTM and attention mechanism
CN109255033A (en) * 2018-11-05 2019-01-22 桂林电子科技大学 A kind of recommended method of the knowledge mapping based on location-based service field
CN109284396A (en) * 2018-09-27 2019-01-29 北京大学深圳研究生院 Medical knowledge map construction method, apparatus, server and storage medium
CN110046252A (en) * 2019-03-29 2019-07-23 北京工业大学 A kind of medical textual hierarchy method based on attention mechanism neural network and knowledge mapping
CN110334219A (en) * 2019-07-12 2019-10-15 电子科技大学 The knowledge mapping for incorporating text semantic feature based on attention mechanism indicates learning method
CN110598001A (en) * 2019-08-05 2019-12-20 平安科技(深圳)有限公司 Method, device and storage medium for extracting association entity relationship
CN110807084A (en) * 2019-05-15 2020-02-18 北京信息科技大学 Attention mechanism-based patent term relationship extraction method for Bi-LSTM and keyword strategy
CN110825721A (en) * 2019-11-06 2020-02-21 武汉大学 Hypertension knowledge base construction and system integration method under big data environment
CN110826303A (en) * 2019-11-12 2020-02-21 中国石油大学(华东) Joint information extraction method based on weak supervised learning
CN111367986A (en) * 2020-03-12 2020-07-03 北京工商大学 Joint information extraction method based on weak supervised learning

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108536754A (en) * 2018-03-14 2018-09-14 四川大学 Electronic health record entity relation extraction method based on BLSTM and attention mechanism
CN109284396A (en) * 2018-09-27 2019-01-29 北京大学深圳研究生院 Medical knowledge map construction method, apparatus, server and storage medium
CN109255033A (en) * 2018-11-05 2019-01-22 桂林电子科技大学 A kind of recommended method of the knowledge mapping based on location-based service field
CN110046252A (en) * 2019-03-29 2019-07-23 北京工业大学 A kind of medical textual hierarchy method based on attention mechanism neural network and knowledge mapping
CN110807084A (en) * 2019-05-15 2020-02-18 北京信息科技大学 Attention mechanism-based patent term relationship extraction method for Bi-LSTM and keyword strategy
CN110334219A (en) * 2019-07-12 2019-10-15 电子科技大学 The knowledge mapping for incorporating text semantic feature based on attention mechanism indicates learning method
CN110598001A (en) * 2019-08-05 2019-12-20 平安科技(深圳)有限公司 Method, device and storage medium for extracting association entity relationship
CN110825721A (en) * 2019-11-06 2020-02-21 武汉大学 Hypertension knowledge base construction and system integration method under big data environment
CN110826303A (en) * 2019-11-12 2020-02-21 中国石油大学(华东) Joint information extraction method based on weak supervised learning
CN111367986A (en) * 2020-03-12 2020-07-03 北京工商大学 Joint information extraction method based on weak supervised learning

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112883736A (en) * 2021-02-22 2021-06-01 零氪科技(北京)有限公司 Medical entity relationship extraction method and device
CN112992317A (en) * 2021-05-10 2021-06-18 明品云(北京)数据科技有限公司 Medical data processing method, system, equipment and medium
CN113539409A (en) * 2021-07-28 2021-10-22 平安科技(深圳)有限公司 Treatment scheme recommendation method, device, equipment and storage medium
CN113539409B (en) * 2021-07-28 2024-04-26 平安科技(深圳)有限公司 Treatment scheme recommendation method, device, equipment and storage medium
CN113436698A (en) * 2021-08-27 2021-09-24 之江实验室 Automatic medical term standardization system and method integrating self-supervision and active learning
CN114036307A (en) * 2021-09-17 2022-02-11 清华大学 Knowledge graph entity alignment method and device
CN114417845A (en) * 2022-03-30 2022-04-29 支付宝(杭州)信息技术有限公司 Identical entity identification method and system based on knowledge graph
CN115080764A (en) * 2022-07-21 2022-09-20 神州医疗科技股份有限公司 Medical similar entity classification method and system based on knowledge graph and clustering algorithm
CN115080764B (en) * 2022-07-21 2022-11-01 神州医疗科技股份有限公司 Medical similar entity classification method and system based on knowledge graph and clustering algorithm
CN115312186A (en) * 2022-08-09 2022-11-08 北京至真互联网技术有限公司 Auxiliary screening system for diabetic retinopathy
CN115312186B (en) * 2022-08-09 2023-06-09 北京至真互联网技术有限公司 Auxiliary screening system for diabetic retinopathy
CN115036034B (en) * 2022-08-11 2022-11-08 之江实验室 Similar patient identification method and system based on patient characterization map
CN115036034A (en) * 2022-08-11 2022-09-09 之江实验室 Similar patient identification method and system based on patient characterization map
CN115658927A (en) * 2022-11-17 2023-01-31 浙江大学 Time sequence knowledge graph-oriented unsupervised entity alignment method and device
CN115658927B (en) * 2022-11-17 2023-04-11 浙江大学 Unsupervised entity alignment method and unsupervised entity alignment device for time sequence knowledge graph
CN115798733A (en) * 2023-01-09 2023-03-14 神州医疗科技股份有限公司 Intelligent auxiliary reasoning system and method for orphan disease
CN116092622A (en) * 2023-04-10 2023-05-09 江苏瀚云医疗信息技术有限公司 Electronic medical record quality control system based on Neo4j atlas and AI algorithm
CN116092622B (en) * 2023-04-10 2023-07-04 江苏瀚云医疗信息技术有限公司 Electronic medical record quality control system based on Neo4j atlas and AI algorithm
CN116434933A (en) * 2023-04-14 2023-07-14 酒泉海容网络科技有限公司 Intelligent nursing remote data processing method and system based on intelligent medical treatment
CN116434933B (en) * 2023-04-14 2023-10-24 湖南提奥医疗科技有限公司 Intelligent nursing remote data processing method and system based on intelligent medical treatment
CN116682553A (en) * 2023-08-02 2023-09-01 浙江大学 Diagnosis recommendation system integrating knowledge and patient representation
CN116682553B (en) * 2023-08-02 2023-11-03 浙江大学 Diagnosis recommendation system integrating knowledge and patient representation

Similar Documents

Publication Publication Date Title
CN112364174A (en) Patient medical record similarity evaluation method and system based on knowledge graph
CN112214995B (en) Hierarchical multitasking term embedded learning for synonym prediction
CN110609891B (en) Visual dialog generation method based on context awareness graph neural network
CN107977361B (en) Chinese clinical medical entity identification method based on deep semantic information representation
CN108984724B (en) Method for improving emotion classification accuracy of specific attributes by using high-dimensional representation
CN111160008A (en) Entity relationship joint extraction method and system
CN112818676A (en) Medical entity relationship joint extraction method
Zhang et al. Semi-supervised structured prediction with neural CRF autoencoder
CN107480194B (en) Method and system for constructing multi-mode knowledge representation automatic learning model
KR102361616B1 (en) Method and apparatus for recognizing named entity considering context
CN111460824A (en) Unmarked named entity identification method based on anti-migration learning
CN113707299A (en) Auxiliary diagnosis method and device based on inquiry session and computer equipment
CN111582506A (en) Multi-label learning method based on global and local label relation
CN116932722A (en) Cross-modal data fusion-based medical visual question-answering method and system
CN113657105A (en) Medical entity extraction method, device, equipment and medium based on vocabulary enhancement
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
CN114564959A (en) Method and system for identifying fine-grained named entities of Chinese clinical phenotype
Zhao et al. Deeply supervised active learning for finger bones segmentation
CN116822579A (en) Disease classification ICD automatic coding method and device based on contrast learning
CN115238026A (en) Medical text subject segmentation method and device based on deep learning
CN110867225A (en) Character-level clinical concept extraction named entity recognition method and system
CN114021584A (en) Knowledge representation learning method based on graph convolution network and translation model
CN112749277A (en) Medical data processing method and device and storage medium
CN116719840A (en) Medical information pushing method based on post-medical-record structured processing
CN116680407A (en) Knowledge graph construction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination