CN112364174A

CN112364174A - Patient medical record similarity evaluation method and system based on knowledge graph

Info

Publication number: CN112364174A
Application number: CN202011131273.1A
Authority: CN
Inventors: 郭伟; 宋贤; 鹿旭东; 孔兰菊; 崔立真
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2020-10-21
Filing date: 2020-10-21
Publication date: 2021-02-12

Abstract

The invention provides a patient medical record similarity evaluation method and system based on a knowledge graph, which are used for acquiring and preprocessing the text data of the patient medical record; performing entity identification and entity relationship extraction on the preprocessed data by adopting a combined information extraction model based on weak supervised learning, and expressing the obtained medical entity type by using a knowledge vector; constructing triples in the knowledge graph according to the obtained entity relationship, and expressing the triples by using a joint knowledge vector; merging the knowledge vectors into the same semantic space by using the joint vector, and judging the similarity by calculating the semantic distance between the entities; the invention excavates and extracts important knowledge in the electronic medical record by using the bidirectional circulation neural network for multiple times, expands the knowledge graph concept in the subject field into the medical field, and carries out entity identification and relationship extraction under the current situation that the types of medical entities are relatively limited, thereby improving the accuracy of similarity evaluation.

Description

Patient medical record similarity evaluation method and system based on knowledge graph

Technical Field

The invention relates to the technical field of text data processing, in particular to a patient medical record similarity evaluation method and system based on a knowledge graph.

Background

The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.

An Electronic Medical Record (EMR) refers to digital electronic information such as characters, diagrams, figures, data, etc. generated by medical staff using a medical institution information system during the medical treatment of a patient, and has the functions of storing, managing, transmitting and reproducing medical records. With the continuous development of intelligent medical treatment and informatization, electronic medical cases gradually replace paper medical records and become main carriers for recording personal medical information and health information, the electronic medical cases have more detailed records on the medical activities of patients, contain a large amount of medical information related to the health conditions of the patients, and have great utilization value for diagnosis, case analysis, prognosis and the like in the medical field. In the case data with mass and diversified information types, similarity duplication checking and analysis are carried out on unstructured electronic medical records, illegal copied medical record contents are searched in an auxiliary mode, and the quality of medical records in hospitals can be effectively controlled. The unstructured medical record text is subjected to information extraction, so that the structured method is divided into two methods, one method is rule-based, namely a regular expression meeting a certain grammatical rule is designed, and the regular expression is matched with the medical record to find corresponding information types, such as diseases, symptoms, treatment means, inspection means and the like. The other method is based on natural language processing, and mainly depends on technologies such as word segmentation, information extraction, syntactic analysis and the like to extract entity words in an unstructured text, and then the entity words are matched with words in a medical term dictionary to obtain medical vocabularies in medical history texts for entity recognition.

Information extraction is an important step for constructing a knowledge graph, is generally divided into two subtasks of entity identification and relationship extraction, and is widely applied to solving the extraction problem of entities and relationships thereof by a Pipelined method and a joint learning method. The tapeled method treats information extraction as two independent tasks, namely Named Entity Recognition (NER) and Relationship Classification (RC). In addition, the knowledge graph displays the key points and the internal relation of knowledge in information in a visual mode, is a basic stone for realizing intelligent medical treatment, and has important significance for clinical decision support, personalized medical service and the like.

The inventor finds that the knowledge graph is not widely applied in the medical aspect, and mainly has the difficulties in two aspects of unstructured text extraction and knowledge graph drawing; the text information extraction is realized by two subtasks, so that non-negligible error propagation can be generated, and the conventional supervised learning has the conditions of time consumption and labor consumption, so that the triple information of the information extraction has the conditions of loss, inaccuracy, non-smoothness and the like.

Disclosure of Invention

In order to solve the defects of the prior art, the invention provides a patient medical record similarity evaluation method and system based on a knowledge graph, which adopt a deep learning method to extract information from an electronic medical record, excavate and extract important knowledge in the electronic medical record by using a bidirectional cyclic neural network for multiple times, expand the knowledge graph concept in the subject field into the medical field, and carry out entity identification and relationship extraction under the current situation that the types of medical entities are relatively limited, thereby improving the accuracy of similarity evaluation.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention provides a patient medical record similarity evaluation method based on a knowledge graph.

A patient medical record similarity evaluation method based on a knowledge graph comprises the following steps:

acquiring and preprocessing the text data of the patient medical record;

performing entity identification and entity relationship extraction on the preprocessed data by adopting a combined information extraction model based on weak supervised learning, and expressing the obtained medical entity type by using a knowledge vector;

constructing triples in the knowledge graph according to the obtained entity relationship, and expressing the triples by using a joint knowledge vector;

and merging the knowledge vectors into the same semantic space by using the joint vector, and judging the similarity of the medical records by calculating the semantic distance between disease entities.

The invention provides a patient medical record similarity evaluation system based on a knowledge graph.

A system for evaluating similarity of medical records of patients based on a knowledge graph comprises:

a data acquisition module configured to: acquiring and preprocessing the text data of the patient medical record;

an information extraction module configured to: performing entity identification and entity relationship extraction on the preprocessed data by adopting a combined information extraction model based on weak supervised learning, and expressing the obtained medical entity type by using a knowledge vector;

a triplet building module configured to: constructing triples in the knowledge graph according to the obtained entity relationship, and expressing the triples by using a joint knowledge vector;

a similarity discrimination module configured to: and merging the knowledge vectors into the same semantic space by using the joint vector, and judging the similarity by calculating the semantic distance between the entities.

A third aspect of the present invention provides a computer-readable storage medium, on which a program is stored, which, when being executed by a processor, implements the steps of the method for similarity evaluation of patient medical records based on a knowledge-graph according to the first aspect of the present invention.

A fourth aspect of the present invention provides an electronic device, which includes a memory, a processor, and a program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method for estimating similarity of medical records of patients based on a knowledge graph according to the first aspect of the present invention.

Compared with the prior art, the invention has the beneficial effects that:

1. the method, the system, the medium or the electronic equipment adopts a deep learning method to extract information from the electronic medical record, excavates and extracts important knowledge in the electronic medical record by using a bidirectional circulation neural network for multiple times, expands the knowledge graph concept in the subject field into the medical field, performs entity identification and relationship extraction under the current situation that the types of medical entities are relatively limited, and improves the accuracy of similarity evaluation.

2. The method, the system, the medium or the electronic equipment of the invention preprocesses the data of the unstructured medical record data and calculates the similarity of the medical record data by utilizing deep semantic information, and compared with the traditional field full matching method, the identification accuracy is greatly improved.

3. According to the method, the system, the medium or the electronic equipment, an attention mechanism is added into a BI-LSTM-CRF network to construct a joint knowledge extraction network, the relation extraction is carried out while the entity is identified, the method is used for quickly constructing the triples in the knowledge graph, and error propagation between subtasks caused by the fact that the information extraction is divided into two subtasks is avoided.

4. The method, the system, the medium or the electronic equipment provided by the invention have the advantages that the complex work of manually marking a data set is reduced while the effective information extraction of the data is carried out by the joint knowledge extraction network based on the weak supervision learning, and the expandability is better.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

Fig. 1 is a schematic flow chart of a method for evaluating similarity of medical records of a patient based on a knowledge graph according to embodiment 1 of the present invention.

FIG. 2 is a schematic diagram of a BI-LSTM network structure provided in embodiment 1 of the present invention.

Fig. 3 is a schematic diagram of an LSTM-CRF network structure provided in embodiment 1 of the present invention.

FIG. 4 is a schematic diagram of a BI-LSTM-CRF network according to embodiment 1 of the present invention.

Fig. 5 is a schematic diagram of an Att-BiLSTM network structure provided in embodiment 1 of the present invention.

Detailed Description

The invention is further described with reference to the following figures and examples.

It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.

Example 1:

as shown in fig. 1, embodiment 1 of the present invention provides a method for evaluating similarity of medical records of patients based on a knowledge graph, including the following steps:

firstly, entity recognition is carried out by adopting a bidirectional LSTM network (BI-LSTM-CRF) with a conditional random field layer, and the obtained medical entity type is expressed by using a knowledge vector;

then adding an attention mechanism (Att-BilSTM) into a BI-LSTM-CRF network, constructing a Joint knowledge extraction network (JKENet), performing relationship extraction while identifying an entity, and expressing the relation by using a Joint knowledge vector, wherein the JKENet is used for quickly constructing triples in a knowledge graph;

and finally, merging the knowledge vectors into the same semantic space by using the joint vector, and realizing the discrimination of the similarity by calculating the semantic distance between the entities.

In detail, the following contents are included:

in the embodiment, the complex work of manually marking the data set is reduced through weak supervised learning while effective information extraction is carried out on the data, and the method has better expandability compared with methods such as supervised learning.

Medical entity identification is carried out by adopting a bidirectional LSTM network (BI-LSTM-CRF) with a conditional random field layer, and the type of the medical entity is obtained and expressed by a knowledge vector.

An attention mechanism is introduced into the BI-LSTM-CRF network, attention probability is calculated to highlight the importance degree of key words in medical records, and an Att-BilTM network model is used for extracting the evolution relation to obtain a result. And the established joint knowledge extraction network performs relationship extraction while identifying the entity, and is used for constructing the triples in the knowledge graph.

Combining the knowledge vectors into the same semantic space by using the joint vector, combining the triples of the plurality of knowledge maps together for training, and constraining the space where the triples with similar relation are located; joint knowledge embedding is realized through TransE with parameter sharing and soft alignment, and the semantic distance between entities is calculated to realize the discrimination of the similarity.

(1) Bidirectional LSTM network based on conditional random field layer (BI-LSTM-CRF)

The BI-directional LSTM network can effectively use past features (via forward states) and future features (via backward states) within a specified time frame to train the BI-LSTM network via back propagation of time (BPTT), the structure of which is shown in fig. 2.

Over time, the forward and backward delivery over an expanded network is similar to that in a conventional network, except that the hidden state needs to be expanded for all time steps, we also need special processing at the beginning and end of the data points. When the whole sentence is scanned forwards and backwards, the hidden state is only required to be reset to 0 at the beginning of the sentence, and a plurality of sentences can be processed simultaneously by batch implementation. The construction of the private information identification model is completed by extracting some basic features and some special features in the electronic medical record.

The Conditional Random Field (CRF) model focuses on sentence level rather than individual locations, and the inputs and outputs of CRF are directly connected, as opposed to LSTM and BI-LSTM networks, which are connected together by memory cells and circulation components. By integrating the LSTM network and the CRF network into an LSTM-CRF model, as shown in fig. 3.

Through the LSTM layer, the model can effectively utilize past input characteristics, and through the CRF layer, the model can effectively utilize sentence-level label information. The CRF layers are represented by lines connecting successive output layers. The CRF layer has a state transition matrix as a parameter. With such a layer, past and future tags can be effectively utilized to predict current tags, similar to the ability of a two-way LSTM network to take advantage of past and future input features.

Then a BI-directional LSTM network and a CRF network are combined into a BI-LSTM-CRF network, and the network structure is shown in FIG. 4. In addition to being able to utilize past input features and sentence-level tag information like the LSTM-CRF model, the BI-LSTM-CRF model is also able to utilize future input features, an additional function that may improve the accuracy of the annotation.

The Bi-LSTM bidirectional long and short term memory network is adopted to take the generated vector as input and generate a prediction vector of the target word as output, the operation of the iteration module mainly comprises a vector layer, a forward long and short term memory network layer, a backward long and short term memory network layer and a connection layer, and the output vector is changed according to the output of the forward long and short term memory network layer and the output of the backward long and short term memory network layer. Given the training set, forward LSTM considers context information in front of the target word, i.e., from ω₁To omega_tUpper and lower ofText information, a vector c of the target word is obtained_tThe specific calculation is shown in the following formula:

i_t＝δ(W_wiω_t+W_hih_t-1+W_cic_t-1+b_i) (1)

c_t＝f_tc_t-1+tanh(W_ωcω_t+W_hch_t-1+b_c)i_t (3)

wherein W ═ { ω ═ in formula (1)₁,...ω_t,ω_t+1...ω_nDenotes a sequence of words, ω_t∈R^dA vector representation representing the t-th word in a certain sentence, the word vector being a d-dimensional word vector, n representing the number of words in the sentence, h_t-1Representing the pre-hidden vector in the memory module in Bi-LSTM, c_t-1Representing the previous original vector in the memory module. Meanwhile, the target word is calculated to the LSTM after passing through the backward direction, and the context information behind the target word is considered, namely from omega_t+1To omega_nGets another vector o_tThe specific calculation is shown in formula (4):

o_t＝δ(W_ωoω_t+W_hoh_t-1+W_coc_t+b_o) (4)

finally, the two vectors c generated simultaneously are obtained_tAnd o_tInputting the connection layer, and obtaining the vector h of the target word by using a hyperbolic tangent function_tThe specific calculation is shown in formula (5):

h_t＝o_ttanh(c_t) (5)

(2) bidirectional LSTM network (Att-BilSTM) based on attention mechanism

After the entity is identified, the relationship extraction is required to complete the task of information extraction, but if the information extraction is divided into two subtasks, error propagation between the subtasks can be caused, and if a supervised mode is adopted to train the model when the relationship extraction is realized, the situations that data labeling is time-consuming and labor-consuming and most of manpower is occupied can occur. Aiming at the problems, the weak supervised learning-based joint information extraction is adopted, a remote supervision method and a remote supervision mode are used for reference, and the weak supervised learning joint information extraction which can ensure the information extraction effect, reduce manual labeling data sets as much as possible and has good extraction speed and expandability is expected to be constructed.

An Attention mechanism is introduced to a bidirectional LSTM network, an Att-BilSTM network is constructed to process the relevant problem of text classification, and the problem that a CNN model is not suitable for learning long-distance semantic information is solved. In the Att-BilSTM network, it is composed of 5 parts:

input layer (Input layer): the input sentence is referred, and for Chinese, the words divided into sentences are referred;

embedding layer: mapping each word in the sentence into a vector with a fixed length;

LSTM layer: calculating an embedding vector by using a bidirectional LSTM, wherein the bidirectional LSTM actually obtains a vector of a sentence at a higher level by calculating a word vector;

an Attention layer: using Attention weighting on the results of the bi-directional LSTM;

output layer (Output layer): and the output layer outputs a specific result.

In Bi-LSTM we will use the last time-series output vector as the feature vector and then perform softmax classification. Attenttion is to calculate the weight of each time sequence, then to make the weighted sum of all time sequence vectors as the feature vector, and then to make softmax classification. In the experiment, the addition of the Attention does improve the result, and the model structure is shown in fig. 5.

Wherein, the coding layer adopts a bidirectional RNN network, and the output of the last hidden layer is the splicing of two vectors expressed as

And the output of the Attention layer is

In the above formula h_jIs the output of the hidden layer at time j of the coding layer, s_i-1Is the output of the hidden layer at the i-1 th time of the decoding layer. It can thus be found that in the calculation c_iIs actually a linear model, and c_iIn fact a weighted average of the hidden layer outputs at each time instant in the coding layer.

Using the encoded layer vector as input, generating the sequence label as output, and generating the final prediction vector h_tAnd multiplying the forward LSTM prediction vector by the position sequence number of the word, updating and connecting, finally obtaining the prediction vector by hyperbolic tangent operation, multiplying the prediction vector by the position vector of the prediction vector and adding the deviation value of the prediction vector, and obtaining the prediction label vector as the output T_t. The generated semantic vector is input into a Softmax layer for similarity calculation, the generated entity label probability is added with a TransE link similarity calculation probability value for normalization, and the probability of the entity label is output, wherein the specific calculation is as follows:

wherein W_yIs a matrix of a Softmax layer, N_tIndicates the number of labels, T_tRepresenting a prediction tag vector, y_tRepresenting entity relationship label probability to obtain

Normalized tag probabilities are shown.

(3) Federated knowledge extraction network

The semantic distance calculation method relies on a joint vector generation model, and can adopt distance calculation between any vectors, such as Euclidean distance. Retraining entities and relations involved in original data by a knowledge representation learning method, converting the entities and relations into an embedded vector form, adopting a TransE model in representation learning for a retraining model, randomly initializing a training set into a vector form as input, and generating word vectors corresponding to an entity set and a predefined relation set in the training set as output. Giving an entity set, a relation set and a training set, constructing a negative sample by replacing a head entity or a tail entity randomly through the training set, calculating the distance between a correct triple entity and the relation, the distance between an entity relation in the negative sample, adjusting the error between the entity relation and the negative sample, and expressing the entity relation as a vector conforming to the real relation, wherein a TransE loss function is as follows:

in equation (10), the loss function of the TransE is divided into a sum of a hyperparameter and a difference between a positive sample distance and a negative sample distance, where γ represents the hyperparameter, f (h, r, t) represents the distance of the positive sample, f (h ', r', t ') represents the distance of the negative sample, Δ represents a positive sample set, Δ' represents a negative sample set, and [ x ] represents max (0, x).

The triples and aligned entities in the knowledge graph are word vectors for learning joint knowledge, two knowledge bases are obtained through TransE and an expansion method PTransE thereof to respectively learn own knowledge vectors, the knowledge vectors are combined into the same semantic space through the joint vectors, and the joint vectors are obtained by the aligned entities. In the merged semantic space, the alignment between the entities is realized through the semantic distance between the entities, and the calculation method of the semantic distance depends on the generation model of the joint vector, and is shown in formula 11 for the use of the energy function:

the value of the energy function is less than the threshold and the two entities are considered similar. And updating the joint vector and finding a new entity pair by using the entity pair obtained by the new alignment, wherein the iterative learning joint vector and the entity alignment adopt two strategies of hard alignment and soft alignment.

Through a combined information extraction algorithm of weak supervised learning, the method realizes error propagation among subtasks, does not need a large amount of time and manpower to label data, extracts valuable triple information from a text, and solves the problems of non-uniform and non-standard data formats of the existing case history information management. The knowledge base is trained through an algorithm combining TransE and expression learning, a semantic vector space which is more consistent with the real world is generated, and the similarity calculation of the patient medical record based on the knowledge map is realized.

Example 2:

the embodiment 2 of the invention provides a patient medical record similarity evaluation system based on a knowledge graph, which comprises:

The working method of the system is the same as the method for evaluating the similarity of the patient medical record based on the knowledge graph provided in the embodiment 1, and the details are not repeated here.

Example 3:

embodiment 3 of the present invention provides a computer-readable storage medium, on which a program is stored, where the program, when executed by a processor, implements the steps in the method for assessing similarity of medical records of a patient based on a knowledge graph according to embodiment 1 of the present invention, where the steps are as follows:

acquiring and preprocessing the text data of the patient medical record;

The detailed steps are the same as the method for evaluating the similarity of the patient medical record based on the knowledge graph provided in the embodiment 1, and are not repeated herein.

Example 4:

an embodiment 4 of the present invention provides an electronic device, including a memory, a processor, and a program stored in the memory and executable on the processor, where the processor executes the program to implement the steps in the method for evaluating similarity of medical records of patients based on a knowledge graph according to the first aspect of the present invention, where the steps are as follows:

acquiring and preprocessing the text data of the patient medical record;

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A patient medical record similarity evaluation method based on a knowledge graph is characterized by comprising the following steps:

acquiring and preprocessing the text data of the patient medical record;

and merging the knowledge vectors into the same semantic space by using the joint vector, and judging the similarity by calculating the semantic distance between the entities.

2. The method as claimed in claim 1, wherein the information extraction model is a combination of a bi-directional LSTM network with conditional random field layers for entity recognition, and the medical entity type is expressed by knowledge vector.

3. The method of claim 1, wherein the information extraction model is combined with an attention-based two-way LSTM network with conditional random field layers for entity relationship extraction.

4. The method of claim 3, wherein the Attention-based two-way LSTM network with conditional random field layers comprises an input layer, an Embedding layer, an LSTM layer, an Attention layer and an output layer;

an Embedding layer configured to: mapping each word in the sentence into a vector with a fixed length;

an LSTM layer configured to: calculating an embedding vector by using a bidirectional LSTM;

an Attention layer configured to: attention weighting is used for the results of the bi-directional LSTM.

5. The method for assessing similarity of medical records of patients based on a knowledge-graph as claimed in claim 1, wherein the similarity determination specifically comprises:

combining the knowledge vectors into the same semantic space by using the joint vector, combining the triples of the plurality of knowledge maps together for training, and constraining the space where the triples with similar relation are located;

and performing joint knowledge embedding through a TransE model with parameter sharing and soft alignment, and calculating the semantic distance between entities to realize the discrimination of the similarity.

6. The method of claim 5, wherein the entities and relationships involved in the raw data are retrained by means of knowledge representation learning and converted into an embedded vector form;

retraining by adopting a TransE model, randomly initializing a training set into a vector form as input, and generating word vectors corresponding to an entity set and a predefined relation set in the training set as output;

giving an entity set, a relation set and a training set, randomly replacing a head entity or a tail entity through the training set to construct a negative sample, calculating the distance between a correct triple entity and the relation, calculating the distance between the entity relation in the negative sample, adjusting the error between the entity relation and the negative sample, and representing the entity relation into a vector which accords with the real relation.

7. The method of claim 5, wherein the alignment between the entities is performed by semantic distance between the entities in the merged semantic space, and the new aligned pair of entities is used to update the join vector and find a new pair of entities.

8. A system for evaluating similarity of medical records of patients based on a knowledge graph is characterized by comprising:

9. A computer-readable storage medium, on which a program is stored, which, when being executed by a processor, carries out the steps of the method for similarity assessment of patient medical records based on a knowledge-graph according to any one of claims 1 to 7.

10. An electronic device comprising a memory, a processor, and a program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for similarity assessment of patient medical records based on a knowledge graph according to any one of claims 1-7 when executing the program.