CN115080764B

CN115080764B - Medical similar entity classification method and system based on knowledge graph and clustering algorithm

Info

Publication number: CN115080764B
Application number: CN202210856458.1A
Authority: CN
Inventors: 刘硕; 杨雅婷; 宋佳祥; 朱宁; 白焜太; 许娟; 史文钊
Original assignee: Digital Health China Technologies Co Ltd
Current assignee: Digital Health China Technologies Co Ltd
Priority date: 2022-07-21
Filing date: 2022-07-21
Publication date: 2022-11-01
Anticipated expiration: 2042-07-21
Also published as: CN115080764A

Abstract

The invention relates to the technical field of knowledge maps, in particular to a medical similar entity classification method and a medical similar entity classification system based on a knowledge map and a clustering algorithm, wherein the method comprises the steps of forming data of a medical database into a triple data set, taking the triple data set as a training set, training a knowledge map learning model to obtain the medical knowledge map expressed by vectorization of the medical database, obtaining representative vectors of triples by the triples through a mean pooling layer, clustering the representative vectors of entities and relations by using a unsupervised clustering algorithm Kmeans to obtain a similar term entity library in the medical knowledge map, taking the entities in the same cluster as positive samples, taking the entities in different clusters as negative samples, inputting the positive samples and the negative samples, training an entity similar classification model, and performing similar judgment on the entities based on the entity similar classification model; the invention solves the problem of complicated classification of manually labeled similar entities and realizes the non-manual accurate construction of the medical knowledge graph.

Description

Medical similar entity classification method and system based on knowledge graph and clustering algorithm

Technical Field

The invention relates to the technical field of knowledge graphs, in particular to a medical similar entity classification method and system based on knowledge graphs and a clustering algorithm.

Background

The knowledge graph is composed of nodes and edges, and the multi-relation graph generally comprises various types of nodes and various types of edges. Entities (nodes) refer to things in the real world such as people, place names, concepts, drugs, companies, etc., and relationships (edges) are used to express some kind of connection between different entities, such as people- "live in" -beijing, zhang and li are "friends", logistic regression is "lead knowledge" for deep learning, etc.

At present, the applications based on the medical knowledge graph are wide, such as intelligent question answering, visualization, searching and the like based on the knowledge graph, but similar entity classification tasks which do not need manual marking based on the constructed knowledge graph are still to be developed, and difficulty is caused to the construction of the medical knowledge graph.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a medical similar entity classification method and system based on a knowledge graph and a clustering algorithm, so as to solve the problem that the task of classifying similar entities without manual labeling is difficult based on the established knowledge graph and realize the classification of the similar entities without manual labeling of the knowledge graph.

In order to solve the problems, the invention adopts the following technical scheme:

based on whether the entities are similar or not manually labeled during the current similar entity classification task, the similar entity classification task without the manual labeling is provided, firstly, entity relationship nodes in a knowledge graph are converted into vector representation, clustering is carried out based on the entity nodes and relationship triples represented by the vectorization, similar entities are obtained through clustering, positive and negative samples are constructed according to the clustering results of the similar entities, and the positive and negative samples serve as input data to train a similar entity classification model.

In a first aspect, the present invention provides a method for classifying medical similar entities based on a knowledge graph and a clustering algorithm, comprising:

s100, forming a triple data set by data of a medical database, taking the triple data set as a training set, selecting correct triples and error triples from the training set, inputting a knowledge graph learning model for training, generating the knowledge graph learning model, obtaining updated vectorization representations of embedded layer entities and relations based on the knowledge graph learning model, and obtaining a medical knowledge graph represented vectorially by the medical database;

s200, based on the obtained vectorization-expressed medical knowledge graph of the medical database, obtaining representative vectors of triples from the triples through a mean pooling layer, and clustering the representative vectors of entities and relations by using an unsupervised clustering algorithm Kmeans to obtain a similar term entity library in the medical knowledge graph;

s300, based on the similar term entity library in the medical knowledge graph, taking the entities in the same cluster as positive samples, taking the entities in different clusters as negative samples, inputting the positive samples and the negative samples, training an entity similar classification model, and performing similar judgment on the entities based on the entity similar classification model.

As an implementation manner, in the step S200, the clustering the representative vectors of the entities and the relations by using an unsupervised clustering algorithm Kmeans includes:

s201, randomly selecting K entities as central points in a data set of the medical knowledge graph;

s202, defining a loss function, and calculating the similarity between entities;

and S203, for each entity in the data set, distributing the entity to the nearest central point according to the calculated cosine distance, re-acquiring K clusters, and for each re-acquired cluster, re-calculating the central point of the cluster until the loss function is converged.

As an implementation manner, the loss function in step S202 is:

，

，

where a and B are the attribute vectors of hypothetical vectors a and B, respectively, ai and Bi represent the components of attribute vectors a and B, respectively, α is the angle between vectors a and B, dist (a, B) represents the cosine distance between vectors a and B.

As an implementation manner, the calculating of the entity similarity classification model in step S300 includes:

s301, mapping the positive sample and the negative sample through an embedding layer weight matrix to obtain word vectors of the embedding layers of the positive sample and the negative sample, and representing the word vectors as an embedding layer matrix of input data, wherein the dimensionality of the word vectors of the embedding layers is 256 dimensionalities;

s302, extracting time series characteristics of word vectors of the positive sample embedding layer and the negative sample embedding layer through the inside of lstm;

s303, carrying out secondary classification through the linear layer, and judging whether the two classes are similar according to the following formula:

wherein, W₃A weight matrix for the last linear layer; ht is the final hidden state output of the lstm network; p is the probability value of whether the final outputs are similar;

is the result of the output of the LSTM after passing through the linear layer; softmax is a normalization function, pair

And (5) normalizing to ensure that the results are distributed in an interval of 0 to 1.

As an implementable manner, in the step S302, the passing lstm internally extracts the time-series features of the word vectors of the positive sample and negative sample embedding layers:

serially inputting the word vectors of the positive sample embedding layer and the negative sample embedding layer into an LSTM calculating unit, and obtaining Lstm _ embedding vector representations in different sequence directions through calculation of the following formula:

wherein the content of the first and second substances,

、

、

in order to input the information into the gate,

in order to forget to leave the door,

to output the gate, parameter

Representing the weight matrix of the linear layer for the memory cell W, x_tA representative vector, h, corresponding to the character currently input by the computing module_t-1Indicating the hidden layer state output corresponding to the last character, c_t-1And b represents a bias weight matrix of the linear layer, and tanh and sigma are activation functions.

As an implementation manner, in the step S100, the selecting correct triples and incorrect triples from the training set, and inputting a knowledge-graph learning model for training includes:

the correct triplet is S (h, l, t), the error triplet is S '(h', l, t) or S '(h, l, t'), wherein h is a head entity, t is a tail entity, l is the relation between h and t, and h 'and t' are respectively obtained by replacing the head entity and the tail entity by a random entity;

and judging the correct triples and the wrong triples by a distance calculating method which comprises the following steps:

；

said loss function

Comprises the following steps:

，

wherein [ x ] + represents: max (0, x), λ are adjustable hyper-parameters.

In a second aspect, the present invention provides a medical similar entity classification system based on knowledge graph and clustering algorithm, including: the system comprises a medical knowledge map vectorization representation module, a similar term entity library construction module and an entity similarity judgment module;

the medical knowledge map vectorization representation module is used for forming a triple data set by data of a medical database, taking the triple data set as a training set, selecting correct triples and error triples from the training set, inputting a knowledge map learning model for training, generating a knowledge map learning model, obtaining vectorization representations of updated embedded layer entities and relations as representation vectors of a knowledge map based on the knowledge map learning model, and obtaining the vectorization representations of the medical knowledge map of the medical database;

the similar term entity library construction module is used for acquiring representative vectors of triples through a mean pooling layer based on the obtained vectorized medical knowledge map of the medical database, and clustering the representative vectors of entities and relations by using an unsupervised clustering algorithm Kmeans to obtain a similar term entity library in the medical knowledge map;

and the entity similarity judgment module is used for taking the entities in the same cluster as positive samples and the entities in different clusters as negative samples based on the similar term entity library in the medical knowledge graph, inputting the positive samples and the negative samples, training an entity similarity classification model, and performing similarity judgment on the entities based on the entity similarity classification model.

As an implementation manner, the similar term entity library construction module comprises a central point selection unit, a similarity calculation unit and a central point re-determination unit;

the central point selecting unit is used for randomly selecting K entities as central points in the data set of the medical knowledge map;

the similarity calculation unit is used for defining a loss function and calculating the similarity between entities;

and the central point re-determining unit is used for distributing each entity in the data set to a central point closest to the entity according to the calculated cosine distance, re-acquiring K clusters, and re-calculating the central point of each cluster for each newly acquired cluster until the loss function is converged.

As an implementable manner, the loss function in the similarity calculation unit is:

，

，

wherein, A and B are the attribute vectors of the assumed vectors a and B, ai and Bi represent the components of the attribute vectors A and B, respectively, α is the angle between the vectors a and B, dist (A, B) represents the cosine distance between the vectors a and B.

As an implementation manner, the entity similarity judgment module comprises a word vector determination unit, a time series feature extraction unit and a similarity judgment unit;

the word vector determining unit is used for mapping the positive sample and the negative sample through an embedding layer weight matrix to obtain word vectors of the embedding layers of the positive sample and the negative sample, the word vectors are used as the embedding layer matrix representation of input data, and the dimensionality of the word vectors is 256 dimensionalities;

the time series feature extraction unit is used for extracting time series features of the word vectors of the positive sample embedding layer and the negative sample embedding layer through the inside of lstm;

the similarity judging unit is used for carrying out two classifications through the linear layer and judging whether the two classifications are similar according to the following formula:

And (5) carrying out normalization so that the results are distributed in an interval of 0 to 1.

As an implementable manner, in the time-series feature extraction unit, the extracting time-series features of the word vectors of the positive sample and negative sample embedding layers through the lstm interior includes:

wherein, the first and the second end of the pipe are connected with each other,

、

、

in order to input the data to the gate,

in order to forget to leave the door,

to output the gate, parameter

Representing the weight matrix of the linear layer for the memory cell W, x_tRepresenting the current character pair input by the computing moduleShould represent the vector, h_t-1Indicating the hidden layer state output corresponding to the last character, C_t-1And b represents a bias weight matrix of the linear layer, and tanh and sigma are activation functions.

As an implementation manner, the selecting, by the medical knowledge-graph vectorization representation module, correct triples and incorrect triples from the training set, and inputting a knowledge-graph learning model for training includes:

；

said loss function

Comprises the following steps:

，

wherein [ x ] + represents: max (0, x), λ are adjustable hyper-parameters.

In a third aspect, the invention provides a computer apparatus comprising:

a memory for storing a computer program;

and the processor is used for realizing the steps of the medical similar entity classification method based on the knowledge graph and the clustering algorithm when executing the computer program.

In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the above medical similar entity classification method based on the knowledge-graph and clustering algorithm.

The invention has the beneficial effects that: according to the medical similar entity classification method and system based on the knowledge graph and the clustering algorithm, the medical knowledge graph expressed in a vectorization mode is constructed, clustering is carried out through the unsupervised clustering algorithm, and similarity judgment is carried out on the entities through the lstm entity similar classification model, so that the accurate medical knowledge graph is formed.

Drawings

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings, in which:

fig. 1 is a flow chart of a medical similar entity classification method based on a knowledge graph and a clustering algorithm according to an embodiment of the present invention.

Fig. 2 is a schematic flow chart of clustering representative vectors of entities and relationships by using an unsupervised clustering algorithm Kmeans according to the embodiment of the present invention.

FIG. 3 is a schematic diagram of a calculation process of the entity similarity classification model according to the embodiment of the present invention.

Fig. 4 is a schematic diagram of a medical similar entity classification system based on a knowledge graph and a clustering algorithm according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to specific examples.

It should be noted that these examples are only for illustrating the present invention, not for limiting the present invention, and that the simple modification of the method based on the idea of the present invention is within the scope of the claimed invention.

The method comprises the steps of converting entity nodes and relation nodes in a knowledge graph into embedding vector representation, clustering based on the vectorized entity nodes and the vectorized triple representation of the relation, obtaining similar entities, constructing positive and negative samples according to the clustering results of the similar entities, and using the positive and negative samples as input data to train a similar entity classification model.

Referring to fig. 1, a method for classifying medical similar entities based on a knowledge graph and a clustering algorithm includes:

s100, forming a triple data set by data of a medical database, taking the triple data set as a training set, selecting correct triples and incorrect triples from the training set, inputting a knowledge graph learning model for training, generating the knowledge graph learning model, obtaining updated vectorization representation of embedded layer entities and relations as representation vectors of a knowledge graph based on the knowledge graph learning model, and obtaining the vectorization representation of the medical knowledge graph of the medical database.

judging the correct triples and the incorrect triples by a distance calculating method, wherein the distance calculating method comprises the following steps:

；

randomly generating an initialized entity vector and a relation vector, then normalizing the initialized vector to express a knowledge graph, wherein the target of the whole algorithm is to obtain the parameters of each entity vector and relation vector to be determined, the vector of each knowledge graph in the knowledge base is set as the parameter to be determined by the method, and the target is to obtain all the coefficients to be determined, namely to obtain the best network, and the prediction can be carried out by the network; how to find eachOne pending parameter: as with linear regression, a loss function is introduced, which is

Comprises the following steps:

，

wherein [ x ] + represents: max (0, x), λ are adjustable hyper-parameters.

S200, based on the obtained vectorization-expressed medical knowledge graph of the medical database, obtaining representative vectors of triples from the triples through a mean pooling layer, and clustering the representative vectors of entities and relations by using an unsupervised clustering algorithm Kmeans to obtain a similar term entity library in the medical knowledge graph.

According to h ', r' and t 'respectively obtained in the obtained triples (h, r and t), vector representation h', r 'and t' are obtained through a mean pooling layer, the vector is used as a representative vector g of the obtained triples, and a mean pooling layer formula is defined as follows:

。

referring to fig. 2, as an implementation manner, the clustering the representative vectors of the entities and the relations by using an unsupervised clustering algorithm Kmeans includes:

s203, for each entity in the data set, distributing the entity to the central point closest to the entity according to the calculated cosine distance, re-acquiring K clusters, and for each re-acquired cluster, re-calculating the central point of the cluster until the loss function converges.

Wherein, the loss function in step S202 is:

，

，

Referring to fig. 3, as an implementation manner, the calculating of the entity similarity classification model in step S300 includes:

s303, performing secondary classification through the linear layer, and judging whether the two classes are similar according to the following formula:

wherein, W₃A weight matrix for the last linear layer; ht is lOutputting the last hidden state of the stm network; p is the probability value of whether the final outputs are similar;

Specifically, in the step S302, the extracting time-series features of the word vectors of the positive sample and the negative sample embedding layer through the lstm interior includes:

wherein the content of the first and second substances,

、

、

in order to input the data to the gate,

to forget the door，

To output the gate, parameter

Representing the weight matrix of the linear layer for the memory cell W, x_tRepresenting the corresponding expression vector h of the character input by the current computing module_t-1Indicating the hidden layer state output corresponding to the last character, c_t-1And b represents a bias weight matrix of the linear layer, and tanh and sigma are activation functions.

Referring to fig. 4, the system for classifying medical similar entities based on knowledge graph and clustering algorithm includes: the medical knowledge map vectorization representation module 100, the similar term entity library construction module 200 and the entity similarity judgment module 300;

the medical knowledge graph vectorization representation module 100 is configured to form a triple data set from data of a medical database, use the triple data set as a training set, select a correct triple and an incorrect triple from the training set, input a knowledge graph learning model for training, generate a knowledge graph learning model, obtain a vectorized representation of updated embedded layer entities and relationships as a representation vector of a knowledge graph based on the knowledge graph learning model, and obtain a vectorized representation of the medical knowledge graph of the medical database;

the similar term entity library construction module 200 is configured to obtain, based on the obtained vectorized medical knowledge graph of the medical database, representative vectors of triples from the triples through a mean pooling layer, and perform clustering on the representative vectors of entities and relations by using an unsupervised clustering algorithm Kmeans to obtain a similar term entity library in the medical knowledge graph;

the entity similarity determination module 300 is configured to use entities in the same cluster as a positive sample and entities in different clusters as negative samples based on a similar term entity library in the medical knowledge graph, input the positive sample and the negative sample, train an entity similarity classification model, and perform similarity determination on the entities based on the entity similarity classification model.

As an implementation manner, the similar term entity library construction module 200 includes a central point selection unit 201, a similarity calculation unit 202, and a central point re-determination unit 203;

the central point selecting unit 201 is configured to randomly select K entities as central points in the data set of the medical knowledge graph;

the similarity calculation unit 202 is configured to define a loss function and calculate a similarity between entities;

the center point re-determining unit 203 is configured to, for each entity in the data set, allocate the entity to a center point closest to the entity according to the calculated cosine distance, re-acquire K clusters, and re-calculate a center point of each newly acquired cluster until the loss function converges.

As an implementable embodiment, the loss function in the similarity calculation unit 202 is:

，

，

As an implementation manner, the entity similarity judging module 300 includes a word vector determining unit 301, a time series feature extracting unit 302 and a similarity judging unit 303;

the word vector determining unit 301 is configured to map the positive sample and the negative sample through an embedding layer weight matrix to obtain word vectors of the positive sample and the negative sample embedding layers, and represent the word vectors as an embedding layer matrix of input data, where the dimensionality of the word vectors is 256 dimensions;

the time series feature extraction unit 302 is configured to extract time series features of the word vectors of the positive sample and negative sample embedding layers through an lstm interior;

the similarity determination unit 303 is configured to perform two classifications through a linear layer, and determine whether the two classifications are similar according to the following formula:

is the result of the output of the LSTM after passing through the linear layer; softmax is a normalization function, for

As an implementation manner, in the time-series feature extraction unit 302, the internally extracting, by lstm, the time-series features of the word vectors of the positive sample and the negative sample embedding layer includes:

wherein the content of the first and second substances,

、

、

in order to input the information into the gate,

in order to forget to leave the door,

to output the gate, parameter

Weight matrix, x, representing a linear layer for memory cell W_tA representative vector, h, corresponding to the character currently input by the computing module_t-1Indicating the hidden layer state output corresponding to the last character, c_t-1And b represents a bias weight matrix of the linear layer, and tanh and sigma are activation functions.

As an implementation, the selecting, in the medical knowledge-graph vectorization representation module 100, correct triples and incorrect triples from the training set, and inputting a knowledge-graph learning model for training includes:

the correct triple is S (h, l, t), the error triple is S '(h', l, t) or S '(h, l, t'), wherein h is a head entity, t is a tail entity, l is the relation between h and t, and h 'and t' are respectively obtained by replacing the head entity and the tail entity by a random entity;

；

said loss function

Comprises the following steps:

，

wherein [ x ] + represents: max (0, x), λ are adjustable hyper-parameters.

A computer apparatus, comprising:

a memory for storing a computer program;

The processor may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the computer to perform desired functions. The memory may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer readable storage medium and executed by a processor to implement the above method steps of the various embodiments of the application and/or other desired functions.

A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the above-mentioned steps of the method for classifying medically similar entities based on a knowledge-graph and clustering algorithm.

A computer-readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Finally, it is noted that the above-mentioned embodiments illustrate rather than limit the invention, and that, while the invention has been described with reference to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A medical similar entity classification method based on knowledge graph and clustering algorithm is characterized by comprising the following steps:

s300, based on a similar term entity library in the medical knowledge graph, taking entities in the same cluster as positive samples, taking entities in different clusters as negative samples, inputting the positive samples and the negative samples, training an entity similar classification model, and performing similar judgment on the entities based on the entity similar classification model;

the calculation of the entity similarity classification model in step S300 includes:

s301, mapping the positive sample and the negative sample through an embedded layer weight matrix to obtain word vectors of the embedded layers of the positive sample and the negative sample, and representing the word vectors as an embedded layer matrix of input data;

wherein W₃A weight matrix for the last linear layer; ht is the final hidden state output of the lstm network; p is the probability value of whether the final outputs are similar;

the output of the LSTM passes through the linear layerThe result of the latter output; softmax is a normalization function, pair

2. The method for classifying medical similar entities based on knowledge graph and clustering algorithm as claimed in claim 1, wherein in the step S200, the clustering the representative vectors of entities and relations by using unsupervised clustering algorithm Kmeans comprises:

3. The method for classifying medical similar entities based on knowledge graph and clustering algorithm as claimed in claim 2, wherein said loss function in step S202 is:

，

，

4. The method for classifying medical similar entities based on knowledge graph and clustering algorithm as claimed in claim 1, wherein in said step S302, said time series feature of word vector passing lstm internal extraction of said positive and negative sample embedding layer:

wherein the content of the first and second substances,

、

、

in order to input the data to the gate,

in order to forget to leave the door,

for the output gate, wi, wf, wc, wxo, who, wco respectively represent the weight matrix of the linear layer where they are respectively located, bi, bc, bf, bo respectively represent the bias weight matrix of the linear layer where they are respectively located, and the parameters

Weight matrix, x, representing a linear layer for memory cell W_tRepresenting the corresponding expression vector h of the character input by the current computing module_t-1Indicating the hidden layer state output corresponding to the last character, c_t-1And b represents a bias weight matrix of the linear layer, and tanh and sigma are activation functions.

5. The method for classifying medical similar entities based on knowledge graph and clustering algorithm as claimed in claim 1, wherein in the step S100, the selecting correct triples and incorrect triples from the training set and inputting the knowledge graph learning model for training comprises:

；

said loss function

Comprises the following steps:

，

wherein [ x ] + represents: max (0, x), λ are adjustable hyper-parameters.

6. A medical similar entity classification system based on knowledge graph and clustering algorithm is characterized by comprising: the system comprises a medical knowledge map vectorization representation module, a similar term entity library construction module and an entity similarity judgment module;

the entity similarity judgment module is used for taking the entities in the same cluster as positive samples and the entities in different clusters as negative samples based on the similar term entity library in the medical knowledge map, inputting the positive samples and the negative samples, training an entity similarity classification model, and performing similarity judgment on the entities based on the entity similarity classification model;

the entity similarity judgment module comprises a word vector determination unit, a time series feature extraction unit and a similarity judgment unit;

the word vector determining unit is used for mapping the positive samples and the negative samples through an embedded layer weight matrix to obtain word vectors of the embedded layers of the positive samples and the negative samples, and taking the word vectors as embedded layer matrix representation of input data;

the similarity judging unit is used for carrying out secondary classification through the linear layer and judging whether the two classes are similar according to the following formula:

7. The medical similar entity classification system based on the knowledge graph and the clustering algorithm as claimed in claim 6, wherein the similar term entity library construction module comprises a central point selection unit, a similarity calculation unit and a central point re-determination unit;

and the central point re-determining unit is used for distributing each entity in the data set to a central point closest to the entity according to the calculated cosine distance, re-acquiring K clusters, and re-calculating the central point of each cluster for re-acquisition until the loss function is converged.

8. The system of claim 7, wherein the loss function in the similarity calculation unit is:

，

，

9. The system of claim 6, wherein the time-series feature extraction unit extracts the time-series features of the word vectors of the positive and negative sample embedding layers through lstm internally, and comprises:

wherein the content of the first and second substances,

、

、

in order to input the information into the gate,

in order to forget to leave the door,

for the output gate, wi, wf, wc, wxo, who, wco respectively represent the weight matrix of the linear layer, bi, bc, bf, bo respectively represent the bias weight matrix of the linear layer, and parameters

10. The system of claim 6, wherein the medical knowledge-graph vectorization module selects the correct triples and the incorrect triples from the training set, and inputs a knowledge-graph learning model to perform the training, the system comprising:

；

said loss function

Comprises the following steps:

where (h '+ l, t') represents an error triplet, [ x ] + represents: max (0, x), λ are adjustable hyper-parameters.

11. A computer device, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the method for classifying medically similar entities based on a knowledge-graph and clustering algorithm according to any one of claims 1 to 5 when executing the computer program.

12. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements the steps of the method for classifying medically similar entities based on a knowledge-graph and clustering algorithm according to any one of claims 1 to 5.