CN115687642A

CN115687642A - Traditional Chinese medicine diagnosis and treatment knowledge discovery method based on clinical knowledge graph representation learning

Info

Publication number: CN115687642A
Application number: CN202211274375.8A
Authority: CN
Inventors: 翁衡; 老膺荣; 楚晓丽
Original assignee: Guangdong Hospital of Traditional Chinese Medicine
Current assignee: Guangdong Hospital of Traditional Chinese Medicine
Priority date: 2022-10-18
Filing date: 2022-10-18
Publication date: 2023-02-03

Abstract

The invention discloses a traditional Chinese medicine diagnosis and treatment knowledge discovery method and device based on clinical knowledge graph representation learning, and relates to the technical field of medical big data and knowledge graphs. The framework has wide application prospect in the aspects of knowledge map fusion reasoning Chinese medicine diagnosis and treatment knowledge discovery, auxiliary decision making, knowledge question answering and the like.

Description

Traditional Chinese medicine diagnosis and treatment knowledge discovery method based on clinical knowledge graph representation learning

Technical Field

The invention relates to the technical field of medical artificial intelligence and knowledge maps, in particular to a traditional Chinese medicine diagnosis and treatment knowledge discovery method and system based on traditional Chinese medicine knowledge map deep analysis.

Background

The traditional Chinese medicine has a long history of five thousand years, and has the advantages of definite clinical curative effect, relative safety in medication, flexible treatment mode and relatively low cost through long-term practice of scientific thinking and personalized diagnosis and treatment technology taking overall observation and treatment based on syndrome differentiation as core. However, a great deal of experience knowledge exists in the brains of the famous and old Chinese medicines, and is difficult to apply to clinical auxiliary decision. While the disassembly relying solely on medical guidelines cannot cope with all situations.

The existing clinical assistant decision-making system lacks the coming and going pulse for deep analysis of the knowledge map/triple and is difficult to explain diagnosis decision like a senior specialist. The discovery of the experience knowledge of the famous and old traditional Chinese medicine diagnosis and treatment and the clinical assistant decision are the difficulties of the artificial intelligence research at present, and in order to solve the problems of model interpretability and intelligent application, the invention provides a framework for constructing the traditional Chinese medicine diagnosis and treatment knowledge discovery and assistant decision based on the representation and learning of the clinical knowledge graph.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a traditional Chinese medicine diagnosis and treatment knowledge discovery method based on clinical knowledge graph representation learning, which effectively solves the learning problem of potential causal relationship of multi-hop relationship tuples of traditional Chinese medicine and pharmacology methods, improves the performance of a traditional Chinese medicine knowledge graph representation learning model, and finally optimizes traditional Chinese medicine diagnosis and treatment knowledge discovery and clinical assistant decision downstream application through knowledge distillation.

The technical scheme of the invention is realized as follows:

a traditional Chinese medicine diagnosis and treatment knowledge discovery method based on clinical knowledge graph representation learning is characterized by comprising the following steps:

collecting data from an existing data resource library to construct a clinical knowledge graph, wherein the clinical knowledge graph is a three-tuple set, and the triplets comprise a head entity s, a relation path r and a tail entity o;

constructing a traditional Chinese medicine clinical knowledge graph representation learning framework taking a triple/multi-hop relational tuple self-adaptive knowledge representation learning model as a core, and screening, identifying and expanding the relational paths;

constructing a dynamic coding knowledge map model, and performing triple completion and multi-hop relation prediction on the clinical knowledge map;

and performing knowledge distillation on the dynamic coding knowledge graph model to reduce the size of the dynamic coding knowledge graph, wherein the dynamic coding knowledge graph model is used for optimization and application of downstream tasks.

In a further embodiment, the head entity and the tail entity comprise information of the disease names of the traditional Chinese and western medicine, the symptoms of the traditional Chinese and western medicine four-diagnosis, the inspection and examination report and the drug names.

In a further embodiment, in performing screening, identification and expansion on the relationship path, the relationship path includes:

the single jump relation Edge is shown in the specification;

the multi-hop relationship Path is shown in the specification;

and (4) screening the relation paths through causal relation constraint and embedding the semantic combination to represent the relation paths, and expanding the screened multi-hop relation to a triple alternative combination.

In a further implementation mode, in the traditional Chinese medicine clinical knowledge graph representation learning framework which takes a triple/multi-hop relational tuple self-adaptive knowledge representation learning model as a core, the traditional Chinese medicine clinical knowledge graph representation learning framework is constructed and subjected to knowledge distillation based on a triple/path embedding technology BERT.

In a further embodiment, the constructing of the dynamic coding knowledge graph model further includes model training and optimization, for a given input sequence, a head entity s or a tail entity o is replaced by [ MASK ] or predicted by the model, the replaced sequence is added with position codes and put into a transform Encoder to obtain a final hidden state sum, and then the final hidden state sum is used for predicting the masked entity.

In a further embodiment, in triple completion and multi-hop relationship prediction of the clinical knowledge graph, the triple completion is regarded as one of the cases of the multi-hop relationship, one side (left) or one path (right) is used as an input sequence, a special marker [ MASK ] is used for replacing an entity, then a customized transform model is input, and a final hidden state corresponding to the [ MASK ] is used for predicting a target entity; thereby obtaining a set of all target entities from the entity s, which are reached via the path p.

In further embodiments, the knowledge distillation comprises: embedded layer knowledge distillation, transformer layer knowledge distillation and output layer knowledge distillation.

In a further implementation mode, the dynamic coding knowledge graph model is used for optimization and application of downstream tasks and specifically comprises intelligent diagnosis and intelligent construction of a famous doctor personalized diagnosis knowledge graph.

The invention also proposes a computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the method according to any one of the above when executing the computer program.

The invention also proposes a computer-readable medium, in which a computer program is stored, characterized in that said program, when executed by a processor, implements the method according to any one of the preceding claims.

Compared with the prior art, the invention has the following advantages.

Aiming at the characteristics of fuzzy and complex relationship mapping between entities of the traditional Chinese medicine Knowledge Graph, the invention innovatively creates a rule-based potential multi-hop causal relationship screening and generating module to realize multi-hop relationship identification and expansion, and further unifies triple Completion (KGC) and multi-hop relationship Prediction (PQA) by a dynamic coding Knowledge representation model fusing Knowledge Graph contexts, thereby effectively solving the learning problem of the potential causal relationship of the multi-hop relationship tuple of the traditional Chinese medicine and pharmacology, improving the performance of the traditional Chinese medicine Knowledge Graph representation learning model, and finally optimizing diagnosis and treatment Knowledge discovery and clinical assistant decision downstream application of the traditional Chinese medicine through Knowledge distillation. The framework has wide application prospects in the aspects of knowledge map fusion reasoning traditional Chinese medicine diagnosis and treatment knowledge discovery, decision assistance, knowledge question answering and the like.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of the multi-hop relational inference screening extraction of the present invention;

FIG. 2 is a schematic diagram of the triple screening rule of the present invention;

FIG. 3 is a schematic diagram of a dynamically encoded knowledge graph model in accordance with the present invention;

FIG. 4 is a schematic illustration of knowledge distillation of a dynamically encoded knowledge graph model according to the present invention;

FIG. 5 is a schematic diagram of the individual knowledge map of the medical treatment of the gynecological diseases;

fig. 6 is a schematic diagram of a medical knowledge discovery and decision-making aid framework according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplification of description, but do not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," "third," "fourth," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it is to be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected" and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases by those skilled in the art.

Referring to fig. 1 to 6, an embodiment of the present invention discloses a method for discovering traditional Chinese medicine diagnosis and treatment knowledge based on clinical knowledge graph representation learning, including the following steps:

s10 clinical knowledge-graph multi-hop relationship identification

In order to realize the discovery of the traditional Chinese medicine clinical diagnosis and treatment knowledge and the assistant decision-making, the invention cleans the data of the traditional Chinese and western medicine combined disease diagnosis and treatment data, arranges the data to obtain the traditional Chinese and western medicine combined disease diagnosis and treatment data set, and expresses the relationship in a triple form. For example, "insulin resistance is a symptom of diabetes," in the present invention, entities and relationships of sentences are extracted and arranged into a triad of "symptom-disease" relationships (insulin resistance, symptom, diabetes).

The knowledge-graph is composed of triplets including head entities, relationships, and tail entities. The fact is depicted in the knowledge map, i.e., G = (E, R, S), where E = { E = { E = ₁ ,e ₂ ,…,e _|E| The entity set of the knowledge base comprises E different entities; r = { R = ₁ ,r ₂ ,…,r _|E| The relation set in the knowledge base contains | R | different relations;

representing a set of triples in a knowledge base. The basic form of a triple mainly includes an entity 1, a relationship, an entity 2, and concepts, attributes, attribute values, and the like. The knowledge graph established in the research contains nearly 10 ten thousand of 20 types, and the relation number is as follows: more than 30, triplet: 140 million clinical knowledge maps.

Conventional knowledge graph models only learn static representations of existing entities and relationships, however, entities and relationships rarely exist in isolation. Rather, they can often expand a rich variety of graph contexts through entity association. When entities and relationships appear in different graphical contexts, they may exhibit different meanings, as if words appear in different textual contexts, and multi-hop relationships (entities to entities need to be represented by a series of paths made up of relationships) may also provide more information for reasoning about relationships.

S → r → o, i.e. entity to entity only passes one hop, represented by the Edge of a relationship, which is the most basic way in the knowledge-graph. For example, pulsatilla → drug-disease → abdominal mass.

The multi-hop relationship Path → r1 → … → rk → o means that the entity-to-entity needs to be represented by a series of paths formed by relationships, and the representation is often accompanied by stronger reasoning due to the inclusion of multi-hop information. For example, pulsatilla → drug-mechanism → (phlegm-dampness-toxin-stasis) mechanism → (disease → abdominal mass).

Practice proves that the recognition of the multi-hop relational path can expand richer inference modes between entities, and as the length of the multi-hop path is increased, the single-hop inference capability is also increased, namely the training of the multi-hop inference is beneficial to the improvement of the triple prediction capability. However, since not all relationship paths are reliable, a causal relationship constraint is designed to screen relationship paths in the present invention. And expressing a relation path through a relation embedded semantic combination, and expanding the screened multi-hop relation to a triple alternative combination.

The specific rules for screening potential multi-hop causal relationships are as follows: for example, two triplets of 'Chinese pulsatilla root-R-abdominal mass' and 'Chinese pulsatilla root-R-phlegm-damp toxin stasis' exist in a clinical knowledge map, which reflects the disease and mechanism, and the relationship between the medicine and the mechanism (as shown in the positive example of fig. 1) can form a causal multihop relationship that 'Chinese pulsatilla root can treat phlegm-damp toxin stasis' in the abdominal mass; the triple group of dry mouth-R-diabetes and dry mouth-R-chronic kidney disease is only the conventional co-occurrence relationship, and the causal relationship between the diabetes and the chronic kidney disease and the dry mouth cannot be enhanced, and the cause and effect relationship is excluded from the rule screening range (as shown in the opposite example of figure 1).

In the relation set, the embedding of the two-hop relation of 'insulin-metabolic abnormality-diabetes' enables the relation of 'insulin-diabetes' to have richer semantic representation, and the representation is accompanied by stronger reasoning due to the fact that the relation comprises multi-hop information, so that better entity prediction accuracy can be obtained. Meanwhile, the entity with the multi-hop relationship is represented in the form of 'Drug 1-r3, r2-Disease 1', so that the representation of the complex connection line is converted into triples which can be represented by texts, and the model parameters are convenient to input.

S20, the triple/multi-hop relational tuple self-adaptive knowledge representation learning model comprises the following steps:

s21 entity link prediction:

according to the characteristics of rich concept meaning of entities, fuzzy and complex triple logical relations and the like of the traditional Chinese medicine clinical knowledge graph, the Bert-based knowledge graph representation learning model is adopted, the dynamic adaptive representation of the entities and the relations is learned according to rich graph structure context, the BerT technology for Embedding recently learned contextualized words is used for reference, and therefore the dynamic coding knowledge graph model also adopts a transform Encoder (BERT) framework to complete Embedding. For a given input sequence X = (X) ₁ ,x ₂ ,…,x _n ) The first and last elements in (1) are entities from E, and the other elements in (1) in (7) are relationship paths from R. For each element X in X _i In the present invention, the input representation thereof is constructed as:

wherein

Is the embedding of the elements, and the embedding of the elements,

is position embedding. The former is used to identify the current element and the latter is used to identify its position in the sequence. The present invention allows for element embedding for each entity/relationship in E @, and location embedding for each location within length k.

After all input representations are constructed, they are input into a series of L consecutive transform encoders in the present invention for sequence encoding, and the following are obtained:

wherein λ represents the number of layers of the transform stack, x after the λ -th layer _i Unlike sequential left-to-right or right-to-left coding strategies, the Transformer uses a multi-headed self-attention mechanism, allowing each element to be interested in all elements in the sequence, and thus more efficient in context modeling. After processing by multi-layer stacking, the resulting representation

Is adaptive to the input.

S22 multi-hop relationship representation learning:

for knowledge-based map model recommendation and question-and-answer scenarios, it is conventional to search or predict based on a given triple structure. There may be two cases in a given context between a host and a guest:

edge s → r → o is that an entity-to-entity just passes through a jump and can be represented by a relationship Edge, which is the most basic way in the knowledge-graph. For example, pulsatilla → treatment → abdominal mass.

S → r1 → … → rk → o, that is, the entity-to-entity needs to be represented by a series of paths formed by relations, and the representation is often accompanied by stronger reasoning due to the inclusion of multi-hop information. For example

The traditional knowledge graph embedding model only supports the calculation of a first simple relation, namely, the first simple relation is converted into a triple completion task, a head entity or a tail entity in a triple is covered, and the set of the covered target entities is presumed;

in reality, a multi-hop relationship is more common, that is, a starting entity s and a relationship path p are given, and all the entities s are supposed to go from, and a set of target entities arriving through the path p is presumed. In the present invention, a triplet is considered as a special case of a multi-hop relationship, and an entity is replaced by a special tag [ MASK ] given an edge (left) or a path (right) as an input sequence. A custom Transformer model is then entered, and the final hidden state corresponding to [ MASK ] is used to predict the target entity. The unified prediction of the triple and multi-hop relationship is realized, and the model performance is improved.

S23, model training and optimizing:

in the invention, an entity prediction experiment is designed, namely missing entities are predicted from given graph context. This task corresponds to a single-hop or multi-hop question answer at KGs. There may be two cases in the KG for a given two entities, edge and path, respectively, for which two different ways are used in the present invention to train.

For the side s → r → o, two instances will be generated, respectively? → r → o and s → r →? . Such a question is a single-hop question in a question-and-answer task, to predict the entity. For example, pulsatillae radix → treatment →? The question is what syndrome type the Chinese pulsatilla can treat.

For path s → r ₁ →...→r _k → o, also mapped to two instances, predicting head and tail entities respectivelyA solid body. It can be seen as a multi-hop question in a question-and-answer task. For example

It is asked "what disease syndrome type can be treated with pulsatilla root".

Unifying the above two, X = (X) for a given input sequence ₁ ,x ₂ ,…,x _n ) Two training examples were created, one with [ MASK ]]Instead of having the model predict the head entity, another one is used [ MASK ]]Alternatively, let the model predict the tail entity. And then adding position codes to the replaced sequence, putting the sequence into a Transformer Encoder to finally obtain a final steganography sum, and then using the final steganography sum to predict the masked entity.

Like the BERT principle, the model uses the feedforward neural network sum to predict the entity at the time of final classification:

wherein z is ₁ /z _n Is a hidden state after the feedforward layer, E ^ele ∈R ^V×D Is the classification weight shared with the input element embedding matrix, D is the hidden size, V is the entity vocabulary size, p ₁ /p _n Is x ₁ /x _n (s/o) predicted distribution over all entities.

Since classification is targeted, cross entropy is used as a loss function; the cross entropy between the single hot label and the prediction is used in the present invention as the training loss:

wherein y is _t And p _t Respectively y ₁ /y _n And y ₁ /y _n The t-th component of (a). Since hot tags limit each entity prediction task to one correct answer, a tag smoothing strategy is used in the present invention to reduce this limitation. In the invention, will y _t Set = ∈ to target entity, will

Each of the other entities is set as shown in detail in fig. 3.

S30 knowledge distillation:

in an actual service scene, aiming at the defects of multiple parameters, high consumption of computing resources and low speed of an original Transformer model, the invention uses the thought of tinybert [55] Knowledge distillation (Knowledge distillation) for reference, the size of a distillation model of a dynamic coding Knowledge graph model is about 1/3 of that of the original model, the prediction speed is improved by 4 times, and meanwhile, the performance of a Knowledge link prediction task is not obviously reduced, as shown in the following figure 4:

(1) embedded layer knowledge Distillation of Embedded layer of Embedded-layer Distiltation

L _embedding ＝MSE(E ^s W _e ,E ^T )

Wherein E is ^S ∈R ^l*d0 ，E ^T ∈R ^l*d Representing embedding of the student network and embedding of the teacher network, respectively. Wherein l represents sequence length, d0 represents student embedding dimension, and d represents teacher embedding dimension. Since the embedding layer of the student network is usually smaller than the teacher network to obtain smaller model and acceleration, we is a trainable linear transformation matrix with d0 × d dimension, and projects embedding of the student to the space where the teacher embedding is located. Finally, MSE is calculated to obtain embedding loss.

(2) Layer knowledge Distillation of transform-layer Distillation

The transform distillation of the dynamic coding knowledge graph model-partition adopts a mode of partition layer distillation. For example, the knowledge graph model of the teacher dynamic coding has a total of 12 layers, and if the student BERT is set to 4 layers, the transform loss is calculated every 3 layers. The mapping function is g (m) =3 × m, and m is the number of student encoder layers. The concrete correspondence is that the 1 st layer of transducer of the student corresponds to the 3 rd layer of the teacher, the 2 nd layer corresponds to the 6 th layer, the 3 rd layer corresponds to the 9 th layer, and the 4 th layer corresponds to the 12 th layer. The transformer loss of each layer is in turn divided into two parts, attention-based knowledge distillation and implicit state-based knowledge distillation.

Attention-based knowledge distillation

Wherein, A _i ∈R ^lxl H represents the head number of attention, l represents the input length, represents the attention score matrix of the ith attention head of the student network, and represents the attention score matrix of the ith attention head of the teacher network.

Knowledge distillation based on hidden states

L _hidden ＝MSE(H ^S W _h ,H ^T )

Wherein the content of the first and second substances,

and H ^T ∈R ^l*d Hidden layer outputs of the student transform and the teacher transform, respectively. The same as embedding loss.

(2) Prediction-layer Distillation output layer knowledge Distillation

L _pred ＝-softmax(z ^T )×log_softmax(z ^S ÷t)

Where t is the temperature value, set to 1. In addition to mimicking the behavior of the middle layer, this layer is used to mimic the behavior of the teacher network at the predict layer. Specifically, this layer computes the softmax cross entropy of the probability distribution output by teacher and the probability distribution output by student. The specific implementation of this layer is task dependent.

An example of the association and fusion of the common knowledge graph and the individual knowledge graph based on the framework is as follows:

as shown in figure 5, the knowledge representation learning framework standardizes concept entities originally existing in text medical records, performs entity alignment and link prediction through the existing knowledge maps, automatically constructs a dynamic knowledge map with strong interpretability, and can guide disease-empirical medication summary, symptom-drug efficacy-interaction to prompt addition and subtraction rules.

The traditional Chinese medicine knowledge map representation model and construction method, the characterization learning of diagnosis and treatment data and the knowledge map construction method are applied to the knowledge explication of the physical and legal prescriptions, follow the traditional Chinese medicine clinical path frame, integrate the modern medical science, the literature evidence, the four-diagnosis objectification and the individualized clinical evidence experience information of the famous and old traditional Chinese medicine, and establish massive common knowledge maps of effective information and relations of ancient and modern literature medical names, traditional Chinese medicines/prescriptions, inspection/examination, western medical names, traditional Chinese medicine symptoms, hospital departments and the like and the individualized knowledge maps defined by the famous medical science/specialist field knowledge; for the construction of the personalized knowledge graph of the specialized famous physicians, the high-accuracy complementary prediction and knowledge alignment of the knowledge elements can be realized only by a small amount of iteration. The method realizes the association fusion and mutual verification of the commonalities and the individual knowledge, realizes the explicit implicit knowledge of group and individual wisdom, and is applied to the discovery of the famous doctors/specialist diagnosis and treatment knowledge and the optimization of downstream tasks such as individual recommendation, knowledge question and answer, auxiliary diagnosis and the like.

Information extraction, concept mapping and entity alignment are carried out on multi-modal information, multi-modal information embedding learning and heterogeneous knowledge graph construction are integrated, small sample weak supervised learning optimized intelligent diagnosis of a graph neural network based on HNE is achieved, and as shown in figure 6, the knowledge graph is also used in practical applications such as medical guidance recommendation and knowledge question and answer of the unit.

The technical scheme of the invention is as follows:

1. by utilizing the traditional Chinese medicine big data intelligent processing and knowledge service platform 1100, a plurality of digital resources such as traditional Chinese medicine classical ancient book documents, 10 ten thousand famous medical examinations and the like and related physical, legal, prescription and biomedical knowledge bases, the traditional Chinese and western medicine combined diagnosis and treatment knowledge map is constructed by methods such as Information Extraction (IE), concept normalized Conversation Notification (CN), entity Alignment assessment (EA) and the like, and 20 entities such as traditional Chinese and western medicine disease names, traditional Chinese and western medicine four diagnosis body examination symptoms, examination and examination reports, medicines and the like are contained, wherein the relationship number is approximately 10: more than 30, triplet: 140 million clinical knowledge maps.

2. Based on the clinical knowledge graph preprocessing, a traditional Chinese medicine clinical knowledge graph representation learning framework with a triple/multi-hop relation tuple self-adaptive knowledge representation learning model as a core is further constructed, the multi-hop relation tuple of the traditional Chinese medicine and pharmacology method is automatically generated based on rules, and the multi-hop relation with potential multi-hop cause and effect exists is screened, identified and expanded. (KGE)

3. The learning model is represented by self-adaptive traditional Chinese medicine Knowledge, dynamic coding Knowledge representation of Knowledge Graph context fusion is realized, triple Completion (KGC) and multi-hop relation Prediction (PQA) are unified, and the performance of the learning model represented by the traditional Chinese medicine Knowledge Graph is improved.

4. Through knowledge distillation, a knowledge graph model optimizes Chinese medicine diagnosis knowledge discovery and clinical assistant decision downstream application, including Intelligent diagnosis (Intelligent Diagnostics), dynamic interpretable famous-medicine personalized diagnosis knowledge graph presentation (The visualization of personalized knowledge graph), knowledge recognition and answering and other three-level work, so that end-to-end optimization verification is realized, and interactive online continuous incremental knowledge graph expansion is supported.

The invention has the beneficial effects that: aiming at the characteristic diagnosis characteristics of implicit knowledge of traditional Chinese medicine diagnosis and treatment experience and real world hybrid factors, effective information and relationships of traditional Chinese medicine names, traditional Chinese medicines/prescriptions, inspection/examination, western medicine names, traditional Chinese medicine symptoms, hospital departments and the like are extracted and respectively arranged from real cases according to the definition of knowledge in the field of famous doctors/specialties, triple single-hop and multi-hop path information fusion learning based on the rule potential causal relationship is carried out by applying a dynamic coding knowledge graph model combining knowledge graph context, the balance of knowledge element prediction errors and space vectors is achieved after multiple rounds of iteration, downstream task optimization and application = > Intelligent diagnosis intelligentization is realized, the knowledge of famous doctors/specialties can be explained, and the method is effectively applied to personalized recommendation, knowledge question answering and auxiliary diagnosis model optimization of famous doctors/specialties.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A traditional Chinese medicine diagnosis and treatment knowledge discovery method based on clinical knowledge graph representation learning is characterized by comprising the following steps:

collecting data from an existing data resource library to construct a clinical knowledge graph, wherein the clinical knowledge graph is a triple set, and the triple comprises a head entity s, a relation path r and a tail entity o;

constructing a dynamic coding knowledge graph model, and performing triple completion and multi-hop relation prediction on the clinical knowledge graph;

2. The method of claim 1, wherein the head entity and the tail entity comprise information of the disease names of the traditional Chinese and western medicine, the symptom of the traditional Chinese and western medicine four-diagnosis, the examination and examination report, and the drug names.

3. The method of claim 1, wherein the relationship path is selected, identified and augmented, and comprises:

the single-hop relationship Edge is s → r → o;

multi-hop relationship Path s → r ₁ →…→r _k →o；

And (4) constraining and screening the relationship paths through the causal relationship and embedding the semantic combination to represent the relationship paths, and expanding the screened multi-hop relationship to the triple alternative combination.

4. The method of claim 1, wherein the construction of the learning framework based on triple/multi-hop relational tuple adaptive knowledge representation learning model is based on the construction of triple/path embedding technique BERT and knowledge distillation.

5. The method of claim 1, wherein the constructing of the dynamically coded knowledge graph model further comprises model training and optimization, wherein for a given input sequence X = (X) for a given input sequence ₁ ,x ₂ ,…,x _n ) By [ MASK ]]In place of x ₁ Or x _n So that the model predicts a head entity s or a tail entity o, adds a position code to the replaced sequence, puts the sequence into a Transformer Encoder to obtain a final hidden state sum, and then uses the final hidden state sum to predict the masked entity.

6. The method of claim 1, wherein in triple completion and multi-hop relationship prediction of the clinical knowledge graph, the triple completion is regarded as one of the cases of the multi-hop relationship, an edge (left) or a path (right) is used as an input sequence, a special marker [ MASK ] is used to replace an entity, and then a customized transform model is input, and the final hidden state corresponding to [ MASK ] is used to predict a target entity; thereby obtaining a set of all target entities that are reached from the entity s via the path p.

7. The method of claim 1, wherein the knowledge distillation comprises: embedded layer knowledge distillation, transformer layer knowledge distillation and output layer knowledge distillation.

8. The method of claim 1, wherein the dynamically encoded knowledge graph model is used for optimization and application of downstream tasks, and specifically comprises intelligent diagnosis and intelligent construction of a famous doctor personalized diagnosis knowledge graph.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the method of any one of claims 1 to 8 when executing the computer program.

10. A computer-readable medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 8.