CN114121212A - Traditional Chinese medicine prescription generation method based on knowledge graph and group representation learning - Google Patents

Traditional Chinese medicine prescription generation method based on knowledge graph and group representation learning Download PDF

Info

Publication number
CN114121212A
CN114121212A CN202111402132.3A CN202111402132A CN114121212A CN 114121212 A CN114121212 A CN 114121212A CN 202111402132 A CN202111402132 A CN 202111402132A CN 114121212 A CN114121212 A CN 114121212A
Authority
CN
China
Prior art keywords
entity
symptom
group
representation
chinese medicine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111402132.3A
Other languages
Chinese (zh)
Other versions
CN114121212B (en
Inventor
王伟
李书晨
何洁月
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202111402132.3A priority Critical patent/CN114121212B/en
Publication of CN114121212A publication Critical patent/CN114121212A/en
Application granted granted Critical
Publication of CN114121212B publication Critical patent/CN114121212B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/90ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to alternative medicines, e.g. homeopathy or oriental medicines

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Theoretical Computer Science (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Alternative & Traditional Medicine (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Medicinal Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a traditional Chinese medicine prescription generation method based on knowledge graph and group representation learning, which sequentially comprises the following steps: step 1, constructing a traditional Chinese medicine knowledge graph, taking the herbs as the core, packaging the properties of the herbs, such as nature, taste, channel tropism, efficacy and the like into a triple group, adding the symptoms in the prescription data set and the treatment relationship of the herbs into the knowledge graph, and finally forming the traditional Chinese medicine knowledge graph
Figure DDA0003365314520000011
Step 2, updating the embedded representation of each entity through propagation and aggregation of neighborhood information in the knowledge graph, and step 3, regarding the symptom combination corresponding to each prescription sample as a group according to the embedded representation of the entities obtained in the step 2, and regarding the group representation information and the traditional Chinese medicine knowledge graph as a group
Figure DDA0003365314520000012
The Chinese herbal medicine entities are subjected to interactive learning, and a plurality of Chinese herbal medicines which are most suitable for symptom combination are finally output to form a Chinese herbal medicine prescription. The invention mainly utilizes a data mining method to simulate the process of 'treatment based on syndrome differentiation' in traditional Chinese medicine diagnosis and treatment, and realizes the prescription of traditional Chinese medicine for assisting clinical treatment according to symptoms.

Description

Traditional Chinese medicine prescription generation method based on knowledge graph and group representation learning
Technical Field
The invention relates to a traditional Chinese medicine prescription generation method, in particular to a traditional Chinese medicine prescription generation method based on knowledge graph and group representation learning.
Background
Prescription generation of chinese medicine is the generation of a set of herbs for treating the symptoms of a patient by analyzing the interaction between symptoms and herbs. The actual diagnosis and treatment process of traditional Chinese medicine is that doctors deduce the syndrome according to the symptoms of patients and then make prescriptions according to the syndrome. Treatment based on syndrome differentiation is the basic principle of understanding and treating diseases in TCM, and is a special research and treatment method for diseases in TCM. The syndrome is the nature of the disease revealed by a number of symptoms with a main and secondary score. A group of symptoms can play different roles in decisions affecting syndrome induction, and syndrome induction does not simply accumulate each symptom, but comprehensively distinguishes the symptoms according to the primary symptoms and the secondary symptoms of each symptom.
The knowledge graph is a large semantic network consisting of rich entities and relationship information, and can be used for supplementing the relationship between users and items in a recommendation task. The embedding-based method and the path-based method in the knowledge-graph are two types common in knowledge-graph recommendation. Embedding-based methods mainly use information in the graph to better characterize entities and relationships. The Trans class model is a representation based on an embedding method. The path-based method utilizes a connectivity mode such as a meta path or a meta graph in the knowledge graph to recommend the path-based method. However, such methods are, for the most part, dependent on domain knowledge and are difficult to apply in practice. On the basis, the two methods are combined through the integration of entity embedding and connectivity information, and the entity embedding is optimized mainly through the path connectivity information of the knowledge graph. The group recommendation method is based on a common recommendation method, uses a specific aggregation method to aggregate users into user groups, learns the representation information of the groups, and models the interaction information of the user groups and items to predict the preference degree of different groups to each item.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to provide a Chinese Medicine Prescription generating method (KGAPG) based on Knowledge-graph and Group representation learning aiming at the problem that the Traditional Prescription generating method can not reasonably simulate the actual diagnosis and treatment process of Chinese Medicine. The invention considers the prescription generation as a group recommendation task, considers the symptom set of one patient as a group, and comprehensively considers the different influence of each symptom in the group, thereby expressing a plurality of symptoms as one group information by means of attention mechanism aggregation. From the theory of TCM, the group is the so-called "syndrome" of TCM, so the interaction between syndrome and herbs can be modeled. Firstly, a knowledge graph of traditional Chinese medicine is constructed, and semantic relations among different herbal medicines are learned in the knowledge graph. Then different influence weights of all symptoms in the symptom group are learned through an attention mechanism, so that the representation information of the symptom group, namely the representation information of the syndrome, is obtained through aggregation. Finally, the syndrome information and the herbal medicine information are interacted to output the predicted scores of different herbal medicines suitable for the group of symptom groups.
The technical scheme is as follows: in order to achieve the above purpose, the method for generating a traditional Chinese medicine prescription based on knowledge graph and group representation learning of the invention sequentially comprises the following steps:
step 1, knowledge graph construction and initial embedding layer: taking herbs as core, encapsulating the properties of herbs such as nature, taste, channel tropism, efficacy and the like into a triple group, adding the treatment relationship between symptoms and herbs in prescription data set into the knowledge graph, and finally forming the knowledge graph of traditional Chinese medicine
Figure BDA0003365314500000027
Initializing the embedded representation of each entity in the knowledge graph through a TransR model;
step 2, a neighbor information transmission and aggregation layer: updating the embedded representation of each entity through the propagation and aggregation of high-order neighborhood information in the knowledge graph, and enriching the semantic relation of each entity in the knowledge graph of the traditional Chinese medicine;
step 3, syndrome induction and prediction layer: according to the stepsThe entity embedded representation obtained in the step 2 is characterized in that symptom combinations corresponding to each prescription sample are regarded as a group, the group is used for representing syndrome information in the theory of traditional Chinese medicine, the embedded representation of the group is learned by using the attention mechanism, and the group representation information and the traditional Chinese medicine knowledge map are combined
Figure BDA0003365314500000028
The Chinese herbal medicine entities are subjected to interactive learning, and a plurality of Chinese herbal medicines which are most suitable for symptom combination are finally output to form a Chinese herbal medicine prescription.
Further, step 1 specifically comprises: by Chinese medicine knowledge map
Figure BDA0003365314500000029
The existing triple (ephedra, hasEffect, sweating) in (1) is taken as an example, wherein "ephedra" is a Chinese herbal medicine entity in the knowledge graph, "sweating" is an efficacy entity in the knowledge graph, and "hasEffect" indicates that the semantic relationship between "ephedra" and "sweating" can be expressed as "ephedra has efficacy of sweating". Let the notation of the triplet be (e)h,r,et) Wherein e ish,r,etThe head entity (ephedra), relationship (hasEffect) and tail entity (sweating) of the knowledge-graph are represented, respectively. First, the entities in the d-dimensional entity space are passed through Wr∈Rk×dProjecting the matrix into a k-dimensional relation space where the relation r is located to obtain an entity ehEmbedded representation within a relationship space
Figure BDA0003365314500000021
And entity etEmbedded representation within a relationship space
Figure BDA0003365314500000022
Then by optimizing the translation principle
Figure BDA0003365314500000023
Where r is an embedded representation of the relationship r in k-dimensional relationship space. Thereby obtaining the Chinese medicine knowledge map of the two entities of the Chinese ephedra and the sweating
Figure BDA00033653145000000210
Is initially embedded in the representation. According to the method, the Chinese medicine knowledge map can be finally obtained
Figure BDA00033653145000000211
Each entity in (a) is represented by an initial embedding after being trained by a TransR model.
Further, step 2 specifically comprises:
step 21, making entity e in the knowledge graphhThe initial embedding obtained after the transR embedding of step 1 is denoted as ehWith entity ehOther directly connected entities are called direct neighbours of the entity, using
Figure BDA0003365314500000024
Denotes ehThe aggregate representation of the neighboring entities is shown in formula (1):
Figure BDA0003365314500000025
wherein
Figure BDA0003365314500000026
Is ehNeighbor entity e oftThe weight occupied in the process of aggregate representation can also be understood as the relation r to the entity ehThe importance of (c). The weights here depend on e in the space of the relation rhAnd etIs defined as shown in formula (2):
Figure BDA0003365314500000031
wherein
Figure BDA0003365314500000032
(d denotes the embedding dimension) is a trainable weight matrix, erIs an embedded representation of the relationship r. Finally, the weights are normalized to be soft max function
Figure BDA0003365314500000033
Step 22, obtaining an entity ehNeighbor aggregate representation of
Figure BDA0003365314500000034
Then use
Figure BDA0003365314500000035
To update the original embedding e of the entityh. Entity e after direct neighbor information aggregation and updatehIs expressed as
Figure BDA0003365314500000036
Wherein f isagg(. cndot.) is an aggregation function defined as shown in equation (3):
Figure BDA0003365314500000037
wherein
Figure BDA0003365314500000038
(d denotes the embedding dimension) is a trainable weight matrix, which indicates the product of elements, and LeakyReLU is an activation function.
And step 23, further stacking more propagation layers to obtain a high-order neighbor aggregation representation of each entity on the basis of the step 21 and the step 22. The entity embedding representation updating is carried out recursively in the l-layer network, and information is propagated node by node in the knowledge graph. Simply, through the propagation of the l layers, the last layer of entity ehThe presentation information of (a) includes ehHigh-order neighbor entity information that can be reached in step l.
Further, step 3 specifically comprises:
step 31, defining a plurality of symptoms of each prescription sample as a group Sp={si|siE.g., S }, where SiRepresenting the ith symptom, and S represents the set of all symptoms in the dataset. The syndrome induction process utilizes attentionThe force mechanism learns the influence of different symptoms in each symptom cluster on the cluster, i.e. the weight each symptom takes on a symptom cluster. The weight is learned by the attention network, symptom set SpEach symptom of (1)iWeight of (a) (S)p,si) The definition is shown in formula (4):
Figure BDA0003365314500000039
wherein
Figure BDA00033653145000000310
And
Figure BDA00033653145000000311
is a trainable parameter, siIs the symptom s obtained through step 2iIs shown embedded. After the weight of each symptom in a group of symptom sets is obtained, the weight is normalized by the softmax function, and finally the influence score of each symptom in the group is obtained
Figure BDA00033653145000000312
Specifically, as shown in formula (5):
Figure BDA00033653145000000313
step 32. based on step 31, the symptom group S can be obtainedpI.e. a representation s of the underlying syndrome for each symptom combinationdThe definition is shown as formula (6):
Figure BDA00033653145000000314
then, by means of the nonlinear processing advantages of the single-layer MLP, the more expressive syndrome representation is learned, as defined by equation (7):
sd=ReLU(Wmlp·sd+bmlp)#(7)
wherein WmlpAnd bmlpAre learnable parameters and ReLU is the activation function. So far, by means of a symptom polymerization method based on an attention mechanism, a potential syndrome representation in each prescription sample is obtained, and the method accords with the basic process of traditional Chinese medicine diagnosis and treatment. The above process is adaptive.
Step 33. the potential syndromes for each prescription sample obtained by the above steps are interacted with herbs to predict the likelihood that each herb will be suitable for treating the set of symptoms. Here, the predicted score is calculated using the inner product, as shown in equation (8):
Figure BDA0003365314500000041
wherein
Figure BDA0003365314500000042
Indicating that herb h is suitable for treating symptom group SpI.e. the underlying syndrome sdH is the embedded representation of the herbal entities in the knowledge-graph. The first N herbs with the highest probability score are finally output as the prescription for the combination of the input symptoms.
Has the advantages that:
the invention provides a traditional Chinese medicine prescription generation method based on knowledge graph and group representation learning, which converts traditional Chinese medicine prescription generation problems into a group recommendation task by means of the knowledge graph and group representation learning method. The model firstly utilizes knowledge in the field of traditional Chinese medicine to construct a traditional Chinese medicine knowledge map, and learns the embedded representation of each entity through the high-order connectivity of the knowledge map; in addition, the symptom combination of each input sample is regarded as a group, the influence weight of each symptom in the group is learned by using an attention mechanism, and the embedded representation of each symptom is aggregated into the embedded representation of the group, namely the embedded representation of the potential symptoms reflected by the symptom combination, so that the treatment process of 'treatment based on syndrome differentiation' in the traditional Chinese medicine diagnosis and treatment is simulated. It includes the following advantages:
(1) the group recommendation method is introduced into the prescription generation task, and the primary and secondary influence relations of different symptoms in the syndrome induction process are emphasized. Using the attention mechanism in the group aggregation process in the syndrome induction stage to learn the influence scores of different symptoms in the symptom group of a prescription sample;
(2) the sparseness problem in the recommendation task is improved by utilizing the abundant project structures in the knowledge graph, and the potential abundant semantic relation between the herbal medicine and various attributes and symptoms is obtained through the high-order connectivity of the knowledge graph.
Drawings
FIG. 1 is a block diagram of the overall framework of the KGAPG model of the present invention;
FIG. 2 is a flow chart of a method of the present invention;
FIG. 3 is a schematic view of a Chinese medicine knowledge map;
FIG. 4 is a verification graph comparing neighbor information propagation and aggregation layer depth;
FIG. 5 is a comparison verification diagram of knowledge-graph entity embedding dimension.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described below with reference to specific embodiments and illustrative drawings, it being understood that the preferred embodiments described herein are for the purpose of illustration and explanation only and are not intended to limit the present invention.
The invention relates to a traditional Chinese medicine prescription generation method based on knowledge graph and group representation learning, which sequentially comprises the following steps of:
step 1: the knowledge-graph construction and initial embedding layer,
the entities in the knowledge graph are a complex of multiple attributes, different relationships concern different attributes of the entities, and different relationships have different semantic spaces. By Chinese medicine knowledge map
Figure BDA0003365314500000051
The existing triple (ephedra, hasEffect, sweating) in (1) is taken as an example, wherein the ephedra is a Chinese herbal medicine entity in the knowledge map, the sweating is an efficacy entity in the knowledge map, and the hasEffect represents that the semantic relationship between the ephedra and the sweating can be expressed as that the ephedra has the sweatingThe efficacy of (1). Let the notation of the triplet be (e)h,r,et) Wherein e ish,r,etThe head entity (ephedra), relationship (hasEffect) and tail entity (sweating) of the knowledge-graph are represented, respectively. First, the entities in the d-dimensional entity space are passed through Wr∈Rk×dProjecting the matrix into a k-dimensional relation space where the relation r is located to obtain an entity ehEmbedded representation within a relationship space
Figure BDA0003365314500000052
And entity etEmbedded representation within a relationship space
Figure BDA0003365314500000053
Then by optimizing the translation principle
Figure BDA0003365314500000054
Where r is an embedded representation of the relationship r in k-dimensional relationship space. For a given triplet (e)h,r,et) The rationality score (or energy score) is formulated as shown in formula (1):
Figure BDA0003365314500000055
lower g (e)h,r,et) The score indicates that the triplet is more likely to be true and false otherwise. Training of TransR takes into account the relative order between real and false triples and distinguishes them by the following pairwise ordering loss function:
Figure BDA0003365314500000056
wherein
Figure BDA0003365314500000057
And (e)h,r,et') is a dummy triple constructed by randomly replacing one of the entities in a real triple; σ (-) is a sigmoid function.
Step 2: the neighbor information propagation and aggregation layer,
order entity e in knowledge maphThe initial embedding obtained after the transR embedding of step 1 is denoted as eh. With entity ehOther entities that are directly connected are referred to as direct neighbors of the entity. The embedded representation of the direct neighbors needs to go through the information propagation process and be aggregated to the entity ehThus entity ehIs integrated with the aggregated representation of the direct neighbours to obtain ehHigher order representation of (a). By using
Figure BDA0003365314500000058
Denotes ehSo the aggregate representation of the neighboring entities is defined as shown in equation (3):
Figure BDA0003365314500000059
wherein
Figure BDA00033653145000000510
Is ehNeighbor entity e oftThe weight occupied in the process of aggregate representation can also be understood as the relation r to the entity ehThe importance of (c). The weights here depend on e in the space of the relation rhAnd etIs defined by formula (4):
Figure BDA00033653145000000511
wherein
Figure BDA00033653145000000512
(d is the embedding dimension) is a trainable weight matrix, erIs an embedded representation of the relationship r. At the time of obtaining entity ehAfter the aggregated weights occupied by all direct neighbors, these weights are normalized by the softmax function, as shown in equation (5):
Figure BDA0003365314500000061
at the time of obtaining entity ehNeighbor aggregate representation of
Figure BDA0003365314500000062
Then use
Figure BDA0003365314500000063
To update the original embedding e of the entityh. Entity e after direct neighbor information aggregation and updatehIs expressed as
Figure BDA0003365314500000064
Wherein f isagg(. cndot.) is an aggregation function defined as shown in equation (6):
Figure BDA0003365314500000065
wherein
Figure BDA0003365314500000066
(d' and d are embedding dimensions) are trainable weight matrices, which indicate element products, LeakyReLU is an activation function. Through the propagation and aggregation of direct neighbors, each entity in the knowledge graph contains not only its own information, but also information flowing from its direct neighbors to the entity along first-order connectivity.
Based on the method for propagating and aggregating the direct neighbor information, more propagation layers are further superimposed to obtain the high-order neighbor information of each entity. The entity embedding representation updating is carried out recursively in the l-layer network, and information is propagated node by node in the knowledge graph. Simply, through the propagation of the l layers, the last layer of entity ehThe presentation information of (a) includes ehHigh-order neighbor entity information that can be reached in step l.
And step 3: the layer of syndrome induction and prediction,
the symptoms of each prescription sampleThe shape is defined as a group Sp={si|siE.g., S }, where SiRepresenting the ith symptom, and S represents the set of all symptoms in the dataset. According to the theory of traditional Chinese medicine, the process of treatment based on syndrome differentiation needs to comprehensively consider the primary and secondary symptoms of all symptoms of patients, which influences the decision of traditional Chinese medicine in the process of syndrome induction. In the syndrome induction process, the influence of different symptoms in each symptom group on the group, namely the weight of each symptom in one symptom group, is learned by using an attention mechanism. The weight is learned by the attention network, symptom set SpEach symptom of (1)iThe weight definition of (c) is shown in equation (7):
Figure BDA0003365314500000067
wherein
Figure BDA0003365314500000068
And
Figure BDA0003365314500000069
(d is the embedding dimension) is a trainable parameter, siIs the symptom s obtained through step 2iIs shown embedded. After the weight of each symptom in a set of symptom sets is obtained, the weight is normalized by the softmax function, and finally the final influence score of each symptom in the set is obtained
Figure BDA00033653145000000610
Specifically, as shown in formula (8):
Figure BDA00033653145000000611
on the basis of the obtained symptom group SpI.e. a representation s of the underlying syndrome for each symptom combinationdThe definition is shown as formula (9):
Figure BDA0003365314500000071
then, by means of the nonlinear processing advantages of the single-layer MLP, the more expressive syndrome representation is learned, as defined by equation (10):
sd=ReLU(Wmlp·sd+bmlp)#(10)
wherein WmlpAnd bmlpAre learnable parameters and ReLU is the activation function. So far, by adopting a symptom polymerization method based on an attention mechanism, a potential syndrome representation in each prescription sample is obtained, which accords with the basic process of traditional Chinese medicine diagnosis and treatment. The above process is adaptive.
Finally, the potential syndromes for each prescription sample obtained in the above steps are interacted with herbs to predict the likelihood that each herb will be suitable for treating the set of symptoms. Here, the predicted score is calculated using the inner product, as shown in equation (11):
Figure BDA0003365314500000072
wherein
Figure BDA0003365314500000073
Indicating that herb h is suitable for treating symptom group SpI.e. the underlying syndrome sdH is the embedded representation of the herbal entities in the knowledge-graph. The first N herbs with the highest probability score are finally output as the prescription for the combination of the input symptoms.
4. Model optimization
At a given prescription sample P ═<Sp,Hp>In the case of (1), wherein Sp,HpThe symptom set and the herb set in the prescription sample, the herb set H used actuallypMultiple heat vectors hp', v (S) represented as dimension | H |pH) is in a given symptom group SpThe output probability vectors of all herbs in case (2). hp' and v (S)pH), Weighted Mean Square Error (WMSE) between H) is defined as shown in equation (12):
Figure BDA0003365314500000074
wherein hpi' and v (S)p,H)iRespectively representing the i-th element, w, in the vectoriIs the weight of the ith herb, and is defined as shown in equation (13):
Figure BDA0003365314500000075
where freq (i) is the frequency with which the ith herb appears in all prescribed samples. The herbal weight is set to balance the contributions of herbs of different frequencies, the higher the frequency of an herb, the lower its weight.
The interaction of the underlying syndrome with herbs uses a loss function as in equation (14):
Figure BDA0003365314500000076
finally, the model is optimized by jointly learning equations (2) and (14) to obtain a joint objective function, as shown in equation (15):
Figure BDA0003365314500000077
wherein λΘControl L2The term is normalized to prevent overfitting, Θ being the set of all parameters of the model.
Experiment:
in order to verify the effectiveness of the KGAPG prescription generation model, the invention discloses a traditional Chinese medicine prescription data set for experiment, and in addition, parameter learning and ablation analysis are carried out to further verify the effectiveness of the model, wherein the prescription data set and the traditional Chinese medicine knowledge map data used in the invention are shown in Table 1.
TABLE 1 prescription data set and Chinese medicine knowledge map data
Figure BDA0003365314500000081
The chinese prescription data set contains 26360 complete prescription samples, which involve 360 symptoms and 753 herbs. Triplets are a general representation of a knowledge graph: (head, relation, tail), wherein head and tail represent the head entity and the tail entity respectively, and relation is the relationship between the two entities. In most cases, natural language can be represented in this form. For example, the triple (ephedra, waseffect, sweating) indicates that the herb "ephedra" has the efficacy of "sweating". In the theory of traditional Chinese medicine, herbs have the properties of four flavors and five flavors and meridian tropism. Therefore, the Chinese medicine knowledge map with the Chinese herbal medicine as the core is constructed on the basis of the background knowledge. FIG. 2 is an illustration of a Chinese medicine knowledge map.
Table 2 shows a comparison of the performance between the syndrome induction and models predicting the presence or absence of attention-driven mechanisms in the hierarchy aggregating multiple symptoms into a cluster. It can be observed from the table that the attention-based syndrome induction process is superior to the average aggregation method without attention. The symptom entity captures similar information between herbs through the high-order connectivity of the knowledge graph, so that the embedded representation of symptoms also contains the herb background knowledge of nature, taste, channel tropism, etc. Obviously, in the syndrome induction stage, the potential syndrome representation also integrates the background knowledge of herbs, which makes the relationship between syndrome and herbs more compact. Therefore, on the basis, the attention mechanism not only can dynamically acquire the respective influence of each symptom in the symptom group, but also accords with the basic idea of traditional Chinese medicine dialectical treatment.
TABLE 2 influence of the attention mechanism on the process of syndrome induction
Figure BDA0003365314500000082
Table 3 demonstrates the effect of using different aggregation functions in the neighbor information propagation and aggregation layer to update the embedded representation of the entity on the model performance. Three different polymerizers were investigated for their effect on the model performance: a GCN polymerizer, a GraphSage polymerizer and a Bi-Interaction polymerizer. From the table, it can be observed that the additional feature Interaction of the Bi-Interaction Aggregator in the information aggregation process can improve the representation learning effect of the node, which proves the rationality and effectiveness of the Bi-Interaction Aggregator.
TABLE 3 Effect of different polymerizers
Figure BDA0003365314500000091
FIG. 4 illustrates the effect of the depth of the neighbor information propagation and aggregation layer on the model performance. Depth embodies the high-order connectivity of the knowledge-graph, which controls the extent to which entities can aggregate information from. It can be observed from the figure that the model achieves the best results when the depth is 2, which shows that the second order relationship between entities can effectively represent the complexity of the herb. As depth continues to increase, noise may be introduced resulting in reduced model performance.
FIG. 5 illustrates the effect of embedding dimension d into an entity in a knowledge graph on model performance. Experiments controlled the range of sizes of the embedding dimensions between 32,64,128,256, 512. It can be observed from the figure that the model achieves the best performance when the embedding dimension is 256, which indicates that increasing the dimension value appropriately can more fully represent the complex herbal information in the knowledge graph.
Case verification:
in order to verify the rationality and validity of the prescription generation method proposed by the present invention, the proposed KGAPG model was tested in two real prescription cases. The model generates a set of herbs to treat these symptoms collectively. Two cases in the recipe generation scenario are shown in table 4, for example. The bold herbs in the table indicate successful hits of the KGAPG model generated herbs in the real herb collection.
Table 4 prescription Generation cases
Figure BDA0003365314500000092
Figure BDA0003365314500000101
It should be noted that the above-mentioned embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, and all equivalent substitutions or substitutions made on the above-mentioned technical solutions belong to the scope of the present invention.

Claims (4)

1. A traditional Chinese medicine prescription generation method based on knowledge graph and group representation learning is characterized by sequentially comprising the following steps:
step 1, knowledge graph construction and initial embedding layer: taking herbs as core, encapsulating the properties of herbs such as nature, taste, channel tropism, efficacy and the like into a triple group, adding the treatment relationship between symptoms and herbs in prescription data set into the knowledge graph, and finally forming the knowledge graph of traditional Chinese medicine
Figure FDA0003365314490000011
Initializing the embedded representation of each entity in the knowledge graph through a TransR model;
step 2, a neighbor information transmission and aggregation layer: updating the embedded representation of each entity through the propagation and aggregation of high-order neighborhood information in the knowledge graph, and enriching the semantic relation of each entity in the knowledge graph of the traditional Chinese medicine;
step 3, syndrome induction and prediction layer: according to the entity embedded expression obtained in the step 2, the symptom combination corresponding to each prescription sample is regarded as a group, the group is used for representing syndrome information in the theory of traditional Chinese medicine, the embedded expression of the group is learned by using the attention mechanism, and the group expression information and the traditional Chinese medicine knowledge map are combined
Figure FDA0003365314490000012
The Chinese herbal medicine entities are subjected to interactive learning, and a plurality of Chinese herbal medicines which are most suitable for symptom combination are finally output to form a Chinese herbal medicine prescription.
2. The method for generating a prescription of traditional Chinese medicine based on knowledge-graph and group representation learning according to claim 1, wherein the step 1 is specifically as follows:
chinese medicine knowledge map
Figure FDA0003365314490000013
Of (c), let the notation of the triplet be (e)h,r,et) Wherein e ish,r,etRepresenting the head entity, the relation (hasEffect) and the tail entity of the knowledge graph respectively, firstly, the entities in the d-dimensional entity space are passed through Wr∈Rk×dProjecting the matrix into a k-dimensional relation space where the relation r is located to obtain an entity ehEmbedded representation within a relationship space
Figure FDA0003365314490000014
And entity etEmbedded representation within a relationship space
Figure FDA0003365314490000015
Then by optimizing the translation principle
Figure FDA0003365314490000016
Wherein r is the embedded expression of the relation r in the k-dimensional relation space, and the Chinese medicine knowledge graph is finally obtained according to the method
Figure FDA0003365314490000017
Each entity in (a) is represented by an initial embedding after being trained by a TransR model.
3. The method for generating a prescription of traditional Chinese medicine based on knowledge-graph and group representation learning according to claim 2, wherein the step 2 is specifically as follows:
step 21, making entity e in the knowledge graphhThe initial embedding obtained after the transR embedding of step 1 is denoted as ehWith entity ehOther directly connected entities are called direct neighbours of the entity, using
Figure FDA0003365314490000018
Denotes ehThe aggregate representation of the neighboring entities is shown in formula (1):
Figure FDA0003365314490000019
wherein
Figure FDA00033653144900000110
Is ehNeighbor entity e oftThe weight occupied in the process of aggregate representation is understood at the same time as the relation r to the entity ehOf importance, the weight being dependent on e in the space of the relation rhAnd etIs defined as shown in formula (2):
π(eh,r,et)=(Wret)Ttanh((Wreh+er))#(2)
wherein
Figure FDA0003365314490000021
(d denotes the embedding dimension) is a trainable weight matrix, erIs an embedded representation of the relationship r, and finally the weights are normalized by the softmax function to
Figure FDA0003365314490000022
Step 22, obtaining an entity ehNeighbor aggregate representation of
Figure FDA0003365314490000023
Then use
Figure FDA0003365314490000024
To update the original embedding e of the entityhEntity e after direct neighbor information aggregation and updatehIs expressed as
Figure FDA0003365314490000025
Wherein f isagg(. cndot.) is an aggregation function defined as shown in equation (3):
Figure FDA0003365314490000026
wherein
Figure FDA0003365314490000027
(d represents an embedding dimension) is a trainable weight matrix, indicating a product of elements, LeakyReLU is an activation function;
step 23, on the basis of step 21 and step 22, further stacking more propagation layers to obtain a high-order neighbor aggregate representation of each entity, recursively updating the entity embedded representation in the layer I network, propagating information node by node in the knowledge graph, propagating information through the layer I, and finally propagating the entity e in the layer IhThe presentation information of (a) includes ehHigh-order neighbor entity information that can be reached in step l.
4. The method for generating a prescription of traditional Chinese medicine based on knowledge-graph and group representation learning according to claim 3, wherein the step 3 is specifically as follows:
step 31, defining a plurality of symptoms of each prescription sample as a group Sp={si|siE S) where SiRepresenting the ith symptom, S represents all symptom sets in the data set, the syndrome induction process utilizes an attention mechanism to learn the influence of different symptoms in each symptom group on the group, namely the weight of each symptom in one symptom group, the weight is learned by an attention network, and the symptom sets SpEach symptom of (1)iWeight of (a) (S)p,si) The definition is shown in formula (4):
α(Sp,si)=hTWattsi#(4)
wherein
Figure FDA0003365314490000028
And
Figure FDA0003365314490000029
is a trainable parameter, siIs the symptom s obtained through step 2iAfter obtaining the weight of each symptom in a set of symptom sets, normalizing by a softmax function to finally obtain the influence score of each symptom in the set
Figure FDA00033653144900000210
Specifically, as shown in formula (5):
Figure FDA00033653144900000211
step 32. based on step 31, the symptom group S can be obtainedpI.e. a representation s of the underlying syndrome for each symptom combinationdThe definition is shown as formula (6):
Figure FDA00033653144900000212
then, by means of the nonlinear processing advantages of the single-layer MLP, the more expressive syndrome representation is learned, as defined by equation (7):
sd=ReLU(Wmlp·sd+bmlp)#(7)
wherein WmlpAnd bmlpAre learnable parameters, ReLU is an activation function, so far, by means of a symptom aggregation method based on attention mechanism, a potential syndrome representation in each prescription sample is obtained,
step 33. the potential syndromes for each prescription sample obtained by the above steps are interacted with herbs to predict the likelihood that each herb will be suitable for treating the set of symptoms, where the predicted score is calculated using the inner product, as shown in equation (8):
Figure FDA0003365314490000031
wherein
Figure FDA0003365314490000032
Indicating that herb h is suitable for treating symptom group SpI.e. the underlying syndrome sdH is an embedded representation of the herb entities in the knowledge-graph, and finally outputting the top N herbs with the highest probability scores as the prescription applicable to the input symptom combination.
CN202111402132.3A 2021-11-19 2021-11-19 Traditional Chinese medicine prescription generation method based on knowledge graph and group representation learning Active CN114121212B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111402132.3A CN114121212B (en) 2021-11-19 2021-11-19 Traditional Chinese medicine prescription generation method based on knowledge graph and group representation learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111402132.3A CN114121212B (en) 2021-11-19 2021-11-19 Traditional Chinese medicine prescription generation method based on knowledge graph and group representation learning

Publications (2)

Publication Number Publication Date
CN114121212A true CN114121212A (en) 2022-03-01
CN114121212B CN114121212B (en) 2024-04-02

Family

ID=80371712

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111402132.3A Active CN114121212B (en) 2021-11-19 2021-11-19 Traditional Chinese medicine prescription generation method based on knowledge graph and group representation learning

Country Status (1)

Country Link
CN (1) CN114121212B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116631612A (en) * 2023-06-09 2023-08-22 广东工业大学 Graph convolution herbal medicine recommendation method and computer based on multi-graph fusion

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110334211A (en) * 2019-06-14 2019-10-15 电子科技大学 A kind of Chinese medicine diagnosis and treatment knowledge mapping method for auto constructing based on deep learning
CN112131399A (en) * 2020-09-04 2020-12-25 牛张明 Old medicine new use analysis method and system based on knowledge graph
WO2021139247A1 (en) * 2020-08-06 2021-07-15 平安科技(深圳)有限公司 Construction method, apparatus and device for medical domain knowledge map, and storage medium
WO2021189971A1 (en) * 2020-10-26 2021-09-30 平安科技(深圳)有限公司 Medical plan recommendation system and method based on knowledge graph representation learning
CN113539412A (en) * 2021-07-19 2021-10-22 闽江学院 Chinese herbal medicine recommendation system based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110334211A (en) * 2019-06-14 2019-10-15 电子科技大学 A kind of Chinese medicine diagnosis and treatment knowledge mapping method for auto constructing based on deep learning
WO2021139247A1 (en) * 2020-08-06 2021-07-15 平安科技(深圳)有限公司 Construction method, apparatus and device for medical domain knowledge map, and storage medium
CN112131399A (en) * 2020-09-04 2020-12-25 牛张明 Old medicine new use analysis method and system based on knowledge graph
WO2021189971A1 (en) * 2020-10-26 2021-09-30 平安科技(深圳)有限公司 Medical plan recommendation system and method based on knowledge graph representation learning
CN113539412A (en) * 2021-07-19 2021-10-22 闽江学院 Chinese herbal medicine recommendation system based on deep learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116631612A (en) * 2023-06-09 2023-08-22 广东工业大学 Graph convolution herbal medicine recommendation method and computer based on multi-graph fusion
CN116631612B (en) * 2023-06-09 2024-03-19 广东工业大学 Graph convolution herbal medicine recommendation method and computer based on multi-graph fusion

Also Published As

Publication number Publication date
CN114121212B (en) 2024-04-02

Similar Documents

Publication Publication Date Title
Khan et al. Chronic disease prediction using administrative data and graph theory: The case of type 2 diabetes
US20200203017A1 (en) Systems and methods of prediction of injury risk with a training regime
Yang et al. Predicting coronary heart disease using an improved LightGBM model: Performance analysis and comparison
Davazdahemami et al. An explanatory machine learning framework for studying pandemics: The case of COVID-19 emergency department readmissions
Sarkar et al. Selecting informative rules with parallel genetic algorithm in classification problem
US20210089965A1 (en) Data Conversion/Symptom Scoring
CN111477337A (en) Infectious disease early warning method, system and medium based on individual self-adaptive transmission network
CN116403730A (en) Medicine interaction prediction method and system based on graph neural network
Ravuri et al. Learning from the experts: From expert systems to machine-learned diagnosis models
CN116992980A (en) Prognosis prediction early warning model training method, system and equipment based on super network and federal learning
CN114121212B (en) Traditional Chinese medicine prescription generation method based on knowledge graph and group representation learning
Hoyos et al. PRV-FCM: An extension of fuzzy cognitive maps for prescriptive modeling
Ma et al. Construction and evaluation of intelligent medical diagnosis model based on integrated deep neural network
Tran et al. Building interpretable predictive models with context-aware evolutionary learning
Zeng et al. Influential simplices mining via simplicial convolutional network
Chen et al. Personalized expert recommendation systems for optimized nutrition
CN116798653A (en) Drug interaction prediction method, device, electronic equipment and storage medium
CN115240811A (en) Construction method and application of implicit relation drug recommendation model based on graph neural network
Xu et al. Multiple MACE risk prediction using multi-task recurrent neural network with attention
Khater et al. Interpretable models for ml-based classification of obesity
Dong et al. PresRecST: a novel herbal prescription recommendation algorithm for real-world patients with integration of syndrome differentiation and treatment planning
Jin et al. A knowledge-guided and traditional Chinese medicine informed approach for herb recommendation
Yale Privacy preserving synthetic health data generation and evaluation
Melek et al. A theoretic framework for intelligent expert systems in medical encounter evaluation
CN117435747B (en) Few-sample link prediction drug recycling method based on multilevel refinement network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant