CN115050481A - Traditional Chinese medicine prescription efficacy prediction method based on graph convolution neural network - Google Patents

Traditional Chinese medicine prescription efficacy prediction method based on graph convolution neural network Download PDF

Info

Publication number
CN115050481A
CN115050481A CN202210683366.8A CN202210683366A CN115050481A CN 115050481 A CN115050481 A CN 115050481A CN 202210683366 A CN202210683366 A CN 202210683366A CN 115050481 A CN115050481 A CN 115050481A
Authority
CN
China
Prior art keywords
chinese medicine
prescription
formula
traditional chinese
efficacy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210683366.8A
Other languages
Chinese (zh)
Other versions
CN115050481B (en
Inventor
丁长松
刘佳俊
李力松
黄辛迪
梁杨
黄群富
程宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University of Chinese Medicine
Original Assignee
Hunan University of Chinese Medicine
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University of Chinese Medicine filed Critical Hunan University of Chinese Medicine
Priority to CN202210683366.8A priority Critical patent/CN115050481B/en
Publication of CN115050481A publication Critical patent/CN115050481A/en
Application granted granted Critical
Publication of CN115050481B publication Critical patent/CN115050481B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/90ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to alternative medicines, e.g. homeopathy or oriental medicines
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Public Health (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Epidemiology (AREA)
  • Artificial Intelligence (AREA)
  • Primary Health Care (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Medicinal Chemistry (AREA)
  • Biophysics (AREA)
  • Chemical & Material Sciences (AREA)
  • Pathology (AREA)
  • Toxicology (AREA)
  • Alternative & Traditional Medicine (AREA)
  • Animal Behavior & Ethology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a Chinese medicine prescription efficacy prediction method based on a graph convolution neural network, which comprises the following steps: s1, a data preprocessing module preprocesses the traditional Chinese medicine prescription data through a natural language processing technology and constructs a traditional Chinese medicine database and a prescription database meeting the data mining requirement; s2, the prescription map construction module represents prescription text data in a map form and constructs a prescription sketch map containing traditional Chinese medicine nodes and prescription nodes; s3, the nodes represent the learning module, and the nodes in the heterogeneous graph of the formula are embedded and represented through the graph convolution network learning module for being used by downstream tasks; and S4, a prediction module is used for embedding and representing the final nodes obtained based on the training of the Chinese medicine prescription samples to obtain the probability distribution of the Chinese medicine prescription on the efficacy.

Description

Traditional Chinese medicine prescription efficacy prediction method based on graph convolution neural network
Technical Field
The invention relates to the technical field of bioinformatics, in particular to a method for predicting traditional Chinese medicine prescription efficacy based on a graph convolution neural network
Background
The traditional Chinese medicine formula is a main means for preventing and treating diseases by applying traditional Chinese medicines in the traditional Chinese medicine, is used as a bridge for connecting the traditional Chinese medicine basis and the clinic, and is always the key point of the traditional Chinese medicine theory and the clinical research. The efficacy of the prescription is summarized on the basis of the traditional Chinese medicine theory and the compatibility theory and the clinical effect and characteristics of the prescription. The accurate grasp of the efficacy performance of the traditional Chinese medicine after the compatibility plays a key role in the clinical diagnosis and treatment effect of the traditional Chinese medicine. Traditionally, the determination of the efficacy of a prescription is carried out by considering the composition of the medicine, the dosage ratio, a dialectical method and the like, and a great deal of manpower and material resources are consumed. If the efficacy of one prescription can be predicted by modern scientific technology, the result provides valuable reference for subsequent clinical practice and is also beneficial to inheritance and innovation of traditional Chinese medicine theories.
The continuous emergence of intelligent information technology brings new opportunity for the modernization research of traditional Chinese medicine, and the deep fusion of artificial intelligence and traditional Chinese medicine leads the traditional Chinese medicine to be better inherited, developed and innovated. The existing methods for analyzing the efficacy of a prescription based on an artificial intelligence technology can be roughly divided into the following two categories: (1) based on the topic model: wang et al consider Chinese medicines as "words", prescriptions as "documents", and prescriptions as "subjects". A topic model is used for mining potential relations between efficacy labels and traditional Chinese medicines, a novel topic model with supervision for prescription efficacy prediction is provided, and prescription compatibility rule mining results are brought into a learning process. However, as a typical bag-of-word model, the topic model has great limitations on short text data of Chinese medicinal preparations, and it is difficult to gain insight into the complex correlation between Chinese medicinal composition and prescription efficacy. (2) Based on the neural network: cheng et al propose an improved deep learning model consisting of a bi-directional long-short term memory neural network and a convolutional neural network that predicts the efficacy of an agent by building a number of two classifiers. However, the method does not consider the influence of the change of the drug dosage in the compatibility on the whole efficacy, and has certain limitations.
As an important technology for propagation and aggregation of information between nodes, a Graph convolutional Neural Network (GCN) can effectively apply the concept of deep learning to unstructured data, and has achieved a very significant effect in the fields of biomedicine, traffic logistics, electronic commerce, mobile social interaction, and the like. In the composing process, the monarch, minister, assistant and guide structures of the prescription are considered, the compatibility of the medicines and the proportion of the dosage are also considered, and the prescription, the traditional Chinese medicine, the drug properties, the dosage, the efficacy and other factors are involved, wherein the prescription, the traditional Chinese medicine, the drug properties and the efficacy have obvious space topological structures. Based on the above, on the basis of summarizing the defects of the conventional method, the invention constructs all prescription data into a complete heterogeneous graph, introduces the traditional Chinese medicine attribute and the common dosage range as external knowledge, provides a prescription efficacy prediction method based on GCN, and makes a contribution to the modernized development of the traditional Chinese medicine.
Disclosure of Invention
The invention aims to provide a traditional Chinese medicine prescription efficacy prediction method based on a graph convolution neural network, which is used for fully learning node embedding of traditional Chinese medicines and prescriptions through a graph convolution method and mining the relationship between medicine composition and prescription efficacy. In order to solve the problems, the technical scheme is as follows:
the invention discloses a traditional Chinese medicine prescription efficacy prediction method based on a graph convolution neural network, which comprises the following steps:
s1, a data preprocessing module preprocesses the traditional Chinese medicine prescription data through a natural language processing technology and constructs a traditional Chinese medicine database and a prescription database meeting the data mining requirement;
s2, the prescription map construction module represents prescription text data in a map form and constructs a prescription sketch map containing traditional Chinese medicine nodes and prescription nodes;
s3, the node represents a learning module, and the node in the prescription heterogeneous graph is embedded and represented through the graph convolution network learning for being used by downstream tasks;
and S4, a prediction module is used for embedding and representing the final nodes obtained based on the training of the Chinese medicine prescription samples to obtain the probability distribution of the Chinese medicine prescription on the efficacy.
Further, the Chinese medicine database comprises information of medicine names, alternative names, properties including cold, heat, warm, benign, flat, sour, bitter, sweet, pungent and salty tastes, meridian tropism including lung, pericardium, heart, large intestine, triple energizer, small intestine, stomach, gallbladder, bladder, spleen, liver and kidney, toxicity and common dosage range; the prescription database comprises the name of the prescription, the composition of the traditional Chinese medicine, the actual dosage and the efficacy information; the efficacy information comprises a dryness treatment formula, a tonifying formula, a blood regulating formula, a wind dispelling formula, an phlegm eliminating formula, a carbuncle and ulcer formula, a dampness eliminating formula, a qi regulating formula, an interior warming formula, a harmonizing formula, a heat clearing formula, an emergency treatment formula, a nerve calming formula, an eyesight improving formula, a purgation formula, an resuscitation inducing formula, a digestion promoting formula, an astringing formula and an exterior syndrome relieving formula.
Further, in S1, preprocessing the prescription data includes:
s101, standardizing traditional Chinese medicine names, and performing synonym replacement on traditional Chinese medicine names in a traditional Chinese medicine prescription through a character string matching technology;
s102, modern conversion of dosage, namely splitting the name and the dosage of the traditional Chinese medicine by utilizing a word segmentation technology, and calculating and converting different dosage units of the traditional Chinese medicine in the traditional Chinese medicine prescription into modern dosage unit grams.
Further, in S2, a recipe profile is constructed, including:
s201, Chinese medicine quantitative representation, namely converting abstract description of Chinese medicine attributes into specific numerical values serving as Chinese medicine attribute vectors
Figure BDA0003699159400000031
d represents the number of attributes of the traditional Chinese medicine; the attributes of the traditional Chinese medicine comprise nature, taste, channel tropism and toxicity, and are quantified by index grade, because the attributes have modifiers for description degree, respectively 2 are used -1 、2 0 、2 1 The expression "slightly cold", "cold" and "big cold"; binary quantization is adopted for the channel tropism attribute, the existence of the attribute is represented by 1, and the nonexistence of the attribute is represented by 0;
s202, quantitatively expressing the Chinese medicine formulas, matching the Chinese medicines contained in the Chinese medicine formulas with corresponding Chinese medicine attribute vectors, and summing the Chinese medicine attribute vectors to obtain Chinese medicine formula attribute vectors
Figure BDA0003699159400000032
S203, normalizing the medicament quantity according to the following relative dose conversion formula:
Figure BDA0003699159400000033
wherein ,
Figure BDA0003699159400000034
represents the relative dosage of the Chinese medicine i in the formula j, d ij Represents the actual dosage of the Chinese medicine i in the prescription j, d min and dmax Respectively representing the minimum value and the maximum value of the common dosage range of the traditional Chinese medicine;
s204, calculating the similarity of the traditional Chinese medicines according to the following formula:
Figure BDA0003699159400000035
wherein ,xa and xb Respectively representing the attribute vectors of the traditional Chinese medicine a and the traditional Chinese medicine b;
s205, constructing a prescription abnormal graph FH (V, E), wherein V is a node set in the graph, E is an edge set in the graph, and a feature matrix of the graph
Figure BDA0003699159400000036
Corresponding adjacency matrix
Figure BDA0003699159400000037
n denotes n nodes with characteristics, d denotes the node characteristic dimension, FH is expressed in an undirected graph, and the adjacency matrix thereof is expressed as follows:
Figure BDA0003699159400000038
wherein beta is a threshold value of a connecting edge between the nodes of the traditional Chinese medicines, when the similarity of the two traditional Chinese medicines is greater than beta, the connecting edge is established between the two traditional Chinese medicines, otherwise, the connecting edge is not established.
Further, in S3, learning node embedding representation in the prescription heterogeneous graph through a graph convolution network, where the graph convolution network captures neighbor information through one layer of convolution, and when the graph convolution network is stacked in multiple layers, high-order neighborhood information can be obtained;
inputting node attribute vector in the first layer
Figure BDA0003699159400000041
According to A i,j and Aj,i Obtaining an adjacent matrix A of FH, updating information of aggregation neighborhood nodes by the traditional Chinese medicine prescription and the traditional Chinese medicine nodes, and performing k-dimensional node matrix on a layer of graph convolution network
Figure BDA0003699159400000042
Is formalized as:
Figure BDA0003699159400000043
wherein ,
Figure BDA0003699159400000044
is a normalized representation of the adjacency matrix a,
Figure BDA0003699159400000045
parameters representing learning, rho being an activation function;
the multi-layer graph convolutional network is represented as:
Figure BDA0003699159400000046
wherein l represents the number of layers, and H (0) =X。
Further, in S4, the efficacy prediction module of the chinese medicine prescription is specifically implemented as follows:
for each Chinese medicine prescription, the distance between the predicted efficacy and the real efficacy is evaluated, the output dimensionality of the convolution network of the last layer of graph is set to be equal to the number of efficacy labels, and the final embedded expression of the Chinese medicine prescription nodes is obtained
Figure BDA0003699159400000047
And sending the information into a Sigmoid classifier for learning to obtain the probability distribution of the traditional Chinese medicine prescription on all the effects
Figure BDA0003699159400000048
Wherein C represents the number of efficacy labels,
Figure BDA0003699159400000049
the calculation method is as follows:
Figure BDA00036991594000000410
further, in S4, the efficacy prediction of the chinese medicine prescription uses the multi-label cross-entropy loss function as the loss function:
Figure BDA00036991594000000411
where n represents the number of training set samples, y (i) E 0, 1 represents the true efficacy label,
Figure BDA00036991594000000412
indicating the predicted value.
The method for predicting the traditional Chinese medicine prescription efficacy based on the graph convolution neural network has the beneficial effects that:
the method for predicting the efficacy of the traditional Chinese medicine prescription based on the graph convolution neural network is combined with relevant theories of traditional Chinese medicine to quantitatively express the traditional Chinese medicine and the prescription, and is favorable for mining traditional Chinese medicine information.
According to the method for predicting the traditional Chinese medicine prescription efficacy based on the graph convolution neural network, the influence of the dosage on the prescription efficacy is considered, and the dosage of the traditional Chinese medicine is normalized through a relative dosage conversion formula, so that the dosage comparison among different traditional Chinese medicines is facilitated.
The Chinese medicine prescription efficacy prediction method based on the graph convolution neural network provided by the invention represents prescription text data in a graph mode, and the graph contains medicine composition, dosage and Chinese medicine attribute information, so that the Chinese medicine prescription efficacy prediction method based on the graph convolution neural network is beneficial to learning of node embedding with rich information.
The method for predicting the traditional Chinese medicine prescription efficacy based on the graph convolution neural network provided by the invention applies the leading-edge technology of the graph convolution neural network, which is the artificial intelligence field, to a prescription efficacy prediction scene, considers the relationship between prescriptions and traditional Chinese medicines and the relationship between traditional Chinese medicines, and has certain innovation in the technology.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a general flow chart of the method for predicting the efficacy of a Chinese medicinal prescription based on a graph convolution neural network according to the present invention.
FIG. 2 is a frame diagram of a model for predicting the efficacy of a convolutional neural network according to the present invention.
Detailed Description
In order to make the technical solutions in the embodiments of the present invention better understood and make the above objects, features, and advantages of the present invention more comprehensible, specific embodiments of the present invention are described below with reference to the accompanying drawings.
It should be noted that the description of the embodiments is provided to help understanding of the present invention, and is not intended to limit the present invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Referring to fig. 1 and fig. 2, the method for predicting the efficacy of a chinese medicinal formulation based on a convolutional neural network of the present embodiment includes the following steps:
s1, a data preprocessing module preprocesses the traditional Chinese medicine prescription data through a natural language processing technology and constructs a traditional Chinese medicine database and a prescription database meeting the data mining requirement;
s2, the prescription map construction module represents prescription text data in a map form and constructs a prescription sketch map containing traditional Chinese medicine nodes and prescription nodes;
s3, the node represents a learning module, and the node in the prescription heterogeneous graph is embedded and represented through the graph convolution network learning for being used by downstream tasks;
and S4, a prediction module is used for embedding and representing the final nodes obtained based on the training of the Chinese medicine prescription samples to obtain the probability distribution of the Chinese medicine prescription on the efficacy.
As a preferred embodiment, the database of chinese medicine includes the name of the drug, alias, nature and taste, meridian tropism, toxicity and general dosage range information, the nature and taste includes cold, heat, warm, benign, flat, sour, bitter, sweet, pungent and salty, and the meridian tropism includes lung, pericardium, heart, large intestine, triple energizer, small intestine, stomach, gallbladder, bladder, spleen, liver and kidney; the prescription database comprises the name of the prescription, the composition of the traditional Chinese medicine, the actual dosage and the efficacy information; the efficacy information comprises a dryness treatment formula, a tonifying formula, a blood regulating formula, a wind dispelling formula, an phlegm eliminating formula, a carbuncle and ulcer formula, a dampness eliminating formula, a qi regulating formula, an interior warming formula, an interior harmonizing formula, a heat clearing formula, an emergency treatment formula, a nerve calming formula, an eyesight improving formula, a lower part excreting formula, an resuscitation inducing formula, a digestion promoting formula, an astringing formula and an exterior relieving formula.
The data of the traditional Chinese medicine prescription is shown in table 1, and the data of the traditional Chinese medicine is shown in table 2.
Table 1 data examples of chinese medicine formulas
Figure BDA0003699159400000061
Table 2 Chinese medicine data examples
Figure BDA0003699159400000062
Preferably, in S1, preprocessing the prescription data includes:
s101, standardizing traditional Chinese medicine names, and performing synonym replacement on traditional Chinese medicine names in a traditional Chinese medicine prescription through a character string matching technology;
s102, modern conversion of dosage, namely splitting the name and the dosage of the traditional Chinese medicine by utilizing a word segmentation technology, and calculating and converting different dosage units of the traditional Chinese medicine in the traditional Chinese medicine prescription into modern dosage unit grams.
Preferably, in S2, a recipe profile is constructed comprising:
s201, Chinese medicine quantitative representation, namely converting abstract description of Chinese medicine attributes into specific numerical values serving as Chinese medicine attribute vectors
Figure BDA0003699159400000071
d represents the number of attributes of the traditional Chinese medicine; the attributes of the traditional Chinese medicine comprise nature, taste, channel tropism and toxicity, and the nature, taste and toxicity are adoptedQuantized by exponential steps, respectively by 2 -1 、2 0 、2 1 The expression "slightly cold", "cold" and "big cold"; binary quantization is adopted for the channel tropism attribute, the existence of the attribute is represented by 1, and the nonexistence of the attribute is represented by 0; wherein, the quantitative representation of the traditional Chinese medicine is shown in table 3, for example, ginseng is slightly warm in nature, slightly bitter and sweet in taste, enters lung, heart, spleen and kidney meridians, is nontoxic, and the quantitative representation of ginseng is as follows:
x ginseng radix =(0,0,0.5,0,0,0,0.5,1,0,0,1,0,1,0,0,0,0,0,0,1,0,1,0)。
TABLE 3 quantitative representation of Chinese medicine
Figure BDA0003699159400000072
S202, quantitatively expressing the Chinese medicine formulas, matching the Chinese medicines contained in the Chinese medicine formulas with corresponding Chinese medicine attribute vectors, and summing the Chinese medicine attribute vectors to obtain Chinese medicine formula attribute vectors
Figure BDA0003699159400000073
S203, normalizing the medicament quantity according to the following relative dose conversion formula:
Figure BDA0003699159400000074
wherein ,
Figure BDA0003699159400000075
represents the relative dosage of the Chinese medicine i in the formula j, d ij Represents the actual dosage of the Chinese medicine i in the prescription j, d min and dmax Respectively representing the minimum value and the maximum value of the common dosage range of the traditional Chinese medicine; in the case of wuling san, the conversion result of the relative dosage of each traditional Chinese medicine in the formula is shown in table 4:
table 4 relative dose conversion example
Figure BDA0003699159400000076
Figure BDA0003699159400000081
S204, calculating the similarity of the traditional Chinese medicines according to the following formula:
Figure BDA0003699159400000082
wherein ,xa and xb Respectively representing the attribute vectors of the traditional Chinese medicine a and the traditional Chinese medicine b;
s205, constructing a prescription abnormal graph FH (V, E), wherein V is a node set in the graph, E is an edge set in the graph, and a feature matrix of the graph
Figure BDA0003699159400000083
Corresponding adjacency matrix
Figure BDA0003699159400000084
n denotes n nodes with characteristics, d denotes the node characteristic dimension, FH is expressed in an undirected graph, and the adjacency matrix thereof is expressed as follows:
Figure BDA0003699159400000085
wherein, beta is a threshold value of the connecting edge between the nodes of the traditional Chinese medicines, when the similarity of the two traditional Chinese medicines is more than beta, the connecting edge is established between the two traditional Chinese medicines, otherwise, the connecting edge is not established.
Preferably, in S3, node embedding representation in the formula heterogeneous graph is learned through a graph convolution network, the graph convolution network captures neighbor information through one layer of convolution, and when the graph convolution network is stacked in multiple layers, high-order neighborhood information can be obtained;
inputting node attribute vector in the first layer
Figure BDA0003699159400000086
According to A i,j and Aj,i Obtaining an adjacent matrix A of FH, updating information of aggregation neighborhood nodes by the traditional Chinese medicine prescription and the traditional Chinese medicine nodes, and performing k-dimensional node matrix on a layer of graph convolution network
Figure BDA0003699159400000087
Is formalized as:
Figure BDA0003699159400000088
wherein ,
Figure BDA0003699159400000089
is a normalized representation of the adjacency matrix a,
Figure BDA00036991594000000810
parameters representing learning, rho being an activation function;
the multi-layer graph convolutional network is represented as:
Figure BDA00036991594000000811
wherein l represents the number of layers, and H (0) =X。
Preferably, in S4, the efficacy prediction module of the chinese medicine formulation is specifically implemented as follows:
for each Chinese medicine prescription, the distance between the predicted efficacy and the real efficacy is evaluated, the output dimensionality of the convolution network of the last layer of graph is set to be equal to the number of efficacy labels, and the final embedded expression of the Chinese medicine prescription nodes is obtained
Figure BDA00036991594000000812
And sending the information into a Sigmoid classifier for learning to obtain the probability distribution of the traditional Chinese medicine prescription on all the effects
Figure BDA0003699159400000091
Wherein C represents the number of efficacy labels,
Figure BDA0003699159400000092
the calculation method is as follows:
Figure BDA0003699159400000093
preferably, in S4, the efficacy prediction of the chinese prescription uses a multi-label cross-entropy loss function as the loss function:
Figure BDA0003699159400000094
where n represents the number of training set samples, y (i) E 0, 1 represents the true efficacy label,
Figure BDA0003699159400000095
indicating the predicted value.
In order to verify the effectiveness of the method for predicting the efficacy of the traditional Chinese medicine prescription based on the graph-convolution neural network, experiments are carried out on 2274 prescriptions, the traditional Chinese medicines 539 are included, and basic information statistics of a prescription data set are shown in table 5. The prescription data is randomly divided into training sets and test sets according to the ratio of 8: 2, and 20% of the training sets are randomly selected as verification sets. And (3) building a prediction model by using two layers of GCN, wherein the embedding dimension of the GCN node of the first layer is set to be 800, and the embedding dimension of the GCN node of the second layer is set to be 19. Setting beta to be 0.6, setting the learning rate to be 0.01, using an Adam optimizer to train the model for at most 500 rounds, adopting an early-stopping mechanism to prevent the model from being over-fitted, and if the verification loss of 10 continuous rounds is not reduced, stopping training of the model.
TABLE 5 formula data set basic information statistics
Efficacy of Chinese medicine
Minimum number of 1 1
Maximum number of 7 50
Mean number of 2.67 9.66
Standard deviation of 1.04 3.81
In the invention, three commonly used multi-label models are selected to evaluate indexes to measure the model performance, namely Precision (Precision, P), F1 score (F1-score, F1) and Hamming Loss (HL), and the calculation formulas of the indexes are as follows:
Figure BDA0003699159400000096
Figure BDA0003699159400000101
Figure BDA0003699159400000102
wherein m represents the testThe number of samples in the set, C represents the number of efficacy labels,
Figure BDA0003699159400000103
the jth label representing the ith sample. Generally, the higher Precision and F1-score, the lower Hamming Loss, the better the model prediction.
Comparative experiments were performed using the following model:
multilayer Perceptron (MLP): information is transmitted among all layers in a full connection mode, and complex data such as nonlinearity, imbalance, small samples and the like can be processed.
Long Short-Term Memory (LSTM): a time circulation neural network extracts text features in time, solves the problems of gradient disappearance and gradient explosion of the circulation neural network, and is high in calculation efficiency.
Bi-directional Long Short-Term Memory (Bi-LSTM): the system is composed of two-way LSTM, solves the difficulty that the traditional LSTM model cannot capture context information due to the problem of serialization processing, and is often used for the classification task of the specimen.
To minimize the effect of randomness on the experimental results, we performed 10 experiments on the data set of the TCM formulation and reported the mean plus minus standard deviation, and the comparative experimental results are shown in Table 5.
TABLE 5 Experimental results of Chinese medicinal prescription data on different baseline models
model MLP LSTM Bi-LSTM TCMGCN
P↑ 0.7338±0.0138 0.7072±0.0209 0.7050±0.0149 0.7503±0.0114
F1↑ 0.6041±0.0074 0.6022±0.0109 0.6057±0.0094 0.6275±0.0130
HL↓ 0.0967±0.0020 0.0992±0.0022 0.0987±0.0022 0.0936±0.0030
Table 5 shows the results of the experiments performed by the four methods on the data set of the chinese herbal prescription, wherein TCMGCN is the method proposed by the present invention. It can be seen that the effect of TCMGCN implementing multi-label classification is the best in the evaluation index accuracy rate, F1 score and hamming loss evaluation index, and is better than the other three baseline models. Specifically, the TCMGCN is improved by 2.25% in the accuracy index compared with the strongest baseline MLP, is improved by 3.60% in the F1 score index compared with the strongest baseline Bi-LSTM, and is reduced by 3.21% in the Hamming loss index compared with the strongest baseline MLP. The MLP, LSTM and Bi-LSTM deep learning network models achieve considerable effects, and the fact that the traditional Chinese medicine prescription efficacy prediction task is regarded as a multi-label classification task is reasonable to a certain extent is demonstrated. The reason why Bi-LSTM is slightly better than LSTM in F1 fraction and Hamming loss index is that Bi-LSTM can receive context information of sequence, and has characteristics of deeper storage level and stronger resolution. The MLP accuracy and Hamming loss index are better than LSTM and Bi-LSTM, because the MLP takes updated network parameters as the training process target and the output is closer to the actual situation. The reason why TCMGCN is excellent in performance may be that the prescription heteromorphic graph can capture the relationship between prescriptions and chinese medicines, and chinese medicines, and the graph convolution layer enables the label to be spread to the entire graph by aggregating labeled node and unlabeled node information. In conclusion, the method for predicting the efficacy of the traditional Chinese medicine prescription based on the graph convolution neural network has certain effectiveness.
Table 6 shows the efficacy prediction results for TCMGCN for the specific two examples, with bold font indicating the correct efficacy of the prediction. The first example input prescription information is astragalus (20 g), cassia twig (30 g), angelica (15 g), dried rehmannia root (15 g), achyranthes root (30 g), ligusticum wallichii (15 g), salvia miltiorrhiza (15 g) and leech (10 g), and all efficacy labels are hit by the method provided by the invention. The second example inputs prescription information of aconite (30 g), grassleaved sweetflag rhizome (30 g), polygala root (30 g), gastrodia tuber (30 g), buthus martensi kirsch (30 g), notopterygium root (30 g), batryticated silkworm (30 g), arisaema tuber (30 g), the method of the invention targets all efficacy labels except 'damp clearing'. Therefore, the TCMGCN method provided by the invention has good practical application value and can carry out reasonable prediction on the efficacy of the traditional Chinese medicine prescription.
TABLE 6 case analysis
Figure BDA0003699159400000111
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments.
The embodiments of the present invention are described in detail above with reference to the drawings, but the present invention is not limited to the described embodiments. Various changes, modifications, substitutions and alterations to these embodiments will occur to those skilled in the art without departing from the spirit and scope of the present invention.

Claims (7)

1. A traditional Chinese medicine prescription efficacy prediction method based on a graph convolution neural network is characterized by comprising the following steps:
s1, a data preprocessing module preprocesses the traditional Chinese medicine prescription data through a natural language processing technology and constructs a traditional Chinese medicine database and a prescription database meeting the data mining requirement;
s2, the prescription map construction module represents prescription text data in a map form and constructs a prescription sketch map containing traditional Chinese medicine nodes and prescription nodes;
s3, the node represents a learning module, and the node in the prescription heterogeneous graph is embedded and represented through the graph convolution network learning for being used by downstream tasks;
and S4, a prediction module is used for embedding and representing the final nodes obtained based on the training of the Chinese medicine prescription samples to obtain the probability distribution of the Chinese medicine prescription on the efficacy.
2. The prediction method of claim 1, wherein the database of chinese herbs comprises drug names, alternative names, flavors, meridians entered, toxicity and usual dosage range information, wherein the flavors comprise cold, hot, warm, benign, flat, sour, bitter, sweet, pungent and salty, and the meridians entered comprise lung, pericardium, heart, large intestine, triple energizer, small intestine, stomach, gallbladder, bladder, spleen, liver and kidney;
the prescription database comprises the name of the prescription, the composition of the traditional Chinese medicine, the actual dosage and the efficacy information; the efficacy information comprises a dryness treatment formula, a tonifying formula, a blood regulating formula, a wind dispelling formula, an phlegm eliminating formula, a carbuncle and ulcer formula, a dampness eliminating formula, a qi regulating formula, an interior warming formula, a harmonizing formula, a heat clearing formula, an emergency treatment formula, a nerve calming formula, an eyesight improving formula, a purgation formula, an resuscitation inducing formula, a digestion promoting formula, an astringing formula and an exterior syndrome relieving formula.
3. The prediction method of claim 2, wherein the preprocessing of the prescription data in S1 comprises:
s101, standardizing traditional Chinese medicine names, and performing synonym replacement on traditional Chinese medicine names in a traditional Chinese medicine prescription through a character string matching technology;
s102, modern conversion of dosage, namely splitting the name and the dosage of the traditional Chinese medicine by using a word segmentation technology, and calculating and converting different dosage units of the traditional Chinese medicine in the traditional Chinese medicine formula into modern dosage unit grams.
4. The prediction method according to claim 3, wherein in S2, constructing a recipe sketch comprises:
s201, Chinese medicine quantitative representation, namely converting abstract description of Chinese medicine attributes into specific numerical values serving as Chinese medicine attribute vectors
Figure FDA0003699159390000021
d represents the number of attributes of the traditional Chinese medicine; the attributes of the Chinese traditional medicine comprise nature, taste, channel tropism and toxicity, the nature, taste and toxicity are quantified by index grade and are respectively 2 -1 、2 0 、2 1 The expression "slightly cold", "cold" and "big cold"; binary quantization is adopted for the channel tropism attribute, the existence of the attribute is represented by 1, and the nonexistence of the attribute is represented by 0;
s202, quantitatively expressing the Chinese medicine formulas, matching the Chinese medicines contained in the Chinese medicine formulas with corresponding Chinese medicine attribute vectors, and summing the Chinese medicine attribute vectors to obtain Chinese medicine formula attribute vectors
Figure FDA0003699159390000022
S203, normalizing the medicament quantity according to the following relative dose conversion formula:
Figure FDA0003699159390000023
wherein ,
Figure FDA0003699159390000024
represents the relative dosage of the Chinese medicine i in the formula j, d ij Represents the actual dosage of the Chinese medicine i in the prescription j, d min and dmax Are respectively provided withRepresents the minimum value and the maximum value of the common dosage range of the traditional Chinese medicine;
s204, calculating the similarity of the traditional Chinese medicines according to the following formula:
Figure FDA0003699159390000025
wherein ,xa and xb Respectively representing the attribute vectors of the traditional Chinese medicine a and the traditional Chinese medicine b;
s205, constructing a prescription abnormal graph FH (V, E), wherein V is a node set in the graph, E is an edge set in the graph, and a feature matrix of the graph
Figure FDA0003699159390000026
Corresponding adjacency matrix
Figure FDA0003699159390000027
n denotes n nodes with characteristics, d denotes the node characteristic dimension, FH is expressed in an undirected graph, and the adjacency matrix thereof is expressed as follows:
Figure FDA0003699159390000028
wherein, beta is a threshold value of the connecting edge between the nodes of the traditional Chinese medicines, when the similarity of the two traditional Chinese medicines is more than beta, the connecting edge is established between the two traditional Chinese medicines, otherwise, the connecting edge is not established.
5. The prediction method according to claim 4, wherein in S3, node embedding representation in the formula heterogeneous graph is learned through a graph convolution network, the graph convolution network captures neighbor information through one layer of convolution, and when the graph convolution network is stacked in multiple layers, high-order neighborhood information can be obtained;
inputting node attribute vector in the first layer
Figure FDA0003699159390000031
According to A i,j and Aj,i Obtaining the abutting moment of FHThe array A, the Chinese medicine prescription and the Chinese medicine nodes update the information of the aggregation neighborhood nodes, and for a layer of graph convolution network, a k-dimensional node matrix
Figure FDA0003699159390000032
Is formalized as:
Figure FDA0003699159390000033
wherein ,
Figure FDA0003699159390000034
is a normalized representation of the adjacency matrix a,
Figure FDA0003699159390000035
parameters representing learning, rho being an activation function;
the multi-layer graph convolutional network is represented as:
Figure FDA0003699159390000036
wherein l represents the number of layers, and H (0) =X。
6. The prediction method according to claim 5, wherein in S4, the Chinese medicinal formula efficacy prediction module is implemented as follows:
for each Chinese medicine prescription, the distance between the predicted efficacy and the real efficacy is evaluated, the output dimensionality of the convolution network of the last layer of graph is set to be equal to the number of efficacy labels, and the final embedded expression of the Chinese medicine prescription nodes is obtained
Figure FDA0003699159390000037
And sending the information into a Sigmoid classifier for learning to obtain the probability distribution of the traditional Chinese medicine prescription on all the effects
Figure FDA0003699159390000038
Wherein C represents the number of efficacy labels,
Figure FDA0003699159390000039
the calculation method is as follows:
Figure FDA00036991593900000310
7. the prediction method of claim 6, wherein in S4, the Chinese medicinal formula efficacy prediction uses a multi-label cross-entropy loss function as the loss function:
Figure FDA00036991593900000311
where n represents the number of training set samples, y (i) E 0, 1 represents the true efficacy label,
Figure FDA0003699159390000041
indicating the predicted value.
CN202210683366.8A 2022-06-17 2022-06-17 Traditional Chinese medicine prescription efficacy prediction method based on graph convolution neural network Active CN115050481B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210683366.8A CN115050481B (en) 2022-06-17 2022-06-17 Traditional Chinese medicine prescription efficacy prediction method based on graph convolution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210683366.8A CN115050481B (en) 2022-06-17 2022-06-17 Traditional Chinese medicine prescription efficacy prediction method based on graph convolution neural network

Publications (2)

Publication Number Publication Date
CN115050481A true CN115050481A (en) 2022-09-13
CN115050481B CN115050481B (en) 2023-10-31

Family

ID=83160800

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210683366.8A Active CN115050481B (en) 2022-06-17 2022-06-17 Traditional Chinese medicine prescription efficacy prediction method based on graph convolution neural network

Country Status (1)

Country Link
CN (1) CN115050481B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115472241A (en) * 2022-09-23 2022-12-13 湖南中医药大学 Meridian tropism determination method of traditional Chinese medicine component clusters based on chemical structure topological index comparison and chromatographic blotting measurement and calculation
CN116525100A (en) * 2023-04-26 2023-08-01 脉景(杭州)健康管理有限公司 Traditional Chinese medicine prescription reverse verification method and system based on label system
CN117079762A (en) * 2023-09-25 2023-11-17 腾讯科技(深圳)有限公司 Drug effect prediction model training method, drug effect prediction method and device thereof

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106803012A (en) * 2016-12-29 2017-06-06 杭州师范大学钱江学院 Prescription function prediction method based on probability topic model and Chinese medicine base attribute
CN110981021A (en) * 2019-12-23 2020-04-10 中新国际联合研究院 Intelligent dosing system and method for advanced wastewater oxidation treatment based on fuzzy BP neural network
CN111312356A (en) * 2020-01-17 2020-06-19 四川大学 Traditional Chinese medicine prescription generation method based on BERT and integration efficacy information
CN113539412A (en) * 2021-07-19 2021-10-22 闽江学院 Chinese herbal medicine recommendation system based on deep learning
CN114255851A (en) * 2021-12-22 2022-03-29 闽江学院 Chinese herbal medicine prescription recommendation method based on hypergraph neural network
CN114547298A (en) * 2022-02-14 2022-05-27 大连理工大学 Biomedical relation extraction method, device and medium based on combination of multi-head attention and graph convolution network and R-Drop mechanism
CN114582497A (en) * 2022-03-04 2022-06-03 闽江学院 Traditional Chinese medicine syndrome type classification method based on graph convolution neural network
CN114613425A (en) * 2022-03-10 2022-06-10 中国石油大学(华东) Drug-target interaction prediction algorithm based on graph volume and similarity

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106803012A (en) * 2016-12-29 2017-06-06 杭州师范大学钱江学院 Prescription function prediction method based on probability topic model and Chinese medicine base attribute
CN110981021A (en) * 2019-12-23 2020-04-10 中新国际联合研究院 Intelligent dosing system and method for advanced wastewater oxidation treatment based on fuzzy BP neural network
CN111312356A (en) * 2020-01-17 2020-06-19 四川大学 Traditional Chinese medicine prescription generation method based on BERT and integration efficacy information
CN113539412A (en) * 2021-07-19 2021-10-22 闽江学院 Chinese herbal medicine recommendation system based on deep learning
CN114255851A (en) * 2021-12-22 2022-03-29 闽江学院 Chinese herbal medicine prescription recommendation method based on hypergraph neural network
CN114547298A (en) * 2022-02-14 2022-05-27 大连理工大学 Biomedical relation extraction method, device and medium based on combination of multi-head attention and graph convolution network and R-Drop mechanism
CN114582497A (en) * 2022-03-04 2022-06-03 闽江学院 Traditional Chinese medicine syndrome type classification method based on graph convolution neural network
CN114613425A (en) * 2022-03-10 2022-06-10 中国石油大学(华东) Drug-target interaction prediction algorithm based on graph volume and similarity

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115472241A (en) * 2022-09-23 2022-12-13 湖南中医药大学 Meridian tropism determination method of traditional Chinese medicine component clusters based on chemical structure topological index comparison and chromatographic blotting measurement and calculation
CN116525100A (en) * 2023-04-26 2023-08-01 脉景(杭州)健康管理有限公司 Traditional Chinese medicine prescription reverse verification method and system based on label system
CN117079762A (en) * 2023-09-25 2023-11-17 腾讯科技(深圳)有限公司 Drug effect prediction model training method, drug effect prediction method and device thereof
CN117079762B (en) * 2023-09-25 2024-01-23 腾讯科技(深圳)有限公司 Drug effect prediction model training method, drug effect prediction method and device thereof

Also Published As

Publication number Publication date
CN115050481B (en) 2023-10-31

Similar Documents

Publication Publication Date Title
CN115050481A (en) Traditional Chinese medicine prescription efficacy prediction method based on graph convolution neural network
CN110032648B (en) Medical record structured analysis method based on medical field entity
Yin et al. Chinese clinical named entity recognition with radical-level feature and self-attention mechanism
Chu et al. Quantitative knowledge presentation models of traditional Chinese medicine (TCM): A review
Feng et al. Knowledge discovery in traditional Chinese medicine: state of the art and perspectives
CN109903854B (en) Core medicine identification method based on traditional Chinese medicine literature
CN110334211A (en) A kind of Chinese medicine diagnosis and treatment knowledge mapping method for auto constructing based on deep learning
Yao et al. A convolutional neural network model for online medical guidance
CN108549639A (en) Based on the modified Chinese medicine case name recognition methods of multiple features template and system
CN109947901B (en) Prescription efficacy prediction method based on multilayer perceptron and natural language processing technology
Wang et al. A knowledge graph enhanced topic modeling approach for herb recommendation
Yao et al. Discovering treatment pattern in Traditional Chinese Medicine clinical cases by exploiting supervised topic model and domain knowledge
Liu et al. A novel transfer learning model for traditional herbal medicine prescription generation from unstructured resources and knowledge
CN114822874B (en) Prescription efficacy classification method based on characteristic deviation alignment
CN114141361B (en) Traditional Chinese medicine prescription recommendation method based on symptom term mapping and deep learning
CN113539412B (en) Deep learning-based Chinese herbal medicine recommendation system
CN114628008A (en) Social user depression tendency detection method based on heterogeneous graph attention network
Zhang et al. A hierarchical symptom-herb topic model for analyzing traditional Chinese medicine clinical diabetic data
Liu et al. HPE-GCN: Predicting efficacy of tonic formulae via graph convolutional networks integrating traditionally defined herbal properties
CN110010251B (en) Traditional Chinese medicine community information generation method, system, device and storage medium
Huang et al. Parecat: Patient record subcategorization for precision traditional chinese medicine
Chen et al. Application of NER and association rules to traditional Chinese medicine patent mining
Xie et al. Personalized diagnostic modal discovery of traditional Chinese medicine knowledge graph
CN115631851A (en) Prescription recommendation method and device, electronic equipment and nonvolatile storage medium
Zhang et al. A Semantic Analysis and Community Detection‐Based Artificial Intelligence Model for Core Herb Discovery from the Literature: Taking Chronic Glomerulonephritis Treatment as a Case Study

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant