CN110010251B - Traditional Chinese medicine community information generation method, system, device and storage medium - Google Patents

Traditional Chinese medicine community information generation method, system, device and storage medium Download PDF

Info

Publication number
CN110010251B
CN110010251B CN201910104918.3A CN201910104918A CN110010251B CN 110010251 B CN110010251 B CN 110010251B CN 201910104918 A CN201910104918 A CN 201910104918A CN 110010251 B CN110010251 B CN 110010251B
Authority
CN
China
Prior art keywords
chinese medicine
traditional chinese
network
edge
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910104918.3A
Other languages
Chinese (zh)
Other versions
CN110010251A (en
Inventor
赵淦森
王剑飞
黎子靖
庄序填
王桂兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China Normal University
Original Assignee
South China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China Normal University filed Critical South China Normal University
Priority to CN201910104918.3A priority Critical patent/CN110010251B/en
Publication of CN110010251A publication Critical patent/CN110010251A/en
Application granted granted Critical
Publication of CN110010251B publication Critical patent/CN110010251B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage

Abstract

The invention discloses a method, a system, a device and a storage medium for generating traditional Chinese medicine community information, wherein the method comprises the steps of establishing a prescription set, calculating the dependence degree of each traditional Chinese medicine in the prescription set on other traditional Chinese medicines in the prescription set, calculating the association degree between each corresponding traditional Chinese medicine, establishing a traditional Chinese medicine network, calculating the wandering probability of each edge in the traditional Chinese medicine network, carrying out random wandering according to the wandering probability of each edge obtained by calculation so as to obtain a plurality of traditional Chinese medicine sequences, carrying out vectorization treatment on each traditional Chinese medicine sequence, outputting the traditional Chinese medicines classified into the same category as a traditional Chinese medicine community, and the like. The method can find the overlapped drug communities and potential traditional Chinese medicine compatibility, has lower calculation complexity compared with the existing algorithm based on association rules or community discovery, and can achieve higher calculation efficiency. The invention is widely applied to the technical field of pharmaceutical informatics.

Description

Traditional Chinese medicine community information generation method, system, device and storage medium
Technical Field
The invention relates to the technical field of pharmacy informatics, in particular to a method, a system, a device and a storage medium for generating traditional Chinese medicine community information.
Background
Traditional Chinese medicine is one of Chinese essences, and a classical traditional Chinese medicine formula (prescription) is an essence of traditional Chinese medicine theory subjected to practical tests, and has great medical research value. The theory of traditional Chinese medicine is a study of the compatibility of Chinese medicines, and it is hoped that the Chinese medicine can be processed according to the information of the traditional Chinese medicine prescription to output a brand new Chinese medicine combination with a specific combination rule, so that a new prescription with better curative effect is obtained.
The existing new prescription mining technology is mainly based on the traditional association rule or community discovery algorithm, so that the existing new prescription mining technology has obvious defects. The association rule method only simply takes the co-occurrence frequency and times of the medicines in the prescription as the basis for measuring the closeness of the relationship between the medicines, although the association rule method can reflect some common medicine collocation rules to a certain extent, the complexity of traditional Chinese medicine collocation is usually ignored only based on the co-occurrence rules; because the compatibility of Chinese herbs has six relationships of mutual reinforcement, mutual antagonism, mutual killing, mutual aversion and contraindications, it is difficult to find out the collocation principle behind the Chinese herbs by the method based on the association rule. The community discovery-based method generally has limitations, and although a non-overlapping community discovery algorithm can discover partial knowledge of a complex theory of traditional Chinese medicines, overlapping property of the use of medicine communities is ignored; and the relationship of the traditional Chinese medicines with more attributes is complex, and the community discovery algorithm is difficult to fully utilize clinical diagnosis and treatment data, so that some precious traditional Chinese medicine data are difficult to effectively utilize, and the complex relationship among the traditional Chinese medicines is difficult to express.
Interpretation of terms:
graph Embedding: graph Embedding is a model of the combination of Graph analysis problems (Graph analytics) and representation learning problems (representation learning). The purpose of graph analytics is to mine the graph with valuable information. The representation learning can convert the data into vector representation, so that various mature data mining algorithms, such as classification, prediction, clustering algorithm, etc., can be used to extract useful information from the data more easily. The goal of the Graph embedding model is to combine the above two models to learn a vector expression that can retain useful information in the Graph (such as Graph structure information and association information between Graph nodes) from the Graph data.
Random walk (random walk): the basic idea of the Graph Embedding method based on random walk is to sample a path set from a Graph and then learn the feature vector representation of nodes or edges in the Graph based on the sampled paths. Since the graph can be represented by sampled paths, the graph is equivalent to being converted into a "document" composed of nodes, and therefore the Word2vec is used as a representative Word Embedding method. The first graphpambedding method proposed based on the idea of random walk is deep walk, which is a graphpambedding method combining random walk and Word2 Vec.
Fuzzy clustering: fuzzy clustering analysis is a mathematical method for describing and classifying objects according to certain requirements by adopting fuzzy mathematical language. Fuzzy clustering analysis generally refers to constructing a fuzzy matrix according to the attributes of a research object, and determining a clustering relation according to a certain membership degree on the basis, namely quantitatively determining the fuzzy relation among samples by using a fuzzy mathematical method, thereby objectively and accurately clustering. Clustering is to divide a data set into a plurality of classes or clusters, so that the data difference between the classes should be as large as possible, and the data difference between the classes should be as small as possible, i.e., the principle of "minimizing inter-class similarity and maximizing intra-class similarity".
Disclosure of Invention
In order to solve the above technical problems, the present invention provides a method, a system, an apparatus and a storage medium for generating Chinese medicine community information.
In one aspect, the invention includes a method for generating information of a traditional Chinese medicine community, comprising the following steps:
establishing a prescription set; the prescription set comprises a plurality of prescriptions, and each prescription consists of corresponding traditional Chinese medicine;
respectively calculating the dependence of each traditional Chinese medicine in the prescription set on other traditional Chinese medicines in the prescription set;
calculating the association degree between the corresponding traditional Chinese medicine medicaments according to the dependence degrees;
establishing a traditional Chinese medicine network; the traditional Chinese medicine network comprises a plurality of nodes which are respectively in one-to-one correspondence with the traditional Chinese medicines in the prescription set; when the degree of association between the traditional Chinese medicine corresponding to any two nodes is greater than a preset first threshold value, an edge with weight connecting the two nodes exists, otherwise, the edge connecting the two nodes does not exist; the weight of the edge is equal to the degree of association between the traditional Chinese medicine corresponding to the two nodes connected with the edge;
calculating the wandering probability of each edge in the traditional Chinese medicine network by using a random wandering algorithm, so as to perform directed processing on the traditional Chinese medicine network;
in the traditional Chinese medicine network subjected to oriented processing, random walk is carried out according to the calculated walk probability of each edge, so that a plurality of traditional Chinese medicine sequences are obtained; each traditional Chinese medicine sequence consists of traditional Chinese medicine medicines corresponding to nodes passing through in the random walk process;
vectorizing each traditional Chinese medicine sequence to obtain a plurality of traditional Chinese medicine vectors;
processing each Chinese medicine vector by using a clustering algorithm; the clustering algorithm is used for classifying the traditional Chinese medicine corresponding to each traditional Chinese medicine vector into corresponding categories;
and outputting the traditional Chinese medicine classified into the same category as a traditional Chinese medicine community.
Further, the calculation formula of the dependency is as follows:
Figure BDA0001966505750000031
in the formula, Ind (h2| h1) is the dependence of the traditional Chinese medicine h2 on the traditional Chinese medicine h1, | h1| is the occurrence frequency of the traditional Chinese medicine h1 in the formula set, and f (h1, h2)iIs the ith prescription in the prescriptions simultaneously containing the Chinese medicine h1 and the Chinese medicine h2, f (h1, h2)iLength is formula f (h1, h2)iThe number of the contained Chinese medicine.
Further, the calculation formula of the correlation degree is as follows:
Figure BDA0001966505750000032
in the formula, the degree of correlation between the traditional Chinese medicine h1 and the traditional Chinese medicine h2, Ind (h2| h1) is the degree of dependence of the traditional Chinese medicine h2 on the traditional Chinese medicine h1, Ind (h1| h2) is the degree of dependence of the traditional Chinese medicine h1 on the traditional Chinese medicine h2, | h1| is the occurrence frequency of the traditional Chinese medicine h1 in the formula set, and | h2| is the degree of occurrence frequency of the traditional Chinese medicine h2 in the formula setK is a preset second threshold value.
Further, the calculation formula used by using the random walk algorithm is the following softmax function:
Figure BDA0001966505750000033
in the formula, sigma (Z)jIs the wandering probability of the jth edge connected with the node Z in the traditional Chinese medicine network, ZjThe weight of the j-th edge connected with the node Z in the traditional Chinese medicine network, i is a serial number, and K is the number of all edges connected with the node Z in the traditional Chinese medicine network.
Further, the step of performing random walk in the traditional Chinese medicine network subjected to the oriented processing according to the calculated walk probability of each edge, thereby obtaining a plurality of traditional Chinese medicine sequences specifically includes:
setting the walking times corresponding to each node in the traditional Chinese medicine network;
setting the number of edges passed by each random walk;
traversing all nodes in the traditional Chinese medicine network to respectively serve as starting points to carry out random walk according to the walking times, the number of the passed edges and the walking probability of each edge;
and outputting the traditional Chinese medicine corresponding to the nodes passing through in each random walk process according to the walk sequence, thereby obtaining a plurality of traditional Chinese medicine sequences.
Further, the vectorizing of each of the chinese medicine sequences to obtain a plurality of chinese medicine vectors includes:
inputting each Chinese medicine sequence into a skip-gram model in a Word2vec algorithm as a document;
receiving the weight of Hidden Layer neuron output of the Hidden Layer Linear Neurons in the skip-gram model;
and returning the weight output by the hidden layer neuron as a traditional Chinese medicine vector.
Further, the step of processing each of the chinese medicine vectors using a clustering algorithm specifically includes:
performing category setting on a Fuzzy C-Means clustering algorithm; each category corresponds to a corresponding third threshold value;
inputting each traditional Chinese medicine vector into a Fuzzy C-Means clustering algorithm, and receiving classification probability output by the Fuzzy C-Means clustering algorithm and corresponding to each traditional Chinese medicine vector;
and when the classification probability reaches a corresponding third threshold value, classifying the traditional Chinese medicine corresponding to the traditional Chinese medicine vector corresponding to the classification probability into a category corresponding to the third threshold value.
In another aspect, the present invention further provides a system for generating information of a chinese medicine community, comprising:
the prescription set module is used for establishing a prescription set; the prescription set comprises a plurality of prescriptions, and each prescription consists of corresponding traditional Chinese medicine;
the dependency calculation module is used for calculating the dependency of each traditional Chinese medicine in the prescription set on other traditional Chinese medicines in the prescription set respectively;
the relevancy calculation module is used for calculating the relevancy between the corresponding traditional Chinese medicine medicaments according to the dependencies;
the traditional Chinese medicine network module is used for establishing a traditional Chinese medicine network; the traditional Chinese medicine network comprises a plurality of nodes which are respectively in one-to-one correspondence with the traditional Chinese medicines in the prescription set; when the degree of association between the traditional Chinese medicine corresponding to any two nodes is greater than a preset first threshold value, an edge with weight connecting the two nodes exists, otherwise, the edge connecting the two nodes does not exist; the weight of the edge is equal to the degree of association between the traditional Chinese medicine corresponding to the two nodes connected with the edge;
the directed processing module is used for calculating the wandering probability of each edge in the traditional Chinese medicine network by using a random wandering algorithm so as to perform directed processing on the traditional Chinese medicine network;
the random walk module is used for carrying out random walk in the traditional Chinese medicine network subjected to oriented processing according to the calculated walk probability of each edge so as to obtain a plurality of traditional Chinese medicine sequences; each traditional Chinese medicine sequence consists of traditional Chinese medicine medicines corresponding to nodes passing through in the random walk process;
the vectorization processing module is used for vectorizing each traditional Chinese medicine sequence so as to obtain a plurality of traditional Chinese medicine vectors;
the clustering module is used for processing each traditional Chinese medicine vector by using a clustering algorithm; the clustering algorithm is used for classifying the traditional Chinese medicine corresponding to each traditional Chinese medicine vector into corresponding categories;
and the output module is used for outputting the traditional Chinese medicine classified into the same category as a traditional Chinese medicine community.
In another aspect, the present invention further includes a device for generating information of a community of traditional Chinese medicine, including a memory for storing at least one program and a processor for loading the at least one program to perform the method of the present invention.
In another aspect, the invention also includes a storage medium having stored therein processor-executable instructions for performing the inventive method when executed by a processor.
The invention has the beneficial effects that: the invention measures the relationship between two traditional Chinese medicines by the dependency, takes the dependency as the weight of the edge in the traditional Chinese medicine network, increases the information content of the traditional Chinese medicine network, carries out random walk on the traditional Chinese medicine network to obtain the traditional Chinese medicine sequence and the traditional Chinese medicine vector for processing by a clustering algorithm, can discover overlapping medicine communities and discover potential traditional Chinese medicine compatibility.
Drawings
FIG. 1 is a flowchart of an embodiment of a method for generating Chinese medicine community information according to the present invention;
FIG. 2 is a diagram of a network of traditional Chinese medicine according to an embodiment of the present invention;
FIG. 3 is a block diagram of a network of Chinese herbs before being directed processed in an embodiment of the present invention;
FIG. 4 is a structural diagram of a network of Chinese medicine after being subjected to directed processing in the embodiment of the present invention;
FIG. 5 is a diagram of a network for randomly walking Chinese herbs in an embodiment of the present invention;
FIG. 6 is a block diagram of a sequence of Chinese herbs obtained by performing random walk according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a skip-gram model used in an embodiment of the present invention.
Detailed Description
The invention comprises a traditional Chinese medicine community information generation method, and with reference to fig. 1, the method comprises the following steps:
s1, establishing a prescription set; the prescription set comprises a plurality of prescriptions, and each prescription consists of corresponding traditional Chinese medicine;
s2, respectively calculating the dependence of each traditional Chinese medicine in the prescription set on other traditional Chinese medicines in the prescription set;
s3, calculating the association degree between the corresponding traditional Chinese medicine medicaments according to the dependence degrees;
s4, establishing a traditional Chinese medicine network; the traditional Chinese medicine network comprises a plurality of nodes which are respectively in one-to-one correspondence with the traditional Chinese medicines in the prescription set; when the degree of association between the traditional Chinese medicine corresponding to any two nodes is greater than a preset first threshold value, an edge with weight connecting the two nodes exists, otherwise, the edge connecting the two nodes does not exist; the weight of the edge is equal to the degree of association between the traditional Chinese medicine corresponding to the two nodes connected with the edge;
s5, calculating the migration probability of each edge in the traditional Chinese medicine network by using a random migration algorithm, so as to perform oriented processing on the traditional Chinese medicine network;
s6, in the traditional Chinese medicine network subjected to oriented processing, random walk is carried out according to the calculated walk probability of each edge, so that a plurality of traditional Chinese medicine sequences are obtained; each traditional Chinese medicine sequence consists of traditional Chinese medicine medicines corresponding to nodes passing through in the random walk process;
s7, vectorizing each traditional Chinese medicine sequence to obtain a plurality of traditional Chinese medicine vectors;
s8, processing each traditional Chinese medicine vector by using a clustering algorithm; the clustering algorithm is used for classifying the traditional Chinese medicine corresponding to each traditional Chinese medicine vector into corresponding categories;
and S9, outputting the traditional Chinese medicine classified into the same category as a traditional Chinese medicine community.
In step S1, a plurality of prescriptions with good therapeutic effects are collected by consulting the famous and old chinese medicine, querying the classic chinese medicine or accessing the database of chinese medicine, etc., thereby forming a prescription set. Each prescription consists of one or more Chinese medicinal materials. For example, a prescription for treating prostatosis comprises tortoise shell, rhizoma anemarrhenae, peach kernel, hispid fig, combined spicebush root, Chinese taxillus twig, lychee shell and other medicines.
Different formulations may contain the same drug. For example, the above-mentioned prescriptions for treating prostatosis include hispid fig, and the prescription for activating qi-flowing and removing dampness also includes hispid fig. In this example, the drugs related to the prescription are distinguished according to their properties as the drug type itself. For example, when the above-mentioned formula for treating prostatosis and one of the above-mentioned formula for promoting qi circulation and removing dampness constitute a formula set, the formula set includes hispid fig in the formula for treating prostatosis and hispid fig in the formula for promoting qi circulation and removing dampness, and these two kinds of hispid fig are regarded as the same Chinese medicine in the formula set in this embodiment.
In step S2, the dependence of each chinese medicine in the set of prescriptions on other chinese medicines in the set of prescriptions is calculated, respectively. For example, when the above-mentioned prescription for treating prostatosis and one prescription for promoting the circulation of qi and removing dampness constitute a prescription set, the dependency degree between the hispid fig in the prescription for treating prostatosis and other traditional Chinese medicines such as tortoise shell, common anemarrhena rhizome, peach kernel, combined spicebush root, mistletoe and lychee shell in the prescription for treating prostatosis and other prescriptions is calculated respectively; for tortoise shell in the prescription for treating prostatosis, the dependence degree between the tortoise shell and other traditional Chinese medicines in the prescription for treating prostatosis, such as hispid fig, rhizoma anemarrhenae, peach kernel, combined spicebush root, loranthus parasiticus and lychee shell, and other prescriptions, is respectively calculated.
In the present embodiment, the dependency is calculated in step S2 by the following formula:
Figure BDA0001966505750000061
in the formula, Ind (h2| h1) is the dependence of the traditional Chinese medicine h2 on the traditional Chinese medicine h1, | h1| is the occurrence frequency of the traditional Chinese medicine h1 in the formula set, and f (h1, h2)iIs the ith prescription in the prescriptions simultaneously containing the Chinese medicine h1 and the Chinese medicine h2, f (h1, h2)iLength is formula f (h1, h2)iThe number of the contained Chinese medicine.
In the formula for calculating the dependency of this embodiment, each of the traditional Chinese medicine is labeled by the property of the medicine itself, for example, all of the hispid fig in the formulas for treating prostatosis and the hispid fig in the formulas for promoting circulation of qi and removing dampness are labeled as h1, | h1| represents the number of occurrences of the hispid fig in the formula set, for example, if only one of the formulas for treating prostatosis and one of the formulas for promoting circulation of qi and removing dampness include hispid fig in the formula set, the hispid appears twice in the formula set, and | h1| is 2. f (h1, h2)iFor the ith prescription of the prescriptions containing both the Chinese medicine h1 and the Chinese medicine h2, for example, there are a prescription for treating prostatosis and a prescription for tonifying kidney in the prescription set, which both contain hispid fig root and Anemarrhena asphodeloides, if hispid fig root is marked as h1 and Anemarrhena asphodeloides is marked as h2, f (h1, h2)1Prescription for treating prostatosis is shown as f (h1, h2)2Indicating a formula for tonifying the kidney. f (h1, h2)iLength is formula f (h1, h2)iThe number of Chinese medicine drugs contained, for example, f (h1, h2) in this embodiment1Showing a prescription for treating prostatosis, the complete prescription of the prescription is hispid fig, rhizoma anemarrhenae, peach kernel, combined spicebush root, mistletoe and lychee shell, namely the prescription contains 6 Chinese medicinal herbs in total, then f (h1, h2)1.length=6。
In this embodiment, it can be known from the calculation formula of the dependency degree that the calculation of the dependency degree has no interchangeability, that is, the dependency degree of the chinese medicine h2 on the chinese medicine h1 is generally not equal to the dependency degree of the chinese medicine h1 on the chinese medicine h 2.
In step S3, the association degree between the corresponding Chinese medicine is calculated according to the dependency degrees calculated in step S2. In this embodiment, the degree of association is a parameter that reflects the degree of association between any two Chinese medicine in the prescription set. In the process of calculating the dependencies in step S2, the process of forming a drug pair from any two chinese traditional medicines in the prescription set has been completed, so in step S3, the drug pair formed in step S2 and the dependencies of the two chinese traditional medicines involved in the drug pair can be directly used to calculate the association degree of the drug pair.
In this embodiment, in step S3, the degree of association is calculated by the following formula:
Figure BDA0001966505750000071
in the formula, the correlation degree between the traditional Chinese medicine h1 and the traditional Chinese medicine h2 is shown as Ind (h2| h1) is the dependency degree of the traditional Chinese medicine h2 on the traditional Chinese medicine h1, Ind (h1| h2) is the dependency degree of the traditional Chinese medicine h1 on the traditional Chinese medicine h2, | h1| is the occurrence frequency of the traditional Chinese medicine h1 in the formula set, | h2| is the occurrence frequency of the traditional Chinese medicine h2 in the formula set, and k is a preset second threshold value.
In this embodiment, the dependency of the chinese medicine h1 on the chinese medicine h2 or the dependency of the chinese medicine h2 on the chinese medicine h1 can be directly used as the association between the chinese medicine h1 and the chinese medicine h 2. Specifically, a second threshold value k is set, and when the minimum of | h1| and | h2| is smaller than k, the smaller of the dependence of the traditional Chinese medicine h1 on the traditional Chinese medicine h2 and the dependence of the traditional Chinese medicine h2 on the traditional Chinese medicine h1 is used as the degree of association between the traditional Chinese medicine h1 and the traditional Chinese medicine h 2; when the minimum of | h1| and | h2| is greater than or equal to k, the greater of the dependence of the traditional Chinese medicine h1 on the traditional Chinese medicine h2 and the dependence of the traditional Chinese medicine h2 on the traditional Chinese medicine h1 is taken as the degree of association between the traditional Chinese medicine h1 and the traditional Chinese medicine h 2. By setting the second threshold and determining the specific value of the association degree according to the magnitude relationship between the second threshold and | h1| and | h2|, the unreasonable situation that the greater of Ind (h2| h1) and Ind (h1| h2) is directly used as the association degree can be avoided, wherein the unreasonable situation is that when the denominator in the formula is small, the association degree is large, so that the association degree of the traditional Chinese medicine which occurs frequently and is small per se with the traditional Chinese medicine in the same formula is large.
In this embodiment, as can be known from the calculation formula of the association degree, the calculation of the association degree has commutative property, that is, the association degree of the traditional Chinese medicine h2 to the traditional Chinese medicine h1 is equal to the association degree of the traditional Chinese medicine h1 to the traditional Chinese medicine h 2.
In step S4, a chinese medicine network is established, which is actually a data set recording the chinese medicines in the prescription set and their interrelationships. Referring to fig. 2, the chinese medicine network includes a plurality of nodes, which correspond to each chinese medicine in the prescription set one-to-one. The traditional Chinese medicine network also comprises edges connecting the two nodes, and the generation rules of the edges are as follows: when the association degree between the traditional Chinese medicine medicines corresponding to any two nodes is greater than a preset first threshold value, generating an edge connecting the two nodes, and giving the association degree between the traditional Chinese medicine medicines corresponding to the two nodes as a weight to the edge; when the association degree between the traditional Chinese medicine corresponding to the two nodes is smaller than a preset first threshold, no edge exists between the two nodes.
Preferably, the first threshold may be set to 0, since the association degree between two chinese medicines has a value greater than 0 as long as the two chinese medicines are present in the same prescription, that is, as long as the two chinese medicines are present in the same prescription, there is an edge between two nodes corresponding to the two chinese medicines in the chinese medicine network.
In step S4, each edge in the chinese medicine network may be given a weight by: initializing the association degree between the traditional Chinese medicines corresponding to each node in the traditional Chinese medicine network to 0, traversing all prescriptions in the prescription set, calculating the association degree between medicine pairs obtained by pairwise combination of the traditional Chinese medicines involved in each observation of one prescription according to the steps S2 and S3, accumulating the association degrees of the same medicine pair obtained by executing the steps S2 and S3 each time, and after all prescriptions are observed, obtaining the final accumulated value which is the association degree of the medicine pair, namely the weight of the edge between the two nodes in the traditional Chinese medicine network corresponding to the medicine pair.
The association degree between the two traditional Chinese medicine medicines calculated in step S3 has commutative property, and accordingly the weight of the edge between the two nodes in the traditional Chinese medicine network also has commutative property, so the edge of the traditional Chinese medicine network obtained in step S4 has nondirectivity. In step S5, based on the original weight of the edge between the two nodes, a random walk algorithm is used to calculate the walk probability of each edge in the traditional Chinese medicine network, and each edge is given a weight again. Because the wandering probability calculated by the random wandering algorithm is related to the starting and stopping sequence of the nodes, each edge in the traditional Chinese medicine network after the weight is given again has a direction, and the directional processing of the traditional Chinese medicine network is realized.
In this embodiment, in step S5, the calculation formula used by the random walk algorithm is the following softmax function:
Figure BDA0001966505750000091
in the formula, sigma (Z)jIs the wandering probability of the jth edge connected with the node Z in the traditional Chinese medicine network, ZjThe weight of the j-th edge connected with the node Z in the traditional Chinese medicine network, i is a serial number, and K is the number of all edges connected with the node Z in the traditional Chinese medicine network.
Fig. 3 is a network of chinese medicine with 4 nodes, each edge being undirected before being subjected to the direction processing. The network shown in fig. 4 is obtained by performing the directed processing on the traditional Chinese medicine network shown in fig. 3, and at this time, the weight of each edge is related to the start-stop relationship of two nodes connected with the edge, so that each edge has a direction.
In step S6, in the traditional Chinese medicine network subjected to the directed processing, random walk is performed according to the calculated walk probability of each edge, so as to obtain a plurality of traditional Chinese medicine sequences. Step S6 specifically includes the following steps:
s601, setting the walking times corresponding to each node in the traditional Chinese medicine network;
s602, setting the number of edges passed by each random walk;
s603, traversing all nodes in the traditional Chinese medicine network to respectively serve as starting points to carry out random walk according to the walking times, the number of the passed edges and the walking probability of each edge;
s604, outputting the traditional Chinese medicine corresponding to the nodes passing through in each random walking process according to the walking sequence, thereby obtaining a plurality of traditional Chinese medicine sequences.
In step S601, a total number of times of walking is set. In step S603, each wandering starts from a node in the network, and the wandering is stopped after passing through a plurality of edges according to the number of edges passed by each random wandering set in step S602, where the passed edges in the wandering are randomly determined by the wandering probabilities determined by the edges and the directions thereof. In step S604, the traditional Chinese medicine corresponding to the node through which the random walk passes each time is output according to the walking sequence, one traditional Chinese medicine sequence is obtained by the random walk each time, and a plurality of traditional Chinese medicine sequences can be obtained by the random walk for a plurality of times.
The principle of step S6 is shown in fig. 4 and 5. Fig. 4 is a network of 6 nodes, and fig. 6 is a sequence of Chinese herbs obtained by respectively starting from Chinese herb 1, Chinese herb 2 and Chinese herb 6 and randomly walking through different walking routes.
In step S7, vectorization processing is performed on each of the chinese medicine sequences, thereby obtaining a plurality of chinese medicine vectors. Step S7 specifically includes the following steps:
s701, inputting each traditional Chinese medicine sequence into a skip-gram model in a Word2vec algorithm as a document;
s702, receiving the weight output by Hidden Layer Neurons of the high Layer Linear Neurons in the skip-gram model;
and S703, returning the weight output by the hidden layer neuron as a traditional Chinese medicine vector.
The principle of the skip-gram model is shown in FIG. 7. The skip-gram model can receive a document and predict a context according to a middle word in the received document, and in the embodiment, the function of the skip-gram model is utilized to input a traditional Chinese medicine sequence into the skip-gram model as the document. Hidden Layer Neurons in the skip-gram model can output weight values corresponding to the received traditional Chinese medicine sequences, and the weight values are traditional Chinese medicine vectors corresponding to the traditional Chinese medicine sequences. The obtained vector of the Chinese medicine can be directly used for the processing of the step S8, and can also be used for the processing of the step S8 after other Chinese medicine attribute characteristics are blended.
In step S8, each of the vectors of chinese medicine is processed using a clustering algorithm. Step S8 specifically includes:
s801, performing category setting on a Fuzzy C-Means clustering algorithm; each category corresponds to a corresponding third threshold value;
s802, inputting each traditional Chinese medicine vector into a Fuzzy C-Means clustering algorithm, and receiving classification probability output by the Fuzzy C-Means clustering algorithm and corresponding to each traditional Chinese medicine vector;
s803, when the classification probability reaches the corresponding third threshold, classifying the traditional Chinese medicine corresponding to the traditional Chinese medicine vector corresponding to the classification probability into the category corresponding to the third threshold.
The clustering algorithm used in step S8 is a Fuzzy C-Means clustering algorithm. When the Fuzzy C-Means clustering algorithm is used, a plurality of required categories are set, and third threshold values corresponding to the categories are set. And after receiving each traditional Chinese medicine vector, outputting classification probabilities corresponding to the traditional Chinese medicine vectors by using a Fuzzy C-Means clustering algorithm, and classifying the traditional Chinese medicine corresponding to the corresponding traditional Chinese medicine vector into a category corresponding to a third threshold when the classification probability reaches a certain third threshold. After the clustering processing of all the Chinese medicine vectors is completed, each category set in the Fuzzy C-Means clustering algorithm has corresponding Chinese medicine, namely the Chinese medicine is classified into various categories. In step S9, the chinese drugs classified into the same category are output as chinese drug communities that reflect medical information hidden in the prescription and can be used for further research or experiments, thereby achieving a new prescription with good effect.
By properly setting the Fuzzy C-Means clustering algorithm and carrying out experimental verification, the method of the embodiment can have the following effects: for the prescription for treating prostatosis, which comprises tortoise shell, rhizoma anemarrhenae, peach kernel, hispid fig, combined spicebush root, Chinese taxillus twig, lychee shell and other traditional Chinese medicines, the tortoise shell and the rhizoma anemarrhenae are classified into one category, and the hispid fig, the combined spicebush root and Chinese taxillus twig are classified into one category. According to the theory of traditional Chinese medicine and further research, tortoise shell and rhizoma anemarrhenae are core medicines with the effect of treating the prostatosis in the prescription, and the hispid fig, the combined spicebush root and the parasitic loranthus have no obvious effect of treating the prostatosis, but belong to medicine combinations used according to the constitutions of patients, and also have medical values, so that the compatibility of the traditional Chinese medicines of the hispid fig, the combined spicebush root and the parasitic loranthus is excavated by the method.
S1-S9 in this embodiment is a Graph Embedding method, and has the following advantages:
1. the relationship between the two traditional Chinese medicine medicines is measured through the dependency degree, and the dependency degree is used as the weight of the edges in the traditional Chinese medicine network, so that the information content of the traditional Chinese medicine network is increased;
2. according to the weight of the edges in the traditional Chinese medicine network, the traditional Chinese medicine network is converted into a probability directed graph by using a softmax formula, so that random walk can be realized on the basis;
3. random walking on the probability directed graph is Deepwalk with the weight, and compared with the basic Deepwalk, the probability directed graph can output a traditional Chinese medicine sequence and further output a traditional Chinese medicine vector by considering the weight factor, and the traditional Chinese medicine vector can be more easily blended into other traditional Chinese medicine attribute characteristics;
4. the Fuzzy C-Means clustering algorithm is used for processing the traditional Chinese medicine vectors, so that an overlapped medicine community can be found, and potential traditional Chinese medicine compatibility can be found.
5. Compared with the existing algorithm based on association rules or community discovery, the drug vector acquired by the Graph Embedding technology has lower calculation complexity and can achieve higher calculation efficiency.
The embodiment further includes a system for generating information of a traditional Chinese medicine community, including:
the prescription set module is used for establishing a prescription set; the prescription set comprises a plurality of prescriptions, and each prescription consists of corresponding traditional Chinese medicine;
the dependency calculation module is used for calculating the dependency of each traditional Chinese medicine in the prescription set on other traditional Chinese medicines in the prescription set respectively;
the relevancy calculation module is used for calculating the relevancy between the corresponding traditional Chinese medicine medicaments according to the dependencies;
the traditional Chinese medicine network module is used for establishing a traditional Chinese medicine network; the traditional Chinese medicine network comprises a plurality of nodes which are respectively in one-to-one correspondence with the traditional Chinese medicines in the prescription set; when the degree of association between the traditional Chinese medicine corresponding to any two nodes is greater than a preset first threshold value, an edge with weight connecting the two nodes exists, otherwise, the edge connecting the two nodes does not exist; the weight of the edge is equal to the degree of association between the traditional Chinese medicine corresponding to the two nodes connected with the edge;
the directed processing module is used for calculating the wandering probability of each edge in the traditional Chinese medicine network by using a random wandering algorithm so as to perform directed processing on the traditional Chinese medicine network;
the random walk module is used for carrying out random walk in the traditional Chinese medicine network subjected to oriented processing according to the calculated walk probability of each edge so as to obtain a plurality of traditional Chinese medicine sequences; each traditional Chinese medicine sequence consists of traditional Chinese medicine medicines corresponding to nodes passing through in the random walk process;
the vectorization processing module is used for vectorizing each traditional Chinese medicine sequence so as to obtain a plurality of traditional Chinese medicine vectors;
the clustering module is used for processing each traditional Chinese medicine vector by using a clustering algorithm; the clustering algorithm is used for classifying the traditional Chinese medicine corresponding to each traditional Chinese medicine vector into corresponding categories;
and the output module is used for outputting the traditional Chinese medicine classified into the same category as a traditional Chinese medicine community.
Each module in the traditional Chinese medicine community information generation system can be a hardware module or a software module with corresponding functions.
The invention also comprises a traditional Chinese medicine community information generation device which comprises a memory and a processor, wherein the memory is used for storing at least one program, and the processor is used for loading the at least one program to execute the method.
The invention also includes a storage medium having stored therein processor-executable instructions for performing the inventive method when executed by a processor.
The system, the device and the storage medium for generating the Chinese medicine community information in the embodiment can execute the method for generating the Chinese medicine community information, can execute any combination of the implementation steps of the method embodiment, and have corresponding functions and beneficial effects of the method.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (9)

1. A traditional Chinese medicine community information generation method is characterized by comprising the following steps:
establishing a prescription set; the prescription set comprises a plurality of prescriptions, and each prescription consists of corresponding traditional Chinese medicine;
respectively calculating the dependence of each traditional Chinese medicine in the prescription set on other traditional Chinese medicines in the prescription set;
calculating the association degree between the corresponding traditional Chinese medicine medicaments according to the dependence degrees;
establishing a traditional Chinese medicine network; the traditional Chinese medicine network comprises a plurality of nodes which are respectively in one-to-one correspondence with the traditional Chinese medicines in the prescription set; when the degree of association between the traditional Chinese medicine corresponding to any two nodes is greater than a preset first threshold value, an edge with weight connecting the two nodes exists, otherwise, the edge connecting the two nodes does not exist; the weight of the edge is equal to the degree of association between the traditional Chinese medicine corresponding to the two nodes connected with the edge;
calculating the wandering probability of each edge in the traditional Chinese medicine network by using a random wandering algorithm, so as to perform directed processing on the traditional Chinese medicine network;
in the traditional Chinese medicine network subjected to oriented processing, random walk is carried out according to the calculated walk probability of each edge, so that a plurality of traditional Chinese medicine sequences are obtained; each traditional Chinese medicine sequence consists of traditional Chinese medicine medicines corresponding to nodes passing through in the random walk process;
vectorizing each traditional Chinese medicine sequence to obtain a plurality of traditional Chinese medicine vectors;
processing each Chinese medicine vector by using a clustering algorithm; the clustering algorithm is used for classifying the traditional Chinese medicine corresponding to each traditional Chinese medicine vector into corresponding categories;
outputting the traditional Chinese medicine classified into the same category as a traditional Chinese medicine community;
the step of performing random walk in the traditional Chinese medicine network subjected to the directed processing according to the calculated walk probability of each edge so as to obtain a plurality of traditional Chinese medicine sequences specifically comprises:
setting the walking times corresponding to each node in the traditional Chinese medicine network;
setting the number of edges passed by each random walk;
traversing all nodes in the traditional Chinese medicine network to respectively serve as starting points to carry out random walk according to the walking times, the number of the passed edges and the walking probability of each edge;
and outputting the traditional Chinese medicine corresponding to the nodes passing through in each random walk process according to the walk sequence, thereby obtaining a plurality of traditional Chinese medicine sequences.
2. The method as claimed in claim 1, wherein the dependency is calculated by the following formula:
Figure FDA0003509854900000011
in the formula, Ind (h2h1) is the dependence of the traditional Chinese medicine h2 on the traditional Chinese medicine h1, | h1| is the occurrence frequency of the traditional Chinese medicine h1 in the formula set, f (h1, h2)iIs the same asThe ith prescription f (h1, h2) of the prescriptions containing the Chinese medicine h1 and the Chinese medicine h2iLength is formula f (h1, h2)iThe number of the contained Chinese medicine.
3. The method as claimed in claim 2, wherein the calculation formula of the association degree is:
Figure FDA0003509854900000021
in the formula, the correlation degree between the traditional Chinese medicine h1 and the traditional Chinese medicine h2 is shown as Ind (h2| h1) is the dependency degree of the traditional Chinese medicine h2 on the traditional Chinese medicine h1, Ind (h1| h2) is the dependency degree of the traditional Chinese medicine h1 on the traditional Chinese medicine h2, | h1| is the occurrence frequency of the traditional Chinese medicine h1 in the formula set, | h2| is the occurrence frequency of the traditional Chinese medicine h2 in the formula set, and k is a preset second threshold value.
4. The method as claimed in claim 1, wherein the calculation formula used by the random walk algorithm is softmax function:
Figure FDA0003509854900000022
in the formula, sigma (Z)jIs the wandering probability of the jth edge connected with the node Z in the traditional Chinese medicine network, ZjThe weight of the j-th edge connected with the node Z in the traditional Chinese medicine network, i is a serial number, and K is the number of all edges connected with the node Z in the traditional Chinese medicine network.
5. The method of claim 1, wherein the step of vectorizing each of the chinese medicine sequences to obtain a plurality of chinese medicine vectors includes:
inputting each Chinese medicine sequence into a skip-gram model in a Word2vec algorithm as a document;
receiving the weight of Hidden Layer neuron output of the Hidden Layer Linear Neurons in the skip-gram model;
and returning the weight output by the hidden layer neuron as a traditional Chinese medicine vector.
6. The method of claim 1, wherein the step of processing each of the TCM vectors by using a clustering algorithm specifically comprises:
performing category setting on a Fuzzy C-Means clustering algorithm; each category corresponds to a corresponding third threshold value;
inputting each traditional Chinese medicine vector into a Fuzzy C-Means clustering algorithm, and receiving classification probability output by the Fuzzy C-Means clustering algorithm and corresponding to each traditional Chinese medicine vector;
and when the classification probability reaches a corresponding third threshold value, classifying the traditional Chinese medicine corresponding to the traditional Chinese medicine vector corresponding to the classification probability into a category corresponding to the third threshold value.
7. A system for generating information of a traditional Chinese medicine community, comprising:
the prescription set module is used for establishing a prescription set; the prescription set comprises a plurality of prescriptions, and each prescription consists of corresponding traditional Chinese medicine;
the dependency calculation module is used for calculating the dependency of each traditional Chinese medicine in the prescription set on other traditional Chinese medicines in the prescription set respectively;
the relevancy calculation module is used for calculating the relevancy between the corresponding traditional Chinese medicine medicaments according to the dependencies;
the traditional Chinese medicine network module is used for establishing a traditional Chinese medicine network; the traditional Chinese medicine network comprises a plurality of nodes which are respectively in one-to-one correspondence with the traditional Chinese medicines in the prescription set; when the degree of association between the traditional Chinese medicine corresponding to any two nodes is greater than a preset first threshold value, an edge with weight connecting the two nodes exists, otherwise, the edge connecting the two nodes does not exist; the weight of the edge is equal to the degree of association between the traditional Chinese medicine corresponding to the two nodes connected with the edge;
the directed processing module is used for calculating the wandering probability of each edge in the traditional Chinese medicine network by using a random wandering algorithm so as to perform directed processing on the traditional Chinese medicine network;
the random walk module is used for carrying out random walk in the traditional Chinese medicine network subjected to oriented processing according to the calculated walk probability of each edge so as to obtain a plurality of traditional Chinese medicine sequences; each traditional Chinese medicine sequence consists of traditional Chinese medicine medicines corresponding to nodes passing through in the random walk process;
the vectorization processing module is used for vectorizing each traditional Chinese medicine sequence so as to obtain a plurality of traditional Chinese medicine vectors;
the clustering module is used for processing each traditional Chinese medicine vector by using a clustering algorithm; the clustering algorithm is used for classifying the traditional Chinese medicine corresponding to each traditional Chinese medicine vector into corresponding categories;
the output module is used for outputting the traditional Chinese medicine classified into the same category as a traditional Chinese medicine community;
in the traditional Chinese medicine network subjected to the directed processing, random walk is performed according to the calculated walk probability of each edge, so as to obtain a plurality of traditional Chinese medicine sequences, and the method specifically comprises the following steps:
setting the walking times corresponding to each node in the traditional Chinese medicine network;
setting the number of edges passed by each random walk;
traversing all nodes in the traditional Chinese medicine network to respectively serve as starting points to carry out random walk according to the walking times, the number of the passed edges and the walking probability of each edge;
and outputting the traditional Chinese medicine corresponding to the nodes passing through in each random walk process according to the walk sequence, thereby obtaining a plurality of traditional Chinese medicine sequences.
8. An apparatus for generating community information of traditional Chinese medicine, comprising a memory for storing at least one program and a processor for loading the at least one program to perform the method of any one of claims 1 to 6.
9. A storage medium having stored therein processor-executable instructions, which when executed by a processor, are configured to perform the method of any one of claims 1-6.
CN201910104918.3A 2019-02-01 2019-02-01 Traditional Chinese medicine community information generation method, system, device and storage medium Active CN110010251B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910104918.3A CN110010251B (en) 2019-02-01 2019-02-01 Traditional Chinese medicine community information generation method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910104918.3A CN110010251B (en) 2019-02-01 2019-02-01 Traditional Chinese medicine community information generation method, system, device and storage medium

Publications (2)

Publication Number Publication Date
CN110010251A CN110010251A (en) 2019-07-12
CN110010251B true CN110010251B (en) 2022-04-15

Family

ID=67165631

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910104918.3A Active CN110010251B (en) 2019-02-01 2019-02-01 Traditional Chinese medicine community information generation method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN110010251B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110706789B (en) * 2019-10-10 2022-05-24 电子科技大学 Excavation method for incompatibility of traditional Chinese medicines
CN113011471A (en) * 2021-02-26 2021-06-22 山东英信计算机技术有限公司 Social group dividing method, social group dividing system and related devices

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101615222A (en) * 2008-06-23 2009-12-30 中国医学科学院放射医学研究所 A kind of Chinese prescription designing technique based on the Chinese medicine effective component group
CN104820775A (en) * 2015-04-17 2015-08-05 南京大学 Discovery method of core drug of traditional Chinese medicine prescription
CN106126649A (en) * 2016-06-24 2016-11-16 北京千安哲信息技术有限公司 A kind of similar Chinese crude drug method for digging and device
CN107519262A (en) * 2017-10-14 2017-12-29 杜运升 One kind is promoted the sexual maturity scattered medicine and preparation method and application
CN108037093A (en) * 2017-12-20 2018-05-15 荣贵福 It is a kind of differentiate " Baizhi to be measured whether the method for carrying out sulfur fumigation
CN108647236A (en) * 2018-03-30 2018-10-12 山东管理学院 A kind of prescriptions of traditional Chinese medicine vector space model method and device based on Term co-occurrence

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101615222A (en) * 2008-06-23 2009-12-30 中国医学科学院放射医学研究所 A kind of Chinese prescription designing technique based on the Chinese medicine effective component group
CN104820775A (en) * 2015-04-17 2015-08-05 南京大学 Discovery method of core drug of traditional Chinese medicine prescription
CN106126649A (en) * 2016-06-24 2016-11-16 北京千安哲信息技术有限公司 A kind of similar Chinese crude drug method for digging and device
CN107519262A (en) * 2017-10-14 2017-12-29 杜运升 One kind is promoted the sexual maturity scattered medicine and preparation method and application
CN108037093A (en) * 2017-12-20 2018-05-15 荣贵福 It is a kind of differentiate " Baizhi to be measured whether the method for carrying out sulfur fumigation
CN108647236A (en) * 2018-03-30 2018-10-12 山东管理学院 A kind of prescriptions of traditional Chinese medicine vector space model method and device based on Term co-occurrence

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于中药组分和"组分结构"理论的中药研究模式的探讨;严红梅等;《中草药》;20150428;第1103-1110页 *

Also Published As

Publication number Publication date
CN110010251A (en) 2019-07-12

Similar Documents

Publication Publication Date Title
Zhang et al. An end-to-end deep learning architecture for graph classification
Yue et al. TreeUNet: Adaptive tree convolutional neural networks for subdecimeter aerial image segmentation
CN108520166B (en) Drug target prediction method based on multiple similarity network migration
CN108389614A (en) The method for building medical image collection of illustrative plates based on image segmentation and convolutional neural networks
CN106933985B (en) Analysis and discovery method of core party
CN116364299B (en) Disease diagnosis and treatment path clustering method and system based on heterogeneous information network
CN110010251B (en) Traditional Chinese medicine community information generation method, system, device and storage medium
CN111180024A (en) Data processing method and device based on word frequency and inverse document frequency and computer equipment
Fountoulakis et al. An optimization approach to locally-biased graph algorithms
CN113793696A (en) Similarity-based method, system, terminal and readable storage medium for predicting occurrence frequency of side effects of new drug
CN114141361B (en) Traditional Chinese medicine prescription recommendation method based on symptom term mapping and deep learning
WO2023134060A1 (en) Information pushing method and apparatus based on drug molecule image classification
Feng et al. Specgreedy: unified dense subgraph detection
CN114373554A (en) Drug interaction relation extraction method using drug knowledge and syntactic dependency relation
CN113380360A (en) Similar medical record retrieval method and system based on multi-mode medical record map
CN112435745A (en) Consultation strategy recommendation method and device, electronic equipment and storage medium
CN113946647A (en) DDIs (distributed denial of service) search engine based on medical entity vector and construction method thereof
CN115376658A (en) Artificial intelligent evaluation method for traditional Chinese medicine prescription based on fusion phenotype and molecular information of deep neural network
US11915832B2 (en) Apparatus and method for processing multi-omics data for discovering new drug candidate substance
CN111986815A (en) Project combination mining method based on co-occurrence relation and related equipment
CN113345514A (en) Method and device for establishing microorganism high-order network based on hypergraph clustering model
Zengyou Data mining for bioinformatics applications
Lin et al. A general iterative clustering algorithm
CN114610921B (en) Object cluster portrait determination method, device, computer equipment and storage medium
Kadam et al. Improving efficiency of similarity of document network using bisect K-means

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant