CN110010251A - A kind of Chinese medicine community information generation method, system, device and storage medium - Google Patents
A kind of Chinese medicine community information generation method, system, device and storage medium Download PDFInfo
- Publication number
- CN110010251A CN110010251A CN201910104918.3A CN201910104918A CN110010251A CN 110010251 A CN110010251 A CN 110010251A CN 201910104918 A CN201910104918 A CN 201910104918A CN 110010251 A CN110010251 A CN 110010251A
- Authority
- CN
- China
- Prior art keywords
- chinese medicine
- prescription
- network
- degree
- random walk
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
- G16H70/40—ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Toxicology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medicinal Chemistry (AREA)
- Pharmacology & Pharmacy (AREA)
- Chemical & Material Sciences (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention discloses a kind of Chinese medicine community information generation methods, system, device and storage medium, the method includes establishing prescription set, each Chinese medicine in the prescription set is calculated to the dependency degree of other Chinese medicines in prescription set, calculate the degree of association between each corresponding Chinese medicine, establish Chinese medicine network, calculate the migration probability on each side in the Chinese medicine network, random walk is carried out according to the migration probability on each side being calculated, to obtain multiple Chinese medicine sequences, vectorization processing is carried out to each Chinese medicine sequence, same category of Chinese medicine will be classified as Chinese medicine corporations export.The present invention can be found that overlapping drug corporations, finds potential drug matching, the present invention with it is existing based on the algorithm of correlation rule or community discovery compared to lower computation complexity, can achieve higher computational efficiency.The present invention is widely used in pharmacoinformatics technical field.
Description
Technical field
The present invention relates to pharmacoinformatics technical field, especially a kind of Chinese medicine community information generation method, system, device
And storage medium.
Background technique
Chinese medicine is one of Chinese quintessence of Chinese culture, and classical tcm prescription (prescription) is the theory of traditional Chinese medical science essence of the test of experience practice
China has huge medical research value.Theory of traditional Chinese medical science is particular about compatibility of drugs, i.e. a herb prescription is usually by plurality of Chinese
Drug collocation composition, the medical functions of Chinese medicinal formulae are from the plurality of Chinese drug itself and its group as its constituent
Conjunction relationship, therefore be the compatibility relationship between Study of Traditional Chinese Medicine drug to a research direction of tcm prescription, it is desirable to it can basis
Existing tcm prescription information is handled, and the completely new Traditional chinese medicine medicament combination with specific combination rule is exported, to excavate
Obtain the new prescription with more preferable curative effect.
Existing new prescription digging technology is mainly based upon what traditional correlation rule or community discovery algorithm carried out, therefore
Have the shortcomings that obvious.Association rules method, only merely using the co-occurrence frequency of the drug in prescription and number as weighing apparatus
Measure drug between close relation foundation, although association rules method can reflect out to a certain extent it is some common
Drug collocating rule, but only it is based purely on the complexity that co-occurrence rule often ignores Chinese medicine collocation;Due in drug matching
There is mutual reinforcement between, mutually make, mutual restraint between two drugs, the mutual-detoxication, the mutual inhibition, six kinds of relationships such as opposite, how the collocating principle of its behind to be dug to Chinese medicinal formulae
Excavate what the method to be also based on correlation rule was difficult to.Method based on community discovery generally has limitation, non-heavy
Although folded community discovery algorithm it can be found that Chinese medicine complexity theory partial knowledge, but have ignored the weight that drug corporations use
Folded property;And Chinese medicine properties is complicated compared with the relationship of drug, and community discovery algorithm is often difficult to make full use of clinic diagnosis data,
The Chinese medicine data of some preciousnesses are caused to be difficult to be used effectively, so that it is difficult to relationship complicated between Chinese medicine to show.
Term is explained:
Graph Embedding:Graph Embedding is that map analysis problem (graph analytics) and characterization are learned
The model of the combination of habit problem (representation learning).The purpose of graph analytics, is excavated from figure
Useful valuable information.And representation learning can then convert data to vector and indicate so that more
It is easy useful valuable in the extraction data such as data mining algorithm, such as classification, prediction, clustering algorithm using various maturation
Information.The target of Graph Embeding model is exactly to combine said two devices, from learn in diagram data out can be in reserved graph
The vector expression of useful information (such as graph structure information, the related information between node of graph).
Random walk (random walk): the Graph Embedding method basic ideas based on random walk are, from
Outbound path set is sampled in figure, is then based on the path that sampling comes out to learn the feature vector of figure interior joint or side and indicate.By
Represented by the path that figure can be come out by sampling, so figure, which is equivalent to, is converted into " document " being made of node, because
This is to represent Word Embedding method to be used in this with word2vec.First is mentioned based on random walk thought
The method of GraphEmbedding out is DeepWalk, and DeepWalk is to combine random walk and Word2Vec
GraphEmbedding method.
Fuzzy clustering: fuzzy cluster analysis is a kind of things to be described by certain requirement using fuzzy mathematics language
With the mathematical method of classification.Fuzzy cluster analysis generally refers to construct fuzzy matrix according to the attribute of research object itself, and
Clustering relationships are determined according to certain degree of membership on this basis, i.e., with the method for fuzzy mathematics the fuzzy pass between sample
It is quantitative determination, thus objective and accurately clustered.Cluster is exactly that data set is divided into multiple classes or cluster, so that each
Data differences between class answer it is as big as possible, in class between data differences answer as small as possible, as " minimize similar between class
Property, maximize similitude in class " principle.
Summary of the invention
In order to solve the above-mentioned technical problem, the purpose of the present invention is to provide a kind of Chinese medicine community information generation method, be
System, device and storage medium.
On the one hand, the present invention includes a kind of Chinese medicine community information generation method, comprising the following steps:
Establish prescription set;The prescription set includes multiple prescriptions, and each prescription is respectively by corresponding Chinese medicine
Composition;
Each Chinese medicine in the prescription set is calculated separately to the dependency degree of other Chinese medicines in prescription set;
According to each dependency degree, the degree of association between each corresponding Chinese medicine is calculated;
Establish Chinese medicine network;The Chinese medicine network include it is multiple respectively with each Chinese medicine in the prescription set one by one
Corresponding node;When the degree of association between the Chinese medicine corresponding to any two described nodes is greater than preset first threshold,
There are the sides with weight that one connects two nodes, otherwise there is no the sides for connecting two nodes;The power on the side
The degree of association being equal between Chinese medicine corresponding to two nodes that this edge is connected again;
The migration probability on each side in the Chinese medicine network is calculated using Random Walk Algorithm, thus to the Chinese medicine net
Network carries out orientedization processing;
By orientedization processing the Chinese medicine network in, according to the migration probability on each side being calculated progress with
Machine migration, to obtain multiple Chinese medicine sequences;Each Chinese medicine sequence is corresponding by the node passed through in random walk process
Chinese medicine composition;
Vectorization processing is carried out to each Chinese medicine sequence, to obtain multiple Chinese medicine vectors;
Each Chinese medicine vector is handled using clustering algorithm;The clustering algorithm is used for each Chinese medicine vector
Corresponding Chinese medicine is classified as corresponding classification;
Same category of Chinese medicine will be classified as to export as Chinese medicine corporations.
Further, the calculation formula of the dependency degree are as follows:
In formula, and Ind (h2 | h1) it is Chinese medicine h2 to Chinese medicine
The dependency degree of h1, | h1 | it is frequency of occurrence of the Chinese medicine h1 in the prescription set, f (h1, h2)iFor simultaneously comprising Chinese medicine
I-th of prescription in the prescription of drug h1 and Chinese medicine h2, f (h1, h2)i.length it is prescription f (h1, h2)iIn including
The quantity of medicine drug.
Further, the calculation formula of the degree of association are as follows:
In formula, be Chinese medicine h1 and
The degree of association between Chinese medicine h2, Ind (h2 | h1) are dependency degree of the Chinese medicine h2 to Chinese medicine h1, Ind (h1 | h2)
It is Chinese medicine h1 to the dependency degree of Chinese medicine h2, | h1 | it is frequency of occurrence of the Chinese medicine h1 in the prescription set, |
H2 | it is frequency of occurrence of the Chinese medicine h2 in the prescription set, k is preset second threshold.
Further, it is described using calculation formula used in Random Walk Algorithm be softmax function below:
In formula, σ (Z)jMigration by the j-th strip side connected the Chinese medicine nodes Z is general
Rate, ZjBy the weight on the j-th strip side connected Chinese medicine nodes Z, i is serial number, and K is to connect in Chinese medicine network with node Z
All sides quantity.
Further, in the Chinese medicine network by orientedization processing, according to each side being calculated
Migration probability carries out random walk, so that the step for obtaining multiple Chinese medicine sequences, specifically includes:
Migration number corresponding to each node in the Chinese medicine network is set;
The item number on the side that each random walk is passed through is set;
According to the migration number, process while item number and migration probability at each, traverse in the Chinese medicine network
All nodes carry out random walk respectively as starting point;
Chinese medicine corresponding to the node passed through in each secondary random walk process is exported by migration sequence, thus
Obtain multiple Chinese medicine sequences.
Further, described that vectorization processing is carried out to each Chinese medicine sequence, thus obtain multiple Chinese medicine vectors this
Step specifically includes:
It is input in the skip-gram model in Word2vec algorithm using each Chinese medicine sequence as document;
Receive the Hidden Layer Linear Neurons hidden layer neuron output in the skip-gram model
Weight;
The weight that the hidden layer neuron exports is returned as Chinese medicine vector.
Further, it described the step for each Chinese medicine vector is handled using clustering algorithm, specifically includes:
Category setting is carried out to Fuzzy C-Means clustering algorithm;Each classification respectively corresponds corresponding third threshold value;
Each Chinese medicine vector is input in Fuzzy C-Means clustering algorithm, Fuzzy C-Means cluster is received and calculates
The classification probability corresponding with each Chinese medicine vector of method output;
When the classification probability reaches corresponding third threshold value, by the corresponding Chinese medicine vector of the classification probability it is corresponding in
Medicine drug is classified as the corresponding classification of third threshold value.
On the other hand, the invention also includes a kind of Chinese medicine community informations to generate system, comprising:
Prescription collection modules, for establishing prescription set;The prescription set includes multiple prescriptions, each prescription difference
It is made of corresponding Chinese medicine;
Dependency degree computing module, for calculating separately each Chinese medicine in the prescription set to its in prescription set
The dependency degree of his Chinese medicine;
Calculation of relationship degree module, for calculating the degree of association between each corresponding Chinese medicine according to each dependency degree;
Chinese medicine network module, for establishing Chinese medicine network;The Chinese medicine network include it is multiple respectively with the prescription set
In the one-to-one node of each Chinese medicine;The degree of association between the Chinese medicine corresponding to any two described nodes is greater than
When preset first threshold, there are the sides with weight that one connects two nodes, otherwise there is no this two sections of connection
The side of point;The weight on the side is equal to the degree of association between Chinese medicine corresponding to two nodes that this edge is connected;
Orientedization processing module, the migration for calculating each side in the Chinese medicine network using Random Walk Algorithm are general
Rate, to carry out orientedization processing to the Chinese medicine network;
Random walk module, for by orientedization handle the Chinese medicine network in, according to each item being calculated
The migration probability on side carries out random walk, to obtain multiple Chinese medicine sequences;Each Chinese medicine sequence is by random walk process
The corresponding Chinese medicine composition of the node passed through;
Vectorization processing module, for carrying out vectorization processing to each Chinese medicine sequence, thus obtain multiple Chinese medicines to
Amount;
Cluster module, for being handled using clustering algorithm each Chinese medicine vector;The clustering algorithm is used for will
The corresponding Chinese medicine of each Chinese medicine vector is classified as corresponding classification;
Output module is exported for that will be classified as same category of Chinese medicine as Chinese medicine corporations.
On the other hand, described the invention also includes a kind of Chinese medicine community information generating means, including memory and processor
Memory is for storing at least one program, and the processor is for loading at least one described program to execute present invention side
Method.
On the other hand, the invention also includes a kind of storage mediums, wherein being stored with the executable instruction of processor, the place
The executable instruction of reason device is used to execute the method for the present invention when executed by the processor.
The beneficial effects of the present invention are: the present invention measures the relationship between two Chinese medicines by dependency degree, and with
Weight of the dependency degree as side in Chinese medicine network increases the information content of Chinese medicine network, and random walk institute is carried out on Chinese medicine network
Obtained Chinese medicine sequence and Chinese medicine vector is handled for clustering algorithm, it can be found that overlapping drug corporations, find in potential
Medicine compatibility, the present invention with it is existing based on the algorithm of correlation rule or community discovery compared with have lower computation complexity, can
To reach higher computational efficiency.
Detailed description of the invention
Fig. 1 is the flow chart of Chinese medicine community information generation method embodiment of the present invention;
Fig. 2 is the structure chart of Chinese medicine network in the embodiment of the present invention;
Fig. 3 is a structure chart by the Chinese medicine network before orientedization processing in the embodiment of the present invention;
Fig. 4 is the structure chart of a Chinese medicine network after orientedization processing in the embodiment of the present invention;
Fig. 5 is the structure chart of the Chinese medicine network of a progress random walk in the embodiment of the present invention;
Fig. 6 is the block diagram that the resulting Chinese medicine sequence of random walk is carried out in the embodiment of the present invention;
Fig. 7 is the schematic diagram of skip-gram model used in the embodiment of the present invention.
Specific embodiment
The present invention includes a kind of Chinese medicine community information generation method, referring to Fig.1, comprising the following steps:
S1. prescription set is established;The prescription set includes multiple prescriptions, and each prescription is respectively by corresponding Chinese medicine medicine
Object composition;
S2. dependence of each Chinese medicine in the prescription set to other Chinese medicines in prescription set is calculated separately
Degree;
S3. according to each dependency degree, the degree of association between each corresponding Chinese medicine is calculated;
S4. Chinese medicine network is established;The Chinese medicine network include it is multiple respectively with each Chinese medicine in the prescription set
One-to-one node;The degree of association between the Chinese medicine corresponding to any two described nodes is greater than preset first threshold
When, there are the sides with weight that one connects two nodes, otherwise there is no the sides for connecting two nodes;The side
Weight is equal to the degree of association between Chinese medicine corresponding to two nodes that this edge is connected;
S5. the migration probability on each side in the Chinese medicine network is calculated using Random Walk Algorithm, thus in described
Medicine network carries out orientedization processing;
S6. by orientedization handle the Chinese medicine network in, according to the migration probability on each side being calculated into
Row random walk, to obtain multiple Chinese medicine sequences;Each Chinese medicine sequence is by the node pair that is passed through in random walk process
The Chinese medicine composition answered;
S7. vectorization processing is carried out to each Chinese medicine sequence, to obtain multiple Chinese medicine vectors;
S8. each Chinese medicine vector is handled using clustering algorithm;The clustering algorithm is used for each Chinese medicine
The corresponding Chinese medicine of vector is classified as corresponding classification;
S9. same category of Chinese medicine will be classified as to export as Chinese medicine corporations.
In step S1, by modes such as consulting distinguished veteran doctors of TCM, inquiry classical Chinese medicine or access Chinese medicine medicament databases, receive
Collect multiple prescriptions with good therapeutic effect, to form prescription set.Each prescription is by one or more Chinese medicine medicines
Object composition.For example, the prescription for treating prostatosis simply includes tortoise plastron, rhizoma anemarrhenae, peach kernel, radix fici simplicissimae, the root of three-nerved spicebush, herba taxilli, litchi
The drugs such as branch shell.
Different prescriptions may include identical drug.It such as include radix fici simplicissimae in the prescription of above-mentioned treatment prostatosis
This drug, and simply in the prescription of promoting the circulation of qi dampness removing also include this drug of radix fici simplicissimae.In the present embodiment, for being related in prescription
And drug, be to be distinguished according to its property as medicament categories itself.For example, working as the side of above-mentioned treatment prostatosis
It had both included in the prescription for the treatment of prostatosis when the prescription of agent and simply promoting the circulation of qi dampness removing forms prescription set, in the prescription set
Both radix fici simplicissimaes are considered as in prescription set together by the radix fici simplicissimae in the prescription of radix fici simplicissimae and promoting the circulation of qi dampness removing, the present embodiment
One Chinese medicine.
Each Chinese medicine in the prescription set is calculated separately in step S2 to other Chinese medicines in prescription set
Dependency degree.For example, when the prescription of above-mentioned treatment prostatosis and the prescription composition prescription set of promoting the circulation of qi dampness removing simply, for
The radix fici simplicissimae in the prescription of prostatosis is treated, tortoise plastron, rhizoma anemarrhenae, peach in its prescription with treatment prostatosis are calculated separately
In benevolence, the root of three-nerved spicebush, herba taxilli and lychee exocarp and other prescriptions etc. dependency degree between other Chinese medicines;For treating forefront
Tortoise plastron in the prescription of adenopathy, calculate separately its with treatment prostatosis prescription in radix fici simplicissimae, rhizoma anemarrhenae, peach kernel, the root of three-nerved spicebush,
In herba taxilli and lychee exocarp and other prescriptions etc. dependency degree between other Chinese medicines.
In the present embodiment, it is calculated by the following formula dependency degree in step s 2:
In formula, and Ind (h2 | h1) it is Chinese medicine h2 to Chinese medicine
The dependency degree of h1, | h1 | it is frequency of occurrence of the Chinese medicine h1 in the prescription set, f (h1, h2)iFor simultaneously comprising Chinese medicine
I-th of prescription in the prescription of drug h1 and Chinese medicine h2, f (h1, h2)i.length it is prescription f (h1, h2)iIn including
The quantity of medicine drug.
In the formula that the present embodiment calculates dependency degree, every kind of Chinese medicine is the property using it as medicament categories itself
The five fingers hair in the prescription of radix fici simplicissimae and promoting the circulation of qi dampness removing come what is be marked, such as in the prescription for treating prostatosis
Peach marks, | h1 | indicate frequency of occurrence of the radix fici simplicissimae in the prescription set, such as in prescription set only simply
The prescription of the prescription for treating prostatosis and promoting the circulation of qi dampness removing simply includes radix fici simplicissimae, then radix fici simplicissimae is one in the prescription set
Occur altogether twice, | h1 |=2.f(h1,h2)iFor i-th of side in the prescription simultaneously comprising Chinese medicine h1 and Chinese medicine h2
There is the prescription and the prescription of kidney tonifying simply for treating prostatosis simply in agent, such as prescription set, they all include the five fingers hair
Peach and rhizoma anemarrhenae are that rhizoma anemarrhenae is labeled as h2, then can use f (h1, h2) if radix fici simplicissimae marked1Before indicating treatment
The prescription of column adenopathy, with f (h1, h2)2Indicate the prescription of kidney tonifying.f(h1,h2)i.length it is prescription f (h1, h2)iIn including
The quantity of medicine drug, such as f (h1, h2) in the present embodiment1Indicate the prescription for the treatment of prostatosis, the complete formula of the prescription
For radix fici simplicissimae, rhizoma anemarrhenae, peach kernel, the root of three-nerved spicebush, herba taxilli and lychee exocarp, i.e., it includes 6 kinds of Chinese medicines that the prescription, which has altogether, then f (h1,
h2)1.length=6.
In the present embodiment, by the calculation formula of dependency degree it is found that the calculating of dependency degree does not have exchangeability, i.e. Chinese medicine
H2 is usually unequal to the dependency degree and dependency degree of the Chinese medicine h1 to Chinese medicine h2 of Chinese medicine h1.
In step S3, resulting each dependency degree is calculated according to step S2, calculates the degree of association between each corresponding Chinese medicine.
In the present embodiment, the degree of association is the parameter of the correlation degree in a reflection prescription set between any two Chinese medicines.In step
During rapid S2 calculates each dependency degree, the mistake of any two Chinese medicine group patent medicine pair in prescription set is actually had been completed
Journey, therefore can be directly using each medicine pair composed by step S2 and medicine to related two Chinese medicines in step S3
Dependency degree, to calculate the degree of association of the medicine pair.
In the present embodiment, it is calculated by the following formula the degree of association in step s3:
In formula, the degree of association between Chinese medicine h1 and Chinese medicine h2, Ind (h2 | h1) it is Chinese medicine h2 to Chinese medicine medicine
The dependency degree of object h1, Ind (h1 | h2) it is dependency degree of the Chinese medicine h1 to Chinese medicine h2, | h1 | it is Chinese medicine h1 in institute
The frequency of occurrence in prescription set is stated, | h2 | it is frequency of occurrence of the Chinese medicine h2 in the prescription set, k is preset the
Two threshold values.
In the present embodiment, can directly by Chinese medicine h1 to the dependency degree or Chinese medicine h2 of Chinese medicine h2 to Chinese medicine
The dependency degree of drug h1 is as the degree of association between Chinese medicine h1 and Chinese medicine h2.Specifically, a second threshold is set
K, as | h1 | and | h2 | in reckling when being less than k, by Chinese medicine h1 to the dependency degree and Chinese medicine h2 of Chinese medicine h2
To the smaller in the dependency degree of Chinese medicine h1 as the degree of association between Chinese medicine h1 and Chinese medicine h2;When | h1 | and
| h2 | in reckling when being greater than or equal to k, by Chinese medicine h1 to the dependency degree of Chinese medicine h2 and Chinese medicine h2 centering
The greater in the dependency degree of medicine drug h1 is as the degree of association between Chinese medicine h1 and Chinese medicine h2.Pass through setting second
Threshold value and according to second threshold and | h1 | and | h2 | between size relation determine the occurrence of the degree of association, can be to avoid direct
The unreasonable situation that the greater in Ind (h2 | h1) and Ind (h1 | h2) may cause as the degree of association, the unreasonable feelings
Condition, which refers to, will lead to that the degree of association is very big when the denominator very little in formula, so that the Chinese medicine of itself frequency of occurrence very little and same
The degree of association of Chinese medicine in one prescription is very big.
In the present embodiment, by the calculation formula of the degree of association it is found that the calculating of the degree of association has exchangeability, i.e. Chinese medicine h2
The degree of association and the degree of association of the Chinese medicine h1 to Chinese medicine h2 to Chinese medicine h1 are equal.
In step S4, Chinese medicine network is established, which is actually each Chinese medicine medicine in a record prescription set
Object and its between correlation data set.Referring to Fig. 2, which includes multiple nodes, these nodes and prescription set
In each Chinese medicine correspond.The Chinese medicine network further includes the side of two nodes of some connections, the create-rule on these sides
It is: when the degree of association between the Chinese medicine corresponding to any two described nodes is greater than preset first threshold, then generates one
Item connects the side of the company node, and assigns the degree of association between Chinese medicine corresponding to two nodes as weight
To this edge;When the degree of association between the Chinese medicine corresponding to two nodes is less than preset first threshold, the two sections
Just there is no side between point.
Preferentially, first threshold can be set as 0, as long as since two Chinese medicines appear in inside same prescription, this
The degree of association between two Chinese medicines just has the value greater than 0, as long as that is, two Chinese medicines appear in same prescription
Just there is a line in the inside between two nodes corresponding with two Chinese medicines of the two Chinese medicines in Chinese medicine network.
It can be that each side in Chinese medicine network assigns weight by following steps: will be in Chinese medicine network in step S4
The degree of association between the corresponding Chinese medicine of each node is initialized as 0, all prescriptions in prescription set is then traversed, according to step
The pass between the resulting medicine pair of Chinese medicine combination of two that rapid S2 and S3 is related to when observing one of prescription every time to calculate
The degree of association that each time executes the resulting same medicine pair of step S2 and S3 is added up, has all been observed when to all prescriptions by connection degree
Bi Hou, obtained final accumulated value are the degrees of association of the medicine pair, i.e., the medicine is to two sections in corresponding Chinese medicine network
The weight on the side between point.
The degree of association calculated between resulting two Chinese medicines in step S3 has exchangeability, correspondingly in Chinese medicine network
The weight on the side between two nodes also has exchangeability, therefore the side of the resulting Chinese medicine network of step S4 has scalar property.Step
In rapid S5, on the original weighted basis in side between the two nodes, calculated in the Chinese medicine network using Random Walk Algorithm
Each side migration probability, assign weight again for each side.The migration probability and node calculated due to Random Walk Algorithm
Start-stop sequence it is related, therefore again assign weight after Chinese medicine network in each side have direction, realize to Chinese medicine net
Orientedization of network is handled.
In the present embodiment, in step s 5, it is described using calculation formula used in Random Walk Algorithm be it is below
Softmax function:
In formula, σ (Z)jMigration by the j-th strip side connected the Chinese medicine nodes Z is general
Rate, ZjBy the weight on the j-th strip side connected Chinese medicine nodes Z, i is serial number, and K is to connect in Chinese medicine network with node Z
All sides quantity.
Fig. 3 is the Chinese medicine network with 4 nodes, and before by orientedization processing, each side is undirected.It is right
Chinese medicine network shown in Fig. 3 obtains network shown in Fig. 4 after carrying out orientedization processing, and the weight on each at this time side is connected to it
The start-stop relationship of two nodes is related, therefore each side has direction.
In step S6, in the Chinese medicine network by orientedization processing, according to the migration on each side being calculated
Probability carries out random walk, to obtain multiple Chinese medicine sequences.Step S6 specifically includes the following steps:
S601., migration number corresponding to each node in the Chinese medicine network is set;
S602., the item number on the side that each random walk is passed through is set;
S603. according to the migration number, process while item number and migration probability at each, traverse the Chinese medicine net
All nodes carry out random walk respectively as starting point in network;
S604. Chinese medicine corresponding to the node passed through in each secondary random walk process is carried out by migration sequence defeated
Out, to obtain multiple Chinese medicine sequences.
Setting has the migration number to be carried out altogether in step S601.In step S603, migration each time is respectively from Chinese medicine net
A node in network starts to set out, and according to the item number on the side that each random walk being arranged in step S602 is passed through, passes through
Stop the secondary migration after multiple summits, passed through in the secondary migration when being as each and its migration probability determined by direction
It determines at random.In step S604, the corresponding Chinese medicine of the node that each random walk is passed through is carried out defeated by migration sequence
Out, each random walk all obtains a Chinese medicine sequence, by the available multiple Chinese medicine sequences of multiple random walk.
The principle of step S6 is as shown in Figure 4 and Figure 5.Fig. 4 is the Chinese medicine network with 6 nodes, Fig. 6 be respectively from
Chinese medicine 1, Chinese medicine 2 and Chinese medicine 6 set out, and carry out the resulting Chinese medicine sequence of random walk by different migration routes.
In step S7, vectorization processing is carried out to each Chinese medicine sequence, to obtain multiple Chinese medicine vectors.Step S7 tool
Body the following steps are included:
S701. the skip-gram model being input to using each Chinese medicine sequence as document in Word2vec algorithm
In;
S702. the Hidden Layer Linear Neurons hidden layer neuron in the skip-gram model is received
The weight of output;
S703. the weight that the hidden layer neuron exports is returned as Chinese medicine vector.
The principle of skip-gram model is as shown in Figure 7.Skip-gram model can receive document, and according to receiving
Medium term in document predicts context, this function of skip-gram model is utilized in the present embodiment, using Chinese medicine sequence as
Document is input in skip-gram model.Hidden Layer Linear Neurons hidden layer in skip-gram model
Neuron can export the received corresponding weighted value of Chinese medicine sequence, which is exactly Chinese medicine corresponding with Chinese medicine sequence
Vector.Resulting Chinese medicine vector is used directly for the processing of step S8, can also incorporate use after other Chinese medicine properties features
In the processing of step S8.
In step S8, each Chinese medicine vector is handled using clustering algorithm.Step S8 is specifically included:
S801. category setting is carried out to Fuzzy C-Means clustering algorithm;Each classification respectively corresponds corresponding third
Threshold value;
Each Chinese medicine vector is input in Fuzzy C-Means clustering algorithm by S802, and it is poly- to receive Fuzzy C-Means
The classification probability corresponding with each Chinese medicine vector of class algorithm output;
S803 is corresponding by the corresponding Chinese medicine vector of the classification probability when the classification probability reaches corresponding third threshold value
Chinese medicine be classified as the corresponding classification of third threshold value.
Clustering algorithm used in step S8 is Fuzzy C-Means clustering algorithm.Poly- using Fuzzy C-Means
When class algorithm, first set it is multiple need classification, and set third threshold value corresponding with these classifications.Fuzzy C-Means cluster
After algorithm receives each Chinese medicine vector, corresponding with these Chinese medicine vectors classification probability is exported, when classification probability reaches a certain the
When three threshold values, the corresponding Chinese medicine of corresponding Chinese medicine vector is classified as the corresponding classification of third threshold value.When to all Chinese medicines
After the clustering processing of vector is completed, be arranged in Fuzzy C-Means clustering algorithm it is of all categories have corresponding Chinese medicine,
I.e. Chinese medicine is classified as various classifications.In step S9, same category of Chinese medicine will be classified as Chinese medicine corporations
It being exported, these Chinese medicine corporations reflect the medical information hidden in prescription, can be used for further studying or testing, from
And reach the new prescription to work well.
By being suitably arranged and being carried out experimental verification, the present embodiment method to Fuzzy C-Means clustering algorithm
Can have following effect: for treatment prostatosis include tortoise plastron, rhizoma anemarrhenae, peach kernel, radix fici simplicissimae, the root of three-nerved spicebush, herba taxilli and
Tortoise plastron and rhizoma anemarrhenae are classified as one kind by the prescription of the Chinese medicines such as lychee exocarp, and radix fici simplicissimae, the root of three-nerved spicebush and herba taxilli are classified as one kind.
According to theory of traditional Chinese medical science and further research it is found that tortoise plastron and rhizoma anemarrhenae are the core medicines in the prescription with prostatosis therapeutic effect
Object, although radix fici simplicissimae, the root of three-nerved spicebush and herba taxilli do not have apparent prostatosis therapeutic effect, belong to according to body and
The pharmaceutical composition used, equally have medical value, therefore the present embodiment method realize to radix fici simplicissimae, the root of three-nerved spicebush and herba taxilli this
The excavation of one Chinese medicine compatibility.
S1-S9 in the present embodiment is a kind of Graph Embedding method, is had the advantage that
1, the relationship between two Chinese medicines is measured by dependency degree, and using dependency degree as side in Chinese medicine network
Weight increases the information content of Chinese medicine network;
2, according to the weight on side in Chinese medicine network, probability digraph is converted by Chinese medicine network using softmax formula, is made
Random walk can be realized on this basis by obtaining;
3, the random walk carried out on probability digraph is cum rights DeepWalk, compared with the DeepWalk on basis, is examined
Consider weight, Chinese medicine sequence can be exported and further export Chinese medicine vector, which can be easier to incorporate other
Chinese medicine properties feature;
4, Chinese medicine vector is handled using Fuzzy C-Means clustering algorithm, it can be found that overlapping drug corporations, hair
Now potential drug matching.
5, the drug vector obtained by Graph Embedding technology is sent out with existing based on correlation rule or corporations
Existing algorithm, which is compared, has lower computation complexity, can achieve higher computational efficiency.
The present embodiment further includes a kind of Chinese medicine community information generation system, comprising:
Prescription collection modules, for establishing prescription set;The prescription set includes multiple prescriptions, each prescription difference
It is made of corresponding Chinese medicine;
Dependency degree computing module, for calculating separately each Chinese medicine in the prescription set to its in prescription set
The dependency degree of his Chinese medicine;
Calculation of relationship degree module, for calculating the degree of association between each corresponding Chinese medicine according to each dependency degree;
Chinese medicine network module, for establishing Chinese medicine network;The Chinese medicine network include it is multiple respectively with the prescription set
In the one-to-one node of each Chinese medicine;The degree of association between the Chinese medicine corresponding to any two described nodes is greater than
When preset first threshold, there are the sides with weight that one connects two nodes, otherwise there is no this two sections of connection
The side of point;The weight on the side is equal to the degree of association between Chinese medicine corresponding to two nodes that this edge is connected;
Orientedization processing module, the migration for calculating each side in the Chinese medicine network using Random Walk Algorithm are general
Rate, to carry out orientedization processing to the Chinese medicine network;
Random walk module, for by orientedization handle the Chinese medicine network in, according to each item being calculated
The migration probability on side carries out random walk, to obtain multiple Chinese medicine sequences;Each Chinese medicine sequence is by random walk process
The corresponding Chinese medicine composition of the node passed through;
Vectorization processing module, for carrying out vectorization processing to each Chinese medicine sequence, thus obtain multiple Chinese medicines to
Amount;
Cluster module, for being handled using clustering algorithm each Chinese medicine vector;The clustering algorithm is used for will
The corresponding Chinese medicine of each Chinese medicine vector is classified as corresponding classification;
Output module is exported for that will be classified as same category of Chinese medicine as Chinese medicine corporations.
Each module in the Chinese medicine community information generation system, can be hardware module or software with corresponding function
Module.
The invention also includes a kind of Chinese medicine community information generating means, including memory and processor, the memory is used
In storing at least one program, the processor is for loading at least one described program to execute the method for the present invention.
The invention also includes a kind of storage mediums, wherein being stored with the executable instruction of processor, the processor can be held
Capable instruction is used to execute the method for the present invention when executed by the processor.
Chinese medicine community information in the present embodiment generates system, device and storage medium, can execute Chinese medicine of the invention
Community information generation method, any combination implementation steps of executing method embodiment have the corresponding function of this method and have
Beneficial effect.
It is to be illustrated to preferable implementation of the invention, but the implementation is not limited to the invention above
Example, those skilled in the art can also make various equivalent variations on the premise of without prejudice to spirit of the invention or replace
It changes, these equivalent deformations or replacement are all included in the scope defined by the claims of the present application.
Claims (10)
1. a kind of Chinese medicine community information generation method, which comprises the following steps:
Establish prescription set;The prescription set includes multiple prescriptions, and each prescription is made of corresponding Chinese medicine respectively;
Each Chinese medicine in the prescription set is calculated separately to the dependency degree of other Chinese medicines in prescription set;
According to each dependency degree, the degree of association between each corresponding Chinese medicine is calculated;
Establish Chinese medicine network;The Chinese medicine network include it is multiple respectively in the prescription set each Chinese medicine correspond
Node;When the degree of association between the Chinese medicine corresponding to any two described nodes is greater than preset first threshold, exist
One connects the side with weight of two nodes, otherwise there is no the sides for connecting two nodes;The weight etc. on the side
The degree of association between the Chinese medicine corresponding to two nodes that this edge is connected;
The migration probability on each side in the Chinese medicine network is calculated using Random Walk Algorithm, thus to the Chinese medicine network into
Row orientedization processing;
In the Chinese medicine network by orientedization processing, swum at random according to the migration probability on each side being calculated
It walks, to obtain multiple Chinese medicine sequences;Each Chinese medicine sequence is by the corresponding Chinese medicine of the node passed through in random walk process
Drug composition;
Vectorization processing is carried out to each Chinese medicine sequence, to obtain multiple Chinese medicine vectors;
Each Chinese medicine vector is handled using clustering algorithm;The clustering algorithm is used for each Chinese medicine vector is corresponding
Chinese medicine be classified as corresponding classification;
Same category of Chinese medicine will be classified as to export as Chinese medicine corporations.
2. a kind of Chinese medicine community information generation method according to claim 1, which is characterized in that the calculating of the dependency degree
Formula are as follows:
In formula, Ind (h2 | h1) be Chinese medicine h2 to Chinese medicine h1 according to
Lai Du, | h1 | it is frequency of occurrence of the Chinese medicine h1 in the prescription set, f (h1, h2)iFor simultaneously comprising Chinese medicine h1
With i-th of prescription in the prescription of Chinese medicine h2, f (h1, h2)i.length it is prescription f (h1, h2)iThe Chinese medicine for including
Quantity.
3. a kind of Chinese medicine community information generation method according to claim 2, which is characterized in that the calculating of the degree of association
Formula are as follows:
It is Chinese medicine h1 and Chinese medicine in formula
The degree of association between drug h2, Ind (h2 | h1) are dependency degree of the Chinese medicine h2 to Chinese medicine h1, during Ind (h1 | h2) is
Medicine drug h1 to the dependency degree of Chinese medicine h2, | h1 | be frequency of occurrence of the Chinese medicine h1 in the prescription set, | h2 |
For frequency of occurrence of the Chinese medicine h2 in the prescription set, k is preset second threshold.
4. a kind of Chinese medicine community information generation method according to claim 1, which is characterized in that described to use random walk
Calculation formula used in algorithm is softmax function below:
In formula, σ (Z)jBy the migration probability on the j-th strip side connected the Chinese medicine nodes Z, Zj
By the weight on the j-th strip side connected Chinese medicine nodes Z, i is serial number, and K is all for what is connect in Chinese medicine network with node Z
The quantity on side.
5. a kind of Chinese medicine community information generation method according to claim 1, which is characterized in that described to pass through orientedization
In the Chinese medicine network of processing, random walk is carried out according to the migration probability on each side being calculated, to obtain multiple
It the step for Chinese medicine sequence, specifically includes:
Migration number corresponding to each node in the Chinese medicine network is set;
The item number on the side that each random walk is passed through is set;
According to the migration number, process while item number and migration probability at each, traverse in the Chinese medicine network and own
Node carries out random walk respectively as starting point;
Chinese medicine corresponding to the node passed through in each secondary random walk process is exported by migration sequence, to obtain
Multiple Chinese medicine sequences.
6. a kind of Chinese medicine community information generation method according to claim 1, which is characterized in that described to each Chinese medicine
Sequence carries out vectorization processing, so that the step for obtaining multiple Chinese medicine vectors, specifically includes:
It is input in the skip-gram model in Word2vec algorithm using each Chinese medicine sequence as document;
Receive the power of the Hidden Layer Linear Neurons hidden layer neuron output in the skip-gram model
Weight;
The weight that the hidden layer neuron exports is returned as Chinese medicine vector.
7. a kind of Chinese medicine community information generation method according to claim 1, which is characterized in that described to use clustering algorithm
The step for handling each Chinese medicine vector, specifically includes:
Category setting is carried out to Fuzzy C-Means clustering algorithm;Each classification respectively corresponds corresponding third threshold value;
Each Chinese medicine vector is input in Fuzzy C-Means clustering algorithm, it is defeated to receive Fuzzy C-Means clustering algorithm
Classification probability corresponding with each Chinese medicine vector out;
When the classification probability reaches corresponding third threshold value, by the corresponding Chinese medicine medicine of the corresponding Chinese medicine vector of the classification probability
Object is classified as the corresponding classification of third threshold value.
8. a kind of Chinese medicine community information generates system characterized by comprising
Prescription collection modules, for establishing prescription set;The prescription set includes multiple prescriptions, and each prescription is respectively by phase
The Chinese medicine composition answered;
Dependency degree computing module, for calculating separately each Chinese medicine in the prescription set in other in prescription set
The dependency degree of medicine drug;
Calculation of relationship degree module, for calculating the degree of association between each corresponding Chinese medicine according to each dependency degree;
Chinese medicine network module, for establishing Chinese medicine network;The Chinese medicine network includes multiple respectively and in the prescription set
Each one-to-one node of Chinese medicine;The degree of association between the Chinese medicine corresponding to any two described nodes is greater than default
First threshold when, there are the side with weight that one connects two nodes, otherwise there is no connect two nodes
Side;The weight on the side is equal to the degree of association between Chinese medicine corresponding to two nodes that this edge is connected;
Orientedization processing module, for calculating the migration probability on each side in the Chinese medicine network using Random Walk Algorithm,
To carry out orientedization processing to the Chinese medicine network;
Random walk module, for by orientedization handle the Chinese medicine network in, according to each side being calculated
Migration probability carries out random walk, to obtain multiple Chinese medicine sequences;Each Chinese medicine sequence in random walk process by being passed through
The corresponding Chinese medicine composition of the node crossed;
Vectorization processing module, for carrying out vectorization processing to each Chinese medicine sequence, to obtain multiple Chinese medicine vectors;
Cluster module, for being handled using clustering algorithm each Chinese medicine vector;The clustering algorithm is used for each institute
It states the corresponding Chinese medicine of Chinese medicine vector and is classified as corresponding classification;
Output module is exported for that will be classified as same category of Chinese medicine as Chinese medicine corporations.
9. a kind of Chinese medicine community information generating means, which is characterized in that including memory and processor, the memory is for depositing
At least one program is stored up, the processor is required described in any one of 1-7 for loading at least one described program with perform claim
Method.
10. a kind of storage medium, wherein being stored with the executable instruction of processor, which is characterized in that the processor is executable
Instruction be used to execute such as any one of claim 1-7 the method when executed by the processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910104918.3A CN110010251B (en) | 2019-02-01 | 2019-02-01 | Traditional Chinese medicine community information generation method, system, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910104918.3A CN110010251B (en) | 2019-02-01 | 2019-02-01 | Traditional Chinese medicine community information generation method, system, device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110010251A true CN110010251A (en) | 2019-07-12 |
CN110010251B CN110010251B (en) | 2022-04-15 |
Family
ID=67165631
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910104918.3A Active CN110010251B (en) | 2019-02-01 | 2019-02-01 | Traditional Chinese medicine community information generation method, system, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110010251B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110706789A (en) * | 2019-10-10 | 2020-01-17 | 电子科技大学 | Excavation method for incompatibility of traditional Chinese medicines |
WO2022179384A1 (en) * | 2021-02-26 | 2022-09-01 | 山东英信计算机技术有限公司 | Social group division method and division system, and related apparatuses |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101615222A (en) * | 2008-06-23 | 2009-12-30 | 中国医学科学院放射医学研究所 | A kind of Chinese prescription designing technique based on the Chinese medicine effective component group |
CN104820775A (en) * | 2015-04-17 | 2015-08-05 | 南京大学 | Discovery method of core drug of traditional Chinese medicine prescription |
CN106126649A (en) * | 2016-06-24 | 2016-11-16 | 北京千安哲信息技术有限公司 | A kind of similar Chinese crude drug method for digging and device |
CN107519262A (en) * | 2017-10-14 | 2017-12-29 | 杜运升 | One kind is promoted the sexual maturity scattered medicine and preparation method and application |
CN108037093A (en) * | 2017-12-20 | 2018-05-15 | 荣贵福 | It is a kind of differentiate " Baizhi to be measured whether the method for carrying out sulfur fumigation |
CN108647236A (en) * | 2018-03-30 | 2018-10-12 | 山东管理学院 | A kind of prescriptions of traditional Chinese medicine vector space model method and device based on Term co-occurrence |
-
2019
- 2019-02-01 CN CN201910104918.3A patent/CN110010251B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101615222A (en) * | 2008-06-23 | 2009-12-30 | 中国医学科学院放射医学研究所 | A kind of Chinese prescription designing technique based on the Chinese medicine effective component group |
CN104820775A (en) * | 2015-04-17 | 2015-08-05 | 南京大学 | Discovery method of core drug of traditional Chinese medicine prescription |
CN106126649A (en) * | 2016-06-24 | 2016-11-16 | 北京千安哲信息技术有限公司 | A kind of similar Chinese crude drug method for digging and device |
CN107519262A (en) * | 2017-10-14 | 2017-12-29 | 杜运升 | One kind is promoted the sexual maturity scattered medicine and preparation method and application |
CN108037093A (en) * | 2017-12-20 | 2018-05-15 | 荣贵福 | It is a kind of differentiate " Baizhi to be measured whether the method for carrying out sulfur fumigation |
CN108647236A (en) * | 2018-03-30 | 2018-10-12 | 山东管理学院 | A kind of prescriptions of traditional Chinese medicine vector space model method and device based on Term co-occurrence |
Non-Patent Citations (1)
Title |
---|
严红梅等: "基于中药组分和"组分结构"理论的中药研究模式的探讨", 《中草药》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110706789A (en) * | 2019-10-10 | 2020-01-17 | 电子科技大学 | Excavation method for incompatibility of traditional Chinese medicines |
CN110706789B (en) * | 2019-10-10 | 2022-05-24 | 电子科技大学 | Excavation method for incompatibility of traditional Chinese medicines |
WO2022179384A1 (en) * | 2021-02-26 | 2022-09-01 | 山东英信计算机技术有限公司 | Social group division method and division system, and related apparatuses |
Also Published As
Publication number | Publication date |
---|---|
CN110010251B (en) | 2022-04-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106933994B (en) | Traditional Chinese medicine knowledge graph-based core disease and syndrome relation construction method | |
Huang et al. | Integration of patch features through self-supervised learning and transformer for survival analysis on whole slide images | |
Tang et al. | CapSurv: Capsule network for survival analysis with whole slide pathological images | |
Qahtan et al. | Review of healthcare industry 4.0 application-based blockchain in terms of security and privacy development attributes: Comprehensive taxonomy, open issues and challenges and recommended solution | |
CN109949929A (en) | A kind of assistant diagnosis system based on the extensive case history of deep learning | |
Li et al. | Multi-modal multi-instance learning using weakly correlated histopathological images and tabular clinical information | |
Mancini et al. | Computing biological model parameters by parallel statistical model checking | |
CN107644062A (en) | The knowledge content Weight Analysis System and method of a kind of knowledge based collection of illustrative plates | |
CN110010251A (en) | A kind of Chinese medicine community information generation method, system, device and storage medium | |
Hao et al. | Intelligent diagnosis of jaundice with dynamic uncertain causality graph model | |
Liu et al. | Multi-branch fusion auxiliary learning for the detection of pneumonia from chest X-ray images | |
Akter et al. | Hepatocellular carcinoma patient’s survival prediction using oversampling and machine learning techniques | |
CN108206056A (en) | A kind of nasopharyngeal carcinoma artificial intelligence assisting in diagnosis and treatment decision terminal | |
CN114141361B (en) | Traditional Chinese medicine prescription recommendation method based on symptom term mapping and deep learning | |
CN113111657A (en) | Cross-language knowledge graph alignment and fusion method, device and storage medium | |
WO2023134060A1 (en) | Information pushing method and apparatus based on drug molecule image classification | |
CN116936108A (en) | Unbalanced data-oriented disease prediction system | |
Ma et al. | Prediction of disease progression of chronic hepatitis C based on XGBoost algorithm | |
Jain et al. | Diagnosing covid-19 and pneumonia from chest ct-scan and x-ray images using deep learning technique | |
He et al. | Evolutionary multi-objective architecture search framework: Application to covid-19 3d ct classification | |
Zheng et al. | Learning from the guidance: Knowledge embedded meta-learning for medical visual question answering | |
Zhou et al. | Audit to Forget: A Unified Method to Revoke Patients' Private Data in Intelligent Healthcare | |
Yu et al. | Deep learning hybrid models for COVID-19 prediction | |
CN115223657B (en) | Medicinal plant transcriptional regulation map prediction method | |
Xu et al. | Gene mutation classification using CNN and BiGRU network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |