CN110010251A - A kind of Chinese medicine community information generation method, system, device and storage medium - Google Patents

A kind of Chinese medicine community information generation method, system, device and storage medium Download PDF

Info

Publication number
CN110010251A
CN110010251A CN201910104918.3A CN201910104918A CN110010251A CN 110010251 A CN110010251 A CN 110010251A CN 201910104918 A CN201910104918 A CN 201910104918A CN 110010251 A CN110010251 A CN 110010251A
Authority
CN
China
Prior art keywords
chinese medicine
prescription
network
degree
random walk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910104918.3A
Other languages
Chinese (zh)
Other versions
CN110010251B (en
Inventor
赵淦森
王剑飞
黎子靖
庄序填
王桂兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China Normal University
Original Assignee
South China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China Normal University filed Critical South China Normal University
Priority to CN201910104918.3A priority Critical patent/CN110010251B/en
Publication of CN110010251A publication Critical patent/CN110010251A/en
Application granted granted Critical
Publication of CN110010251B publication Critical patent/CN110010251B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Toxicology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Chemical & Material Sciences (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a kind of Chinese medicine community information generation methods, system, device and storage medium, the method includes establishing prescription set, each Chinese medicine in the prescription set is calculated to the dependency degree of other Chinese medicines in prescription set, calculate the degree of association between each corresponding Chinese medicine, establish Chinese medicine network, calculate the migration probability on each side in the Chinese medicine network, random walk is carried out according to the migration probability on each side being calculated, to obtain multiple Chinese medicine sequences, vectorization processing is carried out to each Chinese medicine sequence, same category of Chinese medicine will be classified as Chinese medicine corporations export.The present invention can be found that overlapping drug corporations, finds potential drug matching, the present invention with it is existing based on the algorithm of correlation rule or community discovery compared to lower computation complexity, can achieve higher computational efficiency.The present invention is widely used in pharmacoinformatics technical field.

Description

A kind of Chinese medicine community information generation method, system, device and storage medium
Technical field
The present invention relates to pharmacoinformatics technical field, especially a kind of Chinese medicine community information generation method, system, device And storage medium.
Background technique
Chinese medicine is one of Chinese quintessence of Chinese culture, and classical tcm prescription (prescription) is the theory of traditional Chinese medical science essence of the test of experience practice China has huge medical research value.Theory of traditional Chinese medical science is particular about compatibility of drugs, i.e. a herb prescription is usually by plurality of Chinese Drug collocation composition, the medical functions of Chinese medicinal formulae are from the plurality of Chinese drug itself and its group as its constituent Conjunction relationship, therefore be the compatibility relationship between Study of Traditional Chinese Medicine drug to a research direction of tcm prescription, it is desirable to it can basis Existing tcm prescription information is handled, and the completely new Traditional chinese medicine medicament combination with specific combination rule is exported, to excavate Obtain the new prescription with more preferable curative effect.
Existing new prescription digging technology is mainly based upon what traditional correlation rule or community discovery algorithm carried out, therefore Have the shortcomings that obvious.Association rules method, only merely using the co-occurrence frequency of the drug in prescription and number as weighing apparatus Measure drug between close relation foundation, although association rules method can reflect out to a certain extent it is some common Drug collocating rule, but only it is based purely on the complexity that co-occurrence rule often ignores Chinese medicine collocation;Due in drug matching There is mutual reinforcement between, mutually make, mutual restraint between two drugs, the mutual-detoxication, the mutual inhibition, six kinds of relationships such as opposite, how the collocating principle of its behind to be dug to Chinese medicinal formulae Excavate what the method to be also based on correlation rule was difficult to.Method based on community discovery generally has limitation, non-heavy Although folded community discovery algorithm it can be found that Chinese medicine complexity theory partial knowledge, but have ignored the weight that drug corporations use Folded property;And Chinese medicine properties is complicated compared with the relationship of drug, and community discovery algorithm is often difficult to make full use of clinic diagnosis data, The Chinese medicine data of some preciousnesses are caused to be difficult to be used effectively, so that it is difficult to relationship complicated between Chinese medicine to show.
Term is explained:
Graph Embedding:Graph Embedding is that map analysis problem (graph analytics) and characterization are learned The model of the combination of habit problem (representation learning).The purpose of graph analytics, is excavated from figure Useful valuable information.And representation learning can then convert data to vector and indicate so that more It is easy useful valuable in the extraction data such as data mining algorithm, such as classification, prediction, clustering algorithm using various maturation Information.The target of Graph Embeding model is exactly to combine said two devices, from learn in diagram data out can be in reserved graph The vector expression of useful information (such as graph structure information, the related information between node of graph).
Random walk (random walk): the Graph Embedding method basic ideas based on random walk are, from Outbound path set is sampled in figure, is then based on the path that sampling comes out to learn the feature vector of figure interior joint or side and indicate.By Represented by the path that figure can be come out by sampling, so figure, which is equivalent to, is converted into " document " being made of node, because This is to represent Word Embedding method to be used in this with word2vec.First is mentioned based on random walk thought The method of GraphEmbedding out is DeepWalk, and DeepWalk is to combine random walk and Word2Vec GraphEmbedding method.
Fuzzy clustering: fuzzy cluster analysis is a kind of things to be described by certain requirement using fuzzy mathematics language With the mathematical method of classification.Fuzzy cluster analysis generally refers to construct fuzzy matrix according to the attribute of research object itself, and Clustering relationships are determined according to certain degree of membership on this basis, i.e., with the method for fuzzy mathematics the fuzzy pass between sample It is quantitative determination, thus objective and accurately clustered.Cluster is exactly that data set is divided into multiple classes or cluster, so that each Data differences between class answer it is as big as possible, in class between data differences answer as small as possible, as " minimize similar between class Property, maximize similitude in class " principle.
Summary of the invention
In order to solve the above-mentioned technical problem, the purpose of the present invention is to provide a kind of Chinese medicine community information generation method, be System, device and storage medium.
On the one hand, the present invention includes a kind of Chinese medicine community information generation method, comprising the following steps:
Establish prescription set;The prescription set includes multiple prescriptions, and each prescription is respectively by corresponding Chinese medicine Composition;
Each Chinese medicine in the prescription set is calculated separately to the dependency degree of other Chinese medicines in prescription set;
According to each dependency degree, the degree of association between each corresponding Chinese medicine is calculated;
Establish Chinese medicine network;The Chinese medicine network include it is multiple respectively with each Chinese medicine in the prescription set one by one Corresponding node;When the degree of association between the Chinese medicine corresponding to any two described nodes is greater than preset first threshold, There are the sides with weight that one connects two nodes, otherwise there is no the sides for connecting two nodes;The power on the side The degree of association being equal between Chinese medicine corresponding to two nodes that this edge is connected again;
The migration probability on each side in the Chinese medicine network is calculated using Random Walk Algorithm, thus to the Chinese medicine net Network carries out orientedization processing;
By orientedization processing the Chinese medicine network in, according to the migration probability on each side being calculated progress with Machine migration, to obtain multiple Chinese medicine sequences;Each Chinese medicine sequence is corresponding by the node passed through in random walk process Chinese medicine composition;
Vectorization processing is carried out to each Chinese medicine sequence, to obtain multiple Chinese medicine vectors;
Each Chinese medicine vector is handled using clustering algorithm;The clustering algorithm is used for each Chinese medicine vector Corresponding Chinese medicine is classified as corresponding classification;
Same category of Chinese medicine will be classified as to export as Chinese medicine corporations.
Further, the calculation formula of the dependency degree are as follows:
In formula, and Ind (h2 | h1) it is Chinese medicine h2 to Chinese medicine The dependency degree of h1, | h1 | it is frequency of occurrence of the Chinese medicine h1 in the prescription set, f (h1, h2)iFor simultaneously comprising Chinese medicine I-th of prescription in the prescription of drug h1 and Chinese medicine h2, f (h1, h2)i.length it is prescription f (h1, h2)iIn including The quantity of medicine drug.
Further, the calculation formula of the degree of association are as follows:
In formula, be Chinese medicine h1 and The degree of association between Chinese medicine h2, Ind (h2 | h1) are dependency degree of the Chinese medicine h2 to Chinese medicine h1, Ind (h1 | h2) It is Chinese medicine h1 to the dependency degree of Chinese medicine h2, | h1 | it is frequency of occurrence of the Chinese medicine h1 in the prescription set, | H2 | it is frequency of occurrence of the Chinese medicine h2 in the prescription set, k is preset second threshold.
Further, it is described using calculation formula used in Random Walk Algorithm be softmax function below:
In formula, σ (Z)jMigration by the j-th strip side connected the Chinese medicine nodes Z is general Rate, ZjBy the weight on the j-th strip side connected Chinese medicine nodes Z, i is serial number, and K is to connect in Chinese medicine network with node Z All sides quantity.
Further, in the Chinese medicine network by orientedization processing, according to each side being calculated Migration probability carries out random walk, so that the step for obtaining multiple Chinese medicine sequences, specifically includes:
Migration number corresponding to each node in the Chinese medicine network is set;
The item number on the side that each random walk is passed through is set;
According to the migration number, process while item number and migration probability at each, traverse in the Chinese medicine network All nodes carry out random walk respectively as starting point;
Chinese medicine corresponding to the node passed through in each secondary random walk process is exported by migration sequence, thus Obtain multiple Chinese medicine sequences.
Further, described that vectorization processing is carried out to each Chinese medicine sequence, thus obtain multiple Chinese medicine vectors this Step specifically includes:
It is input in the skip-gram model in Word2vec algorithm using each Chinese medicine sequence as document;
Receive the Hidden Layer Linear Neurons hidden layer neuron output in the skip-gram model Weight;
The weight that the hidden layer neuron exports is returned as Chinese medicine vector.
Further, it described the step for each Chinese medicine vector is handled using clustering algorithm, specifically includes:
Category setting is carried out to Fuzzy C-Means clustering algorithm;Each classification respectively corresponds corresponding third threshold value;
Each Chinese medicine vector is input in Fuzzy C-Means clustering algorithm, Fuzzy C-Means cluster is received and calculates The classification probability corresponding with each Chinese medicine vector of method output;
When the classification probability reaches corresponding third threshold value, by the corresponding Chinese medicine vector of the classification probability it is corresponding in Medicine drug is classified as the corresponding classification of third threshold value.
On the other hand, the invention also includes a kind of Chinese medicine community informations to generate system, comprising:
Prescription collection modules, for establishing prescription set;The prescription set includes multiple prescriptions, each prescription difference It is made of corresponding Chinese medicine;
Dependency degree computing module, for calculating separately each Chinese medicine in the prescription set to its in prescription set The dependency degree of his Chinese medicine;
Calculation of relationship degree module, for calculating the degree of association between each corresponding Chinese medicine according to each dependency degree;
Chinese medicine network module, for establishing Chinese medicine network;The Chinese medicine network include it is multiple respectively with the prescription set In the one-to-one node of each Chinese medicine;The degree of association between the Chinese medicine corresponding to any two described nodes is greater than When preset first threshold, there are the sides with weight that one connects two nodes, otherwise there is no this two sections of connection The side of point;The weight on the side is equal to the degree of association between Chinese medicine corresponding to two nodes that this edge is connected;
Orientedization processing module, the migration for calculating each side in the Chinese medicine network using Random Walk Algorithm are general Rate, to carry out orientedization processing to the Chinese medicine network;
Random walk module, for by orientedization handle the Chinese medicine network in, according to each item being calculated The migration probability on side carries out random walk, to obtain multiple Chinese medicine sequences;Each Chinese medicine sequence is by random walk process The corresponding Chinese medicine composition of the node passed through;
Vectorization processing module, for carrying out vectorization processing to each Chinese medicine sequence, thus obtain multiple Chinese medicines to Amount;
Cluster module, for being handled using clustering algorithm each Chinese medicine vector;The clustering algorithm is used for will The corresponding Chinese medicine of each Chinese medicine vector is classified as corresponding classification;
Output module is exported for that will be classified as same category of Chinese medicine as Chinese medicine corporations.
On the other hand, described the invention also includes a kind of Chinese medicine community information generating means, including memory and processor Memory is for storing at least one program, and the processor is for loading at least one described program to execute present invention side Method.
On the other hand, the invention also includes a kind of storage mediums, wherein being stored with the executable instruction of processor, the place The executable instruction of reason device is used to execute the method for the present invention when executed by the processor.
The beneficial effects of the present invention are: the present invention measures the relationship between two Chinese medicines by dependency degree, and with Weight of the dependency degree as side in Chinese medicine network increases the information content of Chinese medicine network, and random walk institute is carried out on Chinese medicine network Obtained Chinese medicine sequence and Chinese medicine vector is handled for clustering algorithm, it can be found that overlapping drug corporations, find in potential Medicine compatibility, the present invention with it is existing based on the algorithm of correlation rule or community discovery compared with have lower computation complexity, can To reach higher computational efficiency.
Detailed description of the invention
Fig. 1 is the flow chart of Chinese medicine community information generation method embodiment of the present invention;
Fig. 2 is the structure chart of Chinese medicine network in the embodiment of the present invention;
Fig. 3 is a structure chart by the Chinese medicine network before orientedization processing in the embodiment of the present invention;
Fig. 4 is the structure chart of a Chinese medicine network after orientedization processing in the embodiment of the present invention;
Fig. 5 is the structure chart of the Chinese medicine network of a progress random walk in the embodiment of the present invention;
Fig. 6 is the block diagram that the resulting Chinese medicine sequence of random walk is carried out in the embodiment of the present invention;
Fig. 7 is the schematic diagram of skip-gram model used in the embodiment of the present invention.
Specific embodiment
The present invention includes a kind of Chinese medicine community information generation method, referring to Fig.1, comprising the following steps:
S1. prescription set is established;The prescription set includes multiple prescriptions, and each prescription is respectively by corresponding Chinese medicine medicine Object composition;
S2. dependence of each Chinese medicine in the prescription set to other Chinese medicines in prescription set is calculated separately Degree;
S3. according to each dependency degree, the degree of association between each corresponding Chinese medicine is calculated;
S4. Chinese medicine network is established;The Chinese medicine network include it is multiple respectively with each Chinese medicine in the prescription set One-to-one node;The degree of association between the Chinese medicine corresponding to any two described nodes is greater than preset first threshold When, there are the sides with weight that one connects two nodes, otherwise there is no the sides for connecting two nodes;The side Weight is equal to the degree of association between Chinese medicine corresponding to two nodes that this edge is connected;
S5. the migration probability on each side in the Chinese medicine network is calculated using Random Walk Algorithm, thus in described Medicine network carries out orientedization processing;
S6. by orientedization handle the Chinese medicine network in, according to the migration probability on each side being calculated into Row random walk, to obtain multiple Chinese medicine sequences;Each Chinese medicine sequence is by the node pair that is passed through in random walk process The Chinese medicine composition answered;
S7. vectorization processing is carried out to each Chinese medicine sequence, to obtain multiple Chinese medicine vectors;
S8. each Chinese medicine vector is handled using clustering algorithm;The clustering algorithm is used for each Chinese medicine The corresponding Chinese medicine of vector is classified as corresponding classification;
S9. same category of Chinese medicine will be classified as to export as Chinese medicine corporations.
In step S1, by modes such as consulting distinguished veteran doctors of TCM, inquiry classical Chinese medicine or access Chinese medicine medicament databases, receive Collect multiple prescriptions with good therapeutic effect, to form prescription set.Each prescription is by one or more Chinese medicine medicines Object composition.For example, the prescription for treating prostatosis simply includes tortoise plastron, rhizoma anemarrhenae, peach kernel, radix fici simplicissimae, the root of three-nerved spicebush, herba taxilli, litchi The drugs such as branch shell.
Different prescriptions may include identical drug.It such as include radix fici simplicissimae in the prescription of above-mentioned treatment prostatosis This drug, and simply in the prescription of promoting the circulation of qi dampness removing also include this drug of radix fici simplicissimae.In the present embodiment, for being related in prescription And drug, be to be distinguished according to its property as medicament categories itself.For example, working as the side of above-mentioned treatment prostatosis It had both included in the prescription for the treatment of prostatosis when the prescription of agent and simply promoting the circulation of qi dampness removing forms prescription set, in the prescription set Both radix fici simplicissimaes are considered as in prescription set together by the radix fici simplicissimae in the prescription of radix fici simplicissimae and promoting the circulation of qi dampness removing, the present embodiment One Chinese medicine.
Each Chinese medicine in the prescription set is calculated separately in step S2 to other Chinese medicines in prescription set Dependency degree.For example, when the prescription of above-mentioned treatment prostatosis and the prescription composition prescription set of promoting the circulation of qi dampness removing simply, for The radix fici simplicissimae in the prescription of prostatosis is treated, tortoise plastron, rhizoma anemarrhenae, peach in its prescription with treatment prostatosis are calculated separately In benevolence, the root of three-nerved spicebush, herba taxilli and lychee exocarp and other prescriptions etc. dependency degree between other Chinese medicines;For treating forefront Tortoise plastron in the prescription of adenopathy, calculate separately its with treatment prostatosis prescription in radix fici simplicissimae, rhizoma anemarrhenae, peach kernel, the root of three-nerved spicebush, In herba taxilli and lychee exocarp and other prescriptions etc. dependency degree between other Chinese medicines.
In the present embodiment, it is calculated by the following formula dependency degree in step s 2:
In formula, and Ind (h2 | h1) it is Chinese medicine h2 to Chinese medicine The dependency degree of h1, | h1 | it is frequency of occurrence of the Chinese medicine h1 in the prescription set, f (h1, h2)iFor simultaneously comprising Chinese medicine I-th of prescription in the prescription of drug h1 and Chinese medicine h2, f (h1, h2)i.length it is prescription f (h1, h2)iIn including The quantity of medicine drug.
In the formula that the present embodiment calculates dependency degree, every kind of Chinese medicine is the property using it as medicament categories itself The five fingers hair in the prescription of radix fici simplicissimae and promoting the circulation of qi dampness removing come what is be marked, such as in the prescription for treating prostatosis Peach marks, | h1 | indicate frequency of occurrence of the radix fici simplicissimae in the prescription set, such as in prescription set only simply The prescription of the prescription for treating prostatosis and promoting the circulation of qi dampness removing simply includes radix fici simplicissimae, then radix fici simplicissimae is one in the prescription set Occur altogether twice, | h1 |=2.f(h1,h2)iFor i-th of side in the prescription simultaneously comprising Chinese medicine h1 and Chinese medicine h2 There is the prescription and the prescription of kidney tonifying simply for treating prostatosis simply in agent, such as prescription set, they all include the five fingers hair Peach and rhizoma anemarrhenae are that rhizoma anemarrhenae is labeled as h2, then can use f (h1, h2) if radix fici simplicissimae marked1Before indicating treatment The prescription of column adenopathy, with f (h1, h2)2Indicate the prescription of kidney tonifying.f(h1,h2)i.length it is prescription f (h1, h2)iIn including The quantity of medicine drug, such as f (h1, h2) in the present embodiment1Indicate the prescription for the treatment of prostatosis, the complete formula of the prescription For radix fici simplicissimae, rhizoma anemarrhenae, peach kernel, the root of three-nerved spicebush, herba taxilli and lychee exocarp, i.e., it includes 6 kinds of Chinese medicines that the prescription, which has altogether, then f (h1, h2)1.length=6.
In the present embodiment, by the calculation formula of dependency degree it is found that the calculating of dependency degree does not have exchangeability, i.e. Chinese medicine H2 is usually unequal to the dependency degree and dependency degree of the Chinese medicine h1 to Chinese medicine h2 of Chinese medicine h1.
In step S3, resulting each dependency degree is calculated according to step S2, calculates the degree of association between each corresponding Chinese medicine. In the present embodiment, the degree of association is the parameter of the correlation degree in a reflection prescription set between any two Chinese medicines.In step During rapid S2 calculates each dependency degree, the mistake of any two Chinese medicine group patent medicine pair in prescription set is actually had been completed Journey, therefore can be directly using each medicine pair composed by step S2 and medicine to related two Chinese medicines in step S3 Dependency degree, to calculate the degree of association of the medicine pair.
In the present embodiment, it is calculated by the following formula the degree of association in step s3:
In formula, the degree of association between Chinese medicine h1 and Chinese medicine h2, Ind (h2 | h1) it is Chinese medicine h2 to Chinese medicine medicine The dependency degree of object h1, Ind (h1 | h2) it is dependency degree of the Chinese medicine h1 to Chinese medicine h2, | h1 | it is Chinese medicine h1 in institute The frequency of occurrence in prescription set is stated, | h2 | it is frequency of occurrence of the Chinese medicine h2 in the prescription set, k is preset the Two threshold values.
In the present embodiment, can directly by Chinese medicine h1 to the dependency degree or Chinese medicine h2 of Chinese medicine h2 to Chinese medicine The dependency degree of drug h1 is as the degree of association between Chinese medicine h1 and Chinese medicine h2.Specifically, a second threshold is set K, as | h1 | and | h2 | in reckling when being less than k, by Chinese medicine h1 to the dependency degree and Chinese medicine h2 of Chinese medicine h2 To the smaller in the dependency degree of Chinese medicine h1 as the degree of association between Chinese medicine h1 and Chinese medicine h2;When | h1 | and | h2 | in reckling when being greater than or equal to k, by Chinese medicine h1 to the dependency degree of Chinese medicine h2 and Chinese medicine h2 centering The greater in the dependency degree of medicine drug h1 is as the degree of association between Chinese medicine h1 and Chinese medicine h2.Pass through setting second Threshold value and according to second threshold and | h1 | and | h2 | between size relation determine the occurrence of the degree of association, can be to avoid direct The unreasonable situation that the greater in Ind (h2 | h1) and Ind (h1 | h2) may cause as the degree of association, the unreasonable feelings Condition, which refers to, will lead to that the degree of association is very big when the denominator very little in formula, so that the Chinese medicine of itself frequency of occurrence very little and same The degree of association of Chinese medicine in one prescription is very big.
In the present embodiment, by the calculation formula of the degree of association it is found that the calculating of the degree of association has exchangeability, i.e. Chinese medicine h2 The degree of association and the degree of association of the Chinese medicine h1 to Chinese medicine h2 to Chinese medicine h1 are equal.
In step S4, Chinese medicine network is established, which is actually each Chinese medicine medicine in a record prescription set Object and its between correlation data set.Referring to Fig. 2, which includes multiple nodes, these nodes and prescription set In each Chinese medicine correspond.The Chinese medicine network further includes the side of two nodes of some connections, the create-rule on these sides It is: when the degree of association between the Chinese medicine corresponding to any two described nodes is greater than preset first threshold, then generates one Item connects the side of the company node, and assigns the degree of association between Chinese medicine corresponding to two nodes as weight To this edge;When the degree of association between the Chinese medicine corresponding to two nodes is less than preset first threshold, the two sections Just there is no side between point.
Preferentially, first threshold can be set as 0, as long as since two Chinese medicines appear in inside same prescription, this The degree of association between two Chinese medicines just has the value greater than 0, as long as that is, two Chinese medicines appear in same prescription Just there is a line in the inside between two nodes corresponding with two Chinese medicines of the two Chinese medicines in Chinese medicine network.
It can be that each side in Chinese medicine network assigns weight by following steps: will be in Chinese medicine network in step S4 The degree of association between the corresponding Chinese medicine of each node is initialized as 0, all prescriptions in prescription set is then traversed, according to step The pass between the resulting medicine pair of Chinese medicine combination of two that rapid S2 and S3 is related to when observing one of prescription every time to calculate The degree of association that each time executes the resulting same medicine pair of step S2 and S3 is added up, has all been observed when to all prescriptions by connection degree Bi Hou, obtained final accumulated value are the degrees of association of the medicine pair, i.e., the medicine is to two sections in corresponding Chinese medicine network The weight on the side between point.
The degree of association calculated between resulting two Chinese medicines in step S3 has exchangeability, correspondingly in Chinese medicine network The weight on the side between two nodes also has exchangeability, therefore the side of the resulting Chinese medicine network of step S4 has scalar property.Step In rapid S5, on the original weighted basis in side between the two nodes, calculated in the Chinese medicine network using Random Walk Algorithm Each side migration probability, assign weight again for each side.The migration probability and node calculated due to Random Walk Algorithm Start-stop sequence it is related, therefore again assign weight after Chinese medicine network in each side have direction, realize to Chinese medicine net Orientedization of network is handled.
In the present embodiment, in step s 5, it is described using calculation formula used in Random Walk Algorithm be it is below Softmax function:
In formula, σ (Z)jMigration by the j-th strip side connected the Chinese medicine nodes Z is general Rate, ZjBy the weight on the j-th strip side connected Chinese medicine nodes Z, i is serial number, and K is to connect in Chinese medicine network with node Z All sides quantity.
Fig. 3 is the Chinese medicine network with 4 nodes, and before by orientedization processing, each side is undirected.It is right Chinese medicine network shown in Fig. 3 obtains network shown in Fig. 4 after carrying out orientedization processing, and the weight on each at this time side is connected to it The start-stop relationship of two nodes is related, therefore each side has direction.
In step S6, in the Chinese medicine network by orientedization processing, according to the migration on each side being calculated Probability carries out random walk, to obtain multiple Chinese medicine sequences.Step S6 specifically includes the following steps:
S601., migration number corresponding to each node in the Chinese medicine network is set;
S602., the item number on the side that each random walk is passed through is set;
S603. according to the migration number, process while item number and migration probability at each, traverse the Chinese medicine net All nodes carry out random walk respectively as starting point in network;
S604. Chinese medicine corresponding to the node passed through in each secondary random walk process is carried out by migration sequence defeated Out, to obtain multiple Chinese medicine sequences.
Setting has the migration number to be carried out altogether in step S601.In step S603, migration each time is respectively from Chinese medicine net A node in network starts to set out, and according to the item number on the side that each random walk being arranged in step S602 is passed through, passes through Stop the secondary migration after multiple summits, passed through in the secondary migration when being as each and its migration probability determined by direction It determines at random.In step S604, the corresponding Chinese medicine of the node that each random walk is passed through is carried out defeated by migration sequence Out, each random walk all obtains a Chinese medicine sequence, by the available multiple Chinese medicine sequences of multiple random walk.
The principle of step S6 is as shown in Figure 4 and Figure 5.Fig. 4 is the Chinese medicine network with 6 nodes, Fig. 6 be respectively from Chinese medicine 1, Chinese medicine 2 and Chinese medicine 6 set out, and carry out the resulting Chinese medicine sequence of random walk by different migration routes.
In step S7, vectorization processing is carried out to each Chinese medicine sequence, to obtain multiple Chinese medicine vectors.Step S7 tool Body the following steps are included:
S701. the skip-gram model being input to using each Chinese medicine sequence as document in Word2vec algorithm In;
S702. the Hidden Layer Linear Neurons hidden layer neuron in the skip-gram model is received The weight of output;
S703. the weight that the hidden layer neuron exports is returned as Chinese medicine vector.
The principle of skip-gram model is as shown in Figure 7.Skip-gram model can receive document, and according to receiving Medium term in document predicts context, this function of skip-gram model is utilized in the present embodiment, using Chinese medicine sequence as Document is input in skip-gram model.Hidden Layer Linear Neurons hidden layer in skip-gram model Neuron can export the received corresponding weighted value of Chinese medicine sequence, which is exactly Chinese medicine corresponding with Chinese medicine sequence Vector.Resulting Chinese medicine vector is used directly for the processing of step S8, can also incorporate use after other Chinese medicine properties features In the processing of step S8.
In step S8, each Chinese medicine vector is handled using clustering algorithm.Step S8 is specifically included:
S801. category setting is carried out to Fuzzy C-Means clustering algorithm;Each classification respectively corresponds corresponding third Threshold value;
Each Chinese medicine vector is input in Fuzzy C-Means clustering algorithm by S802, and it is poly- to receive Fuzzy C-Means The classification probability corresponding with each Chinese medicine vector of class algorithm output;
S803 is corresponding by the corresponding Chinese medicine vector of the classification probability when the classification probability reaches corresponding third threshold value Chinese medicine be classified as the corresponding classification of third threshold value.
Clustering algorithm used in step S8 is Fuzzy C-Means clustering algorithm.Poly- using Fuzzy C-Means When class algorithm, first set it is multiple need classification, and set third threshold value corresponding with these classifications.Fuzzy C-Means cluster After algorithm receives each Chinese medicine vector, corresponding with these Chinese medicine vectors classification probability is exported, when classification probability reaches a certain the When three threshold values, the corresponding Chinese medicine of corresponding Chinese medicine vector is classified as the corresponding classification of third threshold value.When to all Chinese medicines After the clustering processing of vector is completed, be arranged in Fuzzy C-Means clustering algorithm it is of all categories have corresponding Chinese medicine, I.e. Chinese medicine is classified as various classifications.In step S9, same category of Chinese medicine will be classified as Chinese medicine corporations It being exported, these Chinese medicine corporations reflect the medical information hidden in prescription, can be used for further studying or testing, from And reach the new prescription to work well.
By being suitably arranged and being carried out experimental verification, the present embodiment method to Fuzzy C-Means clustering algorithm Can have following effect: for treatment prostatosis include tortoise plastron, rhizoma anemarrhenae, peach kernel, radix fici simplicissimae, the root of three-nerved spicebush, herba taxilli and Tortoise plastron and rhizoma anemarrhenae are classified as one kind by the prescription of the Chinese medicines such as lychee exocarp, and radix fici simplicissimae, the root of three-nerved spicebush and herba taxilli are classified as one kind. According to theory of traditional Chinese medical science and further research it is found that tortoise plastron and rhizoma anemarrhenae are the core medicines in the prescription with prostatosis therapeutic effect Object, although radix fici simplicissimae, the root of three-nerved spicebush and herba taxilli do not have apparent prostatosis therapeutic effect, belong to according to body and The pharmaceutical composition used, equally have medical value, therefore the present embodiment method realize to radix fici simplicissimae, the root of three-nerved spicebush and herba taxilli this The excavation of one Chinese medicine compatibility.
S1-S9 in the present embodiment is a kind of Graph Embedding method, is had the advantage that
1, the relationship between two Chinese medicines is measured by dependency degree, and using dependency degree as side in Chinese medicine network Weight increases the information content of Chinese medicine network;
2, according to the weight on side in Chinese medicine network, probability digraph is converted by Chinese medicine network using softmax formula, is made Random walk can be realized on this basis by obtaining;
3, the random walk carried out on probability digraph is cum rights DeepWalk, compared with the DeepWalk on basis, is examined Consider weight, Chinese medicine sequence can be exported and further export Chinese medicine vector, which can be easier to incorporate other Chinese medicine properties feature;
4, Chinese medicine vector is handled using Fuzzy C-Means clustering algorithm, it can be found that overlapping drug corporations, hair Now potential drug matching.
5, the drug vector obtained by Graph Embedding technology is sent out with existing based on correlation rule or corporations Existing algorithm, which is compared, has lower computation complexity, can achieve higher computational efficiency.
The present embodiment further includes a kind of Chinese medicine community information generation system, comprising:
Prescription collection modules, for establishing prescription set;The prescription set includes multiple prescriptions, each prescription difference It is made of corresponding Chinese medicine;
Dependency degree computing module, for calculating separately each Chinese medicine in the prescription set to its in prescription set The dependency degree of his Chinese medicine;
Calculation of relationship degree module, for calculating the degree of association between each corresponding Chinese medicine according to each dependency degree;
Chinese medicine network module, for establishing Chinese medicine network;The Chinese medicine network include it is multiple respectively with the prescription set In the one-to-one node of each Chinese medicine;The degree of association between the Chinese medicine corresponding to any two described nodes is greater than When preset first threshold, there are the sides with weight that one connects two nodes, otherwise there is no this two sections of connection The side of point;The weight on the side is equal to the degree of association between Chinese medicine corresponding to two nodes that this edge is connected;
Orientedization processing module, the migration for calculating each side in the Chinese medicine network using Random Walk Algorithm are general Rate, to carry out orientedization processing to the Chinese medicine network;
Random walk module, for by orientedization handle the Chinese medicine network in, according to each item being calculated The migration probability on side carries out random walk, to obtain multiple Chinese medicine sequences;Each Chinese medicine sequence is by random walk process The corresponding Chinese medicine composition of the node passed through;
Vectorization processing module, for carrying out vectorization processing to each Chinese medicine sequence, thus obtain multiple Chinese medicines to Amount;
Cluster module, for being handled using clustering algorithm each Chinese medicine vector;The clustering algorithm is used for will The corresponding Chinese medicine of each Chinese medicine vector is classified as corresponding classification;
Output module is exported for that will be classified as same category of Chinese medicine as Chinese medicine corporations.
Each module in the Chinese medicine community information generation system, can be hardware module or software with corresponding function Module.
The invention also includes a kind of Chinese medicine community information generating means, including memory and processor, the memory is used In storing at least one program, the processor is for loading at least one described program to execute the method for the present invention.
The invention also includes a kind of storage mediums, wherein being stored with the executable instruction of processor, the processor can be held Capable instruction is used to execute the method for the present invention when executed by the processor.
Chinese medicine community information in the present embodiment generates system, device and storage medium, can execute Chinese medicine of the invention Community information generation method, any combination implementation steps of executing method embodiment have the corresponding function of this method and have Beneficial effect.
It is to be illustrated to preferable implementation of the invention, but the implementation is not limited to the invention above Example, those skilled in the art can also make various equivalent variations on the premise of without prejudice to spirit of the invention or replace It changes, these equivalent deformations or replacement are all included in the scope defined by the claims of the present application.

Claims (10)

1. a kind of Chinese medicine community information generation method, which comprises the following steps:
Establish prescription set;The prescription set includes multiple prescriptions, and each prescription is made of corresponding Chinese medicine respectively;
Each Chinese medicine in the prescription set is calculated separately to the dependency degree of other Chinese medicines in prescription set;
According to each dependency degree, the degree of association between each corresponding Chinese medicine is calculated;
Establish Chinese medicine network;The Chinese medicine network include it is multiple respectively in the prescription set each Chinese medicine correspond Node;When the degree of association between the Chinese medicine corresponding to any two described nodes is greater than preset first threshold, exist One connects the side with weight of two nodes, otherwise there is no the sides for connecting two nodes;The weight etc. on the side The degree of association between the Chinese medicine corresponding to two nodes that this edge is connected;
The migration probability on each side in the Chinese medicine network is calculated using Random Walk Algorithm, thus to the Chinese medicine network into Row orientedization processing;
In the Chinese medicine network by orientedization processing, swum at random according to the migration probability on each side being calculated It walks, to obtain multiple Chinese medicine sequences;Each Chinese medicine sequence is by the corresponding Chinese medicine of the node passed through in random walk process Drug composition;
Vectorization processing is carried out to each Chinese medicine sequence, to obtain multiple Chinese medicine vectors;
Each Chinese medicine vector is handled using clustering algorithm;The clustering algorithm is used for each Chinese medicine vector is corresponding Chinese medicine be classified as corresponding classification;
Same category of Chinese medicine will be classified as to export as Chinese medicine corporations.
2. a kind of Chinese medicine community information generation method according to claim 1, which is characterized in that the calculating of the dependency degree Formula are as follows:
In formula, Ind (h2 | h1) be Chinese medicine h2 to Chinese medicine h1 according to Lai Du, | h1 | it is frequency of occurrence of the Chinese medicine h1 in the prescription set, f (h1, h2)iFor simultaneously comprising Chinese medicine h1 With i-th of prescription in the prescription of Chinese medicine h2, f (h1, h2)i.length it is prescription f (h1, h2)iThe Chinese medicine for including Quantity.
3. a kind of Chinese medicine community information generation method according to claim 2, which is characterized in that the calculating of the degree of association Formula are as follows:
It is Chinese medicine h1 and Chinese medicine in formula The degree of association between drug h2, Ind (h2 | h1) are dependency degree of the Chinese medicine h2 to Chinese medicine h1, during Ind (h1 | h2) is Medicine drug h1 to the dependency degree of Chinese medicine h2, | h1 | be frequency of occurrence of the Chinese medicine h1 in the prescription set, | h2 | For frequency of occurrence of the Chinese medicine h2 in the prescription set, k is preset second threshold.
4. a kind of Chinese medicine community information generation method according to claim 1, which is characterized in that described to use random walk Calculation formula used in algorithm is softmax function below:
In formula, σ (Z)jBy the migration probability on the j-th strip side connected the Chinese medicine nodes Z, Zj By the weight on the j-th strip side connected Chinese medicine nodes Z, i is serial number, and K is all for what is connect in Chinese medicine network with node Z The quantity on side.
5. a kind of Chinese medicine community information generation method according to claim 1, which is characterized in that described to pass through orientedization In the Chinese medicine network of processing, random walk is carried out according to the migration probability on each side being calculated, to obtain multiple It the step for Chinese medicine sequence, specifically includes:
Migration number corresponding to each node in the Chinese medicine network is set;
The item number on the side that each random walk is passed through is set;
According to the migration number, process while item number and migration probability at each, traverse in the Chinese medicine network and own Node carries out random walk respectively as starting point;
Chinese medicine corresponding to the node passed through in each secondary random walk process is exported by migration sequence, to obtain Multiple Chinese medicine sequences.
6. a kind of Chinese medicine community information generation method according to claim 1, which is characterized in that described to each Chinese medicine Sequence carries out vectorization processing, so that the step for obtaining multiple Chinese medicine vectors, specifically includes:
It is input in the skip-gram model in Word2vec algorithm using each Chinese medicine sequence as document;
Receive the power of the Hidden Layer Linear Neurons hidden layer neuron output in the skip-gram model Weight;
The weight that the hidden layer neuron exports is returned as Chinese medicine vector.
7. a kind of Chinese medicine community information generation method according to claim 1, which is characterized in that described to use clustering algorithm The step for handling each Chinese medicine vector, specifically includes:
Category setting is carried out to Fuzzy C-Means clustering algorithm;Each classification respectively corresponds corresponding third threshold value;
Each Chinese medicine vector is input in Fuzzy C-Means clustering algorithm, it is defeated to receive Fuzzy C-Means clustering algorithm Classification probability corresponding with each Chinese medicine vector out;
When the classification probability reaches corresponding third threshold value, by the corresponding Chinese medicine medicine of the corresponding Chinese medicine vector of the classification probability Object is classified as the corresponding classification of third threshold value.
8. a kind of Chinese medicine community information generates system characterized by comprising
Prescription collection modules, for establishing prescription set;The prescription set includes multiple prescriptions, and each prescription is respectively by phase The Chinese medicine composition answered;
Dependency degree computing module, for calculating separately each Chinese medicine in the prescription set in other in prescription set The dependency degree of medicine drug;
Calculation of relationship degree module, for calculating the degree of association between each corresponding Chinese medicine according to each dependency degree;
Chinese medicine network module, for establishing Chinese medicine network;The Chinese medicine network includes multiple respectively and in the prescription set Each one-to-one node of Chinese medicine;The degree of association between the Chinese medicine corresponding to any two described nodes is greater than default First threshold when, there are the side with weight that one connects two nodes, otherwise there is no connect two nodes Side;The weight on the side is equal to the degree of association between Chinese medicine corresponding to two nodes that this edge is connected;
Orientedization processing module, for calculating the migration probability on each side in the Chinese medicine network using Random Walk Algorithm, To carry out orientedization processing to the Chinese medicine network;
Random walk module, for by orientedization handle the Chinese medicine network in, according to each side being calculated Migration probability carries out random walk, to obtain multiple Chinese medicine sequences;Each Chinese medicine sequence in random walk process by being passed through The corresponding Chinese medicine composition of the node crossed;
Vectorization processing module, for carrying out vectorization processing to each Chinese medicine sequence, to obtain multiple Chinese medicine vectors;
Cluster module, for being handled using clustering algorithm each Chinese medicine vector;The clustering algorithm is used for each institute It states the corresponding Chinese medicine of Chinese medicine vector and is classified as corresponding classification;
Output module is exported for that will be classified as same category of Chinese medicine as Chinese medicine corporations.
9. a kind of Chinese medicine community information generating means, which is characterized in that including memory and processor, the memory is for depositing At least one program is stored up, the processor is required described in any one of 1-7 for loading at least one described program with perform claim Method.
10. a kind of storage medium, wherein being stored with the executable instruction of processor, which is characterized in that the processor is executable Instruction be used to execute such as any one of claim 1-7 the method when executed by the processor.
CN201910104918.3A 2019-02-01 2019-02-01 Traditional Chinese medicine community information generation method, system, device and storage medium Active CN110010251B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910104918.3A CN110010251B (en) 2019-02-01 2019-02-01 Traditional Chinese medicine community information generation method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910104918.3A CN110010251B (en) 2019-02-01 2019-02-01 Traditional Chinese medicine community information generation method, system, device and storage medium

Publications (2)

Publication Number Publication Date
CN110010251A true CN110010251A (en) 2019-07-12
CN110010251B CN110010251B (en) 2022-04-15

Family

ID=67165631

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910104918.3A Active CN110010251B (en) 2019-02-01 2019-02-01 Traditional Chinese medicine community information generation method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN110010251B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110706789A (en) * 2019-10-10 2020-01-17 电子科技大学 Excavation method for incompatibility of traditional Chinese medicines
WO2022179384A1 (en) * 2021-02-26 2022-09-01 山东英信计算机技术有限公司 Social group division method and division system, and related apparatuses

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101615222A (en) * 2008-06-23 2009-12-30 中国医学科学院放射医学研究所 A kind of Chinese prescription designing technique based on the Chinese medicine effective component group
CN104820775A (en) * 2015-04-17 2015-08-05 南京大学 Discovery method of core drug of traditional Chinese medicine prescription
CN106126649A (en) * 2016-06-24 2016-11-16 北京千安哲信息技术有限公司 A kind of similar Chinese crude drug method for digging and device
CN107519262A (en) * 2017-10-14 2017-12-29 杜运升 One kind is promoted the sexual maturity scattered medicine and preparation method and application
CN108037093A (en) * 2017-12-20 2018-05-15 荣贵福 It is a kind of differentiate " Baizhi to be measured whether the method for carrying out sulfur fumigation
CN108647236A (en) * 2018-03-30 2018-10-12 山东管理学院 A kind of prescriptions of traditional Chinese medicine vector space model method and device based on Term co-occurrence

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101615222A (en) * 2008-06-23 2009-12-30 中国医学科学院放射医学研究所 A kind of Chinese prescription designing technique based on the Chinese medicine effective component group
CN104820775A (en) * 2015-04-17 2015-08-05 南京大学 Discovery method of core drug of traditional Chinese medicine prescription
CN106126649A (en) * 2016-06-24 2016-11-16 北京千安哲信息技术有限公司 A kind of similar Chinese crude drug method for digging and device
CN107519262A (en) * 2017-10-14 2017-12-29 杜运升 One kind is promoted the sexual maturity scattered medicine and preparation method and application
CN108037093A (en) * 2017-12-20 2018-05-15 荣贵福 It is a kind of differentiate " Baizhi to be measured whether the method for carrying out sulfur fumigation
CN108647236A (en) * 2018-03-30 2018-10-12 山东管理学院 A kind of prescriptions of traditional Chinese medicine vector space model method and device based on Term co-occurrence

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
严红梅等: "基于中药组分和"组分结构"理论的中药研究模式的探讨", 《中草药》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110706789A (en) * 2019-10-10 2020-01-17 电子科技大学 Excavation method for incompatibility of traditional Chinese medicines
CN110706789B (en) * 2019-10-10 2022-05-24 电子科技大学 Excavation method for incompatibility of traditional Chinese medicines
WO2022179384A1 (en) * 2021-02-26 2022-09-01 山东英信计算机技术有限公司 Social group division method and division system, and related apparatuses

Also Published As

Publication number Publication date
CN110010251B (en) 2022-04-15

Similar Documents

Publication Publication Date Title
CN106933994B (en) Traditional Chinese medicine knowledge graph-based core disease and syndrome relation construction method
Huang et al. Integration of patch features through self-supervised learning and transformer for survival analysis on whole slide images
Tang et al. CapSurv: Capsule network for survival analysis with whole slide pathological images
Qahtan et al. Review of healthcare industry 4.0 application-based blockchain in terms of security and privacy development attributes: Comprehensive taxonomy, open issues and challenges and recommended solution
CN109949929A (en) A kind of assistant diagnosis system based on the extensive case history of deep learning
Li et al. Multi-modal multi-instance learning using weakly correlated histopathological images and tabular clinical information
Mancini et al. Computing biological model parameters by parallel statistical model checking
CN107644062A (en) The knowledge content Weight Analysis System and method of a kind of knowledge based collection of illustrative plates
CN110010251A (en) A kind of Chinese medicine community information generation method, system, device and storage medium
Hao et al. Intelligent diagnosis of jaundice with dynamic uncertain causality graph model
Liu et al. Multi-branch fusion auxiliary learning for the detection of pneumonia from chest X-ray images
Akter et al. Hepatocellular carcinoma patient’s survival prediction using oversampling and machine learning techniques
CN108206056A (en) A kind of nasopharyngeal carcinoma artificial intelligence assisting in diagnosis and treatment decision terminal
CN114141361B (en) Traditional Chinese medicine prescription recommendation method based on symptom term mapping and deep learning
CN113111657A (en) Cross-language knowledge graph alignment and fusion method, device and storage medium
WO2023134060A1 (en) Information pushing method and apparatus based on drug molecule image classification
CN116936108A (en) Unbalanced data-oriented disease prediction system
Ma et al. Prediction of disease progression of chronic hepatitis C based on XGBoost algorithm
Jain et al. Diagnosing covid-19 and pneumonia from chest ct-scan and x-ray images using deep learning technique
He et al. Evolutionary multi-objective architecture search framework: Application to covid-19 3d ct classification
Zheng et al. Learning from the guidance: Knowledge embedded meta-learning for medical visual question answering
Zhou et al. Audit to Forget: A Unified Method to Revoke Patients' Private Data in Intelligent Healthcare
Yu et al. Deep learning hybrid models for COVID-19 prediction
CN115223657B (en) Medicinal plant transcriptional regulation map prediction method
Xu et al. Gene mutation classification using CNN and BiGRU network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant