CN111180045B - Method for mining relation between drug pairs and efficacy from prescription information - Google Patents

Method for mining relation between drug pairs and efficacy from prescription information Download PDF

Info

Publication number
CN111180045B
CN111180045B CN201911165949.6A CN201911165949A CN111180045B CN 111180045 B CN111180045 B CN 111180045B CN 201911165949 A CN201911165949 A CN 201911165949A CN 111180045 B CN111180045 B CN 111180045B
Authority
CN
China
Prior art keywords
prescription
efficacy
vocabulary
pair
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911165949.6A
Other languages
Chinese (zh)
Other versions
CN111180045A (en
Inventor
张引
白宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201911165949.6A priority Critical patent/CN111180045B/en
Publication of CN111180045A publication Critical patent/CN111180045A/en
Application granted granted Critical
Publication of CN111180045B publication Critical patent/CN111180045B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/90ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to alternative medicines, e.g. homeopathy or oriental medicines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention discloses a method for mining the relation between a medicine pair and efficacy from prescription information. The method comprises the following steps: 1) Authoritative prescription information data is collected, wherein the data comprises prescription efficacy main treatment information and traditional Chinese medicine composition information. 2) And the data is cleaned and structured, so that the subsequent model training and information mining are facilitated. 3) And (3) constructing a data mining model, fitting the samples, and learning parameters with strong interpretability. 4) And acquiring the interpretive parameters learned by the model, performing subsequent filtering treatment, removing noise, reserving medicine pair information, and mining the relation between the medicine pair and efficacy. According to the invention, a heuristic strategy is adopted for filtering, the association degree between the efficacy and the drug pair is measured according to the efficacy prediction accuracy, and most of invalid relations can be removed.

Description

Method for mining relation between drug pairs and efficacy from prescription information
Technical Field
The invention relates to the fields of forward network and interpretability theory in a neural network and unsupervised training. In particular to a method for mining drug pairs and efficacy from prescription information.
Background
Prescription refers to the recipe of medicine, the recipe is prepared from ancient times, and the prescription refers to the prescription of prescription for treating diseases. Single drugs have long been used in the ancient times of China to treat diseases. Through long-term medical practice, a plurality of medicines are matched, and decoction is prepared, namely the earliest prescription. The prescription contains the application relation between the traditional Chinese medicine and the efficacy.
The drug pairs are the combination relation of two different drugs, the different drug pairs have different application values, and the relation between the drug pairs and the efficacy is mined from the prescription, so that the traditional Chinese medicine expert can be statistically helped to analyze the functions exerted by the different drug pairs.
Data mining, which is a hotspot problem in artificial intelligence and database field research, refers to a non-trivial process of revealing implicit, previously unknown and potentially valuable information from a vast amount of data in a database. The data mining is a decision support process, and is mainly based on artificial intelligence, machine learning, pattern recognition, statistics, databases, visualization technologies and the like, so that enterprise data is analyzed with high automation, inductive reasoning is made, potential patterns are mined from the data, a decision maker is helped to adjust market strategies, risks are reduced, and correct decisions are made. The knowledge discovery process consists of three phases: (1) preparing data; (2) data mining; (3) results are expressed and interpreted. Data mining may interact with users or knowledge bases.
Neural networks perform very well in supervised task learning, but are somewhat limited in application in the unsupervised field due to their non-interpretability. The invention uses the forward network with simpler structure in the neural network, maintains the network interpretability, and indirectly obtains the information of the task B through training the task A, thereby realizing the unsupervised data mining.
And establishing a corresponding relation between the drug pair and the efficacy by using the authoritative prescription information data. The technical difficulties involved include: 1. lack of annotated supervision data, 2. How to prevent overfitting, 3. How to reject trained mispredictions, 4. How to design a model.
Disclosure of Invention
The invention aims to provide an explanatory neural network by utilizing authoritative prescription information data, providing a strategy for preventing overfitting, providing an invalid relation filtering method of drug pairs and efficacy, and finally obtaining the corresponding relation between the drug pairs and the efficacy.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a method for mining the relation between drug pairs and efficacy from prescription information comprises the following steps:
1) Collecting authoritative prescription information data, and extracting prescription information through an OCR and rule method;
2) Extracting composition information of text types from the prescription information to generate structural prescription composition data corresponding to each prescription sample;
3) Extracting keywords in the prescription efficacy description text from all prescription information, and establishing a prescription efficacy vocabulary; determining prescription efficacy vocabulary vectors corresponding to each prescription sample according to prescription efficacy description texts and prescription efficacy vocabulary corresponding to each prescription sample, wherein the prescription efficacy vocabulary vectors corresponding to all the prescription samples form a prescription efficacy vocabulary matrix;
4) Pairing all the medicine components in the structural prescription composition data of all the prescription samples in pairs; in order to prevent overfitting, counting the occurrence frequency of each drug pair in all prescription samples, screening out high-frequency drug pairs to form a high-frequency drug pair table, and further reducing the size of an input vocabulary; according to the structural prescription composition data and the high-frequency medicine pair table corresponding to each prescription sample, a high-frequency medicine pair vector corresponding to each prescription sample is obtained, and the high-frequency medicine pair vectors corresponding to all prescription samples form a high-frequency medicine pair matrix;
5) Designing a model structure: to maintain the interpretability of the model, the model selects a single layer forward network, and model parameters will represent the degree of association between each drug pair vocabulary and each efficacy; training a single-layer forward network according to the prescription efficacy vocabulary matrix obtained in the step 3) and the high-frequency medicine pair matrix obtained in the step 4) to obtain a trained network model;
6) Extracting weight parameters from the trained model, and filtering according to a heuristic strategy, wherein the heuristic strategy is as follows: based on all the formulas containing the medicine pair A, calculating the accuracy of the formulas containing the efficacy B, wherein if the accuracy is higher, the probability that the efficacy of the medicine pair A is B is higher, and the specific operation is as follows: inputting the high-frequency medicine pair matrix obtained in the step 4) into a trained network model to obtain a final predicted prescription efficacy vocabulary matrix; and (3) calculating the prediction accuracy of each medicine pair on different efficacy vocabularies and sequencing according to the finally predicted prescription efficacy vocabulary matrix and the actual prescription efficacy vocabulary matrix obtained in the step (3), so as to obtain the relation between the medicine pair and the efficacy.
Preferably, TCM prescription composition component extraction tools are used for carrying out structural extraction on the composition information of the prescription on the prescription information extracted in the step 1).
Preferably, the step 3) specifically includes:
3.1 Extracting keywords in the prescription efficacy description text from prescription information of all prescription samples, wherein the keywords comprise all four-word vocabulary, three-word vocabulary and two-word vocabulary, and splitting the four-word vocabulary into two-word vocabulary; counting the occurrence times of each different vocabulary, and filtering the low-frequency vocabulary to obtain a prescription efficacy vocabulary containing p efficacy vocabularies;
3.2 According to the efficacy of the prescription corresponding to each prescription sampleDescribing a text and a prescription efficacy vocabulary, determining prescription efficacy vocabulary vectors corresponding to each prescription sample, wherein the prescription efficacy vocabulary vectors corresponding to all prescription samples form a prescription efficacy vocabulary matrix Y= { Y ij I=0, 1, &..n; j=0, 1, p. n represents the number of prescription samples; y is ij The numerical determination rule of (2) is: if the prescription efficacy description text corresponding to the ith prescription sample contains the jth efficacy vocabulary in the prescription efficacy vocabulary, y ij =1, otherwise y ij =0。
Preferably, the step 4) specifically includes:
4.1 The pretreatment mode for preventing overfitting can avoid the relation between the drug pairs with lower model fitting frequency and efficacy and avoid generating result errors, and specifically comprises the following steps: pairing all the medicine components in the structural prescription composition data of all the prescription samples in pairs, counting the occurrence frequency of each medicine pair in all the prescription samples, and selecting q medicine pairs with highest frequency to form a high-frequency medicine pair, wherein the low-frequency medicine pair is discarded.
4.2 According to the structural prescription composition data and the high-frequency medicine pair table corresponding to each prescription sample, obtaining a high-frequency medicine pair vector corresponding to each prescription sample, and forming a high-frequency medicine pair matrix X= { X by the high-frequency medicine pair vectors corresponding to all prescription samples ij I=0, 1, &..n; j=0, 1,. -%, q; n represents the number of prescription samples; x is x ij The numerical determination rule of (2) is: if the structured prescription composition data of the ith prescription contains the jth drug pair in the high-frequency drug pair table, x ij =1, otherwise x ij =0。
Preferably, the step 5) specifically comprises the following steps:
establishing a single-layer forward network, wherein the single-layer forward network can maintain the interpretability of the neural network, and training the single-layer forward network according to the prescription efficacy vocabulary matrix obtained in the step 3) and the high-frequency medicine pair matrix obtained in the step 4), wherein a training formula is as follows:
Y=W·X+b
wherein X represents a high-frequency medicine pair matrix, Y represents a prescription efficacy vocabulary matrix, and the training parameter W representsThe degree of relationship between drug pairs and efficacy, b is the offset; obtaining the parameter W after the first training is finished l And b l
Inputting the high-frequency medicine pair matrix X into a network model after the first training to obtain a predicted prescription efficacy vocabulary matrix Y l The training loss function adopted in the training process is as follows:
loss=-Y*logY l
Y l =W l ·X+b l
wherein, parameters W 'and b' are obtained after training is finished.
Preferably, the step 6) adopts heuristic strategy to filter out the relationship between the non-drug and the efficacy, specifically:
6.1 Extracting W 'and b' from the trained network model in the step 5), inputting the high-frequency medicine pair matrix X into the trained network model to obtain a final predicted prescription efficacy vocabulary matrix Y '= { Y' ij I=0, 1, &..n; j=0, 1,..p, the formula is;
Y′=W′·X+b′
each item in the final predicted prescription efficacy vocabulary matrix Y' is a numerical value between 0 and 1; by setting a threshold value T, the numerical value smaller than T in the final predicted prescription efficacy vocabulary matrix Y 'is recorded as 0, the numerical value larger than T is recorded as 1, and the final predicted prescription efficacy vocabulary matrix Y' is converted into a matrix Y '= { Y', which consists of 0 and 1 ij I=0, 1, &..n; j=0, 1, p.
6.2 Traversing each drug pair, counting all prescription samples with the drug pair, and predicting the weight of each efficacy, wherein the specific formula is as follows:
score=correct/(correct+error)
correct indicates the number of prescriptions predicted correctly, and the counting rule is: if y ij =1,y″ ij If =1, count correct statistics, if yi j ≠y″ ij Or yi j =y″ ij =0, not counted; error indicates the number of prediction errors, and the counting rule is as follows: if y ij ≠y″ ij Counting error, otherwise, not counting;
6.3 After score is calculated between each drug pair and each efficacy, all the efficacy of each drug pair is ordered to obtain an ordered list of the most relevant efficacy of the drug pair, and the efficacy ordered at the front is the relevant efficacy of the drug pair to obtain the relation between the drug pair and the efficacy.
Compared with the prior art, the invention has the beneficial effects that:
1) Strategies to prevent overfitting are proposed. The prevention of overfitting is mainly based on controlling the size of the input-output vocabulary, the output vocabulary is a drug-to-vocabulary, and the output vocabulary is an efficacy vocabulary, so that the number of model parameters is reduced. The strategy ensures that parameters with poor adaptability are learned in the unsupervised learning process.
2) An interpretable neural network is designed.
y′=Wx+b
The dimension of the parameter W represents the pairwise association between efficacy and drug pairs. Meanwhile, the single-layer neural network design can also reduce the overfitting behavior of the model.
3) A method for filtering the ineffective relation between medicine pair and efficacy is provided. After the model is trained, the useful medicine pair efficacy relation is needed to be found out from the currently learned parameters, and the strategy provided by the invention can be used for measuring the association degree between the efficacy and the medicine pair according to the efficacy prediction accuracy, so that most of invalid relations can be removed.
Detailed Description
The present invention will be described in further detail with reference to specific examples.
The invention discloses a method for mining the relation between efficacy and drug pairs from prescription information, which comprises the following steps:
step one, a prescription book mainly takes a prescription dictionary as a main part, wherein the prescription dictionary contains authoritative prescriptions with uniform formats, and the data size is of ten thousand grades, so that the processing requirement is met; ocr processing information such as a prescription dictionary and the like by using a ocr technology, and converting the information into text information; establishing a prescription structured information extraction rule by using a regular expression technology of python, and extracting and storing prescription information, such as a mysql database, wherein a data table field comprises: prescription name, prescription composition, prescription efficacy indications, prescription usage, prescription contraindications, etc.;
step two, carrying out preliminary cleaning on prescription information, wherein some prescription information fields are incomplete, eliminating prescription data without efficacy main treatment information, and obtaining 15241 prescription samples in total; the text-type prescription composition description is converted to a structured composition using a TCM prescription composition extraction tool.
Illustrating:
the composition of the text type formulas is described as: dried rehmannia octastone, four pairs of yam and dogwood, three pairs of alisma, poria cocos and tree peony bark, and one part of cassia twig and aconite root processed respectively
The structural composition is as follows: dried rehmannia root, yam, dogwood, alisma orientale, poria cocos, moutan bark, cassia twig and aconite
Step three, constructing an efficacy vocabulary, and establishing a prescription efficacy vocabulary matrix:
the prescription efficacy text description has certain characteristics, and frequent 4-word phrases can be cut into 2-word phrases, for example: antipruritic, insecticidal, heat-clearing and detoxicating, in order to avoid huge vocabulary leading to too sparse model output, split 4 words short sentence into 2 words: antipruritic, insecticidal, heat clearing, and toxic materials removing effects. Efficacy description vocabulary is mostly 3-word and 2-word vocabulary, for example: regulating spleen and stomach, regulating large intestine, dispelling pathogenic wind, and promoting bone union. The prescription efficacy vocabulary is a key factor of the output of the model, the efficacy vocabulary is as small as possible in order to prevent overfitting, the occurrence frequency of all the vocabularies is counted, 1621 vocabularies with highest frequency are reserved as the prescription efficacy vocabulary, and the characteristics of integrity of vocabulary construction, low noise, labor saving and the like are ensured.
Determining prescription efficacy vocabulary vectors corresponding to each prescription sample according to prescription efficacy description text and prescription efficacy vocabulary corresponding to each prescription sample, wherein the prescription efficacy vocabulary vectors corresponding to all prescription samples form a prescription efficacy vocabulary matrix Y= { Y ij Dimension 15241 x 1621, where i=0, 1, …,15241; j=0, 1, …,1621; y is ij The numerical determination rule of (2) is: if the prescription efficacy description text corresponding to the ith prescription sample contains the jth efficacy word in the prescription efficacy vocabularySink, then y ij =1, otherwise y ij =0。
Step four, constructing a medicine pair vocabulary, and filtering low-frequency medicine pairs according to word frequency statistics to obtain a high-frequency medicine pair table; the following are illustrated:
prescription 1: dried rehmannia root, cassia twig and aconite root
Prescription 2: ramulus Cinnamomi, radix Aconiti lateralis Preparata, and Glycyrrhrizae radix
The appearance drug pair comprises: (dried rehmannia root, cassia twig) (dried rehmannia root, aconite root) (cassia twig, licorice root) (aconite root, licorice root). Wherein (ramulus Cinnamomi, radix Aconiti lateralis) appears in both formulations with frequency of 2.
The 1333 drug pairs with higher frequency are reserved as a final high-frequency drug pair vocabulary, so that the relation between drug pairs with lower model fitting frequency and efficacy can be avoided, and the result error is avoided.
According to the structural prescription composition data and the high-frequency medicine pair table corresponding to each prescription sample, a high-frequency medicine pair vector corresponding to each prescription sample is obtained, and the high-frequency medicine pair vector corresponding to all prescription samples forms a high-frequency medicine pair matrix X= { X ij Dimension 15241 x 1333, where i=0, 1, 15241; j=0, 1,/1621; x is x ii The numerical determination rule of (2) is: if the structured prescription composition data of the ith prescription contains the jth drug pair in the high-frequency drug pair table, x ij =1, otherwise x ij =0。
Step five, in order to maintain the interpretability of the model, a single-layer forward neural network is built by using Tensorflow, and the single-layer forward neural network is trained according to a prescription efficacy vocabulary matrix and a high-frequency medicine pair matrix, wherein a training formula is as follows:
Y=W·X+b
wherein X represents a high-frequency medicine pair matrix, the dimension is 1333, Y represents a prescription efficacy vocabulary matrix, the dimension is 1621, the training parameter W represents the relation degree between medicine pairs and efficacy, b is an offset, and b has the function of preventing weight offset caused by different frequency of efficacy labels; obtaining the parameter W after the training of the 1 st time is finished l And b l
The high-frequency medicine pair matrix X is input into a network model after the 1 st training,obtaining a predicted prescription efficacy vocabulary matrix Y l The prediction formula is as follows:
Y l =W l ·X+b l
the neural network training loss function adopts cross entropy loss:
loss=-Y*logY l
wherein, parameters W 'and b' are obtained after training is finished.
And step six, extracting weight parameters from the trained model, and filtering according to a heuristic strategy. The heuristic is: based on all the formulas containing the drug pair A, calculating the accuracy of the formulas containing the efficacy B, and if the accuracy is higher, the probability that the efficacy of the drug pair A is B is larger. Subsequent heuristic strategy processing employs numpy as the data processing tool.
And (3) marking the prescription by using the trained model, extracting W 'and b' from the trained network model, inputting a high-frequency medicine pair matrix X into the trained network model, wherein the dimension is 15241X 1333, wherein 15241 represents the number of prescription samples, and 1333 represents the number of medicine pairs. Obtaining a final predicted prescription efficacy vocabulary matrix Y '= { Y' ij I=0, 1,. }, 15241; j=0, 1,..1621, where "1621" represents the number of efficacy words.
The calculation formula is as follows;
Y′=W′·X+b′
examples are as follows:
the drug pair: [ (herba Ephedrae, ramulus Cinnamomi) (herba Ephedrae, radix Aconiti lateralis Preparata) (ramulus Cinnamomi, radix Aconiti lateralis Preparata) ]
Prediction result: (relieving cough, resolving phlegm and clearing heat)
True label: (cough relieving, phlegm resolving, detoxification)
The accuracy of each drug on the predicted individual efficacy was counted.
score=correct/(correct+error)
correct indicates the number of prescriptions predicted correctly, note that if the efficacy label is 0, the prediction result is 0, correct is not taken into account, and because the efficacy label is 0 in a larger number, calculation offset is caused. error indicates the number of prediction errors, and if the efficacy label is 0, the prediction result is 1 or the efficacy label is 1, the prediction result is 0, and error statistics are counted. Based on the above examples: the correct value for (ephedra, cassia twig) corresponding to cough was 1, since the prediction of "cough" was consistent with the true label. The error value corresponding to detoxification (ephedra, cassia twig) is 1, because the "detoxification" prediction is inaccurate.
All efficacy on each drug pair was ranked, top5 was taken as the final drug pair efficacy.

Claims (6)

1. A method for mining the relation between drug pairs and efficacy from prescription information comprises the following steps:
1) Collecting authoritative prescription information data, and extracting prescription information through an OCR tool;
2) Extracting composition information of text types from the prescription information to generate structural prescription composition data corresponding to each prescription sample;
3) Extracting keywords in the prescription efficacy description text from all prescription information, and establishing a prescription efficacy vocabulary; determining prescription efficacy vocabulary vectors corresponding to each prescription sample according to prescription efficacy description texts and prescription efficacy vocabulary corresponding to each prescription sample, wherein the prescription efficacy vocabulary vectors corresponding to all the prescription samples form a prescription efficacy vocabulary matrix;
4) Pairing all the medicine components in the structural prescription composition data of all the prescription samples in pairs, counting the occurrence frequency of each medicine pair in all the prescription samples, and screening out high-frequency medicine pairs to form a high-frequency medicine pair table; according to the structural prescription composition data and the high-frequency medicine pair table corresponding to each prescription sample, a high-frequency medicine pair vector corresponding to each prescription sample is obtained, and the high-frequency medicine pair vectors corresponding to all prescription samples form a high-frequency medicine pair matrix;
5) Establishing a single-layer forward network, and training the single-layer forward network according to the prescription efficacy vocabulary matrix obtained in the step 3) and the high-frequency medicine pair matrix obtained in the step 4) to obtain a trained network model;
6) Inputting the high-frequency medicine pair matrix obtained in the step 4) into a trained network model to obtain a final predicted prescription efficacy vocabulary matrix; and (3) calculating the prediction accuracy of each medicine pair on different efficacy vocabularies and sequencing according to the finally predicted prescription efficacy vocabulary matrix and the actual prescription efficacy vocabulary matrix obtained in the step (3), so as to obtain the relation between the medicine pair and the efficacy.
2. The method for mining drug pair and efficacy relation from prescription information according to claim 1, wherein in the step 2), the TCM prescription composition extraction tool is used to perform structural extraction of composition information of the prescription from the prescription information extracted in the step 1).
3. The method for mining drug pair and efficacy relationship from prescription information according to claim 1, wherein the step 3) specifically comprises:
3.1 Extracting keywords in the prescription efficacy description text from prescription information of all prescription samples, wherein the keywords comprise all four-word vocabulary, three-word vocabulary and two-word vocabulary, and splitting the four-word vocabulary into two-word vocabulary; filtering the low-frequency vocabulary to obtain a prescription efficacy vocabulary containing p efficacy vocabularies;
3.2 Determining prescription efficacy vocabulary vectors corresponding to each prescription sample according to prescription efficacy description text and prescription efficacy vocabulary corresponding to each prescription sample, wherein the prescription efficacy vocabulary vectors corresponding to all prescription samples form a prescription efficacy vocabulary matrix Y= { Y ij I=0, 1, …, n; j=0, 1, …, p; n represents the number of prescription samples; y is ij The numerical determination rule of (2) is: if the prescription efficacy description text corresponding to the ith prescription sample contains the jth efficacy vocabulary in the prescription efficacy vocabulary, y ij =1, otherwise y ij =0。
4. The method for mining drug pair and efficacy relationship from prescription information according to claim 1, wherein the step 4) is specifically:
4.1 Pairing all the medicine components in the structural prescription composition data of all the prescription samples in pairs, counting the occurrence frequency of each medicine pair in all the prescription samples, and screening out q medicine pairs with highest frequency to form a high-frequency medicine pair table;
4.2 According to the structural prescription composition data and the high-frequency medicine pair table corresponding to each prescription sample, obtaining a high-frequency medicine pair vector corresponding to each prescription sample, and forming a high-frequency medicine pair matrix X= { X by the high-frequency medicine pair vectors corresponding to all prescription samples ij I=0, 1, …, n; j=0, 1, …, q; n represents the number of prescription samples; x is x ij The numerical determination rule of (2) is: if the structured prescription composition data of the ith prescription contains the jth drug pair in the high-frequency drug pair table, x ij =1, otherwise x ij =0。
5. The method for mining drug pair and efficacy relationship from prescription information according to claim 1, wherein the step 5) specifically comprises:
establishing a single-layer forward network, and training the single-layer forward network according to the prescription efficacy vocabulary matrix obtained in the step 3) and the high-frequency medicine pair matrix obtained in the step 4), wherein the training formula is as follows:
Y=W·X+b
wherein X represents a high-frequency drug pair matrix, Y represents a prescription efficacy vocabulary matrix, a training parameter W represents the degree of relationship between drug pairs and efficacy, and b is an offset; obtaining the parameter W after the first training is finished l And b l
Inputting the high-frequency medicine pair matrix X into a network model after the first training to obtain a predicted prescription efficacy vocabulary matrix Y l The training loss function adopted in the training process is as follows:
loss=-Y*logY l
Y l =W l ·X+b l
wherein, the parameter W is obtained after training And b
6. The method for mining drug pair and efficacy relationship from prescription information according to claim 5, wherein said step 6) specifically comprises:
6.1 Extracting the parameters W obtained after training is finished from the trained network model in the step 5) And b Inputting the high-frequency medicine pair matrix X into a trained network model to obtain a final predicted prescription efficacy vocabulary matrix Y ={y i j I=0, 1, …, n; j=0, 1, …, p, n represent the number of prescription samples, i represents the ith prescription sample, p represents the number of efficacy vocabulary words in the prescription efficacy vocabulary, j represents the jth efficacy vocabulary; the calculation formula is as follows;
Y′=W′·X+b′
each item in the final predicted prescription efficacy vocabulary matrix Y' is a numerical value between 0 and 1; by setting a threshold value T, the numerical value smaller than T in the final predicted prescription efficacy vocabulary matrix Y 'is recorded as 0, the numerical value larger than T is recorded as 1, and the final predicted prescription efficacy vocabulary matrix Y' is converted into a matrix Y '= { Y', which consists of 0 and 1 ij I=0, 1, …, n; j=0, 1, …, p;
6.2 Traversing each drug pair, counting all prescription samples with the drug pair, and predicting the weight of each efficacy, wherein the specific formula is as follows:
score=correct/(correct+error)
correct indicates the number of prescriptions predicted correctly, and the counting rule is: if y ij =1,y″ ij Let y be the sum of the correction times =1 ij ≠y″ ij Or y ij =y″ ij =0, not counted; error indicates the number of prediction errors, and the counting rule is as follows: if y ij ≠y″ ij Counting error, otherwise, not counting;
6.3 After score is calculated between each drug pair and each efficacy, all the efficacy of each drug pair is ordered to obtain an ordered list of the most relevant efficacy of the drug pair, and the efficacy ordered at the front is the relevant efficacy of the drug pair to obtain the relation between the drug pair and the efficacy.
CN201911165949.6A 2019-11-25 2019-11-25 Method for mining relation between drug pairs and efficacy from prescription information Active CN111180045B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911165949.6A CN111180045B (en) 2019-11-25 2019-11-25 Method for mining relation between drug pairs and efficacy from prescription information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911165949.6A CN111180045B (en) 2019-11-25 2019-11-25 Method for mining relation between drug pairs and efficacy from prescription information

Publications (2)

Publication Number Publication Date
CN111180045A CN111180045A (en) 2020-05-19
CN111180045B true CN111180045B (en) 2023-05-12

Family

ID=70653754

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911165949.6A Active CN111180045B (en) 2019-11-25 2019-11-25 Method for mining relation between drug pairs and efficacy from prescription information

Country Status (1)

Country Link
CN (1) CN111180045B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112133382B (en) * 2020-09-24 2024-02-20 南京泛泰数字科技研究院有限公司 Learning method and system for medical analysis by using algorithm model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005027012A2 (en) * 2003-09-16 2005-03-24 Pfizer Inc. System and method for the computer-assisted identification of drugs and indications
KR20090072422A (en) * 2007-12-28 2009-07-02 대구한의대학교산학협력단 System analyzing efficacy and genealogy of traditional medicine prescription
CN102122325A (en) * 2011-04-20 2011-07-13 天津师范大学 Method for automatically analyzing efficacy of Chinese medicine formula
CN106803012A (en) * 2016-12-29 2017-06-06 杭州师范大学钱江学院 Prescription function prediction method based on probability topic model and Chinese medicine base attribute
CN109947901A (en) * 2019-02-20 2019-06-28 杭州师范大学 Prescription Effect prediction technique based on multi-layer perception (MLP) and natural language processing technique

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005027012A2 (en) * 2003-09-16 2005-03-24 Pfizer Inc. System and method for the computer-assisted identification of drugs and indications
KR20090072422A (en) * 2007-12-28 2009-07-02 대구한의대학교산학협력단 System analyzing efficacy and genealogy of traditional medicine prescription
CN102122325A (en) * 2011-04-20 2011-07-13 天津师范大学 Method for automatically analyzing efficacy of Chinese medicine formula
CN106803012A (en) * 2016-12-29 2017-06-06 杭州师范大学钱江学院 Prescription function prediction method based on probability topic model and Chinese medicine base attribute
CN109947901A (en) * 2019-02-20 2019-06-28 杭州师范大学 Prescription Effect prediction technique based on multi-layer perception (MLP) and natural language processing technique

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于改进关联规则算法的中药药对药味间性味归经功效属性关系的发现研究;尚尔鑫等;《世界科学技术(中医药现代化)》;20100620(第03期);第377-382页 *

Also Published As

Publication number Publication date
CN111180045A (en) 2020-05-19

Similar Documents

Publication Publication Date Title
CN110134772B (en) Medical text relation extraction method based on pre-training model and fine tuning technology
CN112216396B (en) Method for predicting drug-side effect relationship based on graph neural network
CN109344250A (en) Single diseases diagnostic message rapid structure method based on medical insurance data
CN111681718B (en) Medicine relocation method based on deep learning multi-source heterogeneous network
CN106202891B (en) A kind of big data method for digging towards Evaluation of Medical Quality
CN103970666B (en) Method for detecting repeated software defect reports
CN108363714A (en) A kind of method and system for the ensemble machine learning for facilitating data analyst to use
CN101149751B (en) Generalized relating rule digging method for analyzing traditional Chinese medicine recipe drug matching rule
CN106407443A (en) Structured medical data generation method and device
CN110298036A (en) A kind of online medical text symptom identification method based on part of speech increment iterative
CN107798083A (en) A kind of information based on big data recommends method, system and device
CN111180045B (en) Method for mining relation between drug pairs and efficacy from prescription information
CN113539412B (en) Deep learning-based Chinese herbal medicine recommendation system
Danubianu Step by step data preprocessing for data mining. A case study
CN111477295A (en) Traditional Chinese medicine formula recommendation method and system based on latent semantic model
Chen et al. Application of NER and association rules to traditional Chinese medicine patent mining
CN108197271A (en) A kind of films and television programs market analysis method based on big data
Li et al. Research on optimization of process parameters of traditional Chinese medicine based on data mining technology
CN116721779B (en) Medical data preprocessing method and system
Ruan et al. Tpgen: Prescription generation using knowledge-guided translator
CN117807534A (en) Traditional Chinese medicine syndrome attribution classification method and database system thereof
CN112420153B (en) Method for improving traditional Chinese medicine prescription based on GAN
CN112836011B (en) Analysis management system construction method and model construction based on big data application
CN105628885A (en) Multi-source data based Chinese patent medicine analysis method
Lin et al. Application of Computer Big Data Technology in Chinese Medicine Based on Ancient Record Database and Cloud Computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant