CN113838583B - Intelligent medicine curative effect evaluation method based on machine learning and application thereof - Google Patents
Intelligent medicine curative effect evaluation method based on machine learning and application thereof Download PDFInfo
- Publication number
- CN113838583B CN113838583B CN202111135248.5A CN202111135248A CN113838583B CN 113838583 B CN113838583 B CN 113838583B CN 202111135248 A CN202111135248 A CN 202111135248A CN 113838583 B CN113838583 B CN 113838583B
- Authority
- CN
- China
- Prior art keywords
- medicine
- medicines
- drug
- evaluating
- efficacy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 239000003814 drug Substances 0.000 title claims abstract description 167
- 230000000694 effects Effects 0.000 title claims abstract description 57
- 238000011156 evaluation Methods 0.000 title claims abstract description 18
- 238000010801 machine learning Methods 0.000 title claims abstract description 15
- 229940079593 drug Drugs 0.000 claims abstract description 88
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims abstract description 43
- 201000010099 disease Diseases 0.000 claims abstract description 41
- 208000024891 symptom Diseases 0.000 claims abstract description 39
- 238000000034 method Methods 0.000 claims abstract description 21
- 238000011282 treatment Methods 0.000 claims abstract description 19
- 238000012549 training Methods 0.000 claims abstract description 12
- 239000013598 vector Substances 0.000 claims abstract description 12
- 230000006870 function Effects 0.000 claims abstract description 11
- 238000002372 labelling Methods 0.000 claims abstract description 8
- 238000013507 mapping Methods 0.000 claims abstract description 8
- 238000012795 verification Methods 0.000 claims abstract description 7
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 6
- 230000007246 mechanism Effects 0.000 claims abstract description 5
- 230000001225 therapeutic effect Effects 0.000 claims description 8
- 230000008451 emotion Effects 0.000 claims description 6
- 238000007637 random forest analysis Methods 0.000 claims description 5
- 230000007935 neutral effect Effects 0.000 claims description 4
- 238000007477 logistic regression Methods 0.000 claims description 3
- 238000010200 validation analysis Methods 0.000 claims description 3
- 239000003596 drug target Substances 0.000 claims description 2
- 239000000825 pharmaceutical preparation Substances 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 claims 1
- 230000008901 benefit Effects 0.000 description 5
- 229960000282 metronidazole Drugs 0.000 description 5
- VAOCPAMSLUNLGC-UHFFFAOYSA-N metronidazole Chemical compound CC1=NC=C([N+]([O-])=O)N1CCO VAOCPAMSLUNLGC-UHFFFAOYSA-N 0.000 description 5
- 206010019233 Headaches Diseases 0.000 description 4
- 208000020670 canker sore Diseases 0.000 description 4
- 208000002173 dizziness Diseases 0.000 description 4
- 231100000869 headache Toxicity 0.000 description 4
- 208000034783 hypoesthesia Diseases 0.000 description 4
- IPWKIXLWTCNBKN-UHFFFAOYSA-N Madelen Chemical compound CC1=NC=C([N+]([O-])=O)N1CC(O)CCl IPWKIXLWTCNBKN-UHFFFAOYSA-N 0.000 description 3
- 201000009906 Meningitis Diseases 0.000 description 3
- 206010028813 Nausea Diseases 0.000 description 3
- 206010040047 Sepsis Diseases 0.000 description 3
- 230000008693 nausea Effects 0.000 description 3
- 229960002313 ornidazole Drugs 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 208000007101 Muscle Cramp Diseases 0.000 description 2
- 206010048685 Oral infection Diseases 0.000 description 2
- 206010043376 Tetanus Diseases 0.000 description 2
- 206010047700 Vomiting Diseases 0.000 description 2
- 206010009887 colitis Diseases 0.000 description 2
- 206010014665 endocarditis Diseases 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 208000035824 paresthesia Diseases 0.000 description 2
- 201000001245 periodontitis Diseases 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 230000008673 vomiting Effects 0.000 description 2
- 208000004998 Abdominal Pain Diseases 0.000 description 1
- 206010006326 Breath odour Diseases 0.000 description 1
- 208000002881 Colic Diseases 0.000 description 1
- 206010011224 Cough Diseases 0.000 description 1
- 208000004145 Endometritis Diseases 0.000 description 1
- 208000010201 Exanthema Diseases 0.000 description 1
- 206010061218 Inflammation Diseases 0.000 description 1
- 206010037660 Pyrexia Diseases 0.000 description 1
- 206010041349 Somnolence Diseases 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000004596 appetite loss Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 229960000074 biopharmaceutical Drugs 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000003467 diminishing effect Effects 0.000 description 1
- 208000035475 disorder Diseases 0.000 description 1
- 238000013210 evaluation model Methods 0.000 description 1
- 238000012854 evaluation process Methods 0.000 description 1
- 201000005884 exanthem Diseases 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000004054 inflammatory process Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 208000019017 loss of appetite Diseases 0.000 description 1
- 235000021266 loss of appetite Nutrition 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 206010037844 rash Diseases 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 229940126672 traditional medicines Drugs 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
- G16H70/40—ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medicinal Chemistry (AREA)
- Mathematical Physics (AREA)
- Chemical & Material Sciences (AREA)
- Computing Systems (AREA)
- Pharmacology & Pharmacy (AREA)
- Toxicology (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention discloses an intelligent medicine curative effect evaluation method based on machine learning and application thereof, wherein the method comprises the steps of establishing a mapping relation between medicines and corresponding target treatment diseases or symptoms; extracting potential side effects corresponding to medicines, and calculating similarity indexes among medicines; labeling the data on the medicine line to mark whether the medicine is effective or not, structuring the text data of the medicine, and extracting multi-dimensional crowd information and relevant feature vectors of the medicine and treatment; dividing the structured data into a training set and a verification set, establishing an integrated prediction model by utilizing a plurality of algorithms and different characteristic variable selection mechanisms, and selecting a scheme with optimal prediction effect; finally, the effective rates of the similar medicines are ranked according to the medicine similarity indexes, and various functions of evaluating the curative effect of the medicines are realized through the application.
Description
Technical Field
The invention relates to the field of biological medicine and artificial intelligence, in particular to an intelligent medicine curative effect evaluation method based on machine learning and application thereof.
Background
In the field of biopharmaceuticals, efficacy (efficacy), efficacy and benefit (efficacy) are three indicators used to evaluate drugs at different times and environments. Efficacy generally refers to the amount of therapeutic effect that a drug can achieve under ideal conditions during the clinical trial phase, and is the maximum desired effect of the drug. The curative effect is the magnitude of the therapeutic effect achieved by the medicine under the actual medical and health conditions, namely the data result obtained in the real world. Benefit refers to whether a drug's value is comparable to the cost paid by an individual or society, which considers not only clinical effectiveness, but also cost benefit to benefit the socioeconomic performance, which is commonly used for health and economics assessment.
When a drug passes the three-phase clinical trial, its efficacy will be examined from the real world after approval for marketing. Under the real condition, factors such as patient groups, medicine doses, use frequency and the like are much more complex than clinical random experiments, so that medicine curative effect evaluation aiming at the real world is more and more emphasized, and mass data mining such as on-line medicine evaluation, case report, medicine use instruction, notice and the like can be realized due to development of big data technology.
Existing studies and methods of drug efficacy from the real world are generally directed to a single data source, such as evaluating drug efficacy through research reports, clinical follow-up or performing a four-phase trial, and the crowd information that they can cover is still affected by factors such as research expenses, research scale, selectivity bias, etc. The invention utilizes text mining technology and integrated machine learning algorithm to integrate the data of different information sources, extracts effective characteristic values, establishes a comprehensive drug efficacy evaluation system and a decision mechanism applied by the system, and realizes multiple functions such as drug recommendation, efficacy and side effect evaluation, similar drug comparison and the like.
The invention can monitor and evaluate the curative effect of the drug on the market for a long time and in a large scale, and can be further used as an important reference index for evaluating the effectiveness of the drug and the benefit of the cost price.
Disclosure of Invention
The invention aims to provide an intelligent medicine evaluation method based on machine learning and application thereof, which combines massive internet data with hospital medical record list, follow-up visit or investigation report data to obtain larger-range real-time feedback information of medicine use conditions, and comprehensively evaluates medicine curative effects by multiple information sources. Adverse factors such as high cost generated by recruiting subjects, artificial inclusion and exclusion criteria and the like in the treatment effect evaluation process after the traditional medicines are marketed are avoided, and the use treatment effect and the side effect generated by the medicines under various conditions are evaluated more comprehensively and efficiently.
The first aspect of the invention provides an intelligent medicine evaluation method based on machine learning, which comprises the following steps:
1) The mapping relation between the medicine and the corresponding therapeutic disease or symptom is extracted through the medicine instruction book and the medicine guide of the medicine administration: assume a pharmaceutical productI=1, a method of treating a subject suffering from a disorder, I, the corresponding target for treating diseases or symptoms is +.>J=1,..j, J targets treat disease or symptom with the corresponding potential side effect +.>K=1,..k is K potential side effects.
Similar drug indices prior to drug were calculated. Specifically, assume a medicineThe corresponding target for treating diseases or symptoms is +.>The method comprises the steps of carrying out a first treatment on the surface of the Medicine->The corresponding target for treating diseases or symptoms is +.>The method comprises the steps of carrying out a first treatment on the surface of the Drug similarity index->。
2) Treating diseases or symptoms by on-line drug comments, hospital medical records and follow-up records according to the drug targets in the step 1)Grouping and sorting each comment and medical recordLabeling, labeled "active" or "inactive", respectively.
Specifically, the labeling mode is automatic labeling, emotion analysis is performed according to semantics, sentences are scored as-1 (negative) to 1 (positive) by VADER (Valence Aware Dictionary and sEntiment Reasoner), and 0 is a neutral opinion. Further, a manual verification may also be performed after automated labeling.
3) Structuring the text data: a) Extracting multidimensional crowd information such as age, gender, race, wedding, region and the like, b) extracting feature vectors: extracting characteristic words or phrases such as inflammation diminishing, fever, headache, cold, cough and the like from on-line medicine comments, medical record lists and follow-up visit records to obtain characteristic vectors;
4) The data set converted from text data to structured data is divided into a training set and a verification set according to a certain proportion.
Specifically, the training set and the validation set may be partitioned in a ratio of 8:2,7:3, or 6:4.
5) Multiple algorithms are selected as classifiers for predicting the classification problem.
Specifically, four classifiers for the two-classification problem may be selected: a) OneVrest SVM, b) Logistic Regression, c) Random Forest, d) Bagging meta-estimator with logistic regressor base.
6) Different characteristic variable selection mechanisms are established, and a scheme with optimal prediction effect of various classifiers under different characteristic variables is selected.
Specifically, the feature variable selection may be obtained by combining a specific word occurrence frequency (Count), a word frequency-inverse document frequency (tf-idf, i.e., tfidf), and a VADER score, for example:
FS-1:CountVectorizer,
FS-2:CountVectorizer +VADERscore,
FS-3: countVectorizer top 10000 feature vector + vaderrscore,
FS-4:TfidfVectorizer,
FS-5:TfidfVectorizer +VADERscore,
FS-6: tfidfVectorizer top 10000 eigenvectors+vaderrscore.
Further, the optimal prediction effect scheme is obtained by evaluating F1-score,
F1 score = 2*(Recall * Precision) / (Recall + Precision);
where Recall = true positive/(true positive + false negative), precision = true positive/(true positive + false positive).
A second aspect of the present invention provides an application of intelligent drug evaluation based on machine learning, the application comprising a plurality of functions: function 1) evaluating curative effect of a certain medicine, inputting medicine name, obtaining medicine effectiveness score, ranking in the same kind of medicine and ranking side effect; function 2) searching for corresponding medicines aiming at a certain disease or symptom, inputting names of single or multiple diseases or symptoms, and obtaining single or multiple corresponding medicine effectiveness scores, ranks and ranks of side effects of each medicine; function 3) ranking the effectiveness of medicines and similar medicines and side effects of the medicines aiming at multidimensional crowds with different ages, sexes, race, wedding, region and the like.
In the embodiment of the invention, the mapping relation between the medicine and the corresponding target treatment disease or symptom is determined according to approved information such as a medicine use instruction, a medicine guide and the like, and the medicine effectiveness is predicted by using information such as on-line medicine comments, medical records, follow-up records and the like, and the process of establishing a prediction model can be divided into: firstly, carrying out emotion analysis on a sentence through VADER, calibrating whether a medicine aimed by the sentence is effective or not, then dividing a structured data set into a training set and a verification set, and training a plurality of two-classification classifiers (models) by adopting different feature extraction modes to obtain an optimal scheme. In the application level, the prediction result of whether the medicines are effective is applied to the mapping relation which is initially determined, and the effective rate and the side effect of each medicine for treating the diseases or symptoms aiming at each target and the effective rate of single or a plurality of similar medicines for treating the diseases or symptoms aiming at the same target are calculated. Therefore, when a certain medicine is input at the user end, the effective rate and side effect of the corresponding target treatment disease or symptom and the effective rate of similar medicines can appear; if a disease or symptom is input, the effective rate corresponding to single or multiple medicines and the side effects thereof can appear. In addition, the effective rate of the crowd of age, sex, race, wedding, region and the like of the crowd taking the medicine can be known by screening crowd information.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of an intelligent drug efficacy evaluation method based on machine learning in an embodiment of the invention.
Fig. 2 is a schematic diagram of a software running structure of an intelligent drug efficacy evaluation application based on machine learning in an embodiment of the present invention.
Description of the embodiments
The following description of the embodiments of the present invention will be made with reference to the accompanying drawings, in which it is evident that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
Embodiments of the present invention are described in detail below.
Examples
Referring to fig. 1, fig. 1 is a flow chart of a method for evaluating the efficacy of an intelligent drug based on machine learning according to an embodiment of the present invention, as shown in fig. 1, the method for evaluating the efficacy of an intelligent drug based on machine learning includes:
101. and establishing a mapping relation between the medicine and the corresponding target treatment disease or symptom.
As shown in Table 1, the diseases or symptoms treatable by the drug Metronidazole are sepsis, endocarditis, meningitis, colitis, tetanus, canker sore, etc. Extracting the corresponding target therapeutic diseases or symptoms from the drug instruction or drug guide, and marking the drug as respectivelyThe corresponding target for treating diseases or symptoms is +.>. In this example +.>Metronidazole, =>= { sepsis, endocarditis, meningitis, colitis, tetanus, canker sore. After establishing the mapping relationship between the drug and the corresponding target therapeutic disease or symptom, step 102 is performed.
102. And extracting the corresponding potential side effects of the medicines, and calculating the similarity index between the medicines.
As shown in table 1, potential side effects of metronidazole using the drug were nausea, vomiting, loss of appetite, abdominal cramps, headaches, dizziness, paresthesia, numbness of limbs, and the like. Extracting potential side effects corresponding to the medicine in the medicine use instruction or medicine guide, and marking as. In this example +.>= { nausea, vomiting, loss of appetiteAbdominal cramps, headaches, dizziness, paresthesia, numbness of limbs.
As shown in the table 2 below,ornidazole for the treatment of diseases or conditions with the drug Ornidazole +.>= { sepsis, amoebae disease, meningitis, periodontitis, endometritis, canker sore..the }, potential side effects +.>= { nausea, bad breath, dizziness, drowsiness, rash, cramps, confusion, numbness of limbs.
The similarity index of the two drugs is calculated,
further, a data dictionary for treating diseases or symptoms of medicines is designed, the data dictionary comprises the upper and lower concepts of diseases or symptoms, such as periodontitis and canker sore are oral infections, and if the upper concept of oral infections is used, the similarity index of the two medicines is 3/8=0.375.
Further, a similar word merging data dictionary is designed, and the data dictionary contains words which can be regarded as similar in the same level, such as { numbness of limbs }, { headache }, { dizziness }, and the like.
TABLE 1 Metronidazole drug for treating diseases or symptoms and potential side effects
TABLE 2 therapeutic diseases or symptoms and potential side effects of Ornidazole drug
103. Marking on-line comments, medical records and follow-up records of the medicines, and marking whether the medicines are effective.
Specifically, the statement was scored as a value of-1 (negative) to 1 (positive) with VADER (Valence Aware Dictionary and sEntiment Reasoner), 0 being the neutral opinion. Four scores are given for sentences using the polarity_score method using the VADER module in statistical analysis software python: (a) negative, (b) aggressiveness, (c) neutral score, (d) complex emotion score. The composite score is the sum of the first three scores and is used to measure the positive or negative emotion of a sentence. The application is suitable for emotion analysis of English sentences, so that all data sources for evaluating the curative effect of the medicine are mainly English as much as possible, and if Chinese text is collected, the English can be translated by an automatic translator and manually checked.
104. And structuring the text data, and extracting multidimensional crowd information and relevant feature vectors of medicines and treatments.
Crowd information including, but not limited to, age, gender, race, wedding status, region, etc., such information as on-line drug reviews may be obtained by a computer background database, and hospital medical record management systems may also obtain such information, where the follow-up records should contain such crowd information as much as possible prior to designing a follow-up study.
Words which can embody important text characteristics in online medicine comments, hospital medical records and follow-up records are converted into vector forms in terms of word frequency (CountVec) and word frequency-inverse document frequency (tf-idf).
Further, the feature vector may have the following rules, such as:
FS-1:CountVectorizer,
FS-2:CountVectorizer +VADERscore,
FS-3: countVectorizer top 10000 feature vector + vaderrscore,
FS-4:TfidfVectorizer,
FS-5:TfidfVectorizer +VADERscore,
FS-6: tfidfVectorizer top 10000 eigenvectors+vaderrscore.
105. The structured data is divided into a training set and a validation set.
The data sets in the form of vectors are converted into data sets in the steps of dividing training sets and verification sets according to the proportion of 8:2,7:3 or 6:4.
106. Multiple algorithms are selected as classifiers for predicting the classification problem.
Further, four commonly used algorithms are chosen to train the classifier, such as a) OneVrest SVM, b) Logistic Regression, c) Random Forest, d) Bagging meta-estimator with logistic regressor base.
107. Different characteristic variable selection mechanisms are established, and a scheme with optimal prediction effect of various classifiers under different characteristic variables is selected.
As shown in Table 3, table 3 shows the F1-score of the data training results obtained by combining the four classifiers of step 106 with the six feature variable selection rules of step 104,
F1 score = 2*(Recall * Precision) / (Recall + Precision);
where Recall = true positive/(true positive + false negative), precision = true positive/(true positive + false positive). Recall is Recall and Precision is Precision.
TABLE 3 prediction effect of multiple classifiers under different feature variables
108. And calculating the effective rate of the medicine aiming at the target treatment disease or symptom by using the optimal scheme obtained through training.
The optimal scheme of the predicted medicine for treating the disease or symptom aiming at the target is random forest (random forest) obtained from the table 3, and the characteristic extraction mode is FS-6: tfidfVectorizer top 10000 eigenvectors+vaderrscore. The corresponding F1-score was 0.760. The scheme is utilized to predict unlabeled data and calculate the effective rate of a certain medicine for different treatment diseases or symptoms respectively.
109. And obtaining the effective rate rank and the potential side effect rank of the similar medicines according to the medicine similarity index.
Drug similarity index.
For a certain drug, such as metronidazole, the similarity index of other drugs and the drug is calculated by using the method of step 102, the first five drugs can be taken, and the effective rate of the drug and the similar drugs can be calculated respectively through a model. Using the potential side effect feature words of the drug extracted in step 102, the ranking of side effects produced by the drug in all data sources is counted, preferably top 10, or top ranking side effects as the case may be.
Examples
Based on the intelligent medicine efficacy evaluation method described in the above embodiment, an intelligent medicine efficacy evaluation application based on machine learning is developed, the application background includes a database for collecting and managing the above different data sources, the middle stage includes an intelligent medicine efficacy evaluation method capable of performing model parameter adjustment and real-time monitoring, and the front stage can realize the following functions:
evaluating the curative effect of a certain medicine, inputting the name of the medicine, and obtaining the medicine effectiveness score, ranking in the medicines of the same type and ranking side effects;
the medicine effectiveness score is the medicine effectiveness rate obtained through the medicine curative effect evaluation model.
Searching for corresponding medicines aiming at a certain disease or symptom, inputting names of single or multiple diseases or symptoms, and obtaining single or multiple corresponding medicine effectiveness scores, ranks and side effect ranks of each medicine;
the disease or symptom is input into the system, the mapping relationship between the drug obtained in step 101 of example 1 and the target therapeutic disease or symptom is used to find the corresponding drug, and the effectiveness score, rank, potential side effect rank of each drug, etc. of the single or multiple corresponding drugs are calculated by the model.
Aiming at multidimensional crowds with different ages, sexes, race, wedding, region and the like, ranking the effectiveness of medicines, similar medicines and side effects thereof;
the crowd information is used as screening conditions for calculating the effectiveness of medicines and searching similar medicines and side effect ranking of the similar medicines.
Claims (9)
1. The intelligent medicine curative effect evaluation method based on machine learning is characterized by comprising the following steps of:
1) The mapping relation between the medicine and the corresponding therapeutic disease or symptom is extracted through the medicine instruction book and the medicine guide of the medicine administration: assume a pharmaceutical productI=1, a method of treating a subject suffering from a disorder, I, its corresponding targetThe treatment of diseases or symptoms is->J=1,..j, J targets treat disease or symptom with the corresponding potential side effect +.>K=1,..k, K is K potential side effects; calculating medicine->A similar drug index;
2) Treating diseases or symptoms by on-line drug comments, hospital medical records and follow-up records according to the drug targets in the step 1)Grouping and labeling each comment and medical record list as valid or invalid;
3) Structuring the text data:
a) Extracting multi-dimensional crowd information such as age, gender, race, wedding and region,
b) Extracting feature vectors: extracting characteristic words or phrases from online drug comments, medical record lists and follow-up records to obtain characteristic vectors;
4) Dividing the data set converted from text data into a structured data set into a training set and a verification set according to a certain proportion;
5) Selecting a plurality of algorithms as classifiers for predicting the classification problems;
6) Different characteristic variable selection mechanisms are established, and a scheme with optimal prediction effect of various classifiers under different characteristic variables is selected;
7) Calculating medicine by using the optimal scheme obtained by training in the step 6)Treatment of diseases or symptoms against the target->The ranking of the medicines in the same kind of medicines is obtained according to the medicine similarity index calculated in the step 1), and the potential side effect of the medicines in the step 1) is +.>Feature words in the dataset are extracted and ranked.
2. The method for evaluating the efficacy of intelligent medicine according to claim 1, wherein the labeling in the step 2) is automatic labeling, emotion analysis is performed according to semantics, the statement is scored as a value from-1 (negative) to 1 (positive) by VADER (Valence Aware Dictionary and sEntiment Reasoner), and 0 is a neutral opinion.
3. The method for evaluating the efficacy of intelligent medicine according to claim 2, wherein the automatic labeling in the step 2) is followed by manual verification.
4. The method for evaluating the efficacy of treatment of intelligent drugs according to claim 1, wherein the training set and the validation set in the step 4) can be classified into 8:2 or 7:3.
5. The method for evaluating the efficacy of treatment of intelligent drugs according to claim 1, wherein the classifiers in the step 5) are four:
a)OneVsRest SVM,
b) Logistic Regression,
c) Random Forest,
d) Bagging meta-estimator with logistic regressor base。
6. the method for evaluating the efficacy of intelligent medicine according to claim 1, wherein the feature variable selection in the step 6) is obtained by combining a specific word occurrence frequency (Count), a word frequency-inverse document frequency (tf-idf, tfidf) and a VADER score, for example:
FS-1:CountVectorizer,
FS-2:CountVectorizer +VADERscore,
FS-3: countVectorizer top 10000 feature vector + vaderrscore,
FS-4:TfidfVectorizer,
FS-5:TfidfVectorizer +VADERscore,
FS-6: tfidfVectorizer top 10000 eigenvectors+vaderrscore.
7. The method for evaluating the efficacy of intelligent medicine according to claim 1, wherein the scheme with the optimal prediction effect in the step 6) is obtained by evaluating F1 score, where f1score=2 (Recall Precision)/(recall+precision); where Recall = true positive/(true positive + false negative), precision = true positive/(true positive + false positive).
8. The method for evaluating the efficacy of a smart drug according to claim 1, wherein the step 1) is characterized by a drug similarity indexThe calculation method of (1) is as follows: let us assume medicine->The corresponding target for treating diseases or symptoms is +.>The method comprises the steps of carrying out a first treatment on the surface of the MedicineThe corresponding target for treating diseases or symptoms is +.>The method comprises the steps of carrying out a first treatment on the surface of the Then->。
9. A machine learning based intelligent drug efficacy evaluation application, characterized in that a plurality of functions can be realized according to the method of any one of claims 1 to 8, comprising:
function 1) evaluating curative effect of a certain medicine, inputting medicine name, obtaining medicine effectiveness score, ranking in the same kind of medicine and ranking side effect;
function 2) searching for corresponding medicines aiming at a certain disease or symptom, inputting names of single or multiple diseases or symptoms, and obtaining single or multiple corresponding medicine effectiveness scores, ranks and ranks of side effects of each medicine;
function 3) ranking the effectiveness of the medicines, the medicines of the same type and side effects thereof aiming at multi-dimensional crowds with different ages, sexes, race, weddings and regions.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111135248.5A CN113838583B (en) | 2021-09-27 | 2021-09-27 | Intelligent medicine curative effect evaluation method based on machine learning and application thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111135248.5A CN113838583B (en) | 2021-09-27 | 2021-09-27 | Intelligent medicine curative effect evaluation method based on machine learning and application thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113838583A CN113838583A (en) | 2021-12-24 |
CN113838583B true CN113838583B (en) | 2023-10-24 |
Family
ID=78970737
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111135248.5A Active CN113838583B (en) | 2021-09-27 | 2021-09-27 | Intelligent medicine curative effect evaluation method based on machine learning and application thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113838583B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115497630B (en) * | 2022-08-24 | 2023-11-03 | 中国医学科学院北京协和医院 | Method and system for processing acute severe ulcerative colitis data |
CN116758062A (en) * | 2023-08-11 | 2023-09-15 | 之江实验室 | Drug effectiveness evaluation method and device |
CN118072980A (en) * | 2024-04-18 | 2024-05-24 | 首都医科大学附属北京儿童医院 | Method and related equipment for evaluating mucociliary clearance function of mucous membrane in nasal cavity |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104200069A (en) * | 2014-08-13 | 2014-12-10 | 周晋 | Drug use recommendation system and method based on symptom analysis and machine learning |
CN104951665A (en) * | 2015-07-22 | 2015-09-30 | 浙江大学 | Method and system of medicine recommendation |
CN107092797A (en) * | 2017-04-26 | 2017-08-25 | 广东亿荣电子商务有限公司 | A kind of medicine proposed algorithm based on deep learning |
CN107403069A (en) * | 2017-07-31 | 2017-11-28 | 京东方科技集团股份有限公司 | A kind of medicine disease association relationship analysis system and method |
CN111599403A (en) * | 2020-05-22 | 2020-08-28 | 电子科技大学 | Parallel drug-target correlation prediction method based on sequencing learning |
CN112116978A (en) * | 2020-09-17 | 2020-12-22 | 陕西师范大学 | Method, system and device for recommending rheumatism immunity medicine |
CN113160879A (en) * | 2021-04-25 | 2021-07-23 | 上海基绪康生物科技有限公司 | Method for predicting drug relocation through side effect based on network learning |
CN113241193A (en) * | 2021-06-01 | 2021-08-10 | 平安科技(深圳)有限公司 | Drug recommendation model training method, recommendation method, device, equipment and medium |
CN113316720A (en) * | 2019-01-15 | 2021-08-27 | 国际商业机器公司 | Determining a drug effectiveness ranking for a patient using machine learning |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11217334B2 (en) * | 2016-02-29 | 2022-01-04 | Mor Research Applications Ltd. | System and method for selecting optimal medications for a specific patient |
US11238966B2 (en) * | 2019-11-04 | 2022-02-01 | Georgetown University | Method and system for assessing drug efficacy using multiple graph kernel fusion |
-
2021
- 2021-09-27 CN CN202111135248.5A patent/CN113838583B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104200069A (en) * | 2014-08-13 | 2014-12-10 | 周晋 | Drug use recommendation system and method based on symptom analysis and machine learning |
CN104951665A (en) * | 2015-07-22 | 2015-09-30 | 浙江大学 | Method and system of medicine recommendation |
CN107092797A (en) * | 2017-04-26 | 2017-08-25 | 广东亿荣电子商务有限公司 | A kind of medicine proposed algorithm based on deep learning |
CN107403069A (en) * | 2017-07-31 | 2017-11-28 | 京东方科技集团股份有限公司 | A kind of medicine disease association relationship analysis system and method |
CN113316720A (en) * | 2019-01-15 | 2021-08-27 | 国际商业机器公司 | Determining a drug effectiveness ranking for a patient using machine learning |
CN111599403A (en) * | 2020-05-22 | 2020-08-28 | 电子科技大学 | Parallel drug-target correlation prediction method based on sequencing learning |
CN112116978A (en) * | 2020-09-17 | 2020-12-22 | 陕西师范大学 | Method, system and device for recommending rheumatism immunity medicine |
CN113160879A (en) * | 2021-04-25 | 2021-07-23 | 上海基绪康生物科技有限公司 | Method for predicting drug relocation through side effect based on network learning |
CN113241193A (en) * | 2021-06-01 | 2021-08-10 | 平安科技(深圳)有限公司 | Drug recommendation model training method, recommendation method, device, equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN113838583A (en) | 2021-12-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113838583B (en) | Intelligent medicine curative effect evaluation method based on machine learning and application thereof | |
Tashkandi et al. | Efficient in-database patient similarity analysis for personalized medical decision support systems | |
Gurulingappa et al. | Semi-Supervised Information Retrieval System for Clinical Decision Support. | |
Zhao et al. | Ensemble learning predicts multiple sclerosis disease course in the SUMMIT study | |
Patil et al. | A new approach: role of data mining in prediction of survival of burn patients | |
Asghar et al. | Health miner: opinion extraction from user generated health reviews | |
Chen et al. | Deep learning with sentence embeddings pre-trained on biomedical corpora improves the performance of finding similar sentences in electronic medical records | |
Li et al. | Modelling online user behavior for medical knowledge learning | |
Afsana et al. | Automatically assessing quality of online health articles | |
Afzal et al. | Context-aware grading of quality evidences for evidence-based decision-making | |
Taghizadeh et al. | SINA-BERT: a pre-trained language model for analysis of medical texts in Persian | |
Pedersen et al. | Deep learning detects and visualizes bleeding events in electronic health records | |
Kartchner et al. | Zero-Shot Information Extraction for Clinical Meta-Analysis using Large Language Models | |
Wegrzyn-Wolska et al. | Social media analysis for e-health and medical purposes | |
Zengul et al. | A practical and empirical comparison of three topic modeling methods using a COVID-19 corpus: LSA, LDA, and Top2Vec | |
Cheng et al. | Prediction of blood culture outcome using hybrid neural network model based on electronic health records | |
Al Amin et al. | Data driven classification of opioid patients using machine learning–an investigation | |
Kusa et al. | Effective matching of patients to clinical trials using entity extraction and neural re-ranking | |
Li et al. | Patient similarity via medical attributed heterogeneous graph convolutional network | |
Rahul et al. | Cardiovascular Disease Classification Using Different Algorithms | |
Zhang et al. | From electronic health records to terminology base: A novel knowledge base enrichment approach | |
Goodwin et al. | Automatically linking registered clinical trials to their published results with deep highway networks | |
Montenegro et al. | The HoPE model architecture: A novel approach to pregnancy information retrieval based on conversational agents | |
Al-Smadi | DeBERTa-BiLSTM: A multi-label classification model of Arabic medical questions using pre-trained models and deep learning | |
Li et al. | Tracking biomedical articles along the translational continuum: a measure based on biomedical knowledge representation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |