CN113838583A - Intelligent drug efficacy evaluation method based on machine learning and application thereof - Google Patents
Intelligent drug efficacy evaluation method based on machine learning and application thereof Download PDFInfo
- Publication number
- CN113838583A CN113838583A CN202111135248.5A CN202111135248A CN113838583A CN 113838583 A CN113838583 A CN 113838583A CN 202111135248 A CN202111135248 A CN 202111135248A CN 113838583 A CN113838583 A CN 113838583A
- Authority
- CN
- China
- Prior art keywords
- medicine
- ranking
- symptom
- evaluating
- medicines
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000003814 drug Substances 0.000 title claims abstract description 168
- 229940079593 drug Drugs 0.000 title claims abstract description 74
- 238000011156 evaluation Methods 0.000 title claims abstract description 23
- 238000010801 machine learning Methods 0.000 title claims abstract description 15
- 230000000694 effects Effects 0.000 claims abstract description 58
- 201000010099 disease Diseases 0.000 claims abstract description 40
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims abstract description 40
- 208000024891 symptom Diseases 0.000 claims abstract description 37
- 238000000034 method Methods 0.000 claims abstract description 17
- 238000012549 training Methods 0.000 claims abstract description 12
- 239000013598 vector Substances 0.000 claims abstract description 12
- 230000006870 function Effects 0.000 claims abstract description 11
- 238000002372 labelling Methods 0.000 claims abstract description 8
- 238000013507 mapping Methods 0.000 claims abstract description 8
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 6
- 230000007246 mechanism Effects 0.000 claims abstract description 5
- 238000012795 verification Methods 0.000 claims abstract description 5
- 230000035606 childbirth Effects 0.000 claims description 6
- 230000001225 therapeutic effect Effects 0.000 claims description 6
- 230000008451 emotion Effects 0.000 claims description 5
- 238000007637 random forest analysis Methods 0.000 claims description 5
- 238000007477 logistic regression Methods 0.000 claims description 4
- 230000007935 neutral effect Effects 0.000 claims description 4
- 238000010200 validation analysis Methods 0.000 claims description 3
- 239000002778 food additive Substances 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 claims 1
- 208000034783 hypoesthesia Diseases 0.000 description 5
- 206010019233 Headaches Diseases 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 208000002173 dizziness Diseases 0.000 description 4
- 231100000869 headache Toxicity 0.000 description 4
- 229960000282 metronidazole Drugs 0.000 description 4
- VAOCPAMSLUNLGC-UHFFFAOYSA-N metronidazole Chemical compound CC1=NC=C([N+]([O-])=O)N1CCO VAOCPAMSLUNLGC-UHFFFAOYSA-N 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 201000009906 Meningitis Diseases 0.000 description 3
- 206010028813 Nausea Diseases 0.000 description 3
- 206010040047 Sepsis Diseases 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000008693 nausea Effects 0.000 description 3
- 208000013223 septicemia Diseases 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 208000004998 Abdominal Pain Diseases 0.000 description 2
- 208000002881 Colic Diseases 0.000 description 2
- IPWKIXLWTCNBKN-UHFFFAOYSA-N Madelen Chemical compound CC1=NC=C([N+]([O-])=O)N1CC(O)CCl IPWKIXLWTCNBKN-UHFFFAOYSA-N 0.000 description 2
- 208000007117 Oral Ulcer Diseases 0.000 description 2
- 206010048685 Oral infection Diseases 0.000 description 2
- 206010043376 Tetanus Diseases 0.000 description 2
- 206010047700 Vomiting Diseases 0.000 description 2
- 208000002399 aphthous stomatitis Diseases 0.000 description 2
- 230000004596 appetite loss Effects 0.000 description 2
- 208000020670 canker sore Diseases 0.000 description 2
- 206010009887 colitis Diseases 0.000 description 2
- 239000002131 composite material Substances 0.000 description 2
- 206010014665 endocarditis Diseases 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011835 investigation Methods 0.000 description 2
- 208000019017 loss of appetite Diseases 0.000 description 2
- 235000021266 loss of appetite Nutrition 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 229960002313 ornidazole Drugs 0.000 description 2
- 208000035824 paresthesia Diseases 0.000 description 2
- 201000001245 periodontitis Diseases 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 230000008673 vomiting Effects 0.000 description 2
- 208000004881 Amebiasis Diseases 0.000 description 1
- 206010001980 Amoebiasis Diseases 0.000 description 1
- 206010011224 Cough Diseases 0.000 description 1
- 208000004145 Endometritis Diseases 0.000 description 1
- 208000010201 Exanthema Diseases 0.000 description 1
- 208000004044 Hypesthesia Diseases 0.000 description 1
- 206010061218 Inflammation Diseases 0.000 description 1
- 206010037660 Pyrexia Diseases 0.000 description 1
- 206010041349 Somnolence Diseases 0.000 description 1
- 208000005392 Spasm Diseases 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 229960000074 biopharmaceutical Drugs 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000013210 evaluation model Methods 0.000 description 1
- 201000005884 exanthem Diseases 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 231100000862 numbness Toxicity 0.000 description 1
- 239000000825 pharmaceutical preparation Substances 0.000 description 1
- 206010037844 rash Diseases 0.000 description 1
- 230000007115 recruitment Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
- G16H70/40—ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medicinal Chemistry (AREA)
- Mathematical Physics (AREA)
- Chemical & Material Sciences (AREA)
- Computing Systems (AREA)
- Pharmacology & Pharmacy (AREA)
- Toxicology (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention discloses an intelligent drug efficacy evaluation method based on machine learning and application thereof, wherein the method comprises the steps of establishing a mapping relation between a drug and a corresponding target treatment disease or symptom; extracting the corresponding potential side effects of the medicines, and calculating similarity indexes among the medicines; labeling the data on the medicine line to mark whether the medicine is effective or not, structuring the text data of the medicine, and extracting multi-dimensional crowd information and medicine and treatment related feature vectors; dividing the structured data into a training set and a verification set, establishing an integrated prediction model and selecting a scheme with optimal prediction effect by utilizing various algorithms and different characteristic variable selection mechanisms; and finally, obtaining the effective rate ranking of the similar medicines according to the medicine similarity index and realizing various functions of medicine curative effect evaluation through the application.
Description
Technical Field
The invention relates to the fields of biomedicine and artificial intelligence, in particular to an intelligent medicine curative effect evaluation method based on machine learning and application thereof.
Background
In the field of biopharmaceuticals, efficacy (efficacy), therapeutic effect (efffectiveness) and benefit (efficiency) are three indicators used to evaluate drugs at different times and environments. Efficacy generally refers to the magnitude of therapeutic effect that a drug can achieve under ideal conditions during clinical trials, and is the maximum desired effect of the drug. The curative effect is the magnitude of the therapeutic action which can be achieved by the medicine under the actual medical and sanitary conditions, namely the data result obtained in the real world. Benefit refers to whether the value of a drug is comparable to the cost paid by an individual or society, not only considering clinical effectiveness, but also cost benefits, which are generally used for health economics evaluations, to the public.
When a drug passes the third phase clinical trial, approved for marketing, its efficacy will be tested by real world tests. Under the real condition, the factors such as patient groups, drug dosage, use frequency and the like are much more complex compared with clinical random tests, so that the evaluation of the drug curative effect in the real world is more and more looked at, and the information extraction such as on-line drug evaluation, case reports, drug use guidelines, cautionary matters and the like can be realized by mass data mining due to the development of a big data technology.
The existing research and method for the curative effect of the medicine from the real world only aims at a single data source, for example, the curative effect of the medicine is evaluated through investigation reports, clinical follow-up visits or four-stage tests, and the information of the population which can be covered by the research and treatment method is still influenced by factors such as scientific research expenses, research scale, selective deviation and the like. The invention integrates data of different information sources by utilizing a text mining technology and an integrated machine learning algorithm, extracts effective characteristic values, establishes a set of comprehensive drug curative effect evaluation system and a decision mechanism applied by the comprehensive drug curative effect evaluation system, and realizes multiple functions of drug recommendation, curative effect and side effect evaluation, similar drug comparison and the like.
The invention can not only carry out long-term and large-scale monitoring and evaluation on the curative effect of the medicine after the medicine is on the market, but also can be further used as an important reference index for the benefit evaluation of the effectiveness and the cost price of the medicine.
Disclosure of Invention
The invention aims to provide an intelligent drug evaluation method based on machine learning and application thereof, which combines mass internet data with hospital case history list, follow-up visit or investigation report data to obtain larger-range drug use condition real-time feedback information and comprehensively evaluates drug curative effect from multiple information sources. The adverse factors such as high cost and artificial inclusion and exclusion standards caused by recruitment of subjects in the process of evaluating the curative effect of the traditional medicine after the traditional medicine is on the market are avoided, and the using curative effect and the side effect of the medicine under various conditions are evaluated more comprehensively and efficiently.
The invention provides an intelligent medicine evaluation method based on machine learning in a first aspect, which specifically comprises the following steps:
1) extracting the mapping relation between the medicine and the corresponding treatment disease or symptom through the medicine use instruction and the medicine guide of the medical supervision bureau: supposing that the medicine isI = 1.. I, which corresponds to a target treatment disease or symptom of IJ = 1.. J is J of J target diseases or symptoms, with the corresponding potential side effects of J target diseases or symptomsK = 1,.. K is K potential side effects.
Calculate the similar drug index before the drug. In particular, suppose a drug productThe corresponding target treatment disease or symptom is(ii) a Medicine and food additiveThe corresponding target treatment disease or symptom is(ii) a Then medicine similarity index。
2) Treating diseases or symptoms of the on-line medicine comments, the medical record sheets of the hospitals and the follow-up records according to the medicine targets in the step 1)Grouping, and labeling each comment and medical record sheet as 'effective' or 'ineffective' respectively.
Specifically, the labeling mode is automatic labeling, emotion analysis is performed according to semantics, a sentence is scored as a value from-1 (negative) to 1 (positive) by using a VADER (value Aware Dictionary and sEntiment reader), and 0 is a neutral opinion. Further, manual checking can be performed after automatic labeling.
3) Structuring the text data: a) extracting multi-dimensional crowd information such as age, gender, race, marriage and childbirth, region and the like, b) extracting feature vectors: extracting characteristic words or phrases such as anti-inflammation, fever, headache, cold, cough and the like from online medicine comments, medical record sheets and follow-up records to obtain characteristic vectors;
4) the text data is converted into a structured data set which is divided into a training set and a verification set according to a certain proportion.
In particular, the training set and validation set may be divided in a ratio of 8:2, 7:3, or 6: 4.
5) Various algorithms are selected as classifiers to predict the binary problem.
Specifically, four classifiers for the two-class problem may be selected: a) OneVsRest SVM, b) Logistic Regression, c) Random Forest, d) Bagging meta-estimator with Logistic Regression base.
6) And establishing different characteristic variable selection mechanisms, and selecting a scheme with optimal prediction effect of various classifiers under different characteristic variables.
Specifically, the feature variable selection may be obtained by permutation and combination of specific word occurrence frequency (Count), word frequency-inverse document frequency (tf-idf, i.e., Tfidf) and VADER score, such as:
FS-1:CountVectorizer,
FS-2:CountVectorizer +VADERscore,
FS-3: countvectorer top 10000 feature vector + VADERscore,
FS-4:TfidfVectorizer,
FS-5:TfidfVectorizer +VADERscore,
FS-6: tfidfvactorizer top 10000 eigenvector + VADERscore.
Further, the optimal prediction scheme is evaluated by F1-score,
F1 score = 2*(Recall * Precision) / (Recall + Precision);
wherein Recall = true positive/(true positive + false negative), Precision = true positive/(true positive + false positive).
The invention provides an application of intelligent medicine evaluation based on machine learning, which comprises multiple functions: function 1) evaluating the curative effect of a certain medicine, inputting the name of the medicine, and obtaining the effectiveness score of the medicine, the ranking in the same kind of medicine and the ranking of side effects; function 2) searching corresponding medicines for a certain disease or symptom, inputting names of single or multiple diseases or symptoms, and obtaining effectiveness scores, ranking and side effect ranking of each medicine of the single or multiple corresponding medicines; function 3) ranking the effectiveness of the medicine, the similar medicines and the side effects thereof aiming at multi-dimensional people of different ages, sexes, ethnicities, marriage and childbirth, regions and the like.
In the embodiment of the invention, the mapping relation between the medicine and the corresponding target treatment disease or symptom is determined through approved information such as a medicine use instruction book, a medicine guide and the like, the effectiveness of the medicine is predicted by utilizing information such as on-line medicine comments, medical record lists, follow-up records and the like, and the process of establishing a prediction model can be divided into the following steps: firstly, performing emotion analysis on a statement through a VADER (variable amplitude error rate), calibrating whether a medicine aimed at by the statement is effective, then dividing a structured data set into a training set and a verification set, and training a plurality of classifiers (models) of two classes in different feature extraction modes to obtain an optimal scheme. On the application level, the prediction result of whether the medicine is effective is applied to the initially determined mapping relation, and the effective rate of each medicine for treating the disease or symptom and the side effect thereof and the effective rate of a single or a plurality of similar medicines for treating the disease or symptom with the same target are calculated. Therefore, when a certain medicine is input at the user end, the effective rate and the side effect of the medicine corresponding to the target treatment disease or symptom and the effective rates of similar medicines can appear; when a disease or symptom is inputted, its effective rate corresponding to a single or multiple drugs and its respective side effects may appear. In addition, the information of the age, sex, race, marriage and childbearing, region, etc. of the people taking the medicine and the effective rate of the subdivided people can be known by screening the crowd information.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of an intelligent drug efficacy evaluation method based on machine learning in an embodiment of the present invention.
Fig. 2 is a schematic diagram of a software operation structure of a machine learning-based intelligent drug efficacy evaluation application in an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below in a clear and complete manner with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The following describes embodiments of the present invention in detail.
Example 1
Referring to fig. 1, fig. 1 is a schematic flowchart of a method for evaluating a therapeutic effect of an intelligent drug based on machine learning according to an embodiment of the present invention, and as shown in fig. 1, the method for evaluating a therapeutic effect of an intelligent drug based on machine learning includes:
101. and establishing a mapping relation between the medicine and the corresponding target treatment disease or symptom.
As shown in Table 1, the disease or symptom treated by metronidazole is septicemia, endocarditis, meningitis, colitis, tetanus, oral ulcer, etc. Extracting the target treatment disease or symptom corresponding to the medicine from the medicine use instruction or medicine guide, and respectively marking the medicine asCorresponding to the target treatment disease or symptom is. In this example= metronidazole (alpha-nitrozole),= { septicemia, endocarditis, meningitis, colitis, tetanus, canker sore. After the mapping relationship between the drugs and the corresponding target treatment diseases or symptoms is established, step 102 is performed.
102. Extracting the corresponding potential side effects of the medicines, and calculating similarity indexes among the medicines.
As shown in Table 1, potential side effects of metronidazole use are nausea, vomiting, loss of appetite, abdominal cramps, headache, dizziness, paresthesia, numbness of the extremities, etc. Extracting the corresponding potential side effects of the medicine in the medicine use instruction or medicine guide and marking the extracted potential side effects as. In this example= { nausea, vomiting, loss of appetite, abdominal cramps, headache, dizziness, paresthesia, numbness of limbs.
As shown in the table 2 below, the following examples,= Ornidazole, use of drug Ornidazole for treatment of disease or symptoms= { septicemia, amebiasis, meningitis, periodontitis, endometritis, canker sore. }, potential side effects= { nausea, oral malodor, dizziness, drowsiness, rash, spasm, confusion, numbness of limbs.
The similarity index of the two drugs is calculated,
further, a dictionary of data for designing a drug to treat a disease or symptom includes upper and lower concepts of a disease or symptom, such as periodontitis and oral ulcer are both oral infections, and if the upper concept of oral infections is used, the similarity index of two drugs is 3/8= 0.375.
Furthermore, a similar word merging data dictionary is designed, which contains words with similar levels that can be considered to be similar, such as { numbness of limbs } and { numbness of limbs }, { headache } and { dizziness }, etc.
103. And marking the on-line comments, the medical record list and the follow-up record of the medicine with labels to mark whether the medicine is effective or not.
Specifically, the sentence was rated as a value of-1 (negative) to 1 (positive) with a value of-0 being a neutral opinion using a VADER (value Aware Dictionary and sEntiment reader). Using the VADER module in the statistical analysis software python, using the polarity _ score method, four scores are given for the sentence: (a) negation, (b) aggressiveness, (c) neutral score, (d) composite sentiment score. The composite score is the sum of the first three scores and is used for measuring positive or negative emotion of the sentence. The application is suitable for emotion analysis of English sentences, so that all data sources for drug efficacy evaluation are mainly English as much as possible, for example, Chinese texts are collected and can be translated into English by an automatic translator and manually checked.
104. And structuring the text data, and extracting multi-dimensional crowd information and medicine and treatment related feature vectors.
The crowd information comprises but is not limited to age, gender, race, marriage and childbirth conditions, regions and the like, such information of the on-line medicine comments can be obtained by a computer background database, the medical record management system of the hospital can also obtain such information, and the follow-up records should contain the crowd information as much as possible before the follow-up survey is designed.
Words which can reflect important characteristics of texts in online medicine comments, hospital medical records and follow-up records are converted into vector forms through word frequency (CountVec) and word frequency-inverse document frequency (tf-idf).
Further, the feature vector may have the following rules, for example:
FS-1:CountVectorizer,
FS-2:CountVectorizer +VADERscore,
FS-3: countvectorer top 10000 feature vector + VADERscore,
FS-4:TfidfVectorizer,
FS-5:TfidfVectorizer +VADERscore,
FS-6: tfidfvactorizer top 10000 eigenvector + VADERscore.
105. The structured data is divided into a training set and a validation set.
And dividing the data set converted into the vector form in the steps into a training set and a verification set according to the ratio of 8:2, 7:3 or 6: 4.
106. Various algorithms are selected as classifiers to predict the binary problem.
Further, four commonly used algorithms are chosen to train the classifier, such as a) OneVsRest SVM, b) Logistic Regression, c) Random Forest, d) Bagging meta-estimator with local classifier base.
107. And establishing different characteristic variable selection mechanisms, and selecting a scheme with optimal prediction effect of various classifiers under different characteristic variables.
As shown in Table 3, Table 3 shows F1-score of the data training result obtained by using the four classifiers in step 106 and the six feature variable selection rule permutation combination in step 104,
F1 score = 2*(Recall * Precision) / (Recall + Precision);
wherein Recall = true positive/(true positive + false negative), Precision = true positive/(true positive + false positive). Recall is Recall and Precision.
108. And calculating the effective rate of the medicine aiming at the target treatment disease or symptom by using the optimal scheme obtained by training.
The optimal solution for predicting the disease or symptom of the target treatment obtained from table 3 is random forest (RandomForest), and the feature extraction mode is FS-6: tfidfvactorizer top 10000 eigenvector + VADERscore. The corresponding F1-score is 0.760. The scheme is utilized to predict the data which are not labeled, and the effective rates of a certain medicine for treating different diseases or symptoms are respectively calculated.
109. And obtaining the ranking of the effective rate and the ranking of the potential side effects of the similar medicines according to the medicine similarity index.
Drug similarity index.
Aiming at a certain drug, such as metronidazole, the similarity index of other drugs and the drug is calculated by the method in the step 102, the first five drugs can be taken, and the effective rates of the drug and the similar drugs are respectively calculated by a model. And (4) counting the ranking of the side effects generated by the medicine in all the data sources by using the characteristic words of the potential side effects of the medicine extracted in the step 102, wherein the ranking can be 10, or the ranking can be the top side effect according to the situation.
Example 2
Based on the intelligent drug efficacy evaluation method described in the above embodiment, an intelligent drug efficacy evaluation application based on machine learning is developed, the application background includes a database for collecting and managing the above different data sources, the middle station includes an intelligent drug efficacy evaluation method capable of model parameter adjustment and real-time monitoring, and the foreground can implement the following functions:
1) evaluating the curative effect of a certain medicine, inputting the name of the medicine to obtain the effectiveness score of the medicine, ranking in the same medicine and ranking the side effect;
the medicine effectiveness score is the effective rate of the medicine obtained by the medicine curative effect evaluation model.
2) Searching corresponding medicines aiming at a certain disease or symptom, inputting names of single or multiple diseases or symptoms, and obtaining effectiveness scores, ranking and side effect ranking of each medicine of the single or multiple corresponding medicines;
the disease or symptom is input into the system, the mapping relation between the medicine obtained in step 101 of example 1 and the target treatment disease or symptom is used to find the corresponding medicine, and the effectiveness score, the ranking, the potential side effect ranking and the like of each medicine are respectively calculated through the model.
3) Aiming at multi-dimensional people with different ages, sexes, ethnicities, marriage and childbirth, regions and the like, the effectiveness of the medicine, the medicines of the same kind and the ranking of side effects are carried out;
the crowd information is used as a screening condition for calculating the effectiveness of the medicine and searching the similar medicine and the ranking of the side effects thereof.
Claims (9)
1. An intelligent drug efficacy evaluation method based on machine learning is characterized by comprising the following steps:
1) extracting the mapping relation between the medicine and the corresponding treatment disease or symptom through the medicine use instruction and the medicine guide of the medical supervision bureau: supposing that the medicine isI = 1.. I, which corresponds to a target treatment disease or symptom of IJ = 1.. J is J of J target diseases or symptoms, with the corresponding potential side effects of J target diseases or symptomsK = 1,.. K is K potential side effects; calculating medicineSimilar drug indices between them;
2) treating diseases or symptoms of the on-line medicine comments, the medical record sheets of the hospitals and the follow-up records according to the medicine targets in the step 1)Grouping, labeling each comment and medical record sheet, and respectively marking the comment and the medical record sheet as 'effective' or 'ineffective';
3) structuring the text data:
a) extracting multi-dimensional crowd information such as age, gender, race, marriage and childbirth, region and the like,
b) extracting a feature vector: extracting characteristic words or phrases from online medicine comments, medical history lists and follow-up records to obtain characteristic vectors;
4) dividing a data set converted from text data into a structured data set into a training set and a verification set according to a certain proportion;
5) selecting a plurality of algorithms as classifiers for predicting the two-classification problem;
6) establishing different characteristic variable selection mechanisms, and selecting a scheme with optimal prediction effect of various classifiers under different characteristic variables;
7) calculating the medicine by using the optimal scheme obtained by the training in the step 6)Treatment of diseases or conditions for a targetThe ranking of the effective rate of the medicine in the same class of medicines is obtained according to the medicine similarity index calculated in the step 1), and the potential side effect of the medicine in the step 1) is obtainedAnd extracting characteristic words in the data set and ranking.
2. The method for evaluating the curative effect of the intelligent drug according to claim 1, wherein the labeling in the step 2) is performed automatically, emotion analysis is performed according to semantics, a sentence is scored as a value from-1 (negative) to 1 (positive) by using a VADER (value Aware Dictionary and sEntiment reader), and 0 is a neutral opinion.
3. The method for evaluating the curative effect of an intelligent drug according to claim 2, wherein the labeling in step 2) is automated and then manually checked.
4. The method for evaluating the curative effect of an intelligent drug according to claim 1, wherein the training set and the validation set in step 4) can be classified into 8:2 or 7: 3.
5. The method for evaluating the curative effect of an intelligent drug according to claim 1, wherein the classifiers in the step 5) are four types:
a)OneVsRest SVM,
b) Logistic Regression,
c) Random Forest,
d) Bagging meta-estimator with logistic regressor base。
6. the method for evaluating the curative effect of an intelligent drug according to claim 1, wherein the characteristic variables in step 6) are selected from a list of specific word occurrence frequencies (Count), word frequency-inverse document frequency (tf-idf, Tfidf) and VADER scores, such as:
FS-1:CountVectorizer,
FS-2:CountVectorizer +VADERscore,
FS-3: countvectorer top 10000 feature vector + VADERscore,
FS-4:TfidfVectorizer,
FS-5:TfidfVectorizer +VADERscore,
FS-6: tfidfvactorizer top 10000 eigenvector + VADERscore.
7. The method for evaluating the therapeutic effect of a smart drug according to claim 1, wherein the optimal prediction in step 6) is evaluated by F1-score, F1 score = 2 (decrease Precision)/(decrease + Precision); wherein Recall = true positive/(true positive + false negative), Precision = true positive/(true positive + false positive).
8. The method for evaluating the efficacy of an intelligent drug according to claim 1, wherein the similarity index of drugs in step 1) isThe calculation method comprises the following steps: supposing that the medicine isThe corresponding target treatment disease or symptom is(ii) a Medicine and food additiveThe corresponding target treatment disease or symptom is(ii) a Then。
9. A machine learning based intelligent drug efficacy assessment application that can perform multiple functions according to the method of any of claims 1 to 8, comprising:
function 1) evaluating the curative effect of a certain medicine, inputting the name of the medicine, and obtaining the effectiveness score of the medicine, the ranking in the same kind of medicine and the ranking of side effects;
function 2) searching corresponding medicines for a certain disease or symptom, inputting names of single or multiple diseases or symptoms, and obtaining effectiveness scores, ranking and side effect ranking of each medicine of the single or multiple corresponding medicines;
function 3) ranking the effectiveness of the medicine, the similar medicines and the side effects thereof aiming at multi-dimensional people of different ages, sexes, ethnicities, marriage and childbirth, regions and the like.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111135248.5A CN113838583B (en) | 2021-09-27 | 2021-09-27 | Intelligent medicine curative effect evaluation method based on machine learning and application thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111135248.5A CN113838583B (en) | 2021-09-27 | 2021-09-27 | Intelligent medicine curative effect evaluation method based on machine learning and application thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113838583A true CN113838583A (en) | 2021-12-24 |
CN113838583B CN113838583B (en) | 2023-10-24 |
Family
ID=78970737
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111135248.5A Active CN113838583B (en) | 2021-09-27 | 2021-09-27 | Intelligent medicine curative effect evaluation method based on machine learning and application thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113838583B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115497630A (en) * | 2022-08-24 | 2022-12-20 | 中国医学科学院北京协和医院 | Method and system for processing acute severe ulcerative colitis data |
CN116758062A (en) * | 2023-08-11 | 2023-09-15 | 之江实验室 | Drug effectiveness evaluation method and device |
CN118072980A (en) * | 2024-04-18 | 2024-05-24 | 首都医科大学附属北京儿童医院 | Method and related equipment for evaluating mucociliary clearance function of mucous membrane in nasal cavity |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104200069A (en) * | 2014-08-13 | 2014-12-10 | 周晋 | Drug use recommendation system and method based on symptom analysis and machine learning |
CN104951665A (en) * | 2015-07-22 | 2015-09-30 | 浙江大学 | Method and system of medicine recommendation |
CN107092797A (en) * | 2017-04-26 | 2017-08-25 | 广东亿荣电子商务有限公司 | A kind of medicine proposed algorithm based on deep learning |
CN107403069A (en) * | 2017-07-31 | 2017-11-28 | 京东方科技集团股份有限公司 | A kind of medicine disease association relationship analysis system and method |
US20190035496A1 (en) * | 2016-02-29 | 2019-01-31 | Mor Research Applications Ltd | System and method for selecting optimal medications for a specific patient |
CN111599403A (en) * | 2020-05-22 | 2020-08-28 | 电子科技大学 | Parallel drug-target correlation prediction method based on sequencing learning |
CN112116978A (en) * | 2020-09-17 | 2020-12-22 | 陕西师范大学 | Method, system and device for recommending rheumatism immunity medicine |
US20210134418A1 (en) * | 2019-11-04 | 2021-05-06 | Georgetown University | Method and System for Assessing Drug Efficacy Using Multiple Graph Kernel Fusion |
CN113160879A (en) * | 2021-04-25 | 2021-07-23 | 上海基绪康生物科技有限公司 | Method for predicting drug relocation through side effect based on network learning |
CN113241193A (en) * | 2021-06-01 | 2021-08-10 | 平安科技(深圳)有限公司 | Drug recommendation model training method, recommendation method, device, equipment and medium |
CN113316720A (en) * | 2019-01-15 | 2021-08-27 | 国际商业机器公司 | Determining a drug effectiveness ranking for a patient using machine learning |
-
2021
- 2021-09-27 CN CN202111135248.5A patent/CN113838583B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104200069A (en) * | 2014-08-13 | 2014-12-10 | 周晋 | Drug use recommendation system and method based on symptom analysis and machine learning |
CN104951665A (en) * | 2015-07-22 | 2015-09-30 | 浙江大学 | Method and system of medicine recommendation |
US20190035496A1 (en) * | 2016-02-29 | 2019-01-31 | Mor Research Applications Ltd | System and method for selecting optimal medications for a specific patient |
CN107092797A (en) * | 2017-04-26 | 2017-08-25 | 广东亿荣电子商务有限公司 | A kind of medicine proposed algorithm based on deep learning |
CN107403069A (en) * | 2017-07-31 | 2017-11-28 | 京东方科技集团股份有限公司 | A kind of medicine disease association relationship analysis system and method |
CN113316720A (en) * | 2019-01-15 | 2021-08-27 | 国际商业机器公司 | Determining a drug effectiveness ranking for a patient using machine learning |
US20210134418A1 (en) * | 2019-11-04 | 2021-05-06 | Georgetown University | Method and System for Assessing Drug Efficacy Using Multiple Graph Kernel Fusion |
CN111599403A (en) * | 2020-05-22 | 2020-08-28 | 电子科技大学 | Parallel drug-target correlation prediction method based on sequencing learning |
CN112116978A (en) * | 2020-09-17 | 2020-12-22 | 陕西师范大学 | Method, system and device for recommending rheumatism immunity medicine |
CN113160879A (en) * | 2021-04-25 | 2021-07-23 | 上海基绪康生物科技有限公司 | Method for predicting drug relocation through side effect based on network learning |
CN113241193A (en) * | 2021-06-01 | 2021-08-10 | 平安科技(深圳)有限公司 | Drug recommendation model training method, recommendation method, device, equipment and medium |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115497630A (en) * | 2022-08-24 | 2022-12-20 | 中国医学科学院北京协和医院 | Method and system for processing acute severe ulcerative colitis data |
CN115497630B (en) * | 2022-08-24 | 2023-11-03 | 中国医学科学院北京协和医院 | Method and system for processing acute severe ulcerative colitis data |
CN116758062A (en) * | 2023-08-11 | 2023-09-15 | 之江实验室 | Drug effectiveness evaluation method and device |
CN118072980A (en) * | 2024-04-18 | 2024-05-24 | 首都医科大学附属北京儿童医院 | Method and related equipment for evaluating mucociliary clearance function of mucous membrane in nasal cavity |
Also Published As
Publication number | Publication date |
---|---|
CN113838583B (en) | 2023-10-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Basiri et al. | A novel method for sentiment classification of drug reviews using fusion of deep and machine learning techniques | |
Castillo-Sánchez et al. | Suicide risk assessment using machine learning and social networks: a scoping review | |
Perez et al. | Semi-supervised medical entity recognition: A study on Spanish and Swedish clinical corpora | |
CN113838583A (en) | Intelligent drug efficacy evaluation method based on machine learning and application thereof | |
Ramachandran et al. | Named entity recognition on bio-medical literature documents using hybrid based approach | |
Liu et al. | Extracting features with medical sentiment lexicon and position encoding for drug reviews | |
Wu et al. | KAICD: A knowledge attention-based deep learning framework for automatic ICD coding | |
Shen et al. | Enhancing ontology-driven diagnostic reasoning with a symptom-dependency-aware Naïve Bayes classifier | |
Afsana et al. | Automatically assessing quality of online health articles | |
Falissard et al. | Neural translation and automated recognition of ICD-10 medical entities from natural language: Model development and performance assessment | |
Taghizadeh et al. | SINA-BERT: a pre-trained language model for analysis of medical texts in Persian | |
Rakhsha et al. | Detecting adverse drug reactions from social media based on multichannel convolutional neural networks modified by support vector machine | |
Al-Jefri et al. | Using machine learning for automatic identification of evidence-based health information on the web | |
Chaturvedi et al. | Identifying mentions of pain in mental health records text: a natural language processing approach | |
Roosan et al. | Artificial intelligent context-aware machine-learning tool to detect adverse drug events from social media platforms | |
Al Amin et al. | Data driven classification of opioid patients using machine learning–an investigation | |
Cousyn et al. | Towards using scientific publications to automatically extract information on rare diseases | |
Liu et al. | Sentiment classification with medical word embeddings and sequence representation for drug reviews | |
Al-Smadi | DeBERTa-BiLSTM: A multi-label classification model of Arabic medical questions using pre-trained models and deep learning | |
Vithanage et al. | Contextual Word Embedding for Biomedical Knowledge Extraction: A Rapid Review and Case Study | |
Liu et al. | Clinical quantitative information recognition and entity-quantity association from Chinese electronic medical records | |
Shi et al. | Enhancing efficiency and capacity of telehealth services with intelligent triage: a bidirectional LSTM neural network model employing character embedding | |
He et al. | A method of electronic medical record similarity computation | |
Raza | Improving Clinical Decision Making with a Two-Stage Recommender System: A Case Study on MIMIC-III Dataset | |
Gatto et al. | HealthE: Recognizing Health Advice & Entities in Online Health Communities |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |