CN108376567A - A kind of clinical medicine based on label propagation algorithm-adverse drug reaction detection method - Google Patents

A kind of clinical medicine based on label propagation algorithm-adverse drug reaction detection method Download PDF

Info

Publication number
CN108376567A
CN108376567A CN201810010035.1A CN201810010035A CN108376567A CN 108376567 A CN108376567 A CN 108376567A CN 201810010035 A CN201810010035 A CN 201810010035A CN 108376567 A CN108376567 A CN 108376567A
Authority
CN
China
Prior art keywords
label
sample
similarity
drug
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810010035.1A
Other languages
Chinese (zh)
Other versions
CN108376567B (en
Inventor
张强
魏小鹏
燕智策
赵腊生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University
Original Assignee
Dalian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University filed Critical Dalian University
Priority to CN201810010035.1A priority Critical patent/CN108376567B/en
Publication of CN108376567A publication Critical patent/CN108376567A/en
Application granted granted Critical
Publication of CN108376567B publication Critical patent/CN108376567B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Medicinal Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Toxicology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The clinical medicine adverse drug reaction detection method based on label propagation algorithm that the present invention relates to a kind of.New similitude based on given drug sample set reconstructs label circulation way with label initialization, and then for the detection of drug adverse drug reaction.First, drug characteristic is filtered using CHI methods, selection includes the larger feature of information content;Secondly, new sample similarity is constructed according to the sample similarity of sample label similitude and Laplace operator adjustment;Then, the information based on known label sample establishes the initialization information of Unknown Label sample;Finally, the drug of detection adverse reaction is propagated by label.The present invention reconstructs drug Similarity measures mode and label circulation way so that the similitude between drug is more accurate, and label circulation way is more smooth, can effectively improve the detection of clinical stage drug adverse drug reaction.

Description

Label propagation algorithm-based clinical drug-drug adverse reaction detection method
Technical Field
The invention relates to the field of medicine safety detection, in particular to a method for detecting adverse reactions of clinical medicines and medicines based on a label propagation algorithm.
Background
In the traditional drug safety detection methods, methods such as a frequency method (a report ratio method (PRR), a report ratio method (ROR) and a comprehensive standard Method (MHRA)) and a bayesian method (a bayesian confidence coefficient propagation neural network (BCPNN) and a multivariate gamma-poisson distribution subtraction Method (MGPS)) are all used for detecting the drugs with adverse reactions in the market. In real life, the medicine before the market is detected, so that some unsafe medicines are prevented from appearing on the market, and other diseases can be caused or the death of the patient can be caused after the unsafe medicines are taken. In recent years, with the heat of big data, the big data method is also used in the medical field to detect new drugs, and the main detection methods are divided into two categories: a similarity-based approach and a classification model-based approach. The similarity-based approach uses the assumption that similar drugs and the same drug have the same effect. The classification model-based method regards the drug research problem as a binary classification problem and utilizes the traditional data mining or machine learning method for detection. Currently, in the field of big data research of drugs, researchers are more used to methods based on similarity assumptions, because the methods based on similarity are more capable of explaining the cause of adverse reactions of drugs, and can also obtain higher detection capability compared with methods based on classification models.
Although researchers have made a great deal of contribution in drug testing using similarity-based methods, there are still a great number of adverse reactions in new drugs appearing on the market. This is because the similarity-based method cannot accurately classify drugs by directly using the similarity between drugs (there is a phenomenon that a large number of classes overlap between them). The Label Propagation Algorithm (LPA) is an adverse drug reaction detection algorithm proposed based on a similarity method. In the label propagation algorithm, researchers directly utilize the similarity between samples according to samples with known labels to iteratively propagate labels until the label information value of the samples is converged, so that adverse reaction information of the detected samples can be obtained. However, this method has the disadvantages of the similarity method, and also has the disadvantages of the sample data characteristic information selection and the unlabeled sample label initialization mode.
Disclosure of Invention
The invention provides a clinical medicine-medicine adverse reaction detection method based on a label propagation algorithm, which is used for correspondingly adjusting the label propagation algorithm from the aspects of given data characteristics, data sample similarity and sample label initialization, so as to improve the defects of the medicine similarity method and the label propagation algorithm.
The technical scheme adopted by the invention for solving the technical problem is to provide a label propagation algorithm-based clinical medicine-adverse medicine reaction detection method, which comprises the following steps:
step 1: filtering the medicine characteristics by adopting a CHI-square (CHI) method, and selecting the characteristics with larger information content;
step 2: constructing new sample similarity according to the sample label similarity and the sample similarity adjusted by the Laplace operator;
and step 3: establishing initialization information of an unknown label sample based on the information of the known label sample;
and 4, step 4: and (3) integrating the step 1, the step 2 and the step 3 to obtain a new label propagation algorithm, and using the algorithm to obtain a detection result of the sample to be identified.
Wherein, the step 1, the step 2 and the step 3 comprise the following specific steps:
(1) the drug data set includes two parts: a drug sample dataset and a drug label dataset. In the sample data set of drugs, each drug is represented by a 1 × N binary vector, and N represents the total number of samples. In the drug label dataset, each drug is represented by a 1 × c vector, c represents both the number of samples of known labels and the number of multiple labels of the samples, and the label dataset of the drug is often represented by Y;
(2) in the training data set of the medicine, the CHI method is used to calculate the sample eigenvalue, and the eigenvalue with a large information content is selected from all the data of the medicine:
wherein,represents a feature tiIn class ckThe frequency of occurrence of;representing the degree to which features collectively appear in a certain category; a represents a category ckIn which the feature t is includediB represents a non-category ckIn which the feature t is includediC represents a category ckDoes not contain the feature tiD represents a non-category ckDoes not contain the feature tiN ═ a + b + c + d denotes the total number of samples;
(3) solving a sample similarity matrix A after the adjustment of the laplacian of the medicine in the step (2):
siand sjRepresenting the vector formed by the ith sample and the jth drug feature.
(4) Obtaining a label similarity matrix C of the medicine:
representing the weight of the t label; n is a radical ofpRepresenting the total number of samples, NtRepresenting the number of the t-th label in the sample label; l is a 1 x n vector and,a t-th label representing an i-th sample label vector; representing unknown label samples xjK- ξ neighbor set of (a) contains a subset of all labeled exemplars;is the average of the previous three cases, indicating the similarity between unlabeled exemplar labels.
(5) A similarity matrix S, S ═ TC.
(6) Reconstructing label initialization information of an unknown label sample by using label information of a known label sample and a similar matrix A:
wherein, PdiffRepresenting the probability of a reaction with a similarity of less than 0.5, P, in a sample of known tagssimIndicating the probability of a reaction with similarity greater than 0.5 in a sample of known tags.
The step 4 comprises the following specific steps:
(1) and (3) carrying out iterative normalization processing on the similarity matrix S by using a Bregmanian-Bi-Stochastication (BBS) algorithm to obtain a normalized convergence matrix W.
(2) And (3) detecting the medicine by using a label propagation algorithm according to the normalized matrix W in the step (1) and the step (6):
wherein u represents that the medicine obtains the label information of u part from other medicines, and the label information of 1-u part of the medicine is reserved; i denotes an N × N identity matrix.
On the basis of a label propagation model, theoretical analysis and practice are carried out by respectively utilizing a CHI feature extraction method, a Laplace operator and label similarity method and a user-defined unknown label sample acquisition initialization method from the aspects of data features, sample similarity and sample label initialization, so that the improved model is more favorable for detecting adverse drug reaction events.
Drawings
Fig. 1 is a flow chart illustrating a label propagation method integrating multiple modes.
Detailed Description
As shown in fig. 1, in order to improve the theoretical label propagation algorithm and effectively detect experimental drugs, a drug data set is obtained, sample features in the drug data set are filtered by using a CHI feature extraction method, and features with large information content are selected from the sample features; secondly, improving a Jacquest correlation coefficient (TC) method by adopting a Laplace algorithm, calculating sample similarity of the medicines, calculating label similarity of the medicines according to a label similarity method, and reconstructing similarity of the medicines according to the sample similarity of the medicines and the label similarity of the medicines; then, carrying out normalization processing on the similarity matrix of the medicine by using a BBS algorithm to obtain a similarity normalization matrix of the medicine; and finally, initializing the label information of the test sample based on the label information of the training medicine, iteratively propagating the label according to the label propagation idea until the label information of the sample is converged, and calculating by using an evaluation method to obtain an evaluation result.
The invention is described in detail below with reference to examples and figures:
the experimental data of the invention are from a FAERS DDI dataset database and a Chemical structure dataset database, 645 medicines and 63473 adverse reaction information which occur between the medicines can be mined from the FAERS DDI dataset database, and are represented by a DDI dataset; chemical structure data for these 645 drugs are available from the Chemical structure dataset database, each represented by a 881-dimensional {0,1} vector. The data used in the experiment are the data of the pretreated medicines in the medicines with the same chemical structure, namely the data used are completely different data. And 5-fold cross validation is carried out on the preprocessed data. The specific process is as follows:
the method comprises the following steps: performing initial preprocessing on the acquired medicine characteristic data, wherein the initial preprocessing comprises deleting medicines with the same characteristics and randomly reserving one of the medicines; and deleting the feature column with only one feature value in the features. Finally 638 medicines are obtained, and 616 characteristics of the 638 medicines are obtained. In the experiment, 638 medicines are randomly divided into 5 equal parts (corresponding feature matrix and label matrix can be obtained) by using a cross validation function, one part of medicines is taken out in each experiment as a test set, and the rest medicines are taken as training sets for validation.
1. And (3) screening all the medicine characteristics in the training medicine data set by using a CHI method:
wherein, χ2(ti,Ck) Represents class CkMiddle feature tiThe amount of information contained;representing features t in all classesiAverage information amount of (2). If it isSelecting a feature ti(ii) a Otherwise, delete feature ti
2. In the construction of unknown label samples, the initialization information of the labels is as follows:
wherein, PdiffRepresenting the probability of a reaction with a similarity of less than 0.5, P, in a sample of known tagssimIndicating the probability of a reaction with similarity greater than 0.5 in a sample of known tags.
Step two: solving a sample similarity matrix A, a label similarity matrix C and a formed new similarity matrix S:
S=TC.*C;
liand ljRepresenting the eigenvectors of the ith sample and the jth sample, and using a k- ξ nearest neighbor method in the label similarity matrix C calculation process, wherein k is 2 and represents k immediately, and ξ is 0.80 and represents threshold nearest neighbor.
Step three: calculating a normalized matrix W by using the similarity matrix S in the step one:
wherein l is a vector with n × 1 dimensional elements all being 1; w+Representing the positive part of the matrix W.
Step four: and (3) carrying out label propagation by using a label propagation algorithm:
wherein u represents that the medicine obtains the label information of u part from other medicines, and the label information of 1-u part of the medicine is reserved; i denotes a 638 × 638 identity matrix. In the experiment, the optimal value of u is as follows: u is 0.97 and the identification results are shown in table 1.
TABLE 1 comparison of detection rates of the conventional method and the method of the present invention
Model (model) AUC AUPR
Conventional label propagation algorithm 0.8063+/-0.0050 0.6457+/-0.0154
Proposed label propagation algorithm 0.8119+/-0.0054 0.6522+/-0.0163
According to the steps, the traditional label propagation algorithm in the aspect of medicine detection is compared with the label propagation algorithm integrating various methods, and as can be observed from the table 1, the method provided by the invention is obviously superior to the traditional method.
In conclusion, the LPA method provided by the invention has a good identification effect on given adverse drug reaction data and has strong robustness. The medicine data is filtered from the aspects of characteristics, similarity and initial label values, the traditional similarity calculation mode is improved, and the label initialization mode is adjusted, so that the label is judged more easily, and the detection accuracy is increased.
The above description is only for the best mode of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can make equivalent changes in the technical solution and the inventive concept of the present invention within the technical scope of the present invention.

Claims (3)

1. A clinical medicine-adverse drug reaction detection method based on a label propagation algorithm is characterized by comprising the following steps:
step 1: filtering the medicine characteristics by adopting a chi-square method, and selecting the characteristics with larger information content;
step 2: constructing new sample similarity according to the sample label similarity and the sample similarity adjusted by the Laplace operator;
and step 3: establishing initialization information of an unknown label sample based on the information of the known label sample;
and 4, step 4: and (3) integrating the step 1, the step 2 and the step 3 to obtain a new label propagation algorithm, and using the algorithm to obtain a detection result of the sample to be identified.
2. The method for detecting adverse drug reactions of clinical drugs based on the label propagation algorithm as claimed in claim 1, wherein the method model for performing the feature filtering in step 1 is as follows:
wherein,represents a feature tiIn class ckThe frequency of occurrence of;representing the degree to which features collectively appear in a certain category; a represents a category ckIn which the feature t is includediB represents a non-category ckIn which the feature t is includediC represents a category ckDoes not contain the feature tiD represents a non-category ckDoes not contain the feature tiN ═ a + b + c + d denotes the total number of samples;
the method model adopted for constructing the new sample similarity in the step 2 is as follows:
S(i,j)=TC(i,j)after.*C(i,j)
wherein,representing the similarity between sample i and sample j; c (i, j) represents the similarity between the sample labels, and the formula is:
representing the weight of the t label; n is a radical ofpRepresenting the total number of samples, NtRepresenting the number of the t-th label in the sample label; l is a 1 x n vector and,a t-th label representing an i-th sample label vector; representing unknown label samples xjK- ξ neighbor set of (a) contains a subset of all labeled exemplars;is the average of the previous three cases, indicating the similarity between unlabeled exemplar labels;
the method model adopted for establishing the label information initialization of the unknown label sample in the step 3 is as follows:
wherein, PdiffRepresenting the probability of a reaction with a similarity of less than 0.5, P, in a sample of known tagssimIndicating the probability of a reaction with similarity greater than 0.5 in a sample of known tags.
3. The method for detecting adverse drug reactions of clinical drugs based on the label propagation algorithm as claimed in claim 2, wherein: step 3 as F ═ (1-u) (I-uW)-1And Y is propagated to obtain a detection result.
CN201810010035.1A 2018-01-05 2018-01-05 Label propagation algorithm-based clinical drug-drug adverse reaction detection method Active CN108376567B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810010035.1A CN108376567B (en) 2018-01-05 2018-01-05 Label propagation algorithm-based clinical drug-drug adverse reaction detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810010035.1A CN108376567B (en) 2018-01-05 2018-01-05 Label propagation algorithm-based clinical drug-drug adverse reaction detection method

Publications (2)

Publication Number Publication Date
CN108376567A true CN108376567A (en) 2018-08-07
CN108376567B CN108376567B (en) 2022-04-01

Family

ID=63016617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810010035.1A Active CN108376567B (en) 2018-01-05 2018-01-05 Label propagation algorithm-based clinical drug-drug adverse reaction detection method

Country Status (1)

Country Link
CN (1) CN108376567B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111383725A (en) * 2018-12-28 2020-07-07 国家食品药品监督管理总局药品评价中心 Adverse reaction data identification method and device, electronic equipment and readable medium
CN115083623A (en) * 2022-06-22 2022-09-20 开封市中心医院 Adverse drug reaction mining method, system, terminal and medium based on label propagation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102043896A (en) * 2009-12-28 2011-05-04 中国人民解放军第二军医大学东方肝胆外科医院 Clinical tissue sample bank information management method
US20120000592A1 (en) * 2010-07-01 2012-01-05 Sagent Pharmaceuticals, Inc. Label, labeling system and method of labeling for containers for drug products
CN105354595A (en) * 2015-10-30 2016-02-24 苏州大学 Robust visual image classification method and system
CN106055879A (en) * 2016-05-24 2016-10-26 北京千安哲信息技术有限公司 Adverse drug reaction mining method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102043896A (en) * 2009-12-28 2011-05-04 中国人民解放军第二军医大学东方肝胆外科医院 Clinical tissue sample bank information management method
US20120000592A1 (en) * 2010-07-01 2012-01-05 Sagent Pharmaceuticals, Inc. Label, labeling system and method of labeling for containers for drug products
CN105354595A (en) * 2015-10-30 2016-02-24 苏州大学 Robust visual image classification method and system
CN106055879A (en) * 2016-05-24 2016-10-26 北京千安哲信息技术有限公司 Adverse drug reaction mining method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111383725A (en) * 2018-12-28 2020-07-07 国家食品药品监督管理总局药品评价中心 Adverse reaction data identification method and device, electronic equipment and readable medium
CN115083623A (en) * 2022-06-22 2022-09-20 开封市中心医院 Adverse drug reaction mining method, system, terminal and medium based on label propagation

Also Published As

Publication number Publication date
CN108376567B (en) 2022-04-01

Similar Documents

Publication Publication Date Title
US8340437B2 (en) Methods and systems for determining optimal features for classifying patterns or objects in images
CN112270666A (en) Non-small cell lung cancer pathological section identification method based on deep convolutional neural network
CN111009321A (en) Application method of machine learning classification model in juvenile autism auxiliary diagnosis
US11710540B2 (en) Multi-level architecture of pattern recognition in biological data
Ma et al. A new classifier fusion method based on historical and on-line classification reliability for recognizing common CT imaging signs of lung diseases
CN110705621A (en) Food image identification method and system based on DCNN and food calorie calculation method
US20230112591A1 (en) Machine learning based medical data checker
WO2022100497A1 (en) Method for determining mutation state of epidermal growth factor receptor, and medium and electronic device
Chakradeo et al. Breast cancer recurrence prediction using machine learning
Ding et al. High-order correlation detecting in features for diagnosis of Alzheimer’s disease and mild cognitive impairment
Tian et al. Radiomics and its clinical application: artificial intelligence and medical big data
CN108376567B (en) Label propagation algorithm-based clinical drug-drug adverse reaction detection method
Laajili et al. Application of radiomics features selection and classification algorithms for medical imaging decision: MRI radiomics breast cancer cases study
Batool et al. Towards Improving Breast Cancer Classification using an Adaptive Voting Ensemble Learning Algorithm
CN114596253A (en) Alzheimer's disease identification method based on brain imaging genome features
Nugroho et al. Image dermoscopy skin lesion classification using deep learning method: systematic literature review
Thapa et al. Deep learning for breast cancer classification: Enhanced tangent function
CN116805522A (en) Diagnostic report output method, device, terminal and storage medium
Warjurkar et al. A study on brain tumor and parkinson’s disease diagnosis and detection using deep learning
Çınaroğlu et al. New initialization approaches for the k-means and particle swarm optimization based clustering algorithms
Syafiandini et al. Cancer subtype identification using deep learning approach
Deepa et al. Performance Analysis of the Classification of Breast Cancer
Ashraf et al. Iterative weighted k-NN for constructing missing feature values in Wisconsin breast cancer dataset
Depeursinge et al. A classification framework for lung tissue categorization
Chhabra et al. Comparison of different edge detection techniques to improve quality of medical images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant