CN109215799B - Screening method for false correlation signals in combined adverse drug reaction report data - Google Patents

Screening method for false correlation signals in combined adverse drug reaction report data Download PDF

Info

Publication number
CN109215799B
CN109215799B CN201810946651.8A CN201810946651A CN109215799B CN 109215799 B CN109215799 B CN 109215799B CN 201810946651 A CN201810946651 A CN 201810946651A CN 109215799 B CN109215799 B CN 109215799B
Authority
CN
China
Prior art keywords
adverse reaction
correlation
data
drug
medicine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810946651.8A
Other languages
Chinese (zh)
Other versions
CN109215799A (en
Inventor
魏建香
张亚楠
刘美含
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN201810946651.8A priority Critical patent/CN109215799B/en
Publication of CN109215799A publication Critical patent/CN109215799A/en
Application granted granted Critical
Publication of CN109215799B publication Critical patent/CN109215799B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Toxicology (AREA)
  • Medicinal Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a screening method for false associated signals in combined adverse drug reaction report data, which comprises the following steps: s1, data acquisition and preprocessing, namely acquiring ADR data and carrying out standardized processing; s2, a data splitting step, namely splitting the data into single corresponding combinations; s3, frequency statistics and analysis, wherein the frequency of occurrence of each combination in the statistical data is analyzed for correlation; s4, establishing a simple fake correlation model; s5, establishing a pseudo-correlation signal proportional imbalance model; s6, a signal detection step, wherein signal detection is carried out on the data before and after optimization; and S7, effectiveness evaluation step, wherein effectiveness evaluation is carried out on the false association screening of the combined medicine, and the effect of signal detection before and after screening is judged. The invention can improve the quality of the data of the combined medication and improve the precision and the efficiency of the detection of the adverse reaction signals.

Description

Screening method for false correlation signals in combined adverse drug reaction report data
Technical Field
The invention relates to a signal screening method, in particular to a screening method for false associated signals in combined adverse drug reaction report data, and belongs to the field of drug safety data mining.
Background
Drug interactions have long been an important issue in Drug safety and a problem in detection of ADR (ADR) signals. According to the statistics of the U.S. Food and Drug Administration (FDA): the proportion of adverse reactions resulting from drug interactions is between 6% and 30%.
From an operability point of view, monitoring the safety of interactions between different drugs is very challenging. The reason is that in the actual clinical use process, the Drug combination is frequent and a large number of unpredictable Drug combinations exist, which makes the report combination forms in the spontaneous reporting system in China diverse and the potential report number is excessive, thereby increasing the Difficulty of Drug Interaction (DDI) safety monitoring and making the DDI complex and uncontrollable. Therefore, screening false correlation in drug combination is needed to improve the accuracy and efficiency of DDI detection.
With respect to mining and analysis of adverse reaction signals of combined medication, no unified standard is formed internationally, ADR analysis of combined medication remains a hot problem of current research, and international research on association degree between drugs and adverse reactions in combined medication is quite few. The currently adopted data splitting method at home and abroad is a mode of completely correlating all medicines in a combined medication report with all adverse reactions, but in such a data processing mode, a false correlation signal inevitably exists.
In summary, how to provide a screening method for false correlation signals in combined medication adverse reaction report data on the basis of the prior art is provided to overcome many defects in the prior art, fully measure the correlation between drugs and adverse reactions in combined medication, delete the false correlation between drugs and adverse reactions in combined medication, and improve the accuracy of adverse reaction signal detection, which is a problem to be solved urgently by technical personnel in the field.
Disclosure of Invention
In view of the above-mentioned drawbacks of the prior art, the present invention provides a screening method for false correlation signals in combined adverse drug reaction report data, comprising the following steps:
s1, data acquisition and preprocessing, namely acquiring ADR data, and carrying out standardized processing on the adverse reaction name and the medicine name in the ADR data;
s2, a data splitting step, namely splitting reports of multiple medicines, multiple adverse reactions, single medicines and multiple adverse reactions in the preprocessed ADR data into one-to-one single corresponding combination;
s3, frequency statistics and analysis, wherein the frequency of occurrence of each combination in the ADR data is counted and correlation analysis is carried out;
s4, establishing a simple fake correlation model;
s5, establishing a pseudo-correlation signal proportional imbalance model;
s6, a signal detection step, wherein the data before and after the optimization by using the simple pseudo-correlation model and the pseudo-correlation signal proportional imbalance model are subjected to signal detection by using an ADR signal detection method;
and S7, effectiveness evaluation, namely, evaluating effectiveness of the false association screening of the combined medicines, and judging the signal detection effects before and after the false association screening by using the adverse reaction warning of the medicines on the medicine specification as a known library.
Preferably, in the data acquiring and preprocessing step of S1, the ADR data is the ADR data of the country center.
Preferably, the data splitting step of S2 specifically includes: firstly, splitting combined medication data and single medication data in ADR data, splitting n adverse reactions caused by single medication into n drug-adverse reaction combinations one by one, and splitting i drugs and j adverse reactions of combined medication into i x j drug-adverse reaction combinations formed in a one-to-one correspondence manner.
Preferably, the frequency statistics and analysis step of S3 specifically includes:
counting the occurrence frequency of each record in the ADR data after splitting, constructing a combined medication data set, wherein the data set comprises four attributes of the name of the drug, the name of the adverse reaction, the occurrence frequency of the single medication and the association degree of the drug and the adverse reaction, calculating the correlation between the drug and the adverse reaction, assuming that the occurrence of each adverse reaction is completely independent when the drug is used alone, the report number of the adverse reactions generated when the drug is used alone can reflect the possibility degree of the adverse reaction caused by the drug, and defining a drug D1Adverse reaction with A1Has a correlation coefficient of
Figure RE-GDA0001861624100000031
The drug D is then calculated using the total probability formula1Cause adverse reaction A1The correlation when the medicine is taken alone is calculated by the formula
Figure RE-GDA0001861624100000032
Wherein P (A)1|D1) Namely, it isFor using medicine D1Cause adverse reaction A1Probability of (A), P (A)1|D1)=RD1(A1),
Figure RE-GDA0001861624100000033
Is unused medicine D1Cause adverse reaction A1The probability of (a) of (b) being,
Figure RE-GDA0001861624100000034
preferably, the step of establishing the simple pseudo-correlation model in S4 specifically includes:
suppose a certain drug DkCause adverse reaction AiWhen the number of reports is less than or equal to X, the medicine D is putkAdverse reaction with AiThe combination of (A) is regarded as a pseudo-correlation, and a simple pseudo-correlation model is obtained as
Figure RE-GDA0001861624100000035
Wherein
Figure RE-GDA0001861624100000036
For the drug D when administered alonekAdverse reaction caused by AiThe number of occurrences of the event.
Preferably, in the step of establishing the simple pseudorelevance model in S4, when the adverse reaction A occurs1The frequency of occurrence is less than 3, that is, the medicine D can be considered1Produce adverse reaction A1Occasionally, without considering the adverse reaction combination, the combination formula
Figure RE-GDA0001861624100000041
The simple false correlation model can be optimized to
ND1(A1)<3。
Preferably, the step of establishing the pseudo-correlation signal proportional imbalance model in S5 specifically includes:
s51 calculation of medicine D1Adverse reaction with A1Proportional imbalance d between the mapping of (d) and the maximum combination of correlation coefficientsD1(A1) The calculation formula is
Figure RE-GDA0001861624100000042
Wherein MAX (R)D1(A1),RD2(A1),RD3(A1) Is shown in medicine D1And medicine D2And medicine D3Adverse reaction with A1Taking the maximum value of the correlation coefficients;
s52, the calculation formula in S51 is popularized to general conditions to obtain the medicine K and the adverse reaction A in combined medication1Formula for calculating mapped false correlation signal
Figure RE-GDA0001861624100000043
Wherein MAX (R)D1(A1),RD2(A1) ...) shows the adverse reaction A of a drug substance taken together with the drug substance1Taking the maximum value of the correlation coefficients of (1), Rk(A1) Is a medicine K and adverse reaction A1The correlation coefficient of (a);
s53, under the condition of combined medication, judging the medicine K and the adverse reaction A1Whether the correlation is a pair of false correlations or not, the judgment formula is
Nk(A1)=0|dk(A1)≥2,
Wherein, | is OR logic operation, N (D) represents adverse reaction A generated when medicine K is singly taken1Number of (d)kA pseudo-correlation signal representing the drug K and the adverse reaction, an adverse reaction A1Can be any adverse reaction caused by combined medication.
Preferably, in the signal detection step of S6, the ratio report ratio algorithm is used to perform signal detection when the combination of the drug and the adverse reaction satisfies PRR ≧ 2, x2When the ratio is more than or equal to 4, the combination of the medicine and the adverse reaction is regarded as one notPositive signal of good reaction.
Preferably, in the effectiveness evaluating step of S7, the evaluation indexes include accuracy, recall ratio, and F index.
Compared with the prior art, the invention has the advantages that:
the relevance model provided by the invention can effectively measure the relevance between the medicines taking the medicines together and the adverse reaction, and meanwhile, the false relevance measurement model provided based on the relevance model can also provide effective guidance for deleting false relevance, improve the quality of data of taking the medicines together, and improve the accuracy and efficiency of detecting the adverse reaction signals, thereby providing reference for detecting the adverse reaction signals of taking the medicines together.
In addition, the invention also provides reference basis for other related problems in the same field, can be expanded and extended on the basis of the reference basis, is applied to the technical scheme of other signal screening methods in the same field, and has very wide application prospect.
The following detailed description of the embodiments of the present invention is provided in connection with the accompanying drawings for the purpose of facilitating understanding and understanding of the technical solutions of the present invention.
Drawings
FIG. 1 is a flow chart of the steps of the present invention;
FIG. 2 is a graph of the combined frequency distribution of the original data set according to the present invention;
FIG. 3 is a split view of individual medication data according to the present invention;
FIG. 4 is a split view of the merged medication data of the present invention.
Detailed Description
As shown in figures 1-4, the invention discloses a screening method for false correlation signals in combined adverse drug reaction report data, which comprises the following steps:
s1, data acquisition and preprocessing, acquiring ADR data, and carrying out standardized processing on the adverse reaction name and the medicine name in the ADR data.
The ADR data here is the ADR data of the country center.
And S2, a data splitting step, namely splitting reports of multiple medicines, multiple adverse reactions, single medicines and multiple adverse reactions in the preprocessed ADR data into one-to-one single corresponding combination.
The method specifically comprises the steps of splitting combined medication data and single medication data in ADR data, splitting n adverse reactions caused by single medication into n medicine-adverse reaction combinations one by one, and splitting i medicines and j adverse reactions of combined medication into i x j medicine-adverse reaction combinations formed in a one-to-one correspondence mode.
And S3, frequency statistics and analysis, wherein the frequency of occurrence of each combination in the ADR data is counted and correlation analysis is carried out.
Specifically, the method comprises the step of counting the occurrence frequency of each record in the ADR data after splitting, wherein the frequency comprises the frequency of the combination of the single-medication adverse reaction and the frequency of the combination of the combined medication adverse reaction.
The method for the frequency statistics of the single-medication adverse reaction combination comprises the following steps: will be composed of medicine D1Adverse reactions were scored as D1→A1,D1→A2The frequency is ND1(A1),ND1(A2) (ii) a Is not prepared from medicine D1Adverse reactions caused were scored as
Figure RE-GDA0001861624100000061
At a frequency of
Figure RE-GDA0001861624100000062
The combined frequency statistics mode of the combined adverse reactions of the medicines comprises the following steps: will be composed of medicine D1、D2The combined use causes adverse reaction A1、A2According to the statistical frequency of the data after splitting, respectively recording the statistical frequency as the following form D1Cause A1Is ND1 *(A1),D2Cause A1Is ND2 *(A1) And so on.
Constructing a combined medication data set which comprises the names of the medicines, the names of the adverse reactions, the occurrence frequency of the single medication, the medicines and the adverse reactionsCalculating the correlation between the medicine and the adverse reaction by using a data table of four attributes of reaction correlation degree, assuming that each adverse reaction is completely independent when the medicine is singly used, and the report quantity of the adverse reactions generated when the medicine is singly used can reflect the possible degree of the adverse reactions caused by the medicine, and defining a medicine D1Adverse reaction with A1Has a correlation coefficient of
Figure RE-GDA0001861624100000071
The drug D is then calculated using the total probability formula1Cause adverse reaction A1The correlation when the medicine is taken alone is calculated by the formula
Figure RE-GDA0001861624100000072
Wherein P (A)1|D1) That is to use the medicine D1Cause adverse reaction A1Probability of (A), P (A)1|D1)=RD1(A1),
Figure RE-GDA0001861624100000073
Is unused medicine D1Cause adverse reaction A1The probability of (a) of (b) being,
Figure RE-GDA0001861624100000074
by the same method, medicine D can be calculated2Adverse reaction with A1The correlation coefficient calculation formula is expanded to the general condition in the adverse reaction database of the single medication, and any one medicine k and the adverse reaction A are calculatediCorrelation coefficient of (d):
Figure RE-GDA0001861624100000075
where N is the number of all adverse reactions in the ADR data.
S4, establishing a simple pseudo-correlation model.
Specifically, assume a certain drug DkCause adverse reaction AiWhen the number of reports is less than or equal to X, the medicine D is putkAdverse reaction with AiThe combination of (A) is regarded as a pseudo-correlation, and a simple pseudo-correlation model is obtained as
Figure RE-GDA0001861624100000076
Wherein
Figure RE-GDA0001861624100000077
For the drug D when administered alonekAdverse reaction caused by AiThe number of occurrences of the event.
In general, when adverse reaction A occurs1The frequency of occurrence is less than 3, that is, the medicine D can be considered1Produce adverse reaction A1Occasionally, without considering the adverse reaction combination, the combination formula
Figure RE-GDA0001861624100000078
The simple false correlation model can be optimized to
ND1(A1)<3。
S5, establishing a pseudo-correlation signal proportional imbalance model.
Specifically, the method comprises the following steps:
s51, in a group of combined medication data, the combination of the drug with the maximum correlation coefficient between the drug and the adverse reaction is supposed to be truly related, and whether the combination is falsely related or not is judged by measuring the difference between the correlation coefficient and the maximum correlation coefficient of other combinations.
Calculating medicine D1Adverse reaction with A1Proportional imbalance d between the mapping of (d) and the maximum combination of correlation coefficientsD1(A1) The calculation formula is
Figure RE-GDA0001861624100000081
Wherein MAX (R)D1(A1),RD2(A1),RD3(A1) Is shown in medicine D1And medicine D2And medicine D3Adverse reaction with A1The correlation coefficient of (2) is taken as the maximum value.
S52, the calculation formula in S51 is popularized to general conditions to obtain the medicine K and the adverse reaction A in combined medication1Formula for calculating mapped false correlation signal
Figure RE-GDA0001861624100000082
Wherein MAX (R)D1(A1),RD2(A1) ...) shows the adverse reaction A of a drug substance taken together with the drug substance1Taking the maximum value of the correlation coefficients of (1), Rk(A1) Is a medicine K and adverse reaction A1The correlation coefficient of (2).
S53, suppose dk(A1) When the reaction is more than or equal to 2, the medicine K and the adverse reaction A1The combination of (A) is a pseudo-relation, and the assumption is that the drug does not have adverse reaction when being taken alone1The medicine is considered not to cause adverse reaction A1If the medicine is used in combination with other medicines, adverse reaction A occurs1Then the medicine is considered to have adverse reaction A1Is a one-to-false correlation.
Based on the hypothesis, under the condition of combined medication, the medicine K and the adverse reaction A are judged1Whether the correlation is a pair of false correlations or not, the judgment formula is
Nk(A1)=0|dk(A1)≥2,
Wherein, | is OR logic operation, N (D) represents adverse reaction A generated when medicine K is singly taken1Number of (d)kA pseudo-correlation signal representing the drug K and the adverse reaction, an adverse reaction A1Can be anyThis is intended to combine the adverse reactions caused by medication.
And S6, a signal detection step, namely, carrying out signal detection on the data before and after the optimization by using the simple pseudo-correlation model and the pseudo-correlation signal proportional imbalance model by using an ADR signal detection method. Aiming at original data and optimized data before and after screening, a proportion report ratio algorithm (PRR) commonly used in China is usually used for signal detection, and when the combination of a medicine and an adverse reaction meets the condition that the PRR is more than or equal to 2, x2When the number is more than or equal to 4, the combination of the drug and the adverse reaction is regarded as an adverse reaction positive signal.
And S7, effectiveness evaluation, namely, evaluating effectiveness of the false association screening of the combined medicines, and judging the signal detection effects before and after the false association screening by using the adverse reaction warning of the medicines on the medicine specification as a known library. The evaluation index includes Precision (Precision), Recall (Recall), and F index.
The technical solution of the present invention is further described with reference to a specific embodiment.
First, analysis and selection of ADR data are performed. 1,823,144 ADR report data are obtained from a national drug adverse reaction detection center within 2010.1.1-2011.12.31 years, wherein 608,710 reports record multiple corresponding relations of multiple drugs and multiple adverse reactions, and the data are split into one-to-one single corresponding relations. And after the splitting is finished, abnormal data is removed, and the record of the medicine name or the untoward effect name in the data is deleted. Wherein the data of the western medicines are 1,874,904 records, 16,383 adverse reaction reports of the database are selected for spontaneous reporting of adverse reactions in Jiangsu province, the data of the combined medication is 8, the data of 172 combined medication comprises 3,264 combined medication combinations, the data of the single medication is 8,212 combined medication comprising 547 medicines and 137 adverse reactions, and the combined medication accounts for 49.89 percent of the total database. In this example, western medicine data is selected as the experimental subject.
The attribute fields of the adverse reaction report table in the database are shown in table 1, wherein bgbm is the code of each adverse reaction report, pzmc is the name of the drug causing the adverse reaction, and blfymc is the name of the adverse reaction in the report. Through careful study on the subject, the experiment only needs to report three fields of codes, drug names and adverse reaction names, and the three fields are selected to be recombined into a new original data set:
TABLE 1 adverse reaction report Attribute Table
Figure RE-GDA0001861624100000101
Subsequently, data splitting is performed. Some drugs and adverse event combinations exist in the data set, and the records are divided into multiple records corresponding to different adverse reactions, drug D1The single use of the traditional Chinese medicine composition produces a plurality of adverse reactions A1And A2The data format in the original data set and the mapping before splitting are shown in the figure. The record is now split into two data records, and the schematic diagram of the split structure is shown in fig. 3.
And recording the data record of the separated medicine corresponding to one adverse reaction as an independent medicine data set. Selecting the drug name and adverse reaction name attributes on the independent medication data set, performing database grouping operation on the two attributes, counting the occurrence frequency of the combination of the drug and the adverse reaction, and finally adding the occurrence frequency of the combination of the drug and the adverse reaction to the independent medication data set. The data table after addition is shown in table 2:
TABLE 2 Combined frequency statistical table for adverse reaction after single medication
Figure RE-GDA0001861624100000102
Under the implementation of the general ADR signal mining algorithm, assume D1、D2And D3All produce A1Adverse reactions, disregarding D1、D2And D3The interaction between them, and therefore the resolution of this adverse reaction report under this assumption is shown in figure 4.
The assumption is that when the medicine is taken alone, each adverse reaction occurs completely independently, and the medicine is taken aloneThe reported number of the adverse reactions generated during the use can reflect the possible degree of the adverse reactions caused by the medicine, and the full probability formula is used for defining the medicine D1Adverse reaction with A1Correlation when administered alone:
Figure RE-GDA0001861624100000111
in the data of a group of combined medicines producing a certain adverse reaction, the combination of the medicine with the maximum correlation coefficient between the certain medicine and the adverse reaction is supposed to be truly associated, so that the false association can be judged by measuring the difference between the correlation coefficients of other combinations and the maximum correlation coefficient. If the difference in the correlation coefficient is very large, and reaches a certain threshold, the combination is considered to be a false correlation signal relative to the combination with the largest correlation coefficient, and the mapping of the drug to the adverse reaction is likely to be false correlation.
Calculating the adverse reaction A of the medicine K1Proportional imbalance correlation coefficient d of maximum combination in mappingk(A1) Is calculated by the formula
Figure RE-GDA0001861624100000112
When the proportional imbalance coefficient is large and a certain threshold value, the medicine and the adverse reaction A are considered1Is a false association. Experiments prove that the effect is most obvious when the threshold value is equal to 2.
Suppose dk(A1) When the reaction is more than or equal to 2, the medicine K and the adverse reaction A1The combination of (c) is a false association. And supposing that the drug has no adverse reaction A when being taken alone1Thus, it is considered that this drug does not cause adverse reaction A1If the medicine is used in combination with other medicines, adverse reaction A occurs1Then the drug is considered to have adverse reaction A1Is a one-to-false correlation. Therefore, a false correlation judgment function is constructed as follows:
NK(D)=0|dk(D)≥2。
in the formula, "|" is OR logic operation NK(D) Shows the adverse reaction A generated when the medicine K is taken alone1Number of (d)kIndicating a false correlation signal of drug K with adverse reactions. Wherein A is1Can be the adverse reaction of any combination of medicines.
Then, false association deletion and data merging are carried out. Firstly, false correlation in the combined medication data set is removed by contrasting a false correlation data set, then three fields of medicines, adverse reactions and occurrence frequency in the combined medication data set and the single medication data set are selected, and the selected records form a new data set. And finally, selecting the drug and adverse reaction fields of the new data set, grouping the two fields, calculating the sum of the occurrence frequency in the group, and replacing the sum with the attribute of the occurrence frequency in the original set to obtain an optimized data set for signal mining. If the false association is not cleared, the three fields combined for single administration are directly selected to form a new data set, and an original data set which is not optimized is obtained.
Subsequently, an ADR signal detection algorithm is applied to the optimized data set for data mining. The ADR signal mining algorithm used in the embodiment is a proportion report ratio algorithm commonly used in China. In this embodiment, adverse reaction signal mining is performed only on signals in which the frequency of occurrence of the combination of the drug and the adverse reaction in the database is not less than 3, and the critical value for determining the adverse reaction positive signal is as follows: PRR is not less than 2 and x2≥4。
Finally, analyzing results, and calculating PRR and x by a formula for each data set based on a four-grid table of an adverse reaction proportion imbalance algorithm2Then selecting a value satisfying PRR ≧ 2, x2And (3) combining the medicines with the adverse reactions of not less than 4 and N of not less than 2, and selecting the combined medicines and adverse reactions fields to form a medicine adverse reaction positive signal set. And finally, obtaining a mining result based on two different data sets.
The result evaluation criteria mainly include accuracy, recall, and F-factor, specifically:
1) rate of accuracy
The accuracy here is for the prediction result of the present invention, which indicates how many of the samples predicted to be positive signals are samples of true positive signals. The calculation formula is as follows:
Figure RE-GDA0001861624100000121
where TP is the number of positive signals predicted to be positive signals and FP is the number of negative signals predicted to be positive signals.
2) Recall rate
For a positive signal sample in the dataset, it indicates how much of the positive signal in the sample was predicted. The calculation formula is as follows:
Figure RE-GDA0001861624100000131
where TP is the number of predictions that an original positive signal was a positive signal and FN is the number of predictions that an original positive signal was a negative signal.
3) F index
The F index is the harmonic mean of the accuracy rate and the recall rate, and the calculation formula is as follows:
Figure RE-GDA0001861624100000132
the calculation results of the accuracy, the recall rate and the F index value on the three data sets of the original data set, the simple optimized data set and the proportional imbalance optimized data set are as follows:
table 3 mining result inspection standard table
Figure RE-GDA0001861624100000133
From table 3, it can be seen that the accuracy of the positive signals obtained by using the simplified model optimized data set based on the false correlation for signal mining is higher than that of the original unoptimized data set, but the recall rate and the F coefficient are lower than those of the original data set. The main reason for this is that the simple pseudo-correlation model is a combination of drug deletion and adverse reaction occurring in the data set of single drug administration less than 3 times, which reduces the number of optimized data samples and reduces the total number of positive signals found by the final algorithm. The reduction in the number of true positive signals is seen from the reduction in the recall rate, which indicates that the root cause of the improvement in accuracy is the reduction in sample size. From the above analysis, it can be seen that the simple model of the spurious correlation has little influence on eliminating the spurious correlation and improving the data quality.
The accuracy, recall rate and F coefficient of a data set optimized by a proportion imbalance based false correlation model for a positive signal data set obtained by signal mining are all higher than three indexes of an original data set, wherein the F coefficient is higher than the F coefficient in the original data set by 3 percentage points. The ratio imbalance pseudo-correlation model can effectively find out the pseudo-correlation and remove the pseudo-correlation in the drug combination, and can effectively optimize the original data set, so that the original data set is more favorable for detecting the adverse reaction positive signals.
In conclusion, the invention takes Chinese ADR report data as the basis, carries out full research on the screening problem of false association between drugs and adverse reactions in combined medication, provides a simple false association model and a false association signal proportion imbalance model, and detects the screening effectiveness by using a detection index. The final results obtained were: compared with the accuracy, recall rate and F index of the positive signals obtained by using the original data set for signal mining, the data set screened by using the false correlation signal proportion imbalance model are improved by 2-3 percentage points, so that the false correlation can be effectively screened by using the false correlation signal proportion imbalance model, and the accuracy and efficiency of signal detection are improved.
The correlation model can effectively measure the correlation between the medicines taking the medicines together and the adverse reaction, and meanwhile, the pseudo-correlation measurement model provided based on the correlation model can provide effective guidance for deleting the pseudo-correlation, improve the quality of the data of the medicines taking the medicines together, and improve the precision and the efficiency of the detection of the adverse reaction signals, thereby providing reference for the detection of the adverse reaction signals of the medicines taking the medicines together.
In addition, the invention also provides reference basis for other related problems in the same field, can be expanded and extended on the basis of the reference basis, is applied to the technical scheme of other signal screening methods in the same field, and has very wide application prospect.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein, and any reference signs in the claims are not intended to be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims (7)

1. A screening method for false correlation signals in combined adverse drug reaction report data is characterized by comprising the following steps:
s1, data acquisition and preprocessing, namely acquiring ADR data, and carrying out standardized processing on the adverse reaction name and the medicine name in the ADR data;
s2, a data splitting step, namely splitting reports of multiple medicines, multiple adverse reactions, single medicines and multiple adverse reactions in the preprocessed ADR data into one-to-one single corresponding combination;
s3, frequency statistics and analysis, wherein the frequency of occurrence of each combination in the ADR data is counted and correlation analysis is carried out;
s4, establishing a simple fake correlation model;
s5, establishing a pseudo-correlation signal proportional imbalance model;
s6, a signal detection step, wherein the data before and after the optimization by using the simple pseudo-correlation model and the pseudo-correlation signal proportional imbalance model are subjected to signal detection by using an ADR signal detection method;
s7, effectiveness evaluation, namely, carrying out effectiveness evaluation on the false association screening of the combined medicines, and judging the signal detection effects before and after the false association screening by using the adverse reaction warning of the medicines on the medicine specification as a known library;
s3, the frequency counting and analyzing step specifically includes:
counting the occurrence frequency of each record in the ADR data after splitting, constructing a combined medication data set, wherein the data set comprises four attributes of the name of the drug, the name of the adverse reaction, the occurrence frequency of the single medication and the association degree of the drug and the adverse reaction, calculating the correlation between the drug and the adverse reaction, assuming that the occurrence of each adverse reaction is completely independent when the drug is used alone, the report number of the adverse reactions generated when the drug is used alone can reflect the possibility degree of the adverse reaction caused by the drug, and defining a drug D1Adverse reaction with A1Has a correlation coefficient of
Figure FDA0003431738490000021
The drug D is then calculated using the total probability formula1Cause adverse reaction A1The correlation when the medicine is taken alone is calculated by the formula
Figure FDA0003431738490000022
Wherein P (A)1|D1) That is to use the medicine D1Cause adverse reaction A1Probability of RD1(A1)=P(A1|D1),
Figure FDA0003431738490000023
Is unused medicine D1Cause adverse reaction A1The probability of (a) of (b) being,
Figure FDA0003431738490000024
s5, the step of establishing the pseudo-correlation signal proportional imbalance model specifically comprises the following steps:
s51 calculation of medicine D1Adverse reaction with A1Proportional imbalance d between the mapping of (d) and the maximum combination of correlation coefficientsD1(A1) The calculation formula is
Figure FDA0003431738490000025
Wherein MAX (R)D1(A1),RD2(A1),RD3(A1) Is shown in medicine D1And medicine D2And medicine D3Adverse reaction with A1Taking the maximum value of the correlation coefficients;
s52, the calculation formula in S51 is popularized to general conditions to obtain the medicine K and the adverse reaction A in combined medication1Formula for calculating mapped false correlation signal
Figure FDA0003431738490000031
Wherein MAX (R)D1(A1),RD2(A1) ...) shows the adverse reaction A of a drug substance taken together with the drug substance1Taking the maximum value of the correlation coefficients of (1), Rk(A1) Is a drug K with deficiencyReaction A1The correlation coefficient of (a);
s53, under the condition of combined medication, judging the medicine K and the adverse reaction A1Whether the correlation is a pair of false correlations or not, the judgment formula is
Nk(A1)=0|dk(A1)≥2,
Wherein, | is OR logic operation, N (D) represents adverse reaction A generated when medicine K is singly taken1Number of (d)kThe number of false correlation signals representing the drug K and the adverse reaction, the adverse reaction A1Can be any adverse reaction caused by combined medication.
2. The screening method for combining false correlation signals in ADR data according to claim 1, wherein: in the data acquiring and preprocessing step of S1, the ADR data is the ADR data of the country center.
3. The screening method for merging false correlation signals in adverse drug reaction report data according to claim 1, wherein the data splitting step of S2 specifically comprises: firstly, splitting combined medication data and single medication data in ADR data, splitting n adverse reactions caused by single medication into n drug-adverse reaction combinations one by one, and splitting i drugs and j adverse reactions of combined medication into i x j drug-adverse reaction combinations formed in a one-to-one correspondence manner.
4. The screening method for the false correlation signals in the combined adverse drug reaction report data according to claim 1, wherein the step of establishing the simple false correlation model of S4 specifically comprises:
suppose a certain drug DkCause adverse reaction AiWhen the number of reports is less than or equal to X, the medicine D is putkAdverse reaction with AiThe combination of (A) is regarded as a pseudo-correlation, and a simple pseudo-correlation model is obtained as
Figure FDA0003431738490000041
Wherein
Figure FDA0003431738490000042
For the drug D when administered alonekAdverse reaction caused by AiThe number of occurrences of the event.
5. The method for screening false correlation signals in merged adverse drug reaction report data according to claim 4, wherein in the step of establishing the simplified false correlation model at S4, when adverse reaction A occurs1The frequency of occurrence is less than 3, that is, the medicine D can be considered1Produce adverse reaction A1Occasionally, without considering the adverse reaction combination, the combination formula
Figure FDA0003431738490000043
The simple false correlation model can be optimized to
ND1(A1)<3。
6. The screening method for combining false correlation signals in ADR data according to claim 1, wherein: in the signal detection step of S6, a ratio report ratio algorithm is used for signal detection, and when the combination of the medicine and the adverse reaction meets the condition that PRR is more than or equal to 2, x2When the number is more than or equal to 4, the combination of the drug and the adverse reaction is regarded as an adverse reaction positive signal.
7. The screening method for combining false correlation signals in ADR data according to claim 1, wherein: in the effectiveness evaluation step at S7, the evaluation indexes include accuracy, recall rate, and F index.
CN201810946651.8A 2018-08-20 2018-08-20 Screening method for false correlation signals in combined adverse drug reaction report data Active CN109215799B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810946651.8A CN109215799B (en) 2018-08-20 2018-08-20 Screening method for false correlation signals in combined adverse drug reaction report data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810946651.8A CN109215799B (en) 2018-08-20 2018-08-20 Screening method for false correlation signals in combined adverse drug reaction report data

Publications (2)

Publication Number Publication Date
CN109215799A CN109215799A (en) 2019-01-15
CN109215799B true CN109215799B (en) 2022-03-15

Family

ID=64989293

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810946651.8A Active CN109215799B (en) 2018-08-20 2018-08-20 Screening method for false correlation signals in combined adverse drug reaction report data

Country Status (1)

Country Link
CN (1) CN109215799B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110767319A (en) * 2019-09-30 2020-02-07 南京邮电大学 Method for detecting adverse reaction signals of combined medication
CN111161890B (en) * 2019-12-31 2021-02-12 上海亿锎智能科技有限公司 Method and system for judging relevance between adverse event and combined medication
CN111966838B (en) * 2020-08-25 2024-08-23 上海梅斯医药科技有限公司 Adverse event and medication relevance judging method and system
CN112365990B (en) * 2020-11-19 2022-05-03 长沙市弘源心血管健康研究院 Strong signal screening method for adverse reaction signals of SRS combined medication
CN112365991B (en) * 2020-11-19 2022-05-03 长沙市弘源心血管健康研究院 Method for mining doubt signal facing SRS combined adverse reaction signal

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104765947B (en) * 2015-03-02 2017-12-26 大连理工大学 A kind of potential drug adverse reaction data digging method towards big data

Also Published As

Publication number Publication date
CN109215799A (en) 2019-01-15

Similar Documents

Publication Publication Date Title
CN109215799B (en) Screening method for false correlation signals in combined adverse drug reaction report data
WO2016029570A1 (en) Intelligent alert analysis method for power grid scheduling
CN104756106B (en) Data source in characterize data storage system
CN104216349B (en) Utilize the yield analysis system and method for the sensing data of manufacturing equipment
CN112669991A (en) Method for detecting adverse drug reaction signals
CN111159272A (en) Data quality monitoring and early warning method and system based on data warehouse and ETL
CN108831563B (en) Decision method for distinguishing classification detection of adverse drug reaction signals
CN111159161A (en) ETL rule-based data quality monitoring and early warning system and method
CN110879822B (en) Drug adverse reaction signal detection method based on association rule analysis
CN111082985A (en) API (application program interface) monitoring method based on open platform
CN115098740B (en) Data quality detection method and device based on multi-source heterogeneous data source
CN115544519A (en) Method for carrying out security association analysis on threat information of metering automation system
CN108565029B (en) Method for eliminating adverse drug reaction data shielding effect based on layering strategy
CN110703183A (en) Intelligent electric energy meter fault data analysis method and system
CN117035563B (en) Product quality safety risk monitoring method, device, monitoring system and medium
CN108122059B (en) Production risk identification method and automatic early warning system for pharmaceutical manufacturing enterprise
CN117633249A (en) Basic variable construction method and device for SDGs space type monitoring index
CN117314150A (en) Construction enterprise potential safety hazard prevention system and method based on data analysis
CN116628609A (en) Intelligent monitoring method and system for multi-source data analysis
CN114944208B (en) Quality control method, quality control device, electronic equipment and storage medium
CN108804494B (en) Method for minimizing data shielding effect in adverse drug reaction signal detection
CN110837504A (en) Industrial control system abnormal system event identification method
CN115983582A (en) Data analysis method and energy consumption management system
CN115470251A (en) Big data analysis display device
CN114496293A (en) Construction method and construction system of severe risk prediction model of COVID-19 patient

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant