CN116913548A - Adverse reaction data analysis method, device, electronic equipment and storage medium - Google Patents

Adverse reaction data analysis method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116913548A
CN116913548A CN202310685543.0A CN202310685543A CN116913548A CN 116913548 A CN116913548 A CN 116913548A CN 202310685543 A CN202310685543 A CN 202310685543A CN 116913548 A CN116913548 A CN 116913548A
Authority
CN
China
Prior art keywords
adverse reaction
information
target
data
reaction data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310685543.0A
Other languages
Chinese (zh)
Inventor
周立运
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rubik's Cube Medical Technology Suzhou Co ltd
Original Assignee
Rubik's Cube Medical Technology Suzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rubik's Cube Medical Technology Suzhou Co ltd filed Critical Rubik's Cube Medical Technology Suzhou Co ltd
Priority to CN202310685543.0A priority Critical patent/CN116913548A/en
Publication of CN116913548A publication Critical patent/CN116913548A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Toxicology (AREA)
  • Medicinal Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an adverse reaction data analysis method, an adverse reaction data analysis device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring adverse reaction data related to a drug; based on the data type of the adverse reaction data, extracting information from the adverse reaction data to obtain target information of the adverse reaction data; and carrying out adverse reaction data analysis on the target information under the target dimension by using an adverse reaction data analysis system to obtain an analysis result. By adopting the method and the device, the analysis efficiency of adverse reaction data of the medicine can be improved.

Description

Adverse reaction data analysis method, device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for analyzing adverse reaction data, an electronic device, and a storage medium.
Background
Currently, adverse reaction data analysis is a critical component in drug development.
However, the existing adverse reaction data analysis mode does not have the support of drug research and development bottom data of full life cycle, and is still dependent on manual experience analysis and integration for mass adverse event data reported by all parties, so that after a large amount of manpower and energy are input, the high-precision analysis requirement of users on adverse reaction of drugs cannot be met.
Therefore, the conventional adverse reaction data analysis method has a problem of low analysis efficiency.
Disclosure of Invention
The invention provides an adverse reaction data analysis method, an adverse reaction data analysis device, electronic equipment and a storage medium, which are used for solving the defect that an adverse reaction data analysis system in the prior art cannot meet the high-precision analysis requirement of a user on adverse reaction of medicines.
The invention provides an adverse reaction data analysis method, which comprises the following steps:
acquiring adverse reaction data related to a drug;
based on the data type of the adverse reaction data, extracting information from the adverse reaction data to obtain target information of the adverse reaction data;
and carrying out adverse reaction data analysis on the target information under the target dimension by using an adverse reaction data analysis system to obtain an analysis result.
According to the adverse reaction data analysis method provided by the invention, the adverse reaction data analysis system comprises at least one of a time analysis unit, a medicine analysis unit and an enterprise analysis unit;
the step of analyzing the adverse reaction data under the target dimension by the adverse reaction data analysis system to obtain an analysis result comprises the following steps:
Determining a target dimension, wherein the target dimension comprises at least one dimension of a time dimension, a medicine dimension and an enterprise dimension;
extracting information to be analyzed associated with the target dimension from the target information;
and carrying out adverse reaction data analysis on the information to be analyzed under the target dimension through at least one of the time analysis unit, the medicine analysis unit and the enterprise analysis unit to obtain an analysis result.
According to the adverse reaction data analysis method provided by the invention, the extracting of the information to be analyzed associated with the target dimension from the target information comprises the following steps:
acquiring a target object, wherein the target object is described based on at least one of patient information, medicine information, enterprise information, indication information, reporting time and medical term information;
and extracting information to be analyzed associated with both the target dimension and the target object from the target information.
According to the method for analyzing adverse reaction data provided by the invention, the analyzing the adverse reaction data under the target dimension of the information to be analyzed by at least one of the time analysis unit, the medicine analysis unit and the enterprise analysis unit to obtain an analysis result comprises the following steps:
Under the condition that the target dimension is the time dimension, analyzing the type and/or occurrence frequency of the adverse reaction event corresponding to the target time in the information to be analyzed through the time analysis unit to obtain an analysis result;
under the condition that the target dimension is the medicine dimension, analyzing the type and/or occurrence frequency of the adverse reaction event corresponding to the target medicine in the information to be analyzed through the medicine analysis unit to obtain an analysis result;
and under the condition that the target dimension is the enterprise dimension, analyzing a target medicine associated with a target enterprise in the information to be analyzed and the type and/or occurrence frequency of adverse reaction events associated with the target medicine by the enterprise analysis unit to obtain an analysis result.
According to the method for analyzing adverse reaction data provided by the invention, the analyzing the type and/or occurrence frequency of the adverse reaction event corresponding to the target drug in the information to be analyzed by the drug analyzing unit, to obtain the analysis result, comprises the following steps:
based on the causal relationship between each target medicine and each adverse reaction event in the information to be analyzed, counting the types and/or occurrence times of the adverse reaction events corresponding to each target medicine from the information to be analyzed;
And comparing the types and/or occurrence times of the adverse reaction events corresponding to the target medicines through the medicine analysis unit, and determining the analysis result based on the comparison result.
According to the method for analyzing adverse reaction data provided by the invention, the method for acquiring adverse reaction data related to the drug comprises the following steps:
acquiring medicine information data, extracting information text in the medicine information data, performing adverse reaction association classification on the information text, and further screening adverse reaction data of information sources from the medicine information data based on the adverse reaction association classification result; the method comprises the steps of,
and obtaining adverse reaction data of a clinical adverse reaction report source and/or an adverse reaction database source after marketing.
According to the method for analyzing adverse reaction data provided by the invention, the information extraction is performed on the adverse reaction data based on the data type of the adverse reaction data, so as to obtain the target information of the adverse reaction data, and the method comprises the following steps:
under the condition that the data type of the adverse reaction data comprises texts, extracting the entity from the texts in the adverse reaction data to obtain adverse reaction entities;
The target information is determined based on the adverse reaction entities and the relationship between the adverse reaction entities in the text.
The invention also provides an adverse reaction data analysis device, which comprises:
an acquisition unit for acquiring adverse reaction data related to the drug;
the extraction unit is used for extracting information of the adverse reaction data based on the data type of the adverse reaction data to obtain target information of the adverse reaction data;
and the analysis unit is used for carrying out adverse reaction data analysis on the target information under the target dimension through an adverse reaction data analysis system to obtain an analysis result.
The invention also provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the adverse reaction data analysis method according to any one of the above when executing the computer program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of adverse reaction data analysis as described in any of the above.
The invention also provides a computer program product comprising a computer program which when executed by a processor implements a method of adverse reaction data analysis as described in any one of the above.
According to the adverse reaction data analysis method, the device, the electronic equipment, the storage medium and the computer program product, provided by the invention, the drug research and development bottom data support with full life cycle is provided for subsequent adverse reaction data analysis by acquiring the adverse reaction data related to the drug; and the adverse reaction data analysis system is used for carrying out adverse reaction data analysis on target information obtained by information extraction under a target dimension, so that multi-dimension and systematic combined analysis on the adverse reaction data can be rapidly realized, and the high-precision analysis requirement of a user on adverse reaction of medicines is met, thereby solving the problem of low analysis efficiency of the existing adverse reaction data at one step.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of an adverse reaction data analysis method provided by the invention;
FIG. 2 is a schematic structural diagram of an adverse reaction data analysis device provided by the invention;
fig. 3 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The method comprises the steps of establishing a drug alert during a clinical test, enhancing quality management of a rapid report of safety data during the clinical test, monitoring the rapid report of the safety data during the clinical test, identifying safety signals, evaluating and controlling safety risks, and establishing the drug alert covering the whole life cycle of a drug, thus being one of important tasks of supervision science.
The Chinese hospital drug alert system (China Hospital Pharmacovigilance System, CHPS) is an active monitoring system established by a national drug adverse reaction monitoring center for realizing drug adverse reaction monitoring after marketing, establishes a cooperation mechanism of each party through CASSA, establishes the CHPS to open a data channel between data sources, and can verify drug safety problems by establishing a risk analysis model.
However, CHPS is currently only monitored for adverse events of drugs after marketing, and data sources are concentrated on part of three-dimensional hospital data with strong research capability and high informatization degree in CASSA, and analyzed data sources are single. And the monitoring range of the data analysis system established based on the method is narrow, so that the effectiveness and timeliness of data analysis on adverse drug reactions are directly affected.
In addition, the massive adverse event data reported by all parties still need to be analyzed and integrated by relying on manual experience, so that after a great deal of manpower and energy are input, the high-precision analysis requirement of users on adverse drug reactions cannot be met.
Based on this, the embodiment of the invention provides an adverse reaction data analysis method, which is applied to a data analysis scene of adverse drug reactions, for example, directly performs data analysis on report texts of suspected unexpected serious adverse reactions (Suspected Unexpected Serious Adverse Reaction, SUSAR), so as to meet the high-precision analysis requirement of users on the adverse drug reactions and improve the data analysis efficiency on the adverse drug reactions.
Fig. 1 is a flow chart of an adverse reaction data analysis method provided by the present invention, and as shown in fig. 1, the adverse reaction data analysis method provided by the embodiment of the present invention may include the following steps:
Step 110, obtaining adverse reaction data related to the medicine;
step 120, based on the data type of the adverse reaction data, extracting information from the adverse reaction data to obtain target information of the adverse reaction data;
and 130, carrying out adverse reaction data analysis on the target information in the target dimension through an adverse reaction data analysis system to obtain an analysis result.
In particular, adverse reaction data related to a drug refers to data related to adverse reaction events, which may include drug adverse reactions (Adverse Drug Reactions, ADR) and drug adverse events (Adverse Drug Events, ADE). The adverse drug reaction refers to an irrelevant or unexpected adverse reaction of the qualified drug with the normal dosage of the drug; adverse drug events refer to any unfortunate medical and health event that occurs during drug therapy and is not necessarily causally related to drug therapy.
The adverse reaction data may include data of various modalities such as text, graphics (images and tables), audio and video. The adverse reaction data can be acquired through external equipment, block chain type acquisition, polling mode acquisition, request mode acquisition, pre-stored direct acquisition and the like. Taking the text in the adverse reaction data as an example, the text can be directly input by a user, can be obtained by performing voice transcription on the acquired audio, can be obtained by acquiring an image through image acquisition equipment such as a scanner, a mobile phone and a camera, can be obtained by performing OCR (Optical Character Recognition ) on the image, can be obtained from the Internet through a crawler, and the method for obtaining the adverse reaction data is not particularly limited in the embodiment of the invention.
Considering that adverse reaction event monitoring is mainly focused on medicines after marketing at present, and the source of monitoring data is quite single, the monitoring range of a data analysis system built based on the monitoring data is narrow, and the data support of the medicine research and development bottom layer of the full life cycle is not provided. In the embodiment of the present invention, the data sources of the adverse reaction data include at least two data sources, for example, adverse reaction data may be obtained from two or more data sources. Further, the at least two data sources can comprise data sources for monitoring adverse reaction events of medicines after being marketed, patient medical records and data sources for monitoring adverse reaction events of medicines in clinical trial stages, so that adverse reaction event monitoring of the whole life cycle of the medicines is covered, the monitoring range of a data analysis system established based on adverse reaction data is widened, and the effectiveness and timeliness of data analysis of adverse reactions of medicines are improved.
In some embodiments, the obtaining adverse reaction data related to the drug, i.e., step 110 specifically includes:
step 111, obtaining the medicine information data, extracting the information text in the medicine information data, performing adverse reaction association classification on the information text, and further screening the adverse reaction data of the information source from the medicine information data based on the adverse reaction association classification result;
And, step 112, obtaining adverse reaction data of a clinical adverse reaction report source and/or a post-marketing adverse reaction database source.
It should be understood that, here, the step 111 and the step 112 are parallel steps, and the execution sequence of the step 111 and the step 112 is not specifically limited in the embodiment of the present invention.
Specifically, the data source of the drug information data is an information source, for example, an information document related to a clinical test stage or a post-market application stage of the drug. The drug information data may be bulletin or news from pharmaceutical companies or regulators, or may be literature or conference information related to medicine, pharmacy, biology, health science, or nursing science, etc., which is not particularly limited in the embodiment of the present invention.
According to the method, a crawler technology can be adopted for acquiring the medicine information data, crawling is conducted on data sources such as medicine type enterprise networks, public numbers, information websites and the like, automatic detection and early warning are conducted, real-time monitoring of the medicine information data is achieved, and availability and information quality of the information data sources are guaranteed. Furthermore, a crawler program can be written by using Python language, and a multithreading technology is adopted to crawl a target website so as to obtain medicine information data.
In consideration of the fact that the drug information data may include information contents in different directions, for example, contents related to adverse reaction events of drugs, and contents in directions such as marketing and sales of drugs, the drug information data needs to be screened to obtain adverse reaction data therein.
In order to obtain adverse reaction data, the information text in the drug information data may be extracted first. The information text is understood as text information in the drug information data, and the information text may be natural language of any language, such as english, chinese, japanese, etc., which is not particularly limited in the embodiment of the present invention.
The information text may be obtained by text parsing of text in the drug information data, may be obtained by voice recognition of voice therein, and may be obtained by OCR (Optical Character Recognition ) of an image therein, which is not particularly limited in the embodiment of the present invention.
In some embodiments, considering that there may be data of a chart mode in the drug information data, extracting the information text in the drug information data in step 111 includes:
step 111-1, performing chart detection on drug information data based on a target detection model, wherein the target detection model is obtained by training a sample data set of a coverage title area of a pre-labeled chart area;
Step 111-2, if the chart position information in the medicine information data is detected, the information text is extracted from the chart of the medicine information data based on the chart position information.
Specifically, in order to extract the information text of the chart portion in the medicine information data, the chart detection can be performed on the medicine information data based on the trained target detection model. Specifically, the drug information data is inputted into a trained target detection model, and the target detection model performs graph detection on the drug information data. If the chart position information in the medicine information data is detected, the chart position information in the medicine information data, namely, the chart position information is output. The chart position information here may include not only the position information of the area occupied by the chart in the medicine information data but also the position information of the area occupied by the title of the chart in the medicine information data, that is, the chart position information may include the position information of the chart area and the position information of the title area.
The target detection model can be obtained through training in the following way:
and obtaining a sample document set, wherein each sample document in the set is pre-marked with a chart area and a title area, the pre-marked chart area can cover the title area associated with the chart, namely, the sample document set can be understood that each sample document in the sample document set is pre-marked with a chart position and a title position, and the pre-marked chart position is bound with the corresponding title position.
For example, when the sample document contains a chart a, the area occupied by the title corresponding to the chart a is labeled as a title area, and the area occupied by the title area and the chart a is labeled as a chart area, so that the labeled chart area covers the title area.
Then, training the initial target detection model based on the pre-labeled sample document set, and continuously learning the position information of the chart area and the title area in the sample document in the training process of the initial target detection model, so that the target detection model obtained by training can accurately predict charts and titles thereof possibly contained in the medicine information data and obtain chart position information.
After obtaining the chart position information in the medicine information data, the position information A of the chart area and the position information B of the title area contained in the chart position information can be analyzed to determine the title uniquely corresponding to each chart (for example, the position information A and the position information B meet the following conditions, and the unique corresponding relation between two objects can be determined, (1) the position information B is not in the document page invalid area, (2) the position information A covers the position information B, (3) the position information B is not in the chart invalid area contained in the position information A, (4) the distance between the title and the chart accords with a preset threshold value, (5) the title content contains a preset title keyword, further the title content is used as an index, and the chart position information irrelevant to adverse reaction events is primarily filtered (for example, the title content is traversed based on preset keywords relevant to medicine marketing and medicine sales, so that the primary filtration of non-target charts is realized). Therefore, text extraction is not required for all charts in the medicine information data, so that the extraction workload of the information text is reduced, and the extraction efficiency of the information text is improved.
At this time, the information text extracted from the remaining charts may include the following two cases:
1) For a chart in a plain text format, the content in a chart source file can be directly analyzed, and chart positioning is carried out by combining chart position information so as to accurately acquire information text in the chart;
2) For graphics in image format, the graphic localization may be performed based on the graphic location information and the information text may be extracted from the localized graphic using OCR techniques.
After obtaining the information text in the drug information data, it is considered that although the graphs irrelevant to the adverse reaction events are primarily filtered based on the graph titles, it cannot be ensured that all the information text extracted from the remaining graphs are relevant to the adverse reaction events, so in order to improve the accuracy of the subsequent data analysis on the adverse reaction of the drug, the information text can be further subjected to adverse reaction association classification to screen the adverse reaction data of the information source from the drug information data.
The adverse reaction related classification of the information text can be realized by a machine learning technology, such as a trained text classification model, so as to identify and screen the adverse reaction data from the medicine information data. It will be appreciated that the text classification model herein is used to determine whether the informational text is associated with an adverse reaction.
To improve the accuracy and efficiency of the text classification model, a Multilingual BERT large-scale pre-training model combined with the fine-training method can be used to train the classifier. The model can process text in multiple languages simultaneously without the need to customize the model for each language separately.
Further, the text classification model can be obtained through training the following steps:
firstly, preprocessing the sample text obtained by crawling, including removing useless information such as stop words, punctuations and the like, and converting the sample text into a token sequence so as to facilitate model processing.
Then, in the data labeling stage, a semi-supervision method is adopted, and specific keywords are used in advance, for example: the specified document is filtered out by the aid of the aid reaction, the aid event, adverse reactions and the like, and then the document is manually checked so as to mark positive and negative samples. The remaining negative samples are randomly extracted from the remaining documents.
The mBERT model is then used as a base model and is adapted and optimized in a fine-tuning manner to accommodate the text classification task. Specifically, a fully connected layer is added on top of the mBERT model, and a cross entropy loss function is used in the training process to measure the loss of the model, and a random gradient descent algorithm is adopted for optimization.
After multiple rounds of training, an efficient and accurate text classification model can be obtained, and the model can carry out adverse reaction association classification on the information text to obtain an adverse reaction association classification result. On the basis, adverse reaction data of information sources can be screened from the drug information data based on the adverse reaction association classification result.
Besides the adverse reaction data of the information sources, the adverse reaction data of the clinical adverse reaction report sources and/or the adverse reaction database sources after marketing can be obtained, so that the adverse reaction data of at least two data sources can be obtained.
The source of the clinical adverse reaction report may be a report provided by the enterprise to the research institutions and regulators during the clinical trial, and may include at least a SUSAR (Suspected Unexpected Serious Adverse Reaction), a DSUR (Development Safety Update Report, during development safety update report), and an SAE (Serious Adverse Event ) report.
The adverse reaction database sources after marketing can reflect adverse event information and medication error information of the medicines on the market, and the adverse reaction database after marketing can comprise: FDA adverse event reporting system (FDA Adverse Event Reporting System, FAERS), european union drug regulatory agency drug alert database (European Union Drug Regulating Authorities Pharmacovigilance, EMA Eudra Vigilance), japanese drug adverse event reporting database (Japanese Adverse Drug Event Report, JADER), and the like.
The FAERS serves as a united states drug alert marketing library that monitors the safety of all marketed drugs and therapeutic biologicals, and the public can obtain information via "dashboards" (shown in a categorized summary based on some key fields, such as report type/region/patient age/severity, etc.) and "quarter data files" (containing raw report data extracted from the FAERS over a specified time frame).
Wherein, the data elements presented by the quarter data file are:
(1) general information: case report ID (ASCII), FAERS case report ID, case report version, pharmaceutical enterprise unique case report ID, regulatory agency case report ID, date of delivery to FDA, date of first/last receipt by FDA, type of report (urgent, non-urgent, direct), form of report (spontaneous report, study report, unknown, XML), whether pharmaceutical enterprise electronically submitted (ASCII), severity (XML), whether fast reporting criteria (XML) are met;
(2) patient information: patient age and age group, patient gender, weight;
(3) drug information: drug unique identification number (ASCII), drug name (primary and secondary suspected drug in ASCII, suspected concomitant drug in XML), active substance name, route of administration, dose, frequency, cumulative dose, dosage form, concomitant drug, date of expiration of drug use, course of treatment, drug indication in MedDRA preferred terms, medication behavior (XML for increasing or decreasing dose, etc.), deexcitation effect (ASCII), reexcitation effect (ASCII), lot number, expiration date (ASCII), new drug application number;
(4) Event information: the preferred terms of MedDRA include adverse drug reactions, medDRA version (XML), date of occurrence (XML), severity, country of occurrence of event, and outcome (XML);
(5) report source information: the type of reporter, the country in which the reporter is located, the pharmaceutical enterprise/organization code that sent the report, the report source (ASCII), whether the voluntary report notifies the pharmaceutical enterprise (ASCII).
It can be understood that, the adverse reaction data of the information source is obtained by screening from the drug information data in step 111, and the adverse reaction data of the clinical adverse reaction report source and/or the adverse reaction database source after marketing is obtained in step 112, so that the adverse reaction data of at least two data sources are obtained, the monitoring range of the data analysis system can be widened, and the effectiveness and timeliness of the data analysis on the adverse reaction of the drug are improved.
In order to perform data analysis on the adverse reaction data, the adverse reaction data obtained in step 110 may be further extracted to obtain target information of the adverse reaction data. The target information herein may be understood as information related to analysis of adverse reaction data, such as drug information in adverse reaction data, patient information, adverse reaction information, reporting time, and the like.
The data types of the adverse reaction data can comprise texts, images, audio and video, and the like, and for various data types, information extraction can be performed on the adverse reaction data in different modes. The information extraction mode herein may include at least one of entity extraction, entity relation extraction, keyword extraction, voice recognition, and OCR. For example, for text data types, named entity recognition (Named Entity Recognition, NER) or keyword extraction may be used for target information extraction; for the type of image data, firstly, an OCR technology is adopted to extract text from an image, then target information extraction is carried out on the text, and also the target information extraction can be directly carried out on the image in an image recognition mode; for the type of audio and video data, target information extraction can be directly performed on the audio and video data, voice transcription can be performed on the audio data to obtain a transcribed text, then target information extraction is performed on the transcribed text, or an OCR technology is respectively applied to each frame of image in the video data to perform text extraction, and then target information extraction is performed on the extracted text.
In some embodiments, based on the data type to which the adverse reaction data belongs, information extraction is performed on the adverse reaction data to obtain target information of the adverse reaction data, that is, step 120 specifically includes:
step 121, performing entity extraction on the text in the adverse reaction data based on the pre-training language model to obtain an adverse reaction entity under the condition that the data type of the adverse reaction data comprises the text;
in step 122, the target information is determined based on the adverse reaction entities and the relationship between the adverse reaction entities in the text.
In particular, entity extraction may also be referred to as named entity recognition (Named Entity Recognition, NER), which refers to automatically recognizing named entities from textual information. In the case that the data type of the adverse reaction data includes text, in order to obtain the target information, entity extraction can be performed on the text in the adverse reaction data based on the pre-training language model, so as to obtain an adverse reaction entity.
Entity extraction is performed on texts in adverse reaction data, and the entity extraction can be divided into the following two cases:
1) For plain text not containing charts, entity extraction can be performed on the text based on a trained NER model or a pre-training language model to obtain adverse reaction entities;
2) For the text containing the chart, in order to effectively utilize chart information in the text to enhance the entity extraction structure, thereby improving the integrity and accuracy of entity extraction, the entity extraction can be carried out on the text containing the chart through a pre-training language model, and adverse reaction entities can be obtained.
In some embodiments, in step 121, for the text including the chart, entity extraction is performed on the text in the adverse reaction data based on the pre-trained language model to obtain an adverse reaction entity, which specifically includes:
step 121-1, obtaining form information of a form in the adverse reaction data, and performing entity masking on the form information to obtain a complete blank filling template;
step 121-2, determining a target text sequence based on the complete blank filling template and the text in the adverse reaction data;
121-3, performing entity extraction on the target text sequence based on the pre-training language model to obtain an adverse reaction entity;
the pre-training language model is obtained by pre-training the initial language model based on an alignment entity, a preset entity filling task and a complete filling task; the alignment entity is obtained by entity alignment of text and associated form information in the sample data.
Specifically, for the case that the text in the adverse reaction data includes a table, the trained object detection model described in step 111-1 may be used first to obtain table information of the table in the adverse reaction data, where the table information is used to reflect contents of the table, and may specifically include contents inside the table, and may also include title contents of the table. The table information may be obtained after positioning based on the chart position information.
The text to be detected can be input into a trained target detection model, chart position information of the text is output, then one-to-one binding is carried out on the chart and the title by analyzing the position information of the chart and the position information of the title in the chart position information, and the title content is acquired based on the title position after the binding. And finally, screening the chart positions as required by taking the title content as an index, and extracting a key chart from the text after obtaining a target chart position corresponding to the title content, wherein the key chart is specifically table information of a table and comprises the content in the table and the title content of the table.
After the table information in the text is obtained, entity masking can be performed on the table information to obtain the complete blank filling template. The complete gap-filling template may be used to describe relationships between entities in the form information.
In some embodiments, the entity masking of the table information in step 121-1, to obtain the complete blank-filling template, specifically includes:
step 121-11, obtaining layout information corresponding to the form information;
step 121-12, determining entity relations among cell entities in the form information according to the layout information; wherein the entity relationship comprises at least one of: the same column entity relationship, the same row entity relationship and the same row column entity relationship;
step 121-13, filling the cell entities into task templates preset corresponding to the entity relationship to obtain template texts;
and step 121-14, in the template text, masking the target entity in the cell entities to obtain the complete blank filling template.
Specifically, the layout information corresponding to the table information can represent the layout position of each cell entity in the table, and the layout information can specifically include the title row cell, the title column cell, and the correspondence between other cells and the title row cell and the title column cell of the table, where the other cells refer to cells other than the title row cell and the title column cell. It can be determined whether any two or more of the aligned entities have an entity relationship through the layout information.
The entity relationship between the cell entities in the table information mainly considers the relationship of the following two cases:
1) In a single column or row, the title cell has an entity relationship with other cells, i.e., a co-column entity relationship or a co-row entity relationship, which generally refers to a relationship between two cells. Any non-title line cell in the same line has the same-line entity relationship with the title line cell in the line, and any non-title column cell in the same column has the same-column entity relationship with the title column cell in the column.
2) One other cell has a relationship with its corresponding title row cell and title column cell, i.e., a peer row entity relationship. The inline entity relationship generally refers to a relationship between three cells.
For example, the layout information corresponding to the table information may be shown as table 1 below, where the first row of the table is a header row and the first column is a header column.
TABLE 1
Sorafenib-ravastatin Sorafenib alone P-value
Median OS 10.7months 10.5months 0.975
Median PFS 5.0months 4.4months 0.986
…… …… …… ……
For the layout information of table 1, it can be obtained that the same column entity relationship exists between the cell entities "P-value" and "0.975", the same row entity relationship exists between the cell entities "Median PFS" and "4.4 mol hs", and the same row entity relationship exists between the cell entities "Median OS", "10.7 mol hs" and "Sorafenib-ravastatin".
After the entity relation among the cell entities is obtained, the cell entities can be filled into task templates preset corresponding to the entity relation, and template texts are obtained.
The preset task template is text for describing the entity relationship between the cell entities. In some embodiments, the task template may be presented as shown in table 2 below:
TABLE 2
Number of entities Task template
2 Ent1 is associated with Ent2.
3 Ent1 and Ent2 may be related to Ent3.
In table 2, there are two task templates corresponding to the text describing the relationship between 2 entities and the text describing the relationship between 3 entities, respectively. Ent1, ent2, and Ent3 are placeholders corresponding to the cell entities in which the entity relationship exists.
After the task template is determined, the cell entity can be filled into the task template preset corresponding to the entity relationship, so that a template text is obtained. For example, two cell entities "P-value" and "P-value" having the same column entity relationship may be filled into the task template corresponding to the entity number 2, and the resulting template text may be represented as "P-value is associated with P-value".
For another example, three cell entities "Median OS", "10.7 montas", and "Sorafenib-ravastatin" that have a relationship with inline entities may be filled into the task templates corresponding to the entity number 3, and the resulting template text may be expressed as "Median OS and 10.7months may be related to Sorafenib-ravastatin".
And then, in the template text, carrying out mask processing on the target entity in the cell entity to obtain the complete blank filling template. The target entity may here be a random entity filled into the template text. In the masking process, a new special word [ SOE ] may be introduced to replace the masked entity.
For example, a target entity "Median OS" may be randomly selected from the template text "Median OS and 10.7months may be related to Sorafenib-ravastatin", and masked, and the resulting finished blank filling template may be expressed as "[ SOE ]]and 10.7. 10.7months may be related to Sorafenib-ravastatin). The resulting finished void-filling template may be denoted as X C
It should be noted that, in the embodiment of the present invention, the number of template texts included in the complete blank filling template is not specifically limited, and may include 3 or 5 template texts, for example.
After the complete blank filling template is obtained, a target text sequence can be determined based on the complete blank filling template and texts in the adverse reaction data. In one embodiment, the target text sequence may be expressed in the form of:
X=[CLS]X C [SEP]X E [SEP]
wherein X is a target text sequence; x is X C Is a complete form filling template; x is X E Is text in the adverse reaction data; [ CLS ] ]And [ SEP ]]Is a special mark.
And then, extracting the entity of the target text sequence based on the pre-training language model to obtain an adverse reaction entity, and specifically, inputting the target text sequence into the pre-training language model to obtain the adverse reaction entity output by the pre-training language model.
The pre-training language model can be obtained through training by the following steps:
1. a sample document is obtained, the sample document including text information and associated form information. I.e. a document containing both text information and associated form information, can be taken as a sample document. For the acquisition of the sample document, the sample document may be acquired from a database including a large amount of information about medical, biological, health, or nursing documents, for example, from a document retrieval database such as PubMed, pubTab.
2. After obtaining the text information and the form information in the sample document, entity alignment can be performed on the text information and the form information, and the entity alignment can be used for judging whether two or more entities with different information sources point to the same object in the real world or not, and collecting named entities with the same reference together, so that aligned entities are obtained.
It is understood that the alignment entity is derived from both text information and form information and points to the same object in the real world.
Aiming at entity alignment, the alignment entity can be realized in a character string matching mode, namely, the cell content in the form information is matched with each entry in the text information, and the alignment entity is obtained. The character string matching method may include two kinds of:
1) Taking English as an example, acquiring a table, cell contents in the table and related text information. Each word in the cell (except the stop word and the punctuation mark) is converted to the same root and an attempt is made to find the corresponding position of each word in the cell in the text information. The conversion into the same root word is used for improving the matching efficiency, for example, words in a cell are plural, corresponding words in text information are singular, and the matching efficiency and accuracy can be further improved after the root word conversion.
2) Enumerating possible phrases in the text information, comparing each possible phrase with the content of the cell, scoring each phrase based on the overlapping proportion of the word appearing in the cell and the phrase, obtaining the score of each phrase in the text information, and reserving the phrase with the highest score as an alignment entity. For example, the content of the cell is "Sequenced genome falciparum", the phrase in the text information is "Sequenced genome plasmodium falciparum", and the score corresponding to this is 0.75. Further, a score threshold may be set, and a phrase with a score higher than a preset score threshold may be determined as an aligned entity with successful matching, for example, the score threshold may be 0.5.
3. Based on the alignment entity, a preset entity filling task and a complete filling task, pre-training the initial language model to obtain a pre-training language model.
Two self-supervision tasks, namely an Entity filling task (EI) and a complete filling task (TCT), are designed during pre-training.
Compared with the existing mask language model, the entity filling task EI can mask the entry of the aligned entity in the text information and require the model to be restored. Meanwhile, the complete filling task TCT converts a plurality of aligned entities in the table layout into a text of a missing entity, and the model needs to extract the correct entity from the text information to fill the blank. Through the above two tasks, the knowledge of the form is well integrated into the language model.
The training of the obtained pre-training language model can fully utilize entity information in form information related to text information, so that good processing precision is achieved on downstream NLP tasks (such as information extraction, relation extraction, classification and the like).
The pre-trained language model may include an input layer, a semantic information extraction layer, a prediction layer, and a Pointer layer, wherein the semantic information extraction layer may be a two-way long and short Term Memory Network (Bidirectional Long Short-Term Memory, biLSTM), the prediction layer may be a conditional random field Network (Conditional Random Fields, CRF), and the Pointer layer may be a attention-based Pointer Network (Pointer Network). The prediction layer and the pointer layer are both connected with the semantic information extraction layer, and the prediction layer and the pointer layer are parallel.
Entity extraction is carried out on the target text sequence through the trained pre-training language model to obtain an adverse reaction entity, and the method specifically comprises the following steps of:
1) Inputting the target text sequence to an input layer, and encoding the target text sequence into a semantic vector sequence by using the input layer;
2) Inputting the encoded semantic vector sequence to a semantic information extraction layer, and outputting semantic information of each word in the semantic vector sequence in the context;
3) Inputting semantic information of each word in the semantic vector sequence in the context to a prediction layer, solving complex conditions such as entity crossing or nesting through global optimization, and obtaining a first entity output by the prediction layer based on an entity filling task;
4) Inputting semantic information of each word in the context of the semantic vector sequence to a pointer layer, finding out a corresponding position in text information of an entity missing in a complete blank filling template, and obtaining a second entity output by the pointer layer based on the complete blank filling task;
5) On the basis, the first entity and the second entity can be fused to obtain an adverse reaction entity. If the pointer layer is able to find the corresponding entity in the text but the prediction layer does not, the second entity output by the pointer layer is also taken as the last output entity. Experiments show that the entity correction method based on the Pointer Network can effectively improve the prediction performance of CRF.
On the basis of obtaining the adverse reaction entity, the situation that a plurality of medicines can be given to one patient in a clinical test and a plurality of corresponding adverse reaction events can occur is considered, so that the corresponding relation among the adverse reaction entities is also required to be extracted. A relationship extraction model can be constructed on the pre-training language model to further extract the relationship between the adverse reaction entities in the text.
Alternatively, the relationship between the adverse reaction entities in the text can be obtained by:
encoding the target text sequence and each entity to obtain a semantic vector sequence;
based on the semantic vector sequence, extracting the relation of each entity to obtain the relation among the entities.
Specifically, firstly, inputting the target text sequence and each adverse reaction entity obtained by entity extraction results into a model, and coding the target text sequence into a semantic vector sequence by using a pre-training language model. The adverse reaction entity may be added to the input text as a special phrase by reference to the method of the PURE model.
And then, constructing a relation classification model on the word segments corresponding to the adverse reaction entities, and extracting the relation of the adverse reaction entities to judge whether a specific relation exists between the given entities.
The target information thus obtained may include adverse reaction entities, and relationships between the adverse reaction entities. For example, adverse reaction entities include, but are not limited to: (1) drug information (drug name, dose, mode of administration, route of administration), (2) adaptability, (3) adverse reaction/event information, (4) SMQ of adverse reaction, (5) report type/severity/causal relationship, (6) patient information (age, sex, age at which adverse reaction/event occurred), (7) examination result, (8) diagnosis details, (9) icsr number, reporting time of the report.
In addition, the correspondence between entities of ICSR number-reporting time-indication-drug information-adverse reaction-SMQ of adverse reaction-report type/severity/causality-age can be obtained.
It should be noted that, considering that the ICSR numbers of the same patient are the same, if the same patient reports multiple SUSAR reports, only the version number will change, so the ICSR number + version number may be used as the unique code of the SUSAR report.
In some embodiments, after obtaining the target information, the target information may be further normalized, which specifically includes:
1) And matching the drug names, the enterprise names and the indications in the target information with a pre-constructed standard drug dictionary, an enterprise dictionary and an indication dictionary to obtain standard names.
2) Unifying the description of the sources of the drug and response/event correlation evaluation and the drug and response/event correlation evaluation results in the target information.
Wherein the evaluation sources include: the source of the assessment herein generally refers to the source of the adverse reaction data for the enterprise or researcher. For example, businesses in the evaluation source have the following descriptions "company", "sponsor", "Pharmaceutical Company", "AMH", etc. in different reports; researchers in the source of evaluation have the following descriptions "reporter", "researcher judgment", "Primary Source Reporter", etc. in different reports.
The evaluation results (causal relationships) include: positive correlation, likely correlation, uncertainty/to-be-evaluated or not-be-evaluated, likely irrelevant, uncorrelated, etc.
Positive correlation in the evaluation results the following description exists in different reports: "affirmative", "certain", "Related", "defined", "Strong correlation", etc.; it is likely that the correlation will be described as follows in different reports: "likely", "high probability", "Perfect correlation"; the possible correlations are described in the different reports as follows: "possibility", "certain probability", "association", "relation", "Moderate correlation"; the uncertainty/to-be-evaluated or inability to evaluate exists in different reports as follows: "unable to evaluate", "uncertainty"; the following description may not exist in different reports: "possibly irrelevant", "Weak correlation"; uncorrelation there is the following description in the different reports: "irrelevant", "No corridation";
3) The acquired indication information is bound to a MedDRA dictionary (medical-dictionary of-regulation-activities). The MedDRA dictionary is an internationally validated set of medical terms for use by regulatory authorities and the regulated biopharmaceutical industry for data entry, retrieval, evaluation and presentation throughout the regulatory process from before to after market.
Wherein attribution and association are carried out between each term of the MedDRA dictionary through a 5-level structure: 1. SOC (System Organ Class, systemic organ classification); 2. HLGT (High Level Group Term, high-level language); 3. HLT (High Level Term, high-Level language); 4. PT (Preferred Term, preferred); 5. LLT (Lowest Level Term, lower language). Generally, PT and SOC are two common dimensions when data analysis is performed, in the SUSAR report, LLT codes are marked beside for indications, and PT, SOC codes and corresponding terms can be obtained based on hierarchical correspondence in a dictionary.
4) And opening the acquired adverse event information with the MedDRA dictionary and the SMQ dictionary. SMQ dictionary is a standard MedDRA analysis query that refers to a set of terms from one or more SOCs that are related to a particular medical condition or field of interest, intended to aid in case identification).
Wherein, the adverse event information and the logic communicated with the MedDRA dictionary are the same as the indication information; in the SUSAR report, LLT codes are marked beside the adverse events, PT codes are obtained through the LLT codes based on the MedDRA dictionary, and further, corresponding SMQ information can be obtained through the LLT codes or the PT codes based on the existing logic of the SMQ dictionary.
Based on this, according to the target information obtained in step 120, the adverse reaction data analysis system is used to analyze the adverse reaction data under the target dimension of the target information, so as to obtain an analysis result.
Along with the accumulation of the number of adverse reaction reports, besides evaluating a single report, dynamic and multi-dimensional analysis also becomes a high-demand scene. At present, a great deal of time is required to collect and summarize reports manually to realize the analysis, and then unified data cleaning is performed on the summary. This approach consumes a lot of manpower and time, and the need can be rapidly fulfilled by the adverse reaction data analysis system for analyzing the adverse reaction data in the target dimension for the target information. The adverse reaction data analysis system can solve the problems that the existing adverse reaction data analysis mode can not meet the high-precision analysis requirement of a user on adverse reaction of medicines and the analysis efficiency is low.
In some embodiments, the adverse reaction data analysis system includes at least one of a time analysis unit, a drug analysis unit, and an enterprise analysis unit, and the adverse reaction data analysis system analyzes the adverse reaction data in the target dimension of the target information to obtain an analysis result, that is, step 130 specifically includes:
step 131, determining a target dimension, wherein the target dimension comprises at least one dimension of a time dimension, a medicine dimension and an enterprise dimension;
step 132, extracting information to be analyzed associated with the target dimension from the target information;
and step 133, analyzing adverse reaction data under a target dimension of the information to be analyzed by at least one of a time analysis unit, a medicine analysis unit and an enterprise analysis unit to obtain an analysis result.
Specifically, the target dimension may be any single dimension of a time dimension, a medicine dimension, and an enterprise dimension; at least two dimensions may also be included, for example, the target dimension may include any two of a time dimension, a drug dimension, and an enterprise dimension; the three dimensions can be simultaneously included, and the dimensions can be flexibly selected according to scene requirements, and the embodiment of the invention is not particularly limited.
After the target dimension is determined, information to be analyzed associated with the target dimension may be extracted from the target information. For example, when the target dimension includes a time dimension and a medicine dimension, information to be analyzed of any target medicine in any target time can be extracted from the target information; when the target dimension includes a time dimension, a medicine dimension and an enterprise dimension, any target medicine under any target enterprise can be extracted from the target information, and information to be analyzed in any target time can be extracted.
In some embodiments, extracting the information to be analyzed associated with the target dimension from the target information, that is, step 132 specifically includes:
step 132-1, obtaining a target object, wherein the target object is described based on at least one of patient information, medicine information, enterprise information, indication information, reporting time and medical term information;
step 132-2, extracting information to be analyzed associated with both the target dimension and the target object from the target information.
Specifically, not only the information to be analyzed may be extracted from the target dimension, but also the information to be analyzed associated with both the target dimension and the target object may be simultaneously extracted. The target object herein may be described based on at least one of patient information, drug information, enterprise information, indication information, reporting time, medical term information. Among them, the medical term information may include, for example, SOC (system organ classification), HLT (high-order language), PT (preferred language), SMQ, and the like.
For the extraction of the information to be analyzed associated with both the target dimension and the target object, the information to be analyzed associated with the target dimension may be extracted first, and then the information to be analyzed associated with the target object may be extracted; or firstly extracting the information to be analyzed associated with the target object, and then extracting the information to be analyzed associated with the target dimension; of course, the information to be analyzed associated with both the target dimension and the target object may also be extracted at the same time, which is not particularly limited in the embodiment of the present invention.
For example, in the case where the target dimension includes a time dimension, the first target information may be first extracted from the target information based on at least one of patient information, drug information, enterprise information, indication information, medical term information; and extracting the first target information in any time period from the first target information, thereby obtaining information to be analyzed associated with both the target dimension and the target object.
After the information to be analyzed is obtained, the adverse reaction data analysis under the target dimension can be carried out on the information to be analyzed through at least one of a time analysis unit, a medicine analysis unit and an enterprise analysis unit, so that an analysis result can be obtained.
In some embodiments, the analysis result is obtained by performing the analysis on the adverse reaction data in the target dimension on the information to be analyzed by at least one of the time analysis unit, the drug analysis unit and the enterprise analysis unit, that is, step 133 specifically includes:
step 133-1, under the condition that the target dimension is the time dimension, analyzing the type and/or occurrence frequency of the adverse reaction event corresponding to the target time in the information to be analyzed through a time analysis unit to obtain an analysis result;
step 133-2, under the condition that the target dimension is the medicine dimension, analyzing the type and/or occurrence frequency of the adverse reaction event corresponding to the target medicine in the information to be analyzed by a medicine analysis unit to obtain an analysis result;
and step 133-3, analyzing, by the enterprise analysis unit, the target medicine associated with the target enterprise in the information to be analyzed and the type and/or occurrence frequency of the adverse reaction event associated with the target medicine to obtain an analysis result.
Specifically, the time analysis unit in the adverse reaction data analysis system may be used for performing adverse reaction data analysis in a time dimension on the information to be analyzed. Under the condition that the target dimension is the time dimension, the SUSAR total number, the related medicine total number, the type of adverse reaction event and/or the occurrence frequency can be reported in any time within the time dimension statistics cutoff, and further, the data can be further limited based on any one or combination dimensions of patient information, medicine names, enterprise names, indication information and the like. The target time in the information to be analyzed may be input by the user through a client of the adverse reaction data analysis system.
For example, the type of adverse events within the target time period of 2021, 6, 7, and 2023, 4, and 7 can be counted, e.g., the type can include death, life threatening, prolonged hospitalization/hospitalization, permanent/significant loss of function, teratogenesis/birth defects, other important medical events, and the number of occurrence of adverse events of each type; the annual occurrence times of various types of adverse reaction events within the target time can be counted.
Further, the statistical results can be analyzed, and the analysis results are visually displayed by adopting tables, time stacking diagrams, sector diagrams and the like.
The medicine analysis unit in the adverse reaction data analysis system can be used for analyzing the adverse reaction data of the medicine dimension of the information to be analyzed. Under the condition that the target dimension is the medicine dimension, the total SUSAR number, the type of adverse events and/or the occurrence number reported by any target medicine can be obtained in the medicine dimension, and further, the data can be further limited based on any one or combination dimensions of patient information, indication information, reporting time and the like.
For example, the type and number of adverse events reported by the target drug "fluxapyropali capsule" may be obtained.
In some embodiments, the type and/or the occurrence number of the adverse reaction event corresponding to the target drug in the information to be analyzed are analyzed by the drug analysis unit, so as to obtain an analysis result, namely, step 133-2 specifically includes:
based on the causal relationship between each target medicine and each adverse reaction event in the information to be analyzed, counting the types and/or occurrence times of the adverse reaction events corresponding to each target medicine from the information to be analyzed;
and comparing the types and/or occurrence times of the adverse reaction events corresponding to the target medicines through the medicine analysis unit, and determining an analysis result based on the comparison result.
Specifically, the causal relationship between each target drug and each adverse reaction event can be obtained based on the relationship between the drug-adverse event-evaluation source-evaluation result, so that the type and/or occurrence frequency of the adverse reaction event corresponding to each target drug can be counted from the information to be analyzed. For example, the type and/or the occurrence number of the adverse reaction event corresponding to the target medicine a, the type and/or the occurrence number of the adverse reaction event corresponding to the target medicine B, the type and/or the occurrence number of the adverse reaction event corresponding to the target medicine C, and the like may be obtained.
Based on the comparison, the types and/or occurrence times of the adverse reaction events corresponding to the target medicines are compared, and the analysis result is determined based on the comparison result.
In practical application, the medicine analysis unit in the adverse reaction data analysis system can provide an inlet of medicine analysis, click a medicine comparison button after entering a medicine analysis page, enter the medicine comparison page, automatically input medicines entering an interface into a medicine retrieval frame by default, input medicines to be compared into the medicine retrieval frame, input association, and for example, compare five medicines.
Further, the need to compare the SUSAR event profile for drugs in different indications, reporting times, gender and age ranges can be limited.
The comparison result thus obtained may include a drug name, a total number of SUSAR, a number of clinical trials, a number of events of respective risk degrees, a PT type and number thereof, an SMQ type and number thereof.
Then, an analysis result can be determined based on the comparison result, and the analysis result can be visually displayed by adopting a table, a time accumulation chart, a sector chart and the like.
The enterprise analysis unit in the adverse reaction data analysis system can be used for carrying out adverse reaction data analysis of enterprise dimension on the information to be analyzed. And under the condition that the target dimension is the enterprise dimension, determining an analysis result based on the target medicine associated with the target enterprise in the information to be analyzed and the type and/or occurrence number of adverse reaction events associated with the target medicine.
For example, the total number of the SUSAR reported by any target enterprise, the types and the occurrence number of the adverse events, the names and the number of the medicines corresponding to the adverse events can be obtained in the enterprise dimension, and further, the data can be further limited based on any one or combination dimensions of the patient information, the medicine names, the indication information, the reporting time and SOC, HLT, PT, SMQ.
In practical application, the analysis result may include the proportion and the number of the SUSAR drugs reported by the target enterprise at different times, sexes, ages, countries, drugs, indications and adverse reactions; but also includes adverse event (PT) types reported by different drugs at different times by the target enterprise.
For example, the left side of the page may be a search term, and "time" (defaults to all times), "drug" (drug that defaults to the largest number of reports) and "PT type" may be searched, and the right side of the page displays the PT type (fan-shaped diagram) and the number of reports time-number of reports (line diagram) corresponding to the target drug.
According to the adverse reaction data analysis method provided by the embodiment of the invention, the adverse reaction data related to the medicine is subjected to structural analysis through the adverse reaction data analysis system to obtain the target information of the adverse reaction data, so that the monitoring range based on data analysis is widened, and the effectiveness and timeliness of the data analysis on the adverse reaction of the medicine are improved.
In addition, the target information is subjected to standard matching with the MedDRA and SMQ dictionary, so that a user can conveniently and rapidly carry out multidimensional, systematic and visual combined analysis on data under the condition that the user only needs to provide adverse reaction data (for example, SUSAR report text), and the search requirement and statistical requirement of the user on adverse reaction/event are met, and the problem that the existing adverse reaction data analysis efficiency is low is solved at one step.
Based on the above embodiments, there is provided an adverse reaction data analysis method including:
s1, acquiring medicine information data.
S2, carrying out chart detection on the drug information data based on a target detection model, wherein the target detection model is obtained by training a sample data set of a chart area coverage title area pre-marked; if the graph position information in the medicine information data is detected, the information text is extracted from the graph of the medicine information data based on the graph position information.
S3, carrying out adverse reaction association classification on the information text, and screening adverse reaction data of information sources from the drug information data based on the adverse reaction association classification result;
and obtaining adverse reaction data of a clinical adverse reaction report source and/or an adverse reaction database source after marketing.
S4, under the condition that the data type of the adverse reaction data comprises texts, extracting the entity from the texts in the adverse reaction data based on a pre-training language model to obtain adverse reaction entities; the target information is determined based on the adverse reaction entities and the relationship between the adverse reaction entities in the text.
In S4, based on the pre-training language model, entity extraction is performed on the text in the adverse reaction data to obtain an adverse reaction entity, which specifically includes:
s41, acquiring form information of a form in the adverse reaction data, and performing entity masking on the form information to obtain a complete blank filling template; determining a target text sequence based on the complete blank filling template and texts in the adverse reaction data; based on a pre-training language model, extracting entities from the target text sequence to obtain adverse reaction entities; the pre-training language model is obtained by pre-training the initial language model based on an alignment entity, a preset entity filling task and a complete filling task; the alignment entity is obtained by entity alignment of text and associated form information in the sample data.
S5, determining a target dimension, wherein the target dimension comprises at least one dimension of a time dimension, a medicine dimension and an enterprise dimension; acquiring a target object, wherein the target object is described based on at least one of patient information, medicine information, enterprise information, indication information, reporting time and medical term information;
Extracting information to be analyzed associated with both the target dimension and the target object from the target information;
under the condition that the target dimension is the time dimension, analyzing the type and/or occurrence frequency of the adverse reaction event corresponding to the target time in the information to be analyzed through a time analysis unit in the adverse reaction data analysis system to obtain an analysis result;
under the condition that the target dimension is the medicine dimension, based on the causal relationship between each target medicine and each adverse reaction event in the information to be analyzed, counting the type and/or occurrence frequency of the adverse reaction event corresponding to each target medicine from the information to be analyzed; and comparing types and/or occurrence times of the adverse reaction events corresponding to the target medicines through a medicine analysis unit in the adverse reaction data analysis system, and determining an analysis result based on the comparison result.
And under the condition that the target dimension is the enterprise dimension, analyzing the type and/or occurrence frequency of the adverse reaction event corresponding to the target medicine in the information to be analyzed through an enterprise analysis unit in the adverse reaction data analysis system to obtain an analysis result.
The adverse reaction data analysis device provided by the invention is described below, and the adverse reaction data analysis device described below and the adverse reaction data analysis method described above can be referred to correspondingly.
Fig. 2 is a schematic structural diagram of an adverse reaction data analysis device provided by the present invention, and as shown in fig. 2, the adverse reaction data analysis device provided by the embodiment of the present invention includes:
an acquisition unit 210 for acquiring adverse reaction data related to a drug;
an extracting unit 220, configured to extract information from the adverse reaction data based on a data type to which the adverse reaction data belongs, so as to obtain target information of the adverse reaction data;
and an analysis unit 230, configured to perform adverse reaction data analysis on the target information in the target dimension by using an adverse reaction data analysis system, so as to obtain an analysis result.
According to the adverse reaction data analysis device provided by the embodiment of the invention, the adverse reaction data related to the medicine is obtained, so that the medicine research and development bottom layer data support with the full life cycle is provided for subsequent adverse reaction data analysis; and the adverse reaction data analysis system is used for carrying out adverse reaction data analysis on target information obtained by information extraction under a target dimension, so that multi-dimension and systematic combined analysis on the adverse reaction data can be rapidly realized, and the high-precision analysis requirement of a user on adverse reaction of medicines is met, thereby solving the problem of low analysis efficiency of the existing adverse reaction data at one step.
Based on the above embodiment, the obtaining unit 210 is specifically configured to:
acquiring medicine information data, extracting information text in the medicine information data, performing adverse reaction association classification on the information text, and further screening adverse reaction data of information sources from the medicine information data based on the adverse reaction association classification result;
and obtaining adverse reaction data of a clinical adverse reaction report source and/or an adverse reaction database source after marketing.
Based on the above embodiment, the obtaining unit 210 is further specifically configured to:
performing chart detection on the drug information data based on a target detection model, wherein the target detection model is obtained by training based on a sample data set of a pre-marked chart region coverage title region;
if the chart position information in the medicine information data is detected, the information text is extracted from the chart of the medicine information data based on the chart position information.
Based on the above embodiment, the extracting unit 220 is specifically configured to:
under the condition that the data type of the adverse reaction data comprises texts, extracting the entity from the texts in the adverse reaction data based on a pre-training language model to obtain adverse reaction entities;
The target information is determined based on the adverse reaction entities and the relationship between the adverse reaction entities in the text.
Based on the above embodiment, the extracting unit 220 is further specifically configured to:
acquiring form information of a form in the adverse reaction data, and performing entity masking on the form information to obtain a complete blank filling template;
determining a target text sequence based on the complete blank filling template and the text in the adverse reaction data;
based on a pre-training language model, entity extraction is carried out on the target text sequence, and the adverse reaction entity is obtained;
the pre-training language model is obtained by pre-training the initial language model based on an alignment entity, a preset entity filling task and a complete filling task; the alignment entity is obtained by performing entity alignment on text and associated form information in the sample data.
Based on the above embodiment, the extracting unit 220 is further specifically configured to:
acquiring layout information corresponding to the table information;
determining entity relations among cell entities in the table information according to the layout information; wherein the entity relationship comprises at least one of: the same column entity relationship, the same row entity relationship and the same row column entity relationship;
Filling the cell entities into task templates preset corresponding to the entity relationship to obtain template texts;
and in the template text, carrying out mask processing on the target entity in the cell entity to obtain the complete blank filling template.
Based on the above embodiment, the adverse reaction data analysis system includes at least one of a time analysis unit, a drug analysis unit, and an enterprise analysis unit, and the analysis unit 230 is specifically configured to:
determining a target dimension, wherein the target dimension comprises at least one dimension of a time dimension, a medicine dimension and an enterprise dimension;
extracting information to be analyzed associated with the target dimension from the target information;
and carrying out adverse reaction data analysis on the information to be analyzed under the target dimension through at least one of the time analysis unit, the medicine analysis unit and the enterprise analysis unit to obtain an analysis result.
Based on the above embodiment, the analysis unit 230 is further specifically configured to:
acquiring a target object, wherein the target object is described based on at least one of patient information, medicine information, enterprise information, indication information, reporting time and medical term information;
And extracting information to be analyzed associated with both the target dimension and the target object from the target information.
Based on the above embodiment, the analysis unit 230 is further specifically configured to:
under the condition that the target dimension is the time dimension, analyzing the type and/or occurrence frequency of the adverse reaction event corresponding to the target time in the information to be analyzed through the time analysis unit to obtain an analysis result;
under the condition that the target dimension is the medicine dimension, analyzing the type and/or occurrence frequency of the adverse reaction event corresponding to the target medicine in the information to be analyzed through the medicine analysis unit to obtain an analysis result;
and under the condition that the target dimension is the enterprise dimension, analyzing a target medicine associated with a target enterprise in the information to be analyzed and the type and/or occurrence frequency of adverse reaction events associated with the target medicine by the enterprise analysis unit to obtain an analysis result.
Based on the above embodiment, the analysis unit 230 is further specifically configured to:
based on the causal relationship between each target medicine and each adverse reaction event in the information to be analyzed, counting the types and/or occurrence times of the adverse reaction events corresponding to each target medicine from the information to be analyzed;
And comparing the types and/or occurrence times of the adverse reaction events corresponding to the target medicines through the medicine analysis unit, and determining the analysis result based on the comparison result.
Fig. 3 illustrates a physical schematic diagram of an electronic device, as shown in fig. 3, where the electronic device may include: processor 310, communication interface (Communications Interface) 320, memory 330 and communication bus 340, wherein processor 310, communication interface 320, memory 330 accomplish communication with each other through communication bus 340. The processor 310 may invoke logic instructions in the memory 330 to perform a data analysis method comprising: acquiring adverse reaction data related to a drug; based on the data type of the adverse reaction data, extracting information from the adverse reaction data to obtain target information of the adverse reaction data; and carrying out adverse reaction data analysis on the target information under the target dimension by using an adverse reaction data analysis system to obtain an analysis result.
Further, the logic instructions in the memory 330 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of executing the adverse reaction data analysis method provided by the above methods, the method comprising: acquiring adverse reaction data related to a drug; based on the data type of the adverse reaction data, extracting information from the adverse reaction data to obtain target information of the adverse reaction data; and carrying out adverse reaction data analysis on the target information under the target dimension by using an adverse reaction data analysis system to obtain an analysis result.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the adverse reaction data analysis method provided by the above methods, the method comprising: acquiring adverse reaction data related to a drug; based on the data type of the adverse reaction data, extracting information from the adverse reaction data to obtain target information of the adverse reaction data; and carrying out adverse reaction data analysis on the target information under the target dimension by using an adverse reaction data analysis system to obtain an analysis result.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for analyzing adverse reaction data, comprising:
acquiring adverse reaction data related to a drug;
based on the data type of the adverse reaction data, extracting information from the adverse reaction data to obtain target information of the adverse reaction data;
and carrying out adverse reaction data analysis on the target information under the target dimension by using an adverse reaction data analysis system to obtain an analysis result.
2. The adverse reaction data analysis method according to claim 1, wherein the adverse reaction data analysis system includes at least one of a time analysis unit, a medicine analysis unit, and an enterprise analysis unit;
The step of analyzing the adverse reaction data under the target dimension by the adverse reaction data analysis system to obtain an analysis result comprises the following steps:
determining a target dimension, wherein the target dimension comprises at least one dimension of a time dimension, a medicine dimension and an enterprise dimension;
extracting information to be analyzed associated with the target dimension from the target information;
and carrying out adverse reaction data analysis on the information to be analyzed under the target dimension through at least one of the time analysis unit, the medicine analysis unit and the enterprise analysis unit to obtain an analysis result.
3. The method according to claim 2, wherein the extracting information to be analyzed associated with the target dimension from the target information includes:
acquiring a target object, wherein the target object is described based on at least one of patient information, medicine information, enterprise information, indication information, reporting time and medical term information;
and extracting information to be analyzed associated with both the target dimension and the target object from the target information.
4. The method for analyzing adverse reaction data according to claim 2, wherein the analyzing the adverse reaction data in the target dimension of the information to be analyzed by at least one of the time analysis unit, the medicine analysis unit, and the enterprise analysis unit to obtain an analysis result includes:
Under the condition that the target dimension is the time dimension, analyzing the type and/or occurrence frequency of the adverse reaction event corresponding to the target time in the information to be analyzed through the time analysis unit to obtain an analysis result;
under the condition that the target dimension is the medicine dimension, analyzing the type and/or occurrence frequency of the adverse reaction event corresponding to the target medicine in the information to be analyzed through the medicine analysis unit to obtain an analysis result;
and under the condition that the target dimension is the enterprise dimension, analyzing a target medicine associated with a target enterprise in the information to be analyzed and the type and/or occurrence frequency of adverse reaction events associated with the target medicine by the enterprise analysis unit to obtain an analysis result.
5. The method for analyzing adverse reaction data according to claim 4, wherein analyzing, by the drug analysis unit, the type and/or occurrence number of adverse reaction events corresponding to the target drug in the information to be analyzed, to obtain the analysis result, includes:
based on the causal relationship between each target medicine and each adverse reaction event in the information to be analyzed, counting the types and/or occurrence times of the adverse reaction events corresponding to each target medicine from the information to be analyzed;
And comparing the types and/or occurrence times of the adverse reaction events corresponding to the target medicines through the medicine analysis unit, and determining the analysis result based on the comparison result.
6. The method of any one of claims 1 to 5, wherein the acquiring adverse reaction data related to a drug comprises:
acquiring medicine information data, extracting information text in the medicine information data, performing adverse reaction association classification on the information text, and further screening adverse reaction data of information sources from the medicine information data based on the adverse reaction association classification result; the method comprises the steps of,
and obtaining adverse reaction data of a clinical adverse reaction report source and/or an adverse reaction database source after marketing.
7. The method for analyzing adverse reaction data according to any one of claims 1 to 5, wherein the extracting information of the adverse reaction data based on the data type to which the adverse reaction data belongs to obtain target information of the adverse reaction data includes:
under the condition that the data type of the adverse reaction data comprises texts, extracting the entity from the texts in the adverse reaction data based on a pre-training language model to obtain adverse reaction entities;
The target information is determined based on the adverse reaction entities and the relationship between the adverse reaction entities in the text.
8. An adverse reaction data analysis device, comprising:
an acquisition unit for acquiring adverse reaction data related to the drug;
the extraction unit is used for extracting information of the adverse reaction data based on the data type of the adverse reaction data to obtain target information of the adverse reaction data;
and the analysis unit is used for carrying out adverse reaction data analysis on the target information under the target dimension through an adverse reaction data analysis system to obtain an analysis result.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the adverse reaction data analysis method of any one of claims 1 to 7 when the program is executed by the processor.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the adverse reaction data analysis method according to any one of claims 1 to 7.
CN202310685543.0A 2023-06-09 2023-06-09 Adverse reaction data analysis method, device, electronic equipment and storage medium Pending CN116913548A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310685543.0A CN116913548A (en) 2023-06-09 2023-06-09 Adverse reaction data analysis method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310685543.0A CN116913548A (en) 2023-06-09 2023-06-09 Adverse reaction data analysis method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116913548A true CN116913548A (en) 2023-10-20

Family

ID=88357213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310685543.0A Pending CN116913548A (en) 2023-06-09 2023-06-09 Adverse reaction data analysis method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116913548A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117649909A (en) * 2024-01-29 2024-03-05 吉林省乾宇升科技有限公司 Optimized matching method for biomedical clinical test

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117649909A (en) * 2024-01-29 2024-03-05 吉林省乾宇升科技有限公司 Optimized matching method for biomedical clinical test
CN117649909B (en) * 2024-01-29 2024-04-19 吉林省乾宇升科技有限公司 Optimized matching method for biomedical clinical test

Similar Documents

Publication Publication Date Title
US10818397B2 (en) Clinical content analytics engine
US10878962B2 (en) System and method for extracting oncological information of prognostic significance from natural language
US20220044812A1 (en) Automated generation of structured patient data record
Meystre et al. Automatic trial eligibility surveillance based on unstructured clinical data
Matci et al. Address standardization using the natural language process for improving geocoding results
Liu et al. A knowledge base of clinical trial eligibility criteria
Pereira et al. ICD9-based text mining approach to children epilepsy classification
Negi et al. A novel method for drug-adverse event extraction using machine learning
CN112541066A (en) Text-structured-based medical and technical report detection method and related equipment
CN116913548A (en) Adverse reaction data analysis method, device, electronic equipment and storage medium
CN110532367A (en) A kind of information cuing method and system
CN115394393A (en) Intelligent diagnosis and treatment data processing method and device, electronic equipment and storage medium
CN116913549A (en) Adverse reaction event early warning method, device, system and electronic equipment
Rijo et al. Decision Support System to Diagnosis and Classification of Epilepsy in Children.
CN112561714B (en) Nuclear protection risk prediction method and device based on NLP technology and related equipment
US8756234B1 (en) Information theory entropy reduction program
Lavanya et al. Auto capture on drug text detection in social media through NLP from the heterogeneous data
CN114548100A (en) Clinical scientific research auxiliary method and system based on big data technology
Wenger et al. Automated Extraction of Sentencing Decisions from Court Cases in the Hebrew Language
Usip et al. PeNLP parser: an extraction and visualization tool for precise maternal, neonatal and child healthcare geo-locations from unstructured data
Ficheur et al. Interoperability of medical databases: construction of mapping between hospitals laboratory results assisted by automated comparison of their distributions
Shi et al. Constructing a finer-grained representation of clinical trial results from ClinicalTrials. gov
CN116992839B (en) Automatic generation method, device and equipment for medical records front page
EP3738054A1 (en) A system and method for extracting oncological information of prognostic significance from natural language
Osebe et al. H4H: A Comprehensive Repository of Housing Resources for Homelessness

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination