CN116913549A - Adverse reaction event early warning method, device, system and electronic equipment - Google Patents

Adverse reaction event early warning method, device, system and electronic equipment Download PDF

Info

Publication number
CN116913549A
CN116913549A CN202310685558.7A CN202310685558A CN116913549A CN 116913549 A CN116913549 A CN 116913549A CN 202310685558 A CN202310685558 A CN 202310685558A CN 116913549 A CN116913549 A CN 116913549A
Authority
CN
China
Prior art keywords
entity
information
early warning
event
adverse reaction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310685558.7A
Other languages
Chinese (zh)
Inventor
周立运
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rubik's Cube Medical Technology Suzhou Co ltd
Original Assignee
Rubik's Cube Medical Technology Suzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rubik's Cube Medical Technology Suzhou Co ltd filed Critical Rubik's Cube Medical Technology Suzhou Co ltd
Priority to CN202310685558.7A priority Critical patent/CN116913549A/en
Publication of CN116913549A publication Critical patent/CN116913549A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Toxicology (AREA)
  • Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Biomedical Technology (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Medicinal Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the technical field of medicines, and provides an adverse reaction event early warning method, device and system and electronic equipment, wherein the adverse reaction event early warning method comprises the following steps: acquiring a target text set, wherein the target text set comprises a plurality of text information related to adverse reaction events; entity extraction is carried out on text information in the target text set to obtain event entity information, wherein the event entity information comprises entities and relations among the entities; and carrying out intelligent analysis and early warning on the attention degree of the adverse reaction event based on the event entity information to obtain an early warning result. By adopting the method and the device, the early warning result of the adverse drug reaction event with high accuracy can be obtained.

Description

Adverse reaction event early warning method, device, system and electronic equipment
Technical Field
The invention relates to the technical field of medicines, in particular to an adverse reaction event early warning method, device, system and electronic equipment.
Background
At present, monitoring the safety of medicines, identifying safety signals, evaluating and controlling safety risks, and establishing medicine vigilance covering the whole life cycle of medicines are one of important tasks of supervision science.
However, because the acquisition and processing of the adverse drug reaction data requires a large amount of manpower and material resources, and the data sources are scattered and the quality is uneven, the early warning difficulty is increased, the comprehensive and accurate monitoring and early warning are more difficult to realize, and even after the monitoring, the result with action guidance cannot be provided.
Therefore, the development and application of the existing adverse reaction early warning technology aiming at the medicine still face a plurality of challenges, and the high-precision adverse reaction event early warning result of the medicine is difficult to obtain.
Disclosure of Invention
The invention provides an adverse reaction event early warning method, device, system and electronic equipment, which are used for solving the defects that a great deal of manpower and material resource investment is required for acquiring and processing adverse reaction data of medicines in the prior art, and high-precision adverse reaction event early warning results of the medicines are difficult to obtain.
The invention provides an adverse reaction event early warning method, which comprises the following steps:
acquiring a target text set, wherein the target text set comprises a plurality of text information related to adverse reaction events;
entity extraction is carried out on the text information in the target text set to obtain event entity information, wherein the event entity information comprises entities and relations among the entities;
And based on the event entity information, performing intelligent analysis and early warning on the attention degree of the adverse reaction event to obtain an early warning result.
According to the adverse reaction event early warning method provided by the invention, the attention degree intelligent analysis early warning is carried out on the adverse reaction event based on the event entity information to obtain an early warning result, and the method comprises the following steps:
determining a first early warning result based on at least one of a drug entity, an adverse reaction event entity and an age entity in the event entity information; and
determining a second early warning result based on the dangerous degree entity and/or the causal relationship entity in the event entity information;
and carrying out intelligent analysis and early warning on the attention degree of the adverse reaction event based on the first early warning result and/or the second early warning result to obtain the early warning result.
According to the adverse reaction event early warning method provided by the invention, the determining the first early warning result based on at least one of the drug entity, the adverse reaction event entity and the age entity in the event entity information comprises the following steps:
marking a drug entity, an adverse reaction event entity and an age entity in the event entity information respectively to obtain a drug entity label, an adverse reaction event entity label and an age entity label;
And determining the first early warning result based on at least one of the drug entity tag, the adverse reaction event entity tag and the age entity tag.
According to the adverse reaction event early warning method provided by the invention, the causality entity is determined by the following steps:
based on semantic similarity between related segmentation words in the text information and a preset causal relationship entity, matching the related segmentation words with the preset causal relationship entity, and determining the causal relationship entity based on a matching result; and/or the number of the groups of groups,
and extracting the causal relation entity from the text information based on a pre-training entity extraction model to obtain the causal relation entity.
According to the adverse reaction event early warning method provided by the invention, the acquisition of the target text set comprises the following steps:
acquiring information documents related to medicines;
extracting candidate text information in the information document, and classifying the candidate text information to screen out target text information related to the adverse reaction event;
and determining the target text set based on at least two of the target text information, text information in a safety report of the clinical trial drug and text information in safety monitoring data of the marketed drug.
According to the adverse reaction event early warning method provided by the invention, the entity extraction is carried out on the text information in the target text set to obtain event entity information, and the method comprises the following steps:
acquiring form information in the text information, and performing entity masking on the form information to obtain a complete blank filling template;
determining a target text sequence based on the text information and the complete blank filling template;
inputting the target text sequence into a pre-training language model, and outputting the event entity information;
the pre-training language model is obtained by pre-training the initial language model based on an alignment entity, a preset entity filling task and a complete filling task; the alignment entity is obtained by performing entity alignment on text information and associated form information in the sample document.
According to the adverse reaction event early warning method provided by the invention, the entity extraction is carried out on the text information in the target text set, and after the event entity information is obtained, the method further comprises the following steps:
normalizing the entity in the event entity information based on at least one of each preset dictionary, a general term evaluation standard of adverse reaction events and preset keywords;
And the standardized entity is used for carrying out intelligent analysis and early warning on the attention degree of the adverse reaction event to obtain the early warning result.
The invention also provides an adverse reaction event early warning device, which comprises:
the text set acquisition unit is used for acquiring a target text set, wherein the target text set comprises a plurality of text information related to adverse reaction events;
the entity extraction unit is used for carrying out entity extraction on the text information in the target text set to obtain event entity information, wherein the event entity information comprises entities and relations among the entities;
and the event early warning unit is used for carrying out intelligent analysis early warning on the attention degree of the adverse reaction event based on the event entity information to obtain an early warning result.
The invention also provides an adverse reaction event early warning system which comprises a client and a server; wherein,,
the client is used for receiving the keywords to be searched and sending the keywords to be searched to the server; the keyword to be searched comprises a target entity and/or a target early warning result;
the server is used for receiving the keywords to be searched of the client and determining adverse reaction event information related to the keywords to be searched based on event entity information and early warning results of adverse reaction events; the event entity information and the early warning result of the adverse reaction event are determined based on the adverse reaction event early warning method.
The invention also provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the adverse reaction event early warning method according to any one of the above when executing the computer program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements an adverse reaction event warning method as described in any of the above.
The invention also provides a computer program product comprising a computer program which when executed by a processor implements the adverse reaction event early warning method as described in any one of the above.
According to the adverse reaction event early warning method, device, system, electronic equipment, non-transitory computer readable storage medium and computer program product, the event entity information is obtained by carrying out entity extraction on a plurality of text information related to the adverse reaction event, and intelligent analysis early warning is carried out on the attention of the adverse reaction event based on the event entity information to obtain an early warning result, so that the adverse reaction data with comprehensive high quality can be obtained and processed, comprehensive and accurate monitoring early warning is realized, and the accuracy and reliability of safety early warning covering the whole life cycle of the medicine are obviously improved.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of an adverse reaction event early warning method provided by the invention;
FIG. 2 is a schematic structural diagram of an adverse reaction event early warning device provided by the invention;
FIG. 3 is a schematic structural diagram of an adverse reaction event early warning system provided by the invention;
fig. 4 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
With the reform of the national drug review batch system, the number of new drug clinical trials in China is greatly increased, and the new clinical trial number brings new opportunities to patients, but also faces huge safety risks. Therefore, establishing a drug alert during a clinical trial, enhancing quality management of the rapid reporting of safety data during the clinical trial, monitoring the rapid reporting of safety data during the clinical trial, identifying safety signals, evaluating and controlling safety risks, and establishing a drug alert covering the full life cycle of a drug are one of the important tasks of regulatory science.
However, the existing adverse reaction event monitoring system mainly focuses on monitoring adverse reaction signals of medicines after marketing, and the medicine warning during clinical experiments before marketing still only meets the reporting requirements of the regulation of the regulatory authorities, and further optimization and improvement are needed. For example, the chinese hospital drug alert system (China Hospital Pharmacovigilance System, CHPS) is an active monitoring system established by the national drug adverse reaction monitoring center for achieving post-marketing drug adverse reaction monitoring, which establishes a partner cooperation mechanism by the national drug adverse reaction monitoring whistle alliance CASSA, establishes the CHPS to open a data channel between data sources, and verifies drug safety issues by establishing a risk analysis model. However, CHPS is currently only monitored for adverse events of drugs after marketing, and the data source is concentrated on part of three-dimensional hospital data with strong research capability and high informatization degree in CASSA, and the monitored data source is single.
For another example, although the adverse reaction collection system (such as taimei medical treatment) during clinical test is based on the requirements of the quality management standard (Good Clinical Practice, GCP) of new-edition clinical test, the process for realizing the full-chain management of safety information during clinical test is as follows: researchers found during clinical trials that serious adverse events (Serious Adverse Event, SAE) reported immediately to the reporting party, reporting party or contractual research organization (Contract Research Organization, CRO) reported events assessed as suspicious unexpected serious adverse effects (Suspected Unexpected Serious Adverse Reaction, SUSAR), further reported to all researchers/clinical trial institutions and ethics committees participating in clinical trials, while reporting to drug administration and health authorities.
It can be seen that although the problem of unified transmission between different data sources is currently solved based on a general data model (Common Data Model, CDM), a research project-level CDM has not been established yet, and complex, large-scale security signal alerting cannot be achieved.
In addition, the system schemes serving the medical institutions on the market include, but are not limited to, a clinical auxiliary decision making system, a medical safety early warning information system, a cerebral apoplexy early warning system, a clinical test management system and the like, but basically, the solutions can only provide early warning prompts, can not early warn risks or events existing in diagnosis and treatment and test processes, and can not provide guiding action measures.
In summary, the acquisition and processing of the existing adverse reaction data of the medicine requires a large amount of manpower and material resources, and the data sources are scattered and the quality is uneven, so that the early warning difficulty is increased, the comprehensive and accurate monitoring and early warning are more difficult to realize, and even after the monitoring, the result with action guidance cannot be provided.
Based on the above, in order to obtain the early warning result of adverse drug reaction event with high precision and containing the advice of guiding action measures, the invention concept of the invention is as follows: obtaining a plurality of text information related to adverse reaction events, extracting the text information to obtain event entity information, and performing intelligent analysis and early warning on attention degree of the adverse reaction events based on the event entity information to obtain early warning results, thereby realizing overall monitoring and early warning of longitudinally looking at the whole view of a test in a supervision angle, transversely looking at the differences of all clinical tests, and providing helpful overall monitoring and early warning in a diagnosis and treatment angle, and obtaining high-precision drug adverse reaction event early warning results.
Based on the above inventive concept, the invention provides an adverse reaction event early warning method, an adverse reaction event early warning device, an adverse reaction event early warning system, an electronic device, a non-transitory computer readable storage medium and a computer program product, which are applied to adverse reaction event early warning scenes of medicines in medicine technology, so as to improve the accuracy and reliability of safety early warning aiming at the whole life cycle of the medicines.
The technical scheme of the present invention will be described in detail with reference to the accompanying drawings. Fig. 1 is a schematic flow chart of an adverse reaction event early warning method provided by the present invention, in which the execution subject of each step may be an adverse reaction event early warning device, and the device may be implemented by software and/or hardware, and the device may be integrated in an electronic device, where the electronic device may be a terminal device (such as a smart phone, a personal computer, etc.), and may also be a server (such as a local server or cloud server, or a server cluster, etc.), and may also be a processor, or a chip, etc. As shown in fig. 1, the adverse reaction event early warning method may include the following steps:
step 110, a target text set is obtained, wherein the target text set comprises a plurality of text information related to adverse reaction events;
step 120, entity extraction is performed on the text information in the target text set to obtain event entity information, wherein the event entity information comprises entities and relations among the entities;
and 130, performing intelligent analysis and early warning on the attention degree of the adverse reaction event based on the event entity information to obtain an early warning result.
In particular, the target text set may be a set comprising a plurality of text information, and each text information in the set is associated with an adverse reaction event. Adverse reaction events herein may include drug adverse reactions (Adverse Drug Reactions, ADR) and drug adverse events (Adverse Drug Events, ADE). The adverse drug reaction refers to an irrelevant or unexpected adverse reaction of the qualified drug with the normal dosage of the drug; adverse drug events refer to any unfortunate medical and health event that occurs during drug therapy and is not necessarily causally related to drug therapy.
The target text set can be acquired through external equipment, block chain, polling, request, pre-stored direct acquisition and the like. The text information in the target text set may be directly input by a user, may be obtained after audio acquired by voice transcription, may be obtained by acquiring an image through an image acquisition device such as a scanner, a mobile phone, a camera, etc., and performing OCR (Optical Character Recognition ) on the image, or may be obtained from the internet through a crawler.
In consideration of the fact that only adverse events of medicines after being marketed are monitored at present, early warning data sources are single, and therefore medicine early warning accuracy is low, at least two types of data sources exist for each text message in the embodiment of the invention, each text message related to adverse reaction events is obtained from a plurality of types of data sources, and then adverse reaction event early warning is carried out based on each text message, so that accuracy and reliability of safety early warning for a medicine full life cycle can be remarkably improved.
In some embodiments, the obtaining the target text set, i.e., step 110 specifically includes:
step 111, obtaining information documents related to the medicine; step 112, extracting candidate text information in the information document, and classifying the candidate text information to screen out target text information related to the adverse reaction event; step 113, determining a target text set based on at least two of the target text information, the text information in the safety report of the clinical trial drug, and the text information in the safety monitoring data of the marketed drug.
Specifically, the information document related to the drug may be an information document related to any stage in the whole life cycle of the drug, for example, an information document related to a clinical trial stage or a post-market application stage of the drug. The information document may be a notice or news from a pharmaceutical company or a supervisor, or may be a document or conference information related to medicine, pharmacy, biology, health, or nursing. In addition, the target text set can be derived from text information in a safety report of clinical test medicines, text information in safety monitoring data of medicines on the market, patient medical records and the like.
By way of example, a crawler technology can be adopted to crawl data sources such as medicine type enterprise networks, public numbers, information websites and the like, and automatic detection and early warning are carried out, so that real-time monitoring of information documents related to medicines is realized, and the availability and information quality of the data sources are ensured. Furthermore, a crawler program can be written by using Python language, and a multithreading technology is adopted to crawl a target website, so that information documents related to medicines are obtained.
In order to obtain multimodal information from more data sources, candidate text information in the information document may be first extracted, considering that the obtained information document may include multimodal information such as text and charts (images and tables). Here, the candidate text information may include plain text information in the information literature and text information acquired from the chart.
In some embodiments, for the chart in the information document, a method for extracting candidate text information is provided, namely, extracting candidate text information in the information document in step 112, specifically including:
step 112-1, performing chart detection on the information document based on a trained target detection model to obtain position information of a chart, wherein the trained target detection model is obtained by training a sample document set of a coverage title area of a chart area pre-marked; and step 112-2, extracting candidate text information from the chart based on the position information of the chart.
When the information document includes a chart, the information document may be first input into a trained target detection model, the information document may be subjected to chart detection by the target detection model, and the position information of the chart in the information document may be output.
The target detection model can be obtained through training in the following way:
and obtaining a sample document set, wherein each sample document in the set is pre-marked with a chart area and a title area, the pre-marked chart area covers the title area associated with the chart, namely, the sample document set can be understood that each sample document in the sample document set is pre-marked with a chart position and a title position, and the pre-marked chart position is bound with the corresponding title position.
For example, when the sample document contains a chart a, the header area corresponding to the chart a is labeled as a header area, and the area corresponding to the chart a and the header area are labeled as a chart area, when the sample document is labeled, the labeled chart area covers the header area.
Then, training the initial target detection model based on the pre-labeled sample document set, and continuously learning the position information of the chart area and the title area in the sample document in the training process of the initial target detection model, so that the target detection model obtained by training can accurately predict charts possibly contained in information documents and obtain the position information of the charts.
It will be appreciated that by means of the trained object detection model, header position information corresponding to chart position information can also be obtained.
After the position information of the chart in the information document is obtained, the text information in the chart can be extracted, so that candidate text information is obtained. Extracting the text information of the chart may include the following two cases:
1) For a chart in a plain text format, the text in the chart source file can be accurately obtained by directly analyzing the content in the chart source file and combining the position information of the chart; 2) For charts in image format, OCR techniques may be used to extract text information from the chart.
In consideration of that candidate text information extracted from the information literature is not all related to the adverse reaction event, in order to improve the efficiency and reliability of text analysis and early warning for the adverse reaction event, the candidate text information can be further classified to screen out target text information related to the adverse reaction event.
The classification of candidate text information may be accomplished by machine learning techniques, such as by a trained text classification model, to identify and extract target text information associated with adverse events from the candidate text information.
To improve the accuracy and efficiency of the text classification model, a Multilingual BERT large-scale pre-training model combined with the fine-training method can be used to train the classifier. The model can process text in multiple languages simultaneously without the need to customize the model for each language separately.
The text classification model can be obtained through training the following steps:
firstly, preprocessing the sample text obtained by crawling, including removing useless information such as stop words, punctuations and the like, and converting the sample text into a token sequence so as to facilitate model processing.
Then, in the data labeling stage, a semi-supervision method is adopted, and specific keywords are used in advance, for example: the specified document is filtered out by the aid of the aid reaction, the aid event, adverse reactions and the like, and then the document is manually checked so as to mark positive and negative samples. The remaining negative samples are randomly extracted from the remaining documents.
The mBERT model is then used as a base model and is adapted and optimized in a fine-tuning manner to accommodate the text classification task. Specifically, a fully connected layer is added on top of the mBERT model, and a cross entropy loss function is used in the training process to measure the loss of the model, and a random gradient descent algorithm is adopted for optimization.
After multiple rounds of training, an efficient and accurate text classification model can be obtained, and the model can classify candidate text information so as to screen out target text information related to adverse reaction events.
On the basis, the target text set can be determined based on at least two of the target text information screened from the information literature related to the medicines, the text information in the safety report of the clinical test medicines and the text information in the safety monitoring data of the medicines on the market.
The safety report of the clinical trial drug may be a report provided by the enterprise to the research institutions and regulators during the clinical trial, and may include at least a SUSAR (Suspected Unexpected Serious Adverse Reaction), a DSUR (Development Safety Update Report, during development safety update report), an SAE (Serious Adverse Event ) report.
The safety monitoring data of the medicines on the market can reflect adverse event information and medication error information of the medicines on the market, and can be obtained from a public database for collecting the information, for example, the following databases can be obtained: FDA adverse event reporting system (FDA Adverse Event Reporting System, FAERS), european union drug regulatory agency drug alert database (European Union Drug Regulating Authorities Pharmacovigilance, EMA EudraVigilance) and japanese drug adverse event reporting database (Japanese Adverse Drug Event Report, JADER).
The FAERS serves as a united states drug alert marketing library that monitors the safety of all marketed drugs and therapeutic biologicals, and the public can obtain information via "dashboards" (shown in a categorized summary based on some key fields, such as report type/region/patient age/severity, etc.) and "quarter data files" (containing raw report data extracted from the FAERS over a specified time frame).
Wherein, the data elements presented by the quarter data file are:
(1) general information: case report ID (ASCII), FAERS case report ID, case report version, pharmaceutical enterprise unique case report ID, regulatory agency case report ID, date of delivery to FDA, date of first/last receipt by FDA, type of report (urgent, non-urgent, direct), form of report (spontaneous report, study report, unknown, XML), whether pharmaceutical enterprise electronically submitted (ASCII), severity (XML), whether fast reporting criteria (XML) are met;
(2) patient information: patient age and age group, patient gender, weight;
(3) drug information: drug unique identification number (ASCII), drug name (primary and secondary suspected drug in ASCII, suspected concomitant drug in XML), active substance name, route of administration, dose, frequency, cumulative dose, dosage form, concomitant drug, date of expiration of drug use, course of treatment, drug indication in MedDRA preferred terms, medication behavior (XML for increasing or decreasing dose, etc.), deexcitation effect (ASCII), reexcitation effect (ASCII), lot number, expiration date (ASCII), new drug application number;
(4) Event information: the preferred terms of MedDRA include adverse drug reactions, medDRA version (XML), date of occurrence (XML), severity, country of occurrence of event, and outcome (XML);
(5) report source information: the type of reporter, the country in which the reporter is located, the pharmaceutical enterprise/organization code that sent the report, the report source (ASCII), whether the voluntary report notifies the pharmaceutical enterprise (ASCII).
It can be understood that the text information in the safety report of the clinical test medicine and the text information in the safety monitoring data of the marketed medicine are both related to adverse reaction events, and the target text information screened from the information literature related to the medicine is also related to adverse reaction events, so that the safety early warning for the whole life cycle of the medicine can be based on at least two of the text information of the three types of data sources to construct a target text set. Therefore, multiple types of data sources are introduced, and corresponding text information extraction methods are adopted aiming at different types of data sources, so that the early warning range of adverse reaction events can be enlarged, and early warning information is enriched.
In order to realize early warning of adverse reaction events, after the target text set is obtained according to step 110, entity extraction can be performed on text information in the target text set to obtain event entity information, wherein the event entity information comprises entities and relations among the entities.
Entity extraction may also be referred to as named entity recognition (Named Entity Recognition, NER), which refers to automatically recognizing named entities from textual information. The event entity information extracted by the entity refers to entity information related to adverse reaction events, and the event entity information comprises the entities and relations among the entities.
Entity extraction is performed on text information, and can be divided into the following two cases:
1) For plain text information not containing charts, entity extraction can be performed on the text information based on a trained NER model or a pre-training language model to obtain event entity information;
2) For the text information containing the chart, in order to effectively utilize the chart information in the text information, the entity extraction structure is enhanced, so that the integrity and accuracy of entity extraction are improved, and entity extraction can be carried out on the text information containing the chart through a pre-training language model, so that event entity information is obtained.
In some embodiments, entity extraction is performed on text information containing charts to obtain event entity information, which specifically includes:
step 121, obtaining form information in the text information, and performing entity masking on the form information to obtain a complete blank filling template; step 122, determining a target text based on the text information and the complete blank filling template; step 123, inputting the target text into the pre-training language model, and outputting event entity information; the pre-training language model is obtained by pre-training the initial language model based on an alignment entity, a preset entity filling task and a complete filling task; the alignment entity is obtained by performing entity alignment on text information and associated form information in the sample document.
Specifically, for text information containing a chart, the trained object detection model described in step 112-1 may be used first to obtain form information in the text information, where the form information may be chart information.
The text information to be detected can be input into a trained target detection model, chart position information and title position information of the text information are output, then one-to-one binding is carried out on the chart and the title by analyzing the chart position information and the title position information, and title content is acquired based on the title position after the binding. And finally, screening the chart positions as required by taking the title content as an index, and extracting key chart information from the text information after obtaining the target chart position corresponding to the title content.
After the table information in the text information is obtained, entity masking can be carried out on the table information, and a complete blank filling template is obtained. The complete gap-filling template may be used to describe relationships between entities in the form information.
In some embodiments, the entity masking of the table information in step 121, to obtain a complete blank-filling template, specifically includes:
step 121-1, obtaining layout information corresponding to the form information; step 121-2, determining entity relations among cell entities in the form information according to the layout information; wherein the entity relationship comprises at least one of: the same column entity relationship, the same row entity relationship and the same row column entity relationship; step 121-3, filling the cell entities into task templates preset corresponding to the entity relationship to obtain template texts; and step 121-4, in the template text, masking the target entity in the cell entities to obtain the complete blank filling template.
Specifically, the layout information corresponding to the table information can represent the layout position of each cell entity in the table, and the layout information can specifically include the title row cell, the title column cell, and the correspondence between other cells and the title row cell and the title column cell of the table, where the other cells refer to cells other than the title row cell and the title column cell. It can be determined whether any two or more of the aligned entities have an entity relationship through the layout information.
The entity relationship between the cell entities in the table information mainly considers the relationship of the following two cases:
1) In a single column or row, the title cell has an entity relationship with other cells, i.e., a co-column entity relationship or a co-row entity relationship, which generally refers to a relationship between two cells. Any non-title line cell in the same line has the same-line entity relationship with the title line cell in the line, and any non-title column cell in the same column has the same-column entity relationship with the title column cell in the column.
2) One other cell has a relationship with its corresponding title row cell and title column cell, i.e., a peer row entity relationship. The inline entity relationship generally refers to a relationship between three cells.
For example, the layout information corresponding to the table information may be shown as table 1 below, where the first row of the table is a header row and the first column is a header column.
TABLE 1
Sorafenib-ravastatin Sorafenib alone P-value
Median OS 10.7months 10.5months 0.975
Median PFS 5.0months 4.4months 0.986
…… …… …… ……
For the layout information of table 1, it can be obtained that the same column entity relationship exists between the cell entities "P-value" and "0.975", the same row entity relationship exists between the cell entities "Median PFS" and "4.4 mol hs", and the same row entity relationship exists between the cell entities "Median OS", "10.7 mol hs" and "Sorafenib-ravastatin".
After the entity relation among the cell entities is obtained, the cell entities can be filled into task templates preset corresponding to the entity relation, and template texts are obtained.
The preset task template is text for describing the entity relationship between the cell entities. In some embodiments, the task template may be presented as shown in table 2 below:
TABLE 2
Number of entities Task template
2 Ent1 is associated with Ent2.
3 Ent1 and Ent2 may be related to Ent3.
In table 2, there are two task templates corresponding to the text describing the relationship between 2 entities and the text describing the relationship between 3 entities, respectively. Ent1, ent2, and Ent3 are placeholders corresponding to the cell entities in which the entity relationship exists.
After the task template is determined, the cell entity can be filled into the task template preset corresponding to the entity relationship, so that a template text is obtained. For example, two cell entities "P-value" and "P-value" having the same column entity relationship may be filled into the task template corresponding to the entity number 2, and the resulting template text may be represented as "P-value is associated with P-value".
For another example, three cell entities "Median OS", "10.7 montas", and "Sorafenib-ravastatin" that have a relationship with inline entities may be filled into the task templates corresponding to the entity number 3, and the resulting template text may be expressed as "Median OS and 10.7months may be related to Sorafenib-ravastatin".
And then, in the template text, carrying out mask processing on the target entity in the cell entity to obtain the complete blank filling template. The target entity may here be a random entity filled into the template text. In the masking process, a new special word [ SOE ] may be introduced to replace the masked entity.
For example, a target entity "Median OS" may be randomly selected from the template text "Median OS and 10.7months may be related to Sorafenib-ravastatin", and masked, and the resulting finished blank filling template may be expressed as "[ SOE ]]and 10.7. 10.7months may be related to Sorafenib-ravastatin). The resulting finished void-filling template may be denoted as X C
It should be noted that, in the embodiment of the present invention, the number of template texts included in the complete blank filling template is not specifically limited, and may include 3 or 5 template texts, for example.
After the complete blank filling template is obtained, a target text sequence can be determined based on the text information and the complete blank filling template. In one embodiment, the target text sequence may be expressed in the form of:
X=[CLS]X C [SEP]X E [SEP]
wherein X is a target text sequence; x is X C Is a complete form filling template; x is X E Is text information; [ CLS ]]And [ SEP ]]Is a special mark.
Thereupon, the target text may be input into the pre-trained language model, outputting event entity information.
The pre-training language model can be obtained through training by the following steps:
1. a sample document is obtained, the sample document including text information and associated form information. I.e. a document containing both text information and associated form information, can be taken as a sample document. For the acquisition of the sample document, the sample document may be acquired from a database including a large amount of information about medical, biological, health, or nursing documents, for example, from a document retrieval database such as PubMed, pubTab.
2. After obtaining the text information and the form information in the sample document, entity alignment can be performed on the text information and the form information, and the entity alignment can be used for judging whether two or more entities with different information sources point to the same object in the real world or not, and collecting named entities with the same reference together, so that aligned entities are obtained.
It is understood that the alignment entity is derived from both text information and form information and points to the same object in the real world.
Aiming at entity alignment, the alignment entity can be realized in a character string matching mode, namely, the cell content in the form information is matched with each entry in the text information, and the alignment entity is obtained. The character string matching method may include two kinds of:
1) Taking English as an example, acquiring a table, cell contents in the table and related text information. Each word in the cell (except the stop word and the punctuation mark) is converted to the same root and an attempt is made to find the corresponding position of each word in the cell in the text information. The conversion into the same root word is used for improving the matching efficiency, for example, words in a cell are plural, corresponding words in text information are singular, and the matching efficiency and accuracy can be further improved after the root word conversion.
2) Enumerating possible phrases in the text information, comparing each possible phrase with the content of the cell, scoring each phrase based on the overlapping proportion of the word appearing in the cell and the phrase, obtaining the score of each phrase in the text information, and reserving the phrase with the highest score as an alignment entity. For example, the content of the cell is "Sequenced genome falciparum", the phrase in the text information is "Sequenced genome plasmodium falciparum", and the score corresponding to this is 0.75. Further, a score threshold may be set, and a phrase with a score higher than a preset score threshold may be determined as an aligned entity with successful matching, for example, the score threshold may be 0.5.
3. Based on the alignment entity, a preset entity filling task and a complete filling task, pre-training the initial language model to obtain a pre-training language model.
Two self-supervision tasks, namely an Entity filling task (EI) and a complete filling task (TCT), are designed during pre-training.
Compared with the existing mask language model, the entity filling task EI can mask the entry of the aligned entity in the text information and require the model to be restored. Meanwhile, the complete filling task TCT converts a plurality of aligned entities in the table layout into a text of a missing entity, and the model needs to extract the correct entity from the text information to fill the blank. Through the above two tasks, the knowledge of the form is well integrated into the language model.
The training of the obtained pre-training language model can fully utilize entity information in form information related to text information, so that good processing precision is achieved on downstream NLP tasks (such as information extraction, relation extraction, classification and the like).
In some embodiments, the pre-training language model includes a prediction layer and a pointer layer in parallel, the target text sequence is input into the pre-training language model, and the event entity information is output, that is, step 123 specifically includes:
Step 123-1, inputting a target text sequence into a pre-training language model to obtain a first entity output by a prediction layer based on an entity filling task and a second entity output by a pointer layer based on a complete filling task; and step 123-2, fusing the first entity and the second entity to obtain event entity information.
Specifically, the pre-trained language model may include an input layer, a semantic information extraction layer, a prediction layer, and a Pointer layer, wherein the semantic information extraction layer may be a two-way long and short Term Memory Network (Bidirectional Long Short-Term Memory, biLSTM), the prediction layer may be a conditional random field Network (Conditional Random Fields, CRF), and the Pointer layer may be a attention-based Pointer Network (Pointer Network). The prediction layer and the pointer layer are both connected with the semantic information extraction layer, and the prediction layer and the pointer layer are parallel. Entity extraction may include the steps of:
1) Inputting the target text sequence to an input layer, and encoding the target text sequence into a semantic vector sequence by using the input layer;
2) Inputting the encoded semantic vector sequence to a semantic information extraction layer, and outputting semantic information of each word in the semantic vector sequence in the context;
3) Inputting semantic information of each word in the semantic vector sequence in the context to a prediction layer, solving complex conditions such as entity crossing or nesting through global optimization, and obtaining a first entity output by the prediction layer based on an entity filling task;
4) Inputting semantic information of each word in the context of the semantic vector sequence to a pointer layer, finding out a corresponding position in text information of an entity missing in a complete blank filling template, and obtaining a second entity output by the pointer layer based on the complete blank filling task;
5) On the basis, the first entity and the second entity can be fused to obtain event entity information. If the pointer layer can find the corresponding entity in the text information and the prediction layer does not predict, the second entity output by the pointer layer is also taken as the last output entity. Experiments show that the entity correction method based on the Pointer Network can effectively improve the prediction performance of CRF.
In addition, considering that a plurality of medicines can be given to one patient in a clinical test, and a plurality of corresponding adverse reaction events can also occur, in order to further improve the accuracy and reliability of the early warning of the adverse reaction events, a relation extraction model can be constructed on the pre-training language model, and the relation among the entities can be further extracted.
In some embodiments, after obtaining the entities in the event entity information, entity relationship extraction may be further performed, where the relationship extraction between the entities may include:
Encoding the target text sequence and each entity to obtain a semantic vector sequence;
based on the semantic vector sequence, extracting the relation of each entity to obtain the relation among the entities.
Specifically, firstly, inputting the target text sequence and each entity obtained by entity extraction into a model, and encoding the target text sequence into a semantic vector sequence by using a pre-training language model. Entity information may be added to the input text as special syllables with reference to the method of the PURE model.
And then, constructing a relation classification model on the word segments corresponding to the entity information, and extracting the relation of each entity to judge whether a specific relation exists between the given entities.
In some embodiments, the entities in the event entity information obtained therefrom include, but are not limited to, by entity extraction and relationship extraction: (1) drug information (drug name, dose, mode of administration, route of administration), (2) adaptability, (3) adverse reaction/event information, (4) SMQ of adverse reaction, (5) report type/severity/causal relationship, (6) patient information (age, sex, age at which adverse reaction/event occurred), (7) examination result, (8) diagnosis details, (9) icsr number, reporting time of the report.
In addition, the correspondence between entities of ICSR number-reporting time-indication-drug information-adverse reaction-SMQ of adverse reaction-report type/severity/causality-age can be obtained.
It should be noted that, considering that the ICSR numbers of the same patient are the same, if the same patient reports multiple SUSAR reports, only the version number will change, so the ICSR number + version number may be used as the unique code of the SUSAR report.
In some embodiments, after entity extraction is performed on text information in the target text set to obtain event entity information, the method further includes:
based on at least one of each preset dictionary, general term evaluation criteria of adverse reaction events and preset keywords, standardizing the entities in the event entity information; the standardized entity is used for carrying out intelligent analysis and early warning on the attention degree of the adverse reaction event to obtain an early warning result.
Specifically, considering that the entity extraction may not be standard in the event entity information, entity extraction is performed on the text information in the target text set, and after the event entity information is obtained, standardized processing is performed on the entity in the event entity information. The standardized entity is used for carrying out intelligent analysis and early warning on the attention degree of the adverse reaction event to obtain an early warning result. The standardization here may be to put together the entities with various medical dictionaries and general terms for efficient and accurate early warning of adverse events later on.
In some embodiments, the normalizing the entity in the event entity information specifically includes at least one of the following steps:
based on each preset dictionary, respectively normalizing at least one of a drug entity, an indication entity and an adverse reaction event entity in the entity;
based on the data source type of the text information corresponding to the event entity information, standardizing a report type entity in the entity;
based on general term evaluation criteria for adverse reaction events, normalizing severity entities in the entities;
and based on the preset keywords, normalizing the causal relationship entities in the entities.
In this embodiment, each preset dictionary may include a MedDRA dictionary (Medical Dictionary for Regulatory Activities, a supervision active medical dictionary), an SMQ (standard MedDRA analysis query) dictionary, a CDE (Center For Drug Evaluation, drug review center) drug dictionary, and the like.
The MedDRA dictionary is an internationally validated set of medical terms for use by regulatory authorities and the regulated biopharmaceutical industry for data entry, retrieval, evaluation and presentation throughout the regulatory process from before to after market. MedDRA is a medical dictionary aimed at standardized clinical information exchange.
Attribution and association are performed between each term of the MedDRA dictionary through a 5-level structure: 1. SOC (System Organ Class, systemic organ classification); 2. HLGT (High Level Group Term, high-level language); 3. HLT (High Level Term, high-Level language); 4. PT (Preferred Term, preferred); 5. LLT (Lowest Level Term, lower language). Generally, PT and SOC are two common dimensions when data analysis is performed, in the SUSAR report, LLT codes are marked beside for indications, and PT, SOC codes and corresponding terms can be obtained based on hierarchical correspondence in a dictionary.
SMQ is a standard analysis query set generated on the basis of MedDRA and is applied to clinical trial safety information communication of different stages.
Illustratively, the "drug name" entity and CDE drug dictionary may be matched and normalized to yield a standard drug name (e.g., fluxapyroxad capsule);
matching and standardizing the 'indication' entity and the MedDRA dictionary to obtain standard indication names (which can be standardized to the degree that LLT is low and the system of SOC is classified into any one of organ classifications, such as LLT 'infectious pneumonia', PT 'infectious pneumonia', HLT 'lower respiratory tract infection' and 'lung infection', HLGT 'infection-unspecified pathogen');
The "adverse reaction/event information" entity is matched and standardized with MedDRA and/or SMQ to obtain standard adverse event names (such as death, constipation, lower platelet count, pneumonia, etc.).
The matching standardization of the adverse event information and the MedDRA dictionary can be performed based on the hierarchical correspondence in the dictionary.
Matching and normalizing the adverse event information and the SMQ dictionary: in the SUSAR report, LLT codes are marked beside adverse events, PT codes are obtained through the LLT codes based on the MedDRA dictionary, and further, corresponding SMQ information can be obtained through the LLT codes or the PT codes based on the existing logic of the SMQ dictionary.
Report type entities among the entities may also be standardized based on the data source type of the text information corresponding to the event entity information. For example, standardized classification of report type entities may include: (1) clinical trials; (2) and (3) literature.
The severity entity may also be normalized based on a general term evaluation criteria for adverse events. The severity entity may be normalized by ranking the adverse reactions/time stamps, including: (1) death|teratogenesis, birth defect|permanent or significant loss of function; (2) life threatening; (3) hospitalization or prolonged hospitalization |other important medical events.
The causal relationship entities in the entities may also be normalized based on preset keywords. The preset keywords may be obtained by detecting keywords with respect to the word segmentation in the report. Normalized classification of causal relationship entities, comprising: (1) positive correlation |likely correlation; (2) uncertainty/to-be-evaluated or inability to evaluate | may not be related.
Where "positive correlation" exists in different reports as follows: "affirmative", "certain", "Related", "defined", "Strong correlation", etc.; the "likely correlation" exists in different reports as follows: "likely", "high probability", "Perfect correlation"; "likely correlation" exists in different reports as follows: "possibility", "certain probability", "association", "relation", "Moderate correlation"; "uncertain/to-be-evaluated or not-to-be-evaluated" is described in the following in the different reports: "unable to evaluate", "uncertainty"; "potentially unrelated" exists in different reports as follows: "possibly irrelevant", "Weak correlation"; "irrelevant" there are the following descriptions in the different reports: "irrelevant", "No corridation". According to the method provided by the embodiment of the invention, the entity in the event entity information is subjected to standardized processing, and the standardized entity provides a data base for the subsequent adverse reaction event early warning, so that the accuracy and reliability of the adverse reaction event early warning can be improved.
In some embodiments, the causal relationship entity is determined by:
based on semantic similarity between related segmentation words in the text information and a preset causal relationship entity, matching the related segmentation words with the preset causal relationship entity, and determining the causal relationship entity based on a matching result; and/or the number of the groups of groups,
and carrying out causal relation entity extraction on the text information based on the pre-training entity extraction model to obtain a causal relation entity.
Specifically, the causal entity can reflect the causal relationship between the drug and the adverse reaction event, where the preset causal entity refers to a preset field that can indicate the causal relationship between the drug and the adverse reaction event, and may include, for example: positive correlation, likely correlation, uncertainty, to evaluate, unable to evaluate, likely irrelevant, and uncorrelated.
The causal relation entity in the text information can be obtained by means of word segmentation matching. The related word in the text information refers to a word related to the causal relationship description in the text, and may be, for example, a word such as "affirmative", "uncertain" or "related". Based on the semantic similarity between the related segmentation word and the preset causal relationship entity, matching the related segmentation word with the preset causal relationship entity, and determining the causal relationship entity based on a matching result. If the word "positive" or "certain" exists in the text information and is matched with the preset causal relationship entity "positive correlation", the causal relationship entity of the text information is "positive correlation"; for another example, if there is a match between the word "No correlation" and the preset cause and effect entity "irrelevant" in the text information, the cause and effect entity of the text information is "irrelevant".
In another embodiment, the causal relationship entity in the text information may be obtained by pre-training an entity extraction model. The pre-training entity extraction model can be used for carrying out causal relation entity extraction on text information, and the text information can be input into the pre-training entity extraction model to obtain a causal relation entity output by the entity extraction model. The pre-training entity extraction model may be a general named entity recognition model, or may be a pre-training language model in the present invention, which is not particularly limited.
It should be noted that, the causal relationship entity may be determined by word segmentation matching, or may be determined by a pre-training entity extraction model, or the causal relationship entities obtained in the two ways may be fused to obtain a final causal relationship entity, which is not specifically limited in the embodiment of the present invention.
On the basis, according to the event entity information obtained in the step 120, intelligent analysis and early warning of attention degree can be carried out on adverse reaction events, and an early warning result is obtained. The attention degree here represents the attention degree aiming at the adverse reaction event, and the higher the attention degree is, the higher the level of the early warning result is; conversely, the smaller the attention degree is, the lower the level of the early warning result is.
In some embodiments, the early warning result obtained by performing the intelligent analysis of the attention degree may specifically include: (1) important attention is paid; (2) general attention; (3) no attention is required. The special early warning rule can be constructed based on CDE evaluation experience, and early warning is carried out based on the early warning rule.
In some embodiments, based on the event entity information, the intelligent analysis and early warning of the attention degree of the adverse reaction event is performed to obtain the early warning result, that is, step 130 specifically includes:
step 131, determining a first early warning result based on at least one of a drug entity, an adverse reaction event entity and an age entity in the event entity information; step 132, determining a second early warning result based on the dangerous degree entity and/or the causal relationship entity in the event entity information;
and step 133, carrying out early warning on the adverse reaction event based on the first early warning result and/or the second early warning result to obtain an early warning result.
Specifically, the early warning rule may include at least one of:
1) And determining a first early warning result based on at least one of a drug entity, an adverse reaction event entity and an age entity in the event entity information.
In some embodiments, the step of determining the first warning result includes:
Step 131-1, marking a drug entity, an adverse reaction event entity and an age entity in the event entity information respectively to obtain a drug entity label, an adverse reaction event entity label and an age entity label; step 131-2, determining a first pre-warning result based on at least one of the drug entity tag, the adverse reaction event entity tag, and the age entity tag.
Specifically, a tag system can be constructed according to CDE evaluation experience and business requirements. The preset tag may include the following:
drug entity tag: (1) high risk varieties; (2) varieties with risk measures adopted in the past;
adverse Event (AE) entity tag: (1) rare AE; (2) the focus is on SMQ;
age entity tag: (1) the age of less than or equal to 3 years; (2) 3-18 years old; (3) and more than or equal to 60 years old.
For example, labeling a drug entity, if the drug entity is of a high risk variety, can be labeled (1); if the pharmaceutical entity belongs to a variety for which a prior risk measure has been taken, it may be marked as (2). For another example, an age entity is marked, and if the age entity is less than or equal to 3 years old, it can be marked as (1); if the age entity is between 3 and 18 years old, it can be marked as (2); if the age entity is more than or equal to 60 years old, the age entity can be marked as (3). Therefore, after the drug entity label, the adverse reaction event entity label and the age entity label are obtained, the first early warning result can be determined.
For the pharmaceutical entity tag, whether the (1) type tag or the (2) type tag appears in the data source, the first early warning result is classified as: important attention is paid; for adverse reaction event entity (AE) tags, the first early warning result of the occurrence of the class (1) tag is classified as: focusing on the fact that the first early warning result of the class (2) label needs to be combined with other indexes for judgment; for an age entity tag, the first early warning result of the occurrence of the class (1) tag is classified as: focusing on the important point, the first early warning result of the type (2) label needs to be combined with other indexes for judgment.
2) And determining a second early warning result based on the dangerous degree entity and/or the causal relationship entity in the event entity information.
In one embodiment, the relationship between the second warning result and the hazard level entity and the causal relationship entity may be shown in table 3 below.
TABLE 3 Table 3
As shown in table 3, when the classification of the dangerous degree entity is "death|teratogenesis, birth defect|permanent or significant function loss", the second early warning result is "focus on" regardless of the classification of the causal relationship entity; as another example, the second pre-warning result is "general attention" when the classification of the threat level entity is "life threatening", and the classification of the causal relationship entity is "uncertain/to be evaluated or unable to evaluate |likely irrelevant|irrelevant".
3) And combining the first early warning result and the second early warning result, wherein when the first early warning result and the second early warning result have more than or equal to 2 general attention, the combined early warning result is the important attention.
It should be noted that, according to the three pre-warning rules, a pre-warning result can be obtained based on the first pre-warning result; the early warning result can also be obtained based on the second early warning result; the first early warning result and the second early warning result can be fused to obtain an early warning result, and the embodiment of the invention is not particularly limited.
In addition, a manual early warning mode can be adopted, and when the machine early warning is inconsistent with the manual early warning result, the manual early warning is mainly adopted.
In some embodiments, after obtaining the early warning result, the method further includes:
and displaying the early warning result by adopting at least one of a table, a time accumulation diagram and a sector diagram.
In order to be more convenient for a user to analyze and monitor the early warning result, the early warning result can be visually displayed.
1) In a tabular form, a table exhibiting the early warning results may include a plurality of dimensions: ICSR number, reporting time, indication, drug name, adverse reaction, SMQ, report type/risk/cause and effect, age, early warning results, and manual early warning. It should be noted that the early warning result in the table is determined according to the early warning rule.
The early warning results in the same level can be arranged in descending order according to the order of important attention, general attention and no attention, and the early warning results in the same level are arranged in descending order according to the reporting time.
2) The time stacking diagram can show SUSAR event change conditions of different early warning results in a time dimension. The display can be switched according to the CDE receiving time or the event occurrence time; different time intervals (year, half year, quarter, month, week, day) of display may also be selected; of course, the event accumulation map obtained by the 'download' and the 'full screen display' can also be downloaded.
3) The fan-shaped diagram can display the early warning result from four dimensions of medicine, enterprise, PT (adverse event) and SMQ.
According to the method provided by the embodiment of the invention, the special early warning rule and the label system are constructed based on the CDE evaluation experience, and meanwhile, the early warning function is combined with the actual working scene, so that the evaluation efficiency is improved to the greatest extent, and important early warning signals are found earlier.
The adverse reaction event early warning device provided by the invention is described below, and the adverse reaction event early warning device described below and the adverse reaction event early warning method described above can be correspondingly referred to each other.
Fig. 2 is a schematic structural diagram of an adverse reaction event early warning device provided by the present invention, and as shown in fig. 2, the adverse reaction event early warning device includes:
a text set obtaining unit 210, configured to obtain a target text set, where the target text set includes a plurality of text information related to adverse reaction events;
entity extraction unit 220, configured to perform entity extraction on the text information in the target text set, so as to obtain event entity information, where the event entity information includes entities and relationships between the entities;
and the event early warning unit 220 is configured to perform intelligent analysis and early warning on the attention degree of the adverse reaction event based on the event entity information, so as to obtain an early warning result.
According to the adverse reaction event early warning device provided by the embodiment of the invention, the event entity information is obtained by extracting the plurality of text information related to the adverse reaction event, and the adverse reaction event is intelligently analyzed and early warned in attention degree based on the event entity information, so that the early warning result is obtained, the adverse reaction data with comprehensive high quality can be obtained and processed, the comprehensive and accurate monitoring and early warning are realized, and the accuracy and reliability of the safety early warning covering the whole life cycle of the medicine are obviously improved.
Based on the above embodiment, the text set obtaining unit 210 is specifically configured to:
acquiring information documents related to medicines;
extracting candidate text information in the information document, and classifying the candidate text information to screen out target text information related to the adverse reaction event;
and determining the target text set based on at least two of the target text information, text information in a safety report of the clinical trial drug and text information in safety monitoring data of the marketed drug.
Based on the above embodiment, the text set obtaining unit 210 is further specifically configured to: performing chart detection on the information document based on a trained target detection model to obtain position information of a chart, wherein the trained target detection model is obtained by training a sample document set of a coverage title area of a chart area pre-marked; candidate text information is extracted from the chart based on the position information of the chart.
Based on the above embodiment, the entity extraction unit 220 is further specifically configured to: acquiring form information in the text information, and performing entity masking on the form information to obtain a complete blank filling template; determining a target text sequence based on the text information and the complete blank filling template; inputting the target text sequence into a pre-training language model, and outputting the event entity information; the pre-training language model is obtained by pre-training the initial language model based on an alignment entity, a preset entity filling task and a complete filling task; the alignment entity is obtained by performing entity alignment on text information and associated form information in the sample document.
Based on the above embodiment, the entity extraction unit 220 is further specifically configured to:
acquiring layout information corresponding to the table information;
determining entity relations among cell entities in the table information according to the layout information; wherein the entity relationship comprises at least one of: the same column entity relationship, the same row entity relationship and the same row column entity relationship;
filling the cell entities into task templates preset corresponding to the entity relationship to obtain template texts;
and in the template text, carrying out mask processing on the target entity in the cell entity to obtain the complete blank filling template.
Based on the above embodiment, the entity extraction unit 220 is further specifically configured to:
inputting the target text sequence into the pre-training language model to obtain a first entity output by the prediction layer based on an entity filling task and a second entity output by the pointer layer based on a complete filling task;
and fusing the first entity and the second entity to obtain the entity in the event entity information.
Based on the above embodiment, the entity extraction unit 220 is further specifically configured to:
encoding the target text sequence and each entity to obtain a semantic vector sequence;
And extracting the relation of the entities based on the semantic vector sequence to obtain the relation among the entities.
Based on the above embodiment, the adverse reaction event early warning device further includes an entity standardization unit for:
normalizing the entity in the event entity information based on at least one of each preset dictionary, a general term evaluation standard of adverse reaction events and preset keywords; the standardized entity is used for carrying out early warning on the adverse reaction event to obtain the early warning result.
Based on the above embodiments, the entity normalization unit is specifically configured to:
based on each preset dictionary, respectively normalizing at least one of a pharmaceutical entity, an indication entity and an adverse reaction event entity in the entities;
based on the data source type of the text information corresponding to the event entity information, standardizing report type entities in the entities;
based on general term evaluation criteria for adverse reaction events, normalizing severity entities in the entities;
and normalizing the causal relationship entity in the entities based on the preset keywords.
Based on the above embodiment, the event early warning unit 230 is specifically configured to:
Determining a first early warning result based on at least one of a drug entity, an adverse reaction event entity and an age entity in the event entity information; and
determining a second early warning result based on the dangerous degree entity and/or the causal relationship entity in the event entity information;
and carrying out early warning on the adverse reaction event based on the first early warning result and/or the second early warning result to obtain the early warning result.
Based on the above embodiment, the event early warning unit 230 is specifically configured to:
marking a drug entity, an adverse reaction event entity and an age entity in the event entity information respectively to obtain a drug entity label, an adverse reaction event entity label and an age entity label;
and determining the first early warning result based on at least one of the drug entity tag, the adverse reaction event entity tag and the age entity tag.
Based on the above embodiment, the method further includes a causal relationship entity determining unit, configured to:
based on semantic similarity between related segmentation words in the text information and a preset causal relationship entity, matching the related segmentation words with the preset causal relationship entity, and determining the causal relationship entity based on a matching result; and/or the number of the groups of groups,
And extracting the causal relation entity from the text information based on a pre-training entity extraction model to obtain the causal relation entity.
Optionally, based on the foregoing embodiment, fig. 3 is a schematic structural diagram of an adverse reaction event early warning system provided by the present invention, and as shown in fig. 3, an adverse reaction event early warning system is provided, including a client 310 and a server 320, where:
the client 310 is configured to obtain a keyword to be retrieved, and send the keyword to be retrieved to the server, where the keyword to be retrieved includes a target entity and/or a target early warning result;
the server 320 is configured to receive a keyword to be searched at the client, and determine adverse reaction event information related to the keyword to be searched based on event entity information of the adverse reaction event and an early warning result; the event entity information and the early warning result of the adverse reaction event are determined based on the adverse reaction event early warning method.
Specifically, by the adverse reaction event early warning method described in the embodiment, entity extraction is performed on text information in the target text set to obtain event entity information of the adverse reaction event, and then intelligent analysis early warning is performed on attention of the adverse reaction event to obtain an early warning result, so that convenience is provided for information searching and positioning. On the basis, an adverse reaction event early warning system can be constructed so as to facilitate quick searching and positioning of target information.
The adverse reaction event early warning system comprises a client and a server, wherein the client can comprise a user terminal, a user can input keywords to be searched through the user terminal in the form of a smart phone, a computer, a tablet personal computer and the like, and the keywords to be searched are sent to the server for searching. The keywords to be searched are keywords for which related information is expected to be searched from the early warning system, and specifically may include a target entity and/or a target early warning result.
After receiving the keywords to be searched of the client, the server can search and locate data associated with the keywords to be searched in event entity information and early warning results of the predetermined adverse reaction events.
1. Basic search
The search can be performed from dimensions such as "early warning result", "drug name", "enterprise", "adverse event", "ICSR number", "indication", "SMQ", etc.
1) When the target early warning result is "focus attention", the method can further search whether the target early warning result is "combined early warning" (i.e. the report itself has no "focus attention" condition, but contains more than or equal to 2 "general attention" conditions).
2) The input association is carried out after the keywords to be searched are input, and a plurality of (at most 5) medicines/indications and the like can be selected for searching, wherein the different keywords to be searched are in a 'and' relationship, and the same keyword to be searched is in a 'or' relationship.
3) When the keywords to be searched comprise an adverse event entity and an indication entity, the keywords are associated with a medDRA dictionary, input related contents can be associated, and the hierarchy of corresponding options in the medDRA dictionary is displayed;
it should be noted that, indications and adverse events in the SUSAR report are encoded according to the "LLT" level, and when a high-level medDRA term is input in the search, a relevant report containing the term of the LLT level under the term is automatically searched. Further, since SOC and PT are the main analysis levels, SOC and PT results are prioritized higher when inputting associations (e.g., the first 10-bit association results can be shown).
4) When the keyword to be searched comprises SMQ, the related SMQ name can be input for searching, association can be carried out after the input, and meanwhile, the corresponding SMQ searching type can be further selected at the right lower part of the drop-down frame: narrow, broad, algorithm (default to narrow).
Further, clicking the tree diagram mark on the right side of the SMQ retrieval frame can directly retrieve according to the SMQ dictionary (different SMQ interlayer levels are built in), the input frame on the left upper side of the popup window can be associated, after clicking the selectable item on the left side, the corresponding content can enter the selected item on the right side, and different SMQ retrieval types can be selected on the right lower side.
2. Event detail retrieval
The related information (report type, severity, causality and event level) of the adverse reaction event can be retrieved, and defaults to 'all'; multiple items of content can be selected in the same field (e.g., such as "teratogenic", "life threatening") and the different fields are in "and" relationship, and the same field is in "or" relationship.
3. Label retrieval
Tag-related information (AE, drug, age) can be retrieved, defaulting to "all"; multiple items of content (such as 'high risk variety', 'variety with risk measures taken in the past') can be selected from the same field, and the different fields are in 'and' relationship, and the same field is in 'or' relationship.
According to the adverse reaction event early warning system provided by the embodiment of the invention, through the pre-constructed event entity information and early warning result of the adverse reaction event, the rapid search with the target entity and/or the target early warning result is realized, the information query efficiency is improved, the evaluation efficiency is improved to the greatest extent, and the important early warning signal is discovered earlier.
Fig. 4 illustrates a physical schematic diagram of an electronic device, as shown in fig. 4, which may include: processor 410, communication interface (Communications Interface) 420, memory 430 and communication bus 440, wherein processor 410, communication interface 420 and memory 430 communicate with each other via communication bus 440. The processor 410 may invoke logic instructions in the memory 430 to perform an adverse event early warning method comprising:
Acquiring a target text set, wherein the target text set comprises a plurality of text information related to adverse reaction events;
entity extraction is carried out on the text information in the target text set to obtain event entity information, wherein the event entity information comprises entities and relations among the entities;
and based on the event entity information, performing intelligent analysis and early warning on the attention degree of the adverse reaction event to obtain an early warning result.
Further, the logic instructions in the memory 430 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, where the computer program product includes a computer program, where the computer program can be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer can execute the adverse reaction event early warning method provided by the above methods, and the method includes:
acquiring a target text set, wherein the target text set comprises a plurality of text information related to adverse reaction events;
entity extraction is carried out on the text information in the target text set to obtain event entity information, wherein the event entity information comprises entities and relations among the entities;
and based on the event entity information, performing intelligent analysis and early warning on the attention degree of the adverse reaction event to obtain an early warning result.
In still another aspect, the present invention further provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the adverse reaction event early warning method provided by the above methods, the method comprising:
acquiring a target text set, wherein the target text set comprises a plurality of text information related to adverse reaction events;
Entity extraction is carried out on the text information in the target text set to obtain event entity information, wherein the event entity information comprises entities and relations among the entities;
and based on the event entity information, performing intelligent analysis and early warning on the attention degree of the adverse reaction event to obtain an early warning result.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. An adverse reaction event early warning method is characterized by comprising the following steps:
acquiring a target text set, wherein the target text set comprises a plurality of text information related to adverse reaction events;
entity extraction is carried out on the text information in the target text set to obtain event entity information, wherein the event entity information comprises entities and relations among the entities;
and based on the event entity information, performing intelligent analysis and early warning on the attention degree of the adverse reaction event to obtain an early warning result.
2. The adverse reaction event early warning method according to claim 1, wherein the performing intelligent analysis early warning on the attention degree of the adverse reaction event based on the event entity information to obtain an early warning result comprises:
Determining a first early warning result based on at least one of a drug entity, an adverse reaction event entity and an age entity in the event entity information; and
determining a second early warning result based on the dangerous degree entity and/or the causal relationship entity in the event entity information;
and carrying out intelligent analysis and early warning on the attention degree of the adverse reaction event based on the first early warning result and/or the second early warning result to obtain the early warning result.
3. The adverse reaction event early warning method according to claim 2, wherein the determining the first early warning result based on at least one of a drug entity, an adverse reaction event entity, and an age entity in the event entity information includes:
marking a drug entity, an adverse reaction event entity and an age entity in the event entity information respectively to obtain a drug entity label, an adverse reaction event entity label and an age entity label;
and determining the first early warning result based on at least one of the drug entity tag, the adverse reaction event entity tag and the age entity tag.
4. The adverse reaction event early warning method according to claim 2, wherein the causal relationship entity is determined by:
Based on semantic similarity between related segmentation words in the text information and a preset causal relationship entity, matching the related segmentation words with the preset causal relationship entity, and determining the causal relationship entity based on a matching result; and/or the number of the groups of groups,
and extracting the causal relation entity from the text information based on a pre-training entity extraction model to obtain the causal relation entity.
5. The adverse reaction event early warning method according to any one of claims 1 to 4, wherein the acquiring the target text set includes:
acquiring information documents related to medicines;
extracting candidate text information in the information document, and classifying the candidate text information to screen out target text information related to the adverse reaction event;
and determining the target text set based on at least two of the target text information, text information in a safety report of the clinical trial drug and text information in safety monitoring data of the marketed drug.
6. The adverse reaction event early warning method according to any one of claims 1 to 4, wherein the entity extraction of the text information in the target text set to obtain event entity information includes:
Acquiring form information in the text information, and performing entity masking on the form information to obtain a complete blank filling template;
determining a target text sequence based on the text information and the complete blank filling template;
inputting the target text sequence into a pre-training language model, and outputting the event entity information;
the pre-training language model is obtained by pre-training the initial language model based on an alignment entity, a preset entity filling task and a complete filling task; the alignment entity is obtained by performing entity alignment on text information and associated form information in the sample document.
7. The adverse reaction event early warning method according to any one of claims 1 to 4, wherein the entity extraction is performed on the text information in the target text set, and after obtaining event entity information, further includes:
normalizing the entity in the event entity information based on at least one of each preset dictionary, a general term evaluation standard of adverse reaction events and preset keywords;
and the standardized entity is used for carrying out intelligent analysis and early warning on the attention degree of the adverse reaction event to obtain the early warning result.
8. An adverse reaction event early warning device, characterized by comprising:
the text set acquisition unit is used for acquiring a target text set, wherein the target text set comprises a plurality of text information related to adverse reaction events;
the entity extraction unit is used for carrying out entity extraction on the text information in the target text set to obtain event entity information, wherein the event entity information comprises entities and relations among the entities;
and the event early warning unit is used for carrying out intelligent analysis early warning on the attention degree of the adverse reaction event based on the event entity information to obtain an early warning result.
9. The adverse reaction event early warning system is characterized by comprising a client and a server; wherein,,
the client is used for receiving the keywords to be searched and sending the keywords to be searched to the server; the keyword to be searched comprises a target entity and/or a target early warning result;
the server is used for receiving the keywords to be searched of the client and determining adverse reaction event information related to the keywords to be searched based on event entity information and early warning results of adverse reaction events; wherein, the event entity information and the early warning result of the adverse reaction event are determined based on the adverse reaction event early warning method according to any one of claims 1 to 7.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the adverse reaction event warning method of any one of claims 1 to 7 when the computer program is executed.
CN202310685558.7A 2023-06-09 2023-06-09 Adverse reaction event early warning method, device, system and electronic equipment Pending CN116913549A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310685558.7A CN116913549A (en) 2023-06-09 2023-06-09 Adverse reaction event early warning method, device, system and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310685558.7A CN116913549A (en) 2023-06-09 2023-06-09 Adverse reaction event early warning method, device, system and electronic equipment

Publications (1)

Publication Number Publication Date
CN116913549A true CN116913549A (en) 2023-10-20

Family

ID=88361845

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310685558.7A Pending CN116913549A (en) 2023-06-09 2023-06-09 Adverse reaction event early warning method, device, system and electronic equipment

Country Status (1)

Country Link
CN (1) CN116913549A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117558464A (en) * 2024-01-12 2024-02-13 四川大学华西医院 Method for constructing ADR prediction model of elderly patient, prediction system and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117558464A (en) * 2024-01-12 2024-02-13 四川大学华西医院 Method for constructing ADR prediction model of elderly patient, prediction system and storage medium
CN117558464B (en) * 2024-01-12 2024-04-26 四川大学华西医院 Method for constructing ADR prediction model of elderly patient, prediction system and storage medium

Similar Documents

Publication Publication Date Title
US10818397B2 (en) Clinical content analytics engine
US20200381087A1 (en) Systems and methods of clinical trial evaluation
WO2021068601A1 (en) Medical record detection method and apparatus, device and storage medium
US20220044812A1 (en) Automated generation of structured patient data record
CN112035675A (en) Medical text labeling method, device, equipment and storage medium
CN114996388A (en) Intelligent matching method and system for diagnosis name standardization
CN112541066A (en) Text-structured-based medical and technical report detection method and related equipment
CN116913549A (en) Adverse reaction event early warning method, device, system and electronic equipment
G. Rodrigo et al. Machine learning from crowds: A systematic review of its applications
Falissard et al. Neural translation and automated recognition of ICD-10 medical entities from natural language: Model development and performance assessment
CN113707304B (en) Triage data processing method, triage data processing device, triage data processing equipment and storage medium
CN116913548A (en) Adverse reaction data analysis method, device, electronic equipment and storage medium
Wong et al. Medication-rights detection using incident reports: A natural language processing and deep neural network approach
CN116976321A (en) Text processing method, apparatus, computer device, storage medium, and program product
CN112053760B (en) Medication guide method, medication guide device, and computer-readable storage medium
JP6026036B1 (en) DATA ANALYSIS SYSTEM, ITS CONTROL METHOD, PROGRAM, AND RECORDING MEDIUM
US8756234B1 (en) Information theory entropy reduction program
US10586616B2 (en) Systems and methods for generating subsets of electronic healthcare-related documents
WO2022061259A1 (en) System and method for automatic analysis and management of a workers' compensation claim
Armstrong et al. Person or PC? A Comparison of Human and Computer Coding as Content Analyses Tools Evaluating Severe Weather
Ficheur et al. Interoperability of medical databases: construction of mapping between hospitals laboratory results assisted by automated comparison of their distributions
CN116992839B (en) Automatic generation method, device and equipment for medical records front page
CN114822859B (en) Treatment thread mining and searching method and device
CN111079420B (en) Text recognition method and device, computer readable medium and electronic equipment
US20230143418A1 (en) Method and system for determining relationships between linguistic entities

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination