CN112766903A - Method, apparatus, device and medium for identifying adverse events - Google Patents

Method, apparatus, device and medium for identifying adverse events Download PDF

Info

Publication number
CN112766903A
CN112766903A CN202110065632.6A CN202110065632A CN112766903A CN 112766903 A CN112766903 A CN 112766903A CN 202110065632 A CN202110065632 A CN 202110065632A CN 112766903 A CN112766903 A CN 112766903A
Authority
CN
China
Prior art keywords
text
word
adverse event
adverse
recognition model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110065632.6A
Other languages
Chinese (zh)
Other versions
CN112766903B (en
Inventor
赵奇
金毅
黄晞益
刘戈
朱晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AstraZeneca Investment China Co Ltd
Original Assignee
AstraZeneca Investment China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AstraZeneca Investment China Co Ltd filed Critical AstraZeneca Investment China Co Ltd
Priority to CN202110065632.6A priority Critical patent/CN112766903B/en
Publication of CN112766903A publication Critical patent/CN112766903A/en
Application granted granted Critical
Publication of CN112766903B publication Critical patent/CN112766903B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Quality & Reliability (AREA)
  • Primary Health Care (AREA)
  • Operations Research (AREA)
  • Data Mining & Analysis (AREA)
  • Machine Translation (AREA)

Abstract

The present disclosure provides a method, an apparatus, a device and a medium for identifying adverse events, wherein the method comprises: selectively obtaining text to be recognized from one or more data sources; selecting a recognition model corresponding to the type of the text according to the type of the text; and performing semantic recognition on the text using the selected recognition model to identify adverse events in the text. According to the method for identifying the adverse events, different identification models can be selected according to different text types, then corresponding adverse event identification is carried out on the text to be identified according to the selected identification models, manual screening and identification are not needed, the occurrence of missed reports/delayed reports of the adverse events is avoided, and the adverse events can be reported timely/accurately, so that the accuracy and timeliness of adverse event reports can be improved in various places such as companies or hospitals for manufacturing or using medicines or medical instruments, and the purposes of saving resources, improving efficiency, enabling and the like of the overall process of identification and reporting of the adverse events are achieved.

Description

Method, apparatus, device and medium for identifying adverse events
Technical Field
The present disclosure relates to the field of medicine, and more particularly, to a method, apparatus, device, and medium for identifying adverse events.
Background
In recent years, as regulatory regulations become stricter, requirements for Adverse Event (AE) reporting are becoming higher and higher. The national drug administration requires drug marketing licensees to establish a sound drug adverse event monitoring system and report adverse events in time. An overdue event may result in the product being sold down or even the drug approval documentation being cancelled. Therefore, currently, each pharmaceutical company requires all employees to report to the drug alert department on the day of learning of the adverse event to ensure that product safety is assessed in time and patient safety is ensured.
In addition, in places (such as hospitals) where drugs or medical devices are used, adverse events caused by drug exposure during pregnancy (maternal and paternal sources), drug exposure during lactation/lactation, drug overdose, drug abuse, misuse, adverse events accompanying the use of over-specification, medication errors, occupational exposure, lack of curative effect and disease progression, exposure to pathogens, drug interactions, medical devices (malfunction), death of unknown origin, suicide or suicide attempt, and unexpected benefits are increasing, and if the adverse events are not timely and effectively reported to related warning departments, the places (such as hospitals) where drugs or medical devices are used are exposed to huge risks of indemnification, accountability and even customs, and it is difficult to ensure the safety of patients or users.
At present, identification of adverse events is performed by means of manual screening, but as sources of adverse event information become more and more complex and diversified, manual screening of adverse events also increasingly requires more personnel to handle. The existing human resources are limited, and therefore, the compliance risk of the current adverse event report is increasingly prominent.
Under the conditions that the information sources of the adverse events are complicated and diversified and the human resources are limited, the requirements for avoiding missing reporting/late reporting of the AE and timely/accurate reporting of the AE are increased day by day.
Therefore, there is a need for a method for automatically identifying adverse events, and the method can identify different sources of adverse events respectively, so as to avoid missing reporting/late reporting of AE, and report AE timely/accurately.
Disclosure of Invention
In order to solve the problems, the present disclosure provides a method for identifying adverse events, and the method can identify different adverse event sources respectively to avoid missing reporting/late reporting of AE and timely/accurately report AE, so as to help companies or hospitals and other places that manufacture or use medicines or medical devices to improve accuracy and timeliness of adverse event reporting, and achieve the purposes of source saving, efficiency improvement, enabling and the like of AE identification and reporting of a whole flow.
The embodiment of the disclosure provides a method for identifying adverse events, which comprises the following steps: selectively obtaining text to be recognized from one or more data sources; selecting a recognition model corresponding to the type of the text according to the type of the text; and performing semantic recognition on the text using the selected recognition model to identify adverse events in the text.
According to an embodiment of the present disclosure, wherein the type of the text is determined based on a length of the text and/or a source of the text.
According to an embodiment of the present disclosure, the selecting, according to the type of the text, a recognition model corresponding to the type of the text includes: selecting a first recognition model if the type of text is a first type, wherein the first recognition model comprises: the system comprises a word segmentation device, a converter, a feature extraction device and a classifier, wherein the word segmentation device is used for segmenting sentences in the text, the converter is used for converting word segmentation results into vector sequences, the feature extraction device is used for extracting semantic features based on the vector sequences, and the classifier is used for judging whether adverse events are contained in the text based on the extracted semantic features.
According to the embodiment of the disclosure, the word segmenter comprises a first word segmenter and a second word segmenter, wherein the first word segmenter is used for segmenting the text word by word, and the second word segmenter is used for segmenting the text word by word; the converter comprises a first converter and a second converter, wherein the first converter is used for converting the word segmentation result of the first word segmenter into a word vector sequence, and the second converter is used for converting the word segmentation result of the second word segmenter into a word vector sequence.
According to an embodiment of the present disclosure, the second word splitter is configured to split words of the text word by word, and includes: and generating a directed acyclic graph of all word segmentation conditions in the sentences in the text according to a dictionary tree generated by the general dictionary and the field professional dictionary, thereby realizing word segmentation of the text word by word.
According to an embodiment of the present disclosure, the performing semantic recognition on the text to identify an adverse event in the text by using the selected recognition model includes: segmenting the sentences in the text word by utilizing the first word segmenter and segmenting the sentences in the text word by utilizing the second word segmenter; converting the word segmentation result of the first word segmentation device into a word vector sequence by using the first converter and converting the word segmentation result of the second word segmentation device into a word vector sequence by using the second converter; extracting semantic features based on the word vector sequence and the word vector sequence with the feature extractor; and judging whether the text contains the adverse events or not by using the classifier based on the extracted semantic features, wherein the text is determined to contain the adverse events under the condition that the probability of the occurrence of the adverse events in the text is greater than a preset threshold value.
According to an embodiment of the present disclosure, the selecting, according to the type of the text, a recognition model corresponding to the type of the text includes: selecting a second recognition model if the type of text is a second type, wherein the second recognition model comprises: the text recognition system comprises a named entity recognizer, an adverse event name recognizer, a semantic role filter and an event determiner, wherein the named entity recognizer is used for recognizing a named entity in a text, the adverse event name recognizer is used for recognizing an adverse event name in the text, the semantic role recognizer is used for recognizing a semantic role of an adverse event in a sentence of the text according to the recognized named entity and the adverse event name, the semantic role filter is used for screening at least one part of roles according to the recognized semantic role and a preset rule, and the event determiner is used for determining whether the text contains the adverse event according to the screened roles and a preset trigger word.
According to an embodiment of the present disclosure, the performing semantic recognition on the text to identify an adverse event in the text by using the selected recognition model includes: identifying a named entity in text with the named entity identifier; identifying an adverse event name in a text with the adverse event name identifier; identifying semantic roles in which adverse events occur in sentences of the text according to the identified named entities and the adverse event names by using the semantic role identifier; screening out at least a part of roles according to the identified semantic roles and a preset rule by utilizing the semantic role screener; and determining whether the text contains the adverse event or not by using the event determiner according to the screened roles and the preset trigger words, wherein the text is determined to contain the adverse event under the condition that the screened roles and the preset trigger words meet preset event triples.
According to an embodiment of the present disclosure, wherein the second recognition model further includes: a coreference resolver configured to perform coreference resolution in the text to determine an association between the drug and the adverse event, wherein the identifying the adverse event in the text using the selected identification model further comprises: and after the semantic role identifier is used for identifying the semantic role of the adverse event in the sentence of the text according to the identified named entity and the adverse event name, the coreference resolution in the text is completed by using the coreference resolution device.
According to an embodiment of the present disclosure, the adverse event includes at least the following three elements: subjects, causes, and bad outcomes.
According to the embodiment of the present disclosure, the method further includes: the recognition result regarding the adverse event is fed back through a predetermined reporter.
According to the embodiment of the present disclosure, the identification model is an identification model of a medical field, and the adverse event is an adverse event of the medical field.
The embodiment of the present disclosure provides an apparatus for identifying an adverse event, including: an acquisition module configured to selectively acquire text to be recognized from one or more data sources; a selection module configured to select a recognition model corresponding to the type of the text according to the type of the text; and a recognition module configured to perform semantic recognition on the text using the selected recognition model to identify adverse events in the text.
According to an embodiment of the present disclosure, wherein the type of the text is determined based on a length of the text and/or a source of the text.
According to an embodiment of the present disclosure, the selecting module includes: selecting a first recognition model if the type of text is a first type, wherein the first recognition model comprises: the system comprises a word segmentation device, a converter, a feature extraction device and a classifier, wherein the word segmentation device is used for segmenting sentences in the text, the converter is used for converting word segmentation results into vector sequences, the feature extraction device is used for extracting semantic features based on the vector sequences, and the classifier is used for judging whether adverse events are contained in the text based on the extracted semantic features.
According to the embodiment of the disclosure, the word segmenter comprises a first word segmenter and a second word segmenter, wherein the first word segmenter is used for segmenting the text word by word, and the second word segmenter is used for segmenting the text word by word; the converter comprises a first converter and a second converter, wherein the first converter is used for converting the word segmentation result of the first word segmenter into a word vector sequence, and the second converter is used for converting the word segmentation result of the second word segmenter into a word vector sequence.
According to an embodiment of the present disclosure, the second word splitter is configured to split words of the text word by word, and includes: and generating a directed acyclic graph of all word segmentation conditions in the sentences in the text according to a dictionary tree generated by the general dictionary and the field professional dictionary, thereby realizing word segmentation of the text word by word.
According to an embodiment of the present disclosure, the identification module includes: segmenting the sentences in the text word by utilizing the first word segmenter and segmenting the sentences in the text word by utilizing the second word segmenter; converting the word segmentation result of the first word segmentation device into a word vector sequence by using the first converter and converting the word segmentation result of the second word segmentation device into a word vector sequence by using the second converter; extracting semantic features based on the word vector sequence and the word vector sequence with the feature extractor; and judging whether the text contains the adverse events or not by using the classifier based on the extracted semantic features, wherein the text is determined to contain the adverse events under the condition that the probability of the occurrence of the adverse events in the text is greater than a preset threshold value.
According to an embodiment of the present disclosure, the selecting module includes: selecting a second recognition model if the type of text is a second type, wherein the second recognition model comprises: the text recognition system comprises a named entity recognizer, an adverse event name recognizer, a semantic role filter and an event determiner, wherein the named entity recognizer is used for recognizing a named entity in a text, the adverse event name recognizer is used for recognizing an adverse event name in the text, the semantic role recognizer is used for recognizing a semantic role of an adverse event in a sentence of the text according to the recognized named entity and the adverse event name, the semantic role filter is used for screening at least one part of roles according to the recognized semantic role and a preset rule, and the event determiner is used for determining whether the text contains the adverse event according to the screened roles and a preset trigger word.
According to an embodiment of the present disclosure, the identification module includes: identifying a named entity in text with the named entity identifier; identifying an adverse event name in a text with the adverse event name identifier; identifying semantic roles in which adverse events occur in sentences of the text according to the identified named entities and the adverse event names by using the semantic role identifier; screening out at least a part of roles according to the identified semantic roles and a preset rule by utilizing the semantic role screener; and determining whether the text contains the adverse event or not by using the event determiner according to the screened roles and the preset trigger words, wherein the text is determined to contain the adverse event under the condition that the screened roles and the preset trigger words meet preset event triples.
According to an embodiment of the present disclosure, wherein the second recognition model further includes: a coreference resolver configured to perform coreference resolution in the text to determine an association between the drug and the adverse event, wherein the identifying the adverse event in the text using the selected identification model further comprises: and after the semantic role identifier is used for identifying the semantic role of the adverse event in the sentence of the text according to the identified named entity and the adverse event name, the coreference resolution in the text is completed by using the coreference resolution device.
An embodiment of the present disclosure provides an apparatus for identifying an adverse event, including: a processor, and a memory storing computer-executable instructions that, when executed by the processor, cause the processor to perform any of the methods described above.
The disclosed embodiments provide a computer-readable recording medium storing computer-executable instructions, wherein the computer-executable instructions, when executed by a processor, cause the processor to perform any one of the methods described above.
The embodiment of the disclosure provides a method, a device, equipment and a medium for identifying adverse events. According to the method for identifying the adverse events, different identification models can be selected according to different text types, then corresponding adverse event identification is carried out on the text to be identified according to the selected identification models, manual screening and identification are not needed, so that the occurrence of missed-reporting/late-reporting AE is avoided, the AE can be reported timely/accurately, the accuracy and timeliness of adverse event reports can be improved in various places such as companies or hospitals for manufacturing or using medicines or medical instruments, and the purposes of saving sources, improving efficiency, enabling and the like of the whole process of AE identification and reporting are achieved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly introduced below. It is apparent that the drawings in the following description are only exemplary embodiments of the disclosure, and that other drawings may be derived from those drawings by a person of ordinary skill in the art without inventive effort.
Fig. 1 illustrates a flow diagram of a method of identifying adverse events according to an embodiment of the present disclosure.
FIG. 2A illustrates a block diagram of a first recognition model according to an embodiment of the present disclosure.
Fig. 2B illustrates a flow diagram for adverse event recognition of text using a first recognition model in accordance with an embodiment of the present disclosure.
Fig. 2C is an example of adverse event recognition of a first type of text using a first recognition model.
FIG. 3A illustrates a block diagram of a second recognition model according to an embodiment of the present disclosure.
Fig. 3B illustrates a flow chart for identifying adverse events to text using a second recognition model in accordance with an embodiment of the present disclosure.
Fig. 3C is an example of adverse event recognition of a second type of text using a second recognition model.
Fig. 3D and 3E illustrate examples of semantic role annotated results in accordance with an embodiment of the disclosure.
Fig. 4 illustrates a block diagram of an apparatus 400 for identifying adverse events according to an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the present disclosure more apparent, example embodiments according to the present disclosure will be described in detail below with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.
In the present specification and the drawings, substantially the same or similar steps and elements are denoted by the same or similar reference numerals, and repeated descriptions of the steps and elements will be omitted. Meanwhile, in the description of the present disclosure, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance or order.
In the specification and drawings, elements are described in singular or plural according to embodiments. However, the singular and plural forms are appropriately selected for the proposed cases only for convenience of explanation and are not intended to limit the present disclosure thereto. Thus, the singular may include the plural and the plural may also include the singular, unless the context clearly dictates otherwise.
In the prior art, identification of Adverse Events (AEs) is performed by manual screening, and in the case where the sources of adverse events information are complicated and diversified and human resources are limited, the risk of identifying AEs by manual screening is increasingly prominent, which gives rise to the risk of compliance with adverse event reports in various places such as companies and hospitals where drugs or medical devices are manufactured or used.
In order to solve the problems, the present disclosure provides a method for identifying adverse events, and the method can identify different adverse event sources respectively to avoid missing reporting/late reporting of AE and timely/accurately report AE, so as to help companies or hospitals and other places that manufacture or use medicines or medical devices to improve accuracy and timeliness of adverse event reporting, and achieve the purposes of source saving, efficiency improvement, enabling and the like of AE identification and reporting of a full flow.
The method for identifying adverse events provided by the present disclosure described above will be described in detail with reference to the accompanying drawings.
Fig. 1 illustrates a flow diagram of a method of identifying adverse events according to an embodiment of the present disclosure.
Referring to fig. 1, in step S110, text to be recognized may be selectively acquired from one or more data sources.
As an example, the one or more data sources may be from a paper published on the internet, wherein the published paper is a paper related to the medical field, in which some adverse events generated by a company or a hospital, etc. that manufactures or uses a drug or medical instrument may be described, and the paper is generally at a chapter level, and the content is usually extremely long.
As another example, the one or more data sources may be data recorded from a Call Center (Call Center) at a company or hospital or the like that makes or uses the drug or medical device. Typically, a person calls a call center, and the caller at the call center records the information of the calls, which may be reported or referred to by the patient or customer. Typically, the information for the call is relatively short, e.g., a few words.
As yet another example, the one or more data sources may be data from records in a call log, where the call log is typically a call summary written by a medical representative of an individual pharmaceutical company or the like after making a call with a doctor, where there may be some adverse event mentioned by the doctor. Typically, the data content of the call log is moderate, which is between the two data sources.
As another example, the data from the one or more data sources may be selectively selected manually or by setting an automatic configuration, for example, a thesis published on the internet or data recorded by a call center or a visit record may be manually selected as the one or more data sources; for example, the automatic configuration may be preset, and some time periods (for example, 00:00 to 07:00 per day) may be set in the automatic configuration to select data recorded by the call center as one or more data sources, and papers or call records published on the network may be selected as one or more data sources in another time period (for example, 08:00 to 20:00 per day), or the data sources may be selected in the time periods, which is not limited herein, and other time periods or data sources may be selected according to other manners.
The text to be recognized may or may not contain adverse event information obtained from the data source in the above examples.
In step S120, a recognition model corresponding to the type of the text may be selected according to the type of the text.
According to an embodiment of the present disclosure, the type of text may be determined based on the length of the text and/or the source of the text. The type of the text may be a first type and a second type.
As an example, the type of text may be determined based on the length of the text. For example, a type of text having a particularly long content length may be determined as the first type, a type of text having a relatively short content length may be determined as the second type, and a type of text having a relatively medium content length may be determined as the first type.
As another example, the type of text may be determined based on the source of the text. For example, the type of text to a paper published on the web may be determined as a first type, the type of text to a call record may be determined as a first type, and the type of text to data recorded to a call center may be determined as a second type.
As yet another example, the type of text may be determined based on the length of the text and the source of the text. For example, the type of text which is recorded to a treatise published on the internet and has an especially long content length may be determined as a first type, the type of text which is recorded to a call and has an appropriate content length may be determined as a first type, the type of text which is recorded to a call and has an especially long content length may be determined as a second type, the type of text which is recorded to a call center and has an especially long content length may be determined as a first type, and the type of text which is recorded to a call center and has an especially short content length may be determined as a second type.
According to an embodiment of the present disclosure, the recognition models may be a first recognition model and a second recognition model.
According to the embodiment of the present disclosure, in the case where the type of the text is a first type, a first recognition model corresponding to the first type is selected, which will be described in detail below with reference to fig. 2A.
According to the embodiment of the present disclosure, in the case where the type of the text is the second type, a second recognition model corresponding to the second type is selected, which will be described in detail below with reference to fig. 3A.
In step S130, semantic recognition may be performed on the text using the selected recognition model to identify adverse events in the text.
According to an embodiment of the present disclosure, semantic recognition may be performed on the text using the selected first recognition model to recognize an adverse event in the text, which will be described in detail below with reference to fig. 2B.
According to the embodiment of the present disclosure, semantic recognition may be performed on the text to identify an adverse event in the text by using the selected second recognition model, and the recognition process will be described in detail below with reference to fig. 3B.
The method of identifying adverse events of embodiments of the present disclosure is described above in connection with fig. 1. According to the method for identifying the adverse events, the corresponding identification models can be selected according to different text types, so that the adverse events in the text can be automatically identified without manually identifying the adverse events, the occurrence of missed reports/delayed reports of the adverse events is avoided, the adverse events in the text can be timely/accurately identified, the identified adverse events are timely and accurately reported, the accuracy and timeliness of the adverse event reports of places such as companies or hospitals for manufacturing or using medicines or medical instruments are improved, and the purposes of saving resources, improving efficiency, enabling and the like of the whole process of identifying and reporting the adverse events are achieved.
The first recognition model, the second recognition model and the recognition process thereof will be described in detail below with reference to fig. 2A and 3A.
FIG. 2A illustrates a block diagram of a first recognition model according to an embodiment of the present disclosure.
Referring to fig. 2A, the first recognition model 200 may include a tokenizer 210, a transformer 220, a feature extractor 230, and a classifier 240.
The tokenizer 210 may be configured to tokenize sentences in the text, and the tokenizer 210 may include a first tokenizer and a second tokenizer, wherein the first tokenizer may be configured to tokenize the text word by word and the second tokenizer may be configured to tokenize the text word by word.
According to the embodiment of the present disclosure, the first word segmenter may be configured to segment the text word by word, and may include separately segmenting sentences in the text word by word, so as to obtain the segmented sentences with single words as the granularity.
As an example, the text to be recognized may be "dizziness occurs after a patient takes aspirin", and the phrase with granularity of individual words obtained after the verbatim phrase break may be "one/patient/take/use/a/s/p/lin/back/show/head/halo".
According to the embodiment of the present disclosure, the second tokenizer may be configured to tokenize the text word by word, and the tokenizing may include generating a Directed Acyclic Graph (DAG) of all possible tokenization situations in the sentence according to a dictionary tree generated by a general dictionary and a domain professional dictionary, and then obtaining an optimal tokenization result by using viterbi (viterbi) decoding to obtain a punctuation sentence with a granularity of a word, where the general dictionary may be obtained from an open source thesaurus such as a thesaurus, and the domain professional dictionary refers to a dictionary in the medical field and may be a dictionary actually constructed according to its own situation.
As an example, the text to be recognized may be "a patient appears dizzy after taking aspirin", and a phrase that results in the word being granular after a phrase-by-phrase may be "a/patient/taken/aspirin/back/appeared/dizzy".
The converter 220 may be configured to convert the segmentation results into a vector sequence, and the converter 220 may include a first converter configured to convert the segmentation results of the first segmenter into a word vector sequence and a second converter configured to convert the segmentation results of the second segmenter into a word vector sequence.
According to the embodiment of the disclosure, the first converter may obtain the word segmentation result of the first word segmenter (for example, "one/patient/one/serve/one/stand/one/forest/back/present/head/halo" described above) by using the articles in the medical field, and then obtaining the semantic expression of the profession in the medical field by performing fine tuning on an open source pre-training language module (for example, albert model), that is, obtaining a word vector sequence (char-sequence), wherein the articles in the medical field can be obtained through the disclosed articles related to medical treatment.
According to the embodiment of the present disclosure, the second converter may convert the word segmentation result of the second word segmenter (e.g., "one word/patient/medicine/aspirin/after/present/dizziness" described above) into a word-vector sequence (word-sequence) by using an algorithm (e.g., cbow algorithm) that converts words into vectors.
Feature extractor 230 may be used to extract semantic features based on the vector sequence.
According to the embodiment of the present disclosure, the feature extractor 230 may send the word vector sequence (char-sequence) and the word vector sequence (char-sequence) obtained above to a deep learning model, respectively, and extract semantic features therein by using convolution in the model, so as to obtain two sentence vectors, where the deep learning model may be obtained by training with a labeled training sample in advance, and a source of the sample may be recorded data from a call center, and the label may indicate which data contains an adverse event and which data does not contain an adverse event.
As an example, the extraction of the semantic features described above can be performed with convolution kernel sizes of 3, 4, and 5 lengths in the deep learning model.
As an example, the deep learning model may be a Convolutional Neural Network (CNN), and more specifically, may be an algorithm for classifying text using a Convolutional Neural network, such as a TextCNN algorithm.
The classifier 240 may be used to determine whether the text contains an adverse event based on the extracted semantic features.
According to the embodiment of the present disclosure, the classifier 240 may splice the two sentence vectors obtained by the training and input the spliced two sentence vectors to the full connection layer of the cross entropy loss function of the deep learning model, so as to obtain a classification result whether the text contains an adverse event. As an example, in the case where the probability of occurrence of an adverse event in a text is greater than a predetermined threshold value, which may be 50% or 60% of the value, it may be determined that the text contains an adverse event.
While the first recognition model and the components included in the first recognition model are described above with reference to fig. 2A, it can be seen from the above that the process of performing adverse event recognition on a text by using the first recognition model may be the process shown in fig. 2B, and specifically, fig. 2B shows a flowchart of performing adverse event recognition on a text by using the first recognition model according to an embodiment of the present disclosure.
Referring to fig. 2B, in step S210, the sentences in the text may be participled word by word using the first participler and participled word by word using the second participler.
In step S220, the word segmentation result of the first word segmenter may be converted into a word vector sequence using the first converter and the word segmentation result of the second word segmenter may be converted into a word vector sequence using the second converter.
In step S230, semantic features may be extracted based on the word vector sequence and the word vector sequence using the feature extractor.
In step S240, it may be determined whether the text contains an adverse event based on the extracted semantic features by using the classifier, wherein in the case that the probability of the occurrence of the adverse event in the text is greater than a predetermined threshold, it is determined that the text contains the adverse event.
For details of each step in fig. 2B, reference may be made to the description of the corresponding portion in fig. 2A, and details are not repeated here. The above-mentioned identification of the text by the identification model for adverse events will be described in an exemplary manner with reference to fig. 2C.
Fig. 2C is an example of adverse event recognition of a first type of text using a first recognition model.
Referring to fig. 2C, the text to be recognized may be "dizziness occurred after a patient takes aspirin" as described above.
First, in step S2021, the first segmenter performs segmentation on the text to be recognized, and in step S2011, the second segmenter performs segmentation on the text to be recognized, so as to obtain: "one/patient/take/use/a/s/p/lin/post/present/head/dizziness" and "one/patient/take/aspirin/post/present/dizziness";
next, in step S2022, the obtained segmentation results are processed by the first converter, and in step S2012, the obtained segmentation results are processed by the second converter, so as to obtain different vector sequences, i.e. the character vector sequence (char-sequence) and the character vector sequence (char-sequence);
then, in step S2023 and step S2013, respectively, extracting semantic features from the obtained vector sequence through a convolutional layer by using a feature extractor, and respectively obtaining a brand new sentence vector;
next, in step S2030, the classifier is used to process the concatenation of the two completely new sentence vectors to obtain a result of two classifications, and when the probability of occurrence of an adverse event in the text is greater than a predetermined threshold, it may be determined that the text contains an adverse event;
finally, at step S2040, AE identification results are obtained.
The above-mentioned first recognition model and the process of recognizing the first type of text by using the first recognition model are described in detail with reference to fig. 2A to 2C, and next, the process of recognizing the second type of text by using the second recognition model and the second recognition model will be described in detail with reference to fig. 3A to 3E.
FIG. 3A illustrates a block diagram of a second recognition model according to an embodiment of the present disclosure.
Referring to fig. 3A, the second recognition module 300 may include a named entity recognizer 310, an adverse event name recognizer 320, a semantic role identifier 330, a semantic role filter 340, and an event determiner 350.
The named entity identifier 310 can be used to identify named entities in text.
According to an embodiment of the present disclosure, the named entity identifier 310 may identify named entities such as patients, drugs, factories, time, etc. by sequence modeling the text using a Transformer model in conjunction with Conditional Random Fields (CRF).
As an example, the text to be recognized may be "professor Sun has two patients who took aspirin and experienced vomiting and dizziness after 100 mg/d. Ticagrelor [ Shandong Qinghua pharmaceutical preparations ] may also be administered in combination by a small number of patients]With the subsequent appearance of bleeding symptoms. After the named entity identifier 310 identifies the named entity in the text, the sequence labeling result obtained may be: "Teaching Sun [ PER]Is provided withTwo patients [ PER]Take it orallyAspirin [ MED]Followed by vomiting and dizziness. May also haveMinority of patients [ PER]Is administered in combinationTicagrelor (Shandong Qinghua pharmaceutical)][MED]With the subsequent appearance of bleeding symptoms. "where PER denotes a human and MED denotes a drug.
The adverse event name identifier 320 may be used to identify the adverse event name in the text.
According to an embodiment of the present disclosure, the adverse event name identifier 320 may generate a Dictionary tree using a Medical Dictionary for Regulatory Activities (MedDRA), and then find the adverse event name appearing therein using pattern recognition.
As an example, the adverse event name identifier 320 performs the adverse event by tagging the results of the above sequenceAfter name recognition, the result obtained may be: teaching Sun (PER)]There are two cases of patients [ PER]Aspirin (MED) is administered]Then take place
Figure BDA0002902703630000131
. Possibly also a few patients [ PER]Ticagrelor [ Shandong Qinghua pharmaceutical preparations for combined administration][MED]Later appear
Figure BDA0002902703630000132
. "AE" means an adverse event, and the "vomiting and dizziness symptoms" and "bleeding symptoms" are names of adverse events.
Semantic role identifier 330 may be used to identify semantic roles in a sentence of text where an adverse event occurred based on the identified named entity and the adverse event name.
According to the embodiment of the present disclosure, the semantic role identifier 330 may perform semantic role labeling by using an open source model (e.g., BERT-BLSTM-CRF) according to the content in the text to be recognized, and find out corresponding semantic role components of the adverse events in the sentence, including core semantic roles (e.g., patient, doctor, etc.) and auxiliary semantic roles (e.g., medicine, AE, etc.).
As an example, the semantic role identifier 330 may obtain the semantic role labeling result by performing semantic role identification on the obtained name recognition result, wherein the semantic role labeling result may be that "for professor grand education [ PER ] [ a0], two patients [ PER ] [ a0] have vomit and dizziness symptoms [ AE ] [ a1] after aspirin [ MED ] [ a1] is taken. There may also be a few patients [ PER ] [ A0] who had a bleeding symptom [ AE ] [ A1] after taking ticagrelor [ Shandong Qinghua pharmaceutical ] [ MED ] [ A1] in combination. "wherein A0 represents the performer and A1 represents the performer.
Semantic role filter 340 may be operable to filter out at least a portion of roles based on the identified semantic roles and predetermined rules.
According to the embodiment of the present disclosure, the semantic role filter 340 may filter out components that are unlikely to take a role by formulating rules in advance, so as to filter out at least a part of roles, where the rules may be heuristic rules, for example, the rules may be conditional clauses (such as even, if, etc.), other fuzzy statements (such as possible), or rules that only retain specific drug roles, etc., so as to filter out roles that satisfy the rules.
As an example, semantic role screening is performed on the obtained semantic role labeling result, and since there is no role that satisfies the rule in the result, information of all entities is continuously retained.
As another example, if the text message is "there are possibly also a few patients to take ticagrelor [ shandong qinghua pharmaceutical production ]", there is a patient role "there are also possibly a few patients" at this time, which is a fuzzy statement, and specific patient information cannot be extracted, so that the rule is satisfied and it is filtered out; the drug role is ticagrelor [ Shandong Qinghua pharmaceutical production ] and is not a required specific drug role, so that the drug role is judged to be invalid by rules and filtered.
The event determiner 350 may be configured to determine whether an adverse event is included in the text according to the screened characters and the predetermined trigger.
According to the embodiment of the present disclosure, the event determiner 350 determines that the text includes an adverse event by determining whether the screened role and the predetermined trigger satisfy a preset event triple, where the trigger may be a use, an occurrence, an equal verb, and the like, the preset event triple may be composed of three elements, and the three elements may be a patient, a trigger, and a drug or an adverse event name, respectively. As an example, the preset event triplet may be "patient-use-drug" or "patient-occurrence-AE".
As an example, the screened roles and the predetermined trigger words are extracted and judged through the designed template of the event triples to obtain an event of 'two patients [ Patient ] -taking-aspirin [ Medicine ]' and an event of 'two patients [ Patient ] -occurrence-vomiting and dizziness [ AE ]', and the events meet the judgment condition of the preset event triples, so the events are judged as AE items.
While the second recognition model and the components included in the second recognition model are described above with reference to fig. 3A, it can be seen from the above that the process of performing adverse event recognition on a text by using the second recognition model may be the process shown in fig. 3B, and specifically, fig. 3B shows a flowchart of performing adverse event recognition on a text by using the second recognition model according to an embodiment of the present disclosure.
Referring to fig. 3B, in step S310, a named entity in the text may be identified using the named entity identifier.
In step S320, the adverse event name in the text may be identified using the adverse event name identifier.
In step S330, a semantic role of an adverse event occurring in a sentence of the text may be identified according to the identified named entity and the adverse event name by using the semantic role identifier;
at step S340, at least a portion of the roles may be filtered according to the identified semantic roles and predetermined rules using the semantic role filter.
In step S350, the event determiner may be utilized to determine whether an adverse event is included in the text according to the screened role and the predetermined trigger word, where in a case that the screened role and the predetermined trigger word satisfy a preset event triple, it is determined that the text includes the adverse event.
For details of each step in fig. 3B, reference may be made to the description of the corresponding portion in fig. 3A, and details are not repeated here.
Since there are often problems of conflicting or ambiguous references for particularly long texts (such as texts at chapter level), and it is necessary to determine what each reference is, the second recognition model provided by the present disclosure further includes: a coreference resolver, the coreference resolver operable to complete coreference resolution in the text to determine an association between the drug and the adverse event.
According to the embodiment of the disclosure, for a particularly long text (such as a text at chapter level), the coreference eliminator can establish a relationship of cause and effect for the name of a medicine and the curative effect of the medicine, and for the situation that there is a reference conflict, based on the similarity of the attributes of the entity, the coreference elimination of the entity is completed, so as to determine the association relationship between the medicine and the AE.
By way of example, the particularly long text may be "32 patients in the experimental group took aspirin effervescent tablets, 100mg/d … … … aspirin had an anticoagulant effect, and in actual use … … had been demonstrated to cause 2 patients to develop symptoms of local bleeding due to the anticoagulant effect. "among them," local bleeding symptoms occurred in 2 patients due to anticoagulation "lacks the drug subject, and needs to be linked to aspirin in the former through anticoagulation. Aspirin was used instead of anticoagulant effect, and it was found that aspirin caused local bleeding in 2 patients.
As can be seen from the above, the process of performing adverse event recognition on the text by using the second recognition model may further include performing coreference resolution in the text by using the coreference resolver after identifying the semantic role of the adverse event occurrence in the sentence of the text according to the identified named entity and the adverse event name by using the semantic role identifier, for example, the coreference resolution may be included between step S330 and step S340 in fig. 3B.
The foregoing will now be described, by way of example, with reference to fig. 3C.
Fig. 3C is an example of adverse event recognition of a second type of text using a second recognition model.
Referring to FIG. 3C, the text to be recognized may be "professor Sun has two cases of patients taking aspirin, who experienced vomiting and dizziness after 100 mg/d. There may also be a few patients who have combined ticagrelor [ Shandong Qinghua pharmaceutical ] and then developed bleeding symptoms. "
First, in step S3040, after the named entity identifier is used to identify the named entity in the text, the obtained sequence labeling result is: "Teaching Sun [ PER]Is provided withTwo patients [ PER]Take it orallyAspirin [ MED]Followed by vomiting and dizziness. May also haveMinority of patients [ PER]Is administered in combinationTicagrelor (Shandong Qinghua pharmaceutical)][MED]With the subsequent appearance of bleeding symptoms. "
Next, at step 3010, the adverse event name identifier is used to identify the adverse event name of the result labeled with the sequence, and the result is: teaching Sun (PER)]There are two cases of patients [ PER]Aspirin (MED) is administered]Then take place
Figure BDA0002902703630000161
. Possibly also a few patients [ PER]Ticagrelor [ Shandong Qinghua pharmaceutical preparations for combined administration][MED]Later appear
Figure BDA0002902703630000162
。”
Next, in step S3020, performing semantic role identification on the obtained name recognition result by using a semantic role identifier, and obtaining a result of semantic role labeling: "Sun professor [ PER ] [ A0] there were two cases of patients [ PER ] [ A0] who took aspirin [ MED ] [ A1] and developed vomiting and dizziness symptoms [ AE ] [ A1 ]. There may also be a few patients [ PER ] [ A0] who had a bleeding symptom [ AE ] [ A1] after taking ticagrelor [ Shandong Qinghua pharmaceutical ] [ MED ] [ A1] in combination. "semantic role labeling results as shown in fig. 3D and 3E, in which n denotes a noun, nh denotes a person name, v denotes a verb, m denotes a number word, u denotes a co-word, c denotes a conjunction, wp denotes a punctuation, nd denotes an adverb, a0 denotes a performer, a1 denotes a victim, TMP denotes time, ADV denotes an adverb, and DIS denotes a conversation mark.
Since the text of the above example is not chapter-level text, there is no coreference resolution step, i.e., step S2030 does not need to be executed, and only step S2050 needs to be executed. In step S2050, the semantic role filter is used to filter the text, and since there are no fuzzy sentences that satisfy the rule in the example text, there is no language to filter out any content, and all entity information is continuously retained.
Finally, in step S2060, the event determiner is utilized to extract the example text to obtain an event of "two patients [ Patient ] -taking-aspirin [ Medicine ]" and an event of "two patients [ Patient ] -occurrence-vomiting and dizziness symptoms [ AE ]" which meet the judgment condition of the preset event triple, so that the events are judged as AE items.
As can be seen from the above, the above example text does not refer to a resolution step, and for a better understanding of the present disclosure, another example will be described below.
Referring again to fig. 3C, the text to be identified may be "32 patients in the experimental group took aspirin effervescent tablet, 100mg/d … … … aspirin had anticoagulant effect, … … has been confirmed in actual use. The result was that 2 patients experienced symptoms of local bleeding … … "due to anticoagulant effect.
First, in step S3040, after the named entity identifier is used to identify the named entity in the text, the obtained sequence labeling result is: "in the Experimental group32 patients [ PER]Is taken orallyAspirin effervescent tablet, 100mg/d [ MED]………Ash P Lin [ MED ]]Has anticoagulant effect, and has been proved in practical use … …. Due to the anticoagulant effect2 patients [PER]A local bleeding symptom … … "appeared.
Next, at step 3010, the adverse event name identifier is used to identify the adverse event name of the result labeled with the sequence, and the result is: "32 patients in the Experimental group [ PER]Taking aspirin effervescent tablet at 100mg/d (MED)]… … … Aspirin [ MED ]]Has anticoagulant effect, and has been proved in practical use … …. 2 patients were treated with anti-coagulant effect [ PER]Appear to
Figure BDA0002902703630000171
Figure BDA0002902703630000172
……”。
Thirdly, in step S3020, semantic role identification is performed on the obtained name recognition result by using a semantic role identifier, and the result of obtaining semantic role labeling is: "32 patients [ PER ] [ A0] in experimental group take aspirin effervescent tablet, 100mg/d [ MED ] [ A1] … … … aspirin [ MED ] [ A0] has anticoagulant effect [ A1], and actual use proves … …. Due to anticoagulant effect, 2 patients [ PER ] [ A0] developed symptoms of local hemorrhage [ AE ] [ A1] … … ".
Next, at step S2030, the above text was subjected to coreference resolution using a coreference resolver, specifically, "32 patients in the experimental group took aspirin effervescent tablets, and 100mg/d … … … aspirin had an anticoagulant effect, and … … was confirmed in actual use. The 2 patients had local bleeding due to anticoagulant effect, wherein the "local bleeding due to anticoagulant effect in 2 patients" lacks the drug subject and needs to be linked with aspirin in the former through anticoagulation. Aspirin was used instead of anticoagulant effect, and it was shown that aspirin [ MED ] [ A0] resulted in the appearance of a partial haemorrhage [ AE ] [ A1] in 2 patients [ PER ] [ A0 ].
Then, in step S2050, the semantic role filter is used to filter the text, and since there is no fuzzy sentence satisfying the rule in the example text, there is no language to filter out any content, and all the entity information is continuously retained.
Finally, in step S2060, the event determiner is utilized to extract the example text to obtain a "[ Patient ] -take-aspirin effervescent tablet [ Medicine ]" event and an "aspirin [ Patient ] -event which causes that-2 patients have local bleeding symptoms [ AE ]", and the event satisfies the judgment condition of the preset event triple, so that the event is judged as an AE item.
According to an embodiment of the present disclosure, the above adverse event may include at least the following three elements: subjects, causes, and adverse effects, and in the above examples, patients are the subjects, drugs are the causes, and adverse effects after drugs are used are the adverse effects are exemplified, of course, other variations of embodiments according to the present disclosure are possible, such as patients or users who manufacture or use drugs or medical devices, drugs or medical devices are the subjects, and adverse effects from pregnancy (maternal and paternal) drug exposure, lactation/lactation drug exposure, drug overdose, drug abuse, misuse, over-specification use of drugs accompanied by adverse events, medication errors, occupational exposure, lack of efficacy and disease progression, exposure to pathogens, drug interactions, medical devices (failure), unexplained death, suicide or attempted suicide, and unexpected benefits are the adverse effect elements included in the adverse events, are covered by the present disclosure, and those skilled in the art can easily obtain the above-mentioned related adverse event identification result according to the above-mentioned disclosed method for identifying adverse event, and will not be described herein again.
According to the embodiment of the present disclosure, the method for identifying an adverse event may further include feeding back an identification result about the adverse event through a predetermined reporter, and uploading, by the predetermined reporter, the identified specific AE entry to the reporting system in a case where the adverse event is identified to be included in the text to be identified, where the predetermined reporter may be a person who reports in the reporting system.
In addition, the recognition effect of the method for recognizing adverse events provided by the present disclosure on different data sources achieves an accuracy rate of greater than 50%, and a recall rate of greater than 99%, referring to table 1 below.
TABLE 1
Figure BDA0002902703630000191
See table 1, where the recall indicates the ratio of the number of AEs identified in all texts to be identified to the number of all AEs, e.g., 4 AEs in 100 texts to be identified, the recall is 75% if 3 AEs are identified, and the recall is 100% if 4 AEs are identified. The accuracy rate represents the ratio of the number of AEs identified in all texts to be identified as being true to the number of all identified AEs, for example, 4 AEs out of 100 texts to be identified, if 5 AEs are identified, it is stated that 4 AEs out of 5 AEs have an accuracy rate of 80%.
As can be seen from table 1, the method for identifying adverse events provided by the present disclosure has the identification effect exceeding the respective target value, and has very good identification effect.
The method for identifying adverse events provided by the present disclosure is described in detail above with reference to fig. 1 to 3E, and it can be known from the above detailed description that the method for identifying adverse events provided by the present disclosure can select different identification models according to different text types, then perform corresponding adverse event identification on a text to be identified according to the selected identification model, do not need to manually perform screening identification, avoid occurrence of missed reporting/late reporting AE, and report AE timely/accurately, thereby helping each company or hospital that manufactures or uses drugs or medical instruments to improve accuracy and timeliness of adverse event reporting, and achieving the purposes of saving resources, increasing efficiency, enabling and the like of the whole flow of AE identification and reporting. In addition, as can be seen from table 1 above, the method for identifying adverse events provided by the present disclosure achieves the effect of being completely receivable and completely missing the adverse events of the drug. On the basis, the resource investment degree of manual reinspection is greatly reduced.
The present disclosure provides an apparatus for identifying an adverse event in addition to the above-described method for identifying an adverse event, and an apparatus for identifying an adverse event according to an embodiment of the present disclosure will be described next with reference to fig. 4.
Fig. 4 illustrates a block diagram of an apparatus 400 for identifying adverse events according to an embodiment of the present disclosure.
Referring to fig. 4, the apparatus 400 for identifying an adverse event may include an acquisition module 410, a selection module 420, and an identification module 430.
The retrieval module 410 may be configured to selectively retrieve text to be recognized from one or more data sources.
As an example, the one or more data sources may be from a paper published on the internet, wherein the published paper is a paper related to the medical field, in which some adverse events generated by a company or a hospital, etc. that manufactures or uses a drug or medical instrument may be described, and the paper is generally at a chapter level, and the content is usually extremely long.
As another example, the one or more data sources may be data recorded from a Call Center (Call Center) at a company or hospital or the like that makes or uses the drug or medical device. Typically, a person calls a call center, and the caller at the call center records the information of the calls, which may be reported or referred to by the patient or customer. Typically, the information for the call is relatively short, e.g., a few words.
As yet another example, the one or more data sources may be data from records in a call log, where the call log is typically a call section written by a medical representative of an individual pharmaceutical company or the like after making a call with a doctor, where there may be some adverse event mentioned by the doctor. Typically, the data content of the call log is moderate, which is between the two data sources.
The selection module 420 may be configured to select a recognition model corresponding to the type of the text according to the type of the text.
According to an embodiment of the present disclosure, the type of text may be determined based on the length of the text and/or the source of the text. The type of the text may be a first type and a second type.
According to an embodiment of the present disclosure, the recognition models may be a first recognition model and a second recognition model.
According to an embodiment of the present disclosure, in a case where the type of the text is a first type, a first recognition model corresponding to the first type is selected.
According to an embodiment of the present disclosure, in a case where the type of the text is a second type, a second recognition model corresponding to the second type is selected.
The recognition module 430 may be configured to perform semantic recognition on the text using the selected recognition model to identify adverse events in the text.
According to an embodiment of the present disclosure, semantic recognition may be performed on the text using the selected first recognition model to identify adverse events in the text.
According to the embodiment of the disclosure, semantic recognition can be performed on the text by utilizing the selected second recognition model to recognize the adverse events in the text.
According to an embodiment of the present disclosure, the first recognition model may include a word segmenter, a converter, a feature extractor, and a classifier.
The tokenizer may be configured to tokenize a sentence in a text, and the tokenizer may include a first tokenizer and a second tokenizer, wherein the first tokenizer may be configured to tokenize the text word by word, and the second tokenizer may be configured to tokenize the text word by word.
The converter may be configured to convert the segmentation results into a sequence of vectors, and may include a first converter configured to convert the segmentation results of the first segmenter into a sequence of word vectors and a second converter configured to convert the segmentation results of the second segmenter into a sequence of word vectors.
A feature extractor may be used to extract semantic features based on the vector sequence.
The classifier may be configured to determine whether the text contains an adverse event based on the extracted semantic features.
According to an embodiment of the present disclosure, the second recognition model may include a named entity recognizer, an adverse event name recognizer, a semantic role identifier, a semantic role filter, and an event determiner.
According to an embodiment of the present disclosure, in a case where the type of the text is a first type, the identifying module may include: segmenting the sentences in the text word by utilizing the first word segmenter and segmenting the sentences in the text word by utilizing the second word segmenter; converting the word segmentation result of the first word segmentation device into a word vector sequence by using the first converter and converting the word segmentation result of the second word segmentation device into a word vector sequence by using the second converter; extracting semantic features based on the word vector sequence and the word vector sequence with the feature extractor; and judging whether the text contains the adverse events or not by using the classifier based on the extracted semantic features, wherein the text is determined to contain the adverse events under the condition that the probability of the occurrence of the adverse events in the text is greater than a preset threshold, wherein the preset threshold can be 50% or 60% of the value.
According to an embodiment of the present disclosure, in the case that the type of the text is a second type, the identifying module may include: identifying a named entity in text with the named entity identifier; identifying an adverse event name in a text with the adverse event name identifier; identifying semantic roles in which adverse events occur in sentences of the text according to the identified named entities and the adverse event names by using the semantic role identifier; screening out at least a part of roles according to the identified semantic roles and a preset rule by utilizing the semantic role screener; and determining whether the text contains the adverse event or not by using the event determiner according to the screened roles and the preset trigger words, wherein the text is determined to contain the adverse event under the condition that the screened roles and the preset trigger words meet preset event triples.
According to an embodiment of the present disclosure, the second recognition model further includes: a coreference resolver configured to perform coreference resolution in the text to determine an association between the drug and the adverse event, wherein the identifying the adverse event in the text using the selected identification model further comprises: and after the semantic role identifier is used for identifying the semantic role of the adverse event in the sentence of the text according to the identified named entity and the adverse event name, the coreference resolution in the text is completed by using the coreference resolution device.
Since the details of the above operations have been introduced in the process of describing the method for identifying adverse events according to the present disclosure, the details are not repeated here for brevity, and the related details can refer to the above description of fig. 1 to 3E.
A method and apparatus for identifying adverse events according to the disclosed embodiments has been described above with reference to fig. 1-4. However, it should be understood that: the various modules in the apparatus shown in fig. 4 may each be configured as software, hardware, firmware, or any combination thereof that performs a particular function. For example, the modules may correspond to an application specific integrated circuit, to pure software code, or to a combination of software and hardware. By way of example, and not limitation, the device described with reference to fig. 4 may be a PC computer, tablet device, personal digital assistant, smart phone, web application, or other device capable of executing program instructions.
It should be noted that although the apparatus 400 for identifying adverse events is described above as being divided into modules for respectively performing corresponding processes, it is clear to those skilled in the art that the processes performed by the modules may also be performed without any specific division of the modules by the apparatus for identifying adverse events or without explicit demarcation between the modules. Further, the apparatus described above with reference to fig. 4 is not limited to include the above-described modules, but some other modules (e.g., a storage module, a data processing module, etc.) may be added as needed, or the above modules may be combined.
Further, the method of identifying adverse events according to the present disclosure may be recorded in a computer-readable recording medium. In particular, according to the present disclosure, there may be provided a computer-readable recording medium storing computer-executable instructions that, when executed by a processor, may cause the processor to perform the method of identifying adverse events as described above. Examples of the computer readable recording medium may include magnetic media (e.g., hard disks, floppy disks, and magnetic tapes); optical media (e.g., CD-ROM and DVD); magneto-optical media (e.g., optical disks); and hardware devices (e.g., Read Only Memory (ROM), Random Access Memory (RAM), flash memory, etc.) that are specially configured to store and execute program instructions. Further, according to the present disclosure, there may also be provided an apparatus comprising a processor and a memory having stored therein computer-executable instructions, wherein the computer-executable instructions, when executed by the processor, cause the processor to perform the method of identifying an adverse event as described above. Examples of computer-executable instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter.
In addition, some operations of the method for identifying adverse events according to the present disclosure may be implemented by software, some operations may be implemented by hardware, and other operations may be implemented by a combination of hardware and software.
It is to be noted that the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises at least one executable instruction for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In general, the various example embodiments of this disclosure may be implemented in hardware or special purpose circuits, software, firmware, logic or any combination thereof. Certain aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device. While aspects of the embodiments of the present disclosure are illustrated or described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that the blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
The exemplary embodiments of the present disclosure described in detail above are merely illustrative, and not restrictive. It will be appreciated by those skilled in the art that various modifications and combinations of these embodiments or features thereof may be made without departing from the principles and spirit of the disclosure, and that such modifications are intended to be within the scope of the disclosure.

Claims (23)

1. A method of identifying an adverse event, comprising:
selectively obtaining text to be recognized from one or more data sources;
selecting a recognition model corresponding to the type of the text according to the type of the text; and
performing semantic recognition on the text using the selected recognition model to identify adverse events in the text.
2. The method of claim 1, wherein the type of text is determined based on a length of the text and/or a source of the text.
3. The method of claim 2, wherein the selecting, according to the type of text, a recognition model corresponding to the type of text comprises:
selecting a first recognition model if the type of text is a first type, wherein the first recognition model comprises: the system comprises a word segmentation device, a converter, a feature extraction device and a classifier, wherein the word segmentation device is used for segmenting sentences in the text, the converter is used for converting word segmentation results into vector sequences, the feature extraction device is used for extracting semantic features based on the vector sequences, and the classifier is used for judging whether adverse events are contained in the text based on the extracted semantic features.
4. The method of claim 3, wherein the tokenizer comprises a first tokenizer and a second tokenizer, wherein the first tokenizer is configured to tokenize the text on a word-by-word basis and the second tokenizer is configured to tokenize the text on a word-by-word basis;
the converter comprises a first converter and a second converter, wherein the first converter is used for converting the word segmentation result of the first word segmenter into a word vector sequence, and the second converter is used for converting the word segmentation result of the second word segmenter into a word vector sequence.
5. The method of claim 4, wherein the second tokenizer is to tokenize the text word by word, comprising:
and generating a directed acyclic graph of all word segmentation conditions in the sentences in the text according to a dictionary tree generated by the general dictionary and the field professional dictionary, thereby realizing word segmentation of the text word by word.
6. The method of claim 4, wherein said performing semantic recognition on the text using the selected recognition model to identify adverse events in the text comprises:
segmenting the sentences in the text word by utilizing the first word segmenter and segmenting the sentences in the text word by utilizing the second word segmenter;
converting the word segmentation result of the first word segmentation device into a word vector sequence by using the first converter and converting the word segmentation result of the second word segmentation device into a word vector sequence by using the second converter;
extracting semantic features based on the word vector sequence and the word vector sequence with the feature extractor;
determining, with the classifier, whether the text contains an adverse event based on the extracted semantic features,
and determining that the text contains the adverse events when the probability of the occurrence of the adverse events in the text is greater than a preset threshold value.
7. The method of claim 2, wherein the selecting, according to the type of text, a recognition model corresponding to the type of text comprises:
selecting a second recognition model if the type of text is a second type, wherein the second recognition model comprises: the text recognition system comprises a named entity recognizer, an adverse event name recognizer, a semantic role filter and an event determiner, wherein the named entity recognizer is used for recognizing a named entity in a text, the adverse event name recognizer is used for recognizing an adverse event name in the text, the semantic role recognizer is used for recognizing a semantic role of an adverse event in a sentence of the text according to the recognized named entity and the adverse event name, the semantic role filter is used for screening at least one part of roles according to the recognized semantic role and a preset rule, and the event determiner is used for determining whether the text contains the adverse event according to the screened roles and a preset trigger word.
8. The method of claim 7, wherein said performing semantic recognition on the text using the selected recognition model to identify adverse events in the text comprises:
identifying a named entity in text with the named entity identifier;
identifying an adverse event name in a text with the adverse event name identifier;
identifying semantic roles in which adverse events occur in sentences of the text according to the identified named entities and the adverse event names by using the semantic role identifier;
screening out at least a part of roles according to the identified semantic roles and a preset rule by utilizing the semantic role screener; and
determining whether the text contains an adverse event according to the screened characters and the preset trigger words by using the event determiner,
and determining that the text contains adverse events under the condition that the screened roles and the preset trigger words meet preset event triples.
9. The method of claim 8, wherein the second recognition model further comprises: a coreference resolver for performing coreference resolution in the text to determine an association between the drug and the adverse event,
wherein said identifying adverse events in said text using the selected recognition model further comprises:
and after the semantic role identifier is used for identifying the semantic role of the adverse event in the sentence of the text according to the identified named entity and the adverse event name, the coreference resolution in the text is completed by using the coreference resolution device.
10. The method of any one of claims 1 to 9, wherein the adverse event comprises at least the following three elements: subjects, causes, and bad outcomes.
11. The method of claim 10, further comprising: the recognition result regarding the adverse event is fed back through a predetermined reporter.
12. The method of claim 1, wherein the recognition model is a recognition model of a medical domain and the adverse event is an adverse event of the medical domain.
13. An apparatus for identifying an adverse event, comprising:
an acquisition module configured to selectively acquire text to be recognized from one or more data sources;
a selection module configured to select a recognition model corresponding to the type of the text according to the type of the text; and
a recognition module configured to perform semantic recognition on the text using the selected recognition model to identify adverse events in the text.
14. The apparatus of claim 13, wherein the type of text is determined based on a length of the text and/or a source of the text.
15. The apparatus of claim 14, wherein the selection module comprises:
selecting a first recognition model if the type of text is a first type, wherein the first recognition model comprises: the system comprises a word segmentation device, a converter, a feature extraction device and a classifier, wherein the word segmentation device is used for segmenting sentences in the text, the converter is used for converting word segmentation results into vector sequences, the feature extraction device is used for extracting semantic features based on the vector sequences, and the classifier is used for judging whether adverse events are contained in the text based on the extracted semantic features.
16. The apparatus of claim 15, wherein the tokenizer comprises a first tokenizer and a second tokenizer, wherein the first tokenizer is configured to tokenize the text on a word-by-word basis and the second tokenizer is configured to tokenize the text on a word-by-word basis;
the converter comprises a first converter and a second converter, wherein the first converter is used for converting the word segmentation result of the first word segmenter into a word vector sequence, and the second converter is used for converting the word segmentation result of the second word segmenter into a word vector sequence.
17. The apparatus of claim 16, wherein the second tokenizer is configured to tokenize the text word by word, comprising:
and generating a directed acyclic graph of all word segmentation conditions in the sentences in the text according to a dictionary tree generated by the general dictionary and the field professional dictionary, thereby realizing word segmentation of the text word by word.
18. The apparatus of claim 16, wherein the identification module comprises:
segmenting the sentences in the text word by utilizing the first word segmenter and segmenting the sentences in the text word by utilizing the second word segmenter;
converting the word segmentation result of the first word segmentation device into a word vector sequence by using the first converter and converting the word segmentation result of the second word segmentation device into a word vector sequence by using the second converter;
extracting semantic features based on the word vector sequence and the word vector sequence with the feature extractor;
determining, with the classifier, whether the text contains an adverse event based on the extracted semantic features,
and determining that the text contains the adverse events when the probability of the occurrence of the adverse events in the text is greater than a preset threshold value.
19. The apparatus of claim 18, wherein the selection module comprises:
selecting a second recognition model if the type of text is a second type, wherein the second recognition model comprises: the text recognition system comprises a named entity recognizer, an adverse event name recognizer, a semantic role filter and an event determiner, wherein the named entity recognizer is used for recognizing a named entity in a text, the adverse event name recognizer is used for recognizing an adverse event name in the text, the semantic role recognizer is used for recognizing a semantic role of an adverse event in a sentence of the text according to the recognized named entity and the adverse event name, the semantic role filter is used for screening at least one part of roles according to the recognized semantic role and a preset rule, and the event determiner is used for determining whether the text contains the adverse event according to the screened roles and a preset trigger word.
20. The apparatus of claim 19, wherein the identification module comprises:
identifying a named entity in text with the named entity identifier;
identifying an adverse event name in a text with the adverse event name identifier;
identifying semantic roles in which adverse events occur in sentences of the text according to the identified named entities and the adverse event names by using the semantic role identifier;
screening out at least a part of roles according to the identified semantic roles and a preset rule by utilizing the semantic role screener; and
determining whether the text contains an adverse event according to the screened characters and the preset trigger words by using the event determiner,
and determining that the text contains adverse events under the condition that the screened roles and the preset trigger words meet preset event triples.
21. The apparatus of claim 20, wherein the second recognition model further comprises: a coreference resolver for performing coreference resolution in the text to determine an association between the drug and the adverse event,
wherein said identifying adverse events in said text using the selected recognition model further comprises:
and after the semantic role identifier is used for identifying the semantic role of the adverse event in the sentence of the text according to the identified named entity and the adverse event name, the coreference resolution in the text is completed by using the coreference resolution device.
22. An apparatus for identifying an adverse event, comprising:
a processor, and
a memory storing computer-executable instructions that, when executed by the processor, cause the processor to perform the method of any one of claims 1-12.
23. A computer-readable recording medium storing computer-executable instructions, wherein the computer-executable instructions, when executed by a processor, cause the processor to perform the method of any one of claims 1-12.
CN202110065632.6A 2021-01-18 2021-01-18 Method, device, equipment and medium for identifying adverse event Active CN112766903B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110065632.6A CN112766903B (en) 2021-01-18 2021-01-18 Method, device, equipment and medium for identifying adverse event

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110065632.6A CN112766903B (en) 2021-01-18 2021-01-18 Method, device, equipment and medium for identifying adverse event

Publications (2)

Publication Number Publication Date
CN112766903A true CN112766903A (en) 2021-05-07
CN112766903B CN112766903B (en) 2024-02-06

Family

ID=75702951

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110065632.6A Active CN112766903B (en) 2021-01-18 2021-01-18 Method, device, equipment and medium for identifying adverse event

Country Status (1)

Country Link
CN (1) CN112766903B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122416A (en) * 2017-03-31 2017-09-01 北京大学 A kind of Chinese event abstracting method
CN108231059A (en) * 2017-11-27 2018-06-29 北京搜狗科技发展有限公司 Treating method and apparatus, the device for processing
CN109582949A (en) * 2018-09-14 2019-04-05 阿里巴巴集团控股有限公司 Event element abstracting method, calculates equipment and storage medium at device
CN109657158A (en) * 2018-11-29 2019-04-19 山西大学 A kind of adverse drug events information extracting method based on social network data
CN109670174A (en) * 2018-12-14 2019-04-23 腾讯科技(深圳)有限公司 A kind of training method and device of event recognition model
CN110597994A (en) * 2019-09-17 2019-12-20 北京百度网讯科技有限公司 Event element identification method and device
CN111669757A (en) * 2020-06-15 2020-09-15 国家计算机网络与信息安全管理中心 Terminal fraud call identification method based on conversation text word vector
CN112015901A (en) * 2020-09-08 2020-12-01 迪爱斯信息技术股份有限公司 Text classification method and device and warning situation analysis system
CN112131882A (en) * 2020-09-30 2020-12-25 绿盟科技集团股份有限公司 Multi-source heterogeneous network security knowledge graph construction method and device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122416A (en) * 2017-03-31 2017-09-01 北京大学 A kind of Chinese event abstracting method
CN108231059A (en) * 2017-11-27 2018-06-29 北京搜狗科技发展有限公司 Treating method and apparatus, the device for processing
CN109582949A (en) * 2018-09-14 2019-04-05 阿里巴巴集团控股有限公司 Event element abstracting method, calculates equipment and storage medium at device
CN109657158A (en) * 2018-11-29 2019-04-19 山西大学 A kind of adverse drug events information extracting method based on social network data
CN109670174A (en) * 2018-12-14 2019-04-23 腾讯科技(深圳)有限公司 A kind of training method and device of event recognition model
CN110597994A (en) * 2019-09-17 2019-12-20 北京百度网讯科技有限公司 Event element identification method and device
CN111669757A (en) * 2020-06-15 2020-09-15 国家计算机网络与信息安全管理中心 Terminal fraud call identification method based on conversation text word vector
CN112015901A (en) * 2020-09-08 2020-12-01 迪爱斯信息技术股份有限公司 Text classification method and device and warning situation analysis system
CN112131882A (en) * 2020-09-30 2020-12-25 绿盟科技集团股份有限公司 Multi-source heterogeneous network security knowledge graph construction method and device

Also Published As

Publication number Publication date
CN112766903B (en) 2024-02-06

Similar Documents

Publication Publication Date Title
Surdeanu et al. Using predicate-argument structures for information extraction
Ferraresi et al. Introducing and evaluating ukWaC, a very large web-derived corpus of English
CA2726576C (en) Financial event and relationship extraction
CN110427491B (en) Medical knowledge graph construction method and device based on electronic medical record
CN109657158B (en) Medicine adverse event information extraction method based on social network data
US20090222395A1 (en) Systems, methods, and software for entity extraction and resolution coupled with event and relationship extraction
CN109460552A (en) Rule-based and corpus Chinese faulty wording automatic testing method and equipment
CN112613315B (en) Text knowledge automatic extraction method, device, equipment and storage medium
Dornescu et al. Relative clause extraction for syntactic simplification
CN114021563A (en) Method, device, equipment and storage medium for extracting data in medical information
Doan et al. Using natural language processing to extract health-related causality from Twitter messages
Pal et al. Anubhuti--An annotated dataset for emotional analysis of Bengali short stories
CN112699669A (en) Natural language processing, device and storage medium for fluid pathology survey report
Al-Ayyoub et al. Framework for Affective News Analysis of Arabic News: 2014 Gaza Attacks Case Study.
Boulaknadel et al. Amazighe Named Entity Recognition using a A rule based approach
Foufi et al. De-identification of medical narrative data
CN112766903B (en) Method, device, equipment and medium for identifying adverse event
Ferreira et al. Generating flexible proper name references in text: Data, models and evaluation
Alsudias et al. Classifying information sources in Arabic Twitter to support online monitoring of infectious diseases
LeBlanc The polysemy of an “empty” prefix: A corpus-based cognitive semantic analysis of the Russian verbal prefix po
Baldwin et al. Beyond normalization: Pragmatics of word form in text messages
JP5441174B2 (en) Related information extraction apparatus, method and program thereof
Nikhil et al. Hindi derivational morphological analyzer
Verhoeven et al. Automatic Compound Processing: Compound Splitting and Semantic Analysis for Afrikaans and Dutch
Jiang et al. Describing and classifying post-mortem content on social media

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant