CN112241457A - Event detection method for event of affair knowledge graph fused with extension features - Google Patents

Event detection method for event of affair knowledge graph fused with extension features Download PDF

Info

Publication number
CN112241457A
CN112241457A CN202011002672.8A CN202011002672A CN112241457A CN 112241457 A CN112241457 A CN 112241457A CN 202011002672 A CN202011002672 A CN 202011002672A CN 112241457 A CN112241457 A CN 112241457A
Authority
CN
China
Prior art keywords
event
medical
data set
trigger
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011002672.8A
Other languages
Chinese (zh)
Inventor
方钰
徐蔚
王晨
翟鹏珺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN202011002672.8A priority Critical patent/CN112241457A/en
Publication of CN112241457A publication Critical patent/CN112241457A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Animal Behavior & Ethology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Machine Translation (AREA)

Abstract

The current Chinese medical affair knowledge graph event detection research is mainly based on a mode matching method and a clustering method, the entity distribution characteristics in the medical events are not considered, and the chapter consistency distribution rule between the medical events in each document is also ignored. The invention discloses a method for detecting a matter knowledge graph event fused with extension characteristics, which is characterized in that the event detection is assisted by utilizing entity information in a medical event sentence and event information between the medical event sentences. Firstly, a medical event representation template is designed according to an ACE standard, an event trigger word dictionary is constructed, secondly, a semi-automatic corpus tagging method is adopted to perform entity tagging on a medical text, then on the basis of basic characteristics, different entity information characteristics are selected as extension characteristics according to entity distribution characteristics in an event, an SVM is used to perform preliminary extraction and classification on trigger words in medical event sentences, and finally, the final result of event detection is improved by using the discourse consistency distribution rule of a medical document.

Description

Event detection method for event of affair knowledge graph fused with extension features
Technical Field
The present invention relates to the field of event detection for event extraction in computer natural language processing. Medical event detection is an important subtask in the medical affairs knowledge graph construction task.
Background
The smart city applies intelligent information technology to various fields such as social life, city construction, industrial production and the like, so that the smart information can better serve human beings. Research related to the field of smart urban medical has been spotlighted in recent years, including event detection of medical affairs knowledge-graphs.
Event detection is an important and challenging task in information extraction, and according to the definition of an ace (automatic Content extraction) conference on an Event, the Event is composed of an Event Trigger (Event Trigger) which can indicate the Event type in a sentence and an Event element (Argument) which describes an Event participation object in the sentence, for example, in the sentence "the patient is diagnosed as rectal cancer in my hospital because of stool blood in month 2 2012", the Trigger is "diagnosis", and the sentence is inferred as a diagnosis Event by the Trigger. The event detection task aims to identify a trigger word which can represent the characteristics of an event occurring in the event sentence from the event sentence, and classify the event type of the trigger word.
Most event detection tasks used in 2005 are open domain news event markup corpora provided by ACE conferences, and with research and development of event detection tasks, rise of text informatization of various professional fields and popularization of smart cities, event detection tasks are slowly moving to professional fields such as financial, legal, biomedical and medical fields. Taking the medical field as an example, the medical text contains a large number of strongly related dynamic events, such as treatment guide events aiming at various diseases in a medical guide, a series of diagnosis and treatment events from the patient's visit to the patient's discharge in an electronic medical record, and valuable medical event information is extracted from the medical text, so that the medical text can play roles in assisting a doctor in diagnosing and predicting the diseases and the like.
However, the study of event detection of the case knowledge graph facing the field of Chinese medical treatment is weak, and the study is mainly divided into two categories, one category adopts a template matching method based on syntactic analysis, and character string matching or similarity calculation is performed on predicate verbs in sentences and trigger words set in templates on the basis of the syntactic analysis, so that event detection is completed. Such methods are severely limited by predefined templates and require manual design of extraction rules and are not applicable to medical texts of irregular grammatical structures. The second category adopts a clustering method based on semantic characteristics to automatically classify trigger words in texts, but the event type granularity clustered by the method is too small, so that a diagnosis and treatment event is divided into a plurality of small events, and the completeness of the diagnosis and treatment event is lost. The existing Chinese medical field affair knowledge map event detection research mainly has two defects: only basic words and syntactic characteristics are considered, and entity distribution characteristics in different event sentences in the medical text are not considered; in addition, the events in the medical document often have a distribution rule of chapter consistency, that is, there is a distribution rule among different events, and the conventional research only performs sentence-level event detection, so that the chapter consistency characteristic is not utilized. The above shortcomings affect the result of medical affairs knowledge map event detection.
Disclosure of Invention
The medical affair knowledge graph event detection is a key subtask for constructing the affair knowledge graph in the field of smart urban medical treatment, and the extraction of valuable events from rich medical texts has important significance for assisting doctors in diagnosis and providing medical decisions. At present, the method for detecting and researching the affair knowledge graph events in the Chinese medical field is mainly based on pattern matching and the method based on clustering, the entity distribution characteristics in the medical events are not considered, and the chapter consistency distribution rule between the medical events in each document is also ignored.
Aiming at the problems, the invention aims at realizing the event detection in the medical event sentence, combines the extension characteristics of entity information on the basis of the basic characteristics aiming at the intra-sentence entity distribution rule of the medical text, uses an SVM classifier to complete the preliminary event detection on the medical event sentence, and reclassifies the event sentence with lower classification probability by using the chapter consistency distribution rule, thereby improving the overall event detection accuracy.
In order to achieve the purpose, the technical scheme provided by the invention is as follows:
the invention provides a method for detecting a matter knowledge graph event by fusing extended features, which comprises the following steps:
step 1, constructing a Chinese medical event data set;
step 2, preprocessing the original Chinese medical event data set;
step 3, defining a Chinese medical event representation template;
step 4, constructing a candidate trigger word dictionary;
step 5, defining Chinese medical named entity categories;
step 6, semi-automatically labeling the corpus of the data set;
and 7, introducing entity information as an extended feature on the basis of the basic feature, and completing primary event detection by using a classifier.
And 8, extracting event sentences with low classification probability from the primary event detection result in the step 7, and reclassifying the event sentences by using chapter consistency information.
And 9, the event detection result with high classification probability output in the step 7 and the reclassified event detection result output in the step 8 jointly form a final medical affairs knowledge graph event detection result.
Advantageous effects
The invention aims at the problems that the existing Chinese medical affair knowledge graph event detection research does not fully utilize entity information in medical events, does not consider the distribution rule between the medical events and the like, and realizes the affair knowledge graph event detection method integrating the expansion characteristics. The invention aims to realize the detection of Chinese medical affair knowledge atlas events, takes the characteristics of trigger words, the characteristics of trigger word property, the characteristics of trigger word context and candidate trigger word dictionary which are only considered in the past research as basic characteristics by statistically analyzing the characteristics of entity distribution in events, takes the characteristics of entity types in sentences and the characteristics of different types of entity quantities in sentences as expansion characteristics, uses an SVM classifier to complete the identification of the trigger words and the primary detection of the trigger words, and finally reclassifies event sentences with low classification result probability in the primary event detection by combining the chapter consistency distribution rule of medical events, thereby improving the overall event detection result. The invention is beneficial to promoting the research of Chinese medical affairs knowledge map event detection tasks.
The invention carries out the event detection experiment on the current medical history data set of the Chinese electronic medical record, and the event detection result is obviously improved after the expansion characteristic and the discourse consistency distribution characteristic are added.
Drawings
FIG. 1A process diagram for event detection of a physical knowledge graph incorporating extended features
FIG. 2 example diagram of pre-processed text of a medical event
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, a detailed description of the embodiments of the present invention will be given below with reference to the accompanying drawings. It is to be understood that the specific embodiments described herein are merely illustrative and explanatory of the invention and are not restrictive thereof.
A method for detecting event of event knowledge graph fused with extension features that the entity information in medical event sentence and the event information between medical event sentences are used to aid event detection. Firstly, a medical event representation template is designed according to an ACE standard, an event trigger word dictionary is constructed, secondly, a semi-automatic corpus tagging method is adopted to perform entity tagging on a medical text, then on the basis of basic characteristics, different entity information characteristics are selected as extension characteristics according to entity distribution characteristics in an event, an SVM is used to perform preliminary extraction and classification on trigger words in medical event sentences, and finally, the final result of event detection is improved by using the discourse consistency distribution rule of a medical document. The specific implementation process of the invention is shown in fig. 1, and sequentially comprises the following 9 steps:
step 1, constructing a Chinese medical event data set;
step 2, preprocessing the original Chinese medical event data set;
step 3, defining a Chinese medical event representation template;
step 4, constructing a candidate trigger word dictionary;
step 5, defining Chinese medical named entity categories;
step 6, semi-automatically labeling the corpus of the data set;
and 7, introducing entity information as an extended feature on the basis of the basic feature, and completing primary event detection by using a classifier.
And 8, extracting event sentences with low classification probability from the primary event detection result in the step 7, and reclassifying the event sentences by using chapter consistency information.
And 9, the event detection result with high classification probability output in the step 7 and the reclassified event detection result output in the step 8 jointly form a final medical affairs knowledge graph event detection result.
Each step is described in detail below.
The first step is as follows: a chinese medical event data set is constructed.
The current medical history part in the electronic medical record is the core part of the medical record, records the whole process of patients from illness to treatment and treatment, contains abundant medical events, and collects the current medical history document in the Chinese electronic medical record to construct a Chinese medical event data set. The raw medical event data set is provided to the second step.
The second step is that: the raw chinese medical event data set is preprocessed.
2.1: deleting sentences in the data set which are irrelevant to the medical events;
2.2: sentence splitting is carried out on sentences containing a plurality of medical events, so that each sentence in the data set corresponds to one medical event;
2.3: because the writing habits of doctors are different, the words which refer to the same disease, operation or medicine name in different documents are different, and the conditions of abbreviation, shorthand and variation exist, certain difficulty is brought to subsequent work, and medical vocabularies such as diseases, operations, medicines and the like need to be unified;
2.4: and obtaining a cleaned Chinese medical event data set, and providing the Chinese medical event data set for the third step, the fourth step, the fifth step and the sixth step. The preprocessed medical event text is shown in fig. 2.
The third step: a chinese medical event representation template is defined.
According to the definition of the event in the ACE conference, the data set content in the second step is combined, and the medical events in the data set are divided into six categories of admission, examination, inspection, treatment, operation and diagnosis, wherein the inspection events are divided into two categories of pathological inspection and immunohistochemistry, and the treatment events are divided into two categories of general treatment and chemotherapy. The defined event type is provided to the fourth step and the sixth step.
The fourth step: and constructing a candidate trigger word dictionary.
And according to the medical event category defined in the third step, respectively constructing a candidate trigger word dictionary for each type of event, selecting a trigger word with higher occurrence frequency in each type of event from the medical event data set cleaned in the second step as a candidate trigger word of each type of event, adding the candidate trigger word into the candidate trigger word dictionary of the corresponding type, and expanding the candidate trigger word dictionary by using a word which is similar to the candidate trigger word in the synonym forest. The trigger dictionary is provided to the seventh step and the eighth step.
The fifth step: chinese medical named entity categories are defined.
The classification of medical named entities based on the I2b2 conference and the CCKS competition, in combination with the data set contents in the second step, classifies the named entities in the data set into three major categories, disease, symptom and treatment, wherein the symptom category includes two categories of body parts and symptom description, and the treatment category includes three categories of medicine, surgery and general treatment. Providing the defined entity classes to the sixth step
And a sixth step: and semi-automatically labeling the corpus of the data set.
6.1: and performing entity labeling on the corpus provided in the second step by adopting a semi-automatic method based on a medical dictionary.
And according to the entity category in the fifth step, respectively collecting various entities from medical websites such as a syringyuan dictionary library, a dog search medical dictionary, a 39-health network and the like, constructing each entity dictionary, and automatically labeling the entities of the event sentences in the data set by adopting a maximum reverse matching algorithm. The data set after the entity annotation is completed is provided to 7.2.
6.2: and according to the event type defined in the third step, manually labeling the trigger words and the event types corresponding to the trigger words for each sentence in the data set provided by the 7.2. And providing the data set for completing entity labeling and event triggering word labeling for the seventh step and the eighth step.
The seventh step: on the basis of the basic features, entity information is introduced as extended features, and a classifier is used for completing preliminary event detection.
7.1: constructing basic characteristics (1) trigger word characteristics of a classifier; (2) triggering the part-of-speech characteristics; (3) triggering a word context feature; (4) the candidate triggers a dictionary feature.
(1) The collected medical dictionary is added to the user dictionary of the NLP tool LTP, and the LTP is used for segmenting words of texts in the data set. And converting the trigger word into a word vector by using a word2vec tool, and taking the vector as a trigger word characteristic.
(2) And the LTP tool carries out part-of-speech tagging on the text in the data set, and the part-of-speech of the trigger word in the sentence is used as the characteristic of the part-of-speech of the trigger word.
(3) And selecting a context window with the length of 3 before and after the trigger word, and taking the word and the part of speech in the window as the context characteristics of the trigger word.
(4) Different trigger words generally correspond to fixed one type or several types of events, so that the candidate trigger dictionary type corresponding to the trigger word is used as the candidate trigger dictionary characteristic.
7.2: constructing an extension characteristic of the classifier: (1) intra-sentence entity type characteristics; (2) the quantity of different types of entities within a sentence is characterized.
(1) Intra-sentence entity type features: the entity types contained in the sentences of different event types in the medical event data set provided in the sixth step have certain rules, for example, surgical entities are often found in admission events, diagnostic events and surgical events, body part entities are often found in pathological examination events, examination events and immunohistochemical events, drug entities are often found in chemotherapy events and general treatment events, and the like. Therefore, the practice distribution information greatly helps to classify the trigger words, and the entity type appearing in the event sentence is used as the characteristic of the entity type in the sentence.
(2) Intra-sentence number characteristics of different types of entities: the medical event data set provided in the sixth step has the condition that the entity types in part of different event sentences are distributed similarly, statistics shows that operation + disease or operation + symptom entities exist in part of hospital admission events and operation events, and only drug entities exist in part of chemotherapy events and general treatment events, but the number of the body part entities in the pathological examination events is usually more than that in the examination events, and the number of the drug entities in the chemotherapy events is usually more than that in the general treatment events, so the entity distribution information also has an important role in trigger word classification. The invention takes the number of different types of entities appearing in the event sentence as the number characteristic of different types of entities in the sentence.
7.3: training trigger word recognition two-classifier
Because there is a case where one sentence includes a plurality of candidate triggers in the medical event data set provided in the sixth step, and there is also a case where some unusual triggers are not included in the candidate trigger dictionary or do not appear in the training set, all candidate triggers, verbs, and vernouns in the sentence are labeled with two-class labels, and if the word is a trigger of the sentence, the word is labeled as 1, otherwise the word is labeled as 0. And (4) taking the characteristics of the trigger words, the characteristics of the parts of speech of the trigger words and the quantity characteristics of different types of entities in the sentences as combination characteristics, training vectors formed by the combination characteristics by using a Support Vector Machine (SVM) tool, and providing the training-obtained trigger word recognition binary classifier for the step 7.4.
7.4: the medical event data set is input into the trigger recognition two classifier and the trigger recognition result is provided to step 7.5.
7.5 training event detection Multi-classifier
And (3) carrying out event type classification training on the words identified as the trigger words in the 7.4 by using a Support Vector Machine (SVM) tool, wherein the features used by the classifier training are combined features made by four types of basic features in the 7.1 and two types of extended features in the 7.2, and the event detection multi-classifier obtained by training is provided for a step 7.6.
7.6 inputting the medical event data set into the event detection multi-classifier to complete the preliminary event detection, and providing the preliminary event detection result to the eighth step.
Eighth step: and (4) extracting event sentences with low classification probability from the preliminary event detection result in the step 7.6, and reclassifying the event sentences by using chapter consistency information.
Because the current medical history describes the occurrence and evolution processes of the diseases of the patients, the events have relationship, for example, the admission event cannot occur after the operation event, and the pathological examination event cannot occur before the operation event, so the invention further improves the detection result by using the chapter consistency characteristic of the medical text on the basis of adding the seventh step.
And 7.6, classifying sentences with classification result probability lower than 40% in 7.6 as untrustworthy event sentences, classifying sentences with classification result probability higher than 60% as trustable event sentences, and reclassifying the untrustworthy event sentences. Extracting an untrustworthy event sentence of which the front sentence and the back sentence are both trustable event sentences to construct a reclassified medical event set, combining nine characteristics of entity type characteristics, different types of entity quantity characteristics, basic characteristics in the sentence for event detection and different types of entity quantity characteristics in the sentence for event detection into chapter consistency characteristics, and reclassifying the reclassified medical event set by using the SVM.
The ninth step: and the high trigger word classification probability result in the seventh step and the trigger word reclassification result in the eighth step jointly form a final medical affairs knowledge graph event detection result.
And the event detection result with high classification probability output in the seventh step and the reclassified event detection result output in the eighth step jointly form a final medical affairs knowledge graph event detection result.
Innovation point
Aiming at the defects of event detection research in the field of smart city medical treatment, a method for detecting events of a matter knowledge graph with integrated expansion characteristics is provided. The method is different from the conventional Chinese medical field event detection method in that the medical entity distribution information in the event sentences and the chapter-level distribution information between the event sentences are fully utilized, the expansion characteristic based on the entity information is provided on the basis of the basic characteristic, a preliminary event detection result is obtained by using a method combining two classification and multi-classification, finally, the chapter consistency expansion characteristic is added to the experimental result based on the entity characteristic expansion, and the accuracy of event detection is further improved by utilizing a chapter-level multi-classifier.
The method provided by the text has a good effect in Chinese medical event detection, and provides sufficient help for the construction of a subsequent affair knowledge graph.

Claims (10)

1. A method for detecting a matter knowledge graph event fused with extension features is characterized by comprising the following steps
Step 1, constructing a Chinese medical event data set;
step 2, preprocessing the original Chinese medical event data set;
step 3, defining a Chinese medical event representation template;
step 4, constructing a candidate trigger word dictionary;
step 5, defining Chinese medical named entity categories;
step 6, semi-automatically labeling the corpus of the data set;
and 7, introducing entity information as an extended feature on the basis of the basic feature, and completing primary event detection by using a classifier.
And 8, extracting event sentences with low classification probability from the primary event detection result in the step 7, and reclassifying the event sentences by using chapter consistency information.
And 9, the event detection result with high classification probability output in the step 7 and the reclassified event detection result output in the step 8 jointly form a final medical affairs knowledge graph event detection result.
2. The method for detecting a case-of-knowledge-graph event fused with extended features according to claim 1, wherein in the first step:
the current medical history part in the electronic medical record is a core part of the medical record, records the whole process of patients from illness to treatment and treatment, and contains abundant medical events, so that the current medical history document in the Chinese electronic medical record is collected to construct a Chinese medical event data set; the raw medical event data set is provided to the second step.
3. The method of claim 1, wherein the extended features are fused to the event detection of the event knowledgebase,
the second step is that: the raw chinese medical event data set is pre-processed,
2.1: deleting sentences in the data set which are irrelevant to the medical events;
2.2: sentence splitting is carried out on sentences containing a plurality of medical events, so that each sentence in the data set corresponds to one medical event;
2.3: because the writing habits of doctors are different, the words which refer to the same disease, operation or medicine name in different documents are different, and the conditions of abbreviation, shorthand and variation exist, certain difficulty is brought to subsequent work, and medical vocabularies such as diseases, operations, medicines and the like need to be unified;
2.4: and obtaining a cleaned Chinese medical event data set, and respectively providing the Chinese medical event data set for the third step, the fourth step, the fifth step and the sixth step.
4. The method of claim 1, wherein the extended features are fused to the event detection of the event knowledgebase,
the third step: a chinese medical event representation template is defined,
according to the definition of the event in the ACE conference, the data set content in the second step is combined, and the medical events in the data set are divided into six categories of admission, examination, inspection, treatment, operation and diagnosis, wherein the inspection events are divided into two categories of pathological inspection and immunohistochemistry, and the treatment events are divided into two categories of general treatment and chemotherapy. The defined event type is provided to the fourth step and the sixth step.
5. The method for detecting event of event knowledgebase with fused extended features as claimed in claim 1, wherein the fourth step: a dictionary of candidate trigger words is constructed,
respectively constructing a candidate trigger dictionary for each type of event according to the medical event type defined in the third step, selecting a trigger with higher occurrence frequency in each type of event from the medical event data set cleaned in the second step as a candidate trigger of each type of event, adding the candidate trigger dictionary into the corresponding type of event, and expanding the candidate trigger dictionary by using a word which is similar to the candidate trigger in the synonym forest; the trigger dictionary is provided to the seventh step and the eighth step.
6. The method for detecting event of event knowledgebase with fused extended features as claimed in claim 1, wherein the fifth step: the Chinese medical named entity category is defined,
classifying medical named entities based on the I2b2 conference and the CCKS match, and classifying the named entities in the data set into three categories of diseases, symptoms and treatments by combining the data set content in the second step, wherein the symptom category is classified; the treatment category comprises three categories of medicine, operation and general treatment means. The defined entity classes are provided to the sixth step.
7. The method for detecting event of a case knowledge graph with fused extended features as claimed in claim 1, wherein the sixth step: the data set corpus is semi-automatically labeled,
6.1: performing entity labeling on the corpus provided in the second step by adopting a semi-automatic method based on a medical dictionary;
according to the entity category in the fifth step, various entities are collected from medical websites such as a syringyuan dictionary library, a dog search medical dictionary, a 39-health network and the like respectively, each entity dictionary is constructed, and the event sentences in the data set are subjected to entity automatic labeling by adopting a maximum reverse matching algorithm; the data set after the entity labeling is completed is provided for 7.2;
6.2: according to the event type defined in the third step, carrying out manual marking on the trigger words and the event types corresponding to the trigger words for each sentence in the data set provided by 7.2; and providing the data set for completing entity labeling and event triggering word labeling for the seventh step and the eighth step.
8. The method for detecting event of event knowledgebase with fused extended features as claimed in claim 1, wherein the seventh step: on the basis of basic features, entity information is introduced as extended features, a classifier is utilized to complete primary event detection,
7.1: constructing basic characteristics of a classifier: (1) triggering word characteristics; (2) triggering the part-of-speech characteristics; (3) triggering a word context feature; (4) candidate trigger dictionary features;
(1) adding the collected medical dictionary into a user dictionary of an NLP tool LTP, and segmenting words of texts in a data set by using the LTP; converting the trigger word into a word vector by using a word2vec tool, and taking the vector as the characteristic of the trigger word;
(2) and the LTP tool carries out part-of-speech tagging on the text in the data set, and the part-of-speech of the trigger word in the sentence is used as the characteristic of the part-of-speech of the trigger word.
(3) Selecting a context window with the length of 3 before and after the trigger word, and taking the words and the part of speech in the window as the context characteristics of the trigger word;
(4) different trigger words generally correspond to fixed one type or several types of events, so that the candidate trigger dictionary type corresponding to the trigger word is used as the candidate trigger dictionary characteristic;
7.2: constructing an extension characteristic of the classifier: (1) intra-sentence entity type characteristics (2) intra-sentence entity quantity characteristics of different types;
(1) intra-sentence entity type features: the entity types contained in the sentences of different event types in the medical event data set provided in the sixth step are regular, and the rules are as follows: surgical entities occur in hospitalization events, diagnostic events, surgical events, body part entities occur in pathologic examination events, and immunohistochemistry events, drug entities occur in chemotherapy events and general treatment events; taking the entity type appearing in the event sentence as the characteristic of the entity type in the sentence;
(2) intra-sentence number characteristics of different types of entities: taking the number of different types of entities appearing in the event sentence as the number characteristic of different types of entities in the sentence;
7.3: training a trigger word recognition secondary classifier;
labeling all candidate trigger words, verbs and vernouns in the sentence with two classification labels, wherein if the word is the trigger word of the sentence, the word is labeled as 1, otherwise, the word is labeled as 0; taking the characteristics of the trigger words, the characteristics of the words of the trigger words and the quantity of different types of entities in the sentences as combination characteristics, training vectors formed by the combination characteristics by utilizing a Support Vector Machine (SVM) tool, and providing a trigger word recognition secondary classifier obtained by training to the step 7.4;
7.4: inputting the medical event data set into a trigger word recognition secondary classifier, and providing a trigger word recognition result to the step 7.5;
7.5 training the event detection multi-classifier;
carrying out event type classification training on the words identified as the trigger words in the 7.4 by using a Support Vector Machine (SVM) tool, wherein the characteristics used by the classifier training are combined characteristics made by four types of basic characteristics in 7.1 and two types of extended characteristics in 7.2, and the event detection multi-classifier obtained by training is provided for the step 7.6;
7.6 inputting the medical event data set into the event detection multi-classifier to complete the preliminary event detection, and providing the preliminary event detection result to the eighth step.
9. The method for detecting event of a fact knowledge graph fused with extended features according to claim 1, wherein the eighth step: extracting event sentences with low classification probability from the preliminary event detection result in the step 7.6, reclassifying the event sentences by utilizing chapter consistency information,
the events are related, and the detection result is further improved by using the discourse consistency characteristics of the medical texts;
sentences with classification result probability lower than 40% in 7.6 are called untrustworthy event sentences, sentences with classification result probability higher than 60% are called trustable event sentences, and untrustworthy event sentences are reclassified; extracting an untrustworthy event sentence of which the front sentence and the back sentence are both trustable event sentences to construct a reclassified medical event set, combining nine characteristics of entity type characteristics, different types of entity quantity characteristics, basic characteristics in the sentence for event detection and different types of entity quantity characteristics in the sentence for event detection into chapter consistency characteristics, and reclassifying the reclassified medical event set by using the SVM.
10. The method for detecting a case-of-knowledge-graph event fused with extended features according to claim 1, wherein the ninth step: the seventh step of classifying the high trigger word with probability and the eighth step of classifying the trigger word with probability together to form a final medical affairs knowledge graph event detection result;
and the event detection result with high classification probability output in the seventh step and the reclassified event detection result output in the eighth step jointly form a final medical affairs knowledge graph event detection result.
CN202011002672.8A 2020-09-22 2020-09-22 Event detection method for event of affair knowledge graph fused with extension features Pending CN112241457A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011002672.8A CN112241457A (en) 2020-09-22 2020-09-22 Event detection method for event of affair knowledge graph fused with extension features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011002672.8A CN112241457A (en) 2020-09-22 2020-09-22 Event detection method for event of affair knowledge graph fused with extension features

Publications (1)

Publication Number Publication Date
CN112241457A true CN112241457A (en) 2021-01-19

Family

ID=74171062

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011002672.8A Pending CN112241457A (en) 2020-09-22 2020-09-22 Event detection method for event of affair knowledge graph fused with extension features

Country Status (1)

Country Link
CN (1) CN112241457A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112749564A (en) * 2021-01-31 2021-05-04 云知声智能科技股份有限公司 Medical record event element extraction method and device, electronic equipment and storage medium
CN112948552A (en) * 2021-02-26 2021-06-11 北京信息科技大学 Method and device for online expansion of affair map
CN113076411A (en) * 2021-04-26 2021-07-06 同济大学 Medical query expansion method based on knowledge graph
CN113177416A (en) * 2021-05-17 2021-07-27 同济大学 Event element detection method combining sequence labeling and pattern matching
CN113468333A (en) * 2021-09-02 2021-10-01 华东交通大学 Event detection method and system fusing hierarchical category information
CN113553853A (en) * 2021-09-16 2021-10-26 南方电网数字电网研究院有限公司 Named entity recognition method and device, computer equipment and storage medium
CN113779358A (en) * 2021-09-14 2021-12-10 支付宝(杭州)信息技术有限公司 Event detection method and system
CN114817575A (en) * 2022-06-24 2022-07-29 国网浙江省电力有限公司信息通信分公司 Large-scale electric power affair map processing method based on extended model
CN116110533A (en) * 2023-02-27 2023-05-12 之江实验室 Event map-based drug type and dosage recommendation system and method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572958A (en) * 2014-12-29 2015-04-29 中国科学院计算机网络信息中心 Event extraction based sensitive information monitoring method
CN107092674A (en) * 2017-04-14 2017-08-25 福建工程学院 The automatic abstracting method and system of a kind of Chinese medicine acupuncture field event trigger word

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572958A (en) * 2014-12-29 2015-04-29 中国科学院计算机网络信息中心 Event extraction based sensitive information monitoring method
CN107092674A (en) * 2017-04-14 2017-08-25 福建工程学院 The automatic abstracting method and system of a kind of Chinese medicine acupuncture field event trigger word

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王晨等: "《Chinese medical event detection based on feature extension and document consistency》", 《PROCEEDINGS OF 5TH INTERNATIONAL CONFERENCE ON AUTOMATION, CONTROL AND ROBOTICS ;ENGINEERING (CACRE 2020)》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112749564A (en) * 2021-01-31 2021-05-04 云知声智能科技股份有限公司 Medical record event element extraction method and device, electronic equipment and storage medium
CN112948552A (en) * 2021-02-26 2021-06-11 北京信息科技大学 Method and device for online expansion of affair map
CN112948552B (en) * 2021-02-26 2023-06-02 北京信息科技大学 Online expansion method and device for a rational map
CN113076411A (en) * 2021-04-26 2021-07-06 同济大学 Medical query expansion method based on knowledge graph
CN113177416B (en) * 2021-05-17 2022-06-07 同济大学 Event element detection method combining sequence labeling and pattern matching
CN113177416A (en) * 2021-05-17 2021-07-27 同济大学 Event element detection method combining sequence labeling and pattern matching
CN113468333A (en) * 2021-09-02 2021-10-01 华东交通大学 Event detection method and system fusing hierarchical category information
CN113779358A (en) * 2021-09-14 2021-12-10 支付宝(杭州)信息技术有限公司 Event detection method and system
CN113779358B (en) * 2021-09-14 2024-05-24 支付宝(杭州)信息技术有限公司 Event detection method and system
CN113553853A (en) * 2021-09-16 2021-10-26 南方电网数字电网研究院有限公司 Named entity recognition method and device, computer equipment and storage medium
CN114817575A (en) * 2022-06-24 2022-07-29 国网浙江省电力有限公司信息通信分公司 Large-scale electric power affair map processing method based on extended model
CN114817575B (en) * 2022-06-24 2022-09-02 国网浙江省电力有限公司信息通信分公司 Large-scale electric power affair map processing method based on extended model
CN116110533A (en) * 2023-02-27 2023-05-12 之江实验室 Event map-based drug type and dosage recommendation system and method
CN116110533B (en) * 2023-02-27 2023-09-01 之江实验室 Event map-based drug type and dosage recommendation system and method

Similar Documents

Publication Publication Date Title
CN112241457A (en) Event detection method for event of affair knowledge graph fused with extension features
CN108831559B (en) Chinese electronic medical record text analysis method and system
CN109299239B (en) ES-based electronic medical record retrieval method
CN112597774B (en) Chinese medical named entity recognition method, system, storage medium and equipment
CN110705293A (en) Electronic medical record text named entity recognition method based on pre-training language model
CN112786194A (en) Medical image diagnosis guide inspection system, method and equipment based on artificial intelligence
CN112001177A (en) Electronic medical record named entity identification method and system integrating deep learning and rules
CN112002411A (en) Cardiovascular and cerebrovascular disease knowledge map question-answering method based on electronic medical record
CN108628824A (en) A kind of entity recognition method based on Chinese electronic health record
Li et al. Intelligent diagnosis with Chinese electronic medical records based on convolutional neural networks
CN111078875B (en) Method for extracting question-answer pairs from semi-structured document based on machine learning
Cai et al. A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records
US20200234801A1 (en) Methods and systems for healthcare clinical trials
CN113707339B (en) Method and system for concept alignment and content inter-translation among multi-source heterogeneous databases
CN113688255A (en) Knowledge graph construction method based on Chinese electronic medical record
CN112541066A (en) Text-structured-based medical and technical report detection method and related equipment
CN110889275A (en) Information extraction method based on deep semantic understanding
Gu et al. Chemical-induced disease relation extraction via attention-based distant supervision
Peng et al. A self-attention based deep learning method for lesion attribute detection from CT reports
CN115019906A (en) Multi-task sequence labeled drug entity and interaction combined extraction method
Shanmuganathan et al. Retracted: Software based sentiment analysis of clinical data for healthcare sector
Friedman Semantic text parsing for patient records
CN113343680A (en) Structured information extraction method based on multi-type case history texts
CN113111660A (en) Data processing method, device, equipment and storage medium
Wang et al. Research on named entity recognition of doctor-patient question answering community based on bilstm-crf model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210119