CN112527961A - Automatic extraction method for emergency response level of emergency plan and responsibility of administrative unit - Google Patents
Automatic extraction method for emergency response level of emergency plan and responsibility of administrative unit Download PDFInfo
- Publication number
- CN112527961A CN112527961A CN202011498662.8A CN202011498662A CN112527961A CN 112527961 A CN112527961 A CN 112527961A CN 202011498662 A CN202011498662 A CN 202011498662A CN 112527961 A CN112527961 A CN 112527961A
- Authority
- CN
- China
- Prior art keywords
- word
- trigger
- emergency
- entity
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000004044 response Effects 0.000 title claims abstract description 59
- 238000000605 extraction Methods 0.000 title claims abstract description 18
- 238000000034 method Methods 0.000 claims abstract description 25
- 238000012545 processing Methods 0.000 claims abstract description 24
- 238000012549 training Methods 0.000 claims abstract description 20
- 238000002372 labelling Methods 0.000 claims abstract description 17
- 230000011218 segmentation Effects 0.000 claims abstract description 8
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 238000013139 quantization Methods 0.000 claims abstract description 4
- 239000013598 vector Substances 0.000 claims description 40
- 238000013145 classification model Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 7
- 238000012935 Averaging Methods 0.000 claims description 3
- 239000000284 extract Substances 0.000 abstract description 5
- 230000006378 damage Effects 0.000 description 7
- 208000027418 Wounds and injury Diseases 0.000 description 6
- 208000014674 injury Diseases 0.000 description 6
- 230000008901 benefit Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 206010067484 Adverse reaction Diseases 0.000 description 1
- 230000001154 acute effect Effects 0.000 description 1
- 230000006838 adverse reaction Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000003912 environmental pollution Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 231100001261 hazardous Toxicity 0.000 description 1
- 206010025482 malaise Diseases 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 231100000572 poisoning Toxicity 0.000 description 1
- 230000000607 poisoning effect Effects 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3346—Query execution using probabilistic model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to an automatic extraction method for emergency response grade of an emergency plan and responsibility of administrative units, which comprises the following steps: s1: preprocessing the emergency plan, splitting the text content of the emergency plan according to the directory title, and storing the text content of the emergency plan to a database according to the directory title grade; s2: labeling classification categories of the catalog titles processed in the step S1 to form a labeled data set; training the labeled data set, and performing word segmentation, quantization and classification processing; s3: extracting the key information; s4: performing de-duplication splicing processing on the extracted name of the administrative unit, outputting the responsibility of the administrative unit, and performing standardization processing on the extracted entity related to the trigger condition; s5: and acquiring the response grade and the corresponding trigger condition, and outputting an analysis result. The method can extract the emergency response grade in the plan and the trigger conditions corresponding to different grades according to the emergency plan after the templating and standardize the emergency response grade and the trigger conditions.
Description
Technical Field
The invention belongs to the field of data processing, and particularly relates to an automatic extraction method for emergency response grade of an emergency plan and responsibility of an administrative unit.
Background
The emergency plan refers to emergency management, command, rescue plan and the like in the case of emergency such as natural disaster, serious accident, environmental pollution and artificial destruction. The emergency plan is often a comprehensive accident emergency plan, which describes in detail what people do before, during and after an accident, when and how to do, and an emergency response plan compiled for the accident situation that may occur in each facility and place on site. The emergency response protocol includes all possible hazardous conditions and specifies the responsibilities of the personnel involved in the emergency.
Most of the current emergency plans are stored in the form of paper files or electronic documents, and the quality of file writing is uneven, so that the content is various. In addition, the existing emergency plan digitizing system usually only performs template transformation on the plan, and does not extract and standardize the trigger conditions corresponding to the emergency response levels in the plan, the responsibility contents of the related functional departments, and the like. When an emergency occurs, it is difficult to judge what emergency plan should be adopted for the event and what emergency response level is met, so that the problem that the response is not timely or which department is responsible is not known easily occurs, and the emergency command disposal efficiency is seriously influenced.
Disclosure of Invention
In order to solve the problems, the method for automatically extracting the emergency response grade and the administrative unit responsibility of the emergency plan, which is provided by the invention, can extract the emergency response grade in the plan and the trigger conditions corresponding to different grades according to the templated emergency plan and standardize the extracted emergency response grade and trigger conditions; and the functional units mentioned in the plan complete text and the related responsibility ranges of the functional units can be extracted.
The technical scheme of the invention is as follows:
an automatic extraction method for emergency response grade and administrative unit responsibility of an emergency plan comprises the following steps:
s1: preprocessing the emergency plan, splitting the text content of the emergency plan according to the directory title, and storing the text content of the emergency plan to a database according to the directory title grade;
s2: labeling classification categories of the catalog titles processed in the step S1 to form a labeled data set; training the labeled data set, and performing word segmentation, quantization and classification processing;
s3: extraction of key information: extracting the name and the responsibility range of an administrative unit from the text content under all the directory titles; according to the classification result obtained in the step S2, extracting the response level and the corresponding trigger condition of the text of which the classification result is the content of describing the emergency response level, the early warning level and the event classification; the key information is extracted by combining entity identification and entity type classification;
s4: performing de-duplication splicing processing on the extracted name of the administrative unit, outputting the responsibility of the administrative unit, and performing standardization processing on the extracted entity related to the trigger condition;
s5: and acquiring the name and responsibility of the administrative unit under each level of directory title according to the directory title level, acquiring the response level and the corresponding trigger condition, and outputting an analysis result.
Preferably, the specific process of step S1 is as follows: splitting the content according to the directory title of the plan, storing the text content in each section of text, simultaneously storing the directory title and the father node of the directory title, specifying the father node of the first-level directory title as 'root', and storing the standardized emergency plan text into a database for further processing.
Preferably, in step S2, the classification labeling adopts a supervised two-classification model, and the labeling of the data set needs to label whether the content in each directory title is an emergency response "class content, if so, label as '1', otherwise label as '0';
the training process in step S2 is: firstly, performing word segmentation on a directory title by adopting a jieba, then calculating word frequency through TF-IDF, performing vectorization processing, and finally classifying by adopting a polynomial naive Bayes classifier.
Preferably, the step of entity identification and entity type classification described in step S3 is as follows:
s3.1: processing text data: in the training stage, when entity recognition is performed on each directory title and all texts under the directory title, the types of the entities to be recognized are as follows: quantity nouns, emergency response levels, condition trigger words, keywords of digit boundaries, quantity units and administrative unit names;
s3.2: entity identification and trigger word category classification model establishment: coding each directory title and all texts under the directory titles according to characters by adopting one-hot, wherein the coded vector is the input vector of the model; inputting the vector into a Bi-LSTM model, obtaining a final state vector of each input word through model coding, and temporarily storing the final state vector; and decoding the final state vector output CRF model to obtain a final sequence labeling result, if the sequence labeling result contains Trigger entities, finding the final state vector corresponding to each word in each Trigger entity, taking vector arithmetic mean as a word vector of the Trigger entity, and inputting Softmax classification.
Preferably, the Loss of the whole model is generated by adding the Loss of the entity recognition model Loss and Trigger classification Loss in the Loss of the training process, and a final entity recognition and Trigger word classification model is obtained through training.
Preferably, the deduplication processing and splicing method in step S4 is as follows: averaging each word vector of the word of which the recognition result is the ORG by using the final state vector of each word output in the step S3.2 to be used as a vector of an entity word, extracting word vectors of each word of which all text entities are recognized as the ORG under the directory, calculating cosine similarity in pairs, taking the word of which each word has the highest similarity with other words, judging that the description is the same administrative unit when the cosine similarity between the two words is greater than 0.9, dividing the two entities into one group, dividing the entities into different groups by comparing the similarity, and respectively forming one group if no similarity is greater than 0.9; and selecting the character with the longest length in each group as the name of the administrative unit, splicing sentences containing any entity in the group according to the sequence, and outputting the sentences as the responsibility of the administrative unit.
Preferably, the normalization is performed for the extracted trigger word entity, number word entity, quantifier entity and keyword entity, and the extraction of each trigger condition must include the trigger word, the number word entity and the quantifier entity at the same time.
Preferably, when a plurality of trigger word entities appear in a sentence, the sentence is again punctuated according to punctuations, so that only one set of trigger conditions appears in each clause finally.
Preferably, the quantifier corresponding to the trigger word needs to be limited, and the trigger condition is screened through secondary matching of the trigger word and the quantifier.
Preferably, when the trigger condition is normalized, when two number word entities are extracted from a set of trigger conditions, the two number word entities are determined to be the number boundary of the trigger condition.
The invention has the beneficial effects that:
according to the text data of the relevant plans for emergency management, all the plans are extracted and standardized through a series of text analysis, such as text classification, entity identification, entity standardization and the like, so that each plan generates trigger conditions corresponding to different response levels, and the names and duties of the associated administrative units; when an accident occurs, the emergency response level to be started by the accident, the name of the administrative unit related to the accident and the responsibility range of the administrative unit can be matched quickly by only knowing key information of casualty conditions, economic loss and the like of the accident and inquiring the standardized plan database, so that the emergency response level can be conveniently and quickly responded, the accident handling efficiency is improved, and the injury and the loss are reduced.
Drawings
FIG. 1 is a flow chart of the present invention
Detailed Description
The embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
As shown in fig. 1, an automatic extraction method for emergency response level and administrative unit responsibility of an emergency plan includes the following specific steps:
s1: preprocessing the emergency plan, splitting the text content of the emergency plan according to the directory title, and storing the text content of the emergency plan to a database according to the directory title grade;
s2: labeling classification categories of the catalog titles processed in the step S1 to form a labeled data set; training the labeled data set, and performing word segmentation, quantization and classification processing;
s3: extracting the name and the responsibility range of an administrative unit from the text content under all the directory titles; according to the classification result obtained in the step S2, extracting the response level and the corresponding trigger condition of the text of which the classification result is the content of describing the emergency response level, the early warning level and the event classification; extracting the key information by combining entity identification and entity type classification;
s4: performing de-duplication splicing processing on the extracted name of the administrative unit, outputting the responsibility of the administrative unit, and performing standardization processing on the extracted entity related to the trigger condition;
s5: and acquiring the name and responsibility of the administrative unit under each level of directory title according to the directory title level, acquiring the response level and the corresponding trigger condition, and outputting an analysis result.
Aiming at the steps, the method is divided into an emergency plan text preprocessing module, a catalog title classification module, a plan entity extraction module, an administrative unit duty standardization module and a response trigger condition standardization module.
As an embodiment of the present invention, in the emergency plan text preprocessing module, the acquired emergency plan text is stored according to the directory level. The content is split according to the directory of the plan, and the directory title and the parent node of the directory are saved while the text content is stored in each section of text. In particular, the parent node of the primary directory is specified as 'root'. And warehousing the standardized emergency plan text for further processing.
In one embodiment of the present invention, in the catalog title classification module, whether the catalog hierarchy is the narrative emergency response grade classification content is judged according to the classification result, and the grade and the trigger condition are extracted from the text content of which the classification result is 'yes'. Therefore, the analysis speed can be improved, and the accuracy of the extraction result can be improved under the constraint of classification. Because the number of words of the title text of the directory is often small, the short text is taken as the main text, and a good classification effect can be achieved by adopting a simple classifier, the invention adopts a naive Bayes classification algorithm to classify the title text of the directory.
And (3) manually labeling and classifying the catalog titles of all the existing emergency plan texts in the database according to the categories, wherein the models are supervised binary classification models, so that the labeling of the data set only needs to label whether the contents of the catalog are contents of 'emergency response', the contents of 'emergency response', early warning response, event level and the like, are labeled as '1' if the contents are the contents of 'emergency response', and are not labeled as '0'.
The title text is firstly subjected to word segmentation by adopting jieba, then word frequency of the segmented words is calculated through TF-IDF, the text is subjected to vectorization, and finally the vectorized text is classified by adopting a polynomial naive Bayes classifier.
TF-IDF, i.e., word frequency-inverse document frequency, is a statistical method used to evaluate the importance of a word to one of a set of documents or a corpus. TF-IDF consists of two parts, TF and IDF. The main idea is as follows: if a word appears in an article with a high frequency TF and rarely appears in other articles, the word or phrase is considered to have a good classification capability and is suitable for classification. Wherein:
TF, Term Frequency (Term Frequency): indicating the frequency with which the word occurs herein.
Namely:
IDF, i.e. inverse file frequency: the IDF for a particular term may be obtained by dividing the total number of documents by the number of documents that contain that term and taking the logarithm of the resulting quotient. If the documents containing the entry t are fewer and the IDF is larger, the entry has good category distinguishing capability. The calculation method of the IDF is shown in formula iii:
where | D | represents the total number of files in the corpus, | { j: t |, wherei∈djDenotes the number of documents containing the term ti. In order to prevent the condition that the denominator is zero due to the fact that the entry is not in the corpus, smoothing is generally performed, namely 1+ | { j: t is used under the condition that the denominator is generali∈dj}|。
Finally, the calculation method of TF-IDF is shown in formula IV:
TF-IDF ═ TF ═ IDF formula iv
A naive Bayes classifier (Navie Bayes) is a classifier constructed based on bayesian principles. In the training stage, training sample characteristics and classes are input, the occurrence frequency of each class in the training samples and the conditional probability of each characteristic attribute to each class are calculated, and the probabilities are stored after training. In the prediction stage, after the input text is subjected to word segmentation and vector conversion, the probability of the text appearing in different categories is calculated, and the text with the highest probability is selected as the classification result of the text. The naive Bayes formula is shown in formula V:
P(yk|x)=P(yk)×ΠP(xi|yk) Formula V
Where x denotes the probability of belonging to a certain class, ykIndicating that the entry occurs in combination.
According to the characteristics of the pre-arranged plan text, the text classification is sequentially carried out according to the directory hierarchy, namely, the text classification is carried out on the primary directory, and when the classification result is '1' in the primary directory, namely, the content is the directory of the 'emergency response' content, the secondary and lower directories are not classified; if not, classifying the secondary directory, and so on.
In an embodiment of the present invention, in the plan entity extraction module, the name of the administrative unit and the scope of responsibility are extracted from the text contents in all the directories. And according to the classification result obtained by the catalog title classification module, extracting the response grade and the corresponding trigger condition of the text which is considered as the content describing 'emergency response grade, early warning grade, event grade' and the like in the classification result. The key information is extracted mainly by combining the entity identification and entity type classification of the Bi-LST M-CRF.
The Bi-LSTM is a bidirectional long-short time memory network and is formed by combining a forward LSTM and a backward LSTM. Both are often used to model context information in natural language processing tasks.
CRF is conditional random field, belonging to discriminant probability map model. CRF is able to label the probability of sequence occurrence given the variable sequence under which it is observed. In the task, the observation sequence is a word sequence, the tag sequence is a corresponding part-of-speech sequence, and the tag sequence has a linear sequence structure.
The advantage of Bi-LSTM is that it can learn the dependency between observation sequences (input words) by Bi-directional setup, and LSTM can automatically extract the features of observation sequences according to targets (such as recognition entities) during training, but it has the disadvantage that it cannot learn the relationship between labeled sequences (output labels). In the named entity recognition task, labels have a certain relationship, for example, a B-type label (representing the beginning of an entity) is not followed by a B-type label, so that while the LSTM solves the sequential labeling task such as NER, although a complicated feature engineering can be omitted, the disadvantage that the labeling context cannot be learned exists.
In contrast, CRF has the advantage of being able to model and learn the characteristics of marker sequences, but has the disadvantage of requiring manual extraction of observed sequence features. It is therefore common to add a CRF layer after the LSTM to obtain the benefits of both.
The main steps of entity identification and entity type classification are as follows:
1. processing text data: in the training stage, entity recognition is carried out on each directory title and all texts under the directory title, and entities needing to be recognized mainly comprise the following types: number nouns (M), emergency Response levels (Response), conditional Trigger words (Trigger), keywords of number boundaries (Keyword), number units (Q), administrative unit names (ORG), and the like, 6 types of entities. The specific data labeling method is as follows: splitting each sentence according to characters, giving each character a label according to a BMESO strategy, wherein the BMESO strategy is that all non-entities are marked as 'O', the entities are directly marked as S _ entity names according to specific entity types, if the entity length is a character, the entities are marked as S _ entity names, otherwise, the first characters of the entities are marked as B _ entity names, the middle characters are marked as M _ entity names, and the last characters are marked as E _ entity names. Specifically, for the conditional Trigger word (Trigger), the extractable Trigger conditions currently supported by the present invention mainly include: the method comprises the following steps of uniformly marking seven categories of death, injury (including serious injury, light injury, poisoning, disability, adverse reaction, local organ disability, acute severe radiation sickness and the like), missing, economic loss (property loss), earthquake magnitude, duration, air quality index, emergency transfer and the like as Trigger in an entity identification stage, and classifying specific types of Trigger words after the entities are extracted.
As a key sentence: "more than 10 persons casualty in one sudden public incident, wherein, more than 3 cases of death and critical case are particularly important incidents, starting first-level emergency response", splitting the sentence by words (including all punctuations and other characters), and after marking, the sequence label corresponding to each word is: "O, O, O, O, O, O, O, B _ Trigger, E _ Trigger, B _ M, E _ M, B _ Keyword, E _ Keyword, O, O, O, O, B _ Trigger, E _ Trigger, O, O, B _ Keyword, E _ Keyword, S _ M, O, O, O, O, O, O, O, O, O, B _ Response, E _ Response, O, O, O, O, O, O, O, O".
2. Entity identification and trigger word category classification model establishment: coding each directory title and all texts under the directory titles according to characters by adopting one-hot, wherein the coded vector is the input vector of the model; the vectors are input into a Bi-LSTM model, final state vectors of each input word are obtained through model coding, and the final state vectors are temporarily stored. And then decoding the vector output CRF model to obtain a final sequence labeling result, if the sequence labeling result contains Trigger entities, finding a final state vector corresponding to each word in each Trigger entity, taking vector arithmetic mean as a word vector of the Trigger entity, and inputting Softmax classification.
Softmax is a very common and important function, and is widely used especially in multi-category scenes. He maps some inputs to real numbers between 0-1 and the normalization guarantees a sum of 1, so the sum of the probabilities for the multi-classes is also exactly 1. The Softmax function is defined as shown in equation VI:
wherein, ViIs the output of the classifier category, i represents the category index, and the total category number is C; siAnd the ratio of the index of the current element to the sum of the indexes of all elements is shown, Softmax converts the output numerical values of multiple classifications into relative probability, and in practical application, the classification with the highest probability value is selected as a classification result.
And the Loss of the whole model is generated by adding the Loss of the entity recognition model Loss and Trigger classification Loss in the Loss of the model in the training process, and the final entity recognition and Trigger word classification model is obtained through training.
In the prediction stage, according to the text standardized in the emergency plan text preprocessing module, the text content under each directory is sequentially analyzed according to the directory hierarchy, the text is punctuated according to the period, the document directory is analyzed in the directory title classification module, when the text under the directory is the content of 'emergency response', the text under the directory is subjected to entity identification, and the Trigger word identified as Trigger is classified; otherwise, only entity recognition is carried out on all texts, so as to extract administrative units in the document.
In one embodiment of the present invention, in the administration unit role standardization module, the entity of the extracted category ORG is processed with respect to the entity identification result in the plan entity extraction module.
Firstly, carrying out duplicate removal on the entity name of an administrative unit: since the names describing the administration units in the text may be spoken, abbreviated, aliased, and the like, the extracted administration unit names need to be deduplicated. The specific method comprises the following steps:
the method comprises the steps of adopting a preplan entity extraction module to carry out entity identification, outputting a final state vector of each word by a Bi-LSTM model, averaging each word vector of a word with an identification result of ORG as a vector of the entity word, extracting word vectors of each word identified as ORG by all text entities under a directory according to the method, calculating cosine similarity in pairs, taking a word with the highest similarity between each word and other words, judging that the description is the same administrative unit when the cosine similarity of the two words is more than 0.9, dividing the two entities into one group, dividing the entities into different groups through the comparison of the similarity, and respectively forming one group if the similarity is more than 0.9.
And secondly, after the unit name is removed, selecting the name with the longest character length in each group as an administrative unit, splicing sentences containing any entity in the group in sequence, and outputting the sentences as responsibility of the administrative unit.
In one embodiment of the present invention, in the response trigger condition standardization module, the extracted entities related to the trigger conditions are standardized with respect to the classification results of the entity identification and the trigger words in the plan entity extraction module. The standardization responding to the repeat condition is mainly to standardize the extracted trigger word entity, number word entity, quantifier entity and keyword entity.
Firstly, for a trigger word entity, according to the characteristics of a pre-arranged text, the invention limits that the extraction of each trigger condition must simultaneously comprise a trigger word, a number word entity and a quantifier entity. When a plurality of trigger word entities appear in a sentence, the sentence is broken again according to punctuation marks on the premise of ensuring the rules, so that only one set of trigger conditions appears in each clause finally.
Secondly, in the aspect of measuring word processing, in order to ensure the accuracy of the triggering condition, the classified triggering words and the corresponding measuring words are limited. If the trigger word classification result is 'death', the corresponding quantifier can only be generated in 'people, names, examples' and the like, and more accurate trigger conditions can be screened out through secondary matching of the trigger word and the quantifier.
For the number words and keywords, the emergency response triggering condition is often a range limitation for the number of people, the magnitude of earthquake, the economic loss and the like. Therefore, when the trigger condition is normalized, when two number word entities are extracted from a set of trigger conditions, the two number word entities are determined as the number boundary of the trigger condition. If the text is 'injured 1 to 3 persons, then four-level emergency response is started', the trigger words are extracted in the entity recognition stage: injury, number: 0,3, quantifier: human, then the standardized trigger conditions are: the number of injured people: 0-3 persons; when only one digital word entity is extracted from a group of trigger conditions but the digital word entity contains a keyword entity, judging the number boundary of the trigger conditions according to the keyword. . If the text is that the number of injured people is below 3, four-level emergency response is started, and the entity recognition stage extracts a trigger word: injury, number: 3, quantifier: human, keyword: the following ", then the standardized trigger conditions are: the number of injured people: 0-3 persons; specifically, when there is no upper limit on the number boundary in the text, the numerical value '99999' is uniformly set as the upper limit on the number.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the present invention in its spirit and scope. Are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. An automatic extraction method for emergency response grade of an emergency plan and responsibility of an administrative unit is characterized by comprising the following steps:
s1: preprocessing the emergency plan, splitting the text content of the emergency plan according to the directory title, and storing the text content of the emergency plan to a database according to the directory title grade;
s2: labeling classification categories of the catalog titles processed in the step S1 to form a labeled data set; training the labeled data set, and performing word segmentation, quantization and classification processing;
s3: extraction of key information: extracting the name and the responsibility range of an administrative unit from the text content under all the directory titles; according to the classification result obtained in the step S2, extracting the response level and the corresponding trigger condition of the text of which the classification result is the content of describing the emergency response level, the early warning level and the event classification; the key information is extracted by combining entity identification and entity type classification;
s4: performing de-duplication splicing processing on the extracted name of the administrative unit, outputting the responsibility of the administrative unit, and performing standardization processing on the extracted entity related to the trigger condition;
s5: and acquiring the name and responsibility of the administrative unit under each level of directory title according to the directory title level, acquiring the response level and the corresponding trigger condition, and outputting an analysis result.
2. The method for automatically extracting the emergency response level and the administrative responsibility of the emergency plan according to claim 1, wherein the specific process of the step S1 is as follows: splitting the content according to the directory title of the plan, storing the text content in each section of text, simultaneously storing the directory title and the father node of the directory title, defining the father node of the first-level directory title as 'root', and storing the standardized emergency plan text into a database for further processing.
3. The method for automatically extracting emergency response levels and administrative unit responsibilities of emergency plans according to claim 1, wherein in step S2, the classification labels adopt a supervised binary classification model, and the label of the data set needs to label whether the content in each directory title is emergency response "type content, if yes, label as '1', otherwise label as '0';
the training process in step S2 is: firstly, performing word segmentation on a directory title by adopting a jieba, then calculating word frequency through TF-IDF, performing vectorization processing, and finally classifying by adopting a polynomial naive Bayes classifier.
4. The method for automatically extracting emergency response level and administrative responsibility of emergency plans according to claim 1, wherein the step of entity identification and entity type classification in step S3 is as follows:
s3.1: processing text data: in the training stage, when entity recognition is performed on each directory title and all texts under the directory title, the types of the entities to be recognized are as follows: quantity nouns, emergency response levels, condition trigger words, keywords of digit boundaries, quantity units and administrative unit names;
s3.2: entity identification and trigger word category classification model establishment: coding each directory title and all texts under the directory titles according to characters by adopting one-hot, wherein the coded vector is the input vector of the model; inputting the vector into a Bi-LSTM model, obtaining a final state vector of each input word through model coding, and temporarily storing the final state vector; and decoding the final state vector output CRF model to obtain a final sequence labeling result, if the sequence labeling result contains Trigger entities, finding the final state vector corresponding to each word in each Trigger entity, taking vector arithmetic mean as a word vector of the Trigger entity, and inputting Softmax classification.
5. The method for automatically extracting emergency response levels and administrative unit responsibilities of emergency plans according to claim 4, wherein Loss of the whole model is generated by adding an entity recognition model Loss and a Trigger classification Loss in the training process, and a final entity recognition and Trigger word classification model is obtained through training.
6. The method for automatically extracting emergency response level and administrative responsibility of emergency plans according to claim 4, wherein the method for performing deduplication processing and splicing in step S4 is as follows: averaging each word vector of the word of which the recognition result is the ORG by using the final state vector of each word output in the step S3.2 to be used as a vector of an entity word, extracting word vectors of each word of which all text entities are recognized as the ORG under the directory, calculating cosine similarity in pairs, taking the word of which each word has the highest similarity with other words, judging that the description is the same administrative unit when the cosine similarity between the two words is greater than 0.9, dividing the two entities into one group, dividing the entities into different groups by comparing the similarity, and respectively forming one group if no similarity is greater than 0.9; and selecting the character with the longest length in each group as the name of the administrative unit, splicing sentences containing any entity in the group according to the sequence, and outputting the sentences as the responsibility of the administrative unit.
7. The method of claim 6, wherein the standardization is performed for extracted trigger word entities, number word entities, quantifier entities and keyword entities, and each trigger condition must be extracted from the trigger word, number word entities and quantifier entities.
8. The method of claim 7, wherein when a plurality of trigger word entities appear in a sentence, the sentence is again punctuated such that only one set of trigger conditions appears in each final clause.
9. The method for automatically extracting emergency response levels and administrative unit responsibilities of emergency plans according to claim 7, wherein quantifier words corresponding to the trigger words are limited, and the trigger conditions are screened by secondary matching of the trigger words and the quantifier words.
10. The method of claim 7, wherein when the trigger condition is standardized, and when two digital entities are extracted from a set of trigger conditions, the two digital entities are determined as the number boundary of the trigger condition.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011498662.8A CN112527961B (en) | 2020-12-18 | 2020-12-18 | Automatic extraction method for emergency response level of emergency plan and responsibility of administrative unit |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011498662.8A CN112527961B (en) | 2020-12-18 | 2020-12-18 | Automatic extraction method for emergency response level of emergency plan and responsibility of administrative unit |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112527961A true CN112527961A (en) | 2021-03-19 |
CN112527961B CN112527961B (en) | 2022-05-13 |
Family
ID=75001253
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011498662.8A Active CN112527961B (en) | 2020-12-18 | 2020-12-18 | Automatic extraction method for emergency response level of emergency plan and responsibility of administrative unit |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112527961B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113780983A (en) * | 2021-08-16 | 2021-12-10 | 创意信息技术股份有限公司 | Emergency plan driving system based on process engine |
CN113936432A (en) * | 2021-12-17 | 2022-01-14 | 中国气象局公共气象服务中心(国家预警信息发布中心) | Weather early warning image-text generation method and device and electronic equipment |
CN114201959A (en) * | 2021-11-16 | 2022-03-18 | 湖南长泰工业科技有限公司 | Mobile emergency command method |
CN114357171A (en) * | 2022-01-04 | 2022-04-15 | 中国建设银行股份有限公司 | Emergency event processing method and device, storage medium and electronic equipment |
CN117112499A (en) * | 2023-10-25 | 2023-11-24 | 数研院(福建)信息产业发展有限公司 | Data directory grading method, medium and equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108665141A (en) * | 2018-04-03 | 2018-10-16 | 山东科技大学 | A method of extracting emergency response procedural model automatically from accident prediction scheme |
US20180342312A1 (en) * | 2017-05-26 | 2018-11-29 | Christopher Khatchig Kaypekian | Method and system for direct access to medical patient records |
CN110223038A (en) * | 2019-05-30 | 2019-09-10 | 山东科技大学 | A kind of emergency response pre-planned scheme text quality evaluating system and method based on process extraction |
-
2020
- 2020-12-18 CN CN202011498662.8A patent/CN112527961B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180342312A1 (en) * | 2017-05-26 | 2018-11-29 | Christopher Khatchig Kaypekian | Method and system for direct access to medical patient records |
CN108665141A (en) * | 2018-04-03 | 2018-10-16 | 山东科技大学 | A method of extracting emergency response procedural model automatically from accident prediction scheme |
CN110223038A (en) * | 2019-05-30 | 2019-09-10 | 山东科技大学 | A kind of emergency response pre-planned scheme text quality evaluating system and method based on process extraction |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113780983A (en) * | 2021-08-16 | 2021-12-10 | 创意信息技术股份有限公司 | Emergency plan driving system based on process engine |
CN114201959A (en) * | 2021-11-16 | 2022-03-18 | 湖南长泰工业科技有限公司 | Mobile emergency command method |
CN113936432A (en) * | 2021-12-17 | 2022-01-14 | 中国气象局公共气象服务中心(国家预警信息发布中心) | Weather early warning image-text generation method and device and electronic equipment |
CN114357171A (en) * | 2022-01-04 | 2022-04-15 | 中国建设银行股份有限公司 | Emergency event processing method and device, storage medium and electronic equipment |
CN117112499A (en) * | 2023-10-25 | 2023-11-24 | 数研院(福建)信息产业发展有限公司 | Data directory grading method, medium and equipment |
CN117112499B (en) * | 2023-10-25 | 2024-01-02 | 数研院(福建)信息产业发展有限公司 | Data directory grading method, medium and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN112527961B (en) | 2022-05-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112527961B (en) | Automatic extraction method for emergency response level of emergency plan and responsibility of administrative unit | |
CN107818138B (en) | Case law regulation recommendation method and system | |
KR101999152B1 (en) | English text formatting method based on convolution network | |
CN110674274B (en) | Knowledge graph construction method for food safety regulation question-answering system | |
CN112732934B (en) | Power grid equipment word segmentation dictionary and fault case library construction method | |
CN113221567A (en) | Judicial domain named entity and relationship combined extraction method | |
CN111950273A (en) | Network public opinion emergency automatic identification method based on emotion information extraction analysis | |
Ayishathahira et al. | Combination of neural networks and conditional random fields for efficient resume parsing | |
CN113168499A (en) | Method for searching patent document | |
WO2018160551A1 (en) | Automatic human-emulative document analysis enhancements | |
CN113196277A (en) | System for retrieving natural language documents | |
CN113196278A (en) | Method for training a natural language search system, search system and corresponding use | |
CN112989830B (en) | Named entity identification method based on multiple features and machine learning | |
CN115618085B (en) | Interface data exposure detection method based on dynamic tag | |
CN113704396A (en) | Short text classification method, device, equipment and storage medium | |
CN113673223A (en) | Keyword extraction method and system based on semantic similarity | |
CN115238040A (en) | Steel material science knowledge graph construction method and system | |
Anandika et al. | A study on machine learning approaches for named entity recognition | |
CN115292490A (en) | Analysis algorithm for policy interpretation semantics | |
KR102563539B1 (en) | System for collecting and managing data of denial list and method thereof | |
CN113987175A (en) | Text multi-label classification method based on enhanced representation of medical topic word list | |
CN117874206A (en) | Query method for natural language identification and Chinese word segmentation of high-efficiency data asset based on large model | |
CN111191455A (en) | Legal provision prediction method in traffic accident damage compensation | |
CN115712720A (en) | Rainfall dynamic early warning method based on knowledge graph | |
Wu et al. | On constructing a knowledge base of chinese criminal cases |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: A Method for Automatically Extracting Emergency Response Levels and Administrative Unit Responsibilities from Emergency Plans Effective date of registration: 20231007 Granted publication date: 20220513 Pledgee: Guotou Taikang Trust Co.,Ltd. Pledgor: HANGZHOU XUJIAN SCIENCE AND TECHNOLOGY Co.,Ltd. Registration number: Y2023980059619 |