CN113901815B - Emergency working condition event detection method based on dam operation log - Google Patents

Emergency working condition event detection method based on dam operation log Download PDF

Info

Publication number
CN113901815B
CN113901815B CN202111202004.4A CN202111202004A CN113901815B CN 113901815 B CN113901815 B CN 113901815B CN 202111202004 A CN202111202004 A CN 202111202004A CN 113901815 B CN113901815 B CN 113901815B
Authority
CN
China
Prior art keywords
vector
word
sentence
document
embedded
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111202004.4A
Other languages
Chinese (zh)
Other versions
CN113901815A (en
Inventor
孙卫
周华
迟福东
毛莺池
李然
陈豪
王龙宝
程永
卢俊
钟鸣
夏旭东
李玲
赵欢
罗松
马建平
袁溯
吴胜亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Huaneng Group Technology Innovation Center Co Ltd
Huaneng Lancang River Hydropower Co Ltd
Original Assignee
Hohai University HHU
Huaneng Group Technology Innovation Center Co Ltd
Huaneng Lancang River Hydropower Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU, Huaneng Group Technology Innovation Center Co Ltd, Huaneng Lancang River Hydropower Co Ltd filed Critical Hohai University HHU
Priority to CN202111202004.4A priority Critical patent/CN113901815B/en
Publication of CN113901815A publication Critical patent/CN113901815A/en
Application granted granted Critical
Publication of CN113901815B publication Critical patent/CN113901815B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/08Construction
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A10/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
    • Y02A10/40Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Strategic Management (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Primary Health Care (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses an emergency working condition event detection method for a dam operation log, which comprises the steps of constructing a dam emergency working condition event type set; coding all the word fragments in the dam operation log, and converting the code fragments into embedded vectors corresponding to the word fragments; fusing an embedded vector, a named entity type and a part-of-speech tagging vector corresponding to the word segmentation, and strengthening semantic information of the word segmentation; the sentence-document dual attention fusion context information is used, the sentence-level attention is improved to the important words which possibly trigger events in each sentence, the document-level attention is improved to the important sentences which possibly trigger events in each log document, the local and global semantic information of word segmentation is enhanced, and the problems of word polysense and word and trigger mismatch in traditional Chinese event detection are solved; in order to avoid the problem of unbalanced classification of the positive and negative samples of the classification caused by that each sentence in the common dam log document contains 2 events at most, a training model is adopted to detect the events, and classification of all the documents is realized based on the events contained in each document.

Description

Emergency working condition event detection method based on dam operation log
Technical Field
The invention relates to an emergency working condition event detection method based on a dam operation log, which is used for carrying out event detection on the dam operation log in the hydraulic field, in particular to carrying out event detection on various special working condition events and corresponding events of the special working condition events which are passed by the dam in a long period operation, and belongs to the technical field of natural language processing.
Background
The task of event detection is to identify event trigger words from large-scale unstructured natural language text and correctly classify event types, wherein the trigger words refer to core words or phrases which can clearly and obviously express the occurrence of an event. Event detection has important significance on event semantic modeling, and is convenient for subsequent structural management and storage of events.
In the field of hydraulic engineering, the dam facilities provide various functions such as flood control, anti-icing, water storage, water supply, power generation and the like, and are medium-flow whetstones for development of water conservancy projects in China. During long period operation of decades, the dam is subjected to various natural risk events such as flood, earthquake, storm and the like, and the structural safety of the dam and the security of the lives and property of people downstream of the dam can be endangered. Thus, after a special event, the dam manager may arrange for a comprehensive special inspection to maintain the dam structure. In addition, daily inspection and overhaul of the dam are also important measures for guaranteeing the safety of the dam body. After various countermeasures, the patrol personnel can record the reasons of the patrol event and the patrol result in a word way to form a dam operation log file.
By processing the dam operation log to a certain extent, the safety condition of the self-built dam of the dam can be analyzed, a dam event knowledge base is formed, and the intelligent management level of the dam is improved. The emergency working condition event detection method facing the dam operation log can skip the event trigger to detect all preset events in the dam operation log and classify the attribution event type of each document, and provides a basis for the subsequent extraction of the event, the construction of an event map and the construction task of an event knowledge base.
There are a large number of ambiguities in chinese text, and events are typically composed of event triggers and event arguments. Event triggers are often verbs, and generally have the problems of word ambiguity and mismatching of the triggers and words, so that an event detection method with trigger recognition as a core is prone to classification errors.
Disclosure of Invention
The invention aims to: aiming at the problems existing in the prior art and various natural events and countermeasure events encountered in the dam operation process, and lack of standardized records for the events, the invention provides an emergency working condition event detection method based on a dam operation log, which avoids the process of identifying triggers, solves the problems by simulating the triggers in sentences, detects dam special working condition events from the dam operation log, classifies the attribution event types of each document, and provides a basis for the extraction of subsequent events.
The technical scheme is as follows: an emergency working condition event detection method based on a dam operation log comprises the following steps:
(1) Preprocessing a log file: firstly, sorting and splitting dam operation logs according to the recording date, marking each document, sorting sentences in each document, marking and word segmentation, marking entity types and parts of speech of each word, and then constructing a dam emergency working condition event type set; the sorting refers to sorting logs of different dates; splitting refers to splitting the logs on the same day according to the document content;
(2) Encoding vector embedding: encoding all the word fragments in the dam operation log by using an ALBERT preprocessing model, and converting the encoded word fragments into embedded vectors corresponding to the words;
(3) BiLSTM feature fusion: using BiLSTM to fuse an embedded vector, a named entity type and a part-of-speech labeling vector corresponding to the word segmentation, and reinforcing semantic information of the word segmentation;
(4) Dual attention mechanism semantic enhancement: the sentence-document dual attention fusion context information is used, the sentence-level attention is improved to the important words which possibly trigger the event in each sentence, the document-level attention is improved to the important sentences which possibly trigger the event in each log document, and the problems of word polysense and word and trigger mismatch in the traditional Chinese event detection are solved;
(5) Training a model by using the Focal loss function and realizing classification: in order to avoid the problem of unbalanced classification of the two classification positive and negative samples caused by that each sentence in the common dam log document contains 2 events at most, a Focal loss function training model is adopted to realize classification of all the document attributive events.
The dam emergency working condition event type set comprises typical events such as earthquake, heavy rain, flood discharge, pre-flood safety large inspection, comprehensive special inspection, daily maintenance, daily inspection and the like.
The named entity types comprise names, departments, positions, time, date, measured values, percentages, defect types and the like; the part of speech tagging vector comprises nouns, verbs, adjectives, quantity words, pronouns and the like.
Further, the step (1) includes the following steps:
firstly, dividing a dam operation log file into a plurality of documents according to log record dates, sequencing and marking each document, sequencing and marking sentences in each document, and using jieba to divide words in word units;
(1.2) performing entity type labeling and part-of-speech labeling on the word segmentation result, wherein the entity type labeling converts the entity type labeling into a low-dimensional vector by searching a randomly initialized embedded table, the part-of-speech labeling adopts Stanford CoreNLP to label the part of speech of each word, and then converts the part-of-speech labeling into the low-dimensional vector by searching a corresponding embedded table;
(1.3) predefining the event types of emergency working conditions of the dam, including typical events such as earthquake, heavy rain, flood discharge, pre-flood safety large inspection, comprehensive special inspection, daily maintenance, daily inspection and the like.
Further, the step (2) includes the following steps:
all the segmentations in (1.1) are encoded using the ALBERT pre-training model, converted into vector representations that can be processed by the computer.
Further, the step (3) includes the following steps:
(3.1) concatenating an embedded vector corresponding to each word, an entity type vector and a part-of-speech tagging vector, wherein the embedded vector is the vector obtained in the step (2), the entity type vector is a mathematical vector corresponding to all word-segmentation named entity recognition results such as name, organization, position, time, date, numerical value, percentage and the like, and the part-of-speech tagging vector is a mathematical vector corresponding to part-of-speech tagging results such as nouns, verbs, adjectives, number words, pronouns and the like of all word-segmentation;
(3.2) processing the serial vectors in a single sentence by using the BiLSTM model, wherein each vector is an input, capturing word context information by using the bidirectional LSTM unit, and respectively outputting two hidden states
Figure BDA0003305279490000031
And->
Figure BDA0003305279490000032
Synthesizing the two vectors into an output vector +.>
Figure BDA0003305279490000033
Further, the step (4) includes the following steps:
(4.1) in the training set, converting the emergency condition predefined event contained in each sentence into an embedded vector t by searching a randomly initialized embedded table 1 Converting each document into an embedded vector d by using Dov2 Vec;
(4.2) for all the words in each sentence, calculating the weight of each word in the sentence by using a local attention mechanism, improving the attention weight of the word triggering the target event type and simulating the trigger, wherein the calculation formula is as follows:
Figure BDA0003305279490000034
/>
wherein h is k Is the kth part of the output vector h,
Figure BDA0003305279490000035
is the local attention vector alpha s Part k of (a)>
Figure BDA0003305279490000036
Is the transition of event type embedded vectorPlacing; the trigger refers to an event trigger, namely a word for triggering an event, which is generally a verb;
(4.3) for all the words in each sentence, calculating the weight of the sentence in which the word is located in the document by using a global attention mechanism, obtaining the unique meaning of the trigger in the scene, assisting in judging the event type of the sentence, solving the ambiguity problem of the trigger due to the context information, and adopting the following calculation formula:
Figure BDA0003305279490000037
wherein h is k Is the kth part of the output vector h,
Figure BDA0003305279490000038
is the global attention vector alpha d Part k of (a)>
Figure BDA0003305279490000039
Is event type embedding vector transpose, d T Is a document-level embedded vector transpose;
and (4.4) weighting and fusing the local attention and the global attention, improving the detection precision of the event, and calculating the weight vector and the weighted fusion formula of the local attention and the global attention to the event, wherein the weight vector and the weighted fusion formula are as follows:
v s =α s ·t 1
v d =α d ·t 2
o=σ(λ·v s +(1-λ)·v d )
wherein the final output value o is defined by v s And v d Two parts. v s From alpha s And event type embedding vector t 1 Dot product generation for capturing local features and simulating hidden event triggers; v d From alpha d And t 2 Dot product generation for capturing global features and context information. Sigma is a Sigmoid function, λ ε [0,1]Is at v s And v d Super parameters for trade-off between.
Further, the step (5) includes the following steps:
the data set is processed in sentence units, training data is formed in < sentences, event type > pairs, whether a given sentence conveys an event of t type is represented, and event type labels are 1 or 0, such as < near dam bank, hub area side slope and highway inspection conditions: no abnormality, daily inspection > training pair label 1, < near dam bank, hub area side slope and highway inspection condition: no anomaly exists, the label of the earthquake > training pair is 0, and because the number of events possibly expressed by a single sentence is less than the number of predefined events, the problem of unbalance of the number of negative samples far greater than the number of positive samples caused by two-class identification is solved, a model obtained by Focal loss function training is introduced, the influence of the positive samples and the difficult-to-separate samples on the model is enhanced, and the calculation formula is as follows:
Figure BDA0003305279490000041
where x is composed of sentences and target event types, y ε {0,1}, o (x (i) ) Is a model predictive value of the model, and, |θ| 2 Is the sum of squares of all elements in the model, delta is more than 0, the weight of an L2 normalization term, beta is a parameter for balancing the positive and negative weight proportion of a sample, gamma is a parameter for balancing the difficult-to-classify and easy-to-classify weight proportion of the sample, and beta=0.25 and gamma=2 are set in the experiment.
And finally, performing event detection on the dam operation log file by using the trained model, and classifying the event types contained in each document.
An emergency working condition event detection system based on a dam operation log, which carries out event detection on the dam operation log in the water conservancy field, comprises:
the log file preprocessing module: firstly, sorting and splitting dam operation logs according to the recording date, marking each document, sorting sentences in each document, marking and word segmentation, marking entity types and parts of speech of each word, and then constructing a dam emergency working condition event type set;
the code vector embedding module: encoding all the word fragments in the dam operation log by using an ALBERT preprocessing model, and converting the encoded word fragments into embedded vectors corresponding to the words;
BiLSTM feature fusion module: using BiLSTM to fuse an embedded vector, a named entity type and a part-of-speech labeling vector corresponding to the word segmentation, and reinforcing semantic information of the word segmentation;
a dual-attention mechanism semantic enhancement module: context information is fused using sentence-document dual attention;
one model: the model is trained by adopting a Focal loss function, and classification of all document attribution events is carried out by using the trained model.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method for detecting an emergency condition event based on a dam operation log as described above when executing the computer program.
A computer readable storage medium storing a computer program for executing the dam operation log-based emergency condition event detection method as described above.
The beneficial effects are that: compared with the prior art, the emergency working condition event detection method based on the dam operation log provided by the invention has the advantages that the hidden event trigger is simulated through capturing the keyword and sentence-level semantic information by local attention, the event detection under the trigger-free condition is realized, rich document-level context information is introduced through global attention, the meaning of a word under the real context is judged in an auxiliary manner, the trigger identification link is skipped, and the event type is directly judged. The problems of mismatching of Chinese words and triggers and ambiguity of the words are avoided, and the event detection precision is improved.
Drawings
FIG. 1 is a model training flow chart of an embodiment of the present invention;
FIG. 2 is a diagram of a model training framework in accordance with an embodiment of the present invention.
Detailed Description
The present invention is further illustrated below in conjunction with specific embodiments, it being understood that these embodiments are meant to be illustrative of the invention only and not limiting the scope of the invention, and that modifications of the invention, which are equivalent to those skilled in the art to which the invention pertains, will fall within the scope of the invention as defined in the claims appended hereto.
As shown in fig. 1, the emergency working condition event detection method based on the dam operation log mainly comprises the following steps:
step (1) preprocessing the dam operation log file
(1.1) first dividing the dam travel log file into a plurality of documents according to the log record date, as a training set, sorting and labeling each document, sorting and labeling sentences in each document, and word segmentation in word units using jieba. The sentence "near dam bank, junction area side slope and highway inspection conditions" as in fig. 2: the "no anomaly" is first split into "near dam", "bank", "junction", "side slope", "road", "inspection", "condition", "": "," none "," abnormal ".
And (1.2) marking the entity type and the part of speech of the word segmentation result, wherein the entity type marking is converted into a low-dimensional vector by searching an embedded table which is randomly initialized, the part of speech marking adopts Stanford CoreNLP to mark the part of speech of each word, and then the low-dimensional vector is converted into the low-dimensional vector by searching the embedded table.
And (1.3) predefining dam emergency working condition event types for a dam operation log, wherein the typical events comprise earthquake, storm, flood discharge, pre-flood safety large inspection, comprehensive special inspection, daily maintenance, daily inspection and the like.
Step (2) encoding the segmented word into a word vector
All the segmentations in (1.3) are encoded using the ALBERT pre-training model, converted into vector representations that can be processed by the computer.
And (3) extracting semantic information after splicing word vectors, named entity types and part-of-speech tagging vectors.
And (3.1) connecting an embedded vector, an entity type vector and a part-of-speech tagging vector corresponding to each word, wherein the embedded vector is the vector obtained in the step (2), the entity type vector is a mathematical vector corresponding to all word naming entity recognition results such as name, organization, position, time, date, measured value, percentage, defect type and the like, and the part-of-speech tagging vector is a mathematical vector corresponding to part-of-speech tagging results such as nouns, verbs, adjectives, number words, pronouns and the like of all the words.
(3.2) processing the serial vectors in a single sentence by using the BiLSTM model, wherein each vector is an input, capturing word context information by using the bidirectional LSTM unit, and respectively outputting two hidden states
Figure BDA00033052794900000610
And->
Figure BDA00033052794900000611
Synthesizing the two vectors into an output vector +.>
Figure BDA0003305279490000069
Step (4) capturing sentence-level context and document-level context using a dual-attentive mechanism, enhancing word vector representation, and emulating a trigger
(4.1) converting the events contained in each sentence into an embedding vector t in the training set by looking up a randomly initialized embedding table 1 Each document is converted to an embedded vector d using Dov2 Vec.
(4.2) for all the words in each sentence, calculating the weight of each word in the sentence by using a local attention mechanism, improving the attention weight of the word triggering the target event type and simulating the trigger, wherein the calculation formula is as follows:
Figure BDA0003305279490000061
wherein h is k Is the kth part of the output vector h,
Figure BDA0003305279490000062
is local attentionVector alpha s Part k of (a)>
Figure BDA0003305279490000063
Is a transpose of the event type embedding vector. As in figure 2 +.>
Figure BDA0003305279490000068
For assisting the local attention mechanism, a trigger is simulated for each word segmentation.
(4.3) for all the words in each sentence, calculating the weight of the sentence in which the word is located in the document by using a global attention mechanism, obtaining the unique meaning of the trigger in the scene, assisting in judging the event type of the sentence, solving the ambiguity problem of the trigger due to the context information, and the calculation formula is as follows:
Figure BDA0003305279490000064
wherein h is k Is the kth part of the output vector h,
Figure BDA0003305279490000065
is the global attention vector alpha d Part k of (a)>
Figure BDA0003305279490000066
Is event type embedding vector transpose, d T Is a document-level embedding vector transpose. As in figure 2 +.>
Figure BDA0003305279490000067
The system is used for assisting global attention and avoiding ambiguity caused by local attention.
(4.4) weighting and fusing the local attention and the global attention, and improving the event detection precision, wherein the formula is as follows:
v s =α s ·t 1
v d =α d ·t 2
o=σ(λ·v s +(1-λ)·v d )
wherein the final output value o is defined by v s And v d Two parts. v s From alpha s And t 1 Dot product generation for capturing local features and simulating hidden event triggers; v d From alpha d And t 2 Dot product generation for capturing global features and context information. Sigma is a Sigmoid function, λ ε [0,1]Is at v s And v d Super parameters for trade-off between.
Step (5) adopting a Focal loss function to avoid the problem of imbalance of positive and negative samples and realize classification of all documents
The data set is processed in sentence units, training data is formed in < sentences, event type > pairs, and the label of the given sentence is 1 or 0, for example < near dam bank, hub area side slope and highway inspection conditions, and the given sentence represents whether t type events are transmitted or not: no abnormality, daily inspection > training pair label 1, < near dam bank, hub area side slope and highway inspection condition: no anomaly exists, the label of the earthquake > training pair is 0, and as the number of events which can be expressed by a single sentence is less than the number of predefined events, the unbalanced problem that the number of negative samples is far greater than the number of positive samples is caused by two-class identification is solved, a Focal loss function is introduced, the influence of the positive samples and the refractory samples on the model is enhanced, and the calculation formula is as follows:
Figure BDA0003305279490000071
where x is composed of sentences and target event types, y ε {0,1}, o (x (i) ) Is a model predictive value of the model, and, |θ| 2 Is the sum of squares of all elements in the model, delta is more than 0, the weight of an L2 normalization term, beta is a parameter of the positive and negative weight proportion of a balance sample, gamma is a parameter of the balance sample which is difficult to classify and easy to classify weight proportion, and beta=0.25 and gamma=2 are set in the experiment;
and finally, performing event detection on the dam operation log file by using the trained model, and classifying the documents based on event types contained in each document.
An emergency working condition event detection system based on a dam operation log, which carries out event detection on the dam operation log in the water conservancy field, comprises:
the log file preprocessing module: firstly, sorting and splitting dam operation logs according to the recording date, marking each document, sorting sentences in each document, marking and word segmentation, marking entity types and parts of speech of each word, and then constructing a dam emergency working condition event type set;
the code vector embedding module: encoding all the word fragments in the dam operation log by using an ALBERT preprocessing model, and converting the encoded word fragments into embedded vectors corresponding to the words;
BiLSTM feature fusion module: using BiLSTM to fuse an embedded vector, a named entity type and a part-of-speech labeling vector corresponding to the word segmentation, and reinforcing semantic information of the word segmentation;
a dual-attention mechanism semantic enhancement module: context information is fused using sentence-document dual attention;
one model: the model is trained by adopting a Focal loss function, and classification of all document attribution events is carried out by using the trained model.
It will be apparent to those skilled in the art that the steps of the dam log based emergency condition event detection method or the dam log based emergency condition event detection system of the embodiments of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device, or distributed over a network of computing devices, or they may alternatively be implemented in program code executable by a computing device, such that they may be stored in a memory device for execution by the computing device, and in some cases, the steps shown or described may be performed in a different order than herein, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.

Claims (5)

1. The emergency working condition event detection method based on the dam operation log is used for carrying out event detection on the dam operation log in the water conservancy field and is characterized by comprising the following steps:
(1) Preprocessing a log file: firstly, sorting and splitting dam operation logs according to the recording date, marking each document, sorting sentences in each document, marking and word segmentation, marking entity types and parts of speech of each word, and then constructing a dam emergency working condition event type set;
(2) Encoding vector embedding: encoding all the word fragments in the dam operation log by using an ALBERT preprocessing model, and converting the encoded word fragments into embedded vectors corresponding to the words; coding all the segmented words by using an ALBERT pre-training model, and converting the coded words into vector representations which can be processed by a computer;
(3) BiLSTM feature fusion: using BiLSTM to fuse an embedded vector, a named entity type and a part-of-speech labeling vector corresponding to the word segmentation, and reinforcing semantic information of the word segmentation;
(4) Dual attention mechanism semantic enhancement: context information is fused using sentence-document dual attention;
(5) Training a model by using the Focal loss function and realizing classification: the classification of all document attribution events is realized by adopting a Focal loss function training model;
the step (1) comprises the following steps:
firstly, dividing a dam operation log file into a plurality of documents according to log record dates, sequencing and marking each document, sequencing and marking sentences in each document, and using jieba to divide words in word units;
the method comprises the steps of (1.2) marking entity types and parts of speech of a word segmentation result, converting the entity types into low-dimensional vectors by searching an embedded table which is randomly initialized, marking parts of speech of each word by using Stanford CoreNLP, and then converting the parts of speech into the low-dimensional vectors by searching the embedded table;
(1.3) predefining the event types of dam emergency working conditions, including earthquake, heavy rain, flood discharge, pre-flood safety large inspection, comprehensive special inspection, daily maintenance and daily inspection events;
the step (3) comprises the following steps:
(3.1) connecting an embedded vector, an entity type vector and a part-of-speech tagging vector corresponding to each word in series, wherein the embedded vector is the vector obtained in the step (2), the entity type vector is the mathematical vector of the recognition result of the named entity of all the words, and the part-of-speech tagging vector is the mathematical vector of the part-of-speech tagging result of all the words;
(3.2) processing the serial vectors in a single sentence by using the BiLSTM model, wherein each vector is an input, capturing word context information by using the bidirectional LSTM unit, and respectively outputting two hidden states
Figure FDA0003962959420000011
And->
Figure FDA0003962959420000012
Synthesizing the two vectors into an output vector +.>
Figure FDA0003962959420000013
In the step (4): in the training set, the emergency working condition predefined event contained in each sentence is converted into an embedded vector t by searching a randomly initialized embedded table 1 Converting each document into an embedded vector d by using Dov2 Vec; for all the words in each sentence, calculating the weight of each word in the sentence by using a local attention mechanism, improving the attention weight of the word triggering the target event type and simulating a trigger;
the step (4) comprises the following steps:
(4.1) in the training set, converting the emergency condition predefined event contained in each sentence into an embedded vector t by searching a randomly initialized embedded table 1 Converting each document into an embedded vector d by using Dov2 Vec;
(4.2) for all the words in each sentence, calculating the weight of each word in the sentence by using a local attention mechanism, improving the attention weight of the word triggering the target event type and simulating the trigger, wherein the calculation formula is as follows:
Figure FDA0003962959420000021
wherein h is k Is the kth part of the output vector h,
Figure FDA0003962959420000022
is the local attention vector alpha s Part k of (a)>
Figure FDA0003962959420000023
Is a transpose of the event type embedding vector;
(4.3) for all the words in each sentence, calculating the weight of the sentence in which the word is located in the document by using a global attention mechanism, obtaining the unique meaning of the trigger in the scene, assisting in judging the event type of the sentence, solving the ambiguity problem of the trigger due to the context information, and adopting the following calculation formula:
Figure FDA0003962959420000024
wherein h is k Is the kth part of the output vector h,
Figure FDA0003962959420000025
is the global attention vector alpha d Part k of (a)>
Figure FDA0003962959420000026
Is event type embedding vector transpose, d T Is a document-level embedded vector transpose;
(4.4) weighting and fusing the local attention and the global attention, and improving the event detection precision, wherein the formula is as follows:
v s =α s ·t 1
v d =α d ·t 2
o=σ(λ·v s +(1-λ)·v d )
wherein the final output value o is defined by v s And v d Two parts are formed; v s From alpha s And t 1 Dot product generation for capturing local features and simulating hidden event triggers; v d From alpha d And t 2 Dot product generation for capturing global features and context information; sigma is a Sigmoid function, λ ε [0,1]Is at v s And v d Super parameters for trade-off between.
2. The method for detecting emergency condition events based on dam operation log according to claim 1, wherein in the step (5), the data set is processed in sentence units, training data is formed by < sentence, event type > pairs, and the training data represents whether a given sentence conveys an event of t type, the label is 1 or 0, a Focal loss function is introduced, the influence of positive samples and refractory samples on the model is enhanced, and the calculation formula is as follows:
Figure FDA0003962959420000031
where x is composed of sentences and target event types, y ε {0,1}, o (x (i) ) Is a model predictive value of the model, and, |θ| 2 Is the sum of squares of all elements in the model, delta is more than 0, the weight of the L2 normalization term, beta is a parameter for balancing the positive and negative weight proportion of the sample, and gamma is a parameter for balancing the difficult-to-classify and easy-to-classify weight proportion of the sample;
and finally, performing event detection on the dam operation log file by using the trained model, and classifying the event types contained in each document.
3. An emergency working condition event detection system based on a dam operation log, which carries out event detection on the dam operation log in the water conservancy field, is characterized by comprising:
the log file preprocessing module: firstly, sorting and splitting dam operation logs according to the recording date, marking each document, sorting sentences in each document, marking and word segmentation, marking entity types and parts of speech of each word, and then constructing a dam emergency working condition event type set;
the code vector embedding module: encoding all the word fragments in the dam operation log by using an ALBERT preprocessing model, and converting the encoded word fragments into embedded vectors corresponding to the words; coding all the segmented words by using an ALBERT pre-training model, and converting the coded words into vector representations which can be processed by a computer;
BiLSTM feature fusion module: using BiLSTM to fuse an embedded vector, a named entity type and a part-of-speech labeling vector corresponding to the word segmentation, and reinforcing semantic information of the word segmentation;
a dual-attention mechanism semantic enhancement module: context information is fused using sentence-document dual attention;
one model: training a model by adopting a Focal loss function, and classifying all document attribution events by using the trained model;
the log file preprocessing module is realized as follows:
firstly, dividing a dam operation log file into a plurality of documents according to log record dates, sequencing and marking each document, sequencing and marking sentences in each document, and using jieba to divide words in word units;
the method comprises the steps of (1.2) marking entity types and parts of speech of a word segmentation result, converting the entity types into low-dimensional vectors by searching an embedded table which is randomly initialized, marking parts of speech of each word by using Stanford CoreNLP, and then converting the parts of speech into the low-dimensional vectors by searching the embedded table;
(1.3) predefining the event types of dam emergency working conditions, including earthquake, heavy rain, flood discharge, pre-flood safety large inspection, comprehensive special inspection, daily maintenance and daily inspection events;
the BiLSTM feature fusion module is realized as follows:
(3.1) connecting an embedded vector, an entity type vector and a part-of-speech tagging vector corresponding to each word in series, wherein the embedded vector is the vector obtained in the step (2), the entity type vector is the mathematical vector of the recognition result of the named entity of all the words, and the part-of-speech tagging vector is the mathematical vector of the part-of-speech tagging result of all the words;
(3.2) processing the serial vectors in a single sentence by using the BiLSTM model, wherein each vector is an input, capturing word context information by using the bidirectional LSTM unit, and respectively outputting two hidden states
Figure FDA0003962959420000041
And->
Figure FDA0003962959420000042
Synthesizing the two vectors into an output vector +.>
Figure FDA0003962959420000043
In the dual-attention mechanism semantic enhancement module: in the training set, the emergency working condition predefined event contained in each sentence is converted into an embedded vector t by searching a randomly initialized embedded table 1 Converting each document into an embedded vector d by using Dov2 Vec; for all the words in each sentence, calculating the weight of each word in the sentence by using a local attention mechanism, improving the attention weight of the word triggering the target event type and simulating a trigger;
the dual-attention mechanism semantic enhancement module converts emergency working condition predefined events contained in each sentence into an embedded vector t by searching a randomly initialized embedded table 1 Converting each document into an embedded vector d by using Dov2 Vec; for all the words in each sentence, calculating the weight of each word in the sentence by using a local attention mechanism, improving the attention weight of the word triggering the target event type and simulating a trigger; the method comprises the following steps:
(4.1) in the training set, searching the emergency working condition predefined event contained in each sentenceConversion of a randomly initialized embedding table into an embedding vector t 1 Converting each document into an embedded vector d by using Dov2 Vec;
(4.2) for all the words in each sentence, calculating the weight of each word in the sentence by using a local attention mechanism, improving the attention weight of the word triggering the target event type and simulating the trigger, wherein the calculation formula is as follows:
Figure FDA0003962959420000044
wherein h is k Is the kth part of the output vector h,
Figure FDA0003962959420000045
is the local attention vector alpha s Part k of (a)>
Figure FDA0003962959420000046
Is a transpose of the event type embedding vector;
(4.3) for all the words in each sentence, calculating the weight of the sentence in which the word is located in the document by using a global attention mechanism, obtaining the unique meaning of the trigger in the scene, assisting in judging the event type of the sentence, solving the ambiguity problem of the trigger due to the context information, and adopting the following calculation formula:
Figure FDA0003962959420000047
wherein h is k Is the kth part of the output vector h,
Figure FDA0003962959420000048
is the global attention vector alpha d Part k of (a)>
Figure FDA0003962959420000049
Is event type embedding vector transpose, d T Is a document-level embedded vector transpose;
(4.4) weighting and fusing the local attention and the global attention, and improving the event detection precision, wherein the formula is as follows:
v s =α s ·t 1
v d =α d ·t 2
o=σ(λ·v s +(1-λ)·v d )
wherein the final output value o is defined by v s And v d Two parts are formed; v s From alpha s And t 1 Dot product generation for capturing local features and simulating hidden event triggers; v d From alpha d And t 2 Dot product generation for capturing global features and context information; sigma is a Sigmoid function, λ ε [0,1]Is at v s And v d Super parameters for trade-off between.
4. A computer device, characterized by: the computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the dam operation log-based emergency condition event detection method according to any one of claims 1-2 when executing the computer program.
5. A computer-readable storage medium, characterized by: the computer readable storage medium stores a computer program for executing the emergency condition event detection method based on the dam operation log according to any one of claims 1 to 2.
CN202111202004.4A 2021-10-15 2021-10-15 Emergency working condition event detection method based on dam operation log Active CN113901815B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111202004.4A CN113901815B (en) 2021-10-15 2021-10-15 Emergency working condition event detection method based on dam operation log

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111202004.4A CN113901815B (en) 2021-10-15 2021-10-15 Emergency working condition event detection method based on dam operation log

Publications (2)

Publication Number Publication Date
CN113901815A CN113901815A (en) 2022-01-07
CN113901815B true CN113901815B (en) 2023-05-05

Family

ID=79192213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111202004.4A Active CN113901815B (en) 2021-10-15 2021-10-15 Emergency working condition event detection method based on dam operation log

Country Status (1)

Country Link
CN (1) CN113901815B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115082898A (en) * 2022-07-04 2022-09-20 小米汽车科技有限公司 Obstacle detection method, obstacle detection device, vehicle, and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135457A (en) * 2019-04-11 2019-08-16 中国科学院计算技术研究所 Event trigger word abstracting method and system based on self-encoding encoder fusion document information

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049532A (en) * 2012-12-21 2013-04-17 东莞中国科学院云计算产业技术创新与育成中心 Method for creating knowledge base engine on basis of sudden event emergency management and method for inquiring knowledge base engine
CN111881258B (en) * 2020-07-28 2023-06-20 广东工业大学 Self-learning event extraction method and application thereof
CN112612871B (en) * 2020-12-17 2023-09-15 浙江大学 Multi-event detection method based on sequence generation model
CN112765952A (en) * 2020-12-28 2021-05-07 大连理工大学 Conditional probability combined event extraction method under graph convolution attention mechanism
CN113312500B (en) * 2021-06-24 2022-05-03 河海大学 Method for constructing event map for safe operation of dam

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135457A (en) * 2019-04-11 2019-08-16 中国科学院计算技术研究所 Event trigger word abstracting method and system based on self-encoding encoder fusion document information

Also Published As

Publication number Publication date
CN113901815A (en) 2022-01-07

Similar Documents

Publication Publication Date Title
CN111079430B (en) Power failure event extraction method combining deep learning and concept map
CN109635109A (en) Sentence classification method based on LSTM and combination part of speech and more attention mechanism
CN112836046A (en) Four-risk one-gold-field policy and regulation text entity identification method
CN111274814B (en) Novel semi-supervised text entity information extraction method
CN112560486A (en) Power entity identification method based on multilayer neural network, storage medium and equipment
CN113157859B (en) Event detection method based on upper concept information
CN113191148A (en) Rail transit entity identification method based on semi-supervised learning and clustering
Marreddy et al. Clickbait detection in telugu: Overcoming nlp challenges in resource-poor languages using benchmarked techniques
CN113901815B (en) Emergency working condition event detection method based on dam operation log
CN112257425A (en) Power data analysis method and system based on data classification model
CN109446299A (en) The method and system of searching email content based on event recognition
CN112328792A (en) Optimization method for recognizing credit events based on DBSCAN clustering algorithm
CN112232078A (en) Scheduling operation ticket auditing method based on bidirectional GRU and attention mechanism
CN111178080A (en) Named entity identification method and system based on structured information
CN111079582A (en) Image recognition English composition running question judgment method
Kayesh et al. A deep learning model for mining and detecting causally related events in tweets
CN113869054A (en) Deep learning-based electric power field project feature identification method
CN114492392A (en) Annual report risk mining system and method based on phrase vector construction
Hu et al. A classification model of power operation inspection defect texts based on graph convolutional network
CN112488593B (en) Auxiliary bid evaluation system and method for bidding
Wang et al. Disaster Detector on Twitter Using Bidirectional Encoder Representation from Transformers with Keyword Position Information
CN114298041A (en) Network security named entity identification method and identification device
CN116627915B (en) Dam emergency working condition event detection method and system based on slot semantic interaction
CN103119585B (en) Knowledge acquisition device and method
Sakahira et al. Creating a Disaster Chain Diagram from Japanese Newspaper Articles Using Mechanical Methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant