CN114201970A - Method and device for capturing power grid scheduling event detection based on semantic features - Google Patents

Method and device for capturing power grid scheduling event detection based on semantic features Download PDF

Info

Publication number
CN114201970A
CN114201970A CN202111393510.6A CN202111393510A CN114201970A CN 114201970 A CN114201970 A CN 114201970A CN 202111393510 A CN202111393510 A CN 202111393510A CN 114201970 A CN114201970 A CN 114201970A
Authority
CN
China
Prior art keywords
information
vocabulary
vocabularies
text information
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111393510.6A
Other languages
Chinese (zh)
Inventor
张亮
翟海保
屈刚
葛敏辉
李慧星
许凌
金皓纯
杜宽
韩博文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Branch Of State Grid Corp ltd
Beijing Kedong Electric Power Control System Co Ltd
Original Assignee
East China Branch Of State Grid Corp ltd
Beijing Kedong Electric Power Control System Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Branch Of State Grid Corp ltd, Beijing Kedong Electric Power Control System Co Ltd filed Critical East China Branch Of State Grid Corp ltd
Priority to CN202111393510.6A priority Critical patent/CN114201970A/en
Publication of CN114201970A publication Critical patent/CN114201970A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06313Resource planning in a project environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply

Abstract

The invention discloses a method and a device for capturing power grid scheduling event detection based on semantic features, wherein the method comprises the steps of obtaining text information with field vocabularies and common vocabularies; training the field vocabulary representation information and the common vocabulary representation information to obtain basic data text information; processing the vocabulary in the basic data text information to obtain the processed text information; performing algorithm verification on information related to Chinese in the processed text information, and training through a mixed characterization algorithm; and inputting the preliminarily classified vocabulary information into a pre-constructed convolutional neural network to obtain a final detection result. The method can help the power grid dispatching management to extract events and entities in work deeply through automatic and intelligent means, and therefore the working efficiency of workers is improved.

Description

Method and device for capturing power grid scheduling event detection based on semantic features
Technical Field
The invention relates to a method and a device for capturing power grid scheduling event detection based on semantic features, and belongs to the technical field of semantic recognition.
Background
With the gradual deepening and research of the construction requirements of double-carbon and novel electric power systems, the construction of various informationized and intelligent systems for power grid dispatching is also suddenly advanced, but in the construction process, the quantity and types of various trace documents for power grid dispatching are gradually increased, and more automatic and intelligent means are needed to help the power grid dispatching management to generate the extraction of events and entities in work, so that the working efficiency of workers is improved.
In the current development process of artificial intelligence, event extraction is one of the important tasks of natural language processing, and Event Detection (ED) is one of the key steps of event extraction, and event trigger words are identified by deep learning, and event classification is further realized, aiming at identifying event instances of a specific type by a specific type in a plain text.
To date, many methods [1-2] have been proposed and better performance has been obtained. Among them, the conventional media event monitoring method based on document mainly detects events through text similarity and clustering, and Yang [3-4] et al propose the basic steps of event detection based on document method, including text preprocessing, data representation, data organization or clustering, which are the basic components of many event detection methods so far. Salton [5] used TFIDF in its paper approach to weight important words in documents from a corpus, and is also widely used by later event detection methods. To improve the shortages of term vector and bag-of-word models, Kumaran [6] et al propose a VSM text vector model fused with named entities, which is used for weighting important word features and making up the shortages of TFIDF.
However, existing event extraction methods find capturing enough semantic information from plain text to be challenging to find, as words may have different meanings in different sentences.
Such as "sentence 1: the total installed capacity of the A provincial fire power station is xxxkW 'and' sentence 2: the behavior stopping specification when the power supply operator stands requires that the word of 'standing' in two sentences represents different meanings under different sentence scenes; in addition, in sentence 1, "installed capacity" is a complete vocabulary in a power grid service scene, and the "installed capacity" is divided into two words, namely "installed" and "capacity", in a common word segmentation tool, so that information extraction cannot be correctly and completely performed. Therefore, it is difficult to make full use of these cue words for efficient information extraction by conventional word embedding.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a method and a device for capturing power grid scheduling event detection based on semantic features, which can identify event instances of a specific type in plain text. The method can help the power grid dispatching management to extract events and entities in work deeply through automatic and intelligent means, and therefore the working efficiency of workers is improved.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
in a first aspect, the invention provides a method for capturing power grid scheduling event detection based on semantic features, which comprises the following steps:
acquiring text information with field vocabularies and common vocabularies;
training the text information to obtain domain vocabulary representation information and common vocabulary representation information;
training the field vocabulary representation information and the common vocabulary representation information to obtain basic data text information;
processing the vocabulary in the basic data text information to obtain the processed text information; the processing content comprises labeling vocabularies, analyzing the dependency relationship among the vocabularies, and calculating the distance between each type of vocabularies and a head;
performing algorithm verification on information related to Chinese in the processed text information, and training through a mixed characterization algorithm;
dividing the Chinese related information into vocabularies, and primarily classifying the vocabularies;
and inputting the preliminarily classified vocabulary information into a pre-constructed convolutional neural network to obtain a final detection result.
Further, training the text information to obtain domain vocabulary representation information and common vocabulary representation information, including:
acquiring characteristics from the domain vocabulary and the common vocabulary according to the Token-Level neural network;
let T equal T1,t2,...,tnWherein t isiIs the number of tokens in the sentence, and let xiIs tiIs embedded and tiTo tcThe convolutional layer with window size s is introduced to capture the constituent semantics, and the formula is as follows:
hij=tanh(wi·xj:j+s-1+bi) (1)
Figure BDA0003369159560000031
equation (1) shows the convolution process, where wiBeing a filter of convolutional layers, xi: i + j is the embedding layer from xjTo hj+ s-1 series connection, biIs an offset;
equation (2) provides important signals for different parts of a sentence by using dynamic multi-pools, where
Figure BDA0003369159560000032
Association tcThe aggregated results on the left-hand side,
Figure BDA0003369159560000033
association tcCollecting results on the right side; by connecting
Figure BDA0003369159560000034
And
Figure BDA0003369159560000035
obtaining a representation fbword of a field vocabulary; by using the same process on a common vocabulary level sequence, a representation fnword of the common vocabulary is obtained.
Further, processing the vocabulary in the basic data text information to obtain the processed text information includes:
performing part-of-speech tagging on the vocabulary through a POS (point-of-sale), wherein the POS is the part-of-speech of the word;
analyzing the dependency relationship among the vocabularies through DR;
and calculating the distance between each class of words and the Head through a DIS algorithm.
Further, performing algorithm verification on information related to Chinese in the processed text information, and training through a mixed characterization algorithm, wherein a formula is as follows:
αTi=s(WTif′char+UTif′word+VTif′F+bTi) (3)
αTc=s(WTcf′char+UTcf′word+VTcf′F+bTc) (4)
wherein s is a sigmoid function, W is Rd ' xDw is Rd ' xDd ', U is Rd ' xDrd ' xDd ' and V is Rd ' xDv is Rd'd ' are weight matrices, and B is an offset;
constructing an 82-dimensional vector F' F, obtaining a concatenation of features and words as new representations: f 'h ═ F' F; ' nword ];
from the grid incorporating the trigger kernel generator and the event type classifier, a final vector is obtained as input, the formula is as follows:
fTi=αTif′char+(1-αTi)f′h (5)
fTc=αTcf′char+(1-αTc)f′h (6)
where fTi is the hybrid signature of the trigger recognition, fTc is the hybrid function of the event type classifier, α Ti and (1- α Ti) represent the importance of fchar 'and f' h in the trigger recognition, respectively, and α Tc plays a similar role in the event type classifier.
Further, inputting the preliminarily classified vocabulary information into a pre-constructed convolutional neural network to obtain a final detection result, including:
obtaining embedding and relative position xiConcatenating xi to the feature vectors defined above, and then using the vocabulary level features as the input of the convolutional layer to capture the constituent semantics and obtain the feature map;
feature map output c based on the number of convolution kernels in the sentencejDivided into i parts, with dynamic multicell, the final output of a filter being represented by pji=max(cji) Given to obtain p for each feature mapjiAnd all of pjiConnected to the final result formed.
Further, the method also comprises the following steps: and connecting the feature vector and the vocabulary features into a vector Fword, acquiring a character-level vector Fchar, and generating two mixed representations for the trigger recognition and trigger type classification layers.
In a second aspect, the present invention provides a device for capturing power grid scheduling event detection based on semantic features, including:
the acquisition unit is used for acquiring text information with field vocabularies and common vocabularies;
the first training unit is used for training the text information to obtain field vocabulary representation information and common vocabulary representation information;
the second training unit is used for training the field vocabulary representation information and the common vocabulary representation information to obtain basic data text information;
the processing unit is used for processing the vocabulary in the basic data text information and acquiring the processed text information; the processing content comprises labeling vocabularies, analyzing the dependency relationship among the vocabularies, and calculating the distance between each type of vocabularies and a head;
a hybrid representation training unit; the system is used for carrying out algorithm verification on information related to Chinese in the processed text information and training the information through a mixed characterization algorithm;
the classification unit is used for dividing the Chinese-related information into vocabularies and primarily classifying the vocabularies;
and the detection result acquisition unit is used for inputting the preliminarily classified vocabulary information into a pre-constructed convolutional neural network to acquire a final detection result.
In a third aspect, the invention provides a device for capturing power grid scheduling event detection based on semantic features, which comprises a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any of the above.
In a fourth aspect, the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods described above.
Compared with the prior art, the invention has the following beneficial effects:
the present invention employs a hybrid representation method for learning information from words, characters and dependencies. Learning two separate character-Level and word-Level representations by using a Token-Level neural network; the dependency information is obtained from the dependency analyzer, the representation of the dependency information is generated by reading the thermal codes, and the event instance of the specific type can be identified by the specific type in the plain text through the research of a technical method and the determination of related experiments, so that the extraction of the events and entities in the work of the power grid dispatching management can be helped through an automatic and intelligent means, and the working efficiency of workers can be improved.
Drawings
Fig. 1 is a schematic structural diagram of a method and an apparatus for capturing power grid scheduling event detection based on semantic features according to an embodiment of the present invention.
Fig. 2 is a schematic cross-sectional structure diagram of a method and an apparatus for capturing power grid scheduling event detection based on semantic features according to an embodiment of the present invention.
In the figure:
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
Example 1
The embodiment introduces a method and a device for capturing power grid scheduling event detection based on semantic features, which comprise the following steps:
acquiring text information with field vocabularies and common vocabularies;
training the text information to obtain domain vocabulary representation information and common vocabulary representation information;
training the field vocabulary representation information and the common vocabulary representation information to obtain basic data text information;
processing the vocabulary in the basic data text information to obtain the processed text information; the processing content comprises labeling vocabularies, analyzing the dependency relationship among the vocabularies, and calculating the distance between each type of vocabularies and a head;
performing algorithm verification on information related to Chinese in the processed text information, and training through a mixed characterization algorithm;
dividing the Chinese related information into vocabularies, and primarily classifying the vocabularies;
and inputting the preliminarily classified vocabulary information into a pre-constructed convolutional neural network to obtain a final detection result.
The method and the device for capturing the power grid scheduling event detection based on the semantic features provided by the embodiment specifically relate to the following steps in an application process:
the model provided by the method is divided into two stages, and the mixed representation is processed through a dynamic multi-pool convolutional neural network; the first stage is trigger recognition, and the character composition structure of the trigger is utilized by using a Token-Level neural network to capture potential trigger semantics containing related roles. The second phase is used to determine a specific type of event, which is called trigger type classification. We use dependency information extraction from the dependency parser to generate a feature representation through Token-Level neural networks in both stages; finally it is combined with the word feature representation and a hybrid representation is obtained.
The architecture of the power grid dispatching semantic event detection method is composed of four parts, including representation of input sequences, characteristic representation based on a dependency parser, a hybrid representation method, and a dynamic multi-polling convolutional neural network, which is shown in fig. 2:
first, triggering recognition-characterization of the input sequence, herein using two levels of embedding, namely domain vocabulary embedding and ordinary vocabulary level embedding. To further improve performance, pre-trained weights are used herein to initialize the embedding. And obtaining features from the domain vocabulary and the common vocabulary according to the two Token-Level neural networks. The network architecture is similar to NPNS. Let T equal T1,t2,...,tnWherein t isiIs the number of tokens in the sentence, and let xiIs tiEmbedding (domain vocabulary or common words) and tiTo tcThe concatenation of (1) introduces a convolutional layer of window size s to capture the constituent semantics, whose hidden layer identification is as follows:
hij=tanh(wi·xj:j+s-1+bi) (1)
Figure BDA0003369159560000081
equation (1) shows the convolution process, where wiBeing a filter of convolutional layers, xi: i + j is the embedding layer from xjTo hj+ s-1 series connection, biIs an offset.
Equation (2) provides important signals for different parts of a sentence by using dynamic multi-pools, where
Figure BDA0003369159560000082
Association tcThe aggregated results on the left-hand side,
Figure BDA0003369159560000083
association tcCollecting results on the right side; by connecting
Figure BDA0003369159560000084
And
Figure BDA0003369159560000085
and obtaining the representation fbword of the field vocabulary. By using the same process on a common vocabulary level sequence, we can also obtain the common vocabulary's characterization fnword. The method comprises the following steps:
1) inputting text information with field vocabularies and common vocabularies for training;
2) and (3) performing training processing through a formula (1) and a formula (2) to obtain field vocabulary representation information and common vocabulary representation information.
3) Training mature and available field vocabulary representation information and common vocabulary representation information, and storing the field vocabulary representation information and the common vocabulary representation information as basic data information of a dependency analyzer algorithm in a power grid information corpus;
second, trigger recognition-feature characterization based on dependency analysis, the dependency parser is an important part of dependency syntax analysis based on dependencies. Syntactic dependencies may be used to obtain deep semantic information. Syntactic dependencies are used in neural network models by directly incorporating them into the embedding. In this work, we use three different feature abstraction layers to represent three functions:
POS: and (5) part-of-speech tagging.
DR: and (5) analyzing the dependency relationship.
DIS: distance from Head
POS is the part-of-speech of words that plays an important role in natural language, such as named entity recognition, parsing, and event extraction. Nouns or pronouns may be topics in sentences, but are not interconvertible. This is because the grammar component has limitations on the voice component. Thus, the POS is adapted to abstract features to express different features of textual semantic information. POS is used herein as a feature to enhance word-based features. There are roughly 52 common parts of speech in chinese, and a 52-dimensional one-hot vector will be used herein to represent part of the sentence POS. This means that the POS for each word in the sentence can be represented as a 52-dimensional feature vector. Each dimension of the 52-dimensional vector represents a portion of the POS. Indicating the position of a word if one of the values is 1 and the remaining 52-dimensional values are 0.
Dependencies express semantic relationships between sentence components. For event detection tasks, triggers are typically predicates (i.e., verbs). Generally, in a power grid dispatching corpus, the role of a trigger as a verb object accounts for 19%. Thus, it is considered herein that dependencies can be used to improve trigger detection. At the feature level of the dependency, the vector dimension is 23(22 relation types and one 'other' type). We have found that 22 dependencies often use syntactic dependencies, and to reduce the complexity of the feature representation, we classify other dependencies as "other" types.
The distance from the head is dependent on the length of the path. Specifically, if a word is directly related to the head, the distance to the head is defined herein to be 1. if the path includes an intermediate dependency, the distance to the head is 2. for example, in the sentence of FIG. 1, the dependency path for a secondary loop malfunction to cause a heavy gas alarm is as follows:
secondary loop false action- → triggering- → heavy gas alarm
Wherein, the 'secondary circuit malfunction' is the head in the sentence, and causes the intermediate dependence between the head and the 'heavy gas alarm'.
The method comprises the following steps:
1) as in fig. 1, inputting text information;
2) based on the step of representing the input sequence, acquiring words in the text information, including field words and common words;
3) performing part-of-speech tagging on the vocabulary through the POS;
4) analyzing the dependency relationship among the vocabularies through DR;
5) calculating the distance between each vocabulary and the Head through a DIS algorithm;
6) and the information of the vocabularies, the part of speech analysis, the dependency relationship, the distance and the like of the first two steps is used as the information input of the step of 'mixed characterization learning'.
Third, hybrid token learning, for chinese event detection, does not obtain enough information using only domain vocabulary representations or ordinary vocabulary level representations. For example, if one understands a "trigger" in a character-level representation, it is a trigger consisting of a 'lead' and a 'hair'. In a word-level representation, word-level sequences may provide more explicit information to distinguish between the semantics of "quotes".
After embedding the layers, word-level feature representations fbword, a character-level representation fchar and a feature representation F' F may be obtained. The information flow of the trigger recognition and event type classifier is simulated herein by learning two gates, the formula is as follows:
αTi=s(WTif′char+UTif′word+VTif′F+bTi) (3)
αTc=s(WTcf′char+UTcf′word+VTcf′F+bTc) (4)
s is a sigmoid function, W is Rd ' x d ' W is Rd ' x d ', U is Rd ' x d ' r x d ' and V is Rd ' x d ' V is Rd'd ' are weight matrices, B is an offset
Based on the three feature layers in fig. 2, three feature representations are obtained. An 82-dimensional vector F' F is constructed by concatenating the three feature representations. Finally, a concatenation of features and words can be obtained as a new representation: f 'h ═ F' F; ' nword ]
From the grid introducing the trigger withhold generator and the event type classifier, we can obtain the final vector as input.
fTi=αTif′char+(1-αTi)f′h (5)
fTc=αTcf′char+(1-αTc)f′h (6)
Where fTi is the blending feature of the trigger recognition and fTc is the blending function of the event type classifier. α Ti and (1- α Ti) represent the importance of fchar 'and f' h in trigger recognition, respectively. α Tc plays a similar role in the event type classifier.
The method comprises the following steps:
1) the processing results of the first two steps relating to Chinese information are subjected to data input in the step;
2) carrying out algorithm verification on the vocabulary information divided in the first step, and avoiding decomposing the vocabulary level information into character level information;
3) the mixed characterization algorithm training is realized through a formula (3), a formula (4), a formula (5) and a formula (6);
4) carrying out more reasonable vocabulary division on Chinese related information;
5) and carrying out primary classification on the vocabularies through an event type classifier, and inputting the vocabularies as data information of a dynamic multi-pooling convolutional neural network in the next step.
And fourthly, the convolutional neural network is dynamically multi-pooled, and the traditional convolutional neural network only uses one pool layer to realize maximum operation. This means that conventional convolutional neural networks capture only the most important information in the representation of the sentence. In event detection, a sentence may contain two or more events, and the parameters may use different triggers to acquire different characters. However, conventional convolutional neural network wisdom captures the most useful information of an entire sentence and loses other information in the same sentence. To address the above problems, there are teams that obtain more valuable information through a dynamic multi-pool convolutional neural network (DMCNN) without losing the largest pool.
In the present study, a similar neural network will be used, the architecture of which triggers detection is described in fig. 3.
First, obtain a mosaicIn and relative position xiConcatenating xi to the feature vector defined above, and then taking the vocabulary level features as input to the convolutional layer to capture the constituent semantics and obtain the feature map. Specifically, the convolution operation produces a new function by scanning a window of H words with a convolution kernel. Let x bei: i + j refers to the word xi,xi+1,...,xi+ j in series. The convolution kernel is applied to a window x of H words in a sentence1:h,x2:h+1,...,xnh+1: n to generate a feature map ciWhere the index is from 1 to NH +1. A convolution kernel yields a feature i of a locationi:cij=σ(wj·xi: i + h-1+ b) where σ is non-linear (Tanh is typically used), j ranges from 1 to m, and m is the number of convolution kernels.
The feature map output c is then output according to the number of convolution kernels in the sentencejIs divided into i parts. For example, if a sentence has one convolution kernel, the sentence will be divided into two parts, and when there are two convolution kernels, the two triggers will divide the sentence into three parts. By dynamic multisubes, the final output of a filter is represented by pji=max(cji) Given to obtain p for each feature mapjiAnd all of pjiConnected to the final result formed.
Finally, the above feature vectors and the vocabulary features are connected into a vector Fword. We use a similar approach to obtain the character-level vector Fchar, producing two mixed representations for the trigger recognition and trigger type classification layers.
Finally, training and classification of the algorithm, in the course of which event detection is considered herein as a multi-level classification problem. The hybrid representation fN learned from the architecture described above is the input to the convolutional neural network. The input is a participled sentence. And Dropout was added to prevent overfitting.
Event trigger words can be further researched and identified by deep learning through experiments, and event classification can be further realized, and the purpose of identifying event instances of a specific type in a plain text is to identify the event instances of the specific type.
And finally, with event identification and classification as targets, constructing a system structure of the power grid scheduling semantic event detection method through four parts of input sequence characterization, dependency analysis-based feature characterization, mixed characterization learning and a dynamic multi-pooling convolutional neural network. And the extraction of events and entities in the work of power grid dispatching management is deeply assisted by an automatic and intelligent means, so that the working efficiency of workers is improved.
Example 2
The embodiment provides a device for capturing power grid scheduling event detection based on semantic features, which includes:
the acquisition unit is used for acquiring text information with field vocabularies and common vocabularies;
the first training unit is used for training the text information to obtain field vocabulary representation information and common vocabulary representation information;
the second training unit is used for training the field vocabulary representation information and the common vocabulary representation information to obtain basic data text information;
the processing unit is used for processing the vocabulary in the basic data text information and acquiring the processed text information; the processing content comprises labeling vocabularies, analyzing the dependency relationship among the vocabularies, and calculating the distance between each type of vocabularies and a head;
a hybrid representation training unit; the system is used for carrying out algorithm verification on information related to Chinese in the processed text information and training the information through a mixed characterization algorithm;
the classification unit is used for dividing the Chinese-related information into vocabularies and primarily classifying the vocabularies;
and the detection result acquisition unit is used for inputting the preliminarily classified vocabulary information into a pre-constructed convolutional neural network to acquire a final detection result.
Example 3
The embodiment provides a device for capturing power grid scheduling event detection based on semantic features, which comprises a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any one of:
acquiring text information with field vocabularies and common vocabularies;
training the text information to obtain domain vocabulary representation information and common vocabulary representation information;
training the field vocabulary representation information and the common vocabulary representation information to obtain basic data text information;
processing the vocabulary in the basic data text information to obtain the processed text information; the processing content comprises labeling vocabularies, analyzing the dependency relationship among the vocabularies, and calculating the distance between each type of vocabularies and a head;
performing algorithm verification on information related to Chinese in the processed text information, and training through a mixed characterization algorithm;
dividing the Chinese related information into vocabularies, and primarily classifying the vocabularies;
and inputting the preliminarily classified vocabulary information into a pre-constructed convolutional neural network to obtain a final detection result.
Example 4
The present embodiments provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of any of the methods:
acquiring text information with field vocabularies and common vocabularies;
training the text information to obtain domain vocabulary representation information and common vocabulary representation information;
training the field vocabulary representation information and the common vocabulary representation information to obtain basic data text information;
processing the vocabulary in the basic data text information to obtain the processed text information; the processing content comprises labeling vocabularies, analyzing the dependency relationship among the vocabularies, and calculating the distance between each type of vocabularies and a head;
performing algorithm verification on information related to Chinese in the processed text information, and training through a mixed characterization algorithm;
dividing the Chinese related information into vocabularies, and primarily classifying the vocabularies;
and inputting the preliminarily classified vocabulary information into a pre-constructed convolutional neural network to obtain a final detection result.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (9)

1. A method for capturing power grid scheduling event detection based on semantic features is characterized by comprising the following steps:
acquiring text information with field vocabularies and common vocabularies;
training the text information to obtain domain vocabulary representation information and common vocabulary representation information;
training the field vocabulary representation information and the common vocabulary representation information to obtain basic data text information;
processing the vocabulary in the basic data text information to obtain the processed text information; the processing content comprises labeling vocabularies, analyzing the dependency relationship among the vocabularies, and calculating the distance between each type of vocabularies and a head;
performing algorithm verification on information related to Chinese in the processed text information, and training through a mixed characterization algorithm;
dividing the Chinese related information into vocabularies, and primarily classifying the vocabularies;
and inputting the preliminarily classified vocabulary information into a pre-constructed convolutional neural network to obtain a final detection result.
2. The semantic feature-based grasping power grid scheduling event detection method according to claim 1, characterized in that: training the text information to obtain domain vocabulary representation information and common vocabulary representation information, including:
acquiring characteristics from the domain vocabulary and the common vocabulary according to the Token-Level neural network;
let T equal T1,t2,...,tnWherein t isiIs the number of tokens in the sentence, and let xiIs tiIs embedded and tiTo tcThe convolutional layer with window size s is introduced to capture the constituent semantics, and the formula is as follows:
hij=tanh(wi·xj:j+s-1+bi) (1)
Figure FDA0003369159550000011
equation (1) shows the convolution process, where wiBeing a filter of convolutional layers, xi: i + j is the embedding layer from xjTo hj+ s-1 series connection, biIs an offset;
equation (2) provides important signals for different parts of a sentence by using dynamic multi-pools, where
Figure FDA0003369159550000021
The aggregated result to the left of the association tc,
Figure FDA0003369159550000022
association tcCollecting results on the right side; by connecting
Figure FDA0003369159550000023
And
Figure FDA0003369159550000024
obtaining a representation fbword of a field vocabulary; by at the ordinary vocabulary levelThe same procedure is used on the sequence to obtain the common vocabulary characterization fnword.
3. The semantic feature-based grasping power grid scheduling event detection method according to claim 1, characterized in that: processing the vocabulary in the basic data text information to acquire the processed text information, wherein the processing comprises the following steps:
performing part-of-speech tagging on the vocabulary through a POS (point-of-sale), wherein the POS is the part-of-speech of the word;
analyzing the dependency relationship among the vocabularies through DR;
and calculating the distance between each class of words and the Head through a DIS algorithm.
4. The semantic feature-based grasping power grid scheduling event detection method according to claim 1, characterized in that: performing algorithm verification on information related to Chinese in the processed text information, and training through a mixed characterization algorithm, wherein the formula is as follows:
αTi=s(WTif′char+UTif′word+VTif′F+bTi) (3)
αTc=s(WTcf′char+UTcf′word+VTcf′F+bTc) (4)
wherein s is a sigmoid function, W is Rd 'xDw is Rd' xDd ', U is Rd' xDdd 'Rd' xDd 'and V is Rd' xDv is Rd'd' are weight matrices, and B is a bias;
constructing an 82-dimensional vector F' F, obtaining a concatenation of features and words as new representations: f 'h ═ F' F; ' nword ];
from the grid incorporating the trigger kernel generator and the event type classifier, a final vector is obtained as input, the formula is as follows:
fTi=αTif′char+(1-αTi)f′h (5)
fTc=αTcf′char+(1-αTc)f′h (6)
where fTi is the hybrid signature of the trigger recognition, fTc is the hybrid function of the event type classifier, α Ti and (1- α Ti) represent the importance of fchar 'and f' h in the trigger recognition, respectively, and α Tc plays a similar role in the event type classifier.
5. The semantic feature-based grasping power grid scheduling event detection method according to claim 1, characterized in that: inputting the preliminarily classified vocabulary information into a pre-constructed convolutional neural network to obtain a final detection result, wherein the final detection result comprises the following steps:
obtaining embedding and relative position xiConcatenating xi to the feature vectors defined above, and then using the vocabulary level features as the input of the convolutional layer to capture the constituent semantics and obtain the feature map;
feature map output c based on the number of convolution kernels in the sentencejDivided into i parts, with dynamic multicell, the final output of a filter being represented by pji=max(cji) Given to obtain p for each feature mapjiAnd all of pjiConnected to the final result formed.
6. The semantic feature-based grasping power grid scheduling event detection method according to claim 1, characterized in that: further comprising: and connecting the feature vector and the vocabulary features into a vector Fword, acquiring a character-level vector Fchar, and generating two mixed representations for the trigger recognition and trigger type classification layers.
7. A device for capturing power grid scheduling event detection based on semantic features is characterized by comprising:
the acquisition unit is used for acquiring text information with field vocabularies and common vocabularies;
the first training unit is used for training the text information to obtain field vocabulary representation information and common vocabulary representation information;
the second training unit is used for training the field vocabulary representation information and the common vocabulary representation information to obtain basic data text information;
the processing unit is used for processing the vocabulary in the basic data text information and acquiring the processed text information; the processing content comprises labeling vocabularies, analyzing the dependency relationship among the vocabularies, and calculating the distance between each type of vocabularies and a head;
a hybrid representation training unit; the system is used for carrying out algorithm verification on information related to Chinese in the processed text information and training the information through a mixed characterization algorithm;
the classification unit is used for dividing the Chinese-related information into vocabularies and primarily classifying the vocabularies;
and the detection result acquisition unit is used for inputting the preliminarily classified vocabulary information into a pre-constructed convolutional neural network to acquire a final detection result.
8. The utility model provides a device for snatching electric wire netting scheduling incident and detect based on semantic feature which characterized in that: comprising a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any one of claims 1 to 6.
9. A computer-readable storage medium having stored thereon a computer program, characterized in that: the program when executed by a processor implements the steps of the method of any one of claims 1 to 6.
CN202111393510.6A 2021-11-23 2021-11-23 Method and device for capturing power grid scheduling event detection based on semantic features Pending CN114201970A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111393510.6A CN114201970A (en) 2021-11-23 2021-11-23 Method and device for capturing power grid scheduling event detection based on semantic features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111393510.6A CN114201970A (en) 2021-11-23 2021-11-23 Method and device for capturing power grid scheduling event detection based on semantic features

Publications (1)

Publication Number Publication Date
CN114201970A true CN114201970A (en) 2022-03-18

Family

ID=80648445

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111393510.6A Pending CN114201970A (en) 2021-11-23 2021-11-23 Method and device for capturing power grid scheduling event detection based on semantic features

Country Status (1)

Country Link
CN (1) CN114201970A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232280A (en) * 2019-06-20 2019-09-13 北京理工大学 A kind of software security flaw detection method based on tree construction convolutional neural networks
CN110472051A (en) * 2019-07-24 2019-11-19 中国科学院软件研究所 A kind of event detecting method indicating study based on variable quantity
US20200074321A1 (en) * 2018-09-04 2020-03-05 Rovi Guides, Inc. Methods and systems for using machine-learning extracts and semantic graphs to create structured data to drive search, recommendation, and discovery
CN111222318A (en) * 2019-11-19 2020-06-02 陈一飞 Trigger word recognition method based on two-channel bidirectional LSTM-CRF network
CN113312500A (en) * 2021-06-24 2021-08-27 河海大学 Method for constructing event map for safe operation of dam
WO2021204017A1 (en) * 2020-11-20 2021-10-14 平安科技(深圳)有限公司 Text intent recognition method and apparatus, and related device
CN113673567A (en) * 2021-07-20 2021-11-19 华南理工大学 Panorama emotion recognition method and system based on multi-angle subregion self-adaption

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200074321A1 (en) * 2018-09-04 2020-03-05 Rovi Guides, Inc. Methods and systems for using machine-learning extracts and semantic graphs to create structured data to drive search, recommendation, and discovery
CN110232280A (en) * 2019-06-20 2019-09-13 北京理工大学 A kind of software security flaw detection method based on tree construction convolutional neural networks
CN110472051A (en) * 2019-07-24 2019-11-19 中国科学院软件研究所 A kind of event detecting method indicating study based on variable quantity
CN111222318A (en) * 2019-11-19 2020-06-02 陈一飞 Trigger word recognition method based on two-channel bidirectional LSTM-CRF network
WO2021204017A1 (en) * 2020-11-20 2021-10-14 平安科技(深圳)有限公司 Text intent recognition method and apparatus, and related device
CN113312500A (en) * 2021-06-24 2021-08-27 河海大学 Method for constructing event map for safe operation of dam
CN113673567A (en) * 2021-07-20 2021-11-19 华南理工大学 Panorama emotion recognition method and system based on multi-angle subregion self-adaption

Similar Documents

Publication Publication Date Title
Zhang et al. Shallow convolutional neural network for implicit discourse relation recognition
Méndez et al. Tokenising, stemming and stopword removal on anti-spam filtering domain
Gamon Linguistic correlates of style: authorship classification with deep linguistic analysis features
CN109902175A (en) A kind of file classification method and categorizing system based on neural network structure model
CN107608999A (en) A kind of Question Classification method suitable for automatically request-answering system
CN110502742B (en) Complex entity extraction method, device, medium and system
Tao et al. Enhancing relation extraction using syntactic indicators and sentential contexts
CN110362819A (en) Text emotion analysis method based on convolutional neural networks
Yang et al. Abstractive text summarization for Hungarian
Khan et al. Genetic semantic graph approach for multi-document abstractive summarization
CN109271624A (en) A kind of target word determines method, apparatus and storage medium
CN109446299A (en) The method and system of searching email content based on event recognition
CN111368540A (en) Keyword information extraction method based on semantic role analysis
Banu et al. Tamil document summarization using semantic graph method
Sun et al. Using support vector machines for terrorism information extraction
CN114201970A (en) Method and device for capturing power grid scheduling event detection based on semantic features
CN109614541A (en) A kind of event recognition method, medium, device and calculate equipment
Li-Juan et al. A classification method of Vietnamese news events based on maximum entropy model
Thilagavathi et al. Document clustering in forensic investigation by hybrid approach
CN113688233A (en) Text understanding method for semantic search of knowledge graph
Hui et al. A weighted topical document embedding based clustering method for news text
Lin et al. Research on mixed model-based chinese relation extraction
Sajadi et al. Arabic named entity recognition using boosting method
Barnard et al. Cross modal disambiguation
Chen et al. Location Extraction from Twitter Messages using Bidirectional Long Short-Term Memory Model.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination