CN115470786A - Entity information extraction method for electric power defect text based on improved Transformer encoder - Google Patents

Entity information extraction method for electric power defect text based on improved Transformer encoder Download PDF

Info

Publication number
CN115470786A
CN115470786A CN202211044230.9A CN202211044230A CN115470786A CN 115470786 A CN115470786 A CN 115470786A CN 202211044230 A CN202211044230 A CN 202211044230A CN 115470786 A CN115470786 A CN 115470786A
Authority
CN
China
Prior art keywords
model
vector
cwg
character
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211044230.9A
Other languages
Chinese (zh)
Inventor
龙云
卢有飞
刘璐豪
梁雪青
吴任博
张扬
赵宏伟
陈明辉
张少凡
邹时容
蔡燕春
刘璇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Original Assignee
Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd filed Critical Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority to CN202211044230.9A priority Critical patent/CN115470786A/en
Publication of CN115470786A publication Critical patent/CN115470786A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the crossing field of artificial intelligence and an electric power system, in particular to an entity information extraction method for an electric power defect text based on an improved Transformer encoder. The method includes the steps of introducing a pre-training language model, a dictionary, a fine-tuning TENER model and a conditional random field model, building a CWG-TENER model, performing optimization training and test selection on the model by using a labeled secondary equipment defect text of the power system to obtain a power equipment defect text information extraction model, inputting the power equipment defect text of information to be extracted into the power equipment defect text information extraction model, and obtaining the extracted information. The method can be used for extracting entity information related in the defect text of the secondary equipment of the electric power system, and can provide an auxiliary decision-making function when the secondary equipment of the electric power system fails.

Description

Entity information extraction method for electric power defect text based on improved Transformer encoder
Technical Field
The invention belongs to the crossing field of artificial intelligence and an electric power system, and particularly relates to an entity information extraction method for an electric power defect text based on an improved Transformer encoder.
Background
The informatization construction in the power field enables data about a power system to grow explosively, wherein a large amount of production process information is recorded in a power equipment defect text, and the important significance of deeply mining valuable information in the production process information on the development of the power industry is inspired. However, the current power defect text lacks efficient structured management, and the utilization rate of information is affected by the condition that the text is filled in abnormally. Due to the fact that the defect text information is not sufficiently utilized, the same defect frequently appears repeatedly in different regions, if the power equipment has defects in the operation process, operation and maintenance personnel cannot judge the accurate defect reason of the equipment in time only by experience of the operation and maintenance personnel, and the equipment with the defects endangered can be processed improperly due to failure in time to cause a series of cascade faults. Meanwhile, with the continuous development of Artificial Intelligence (AI), applying the Artificial Intelligence technology to the power industry is an inevitable requirement for the development of the power industry. The natural language processing technology is successfully applied to a power system, such as a power internet of things, a power intelligent search engine and the like, and a Named Entity Recognition technology (NER), which is one of basic tasks of natural language processing, can make more efficient use of a power defect text due to the strong information extraction and classification capabilities of the NER.
In recent years, the application of named entity recognition to machine translation, question-answering systems, and the like has verified its advantages in entity recognition, including the utilization of centrally stored information by power systems. Many scholars try to solve the problem of cross-region calling, assistant decision making and intelligent diagnosis platform building of the unstructured text of the power system based on NER, and achieve good effects.
However, the utilization degree and the parallel capability of the remote context of the wide variety of Recurrent Neural Networks (RNNs) applied in the electric power text data NER task at present are insufficient, which limits the application of the networks in the scenes with the accumulated large number of defect texts and high accuracy requirements of the electric power system. Therefore, many scholars try to build a model which can improve the problems of a bidirectional long-time memory network (BilSTM) which is a typical recurrent neural network applied to NER by using the existing neural network model based on concepts such as a self-attention mechanism and position embedding. In particular, foreign scholars have proposed a Transformer encoder that simulates remote context using a fully connected self-attention structure, which starts to apply to multi-NLP tasks and behaves like it, while its research and success in the information extraction problem of power defect text has not been developed.
Disclosure of Invention
In order to solve the technical problems in the prior art, the invention provides an entity information extraction method for a power defect text based on an improved Transformer encoder, which is characterized in that a pre-training language model is introduced on the basis of an original NER model based on a Transformer to convert text data into word and phrase vectors, a dictionary obtained by a large amount of linguistic data word segmentation is introduced, word information is fused on the basis of character information, a network is updated in a graph mode, characters, words and global information are better fused, and the entity information of the power defect text can be more accurately extracted.
The invention can be achieved by adopting the following technical scheme:
the entity information extraction method of the power defect text based on the improved Transformer encoder comprises the following steps:
s1, introducing a defect recording data text of secondary equipment of an electric power system, and labeling the data text;
s2, introducing a pre-training model, a dictionary, a fine-tuning TENER model and a conditional random field model, building a CWG-TENER model, and performing optimization training on the CWG-TENER model by using a data text with a label to obtain a power equipment defect text information extraction model;
and S3, inputting the defect text of the electric power equipment with the information to be extracted into the defect text information extraction model of the electric power equipment to obtain the extracted information.
Specifically, the step S2 includes:
s21, introducing a pre-training model and a dictionary, extracting character vectors of a data text and word vectors of words of the dictionary, wherein the dictionary is obtained based on a plurality of original corpus participles;
s22, the extracted character vectors form a character vector set C, the data text is matched with words in a dictionary, and word vectors corresponding to the matched words form a word vector set W:
s23, building a word graph CWG model;
s24, replacing a CRF layer of the transform model with a full connection layer to enable the output dimension to be the same as the word and phrase vector dimension, and obtaining a fine tuning TENER model;
s25, taking the character vector set C and the word vector set W as the input of a fine tuning TENER model to obtain an initial value C of the feature vector of the output node 0 Initial value W of feature vector of sum edge 0 The initial value C of the node feature vector 0 Initial value W of feature vector of sum edge 0 Separately replacing nodes and CWG modes of a CWG modelType edge, defining the initial value of CWG model global variable as g 0
S26, respectively carrying out aggregation calculation on nodes of the CWG model, edges of the CWG model and global variables of the CWG model to obtain a character vector after first aggregation
Figure BDA0003821862390000021
Word vector
Figure BDA0003821862390000022
And a global vector
Figure BDA0003821862390000023
S27, using character vector
Figure BDA0003821862390000024
Word vector
Figure BDA0003821862390000025
And a global vector
Figure BDA0003821862390000026
Replacing nodes of the CWG model, edges of the CWG model and global variables of the CWG model;
s28, updating the character vector and the word vector by finely adjusting the TENER model, and calculating the updating output of the global vector by an LSTM network state updating formula;
s29, respectively replacing the updated character vector, word vector and global vector with the node of the CWG model, the edge of the CWG model and the global variable of the CWG model, and aggregating the node of the CWG model, the edge of the CWG model and the global variable of the CWG model;
s210, circulating the step S28 to the step S29 for T times to obtain a final character feature vector set;
s211, inputting the final character feature vector set into a conditional random field model CRF, and calculating to obtain an output optimal label sequence;
and S212, optimizing the model parameters by using an Adam optimizer according to the optimal label sequence, and circularly training for a preset number of times to obtain the electric power equipment defect text information extraction model.
Compared with the prior art, the invention has the following advantages and beneficial effects:
the invention provides an entity information extraction method for a power defect text based on an improved Transformer encoder, which comprises the steps of introducing a pre-training language model, a dictionary, a fine-tuning TENER model and a conditional random field model, building a CWG-TENER model, performing optimization training and test selection on the model by using a labeled power system secondary equipment defect text to obtain a power equipment defect text information extraction model, wherein the model can be used for extracting entity information related to the power system secondary equipment defect text, more effectively extracting entity information required by the power system secondary equipment defect text, facilitating the subsequent building of a knowledge graph and providing an auxiliary decision-making function when the power system secondary equipment fails.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
Fig. 1 is a flowchart of a work flow of extracting key information of a defect recording text of an electric power secondary device in an embodiment of the present invention;
FIG. 2 is a labeled example diagram of one of the embodiments of the invention;
fig. 3 is a schematic structural diagram of the CWG model in the embodiment of the present invention;
fig. 4 is a schematic diagram of a CWG-TENER model structure and an operation flow in the embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be described in further detail with reference to the drawings and examples, and it is obvious that the described examples are some examples of the present invention, but not all examples, and the embodiments of the present invention are not limited thereto. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
Example 1:
in this embodiment, based on the problem of extracting the "defect phenomenon" information in the text of the functional defect of the secondary device in the power system, a "word graph" model is constructed for the text of the functional defect of the secondary device in the power system, a Transformer-based improved encoder suitable for NER is used to perform aggregation update on a neural network of the graph, finally, a conditional random field model is used to output a labeling sequence for the text of the functional defect of the secondary device in the power system, and the "defect phenomenon" information is extracted according to the labeling sequence.
As shown in fig. 1, the embodiment provides a method for extracting entity information of a power defect text based on an improved Transformer encoder, which specifically includes the following steps:
s1, introducing a defect recording data text of the secondary equipment of the power system, and labeling the data text, wherein a data text labeling result is shown in fig. 2.
Taking the extracted phrase as the "defect phenomenon" as an example, the first phrase in the text for representing the defect phenomenon is labeled as "B", the rest characters in the phrase are labeled as "I", and the characters irrelevant to the defect phenomenon in the text are labeled as "O".
The text "protection device operating abnormally" in fig. 2. For example, in the case where the phrase "defect phenomenon" is "device operation abnormality", the "device" is denoted by "B", the "device operation abnormality" is denoted by "I", and "protection" and "are" provided. "not related to" defect phenomenon ", it is labeled" O ".
And S2, introducing a pre-training model, a dictionary, a fine-tuning TENER model and a conditional random field model, building a CWG-TENER model, and performing optimization training on the model by using the data text marked in the S1 to obtain a power equipment defect text information extraction model.
S21: introducing a pre-training model and a dictionary, extracting character vectors of data texts and word vectors of dictionary words, wherein the dictionary is obtained based on a large number of original corpus participles, and the pre-training model is any one of the following models: BERT model, BERT-wwm model, ERNIE model.
BERT (Bidirectional Encoder restationings from Transformers) is a transform-based bi-directional Encoder published by Google in 2018. It is the "first deep bi-directional, unsupervised language representation, trained in advance using only a corpus of plain text". The pretrained BERT model can generate models for processing various natural language processing tasks only by fine adjustment of an additional output layer.
BERT-wwm (wheel Word Masking), which is an upgrade version of BERT issued by Google in 2019, mainly changes a training sample generation strategy of an original pre-training stage. The mask mode of the original Word-Piece is changed into wheel Word Masking. For Chinese applications, i.e., if a character is masked, other characters belonging to a word are masked.
ERNIE, enhanced reproduction through Knowledge Integration is a BERT-based optimization model published in 2019. The mechanism of the mask is mainly improved, and the mask is composed of three types of masks: basic-level masking (word piece), phrase level masking (WWM style), entity level masking.
BERT/BERT-wm uses Wikipedia data for training, and the positive text effect is better; ERNIE uses additional Baidu post, knowledge, etc. network data, which has advantages for informal text (e.g., micro blogs, etc.). If traditional Chinese data is to be processed, BERT or BERT-wwm is used, since there is little traditional Chinese in the word list of ERNIE.
S22: the extracted character vectors form a character vector set C, the data text is matched with words in a dictionary, and word vectors corresponding to the matched words form a word vector set W.
The character vector set C is:
C=[c 1 ,c 2 ,...,c m ]
wherein, c 1 ,c 2 ,...,c m And extracting character vectors for the data text by using a pre-training model, wherein m is the total number of characters in the text.
The sequence shown in fig. 2 "the protection device is operating abnormally. "for example, 9 characters in total, m =9, and let the dimension of the character vector be d model C from this sequence is a d model X m matrix.
The set of word vectors W is:
Figure BDA0003821862390000051
wherein,
Figure BDA0003821862390000052
for the word vector corresponding to the matched word, its dimension is the same as the character vector, b i 、e i Respectively the head and tail characters of the word corresponding to the ith word vector, and n is the total number of words matched in the dictionary by the data text.
The specific definition of the matching term is: and if a word in the dictionary contains any character in the data text, the word is a matching word.
The sequence shown in fig. 2 "the protection device is operating abnormally. For example, there are 5 words "protect", "device", "run", "abnormal" and "protection", n =5, and W obtained from this sequence is a d model X n matrix.
S23: constructing a CWG (Character-Word Graph) model, wherein the CWG model has a specific structure of a directed Graph formed by data text information, and a Character vector c i Nodes, word vectors, forming a graph
Figure BDA0003821862390000053
Form the slave character b j Corresponding node pointing character e j The edge of the corresponding node.
The CWG model constructed from the sequence of figure 2 is shown in figure 3.
Then, the cyclic operation of 'update → aggregation → update → 8230 → aggregation' is carried out on the CWG model to extract text features, which specifically comprises the following steps: the character vector C (node), the word vector W (edge) is updated and the global vector g initial value → the character vector C (node), the word vector W (edge), the global vector g aggregate → the character vector C (node), the word vector W (edge), the global vector g update → \8230; → the character vector C (node), the word vector W (edge), the global vector g aggregate are calculated. This process will be described in detail below.
S24: a fine-tuning teer model is introduced to perform the "update" operation of the CWG model. The TENER model is a Transformer model improved based on a named entity recognition task, and the specific mode of fine tuning is as follows: and replacing the CRF layer of the model with a full connection layer to ensure that the output dimension is the same as the word and phrase vector dimension, thereby obtaining the fine tuning TENER model.
In this embodiment, when the nodes are updated in the subsequent steps, the output obtained by the attention mechanism is
Figure BDA0003821862390000061
The formula of the full connection layer is:
C t+1 =U Linear C t+1 ′+B Linear
wherein, U Linear 、B Linear Are trainable parameters in the fully connected layer,
Figure BDA0003821862390000062
Figure BDA0003821862390000063
so as to complete a round of updating to obtain a character vector C t+1 Dimension and C t Are identical, i.e.
Figure BDA0003821862390000064
S25: taking the character vector set C and the word vector set W obtained in the S22 as the input of the fine tuning model TENER to obtain an output C 0 And W 0 That is, the character vector and the word vector obtained after the first round of "update" operation are used as the initial values of the feature vectors of the nodes and the edges to replace the CWG module of S23Nodes and edges of the pattern. Simultaneously defining the initial value of the global variable of the CWG model as g 0 =average(C,W);
S26: respectively aggregating the feature vector, the edge feature vector and the global variable of the character nodes of the CWG to obtain a character vector after the first aggregation
Figure BDA0003821862390000065
Word vector
Figure BDA0003821862390000066
And a global vector
Figure BDA0003821862390000067
The specific method comprises the following steps:
the aggregation formula of the nodes is as follows:
Figure BDA0003821862390000068
wherein i represents the ith character and t represents the tth round of updating.
Figure BDA0003821862390000069
Aggregating the feature vectors of the preceding character nodes for the t-th round,
Figure BDA00038218623900000610
for the aggregated character node feature vector,
Figure BDA00038218623900000611
is composed of
Figure BDA00038218623900000612
The feature vector of the predecessor node of (a),
Figure BDA00038218623900000613
is composed of
Figure BDA00038218623900000614
The incoming edge feature vector of (a) is,
Figure BDA00038218623900000615
representing the concatenation of two vectors, and the MultiAtt () representing the aggregation in a multi-headed attentive manner.
The aggregation formula of the edges is:
Figure BDA00038218623900000616
wherein,
Figure BDA00038218623900000617
the feature vector of the edge pointing from node b to node e before the t-th round of aggregation,
Figure BDA00038218623900000618
is the feature vector of the edge after the aggregation,
Figure BDA00038218623900000619
is equal to the edge w b,e All characters corresponding to the word match correspond to a set of feature vectors.
The calculation formula of the global variable is as follows:
Figure BDA00038218623900000620
Figure BDA00038218623900000621
Figure BDA00038218623900000622
wherein,
Figure BDA00038218623900000623
for a set of feature vectors corresponding to all characters in the input text sequence,
Figure BDA00038218623900000624
forming a word vector set for the word vectors corresponding to all the matched words, g t For the global vector before the t-th aggregation,
Figure BDA0003821862390000071
is a global vector after character vector information is merged in the t-th aggregation process,
Figure BDA0003821862390000072
is a global vector after word vector information is merged in the t-th round aggregation process,
Figure BDA0003821862390000073
and aggregating the obtained final global vector for the t round.
S27: by character vector
Figure BDA0003821862390000074
Word vector
Figure BDA0003821862390000075
And a global vector
Figure BDA0003821862390000076
And replacing the feature vector, the edge feature vector and the global feature vector of the character node of the CWG model.
S28: and updating the character vectors and the word vectors by finely adjusting the TENER model, and calculating the updating output of the global vectors by an LSTM network state updating formula.
And S281, carrying out t-round aggregation according to the feature vectors of the character nodes, and inputting the output of the t-round aggregation as a fine-tuning TENER model to obtain a new character vector. Specifically, as described in step S26
Figure BDA0003821862390000077
Structure of the device
Figure BDA0003821862390000078
As an input to the fine-tuning TENER model, an update is performed,
Figure BDA0003821862390000079
the concrete formula is as follows:
Figure BDA00038218623900000710
wherein i represents the ith character and t represents the tth round of updating.
Figure BDA00038218623900000711
The feature vectors of the character nodes before the aggregation for the t-th round,
Figure BDA00038218623900000712
the character node feature vectors after the t-th round of aggregation are obtained,
Figure BDA00038218623900000713
for the aggregated global vector of the t-th round,
Figure BDA00038218623900000714
inputs for the constructed post-fine-tuned TENER model.
Will be described in
Figure BDA00038218623900000715
Inputting the fine tuning TENER model for updating to obtain an output C t+1 The method specifically comprises the following steps:
Figure BDA00038218623900000716
wherein, FTTENER c () Represents a fine-tuned tee model for the character vector.
The fine-tuning TENER model contains an attention mechanism and position codes, wherein the attention mechanism comprises a single-head attention mechanism and a multi-head attention mechanism.
Figure BDA00038218623900000717
Relative to
Figure BDA00038218623900000718
The position code of (a) is:
Figure BDA00038218623900000719
Figure BDA00038218623900000720
Figure BDA00038218623900000721
wherein i and j represent the ith character and the jth character respectively, and p i,j Is composed of
Figure BDA00038218623900000722
Relative to
Figure BDA00038218623900000723
The relative position of (2 k), (2k + 1) is the index of the element in a word vector, d input As input dimension, R, of FTTENER model ij To be the final
Figure BDA00038218623900000724
Relative to
Figure BDA00038218623900000725
The position of (2) is encoded.
In this embodiment, when the single-headed attention mechanism is used in the fine-tuning tee model, the specific formula is as follows.
Input is as
Figure BDA0003821862390000081
Wherein d is model M is the total number of characters in the text, which is the dimension of the character vector.The input is composed of three learnable matrices
Figure BDA0003821862390000082
Projected into different spaces, attention mechanism output is given by the following equation:
Figure BDA0003821862390000083
Figure BDA0003821862390000084
wherein Q is the query vector in the attention mechanism, K is the key vector in the attention mechanism, V is the value vector in the attention mechanism, Q y Is the query vector, K, for the y-th text character z Is the key vector for the z-th text character,
Figure BDA0003821862390000085
representing the transpose of the vector, z being the character number noted for the y-th text character, A y,z Indicating the attention value of the y-th text character to the z-th text character. The single-head attention mechanism output is shown as S24:
Figure BDA0003821862390000086
from said position-coding formula, when d input =5d model ,d model Encode the dimension of the vector for the character, then
Figure BDA0003821862390000087
Relative to
Figure BDA0003821862390000088
Is encoded as:
Figure BDA0003821862390000089
Figure BDA00038218623900000810
Figure BDA00038218623900000811
when a multi-head attention mechanism is used in the fine-tuning TENER model for improving the self-attention capacity, n groups of mapping matrixes are set
Figure BDA00038218623900000812
The output equation is as follows:
Figure BDA00038218623900000813
Figure BDA00038218623900000814
Figure BDA00038218623900000815
where n is the number of heads and the superscript h is the head index, i.e., Q (h) 、K (h) 、V (h) Respectively as the query vector and the key vector value vector of the h head,
Figure BDA0003821862390000091
is a learnable matrix corresponding to the above-mentioned three vectors,
Figure BDA0003821862390000092
is the query vector at the h-th head of the y-th text character,
Figure BDA0003821862390000093
is the transpose of the h-th head of the z-th text character's key vector,
Figure BDA0003821862390000094
attention value, head, at h head for y text character to z text character (h) Is the output of the h head in the multi-head attention mechanism.
Figure BDA0003821862390000095
To learn the parameters, an output C is obtained t+1 ' is:
C t+1 ′=W o [head (1) ;...;head (n) ]
dimension d at this time input =5d model
Figure BDA0003821862390000096
Relative to
Figure BDA0003821862390000097
The position coding of (2) is the same as in the single head case.
And S282, carrying out t-round aggregation according to the feature vectors of the edges, and taking the output of the t-round aggregation as the input of the fine-tuning TENER model to obtain a new word vector. Specifically, the method described in S26
Figure BDA0003821862390000098
Structure of the device
Figure BDA0003821862390000099
As an input of the fine-tuned tee model, a specific formula is as follows:
Figure BDA00038218623900000910
wherein i represents the ith edge and t represents the tth round of updating.
Figure BDA00038218623900000911
Aggregating the feature vectors of the front edge for the t-th round,
Figure BDA00038218623900000912
for the post-polymerization of the t-th roundThe number of the eigenvectors is the sum of the average,
Figure BDA00038218623900000913
for the aggregated global vector of the t-th round,
Figure BDA00038218623900000914
and inputting the constructed fine-tuned TENER model.
Will be described in
Figure BDA00038218623900000915
Inputting the fine-tuned TENER model for updating to obtain an output W t+1
Figure BDA00038218623900000916
Wherein, FTTENER w () Represents a fine-tuned tee model for the word vector.
The FTTENER model in S282 is slightly different from the model in S281 in terms of the position code of the input, specifically:
Figure BDA00038218623900000917
Figure BDA00038218623900000918
Figure BDA00038218623900000919
Figure BDA00038218623900000920
wherein,
Figure BDA00038218623900000921
means between the ith word start character and the jth word start characterThe distance of (a) to (b),
Figure BDA00038218623900000922
indicating the distance between the ith word start character and the jth word end character, and so on.
Figure BDA00038218623900000923
Figure BDA00038218623900000924
Wherein p is pos Vectors are encoded for four relative positions of the words, pos being
Figure BDA0003821862390000101
(2k) Is an index of an element in a word vector, d input Is the input dimension of the FTTENER model.
In this embodiment, as shown in figure 3,
Figure BDA0003821862390000102
the rest relative position codes are obtained by analogy,
Figure BDA0003821862390000103
d model is the coding dimension of the word vector.
Figure BDA0003821862390000104
Wherein, U r Is a trainable parameter, representing a linear layer, such that the final R ij Is the same as the input vector dimension, in this embodiment,
Figure BDA0003821862390000105
R ij the position of the final ith word vector relative to the jth word vector is encoded.
S283, updating network state through LSTMFormula calculation updating global variables
Figure BDA0003821862390000106
To obtain g t+1 LSTM networks, long and short term memory networks, are commonly used for the treatment of NLP problems,
Figure BDA0003821862390000107
the updating method of (2) refers to the updating method of the state value therein, and the calculation formula is as follows:
Figure BDA0003821862390000108
Figure BDA0003821862390000109
Figure BDA00038218623900001010
g t+1 =f t+1 ⊙g t +i t+1 ⊙u t+1
wherein U, V and b are trainable parameters,
Figure BDA00038218623900001011
denote i and f, respectively, i.e. equation one actually contains two equations, u t +1 、i t+1 、f t+1
Figure BDA00038218623900001012
Intermediate variables introduced for clarity of formulation.
S29: respectively replacing nodes, edges and global variables of the CWG model with the updated character vector, word vector and global vector, and aggregating the nodes, edges and global variables of the CWG model;
s210: and (5) circulating the step S28 to the step S29 for T times to obtain a final character feature vector set.
S211: inputting the character vector set corresponding to the finally obtained node into a conditional random field model CRF, and calculating to obtain an optimal label sequence, wherein the specific calculation formula comprises the following steps:
Figure BDA00038218623900001013
Figure BDA00038218623900001014
wherein i represents the ith node;
Figure BDA00038218623900001015
transpose for the final eigenvector of the ith node; l i A label for the ith node;
Figure BDA00038218623900001016
and
Figure BDA00038218623900001017
to a label l i-1 And l i Trainable parameters of (a);
Figure BDA00038218623900001018
intermediate variables introduced for the clear representation of the formula, the sum i-1 、l i
Figure BDA0003821862390000111
Calculation formula related to three variables
Figure BDA0003821862390000112
And
Figure BDA0003821862390000113
the same meaning is applied;
Figure BDA0003821862390000114
in order to optimize the sequence of the tag,
Figure BDA0003821862390000115
a tag representing the ith node in the optimal tag sequence,
Figure BDA0003821862390000116
the same meaning is applied; y(s) is the set of all labels in the current situation s;
Figure BDA0003821862390000117
represents the best tag sequence under the current situation s as
Figure BDA0003821862390000118
The probability of (c).
For the training process, the loss function is:
Figure BDA0003821862390000119
wherein N is the total number of tag sequences contained in Y(s).
In this embodiment, if the data set in which the sequence shown in fig. 2 is located is taken as a training set, a label dictionary is defined:
tag2label={B:0,I:1,O:2}
then for the sequence it is possible to,
Figure BDA00038218623900001110
is a random combination of three labels, and has a total of 3 9 Possible values, i.e. Y(s) totalling 3 9 And (4) each element.
For the test and decoding process, the optimal tag sequence y is found by *
y * =argmax y∈Y(s) p(y|s)
Wherein p (y | s) represents the probability of an arbitrary tag sequence y in the current situation s, the best tag sequence y * A labeling result corresponding to each input character.
In this embodiment, if the sequence shown in FIG. 2 is used as the test set to define the tag dictionary, the best result y is completely correct * =2,2,0,1,1,1,1,1,2。
S212: sequence y obtained according to S28 * And optimizing the model parameters by using an Adam optimizer, and circularly training for a certain number of times to obtain the CWG-TENER model with better effect for extracting the defect text information of the power equipment.
In this embodiment, the criterion of the model effect is the performance of the model on the test set, specifically, the three named entity recognition task common indicators: precision, recall, and F1 value.
And S3, inputting the defect text of the electric power equipment with the information to be extracted into the defect text information extraction model of the electric power equipment to obtain the extracted information.
The information to be extracted may include: and recording information such as defect phenomena, defect reasons, solution measures and the like related in the text of the defect record of the power secondary equipment.
The obtained information to be extracted can be used for constructing a subsequent knowledge graph so as to inquire the solution when the secondary equipment of the power system fails and provide an aid decision function. Compared with the existing model, the model can obtain more complete and accurate entity information, so that the finally obtained decision-making assisting system is more practical.
In this embodiment, the target extraction information is defect phenomenon information, and the overall model architecture is shown in fig. 4.
Firstly, inputting a data text into a pre-training language model BERT/BERT-wwm/ERNIE, converting characters in the data text into character vectors, and converting words in a dictionary into word vectors with the same dimension through the pre-training language model. The method comprises the steps of utilizing a TENER model to carry out first-time feature extraction on character vectors and word vectors to obtain initial values of the character vectors and the word vectors entering' aggregation → update → aggregation → 82308230, 8230, circulation and simultaneously calculating the initial values of global vectors. And the character vector, the word vector and the global vector are subjected to aggregation to obtain aggregation output. And then adding the character vectors and the word vectors into the position codes, inputting the position codes into a transform layer with N heads, namely an 'updating' layer, obtaining updating output, and meanwhile calculating the updating output of the global vectors. And finally inputting the character vector, the word vector and the global vector into the linear layer to obtain output with the dimension being the same as the initial dimension, inputting the output into the aggregation layer again, and circulating for T times. And after the last polymerization operation, inputting the final character feature vector into a CRF layer to obtain the final label output.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such modifications are intended to be included in the scope of the present invention.

Claims (9)

1. The method for extracting the entity information of the power defect text based on the improved Transformer encoder is characterized by comprising the following steps of:
s1, introducing a data text of defect record of secondary equipment of an electric power system, and labeling the data text;
s2, introducing a pre-training model, a dictionary, a fine-tuning TENER model and a conditional random field model, building a CWG-TENER model, and performing optimization training on the CWG-TENER model by using a data text with a label to obtain a power equipment defect text information extraction model;
and S3, inputting the defect text of the electric power equipment with the information to be extracted into the defect text information extraction model of the electric power equipment to obtain the extracted information.
2. The method for extracting entity information of power defect texts based on the improved Transformer encoder as claimed in claim 1, wherein the labeling of the data texts comprises:
the first character of the phrase representing the defect phenomenon in the data text is marked as 'B', the rest characters in the phrase are marked as 'I', and the characters not representing the defect phenomenon in the text are marked as 'O'.
3. The method for extracting entity information of power defect text based on improved Transformer encoder as claimed in claim 1, wherein the step S2 comprises:
s21, introducing a pre-training model and a dictionary, and extracting character vectors of a data text and word vectors of dictionary words, wherein the dictionary is obtained by word segmentation based on a plurality of original corpora;
s22, the extracted character vectors form a character vector set C, the data text is matched with words in a dictionary, and word vectors corresponding to the matched words form a word vector set W;
s23, building a word graph CWG model;
s24, replacing a CRF layer of the transform model with a full connection layer to enable the output dimension to be the same as the word and phrase vector dimension, and obtaining a fine tuning TENER model;
s25, taking the character vector set C and the word vector set W as the input of a fine tuning TENER model to obtain an initial value C of the feature vector of the output node 0 Initial value W of feature vector of sum edge 0 The initial value C of the node feature vector 0 Initial value W of feature vector of sum edge 0 Respectively replacing nodes of the CWG model and edges of the CWG model, and defining the initial value of the global variable of the CWG model as g 0
S26, respectively carrying out aggregation calculation on the nodes of the CWG model, the edges of the CWG model and the global variables of the CWG model to obtain a character vector after the first aggregation
Figure FDA0003821862380000011
Word vector
Figure FDA0003821862380000012
And a global vector
Figure FDA0003821862380000013
S27, using character vector
Figure FDA0003821862380000014
Word vector
Figure FDA0003821862380000015
And a global vector
Figure FDA0003821862380000016
Replacing nodes of the CWG model, edges of the CWG model and global variables of the CWG model;
s28, updating the character vector and the word vector by finely adjusting the TENER model, and calculating the updating output of the global vector by an LSTM network state updating formula;
s29, respectively replacing the updated character vector, word vector and global vector for the nodes of the CWG model, the edges of the CWG model and the global variables of the CWG model, and aggregating the nodes of the CWG model, the edges of the CWG model and the global variables of the CWG model;
s210, circulating the step S28 to the step S29 for T times to obtain a final character feature vector set;
s211, inputting the final character feature vector set into a conditional random field model CRF, and calculating to obtain an output optimal label sequence;
and S212, optimizing the model parameters by using an Adam optimizer according to the optimal label sequence, and circularly training for a preset number of times to obtain the electric power equipment defect text information extraction model.
4. The method for extracting entity information of power defect text based on the improved Transformer encoder as claimed in claim 3, wherein the pre-training model is any one of a BERT model, a BERT-wwm model and an ERNIE model.
5. The method as claimed in claim 3, wherein the CWG model is a directed graph formed by data text information, wherein the character vectors form nodes of the graph, and the word vectors form word vectors
Figure FDA0003821862380000021
Form the slave character b j Corresponding node pointing character e j The edges of the corresponding nodes.
6. The method as claimed in claim 3, wherein the step S26 is performed by performing an aggregation calculation on the nodes of the CWG model, the edges of the CWG model and the global variables of the CWG model respectively, wherein,
the aggregation formula for the nodes of the CWG model is:
Figure FDA0003821862380000022
wherein i represents the ith character, t represents the t round update,
Figure FDA0003821862380000023
aggregating the feature vectors of the preceding character nodes for the t-th round,
Figure FDA0003821862380000024
for the aggregated character node feature vector,
Figure FDA0003821862380000025
is composed of
Figure FDA0003821862380000026
The characteristic vector of the predecessor node of (a),
Figure FDA0003821862380000027
is composed of
Figure FDA0003821862380000028
The incoming edge feature vector of (a) is,
Figure FDA0003821862380000029
representing the splicing of two vectors, and representing the aggregation by adopting a multi-head attention mode through MultiAtt ();
the aggregation formula for the edges of the CWG model is:
Figure FDA00038218623800000210
wherein,
Figure FDA00038218623800000211
the feature vector of the edge pointing from node b to node e before the t-th round of aggregation,
Figure FDA00038218623800000212
is the feature vector of the edge after the aggregation,
Figure FDA00038218623800000213
is equal to the edge w b,e A set consisting of corresponding feature vectors of all characters matched with corresponding words;
the calculation formula for the global variables of the CWG model is:
Figure FDA00038218623800000214
Figure FDA0003821862380000031
Figure FDA0003821862380000032
wherein,
Figure FDA0003821862380000033
a set of feature vectors corresponding to all characters in the input text sequence,
Figure FDA0003821862380000034
forming a word vector set for the word vectors corresponding to all the matched words, g t For the global vector before the t-th aggregation,
Figure FDA0003821862380000035
is a global vector after character vector information is merged in the t-th aggregation process,
Figure FDA0003821862380000036
is a global vector after word vector information is merged in the t-th round aggregation process,
Figure FDA0003821862380000037
and aggregating the obtained final global vector for the t round.
7. The method for extracting entity information of power defect text based on improved Transformer encoder as claimed in claim 3, wherein the step S28 comprises:
s281, carrying out t-round aggregation according to the feature vectors of the character nodes, adding the output of the t-round aggregation into a position code to be used as the input of an N-head fine-tuning TENER model, and obtaining an updated character vector;
s282, carrying out t-round aggregation according to the feature vectors of the edges, adding the output of the t-round aggregation into a position code to be used as the input of an N-head fine-tuning TENER model, and obtaining an updated word vector;
s283, calculating and updating global variable through LSTM network state updating formula
Figure FDA0003821862380000038
To obtain g t+1 The calculation formula is as follows:
Figure FDA0003821862380000039
Figure FDA00038218623800000310
Figure FDA00038218623800000311
g t+1 =f t+1 ⊙g t +i t+1 ⊙u t+1
wherein U, V and b are trainable parameters,
Figure FDA00038218623800000312
denote i and f, respectively, i.e. formula one actually contains two formulas, u t+1 、i t +1 、f t+1
Figure FDA00038218623800000313
Intermediate variables introduced for clarity of formulation.
8. The method as claimed in claim 3, wherein the step S211 of inputting the final set of character feature vectors into a conditional random field model CRF to calculate an output optimal label sequence, wherein the calculation formula includes:
Figure FDA00038218623800000314
Figure FDA00038218623800000315
wherein, i represents the ith node;
Figure FDA00038218623800000316
transpose the final eigenvector of the ith node; l. the i Is a label of the i-th node,
Figure FDA00038218623800000317
and
Figure FDA00038218623800000318
to a label l i-1 And l i Trainable parameters of (a);
Figure FDA00038218623800000319
intermediate variables introduced for the clear representation of the formula, the sum i-1 、l i
Figure FDA00038218623800000320
Calculation formula related to three variables
Figure FDA00038218623800000321
Figure FDA00038218623800000322
And
Figure FDA00038218623800000323
the same meaning is applied;
Figure FDA00038218623800000324
in order to optimize the sequence of the tag,
Figure FDA00038218623800000325
a tag representing the ith node in the optimal tag sequence,
Figure FDA0003821862380000041
the meanings of (A) and (B) are the same; y(s) is the set of all labels in the current situation s,
Figure FDA0003821862380000042
represents the best tag sequence under the current situation s as
Figure FDA0003821862380000043
The probability of (c).
9. The method for extracting entity information of power defect text based on improved Transformer encoder as claimed in claim 1, wherein said step S3 of said information to be extracted includes: and defect phenomenon information, defect reason information and solution information related to the defect recording text of the power secondary equipment.
CN202211044230.9A 2022-08-30 2022-08-30 Entity information extraction method for electric power defect text based on improved Transformer encoder Pending CN115470786A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211044230.9A CN115470786A (en) 2022-08-30 2022-08-30 Entity information extraction method for electric power defect text based on improved Transformer encoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211044230.9A CN115470786A (en) 2022-08-30 2022-08-30 Entity information extraction method for electric power defect text based on improved Transformer encoder

Publications (1)

Publication Number Publication Date
CN115470786A true CN115470786A (en) 2022-12-13

Family

ID=84369353

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211044230.9A Pending CN115470786A (en) 2022-08-30 2022-08-30 Entity information extraction method for electric power defect text based on improved Transformer encoder

Country Status (1)

Country Link
CN (1) CN115470786A (en)

Similar Documents

Publication Publication Date Title
CN109992782B (en) Legal document named entity identification method and device and computer equipment
CN111708882B (en) Transformer-based Chinese text information missing completion method
CN111966812B (en) Automatic question answering method based on dynamic word vector and storage medium
CN112632972A (en) Method for rapidly extracting fault information in power grid equipment fault report
CN115145551A (en) Intelligent auxiliary system for machine learning application low-code development
CN115617990B (en) Power equipment defect short text classification method and system based on deep learning algorithm
CN113779988A (en) Method for extracting process knowledge events in communication field
CN113987183A (en) Power grid fault handling plan auxiliary decision-making method based on data driving
CN113312912A (en) Machine reading understanding method for traffic infrastructure detection text
CN113255366A (en) Aspect-level text emotion analysis method based on heterogeneous graph neural network
CN114238649B (en) Language model pre-training method with common sense concept enhancement
CN115687609A (en) Zero sample relation extraction method based on Prompt multi-template fusion
CN117033423A (en) SQL generating method for injecting optimal mode item and historical interaction information
CN115409122A (en) Method, system, equipment and medium for analyzing concurrent faults of power transformation equipment
CN115062123A (en) Knowledge base question-answer pair generation method of conversation generation system
CN113177113A (en) Task type dialogue model pre-training method, device, equipment and storage medium
CN116680407A (en) Knowledge graph construction method and device
CN111813907A (en) Question and sentence intention identification method in natural language question-answering technology
CN114429144B (en) Diversified machine translation method using auxiliary memory
CN113342982B (en) Enterprise industry classification method integrating Roberta and external knowledge base
CN115906846A (en) Document-level named entity identification method based on double-graph hierarchical feature fusion
CN114154505B (en) Named entity identification method oriented to power planning review field
CN113590745B (en) Interpretable text inference method
CN115470786A (en) Entity information extraction method for electric power defect text based on improved Transformer encoder
CN115034236A (en) Chinese-English machine translation method based on knowledge distillation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Lu Youfei

Inventor after: Cai Yanchun

Inventor after: Liu Xuan

Inventor after: Liu Luhao

Inventor after: Liang Xueqing

Inventor after: Wu Renbo

Inventor after: Zhang Yang

Inventor after: Zhao Hongwei

Inventor after: Chen Minghui

Inventor after: Zhang Shaofan

Inventor after: Zou Shirong

Inventor before: Long Yun

Inventor before: Zou Shirong

Inventor before: Cai Yanchun

Inventor before: Liu Xuan

Inventor before: Lu Youfei

Inventor before: Liu Luhao

Inventor before: Liang Xueqing

Inventor before: Wu Renbo

Inventor before: Zhang Yang

Inventor before: Zhao Hongwei

Inventor before: Chen Minghui

Inventor before: Zhang Shaofan

CB03 Change of inventor or designer information