CN115470786A

CN115470786A - Entity information extraction method for electric power defect text based on improved Transformer encoder

Info

Publication number: CN115470786A
Application number: CN202211044230.9A
Authority: CN
Inventors: 龙云; 卢有飞; 刘璐豪; 梁雪青; 吴任博; 张扬; 赵宏伟; 陈明辉; 张少凡; 邹时容; 蔡燕春; 刘璇
Original assignee: Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Current assignee: Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority date: 2022-08-30
Filing date: 2022-08-30
Publication date: 2022-12-13

Abstract

The invention relates to the crossing field of artificial intelligence and an electric power system, in particular to an entity information extraction method for an electric power defect text based on an improved Transformer encoder. The method includes the steps of introducing a pre-training language model, a dictionary, a fine-tuning TENER model and a conditional random field model, building a CWG-TENER model, performing optimization training and test selection on the model by using a labeled secondary equipment defect text of the power system to obtain a power equipment defect text information extraction model, inputting the power equipment defect text of information to be extracted into the power equipment defect text information extraction model, and obtaining the extracted information. The method can be used for extracting entity information related in the defect text of the secondary equipment of the electric power system, and can provide an auxiliary decision-making function when the secondary equipment of the electric power system fails.

Description

Entity information extraction method for electric power defect text based on improved Transformer encoder

Technical Field

The invention belongs to the crossing field of artificial intelligence and an electric power system, and particularly relates to an entity information extraction method for an electric power defect text based on an improved Transformer encoder.

Background

The informatization construction in the power field enables data about a power system to grow explosively, wherein a large amount of production process information is recorded in a power equipment defect text, and the important significance of deeply mining valuable information in the production process information on the development of the power industry is inspired. However, the current power defect text lacks efficient structured management, and the utilization rate of information is affected by the condition that the text is filled in abnormally. Due to the fact that the defect text information is not sufficiently utilized, the same defect frequently appears repeatedly in different regions, if the power equipment has defects in the operation process, operation and maintenance personnel cannot judge the accurate defect reason of the equipment in time only by experience of the operation and maintenance personnel, and the equipment with the defects endangered can be processed improperly due to failure in time to cause a series of cascade faults. Meanwhile, with the continuous development of Artificial Intelligence (AI), applying the Artificial Intelligence technology to the power industry is an inevitable requirement for the development of the power industry. The natural language processing technology is successfully applied to a power system, such as a power internet of things, a power intelligent search engine and the like, and a Named Entity Recognition technology (NER), which is one of basic tasks of natural language processing, can make more efficient use of a power defect text due to the strong information extraction and classification capabilities of the NER.

In recent years, the application of named entity recognition to machine translation, question-answering systems, and the like has verified its advantages in entity recognition, including the utilization of centrally stored information by power systems. Many scholars try to solve the problem of cross-region calling, assistant decision making and intelligent diagnosis platform building of the unstructured text of the power system based on NER, and achieve good effects.

However, the utilization degree and the parallel capability of the remote context of the wide variety of Recurrent Neural Networks (RNNs) applied in the electric power text data NER task at present are insufficient, which limits the application of the networks in the scenes with the accumulated large number of defect texts and high accuracy requirements of the electric power system. Therefore, many scholars try to build a model which can improve the problems of a bidirectional long-time memory network (BilSTM) which is a typical recurrent neural network applied to NER by using the existing neural network model based on concepts such as a self-attention mechanism and position embedding. In particular, foreign scholars have proposed a Transformer encoder that simulates remote context using a fully connected self-attention structure, which starts to apply to multi-NLP tasks and behaves like it, while its research and success in the information extraction problem of power defect text has not been developed.

Disclosure of Invention

In order to solve the technical problems in the prior art, the invention provides an entity information extraction method for a power defect text based on an improved Transformer encoder, which is characterized in that a pre-training language model is introduced on the basis of an original NER model based on a Transformer to convert text data into word and phrase vectors, a dictionary obtained by a large amount of linguistic data word segmentation is introduced, word information is fused on the basis of character information, a network is updated in a graph mode, characters, words and global information are better fused, and the entity information of the power defect text can be more accurately extracted.

The invention can be achieved by adopting the following technical scheme:

the entity information extraction method of the power defect text based on the improved Transformer encoder comprises the following steps:

s1, introducing a defect recording data text of secondary equipment of an electric power system, and labeling the data text;

s2, introducing a pre-training model, a dictionary, a fine-tuning TENER model and a conditional random field model, building a CWG-TENER model, and performing optimization training on the CWG-TENER model by using a data text with a label to obtain a power equipment defect text information extraction model;

and S3, inputting the defect text of the electric power equipment with the information to be extracted into the defect text information extraction model of the electric power equipment to obtain the extracted information.

Specifically, the step S2 includes:

s21, introducing a pre-training model and a dictionary, extracting character vectors of a data text and word vectors of words of the dictionary, wherein the dictionary is obtained based on a plurality of original corpus participles;

s22, the extracted character vectors form a character vector set C, the data text is matched with words in a dictionary, and word vectors corresponding to the matched words form a word vector set W:

s23, building a word graph CWG model;

s24, replacing a CRF layer of the transform model with a full connection layer to enable the output dimension to be the same as the word and phrase vector dimension, and obtaining a fine tuning TENER model;

s25, taking the character vector set C and the word vector set W as the input of a fine tuning TENER model to obtain an initial value C of the feature vector of the output node ⁰ Initial value W of feature vector of sum edge ⁰ The initial value C of the node feature vector ⁰ Initial value W of feature vector of sum edge ⁰ Separately replacing nodes and CWG modes of a CWG modelType edge, defining the initial value of CWG model global variable as g ⁰ ；

S26, respectively carrying out aggregation calculation on nodes of the CWG model, edges of the CWG model and global variables of the CWG model to obtain a character vector after first aggregation

Word vector

And a global vector

S27, using character vector

Word vector

And a global vector

Replacing nodes of the CWG model, edges of the CWG model and global variables of the CWG model;

s28, updating the character vector and the word vector by finely adjusting the TENER model, and calculating the updating output of the global vector by an LSTM network state updating formula;

s29, respectively replacing the updated character vector, word vector and global vector with the node of the CWG model, the edge of the CWG model and the global variable of the CWG model, and aggregating the node of the CWG model, the edge of the CWG model and the global variable of the CWG model;

s210, circulating the step S28 to the step S29 for T times to obtain a final character feature vector set;

s211, inputting the final character feature vector set into a conditional random field model CRF, and calculating to obtain an output optimal label sequence;

and S212, optimizing the model parameters by using an Adam optimizer according to the optimal label sequence, and circularly training for a preset number of times to obtain the electric power equipment defect text information extraction model.

Compared with the prior art, the invention has the following advantages and beneficial effects:

the invention provides an entity information extraction method for a power defect text based on an improved Transformer encoder, which comprises the steps of introducing a pre-training language model, a dictionary, a fine-tuning TENER model and a conditional random field model, building a CWG-TENER model, performing optimization training and test selection on the model by using a labeled power system secondary equipment defect text to obtain a power equipment defect text information extraction model, wherein the model can be used for extracting entity information related to the power system secondary equipment defect text, more effectively extracting entity information required by the power system secondary equipment defect text, facilitating the subsequent building of a knowledge graph and providing an auxiliary decision-making function when the power system secondary equipment fails.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.

Fig. 1 is a flowchart of a work flow of extracting key information of a defect recording text of an electric power secondary device in an embodiment of the present invention;

FIG. 2 is a labeled example diagram of one of the embodiments of the invention;

fig. 3 is a schematic structural diagram of the CWG model in the embodiment of the present invention;

fig. 4 is a schematic diagram of a CWG-TENER model structure and an operation flow in the embodiment of the present invention.

Detailed Description

The technical solutions of the present invention will be described in further detail with reference to the drawings and examples, and it is obvious that the described examples are some examples of the present invention, but not all examples, and the embodiments of the present invention are not limited thereto. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

Example 1:

in this embodiment, based on the problem of extracting the "defect phenomenon" information in the text of the functional defect of the secondary device in the power system, a "word graph" model is constructed for the text of the functional defect of the secondary device in the power system, a Transformer-based improved encoder suitable for NER is used to perform aggregation update on a neural network of the graph, finally, a conditional random field model is used to output a labeling sequence for the text of the functional defect of the secondary device in the power system, and the "defect phenomenon" information is extracted according to the labeling sequence.

As shown in fig. 1, the embodiment provides a method for extracting entity information of a power defect text based on an improved Transformer encoder, which specifically includes the following steps:

s1, introducing a defect recording data text of the secondary equipment of the power system, and labeling the data text, wherein a data text labeling result is shown in fig. 2.

Taking the extracted phrase as the "defect phenomenon" as an example, the first phrase in the text for representing the defect phenomenon is labeled as "B", the rest characters in the phrase are labeled as "I", and the characters irrelevant to the defect phenomenon in the text are labeled as "O".

The text "protection device operating abnormally" in fig. 2. For example, in the case where the phrase "defect phenomenon" is "device operation abnormality", the "device" is denoted by "B", the "device operation abnormality" is denoted by "I", and "protection" and "are" provided. "not related to" defect phenomenon ", it is labeled" O ".

And S2, introducing a pre-training model, a dictionary, a fine-tuning TENER model and a conditional random field model, building a CWG-TENER model, and performing optimization training on the model by using the data text marked in the S1 to obtain a power equipment defect text information extraction model.

S21: introducing a pre-training model and a dictionary, extracting character vectors of data texts and word vectors of dictionary words, wherein the dictionary is obtained based on a large number of original corpus participles, and the pre-training model is any one of the following models: BERT model, BERT-wwm model, ERNIE model.

BERT (Bidirectional Encoder restationings from Transformers) is a transform-based bi-directional Encoder published by Google in 2018. It is the "first deep bi-directional, unsupervised language representation, trained in advance using only a corpus of plain text". The pretrained BERT model can generate models for processing various natural language processing tasks only by fine adjustment of an additional output layer.

BERT-wwm (wheel Word Masking), which is an upgrade version of BERT issued by Google in 2019, mainly changes a training sample generation strategy of an original pre-training stage. The mask mode of the original Word-Piece is changed into wheel Word Masking. For Chinese applications, i.e., if a character is masked, other characters belonging to a word are masked.

ERNIE, enhanced reproduction through Knowledge Integration is a BERT-based optimization model published in 2019. The mechanism of the mask is mainly improved, and the mask is composed of three types of masks: basic-level masking (word piece), phrase level masking (WWM style), entity level masking.

BERT/BERT-wm uses Wikipedia data for training, and the positive text effect is better; ERNIE uses additional Baidu post, knowledge, etc. network data, which has advantages for informal text (e.g., micro blogs, etc.). If traditional Chinese data is to be processed, BERT or BERT-wwm is used, since there is little traditional Chinese in the word list of ERNIE.

S22: the extracted character vectors form a character vector set C, the data text is matched with words in a dictionary, and word vectors corresponding to the matched words form a word vector set W.

The character vector set C is:

C＝[c ₁ ，c ₂ ，...，c _m ]

wherein, c ₁ ，c ₂ ，...，c _m And extracting character vectors for the data text by using a pre-training model, wherein m is the total number of characters in the text.

The sequence shown in fig. 2 "the protection device is operating abnormally. "for example, 9 characters in total, m =9, and let the dimension of the character vector be d _model C from this sequence is a d _model X m matrix.

The set of word vectors W is:

wherein,

for the word vector corresponding to the matched word, its dimension is the same as the character vector, b _i 、e _i Respectively the head and tail characters of the word corresponding to the ith word vector, and n is the total number of words matched in the dictionary by the data text.

The specific definition of the matching term is: and if a word in the dictionary contains any character in the data text, the word is a matching word.

The sequence shown in fig. 2 "the protection device is operating abnormally. For example, there are 5 words "protect", "device", "run", "abnormal" and "protection", n =5, and W obtained from this sequence is a d _model X n matrix.

S23: constructing a CWG (Character-Word Graph) model, wherein the CWG model has a specific structure of a directed Graph formed by data text information, and a Character vector c _i Nodes, word vectors, forming a graph

Form the slave character b _j Corresponding node pointing character e _j The edge of the corresponding node.

The CWG model constructed from the sequence of figure 2 is shown in figure 3.

Then, the cyclic operation of 'update → aggregation → update → 8230 → aggregation' is carried out on the CWG model to extract text features, which specifically comprises the following steps: the character vector C (node), the word vector W (edge) is updated and the global vector g initial value → the character vector C (node), the word vector W (edge), the global vector g aggregate → the character vector C (node), the word vector W (edge), the global vector g update → \8230; → the character vector C (node), the word vector W (edge), the global vector g aggregate are calculated. This process will be described in detail below.

S24: a fine-tuning teer model is introduced to perform the "update" operation of the CWG model. The TENER model is a Transformer model improved based on a named entity recognition task, and the specific mode of fine tuning is as follows: and replacing the CRF layer of the model with a full connection layer to ensure that the output dimension is the same as the word and phrase vector dimension, thereby obtaining the fine tuning TENER model.

In this embodiment, when the nodes are updated in the subsequent steps, the output obtained by the attention mechanism is

The formula of the full connection layer is:

C ^t+1 ＝U _Linear C ^t+1 ′+B _Linear

wherein, U _Linear 、B _Linear Are trainable parameters in the fully connected layer,

so as to complete a round of updating to obtain a character vector C ^t+1 Dimension and C ^t Are identical, i.e.

S25: taking the character vector set C and the word vector set W obtained in the S22 as the input of the fine tuning model TENER to obtain an output C ⁰ And W ⁰ That is, the character vector and the word vector obtained after the first round of "update" operation are used as the initial values of the feature vectors of the nodes and the edges to replace the CWG module of S23Nodes and edges of the pattern. Simultaneously defining the initial value of the global variable of the CWG model as g ⁰ ＝average(C，W)；

S26: respectively aggregating the feature vector, the edge feature vector and the global variable of the character nodes of the CWG to obtain a character vector after the first aggregation

Word vector

And a global vector

The specific method comprises the following steps:

the aggregation formula of the nodes is as follows:

wherein i represents the ith character and t represents the tth round of updating.

Aggregating the feature vectors of the preceding character nodes for the t-th round,

for the aggregated character node feature vector,

is composed of

The feature vector of the predecessor node of (a),

is composed of

The incoming edge feature vector of (a) is,

representing the concatenation of two vectors, and the MultiAtt () representing the aggregation in a multi-headed attentive manner.

The aggregation formula of the edges is:

wherein,

the feature vector of the edge pointing from node b to node e before the t-th round of aggregation,

is the feature vector of the edge after the aggregation,

is equal to the edge w _b，e All characters corresponding to the word match correspond to a set of feature vectors.

The calculation formula of the global variable is as follows:

wherein,

for a set of feature vectors corresponding to all characters in the input text sequence,

forming a word vector set for the word vectors corresponding to all the matched words, g ^t For the global vector before the t-th aggregation,

is a global vector after character vector information is merged in the t-th aggregation process,

is a global vector after word vector information is merged in the t-th round aggregation process,

and aggregating the obtained final global vector for the t round.

S27: by character vector

Word vector

And a global vector

And replacing the feature vector, the edge feature vector and the global feature vector of the character node of the CWG model.

S28: and updating the character vectors and the word vectors by finely adjusting the TENER model, and calculating the updating output of the global vectors by an LSTM network state updating formula.

And S281, carrying out t-round aggregation according to the feature vectors of the character nodes, and inputting the output of the t-round aggregation as a fine-tuning TENER model to obtain a new character vector. Specifically, as described in step S26

Structure of the device

As an input to the fine-tuning TENER model, an update is performed,

the concrete formula is as follows:

The feature vectors of the character nodes before the aggregation for the t-th round,

the character node feature vectors after the t-th round of aggregation are obtained,

for the aggregated global vector of the t-th round,

inputs for the constructed post-fine-tuned TENER model.

Will be described in

Inputting the fine tuning TENER model for updating to obtain an output C ^t+1 The method specifically comprises the following steps:

wherein, FTTENER _c () Represents a fine-tuned tee model for the character vector.

The fine-tuning TENER model contains an attention mechanism and position codes, wherein the attention mechanism comprises a single-head attention mechanism and a multi-head attention mechanism.

Relative to

The position code of (a) is:

wherein i and j represent the ith character and the jth character respectively, and p _i，j Is composed of

Relative to

The relative position of (2 k), (2k + 1) is the index of the element in a word vector, d _input As input dimension, R, of FTTENER model _ij To be the final

Relative to

The position of (2) is encoded.

In this embodiment, when the single-headed attention mechanism is used in the fine-tuning tee model, the specific formula is as follows.

Input is as

Wherein d is _model M is the total number of characters in the text, which is the dimension of the character vector.The input is composed of three learnable matrices

Projected into different spaces, attention mechanism output is given by the following equation:

wherein Q is the query vector in the attention mechanism, K is the key vector in the attention mechanism, V is the value vector in the attention mechanism, Q _y Is the query vector, K, for the y-th text character _z Is the key vector for the z-th text character,

representing the transpose of the vector, z being the character number noted for the y-th text character, A _y，z Indicating the attention value of the y-th text character to the z-th text character. The single-head attention mechanism output is shown as S24:

from said position-coding formula, when d _input ＝5d _model ，d _model Encode the dimension of the vector for the character, then

Relative to

Is encoded as:

when a multi-head attention mechanism is used in the fine-tuning TENER model for improving the self-attention capacity, n groups of mapping matrixes are set

The output equation is as follows:

where n is the number of heads and the superscript h is the head index, i.e., Q ^(h) 、K ^(h) 、V ^(h) Respectively as the query vector and the key vector value vector of the h head,

is a learnable matrix corresponding to the above-mentioned three vectors,

is the query vector at the h-th head of the y-th text character,

is the transpose of the h-th head of the z-th text character's key vector,

attention value, head, at h head for y text character to z text character ^(h) Is the output of the h head in the multi-head attention mechanism.

To learn the parameters, an output C is obtained ^t+1 ' is:

C ^t+1 ′＝W _o [head ⁽¹⁾ ；...；head ⁽ⁿ⁾ ]

dimension d at this time _input ＝5d _model ，

Relative to

The position coding of (2) is the same as in the single head case.

And S282, carrying out t-round aggregation according to the feature vectors of the edges, and taking the output of the t-round aggregation as the input of the fine-tuning TENER model to obtain a new word vector. Specifically, the method described in S26

Structure of the device

As an input of the fine-tuned tee model, a specific formula is as follows:

wherein i represents the ith edge and t represents the tth round of updating.

Aggregating the feature vectors of the front edge for the t-th round,

for the post-polymerization of the t-th roundThe number of the eigenvectors is the sum of the average,

for the aggregated global vector of the t-th round,

and inputting the constructed fine-tuned TENER model.

Will be described in

Inputting the fine-tuned TENER model for updating to obtain an output W ^t+1 ：

Wherein, FTTENER _w () Represents a fine-tuned tee model for the word vector.

The FTTENER model in S282 is slightly different from the model in S281 in terms of the position code of the input, specifically:

wherein,

means between the ith word start character and the jth word start characterThe distance of (a) to (b),

indicating the distance between the ith word start character and the jth word end character, and so on.

Wherein p is _pos Vectors are encoded for four relative positions of the words, pos being

(2k) Is an index of an element in a word vector, d _input Is the input dimension of the FTTENER model.

In this embodiment, as shown in figure 3,

the rest relative position codes are obtained by analogy,

d _model is the coding dimension of the word vector.

Wherein, U _r Is a trainable parameter, representing a linear layer, such that the final R _ij Is the same as the input vector dimension, in this embodiment,

R _ij the position of the final ith word vector relative to the jth word vector is encoded.

S283, updating network state through LSTMFormula calculation updating global variables

To obtain g ^t+1 LSTM networks, long and short term memory networks, are commonly used for the treatment of NLP problems,

the updating method of (2) refers to the updating method of the state value therein, and the calculation formula is as follows:

g ^t+1 ＝f ^t+1 ⊙g ^t +i ^t+1 ⊙u ^t+1

wherein U, V and b are trainable parameters,

denote i and f, respectively, i.e. equation one actually contains two equations, u ^t ⁺¹ 、i ^t+1 、f ^t+1 、

Intermediate variables introduced for clarity of formulation.

S29: respectively replacing nodes, edges and global variables of the CWG model with the updated character vector, word vector and global vector, and aggregating the nodes, edges and global variables of the CWG model;

s210: and (5) circulating the step S28 to the step S29 for T times to obtain a final character feature vector set.

S211: inputting the character vector set corresponding to the finally obtained node into a conditional random field model CRF, and calculating to obtain an optimal label sequence, wherein the specific calculation formula comprises the following steps:

wherein i represents the ith node;

transpose for the final eigenvector of the ith node; l _i A label for the ith node;

and

to a label l _i-1 And l _i Trainable parameters of (a);

intermediate variables introduced for the clear representation of the formula, the sum _i-1 、l _i 、

Calculation formula related to three variables

And

the same meaning is applied;

in order to optimize the sequence of the tag,

a tag representing the ith node in the optimal tag sequence,

the same meaning is applied; y(s) is the set of all labels in the current situation s;

represents the best tag sequence under the current situation s as

The probability of (c).

For the training process, the loss function is:

wherein N is the total number of tag sequences contained in Y(s).

In this embodiment, if the data set in which the sequence shown in fig. 2 is located is taken as a training set, a label dictionary is defined:

tag2label＝{B：0，I：1，O：2}

then for the sequence it is possible to,

is a random combination of three labels, and has a total of 3 ⁹ Possible values, i.e. Y(s) totalling 3 ⁹ And (4) each element.

For the test and decoding process, the optimal tag sequence y is found by ^* ：

y ^* ＝argmax _y∈Y(s) p(y|s)

Wherein p (y | s) represents the probability of an arbitrary tag sequence y in the current situation s, the best tag sequence y ^* A labeling result corresponding to each input character.

In this embodiment, if the sequence shown in FIG. 2 is used as the test set to define the tag dictionary, the best result y is completely correct ^* ＝2，2，0，1，1，1，1，1，2。

S212: sequence y obtained according to S28 ^* And optimizing the model parameters by using an Adam optimizer, and circularly training for a certain number of times to obtain the CWG-TENER model with better effect for extracting the defect text information of the power equipment.

In this embodiment, the criterion of the model effect is the performance of the model on the test set, specifically, the three named entity recognition task common indicators: precision, recall, and F1 value.

The information to be extracted may include: and recording information such as defect phenomena, defect reasons, solution measures and the like related in the text of the defect record of the power secondary equipment.

The obtained information to be extracted can be used for constructing a subsequent knowledge graph so as to inquire the solution when the secondary equipment of the power system fails and provide an aid decision function. Compared with the existing model, the model can obtain more complete and accurate entity information, so that the finally obtained decision-making assisting system is more practical.

In this embodiment, the target extraction information is defect phenomenon information, and the overall model architecture is shown in fig. 4.

Firstly, inputting a data text into a pre-training language model BERT/BERT-wwm/ERNIE, converting characters in the data text into character vectors, and converting words in a dictionary into word vectors with the same dimension through the pre-training language model. The method comprises the steps of utilizing a TENER model to carry out first-time feature extraction on character vectors and word vectors to obtain initial values of the character vectors and the word vectors entering' aggregation → update → aggregation → 82308230, 8230, circulation and simultaneously calculating the initial values of global vectors. And the character vector, the word vector and the global vector are subjected to aggregation to obtain aggregation output. And then adding the character vectors and the word vectors into the position codes, inputting the position codes into a transform layer with N heads, namely an 'updating' layer, obtaining updating output, and meanwhile calculating the updating output of the global vectors. And finally inputting the character vector, the word vector and the global vector into the linear layer to obtain output with the dimension being the same as the initial dimension, inputting the output into the aggregation layer again, and circulating for T times. And after the last polymerization operation, inputting the final character feature vector into a CRF layer to obtain the final label output.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such modifications are intended to be included in the scope of the present invention.

Claims

1. The method for extracting the entity information of the power defect text based on the improved Transformer encoder is characterized by comprising the following steps of:

s1, introducing a data text of defect record of secondary equipment of an electric power system, and labeling the data text;

2. The method for extracting entity information of power defect texts based on the improved Transformer encoder as claimed in claim 1, wherein the labeling of the data texts comprises:

the first character of the phrase representing the defect phenomenon in the data text is marked as 'B', the rest characters in the phrase are marked as 'I', and the characters not representing the defect phenomenon in the text are marked as 'O'.

3. The method for extracting entity information of power defect text based on improved Transformer encoder as claimed in claim 1, wherein the step S2 comprises:

s21, introducing a pre-training model and a dictionary, and extracting character vectors of a data text and word vectors of dictionary words, wherein the dictionary is obtained by word segmentation based on a plurality of original corpora;

s22, the extracted character vectors form a character vector set C, the data text is matched with words in a dictionary, and word vectors corresponding to the matched words form a word vector set W;

s23, building a word graph CWG model;

s25, taking the character vector set C and the word vector set W as the input of a fine tuning TENER model to obtain an initial value C of the feature vector of the output node ⁰ Initial value W of feature vector of sum edge ⁰ The initial value C of the node feature vector ⁰ Initial value W of feature vector of sum edge ⁰ Respectively replacing nodes of the CWG model and edges of the CWG model, and defining the initial value of the global variable of the CWG model as g ⁰ ；

S26, respectively carrying out aggregation calculation on the nodes of the CWG model, the edges of the CWG model and the global variables of the CWG model to obtain a character vector after the first aggregation

Word vector

And a global vector

S27, using character vector

Word vector

And a global vector

s29, respectively replacing the updated character vector, word vector and global vector for the nodes of the CWG model, the edges of the CWG model and the global variables of the CWG model, and aggregating the nodes of the CWG model, the edges of the CWG model and the global variables of the CWG model;

4. The method for extracting entity information of power defect text based on the improved Transformer encoder as claimed in claim 3, wherein the pre-training model is any one of a BERT model, a BERT-wwm model and an ERNIE model.

5. The method as claimed in claim 3, wherein the CWG model is a directed graph formed by data text information, wherein the character vectors form nodes of the graph, and the word vectors form word vectors

Form the slave character b _j Corresponding node pointing character e _j The edges of the corresponding nodes.

6. The method as claimed in claim 3, wherein the step S26 is performed by performing an aggregation calculation on the nodes of the CWG model, the edges of the CWG model and the global variables of the CWG model respectively, wherein,

the aggregation formula for the nodes of the CWG model is:

wherein i represents the ith character, t represents the t round update,

for the aggregated character node feature vector,

is composed of

The characteristic vector of the predecessor node of (a),

is composed of

The incoming edge feature vector of (a) is,

representing the splicing of two vectors, and representing the aggregation by adopting a multi-head attention mode through MultiAtt ();

the aggregation formula for the edges of the CWG model is:

wherein,

is the feature vector of the edge after the aggregation,

is equal to the edge w _b,e A set consisting of corresponding feature vectors of all characters matched with corresponding words;

the calculation formula for the global variables of the CWG model is:

wherein,

a set of feature vectors corresponding to all characters in the input text sequence,

and aggregating the obtained final global vector for the t round.

7. The method for extracting entity information of power defect text based on improved Transformer encoder as claimed in claim 3, wherein the step S28 comprises:

s281, carrying out t-round aggregation according to the feature vectors of the character nodes, adding the output of the t-round aggregation into a position code to be used as the input of an N-head fine-tuning TENER model, and obtaining an updated character vector;

s282, carrying out t-round aggregation according to the feature vectors of the edges, adding the output of the t-round aggregation into a position code to be used as the input of an N-head fine-tuning TENER model, and obtaining an updated word vector;

s283, calculating and updating global variable through LSTM network state updating formula

To obtain g ^t+1 The calculation formula is as follows:

g ^t+1 ＝f ^t+1 ⊙g ^t +i ^t+1 ⊙u ^t+1 ；

wherein U, V and b are trainable parameters,

denote i and f, respectively, i.e. formula one actually contains two formulas, u ^t+1 、i ^t ⁺¹ 、f ^t+1 、

Intermediate variables introduced for clarity of formulation.

8. The method as claimed in claim 3, wherein the step S211 of inputting the final set of character feature vectors into a conditional random field model CRF to calculate an output optimal label sequence, wherein the calculation formula includes:

wherein, i represents the ith node;

transpose the final eigenvector of the ith node; l. the _i Is a label of the i-th node,

and

to a label l _i-1 And l _i Trainable parameters of (a);

Calculation formula related to three variables

And

the same meaning is applied;

in order to optimize the sequence of the tag,

a tag representing the ith node in the optimal tag sequence,

the meanings of (A) and (B) are the same; y(s) is the set of all labels in the current situation s,

represents the best tag sequence under the current situation s as

The probability of (c).

9. The method for extracting entity information of power defect text based on improved Transformer encoder as claimed in claim 1, wherein said step S3 of said information to be extracted includes: and defect phenomenon information, defect reason information and solution information related to the defect recording text of the power secondary equipment.