CN115936009A

CN115936009A - Electric power text semantic analysis method

Info

Publication number: CN115936009A
Application number: CN202211693288.6A
Authority: CN
Inventors: 贾骏; 杨景刚; 付慧; 张国江; 胡成博; 路永玲; 李双伟
Original assignee: State Grid Jiangsu Electric Power Co Ltd; Electric Power Research Institute of State Grid Jiangsu Electric Power Co Ltd
Current assignee: State Grid Jiangsu Electric Power Co Ltd; Electric Power Research Institute of State Grid Jiangsu Electric Power Co Ltd
Priority date: 2022-12-28
Filing date: 2022-12-28
Publication date: 2023-04-07

Abstract

The invention discloses a semantic analysis method for an electric power text, which comprises the following steps: constructing a PowerBERT power text model, wherein the PowerBERT power text model comprises an embedded layer, a plurality of layers of transform-encoders and an output layer; the embedded layer is used for converting characters and positions in the input power text into corresponding vector information; the Transformer-Encoder is used for capturing the intrinsic meaning in the vector information output by the embedded layer to obtain an encoding matrix of the electric power text; training the PowerBERT electric power text model based on a multiple text crossing mechanism and a real-time dynamic mask mechanism to obtain a trained PowerBERT electric power text model; and inputting the power text to be subjected to semantic analysis into the trained PowerBERT power text model to obtain a semantic analysis result.

Description

Electric power text semantic analysis method

Technical Field

The invention belongs to the technical field of power equipment fault diagnosis, and particularly relates to a power text semantic analysis method.

Background

At present, research aiming at intelligent operation and maintenance of the power transformer mainly focuses on mining and analyzing state data of structured equipment, and research on unstructured data of an operation and maintenance text of the power transformer starts slowly. Due to the fact that a large number of complex text terms exist in unstructured data of the power transformer operation and maintenance text, the existing text classification model cannot well deal with the unstructured data of the power transformer operation and maintenance text, namely the classification accuracy is not high, and terms and structures in the operation and maintenance text cannot be well understood.

Disclosure of Invention

The invention aims to: in order to solve the problem that the existing text classification model cannot well deal with unstructured data of the operation and maintenance text of the power transformer, the invention provides a semantic analysis method for the power text, which can greatly improve the integrity of complex text terms and structure training in the power field, avoid the problem of repeated trapping in local optimization in multiple rounds of training, and realize the mining and analysis of the unstructured data of the operation and maintenance text of the power transformer.

The technical scheme is as follows: a power text semantic analysis method comprises the following steps:

step 1: constructing a PowerBERT power text model, wherein the PowerBERT power text model comprises an embedded layer, a plurality of layers of transform-encoders and an output layer; the embedded layer is used for converting characters and positions in the input power text into corresponding vector information; the Transformer-Encoder is used for capturing the intrinsic meaning in the vector information output by the embedded layer to obtain an encoding matrix of the electric power text;

and 2, step: training the PowerBERT power text model based on a multiple text crossing mechanism and a real-time dynamic mask mechanism to obtain a trained PowerBERT power text model;

and step 3: and inputting the power text to be classified into the trained PowerBERT power text model to obtain a classification result.

Further, the embedding layer includes: a word embedding model for breaking down the input power text into word vectors, a block embedding model for distinguishing which block a word belongs to, and a position embedding model for representing the absolute position of each word.

Furthermore, each layer of the transform-Encoder comprises a transform structure and an Encoder structure, the input of the first layer of the transform-Encoder is the output of the embedding layer, the input of the next layer of the transform-Encoder is the output of the previous layer of the transform-Encoder, and the output of the last layer of the transform-Encoder is the input of the output layer.

Further, the Transformer structure comprises a multi-head attention layer, wherein the multi-head attention layer is formed by combining a plurality of self-attention layers;

assume the input of the multi-head attention layer is matrix X _MHA The inputs Q, K, V from the attention layer are derived according to the following equation:

/>

in the formula, W _Q 、W _K 、W _V A transformation parameter matrix to be optimized;

the output from the attention layer is expressed as:

in the formula, d _k For the dimension of the input matrix, softmax (, is the activation function;

and splicing the outputs of the multiple self-attention layers, and performing linear transformation to obtain the output of the current Transformer structure.

Further, the Encoder structure comprises a normalization layer and a full connection layer, and is represented as:

in the formula, X _Encoder Input representing the Encoder structure, MHA (X) _Encoder ) Represents the output of the Transformer structure, layerNorm (x) represents the layer normalization operation, feed Forward (L) ₁ ) Denotes the output of the fully connected layer, Y _Encoder Represents the output of the Encoder layer, L ₁ Representing an intermediate variable;

further, the fully-connected layer comprises two layers of fully-connected neural networks, wherein Relu is adopted as an activation function in the first layer of fully-connected neural network, the activation function is not used in the second layer of fully-connected neural network, and the fully-connected layer is expressed as:

FeedForward(X)＝max(0,XW ₁ +b ₁ )W ₂ +b ₂ (5)

wherein X represents the input of the full connection layer, W ₁ 、W ₂ 、b ₁ 、b ₂ Representing the parameters to be optimized in the fully-connected layer.

Further, the step 2 specifically includes:

splitting a training process into two threads by adopting a real-time dynamic masking mechanism, wherein the two threads comprise a CPU thread and a GPU thread, the CPU thread is used for carrying out masking operation in real time based on a multiple text crossing mechanism, and the GPU thread is used for training masked samples; when the GPU thread trains one batch of samples, the CPU thread performs masking operation on the next batch of samples.

Further, the masking operation is performed in real time based on the multiple text intersection mechanism, and specifically includes:

assume that the original power text sequence is a ₁ ,a ₂ ,a ₃ …a _n After masking is carried out based on a multiple text crossing mechanism, the obtained sequence is

Wherein the set of indices of the masked characters is T = { T } ₁ ,t ₂ ,…,t _s Denotes the total number of characters covered, and any covered character t _n ＜N；

The PowerBERT inputs the training text each time, and the method is expressed as:

wherein [ CLS ] represents the starting position of corpus, [ SEP ] represents division among a plurality of corpus;

assuming that the length of a sample input in each training of the PowerBERT is M, if the length of a training text sequence is less than M-2, performing a completion operation, wherein the completion operation is marked as [ PAD ], and then the input of the training text is represented as:

if the sequence length of the training text is larger than M-2, truncating the training text to M-2 and inputting;

replacing 80% of the training text with mask, replacing 10% with random words, and keeping 10% original shape to finally obtain a masked sample;

the multiple text crossing mechanism comprises:

judging whether the training text comprises the electric power professional vocabulary or not, and if so, performing masking operation by adopting 40% character mask, 30% entity mask and 30% fragment mask; if not, then a 40% character mask +60% fragment mask masking operation is used.

Further, the training of the masked sample specifically includes:

according to the subscript set T = { T) of the characters to be covered ₁ ,t ₂ ,…,t _s H, output sequence Y from PowerBERT Power text model _PowerBERT And the output sequence Y of the embedding layer _Embeddings The mask for the output sequence is formed by extracting the columns corresponding to the masked characters, and is expressed as

And &>

Calculating probability distribution matrix P on the vocabulary corresponding to the mask position according to formula (8) ^mask ：

In the formula, W ^t Representing a word vector matrix, b is the bias coefficient to be trained, P ^mask As output mask characters

With each position y in the vocabulary _i The probability that the words of (a) are identical;

with the position y of each word in the vocabulary _i The cross entropy loss H (P) is calculated according to equation (9) ^mask ,w ^t )：

And optimizing parameters in each layer of transform-Encoder in the PowerBERT power text model by using a back propagation algorithm for cross entropy loss.

Has the beneficial effects that: compared with the prior art, the invention has the following advantages:

(1) According to the method, the PowerBERT model for the operation and maintenance text of the power transformer is constructed by analyzing the characteristics of the operation and maintenance text of the power transformer, and the model is pre-trained on unstructured data such as operation regulations, technical standards and defect records; in the pre-training process, a plurality of mask mechanisms and mask mechanism dynamic loading strategies are adopted; finally, a large-scale pre-training language model with better effect on electric power text analysis is obtained;

(2) The invention uses a pre-training mechanism based on the cross combination of a plurality of mask mechanisms such as a character mask, an Entity mask, a fragment mask and the like, and a dynamic loading strategy of the mask mechanism, thereby improving the integrity of complex text terms and structure training in the electric power field, improving the knowledge extraction and processing capabilities of the structure, and avoiding the problem of repeated trapping in local optimization in multiple rounds of training.

Drawings

FIG. 1 is a PowerBERT basic architecture;

FIG. 2 is an embedded layer basic architecture;

FIG. 3 is a schematic view of a multi-head attention mechanism;

FIG. 4 is a schematic diagram of a multiple mask cross training mechanism;

FIG. 5 is a schematic diagram of a PowerBERT training process;

FIG. 6 is a mask probability distribution graph;

fig. 7 is a diagram illustrating an error in the training process.

Detailed Description

The technical solution of the present invention will be further explained with reference to the accompanying drawings and embodiments.

The embodiment discloses a semantic analysis method for an electric power text, which mainly comprises the following steps:

step 1: aiming at the unstructured and complex text of the power equipment, a BERT-based PowerBERT power text model is constructed, and the deep semantic features of the power text can be effectively captured; the specific operation of the step comprises the following steps:

as shown in fig. 1, powerBERT electricityThe framework of the force text model is based on BERT and consists of an embedded layer, a plurality of layers of transform-encoders and an output layer, and the input original corpus is marked as X _PowerBERT . The embedding layer comprises word embedding, block embedding and position embedding, and converts the codes and positions of characters in the input original corpus into corresponding vector information. Each layer of Transformer-Encoder comprises a Transformer structure and an Encoder structure, the Transformer structure is used for capturing the intrinsic meanings in the corpus, the Encoder structure is used for merging, linking and normalizing the weight of each layer, the input of the first Transformer-Encoder structure is the output of the embedded layer, namely Y _Embeddings The input of the subsequent transform-Encoder structure is the output of the previous transform-Encoder structure, and the output of the last transform-Encoder structure is the encoding matrix Y of the electric power corpus _Embeddings For final training (Fine-Tuning) and subsequent downstream tasks.

The embedding layer will now be described as follows:

the embedding layer is used for processing an original text input sequence into a vector matrix which can be calculated by BERT, and in order to completely represent text corpus information, each section of input text is decomposed into three embedding matrixes of words, blocks and positions. Where [ CLS ] represents the starting position of the corpus and [ SEP ] represents the division between multiple corpus.

The word embedding model is based on a word vector matrix W ^t Original text X _PowerBERT Conversion into a real-valued vector V ^t Namely:

V ^t ＝X _PowerBERT ×W ^t (1)

equation (1) is expressed as a word vector matrix W ^t The specific vocabulary record represented by each row in the vocabulary is recorded in a vocabulary vector V in a mode of one-hot encoding (OneHotEncoding) ^t In (1).

The block embedding model refers to which block the current word belongs to for encoding, and is used for distinguishing by block vectors when the same word repeatedly appears at different positions in the same sentence.

The position embedding model is used for representing the absolute position of each word so as to record the position information of each word in a sentence.

The Transformer structure mainly includes a Multi-Head Attention layer (Multi-Head-Attention), which is formed by combining a plurality of Self-Attention layers (Self-Attention), as shown in fig. 3.

Assume the input of the multi-head attention layer is matrix X _MHA The inputs Q, K, V from the attention layer can then be calculated according to equation (2).

In the formula W _Q 、W _K 、W _V Is a transformation parameter matrix to be trained.

The Self-Attention layer (Self-Attention) calculates the above inputs Q, K, V according to the formula (3),

in the formula, d _k To dimension the input matrix, softmax (, x) is the activation function.

The output of the multi-layer self-attention layer is spliced and then is subjected to linear transformation to be used as the output matrix Y of the layer _MHA 。

Compared with the traditional recurrent neural network model (such as LSTM, RNN, ELMO and the like), the distance of the words can be ignored by adopting the self-attention layer, all the words are trained simultaneously, and no information attenuation exists, so that the operation efficiency is improved, and the problem of long-distance attenuation of the traditional recurrent neural network model in the operation process is solved.

The Encoder structure of this embodiment is mainly composed of a normalization layer and a full connection layer, and the calculation method is shown in formula (4):

in the formula, X _Encoder Input representing the Encoder structure, MHA (X) _Encoder ) Represents the output of the Transformer structure, layerNorm (x) represents the layer normalization operation performed on the matrix,FeedForward(L ₁ ) In this embodiment, the fully-connected layer includes two layers of fully-connected neural networks, the first layer uses Relu as an activation function, and the second layer does not use an activation function, and a specific model is shown in formula (5).

FeedForward(X)＝max(0,XW ₁ +b ₁ )W ₂ +b ₂ (5)

Wherein X represents the input of the full connection layer, W ₁ 、W ₂ 、b ₁ 、b ₂ Representing the parameters to be trained in the fully-connected layer.

Output Y of equation (4) _Encoder The output of the Encoder layer, i.e. the encoded text vector, is represented, which is more reflective of the semantic information of the text. And the output Y of the previous layer of the transform-Encoder structure _Encoder Can be used as the input of the next layer of transform-Encoder structure for further encoding. And (4) extracting the semantic information of the text deep level by connecting a plurality of transform-Encoder structures.

And 2, step: aiming at the complexity of the PowerBERT power text model constructed in the step 1, a model training method based on a power text is provided, the training method comprises a multiple text crossing mechanism and a dynamic loading strategy, and the collected power text can be effectively utilized to efficiently train the PowerBERT power text model; the specific operation comprises the following steps:

in this embodiment, powerBERT pre-training is performed based on a mask training thought, words in the input corpus are masked in advance, and the vocabulary at the mask position is restored by using context information. The training mode can avoid the problem of information exposure defect existing in the traditional NLP algorithm (the algorithm reversely deduces words needing to be predicted in the forward model from the reverse model). Meanwhile, a masking mechanism of a cross combination of multiple masking mechanisms such as an entity masking and a fragment masking and a masking mechanism dynamic loading strategy in a training process are added on the basis of a character masking of the general BERT, so that the problem of local optimum in multiple rounds of training is avoided, and the text understanding capability of the model is enhanced.

Assume that the original text sequence is a ₁ ,a ₂ ,a ₃ …a _n The sequence after the mask is

Where the set of subscripts for the masked characters is T = { T = ₁ ,t ₂ ,…,t _s Denotes the total number of masked characters, and any masked character t _n < N. The input of the PowerBERT training text once is shown in equation (6) below.

Assuming that the sample length of each training input of the PowerBERT is M, if the text sequence length is less than M-2, a completion operation needs to be performed, and the text completion operation is marked as "[ PAD ]", the input of the training text is as shown in the following formula (7):

if the length of the text sequence is larger than M-2, the text is required to be cut off to M-2 and then input.

In the mask training process, a traditional mask mode is to mask a single character, under the condition, a pre-training model can guess the content to be masked according to words before and after the pre-training model, so that the comprehension capability of the whole sentence is weakened, for example, in the oil leakage of the submersible pump of the transformer cooler, if the word of oil is masked, the word of oil is changed into the oil leakage of the submersible pump of the transformer cooler, the trained model can guess that [ mask ] "is oil" according to the word of [ mask ] "before and after the submersible pump" and the word of oil, and other components of the sentence do not need to be concerned, so that the situation is locally optimal. If the whole submersible pump is used as a named entity in the power field for masking, even if the whole segment of the submersible pump of the cooler is used for masking, the model is more difficult, and the masked information needs to be guessed by combining the context, so that the training effect is improved.

Therefore, in the present embodiment, a cross training strategy of a character mask, an entity mask and a fragment mask is adopted in pre-training a model, specifically, as shown in fig. 4, it is first determined whether a text of an electrical device includes an electrical professional vocabulary, and if so, a cross training strategy of 40% of the character mask, 30% of the entity mask and 30% of the fragment mask is adopted; if not, adopting a cross training strategy of 40% character mask and 60% fragment mask; then, 80% is replaced by mask, 10% is replaced by random words, and 10% is kept as it is, and finally mask text is obtained.

As shown in fig. 5, the mask training of the general BERT model is performed in the data preprocessing stage, so that the same corpus has only one mask pattern, thereby reducing the multiplexing efficiency of the training data. In addition, the efficiency of the computation is reduced because the CPU and the GPU work in series. Therefore, the embodiment adopts a real-time Dynamic Masking (Dynamic Masking) mechanism, and splits the training process into two threads: the CPU thread is responsible for real-time dynamic mask operation, the GPU thread is responsible for training samples of the mask, and when one batch of samples are trained by the GPU, the CPU carries out mask on the next batch of samples, so that computing resources are fully utilized, training time is shortened, and training efficiency and integrity are improved.

The method for calculating the training error of the PowerBERT model comprises the following steps:

first, the output sequence Y from PowerBERT _PowerBERT And an embedding layer output sequence Y _Embeddings In accordance with the subscript set T = { T ] of the masked character ₁ ,t ₂ ,…,t _s Extracting columns corresponding to the masked characters to form a masked representation of the output sequence

And &>

Then, according to the formula (8), calculating the probability distribution matrix P on the vocabulary corresponding to the mask position ^mask ：

In the formula, W ^t Representing a word vector matrix, b being the bias coefficient to be trained, P ^mask Can be understood as the output mask character

With each position y in the vocabulary _i The probability of the same word. Subsequently, is taken up>

With the position y of each word in the vocabulary _i The cross entropy loss H (P) is calculated according to equation (9) ^mask ,w ^t )。

And then, the loss is used for optimizing parameters in each transform-Encoder in the PowerBERT model by using a back propagation algorithm. The calculation process is shown in fig. 6.

To accommodate Chinese text, the present embodiment is based on Chinese Wikipedia as the training corpus of the general BERT. However, because the power text contains a large number of proper nouns and terms, the model of basic corpus training often does not perform well in the power text mining task. Therefore, the present embodiment also collects the operation rules and the general System of the Power equipment, the related technical standards of the Power equipment, and the records of the defects of the Power Production Management System (PMS) of the Power equipment in the last decade as the training materials, which are detailed in table 1.

TABLE 1PowerBERT training corpus

/>

In the pre-training process, the entity needs to be masked, and the used electric power professional vocabulary data set is shown in table 2:

TABLE 2 electric power specialty glossary data sheet

In this embodiment, based on the above table 1 and table 2, different corpora and training methods are used to develop pre-training of multiple models, where the multiple models include: general BERT and power text BERT; wherein, the universal BERT is obtained based on the Wikipedia Chinese edition and the universal BERT architecture and the training method, the universal BERT is combined with the operation rule and the management regulation of the power equipment, the technical standard of the power equipment and the defect record of the power equipment to obtain the power text BERT,

in order to compare the merits of the training method of this embodiment, the power text BERT and the PowerBERT obtained based on the training method proposed in this embodiment using the same linguistic data as the power text BERT are subjected to a comparative test. The model training parameters are shown in table 3, and the error in the training process is shown in fig. 7.

TABLE 3PowerBERT Pre-training parameters

/>

Example 2:

the embodiment discloses a power text processing method, which comprises the following steps:

aiming at the unstructured property of the power text, a power text entity extraction model is adopted to extract entity words and power proper nouns in the power text, and the entity words and the power proper nouns can help a physical examination worker to quickly find out really needed information in a mass information source.

The main tasks of the electric power text entity extraction model are as follows: and giving a section of transformer operation and maintenance inspection text, and extracting entity words from the section of transformer operation and maintenance inspection text, wherein the entity words comprise name words, place words, organization names, time words and other name and word words.

How to enable the electric power text entity extraction model to accurately extract the required words requires that the electric power text entity extraction model can understand the self meaning of each word in the text, establish the relation between words and understand the semantics of the whole text.

The power text entity extraction model of this embodiment essentially labels an input sequence, that is, each element in the sequence is labeled with a certain label in a label set, and the power text entity extraction model mainly includes a language model and a classification layer for text encoding, where the language model for text encoding may be any one of an LSTM model, a general BERT model, a power text BERT model, or a PowerBERT model mentioned in embodiment 1. The classification layer mainly aims to perform sequence labeling, and a common sequence labeling task is mainly realized by using a Conditional Random Field (CRFs) model.

Let X = { X ₁ ,x ₂ ,…,x _n Represents the operation and maintenance text of the power transformer after being coded by PowerBERT, and is used as an input symbol sequence of CRF, and Y = { Y = } ₁ ,y ₂ ,…,y _n The output sequence of CRF represents the tag sequence, and the conditional random field structure model is shown in FIG. 1.

Conditional random field with X = { X = ₁ ,x ₂ ,……x _n Is the input sequence, Y = { Y = ₁ ,y ₂ ……y _n The conditional random field is a characteristic function constructed between the input sequence and the output state:

f(X,i,y _i ,y _i-1 )(10)

where i is the current position and y _i Indicates the current output tag, y _i-1 An output label representing the previous time instant. In the training process, the feature function f is equivalent to the known a priori knowledge, namely the feature provided by the training sample. With the known feature function, the probability distribution function of the CRF can be expressed as shown in the following equation (2):

and lambda is a parameter to be optimized, the weight of each characteristic function is represented, and the predicted output sequence has the highest probability distribution through the optimization parameter lambda.

Now, the following description is made on the electric power text entity extraction model:

the electric power text entity extraction model is a BERT-CRF model, and a BERT structure in the BERT-CRF model is the PowerBERT mentioned in the embodiment 1; firstly, inputting a text into a pre-trained BERT model to obtain coding information of the text; then the coding information is used as the input of a CRF structure, and the label corresponding to each word is obtained through a transfer matrix; and finally, entity extraction of the operation and maintenance text of the power transformer is realized, and the electric power professional vocabulary in the operation and maintenance text is identified.

In order to train the model to complete the electric power text entity extraction task, the electric power text entity extraction data set needs to be constructed first. The embodiment collects the operation rules and the general system of the electric power equipment, the relevant technical standards of the electric power equipment and the defect records of the PMS electric power equipment in the last decade as the training corpora.

The process of constructing the electric power text entity extraction data set comprises the following steps: searching sentences containing 2 to 8 professional vocabularies in a training text according to the electric power professional vocabularies, obtaining 840 samples through manual checking and comparison, randomly cutting according to 50%, 25% and 25%, wherein 50% is a training set, 25% is a verification set, and 25% is a test set, namely inputting 50% of training sentences into a model, searching the electric power professional vocabularies in the model, performing back propagation according to error results, and after each round of training, verifying the generalization capability of the current model by the verification set to determine whether to stop continuing training; and performing test verification on the test set after the model test is completed.

The electric power text entity extraction task is realized by reasonably setting training parameters, wherein the training parameters of the BERT-CRF model are as follows:

for a BERT series model, in the fine adjustment process, the number of training rounds is 5 rounds, the size of a training batch is 4, an AdamW optimizer is used, the learning rate is 5e-5, the warmup probability is 0.1, the weight \ u decade is 0.01, the adam_epsilon is 1e-8, the size of dorpout is 0.1, and finally a BIOES label is used for entity identification.

In this embodiment, a long-short term memory network (LSTM), a general BERT, a power text BERT, and a PowerBERT are selected as codes, and the coding results are input into CRF networks with the same structure, and are compared and verified on the same training set, with the test results shown in table 4.

Table 4 electric power text entity extraction experimental results

As can be seen in Table 4, the general BERT is 10% -20% improved in both the recall rate and accuracy of the validation set and the test set compared to the conventional LSTM model. Compared with other models, the PowerBERT through the power text and the improved training strategy achieves the best indexes.

The deep learning model usually needs to be trained on a large number of marked samples to obtain strong generalization capability, but the acquisition of the data usually needs manual marking, which is time-consuming and labor-consuming, so that if the deep learning model with strong classification performance can be trained by only a small number of training samples, convenience is provided for the actual engineering application of the model. In order to verify the entity extraction capability of each model under a small number of labeled training samples, the embodiment adopts 4 models, namely a long short term memory network (LSTM), a universal BERT, a power text BERT and a PowerBERT, to analyze the entity extraction performance of the power text under different training samples (30, 50 and 80), and the test results are shown in table 5.

TABLE 5 modified training sample number Power text entity extraction experiment

Analyzing the table 5, it can be found that, when the number of samples is less than or equal to 50, the LSTM result is better than the general BERT, but is worse than the electric power text BERT and the PowerBERT, which indicates that the general BERT has larger parameters, but the general BERT has unsatisfactory migration effect due to lack of power field related knowledge because the general BERT is not learned on the electric power text; the power text BERT and the Power BERT have power domain related knowledge, so the performance on small sample data is more excellent. And as the amount of data increases, the effects of LSTM, general BERT, power text BERT, and PowerBERT all increase.

When the number of samples is 80, the recall rate of the PowerBERT reaches 81.26%, the accuracy reaches 81.6%, both of the PowerBERT and the PowerBERT exceed 80%, the F1 fraction reaches 0.8015, and the PowerBERT, the PowerBERT and the PowerBERT are all higher than all other models, wherein the recall rate is increased by 11.91% and the accuracy is increased by 8.26% compared with an LSTM model; compared with the general BERT, the recall rate is improved by 10.06%, and the accuracy is improved by 4.43%; compared with the electric text BERT, the recall rate is improved by 4.40%, the precision is improved by 5.85%, and the recall rate and the precision are obviously improved.

And the performance of PowerBERT also leads other models at all times with continuing to increase the number of training samples. In practical application, it is easier to obtain more than 80 labeled samples, so that the PowerBERT can perform better than the LSTM, the power text BERT and the general BERT in practical engineering application.

Model comparison study:

when the electric text entity is extracted based on the PowerBERT-CRF model, the text is coded through the PowerBERT, and then a final entity extraction result is obtained through a CRF layer. In order to verify whether the effect of extracting the power text entity is influenced by accessing other neural network structures (such as a linear layer and an LSTM layer) after the PowerBERT and inputting the output of the neural network structures into the CRF, the embodiment takes the PowerBERT as text coding, inputs the output of the BERT into different neural network structures for further coding, and finally inputs the output into the CRF for entity extraction and compares the classification precision of the entities. The model types tested in this example are PowerBERT-CRF (meaning that the CRF layer is directly accessed after PowerBERT), powerBERT-Liner-CRF (meaning that a linear layer is accessed after PowerBERT, and then the output of the linear layer is input to the CRF layer), and PowerBERT-LSTM-CRF (meaning that an LSTM layer is accessed after PowerBERT for further encoding, and then the output of LSTM is input to the CRF layer).

The results of the experiment are shown in table 6 below:

table 6 electric power text entity extraction experiment for modifying number of output layers

As can be seen from the above table, the PowerBERT-LSTM-CRF structure has the highest recall rate in terms of recall rate, which is improved by 2% to 3% compared with the other two structures, and there is not much difference between the PowerBERT-CRF and the PowerBERT-Liner-CRF. And in terms of accuracy, the PowerBERT-Liner-CRF has the highest accuracy, and is increased by more than 1% compared with the PowerBERT-CRF and is increased by more than 2% compared with the PowerBERT-LSTM-CRF. Although the PowerBERT-LSTM-CRF structure has the optimal effect, the LSTM structure is a circular neural network, and the output of the next step needs the result of the previous step, so that parallel calculation cannot be performed between networks, the GPU utilization rate is low, and the model training and prediction time consumption is greatly increased. Therefore, the PowerBERT-CRF can be considered preferentially under the condition of pursuing model reasoning speed and memory and having low requirements on recall and accuracy.

Claims

1. A semantic analysis method for power text is characterized in that: the method comprises the following steps:

step 2: training the PowerBERT electric power text model based on a multiple text crossing mechanism and a real-time dynamic mask mechanism to obtain a trained PowerBERT electric power text model;

and 3, step 3: and inputting the power text to be subjected to semantic analysis into the trained PowerBERT power text model to obtain a semantic analysis result.

2. The electric power text semantic analysis method according to claim 1, characterized in that: the embedding layer includes: a word embedding model for breaking down the input power text into word vectors, a block embedding model for distinguishing which block a word belongs to, and a position embedding model for representing the absolute position of each word.

3. The electric power text semantic analysis method according to claim 1, characterized in that: each layer of the transform-Encoder comprises a transform structure and an Encoder structure, the input of the first layer of the transform-Encoder is the output of the embedding layer, the input of the next layer of the transform-Encoder is the output of the previous layer of the transform-Encoder, and the output of the last layer of the transform-Encoder is the input of the output layer.

4. The electric power text semantic analysis method according to claim 3, characterized in that: the Transformer structure comprises a multi-head attention layer, wherein the multi-head attention layer is formed by combining a plurality of self-attention layers;

the output from the attention layer is expressed as:

in the formula (d) _k For the dimension of the input matrix, softmax (, is the activation function;

5. The electric power text semantic analysis method according to claim 3, characterized in that: the Encoder structure includes a normalization layer and a full connection layer, represented as:

in the formula, X _Encoder Input representing the Encoder structure, MHA (X) _Encoder ) Represents the output of the Transformer structure, layerNorm (x) represents the layer normalization operation, feed Forward (L) ₁ ) Denotes the output of the fully connected layer, Y _Encoder Represents the output of the Encoder layer, L ₁ Representing an intermediate variable.

6. The electric power text semantic analysis method according to claim 5, characterized in that: the fully-connected layer comprises two layers of fully-connected neural networks, wherein Relu is adopted as an activation function in the first layer of fully-connected neural network, the activation function is not used in the second layer of fully-connected neural network, and the fully-connected layer is expressed as follows:

FeedForward(X)＝max(0,XW ₁ +b ₁ )W ₂ +b ₂ (5) Wherein X represents the input of the full connection layer, W ₁ 、W ₂ 、b ₁ 、b ₂ Representing the parameters to be optimized in the fully connected layer.

7. The electric power text semantic analysis method according to claim 1, characterized in that: the step 2 specifically comprises:

8. The electric power text semantic analysis method according to claim 7, characterized in that: the masking operation is performed in real time based on the multiple text intersection mechanism, and specifically comprises the following steps:

Where the set of subscripts for the masked characters is T = { T = ₁ ,t ₂ ,…,t _s Denotes the total number of characters that are masked, and any masked characters t _n < N; n represents the maximum number of masked characters;

wherein [ CLS ] represents the starting position of the corpus, and [ SEP ] represents the division among a plurality of corpus;

assuming that the length of an input sample of PowerBERT training each time is M, if the length of a training text sequence is less than M-2, performing a padding operation, wherein a padding operation mark is [ PAD ], and then the input of a training text is represented as:

the multiple text crossing mechanism comprises:

judging whether the training text comprises the electric power professional vocabulary or not, and if so, performing masking operation by adopting 40% of character mask, 30% of entity mask and 30% of fragment mask; if not, then a 40% character mask +60% fragment mask masking operation is used.

9. The electric power text semantic analysis method according to claim 7, characterized in that: the training of the masked sample specifically includes:

according to subscript of the characters to be maskedSet T = { T = } ₁ ,t ₂ ,…,t _s H, output sequence Y from PowerBERT Power text model _PowerBERT And the output sequence Y of the embedding layer _Embeddings The mask for the output sequence is formed by extracting the columns corresponding to the masked characters, and is expressed as

And &>

In the formula, W ^t Representing a word vector matrix, b being the bias coefficient to be trained, P ^mask As output mask characters

with the position y of each word in the vocabulary _i The cross entropy loss H (P) is calculated according to equation (9) ^mask ,w ^t )：/>

And optimizing parameters in each layer of transform-Encoder in the PowerBERT power text model by using a back propagation algorithm to the cross entropy loss.