CN115936009A - Electric power text semantic analysis method - Google Patents

Electric power text semantic analysis method Download PDF

Info

Publication number
CN115936009A
CN115936009A CN202211693288.6A CN202211693288A CN115936009A CN 115936009 A CN115936009 A CN 115936009A CN 202211693288 A CN202211693288 A CN 202211693288A CN 115936009 A CN115936009 A CN 115936009A
Authority
CN
China
Prior art keywords
text
layer
powerbert
training
mask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211693288.6A
Other languages
Chinese (zh)
Inventor
贾骏
杨景刚
付慧
张国江
胡成博
路永玲
李双伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Jiangsu Electric Power Co Ltd
Electric Power Research Institute of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
State Grid Jiangsu Electric Power Co Ltd
Electric Power Research Institute of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Jiangsu Electric Power Co Ltd, Electric Power Research Institute of State Grid Jiangsu Electric Power Co Ltd filed Critical State Grid Jiangsu Electric Power Co Ltd
Priority to CN202211693288.6A priority Critical patent/CN115936009A/en
Publication of CN115936009A publication Critical patent/CN115936009A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a semantic analysis method for an electric power text, which comprises the following steps: constructing a PowerBERT power text model, wherein the PowerBERT power text model comprises an embedded layer, a plurality of layers of transform-encoders and an output layer; the embedded layer is used for converting characters and positions in the input power text into corresponding vector information; the Transformer-Encoder is used for capturing the intrinsic meaning in the vector information output by the embedded layer to obtain an encoding matrix of the electric power text; training the PowerBERT electric power text model based on a multiple text crossing mechanism and a real-time dynamic mask mechanism to obtain a trained PowerBERT electric power text model; and inputting the power text to be subjected to semantic analysis into the trained PowerBERT power text model to obtain a semantic analysis result.

Description

Electric power text semantic analysis method
Technical Field
The invention belongs to the technical field of power equipment fault diagnosis, and particularly relates to a power text semantic analysis method.
Background
At present, research aiming at intelligent operation and maintenance of the power transformer mainly focuses on mining and analyzing state data of structured equipment, and research on unstructured data of an operation and maintenance text of the power transformer starts slowly. Due to the fact that a large number of complex text terms exist in unstructured data of the power transformer operation and maintenance text, the existing text classification model cannot well deal with the unstructured data of the power transformer operation and maintenance text, namely the classification accuracy is not high, and terms and structures in the operation and maintenance text cannot be well understood.
Disclosure of Invention
The invention aims to: in order to solve the problem that the existing text classification model cannot well deal with unstructured data of the operation and maintenance text of the power transformer, the invention provides a semantic analysis method for the power text, which can greatly improve the integrity of complex text terms and structure training in the power field, avoid the problem of repeated trapping in local optimization in multiple rounds of training, and realize the mining and analysis of the unstructured data of the operation and maintenance text of the power transformer.
The technical scheme is as follows: a power text semantic analysis method comprises the following steps:
step 1: constructing a PowerBERT power text model, wherein the PowerBERT power text model comprises an embedded layer, a plurality of layers of transform-encoders and an output layer; the embedded layer is used for converting characters and positions in the input power text into corresponding vector information; the Transformer-Encoder is used for capturing the intrinsic meaning in the vector information output by the embedded layer to obtain an encoding matrix of the electric power text;
and 2, step: training the PowerBERT power text model based on a multiple text crossing mechanism and a real-time dynamic mask mechanism to obtain a trained PowerBERT power text model;
and step 3: and inputting the power text to be classified into the trained PowerBERT power text model to obtain a classification result.
Further, the embedding layer includes: a word embedding model for breaking down the input power text into word vectors, a block embedding model for distinguishing which block a word belongs to, and a position embedding model for representing the absolute position of each word.
Furthermore, each layer of the transform-Encoder comprises a transform structure and an Encoder structure, the input of the first layer of the transform-Encoder is the output of the embedding layer, the input of the next layer of the transform-Encoder is the output of the previous layer of the transform-Encoder, and the output of the last layer of the transform-Encoder is the input of the output layer.
Further, the Transformer structure comprises a multi-head attention layer, wherein the multi-head attention layer is formed by combining a plurality of self-attention layers;
assume the input of the multi-head attention layer is matrix X MHA The inputs Q, K, V from the attention layer are derived according to the following equation:
Figure SMS_1
/>
in the formula, W Q 、W K 、W V A transformation parameter matrix to be optimized;
the output from the attention layer is expressed as:
Figure SMS_2
in the formula, d k For the dimension of the input matrix, softmax (, is the activation function;
and splicing the outputs of the multiple self-attention layers, and performing linear transformation to obtain the output of the current Transformer structure.
Further, the Encoder structure comprises a normalization layer and a full connection layer, and is represented as:
Figure SMS_3
in the formula, X Encoder Input representing the Encoder structure, MHA (X) Encoder ) Represents the output of the Transformer structure, layerNorm (x) represents the layer normalization operation, feed Forward (L) 1 ) Denotes the output of the fully connected layer, Y Encoder Represents the output of the Encoder layer, L 1 Representing an intermediate variable;
further, the fully-connected layer comprises two layers of fully-connected neural networks, wherein Relu is adopted as an activation function in the first layer of fully-connected neural network, the activation function is not used in the second layer of fully-connected neural network, and the fully-connected layer is expressed as:
FeedForward(X)=max(0,XW 1 +b 1 )W 2 +b 2 (5)
wherein X represents the input of the full connection layer, W 1 、W 2 、b 1 、b 2 Representing the parameters to be optimized in the fully-connected layer.
Further, the step 2 specifically includes:
splitting a training process into two threads by adopting a real-time dynamic masking mechanism, wherein the two threads comprise a CPU thread and a GPU thread, the CPU thread is used for carrying out masking operation in real time based on a multiple text crossing mechanism, and the GPU thread is used for training masked samples; when the GPU thread trains one batch of samples, the CPU thread performs masking operation on the next batch of samples.
Further, the masking operation is performed in real time based on the multiple text intersection mechanism, and specifically includes:
assume that the original power text sequence is a 1 ,a 2 ,a 3 …a n After masking is carried out based on a multiple text crossing mechanism, the obtained sequence is
Figure SMS_4
Wherein the set of indices of the masked characters is T = { T } 1 ,t 2 ,…,t s Denotes the total number of characters covered, and any covered character t n <N;
The PowerBERT inputs the training text each time, and the method is expressed as:
Figure SMS_5
wherein [ CLS ] represents the starting position of corpus, [ SEP ] represents division among a plurality of corpus;
assuming that the length of a sample input in each training of the PowerBERT is M, if the length of a training text sequence is less than M-2, performing a completion operation, wherein the completion operation is marked as [ PAD ], and then the input of the training text is represented as:
Figure SMS_6
if the sequence length of the training text is larger than M-2, truncating the training text to M-2 and inputting;
replacing 80% of the training text with mask, replacing 10% with random words, and keeping 10% original shape to finally obtain a masked sample;
the multiple text crossing mechanism comprises:
judging whether the training text comprises the electric power professional vocabulary or not, and if so, performing masking operation by adopting 40% character mask, 30% entity mask and 30% fragment mask; if not, then a 40% character mask +60% fragment mask masking operation is used.
Further, the training of the masked sample specifically includes:
according to the subscript set T = { T) of the characters to be covered 1 ,t 2 ,…,t s H, output sequence Y from PowerBERT Power text model PowerBERT And the output sequence Y of the embedding layer Embeddings The mask for the output sequence is formed by extracting the columns corresponding to the masked characters, and is expressed as
Figure SMS_7
And &>
Figure SMS_8
Calculating probability distribution matrix P on the vocabulary corresponding to the mask position according to formula (8) mask
Figure SMS_9
In the formula, W t Representing a word vector matrix, b is the bias coefficient to be trained, P mask As output mask characters
Figure SMS_10
With each position y in the vocabulary i The probability that the words of (a) are identical;
Figure SMS_11
with the position y of each word in the vocabulary i The cross entropy loss H (P) is calculated according to equation (9) mask ,w t ):
Figure SMS_12
And optimizing parameters in each layer of transform-Encoder in the PowerBERT power text model by using a back propagation algorithm for cross entropy loss.
Has the beneficial effects that: compared with the prior art, the invention has the following advantages:
(1) According to the method, the PowerBERT model for the operation and maintenance text of the power transformer is constructed by analyzing the characteristics of the operation and maintenance text of the power transformer, and the model is pre-trained on unstructured data such as operation regulations, technical standards and defect records; in the pre-training process, a plurality of mask mechanisms and mask mechanism dynamic loading strategies are adopted; finally, a large-scale pre-training language model with better effect on electric power text analysis is obtained;
(2) The invention uses a pre-training mechanism based on the cross combination of a plurality of mask mechanisms such as a character mask, an Entity mask, a fragment mask and the like, and a dynamic loading strategy of the mask mechanism, thereby improving the integrity of complex text terms and structure training in the electric power field, improving the knowledge extraction and processing capabilities of the structure, and avoiding the problem of repeated trapping in local optimization in multiple rounds of training.
Drawings
FIG. 1 is a PowerBERT basic architecture;
FIG. 2 is an embedded layer basic architecture;
FIG. 3 is a schematic view of a multi-head attention mechanism;
FIG. 4 is a schematic diagram of a multiple mask cross training mechanism;
FIG. 5 is a schematic diagram of a PowerBERT training process;
FIG. 6 is a mask probability distribution graph;
fig. 7 is a diagram illustrating an error in the training process.
Detailed Description
The technical solution of the present invention will be further explained with reference to the accompanying drawings and embodiments.
The embodiment discloses a semantic analysis method for an electric power text, which mainly comprises the following steps:
step 1: aiming at the unstructured and complex text of the power equipment, a BERT-based PowerBERT power text model is constructed, and the deep semantic features of the power text can be effectively captured; the specific operation of the step comprises the following steps:
as shown in fig. 1, powerBERT electricityThe framework of the force text model is based on BERT and consists of an embedded layer, a plurality of layers of transform-encoders and an output layer, and the input original corpus is marked as X PowerBERT . The embedding layer comprises word embedding, block embedding and position embedding, and converts the codes and positions of characters in the input original corpus into corresponding vector information. Each layer of Transformer-Encoder comprises a Transformer structure and an Encoder structure, the Transformer structure is used for capturing the intrinsic meanings in the corpus, the Encoder structure is used for merging, linking and normalizing the weight of each layer, the input of the first Transformer-Encoder structure is the output of the embedded layer, namely Y Embeddings The input of the subsequent transform-Encoder structure is the output of the previous transform-Encoder structure, and the output of the last transform-Encoder structure is the encoding matrix Y of the electric power corpus Embeddings For final training (Fine-Tuning) and subsequent downstream tasks.
The embedding layer will now be described as follows:
the embedding layer is used for processing an original text input sequence into a vector matrix which can be calculated by BERT, and in order to completely represent text corpus information, each section of input text is decomposed into three embedding matrixes of words, blocks and positions. Where [ CLS ] represents the starting position of the corpus and [ SEP ] represents the division between multiple corpus.
The word embedding model is based on a word vector matrix W t Original text X PowerBERT Conversion into a real-valued vector V t Namely:
V t =X PowerBERT ×W t (1)
equation (1) is expressed as a word vector matrix W t The specific vocabulary record represented by each row in the vocabulary is recorded in a vocabulary vector V in a mode of one-hot encoding (OneHotEncoding) t In (1).
The block embedding model refers to which block the current word belongs to for encoding, and is used for distinguishing by block vectors when the same word repeatedly appears at different positions in the same sentence.
The position embedding model is used for representing the absolute position of each word so as to record the position information of each word in a sentence.
The Transformer structure mainly includes a Multi-Head Attention layer (Multi-Head-Attention), which is formed by combining a plurality of Self-Attention layers (Self-Attention), as shown in fig. 3.
Assume the input of the multi-head attention layer is matrix X MHA The inputs Q, K, V from the attention layer can then be calculated according to equation (2).
Figure SMS_13
In the formula W Q 、W K 、W V Is a transformation parameter matrix to be trained.
The Self-Attention layer (Self-Attention) calculates the above inputs Q, K, V according to the formula (3),
Figure SMS_14
in the formula, d k To dimension the input matrix, softmax (, x) is the activation function.
The output of the multi-layer self-attention layer is spliced and then is subjected to linear transformation to be used as the output matrix Y of the layer MHA
Compared with the traditional recurrent neural network model (such as LSTM, RNN, ELMO and the like), the distance of the words can be ignored by adopting the self-attention layer, all the words are trained simultaneously, and no information attenuation exists, so that the operation efficiency is improved, and the problem of long-distance attenuation of the traditional recurrent neural network model in the operation process is solved.
The Encoder structure of this embodiment is mainly composed of a normalization layer and a full connection layer, and the calculation method is shown in formula (4):
Figure SMS_15
in the formula, X Encoder Input representing the Encoder structure, MHA (X) Encoder ) Represents the output of the Transformer structure, layerNorm (x) represents the layer normalization operation performed on the matrix,FeedForward(L 1 ) In this embodiment, the fully-connected layer includes two layers of fully-connected neural networks, the first layer uses Relu as an activation function, and the second layer does not use an activation function, and a specific model is shown in formula (5).
FeedForward(X)=max(0,XW 1 +b 1 )W 2 +b 2 (5)
Wherein X represents the input of the full connection layer, W 1 、W 2 、b 1 、b 2 Representing the parameters to be trained in the fully-connected layer.
Output Y of equation (4) Encoder The output of the Encoder layer, i.e. the encoded text vector, is represented, which is more reflective of the semantic information of the text. And the output Y of the previous layer of the transform-Encoder structure Encoder Can be used as the input of the next layer of transform-Encoder structure for further encoding. And (4) extracting the semantic information of the text deep level by connecting a plurality of transform-Encoder structures.
And 2, step: aiming at the complexity of the PowerBERT power text model constructed in the step 1, a model training method based on a power text is provided, the training method comprises a multiple text crossing mechanism and a dynamic loading strategy, and the collected power text can be effectively utilized to efficiently train the PowerBERT power text model; the specific operation comprises the following steps:
in this embodiment, powerBERT pre-training is performed based on a mask training thought, words in the input corpus are masked in advance, and the vocabulary at the mask position is restored by using context information. The training mode can avoid the problem of information exposure defect existing in the traditional NLP algorithm (the algorithm reversely deduces words needing to be predicted in the forward model from the reverse model). Meanwhile, a masking mechanism of a cross combination of multiple masking mechanisms such as an entity masking and a fragment masking and a masking mechanism dynamic loading strategy in a training process are added on the basis of a character masking of the general BERT, so that the problem of local optimum in multiple rounds of training is avoided, and the text understanding capability of the model is enhanced.
Assume that the original text sequence is a 1 ,a 2 ,a 3 …a n The sequence after the mask is
Figure SMS_16
Where the set of subscripts for the masked characters is T = { T = 1 ,t 2 ,…,t s Denotes the total number of masked characters, and any masked character t n < N. The input of the PowerBERT training text once is shown in equation (6) below.
Figure SMS_17
Assuming that the sample length of each training input of the PowerBERT is M, if the text sequence length is less than M-2, a completion operation needs to be performed, and the text completion operation is marked as "[ PAD ]", the input of the training text is as shown in the following formula (7):
Figure SMS_18
if the length of the text sequence is larger than M-2, the text is required to be cut off to M-2 and then input.
In the mask training process, a traditional mask mode is to mask a single character, under the condition, a pre-training model can guess the content to be masked according to words before and after the pre-training model, so that the comprehension capability of the whole sentence is weakened, for example, in the oil leakage of the submersible pump of the transformer cooler, if the word of oil is masked, the word of oil is changed into the oil leakage of the submersible pump of the transformer cooler, the trained model can guess that [ mask ] "is oil" according to the word of [ mask ] "before and after the submersible pump" and the word of oil, and other components of the sentence do not need to be concerned, so that the situation is locally optimal. If the whole submersible pump is used as a named entity in the power field for masking, even if the whole segment of the submersible pump of the cooler is used for masking, the model is more difficult, and the masked information needs to be guessed by combining the context, so that the training effect is improved.
Therefore, in the present embodiment, a cross training strategy of a character mask, an entity mask and a fragment mask is adopted in pre-training a model, specifically, as shown in fig. 4, it is first determined whether a text of an electrical device includes an electrical professional vocabulary, and if so, a cross training strategy of 40% of the character mask, 30% of the entity mask and 30% of the fragment mask is adopted; if not, adopting a cross training strategy of 40% character mask and 60% fragment mask; then, 80% is replaced by mask, 10% is replaced by random words, and 10% is kept as it is, and finally mask text is obtained.
As shown in fig. 5, the mask training of the general BERT model is performed in the data preprocessing stage, so that the same corpus has only one mask pattern, thereby reducing the multiplexing efficiency of the training data. In addition, the efficiency of the computation is reduced because the CPU and the GPU work in series. Therefore, the embodiment adopts a real-time Dynamic Masking (Dynamic Masking) mechanism, and splits the training process into two threads: the CPU thread is responsible for real-time dynamic mask operation, the GPU thread is responsible for training samples of the mask, and when one batch of samples are trained by the GPU, the CPU carries out mask on the next batch of samples, so that computing resources are fully utilized, training time is shortened, and training efficiency and integrity are improved.
The method for calculating the training error of the PowerBERT model comprises the following steps:
first, the output sequence Y from PowerBERT PowerBERT And an embedding layer output sequence Y Embeddings In accordance with the subscript set T = { T ] of the masked character 1 ,t 2 ,…,t s Extracting columns corresponding to the masked characters to form a masked representation of the output sequence
Figure SMS_19
And &>
Figure SMS_20
Then, according to the formula (8), calculating the probability distribution matrix P on the vocabulary corresponding to the mask position mask
Figure SMS_21
In the formula, W t Representing a word vector matrix, b being the bias coefficient to be trained, P mask Can be understood as the output mask character
Figure SMS_22
With each position y in the vocabulary i The probability of the same word. Subsequently, is taken up>
Figure SMS_23
With the position y of each word in the vocabulary i The cross entropy loss H (P) is calculated according to equation (9) mask ,w t )。
Figure SMS_24
And then, the loss is used for optimizing parameters in each transform-Encoder in the PowerBERT model by using a back propagation algorithm. The calculation process is shown in fig. 6.
To accommodate Chinese text, the present embodiment is based on Chinese Wikipedia as the training corpus of the general BERT. However, because the power text contains a large number of proper nouns and terms, the model of basic corpus training often does not perform well in the power text mining task. Therefore, the present embodiment also collects the operation rules and the general System of the Power equipment, the related technical standards of the Power equipment, and the records of the defects of the Power Production Management System (PMS) of the Power equipment in the last decade as the training materials, which are detailed in table 1.
TABLE 1PowerBERT training corpus
Figure SMS_25
/>
In the pre-training process, the entity needs to be masked, and the used electric power professional vocabulary data set is shown in table 2:
TABLE 2 electric power specialty glossary data sheet
Figure SMS_26
In this embodiment, based on the above table 1 and table 2, different corpora and training methods are used to develop pre-training of multiple models, where the multiple models include: general BERT and power text BERT; wherein, the universal BERT is obtained based on the Wikipedia Chinese edition and the universal BERT architecture and the training method, the universal BERT is combined with the operation rule and the management regulation of the power equipment, the technical standard of the power equipment and the defect record of the power equipment to obtain the power text BERT,
in order to compare the merits of the training method of this embodiment, the power text BERT and the PowerBERT obtained based on the training method proposed in this embodiment using the same linguistic data as the power text BERT are subjected to a comparative test. The model training parameters are shown in table 3, and the error in the training process is shown in fig. 7.
TABLE 3PowerBERT Pre-training parameters
Figure SMS_27
/>
Figure SMS_28
Example 2:
the embodiment discloses a power text processing method, which comprises the following steps:
aiming at the unstructured property of the power text, a power text entity extraction model is adopted to extract entity words and power proper nouns in the power text, and the entity words and the power proper nouns can help a physical examination worker to quickly find out really needed information in a mass information source.
The main tasks of the electric power text entity extraction model are as follows: and giving a section of transformer operation and maintenance inspection text, and extracting entity words from the section of transformer operation and maintenance inspection text, wherein the entity words comprise name words, place words, organization names, time words and other name and word words.
How to enable the electric power text entity extraction model to accurately extract the required words requires that the electric power text entity extraction model can understand the self meaning of each word in the text, establish the relation between words and understand the semantics of the whole text.
The power text entity extraction model of this embodiment essentially labels an input sequence, that is, each element in the sequence is labeled with a certain label in a label set, and the power text entity extraction model mainly includes a language model and a classification layer for text encoding, where the language model for text encoding may be any one of an LSTM model, a general BERT model, a power text BERT model, or a PowerBERT model mentioned in embodiment 1. The classification layer mainly aims to perform sequence labeling, and a common sequence labeling task is mainly realized by using a Conditional Random Field (CRFs) model.
Let X = { X 1 ,x 2 ,…,x n Represents the operation and maintenance text of the power transformer after being coded by PowerBERT, and is used as an input symbol sequence of CRF, and Y = { Y = } 1 ,y 2 ,…,y n The output sequence of CRF represents the tag sequence, and the conditional random field structure model is shown in FIG. 1.
Conditional random field with X = { X = 1 ,x 2 ,……x n Is the input sequence, Y = { Y = 1 ,y 2 ……y n The conditional random field is a characteristic function constructed between the input sequence and the output state:
f(X,i,y i ,y i-1 )(10)
where i is the current position and y i Indicates the current output tag, y i-1 An output label representing the previous time instant. In the training process, the feature function f is equivalent to the known a priori knowledge, namely the feature provided by the training sample. With the known feature function, the probability distribution function of the CRF can be expressed as shown in the following equation (2):
Figure SMS_29
and lambda is a parameter to be optimized, the weight of each characteristic function is represented, and the predicted output sequence has the highest probability distribution through the optimization parameter lambda.
Now, the following description is made on the electric power text entity extraction model:
the electric power text entity extraction model is a BERT-CRF model, and a BERT structure in the BERT-CRF model is the PowerBERT mentioned in the embodiment 1; firstly, inputting a text into a pre-trained BERT model to obtain coding information of the text; then the coding information is used as the input of a CRF structure, and the label corresponding to each word is obtained through a transfer matrix; and finally, entity extraction of the operation and maintenance text of the power transformer is realized, and the electric power professional vocabulary in the operation and maintenance text is identified.
In order to train the model to complete the electric power text entity extraction task, the electric power text entity extraction data set needs to be constructed first. The embodiment collects the operation rules and the general system of the electric power equipment, the relevant technical standards of the electric power equipment and the defect records of the PMS electric power equipment in the last decade as the training corpora.
The process of constructing the electric power text entity extraction data set comprises the following steps: searching sentences containing 2 to 8 professional vocabularies in a training text according to the electric power professional vocabularies, obtaining 840 samples through manual checking and comparison, randomly cutting according to 50%, 25% and 25%, wherein 50% is a training set, 25% is a verification set, and 25% is a test set, namely inputting 50% of training sentences into a model, searching the electric power professional vocabularies in the model, performing back propagation according to error results, and after each round of training, verifying the generalization capability of the current model by the verification set to determine whether to stop continuing training; and performing test verification on the test set after the model test is completed.
The electric power text entity extraction task is realized by reasonably setting training parameters, wherein the training parameters of the BERT-CRF model are as follows:
for a BERT series model, in the fine adjustment process, the number of training rounds is 5 rounds, the size of a training batch is 4, an AdamW optimizer is used, the learning rate is 5e-5, the warmup probability is 0.1, the weight \ u decade is 0.01, the adam_epsilon is 1e-8, the size of dorpout is 0.1, and finally a BIOES label is used for entity identification.
In this embodiment, a long-short term memory network (LSTM), a general BERT, a power text BERT, and a PowerBERT are selected as codes, and the coding results are input into CRF networks with the same structure, and are compared and verified on the same training set, with the test results shown in table 4.
Table 4 electric power text entity extraction experimental results
Figure SMS_30
As can be seen in Table 4, the general BERT is 10% -20% improved in both the recall rate and accuracy of the validation set and the test set compared to the conventional LSTM model. Compared with other models, the PowerBERT through the power text and the improved training strategy achieves the best indexes.
The deep learning model usually needs to be trained on a large number of marked samples to obtain strong generalization capability, but the acquisition of the data usually needs manual marking, which is time-consuming and labor-consuming, so that if the deep learning model with strong classification performance can be trained by only a small number of training samples, convenience is provided for the actual engineering application of the model. In order to verify the entity extraction capability of each model under a small number of labeled training samples, the embodiment adopts 4 models, namely a long short term memory network (LSTM), a universal BERT, a power text BERT and a PowerBERT, to analyze the entity extraction performance of the power text under different training samples (30, 50 and 80), and the test results are shown in table 5.
TABLE 5 modified training sample number Power text entity extraction experiment
Figure SMS_31
Analyzing the table 5, it can be found that, when the number of samples is less than or equal to 50, the LSTM result is better than the general BERT, but is worse than the electric power text BERT and the PowerBERT, which indicates that the general BERT has larger parameters, but the general BERT has unsatisfactory migration effect due to lack of power field related knowledge because the general BERT is not learned on the electric power text; the power text BERT and the Power BERT have power domain related knowledge, so the performance on small sample data is more excellent. And as the amount of data increases, the effects of LSTM, general BERT, power text BERT, and PowerBERT all increase.
When the number of samples is 80, the recall rate of the PowerBERT reaches 81.26%, the accuracy reaches 81.6%, both of the PowerBERT and the PowerBERT exceed 80%, the F1 fraction reaches 0.8015, and the PowerBERT, the PowerBERT and the PowerBERT are all higher than all other models, wherein the recall rate is increased by 11.91% and the accuracy is increased by 8.26% compared with an LSTM model; compared with the general BERT, the recall rate is improved by 10.06%, and the accuracy is improved by 4.43%; compared with the electric text BERT, the recall rate is improved by 4.40%, the precision is improved by 5.85%, and the recall rate and the precision are obviously improved.
And the performance of PowerBERT also leads other models at all times with continuing to increase the number of training samples. In practical application, it is easier to obtain more than 80 labeled samples, so that the PowerBERT can perform better than the LSTM, the power text BERT and the general BERT in practical engineering application.
Model comparison study:
when the electric text entity is extracted based on the PowerBERT-CRF model, the text is coded through the PowerBERT, and then a final entity extraction result is obtained through a CRF layer. In order to verify whether the effect of extracting the power text entity is influenced by accessing other neural network structures (such as a linear layer and an LSTM layer) after the PowerBERT and inputting the output of the neural network structures into the CRF, the embodiment takes the PowerBERT as text coding, inputs the output of the BERT into different neural network structures for further coding, and finally inputs the output into the CRF for entity extraction and compares the classification precision of the entities. The model types tested in this example are PowerBERT-CRF (meaning that the CRF layer is directly accessed after PowerBERT), powerBERT-Liner-CRF (meaning that a linear layer is accessed after PowerBERT, and then the output of the linear layer is input to the CRF layer), and PowerBERT-LSTM-CRF (meaning that an LSTM layer is accessed after PowerBERT for further encoding, and then the output of LSTM is input to the CRF layer).
The results of the experiment are shown in table 6 below:
table 6 electric power text entity extraction experiment for modifying number of output layers
Figure SMS_32
As can be seen from the above table, the PowerBERT-LSTM-CRF structure has the highest recall rate in terms of recall rate, which is improved by 2% to 3% compared with the other two structures, and there is not much difference between the PowerBERT-CRF and the PowerBERT-Liner-CRF. And in terms of accuracy, the PowerBERT-Liner-CRF has the highest accuracy, and is increased by more than 1% compared with the PowerBERT-CRF and is increased by more than 2% compared with the PowerBERT-LSTM-CRF. Although the PowerBERT-LSTM-CRF structure has the optimal effect, the LSTM structure is a circular neural network, and the output of the next step needs the result of the previous step, so that parallel calculation cannot be performed between networks, the GPU utilization rate is low, and the model training and prediction time consumption is greatly increased. Therefore, the PowerBERT-CRF can be considered preferentially under the condition of pursuing model reasoning speed and memory and having low requirements on recall and accuracy.

Claims (9)

1. A semantic analysis method for power text is characterized in that: the method comprises the following steps:
step 1: constructing a PowerBERT power text model, wherein the PowerBERT power text model comprises an embedded layer, a plurality of layers of transform-encoders and an output layer; the embedded layer is used for converting characters and positions in the input power text into corresponding vector information; the Transformer-Encoder is used for capturing the intrinsic meaning in the vector information output by the embedded layer to obtain an encoding matrix of the electric power text;
step 2: training the PowerBERT electric power text model based on a multiple text crossing mechanism and a real-time dynamic mask mechanism to obtain a trained PowerBERT electric power text model;
and 3, step 3: and inputting the power text to be subjected to semantic analysis into the trained PowerBERT power text model to obtain a semantic analysis result.
2. The electric power text semantic analysis method according to claim 1, characterized in that: the embedding layer includes: a word embedding model for breaking down the input power text into word vectors, a block embedding model for distinguishing which block a word belongs to, and a position embedding model for representing the absolute position of each word.
3. The electric power text semantic analysis method according to claim 1, characterized in that: each layer of the transform-Encoder comprises a transform structure and an Encoder structure, the input of the first layer of the transform-Encoder is the output of the embedding layer, the input of the next layer of the transform-Encoder is the output of the previous layer of the transform-Encoder, and the output of the last layer of the transform-Encoder is the input of the output layer.
4. The electric power text semantic analysis method according to claim 3, characterized in that: the Transformer structure comprises a multi-head attention layer, wherein the multi-head attention layer is formed by combining a plurality of self-attention layers;
assume the input of the multi-head attention layer is matrix X MHA The inputs Q, K, V from the attention layer are derived according to the following equation:
Figure FDA0004022226990000011
in the formula, W Q 、W K 、W V A transformation parameter matrix to be optimized;
the output from the attention layer is expressed as:
Figure FDA0004022226990000012
in the formula (d) k For the dimension of the input matrix, softmax (, is the activation function;
and splicing the outputs of the multiple self-attention layers, and performing linear transformation to obtain the output of the current Transformer structure.
5. The electric power text semantic analysis method according to claim 3, characterized in that: the Encoder structure includes a normalization layer and a full connection layer, represented as:
Figure FDA0004022226990000021
in the formula, X Encoder Input representing the Encoder structure, MHA (X) Encoder ) Represents the output of the Transformer structure, layerNorm (x) represents the layer normalization operation, feed Forward (L) 1 ) Denotes the output of the fully connected layer, Y Encoder Represents the output of the Encoder layer, L 1 Representing an intermediate variable.
6. The electric power text semantic analysis method according to claim 5, characterized in that: the fully-connected layer comprises two layers of fully-connected neural networks, wherein Relu is adopted as an activation function in the first layer of fully-connected neural network, the activation function is not used in the second layer of fully-connected neural network, and the fully-connected layer is expressed as follows:
FeedForward(X)=max(0,XW 1 +b 1 )W 2 +b 2 (5) Wherein X represents the input of the full connection layer, W 1 、W 2 、b 1 、b 2 Representing the parameters to be optimized in the fully connected layer.
7. The electric power text semantic analysis method according to claim 1, characterized in that: the step 2 specifically comprises:
splitting a training process into two threads by adopting a real-time dynamic masking mechanism, wherein the two threads comprise a CPU thread and a GPU thread, the CPU thread is used for carrying out masking operation in real time based on a multiple text crossing mechanism, and the GPU thread is used for training masked samples; when the GPU thread trains one batch of samples, the CPU thread performs masking operation on the next batch of samples.
8. The electric power text semantic analysis method according to claim 7, characterized in that: the masking operation is performed in real time based on the multiple text intersection mechanism, and specifically comprises the following steps:
assume that the original power text sequence is a 1 ,a 2 ,a 3 …a n After masking is carried out based on a multiple text crossing mechanism, the obtained sequence is
Figure FDA0004022226990000022
Where the set of subscripts for the masked characters is T = { T = 1 ,t 2 ,…,t s Denotes the total number of characters that are masked, and any masked characters t n < N; n represents the maximum number of masked characters;
the PowerBERT inputs the training text each time, and the method is expressed as:
Figure FDA0004022226990000023
wherein [ CLS ] represents the starting position of the corpus, and [ SEP ] represents the division among a plurality of corpus;
assuming that the length of an input sample of PowerBERT training each time is M, if the length of a training text sequence is less than M-2, performing a padding operation, wherein a padding operation mark is [ PAD ], and then the input of a training text is represented as:
Figure FDA0004022226990000024
if the sequence length of the training text is larger than M-2, truncating the training text to M-2 and inputting;
replacing 80% of the training text with mask, replacing 10% with random words, and keeping 10% original shape to finally obtain a masked sample;
the multiple text crossing mechanism comprises:
judging whether the training text comprises the electric power professional vocabulary or not, and if so, performing masking operation by adopting 40% of character mask, 30% of entity mask and 30% of fragment mask; if not, then a 40% character mask +60% fragment mask masking operation is used.
9. The electric power text semantic analysis method according to claim 7, characterized in that: the training of the masked sample specifically includes:
according to subscript of the characters to be maskedSet T = { T = } 1 ,t 2 ,…,t s H, output sequence Y from PowerBERT Power text model PowerBERT And the output sequence Y of the embedding layer Embeddings The mask for the output sequence is formed by extracting the columns corresponding to the masked characters, and is expressed as
Figure FDA0004022226990000031
And &>
Figure FDA0004022226990000032
Calculating probability distribution matrix P on the vocabulary corresponding to the mask position according to formula (8) mask
Figure FDA0004022226990000033
In the formula, W t Representing a word vector matrix, b being the bias coefficient to be trained, P mask As output mask characters
Figure FDA0004022226990000034
With each position y in the vocabulary i The probability that the words of (a) are identical;
Figure FDA0004022226990000035
with the position y of each word in the vocabulary i The cross entropy loss H (P) is calculated according to equation (9) mask ,w t ):/>
Figure FDA0004022226990000036
And optimizing parameters in each layer of transform-Encoder in the PowerBERT power text model by using a back propagation algorithm to the cross entropy loss.
CN202211693288.6A 2022-12-28 2022-12-28 Electric power text semantic analysis method Pending CN115936009A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211693288.6A CN115936009A (en) 2022-12-28 2022-12-28 Electric power text semantic analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211693288.6A CN115936009A (en) 2022-12-28 2022-12-28 Electric power text semantic analysis method

Publications (1)

Publication Number Publication Date
CN115936009A true CN115936009A (en) 2023-04-07

Family

ID=86698942

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211693288.6A Pending CN115936009A (en) 2022-12-28 2022-12-28 Electric power text semantic analysis method

Country Status (1)

Country Link
CN (1) CN115936009A (en)

Similar Documents

Publication Publication Date Title
CN109992782B (en) Legal document named entity identification method and device and computer equipment
CN111694924B (en) Event extraction method and system
CN112528676B (en) Document-level event argument extraction method
CN110516055A (en) A kind of cross-platform intelligent answer implementation method for teaching task of combination BERT
CN110189749A (en) Voice keyword automatic identifying method
CN110020438A (en) Enterprise or tissue Chinese entity disambiguation method and device based on recognition sequence
CN111985239A (en) Entity identification method and device, electronic equipment and storage medium
CN110009025A (en) A kind of semi-supervised additive noise self-encoding encoder for voice lie detection
CN115310448A (en) Chinese named entity recognition method based on combining bert and word vector
CN116483991A (en) Dialogue abstract generation method and system
CN117421595A (en) System log anomaly detection method and system based on deep learning technology
CN116756303A (en) Automatic generation method and system for multi-topic text abstract
CN114492460A (en) Event causal relationship extraction method based on derivative prompt learning
CN117332788B (en) Semantic analysis method based on spoken English text
CN116029295A (en) Electric power text entity extraction method, defect positioning method and fault diagnosis method
CN116975161A (en) Entity relation joint extraction method, equipment and medium of power equipment partial discharge text
CN116522165A (en) Public opinion text matching system and method based on twin structure
CN114579706B (en) Automatic subjective question review method based on BERT neural network and multi-task learning
CN115936009A (en) Electric power text semantic analysis method
CN113157866B (en) Data analysis method, device, computer equipment and storage medium
CN115221284A (en) Text similarity calculation method and device, electronic equipment and storage medium
CN115292490A (en) Analysis algorithm for policy interpretation semantics
CN114238649A (en) Common sense concept enhanced language model pre-training method
CN114298052A (en) Entity joint labeling relation extraction method and system based on probability graph
CN113627146B (en) Knowledge constraint-based two-step refute a rumour text generation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination