CN116028785A

CN116028785A - Training method and device for power grid time sequence data feature extraction model

Info

Publication number: CN116028785A
Application number: CN202211705050.0A
Authority: CN
Inventors: 刘浩; 甘津瑞; 吴鹏; 刘鑫; 谢涛; 陈庆涛; 陈凡; 罗超; 周攀; 常文婧; 马欢; 施雯
Original assignee: Super High Voltage Branch Of State Grid Anhui Electric Power Co ltd; State Grid Smart Grid Research Institute Co ltd; State Grid Anhui Electric Power Co Ltd
Current assignee: Super High Voltage Branch Of State Grid Anhui Electric Power Co ltd; State Grid Smart Grid Research Institute Co ltd; State Grid Anhui Electric Power Co Ltd
Priority date: 2022-12-28
Filing date: 2022-12-28
Publication date: 2023-04-28

Abstract

The embodiment of the invention relates to a training method and a training device for a power grid time sequence data feature extraction model, wherein the training method comprises the following steps: acquiring original power grid time sequence data, and performing mask enhancement processing on the original power grid time sequence data to obtain first enhancement time sequence data; performing numerical conversion operation on the original power grid time sequence data to obtain second enhanced time sequence data; performing contrast learning training on the initial model based on the first enhanced time sequence data and the second enhanced time sequence data to obtain contrast loss; performing mask prediction on the first enhanced time sequence data through an initial model to obtain reconstruction loss; and optimizing training parameters of the initial model based on the contrast loss and the reconstruction loss to obtain a power grid time sequence data characteristic extraction model. Therefore, the self-supervision pre-training is carried out on the time sequence data by utilizing two modes of contrast learning and mask prediction, and the modeling of distinguishing characteristics and context information is considered, so that the generalization capability of the pre-training time sequence characteristic extraction network is improved.

Description

Training method and device for power grid time sequence data feature extraction model

Technical Field

The embodiment of the invention relates to the field of data processing, in particular to a training method and device for a power grid time sequence data characteristic extraction model.

Background

In recent years, time series data analysis plays an important role in many fields including financial markets, demand prediction, climate modeling, and the like. The abundant sensor devices are configured in the power grid scene, massive on-line monitoring time sequence data are generated, abnormal states in the power grid scene can be effectively detected through time sequence prediction, abnormal detection and other analysis technologies, so that the intelligent level of fault diagnosis is improved, serious faults are prevented in advance, and the novel power system construction is supported forcefully.

However, the existing time sequence analysis method does not pointedly cope with noise in time sequence data, so that the overfitting of the model to training data is caused, the generalization capability of the model is influenced, and the improvement of the performance of the time sequence model is prevented.

Disclosure of Invention

In view of the above, in order to solve the above technical problems or some technical problems, an embodiment of the present invention provides a training method and apparatus for a power grid time series data feature extraction model.

In a first aspect, an embodiment of the present invention provides a training method for a power grid time sequence data feature extraction model, including:

Acquiring original power grid time sequence data, and performing mask enhancement processing on the original power grid time sequence data to obtain first enhancement time sequence data;

performing numerical conversion operation on the original power grid time sequence data to obtain second enhanced time sequence data;

performing contrast learning training on the initial model based on the first enhanced time sequence data and the second enhanced time sequence data to obtain contrast loss;

performing mask prediction on the first enhanced time sequence data through an initial model to obtain reconstruction loss;

and optimizing training parameters of the initial model based on the contrast loss and the reconstruction loss to obtain a power grid time sequence data characteristic extraction model.

In one possible embodiment, the method further comprises:

determining a time step index of a fixed proportion and determining a starting index position;

expanding a time step index with a fixed length based on a time step index with a fixed proportion and a starting index position to obtain a plurality of time sequence data segments;

and carrying out zero setting operation on the original power grid time sequence data corresponding to the time sequence data segments to obtain first enhanced time sequence data after mask enhancement processing.

In one possible embodiment, the method further comprises:

Scaling the original power grid time sequence data;

performing translation processing on the original power grid time sequence data subjected to the scaling processing;

and carrying out dithering treatment on the original power grid time sequence data after the translation treatment to obtain second enhanced time sequence data after the numerical conversion operation.

In one possible embodiment, the method further comprises:

extracting a first query feature corresponding to the first enhanced time sequence data and extracting a second query feature corresponding to the second enhanced time sequence data;

extracting a first query mapping feature based on the first query feature and a second query mapping feature based on the second query feature;

extracting first query prediction features based on the first query mapping features, and extracting second query prediction features based on the second query mapping features;

extracting a first key feature corresponding to the first enhanced time sequence data and extracting a second key feature corresponding to the second enhanced time sequence data;

a first key map feature is extracted based on the first key feature and a second key map feature is extracted based on the second key feature.

In one possible embodiment, the method further comprises:

Calculating a first contrast loss based on the first query prediction feature and the second key mapping feature;

a second contrast loss is calculated based on the second query prediction feature and the first key mapping feature.

In one possible embodiment, the method further comprises:

extracting coding features of the first enhancement time sequence data;

predicting the first enhancement time sequence data through a reconstruction decoder based on the coding characteristics to obtain prediction data;

and determining reconstruction loss based on the prediction data and the original data corresponding to the first enhancement time sequence data.

In one possible embodiment, the method further comprises:

optimizing training parameters of the initial model through a first formula, wherein the first formula is as follows: l=l _con +αL _rec ；

Wherein L is _con For comparison of losses, L _rec For reconstructing the losses, α is the weight that balances the two losses.

In a second aspect, an embodiment of the present invention provides a method for extracting a power grid time sequence data feature, including:

acquiring original power grid time sequence data of a power grid to be subjected to feature extraction;

and inputting the original power grid time sequence data into a power grid time sequence data feature extraction model to obtain power grid time sequence data features of the power grid.

In a third aspect, an embodiment of the present invention provides a training device for a power grid time sequence data feature extraction model, including:

the data processing module is used for acquiring original power grid time sequence data, and carrying out mask enhancement processing on the original power grid time sequence data to obtain first enhancement time sequence data;

the data processing module is further used for performing numerical transformation operation on the original power grid time sequence data to obtain second enhanced time sequence data;

the training module is used for carrying out contrast learning training on the initial model based on the first enhanced time sequence data and the second enhanced time sequence data to obtain contrast loss;

the training module is further used for carrying out mask prediction on the first enhanced time sequence data through an initial model to obtain reconstruction loss;

and the optimization module is used for optimizing the training parameters of the initial model based on the contrast loss and the reconstruction loss to obtain a power grid time sequence data characteristic extraction model.

In a possible embodiment, the data processing module is further configured to determine a fixed proportion of the time step index and determine a starting index position; expanding a time step index with a fixed length based on a time step index with a fixed proportion and a starting index position to obtain a plurality of time sequence data segments; and carrying out zero setting operation on the original power grid time sequence data corresponding to the time sequence data segments to obtain first enhanced time sequence data after mask enhancement processing.

In a possible implementation manner, the data processing module is further configured to scale the original grid time sequence data; performing translation processing on the original power grid time sequence data subjected to the scaling processing; and carrying out dithering treatment on the original power grid time sequence data after the translation treatment to obtain second enhanced time sequence data after the numerical conversion operation.

In a possible implementation manner, the training module is further configured to extract a first query feature corresponding to the first enhanced time sequence data, and extract a second query feature corresponding to the second enhanced time sequence data; extracting a first query mapping feature based on the first query feature and a second query mapping feature based on the second query feature; extracting first query prediction features based on the first query mapping features, and extracting second query prediction features based on the second query mapping features; extracting a first key feature corresponding to the first enhanced time sequence data and extracting a second key feature corresponding to the second enhanced time sequence data; a first key map feature is extracted based on the first key feature and a second key map feature is extracted based on the second key feature.

In one possible implementation, the training module is further configured to calculate a first contrast loss based on the first query prediction feature and the second key mapping feature; a second contrast loss is calculated based on the second query prediction feature and the first key mapping feature.

In a possible implementation manner, the training module is further configured to extract coding features of the first enhanced time sequence data; predicting the first enhancement time sequence data through a reconstruction decoder based on the coding characteristics to obtain prediction data; and determining reconstruction loss based on the prediction data and the original data corresponding to the first enhancement time sequence data.

In a possible implementation manner, the optimization module is specifically configured to optimize the training parameters of the initial model through a first formula, where the first formula is: l=l _con +αL _rec The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is _con For comparison of losses, L _rec For reconstructing the losses, α is the weight that balances the two losses.

In a fourth aspect, an embodiment of the present invention provides a power grid time series data feature extraction device, including:

the acquisition module is used for acquiring original power grid time sequence data of the power grid to be subjected to feature extraction;

The extraction module is used for inputting the original power grid time sequence data into a power grid time sequence data feature extraction model to obtain power grid time sequence data features of the power grid;

the power grid time sequence data characteristic extraction model is obtained through training by the method in the first aspect.

In a fifth aspect, an embodiment of the present invention provides a computer apparatus, including: the training program of the power grid time sequence data feature extraction model is stored in the memory, so that the training method of the power grid time sequence data feature extraction model in the first aspect and the power grid time sequence data feature extraction method in the second aspect are realized.

In a sixth aspect, an embodiment of the present invention provides a storage medium, including: the storage medium stores one or more programs executable by one or more processors to implement the training method of the grid time series data feature extraction model described in the first aspect and the grid time series data feature extraction method described in the second aspect.

According to the training scheme of the power grid time sequence data feature extraction model, the first enhancement time sequence data is obtained by acquiring original power grid time sequence data and carrying out mask enhancement processing on the original power grid time sequence data; performing numerical conversion operation on the original power grid time sequence data to obtain second enhanced time sequence data; performing contrast learning training on the initial model based on the first enhanced time sequence data and the second enhanced time sequence data to obtain contrast loss; performing mask prediction on the first enhanced time sequence data through an initial model to obtain reconstruction loss; and optimizing training parameters of the initial model based on the contrast loss and the reconstruction loss to obtain a power grid time sequence data characteristic extraction model. Compared with the existing time sequence analysis method which does not pertinently cope with noise in time sequence data, the method causes overfitting of a model to training data, influences generalization capability of the model, and by the scheme, the time sequence data is self-supervised and pre-trained by utilizing two modes of contrast learning and mask prediction, and modeling of distinguishing characteristics and context information is considered, so that generalization capability of a pre-training time sequence characteristic extraction network is improved.

According to the power grid time sequence data feature extraction scheme provided by the embodiment of the invention, the original power grid time sequence data of the power grid to be subjected to feature extraction is obtained; and inputting the original power grid time sequence data into a power grid time sequence data feature extraction model to obtain power grid time sequence data features of the power grid. According to the scheme, the power grid time sequence data characteristics are extracted by aiming at the noise in the time sequence data and considering the distinguishing characteristics and the model which is completed by modeling training of the context information, so that the generalization capability of a training time sequence characteristic extraction network can be improved, and the training speed and the prediction precision of various downstream power grid time sequence tasks can be improved.

Drawings

Fig. 1 is a flow chart of a training method of a power grid time sequence data feature extraction model according to an embodiment of the present invention;

fig. 2 is a flow chart of a training method of another power grid time sequence data feature extraction model according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a power grid time sequence data feature extraction model according to an embodiment of the present invention;

FIG. 4 is a diagram of an encoder model architecture according to an embodiment of the present invention;

FIG. 5 is a mask prediction model architecture diagram according to an embodiment of the present invention;

Fig. 6 is a schematic structural diagram of a training device for a power grid time sequence data feature extraction model according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a power grid time sequence data feature extraction device according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

For the purpose of facilitating an understanding of the embodiments of the present invention, reference will now be made to the following description of specific embodiments, taken in conjunction with the accompanying drawings, which are not intended to limit the embodiments of the invention.

Fig. 1 is a flow chart of a training method of a power grid time sequence data feature extraction model according to an embodiment of the present invention, as shown in fig. 1, the method specifically includes:

S11, acquiring original power grid time sequence data, and performing mask enhancement processing on the original power grid time sequence data to obtain first enhancement time sequence data.

The embodiment of the invention constructs a training method of a power grid time sequence data characteristic extraction model based on comparison learning and mask prediction combined modeling, firstly, the original power grid time sequence data is obtained, and mask enhancement processing is carried out on the original power grid time sequence data, and the training method specifically comprises the following steps: randomly selecting time step indexes with fixed proportions for time sequence data of an input original power grid, wherein the indexes are not overlapped; then, continuously expanding time step indexes with fixed lengths by taking each index as a starting position to form a time sequence data segment; and finally, carrying out zero setting operation on all the values at the acquired index positions to obtain first enhanced time sequence data.

For example, there are 100 pieces of original grid time sequence data, it is determined that mask enhancement processing is performed on 80 pieces of data, it is determined that from the first data, every 5 pieces of data are used as a time step index until 80 pieces of data are selected to obtain a time sequence data segment, and finally, zero setting operation is performed on the data value at the obtained index position, that is, real data is masked, so that the first enhancement time sequence data is obtained.

And S12, performing numerical conversion operation on the original power grid time sequence data to obtain second enhanced time sequence data.

The numerical transformation firstly scales the original power grid time sequence data through single numerical values E-N (0, 0.5) sampled randomly

Then the scaled data is shifted by a single value e-N (0, 0.5) sampled randomly>

Finally, n values epsilon are sampled randomly _i -N (0, 0.5) dithering the translated data for each time step +.>

n is the length of the time sequence data, and the second enhanced time sequence data after the numerical conversion operation is obtained.

And S13, performing contrast learning training on the initial model based on the first enhancement time sequence data and the second enhancement time sequence data to obtain contrast loss.

The contrast learning training phase includes two branches: query branches and key branches, five core modules: the model architecture is shown in fig. 3, and the query encoder, the query mapping head, the query pre-header, the key encoder and the key mapping head are all described in detail below.

For query branches: firstly, the time sequence data after mask enhancement is first enhancement time sequence data, the time sequence data after numerical conversion is second enhancement time sequence data, and the first enhancement data and the second enhancement data extract first query characteristics and second query characteristics through a query encoder; then, the first and second query features extract the first and second query mapping features through the query mapping header; finally, the first and second query mapping features extract first and second query prediction features through a query prediction head.

For the key branches: firstly, extracting first and second key features from first and second enhancement time sequence data through a key encoder; then, the first and second key features extract the first and second key mapping features through the key mapping head; finally, the first query prediction feature and the second key mapping feature calculate a contrast loss, and the second query prediction feature and the first key mapping feature calculate a contrast loss.

S14, carrying out mask prediction on the first enhanced time sequence data through an initial model to obtain reconstruction loss.

The mask prediction phase includes two core modules: the query encoder and the reconstruction decoder share the same phase with the contrast learning phase, the reconstruction decoder is a transducer coding module, and the mask prediction model structure is shown in fig. 5. Firstly, extracting coding features from first enhancement time sequence data through a query encoder; then, the coding feature predicts the time sequence value of the mask area through a reconstruction decoder; and finally, calculating the reconstruction loss of the mask prediction value and the real time sequence data for parameter optimization of the mask prediction correlation module.

And S15, optimizing training parameters of the initial model based on the contrast loss and the reconstruction loss to obtain a power grid time sequence data feature extraction model.

The two phases of contrast learning training and mask prediction correspond to contrast loss and reconstruction loss respectively, and the two losses optimize the whole model end to end at the same time. The overall loss function is calculated as shown in formula 1:

L＝L _con +αL _rec equation 1

And optimizing training parameters of the initial model according to the loss function calculation result, for example, when the loss function calculation result does not reach the expected value, adjusting the model training parameters until the loss function calculation result reaches the expected value, and completing the training of the characterization model to obtain the power grid time sequence data feature extraction model.

The training method of the power grid time sequence data feature extraction model provided by the embodiment of the invention aims at time sequence data integration contrast learning and mask prediction, and utilizes complementarity of distinguishing feature learning and context modeling to realize pre-training of the power grid time sequence data model, extract general features with strong generalization capability, greatly improve training speed and prediction precision of various downstream power grid time sequence tasks, and comprises time sequence prediction, time sequence classification, anomaly detection and the like.

According to the training method of the power grid time sequence data feature extraction model, original power grid time sequence data are obtained, mask enhancement processing is conducted on the original power grid time sequence data, and first enhancement time sequence data are obtained; performing numerical conversion operation on the original power grid time sequence data to obtain second enhanced time sequence data; performing contrast learning training on the initial model based on the first enhanced time sequence data and the second enhanced time sequence data to obtain contrast loss; performing mask prediction on the first enhanced time sequence data through an initial model to obtain reconstruction loss; and optimizing training parameters of the initial model based on the contrast loss and the reconstruction loss to obtain a power grid time sequence data characteristic extraction model. Compared with the existing time sequence analysis method which does not pertinently cope with noise in time sequence data, the method causes overfitting of a model to training data and influences generalization capability of the model, the method utilizes two modes of contrast learning and mask prediction to conduct self-supervision pre-training on the time sequence data at the same time, and modeling of distinguishing characteristics and context information is considered, so that generalization capability of a pre-training time sequence characteristic extraction network is improved.

Fig. 2 is a flow chart of another training method of a power grid time sequence data feature extraction model according to an embodiment of the present invention, as shown in fig. 2, the method specifically includes:

s21, determining a time step index of a fixed proportion and determining a starting index position.

S22, expanding a time step index with a fixed length based on the time step index with a fixed proportion and the starting index position to obtain a plurality of time sequence data segments.

S23, carrying out zero setting operation on original power grid time sequence data corresponding to the time sequence data segments to obtain first enhancement time sequence data after mask enhancement processing.

Hereinafter, S21 to S23 will be collectively described:

firstly, acquiring original power grid time sequence data, and performing mask enhancement processing on the original power grid time sequence data, wherein the method specifically comprises the following steps: randomly selecting time step indexes with fixed proportions for time sequence data of an input original power grid, wherein the indexes are not overlapped; then, continuously expanding time step indexes with fixed lengths by taking each index as a starting position to form a time sequence data segment; and finally, carrying out zero setting operation on all the values at the acquired index positions to obtain first enhanced time sequence data.

For example, there are 100 pieces of original grid time sequence data, it is determined that mask enhancement processing is performed on 60 pieces of data, it is determined that from the first data, every 3 pieces of data are used as a time step index until 60 pieces of data are selected to obtain a time sequence data segment, and finally, zero setting operation is performed on the data value at the obtained index position, that is, real data is masked, so that the first enhancement time sequence data is obtained.

S24, performing scaling processing on the original power grid time sequence data.

S25, carrying out translation processing on the original power grid time sequence data subjected to the scaling processing.

S26, dithering is carried out on the original power grid time sequence data after the translation processing, and second enhanced time sequence data after the numerical value transformation operation is obtained.

Hereinafter, S24 to S26 will be collectively described:

the numerical transformation first scales the original grid time sequence data by randomly sampled single values epsilon-N (0, 0.5)

The scaled data is then translated by randomly sampled single values ε -N (0, 0.5)

Finally, n values E through random sampling _i -N (0, 0.5) dithering the translated data for each time step +.>

n is the length of the time sequence data, and the second enhanced time sequence data after the numerical conversion operation is obtained. />

S27, extracting first query features corresponding to the first enhanced time sequence data and extracting second query features corresponding to the second enhanced time sequence data.

S28, extracting first query mapping features based on the first query features, and extracting second query mapping features based on the second query features.

S29, extracting first query prediction features based on the first query mapping features, and extracting second query prediction features based on the second query mapping features.

S210, extracting first key features corresponding to the first enhancement time sequence data and extracting second key features corresponding to the second enhancement time sequence data.

S211, extracting a first key mapping feature based on the first key feature, and extracting a second key mapping feature based on the second key feature.

Hereinafter, S27 to S211 will be collectively described:

For query branches: firstly, the time sequence data after mask enhancement is first enhancement time sequence data, the time sequence data after numerical conversion is second enhancement time sequence data, the first enhancement time sequence data can extract first query characteristics through a query encoder, and the second enhancement time sequence data can extract second query characteristics through the query encoder; then, the first query feature extracts the first query mapping feature through the query mapping head, and the second query feature extracts the second query mapping feature through the query mapping head; finally, the first query mapping feature extracts a first query prediction feature through the query prediction head, and the second query mapping feature extracts a second query prediction feature through the query prediction head.

For the key branches: firstly, extracting first key features from first enhancement time sequence data through a key encoder, and extracting second key features from second enhancement time sequence data through the key encoder; then, the first key feature extracts the first key map feature through the key map header, and the second key feature extracts the second key map feature through the key map header.

S212, calculating a first contrast loss based on the first query prediction feature and the second key mapping feature.

S213, calculating a second contrast loss based on the second query prediction feature and the first key mapping feature.

The first contrast loss and the second contrast loss are calculated by equation 2, equation 2 being as follows:

wherein z is _i In order to query the predicted features,

for key mapping features, z _j K is the number of the training data in the same batch for other data in the same batch of training data.

And updating parameters of the key encoder and the key mapping head through exponential moving average by comparing the query encoder and the query mapping head after loss optimization.

It should be noted that, the query encoder and the key encoder adopt the same network structure, and are two transducer encoding modules, and for the time sequence value of each time step, the time sequence value is input to the subsequent multi-head attention layer and feedforward layer through linear mapping encoding and position encoding, so as to realize the feature encoding of the time sequence data, and the specific encoder model architecture is shown in fig. 4. The query mapping head and the key mapping head adopt the same network structure and comprise three neural network layers, wherein each layer consists of a full-connection layer, a layer normalization and a nonlinear activation function; the full link layer dimension of the middle layer is 1024 and the full link dimension of the output layer is 256. The network structure of the query pre-header is the same as the mapping header, and the only difference is that there are only two neural network layers.

S214, extracting coding features of the first enhancement time sequence data.

S215, predicting the first enhancement time sequence data through a reconstruction decoder based on the coding characteristics to obtain prediction data.

S216, determining reconstruction loss based on the prediction data and the original data corresponding to the first enhancement time sequence data.

Hereinafter, S214 to S216 will be collectively described:

firstly, extracting coding features from first enhancement time sequence data through a query encoder; then, the coding feature predicts the time sequence value of the mask area through a reconstruction decoder; and finally, calculating the reconstruction loss of the mask prediction value and the real time sequence data for parameter optimization of the mask prediction correlation module. The reconstruction loss calculation formula is shown in formula 3:

L _rec ＝MSE(pred[masked_index],target[masked_index]) Equation 3

Where MSE is the mean square error loss, pred and target are the reconstruction value and target value, respectively, and masked_index is the masked time step index.

Furthermore, the comparison loss and the reconstruction loss corresponding to the two stages can be predicted based on comparison learning and mask, and the two losses optimize the whole model end to end at the same time. The loss function value is calculated by equation 1 in the corresponding embodiment of fig. 1. And optimizing training parameters of the initial model according to the loss function calculation result, for example, when the loss function calculation result does not reach the expected value, adjusting the model training parameters until the loss function calculation result reaches the expected value, and completing the training of the characterization model to obtain the power grid time sequence data feature extraction model.

According to the training method of the power grid time sequence data feature extraction model, original power grid time sequence data are obtained, mask enhancement processing is conducted on the original power grid time sequence data, and first enhancement time sequence data are obtained; performing numerical conversion operation on the original power grid time sequence data to obtain second enhanced time sequence data; performing contrast learning training on the initial model based on the first enhanced time sequence data and the second enhanced time sequence data to obtain contrast loss; performing mask prediction on the first enhanced time sequence data through an initial model to obtain reconstruction loss; and optimizing training parameters of the initial model based on the contrast loss and the reconstruction loss to obtain a power grid time sequence data characteristic extraction model. The method utilizes two modes of contrast learning and mask prediction to perform self-supervision pre-training on the time sequence data at the same time, and gives consideration to modeling of distinguishing characteristics and context information, so that generalization capability of a pre-training time sequence characteristic extraction network is improved; through the self-supervision learning paradigm of joint modeling, the pre-trained feature extraction network can be used for fine adjustment of various downstream time sequence tasks, and training speed and prediction accuracy of various downstream tasks are greatly improved.

Fig. 6 is a schematic structural diagram of a training device for a power grid time sequence data feature extraction model according to an embodiment of the present invention, which specifically includes:

the data processing module 601 is configured to obtain original power grid time sequence data, and perform mask enhancement processing on the original power grid time sequence data to obtain first enhancement time sequence data. The detailed description refers to the corresponding related description of the above method embodiments, and will not be repeated here.

The data processing module 601 is further configured to perform a numerical transformation operation on the original power grid time sequence data to obtain second enhanced time sequence data. The detailed description refers to the corresponding related description of the above method embodiments, and will not be repeated here.

The training module 602 is configured to perform contrast learning training on the initial model based on the first enhanced time sequence data and the second enhanced time sequence data, so as to obtain contrast loss. The detailed description refers to the corresponding related description of the above method embodiments, and will not be repeated here.

The training module 602 is further configured to perform mask prediction on the first enhanced timing data through an initial model, so as to obtain a reconstruction loss. The detailed description refers to the corresponding related description of the above method embodiments, and will not be repeated here.

And the optimizing module 603 is configured to optimize the training parameters of the initial model based on the contrast loss and the reconstruction loss, so as to obtain a power grid time sequence data feature extraction model. The detailed description refers to the corresponding related description of the above method embodiments, and will not be repeated here.

The training device of the power grid time series data feature extraction model provided in this embodiment may be a training device of the power grid time series data feature extraction model as shown in fig. 6, and may perform all steps of the training method of the power grid time series data feature extraction model as shown in fig. 1-2, so as to achieve the technical effects of the training method of the power grid time series data feature extraction model as shown in fig. 1-2, and specifically please refer to the related description of fig. 1-2, which is not repeated herein for brevity.

The embodiment of the invention also provides a power grid time sequence data feature extraction method, which adopts the trained power grid time sequence data feature extraction model to extract the power grid time sequence data features of the power grid, can improve the generalization capability of the training time sequence feature extraction network, greatly improves the training speed and the prediction precision of various downstream power grid time sequence tasks, and comprises time sequence prediction, time sequence classification, anomaly detection and the like.

Fig. 7 is a schematic structural diagram of a power grid time sequence data feature extraction device according to an embodiment of the present invention, which specifically includes:

An acquisition module 701, configured to acquire original grid time sequence data of a grid to be subjected to feature extraction;

the extraction module 702 is configured to input the original grid time sequence data to a grid time sequence data feature extraction model, so as to obtain a grid time sequence data feature of the grid.

The power grid time series data feature extraction device provided in this embodiment may be a power grid time series data feature extraction device as shown in fig. 7, and may perform all steps of the power grid time series data feature extraction method, thereby achieving the technical effects of the power grid time series data feature extraction method, and for brevity, description will not be repeated here.

Fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present invention, and the computer device 800 shown in fig. 8 includes: at least one processor 801, memory 802, at least one network interface 804, and other user interfaces 803. The various components in computer device 800 are coupled together by a bus system 805. It is appreciated that the bus system 805 is used to enable connected communications between these components. The bus system 805 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration, the various buses are labeled as bus system 805 in fig. 8.

The user interface 803 may include, among other things, a display, a keyboard, or a pointing device (e.g., a mouse, a trackball, a touch pad, or a touch screen, etc.).

It will be appreciated that the memory 802 in embodiments of the invention can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and Direct memory bus RAM (DRRAM). The memory 802 described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

In some implementations, the memory 802 stores the following elements, executable units or data structures, or a subset thereof, or an extended set thereof: an operating system 8021 and application programs 8022.

The operating system 8021 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and processing hardware-based tasks. The application 8022 includes various application programs such as a Media Player (Media Player), a Browser (Browser), and the like for realizing various application services. The program for implementing the method of the embodiment of the present invention may be contained in the application program 8022.

In the embodiment of the present invention, by calling a program or an instruction stored in the memory 802, specifically, a program or an instruction stored in the application program 8022, the processor 801 is configured to perform method steps provided by each method embodiment, for example, including:

acquiring original power grid time sequence data, and performing mask enhancement processing on the original power grid time sequence data to obtain first enhancement time sequence data; performing numerical conversion operation on the original power grid time sequence data to obtain second enhanced time sequence data; performing contrast learning training on the initial model based on the first enhanced time sequence data and the second enhanced time sequence data to obtain contrast loss; performing mask prediction on the first enhanced time sequence data through an initial model to obtain reconstruction loss; and optimizing training parameters of the initial model based on the contrast loss and the reconstruction loss to obtain a power grid time sequence data characteristic extraction model.

In one possible embodiment, a fixed proportion of the time step index is determined and a starting index position is determined; expanding a time step index with a fixed length based on a time step index with a fixed proportion and a starting index position to obtain a plurality of time sequence data segments; and carrying out zero setting operation on the original power grid time sequence data corresponding to the time sequence data segments to obtain first enhanced time sequence data after mask enhancement processing.

In one possible implementation, scaling the raw grid-time-series data; performing translation processing on the original power grid time sequence data subjected to the scaling processing; and carrying out dithering treatment on the original power grid time sequence data after the translation treatment to obtain second enhanced time sequence data after the numerical conversion operation.

In one possible implementation manner, extracting a first query feature corresponding to the first enhanced time sequence data, and extracting a second query feature corresponding to the second enhanced time sequence data; extracting a first query mapping feature based on the first query feature and a second query mapping feature based on the second query feature; extracting first query prediction features based on the first query mapping features, and extracting second query prediction features based on the second query mapping features; extracting a first key feature corresponding to the first enhanced time sequence data and extracting a second key feature corresponding to the second enhanced time sequence data; a first key map feature is extracted based on the first key feature and a second key map feature is extracted based on the second key feature.

In one possible implementation, a first contrast loss is calculated based on the first query prediction feature and the second key mapping feature; a second contrast loss is calculated based on the second query prediction feature and the first key mapping feature.

In one possible implementation, the coding features of the first enhanced temporal data are extracted; predicting the first enhancement time sequence data through a reconstruction decoder based on the coding characteristics to obtain prediction data; and determining reconstruction loss based on the prediction data and the original data corresponding to the first enhancement time sequence data.

In one possible embodiment, the training parameters of the initial model are optimized by a first formula: l=l _con +αL _rec The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is _con For comparison of losses, L _rec For reconstructing the losses, α is the weight that balances the two losses.

The method disclosed in the above embodiment of the present invention may be applied to the processor 801 or implemented by the processor 801. The processor 801 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware in the processor 801 or by instructions in software. The processor 801 described above may be a general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software elements in a decoding processor. The software elements may be located in a random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory 802, and the processor 801 reads information in the memory 802 and, in combination with its hardware, performs the steps of the above method.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof. For a hardware implementation, the processing units may be implemented within one or more application specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processors (Digital Signal Processing, DSP), digital signal processing devices (dspev, DSPD), programmable logic devices (Programmable Logic Device, PLD), field programmable gate arrays (Field-Programmable Gate Array, FPGA), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units designed to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques described herein may be implemented by means of units that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

The computer device provided in this embodiment may be a computer device as shown in fig. 8, and may perform all the steps of the training method of the power grid time series data feature extraction model shown in fig. 1-2, so as to achieve the technical effects of the training method of the power grid time series data feature extraction model shown in fig. 1-2, and the detailed description will be omitted herein for brevity.

The embodiment of the invention also provides a storage medium (computer readable storage medium). The storage medium here stores one or more programs. Wherein the storage medium may comprise volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, hard disk, or solid state disk; the memory may also comprise a combination of the above types of memories.

When one or more programs in the storage medium are executable by one or more processors, the training method of the power grid time sequence data characteristic extraction model executed on the computer equipment side is realized.

The processor is used for executing a training program of the power grid time sequence data characteristic extraction model stored in the memory so as to realize the following steps of a training method of the power grid time sequence data characteristic extraction model executed on the computer equipment side:

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of function in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. The training method of the power grid time sequence data characteristic extraction model is characterized by comprising the following steps of:

2. The method of claim 1, wherein the obtaining the original grid timing data and performing mask enhancement processing on the original grid timing data to obtain first enhanced timing data includes:

3. The method of claim 1, wherein performing a numerical transformation operation on the raw grid timing data to obtain second enhanced timing data comprises:

Scaling the original power grid time sequence data;

4. A method according to any one of claims 1-3, wherein said contrast learning training of the initial model based on said first enhanced timing data and second enhanced timing data comprises:

5. The method of claim 4, wherein the obtaining contrast loss comprises:

6. The method according to claim 1 or 2, wherein said masking the first enhanced timing data by an initial model to obtain a reconstruction penalty comprises:

extracting coding features of the first enhancement time sequence data;

7. The method according to claim 1, wherein optimizing the training parameters of the initial model based on the contrast loss and the reconstruction loss results in a grid time series data feature extraction model, comprising:

8. The utility model provides a method for extracting the time sequence data characteristics of a power grid, which is characterized by comprising the following steps:

inputting the original power grid time sequence data into a power grid time sequence data feature extraction model to obtain power grid time sequence data features of the power grid;

the power grid time sequence data characteristic extraction model is trained by the method of any one of claims 1-7.

9. The utility model provides a training device of electric wire netting time sequence data characteristic extraction model which characterized in that includes:

10. A power grid time series data feature extraction device, comprising:

11. A computer device, comprising: a processor and a memory, the processor being configured to execute a training program of the power grid time series data feature extraction model stored in the memory, to implement the training method of the power grid time series data feature extraction model according to any one of claims 1 to 7 and the power grid time series data feature extraction method according to claim 8.

12. A storage medium storing one or more programs executable by one or more processors to implement the method of training the grid time series data feature extraction model of any one of claims 1-7 and the method of grid time series data feature extraction of claim 8.