CN116579413A - Time sequence data pre-training model fine adjustment method and device and time sequence data prediction model - Google Patents

Time sequence data pre-training model fine adjustment method and device and time sequence data prediction model Download PDF

Info

Publication number
CN116579413A
CN116579413A CN202310582798.4A CN202310582798A CN116579413A CN 116579413 A CN116579413 A CN 116579413A CN 202310582798 A CN202310582798 A CN 202310582798A CN 116579413 A CN116579413 A CN 116579413A
Authority
CN
China
Prior art keywords
time sequence
input
data
feature
sequence data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310582798.4A
Other languages
Chinese (zh)
Inventor
刘浩
甘津瑞
吴鹏
周飞
姚一杨
王剑
邵进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Smart Grid Research Institute Co ltd
State Grid Corp of China SGCC
Information and Telecommunication Branch of State Grid Zhejiang Electric Power Co Ltd
Original Assignee
State Grid Smart Grid Research Institute Co ltd
State Grid Corp of China SGCC
Information and Telecommunication Branch of State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Smart Grid Research Institute Co ltd, State Grid Corp of China SGCC, Information and Telecommunication Branch of State Grid Zhejiang Electric Power Co Ltd filed Critical State Grid Smart Grid Research Institute Co ltd
Priority to CN202310582798.4A priority Critical patent/CN116579413A/en
Publication of CN116579413A publication Critical patent/CN116579413A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The invention discloses a method and a device for fine tuning a time sequence data pre-training model and a time sequence data prediction model, wherein the method comprises the following steps: acquiring a pre-training model and input time sequence data, wherein the pre-training model comprises an encoder and a decoder, and the encoder is used for extracting the input time sequence characteristics of the input time sequence data; performing linear calculation on the input time sequence features by adopting a linear layer to generate corresponding dynamic prompt features; determining enhanced timing characteristics by combining mask characteristics, dynamic prompt characteristics and input timing characteristics; and inputting the enhanced time sequence characteristics into a decoder for decoding, and predicting time sequence data at the future time based on the future time to be predicted. By implementing the method and the device, for each input time sequence feature, the implicit context knowledge of the input time sequence feature is considered to generate the dynamic prompt feature, and the dynamic prompt feature is taken as prompt information of an example level and is used for fine adjustment of model parameters of downstream tasks, so that the problem of over fitting in the fine adjustment process of the time sequence pre-training model is effectively avoided, and the prediction precision of the downstream time sequence tasks is improved.

Description

Time sequence data pre-training model fine adjustment method and device and time sequence data prediction model
Technical Field
The invention relates to the technical field of data processing, in particular to a method and a device for fine tuning a time sequence data pre-training model and a time sequence data prediction model.
Background
In recent years, time series data analysis plays an important role in many fields including financial markets, medical fields, astronomical fields, and the like. In addition, abundant sensor devices are configured in the power grid scene, massive on-line monitoring time sequence data are generated, abnormal states in the power grid scene can be effectively detected through time sequence prediction, abnormal detection and other analysis technologies, so that the intelligent level of fault diagnosis is improved, serious faults are prevented in advance, and the novel power system construction is supported forcefully.
The time sequence analysis commonly used at present comprises semi-supervised training, self-supervised training and the like, wherein the self-supervised training is used as a general model pre-training learning mode, key information and natural modes of time sequence data can be learned, noise existing in the data is ignored, the over fitting of a model to training data is relieved, and the generalization capability of the model is improved.
However, in the case of using the conventional pre-training model fine-tuning paradigm, due to noise and scale of the time-series training data, an overfitting phenomenon still occurs in the process of fine-tuning the pre-trained model to the downstream task, so that the prediction accuracy of the time-series model is reduced, and it is necessary to design a suitable model fine-tuning paradigm for the power grid time-series pre-training model.
Disclosure of Invention
In view of the above, the embodiment of the invention provides a method and a device for fine tuning a time sequence data pre-training model and a time sequence data prediction model, so as to solve the technical problem that the accuracy of the time sequence model prediction is reduced because the pre-trained model is easy to be over-fitted due to noise and scale of time sequence training data in the prior art.
The technical scheme provided by the invention is as follows:
an embodiment of the present invention provides a method for fine tuning a time series data pre-training model, including: acquiring a pre-training model and input time sequence data, wherein the pre-training model comprises an encoder and a decoder, and the encoder is used for extracting the input time sequence characteristics of the input time sequence data; performing linear calculation on the input time sequence features by adopting a linear layer to generate corresponding dynamic prompt features; determining enhanced timing characteristics by combining mask characteristics, the dynamic prompt characteristics and the input timing characteristics; and inputting the enhanced time sequence characteristics to the decoder for decoding, and predicting time sequence data at a future time based on the future time to be predicted.
Optionally, the method for fine tuning the time series data pre-training model further comprises: acquiring actual time sequence data at a future moment; calculating error loss between the actual time sequence data and the predicted time sequence data by adopting a preset loss function; and adjusting parameters of the linear layer according to the error loss.
Optionally, outputting the enhanced timing feature in combination with the mask feature, the dynamic hint feature, and the input timing feature includes: splicing the input time sequence features and the mask features to obtain spliced features; and adding the dynamic prompt feature and the spliced feature according to the corresponding positions to obtain the enhanced time sequence feature.
A second aspect of the embodiment of the present invention provides a device for fine tuning a training model of time series data, including: the data acquisition module is used for acquiring a pre-training model and input time sequence data, wherein the pre-training model comprises an encoder and a decoder, and the encoder is used for extracting the input time sequence characteristics of the input time sequence data; the dynamic prompting module is used for carrying out linear calculation on the input time sequence characteristics by adopting a linear layer to generate corresponding dynamic prompting characteristics; the enhancement module is used for determining enhanced time sequence characteristics by combining mask characteristics, the dynamic prompt characteristics and the input time sequence characteristics; and the decoding module is used for inputting the enhanced time sequence characteristics to the decoder for decoding, and predicting time sequence data at the future time based on the future time to be predicted.
Optionally, the time series data pre-training model fine adjustment device further includes: the optimization module is specifically used for acquiring actual time sequence data at a future moment; calculating error loss between the actual time sequence data and the predicted time sequence data by adopting a preset loss function; and adjusting parameters of the linear layer according to the error loss.
Optionally, the enhancement module is specifically configured to: splicing the input time sequence features and the mask features to obtain spliced features; and adding the dynamic prompt feature and the spliced feature according to the corresponding positions to obtain the enhanced time sequence feature.
A third aspect of an embodiment of the present invention provides a time-series data prediction model, including: an encoder for extracting an input timing characteristic of the input timing data; the dynamic prompt generator is used for carrying out linear calculation on the input time sequence characteristics by adopting a linear layer, generating corresponding dynamic prompt characteristics, and outputting enhanced time sequence characteristics by combining mask characteristics, the dynamic prompt characteristics and the input time sequence characteristics; and the decoder is used for decoding according to the enhanced time sequence characteristics to obtain time sequence data at a future moment.
Optionally, the dynamic prompter includes: the splicing layer is used for splicing the input time sequence features and the mask features to obtain spliced features; the single-layer linear layer is used for carrying out linear calculation on the input time sequence characteristics according to the weight parameters and the bias parameters of the linear layer to obtain dynamic prompt characteristics; and the enhancement layer is used for adding the dynamic prompt feature and the spliced feature according to the corresponding positions to obtain the enhanced time sequence feature.
Optionally, the encoder includes: a linear mapping layer for mapping the input time series data to a high-dimensional feature space in units of each time step; the first position coding layer is used for adding a time sequence position information first transducer layer for the high-dimensional features in the high-dimensional feature space and is used for interacting the high-dimensional features added with the time sequence position information in different time steps based on a self-attention mechanism to obtain input time sequence features.
Optionally, the decoder includes: a second position coding layer for adding position information to the enhanced timing characteristics; the second transducer layer is used for decoding the enhanced time sequence features added with the position information at different time steps based on a self-attention mechanism to obtain decoded features; and the linear prediction layer is used for mapping the decoded characteristics into time sequence data of future time points of the original dimension.
Optionally, the time series data prediction model further comprises: and the model optimization module is used for adjusting parameters in the dynamic prompt generator according to error loss between the time sequence data at the future moment and the actual time sequence data.
A fourth aspect of the embodiments of the present invention provides a computer readable storage medium, where computer instructions are stored, where the computer instructions are configured to cause the computer to perform the method for fine tuning a time-series data pre-training model according to the first aspect of the embodiments of the present invention and any one of the first aspect.
A fifth aspect of an embodiment of the present invention provides an electronic device, including: the device comprises a memory and a processor, wherein the memory and the processor are in communication connection, the memory stores computer instructions, and the processor executes the computer instructions so as to execute the time sequence data pre-training model fine tuning method according to any one of the first aspect and the first aspect of the embodiment of the invention.
The technical scheme provided by the invention has the following effects:
the method and the device for fine tuning of the time sequence data pre-training model provided by the embodiment of the invention are characterized in that the pre-training model and the input time sequence data are obtained, the pre-training model comprises an encoder and a decoder, and the encoder is used for extracting the input time sequence characteristics of the input time sequence data; performing linear calculation on the input time sequence features by adopting a linear layer to generate corresponding dynamic prompt features; determining enhanced timing characteristics by combining mask characteristics, dynamic prompt characteristics and input timing characteristics; and inputting the enhanced time sequence characteristics into a decoder for decoding, and predicting time sequence data at the future time based on the future time to be predicted. In the fine tuning method, for each input time sequence feature, the implicit context knowledge of the input time sequence feature is considered to generate a dynamic prompt feature, and the dynamic prompt feature is added into the input time sequence feature to serve as prompt information of an example level and is used for fine tuning of model parameters of a downstream task, so that the problem of over fitting in a fine tuning process of a time sequence pre-training model is effectively avoided, and the prediction precision of the downstream time sequence task is greatly improved.
According to the time sequence data prediction model provided by the embodiment of the invention, on the basis of a pre-training model formed by an encoder and a decoder, a dynamic prompt generator is added between the encoder and the decoder and is used for generating the dynamic prompt characteristics of the input time sequence characteristics, so that the setting of the dynamic prompt generator considers the implicit context knowledge of each input time sequence characteristic to generate the dynamic prompt characteristics, the dynamic prompt characteristics are added into the input time sequence characteristics to serve as example-level prompt information, the model parameters of the downstream tasks are finely adjusted, the over-fitting problem in the fine adjustment process of the time sequence pre-training model is effectively avoided, and the prediction precision of the downstream time sequence tasks is greatly improved.
The time sequence data prediction model provided by the embodiment of the invention adopts a single-layer linear layer in the dynamic prompt generator, namely the linear layer has extremely small parameter quantity. Therefore, the model does not need to store any intermediate layer gradient result, only needs to calculate gradient information of the dynamic lifting generator, and the video memory occupation of the model is greatly reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method of fine tuning a time series data pre-training model according to an embodiment of the present invention;
FIG. 2 is a block diagram of a timing data pre-training model fine tuning apparatus according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of the structure of a temporal data prediction model according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of the structure of a dynamic hint generator according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an encoder according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a decoder according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a computer-readable storage medium provided according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
The terms first, second, third, fourth and the like in the description and in the claims and in the above drawings are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
As described in the background art, due to noise and scale of the time sequence training data, the pre-trained model is prone to over-fitting phenomenon, so that the prediction accuracy of the time sequence model is reduced, and it is necessary to design a proper model fine-tuning paradigm for the power grid time sequence pre-training model. The prompt learning method in the current academy is used for carrying out model fine adjustment on the downstream task by adding the prompt information under the condition of not significantly changing the structure and parameters of the pre-training model, and has been successfully applied to the fields of natural language processing and computer vision. However, no related work is designed for the "hint" in the field of timing analysis.
In view of this, the method for fine tuning a time series data pre-training model according to the embodiment of the invention introduces a prompt learning technology into the time series analysis field, considers the implicit context knowledge of each input data according to the characteristics of the time series data, forms an example level prompt message applicable to the time series data, is used for fine tuning model parameters of downstream tasks, effectively avoids the problem of over fitting in the fine tuning process of the time series pre-training model, and greatly improves the prediction precision of the downstream power grid time series tasks.
In accordance with an embodiment of the present invention, a method of fine tuning a time series data pre-training model is provided, it being noted that the steps shown in the flowchart of the figures may be performed in a computer system, such as a set of computer executable instructions, and, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in an order other than that shown or described herein.
In this embodiment, a method for fine tuning a time series data pre-training model is provided, which can be used for electronic devices, such as computers, mobile phones, tablet computers, etc., fig. 1 is a flowchart of a method for fine tuning a time series data pre-training model according to an embodiment of the present invention, as shown in fig. 1, and the method includes the following steps:
step S101: a pre-training model and input timing data is acquired, the pre-training model including an encoder and a decoder, the encoder being configured to extract input timing characteristics of the input timing data. Specifically, the pre-training model adopts a transducer model; before the fine tuning method provided by the embodiment of the invention is adopted, the encoder and the decoder in the transducer model can be pre-trained by adopting the historical time sequence data as a training set, and parameters in the encoder and the decoder are determined, wherein the parameters are fixed in the subsequent fine tuning process of the model. The pre-training process of the transducer model can be implemented with reference to the prior art, and will not be described in detail herein. It should be noted that the input time series data may be time series data acquired from the power grid system, and the input time series data is used as training data for the model fine tuning process.
The encoder consists of a linear mapping layer, a position coding layer and two Transformer blocks. The linear mapping layer is used for mapping the input time sequence data to a high-dimensional feature space in a unit of each time step; the position coding layer adds time sequence position information for the high-dimensional features in the high-dimensional feature space; the two Transformer blocks interact with the high-dimensional features that add timing position information at different time steps based on a self-attention mechanism. In the encoder, the addition of a transducer block has the problem of balancing the parameter quantity and the model precision, and the model precision is generally better when the parameter is more, but the model speed is slower; according to the method, two transducer blocks are adopted in the encoder, so that the reasoning speed of the whole model can be kept, and meanwhile, the accuracy of the model can be guaranteed.
When the model fine tuning method is adopted, an encoder and input time sequence data in a pre-training model are firstly obtained, the encoder can map the input time sequence data to a feature space, and extraction of input time sequence features is completed. If the input time sequence features are directly transmitted to a decoder for decoding, the over-fitting phenomenon is easy to occur due to noise and scale of time sequence data, and the prediction precision is low. Therefore, the model fine tuning method firstly designs the prompt information of the input time sequence characteristics output by the encoder, and adds the prompt information into the input time sequence characteristics, so that the problem of over fitting can be effectively avoided.
Step S102: and carrying out linear calculation on the input time sequence features by adopting a linear layer to generate corresponding dynamic prompt features. Specifically, the model fine tuning method adopts a prompt learning method to generate prompt information and adds the prompt information into the input time sequence characteristics. The prior prompt learning is to reform a downstream task and increase expert knowledge to enable task input and output to be suitable for an original language model, so that a good task effect is obtained in a zero sample or less sample scene. However, the hint features obtained with existing hint learning are typically represented by shared learnable parameters, i.e. the resulting hint features are features representing the entire training set.
In this embodiment, the generated hint features are dynamic, i.e., dynamic hint features are generated from the current input data, representing contextual features of the current data, including instance-level information; the instance level may be understood as a level of input data, where each input data is an instance, that is, a corresponding dynamic prompt feature is generated for each input data, that is, the prompt feature changes with changes of the input data, and is dynamically generated, and compared with a feature of a non-instance level in which all input data adopts the same prompt feature, the dynamic prompt feature has a stronger and more specific expression capability.
In particular, the linear computation may be implemented using a single linear layer. And processing the input time sequence characteristics by adopting a single-layer linear layer, so that dynamic prompt characteristics on a predicted time step can be generated. The linear calculation process can be implemented by the following formula:
D=W·F+b
wherein the method comprises the steps of,F∈R h×d Representing the input timing characteristics, h representing the length of the input timing data, d representing the dimension of each time step feature vector, e.g., time steps are hours, in which d represents the feature vector dimension corresponding to each hour, which may be 512.W epsilon R l×h For the weight parameter of the linear layer, l represents the length of the preset time sequence data, b epsilon R d Is the bias parameter of the linear layer, D epsilon R l×d And (5) generating dynamic prompt features. Specifically, the weight parameters and bias parameters in the linear layer are both learnable parameters.
It should be noted that the linear layer is a single linear layer, that is, the linear layer has a very small number of parameters. Therefore, the trimming method does not need to store any intermediate layer gradient result, and the video memory occupation of the trimming method is greatly reduced.
Step S103: and determining enhanced timing characteristics by combining mask characteristics, the dynamic prompt characteristics and the input timing characteristics. Wherein the mask features are determined by the pre-training model, i.e. not only the parameters in the encoder and decoder but also the mask features are determined simultaneously during the pre-training of the transducer model. The mask feature functions to determine which features in locations are missing. Mask features are specifically defined by M ε R d Copy-spread at a predicted timing length l, i.e. mask features expressed asIn the mask feature, d represents a one-dimensional vector of dimension d, which contains d learnable parameters. Specifically, after mask features are determined in the pre-training model, enhanced timing features are obtained in combination with input timing features and dynamic cue features corresponding to each input timing feature.
Step S104: and inputting the enhanced time sequence characteristics to the decoder for decoding, and predicting time sequence data at a future time based on the future time to be predicted. Specifically, in the trimming method, the generated enhanced timing characteristics are input to a decoder, which is determined by training the transducer model, instead of the input timing characteristics of the encoder output.
Specifically, the decoder is composed of a position coding layer, a single transform block and a linear prediction layer, wherein the position coding layer firstly adds position information for the enhanced time sequence characteristics, then the transform block decodes the characteristic information at different time steps based on a self-attention mechanism, and finally the linear prediction layer maps the decoded characteristics into the predicted time sequence data of the original dimension.
According to the time sequence data pre-training model fine tuning method provided by the embodiment of the invention, the pre-training model and the input time sequence data are obtained, the pre-training model comprises an encoder and a decoder, and the encoder is used for extracting the input time sequence characteristics of the input time sequence data; performing linear calculation on the input time sequence features by adopting a linear layer to generate corresponding dynamic prompt features; determining enhanced timing characteristics by combining mask characteristics, dynamic prompt characteristics and input timing characteristics; and inputting the enhanced time sequence characteristics into a decoder for decoding, and predicting time sequence data at the future time based on the future time to be predicted. In the fine tuning method, for each input time sequence feature, the implicit context knowledge of the input time sequence feature is considered to generate a dynamic prompt feature, and the dynamic prompt feature is added into the input time sequence feature to serve as prompt information of an example level and is used for fine tuning of model parameters of a downstream task, so that the problem of over fitting in a fine tuning process of a time sequence pre-training model is effectively avoided, and the prediction precision of the downstream time sequence task is greatly improved.
In one embodiment, the method for fine tuning a time series data pre-training model further comprises: acquiring actual time sequence data at a future moment; calculating error loss between the actual time sequence data and the predicted time sequence data by adopting a preset loss function; and adjusting parameters of the linear layer according to the error loss.
Specifically, the method for fine-tuning the time sequence data pre-training model adopts the linear layer to generate the dynamic prompt characteristic of each input time sequence characteristic, and in order to enable the prediction result of the model after fine-tuning to be more accurate, the model can be optimized, namely the learnable parameters in the linear layer, namely the weight parameters and the bias parameters, are adjusted. It should be noted that the optimization process is mainly adjusted for parameters in the linear layer, and the foregoing fine tuning process and the optimization process herein are both fixed for parameters in the encoder and the decoder, or the parameters in the encoder and the decoder are not updated, so as to prevent the information learned in the encoder or decoder training process from being lost.
By predicting the time series data of the future time which belongs to the future time with respect to the input time series data by the above steps S101 to S103, for example, the input time series data is data from ten to twelve in 10 am of 2 nd month of 2022, the predicted time series data of the future time may be data from twelve in 10 am to twelve in pm of 2 nd month of 2022. When the model is optimized, the time sequence data at the future time, namely the actual time sequence data at the corresponding time, can be directly obtained, the loss error between the actual time sequence data and the predicted time sequence data is calculated, and the parameters in the linear layer are adjusted based on the loss error.
Specifically, the preset loss function may employ a mean square loss error, i.e., the loss function may be expressed as:
L=MSE(pred,target)
where MSE is the mean square error loss, pred and target are the predicted and actual time series data, respectively, and L is the calculated loss error. In other embodiments, other loss functions may be used to calculate the loss error, and the specific choice of the loss function is not limited in the embodiments of the present invention.
In one embodiment, outputting the enhanced timing feature in combination with the mask feature, the dynamic hint feature, and the input timing feature comprises: splicing the input time sequence features and the mask features to obtain spliced features; and adding the dynamic prompt feature and the spliced feature according to the corresponding positions to obtain the enhanced time sequence feature.
Specifically, the input timing characteristics F ε R h×d And mask featureDuring splicing, a characteristic splicing function can be adopted, and the splicing method is longAnd splicing the two characteristics in the dimension of the degree, wherein the spliced characteristics are expressed as follows:
wherein Concat is a characteristic splicing function, and the spliced characteristic is defined as C epsilon R (+l)×d
The enhanced timing characteristics are expressed asWherein D represents a dynamic prompt feature, add is a bitwise function of elements, and ++>To enhance the timing characteristics.
The embodiment of the invention also provides a device for fine tuning the time sequence data pre-training model, as shown in fig. 2, which comprises:
the data acquisition module is used for acquiring a pre-training model and input time sequence data, wherein the pre-training model comprises an encoder and a decoder, and the encoder is used for extracting the input time sequence characteristics of the input time sequence data; the specific content refers to the corresponding parts of the above method embodiments, and will not be described herein.
The dynamic prompting module is used for carrying out linear calculation on the input time sequence characteristics by adopting a linear layer to generate corresponding dynamic prompting characteristics; the specific content refers to the corresponding parts of the above method embodiments, and will not be described herein.
The enhancement module is used for determining enhanced time sequence characteristics by combining mask characteristics, the dynamic prompt characteristics and the input time sequence characteristics; the specific content refers to the corresponding parts of the above method embodiments, and will not be described herein.
And the decoding module is used for inputting the enhanced time sequence characteristics to the decoder for decoding, and predicting time sequence data at the future time based on the future time to be predicted. The specific content refers to the corresponding parts of the above method embodiments, and will not be described herein.
According to the time sequence data pre-training model fine adjustment device provided by the embodiment of the invention, the pre-training model and the input time sequence data are obtained, the pre-training model comprises an encoder and a decoder, and the encoder is used for extracting the input time sequence characteristics of the input time sequence data; performing linear calculation on the input time sequence features by adopting a linear layer to generate corresponding dynamic prompt features; determining enhanced timing characteristics by combining mask characteristics, dynamic prompt characteristics and input timing characteristics; and inputting the enhanced time sequence characteristics into a decoder for decoding, and predicting time sequence data at the future time based on the future time to be predicted. Therefore, in the fine adjustment device, for each input time sequence feature, the implicit context knowledge of the input time sequence feature is considered to generate a dynamic prompt feature, and the dynamic prompt feature is added into the input time sequence feature to serve as prompt information of an example level and is used for fine adjustment of model parameters of downstream tasks, so that the problem of overfitting in the fine adjustment process of a time sequence pre-training model is effectively avoided, and the prediction precision of the downstream time sequence tasks is greatly improved.
Optionally, the time series data pre-training model fine adjustment device further includes: the optimization module is specifically used for acquiring actual time sequence data at a future moment; calculating error loss between the actual time sequence data and the predicted time sequence data by adopting a preset loss function; and adjusting parameters of the linear layer according to the error loss.
Optionally, the enhancement module is specifically configured to: splicing the input time sequence features and the mask features to obtain spliced features; and adding the dynamic prompt feature and the spliced feature according to the corresponding positions to obtain the enhanced time sequence feature.
The functional description of the time sequence data pre-training model fine adjustment device provided by the embodiment of the invention is detailed by referring to the description of the time sequence data pre-training model fine adjustment method in the embodiment.
The embodiment of the invention also provides a time sequence data prediction model, as shown in fig. 3, which comprises the following steps: an encoder for extracting an input timing characteristic of the input timing data; the dynamic prompt generator is used for carrying out linear calculation on the input time sequence characteristics by adopting a linear layer, generating corresponding dynamic prompt characteristics, and outputting enhanced time sequence characteristics by combining mask characteristics, the dynamic prompt characteristics and the input time sequence characteristics; and the decoder is used for decoding according to the enhanced time sequence characteristics to obtain time sequence data at a future moment.
Specifically, the encoder and the decoder are obtained by training a transducer model, that is, the encoder and the decoder together form a pre-training model, and the time sequence data prediction model is based on the pre-training model, and a dynamic prompt generator is added between the encoder and the decoder. The data processing process of the dynamic prompt generator refers to the description of step S102 and step S103 in the foregoing embodiment of the method for fine tuning the time-series data pre-training model, which is not described herein.
According to the time sequence data prediction model provided by the embodiment of the invention, on the basis of a pre-training model formed by an encoder and a decoder, a dynamic prompt generator is added between the encoder and the decoder and is used for generating the dynamic prompt characteristics of the input time sequence characteristics, so that the setting of the dynamic prompt generator considers the implicit context knowledge of each input time sequence characteristic to generate the dynamic prompt characteristics, the dynamic prompt characteristics are added into the input time sequence characteristics to serve as example-level prompt information, the model parameters of the downstream tasks are finely adjusted, the over-fitting problem in the fine adjustment process of the time sequence pre-training model is effectively avoided, and the prediction precision of the downstream time sequence tasks is greatly improved.
In one embodiment, as shown in fig. 4, the dynamic prompter includes: the splicing layer is used for splicing the input time sequence features and the mask features to obtain spliced features; the single-layer linear layer is used for carrying out linear calculation on the input time sequence characteristics according to the weight parameters and the bias parameters of the linear layer to obtain dynamic prompt characteristics; and the enhancement layer is used for adding the dynamic prompt feature and the spliced feature according to the corresponding positions to obtain the enhanced time sequence feature.
Specifically, the feature splicing layer uses a feature splicing function to realize feature splicing, and the spliced features are expressed by the following formula:
wherein Concat is a characteristic splicing function, F epsilon R h×d The input timing characteristics are represented as such,representing mask features, the spliced features being defined as C ε R (+l)×d
The calculation process of the single-layer linear layer can be realized by adopting the following formula:
D=W·F+b
wherein W is E R l×h For the weight parameter of the linear layer, l represents the length of the preset time sequence data, b epsilon R d Is the bias parameter of the linear layer, D epsilon R l×d And (5) generating dynamic prompt features.
The enhancement layer is realized by adopting elements according to bit addition functions, and the enhanced characteristics are expressed as follows:wherein D represents a dynamic prompt feature, add is a bitwise function of elements, and ++>To enhance the timing characteristics.
The time sequence data prediction model provided by the embodiment of the invention adopts a single-layer linear layer in the dynamic prompt generator, namely the linear layer has extremely small parameter quantity. Therefore, the model does not need to store any intermediate layer gradient result, only needs to calculate gradient information of the dynamic lifting generator, and the video memory occupation of the model is greatly reduced.
In one embodiment, as shown in fig. 5, the encoder includes: a linear mapping layer for mapping the input time series data to a high-dimensional feature space in units of each time step; a first position coding layer for adding timing position information for the high-dimensional features in the high-dimensional feature space; and the first transducer layer is used for interacting the high-dimensional characteristics after adding the time sequence position information on different time steps based on a self-attention mechanism to obtain the input time sequence characteristics. Specifically, the first transducer layer is composed of two transducer blocks with the same function, so that the reasoning speed of the whole model can be kept, and meanwhile, the accuracy of the model can be guaranteed.
In one embodiment, as shown in fig. 6, the decoder includes: a second position coding layer for adding position information to the enhanced timing characteristics; the second transducer layer is used for decoding the enhanced time sequence features added with the position information at different time steps based on a self-attention mechanism to obtain decoded features; and the linear prediction layer is used for mapping the decoded characteristics into time sequence data of future time points of the original dimension. Wherein the second transducer layer is formed from individual transducer blocks.
In one embodiment, as shown in fig. 3, the time series data prediction model further includes: and the model optimization module is used for adjusting parameters in the dynamic prompt generator according to error loss between time sequence data at future time and actual time sequence data (target time sequence data). In particular, the calculation of the error loss may employ a mean square error loss, i.e. the error loss is expressed as:
L=MSE(pred,target)
wherein MSE is mean square error loss, pred and target are time sequence data and actual time sequence data at future time respectively, and L is calculated loss error.
The embodiment of the present invention further provides a storage medium, as shown in fig. 7, on which a computer program 601 is stored, which when executed by a processor, implements the steps of the method for fine tuning a time series data pretraining model in the above embodiment. The storage medium also stores audio and video stream data, characteristic frame data, interactive request signaling, encrypted data, preset data size and the like. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.
It will be appreciated by those skilled in the art that implementing all or part of the above-described embodiment method may be implemented by a computer program to instruct related hardware, where the program may be stored in a computer readable storage medium, and the program may include the above-described embodiment method when executed. The storage medium may be a magnetic Disk, an optical disc, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a flash Memory (flash Memory), a Hard Disk (HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.
The embodiment of the present invention further provides an electronic device, as shown in fig. 8, which may include a processor 51 and a memory 52, where the processor 51 and the memory 52 may be connected by a bus or other means, and in fig. 8, the connection is exemplified by a bus.
The processor 51 may be a central processing unit (Central Processing Unit, CPU). The processor 51 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or combinations thereof.
The memory 52 serves as a non-transitory computer readable storage medium that may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as corresponding program instructions/modules in embodiments of the present invention. The processor 51 executes various functional applications of the processor and data processing, i.e., implements the time series data pre-training model fine tuning method in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 52.
The memory 52 may include a memory program area that may store an operating device, an application program required for at least one function, and a memory data area; the storage data area may store data created by the processor 51, etc. In addition, memory 52 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 52 may optionally include memory located remotely from processor 51, which may be connected to processor 51 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The one or more modules are stored in the memory 52, which when executed by the processor 51, perform the time series data pre-training model fine tuning method in the embodiment shown in fig. 1.
The specific details of the electronic device may be understood correspondingly with respect to the corresponding related descriptions and effects in the embodiment shown in fig. 1, which are not repeated herein.
Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims (10)

1. A method for fine tuning a time series data pre-training model, comprising:
acquiring a pre-training model and input time sequence data, wherein the pre-training model comprises an encoder and a decoder, and the encoder is used for extracting the input time sequence characteristics of the input time sequence data;
performing linear calculation on the input time sequence features by adopting a linear layer to generate corresponding dynamic prompt features;
determining enhanced timing characteristics by combining mask characteristics, the dynamic prompt characteristics and the input timing characteristics;
and inputting the enhanced time sequence characteristics to the decoder for decoding, and predicting time sequence data at a future time based on the future time to be predicted.
2. The method of fine tuning a time series data pre-training model of claim 1, further comprising:
acquiring actual time sequence data at a future moment;
calculating error loss between the actual time sequence data and the predicted time sequence data by adopting a preset loss function;
and adjusting parameters of the linear layer according to the error loss.
3. The method of claim 1, wherein outputting the enhanced timing feature in combination with the mask feature, the dynamic hint feature, and the input timing feature comprises:
splicing the input time sequence features and the mask features to obtain spliced features;
and adding the dynamic prompt feature and the spliced feature according to the corresponding positions to obtain the enhanced time sequence feature.
4. A time series data pre-training model fine tuning device, comprising:
the data acquisition module is used for acquiring a pre-training model and input time sequence data, wherein the pre-training model comprises an encoder and a decoder, and the encoder is used for extracting the input time sequence characteristics of the input time sequence data;
the dynamic prompting module is used for carrying out linear calculation on the input time sequence characteristics by adopting a linear layer to generate corresponding dynamic prompting characteristics;
the enhancement module is used for determining enhanced time sequence characteristics by combining mask characteristics, the dynamic prompt characteristics and the input time sequence characteristics;
and the decoding module is used for inputting the enhanced time sequence characteristics to the decoder for decoding, and predicting time sequence data at the future time based on the future time to be predicted.
5. A time series data prediction model, comprising:
an encoder for extracting an input timing characteristic of the input timing data;
the dynamic prompt generator is used for carrying out linear calculation on the input time sequence characteristics by adopting a linear layer, generating corresponding dynamic prompt characteristics, and outputting enhanced time sequence characteristics by combining mask characteristics, the dynamic prompt characteristics and the input time sequence characteristics;
and the decoder is used for decoding according to the enhanced time sequence characteristics to obtain time sequence data at a future moment.
6. The temporal data prediction model of claim 5, wherein the dynamic prompter comprises:
the splicing layer is used for splicing the input time sequence features and the mask features to obtain spliced features;
the single-layer linear layer is used for carrying out linear calculation on the input time sequence characteristics according to the weight parameters and the bias parameters of the linear layer to obtain dynamic prompt characteristics;
and the enhancement layer is used for adding the dynamic prompt feature and the spliced feature according to the corresponding positions to obtain the enhanced time sequence feature.
7. The temporal data prediction model of claim 5, wherein the encoder comprises:
a linear mapping layer for mapping the input time series data to a high-dimensional feature space in units of each time step;
a first position coding layer for adding timing position information for the high-dimensional features in the high-dimensional feature space;
and the first transducer layer is used for interacting the high-dimensional characteristics after adding the time sequence position information on different time steps based on a self-attention mechanism to obtain the input time sequence characteristics.
8. The temporal data prediction model of claim 5, further comprising:
and the model optimization module is used for adjusting parameters in the dynamic prompt generator according to error loss between the time sequence data at the future moment and the actual time sequence data.
9. A computer readable storage medium storing computer instructions for causing the computer to perform the time series data pre-training model fine tuning method according to any one of claims 1-3.
10. An electronic device, comprising: a memory and a processor, said memory and said processor being communicatively coupled to each other, said memory storing computer instructions, said processor executing said computer instructions to perform a method of fine-tuning a time series data pre-training model as claimed in any one of claims 1-3.
CN202310582798.4A 2023-05-22 2023-05-22 Time sequence data pre-training model fine adjustment method and device and time sequence data prediction model Pending CN116579413A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310582798.4A CN116579413A (en) 2023-05-22 2023-05-22 Time sequence data pre-training model fine adjustment method and device and time sequence data prediction model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310582798.4A CN116579413A (en) 2023-05-22 2023-05-22 Time sequence data pre-training model fine adjustment method and device and time sequence data prediction model

Publications (1)

Publication Number Publication Date
CN116579413A true CN116579413A (en) 2023-08-11

Family

ID=87545019

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310582798.4A Pending CN116579413A (en) 2023-05-22 2023-05-22 Time sequence data pre-training model fine adjustment method and device and time sequence data prediction model

Country Status (1)

Country Link
CN (1) CN116579413A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116776228A (en) * 2023-08-17 2023-09-19 合肥工业大学 Power grid time sequence data decoupling self-supervision pre-training method and system
CN117388716A (en) * 2023-12-11 2024-01-12 四川长园工程勘察设计有限公司 Battery pack fault diagnosis method, system and storage medium based on time sequence data

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116776228A (en) * 2023-08-17 2023-09-19 合肥工业大学 Power grid time sequence data decoupling self-supervision pre-training method and system
CN116776228B (en) * 2023-08-17 2023-10-20 合肥工业大学 Power grid time sequence data decoupling self-supervision pre-training method and system
CN117388716A (en) * 2023-12-11 2024-01-12 四川长园工程勘察设计有限公司 Battery pack fault diagnosis method, system and storage medium based on time sequence data
CN117388716B (en) * 2023-12-11 2024-02-13 四川长园工程勘察设计有限公司 Battery pack fault diagnosis method, system and storage medium based on time sequence data

Similar Documents

Publication Publication Date Title
CN116579413A (en) Time sequence data pre-training model fine adjustment method and device and time sequence data prediction model
JP7194284B2 (en) Quantization model optimization method, device, information recommendation method, device, neural network model optimization method, device, electronic device, and computer program
CN110347873B (en) Video classification method and device, electronic equipment and storage medium
CN113034380B (en) Video space-time super-resolution method and device based on improved deformable convolution correction
CN111819580A (en) Neural architecture search for dense image prediction tasks
CN112863180B (en) Traffic speed prediction method, device, electronic equipment and computer readable medium
CN110796619A (en) Image processing model training method and device, electronic equipment and storage medium
CN114418030B (en) Image classification method, training method and device for image classification model
US20190327481A1 (en) Intra-prediction video coding method and device
CN114283120B (en) Domain-adaptive-based end-to-end multisource heterogeneous remote sensing image change detection method
CN114285728A (en) Prediction model training method, flow prediction method, device and storage medium
US20210073645A1 (en) Learning apparatus and method, and program
WO2023279693A1 (en) Knowledge distillation method and apparatus, and terminal device and medium
CN113409803B (en) Voice signal processing method, device, storage medium and equipment
CN116522099A (en) Time sequence data self-supervision pre-training model, construction method, equipment and storage medium
CN116957024A (en) Method and device for reasoning by using neural network model
CN115333961B (en) Wireless communication network management and control method based on deep reinforcement learning and related equipment
US20220327663A1 (en) Video Super-Resolution using Deep Neural Networks
CN114511767B (en) Rapid state prediction method for time sequence diagram data
CN115170887A (en) Target detection model training method, target detection method and device thereof
CN113947250A (en) Urban fine-grained flow prediction method and system based on limited data resources
CN112818846A (en) Video frame feature extraction method and device and electronic equipment
CN114730380A (en) Deep parallel training of neural networks
WO2023206532A1 (en) Prediction method and apparatus, electronic device and computer-readable storage medium
CN114363951B (en) Inter-cell flow collaborative prediction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination