CN117932347A - Small sample time sequence prediction method and system based on resistance transfer learning - Google Patents

Small sample time sequence prediction method and system based on resistance transfer learning Download PDF

Info

Publication number
CN117932347A
CN117932347A CN202410332311.1A CN202410332311A CN117932347A CN 117932347 A CN117932347 A CN 117932347A CN 202410332311 A CN202410332311 A CN 202410332311A CN 117932347 A CN117932347 A CN 117932347A
Authority
CN
China
Prior art keywords
time sequence
sequence data
domain
small sample
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410332311.1A
Other languages
Chinese (zh)
Other versions
CN117932347B (en
Inventor
蒿颜奇
罗川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202410332311.1A priority Critical patent/CN117932347B/en
Publication of CN117932347A publication Critical patent/CN117932347A/en
Application granted granted Critical
Publication of CN117932347B publication Critical patent/CN117932347B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a small sample time sequence prediction method and a system based on resistance transfer learning, which belong to the technical field of time sequence prediction and comprise the following steps: based on resistance migration learning, constructing a composite neural network model comprising a k-means clustering algorithm, a one-dimensional convolutional neural network, a domain classifier and an LSTM long-term memory neural network, and predicting small sample time sequence data by utilizing the trained composite neural network model to obtain a small sample time sequence prediction result. According to the invention, the time sequence data of the source domain is used for training the composite neural network model through resistance transfer learning, so that the time sequence data of the target domain can be predicted, excellent performance is shown on the source domain and the target domain, the time sequence data can be predicted under the condition that only a small amount of small sample time sequence data of the target domain exists, and the problems of low utilization rate and inaccurate prediction caused by poor generalization capability and overlarge data distribution difference of different domains of the traditional time sequence prediction model are solved.

Description

Small sample time sequence prediction method and system based on resistance transfer learning
Technical Field
The invention belongs to the technical field of time sequence prediction, and particularly relates to a small sample time sequence prediction method and system based on resistance transfer learning.
Background
The goal of the migration learning is to effectively overcome the distribution differences observed in different but interrelated target domains or tasks, also known as domain transfers, using information previously obtained from the source task. Domain matching is required for transfer learning, which is an optimization of a specific target domain to improve model efficiency in the target domain, and feature representation that enables model autonomous learning generalization based on antagonistic transfer learning.
In the prior art, when predicting a small sample time sequence, because of limitation of the number of samples, the existing neural network model may excessively learn noise and details in a training set, so that generalization capability is weak, when new prediction data are processed, the situation that a prediction result is easy to fluctuate greatly occurs, accuracy is low, uncertainty of the prediction result is difficult to evaluate and quantify, meanwhile, the utilization rate of a traditional model is low due to overlarge data distribution difference of different domains, and effective and accurate prediction cannot be performed.
Disclosure of Invention
Aiming at the defects in the prior art, the small sample time sequence prediction method and the system based on the resistance transfer learning can solve the problems of low utilization rate and low prediction accuracy caused by poor generalization capability of a traditional model and overlarge difference of different data distribution through the resistance transfer learning.
In order to achieve the aim of the invention, the invention adopts the following technical scheme:
In one aspect, the invention provides a small sample time sequence prediction method based on resistance transfer learning, comprising the following steps:
s1: based on the resistance transfer learning, constructing and training a composite neural network model to obtain a trained composite neural network model;
s2: acquiring small sample time sequence data to be predicted, and extracting features on a time slice by utilizing a trained composite neural network model to obtain feature representation of the time sequence data;
s3: and carrying out regression prediction by using the trained composite neural network model according to the characteristic representation of the time sequence data to obtain a small sample time sequence prediction result.
The beneficial effects of the invention are as follows: the invention can directly use the model obtained by training the time sequence data of the source domain to help predict the time sequence data of the target domain through resistance transfer learning, and simultaneously, the invention can predict the time sequence data under the condition that only a small amount of time sequence data of the target domain is available by taking the characteristic extraction module as a generator and taking the domain classification module as a discriminator through transfer learning, has the characteristics of accuracy and convenience under the condition that the data is scarce, and solves the problems of low utilization rate and inaccurate prediction caused by poor generalization capability of the traditional model and overlarge data distribution difference of different domains.
Further: the specific steps of constructing and training the composite neural network model are as follows:
a1: acquiring time sequence data in the existing multi-domain small sample feature space, and dividing the time sequence data into a training set and a testing set;
A2: dividing the time sequence data of the training set into a source domain and a target domain on different granularities by using a k-means clustering algorithm to obtain source domain long time sequence data and target domain time sequence data;
A3: according to the source domain long time sequence data, utilizing a multi-layer one-dimensional convolutional neural network to extract the characteristics on a time slice, and obtaining the characteristic representation of the source domain long time sequence data and the internal association of the characteristic attribute;
a4: according to the characteristic representation of the source domain long time sequence data and the internal association of the characteristic attribute, the domain classification is utilized, the characteristic representation of the source domain is migrated to the target domain through resistance migration learning, the characteristic representation learning is carried out on the target domain time sequence data, and the characteristic representation of the target domain is obtained;
a5: according to the characteristic representation of the target domain and the time sequence data of the target domain, performing regression prediction by using an LSTM long-term memory neural network to obtain a small sample time sequence prediction result;
A6: and calculating to obtain a final loss function of the composite neural network model according to the test set and the small sample time sequence prediction result, judging whether the final loss function is minimum, if so, completing training to obtain a trained composite neural network model, otherwise, returning to A3.
The beneficial effects of the above-mentioned further scheme are: by training the composite neural network model, generalization capability and performance indexes of the composite neural network model can be improved, so that the composite neural network model can predict small sample time sequence data.
Further: the specific steps of A3 are as follows:
A301: according to the source domain long time sequence data, extracting the characteristics on a time slice by utilizing a plurality of one-dimensional convolution layers to obtain the original characteristic representation of the source domain long time sequence data;
A302: and flattening according to the original local characteristics of the source domain long time sequence data to obtain the characteristic representation and the internal association of the characteristic attribute of the source domain long time sequence data.
The beneficial effects of the above-mentioned further scheme are: the one-dimensional convolutional neural network is utilized to extract the characteristics of the time sequence data, so that the local characteristics in the time sequence data can be effectively captured, the generalization capability is improved, and the characteristics of a longer time span are identified.
Further: the specific steps of A4 are as follows:
A401: according to the local characteristics of the source domain long time sequence data and the internal correlation of the characteristic attributes, processing is carried out by utilizing a gradient inversion layer, so that the local characteristics of the forward source domain long time sequence data are obtained;
A402: and utilizing a plurality of layers of full-connection layers and a Softmax function, migrating local features of the forward source domain long time sequence data to a target domain through resistance migration learning, and assisting the target domain time sequence data to perform feature representation learning to obtain feature representation of the target domain.
The beneficial effects of the above-mentioned further scheme are: the feature extraction and domain classification are used for performing resistance learning, and the feature representation of the source domain is transferred to the target domain, so that the feature extraction capability and the domain classification capability can be improved, and the prediction result is more accurate.
Further: the specific steps of A5 are as follows:
A501: according to the characteristic representation of the target domain and the time sequence data of the target domain, performing regression prediction by using a forward LSTM long-term memory neural network to obtain a small sample time sequence forward prediction result;
A502: according to the characteristic representation of the target domain and the time sequence data of the target domain, performing regression prediction by using a reverse LSTM long-term memory neural network to obtain a small sample time sequence reverse prediction result;
A503: and adding according to the forward prediction result and the reverse prediction result to obtain a small sample time sequence prediction result.
The beneficial effects of the above-mentioned further scheme are: through the forward LSTM long-term memory neural network and the reverse LSTM long-term memory neural network, the global information capturing and predicting of the characteristics of the long time sequence data by the composite neural network model are improved, the expression capability is stronger, and the influence of gradient explosion or gradient disappearance can be reduced.
Further: the expression of the final loss function is as follows:
wherein, As a final loss function,/>Is the global optimum,/>Weights extracted for features,/>Weight for regression prediction,/>Weights for domain classification,/>Loss function for regression prediction,/>For feature extraction,/>For regression prediction,/>For domain classification,/>Is the total number of domains,/>For 1 st source domain,/>For/>Source domain/>In order for the domain of interest to be a target,Loss function classifying a domain,/>As a gradient inversion function,/>For the characteristic value of the small sample time sequence data in the test set,/>For one piece of data in the small sample time sequence data set in the test set,/>For the number of sample timing predictions,/>/>, For small sample timing data in test setActual value/>For the/>, in the small sample timing prediction resultResults,/>To test the first/>, of small sample data in the setActual tag value of personal field,/>For the/>, in the small sample timing prediction resultTag value of individual field.
The beneficial effects of the above-mentioned further scheme are: the performance of the prediction result can be visually observed through the final loss function, and the composite neural network model can be optimized conveniently.
In another aspect, the present invention provides a small sample timing prediction system based on resistance transfer learning, comprising:
The clustering module is used for dividing the time sequence data into a source domain and a target domain on different granularities by using a k-means clustering algorithm to acquire source domain long time sequence data and target domain time sequence data;
the feature extraction module is used for carrying out feature extraction on time slices on the time series data by utilizing a multi-layer one-dimensional convolutional neural network to obtain feature representation of the source domain long time series data and internal association of feature attributes;
The domain classification module is used for migrating the characteristic representation of the source domain to the target domain through resistance migration learning according to the characteristic representation of the source domain long time sequence data and the internal association of the characteristic attribute, assisting the characteristic representation learning of the target domain time sequence data, and obtaining the characteristic representation of the target domain;
and the time sequence regression prediction module is used for predicting the time sequence of the small sample according to the time sequence data and the characteristic representation.
The beneficial effects of the invention are as follows: the invention is based on the GAN generation countermeasure network idea, takes the feature extraction module as a generator, takes the domain classification module as a discriminator, performs countermeasure migration learning of small sample time sequence data, trains the system model by utilizing the time sequence data of the source domain, can predict the time sequence data of the target domain by utilizing the trained system, and has better performance under the condition of only a small amount of small sample time sequence data.
Further: the feature extraction module includes: the one-dimensional convolution layers are sequentially connected with the ReLU activation layer and the pooling layer;
the plurality of one-dimensional convolution layers are used for dividing the source domain long time sequence data into time slices and carrying out quick feature extraction;
and the flattening layer is used for flattening the original characteristic representation of the time sequence data.
The beneficial effects of the above-mentioned further scheme are: by utilizing a plurality of one-dimensional convolution layers, the characteristic representation of the long time sequence data can be obtained from a small amount of small sample time sequence data, the characteristic extraction capability of the composite neural network model on the small sample time sequence data is improved, and the subsequent analysis and prediction are facilitated.
Further: the domain classification module includes: the gradient inversion layer is connected with the full-connection layers in sequence, the last full-connection layer of the full-connection layers is connected with the Softmax function, and the full-connection layers except the last layer are connected with the ReLU activation function and the Dropout discarding layer in sequence;
the gradient inversion layer is used for carrying out forward processing on the characteristic representation of the source domain long time sequence data and the internal association of the characteristic attribute;
The plurality of full-connection layers are used for converting the characteristic representation of the input source domain long time sequence data into the characteristic representation of the specified domain classification task;
The Dropout discarding layer is configured to discard data with a set probability.
The beneficial effects of the above-mentioned further scheme are: the gradient inversion layer is utilized, so that the loss function of the regression prediction module and the loss function of the domain classification module have the same monotonicity, the softmax function is utilized for data classification, the Dropout discarding layer is utilized for discarding part of data, and the overfitting is prevented.
Drawings
FIG. 1 is a flow chart of a small sample timing prediction method based on resistance transfer learning;
FIG. 2 is a block diagram of a small sample timing prediction system based on resistance transfer learning;
FIG. 3 is a diagram of a small sample timing prediction system model architecture based on resistance transfer learning;
FIG. 4 is an application of a small sample timing prediction system based on resistance transfer learning.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.
Example 1
In one embodiment of the present invention, as shown in fig. 1, a small sample timing prediction method based on resistance transfer learning includes the following steps:
s1: based on the resistance transfer learning, constructing and training a composite neural network model to obtain a trained composite neural network model;
s2: acquiring small sample time sequence data to be predicted, and extracting features on a time slice by utilizing a trained composite neural network model to obtain feature representation of the time sequence data;
s3: and carrying out regression prediction by using the trained composite neural network model according to the characteristic representation of the time sequence data to obtain a small sample time sequence prediction result.
Before the small sample time sequence data is predicted by using the composite neural network model, training is needed, and the specific steps of constructing and training the composite neural network model are as follows:
a1: acquiring time sequence data in the existing multi-domain small sample feature space, and dividing the time sequence data into a training set and a testing set;
A2: dividing the time sequence data of the training set into a source domain and a target domain on different granularities by using a k-means clustering algorithm to obtain source domain long time sequence data and target domain time sequence data;
A3: according to the source domain long time sequence data, utilizing a multi-layer one-dimensional convolutional neural network to extract the characteristics on a time slice, and obtaining the characteristic representation of the source domain long time sequence data and the internal association of the characteristic attribute;
a4: according to the characteristic representation of the source domain long time sequence data and the internal association of the characteristic attribute, the domain classification is utilized, the characteristic representation of the source domain is migrated to the target domain through resistance migration learning, the characteristic representation learning is carried out on the target domain time sequence data, and the characteristic representation of the target domain is obtained;
a5: according to the characteristic representation of the target domain and the time sequence data of the target domain, performing regression prediction by using an LSTM long-term memory neural network to obtain a small sample time sequence prediction result;
A6: and calculating to obtain a final loss function of the composite neural network model according to the test set and the small sample time sequence prediction result, judging whether the final loss function is minimum, if so, completing training to obtain a trained composite neural network model, otherwise, returning to A3.
The specific expression of the k-means clustering algorithm in A2 is as follows:
wherein, Representing the distance between different domains, 2 being the norm,/>For the number of clusters at different granularities,/>For the number of features,/>For the total number of features,/>And/>Respectively two different domains,/>And/>Respectively/>Domain and/>The eigenvalues of the fields.
The specific steps of A3 are as follows:
A301: according to the source domain long time sequence data, extracting features on a time slice by utilizing a plurality of one-dimensional convolution layers to obtain an original feature representation of the source domain long time sequence data, wherein the expression of the one-dimensional convolution neural network is as follows:
wherein, For the final output of the characteristic representation,/>For the characteristic representation of the L-layer convolution layer output,/>Is a filter,/>For/>Characteristic representation of layer convolution layer output,/>For biasing,/>To flatten layer/>To activate the function,/>For inputting the number of features,/>For each layer of convolutional layer numbering,/>Is the number of convolutional layers;
A302: and flattening according to the original local characteristics of the source domain long time sequence data to obtain the characteristic representation and the internal association of the characteristic attribute of the source domain long time sequence data.
The specific steps of A4 are as follows:
A401: according to the local characteristics of the source domain long time sequence data and the internal correlation of the characteristic attributes, processing by utilizing a gradient inversion layer to obtain the local characteristics of the forward source domain long time sequence data, wherein the specific expression of the gradient inversion layer is as follows:
wherein, As a gradient inversion function,/>Is the characteristic value of specific data,/>To derive the gradient inversion function during back propagation,/>As a super-parameter dynamically changing along with the iterative process,/>Is a unitary matrix,/>To be with natural constant/>An exponential function of the base,/>For the result of formula calculation,/>For the number of batches,/>For the current number of iterations,/>To consider the length of the smallest total batch of target and source domain training data,/>Is the overall number of iterations; in the process of model iterative optimization error, the batch and the iterative times are changed between 0 and 1.
A402: and utilizing a plurality of layers of full-connection layers and a Softmax function, migrating local features of the forward source domain long time sequence data to a target domain through resistance migration learning, and assisting the target domain time sequence data to perform feature representation learning to obtain feature representation of the target domain.
The specific steps of A5 are as follows:
A501: according to the characteristic representation of the target domain and the time sequence data of the target domain, performing regression prediction by using a forward LSTM long-term memory neural network to obtain a small sample time sequence forward prediction result;
A502: according to the characteristic representation of the target domain and the time sequence data of the target domain, performing regression prediction by using a reverse LSTM long-term memory neural network to obtain a small sample time sequence reverse prediction result;
A503: and adding according to the forward prediction result and the reverse prediction result to obtain a small sample time sequence prediction result.
The expression of A5 is as follows:
wherein, For small sample timing predictions,/>For the small sample time sequence forward prediction result output by the forward LSTM long-term memory neural network,/>And outputting a small sample time sequence reverse prediction result for the reverse LSTM long-term memory neural network.
The expression of the LSTM long-term memory neural network in A5 is as follows:
wherein, Is forgetful door,/>For input gate,/>For outputting door,/>Is candidate gate,/>For/>Activating a function,/>For/>Activating a function,/>For updated cell state,/>For the cell state of the last moment,/>For hidden layer output at time t,/>For hidden layer output of last cell,/>For forgetting the weights of the gate multiplied by the hidden layer output,Weight multiplied by feature value for forgetting gate,/>Weights multiplied by hidden layer outputs for input gates,/>For the weight of the multiplication of the input gate with the eigenvalue,/>Weights multiplied by hidden layer output for output gates,/>For outputting the weight of the gate multiplied by the eigenvalue,/>Weights multiplied by hidden layer outputs for candidate gates,/>Weight multiplied by candidate gate and eigenvalue,/>Is the characteristic value of time t/(Transpose of matrix,/>Bias for forgetting gate,/>For biasing of input gate,/>In order to output the bias term of the gate,Bias for candidate gate,/>Is the updated cell state; one time sequence prediction unit of LSTM is divided into input gate/>Output door/>Forget door/>At the input gate/>Output gate/>, with quantization of the amount of information added from input and previous hidden statesJudging whether the information from the unit state should be output to the hidden state; forgetting door/>The amount of information removed from the cell state is quantized.
In one embodiment of the invention, a GRU gated loop cell neural network may also be used, and compared to LSTM long term memory neural networks, the information flow is also regulated by gates, but there is only one update gate and one reset gate, expressed as follows:
wherein, To update the door,/>For/>Activating a function,/>For/>Activating a function,/>To update the weight of the gate,/>To reset the weight of the gate,/>Weights are output for candidate hidden layers,/>For the hidden layer output of the last cell,Is the characteristic value of time t/(To update the bias of the gate,/>To reset the bias of the gate,/>Hidden layer bias as candidate,/>Output for hidden layer of candidates,/>To reset the gate,/>Outputting for the final hidden layer; update gate/>Being able to quantify the extent to which new hidden states are derived from previous hidden states and from current inputs, resetting gates/>The extent to which the previously hidden state should be forgotten is identified.
In A6, through training the composite neural network model, the final output loss function needs to be made as small as possible, so as to illustrate that the composite neural network model achieves the best prediction effect at the moment, and the expression of the final loss function of the composite neural network model is as follows:
wherein, As a final loss function,/>Is the global optimum,/>Weights extracted for features,/>Weight for regression prediction,/>Weights for domain classification,/>Loss function for regression prediction,/>For feature extraction,/>For regression prediction,/>For domain classification,/>Is the total number of domains,/>For 1 st source domain,/>For/>Source domain/>In order for the domain of interest to be a target,Loss function classifying a domain,/>As a gradient inversion function,/>For the characteristic value of the small sample time sequence data in the test set,/>For one piece of data in the small sample time sequence data set in the test set,/>For the number of sample timing predictions,/>/>, For small sample timing data in test setActual value/>For the/>, in the small sample timing prediction resultResults,/>To test the first/>, of small sample data in the setActual tag value of personal field,/>For the/>, in the small sample timing prediction resultTag value of individual field.
The beneficial effects of the invention are as follows: the invention utilizes a k-means clustering algorithm, a one-dimensional convolution neural network, a domain classifier and an LSTM long-short-term memory neural network to form a composite neural network model, and utilizes the composite neural network model obtained by training time sequence data of a source domain to predict time sequence data of a target domain through resistance migration learning, and both the source domain and the target domain show excellent performance.
Example 2
In one embodiment of the present invention, as shown in fig. 2, a small sample timing prediction system structure diagram based on resistance transfer learning includes:
The clustering module is used for dividing the time sequence data into a source domain and a target domain on different granularities by using a k-means clustering algorithm to acquire source domain long time sequence data and target domain time sequence data;
the feature extraction module is used for carrying out feature extraction on time slices on the time series data by utilizing a multi-layer one-dimensional convolutional neural network to obtain feature representation of the source domain long time series data and internal association of feature attributes;
The domain classification module is used for migrating the characteristic representation of the source domain to the target domain through resistance migration learning according to the characteristic representation of the source domain long time sequence data and the internal association of the characteristic attribute, assisting the characteristic representation learning of the target domain time sequence data, and obtaining the characteristic representation of the target domain;
and the time sequence regression prediction module is used for predicting the time sequence of the small sample according to the time sequence data and the characteristic representation.
Wherein, the feature extraction module includes: a plurality of one-dimensional convolution layers and a flat layer which are sequentially connected, wherein the one-dimensional convolution layers are sequentially connected with the ReLU activation layer and the pooling layer;
The one-dimensional convolution layers are used for dividing the source domain long time sequence data into time slices and carrying out quick feature extraction;
and the flattening layer is used for flattening the original characteristic representation of the time series data.
The domain classification module comprises: the gradient inversion layer is connected with the full-connection layers in sequence, the last full-connection layer in the full-connection layers is connected with the Softmax function, and the full-connection layers except the last layer are connected with the ReLU activation function and the Dropout discarding layer in sequence;
the gradient inversion layer is used for carrying out forward processing on the characteristic representation of the source domain long time sequence data and the internal association of the characteristic attribute;
The full-connection layers are used for converting the characteristic representation of the input source domain long time sequence data into the characteristic representation of the specified domain classification task;
Dropout discard layer for discarding data with set probability.
As shown in fig. 3, a model structure diagram of a small sample time sequence prediction system based on resistance transfer learning is shown, wherein a clustering module divides small sample time sequence data to obtain different source domains and target domains, marks the different domain labels at the same time, inputs the time sequence data of the source domains and the time sequence data of the target domains to a feature extraction module, performs feature extraction on a time slice by using a multi-layer one-dimensional convolutional neural network, and inputs the time sequence data to a domain classification module and a time sequence prediction module after flattening; in the domain classification module, firstly, a gradient inversion layer is entered, segmented training can be avoided, a better effect is kept in domain countermeasure migration, then, a plurality of all-connected layers are entered, all-connected layers except for the last layer in the all-connected layers, namely Fc1 and Fc2, are sequentially connected with a ReLU activation function and a Dropout1 discarding layer, the all-connected layer Fc2 is sequentially connected with the ReLU activation function and the Dropout2 discarding layer, excessive fitting is prevented, and the last all-connected layer Fc3 is connected with a Softmax function and is used for classifying data; the time sequence regression prediction module comprises a forward LSTM long-term memory neural network and a reverse LSTM long-term memory neural network, wherein in an input sequence, an initial item is an original item, a second item is a mirror image copy of the input sequence, a feedforward input and a feedback input are complementary to each other, time sequence data can be subjected to global information capture, learning can be carried out according to past and future characteristic representations in the time sequence data, and finally, a prediction result is output by considering context information from past and future instances.
The beneficial effects of the invention are as follows: the invention is based on the GAN generation countermeasure network idea, takes the feature extraction module as a generator, takes the domain classification module as a discriminator, performs countermeasure migration learning of small sample time sequence data, trains the system model by utilizing the time sequence data of the source domain, can predict the time sequence data of the target domain by utilizing the trained system, and has better performance under the condition of only a small amount of small sample time sequence data.
Example 3
In one embodiment of the invention, a data set about urban air quality is selected as small sample time series data, and the data set is composed of six different data collected in urban air projects by Microsoft urban computing research groups, wherein the data collection time span is one year, specifically, 2014, 5,1 to 2015, 4 and 30 days, and the data set is respectively called urban data, regional data, air quality data, weather forecast data, quality station data and meteorological data, and in the air quality data, a characteristic space contains 12 attributes, namely time, PM10 index, O3 index, CO index, SO2 index, NO2 index, weather, temperature, humidity, atmospheric pressure, wind power and wind direction, and can be used for forecasting concentration values of PM 2.5; the processors and GPUs in the example are Intel Xeon (R) and GeForce RTX 3090, implemented by Python 3.8 and PyTorch 1.7.7 frameworks, with training parameters set as follows: the batch size is 32, the period size is 100, the learning rate is 1e-3, and the drop policy probability of the drop layer is set to 0.3.
According to city distribution, the data set is divided into a cluster A and a cluster B by using a cluster module, wherein the cluster A consists of 19 cities nearby a city, the cluster B covers 24 cities nearby B city and comprises 43 areas in total, 1-42 areas are used as training sets, 43 areas are used as test sets, the cluster A is used as a source domain, and the cluster B is used as a target domain. The dataset contained 2891393 hour recordings of 437 air quality monitoring stations, with the data items including various air indicators and geographic environment indicators.
As shown in fig. 4, the small sample timing prediction system based on the resistance transfer learning is actually applied to a process diagram for predicting PM 2.5;
in the training process, firstly, classifying air quality data sets of sites in different areas by using a clustering module, dividing the air quality data sets into a source domain and a target domain, extracting features on time slices by using a feature extraction module, taking the feature extraction module as a generator, taking a domain classification module as a discriminator, generating a feature representation of source domain long time sequence data and internal association of feature attributes by using the feature extraction module, discriminating by using the domain classification module, transferring the features learned by the source domain to the target domain, assisting the target domain time sequence data to perform feature representation learning, obtaining the feature representation of the target domain, finally predicting the air quality data by using a time sequence regression prediction module, and optimizing a composite neural network model according to a final loss function to enable the composite neural network model to reach the best predicted value;
In the testing process, inputting an air quality data set of a new site to be predicted into a trained composite neural network model for processing, and performing feature extraction on the air quality data set of the new site through a feature extraction module by a frozen domain classification module, and predicting by a time sequence regression prediction module to obtain a PM2.5 change curve.
The time sequence regression prediction module is used for predicting the small sample time sequence, the time sequence regression prediction model comprises a two-way LSTM long-short-term memory neural network, the time sequence regression prediction model can be divided into three layers, the hiding size of each layer is 128, meanwhile, in order to improve training performance, the data is expanded to the range of [0,1] by using normalization of a minimum-maximum function, and in addition, any missing value existing in each column attribute is replaced by the average value of the corresponding column.
In one embodiment of the invention, a Root Mean Square Error (RMSE) and an average absolute error (MAE) are selected as indicators to evaluate the model error and air quality prediction performance of the target test location. The model performance evaluation index is calculated as follows.
Wherein,Is root mean square error,/>Is the average absolute error,/>For the number of sample timing predictions,/>/>, For small sample timing data in test setActual value/>For the/>, in the small sample timing prediction resultResults,/>To test the first/>, of small sample data in the setActual tag value of personal field,/>For the/>, in the small sample timing prediction resultTag value of individual field.
The PM2.5 concentration value was used as an indicator showing the next month change, with a search size of 24 x 30, which represents a search per hour in one month, and the performance of the various models to predict PM2.5 concentration values was evaluated, with the evaluation results shown in table 1.
TABLE 1
Type(s) Model RMSE MAE
GRU 12.482 63.6334
Conv-GRU 12.4194 62.6333
DA DA-Conv-GRU <b>3.9553</b> 40.1001
DG DG- Conv-GRU 4.5386 <b>37.7518</b>
LSTM 3.7976 36.4188
Conv-LSTM 3.4802 35.9811
DA DA- Conv-LSTM <b>2.8610</b> <b>30.2536</b>
DG DG- Conv-LSTM 3.9908 32.5313
The system comprises a GRU, a LSTM, a Conv, a DA, a DG, a plurality of classifiers, wherein the GRU is a gating cyclic unit neural network, the LSTM is a long and short term memory neural network, the Conv is a one-dimensional convolutional neural network, the DA is a binary classifier for domain adaptation on a hybrid network, knowledge is transferred from a source domain to a given target domain, and the DG is a plurality of classifiers for domain generalization on the hybrid network in which the knowledge is transferred from a plurality of source domains to an unknown target domain; different combinations of models produce different losses, as can be seen from table 1, the RMSE and MAE of the mixed model with transfer strategies DA and DG are smaller, have better predictive power, face significant differences in distribution between test and training sets, transfer strategies significantly reduce the RMSE and MAE of the corresponding model, have better performance for highly interconnected and long-term sequences of air quality datasets than for GRU, and the performance of DA is better than DG using LSTM's predictive modules. Compared with other models, the proposed composite neural network model (DA-Conv-LSTM)Root mean square error sum/>The average absolute error is minimum, the excellent performance is shown, and the result shows that the model not only has better prediction performance on the original data, but also has good generalization capability.
The beneficial effects of the invention are as follows: the invention can directly use the model obtained by training the time sequence data of the source domain to help predict the time sequence data of the target domain through resistance transfer learning, and simultaneously, the invention can predict the time sequence data under the condition that only a small amount of time sequence data of the target domain is available by taking the characteristic extraction module as a generator and taking the domain classification module as a discriminator through transfer learning, has the characteristics of accuracy and convenience under the condition that the data is scarce, and solves the problems of low utilization rate and inaccurate prediction caused by poor generalization capability of the traditional model and overlarge data distribution difference of different domains.

Claims (9)

1. The small sample time sequence prediction method based on resistance transfer learning is characterized by comprising the following steps of:
s1: based on the resistance transfer learning, constructing and training a composite neural network model to obtain a trained composite neural network model;
s2: acquiring small sample time sequence data to be predicted, and extracting features on a time slice by utilizing a trained composite neural network model to obtain feature representation of the time sequence data;
s3: and carrying out regression prediction by using the trained composite neural network model according to the characteristic representation of the time sequence data to obtain a small sample time sequence prediction result.
2. The small sample time sequence prediction method based on resistance transfer learning according to claim 1, wherein the specific steps of constructing and training the composite neural network model are as follows:
a1: acquiring time sequence data in the existing multi-domain small sample feature space, and dividing the time sequence data into a training set and a testing set;
A2: dividing the time sequence data of the training set into a source domain and a target domain on different granularities by using a k-means clustering algorithm to obtain source domain long time sequence data and target domain time sequence data;
A3: according to the source domain long time sequence data, utilizing a multi-layer one-dimensional convolutional neural network to extract the characteristics on a time slice, and obtaining the characteristic representation of the source domain long time sequence data and the internal association of the characteristic attribute;
a4: according to the characteristic representation of the source domain long time sequence data and the internal association of the characteristic attribute, the domain classification is utilized, the characteristic representation of the source domain is migrated to the target domain through resistance migration learning, the characteristic representation learning is carried out on the target domain time sequence data, and the characteristic representation of the target domain is obtained;
a5: according to the characteristic representation of the target domain and the time sequence data of the target domain, performing regression prediction by using an LSTM long-term memory neural network to obtain a small sample time sequence prediction result;
A6: and calculating to obtain a final loss function of the composite neural network model according to the test set and the small sample time sequence prediction result, judging whether the final loss function is minimum, if so, completing training to obtain a trained composite neural network model, otherwise, returning to A3.
3. The small sample timing prediction method based on resistance transfer learning according to claim 2, wherein the specific steps of A3 are as follows:
A301: according to the source domain long time sequence data, extracting the characteristics on a time slice by utilizing a plurality of one-dimensional convolution layers to obtain the original characteristic representation of the source domain long time sequence data;
A302: and flattening according to the original local characteristics of the source domain long time sequence data to obtain the characteristic representation and the internal association of the characteristic attribute of the source domain long time sequence data.
4. The small sample timing prediction method based on resistance transfer learning according to claim 2, wherein the specific steps of A4 are as follows:
A401: according to the local characteristics of the source domain long time sequence data and the internal correlation of the characteristic attributes, processing is carried out by utilizing a gradient inversion layer, so that the local characteristics of the forward source domain long time sequence data are obtained;
A402: and utilizing a plurality of layers of full-connection layers and a Softmax function, migrating local features of the forward source domain long time sequence data to a target domain through resistance migration learning, and assisting the target domain time sequence data to perform feature representation learning to obtain feature representation of the target domain.
5. The small sample timing prediction method based on resistance transfer learning according to claim 2, wherein the specific steps of A5 are as follows:
A501: according to the characteristic representation of the target domain and the time sequence data of the target domain, performing regression prediction by using a forward LSTM long-term memory neural network to obtain a small sample time sequence forward prediction result;
A502: according to the characteristic representation of the target domain and the time sequence data of the target domain, performing regression prediction by using a reverse LSTM long-term memory neural network to obtain a small sample time sequence reverse prediction result;
A503: and adding according to the forward prediction result and the reverse prediction result to obtain a small sample time sequence prediction result.
6. The small sample timing prediction method based on resistance transfer learning according to claim 2, wherein the expression of the final loss function is as follows:
wherein, As a final loss function,/>Is the global optimum,/>Weights extracted for features,/>Weight for regression prediction,/>Weights for domain classification,/>Loss function for regression prediction,/>For feature extraction,/>For regression prediction,/>For domain classification,/>Is the total number of domains,/>For 1 st source domain,/>For/>Source domain/>For the target domain,/>Loss function classifying a domain,/>As a gradient inversion function,/>For the characteristic value of the small sample time sequence data in the test set,/>For one piece of data in the small sample time sequence data set in the test set,/>For the number of sample timing predictions,/>/>, For small sample timing data in test setActual value/>For the/>, in the small sample timing prediction resultResults,/>To test the first/>, of small sample data in the setActual tag value of personal field,/>For the/>, in the small sample timing prediction resultTag value of individual field.
7. A small sample timing prediction system based on resistance transfer learning for performing the small sample timing prediction method based on resistance transfer learning according to any one of claims 1 to 6, comprising:
The clustering module is used for dividing the time sequence data into a source domain and a target domain on different granularities by using a k-means clustering algorithm to acquire source domain long time sequence data and target domain time sequence data;
the feature extraction module is used for carrying out feature extraction on time slices on the time series data by utilizing a multi-layer one-dimensional convolutional neural network to obtain feature representation of the source domain long time series data and internal association of feature attributes;
The domain classification module is used for migrating the characteristic representation of the source domain to the target domain through resistance migration learning according to the characteristic representation of the source domain long time sequence data and the internal association of the characteristic attribute, assisting the characteristic representation learning of the target domain time sequence data, and obtaining the characteristic representation of the target domain;
and the time sequence regression prediction module is used for predicting the time sequence of the small sample according to the time sequence data and the characteristic representation.
8. The small sample timing prediction system based on resistance transfer learning of claim 7, wherein the feature extraction module comprises: the one-dimensional convolution layers are sequentially connected with the ReLU activation layer and the pooling layer;
the plurality of one-dimensional convolution layers are used for dividing the source domain long time sequence data into time slices and carrying out quick feature extraction;
and the flattening layer is used for flattening the original characteristic representation of the time sequence data.
9. The small sample timing prediction system based on resistance transfer learning of claim 7, wherein the domain classification module comprises: the gradient inversion layer is connected with the full-connection layers in sequence, the last full-connection layer of the full-connection layers is connected with the Softmax function, and the full-connection layers except the last layer are connected with the ReLU activation function and the Dropout discarding layer in sequence;
the gradient inversion layer is used for carrying out forward processing on the characteristic representation of the source domain long time sequence data and the internal association of the characteristic attribute;
The plurality of full-connection layers are used for converting the characteristic representation of the input source domain long time sequence data into the characteristic representation of the specified domain classification task;
The Dropout discarding layer is configured to discard data with a set probability.
CN202410332311.1A 2024-03-22 2024-03-22 PM2.5 prediction method and system based on resistance transfer learning Active CN117932347B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410332311.1A CN117932347B (en) 2024-03-22 2024-03-22 PM2.5 prediction method and system based on resistance transfer learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410332311.1A CN117932347B (en) 2024-03-22 2024-03-22 PM2.5 prediction method and system based on resistance transfer learning

Publications (2)

Publication Number Publication Date
CN117932347A true CN117932347A (en) 2024-04-26
CN117932347B CN117932347B (en) 2024-06-11

Family

ID=90765021

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410332311.1A Active CN117932347B (en) 2024-03-22 2024-03-22 PM2.5 prediction method and system based on resistance transfer learning

Country Status (1)

Country Link
CN (1) CN117932347B (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930686A (en) * 2016-07-05 2016-09-07 四川大学 Secondary protein structureprediction method based on deep neural network
CN112150209A (en) * 2020-06-19 2020-12-29 南京理工大学 Construction method of CNN-LSTM time sequence prediction model based on clustering center
CN112633658A (en) * 2020-12-16 2021-04-09 广东电网有限责任公司广州供电局 Low-voltage distribution area topological relation identification method based on CNN-LSTM
CN113032917A (en) * 2021-03-03 2021-06-25 安徽大学 Electromechanical bearing fault detection method based on generation countermeasure and convolution cyclic neural network and application system
US20210375441A1 (en) * 2020-05-29 2021-12-02 Regents Of The University Of Minnesota Using clinical notes for icu management
KR102374817B1 (en) * 2021-03-05 2022-03-16 경북대학교 산학협력단 Machinery fault diagnosis method and system based on advanced deep neural networks using clustering analysis of time series properties
CN114239652A (en) * 2021-12-15 2022-03-25 杭州电子科技大学 Clustering-based method for recognizing cross-tested EEG emotion through adaptation of confrontation partial domains
US20220108134A1 (en) * 2020-10-01 2022-04-07 Nvidia Corporation Unsupervised domain adaptation with neural networks
CN115114842A (en) * 2022-04-27 2022-09-27 中国水利水电科学研究院 Rainstorm waterlogging event prediction method based on small sample transfer learning algorithm
CN115640901A (en) * 2022-11-01 2023-01-24 华南理工大学 Small sample load prediction method based on hybrid neural network and generation countermeasure
US20230039900A1 (en) * 2021-08-07 2023-02-09 Fuzhou University Method for realizing a multi-channel convolutional recurrent neural network eeg emotion recognition model using transfer learning
CN115730635A (en) * 2022-12-06 2023-03-03 江南大学 Electric vehicle load prediction method
CN116849697A (en) * 2023-05-22 2023-10-10 四川大学 Basin bottom dysfunction assessment method based on self-supervision transfer learning
CN117371543A (en) * 2023-08-31 2024-01-09 浙江工业大学 Enhanced soft measurement method based on time sequence diffusion probability model
CN117494584A (en) * 2023-12-28 2024-02-02 湖南大学 High-dimensional reliability design optimization method based on neural network anti-migration learning
CN117688362A (en) * 2023-12-12 2024-03-12 天津大学 Photovoltaic power interval prediction method and device based on multivariate data feature enhancement

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930686A (en) * 2016-07-05 2016-09-07 四川大学 Secondary protein structureprediction method based on deep neural network
US20210375441A1 (en) * 2020-05-29 2021-12-02 Regents Of The University Of Minnesota Using clinical notes for icu management
CN112150209A (en) * 2020-06-19 2020-12-29 南京理工大学 Construction method of CNN-LSTM time sequence prediction model based on clustering center
US20220108134A1 (en) * 2020-10-01 2022-04-07 Nvidia Corporation Unsupervised domain adaptation with neural networks
CN112633658A (en) * 2020-12-16 2021-04-09 广东电网有限责任公司广州供电局 Low-voltage distribution area topological relation identification method based on CNN-LSTM
CN113032917A (en) * 2021-03-03 2021-06-25 安徽大学 Electromechanical bearing fault detection method based on generation countermeasure and convolution cyclic neural network and application system
KR102374817B1 (en) * 2021-03-05 2022-03-16 경북대학교 산학협력단 Machinery fault diagnosis method and system based on advanced deep neural networks using clustering analysis of time series properties
US20230039900A1 (en) * 2021-08-07 2023-02-09 Fuzhou University Method for realizing a multi-channel convolutional recurrent neural network eeg emotion recognition model using transfer learning
CN114239652A (en) * 2021-12-15 2022-03-25 杭州电子科技大学 Clustering-based method for recognizing cross-tested EEG emotion through adaptation of confrontation partial domains
CN115114842A (en) * 2022-04-27 2022-09-27 中国水利水电科学研究院 Rainstorm waterlogging event prediction method based on small sample transfer learning algorithm
CN115640901A (en) * 2022-11-01 2023-01-24 华南理工大学 Small sample load prediction method based on hybrid neural network and generation countermeasure
CN115730635A (en) * 2022-12-06 2023-03-03 江南大学 Electric vehicle load prediction method
CN116849697A (en) * 2023-05-22 2023-10-10 四川大学 Basin bottom dysfunction assessment method based on self-supervision transfer learning
CN117371543A (en) * 2023-08-31 2024-01-09 浙江工业大学 Enhanced soft measurement method based on time sequence diffusion probability model
CN117688362A (en) * 2023-12-12 2024-03-12 天津大学 Photovoltaic power interval prediction method and device based on multivariate data feature enhancement
CN117494584A (en) * 2023-12-28 2024-02-02 湖南大学 High-dimensional reliability design optimization method based on neural network anti-migration learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ZENG, YUYANG等: "Cross-Domain Text Sentiment Classification Method Based on the CNN-BiLSTM-TE Model", JOURNAL OF INFORMATION PROCESSING SYSTEMS, 31 December 2021 (2021-12-31) *
宋闯;赵佳佳;王康;梁欣凯;: "面向智能感知的小样本学习研究综述", 航空学报, no. 1, 31 December 2020 (2020-12-31) *
董亮等: "基于卷积神经网络和迁移学习的图像分类", 信息与电脑(理论版), 31 December 2021 (2021-12-31) *

Also Published As

Publication number Publication date
CN117932347B (en) 2024-06-11

Similar Documents

Publication Publication Date Title
CN109492822B (en) Air pollutant concentration time-space domain correlation prediction method
CN109142171B (en) Urban PM10 concentration prediction method based on feature expansion and fusing with neural network
CN109117883B (en) SAR image sea ice classification method and system based on long-time memory network
Wu et al. A hybrid support vector regression approach for rainfall forecasting using particle swarm optimization and projection pursuit technology
CN109086926B (en) Short-time rail transit passenger flow prediction method based on combined neural network structure
CN112232543A (en) Multi-site prediction method based on graph convolution network
Chiang et al. Hybrid time-series framework for daily-based PM 2.5 forecasting
Li et al. Deep spatio-temporal wind power forecasting
CN112598165A (en) Private car data-based urban functional area transfer flow prediction method and device
CN111222689A (en) LSTM load prediction method, medium, and electronic device based on multi-scale temporal features
CN112285376A (en) Wind speed prediction method based on CNN-LSTM
CN116307103A (en) Traffic accident prediction method based on hard parameter sharing multitask learning
CN114596726B (en) Parking berth prediction method based on interpretable space-time attention mechanism
CN117390506A (en) Ship path classification method based on grid coding and textRCNN
CN117436653A (en) Prediction model construction method and prediction method for travel demands of network about vehicles
CN117932347B (en) PM2.5 prediction method and system based on resistance transfer learning
CN117152427A (en) Remote sensing image semantic segmentation method and system based on diffusion model and knowledge distillation
CN116245259A (en) Photovoltaic power generation prediction method and device based on depth feature selection and electronic equipment
CN115481788A (en) Load prediction method and system for phase change energy storage system
Mao et al. Naive Bayesian algorithm classification model with local attribute weighted based on KNN
Sangeetha et al. Crime Rate Prediction and Prevention: Unleashing the Power of Deep Learning
Bi et al. Multi-indicator water time series imputation with autoregressive generative adversarial networks
Liu et al. Short-term Load Forecasting Approach with SVM and Similar Days Based on United Data Mining Technology
Chen et al. Combining random forest and graph wavenet for spatial-temporal data prediction
Shi Image Recognition of Skeletal Action for Online Physical Education Class based on Convolutional Neural Network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant