CN117439069A

CN117439069A - Electric quantity prediction method based on neural network

Info

Publication number: CN117439069A
Application number: CN202311401832.XA
Authority: CN
Inventors: 杨冰; 李云; 江再玉; 熊根鑫; 张俊权; 石文娟; 殷柯; 程峰
Original assignee: Beijing China Power Information Technology Co Ltd
Current assignee: Beijing China Power Information Technology Co Ltd
Priority date: 2023-10-26
Filing date: 2023-10-26
Publication date: 2024-01-23

Abstract

The invention provides an electric quantity prediction method based on a neural network, which comprises 6 main steps of data preparation, network structure design, data division, network training, model evaluation and prediction application. The beneficial technical effects are as follows: the method is based on the neural network, and can effectively solve the defects in the traditional method and improve the accuracy and effect of prediction through the advantages of strong modeling capability, automatic feature extraction, long-term dependence modeling, multi-source data fusion and the like. The invention can well process the fusion of multi-source data and effectively integrate different types of data. The invention can also automatically extract the characteristics: the neural network can automatically learn the characteristic representation in the data without manually selecting and constructing the characteristics. Through layer-by-layer feature extraction and abstraction, the neural network can learn higher-level feature representation from the original data, and effectively overcomes the limitation of feature engineering in the traditional method.

Description

Electric quantity prediction method based on neural network

Technical Field

The invention belongs to the technical field of electric power, and particularly relates to an electric quantity prediction method based on a neural network.

The background technology is as follows:

the current electric quantity prediction method has the following defects:

data sparsity: the charge data is usually collected in units of hours, days or months, and in practical applications, the charge data at many time points may be missing or incomplete. This data sparsity can have an impact on the predictive performance of the model and also presents a challenge in missing value handling and filling.

Feature selection and construction: the accuracy of the electrical quantity prediction is affected by feature selection and construction. Current methods may not be comprehensive enough in terms of feature selection, requiring more in-depth domain knowledge and feature engineering skills to improve the performance of the model. In addition, the conventional feature construction method may be too simple to fully mine hidden information in the data.

Nonlinear modeling difficulty: in the task of predicting the electric quantity, the electric quantity is related to a plurality of factors (such as seasonal variation, weather condition, holiday and the like), and a complex nonlinear relation exists among the factors. The current model still has a certain limitation in dealing with the nonlinearity problem, and the nonlinearity characteristic of the data may not be fully captured.

Disclosure of Invention

Aiming at the defects in the background art, the invention provides a power grid load identification method based on a manual algorithm. The method comprises the following steps:

a power prediction method based on neural network is implemented by computer,

step 1: data preparation: historical power data is collected along with other factors related thereto, which refer to weather conditions, seasons. The data is then collected for preprocessing. Preprocessing includes normalization and feature selection.

Step 2: and (3) network structure design: a feed forward neural network (feedforwardseuralnetwork) or a recurrent neural network (recurrentneurosure) is selected as the structure of the neural network and the number of nodes of the input layer, hidden layer and output layer of the network is determined.

Step 3: dividing data: the collected data is divided into a training set and a test set. A portion of the data is used for training and the remaining data is used for testing.

Step 4: training a network: the neural network is trained using the training set. The weights and offsets of the network are adjusted by a back propagation algorithm (backprojection) to minimize the prediction error.

Step 5: model evaluation: the trained neural network is evaluated using the test set. The error index between the predicted result and the actual value is calculated with root mean square error (rootmeansquarererror) or mean absolute error (meansoluteerror).

Step 6: prediction application: and predicting the future electric quantity by using the trained neural network model. And transmitting the input data to be predicted into a network to obtain a corresponding output result.

Advantageous technical effects

The method has the following characteristics:

the electric quantity prediction method based on the neural network can effectively solve the defects in the traditional method through the advantages of strong modeling capability, automatic feature extraction, long-term dependence modeling, multi-source data fusion and the like, and improves the accuracy and effect of prediction.

Multi-source data fusion: the neural network can well process fusion of multi-source data and effectively integrate different types of data. For example, weather data, holiday information, etc. may be input to the neural network along with the power data for training to improve the predictive performance of the model.

And (3) automatically extracting features: the neural network can automatically learn the characteristic representation in the data without manually selecting and constructing the characteristics. Through layer-by-layer feature extraction and abstraction, the neural network can learn higher-level feature representation from the original data, and effectively overcomes the limitation of feature engineering in the traditional method.

Data modeling capabilities: the neural network has strong data modeling and expression capability, and can flexibly process nonlinear relations and complex characteristic combinations. Compared with the traditional linear model, the neural network can better capture the hidden relation and mode in the data, and the prediction accuracy is improved.

Drawings

FIG. 1 is a flow chart of the present invention.

Fig. 2 is a block flow diagram of step 1.

Fig. 3 is a block flow diagram of step 2.

Fig. 4 is a block flow diagram of step 3.

Fig. 5 is a flow chart of step 5.

Detailed Description

The structural features of the present invention will now be described in detail with reference to the accompanying drawings.

Referring to fig. 1, a neural network-based power prediction method, by a computer,

Referring to fig. 2, further, the specific method of step 1 is as follows:

1.1 data collection: historical electric quantity data, weather conditions and seasonal attribute data are obtained from public platforms, electric power companies, energy companies or government institutions on one hand, and on the other hand, sensors can be used for real-time acquisition.

1.2 data preprocessing: the data is preprocessed to facilitate training and testing of the neural network model.

Data cleaning: deleting the missing value and the abnormal value in the data.

Data conversion: the electrical quantity data is normalized or standardized so that data of different scales can be compared on the same scale. Feature selection: the most representative features are manually selected from other factors (weather conditions, seasonal attributes) related to the amount of electricity to improve the accuracy of the predictions.

1.3 preliminary analysis of data: preliminary analysis is carried out on the data, including methods such as statistical description and visual analysis. By observation and analysis of the data, it is possible to help determine the input and output variables of the neural network model and determine which factors may have an impact on the power prediction.

1.4 establishing a dataset: the prepared data set is divided into a training set, a validation set and a test set. The ratio of the three is 5-7:2-3:1-2, i.e., 50-70% for model training, 20-30% for validating parameters of the model, and 10-20% for final model testing and prediction.

Referring to fig. 3, further, the specific steps of step 2 are as follows:

2.1 selecting a network type: according to the requirements of electric quantity prediction and the characteristics of data, a feed forward neural network (feed forward neural network), a cyclic neural network (recurrentneural network), a long-short-term memory network (LSTM) or a convolutional neural network (Convolitional neural network) is selected as a neural network type.

2.2 determining the network layer number and the node number: and determining the layer number of the neural network and the node number of each layer according to the complexity of the problem and the characteristics of the data. Increasing the depth and number of nodes of the network may increase the representation capabilities of the network, but may also increase the complexity of computation and training.

2.3 selecting an activation function: reLU, sigmoid, or Tanh is selected as the activation function for each layer of neurons. The selection of the activation function takes into account the nonlinear expression capability of the network, the disappearance of the gradient, and the like.

2.4 adding regularization and dropout: to prevent overfitting and improve the generalization ability of the model, regularization terms (e.g., L1 regularization, L2 regularization) and dropout layers can be added to the network to reduce the complexity and number of parameters of the network.

2.5 building a network connection: the input data and the output data are connected with an input layer and an output layer of the neural network. The dimension of the input data is ensured to be matched with the number of the nodes of the input layer, and the number of the nodes of the output layer meets the requirement of the prediction problem.

2.6 parameter initialization: and carrying out random initialization, xavier initialization or He initialization on parameters of the neural network. Good parameter initialization can accelerate the convergence speed of the network and improve the training performance.

Further, in step 2, the process comprises,

feed forward neural network (feed forward neural network): is a most basic neural network structure, and consists of an input layer, a plurality of hidden layers and an output layer. Each layer is fully connected to the next and information can only flow in the direction of input to output without a feedback loop. The number of network layers and nodes can be designed according to the complexity and data volume of a specific task, and in general, increasing the number of network layers and nodes can improve the expression capability and learning capability of the model, but also increase the complexity of calculation and training. Preferably, the network layer number is as follows: 1-5 layers. Number of nodes per layer: between 10 and 100.

Recurrent neural network (recurrentneuroalnetwork): the neural network structure with time sequence processing capability is characterized in that cyclic connection is introduced between hidden layers, so that the network can transmit and utilize historical information. The recurrent neural network can design the number of network layers and nodes according to the complexity of the task and the length of the time sequence. In general, increasing the number of network layers and nodes can increase the memory and context-awareness capabilities of the model. Preferably, the network layer number is as follows: 1-3 layers. Number of nodes per layer: between 10 and 100.

Long and short term memory network (LSTM): the system is a special circulating neural network structure, solves the problems of gradient disappearance and gradient explosion in the traditional circulating neural network by introducing a gating mechanism, and can effectively capture and memorize long-term dependency. The number of layers and the number of nodes of the LSTM network can be designed according to the complexity of tasks and the length of an input sequence, and the modeling capacity and the memory capacity of the model can be improved by increasing the number of layers and the number of nodes of the network. Preferably, the network layer number is as follows: 1-3 layers. Number of nodes per layer: between 10 and 100.

Convolutional neural network (convolitional neural network): the neural network structure is specially used for processing images and space data, and has the characteristics of translational invariance and local perceptibility. Convolutional neural networks are typically composed of a convolutional layer, a pooling layer, and a fully-connected layer. The design of the network layer number and the node number can be determined according to the complexity, the size and the task requirement of the image, and in general, the extraction and the representation capability of the model on the image characteristics can be improved by increasing the network layer number and the node number. Preferably, the network layer number is as follows: 2-5 layers. Number of nodes per layer: between 16 and 128.

Further, in step 2, the process comprises,

the activation function is a nonlinear transformation commonly used in neural networks, which can map the input of neurons onto output space, increasing the nonlinear expression capability of the network. The method of using ReLU, sigmoid, tanh activation functions is as follows:

ReLU (RectifiedLinearUnit) function: the method is that 0 is output for neurons with negative inputs and the neuron with positive inputs outputs itself. The formula is as follows:

f(x)＝max(0,x)

where x is the input of the neuron, f (x) is its output, and max is the maximum of the two numbers.

Sigmoid function: particularly useful in classification problems. The method is to map the input to a real number domain between 0 and 1, and the output result is a probability value. The formula is as follows:

f(x)＝1/(1+exp(-x))

where x is the input of the neuron, f (x) is its output, exp represents the natural exponential function.

Tanh function: also called hyperbolic tangent function, the output range is between-1 and 1, and is more suitable for processing multi-class classification problems. The formula is as follows:

f(x)＝(exp(x)-exp(-x))/(exp(x)+exp(-x))

The activation function is applied to the neuron activation process in the neural network, and the proper activation function is specifically selected according to different situations so as to obtain better performance and effect. Wherein: input: the input data of neurons is the sample feature vector. And (3) outputting: and outputting a result after the neuron passes through the activation function. Furthermore, weights and offsets are manually set: weight, which refers to a weight value that connects each input to a neuron, is used to adjust the effect of each input on the neuron output. Bias, which refers to the bias value of each neuron, corresponds to the initial activation state of the neuron, and is used to adjust the output offset of the network.

Regularization terms, including L1 regularization, L2 regularization, are used to prevent overfitting.

L1 regularization achieves regularization by punishing the weights of the model so that smaller weights are shrunk to zero faster, thereby reducing the number of features and selecting more important features. The cost function formula is as follows:

l1=λΣ|wi|, where Wi is each element in the weight matrix, λ is a super-parameter for controlling the intensity of regularization. Compared to L2 regularization, L1 regularization is more prone to generating sparse weight matrices, i.e., most weights approach zero, with only a small portion of weights having a larger value.

L2 regularization also achieves regularization by punishing the weights of the model, but L2 regularization punishes the sum of squares of the weights, making the weights more diffuse rather than shrinking to zero. The cost function formula is as follows:

l2=λΣ (Wi)/(2), where Wi is each element in the weight matrix, and λ is a super-parameter for controlling the intensity of regularization. L2 regularization is more prone to making all weights in the weight matrix smaller, but does not significantly reduce the number of weights.

If features need to be filtered and the dataset is relatively large, L1 regularization can be used. L2 regularization may be used if overfitting needs to be avoided, but the number of features need not be of particular concern.

In the electric quantity prediction task of the neural network, random initialization, xavier initialization or He initialization is adopted, so that proper initial weight is provided for the neural network, and training and convergence are better performed.

Random initialization: by randomly selecting the initial values of the weights from a uniform distribution or a normal distribution. This approach is less effective in many cases because random initialization may result in weights that are too large or too small, making training of the neural network difficult.

Xavier initialization (also referred to as gloriot initialization): the method is suitable for the neural network with the Sigmoid activation function or the tanh activation function. It dynamically calculates initial weights based on the number of input and output neurons per layer. For a layer with n input neurons and m output neurons, xavier initialization uses the following formula:

W＝np.random.randn(n,m)/sqrt(n)

where W represents a weight matrix, np.random.randn is used to generate random numbers meeting a standard normal distribution, and sqrt represents a square root.

He initialization: is suitable for the neural network with the ReLU activation function. Similar to Xavier initialization, he initialization also dynamically calculates initial weights based on the number of input neurons per layer. For a layer with n input neurons, he initialization uses the following formula:

W＝np.random.randn(n,m)/sqrt(n/2)

The Xavier initialization and He initialization take into account the number of input and output neurons per layer, which can better accommodate signaling between different layers. They generally increase the convergence rate of the neural network and improve the training effect compared to simple random initialization methods. The choice of which initialization method to use should be determined according to the specific neural network architecture and task requirements.

Referring to fig. 4, further, the specific steps of step 3 are as follows:

3.1 loss function selection: based on the characteristics and objectives of the power prediction task, a square error (MSE) or Mean Absolute Error (MAE) is selected as an objective function of network optimization.

3.2 optimization algorithm selection: the gradient descent method (GradientDescent), the random gradient descent method (StochasticGradientDescent), or the Adam optimization algorithm is selected to update the parameters of the neural network to minimize the loss function.

Wherein, gradient descent method (GradientDescent): the cost function is minimized by continuously updating the weighting parameters of the neural network. Specifically, during each iteration, the gradient descent method uses the errors of all samples to calculate the gradient of the weight parameter, and then adjusts the value of the weight parameter according to the opposite direction of the gradient. This process will continue until the cost function reaches a minimum or a certain number of iterations.

Random gradient descent method (stochasticGradientDesceptin): only one sample error is used at a time to update the weight parameters. Compared with the gradient descent method, the random gradient descent method has higher iteration speed, but is also more unstable and is easily affected by noise.

Adam optimization algorithm: is an optimization algorithm combining a momentum gradient descent method and an adaptive learning rate. It takes into account the historical gradient and momentum information of each weight when updating the weight parameters and adjusts the learning rate according to these information. The method can adaptively adjust the learning rate among different weight parameters, thereby more effectively optimizing the neural network.

In the case of a large training data set, the random gradient descent method is preferable. While Adam's optimization algorithm is more suitable for training deep neural networks.

3.3 learning rate adjustment: the learning rate is a super-parameter in the optimization algorithm that determines the stride of parameter update. The convergence speed and stability of the model can be controlled by adjusting the learning rate. The learning rate may use a fixed learning rate, or an adaptive learning rate may be employed, the adaptive learning rate including: learning rate decay and dynamic adjustment of learning rate.

Fixed learning rate: i.e. the learning rate is kept unchanged throughout the training process. This approach is applicable in cases where the training set is small and relatively simple, and the best learning rate can be found by testing different learning values.

Learning rate decay: the method is a method for gradually reducing the learning rate, and can enable the model to approach the globally optimal solution more quickly in the early stage of training and adjust parameters more finely in the later stage.

Dynamically adjusting the learning rate: the learning rate is automatically adjusted according to the performance condition of the model in the training process.

Early stop method: model performance is monitored on the validation set, training is stopped when performance is no longer improved, so that overfitting can be avoided, and the learning rate can be adjusted according to actual conditions.

3.4 batch training and iteration number: the number of samples per training batch is determined and the appropriate number of iterations is set. The frequency of parameter updating can be reduced in batch training, and training efficiency and stability are improved. The number of iterations should be determined based on the convergence of the network and the variation in training error.

3.5 model evaluation and tuning: in the training process, the model is evaluated by using the verification set regularly, and tuning is performed according to the evaluation result. The method of evaluation is to select the best model by comparing the indexes of loss function, prediction precision, etc. of different models, and different super parameter settings can be tried to further improve the performance of the model.

3.6 integration method: and an ensemble learning method is used for improving the accuracy and stability of the electric quantity prediction model. The integration method includes bagging, boosting and/or random forests. By combining the prediction results of multiple base models, more accurate and robust prediction results can be obtained.

Further, the specific steps of step 4 are as follows:

4.1 data preprocessing: and performing data cleaning, feature selection, missing value processing, feature normalization and/or data normalization on the input data to improve the quality and usability of the data. At the same time, the input data is subjected to feature engineering, and more useful features are extracted by transforming and combining the original data.

4.2 data partitioning: the data set is divided into a training set, a validation set and a test set. The training set is used for training the model and updating parameters, the verification set is used for optimizing the model and selecting the optimal super parameters, and the test set is used for evaluating the generalization capability of the model. The duty ratios of the training set, the verification set and the test set are as follows: 70% ± 20% of the data set was used for training, 15% ± 10% of the data set was used for validation, and 15% ± 10% of the data set was used for testing.

4.3 data enhancement: in the event that the data is scarce or unbalanced, data enhancement techniques may be used to expand the number and diversity of samples of the training set. The data enhancement method comprises the following steps: random clipping, rotation, translation, flipping, etc., to generate new training samples. Data enhancement can increase the generalization ability and robustness of the model.

4.4 batch processing: in the training process, in order to improve the calculation efficiency, the data can be divided into small batches for processing. The data for each batch is used as input to the model, and the loss is calculated and parameters updated by forward and backward propagation. The batch processing can improve the training speed, so that the optimization process is more stable.

4.5 feature scaling: feature scaling of the data prior to training the neural network may improve the performance and training effect of the model. Feature scaling methods include Normalization (Normalization) and can adjust the data to the same scale range and avoid the effect of differences between different features on the model.

4.6 data sampling and balancing: for unbalanced data sets, sample distribution among the categories, such as undersampling and oversampling, may be balanced by the sampling method. Undersampling deletes samples of some majority categories, and oversampling replicates or generates samples of some minority categories. Through data sampling and balancing, the learning ability of the model to a few class samples can be improved.

Referring to fig. 5, further, the specific steps of step 5 are as follows:

5.1 model selection: based on task type, data characteristics and expected performance factors, a neural network model, a decision tree model or a support vector machine model is selected.

5.2 initialization parameters: before starting training the model, the parameters of the model need to be initialized: the model weights and biases are initialized by using a random initialization method, so that the model can learn from a better starting point. The initialization method comprises uniform distribution initialization, normal distribution initialization or zero initialization, etc.

5.3 forward propagation and loss calculation: and sending the input sample into the model through forward propagation to obtain an output result of the model. Meanwhile, according to the task requirement, a loss function (cost function) is calculated to evaluate the difference between the prediction result of the model and the real label. The aforementioned loss function is a Mean Square Error (MSE) or cross entropy loss.

5.4 back propagation and parameter update: the calculation model outputs gradient information for each parameter by a back propagation algorithm. Parameters of the model are updated according to an optimization algorithm (random gradient descent method) using the gradient information to reduce the value of the loss function. The rate and direction of parameter updating are controlled by super parameters such as learning rate. And repeatedly carrying out forward propagation and backward propagation through multiple iterations, and gradually optimizing parameters of the model to enable a prediction result of the model to approach to a real label.

5.5 model evaluation and tuning: in the training process, the model is evaluated by using the verification set, and tuning is performed according to the evaluation result. The indexes adopted in the evaluation are as follows: accuracy, precision, recall, and/or F1 score, etc. According to the evaluation result, the super-parameters of the model can be adjusted, the model structure can be modified, or other optimization strategies can be selected so as to improve the performance and generalization capability of the model.

5.6 model save and load: after training is completed, the trained model needs to be saved for subsequent prediction and use.

Claims

1. The electric quantity prediction method based on the neural network is characterized by comprising the following steps of: the computer is used for carrying out the following operations,

step 1: data preparation: collecting historical electric quantity data and other factors related to the historical electric quantity data, wherein the other factors refer to weather conditions and seasons; preprocessing the collected data; preprocessing comprises normalization and feature selection;

step 2: and (3) network structure design: selecting a feedforward neural network or a circulating neural network as a neural network structure, and determining the node numbers of an input layer, a hidden layer and an output layer of the network;

step 3: dividing data: dividing the collected data into a training set and a testing set; a portion of the data is used for training and the remainder of the data is used for testing;

step 4: training a network: training the neural network by using the training set; adjusting the weights and offsets of the network by a back propagation algorithm to minimize the prediction error;

step 5: model evaluation: evaluating the trained neural network by using the test set; calculating an error index between the predicted result and the actual value by using a root mean square error or an average absolute error;

step 6: prediction application: predicting the future electric quantity by using the trained neural network model; and transmitting the input data to be predicted into a network to obtain a corresponding output result.

2. The neural network-based power prediction method according to claim 1, wherein: the specific method of the step 1 is as follows:

1.1 data collection: historical electric quantity data, weather conditions and seasonal attribute data are obtained from public platforms, electric power companies, energy companies or government institutions on one hand, and sensors can be used for real-time acquisition on the other hand;

1.2 data preprocessing: preprocessing data to facilitate training and testing of the neural network model;

data cleaning: deleting the missing value and the abnormal value in the data;

data conversion: normalizing or standardizing the electric quantity data;

feature selection: manually selecting the most representative characteristic from other factors related to the electric quantity so as to improve the accuracy of prediction;

1.3 preliminary analysis of data: preliminary analysis is carried out on the data, including methods such as statistical description and visual analysis; by observing and analyzing the data, the input variable and the output variable of the neural network model can be determined, and the factors which possibly affect the electric quantity prediction are determined;

1.4 establishing a dataset: dividing the prepared data set into a training set, a verification set and a test set; the ratio of the three is 5-7:2-3:1-2, i.e., 50-70% for model training, 20-30% for validating parameters of the model, and 10-20% for final model testing and prediction.

3. The neural network-based power prediction method according to claim 1, wherein: the specific steps of the step 2 are as follows:

2.1 selecting a network type: according to the requirements of electric quantity prediction and the characteristics of data, a feedforward neural network, a cyclic neural network, a long-term and short-term memory network or a convolution neural network is selected as a neural network type;

2.2 determining the network layer number and the node number: determining the number of layers of the neural network and the node number of each layer according to the complexity of the problem and the characteristics of data;

2.3 selecting an activation function: selecting ReLU, sigmoid or Tanh as an activation function for each layer of neurons;

2.4 adding regularization: to prevent overfitting and improve generalization ability of the model, regularization terms may be added in the network to reduce complexity and number of parameters of the network;

2.5 building a network connection: connecting the input data and the output data with an input layer and an output layer of the neural network;

2.6 parameter initialization: and carrying out random initialization, xavier initialization or He initialization on parameters of the neural network.

4. A neural network-based power prediction method according to claim 3, wherein: in the step 2 of the process, the process is carried out,

feedforward neural network: is a most basic neural network structure, and consists of an input layer, a plurality of hidden layers and an output layer; each layer is fully connected to the next layer and information can only flow in the direction of input to output without a feedback loop; the network layer number is as follows: 1-5 layers; number of nodes per layer: between 10 and 100;

cyclic neural network: the neural network structure with time sequence processing capability is characterized in that cyclic connection is introduced between hidden layers, so that the network can transmit and utilize historical information; the network layer number is as follows: 1-3 layers; number of nodes per layer: between 10 and 100;

long-term memory network: the system is a special circulating neural network structure, solves the problems of gradient disappearance and gradient explosion in the traditional circulating neural network by introducing a gating mechanism, and can effectively capture and memorize long-term dependency; the network layer number is as follows: 1-3 layers; number of nodes per layer: between 10 and 100;

convolutional neural network: the neural network structure is specially used for processing images and space data, and has the characteristics of translational invariance and local perceptibility; convolutional neural networks are typically composed of a convolutional layer, a pooling layer, and a fully-connected layer; the network layer number is as follows: 2-5 layers; number of nodes per layer: between 16 and 128.

5. The neural network-based power prediction method according to claim 1, wherein: in the step 2 of the process, the process is carried out,

the activation function is a nonlinear transformation commonly used in the neural network, and can map the input of the neurons to the output space, so as to increase the nonlinear expression capacity of the network; the method of using ReLU, sigmoid, tanh activation functions is as follows:

ReLU function: the method is that 0 is output for the neuron with the input of negative value, and the neuron with the input of positive value outputs itself;

sigmoid function: particularly useful in classification problems; mapping the input to a real number domain between 0 and 1, and outputting a probability value as the result;

tanh function: also called hyperbolic tangent function, the output range is between-1 and 1, and the method is more suitable for processing multiple classification problems;

regularization terms, including L1 regularization, L2 regularization, for preventing overfitting; l1 regularization achieves regularization by punishing the weights of the model so that smaller weights are shrunk to zero faster, thereby reducing the number of features and selecting more important features; the regularization of L2 is realized by punishing the weight of the model, but the regularization of L2 punishs the square sum of the weights, so that the weight is more scattered instead of shrinking to zero; if features need to be screened and the data set is larger, L1 regularization can be used; if overfitting needs to be avoided, but the number of features need not be of particular concern, L2 regularization can be used;

in the electric quantity prediction task of the neural network, random initialization, xavier initialization or He initialization is adopted, so that proper initial weight is provided for the neural network, and training and convergence are better carried out; random initialization: by randomly selecting initial values of weights from a uniform distribution or a normal distribution; xavier initialization: the method is suitable for the neural network with the Sigmoid activation function or the tanh activation function; it dynamically calculates initial weights based on the number of input and output neurons for each layer; he initialization: is suitable for a neural network with a ReLU activation function; he initialization dynamically calculates initial weights according to the number of input neurons of each layer; the Xavier initialization and He initialization take into account the number of input and output neurons per layer, which can better accommodate signaling between different layers.

6. The neural network-based power prediction method according to claim 1, wherein: the specific steps of the step 3 are as follows:

3.1 loss function selection: according to the characteristics and the targets of the electric quantity prediction task, a square error or an average absolute error is selected as an objective function of network optimization;

3.2 optimization algorithm selection: selecting a gradient descent method, a random gradient descent method or an Adam optimization algorithm to update parameters of the neural network so as to minimize a loss function;

wherein, gradient descent method: the cost function is minimized by continuously updating the weight parameters of the neural network;

random gradient descent method: updating the weight parameters using only one sample error at a time;

adam optimization algorithm: is an optimization algorithm combining a momentum gradient descent method and an adaptive learning rate;

in the case of a large training dataset, a random gradient descent method is preferred; the Adam optimization algorithm is more suitable for training the deep neural network;

3.3 learning rate adjustment: the learning rate is a super parameter in the optimization algorithm, which determines the stride of parameter update; the convergence speed and stability of the model can be controlled by adjusting the learning rate; the learning rate may use a fixed learning rate, or an adaptive learning rate may be employed, the adaptive learning rate including: learning rate decay and dynamic adjustment of learning rate;

3.4 batch training and iteration number: determining the number of samples of each training batch, and setting proper iteration times; the frequency of parameter updating can be reduced by batch training, and the training efficiency and stability are improved; the iteration times are determined according to the convergence condition of the network and the change of training errors;

3.5 model evaluation and tuning: in the training process, the model is evaluated by using the verification set regularly, and tuning is performed according to the evaluation result; the method for evaluating is to select the optimal model by comparing indexes such as loss functions, prediction precision and the like of different models, and different super-parameter settings can be tried to further improve the performance of the model;

3.6 integration method: an integrated learning method is used for improving the accuracy and stability of the electric quantity prediction model; the integration method comprises bagging, boosting and/or random forest; by combining the prediction results of multiple base models, more accurate and robust prediction results can be obtained.

7. The neural network-based power prediction method according to claim 1, wherein: the specific steps of the step 4 are as follows:

4.1 data preprocessing: performing data cleaning, feature selection, missing value processing, feature standardization and/or data normalization operation on the input data to improve the quality and usability of the data; meanwhile, the input data is subjected to feature engineering, and more useful features are extracted by transforming and combining the original data;

4.2 data partitioning: dividing the data set into a training set, a verification set and a test set; the training set is used for training the model and updating parameters, the verification set is used for optimizing the model and selecting the optimal super parameters, and the test set is used for evaluating the generalization capability of the model; the duty ratios of the training set, the verification set and the test set are as follows: 70% ± 20% of the data set was used for training, 15% ± 10% of the data set was used for validation, and 15% ± 10% of the data set was used for testing.

8. The neural network-based power prediction method according to claim 1, wherein: the specific steps of the step 5 are as follows:

5.1 model selection: selecting a neural network model, a decision tree model or a support vector machine model based on task types, data characteristics and expected performance factors;

5.2 initialization parameters: before starting training the model, the parameters of the model need to be initialized: initializing the weight and bias of the model by using a random initialization method;

5.3 forward propagation and loss calculation: sending the input sample into the model through forward propagation to obtain an output result of the model; meanwhile, calculating a loss function to evaluate the difference between the prediction result of the model and the real label; the loss function is mean square error or cross entropy loss;

5.4 back propagation and parameter update: outputting gradient information for each parameter by the calculation model through a back propagation algorithm; updating parameters of the model according to an optimization algorithm by utilizing gradient information so as to reduce the value of a loss function; the speed and direction of parameter updating are controlled by super parameters such as learning rate; forward propagation and reverse propagation are repeatedly carried out through multiple iterations, and parameters of the model are gradually optimized, so that a prediction result of the model approaches to a real label;

5.5 model evaluation and tuning: in the training process, the model is evaluated by using the verification set, and tuning is performed according to the evaluation result; the indexes adopted in the evaluation are as follows: accuracy, precision, recall, or/and F1 score, etc.;