CN113240094A

CN113240094A - SVM-based LSTM hyper-parameter optimization method, system, medium and device

Info

Publication number: CN113240094A
Application number: CN202110634057.7A
Authority: CN
Inventors: 伍卫国; 马春苗; 王思敏; 朱肖肖
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2021-06-07
Filing date: 2021-06-07
Publication date: 2021-08-10

Abstract

The invention discloses an LSTM (least squares support vector machine) super-parameter optimization method, a system, a medium and equipment based on an SVM (support vector machine). N groups of super-parameter combinations are randomly selected and input into an LSTM temperature prediction model for training; selecting the minimum parameter combination of the first 3 groups of models RMSE according to the existing results to carry out disturbance, and training to obtain a hyper-parameter combination result set; training an SVM proxy model by using the hyper-parameter combination result set, and predicting all hyper-parameter spaces; selecting the best front N/N groups of super-parameter combinations to be brought into an LSTM temperature prediction model to obtain a real RMSE result; and updating a hyper-parameter combination result set according to the obtained real RMSE result, realizing gradient reduction by updating N to N/N, finishing the cycle when N is less than 1 to obtain the first 3 groups of hyper-parameter combinations with the minimum RMSE, bringing the first 3 groups of hyper-parameter combinations into the LSTM temperature prediction model which is trained to obtain the comprehensive performance of the LSTM temperature prediction model, and taking the highest hyper-parameter combination as the optimal hyper-parameter combination for hyper-parameter optimization. The invention reduces the number of selected sample points and improves the optimization efficiency.

Description

SVM-based LSTM hyper-parameter optimization method, system, medium and device

Technical Field

The invention belongs to the technical field of data centers, and particularly relates to an LSTM (least squares support vector machine) hyper-parameter optimization method, system, medium and equipment based on an SVM (support vector machine).

Background

While the mobile internet is rapidly developed, the number and scale of data centers in China are rapidly increased, and the excessive refrigeration of refrigeration equipment in the data centers causes more serious resource waste. In order to optimize the energy consumption of the data center, the problems of hot spots and cooling lag are avoided by establishing a temperature prediction model. For temperature, temperature increase or decrease is a time-series-dependent gradual process, so that a Long Short-Term Memory (LSTM) network with great advantage in processing data with high correlation with time series is selected for modeling.

LSTM is a special Recurrent Neural Network (RNN). In the training of the original RNN, as the training time is lengthened and the number of network layers is increased, the problem of gradient explosion or gradient disappearance is easily caused, so that long sequence data cannot be processed, and information of long-distance data cannot be acquired. The LSTM solves the problem of long-term memory, adjusts whether the previous network memory state acts on the current network calculation or not by acting on nodes of the RNN through three valves, and greatly improves the prediction accuracy on time sequence data by using an LSTM prediction model.

The machine learning model is seriously dependent on the super-parameter selection of the model in performance or computational complexity, the LSTM is a neural network algorithm and is a black box model, the influence of the super-parameters on the precision can be known only after the model is executed, and how to efficiently select the super-parameter combination with better performance is worthy of research.

The Hyper-parameter optimization problem is commonly referred to as the HPO (Hyper-parameter optimization) problem, and is defined as follows: a is a machine algorithm, which has N hyper-parameters; the domain of the nth hyperparameter is Y_nIf the configuration space of all the superparameters is Y ═ Y₁×Y₂×…×Y_n(ii) a λ denotes the vector of the hyperparametric, A_λShowing the machine algorithm with the parameter instance. For a certain data set, a set of hyper-parameter combinations is found such that the value of L is minimal.

Given data set D_trainAnd D_validThe target is:

the L function is a loss function for measuring the model generated according to the A algorithm and the parameter lambda thereof on the training set and the test set.

In recent years, specific representative hyper-parameter Optimization methods include Grid Search (Grid Search), Random Search (Random Search), and Bayesian Optimization (Bayesian Optimization). The grid search adopts a traversal strategy, a group of configurations with the best performance is selected by trying all combinations of hyper-parameters, and the search explodes in time complexity with the increase of parameter dimensions. Random search randomly samples in the hyper-parameter search space to explore the hyper-parameter space more widely, which can find the optimal configuration in less iteration times, but the random search results have large differences and cannot be scientifically considered. Bayesian optimization takes a Gaussian regression model as a proxy model, the proxy model is continuously updated by data, new sampling points are generated, and then the processes are continuously iterated until better-expression parameter configuration is generated.

The bayesian optimization algorithm is currently most widely used, but has the following limitations: more sample points (characterized by hyper-parameter combination and model accuracy as a result) are needed for training the agent model, and the optimization efficiency is not very high; the agent model mostly takes a Gaussian regression model as an agent, and the effects of other agent models are not considered in comparison; and in the final super-parameter selection, only rmse is used as a unique selection standard, and the aspects of model fitting degree stability and the like are not considered.

Disclosure of Invention

The technical problem to be solved by the invention is to provide an LSTM hyper-parameter optimization method, a system, a medium and equipment based on SVM for overcoming the defects in the prior art, so as to realize the ultra-parameter optimization and the model comprehensive performance evaluation of the directionality, high efficiency of the long-short term memory network algorithm model.

The invention adopts the following technical scheme:

an LSTM hyper-parameter optimization method based on SVM comprises the following steps:

s1, randomly selecting N sets of hyper-parameter combinations and inputting the hyper-parameter combinations into an LSTM temperature prediction model for training;

s2, selecting the parameter combination with the minimum RMSE of the former 3 models according to the existing results to carry out disturbance, and carrying out training in the LSTM temperature prediction model which is trained in the step S1 to obtain a super-parameter combination result set;

s3, training an SVM proxy model by using the hyper-parameter combination result set obtained in the step S2, and predicting all hyper-parameter spaces;

s4, selecting the first N/N groups of hyper-parameter combinations with the best prediction results in the step S3 and substituting the combination into the LSTM temperature prediction model in the step S1 to obtain a real RMSE result;

and S5, updating the hyper-parameter combination result set according to the real RMSE result obtained in the step S4, realizing gradient reduction by updating N to N/N, finishing the cycle when N is less than 1 to obtain the first 3 groups of hyper-parameter combinations with the minimum RMSE, bringing the first 3 groups of hyper-parameter combinations into the LSTM temperature prediction model trained in the step S1 to obtain the comprehensive performance of the LSTM temperature prediction model, and using the highest hyper-parameter combination as the optimal hyper-parameter combination for hyper-parameter optimization.

Specifically, in step S1, a is defined as a machine algorithm having N hyper-parameters; the domain of the nth hyperparameter is Y_nThe configuration space of all the superparameters is Y ═ Y₁×Y₂×…×Y_n(ii) a λ denotes the vector of the hyperparametric, A_λShowing the machine algorithm with the parameter instance.

Specifically, in step S1, in the LSTM network structure, X_n，Y_nRepresenting one input and output, U, W, V representing weight, h_nIndicating hidden layer states, if there is no valve control, then h_nDenoted as f (UX)_n+W_n-1S_n-1+W_n-2S_n-2+...+W_n-RS_n-R) I.e. the implementation is associated with the first R inputs.

Specifically, in step S2, the parameter perturbation specifically includes:

taking a group of parameters as a reference, adding, subtracting or multiplying and dividing step length in a value range of each parameter by P₀＝(Unit₀，Batch_size₀，Dropout₀) For the reference 6 parameter combinations, P₀For reference parameter combinations, Unit₀Is P₀Corresponding to the number of hidden layers, Batch _ size₀Is P₀Dropout corresponding to the number of training samples per time₀Is P₀And (4) corresponding rejection rate.

Further, the 6 parameter combinations are specifically:

P₁＝(Unit₀-10，Batch_size₀，Dropout₀)

P₂＝(Unit₀+10，Batch_size₀，Dropout₀)

P₃＝(Unit₀，Batch_size₀/2，Dropout₀)

P₄＝(Unit₀，Batch_size₀*2，Dropout₀)

P₅＝(Unit₀，Batch_size₀，Dropout₀-0.05)

P₆＝(Unit₀，Batch_size₀，Dropout₀+0.05)。

specifically, in step S5, the model comprehensive performance (hp) is expressed as:

Performance(hp)＝k₁*accuracy(hp)+k₂*stability(hp)

wherein k is₁Model accuracy is the weight of the model's comprehensive performance, accuracy (hp) is the model accuracy, k₂The stability (hp) is the weight of the model stability in the comprehensive performance of the model.

Specifically, in step S5, when N is 1 or more, the process returns to step S2.

Another technical solution of the present invention is an LSTM hyper-parameter optimization system based on SVM, comprising:

the training module randomly selects N sets of hyper-parameter combinations and inputs the hyper-parameter combinations into the LSTM temperature prediction model for training;

the combination module selects the parameter combination with the minimum RMSE of the first 3 groups of models according to the existing result to carry out disturbance, and brings the disturbance into an LSTM temperature prediction model which is trained by a training module to carry out training so as to obtain a super-parameter combination result set;

the prediction module is used for training the SVM proxy model by using the hyper-parameter combination result set obtained by the combination module and predicting all hyper-parameter spaces;

the selection module selects the first N/N groups of hyper-parameter combinations with the best prediction result of the prediction module to be brought into the LSTM temperature prediction model of the training module to obtain a real RMSE result;

and the optimization module updates the hyper-parameter combination result set according to the real RMSE result obtained by the selection module, realizes gradient reduction by updating N to N/N, finishes circulation when N is less than 1 to obtain the first 3 groups of hyper-parameter combinations with the minimum RMSE, brings the first 3 groups of hyper-parameter combinations into an LSTM temperature prediction model trained by the training module to obtain the comprehensive score of the LSTM temperature prediction model, and uses the highest hyper-parameter combination as the optimal hyper-parameter combination for hyper-parameter optimization.

Another aspect of the invention is a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the SVM based LSTM hyper-parameter optimization methods.

Another technical solution of the present invention is a computing device, including:

one or more processors, memory, and one or more programs stored in the memory and configured for execution by the one or more processors, the one or more programs including instructions for performing any of the SVM based LSTM hyper-parameter optimization methods.

Compared with the prior art, the invention has at least the following beneficial effects:

the invention relates to an LSTM hyper-parameter optimization method based on SVM, which realizes the ultra-parameter optimization of the directivity, high efficiency and more comprehensive model comprehensive performance evaluation of a long-short term memory network algorithm model; on the basis of random sampling, predicting the rmse of a model corresponding to a hyper-parameter space by using a proxy model, selecting first groups of hyper-parameters with better prediction results for disturbance, improving the probability of generating an optimal solution and avoiding local optimization, so that the hyper-parameter combination selection has more directionality; the SVM is selected as a proxy model, is a learning tool based on small samples, and has fewer initial sample points and higher optimization efficiency compared with a group optimization algorithm; and outputting a loss function, calculating the squared error of the loss function, adding the squared error of the loss function and the RSME by a certain weight, and obtaining a hyper-parameter combination with the optimal model comprehensive performance when model errors of different parameter combinations are smaller.

Further, the process of hyper-parametric optimization is to find a set of hyper-parameter combinations for a certain data set, so that the L value is minimized. That is, given data set D_trainAnd D_validThrough the analysis process, the parameter optimization problem is converted into a mathematical model, and input and output targets of LSTM hyper-parametric optimization are determined.

Furthermore, the LSTM regulates whether the previous network memory state acts on the calculation of the current network or not by acting three valves on the nodes of the RNN, and the LSTM prediction model is used for improving the prediction accuracy on time series data. And by analyzing the LSTM network structure, selection of hyper-parameters needing to be optimized in the LSTM network model is facilitated, and therefore the hyper-parameter space P is determined.

Furthermore, the first groups of hyper-parameters with better prediction results are selected for disturbance, so that the probability of generating an optimal solution is improved, local optimization is avoided, and the hyper-parameter combination selection is more directional.

Furthermore, an SVM is selected as a proxy model, the SVM is a learning tool based on small samples, the number of initial sample points is less than that required by a group optimization algorithm, and the optimization efficiency is higher;

furthermore, when the model performance is evaluated, the accuracy and the stability of the model are comprehensively considered, and compared with the existing method which only takes rmse as a unique selection standard when the final super-parameter is selected, the method is more comprehensive in consideration.

Further, the number of selected super-parameter combinations is reduced by N-N/N, because the accuracy of the prediction result of the SVM is considered to be poor initially, and the accuracy of the SVM is improved as the number of the super-parameter combinations which are tried is more and more, so that the number of the selected super-parameter combinations can be reduced to improve the optimization efficiency.

In summary, the invention provides a method for optimizing the LSTM hyper-parameter by using the SVM as a proxy model and using the characteristic of the SVM based on small samples to reduce the number of selected sample points and improve the optimization efficiency, thereby realizing the directionality, high-efficiency hyper-parameter optimization and model comprehensive performance evaluation of the long-short term memory network algorithm model. In addition, the method converts the super-parameter optimization problem into a mathematical model for utilization and analysis; selecting the first groups of hyper-parameters with better prediction results to carry out disturbance in the optimization process, and improving the probability of generating an optimal solution while avoiding local optimization; and the step reduction of the selected super parameter combination quantity is realized through N ═ N/N; and finally, comprehensively considering model accuracy and model stability when evaluating the performance of the model. Through the process, the LSTM hyperparametric optimization algorithm based on the SVM provided by the invention realizes high-efficiency hyperparametric optimization and more comprehensive model comprehensive performance evaluation.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

FIG. 1 is a schematic diagram of the structure of an LSTM;

FIG. 2 is a flow chart of the present invention;

FIG. 3 is a diagram of random search results;

FIG. 4 is a graph of the results of an LSTM hyperreference optimization based on SVM;

FIG. 5 is a graph of the LSTM temperature prediction model stability visualization.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Various structural schematics according to the disclosed embodiments of the invention are shown in the drawings. The figures are not drawn to scale, wherein certain details are exaggerated and possibly omitted for clarity of presentation. The shapes of various regions, layers and their relative sizes and positional relationships shown in the drawings are merely exemplary, and deviations may occur in practice due to manufacturing tolerances or technical limitations, and a person skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions, according to actual needs.

Referring to fig. 1, the LSTM network is a variation of RNN, and RNN is the most efficient tool for processing time series related data, compared to other neural networks, the output layer result of RNN is not only related to the current input but also related to the last hidden layer result, which is equivalent to a certain memory function for time series. The problem of long-term memory is solved by the presence of LSTM, which regulates the previous network memory by acting on the nodes of the RNN through three valvesWhether the state is acting on the computation of the current network. In the LSTM network structure, the small circles indicate the added valves, X_n，Y_nRepresenting one input and output, U, W, V representing weight, h_nRepresenting hidden layer states, h_nAnd not only with X_nRelated to and the first R hidden layer outputs, h if no valve control is present_nDenoted as f (UX)_n+W_n-1S_n-1+W_n-2S_n-2+...+W_n-RS_n-R) I.e. the association with the first R inputs is achieved. However, when R is large, i.e., the current output is associated with an input that is further away, the training model grows exponentially as R grows.

Referring to fig. 2, the LSTM hyper-parameter optimization method based on SVM of the present invention includes the following steps:

when implementing an LSTM network, the network structure has the following important hyper-parameters:

activation represents an Activation function, default set to tanh, generally without modification.

The Optimizer represents an Optimizer, and it is generally accepted that setting Adam to calculate the weight update step size has a better effect.

Learning Rate indicates a Learning Rate, the default is 0.01 in the keras framework, and the smaller the Learning Rate, the finer the Learning will be, but the Learning Rate will be decreased.

Timestep indicates how many time series of incoming data are associated with each data.

Epoch represents the number of iterations, i.e., the number of times of complete training using all samples, and sets the Loss Function to MAE (mean absolute error) to represent the training result error, and stops training when the Loss Function is converging.

Dropout controls neurons to be discarded according to a certain probability in the deep learning network, and overfitting can be prevented.

The Unit represents the number of output layers of one Cell of the LSTM, and can be understood as the number of hidden layers of a general neural network, and when the number of nodes of the hidden layers is set to be smaller, the fitting effect of the network is reduced; when the setting is more than enough, the training time is prolonged and the training is easy to fall into a local minimum point.

The Batch _ size represents the number of samples of a training session, and this parameter affects the optimization and training speed of the model. Generally, the larger the value of Batch _ size, the faster the training speed, i.e., the faster the result error converges, but the generalization ability of the model decreases.

The default hyper-parameter of the neural network is a better choice, therefore, Activation, Optimizer, Learning Rate and Epochs all adopt default values, time is directly set according to actual requirements, and the hyper-parameter space P needing to be selected is represented as:

P＝(Unit,Batch_size,Dropout)

wherein, the value range of the Unit is [10, 200], the value range of the Batch _ size is [2,512], and the value range of the Dropout is [0,0.5 ].

Discretizing in the value range, increasing Unit by 10 and Batch _ size by 2ⁿDropout is incremented by 0.05, and the super parameter space contains 720 super parameter combinations, 20 × 9 × 4.

parameter disturbance: taking a group of parameters as a reference, adding, subtracting or multiplying and dividing step length in a value range of each parameter, such as P₀＝(Unit₀，Batch_size₀，Dropout₀) For reference, the perturbation can be combined with up to 6 sets of parameters:

P₁＝(Unit₀-10，Batch_size₀，Dropout₀)

P₂＝(Unit₀+10，Batch_size₀，Dropout₀)

P₃＝(Unit₀，Batch_size₀/2，Dropout₀)

P₄＝(Unit₀，Batch_size₀*2，Dropout₀)

P₅＝(Unit₀，Batch_size₀，Dropout₀-0.05)

P₆＝(Unit₀，Batch_size₀，Dropout₀+0.05)

and S5, updating the hyper-parameter combination result set according to the real RMSE result obtained in the step S4, realizing gradient reduction by updating N to N/N, finishing the cycle when N is less than 1 to obtain the first 3 groups of hyper-parameter combinations with the minimum RMSE, bringing the first 3 groups of hyper-parameter combinations into the LSTM temperature prediction model trained in the step S1 to obtain the comprehensive score of the LSTM temperature prediction model, and taking the highest hyper-parameter combination as the optimal hyper-parameter combination to finish the selection of the optimal hyper-parameter combination.

3. Introduction to the model

LSTM temperature prediction model: and (3) carrying out a temperature prediction model, and establishing a thermal coupling relation between the set temperature of the refrigeration equipment and the operating temperature of the IT equipment.

SVM surrogate model: and predicting the LSTM temperature prediction model RMSE corresponding to the hyper-parameter combination. The method is characterized by taking a Unit, Batch _ size and Dropout as features, performing model training by taking the RMSE corresponding to the LSTM temperature prediction model as a result, wherein the SVM parameter settings all adopt default parameters, including setting a kernel function to rbf, setting a penalty coefficient C to 1, and setting a kernel coefficient gamma to 1/n _ feature to 1/3.

4. And (3) expressing the comprehensive performance of the model:

model accuracy: the root mean square error is obtained by the error function L, the smaller the root mean square error (rmse) of the model obtained by L is, the higher the accuracy of the model is, and the accuracy of the model accuracycacy (hp) is expressed as:

accuracy(hp)＝L(hp，n)

model stability (robustness): after the model is basically stable in the iteration process, the difference sum of squares of each iteration relative to the last iteration is calculated, the smaller the result is, the higher the stability of the model is, and the stability (hp) of the model is expressed as:

wherein, hp represents a parameter combination, m represents the iteration number of the model basically reaching stability, n represents the total iteration number, and L (hp, k) represents the rmse of the model at the kth iteration of setting the parameter combination hp.

The comprehensive performance of the model is as follows: weighting and summing the accuracy accuracycacy and the stability of the model, wherein the smaller the result is, the better the comprehensive performance of the model is, and the comprehensive performance (hp) of the model is expressed as:

Performance(hp)＝k₁*accuracy(hp)+k₂*stability(hp)

In another embodiment of the present invention, an LSTM hyperparameter optimization system based on an SVM is provided, which can be used to implement the above LSTM hyperparameter optimization method based on an SVM, and specifically, the LSTM hyperparameter optimization system based on an SVM includes a training module, a combination module, a prediction module, a selection module, and an optimization module.

In yet another embodiment of the present invention, a terminal device is provided that includes a processor and a memory for storing a computer program comprising program instructions, the processor being configured to execute the program instructions stored by the computer storage medium. The Processor may be a Central Processing Unit (CPU), or may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable gate array (FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, etc., which is a computing core and a control core of the terminal, and is adapted to implement one or more instructions, and is specifically adapted to load and execute one or more instructions to implement a corresponding method flow or a corresponding function; the processor of the embodiment of the invention can be used for the operation of the LSTM hyperparameter optimization method based on the SVM, and comprises the following steps:

randomly selecting N groups of hyper-parameter combinations and inputting the N groups of hyper-parameter combinations into an LSTM temperature prediction model for training; selecting the minimum parameter combination of the first 3 groups of models RMSE according to the existing results to carry out disturbance, and carrying the disturbance into the LSTM temperature prediction model which completes training to carry out training to obtain a super-parameter combination result set; training an SVM proxy model by using the obtained hyper-parameter combination result set, and predicting all hyper-parameter spaces; selecting the first N/N groups of hyper-parameter combinations with the best prediction results to bring the hyper-parameter combinations into an LSTM temperature prediction model to obtain a real RMSE result; and updating a hyper-parameter combination result set according to the obtained real RMSE result, realizing gradient reduction by updating N to N/N, finishing the cycle when N is less than 1 to obtain the first 3 groups of hyper-parameter combinations with the minimum RMSE, bringing the first 3 groups of hyper-parameter combinations into the LSTM temperature prediction model which is trained to obtain the comprehensive performance of the LSTM temperature prediction model, and taking the highest hyper-parameter combination as the optimal hyper-parameter combination for hyper-parameter optimization.

In still another embodiment of the present invention, the present invention further provides a storage medium, specifically a computer-readable storage medium (Memory), which is a Memory device in a terminal device and is used for storing programs and data. It is understood that the computer readable storage medium herein may include a built-in storage medium in the terminal device, and may also include an extended storage medium supported by the terminal device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also, one or more instructions, which may be one or more computer programs (including program code), are stored in the memory space and are adapted to be loaded and executed by the processor. It should be noted that the computer-readable storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory.

One or more instructions stored in a computer-readable storage medium may be loaded and executed by a processor to perform the corresponding steps of the above-described embodiments with respect to the SVM-based LSTM hyper-parameter optimization method; one or more instructions in the computer-readable storage medium are loaded by the processor and perform the steps of:

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The pseudo code of the LSTM hyperparameter optimization method based on the SVM is expressed as follows:

the algorithm is as follows: LSTM hyper-parameter optimization algorithm based on SVM

Inputting: a hyper-parameter space P, initially selecting a hyper-parameter quantity N, and screening a proportion N

And (3) outputting: parameter combination Best _ hp with optimal model comprehensive performance

Sample data example

Referring to fig. 5, a process of finding the optimal parameters by taking the data center data as a sample set and taking a temperature prediction model established by optimizing lstm as an example is described. The previous minute data predicts the next minute temperature, each data interval being 5s, so time is 12, and further N is set to 10, N is set to 2. The specific optimization process is as follows:

1) randomly selecting 10 groups of hyper-parameter combinations, substituting into an LSTM temperature prediction model to obtain rmse, and updating a result set

2) Selecting the first 3 groups of minimum rmse hyper-parameter combinations for disturbance, substituting the disturbed hyper-parameter combinations into an LSTM temperature prediction model to obtain rmse, and updating a result set

3) The result set of the super-parameter combination is used for training an SVM proxy model, 10/2-5 groups of super-parameter combinations are predicted before, the real rmse is obtained by bringing the prediction model into an LSTM temperature prediction model, and the result set is updated

4) Selecting the first 3 groups of minimum rmse hyper-parameter combinations for disturbance again, substituting the disturbed hyper-parameter combinations into an LSTM temperature prediction model to obtain rmse, and updating a result set

5) The result set of the super-parameter combination is used for training an SVM proxy model, 5/2-2 groups of super-parameter combinations are predicted before, the real rmse is obtained by bringing the prediction model into an LSTM temperature prediction model, and the result set is updated

6) Selecting the first 3 groups of minimum rmse hyper-parameter combinations for disturbance again, substituting the disturbed hyper-parameter combinations into an LSTM temperature prediction model to obtain rmse, and updating a result set

7) The result set of the super-parameter combination is used for training an SVM proxy model, 2/2-1 group of super-parameter combinations are predicted before, the real rmse is obtained by bringing the prediction model into an LSTM temperature prediction model, and the result set is updated

8) Selecting the first 3 groups of minimum rmse hyper-parameter combinations for disturbance again, substituting the disturbed hyper-parameter combinations into an LSTM temperature prediction model to obtain rmse, and updating a result set

9) At this time, 1/2<1, the svm training and prediction are finished, the first 3 groups of rmse parameter combinations are directly selected from the result set, the comprehensive performance of the evaluator model is evaluated to obtain the parameter combination with the best comprehensive performance,

note: before the parameter combination is carried into the LSTM temperature prediction model to obtain rmse, whether a result set exists or not is judged firstly, and if yes, direct skipping is carried out.

The optimization process tries no more than one hundred (90) super-parameter combinations at most, 3 groups of minimum rmse combinations are selected, then model comprehensive performance evaluation is carried out respectively, and the super-parameter combination with the optimal performance is selected. Therefore, the optimization efficiency can be improved by the method, and the parameter combination with good model comprehensive performance is selected from the optimization efficiency.

Setting N to 100 and N to 2, fig. 3 shows the results of training the model with 100 sets of hyper-parameters initially randomly selected (RMSE). The RMSE values for the 100 sets of hyperparametric combinations are spread between 0.29 and 0.87, with the results varying widely between the combinations due to the randomly selected hyperparametric combinations. Fig. 4 shows the results of the hyper-parameters obtained by performing the second iteration and the third iteration with the method of the present invention, and from the experimental results, the selected hyper-parameter combinations perform better and better, and the result obtained by the hyper-parameter combinations selected in the last iterations tends to be a better result, which verifies the effectiveness of the method of the present invention.

In summary, the invention provides an LSTM hyper-parameter optimization method and system based on SVM, which aims at the LSTM hyper-parameter optimization problem, and provides that the SVM is used as a proxy model, the small sample-based characteristics of the SVM are utilized, the number of selected sample points is reduced, and the optimization efficiency is improved; in addition, the first groups of hyper-parameters with better prediction results are selected for disturbance on the basis of random sampling, so that the probability of generating an optimal solution is improved, and local optimization is avoided; and the step reduction of the selected super parameter combination quantity is realized through N ═ N/N; and finally, comprehensively considering model accuracy and model stability when evaluating the performance of the model. Therefore, the directivity, high-efficiency and ultra-parameter optimization of the long-short term memory network algorithm model and more comprehensive evaluation of the comprehensive performance of the model are realized.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims

1. An LSTM hyper-parameter optimization method based on SVM is characterized by comprising the following steps:

2. The method according to claim 1, wherein in step S1, a is defined as a machine algorithm with N hyper-parameters; the domain of the nth hyperparameter is Y_nThe configuration space of all the superparameters is Y ═ Y₁×Y₂×…×Y_n(ii) a λ denotes the vector of the hyperparametric, A_λShowing the machine algorithm with the parameter instance.

3. The method of claim 1, wherein in step S1, in the LSTM network structure, X is_n，Y_nRepresenting one input and output, U, W, V representing weight, h_nIndicating hidden layer states, if there is no valve control, then h_nDenoted as f (UX)_n+W_n- ₁S_n-1+W_n-2S_n-2+...+W_n-RS_n-R)。

4. The method according to claim 1, wherein in step S2, the parameter perturbation is specifically:

with a set of parametersThe reference is that step length addition, subtraction, multiplication and division in the value range of each parameter are carried out, and P is used₀＝(Unit₀，Batch_size₀，Dropout₀) For reference parameter combinations, Unit₀Is P₀Corresponding to the number of hidden layers, Batch _ size₀Is P₀Dropout corresponding to the number of training samples per time₀Is P₀And (4) corresponding rejection rate.

5. The method according to claim 4, wherein the reference parameter combinations are specifically:

P₁＝(Unit₀-10，Batch_size₀，Dropout₀)

P₂＝(Unit₀+10，Batch_size₀，Dropout₀)

P₃＝(Unit₀，Batch_size₀/2，Dropout₀)

P₄＝(Unit₀，Batch_size₀*2，Dropout₀)

P₅＝(Unit₀，Batch_size₀，Dropout₀-0.05)

P₆＝(Unit₀，Batch_size₀，Dropout₀+0.05)。

6. the method of claim 1, wherein in step S5, the model performance (hp) is expressed as:

Performance(hp)＝k₁*accuracy(hp)+k₂*stability(hp)

7. The method of claim 1, wherein in step S5, when N is greater than or equal to 1, the method returns to step S2.

8. An LSTM hyper-parameter optimization system based on SVM, characterized by comprising:

and the optimization module updates the hyper-parameter combination result set according to the real RMSE result obtained by the selection module, realizes gradient reduction by updating N to N/N, finishes circulation when N is less than 1 to obtain the first 3 groups of hyper-parameter combinations with the minimum RMSE, brings the first 3 groups of hyper-parameter combinations into an LSTM temperature prediction model which is trained by the training module to obtain the comprehensive score of the LSTM temperature prediction model, and uses the highest hyper-parameter combination as the optimal hyper-parameter combination for hyper-parameter optimization.

9. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-8.

10. A computing device, comprising:

one or more processors, memory, and one or more programs stored in the memory and configured for execution by the one or more processors, the one or more programs including instructions for performing any of the methods of claims 1-8.