CN112733996B - GA-PSO (genetic algorithm-particle swarm optimization) based hydrological time sequence prediction method for optimizing XGboost - Google Patents

GA-PSO (genetic algorithm-particle swarm optimization) based hydrological time sequence prediction method for optimizing XGboost Download PDF

Info

Publication number
CN112733996B
CN112733996B CN202110049321.0A CN202110049321A CN112733996B CN 112733996 B CN112733996 B CN 112733996B CN 202110049321 A CN202110049321 A CN 202110049321A CN 112733996 B CN112733996 B CN 112733996B
Authority
CN
China
Prior art keywords
pso
xgboost
hydrological
model
optimal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110049321.0A
Other languages
Chinese (zh)
Other versions
CN112733996A (en
Inventor
马露
万定生
余宇峰
杨志勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN202110049321.0A priority Critical patent/CN112733996B/en
Publication of CN112733996A publication Critical patent/CN112733996A/en
Application granted granted Critical
Publication of CN112733996B publication Critical patent/CN112733996B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A10/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
    • Y02A10/40Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping

Abstract

The invention discloses a GA-PSO optimization XGboost-based hydrological time sequence prediction method, which comprises the steps of collecting rainfall values corresponding to hydrological stations and flow of the corresponding hydrological stations, and organizing a hydrological time sequence dataset; preprocessing data, and dividing a sample data set into a training set and a test set; optimizing various super parameters such as the learning rate lr of the XGboost, the number n _ estimators of the base learners, the minimum leaf weight min _ weights, the maximum tree depth max _ depth and the like by adopting an improved GA-PSO combined optimization algorithm, and training an XGboost model by utilizing a sample data set to finally obtain a GA-PSO optimized XGboost hydrological time sequence prediction model; and testing the GA-PSO optimized XGboost hydrological prediction model. According to the invention, the GA-PSO is adopted to optimize the parameters of the XGboost model, and the model obtained by using the optimal parameters is used for hydrologic prediction, so that the accuracy is higher.

Description

GA-PSO (genetic Algorithm-particle swarm optimization) based hydrological time sequence prediction method for optimizing XGboost
Technical Field
The invention belongs to a hydrological prediction technology, and particularly relates to a GA-PSO (genetic algorithm-particle swarm optimization) based hydrological time sequence prediction method for optimizing XGboost.
Background
At present, the hydrology industry in China advances from traditional hydrology to modern hydrology, the observation technology of the automatic hydrology station is rapidly popularized, and the coverage of hydrology data is more and more comprehensive from manual recording of hydrology data to data recording of the current automatic station every few minutes or even every second. The hydrological data have the characteristics of large quantity, various categories, spatiotemporal property, quick updating and the like, and meanwhile, the hydrological data are influenced by various conditions such as seasonal climate, geomorphic characteristics, hydrological laws and the like, so that a lot of valuable laws and information are hidden. How to make powerful analysis on them and obtain useful information from them to serve hydrologic forecasting, flood detection, etc. becomes a focus of attention. In the traditional hydrology industry, a physical model is generally established according to the hydrology environment and process, and then manual experience is added for prediction. From the information perspective, if a specific pattern rule can be mined from the long-term time series historical data owned by the drainage basin, the future water level flow of the drainage basin can be effectively predicted by utilizing the approximate trend, and the method is helpful for preventing flood disasters, so the prediction importance of the hydrologic time series is self-evident.
In recent years, a few scholars apply machine learning methods to hydrological time series prediction, such as: the method has the advantages that the method also achieves better effects, and has some problems while improving the calculation speed and precision of the traditional model: the LSTM and BP neural networks have strong learning ability, but are easy to fall into local optimization, a large number of parameters are needed, and the convergence rate is low; the support vector machine has good prediction effect, but for large-scale training samples, the calculation speed is slow and the selection of the hyper-parameters is depended on. Therefore, it is necessary to find a prediction model with both efficiency and accuracy.
The genetic algorithm and the particle swarm algorithm are the most frequently used and most basic optimization algorithms when the parameters are optimized for the model, in the optimization process of the GA algorithm, the whole population exists in a coding form, the variation trend is gradually and uniformly close to the optimal area, but the GA algorithm is 'memoryless', and the particles are updated only through crossing and variation, so that the global search capability is stronger; in contrast, the PSO algorithm "has memory", updates the particle by changing the velocity and position of the particle, is closely related to the position of the previous time, is more suitable for the local optimal search, has less parameters to be adjusted, and has a fast convergence rate but avoids the premature convergence.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to solve the defects in the prior art, and provides a GA-PSO optimization XGboost-based hydrological time sequence prediction method.
The technical scheme is as follows: the invention discloses a GA-PSO (genetic algorithm-particle swarm optimization) based hydrological time sequence prediction method for optimizing XGboost, which comprises the following steps of:
s1, collecting rainfall values of all rainfall stations corresponding to a water system basin within a certain time period and water levels of corresponding water level stations, and organizing a hydrological time series data set;
s2, preprocessing each hydrological sample data in the hydrological time series data set of the S1, and dividing the sample data set into a hydrological training data set L and a hydrological testing data set T;
step S3, optimizing various super parameters such as learning rate lr, number n _ estimators of base learners, minimum leaf weight, maximum tree depth and the like of the XGboost model by adopting an improved GA-PSO combined optimization algorithm, and training the XGboost model by utilizing a sample data set to finally obtain a GA-PSO optimized XGboost hydrological time sequence prediction model;
and step S4, testing the GA-PSO optimized XGboost hydrological prediction model.
The step S1 is to obtain a data set and corresponding tag information, and the step S1 is further to: and organizing current and previous 7-hour rainfall values of the rainfall station corresponding to the water system drainage basin and current and previous 7-hour flow values of the corresponding water system station as a water system time sequence data set.
The step S2 is to pre-process the data in the data set and partition the data set, and the step S2 is further to:
step S2.1, the preprocessing of the hydrological sample data x (t) in step S2 includes missing value processing, error value correction and normalization;
the normalization formula is as follows:
Figure BDA0002898654550000021
wherein x is*Is a normalized value, x is an initial value, xminIs the minimum value in the original sequence, xmaxIs the maximum value in the original sequence;
and S2.2, taking the first 80% of the preprocessed hydrological time series data set as a hydrological training data set L, and taking the remaining 20% of the preprocessed hydrological time series data set as a hydrological test data set T.
The XGboost model has a plurality of parameters, and the more optimal parameter can improve the accuracy of sequence prediction, so that the learning rate lr, the number n _ estimators of base learners, the minimum leaf weight min _ weights, the maximum tree depth max _ depth and other super parameters of the XGboost model are optimized by adopting an improved GA-PSO algorithm, and the step S3 specifically comprises the following steps:
s3.1, initializing the learning rate lr of the XGboost model, the number n _ estimators of the base learners, the minimum leaf weight min _ weights and the value range of the maximum tree depth max _ depth parameter, and setting the iteration number of the GA-PSO integral optimization algorithm as T*
S3.2, randomly generating N subgroups, wherein chromosomes of particles in each subgroup are equivalent to a group of XGboost parameters (lr, N _ estimators, min _ weights, max _ depth);
step S3.3, use R2As individual fitness values, initializing the individual fitness values of all the particles in the N subgroups of step S3.2;
s3.4, performing classical GA optimization on the N subgroups once to finally obtain N optimal particles, wherein the specific GA optimization method comprises the following steps: each subgroup comprises m individuals, and the iteration number of each subgroup is set to be T1Performing selection, crossing and mutation operations on the encoded m individuals to further update the population;
s3.5, calculating the fitness value of each particle after the variation, and updating the optimal individual representing the current iteration times according to the fitness value;
step S3.6, returning to step S3.4 to continue to complete the classical GA optimization until the upper limit T of the iteration times is reached1Satisfying the termination condition, each subgroup will have T1Comparing the fitness of the historical optimal particles, taking the particles with the highest fitness value as the optimal individuals of the subgroup, and finally obtaining N optimal individuals from the N subgroups;
s3.7, decoding the N optimal individuals obtained in the step S3.6 to serve as initial particle swarm of the PSO algorithm, and performing improved PSO optimization, wherein the iteration number of the PSO algorithm is set to be T2
S3.8, initializing the initial speed of the initial particles of the PSO algorithm, and still adopting R2As a calculation formula of the fitness value, updating the speed and the position of each particle by using the improved formula, thereby updating a historical optimal position, which is marked as pbest and global optimal position gbest of the group;
the particle velocity and position update formula in PSO is:
Figure BDA0002898654550000031
Figure BDA0002898654550000032
wherein the content of the first and second substances,
Figure BDA0002898654550000033
representing the velocity of the particles at the current time t,
Figure BDA0002898654550000034
indicating the position of the particle at the current time t,
Figure BDA0002898654550000035
the extreme point of the individual is represented,
Figure BDA0002898654550000036
representing global extreme points, ω being the inertial weight, c1、c2As learning factor, rand1、rand2Is [0,1 ]]Random numbers within the interval;
a non-linear decreasing weight method is adopted for the weight ω:
Figure BDA0002898654550000041
the learning factor is also in a nonlinear function with the weight:
Figure BDA0002898654550000042
step S3.9, judging whether the current iteration number is less than or equal to T2If yes, returning to the step S3.8 to continue the current PSO optimization, otherwise, jumping to the step S3.10;
step S3.10,Judging whether the current total iteration number is less than or equal to T*If the number of the GA subgroups can not be met, K individuals of each GA subgroup in the step S3.2 are randomly selected from the historical optimal particles in the PSO to replace the K individuals, and the step S3.2 is returned to continue the optimization; if so, outputting an optimal solution;
the XGBoost in step S4 is a tree integration model, the internal decision tree uses a regression tree, and the detailed process of step S4 is as follows:
the loss function of the GA-PSO optimized XGboost hydrological time series prediction model is set as follows:
Figure BDA0002898654550000043
wherein the content of the first and second substances,
Figure BDA0002898654550000044
measure the predicted value for the loss function
Figure BDA0002898654550000045
With the actual value yiThe difference between them; k represents the number of decision trees contained in the model;
Figure BDA0002898654550000046
the leaf node is a regular term, wherein gamma is a penalty constant of a profit function for segmenting the leaf nodes, M is the number of the leaf nodes, and lambda is a penalty function coefficient of the L2 regular term;
the predicted value of the jth model, i.e. the ith sample, in the jth training is as follows:
Figure BDA0002898654550000047
the simplified objective function of the jth training model is:
Figure BDA0002898654550000048
in the formula (I), the compound is shown in the specification,
Figure BDA0002898654550000049
is the first derivative of the loss function and,
Figure BDA00028986545500000410
the second derivative of the loss function.
Has the advantages that: compared with the prior art, the invention has the advantages that:
according to the invention, the parameters of the XGboost model are optimized by adopting a GA-PSO combined optimization algorithm, so that the situation that local optimization is involved when the optimal parameters are searched is avoided, and the model obtained by utilizing the optimal parameters is used for hydrologic prediction, so that the accuracy is higher. And on the basis of ensuring the prediction accuracy, the method has higher convergence rate, and the calculation speed of large-scale training samples is improved to a certain extent.
The XGboost prediction model after parameter optimization has better prediction effect and prediction precision, and the generalization capability of the prediction model is improved.
Drawings
FIG. 1 is a schematic overall flow diagram of the present invention;
FIG. 2 is a schematic diagram of GA-PSO optimization according to an embodiment of the present invention;
FIG. 3 is a graph comparing the change curves of the fitness values of the GA-PSO and GA and PSO optimization algorithms in the examples;
FIG. 4 is a detailed sequence (471, 481) of the forecast period 1h in the example;
fig. 5 shows the detailed sequence (2068, 2107) of prophase 1h in the example.
Detailed Description
The technical solution of the present invention is described in detail below, but the scope of the present invention is not limited to the embodiments.
As shown in FIG. 1, the invention relates to a prediction method of a hydrological time series based on GA-PSO optimization XGboost, which mainly comprises 4 steps:
and step S1, selecting data of the Longshan watershed to organize a hydrologic time series data set. The time is from 12/24/01/2010 to 7/25/2014, and the time is 31416 pieces of hour data, and one piece of data consists of five attributes including the flow value of the dragon mountain station and the rain values of four rain stations. The four rainfall stations are respectively: dragon mountain, rear love, stream and moon;
s2, preprocessing each hydrological sample data in the hydrological time series data set of the S1, and dividing the sample data set into a hydrological training data set L and a hydrological testing data set T;
step S2.1, the preprocessing of the hydrological sample data in step S2 includes missing value processing, error value correction, and normalization;
the normalization formula is as follows:
Figure BDA0002898654550000051
wherein x is*Is a normalized value, x is an initial value, xminIs the minimum value in the original sequence, xmaxIs the maximum value in the original sequence;
and S2.2, taking the first 80% of the preprocessed hydrological time series data set as a hydrological training data set L, and taking the remaining 20% of the preprocessed hydrological time series data set as a hydrological test data set T. Selecting 26000 hours of data from 24/01/12/2013/12/11/08 as a training set L, and 5416 pieces of data from 08/12/11/2013/2014/7/25/01 as a test set T;
s3, optimizing the learning rate lr, the number n _ estimators of the base learners, the minimum leaf weight min _ weights and the maximum tree depth max _ depth of the XGboost model by adopting an improved GA-PSO combined optimization algorithm, and training the XGboost model by using a sample data set L to finally obtain the XGboost hydrological time sequence prediction model optimized by GA-PSO;
s3.1, initializing the learning rate lr of the XGboost model, the number n _ estimators of the base learners, the minimum leaf weight min _ weights and the value range of the maximum tree depth max _ depth parameter, setting the range of lr to be (0.01,0.4), the range of n _ estimators to be (10,220), the range of gamma to be (3,10) and the range of max _ depth to be (0, 0.2). Setting the iteration number of the GA-PSO integral optimization algorithm as T*Setting the initial population number to be N-50 and the iteration number T for GA-PSO*Set to 100 times, where the crossover probability cp in the GA used is 0.85, the mutation probability mp is 0.05, and the number of iterations T 150, improved PSO optimization2And (3) optimizing the parameter jinxing of the XGboost model by using a GA-PSO optimization algorithm, wherein the specific flow is shown in figure 2, and the specific steps are as follows:
s3.2, randomly generating N subgroups, wherein chromosomes of particles in each subgroup are equivalent to a group of XGboost parameters (lr, N _ estimators, min _ weights, max _ depth);
step S3.3, use R2As individual fitness values, initializing the individual fitness values of all particles in the N subgroups in step S3.2;
step S3.4, performing classical GA optimization on the 50 subgroups once to finally obtain 50 optimal particles, wherein the specific GA optimization method comprises the following steps: each subgroup contains 50 individuals, and the iteration number of each subgroup is set to be T1Selecting, crossing and mutating the 50 encoded individuals to further update the population;
s3.5, calculating the fitness value of each particle after the variation, and updating the optimal individual representing the current iteration times according to the fitness value;
step S3.6, returning to step S3.4 to continue to complete the classical GA optimization until the upper limit T of the iteration times is reached1Satisfying the termination condition, each subgroup will have T1Comparing the fitness of the historical optimal particles, taking the particles with the highest fitness value as the optimal individuals of the subgroup, and finally obtaining 50 optimal individuals from 50 subgroups;
s3.7, decoding the N optimal individuals obtained in the step S3.6 to serve as initial particle swarm of the PSO algorithm, and performing improved PSO optimization, wherein the iteration number of the PSO algorithm is set to be T2
S3.8, initializing the initial speed of the initial particles of the PSO algorithm, and still adopting R2As a calculation formula of the fitness value, the velocity and the position of each particle are updated by using the improved formula, so that the historical optimal position, which is recorded as pbest and is the whole population is updatedThe optimal position of the bureau gbest;
the particle velocity and position update formula in PSO is:
Figure BDA0002898654550000071
Figure BDA0002898654550000072
wherein the content of the first and second substances,
Figure BDA0002898654550000073
representing the velocity of the particles at the current time t,
Figure BDA0002898654550000074
indicating the position of the particle at the current time t,
Figure BDA0002898654550000075
the extreme point of the individual is represented,
Figure BDA0002898654550000076
representing global extreme points, ω being the inertial weight, c1、c2As a learning factor, rand1、rand2Is [0,1 ]]Random numbers within the interval;
a non-linear decreasing weight method is adopted for the weight ω:
Figure BDA0002898654550000077
the learning factor is also in a nonlinear function with the weight:
Figure BDA0002898654550000078
step S3.9, judging whether the current iteration number is less than or equal to T2If yes, returning to the step S3.8 to continue the current PSO optimization, otherwise, jumping to the step S3.10;
s3.10, judging whether the current total iteration times are less than or equal to T*If the number of the individuals in the GA subgroup in step S3.2 is not equal to N/2 equal to 25, then returning to step S3.2 to continue the optimization; if so, outputting an optimal solution;
and step S4, testing the GA-PSO optimized XGboost hydrological prediction model.
The loss function of the GA-PSO optimized XGboost hydrological time series prediction model is set as follows:
Figure BDA0002898654550000079
wherein the content of the first and second substances,
Figure BDA00028986545500000710
measure the prediction value for the loss function
Figure BDA00028986545500000711
With the actual value yiThe difference between them; k represents the number of decision trees contained in the model;
Figure BDA00028986545500000712
the leaf node is a regular term, wherein gamma is a penalty constant of a profit function for segmenting the leaf nodes, M is the number of the leaf nodes, and lambda is a penalty function coefficient of the L2 regular term;
the predicted value of the jth model, i.e. the ith sample, in the jth training is as follows:
Figure BDA00028986545500000713
the simplified objective function of the jth training model is:
Figure BDA0002898654550000081
in the formula (I), the compound is shown in the specification,
Figure BDA0002898654550000082
is the first derivative of the loss function,
Figure BDA0002898654550000083
the second derivative of the loss function.
In the embodiment, the optimal parameters of the XGBoost model with parameters optimized by the GA-PSO optimization algorithm in the forecast period of 1 to 6 hours are shown in table 1 below:
TABLE 1
Figure BDA0002898654550000084
Predicting the flow data of the dragon mountain by using the optimal model, comparing the flow data with the flow data by using an SVM (support vector machine) model and an LSTM (least squares metric) model, and finally predicting the result as shown in figure 4, wherein MRE (maximum likelihood estimation), MAE (maximum likelihood estimation), RMSE (maximum likelihood estimation) and R (maximum likelihood estimation) are used as evaluation indexes of the predicted result2Four, the calculation formula is as follows:
Figure BDA0002898654550000085
Figure BDA0002898654550000086
Figure BDA0002898654550000087
Figure BDA0002898654550000088
in the formula, yiIn order to be the actual value of the measurement,
Figure BDA0002898654550000089
in order to have a value that is to be reported,
Figure BDA00028986545500000810
is the average value, and n is the number of samples.
Table 2 shows the comparison between the predicted values of the two prediction models, namely SVM and LSTM, when the optimal parameters are used by the XGboost in the prediction period of 1 h.
TABLE 2
Figure BDA00028986545500000811
Figure BDA0002898654550000091
Table 3 shows the comparison of the evaluation indexes of the three models in all the forecast periods.
TABLE 3
Figure BDA0002898654550000092
Fig. 3 shows a fitness value change curve of the GA-PSO optimization algorithm (GPSO for short) at a forecast period of 1h, compared with the classical GA and the classical PSO algorithms. Two detailed sequences (471, 481) and (2068, 2107) in the test set are selected for display in fig. 4 and fig. 5, respectively.

Claims (5)

1. A hydrological time sequence prediction method for optimizing XGboost based on GA-PSO is characterized by comprising the following steps: the method comprises the following steps:
s1, collecting rainfall values of all rainfall stations corresponding to a water system basin within a certain time period and water levels of corresponding water level stations, and organizing a hydrological time series data set;
s2, preprocessing each hydrological sample data in the hydrological time series data set of the S1, and dividing the sample data set into a hydrological training data set L and a hydrological testing data set T;
step S3, optimizing the learning rate lr, the number n _ estimators of the base learners, the minimum leaf weight min _ weights and the maximum tree depth max _ depth of the XGboost model by adopting an improved GA-PSO combined optimization algorithm, and training the XGboost model by utilizing a hydrologic training data set L to finally obtain the GA-PSO optimized XGboost hydrologic time sequence prediction model; the concrete contents are as follows:
s3.1, initializing the learning rate lr of the XGboost model, the number n _ estimators of the base learners, the minimum leaf weight min _ weights and the value range of the maximum tree depth max _ depth parameter, and setting the iteration number of the GA-PSO integral optimization algorithm as T*
S3.2, randomly generating N subgroups, wherein chromosomes of particles in each subgroup are equivalent to a group of XGboost parameters (lr, N _ estimators, min _ weights, max _ depth);
step S3.3, use R2As individual fitness values, initializing the individual fitness values of all particles in the N subgroups in step S3.2;
step S3.4, performing classical GA optimization on the N subgroups to finally obtain N optimal particles, wherein the specific GA optimization method comprises the following steps: each subgroup comprises m individuals, and the iteration number of each subgroup is set to be T1Performing selection, crossing and mutation operations on the encoded m individuals to further update the population;
s3.5, calculating the fitness value of each particle after the variation, and updating the optimal individual representing the current iteration times according to the fitness value;
step S3.6, returning to step S3.4, and continuing to finish GA optimization on subgroups until the upper limit T of iteration times is reached1The termination condition is satisfied, then each subgroup has T1Comparing the fitness of the historical optimal particles, taking the particles with the highest fitness value as the optimal individuals of the subgroup, and finally obtaining N optimal individuals from the N subgroups;
s3.7, decoding the N optimal individuals obtained in the step S3.6 to serve as initial particle swarm of the PSO algorithm, and performing improved PSO optimization, wherein the iteration number of the PSO algorithm is set to be T2
S3.8, initializing the initial speed of the PSO algorithm initial particles, and still adopting R2As a formula for calculating the fitness value, the improved formula is usedUpdating the speed and the position of each particle so as to update the historical optimal position, which is marked as pbest and the global optimal position gbest of the group;
step S3.9, judging whether the current iteration number is less than or equal to T2If yes, returning to the step S3.8 to continue the current PSO optimization, otherwise, jumping to the step S3.10;
s3.10, judging whether the current total iteration times are less than or equal to T*If the number of the GA subgroups can not be met, K individuals of each GA subgroup in the step S3.2 are randomly selected from the historical optimal particles in the PSO to replace the K individuals, and the step S3.2 is returned to continue optimization; if so, outputting an optimal solution;
and S4, testing the test set T by the optimal XGboost hydrological prediction model optimized by the GA-PSO obtained in the step S3.
2. The GA-PSO optimized XGboost-based hydrological time series prediction method according to claim 1, characterized in that: the hydrologic time series data set in step S1 includes current and previous 7-hour rainfall values of the rainfall station corresponding to the water system watershed, and current and previous 7-hour flow values of the corresponding hydrologic station.
3. The GA-PSO optimized XGboost-based hydrological time series prediction method according to claim 1, characterized in that: the preprocessing of the hydrological sample data x (t) in the step S2 includes missing value processing, error value correction and normalization;
the normalization formula is as follows:
Figure FDA0002898654540000021
wherein x is*Is a normalized value, x is an initial value, xminIs the minimum value in the original sequence, xmaxIs the maximum value in the original sequence;
and taking the first 80% of the preprocessed hydrographic time sequence data set as a hydrographic training data set L, and taking the rest 20% of the data as a hydrographic testing data set T.
4. The GA-PSO optimized XGboost-based hydrological time series prediction method according to claim 1, characterized in that: the particle velocity and position update formula in step S3.8 is:
Figure FDA0002898654540000022
Figure FDA0002898654540000023
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0002898654540000031
representing the velocity of the particles at the current time t,
Figure FDA0002898654540000032
indicating the position of the particle at the current time t,
Figure FDA0002898654540000033
the extreme point of the individual is represented,
Figure FDA0002898654540000034
representing global extreme points, ω being the inertial weight, c1、c2As a learning factor, rand1、rand2Is [0,1 ]]Random numbers within the interval;
a non-linear decreasing weight method is adopted for the weight ω:
Figure FDA0002898654540000035
the learning factor is also in a nonlinear function with the weight:
Figure FDA0002898654540000036
5. the GA-PSO optimized XGboost-based hydrological time series prediction method according to claim 1, characterized in that: the detailed process of step S4 is:
the loss function of the GA-PSO optimized XGboost hydrological time series prediction model is set as follows:
Figure FDA0002898654540000037
wherein the content of the first and second substances,
Figure FDA0002898654540000038
measure the prediction value for the loss function
Figure FDA0002898654540000039
With the actual value yiThe difference between them; k represents the number of decision trees contained in the model;
Figure FDA00028986545400000310
is a regular term, wherein gamma is a penalty constant of a gain function for segmenting leaf nodes, M is the number of the leaf nodes, and lambda is an L2 regular term penalty function coefficient;
the predicted value of the jth model, i.e. the ith sample, in the jth training is as follows:
Figure FDA00028986545400000311
the simplified objective function of the jth training model is:
Figure FDA00028986545400000312
in the formula (I), the compound is shown in the specification,
Figure FDA00028986545400000313
is the first derivative of the loss function,
Figure FDA00028986545400000314
is the second derivative of the loss function;
and testing the test set by using the optimal parameters of the XGboost model found by the GA-PSO optimization algorithm.
CN202110049321.0A 2021-01-14 2021-01-14 GA-PSO (genetic algorithm-particle swarm optimization) based hydrological time sequence prediction method for optimizing XGboost Active CN112733996B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110049321.0A CN112733996B (en) 2021-01-14 2021-01-14 GA-PSO (genetic algorithm-particle swarm optimization) based hydrological time sequence prediction method for optimizing XGboost

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110049321.0A CN112733996B (en) 2021-01-14 2021-01-14 GA-PSO (genetic algorithm-particle swarm optimization) based hydrological time sequence prediction method for optimizing XGboost

Publications (2)

Publication Number Publication Date
CN112733996A CN112733996A (en) 2021-04-30
CN112733996B true CN112733996B (en) 2022-07-12

Family

ID=75593039

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110049321.0A Active CN112733996B (en) 2021-01-14 2021-01-14 GA-PSO (genetic algorithm-particle swarm optimization) based hydrological time sequence prediction method for optimizing XGboost

Country Status (1)

Country Link
CN (1) CN112733996B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326660B (en) * 2021-06-17 2022-11-29 广西路桥工程集团有限公司 Tunnel surrounding rock extrusion deformation prediction method based on GA-XGboost model
CN113503750B (en) * 2021-06-25 2022-07-29 太原理工大学 Method for determining optimal back pressure of direct air cooling unit
CN113553760A (en) * 2021-06-25 2021-10-26 太原理工大学 Soft measurement method for final-stage exhaust enthalpy of steam turbine
CN114282431B (en) * 2021-12-09 2023-08-18 淮阴工学院 Runoff interval prediction method and system based on improved SCA and QRGRU
CN115225560B (en) * 2022-07-15 2023-08-22 国网河南省电力公司信息通信公司 Route planning method in power communication service
CN115169243A (en) * 2022-07-28 2022-10-11 中铁三局集团有限公司 GA-PSO-GLSSVM algorithm-based soil-rock composite stratum deep foundation pit deformation time sequence prediction method
CN117272051B (en) * 2023-11-21 2024-03-08 浪潮通用软件有限公司 Time sequence prediction method, device and medium based on LSTM optimization model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112015719A (en) * 2020-08-27 2020-12-01 河海大学 Regularization and adaptive genetic algorithm-based hydrological prediction model construction method

Also Published As

Publication number Publication date
CN112733996A (en) 2021-04-30

Similar Documents

Publication Publication Date Title
CN112733996B (en) GA-PSO (genetic algorithm-particle swarm optimization) based hydrological time sequence prediction method for optimizing XGboost
CN109214592B (en) Multi-model-fused deep learning air quality prediction method
CN104239489B (en) Utilize the method for similarity searching and improved BP forecast level
CN106650767B (en) Flood forecasting method based on cluster analysis and real-time correction
CN113468803B (en) WOA-GRU flood flow prediction method and system based on improvement
CN111401599B (en) Water level prediction method based on similarity search and LSTM neural network
Piltan et al. Energy demand forecasting in Iranian metal industry using linear and nonlinear models based on evolutionary algorithms
CN110363349B (en) ASCS-based LSTM neural network hydrological prediction method and system
CN113537600B (en) Medium-long-term precipitation prediction modeling method for whole-process coupling machine learning
CN107346459B (en) Multi-mode pollutant integrated forecasting method based on genetic algorithm improvement
CN110969290A (en) Runoff probability prediction method and system based on deep learning
CN106600959A (en) Traffic congestion index-based prediction method
CN116596044B (en) Power generation load prediction model training method and device based on multi-source data
CN115374995A (en) Distributed photovoltaic and small wind power station power prediction method
CN113361761A (en) Short-term wind power integration prediction method and system based on error correction
CN112015719A (en) Regularization and adaptive genetic algorithm-based hydrological prediction model construction method
CN113554466A (en) Short-term power consumption prediction model construction method, prediction method and device
KR102585381B1 (en) The method, system and equipment for vegetation restoration or rehabilitation of simulating natural ecosystem based on machine learnig
CN114580762A (en) Hydrological forecast error correction method based on XGboost
CN113722980A (en) Ocean wave height prediction method, system, computer equipment, storage medium and terminal
CN112330487A (en) Photovoltaic power generation short-term power prediction method
CN115329930A (en) Flood process probability forecasting method based on mixed deep learning model
Shang et al. Research on intelligent pest prediction of based on improved artificial neural network
CN111310974A (en) Short-term water demand prediction method based on GA-ELM
CN116542382A (en) Sewage treatment dissolved oxygen concentration prediction method based on mixed optimization algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant