CN116187501A - Low-temperature prediction based on Catboost model - Google Patents

Low-temperature prediction based on Catboost model Download PDF

Info

Publication number
CN116187501A
CN116187501A CN202211509183.0A CN202211509183A CN116187501A CN 116187501 A CN116187501 A CN 116187501A CN 202211509183 A CN202211509183 A CN 202211509183A CN 116187501 A CN116187501 A CN 116187501A
Authority
CN
China
Prior art keywords
model
variable
correlation
data
catboost
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211509183.0A
Other languages
Chinese (zh)
Inventor
冯钢
孙国东
杜翔
冯向南
霍博渊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Narentai Energy Co ltd
Original Assignee
Narentai Energy Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Narentai Energy Co ltd filed Critical Narentai Energy Co ltd
Priority to CN202211509183.0A priority Critical patent/CN116187501A/en
Publication of CN116187501A publication Critical patent/CN116187501A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01WMETEOROLOGY
    • G01W1/00Meteorology
    • G01W1/10Devices for predicting weather conditions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Algebra (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • General Business, Economics & Management (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Biology (AREA)
  • Environmental & Geological Engineering (AREA)
  • Primary Health Care (AREA)
  • Operations Research (AREA)
  • Development Economics (AREA)
  • Artificial Intelligence (AREA)
  • Public Health (AREA)
  • Environmental Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Educational Administration (AREA)
  • Computer Vision & Pattern Recognition (AREA)

Abstract

The invention provides a low-temperature prediction method based on a Catboost model, which comprises the following specific implementation steps: (1) acquiring meteorological data; (2) Establishing hysteresis characteristics between meteorological data through autocorrelation coefficients; (3) The data preprocessing specifically comprises the steps of missing value filling and data normalization; (4) selection of features by LassoCV; (5) establishing a LassoCV-Catboost model to perform low-temperature prediction; (6) Optimizing parameters of the Catboost model through a genetic algorithm GA to obtain a final low-temperature prediction model; (7) model evaluation.

Description

Low-temperature prediction based on Catboost model
Technical Field
The invention belongs to the technical field of early warning and monitoring of icing disasters of a power transmission line, and further relates to a low-temperature prediction method based on a Catboost model in the technical field of low-temperature prediction. The invention can be used for predicting low temperature.
Background
The operation experience of the power grid shows that the damage of the power transmission line caused by the wire breakage and the tower dumping accident caused by the icing disaster of the power transmission line is extremely large, and the adverse effect is caused on the safe and stable operation of the power grid system. The icing accident of the transmission line mostly occurs in a microclimate area, which is a comprehensive physical phenomenon influenced by factors such as temperature, humidity, convection of cold and hot air, circulation, wind and the like. Low temperature is one of the important reasons for icing of the transmission line, so that accurate low-temperature prediction can provide good data support for short-term icing prediction of the transmission line. The low-temperature data has time sequence characteristics, most of traditional prediction methods are unit time sequence modeling, and in fact, the change of air temperature is under the comprehensive action of various meteorological factors, and factors with high correlation with air temperature include wind direction, wind speed and relative humidity. The traditional time series air temperature prediction model mainly comprises a multiple linear regression method, an autoregressive moving average method (ARIMA) and a gray prediction method, the prediction effect of the method is difficult to meet the dynamic change of the air temperature, and the prediction result basically tends to be average. Tao et al propose a long-short term memory network air temperature prediction model based on random forests, niu Zhijuan et al propose a back propagation neural network (back propagation neural network, BP) and a radial basis function neural network (radial basis function neural network, RBF) using principal component analysis to build an air temperature prediction model, which considers the influence of multi-element weather data on air temperature but does not take the time series characteristics of the multi-element weather data itself into account. Jiang Genwei et al propose an application model of PSO-RBF-ANN in air temperature prediction, and although structural parameters of RBF model are optimized by a particle swarm optimization algorithm, the problem of unit time sequence prediction is also existed, so that the prediction accuracy is not high.
Disclosure of Invention
The invention aims to provide a low-temperature prediction method based on a Catboost model aiming at the problems that a traditional prediction method is difficult to learn massive data and the influence of multi-element meteorological data and the time correlation of the multi-element meteorological data on air temperature change is not fully considered.
The method comprises the steps of firstly establishing hysteresis characteristics by utilizing an Autocorrelation coefficient (Autocorrelation), then utilizing the characteristics that the importance of a single characteristic variable can be measured by utilizing LassoCV, screening characteristics which are related to low temperature height from the established characteristics as input variables of a Catboost model, modeling low temperature time series data, and finally optimizing parameters of the Catboost model through a genetic algorithm GA so as to obtain a final low temperature prediction model.
The specific steps of the implementation of the invention include the following steps:
(1) Acquiring meteorological data;
(2) Establishing hysteresis characteristics between meteorological data through autocorrelation coefficients;
(3) The data preprocessing specifically comprises the steps of missing value filling and data normalization;
(4) Selecting characteristics through LassoCV;
(5) Establishing a LassoCV-Catboost model for low-temperature prediction;
(6) Optimizing parameters of the Catboost model through a genetic algorithm GA to obtain a final low-temperature prediction model;
(7) Evaluating a model;
further, the hysteresis feature establishing method in the step (2) is as follows:
measuring the current moment y by means of an autocorrelation coefficient t With a lag of y from k t-k Correlation between them. The correlation measure is the degree of correlation of two random variables, and the correlation coefficient can measure the linear correlation between the two variables.
Figure BDA0003969950260000021
r k Denoted by y t And his degree of correlation of the k-order hysteresis. r is (r) k Referred to as autocorrelation coefficients (Autocorrelation Coefficient, ACF), the autocorrelation study is a relationship of values of a time series at different time points.
Further, the step of preprocessing the data in the step (3) is as follows:
and processing and modeling the data by using a Python environment, filling the missing value and the null value by using a filena function, and selecting an average value for filling. Next, the date and time information (year, month, day) is integrated into one date and time so that it can be used as an index of Pandas.
And carrying out normalization processing on the input data. The normalization method selected herein is linear function normalization (Min-Max scaling), and the specific normalization formula is shown as follows.
Figure BDA0003969950260000022
Wherein x is i (i=1, 2,3, …, n) is a meteorological data input feature,
Figure BDA0003969950260000023
for the normalized value, max is the maximum value of each meteorological element, and min is the minimum value of each meteorological element.
Further, the method for selecting the LassoCV features in step (4) is as follows:
setting a linear regression model as
Y=X T β+ε (3)
Wherein X= [ X ] 1 ,x 2 ,…,x i ,…,x n ] T ,x i =[x i,1 ,x i,2 ,…,x i,m ]∈R 1×m For low-temperature characteristic data subjected to autocorrelation pretreatment, y= [ Y ] 1 ,y 2 ,…,y n ] T ∈R n×1 For response variables, β= [ β ] 12 ,…,β m ] T ∈R 1×m As model coefficients, ε= [ ε ] 12 ,…,ε n ] T ∈R n×1 Is an error vector. The normal least squares estimation of the linear regression model is
Figure BDA0003969950260000031
Figure BDA0003969950260000032
When adding constraint functions, i.e. LASSO, it is specifically expressed as
Figure BDA0003969950260000033
The parameter lambda is a penalty coefficient of parameter estimation, the size of the parameter lambda passes ten-fold cross validation, and the determination mode of the parameter alpha is the same.
The LASSO regression algorithm is solved by the minimum regression method, the minimum regression method is a variable screening algorithm based on a forward selection algorithm and a forward gradient algorithm, and more accurate feature vectors can be obtained, and the method is specifically described as follows:
1) The calculation process of the forward selection algorithm is as follows: at X= [ X 1 ,x 2 ,…,x i ,…,x n ] T In the selection and target variable y k The closest argument x i =[x i,1 ,x i,2 ,…,x i,m ]There is
Figure BDA0003969950260000034
Wherein the coefficient beta k Determined by the above method
Figure BDA0003969950260000035
The variable residual is
Figure BDA0003969950260000036
Defining the variable residual as the new target variable while the variable residual will be free of x k The process is repeated until the residual is smaller than the set range or the number of the independent variable sets is zero, and the algorithm is terminated.
2) The forward gradient algorithm selects a feature variable x with the largest correlation each time k Approximating the target variable y k Unlike the forward selection algorithm, its residual is defined as
y res,k =y k -x k β k (8)
Taking the residual as a new objective function, and taking the original variable set X= [ X ] 1 ,x 2 ,…,x i ,…,x n ] T As a variable set, according to y res,k =y k -x k β k Recalculate until residual y res,k And (5) being smaller than the set threshold range, and obtaining the optimal solution.
Further, the specific steps of the genetic algorithm GA optimization parameters described in the step (6) are as follows:
the low-temperature prediction is defined as a sequence of meteorological elements { …, x }, obtained with historical time t-1 ,x t Low temperature sequence {, x } to predict future time t+1 ,x t+2 …. And carrying the preprocessed and feature-selected data set into a Catboost model for training, and then selecting a genetic algorithm GA and ten-fold cross validation to optimize main super parameters of the Catboost model, including interfaces, learning_rate, max_depth and criterion, so as to improve the precision of model low-temperature prediction.
Further, the specific steps of the model evaluation in the step (7) are as follows:
according to the actual low-temperature value and the predicted low-temperature value, the model is compared with the traditional prediction model ARIMA and a long-short-term memory network (LSTM) in precision. The Root Mean Square Error (RMSE), mean Absolute Error (MAE) and Mean Absolute Percent Error (MAPE) are selected as the evaluation indexes of the model. Their calculation formula is as follows:
Figure BDA0003969950260000042
Figure BDA0003969950260000043
Figure BDA0003969950260000044
wherein y' i For the predicted low temperature value, y i N is the actual value of the air temperature, and N is the number of data sets. The LassoCV-Catboost model test results are compared with the results predicted by ARIMA and LSTM models, and the final fitting result comparison chart is shown in FIG. 2.
Compared with the prior art, the invention has the following advantages:
the first, catboost gradient lifting tree integrated model is suitable for modeling tasks of multi-element time series data, and compared with a traditional time series prediction method, the unique gradient single-side sampling (GOSS), feature binding technology (EFB) and histogram algorithm (Hist) of the integrated model better solve the problems of high dimensionality, nonlinearity and local minimum, and have stronger data learning capacity and generalization capacity.
Second, lassoCV has the ability to analyze feature importance and a regularization term was added to prevent data overfitting.
Thirdly, the invention establishes the hysteresis characteristic through the autocorrelation coefficient, then uses the LassoCV to perform characteristic selection on the multi-element weather time sequence data, and finally performs parameter tuning through the genetic algorithm GA, thereby providing more effective and accurate data for the construction of the model and reducing the complexity of the model.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a graph of the final fit result of the present invention;
Detailed Description
The present invention will be described in further detail with reference to examples.
It will be appreciated by those skilled in the art that the following examples are illustrative of the present invention and should not be construed as limiting the scope of the invention. The specific techniques or conditions are not identified in the examples and are performed according to techniques or conditions described in the literature in this field or according to the product specifications. The materials or equipment used are conventional products available from commercial sources, not identified to the manufacturer.
A low-temperature prediction method implementation flow chart based on a Catboost model is shown in FIG. 1, and comprises the following steps:
(1) Acquiring meteorological data;
(2) Establishing hysteresis characteristics between meteorological data through autocorrelation coefficients;
(3) The data preprocessing specifically comprises the steps of missing value filling and data normalization;
(4) Selecting characteristics through LassoCV;
(5) Establishing a LassoCV-Catboost model for low-temperature prediction;
(6) Optimizing parameters of the Catboost model through a genetic algorithm GA to obtain a final low-temperature prediction model;
(7) Evaluating a model;
measuring the current moment y by means of an autocorrelation coefficient t With a lag of y from k t-k Correlation between them. The correlation measure is the degree of correlation of two random variables, and the correlation coefficient can measure the linear correlation between the two variables.
Figure BDA0003969950260000061
r k Denoted by y t And his degree of correlation of the k-order hysteresis. r is (r) k Referred to as autocorrelation coefficients (Autocorrelation Coefficient, ACF), the autocorrelation study is a relationship of values of a time series at different time points.
Preferably, the specific process of the step (3) is as follows:
and processing and modeling the data by using a Python environment, filling the missing value and the null value by using a filena function, and selecting an average value for filling. Next, the date and time information (year, month, day) is integrated into one date and time so that it can be used as an index of Pandas.
And carrying out normalization processing on the input data. The normalization method selected herein is linear function normalization (Min-Max scaling), and the specific normalization formula is shown as follows.
Figure BDA0003969950260000062
Wherein x is i (i=1, 2,3, …, n) is a meteorological data input feature,
Figure BDA0003969950260000067
for the normalized value, max is the maximum value of each meteorological element, and min is the minimum value of each meteorological element. />
Preferably, the specific process of the step (4) is as follows:
setting a linear regression model as
Figure BDA0003969950260000063
Wherein X= [ X ] 1 ,x 2 ,…,x i ,…,x n ] T ,x i =[x i,1 ,x i,2 ,…,x i,m ]∈R 1×m For low-temperature characteristic data subjected to autocorrelation pretreatment, y= [ Y ] 1 ,y 2 ,…,y n ] T ∈R n×1 For response variables, β= [ β ] 12 ,…,β m ] T ∈R 1×m As model coefficients, ε= [ ε ] 12 ,…,ε n ] T ∈R n×1 Is an error vector. The normal least squares estimation of the linear regression model is
Figure BDA0003969950260000064
Figure BDA0003969950260000065
When adding constraint functions, i.e. LASSO, it is specifically expressed as
Figure BDA0003969950260000066
The parameter lambda is a penalty coefficient of parameter estimation, the size of the parameter lambda passes ten-fold cross validation, and the determination mode of the parameter alpha is the same.
The LASSO regression algorithm is solved by the minimum regression method, the minimum regression method is a variable screening algorithm based on a forward selection algorithm and a forward gradient algorithm, and more accurate feature vectors can be obtained, and the method is specifically described as follows:
1) The calculation process of the forward selection algorithm is as follows: at X= [ X 1 ,x 2 ,…,x i ,…,x n ] T In the selection and target variable y k The closest argument x i =[x i,1 ,x i,2 ,…,x i,m ]There is
Figure BDA0003969950260000074
Wherein the coefficient beta k Determined by the above method
Figure BDA0003969950260000071
The variable residual is
Figure BDA0003969950260000073
Defining the variable residual as the new target variable while the variable residual will be free of x k As a new set of arguments, repeating the above procedure until the residual is less than the set rangeThe number of sets of surrounding or independent variables is zero and the algorithm is terminated.
2) The forward gradient algorithm selects a feature variable x with the largest correlation each time k Approximating the target variable y k Unlike the forward selection algorithm, its residual is defined as
y res,k =y k -x k β k (19)
Taking the residual as a new objective function, and taking the original variable set X= [ X ] 1 ,x 2 ,…,x i ,…,x n ] T As a variable set, according to y res,k =y k -x k β k Recalculate until residual y res,k And (5) being smaller than the set threshold range, and obtaining the optimal solution.
The LASSO algorithm comprises the following specific steps:
step1 solving the variable x having the highest correlation with the objective function according to the equation (16) and the equation (17) k And removing it from the set of variables and determining a new target variable according to equation (19);
step2 repeat Step1 until a new variable x is obtained l With the target variable y res,k Correlation and variable x of (2) k And y is res,k The correlation degree of (3) is the same;
step3 at x k And x l On the angular bisector of (2), re-approximating by means of equation (19) to obtain the variable x t So that x is t And y is res,k Correlation of (2) and x k ,x l And y is res,k As is the correlation of the variable x t Adding the new approach direction into the feature set, and taking a common angular bisector of the feature set as a new approach direction;
step4, cycling the above process until y res,k Small enough or the variable set is empty, the final feature set is the required feature variable.
Preferably, the specific process of the step (6) is as follows:
the low-temperature prediction is defined as a sequence of meteorological elements { …, x }, obtained with historical time t-1 ,x t Low temperature sequence {, x } to predict future time t+1 ,x t+2 …. Will be pretreated and feature selectedThe data set is brought into the Catboost model for training, and then a genetic algorithm GA is selected to combine with ten-fold cross validation to optimize main super parameters of the Catboost model, including the parameters including the candidate_rate, the max_depth and the criterion, so that the precision of low-temperature prediction of the model is improved.
Preferably, the specific process of the step (7) is as follows:
according to the actual low-temperature value and the predicted low-temperature value, the model is compared with the traditional prediction model ARIMA and a long-short-term memory network (LSTM) in precision. The Root Mean Square Error (RMSE), mean Absolute Error (MAE) and Mean Absolute Percent Error (MAPE) are selected as the evaluation indexes of the model. Their calculation formula is as follows:
Figure BDA0003969950260000081
Figure BDA0003969950260000082
Figure BDA0003969950260000083
wherein y' i For the predicted low temperature value, y i N is the actual value of the air temperature, and N is the number of data sets. The LassoCV-Catboost model test results are compared with the results predicted by ARIMA and LSTM models, and the final fitting result comparison chart is shown in FIG. 2.
Application example of the invention:
(1) And (3) data acquisition: meteorological data is downloaded from the national academy of sciences of China (http:// www.resdc.cn /) website.
(2) The invention predicts the low-temperature data of a certain site in the historical Yunnan province, and evaluates and compares the model.
(3) The comparison chart is shown in the specification and the drawing (figure 2). The RMSE value of the invention was 1.432, the MAE was 1.223 and the MAPE was 11.38%.

Claims (7)

1. A low-temperature prediction method based on a Catboost model is characterized in that the influence of time correlation of multi-element meteorological data and the multi-element meteorological data on temperature change can be fully considered, and the method comprises the following steps:
(1) Acquiring meteorological data;
(2) Establishing hysteresis characteristics between meteorological data through autocorrelation coefficients;
(3) The data preprocessing specifically comprises the steps of missing value filling and data normalization;
(4) Selecting characteristics through LassoCV;
(5) Establishing a LassoCV-Catboost model for low-temperature prediction;
(6) Optimizing parameters of the Catboost model through a genetic algorithm GA to obtain a final low-temperature prediction model;
(7) And (5) evaluating a model.
2. The method for predicting low temperature based on the Catboost model as claimed in claim 1, wherein the hysteresis feature establishing method in the step (2) is as follows:
measuring the current moment y by means of an autocorrelation coefficient t With a lag of y from k t-k The correlation between the two variables is measured by the correlation degree of the two random variables, and the linear correlation between the two variables can be measured by the correlation coefficient;
Figure FDA0003969950250000011
r k denoted by y t And his degree of correlation with the k-order hysteresis, r k Referred to as autocorrelation coefficients (Autocorrelation Coefficient, ACF), the autocorrelation study is a relationship of values of a time series at different time points.
3. The method for predicting low temperature based on the Catboost model as claimed in claim 1, wherein the step of preprocessing the data in the step (3) is as follows:
processing and modeling data by using a Python environment, filling a missing value and a null value by using a filena function, selecting an average value for filling, and integrating date and time information (year, month and day) into a date and time so as to be used as an index of Pandas;
the normalization processing is performed on the input data, wherein the normalization method is selected as a linear function normalization (Min-Max normalization), and the specific normalization formula is shown as follows:
Figure FDA0003969950250000012
wherein x is i (i=1, 2,3, …, n) is a meteorological data input feature,
Figure FDA0003969950250000013
for the normalized value, max is the maximum value of each meteorological element, and min is the minimum value of each meteorological element.
4. The method for predicting low temperature based on the Catboost model as claimed in claim 1, wherein the method for selecting LassoCV features in the step (4) is as follows:
setting a linear regression model as follows:
Y=X T β+ε (3)
wherein X= [ X ] 1 ,x 2 ,…,x i ,…,x n ] T ,x i =[x i,1 ,x i,2 ,…,x i,m ]∈R 1×m For low-temperature characteristic data subjected to autocorrelation pretreatment, y= [ Y ] 1 ,y 2 ,…,y n ] T ∈R n×1 For response variables, β= [ β ] 12 ,…,β m ] T ∈R 1×m As model coefficients, ε= [ ε ] 12 ,…,ε n ] T ∈R n×1 As error vector, the normal least square method of the linear regression model is estimated as
Figure FDA0003969950250000021
Figure FDA0003969950250000022
Figure FDA0003969950250000023
When adding constraint functions, i.e. LASSO, it is specifically expressed as: />
Figure FDA0003969950250000024
The parameter lambda is a penalty coefficient of parameter estimation, the size of the parameter lambda passes ten-fold cross validation, and the determination mode of the parameter alpha is the same;
the LASSO regression algorithm is solved by the minimum regression method, the minimum regression method is a variable screening algorithm based on a forward selection algorithm and a forward gradient algorithm, and more accurate feature vectors can be obtained, and the method is specifically described as follows:
1) The calculation process of the forward selection algorithm is as follows: at X= [ X 1 ,x 2 ,…,x i ,…,x n ] T In the selection and target variable y k The closest argument x i =[x i,1 ,x i,2 ,…,x i,m ]There is
Figure FDA0003969950250000025
Wherein the coefficient beta k Determined by the above method
Figure FDA0003969950250000026
The variable residual is
Figure FDA0003969950250000027
Defining the variable residual as the new target variable while the variable residual will be free of x k The set X of the set (2) is taken as a new independent variable set, the process is repeated until the residual error is smaller than a set range or the number of the independent variable sets is zero, and the algorithm is terminated;
2) The forward gradient algorithm selects a feature variable x with the largest correlation each time k Approximating the target variable y k Unlike the forward selection algorithm, its residual is defined as:
y res,k =y k -x k β k (8)
taking the residual as a new objective function, and taking the original variable set X= [ X ] 1 ,x 2 ,…,x i ,…,x n ] T As a variable set, according to y res,k =y k -x k β k Recalculate until residual y res,k Less than the set threshold range to obtain an optimal solution;
the LASSO algorithm comprises the following specific steps:
step1 solving the variable x having the highest correlation with the objective function according to the equation (5) and the equation (6) k Removing the new target variable from the variable set sum, and determining a new target variable according to a formula (8);
step2 repeat Step1 until a new variable x is obtained l With the target variable y res,k Correlation and variable x of (2) k And y is res,k The correlation degree of (3) is the same;
step3 at x k And x l On the angular bisector of (2), re-approximating by equation (8) to obtain the variable x t So that x is t And y is res,k Correlation of (2) and x k ,x l And y is res,k As is the correlation of the variable x t Adding the new approach direction into the feature set, and taking a common angular bisector of the feature set as a new approach direction;
step4, cycling the above process until y res,k Small enough or the variable set is empty, the final feature set is the required feature variable.
5. The castboost model-based low temperature prediction method of claim 1: the method is characterized in that in the step (5), the characteristics after the characteristics of the Lasso-CV are selected are brought into a Catboost model for training.
6. The method for predicting low temperature based on the Catboost model as claimed in claim 1, wherein the specific steps of the genetic algorithm GA optimization parameters in the step (6) are as follows:
the low-temperature prediction is defined as a sequence of meteorological elements { …, x }, obtained with historical time t-1 ,x t Low temperature sequence {, x } to predict future time t+1 ,x t+2 …, the data set after pretreatment and feature selection is brought into a Catboost model for training, and then a genetic algorithm GA is selected to combine with ten-fold cross validation to optimize main super parameters of the Catboost model, including interfaces, learning_rate, max_depth and criterion, so as to improve the precision of model low-temperature prediction.
7. The method for predicting low temperature based on the Catboost model as claimed in claim 1, wherein the specific steps of the model evaluation in the step (7) are as follows:
according to the actual low temperature value and the predicted low temperature value, the model of the invention is compared with the traditional prediction model ARIMA and a long and short term memory network (LSTM) in precision, and Root Mean Square Error (RMSE), mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) are selected as evaluation indexes of the model, and the calculation formulas are as follows:
Figure FDA0003969950250000041
Figure FDA0003969950250000042
Figure FDA0003969950250000043
wherein y is i ' is a predicted low temperature value, y i As for the actual value of the air temperature, N is the number of data sets, the test result of the LassoCV-Catboost model is compared with the predicted result of the ARIMA model and the LSTM model, and the final fitting result comparison chart is shown in fig. 2.
CN202211509183.0A 2022-11-29 2022-11-29 Low-temperature prediction based on Catboost model Pending CN116187501A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211509183.0A CN116187501A (en) 2022-11-29 2022-11-29 Low-temperature prediction based on Catboost model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211509183.0A CN116187501A (en) 2022-11-29 2022-11-29 Low-temperature prediction based on Catboost model

Publications (1)

Publication Number Publication Date
CN116187501A true CN116187501A (en) 2023-05-30

Family

ID=86441046

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211509183.0A Pending CN116187501A (en) 2022-11-29 2022-11-29 Low-temperature prediction based on Catboost model

Country Status (1)

Country Link
CN (1) CN116187501A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110046756A (en) * 2019-04-08 2019-07-23 东南大学 Short-time weather forecasting method based on Wavelet Denoising Method and Catboost
CN111311025A (en) * 2020-03-17 2020-06-19 南京工程学院 Load prediction method based on meteorological similar days
KR102149053B1 (en) * 2020-05-14 2020-08-31 주식회사 애자일소다 Modeling system and method for predicting component
CN113641959A (en) * 2021-08-13 2021-11-12 山东电工电气集团有限公司 High-voltage cable joint temperature trend prediction method
CN113705877A (en) * 2021-08-23 2021-11-26 武汉大学 Real-time monthly runoff forecasting method based on deep learning model
CN113821895A (en) * 2021-09-01 2021-12-21 南方电网科学研究院有限责任公司 Construction method and device of power transmission line icing thickness prediction model and storage medium
CN115018193A (en) * 2022-07-01 2022-09-06 北京华能新锐控制技术有限公司 Time series wind energy data prediction method based on LSTM-GA model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110046756A (en) * 2019-04-08 2019-07-23 东南大学 Short-time weather forecasting method based on Wavelet Denoising Method and Catboost
CN111311025A (en) * 2020-03-17 2020-06-19 南京工程学院 Load prediction method based on meteorological similar days
KR102149053B1 (en) * 2020-05-14 2020-08-31 주식회사 애자일소다 Modeling system and method for predicting component
CN113641959A (en) * 2021-08-13 2021-11-12 山东电工电气集团有限公司 High-voltage cable joint temperature trend prediction method
CN113705877A (en) * 2021-08-23 2021-11-26 武汉大学 Real-time monthly runoff forecasting method based on deep learning model
CN113821895A (en) * 2021-09-01 2021-12-21 南方电网科学研究院有限责任公司 Construction method and device of power transmission line icing thickness prediction model and storage medium
CN115018193A (en) * 2022-07-01 2022-09-06 北京华能新锐控制技术有限公司 Time series wind energy data prediction method based on LSTM-GA model

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
周元哲: "《Python数据分析与机器学习》", 30 June 2022, 机械工业出版社, pages: 116 - 118 *
成立明等: "《Python爬虫、数据分析与可视化 工具详解与案例实战》", 31 January 2021, 哈尔滨工业大学出版社, pages: 142 - 237 *
艾小伟: "《PYTHON程序设计 从基础开发到数据分析》", 31 July 2021, 机械工业出版社, pages: 248 - 249 *
董力铭等: "分类梯度提升算法(CatBoost)与蝙蝠算法(Bat)耦合建模预测中国西北部地区水面蒸发量", 《节水灌溉》, no. 2, 10 February 2021 (2021-02-10), pages 63 - 69 *
钟晓妮等: "中华生物医学统计大辞典 描述统计分册》", 31 December 2020, 中国统计出版社, pages: 79 *

Similar Documents

Publication Publication Date Title
Bouzgou et al. Minimum redundancy–maximum relevance with extreme learning machines for global solar radiation forecasting: Toward an optimized dimensionality reduction for solar time series
CN111310968A (en) LSTM neural network circulation hydrological forecasting method based on mutual information
CN111105104A (en) Short-term power load prediction method based on similar day and RBF neural network
CN111028100A (en) Refined short-term load prediction method, device and medium considering meteorological factors
CN115271186B (en) Reservoir water level prediction and early warning method based on delay factor and PSO RNN Attention model
CN109143408B (en) Dynamic region combined short-time rainfall forecasting method based on MLP
CN113536665B (en) Road surface temperature short-term prediction method and system based on characteristic engineering and LSTM
CN113139605A (en) Power load prediction method based on principal component analysis and LSTM neural network
CN111506868B (en) Ultra-short-term wind speed prediction method based on HHT weight optimization
CN118095570A (en) Intelligent load prediction method and system for transformer area, electronic equipment, medium and chip
CN115329930A (en) Flood process probability forecasting method based on mixed deep learning model
CN116205508A (en) Distributed photovoltaic power generation abnormality diagnosis method and system
CN115310648A (en) Medium-and-long-term wind power combination prediction method based on multi-meteorological variable model identification
CN114429238A (en) Wind turbine generator fault early warning method based on space-time feature extraction
CN116960962A (en) Mid-long term area load prediction method for cross-area data fusion
CN112861418A (en) Short-term icing thickness prediction method for stay cable based on GA-WOA-GRNN network
CN117200223A (en) Day-ahead power load prediction method and device
CN117290673A (en) Ship energy consumption high-precision prediction system based on multi-model fusion
JP7342369B2 (en) Prediction system, prediction method
CN115034426B (en) Rolling load prediction method based on phase space reconstruction and multi-model fusion Stacking integrated learning mode
CN115936236A (en) Method, system, equipment and medium for predicting energy consumption of cigarette factory
CN116187501A (en) Low-temperature prediction based on Catboost model
CN115907228A (en) Short-term power load prediction analysis method based on PSO-LSSVM
CN115759343A (en) E-LSTM-based user electric quantity prediction method and device
CN112581311B (en) Method and system for predicting long-term output fluctuation characteristics of aggregated multiple wind power plants

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination