CN111832828B - Intelligent precipitation prediction method based on wind cloud No. four meteorological satellites - Google Patents

Intelligent precipitation prediction method based on wind cloud No. four meteorological satellites Download PDF

Info

Publication number
CN111832828B
CN111832828B CN202010693436.9A CN202010693436A CN111832828B CN 111832828 B CN111832828 B CN 111832828B CN 202010693436 A CN202010693436 A CN 202010693436A CN 111832828 B CN111832828 B CN 111832828B
Authority
CN
China
Prior art keywords
data
satellite
gbdt
model
precipitation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010693436.9A
Other languages
Chinese (zh)
Other versions
CN111832828A (en
Inventor
刘年庆
宋丽莉
熊怡
蒋建莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Satellite Meteorological Center
Original Assignee
National Satellite Meteorological Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Satellite Meteorological Center filed Critical National Satellite Meteorological Center
Priority to CN202010693436.9A priority Critical patent/CN111832828B/en
Publication of CN111832828A publication Critical patent/CN111832828A/en
Application granted granted Critical
Publication of CN111832828B publication Critical patent/CN111832828B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Economics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Operations Research (AREA)
  • Biomedical Technology (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Physiology (AREA)
  • Genetics & Genomics (AREA)
  • Artificial Intelligence (AREA)
  • General Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

According to the intelligent precipitation prediction method based on the weather satellite No. four of the cloud, time and space continuity integration is carried out on observed result data of the weather satellite No. four of the cloud, a GBDT mathematical model for precipitation prediction is designed, an initial value of parameters of the GBDT algorithm model is optimized by utilizing a genetic algorithm before the integrated satellite observation data trains the GBDT algorithm model, an optimal parameter initial value of the GBDT algorithm model is obtained, the integrated data trains a precipitation prediction model, and precipitation prediction is completed by utilizing the trained prediction model. According to the invention, through integrating training data, the data of the cloud satellite No. four is used to the maximum extent, the sufficiency and the reality of sample data are ensured, the initial value of the parameters of the GBDT algorithm model is optimized by utilizing the genetic algorithm, the data processing capacity of the prediction model is improved, and the accuracy of the precipitation prediction result and the stability of prediction calculation are improved.

Description

Intelligent precipitation prediction method based on wind cloud No. four meteorological satellites
Technical Field
The invention belongs to the technical field of weather prediction, and particularly relates to an intelligent precipitation prediction method based on a cloud No. four weather satellite.
Background
When the wind cloud satellite imaging instrument observes, the cloud wind guide manufacturing, flywheel unloading and scanning area frequency maximization are considered, so that the observation time range is 5 minutes and 15 minutes, the observation space range is two kinds of full discs and China areas, and the observation time is changed for many times every day, and the time when the flywheel is unloaded and vacant is also needed, so that in order to furthest use the data of the wind cloud satellite, the time and space continuity integration is needed for the observation result data of the wind cloud satellite.
In the existing precipitation prediction method based on the weather satellite No. four of the wind cloud, the mathematical model cannot integrate the observed result data of the weather satellite No. four of the wind cloud in time and space continuously, so that the mathematical model obtained by training the observed result data of the weather satellite No. four of the wind cloud is poor in robustness, and the predicted result is poor in accuracy and low in reliability. In addition, the existing mathematical model for predicting the precipitation generally adopts a single algorithm, and a plurality of effective algorithms cannot be fused and optimized, so that the mathematical model is insufficient in data processing capacity, and in particular relates to prediction of the precipitation, the adaptability is poor, the prediction result is inaccurate, and the stability is insufficient.
Disclosure of Invention
The invention aims to solve the technical problems in the prior art and provides an intelligent precipitation prediction method based on a cloud No. four meteorological satellite, which utilizes the observation data of the cloud No. four meteorological satellite to complete the prediction of precipitation in a Chinese area.
The specific technical scheme of the invention is as follows:
step one, satellite observation data time integration:
the average value of the conventional observation data of each channel of the satellite to the China area within the continuous 3 and 5 minutes time period is respectively integrated into 15 minutes data, and 15 minutes is set as one time, namely, the values of each channel of the newly generated 15 minutes of China area observation data are as follows:
wherein i is the channel number of the cloud number four, the value range is 1 to 14, j is the serial number of the 5-minute data, and n is 3, namely the number of the 5-minute data to be averaged.
For 17:15 data missing from the flywheel unloading stop at midnight at satellite point, the data were replaced with the data mean of two time before and after 17:00, 17:30, since 17:00 data were full circle chart observations, 17:30 is observation of Chinese areas, the ranges are different, and the observation needs to be integrated into the observation size of the Chinese areas and then mean value calculation is carried out;
step two, satellite scanning data space integration:
cutting the time-integrated full-disc conventional observation data to obtain an area which is the same as the conventional observation area of China, so that all the data have the same range and time frequency, the time and space integration of satellite data is completed, the integrated satellite data is integrated for 4 times in one hour, and the range is the conventional observation range of the China area;
step three, selecting a precipitation prediction channel of a wind cloud satellite No. four
The method comprises the steps that 14 channels are shared by a wind-cloud satellite number four, satellite prediction precipitation is carried out, an infrared channel and a water vapor channel are selected according to a physical mechanism to calculate, when more channels are selected, the relation among the channels and the effect on precipitation prediction cannot be analyzed, but each channel contains information contributing to precipitation, the information loss is caused by few selected channels, the precipitation precision cannot be improved, and in order to fully mine the information in the channels, 7-14 satellite channels in a list are adopted when wind-cloud satellite number four data are utilized;
step four, setting genetic algorithm and parameter configuration
Predicting precipitation by using a mathematical model combining a genetic algorithm and a GBDT (Gradient Boosting Decision Tree) algorithm; optimizing the initial value of the parameter of the GBDT algorithm model by utilizing a genetic algorithm to obtain the optimal initial value of the GBDT algorithm model;
wherein, the genetic algorithm is specifically designed as follows:
1. population initialization and chromosome coding
The individual coding method is real number coding, each individual is a real number string, and the individual contains all parameters of the GBDT algorithm model; the number of initial populations is: 20-100;
2. determining an objective function and an fitness function
According to all parameters of the GBDT algorithm model obtained by an individual, training the GBDT algorithm model by training data, predicting system output, and taking an absolute value of an error between expected output and actual output obtained by prediction as an individual fitness value F, wherein a calculation formula is as follows:
wherein n is the number of network output nodes; y is i Expected output of leaf node of ith regression tree of GBDT algorithm model, o i The actual output of the leaf node of the ith regression tree; k is a coefficient;
3. selection operation
The selection operation is set as a proportion selection method, namely a selection strategy based on fitness proportion, and the selection probability pi of each individual i is as follows:
wherein Fi is the fitness value of the individual i; n is the number of population individuals;
4. crossover operation
Since individuals use real numbers for coding, the crossover operation method uses real numbers crossover method, the kth chromosome a k And chromosome a of the first l At jThe crossover operation method of the genes is as follows:
a kj =a kj (1-b)+a lj b
a lj =a lj (1-b)+a kj b
wherein the coefficient b is a random number between [0,1 ]; the crossover probability is set to 0.4-0.99;
5. mutation operation
Selecting the jth gene a of the ith chromosome individual ij The mutation operation is carried out by the following method:
wherein a is max Is gene a ij Upper bound of (2); a, a min Is gene a ij Lower bound of (2);g is the current iteration number, and the final evolution algebra of the genetic algorithm is 100-500; g max Is the largest number of evolutions; r is [0,1]]Random numbers between the two; the variation probability is 0.0001-0.1;
step five, setting and calculating process of GBDT algorithm
In the fourth step, optimizing the initial value of the parameter of the GBDT algorithm model by utilizing a genetic algorithm to obtain the optimal initial value of the parameter of the GBDT algorithm model; in the fifth step, the integrated historical data of the satellite observation point scanning, which can be longitude and latitude information, a annual serial number of the same day and the altitude of the same point, are substituted into the GBDT algorithm model to obtain the predicted precipitation corresponding to the satellite observation point; comparing the predicted precipitation with the actual precipitation in the historical data of satellite observation point scanning to obtain residual errors, and further finishing training the GBDT algorithm model; obtaining optimal parameter selection of the GBDT algorithm model through training, and substituting real-time data of an observation point of satellite real-time scanning such as longitude and latitude information, a year serial number of the same day and the altitude of the same point into the trained GBDT algorithm model to obtain a corresponding precipitation prediction value of the satellite observation point;
(1) The GBDT algorithm is set to:
(1) selection of a loss function
For the GBDT algorithm to predict precipitation, or to find the parameters of the appropriate GBDT algorithm, the measurement is whether the parameters of the GBDT algorithm are verified, the determination is made by a loss function, the loss function:
L(y,F)=|y-F|,where y is the actual value, F (x) is the predicted value, +.>Loss of Hu Ba->Obtaining the most appropriate parameters for each regression tree is equivalent to minimizing the loss function L;
(2) objective function
The objective function is to obtain the most suitable parameters of each regression tree, namely the allocation proportion ρ of the t-th regression tree t Parameters of the t-th regression tree:wherein ρ is the distribution ratio of each regression tree, namely the learning rate or step length of the tree, and θ is the parameter of the regression tree;
wherein parameter iteration
Wherein the step size is selected
Wherein E is x,y The error rate, x, represents the input data, f (x) i ) Representing the predicted value of the ith iteration, h representing the second derivative; h (x, θ) represents a regression tree, h (x) it ) ThenRepresenting the ith iteration of the t-th regression tree;
(3) calculation of residual:
when the GBDT algorithm trains the regression tree, the residual error is required to be used as a target value, and the next regression tree is trained until the number of the established regression trees reaches the requirement and the residual error r it Stopping training when reaching the expected range, r it That is to say the direction in which the weak learner of the ith iteration was set up,
(2) The iteration flow of the GBDT algorithm is as follows:
GBDT trains each decision tree for errors in the previous decision tree classification results. Each calculation is to reduce the last residual error, and to eliminate the residual error GBDT, iterate for several times, each iteration builds a new model in the gradient direction of residual error reduction, and the specific iteration flow is as follows:
(1) initializing a first equationAnd initial value θ of parameters of GBDT algorithm model 0 Optimizing an initial value of a parameter of the GBDT algorithm model by utilizing a genetic algorithm to obtain an optimal initial value of the parameter of the GBDT algorithm model;
(2) each iteration calculates the residualr it That is, the direction in which the weak learner of the ith iteration builds to train the next regression tree;
(3) constructing regression equation { (x) with residual error as target i ,r it )} i=1,...,n Wherein x is i The data is imported, and can be longitude and latitude information of the satellite observation point, a year serial number of the same day and the altitude of the same point;
(4) find the most suitable parameter choice at present:
(5) after M iterations, the final prediction result is obtained:y i and the predicted precipitation amount is obtained for the satellite observation point.
Training sample setting: the traditional precipitation model is nationwide and uses a template, so that precipitation differences caused by different altitudes in different areas cannot be processed, precipitation differences caused by different longitude and latitude positions cannot be processed, and precipitation differences caused by different seasons cannot be processed. In the invention, after the precipitation live product and the satellite resolution are matched to 5 km, in order to make the prediction more stable, the data of 9 points which are 3×3 around by taking the observation point as the center are creatively adopted, and longitude and latitude information, the annual serial number of the current day and the altitude of the point are added as inputs of one sample. The input innovation of the characteristic simultaneously considers time, position and altitude, and is greatly improved compared with the traditional unified model for national precipitation. For one input training sample, 292 features are included,
training process: to speed up training time while reducing prediction time, multiple models are trained per region. The observation of the China area is that the north latitude ranges from 3 degrees to 55 degrees and the east longitude ranges from 60 degrees to 137 degrees. A model is trained on each data within the range of 4 degrees by 4 degrees, so that parallel calculation can be performed during training and prediction, the efficiency is improved, and the prediction time is greatly shortened. The latitude direction spans 55 ° -3 ° =52°, 13 area models are required in total, the longitude direction spans 137 ° -60 ° =77°,20 models are required in total, and the 20 th model only requires data of 1 ° longitude×4 ° latitude. I.e. a model of 20 x 13 = 260 areas in total needs to be trained. For each model, 2019 data is input for training, the satellite resolution is 0.04 DEG, then 100×100 points are arranged in each standard area, 365 days are multiplied, and 365 ten thousand training data are used for training
Prediction setting: for satellite data outside the training set, after the satellite acquires 4 time data within one hour, 4 degrees by 4 degrees splitting is performed. Each area is respectively divided into corresponding models for prediction, and finally, prediction results are spliced to form precipitation distribution of the China area
The method for predicting precipitation based on the cloud satellite No. four has the following technical effects:
1. and the observed result data of the wind cloud No. four satellites are integrated in time and space continuity, the data of the wind cloud No. four satellites are used to the maximum extent, the sufficiency and the reality of sample data are ensured, the integrated data are trained on a precipitation prediction model, and the accuracy and the reliability of the result of the precipitation prediction method are improved.
Aiming at the characteristics of observed result data of a wind cloud satellite No. four, a mathematical model for precipitation prediction is involved, a genetic algorithm and a GBDT algorithm model are subjected to deep fusion and optimization, and before the integrated satellite observation data trains the GBDT algorithm model, the initial value of parameters of the GBDT algorithm model is optimized by utilizing the genetic algorithm to obtain the optimal parameter initial value of the GBDT algorithm model, and the iterative computation of the GBDT algorithm model is further completed on the basis of the parameter initial value.
Drawings
Fig. 1: GBDT algorithm flow chart.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
When the wind cloud satellite imaging instrument observes, the cloud wind guide manufacturing, flywheel unloading and the frequency maximization of a scanning area are considered, so that the observation time is 5 minutes and 15 minutes, the space range of observation is two types of full discs and China areas, and the observation time is changed for many times every day, and in addition, in order to furthest use the data of the wind cloud satellite, the observation result of the wind cloud satellite must be integrated with the continuity of space.
1. Satellite scan data time integration
The observation mode of the wind cloud No. four satellite imagers is as follows: the imaging instrument obtains 40 full-disc cloud pictures and 165 Chinese regional cloud pictures every day, and the observation range of the Chinese regional cloud pictures is 3-55 degrees in north latitude and 60-137 degrees in east longitude. The whole disc was observed once per hour for 15 minutes from full point to full point.
Every 3 hours, 3 consecutive full-disc observations were made for a time of 00:00 (worldwide time, hereinafter the same) 03:00/06:00/09:00/12:00/15:00/18:00/21:00 hours full-disc cloud and one full-disc cloud before and after the time, for example: the continuous 3 full-disc observation times of 00:00 are 23:45-23:59:59, 00:00-00:14:59, and 00:15-00:29:59.
The chinese area observations were made for 5 minutes without full disc observations. Positioning calibration observations were made every 15 minutes in the observation gap. And (5) carrying out flywheel unloading stopping observation at midnight at a point below the satellite. The time is 17:15-17:30.
because satellite data observation time is irregular and is unfavorable for the input of a later algorithm, satellite data are integrated, and the average value of all channels conventionally observed in continuous 3 Chinese areas with 5 minutes is integrated into 15 minutes. Namely, the newly generated channel values of the 15-minute China area observation data are as follows:
wherein i is the channel number of the cloud number four, the value range is 1 to 14, j is the serial number of the 5-minute data, and n is 3, namely the number of the 5-minute data to be averaged.
For 17:15, the missing data were stopped observing by flywheel unloading at midnight at the point below the satellite, and the data were replaced with the data mean of the two time before and after. Since the 17:00 data is a full circle chart observation, 17:30 is observation of China area, the scope is different, and the observation is integrated into the size of the China area and then the average value calculation is carried out.
2. Satellite scan data space integration
And cutting the time-integrated full-disc conventional observation data to obtain an area which is the same as the China area conventional observation, so that all the data have the same range and time frequency, the time and space integration of the satellite data is completed, the integrated satellite data is integrated for 4 times in one hour, and the range is the China area conventional observation range.
3. Wind cloud number four satellite precipitation prediction channel selection
There are 14 total channels for satellite number four of the wind cloud, as shown in table 1. Generally, satellite prediction precipitation is performed, an infrared channel and a water vapor channel are selected to calculate according to a physical mechanism, and when more channels are selected, the relationship between the channels and the effect on precipitation prediction cannot be analyzed. But each channel contains information contributing to precipitation, and few channels are selected to cause information loss, so that the accuracy of precipitation cannot be improved. According to the invention, the artificial intelligent algorithm is utilized to predict precipitation, so that the information in the channels can be better mined, and therefore, when the wind cloud satellite data of the fourth satellite are utilized, 8 satellite channels in the list are adopted, namely 7-14 satellite channels are adopted.
Table 1 satellite channel list number four cloud
Channel number Center wave number (mum) Channel type
1 0.47 Visible light
2 0.65 Visible light
3 0.83 Near infrared
4 1.37 Short wave infrared
5 1.61 Short wave infrared
6 2.22 Short wave infrared
7 3.72H Medium wave infrared
8 3.72L Medium wave infrared
9 6.25 Water vapor
10 7.1 Water vapor
11 8.5 Long wave infrared
12 10.8 Long wave infrared
13 12 Long wave infrared
14 13.5 Long wave infrared
4. Algorithm selection and parameter configuration
The GBDT gradient lifting decision tree is an iterative decision tree algorithm, is composed of a plurality of decision trees based on a forward distribution algorithm and an addition model, realizes an integrated algorithm of classification and regression by reducing residual errors generated in a learning process, and finally accumulates conclusions of all the trees to make decisions. There are two types of trees, classification trees, which are typically used to handle classification problems, and regression trees, which are typically used to handle predictive problems. GBDT is an algorithm with strong generalization capability, and the core of GBDT algorithm is to integrate the results of all trees as final prediction output through residual errors generated by fitting each tree to the previous tree and through a series of formula calculation, while classification tree is not easy to realize the above process, so the invention adopts the tree in GBDT algorithm as regression tree. The flow chart is shown in figure 1.
Optimizing the initial value of the parameter of the GBDT algorithm model by utilizing a genetic algorithm to obtain the optimal initial value of the parameter of the GBDT algorithm model; substituting the integrated historical data of satellite observation point scanning such as longitude and latitude information, the annual serial number of the same day and the altitude of the same point into a GBDT algorithm model to obtain the predicted precipitation corresponding to the satellite observation point; comparing the predicted precipitation with the actual precipitation in the historical data of satellite observation point scanning to obtain residual errors, and further finishing training the GBDT algorithm model; and obtaining optimal parameter selection of the GBDT algorithm model through training, and substituting real-time data of the observation point of satellite real-time scanning such as longitude and latitude information, a year serial number of the same day and the altitude of the same point into the trained GBDT algorithm model to obtain a corresponding precipitation prediction value of the satellite observation point.
Wherein, setting and parameter configuration of the genetic algorithm:
predicting precipitation by using a mathematical model combining a genetic algorithm and a GBDT algorithm; optimizing the initial value of the parameter of the GBDT algorithm model by utilizing a genetic algorithm to obtain the optimal initial value of the GBDT algorithm model;
wherein, the genetic algorithm is specifically designed as follows:
1. population initialization and chromosome coding
The individual coding method is real number coding, each individual is a real number string, and the individual contains all parameters of the GBDT algorithm model; the number of initial populations is: 20-100;
2. determining an objective function and an fitness function
According to all parameters of the GBDT algorithm model obtained by an individual, training the GBDT algorithm model by training data, predicting system output, and taking an absolute value of an error between expected output and actual output obtained by prediction as an individual fitness value F, wherein a calculation formula is as follows:
wherein n is the number of network output nodes; y is i Expected output of leaf node of ith regression tree of GBDT algorithm model, o i The actual output of the leaf node of the ith regression tree; k isCoefficients;
3. selection operation
The selection operation is set as a proportion selection method, namely a selection strategy based on fitness proportion, and the selection probability pi of each individual i is as follows:
wherein Fi is the fitness value of the individual i; n is the number of population individuals;
4. crossover operation
Since individuals use real numbers for coding, the crossover operation method uses real numbers crossover method, the kth chromosome a k And chromosome a of the first l The crossover operation method at j genes is as follows:
a kj =a kj (1-b)+a lj b
a lj =a lj (1-b)+a kj b
wherein the coefficient b is a random number between [0,1 ]; the crossover probability is set to 0.4-0.99;
5. mutation operation
Selecting the jth gene a of the ith chromosome individual ij The mutation operation is carried out by the following method:
wherein a is max Is gene a ij Upper bound of (2); a, a min Is gene a ij Lower bound of (2);g is the current iteration number, and the final evolution algebra of the genetic algorithm is 100-500; g max Is the largest number of evolutions; r is [0,1]]Random numbers between the two; the variation probability is 0.0001-0.1.
The GBDT algorithm comprises the following steps of:
in the fourth step, optimizing the initial value of the parameter of the GBDT algorithm model by utilizing a genetic algorithm to obtain the optimal initial value of the parameter of the GBDT algorithm model; in the fifth step, the integrated historical data of the satellite observation point scanning, which can be longitude and latitude information, a annual serial number of the same day and the altitude of the same point, are substituted into the GBDT algorithm model to obtain the predicted precipitation corresponding to the satellite observation point; comparing the predicted precipitation with the actual precipitation in the historical data of satellite observation point scanning to obtain residual errors, and further finishing training the GBDT algorithm model; obtaining optimal parameter selection of the GBDT algorithm model through training, and substituting real-time data of an observation point of satellite real-time scanning such as longitude and latitude information, a year serial number of the same day and the altitude of the same point into the trained GBDT algorithm model to obtain a corresponding precipitation prediction value of the satellite observation point;
(1) The GBDT algorithm is set to:
(1) selection of a loss function
For the GBDT algorithm to predict precipitation, or to find the parameters of the appropriate GBDT algorithm, the measurement is whether the parameters of the GBDT algorithm are verified, the determination is made by a loss function, the loss function:
L(y,F)=|y-F|,where y is the actual value, F (x) is the predicted value, +.>Loss of Hu Ba->Obtaining the most appropriate parameters for each regression tree is equivalent to minimizing the loss function L;
(2) objective function
The objective function is to obtain the most suitable parameters of each regression tree, namely the allocation proportion ρ of the t-th regression tree t Parameters of the t-th regression tree:where ρ is the ratio of the regression trees, i.e. the treeLearning rate or step length, θ is a parameter of the regression tree;
wherein parameter iteration
Wherein the step size is selected
Wherein E is x,y The error rate, x, represents the input data, f (x) i ) Representing the predicted value of the ith iteration, h representing the second derivative; h (x, θ) represents a regression tree, h (x) it ) Then the ith iteration of the t-th regression tree is represented;
(3) calculation of residual:
when the GBDT algorithm trains the regression tree, the residual error is required to be used as a target value, and the next regression tree is trained until the number of the established regression trees reaches the requirement and the residual error r it Stopping training when reaching the expected range, r it That is to say the direction in which the weak learner of the ith iteration was set up,
(2) The iteration flow of the GBDT algorithm is as follows:
GBDT trains each decision tree for errors in the previous decision tree classification results. Each calculation is to reduce the last residual error, and to eliminate the residual error GBDT, iterate for several times, each iteration builds a new model in the gradient direction of residual error reduction, and the specific iteration flow is as follows:
(1) initializing a first equationAnd initial value θ of parameters of GBDT algorithm model 0 Optimizing an initial value of a parameter of the GBDT algorithm model by utilizing a genetic algorithm to obtain an optimal initial value of the parameter of the GBDT algorithm model;
(2) each time it is stackedInstead all you need to calculate the residualr it That is, the direction in which the weak learner of the ith iteration builds to train the next regression tree;
(3) constructing regression equation { (x) with residual error as target i ,r it )} i=1,...,n Wherein x is i The data is imported, and can be longitude and latitude information of the satellite observation point, a year serial number of the same day and the altitude of the same point;
(4) find the most suitable parameter choice at present:
(5) after M iterations, the final prediction result is obtained:y i and the predicted precipitation amount is obtained for the satellite observation point.
And finally obtaining the optimal parameter setting of the GBDT algorithm model through training learning of the training set and 5 cross validation of the testing set as shown in Table 2. At this parameter, 2019 precipitation average error was 0.0476 mm, which is far better than the default parameter error of 0.327 mm.
Table 2 parameter table
5. Training sample setup
The traditional precipitation model is nationally provided with a template, so that precipitation differences caused by different altitudes in different areas cannot be processed, precipitation differences caused by different longitude and latitude positions cannot be processed, and precipitation differences caused by different seasons cannot be processed. In the invention, after the precipitation live product and the satellite resolution are matched to 5 km, in order to make the prediction more stable, the data of 9 points which are 3×3 around by taking the observation point as the center are creatively adopted, and longitude and latitude information, the annual serial number of the current day and the altitude of the point are added as inputs of one sample. The input innovation of the characteristic simultaneously considers time, position and altitude, and is greatly improved compared with the traditional unified model for national precipitation. For one input training sample, 292 features are included, as shown in the following table:
table 3 input sample content list
/>
/>
/>
The corresponding label is then the precipitation amount for this point for an hour.
6. Parallel computing setup
To speed up training time while reducing prediction time, multiple models are trained per region. The observation of the China area is that the north latitude ranges from 3 degrees to 55 degrees and the east longitude ranges from 60 degrees to 137 degrees. A model is trained on each data within the range of 4 degrees by 4 degrees, so that parallel calculation can be performed during training and prediction, the efficiency is improved, and the prediction time is greatly shortened. The latitude direction spans 55 ° -3 ° =52°, 13 area models are required in total, the longitude direction spans 137 ° -60 ° =77°,20 models are required in total, and the 20 th model only requires data of 1 ° longitude×4 ° latitude. I.e. a model of 20 x 13 = 260 areas in total needs to be trained. For each model, data of 2019 are input for training, the satellite resolution is 0.04 degrees, and then 100×100 points are in each standard area, and 365 days are multiplied for training, so that 365 tens of thousands of training data are used for training.
7. Predictive settings
For satellite data outside the training set, after the satellite acquires 4 time data within one hour, 4 degrees by 4 degrees splitting is performed. And each area is respectively divided into corresponding models for prediction, and finally, prediction results are spliced to form precipitation distribution of the China area.
The method for predicting precipitation based on the cloud satellite No. four has the following technical effects:
1. and the observed result data of the wind cloud No. four satellites are integrated in time and space continuity, the data of the wind cloud No. four satellites are used to the maximum extent, the sufficiency and the reality of sample data are ensured, the integrated data are trained on a precipitation prediction model, and the accuracy and the reliability of the result of the precipitation prediction method are improved.
2. Aiming at the characteristics of observed result data of a wind-cloud satellite No. four, a mathematical model for precipitation prediction is designed, a genetic algorithm and a GBDT algorithm model are subjected to deep fusion and optimization, and before the integrated satellite observation data trains the GBDT algorithm model, the initial value of parameters of the GBDT algorithm model is optimized by utilizing the genetic algorithm to obtain the optimal parameter initial value of the GBDT algorithm model, and the iterative computation of the GBDT algorithm model is further completed on the basis of the parameter initial value.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (4)

1. The method for predicting the precipitation based on the cloud satellite No. four is characterized by comprising the following steps of:
step one, satellite observation data time integration:
the average value of the conventional observation data of each channel of the satellite to the China area within the continuous 3 and 5 minutes time period is respectively integrated into 15 minutes data, and 15 minutes is set as one time, namely, the values of each channel of the newly generated 15 minutes of China area observation data are as follows:
wherein i is the channel number of the cloud number IV, the value range is 1 to 14, j is the serial number of the 5-minute data, and n is 3, namely the number of the 5-minute data to be averaged;
step two, satellite scanning data space integration:
cutting the time-integrated full-disc conventional observation data to obtain an area which is the same as the conventional observation area of China, so that all the data have the same range and time frequency, the time and space integration of satellite data is completed, the integrated satellite data is integrated for 4 times in one hour, and the range is the conventional observation range of the China area;
step three, selecting a wind cloud No. four satellite precipitation prediction channel:
selecting data of 8 satellite channels of a wind cloud satellite IV to calculate, and analyzing the relation among the channels and the effect on rainfall prediction;
setting a genetic algorithm and configuring parameters:
predicting precipitation by using a mathematical model combining a genetic algorithm and a GBDT algorithm; optimizing the initial value of the parameter of the GBDT algorithm model by utilizing a genetic algorithm to obtain the optimal initial value of the GBDT algorithm model;
wherein, the genetic algorithm is specifically designed as follows:
(1) Population initialization and chromosome coding
The individual coding method is real number coding, each individual is a real number string, and the individual contains all parameters of the GBDT algorithm model; the number of initial populations is: 20-100;
(2) Determining an objective function and an fitness function
According to all parameters of the GBDT algorithm model obtained by an individual, training the GBDT algorithm model by training data, predicting system output, and taking an absolute value of an error between expected output and actual output obtained by prediction as an individual fitness value F, wherein a calculation formula is as follows:
wherein n is the number of network output nodes; y is i Expected output of leaf node of ith regression tree of GBDT algorithm model, o i The actual output of the leaf node of the ith regression tree; k is a coefficient;
(3) Selection operation
The selection operation is set as a proportional selection method, i.e. a selection strategy based on fitness proportion, the probability of selection p for each individual i i The method comprises the following steps:
wherein F is i Fitness value for individual i; f (F) j The fitness value of the individual j; n is the number of population individuals;
(4) Crossover operation
Since individuals use real numbers for coding, the crossover operation method uses real numbers crossover method, the kth chromosome a k And chromosome a of the first l The crossover operation method at j genes is as follows:
a kj =a kj (1-b)+a lj b
a lj =a lj (1-b)+a kj b
wherein the coefficient b is a random number between [0,1 ]; the crossover probability is set to 0.4-0.99;
(5) Mutation operation
Selecting the jth gene a of the ith chromosome individual ij The mutation operation is carried out by the following method:
wherein a is max Is gene a ij Upper bound of (2); a, a min Is gene a ij Lower bound of (2);g is the current iteration number, and the final evolution algebra of the genetic algorithm is 100-500; g max Is the largest number of evolutions; r is [0,1]]Random numbers between the two; the variation probability is 0.0001-0.1;
step five, setting and calculating process of GBDT algorithm
In the fourth step, optimizing the initial value of the parameter of the GBDT algorithm model by utilizing a genetic algorithm to obtain the optimal initial value of the parameter of the GBDT algorithm model; in the fifth step, substituting the integrated historical data of satellite observation point scanning into GBDT algorithm model to obtain predicted precipitation corresponding to the satellite observation point; comparing the predicted precipitation with the actual precipitation in the historical data of satellite observation point scanning to obtain residual errors, and further finishing training the GBDT algorithm model; obtaining optimal parameter selection of the GBDT algorithm model through training, substituting real-time data of an observation point of satellite real-time scanning into the trained GBDT algorithm model, and further obtaining a corresponding precipitation prediction value of the satellite observation point;
(1) The GBDT algorithm is set to:
(1) selection of a loss function
Predicting precipitation for GBDT algorithmThe parameters of the GBDT algorithm are found, and the measurement is whether the parameters are verified by the GBDT algorithm or not, and the parameters are determined through a loss function, wherein the loss function is as follows: l (y, F) = |y-f|,where y is the actual value, F (x) is the predicted value, +.>Loss of Hu Ba->Obtaining the most appropriate parameters for each regression tree is equivalent to minimizing the loss function L;
(2) objective function
The objective function is to obtain the most suitable parameters of each regression tree, namely the allocation proportion ρ of the t-th regression tree t Parameters of the t-th regression tree:wherein ρ is the distribution ratio of each regression tree, namely the learning rate or step length of the tree, and θ is the parameter of the regression tree;
wherein parameter iteration
Wherein the step size is selected
Wherein E is x,y The error rate, x, represents the input data, f (x) i ) Representing the predicted value of the ith iteration, h representing the second derivative; h (x, θ) represents a regression tree, h (x) it ) Then the ith iteration of the t-th regression tree is represented;
(3) calculation of residual:
the GBDT algorithm is needed to train the regression treeThe residual is taken as a target value, the next regression tree is trained until the number of the established regression trees reaches the requirement and the residual r it Stopping training when reaching the expected range, r it That is to say the direction in which the weak learner of the ith iteration was set up,(2) The iteration flow of the GBDT algorithm is as follows:
each decision tree of the GBDT trains errors in the classification results of the previous decision tree, each calculation is to reduce the residual error of the last time, in order to eliminate the residual error GBDT, a plurality of iterations are carried out, each iteration builds a new model in the gradient direction of the residual error reduction, and the specific iteration flow is as follows:
(1) initializing a first equationAnd initial value θ of parameters of GBDT algorithm model 0 Optimizing an initial value of a parameter of the GBDT algorithm model by utilizing a genetic algorithm to obtain an optimal initial value of the parameter of the GBDT algorithm model;
(2) each iteration calculates the residualr it Representing the direction of weak learner creation for the i-th iteration to train the next regression tree;
(3) constructing regression equation { (x) with residual error as target i ,r it )} i=1,...,n Wherein x is i X is imported data i The longitude and latitude information of the satellite observation point, the annual serial number of the same day and the altitude of the same point are obtained;
(4) find the most suitable parameter choice at present:
(5) after M iterations, the final prediction result is obtained:y i and the predicted precipitation amount is obtained for the satellite observation point.
2. The method for predicting precipitation based on the cloud satellite No. 1, wherein the training sample is set to be centered on the observation point, and the data of 9 points of 3×3 surrounding are added with longitude and latitude information, the annual serial number of the day and the altitude of the point as inputs of one sample; for one input training sample, 292 features are included in total.
3. The method for predicting precipitation based on a satellite No. four in the wind cloud according to any one of claims 1-2, wherein in order to accelerate training time and reduce prediction time, a plurality of models are trained according to regions, and the observation is performed in the chinese region, in the range of 3 ° -55 ° north latitude and 60 ° -137 °; training a model for each data in the range of 4 ° ×4° with a latitude span of 55 ° -3 ° =52°, requiring 13 regional models in total, and a longitude span of 137 ° -60 ° =77°, requiring 20 models in total, wherein the 20 th model requires only 1 ° longitude×4 ° latitude data; that is, a model of 20×13=260 regions in total needs to be trained; for each model, data of 2019 are input for training, the satellite resolution is 0.04 degrees, and then 100×100 points are in each standard area, and 365 days are multiplied for training, so that 365 tens of thousands of training data are used for training.
4. The method for predicting precipitation based on a cloud satellite No. four according to any one of claims 1 to 2, wherein for satellite data outside a training set, 4 ° ×4 ° splitting is performed after the satellite acquires 4 time data within one hour; and each area is respectively divided into corresponding models for prediction, and finally, prediction results are spliced to form precipitation distribution of the China area.
CN202010693436.9A 2020-07-17 2020-07-17 Intelligent precipitation prediction method based on wind cloud No. four meteorological satellites Active CN111832828B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010693436.9A CN111832828B (en) 2020-07-17 2020-07-17 Intelligent precipitation prediction method based on wind cloud No. four meteorological satellites

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010693436.9A CN111832828B (en) 2020-07-17 2020-07-17 Intelligent precipitation prediction method based on wind cloud No. four meteorological satellites

Publications (2)

Publication Number Publication Date
CN111832828A CN111832828A (en) 2020-10-27
CN111832828B true CN111832828B (en) 2023-12-19

Family

ID=72923590

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010693436.9A Active CN111832828B (en) 2020-07-17 2020-07-17 Intelligent precipitation prediction method based on wind cloud No. four meteorological satellites

Country Status (1)

Country Link
CN (1) CN111832828B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380781B (en) * 2020-11-30 2022-10-18 中国人民解放军国防科技大学 Satellite observation completion method based on reanalysis data and unbalanced learning
CN112926664B (en) * 2021-03-01 2023-11-24 南京信息工程大学 Feature selection and CART forest short-time strong precipitation prediction method based on evolutionary algorithm
CN112966656A (en) * 2021-03-29 2021-06-15 国家卫星海洋应用中心 Data processing method and device

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105974495A (en) * 2016-04-29 2016-09-28 中国科学院遥感与数字地球研究所 Method for pre-judging future average cloud amount of target area by using classification fitting method
JP2017003416A (en) * 2015-06-10 2017-01-05 古野電気株式会社 Rainfall prediction system
CN108475393A (en) * 2016-01-27 2018-08-31 华为技术有限公司 The system and method that decision tree is predicted are promoted by composite character and gradient
CN108508505A (en) * 2018-02-05 2018-09-07 南京云思创智信息科技有限公司 Heavy showers and thunderstorm forecasting procedure based on multiple dimensioned convolutional neural networks and system
CN108761574A (en) * 2018-05-07 2018-11-06 中国电建集团北京勘测设计研究院有限公司 Rainfall evaluation method based on Multi-source Information Fusion
CN109143408A (en) * 2018-08-09 2019-01-04 河海大学 Combine short-term precipitation forecasting procedure in dynamic area based on MLP
CN109946235A (en) * 2019-02-26 2019-06-28 南京信息工程大学 The multi layer cloud inversion method of wind and cloud 4A meteorological satellite Multichannel Scan Imagery Radiometer
CN110346844A (en) * 2019-07-15 2019-10-18 南京恩瑞特实业有限公司 Quantitative Precipitation estimating and measuring method of the NRIET based on cloud classification and machine learning
CN110516818A (en) * 2019-05-13 2019-11-29 南京江行联加智能科技有限公司 A kind of high dimensional data prediction technique based on integrated study technology
CN110765644A (en) * 2019-11-06 2020-02-07 兰州大学 Data assimilation method for Fengyun No. four satellite lightning imager
CN110824586A (en) * 2019-10-23 2020-02-21 上海理工大学 Rainfall prediction method based on improved decision tree algorithm
US10613252B1 (en) * 2015-10-02 2020-04-07 Board Of Trustees Of The University Of Alabama, For And On Behalf Of The University Of Alabama In Huntsville Weather forecasting systems and methods
CN111105068A (en) * 2019-11-01 2020-05-05 复旦大学 Numerical value mode correction method based on sequence regression learning
CN111210082A (en) * 2020-01-13 2020-05-29 东南大学 Optimized BP neural network algorithm-based precipitation prediction method
CN111368887A (en) * 2020-02-25 2020-07-03 平安科技(深圳)有限公司 Training method of thunderstorm weather prediction model and thunderstorm weather prediction method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IL156424A0 (en) * 2000-12-15 2004-01-04 Nooly Technologies Ltd Location-based weather nowcast system and method
FR2947938B1 (en) * 2009-07-10 2014-11-21 Thales Sa METHOD OF PREDICTING EVOLUTION OF A WEATHER PHENOMENON FROM DATA FROM A WEATHER RADAR
US9207098B2 (en) * 2014-02-21 2015-12-08 Iteris, Inc. Short-term travel-time prediction modeling augmented with radar-based precipitation predictions and scaling of same
US11248930B2 (en) * 2018-03-02 2022-02-15 International Business Machines Corporation Microclimate wind forecasting

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017003416A (en) * 2015-06-10 2017-01-05 古野電気株式会社 Rainfall prediction system
US10613252B1 (en) * 2015-10-02 2020-04-07 Board Of Trustees Of The University Of Alabama, For And On Behalf Of The University Of Alabama In Huntsville Weather forecasting systems and methods
CN108475393A (en) * 2016-01-27 2018-08-31 华为技术有限公司 The system and method that decision tree is predicted are promoted by composite character and gradient
CN105974495A (en) * 2016-04-29 2016-09-28 中国科学院遥感与数字地球研究所 Method for pre-judging future average cloud amount of target area by using classification fitting method
CN108508505A (en) * 2018-02-05 2018-09-07 南京云思创智信息科技有限公司 Heavy showers and thunderstorm forecasting procedure based on multiple dimensioned convolutional neural networks and system
CN108761574A (en) * 2018-05-07 2018-11-06 中国电建集团北京勘测设计研究院有限公司 Rainfall evaluation method based on Multi-source Information Fusion
CN109143408A (en) * 2018-08-09 2019-01-04 河海大学 Combine short-term precipitation forecasting procedure in dynamic area based on MLP
CN109946235A (en) * 2019-02-26 2019-06-28 南京信息工程大学 The multi layer cloud inversion method of wind and cloud 4A meteorological satellite Multichannel Scan Imagery Radiometer
CN110516818A (en) * 2019-05-13 2019-11-29 南京江行联加智能科技有限公司 A kind of high dimensional data prediction technique based on integrated study technology
CN110346844A (en) * 2019-07-15 2019-10-18 南京恩瑞特实业有限公司 Quantitative Precipitation estimating and measuring method of the NRIET based on cloud classification and machine learning
CN110824586A (en) * 2019-10-23 2020-02-21 上海理工大学 Rainfall prediction method based on improved decision tree algorithm
CN111105068A (en) * 2019-11-01 2020-05-05 复旦大学 Numerical value mode correction method based on sequence regression learning
CN110765644A (en) * 2019-11-06 2020-02-07 兰州大学 Data assimilation method for Fengyun No. four satellite lightning imager
CN111210082A (en) * 2020-01-13 2020-05-29 东南大学 Optimized BP neural network algorithm-based precipitation prediction method
CN111368887A (en) * 2020-02-25 2020-07-03 平安科技(深圳)有限公司 Training method of thunderstorm weather prediction model and thunderstorm weather prediction method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Retrieval of cloud top properties from advanced geostationary satellite imager measurements based on machine learning algorithms;Min Min;《Remote Sensing of Environment》;全文 *
Simulation of weather systems over Indian regionSimulation of weather systems over Indian region using mesoscale models using mesoscale models;S. S. Vaidya;Meteorology and Atmospheric Physics;全文 *
卫星云图资料在降水量客观分析中的应用试验;刘刈;何险峰;刘德;;高原山地气象研究(01);全文 *
基于改进CART算法的降雨量预测模型;李正方;杜景林;周芸;;现代电子技术(02);全文 *
风云三号降水测量雷达技术性能分析;尹红刚;商建;吴琼;郭杨;窦芳丽;谷松岩;;气象科技(05);全文 *

Also Published As

Publication number Publication date
CN111832828A (en) 2020-10-27

Similar Documents

Publication Publication Date Title
CN111832828B (en) Intelligent precipitation prediction method based on wind cloud No. four meteorological satellites
CN109214592B (en) Multi-model-fused deep learning air quality prediction method
CN113919448B (en) Method for analyzing influence factors of carbon dioxide concentration prediction at any time-space position
CN108388956B (en) Photovoltaic power prediction method considering radiation attenuation
CN113282122B (en) Commercial building energy consumption prediction optimization method and system
CN112288164B (en) Wind power combined prediction method considering spatial correlation and correcting numerical weather forecast
CN108872964B (en) Ginkgo artificial forest canopy closure degree extraction method based on unmanned aerial vehicle LiDAR data
CN109213964A (en) A kind of satellite AOD product bearing calibration for merging multi-source feature geographic factor
CN106373022B (en) BP-GA-based greenhouse crop planting efficiency condition optimization method and system
CN111783987A (en) Farmland reference crop evapotranspiration prediction method based on improved BP neural network
CN106951979A (en) The crop maturity phase Forecasting Methodology that remote sensing, crop modeling are merged with weather forecast
CN113255972B (en) Short-term rainfall prediction method based on Attention mechanism
CN113537600A (en) Medium-and-long-term rainfall forecast modeling method based on whole-process coupled machine learning
WO2020253338A1 (en) Traffic big data-based road capacity extraction method
CN111241939A (en) Rice yield estimation method based on unmanned aerial vehicle digital image
CN109214591B (en) Method and system for predicting aboveground biomass of woody plant
CN110705182A (en) Crop breeding adaptive time prediction method coupling crop model and machine learning
CN115859789A (en) Method for improving inversion accuracy of polar atmosphere temperature profile
CN109543911B (en) Sunlight radiation prediction method and system
CN116702937A (en) Photovoltaic output day-ahead prediction method based on K-means mean value clustering and BP neural network optimization
CN114357737B (en) Agent optimization calibration method for time-varying parameters of large-scale hydrologic model
CN115759389A (en) Day-ahead photovoltaic power prediction method based on weather type similar day combination strategy
CN114882373A (en) Multi-feature fusion sandstorm prediction method based on deep neural network
CN117909888A (en) Intelligent artificial intelligence climate prediction method
CN116629453B (en) Remote sensing yield estimation method suitable for whole crop growth period

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant