CN115952743A - Multi-source precipitation data collaborative downscaling method and system coupled with random forest and HASM - Google Patents

Multi-source precipitation data collaborative downscaling method and system coupled with random forest and HASM Download PDF

Info

Publication number
CN115952743A
CN115952743A CN202310239673.1A CN202310239673A CN115952743A CN 115952743 A CN115952743 A CN 115952743A CN 202310239673 A CN202310239673 A CN 202310239673A CN 115952743 A CN115952743 A CN 115952743A
Authority
CN
China
Prior art keywords
data
source
precipitation data
downscaling
precipitation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310239673.1A
Other languages
Chinese (zh)
Other versions
CN115952743B (en
Inventor
赵娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Geographic Sciences and Natural Resources of CAS
Original Assignee
Institute of Geographic Sciences and Natural Resources of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Geographic Sciences and Natural Resources of CAS filed Critical Institute of Geographic Sciences and Natural Resources of CAS
Priority to CN202310239673.1A priority Critical patent/CN115952743B/en
Publication of CN115952743A publication Critical patent/CN115952743A/en
Application granted granted Critical
Publication of CN115952743B publication Critical patent/CN115952743B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Image Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to the technical field of electric digital data processing, and provides a multi-source precipitation data collaborative downscaling method and system for coupling a random forest and a HASM. The method comprises the following steps: performing regression prediction on the value of the target rainfall data by using a pre-constructed random forest model according to the multi-source rainfall data to obtain a prediction result corresponding to the target rainfall data; calculating the residual error of each training sample in a training data set of the random forest model, and calculating the real degradation release degree corresponding to the precipitation data of each source based on the random forest model; carrying out interpolation processing on the residual error through HASM, and summing the interpolation processing result and the prediction result to obtain downscaling comprehensive data; and determining the downscaling result of the precipitation data of each source in the multi-source precipitation data based on the downscaling comprehensive data and the real downscaling hydrolysis release degree. Therefore, the downscaling result corresponding to each source data is output while the downscaling comprehensive data with high precision and high resolution is obtained.

Description

Multi-source precipitation data collaborative downscaling method and system for coupling random forest and HASM
Technical Field
The application relates to the technical field of electric digital data processing, in particular to a multi-source precipitation data collaborative downscaling method and system for coupling a random forest and a HASM.
Background
Ecological environment factors, such as temperature, precipitation, carbon dioxide concentration, etc., are the natural basis on which the human society lives and develops. The scientific understanding of the basic spatial distribution and change rules of these ecological environment elements is the primary task of ecological environment informatics. Water resources are the primary ecological environment factors on which human beings live and are also indispensable production resources for social and economic development. Fresh water resources in all the layers of the earth are the most important resources for human beings, and the main mode for acquiring the fresh water resources at present is atmospheric precipitation. Therefore, how to acquire precipitation grid data with high precision and high spatial resolution to evaluate the existing water resource is a key point and a difficult point in the field of ecological environment information.
With the advent of the information age, the requirement of human beings on the spatial resolution of precipitation data is higher and higher, and in order to conveniently acquire precipitation grid data with high spatial resolution, the data with coarse resolution is often downscaled to acquire data with higher resolution.
The traditional downscaling method can be roughly divided into two categories of power downscaling and statistical downscaling, however, practice shows that the methods have certain limitations, for example, the calculation amount of the power downscaling is large, the statistical downscaling is obviously affected by collinearity although the calculation amount is small, and the method requires that a regression system has consistent smoothness, and cannot well depict heterogeneity of a geographic space, so that the accuracy of the downscaling result is insufficient. In addition, the traditional method usually only outputs the single high-resolution downscaling comprehensive data of the multi-source precipitation data, cannot simultaneously obtain the downscaling result corresponding to each source precipitation data, and cannot keep the multi-scale characteristics of the original data.
Therefore, there is a need to provide an improved solution to the above-mentioned deficiencies of the prior art.
Disclosure of Invention
The application aims to provide a multi-source precipitation data collaborative downscaling method and system for coupling random forests and HASMs, so as to solve or alleviate the problems in the prior art.
In order to achieve the above purpose, the present application provides the following technical solutions:
the application provides a multisource precipitation data collaborative downscaling method for coupling a random forest and a HASM, comprising the following steps of:
according to the multi-source precipitation data, carrying out regression prediction on the value of the target precipitation data by using a pre-constructed random forest model to obtain a prediction result corresponding to the target precipitation data; wherein the spatial resolution of the target precipitation data is equal to the downscaled target resolution; the spatial resolution of the multi-source precipitation data is lower than the target resolution;
calculating the residual error of each training sample in a training data set of the random forest model, and calculating the real degradation release degree corresponding to the precipitation data of each source in the multi-source precipitation data based on the random forest model;
carrying out interpolation processing on the residual error of each training sample in the training data set by a high-precision curve modeling method, and carrying out summation calculation on the interpolation processing result and the prediction result to obtain downscaling comprehensive data;
and determining the downscaling result of the precipitation data of each source in the multi-source precipitation data based on the downscaling comprehensive data and the real downscaling hydrological release degree.
Preferably, the random forest model is constructed by the following steps:
according to the target resolution, carrying out spatial resolution unified processing on the pre-acquired original multi-source precipitation data to obtain multi-source precipitation data with unified resolution;
constructing the training data set according to observation station precipitation data and multi-source precipitation data with unified resolution;
and constructing the random forest model based on the training data set.
Preferably, according to the target resolution, performing spatial resolution unified processing on the pre-acquired original multi-source precipitation data to obtain the multi-source precipitation data, specifically:
calculating a spatial scale unity factor according to the target resolution and the spatial resolution of the precipitation data of each source in the original multi-source precipitation data;
and converting each original pixel in the precipitation data of each source in the original multi-source precipitation data into a target pixel under the target resolution based on the spatial scale unification factor, and setting the value of the target pixel as the value of the corresponding original pixel to obtain the multi-source precipitation data with unified resolution.
Preferably, the calculation process of the true degradation hydrolysis release degree corresponding to the precipitation data of each source in the multi-source precipitation data is as follows:
calculating the overall interpretation degree of the random forest model; the overall interpretation degree characterizes the change degree of real rainfall which can be interpreted by the random forest model after the random forest model is trained by using all training samples of the training data set;
removing training samples corresponding to the first source rainfall data from the training data set, then retraining the random forest model, and calculating to obtain a first interpretation degree; the first source precipitation data is precipitation data of any one source in the multi-source precipitation data;
and performing difference operation on the overall interpretation degree and the first interpretation degree to obtain a real degradation interpretation degree corresponding to the first source precipitation data, wherein the real degradation interpretation degree represents the change degree of real precipitation which can be interpreted by the first source precipitation data.
Preferably, the determining the downscaling result of the precipitation data of each source in the multi-source precipitation data based on the downscaling comprehensive data and the real downscaling hydrolysis degree specifically includes:
and multiplying the downscaling comprehensive data with the real downscaling release degree corresponding to the precipitation data of each source in the multi-source precipitation data to obtain a downscaling result of the precipitation data of each source in the multi-source precipitation data.
The embodiment of the application provides a multisource precipitation data collaborative downscaling system of coupling random forest and HASM, includes:
the regression prediction unit is configured to carry out regression prediction on the value of the target rainfall data by using a pre-constructed random forest model according to the multi-source rainfall data to obtain a prediction result corresponding to the target rainfall data; wherein the spatial resolution of the target precipitation data is equal to the downscaled target resolution; the spatial resolution of the multi-source precipitation data is lower than the target resolution;
the calculation unit is configured to calculate a residual error of each training sample in a training data set of the random forest model, and calculate a true degradation paraphrase corresponding to precipitation data of each source in the multi-source precipitation data based on the random forest model;
the first downscaling unit is configured to perform interpolation processing on a residual error of each training sample in the training data set by a high-precision curved surface modeling method, and perform summation calculation on a result of the interpolation processing and the prediction result to obtain downscaling comprehensive data;
and the second downscaling unit is configured to determine a downscaling result of the precipitation data of each source in the multi-source precipitation data based on the downscaling comprehensive data and the real downscaling hydrological degradation degree.
Has the advantages that:
in the technical scheme provided by the embodiment of the application, a random forest model (Randomforest, RF) is used for carrying out regression prediction on the value of target rainfall data to obtain a prediction result corresponding to the target rainfall data; and then calculating the residual error of each training sample in a training data set of the random forest model, interpolating the residual error by using a high-precision curved Surface Modeling method (HASM for short), summing the interpolation result and a prediction result obtained by the random forest model to obtain high-resolution downscaling comprehensive data, meanwhile, calculating the real downscaling degree corresponding to the rainfall data of each source in the multi-source rainfall data based on the random forest model, and then combining the real downscaling degree of the rainfall data of each source with the downscaling comprehensive data to determine the downscaling result corresponding to the rainfall data of each source. Therefore, the random forest model and the HASM are coupled, the characteristics of simplicity and high efficiency of the random forest model are fully utilized, and the high-precision fitting of the HASM to the residual error is combined to reduce errors caused by geographic spatial heterogeneity, so that high-precision and high-resolution rainfall comprehensive data are obtained. According to the method, the comprehensive rainfall data is obtained, meanwhile, the random forest model can be used for obtaining the real rainfall hydrolysis release degree corresponding to each source rainfall data in the multi-source rainfall data, and the high-precision and high-resolution rainfall scale reduction result corresponding to each source rainfall data is conveniently and quickly calculated, so that the multi-scale characteristics of each source rainfall data are kept, the change rule of the original data can be revealed, and the requirements of different use scenes in the water resource assessment process in the ecological environment information field are met.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, are included to provide a further understanding of the application, and the description of the exemplary embodiments and illustrations of the application are intended to explain the application and are not intended to limit the application. Wherein:
fig. 1 is a logic diagram of a multi-source precipitation data collaborative downscaling method coupled with random forest and high-precision curved surface modeling provided in accordance with some embodiments of the present application;
FIG. 2 is a schematic flow diagram of a method for collaborative downscaling of multi-source precipitation data coupled with random forest and high-precision curved surface modeling provided in accordance with some embodiments of the present application;
fig. 3 is a schematic structural diagram of a multi-source precipitation data collaborative downscaling method coupled with random forest and high-precision curved surface modeling, provided in accordance with some embodiments of the present application.
Detailed Description
The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. The various examples are provided by way of explanation of the application and are not limiting of the application. In fact, it will be apparent to those skilled in the art that modifications and variations can be made in the present application without departing from the scope or spirit of the application. For instance, features illustrated or described as part of one embodiment, can be used with another embodiment to yield a still further embodiment. It is therefore intended that the present application cover the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.
As described in the background, obtaining high-precision, high spatial resolution precipitation grid data has been a problem that the art is desirous of solving. At present, the main ways of directly acquiring the data of the atmospheric precipitation grid are:
(1) Acquisition by meteorological site observation (also known as observation site)
The rainfall sensor is erected at a certain position on the ground, and atmospheric rainfall at the certain position is observed and recorded, so that rainfall data at the certain position is obtained. The method has the advantages that the data accuracy is high, the exact precipitation data on the ground can be obtained, however, due to economic cost and ground actual conditions, observation stations are often very sparse, spatially continuous observation data cannot be obtained, and particularly in economically undeveloped areas, such as western areas of China, the situation that the observation stations are not available in a large range often exists.
(2) Retrieval by inversion of satellite remote sensing data
The development of the satellite remote sensing technology provides a new mode for acquiring ground precipitation, and a fixed relation between the ground precipitation and the reflectivity is established through mathematical reconstruction of an atmospheric process, so that the grid data of the ground precipitation is obtained through satellite data inversion. The method has the advantages that the ground precipitation data in a large range can be synchronously acquired, but the accuracy is usually limited by an inversion algorithm and corresponding weather conditions (such as cloud state when the data are acquired), and in addition, the acquired data are not continuous in time generally due to the characteristic of the satellite remote sensing revisit period.
(3) Obtaining by numerical simulation
With the development of computer technology, scientists in the multidisciplinary field strive together to abstract a climate system by using a mathematical formula, and establish numerical software on the basis of computer hardware and software to simulate a real climate system so as to acquire precipitation grid data. The method has the advantages that the grid data of ground precipitation space-time continuity can be obtained, but due to limited knowledge of people on the climate system and strong variability of the climate system, the data obtained by the method often has certain system deviation, the calculation force required by the method is also increased sharply along with the improvement of the target resolution, and the economic cost of data acquisition is greatly increased.
(4) Reanalyzing the data
And analyzing the data, assimilating various meteorological data on the basis of a forecasting mode to obtain a minimum error initial field, which is essentially a data fusion method, fusing data of an observation station, satellite remote sensing data and numerical simulation data to obtain precipitation data with improved precision. The method has the advantages that multi-source data are combined, the advantages of the multi-source data are fully utilized to obtain a final result, however, the spatial resolution of the data obtained by the method is rough and is often suitable for large-scale climate analysis, errors are accumulated by the method, and sometimes the quality of precipitation data cannot be improved by the obtained data.
As can be seen from the above description, the precipitation data obtained by these methods have the following disadvantages during use: the space resolution is relatively coarse, and the optimal space dimensions are different, for example, observation sites are generally point data (vector data), the obtained site observation data are generally stored in the corresponding observation sites in an attribute table manner, satellite remote sensing data (also called as satellite remote sensing products) are raster data, the space resolution is generally about 10km, the spatial resolution (numerical simulation result) of rainfall data obtained through numerical simulation is about 5km to 30km, and the space resolution of reanalysis data is generally 25km at most. Due to the existence of the geographic mesoscale effect, various precipitation data can only reveal the rule of the spatial scale corresponding to the precipitation data, so that the original spatial scale rule of the precipitation data is kept while the spatial resolution is improved for ensuring the use effect of the precipitation data.
Downscaling (Downscaling) is a method for improving spatial resolution of precipitation data, which converts precipitation data with coarse resolution into precipitation data with high resolution by a certain technical means, and improves the precision of the precipitation data. However, the conventional downscaling method is usually used for downscaling a single data source, and downscaling multi-source precipitation data has the problems of large calculation amount, insufficient precision caused by influence of geographic spatial heterogeneity and the like.
Therefore, the method and the system can fully utilize respective advantages of the multi-source precipitation data in the process of generating the high-spatial-resolution precipitation data, obtain high-resolution and high-precision precipitation comprehensive data, and simultaneously can conveniently obtain respective corresponding scale reduction results of the multi-source precipitation data.
Exemplary method
The embodiment of the application provides a multi-source precipitation data collaborative downscaling method for coupling random forest and high-precision curved surface modeling, and as shown in fig. 1 and fig. 2, the method comprises the following steps:
and S101, performing regression prediction on the value of the target rainfall data by using a pre-constructed random forest model according to the multi-source rainfall data to obtain a prediction result corresponding to the target rainfall data.
Wherein the spatial resolution of the target precipitation data is equal to the downscaled target resolution; the spatial resolution of the multi-source precipitation data is lower than the target resolution.
In the embodiment of the application, the original precipitation data can be satellite remote sensing data, numerical simulation results, reanalysis data and the like. In practice, precipitation data of at least 2 sources in a research area can be obtained according to research purposes and actual conditions, and the method does not set an upper limit on the number of the precipitation data sources. In addition, the multi-source precipitation data may also include observation site precipitation data.
And unifying the spatial resolution of the original multi-source precipitation data to a target resolution to obtain target precipitation data, wherein the precipitation data has the target resolution, but the value of each pixel needs to be estimated according to the original multi-source precipitation data. At the moment, regression prediction can be carried out on the value of the target precipitation data by utilizing a pre-constructed random forest model according to the multi-source precipitation data, so that a preliminary prediction result is obtained.
The random forest model is an algorithm integrating a plurality of decision trees through the idea of ensemble learning, has the characteristics of simplicity and high efficiency, can carry out fast regression prediction on precipitation data from a plurality of sources, and is convenient for obtaining a preliminary prediction result of the value of target precipitation data under a target resolution. Meanwhile, due to the fact that most of original precipitation data have certain noise interference, such as missing values or abnormal values, the noise can be effectively processed by utilizing the anti-noise characteristic of the random forest, overfitting of the model is prevented, and the accuracy of the prediction result is improved. In addition, when the sources of the rainfall data are increased, the random forest model can also be effectively and quickly processed for the high-dimensional rainfall data, and the reduction of the calculation efficiency of the downscaling caused by the fact that the dimensionality of the data is too high is avoided.
In some embodiments, the random forest model is constructed by: according to the target resolution, carrying out spatial resolution unified processing on the pre-acquired original multi-source precipitation data to obtain multi-source precipitation data with unified resolution; constructing a training data set according to observation station precipitation data and multi-source precipitation data with unified resolution; and constructing a random forest model based on the training data set.
Specifically, the target resolution is a data spatial resolution obtained by downscaling the low-resolution original multi-source precipitation data, and it can be understood that the target resolution is higher than the spatial resolution of the original precipitation data. This embodiment will be described with an example of a target resolution of 1km.
In order to facilitate scale reduction, the spatial resolution is required to be uniformly processed on original multi-source precipitation data, and then a training data set is constructed by combining observation station precipitation data and the multi-source precipitation data with uniform resolution, so that a random forest model is trained on the basis of the training data set.
Wherein the essence of the training process of the random forest model is that the input data sequence is learned by a data-driven method-driven modelXTo output dataY"and based on the mapping relationship, simulating and predicting new data, and a set composed of input data sequence and output data is called a training data set. In the embodiment of the application, observation station dewatering data is used as output data of a training data set (namely, observation station dewatering data is used as output data of the training data setY) Extracting the multi-source precipitation number of the observation station position in the multi-source precipitation data with uniform spatial resolution according to the geographic position (namely longitude and latitude coordinates) of each observation stationTaking values of data to form an input data sequenceXThe input data sequence is a sequence consisting of a plurality of dimensionality precipitation data, and different components in the sequence are formed by precipitation data values of all sources. For example, the input data sequence corresponding to the values of the satellite remote sensing data isx1The value of the numerical simulation result corresponds to the input data sequence isx2The values of the reanalyzed data correspond to the input data sequencex3And the like. Thus, observing station precipitation data and multi-source precipitation data obtain a series of observation station precipitation data asYData of precipitation from various sources asXThe training data set of (1).
Subsequently, a random forest model is constructed and trained based on the constructed training data set, and various parameters of the model are learned through a training process so as to perform regression prediction on new data.
Further, according to the target resolution, the spatial resolution unified processing is carried out on the original multi-source precipitation data obtained in advance, and the multi-source precipitation data are obtained, wherein the spatial resolution unified processing specifically comprises the following steps: calculating a spatial scale unity factor according to the target resolution and the spatial resolution of the precipitation data of each source in the original multi-source precipitation data; and converting each original pixel in the precipitation data of each source in the original multi-source precipitation data into a target pixel under the target resolution based on the spatial scale unification factor, and setting the value of the target pixel as the value of the corresponding original pixel to obtain the multi-source precipitation data with unified resolution.
For example, in the original multi-source rainfall data, the raster data with different spatial resolutions comprise satellite remote sensing data, a numerical simulation result, reanalysis data and the like, wherein the spatial resolution of the satellite remote sensing data is 10km, the spatial resolution of the numerical simulation result is 20km, the spatial resolution of the reanalysis data is 25km, and the target resolution is 1km.
The traditional spatial resolution unification process is usually realized by adopting an interpolation mode, so that the obtained high-resolution data is difficult to keep the original data change rule. In the embodiment of the application, in order to keep the spatial multi-scale rule contained in the original multi-source precipitation data, a pixel repetition method completely different from an interpolation method is adopted to unify the spatial resolution of the original multi-source precipitation data to the target resolution. Specifically, first, a spatial scale unification factor is calculated, for example, the spatial resolution of the satellite remote sensing data is 10km, the target resolution is 1km, and it can be known that the spatial scale unification factor of the spatial resolution of the satellite remote sensing data and the target resolution is 10/1=10, and so on, and spatial scale unification factors corresponding to precipitation data of other sources are calculated. Then, each original pixel of each source precipitation data in the original multi-source precipitation data is converted into a pixel under a target resolution, for example, in the satellite remote sensing data, one original pixel represents the real area of the ground as follows: and if the original pixel is converted to the target resolution by 10km multiplied by 10km, the conversion is as follows: 10 × 10=100 target pixels, each target pixel representing the real area of the ground: and 1km multiplied by 1km, and then setting the value of the target pixel as the value of the corresponding original pixel, namely the precipitation values in the 100 target pixels are the precipitation values of the original pixel. In this way, the multi-scale rule of the original product is kept as much as possible after the resolution is unified.
Step S102, calculating residual errors of all training samples in a training data set of the random forest model, and calculating real degradation paraphrase corresponding to precipitation data of all sources in multi-source precipitation data based on the random forest model.
On the basis of the constructed random forest model, the residual error of each training sample in the training data set is calculated, so that an unbiased regression model is obtained, and the expression of the unbiased regression model is as follows:
Y = RF(X)+δ,X={x1,x2,x3…},
in the formula (I), the compound is shown in the specification,Ythe value of the precipitation data to be predicted, namely the value of the target precipitation data,Xinput data sequences representing the composition of multi-source precipitation data, each variable in the sequence representing precipitation data from one source, e.g. from a single sourcex1Which represents the remote sensing data of the satellite,x2represents the result of the numerical simulation and is,x3representing re-analytical data and the like,δrepresenting the residual error.
And S103, carrying out interpolation processing on the residual error of each training sample in the training data set by a high-precision curve modeling method, and carrying out summation calculation on the interpolation processing result and the prediction result to obtain the downscaling comprehensive data.
The HASM method is a mathematical model which is provided by organically combining a system theory, a surface theory and an optimization control theory by a Chinese scholars Yueyani team, can accurately express and analyze ecological environment elements, abstracts the gridding (namely gridding) expression of the ecological environment elements into a mathematical 'surface', and then carries out high-precision simulation on the mathematical 'surface' through a surface modeling technology to obtain a spatially continuous ecological environment element surface.
In the embodiment of the present application, a residual corresponding to each training sample in a training data set is substantially point-like data located at an observation station, and in order to further improve the accuracy of downscaling, the embodiment of the present application makes full use of the high-precision surface simulation advantage of the HASM method, performs interpolation processing on the residual using the HASM method, so that the residual is converted from point-like data to surface-like data with a target resolution, and then performs summation calculation on the interpolation processing result and the prediction result to obtain downscaling comprehensive data, where the expression is as follows:
Y’= RF(X)+HASM(δ),X={x1,x2,x3…},
wherein, the first and the second end of the pipe are connected with each other,Y’the downscaling composite data is represented,δwhich represents the residual error, is,HASM(δ)shows the result of interpolating the residual using the HASM method,Xan input data sequence composed of multi-source precipitation data. It should be noted that, in the above formula,XY’andHASM(δ)are raster data having a spatial resolution equal to the target resolution, which is 1km for the example.
The traditional downscaling method has the problems of large calculation amount or insufficient accuracy of downscaling results. In the embodiment of the application, the regression prediction is carried out on the multi-source precipitation data through the random forest model, the preliminary prediction result of the precipitation data with the target resolution ratio is conveniently and efficiently obtained, then the interpolation processing is carried out on the residual error by utilizing the characteristic of high-precision curve simulation of the HASM method, the error correction is carried out on the preliminary prediction result of the precipitation data under the target resolution ratio, and finally the high-precision and high-resolution downscaling comprehensive data is obtained. The method can realize the downscaling processing of the multisource precipitation data through a small calculation amount, and meanwhile, the precision of the downscaling result is improved, so that a new downscaling idea of the multisource precipitation data is provided.
And S104, determining a downscaling result of precipitation data of each source in the multi-source precipitation data based on the downscaling comprehensive data and the real degradation interpretation degree.
The method and the device for outputting the downscaling comprehensive data can output downscaling comprehensive data, and can also output the downscaling result of the precipitation data of each source by utilizing the real downscaling and downscaling comprehensive data.
In order to simultaneously solve the downscaling result of each source precipitation data, the real degradation interpretations corresponding to each source precipitation data are calculated. In some embodiments, the calculation of the true precipitation interpretations corresponding to the precipitation data for each of the multiple sources of precipitation data is: calculating the overall interpretation degree of the random forest model; the overall interpretation degree characterizes the change degree of real rainfall which can be interpreted by the random forest model after the random forest model is trained by using all training samples of the training data set; removing training samples corresponding to the first source rainfall data from the training data set, then retraining the random forest model, and calculating to obtain a first interpretation degree; the first source precipitation data is precipitation data of any one source in the multi-source precipitation data; and performing difference operation on the total interpretation degree and the first interpretation degree to obtain a real degradation interpretation degree corresponding to the first source precipitation data, wherein the real degradation interpretation degree represents the change degree of real precipitation which can be interpreted by the first source precipitation data.
Wherein, the total interpretation degree of the random forest model is also called variance interpretation degree (abbreviated as interpretation degree)QIs a value ranging from 0 to 1, and represents precipitation data (ratio) using multiple sourcesSuch asx1、x2、x3) And constructing a training data set and training the variation degree of the true rainfall which can be explained when the random forest model is trained by using all training samples, namely the overall goodness of fit of the random forest model.
In order to obtain the true degradation interpretations corresponding to the precipitation data of each source, the embodiment removes the training samples belonging to each source from the training data set, for example, the solving step for the true degradation interpretations corresponding to the satellite remote sensing data is as follows: firstly, a numerical sequence corresponding to satellite remote sensing data (first source precipitation data) in a training data setx1Removing to leavex2、x3The precipitation data form an input data sequenceXAnd is prepared byYForming a new training data set, reconstructing and training the random forest model, and calculating the corresponding interpretations of the new random forest modelQ1(i.e., first degree of interpretation), then using the overall degree of interpretationQSubtracting the corresponding interpretations of the new random forest modelQ1Obtaining the true degradation hydrolysis release degree corresponding to the satellite remote sensing dataq1. Namely:
q1=Q-Q1,
the same procedure is used to determine true degradation interpretations corresponding to precipitation data from other sources, such asx2Corresponding degree of interpretation isq2x3Corresponding degree of interpretation isq3And the like.
Further, based on the downscaling comprehensive data and the real degradation hydrolysis release degree, determining a downscaling result of precipitation data of each source in the multi-source precipitation data, specifically: and multiplying the downscaling comprehensive data with the real downscaling interpretation degree corresponding to the precipitation data of each source in the multi-source precipitation data to obtain a downscaling result of the precipitation data of each source in the multi-source precipitation data. The specific expression is as follows:
X1 = Y’×q1 =(RF(X)+HASM(δ))×q1,
X1representx1The corresponding downscaling result, namely the downscaling result corresponding to the satellite remote sensing data, has a spatial resolution which is converted from 10km to 1km, namely the size of each pixel after downscaling is 1km multiplied by 1km, and simultaneously, each imageThe value precision of the element is also improved.
Obtained by a similar procedurex2、x3Corresponding downscaling resultsX2、X3Namely, the numerical simulation result after the size reduction and the reanalysis data.
In summary, the embodiment of the application provides a multi-source data collaborative downscaling method for coupling a random forest model and a HASM, the method couples the random forest model and the HASM, respective advantages of multi-source precipitation data are fully utilized to obtain high-precision and high-resolution precipitation raster data (namely downscaling comprehensive data), meanwhile, the random forest model is utilized to obtain residual errors of training samples of a training data set and explanation degrees of the real precipitation of the precipitation data in the multi-source precipitation data, and accordingly, high-precision and high-resolution downscaling results of the precipitation data from various sources are obtained simultaneously. That is to say, by the method, high-precision and high-resolution downscaling comprehensive data can be obtained from the rainfall data of a plurality of sources, and downscaling results corresponding to the rainfall data of each source can be conveniently obtained.
By the method, the comprehensive downscaling result coupled with the multi-source precipitation data can be finally obtained, the downscaling result corresponding to the downscaling data can be conveniently obtained based on the random forest model, the multi-scale characteristic of the original precipitation data is retained to the maximum extent, and the spatial law contained in the original precipitation data is retained.
Exemplary System
The embodiment of the application provides multisource precipitation data collaborative downscaling system of coupling random forest and high accuracy curved surface modeling, as shown in fig. 3, this system includes: regression prediction unit 301, calculation unit 302, first downscaling unit 303, and second downscaling unit 304. Wherein:
the regression prediction unit 301 is configured to perform regression prediction on the value of the target precipitation data by using a pre-constructed random forest model according to the multi-source precipitation data to obtain a prediction result corresponding to the target precipitation data. Wherein the spatial resolution of the target precipitation data is equal to the downscaled target resolution; the spatial resolution of the multi-source precipitation data is lower than the target resolution.
The calculating unit 302 is configured to calculate a residual error of each training sample in the training data set of the random forest model based on the random forest model, and calculate a true degradation paraphrase corresponding to the precipitation data of each source in the multi-source precipitation data.
The first downscaling unit 303 is configured to perform interpolation processing on the residual error of each training sample in the training data set by using a high-precision curved surface modeling method, and perform summation calculation on the interpolation processing result and the prediction result to obtain downscaling comprehensive data.
A second downscaling unit 304 configured to determine a downscaling result of the precipitation data of each source in the multi-source precipitation data based on the downscaling integrated data and the true downscaling interpretations.
In some embodiments, the model building unit (not shown in the figures) is further included, configured to: the scale unifying module is configured to perform spatial resolution unifying processing on pre-acquired original multi-source precipitation data according to a target resolution to obtain multi-source precipitation data; the training data construction module is configured to construct a training data set according to the observation station rainfall data and the multi-source rainfall data; and the training module is configured to construct a random forest model based on the training data set.
In some embodiments, the scale unification module is further configured to: calculating a spatial scale unity factor according to the target resolution and the spatial resolution of the precipitation data of each source in the original multi-source precipitation data; and converting each original pixel in the precipitation data of each source in the original multi-source precipitation data into a target pixel under the target resolution based on the spatial scale unification factor, and setting the value of the target pixel as the value of the corresponding original pixel to obtain the multi-source precipitation data.
In some embodiments, the calculation unit further comprises an interpretation degree calculation module; the interpretation degree calculation module is configured to: calculating the overall interpretation degree of the random forest model; the overall interpretation degree characterizes the change degree of real rainfall which can be interpreted by the random forest model after the random forest model is trained by using all training samples of the training data set; removing training samples corresponding to the first source rainfall data from the training data set, then training the random forest model again, and calculating to obtain a first interpretation degree; the first source precipitation data is precipitation data of any one source in the multi-source precipitation data; and performing difference operation on the overall interpretation degree and the first interpretation degree to obtain a real degradation interpretation degree corresponding to the first source precipitation data, wherein the real degradation interpretation degree represents the change degree of real precipitation which can be interpreted by the first source precipitation data.
In some embodiments, the second downscaling unit is further configured to: and multiplying the downscaling comprehensive data with the real downscaling release degree corresponding to the precipitation data of each source in the multi-source precipitation data to obtain a downscaling result of the precipitation data of each source in the multi-source precipitation data.
The coupled random forest and high-precision curved surface modeling multi-source precipitation data collaborative downscaling system provided by the embodiment of the application can realize the steps and the flows of the coupled random forest and high-precision curved surface modeling multi-source precipitation data collaborative downscaling method provided by any one of the embodiments, achieves the same technical effect, and is not repeated one by one.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A multi-source precipitation data collaborative downscaling method coupled with a random forest and a HASM is characterized by comprising the following steps:
according to the multi-source precipitation data, carrying out regression prediction on the value of the target precipitation data by using a pre-constructed random forest model to obtain a prediction result corresponding to the target precipitation data; wherein the spatial resolution of the target precipitation data is equal to the downscaled target resolution; the spatial resolution of the multi-source precipitation data is lower than the target resolution;
calculating the residual error of each training sample in the training data set of the random forest model, and calculating the true precipitation interpretation degree corresponding to the precipitation data of each source in the multi-source precipitation data based on the random forest model;
carrying out interpolation processing on the residual error of each training sample in the training data set by a high-precision curve modeling method, and carrying out summation calculation on the interpolation processing result and the prediction result to obtain downscaling comprehensive data;
and determining the downscaling result of the precipitation data of each source in the multi-source precipitation data based on the downscaling comprehensive data and the real downscaling hydrological release degree.
2. The cooperative multisource precipitation data downscaling method of coupled random forest and HASM according to claim 1, characterized in that the random forest model is constructed by:
according to the target resolution, carrying out spatial resolution unified processing on the pre-acquired original multi-source precipitation data to obtain multi-source precipitation data with unified resolution;
constructing the training data set according to observation station precipitation data and multi-source precipitation data with unified resolution;
and constructing the random forest model based on the training data set.
3. The cooperative multisource precipitation data downscaling method of coupled random forests and HASMs according to claim 2, wherein the spatial resolution unification processing is performed on pre-acquired original multisource precipitation data according to the target resolution to obtain multisource precipitation data with unified resolution, and specifically:
calculating a spatial scale unity factor according to the target resolution and the spatial resolution of the precipitation data of each source in the original multi-source precipitation data;
and based on the spatial scale unification factor, converting each original pixel in the precipitation data of each source in the original multi-source precipitation data into a target pixel under the target resolution, and setting the value of the target pixel as the value of the corresponding original pixel to obtain the multi-source precipitation data with unified resolution.
4. The cooperative downscaling method of multi-source precipitation data coupled with random forests and HASMs according to claim 2, wherein the calculation process of the true degradation hydrolysis release degree corresponding to the precipitation data of each source in the multi-source precipitation data is as follows:
calculating the overall interpretation degree of the random forest model; the overall interpretation degree characterizes the change degree of real rainfall which can be interpreted by the random forest model after the random forest model is trained by using all training samples of the training data set;
removing training samples corresponding to the first source rainfall data from the training data set, then training the random forest model again, and calculating to obtain a first interpretation degree; the first source precipitation data is precipitation data of any one source in the multi-source precipitation data;
and performing difference operation on the total interpretation degree and the first interpretation degree to obtain a real degradation interpretation degree corresponding to the first source precipitation data.
5. The cooperative downscaling method of multi-source precipitation data coupled with random forest and HASM according to claim 1, wherein the downscaling result of precipitation data of each source in the multi-source precipitation data is determined based on the downscaling integrated data and the real downscaling interpretations, and specifically comprises:
and multiplying the downscaling comprehensive data with the real downscaling interpretation degree corresponding to the precipitation data of each source in the multi-source precipitation data to obtain a downscaling result of the precipitation data of each source in the multi-source precipitation data.
6. The utility model provides a multisource precipitation data cooperative downscaling system of coupling random forest and HASM which characterized in that includes:
the regression prediction unit is configured to carry out regression prediction on the value of the target rainfall data by using a pre-constructed random forest model according to the multi-source rainfall data to obtain a prediction result corresponding to the target rainfall data; wherein the spatial resolution of the target precipitation data is equal to the downscaled target resolution; the spatial resolution of the multi-source precipitation data is lower than the target resolution;
the calculation unit is configured to calculate the residual error of each training sample in the training data set of the random forest model, and calculate the real degradation paraphrase corresponding to the precipitation data of each source in the multi-source precipitation data based on the random forest model;
the first downscaling unit is configured to perform interpolation processing on a residual error of each training sample in the training data set through a high-precision curve modeling method, and perform summation calculation on a result of the interpolation processing and the prediction result to obtain downscaling comprehensive data;
and the second downscaling unit is configured to determine a downscaling result of the precipitation data of each source in the multi-source precipitation data based on the downscaling comprehensive data and the real downscaling hydrological degradation degree.
7. The multi-source precipitation data collaborative downscaling system of coupled random forest and HASM according to claim 6, further comprising a model construction unit configured to:
the scale unifying module is configured to perform spatial resolution unifying processing on the pre-acquired original multi-source precipitation data according to the target resolution to obtain multi-source precipitation data with unified resolution;
the training data construction module is configured to construct a training data set according to the observation station rainfall data and the multi-source rainfall data with unified resolution;
a training module configured to construct the random forest model based on the training data set.
8. The cooperative multisource precipitation data downscaling system of coupled random forest and HASM according to claim 7, wherein the scale unification module is further configured to:
calculating a spatial scale unity factor according to the target resolution and the spatial resolution of the precipitation data of each source in the original multi-source precipitation data;
and based on the spatial scale unification factor, converting each original pixel in the precipitation data of each source in the original multi-source precipitation data into a target pixel under the target resolution, and setting the value of the target pixel as the value of the corresponding original pixel to obtain the multi-source precipitation data with unified resolution.
9. The multi-source precipitation data collaborative downscaling system of a coupled random forest and HASM according to claim 7, wherein the computing unit further comprises an interpretative computation module;
the interpretation degree calculation module is configured to: calculating the overall interpretation degree of the random forest model; the overall interpretation degree represents the change degree of real rainfall which can be interpreted by the random forest model after the random forest model is trained by using all training samples of the training data set;
removing training samples corresponding to the first source rainfall data from the training data set, then training the random forest model again, and calculating to obtain a first interpretation degree; the first source precipitation data is precipitation data of any one source in the multi-source precipitation data;
and performing difference operation on the overall interpretation degree and the first interpretation degree to obtain a real degradation interpretation degree corresponding to the first source precipitation data, wherein the real degradation interpretation degree represents the change degree of real precipitation which can be interpreted by the first source precipitation data.
10. The collaborative downscaling system of multi-source precipitation data coupling random forest and HASM according to claim 6, wherein the second downscaling unit is further configured to:
and multiplying the downscaling comprehensive data with the real downscaling release degree corresponding to the precipitation data of each source in the multi-source precipitation data to obtain a downscaling result of the precipitation data of each source in the multi-source precipitation data.
CN202310239673.1A 2023-03-14 2023-03-14 Multi-source precipitation data collaborative downscaling method and system coupled with random forest and HASM Active CN115952743B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310239673.1A CN115952743B (en) 2023-03-14 2023-03-14 Multi-source precipitation data collaborative downscaling method and system coupled with random forest and HASM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310239673.1A CN115952743B (en) 2023-03-14 2023-03-14 Multi-source precipitation data collaborative downscaling method and system coupled with random forest and HASM

Publications (2)

Publication Number Publication Date
CN115952743A true CN115952743A (en) 2023-04-11
CN115952743B CN115952743B (en) 2023-05-05

Family

ID=85903362

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310239673.1A Active CN115952743B (en) 2023-03-14 2023-03-14 Multi-source precipitation data collaborative downscaling method and system coupled with random forest and HASM

Country Status (1)

Country Link
CN (1) CN115952743B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116108761A (en) * 2023-04-12 2023-05-12 中国科学院地理科学与资源研究所 Regional climate simulation method and system for coupling deep learning and HASM

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104143043A (en) * 2014-06-27 2014-11-12 南京林业大学 Multifunctional climate data model and application thereof
US20220043182A1 (en) * 2019-10-14 2022-02-10 Guangzhou Institute of Geography, Guangdong Academy of Science Spatial autocorrelation machine learning-based downscaling method and system of satellite precipitation data
CN115659853A (en) * 2022-12-28 2023-01-31 中国科学院地理科学与资源研究所 Nonlinear mixed-effect strain coefficient downscaling method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104143043A (en) * 2014-06-27 2014-11-12 南京林业大学 Multifunctional climate data model and application thereof
US20220043182A1 (en) * 2019-10-14 2022-02-10 Guangzhou Institute of Geography, Guangdong Academy of Science Spatial autocorrelation machine learning-based downscaling method and system of satellite precipitation data
CN115659853A (en) * 2022-12-28 2023-01-31 中国科学院地理科学与资源研究所 Nonlinear mixed-effect strain coefficient downscaling method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
NA ZHAO: "A New HASM-Based Downscaling Method for High-Resolution Precipitation Estimates", 《REMOTE SENSING》 *
任梅芳;庞博;徐宗学;赵彦军;: "基于随机森林模型的雅鲁藏布江流域气温降尺度研究", 高原气象 *
赵娜: "基于HASM 方法对气候模式气温 降水的降尺度研究 ———以黑河流域为例", 《中国沙漠》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116108761A (en) * 2023-04-12 2023-05-12 中国科学院地理科学与资源研究所 Regional climate simulation method and system for coupling deep learning and HASM

Also Published As

Publication number Publication date
CN115952743B (en) 2023-05-05

Similar Documents

Publication Publication Date Title
US20220043182A1 (en) Spatial autocorrelation machine learning-based downscaling method and system of satellite precipitation data
CN113297527B (en) PM based on multisource city big data 2.5 Overall domain space-time calculation inference method
CN111898543A (en) Building automatic extraction method integrating geometric perception and image understanding
CN115204618B (en) CCMVS region carbon source sink equalization inversion method
CN112699959B (en) Multi-source multi-scale precipitation data fusion method and device based on energy functional model
CN111210483B (en) Simulated satellite cloud picture generation method based on generation of countermeasure network and numerical mode product
CN113837450B (en) Deep learning-based river network dense watershed water situation trend prediction method and application thereof
CN110909447B (en) High-precision short-term prediction method for ionization layer region
CN111652404A (en) All-weather earth surface temperature inversion method and system
CN112819066A (en) Res-UNet single tree species classification technology
CN112100922A (en) Wind resource prediction method based on WRF and CNN convolutional neural network
CN115952743A (en) Multi-source precipitation data collaborative downscaling method and system coupled with random forest and HASM
CN113627093A (en) Underwater mechanism cross-scale flow field characteristic prediction method based on improved Unet network
CN117933095B (en) Earth surface emissivity real-time inversion and assimilation method based on machine learning
CN114331842A (en) DEM super-resolution reconstruction method combined with topographic features
CN116822185A (en) Daily precipitation data space simulation method and system based on HASM
CN117454116A (en) Ground carbon emission monitoring method based on multi-source data interaction network
Zhang et al. SolarGAN: Synthetic annual solar irradiance time series on urban building facades via Deep Generative Networks
TWI717796B (en) System, method and storage medium for estimating the amount of sunshine in geographic location by artificial intelligence
CN116931129A (en) Short-term precipitation prediction method, device, equipment and medium based on multi-mode set
CN112001291A (en) Method and system for quickly extracting main river channel in gravel distribution area of flood fan
CN102880753A (en) Method for converting land utilization spatial characteristic scale based on fractal dimension
Chen et al. FC-ZSM: Spatiotemporal downscaling of rain radar data using a feature constrained zooming slow-mo network
Zhang et al. Digital twin empowered PV power prediction
CN115393731A (en) Method and system for generating virtual cloud picture based on interactive scenario and deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant