CN111932457A

CN111932457A - High-space-time fusion processing algorithm and device for remote sensing image

Info

Publication number: CN111932457A
Application number: CN202010786064.4A
Authority: CN
Inventors: 张永梅; 马健喆; 滑瑞敏; 张奕; 孙捷; 李小冬
Original assignee: North China University of Technology
Current assignee: North China University of Technology
Priority date: 2020-08-06
Filing date: 2020-08-06
Publication date: 2020-11-13
Anticipated expiration: 2040-08-06
Also published as: CN111932457B

Abstract

The invention discloses a high space-time fusion processing algorithm and device for remote sensing images, which are used for acquiring a first remote sensing image with low space and high time and a second remote sensing image with high space and low time; resampling the first remote sensing image to obtain a third remote sensing image; down-sampling the second remote sensing image to obtain a fourth remote sensing image; inputting the third remote sensing image and the fourth remote sensing image into an SRCNN network to obtain a first reconstructed image with intermediate spatial resolution; resampling the first reconstructed image to obtain a first reconstructed image with a second spatial resolution; and inputting the first reconstructed image and the second remote sensing image with the second spatial resolution into the SRCNN network to obtain a second reconstructed image, and inputting the second reconstructed image into the space-time fusion model to obtain a high-space high-time remote sensing image. The method solves the problems that in the prior art, a process fusion algorithm for performing space-time fusion on the remote sensing image to obtain high space-time resolution characteristic data is high in complexity, low in fusion quality and low in operation efficiency.

Description

High-space-time fusion processing algorithm and device for remote sensing image

Technical Field

The invention relates to the technical field of image processing and pattern recognition, in particular to a high space-time fusion processing algorithm and device for remote sensing images, electronic equipment and a computer readable storage medium.

Background

The optical remote sensing image has the characteristics of abundant spectral features and data diversity of ground objects, convenience in acquisition, processing and analysis and the like, and is applied to aspects of vegetation change monitoring, surface temperature monitoring, crop growth monitoring, flood monitoring and the like. At present, low-spatial-resolution remote sensing images such as a medium-resolution Imaging spectrometer (MODIS) are mainly adopted for monitoring the change of vegetation on a large scale, monitoring the surface temperature, monitoring the growth of crops and monitoring flood, and although the low-spatial-resolution images play an important role in the large-scale monitoring, the low-spatial-resolution images have low spatial resolution and a large number of mixed pixels, so that the extraction precision based on the low-spatial-resolution image data is difficult to meet the requirements on a small area scale, particularly the requirements on the aspects of monitoring the change of vegetation on the urban surface, monitoring the temperature, monitoring the flood and the like. And the application of the medium and high spatial resolution satellite (such as Landsat) in actual monitoring is limited because the data is difficult to obtain after vegetation, temperature, crop growth change and flood occur due to the influence of the revisit period and weather. Therefore, how to acquire the high-time and high-spatial-resolution remote sensing images simultaneously is the key point for solving the problem.

The method is limited by hardware conditions and emission cost of the satellite sensor, and a single remote sensing image obtained by the method cannot have both spatial resolution and time resolution, so that the application of the remote sensing image is restricted. In addition, after the launching of some early sensor-mounted satellites, massive data have been acquired. However, due to the technology and conditions at the time, the obtained remote sensing images have to make more trade-offs between the space-time resolution, and the images are difficult to be directly utilized.

The remote sensing image space-time fusion technology is a method for fusing high space-low time resolution data and low space-high time resolution data so as to obtain characteristic data with high space-time resolution. The remote sensing image with high space-time resolution is obtained by using the remote sensing image space-time fusion algorithm, the contradiction between the remote sensing image space-time resolution is favorably solved, the application field and the value of the existing remote sensing image can be enriched, and the utilization efficiency of mass remote sensing historical data can be improved. The remote sensing space-time fusion technology can meet the huge requirements of the application field on the time resolution and the space resolution of remote sensing image data under the condition of low cost, and enriches the time dimension and the space dimension information of the image through a data fusion prediction method. The remote sensing space-time fusion technology can be roughly classified into three types: a transformation model based fusion method, a reconstruction model based fusion method, and a learning model based fusion method.

The fusion method based on the transformation model is the earliest implemented remote sensing spatio-temporal data fusion method. Malenovsky et al have proved the feasibility of the spatio-temporal fusion of MODIS (250m, 500m) and Landsat TM (30m) images for the first time through experiments, and the transformation-based fusion method means that after data transformation (such as wavelet decomposition and the like) is performed on a remote sensing image, fusion processing is performed on the transformed data, and then inverse transformation is performed, so as to obtain a high-resolution image at an unknown time. By means of wavelet transformation, spatial resolution and spectral resolution fusion method is used for reference, space-time fusion is carried out on MODIS image low-frequency information sampled to 480m resolution and Landsat image high-frequency information sampled to 240m resolution, and generated prediction images have the detail effect of Landsat data under the condition that the overall characteristics of the MODIS images are reserved. However, the spatial resolution is improved by the method to only 240m, and the Landsat data used in fusion is the last time phase data, so that the final result has a larger difference compared with the real data in some regions with surface change. The fusion result obtained by the method has lower resolution and poorer fusion effect in the changed area.

Shevyrnogov et al performs principal component transformation during time-space fusion, obtains a first principal component of data of a multispectral Scanner (MSS for short) by using a principal component analysis method, extracts a luminance component, and obtains vegetation coverage index data with high time-space resolution by fusing MSS sensor luminance component data and NOAA sensor vegetation coverage index data.

When fusion is performed based on the reconstruction model, it is assumed that a low-resolution pixel value can be obtained by linearly combining the high-resolution pixels. When the Zhukov et al carries out space-time fusion, the continuity of homogeneous pixel space change is ensured by introducing a sliding window method based on the assumption that a central pixel is only relevant to surrounding pixels. On the basis, according to the assumption that the smaller the distance from the surrounding pixels to the central pixel, the more easily the target pixel is influenced, the consideration on the spatial variability of the pixel is added, and the Euclidean distance is used for expressing the correlation. Butetto et al screen the peripheral pixels in the window according to the spectral distance to the central pixel, thereby eliminating heterogeneous pixels which are close to the central pixel but have large spectral differences, and the result of the fusion algorithm can better process the situation that similar ground objects with spatial continuity change, but if the change situation between two times is the growth situation of crops in spring and winter, the fused vegetation coverage index has deviation. Zurita Milla et al, based on the object-oriented idea, perform image segmentation on low-resolution and high-resolution data to obtain an object classification map, and perform mixed pixel decomposition by using a sliding window technology, thereby realizing fusion prediction of high-spatial-temporal data, but the final effect is not ideal. The fusion method based on the reconstruction model mainly carries out pixel decomposition on the low-spatial-resolution images, calculates the change relation of remote sensing images at different time phases, simultaneously needs the change relation among the images at different resolution ratios, and obtains the high-spatial-resolution images at unknown time by utilizing interpolation. The method is often poor in precision when the ground object coverage is complex and the mixed pixels are serious.

The space-time Adaptive Fusion Model (STARFM) proposed by Gao et al is the most practical and widely used remote sensing space-time data Fusion method at present. The method integrates Landsat ETM + data with 30m spatial resolution in a revisit period of 16 days, and MODIS images with 500m spatial resolution in a revisit period of 0.5 days. Under the condition of not considering errors such as atmosphere and the like, according to the basic idea of assumption based on a reconstruction model, the neighborhood information of the central pixel is extracted by using a sliding window, the surrounding similar pixels are searched, the difference of the pixels in space distance, spectral distance and time distance is comprehensively considered, and the mapping relation from low-resolution data to high-resolution data is reflected by using a weight matrix, so that the reflectivity of the central pixel is calculated. The fused data has the high dynamic time phase characteristics of MODIS and the description capacity of Landsat on space details, adapts to the change information of crops in different seasons to a certain extent, and embodies good consistency and continuity.

Although the STARFM fusion model can achieve better fusion effect, the method can generate some plaque effect and is insensitive to foreign object change in the fusion result. To solve this problem, Hilker et al designed a space-time Adaptive reflectivity Change mapping Algorithm (STARRCH), which selects multiple sets of MODIS and Landsat images, and selects the best set of them for STARFM fusion, so as to improve the fusion quality. In order to solve the problem that the effect obtained by the STARFM method is poor when the STARFM method is fused in a serious mixed pixel region, Zhu et al designs an Enhanced space-time Adaptive reflectivity Fusion Model (abbreviated as ESTARFM) under the inspired of mixed pixel decomposition, and improves the Fusion precision of the STARFM Fusion Model when processing complex ground objects by increasing the variation trend of the sliding window pixels between different phases.

The fusion method based on the learning model is derived from the vigorous development of the machine learning technology, and the sparse representation and super-resolution reconstruction technology is a fusion method based on the learning model which is used in the remote sensing space-time fusion in a large number. Yang et al perform joint training on the high-resolution and low-resolution images to generate respective corresponding dictionaries, find out the similarity between the high-resolution and low-resolution images through the corresponding dictionaries, and enhance the resolution of the low-resolution images by using the similarity. Based on the Yang method idea, Zeyde et al adopt K-SVD training to low-resolution data to obtain a corresponding dictionary, and calculate to obtain a high-resolution dictionary by using the low-resolution dictionary and the corresponding high-resolution data, thereby improving the resolution of the low-resolution data. The method only uses the K-SVD algorithm to operate one image each time during training, thereby reducing the complexity of the algorithm and improving the operation efficiency of the algorithm.

Inspired by the Zeyde method, Song and Huang design a fusion method based on a learning model and based on sparse representation super-resolution reconstruction. The method comprises the steps of performing super-resolution reconstruction on low-resolution data by utilizing sparse representation to enable the low-resolution data to have the same resolution as high-resolution data, performing high-pass filtering on the reconstructed low-resolution data and original high-resolution data, and performing STARFM fusion. The method carries out super-resolution reconstruction and STARFM twice time-space fusion, fully calculates the change information of high-resolution data and low-resolution data on different time phases, and can better fuse and predict various types of earth surface change information. At present, a fusion method based on a learning model is still in a development stage, more shallow learning is used, an algorithm adopting deep learning is generally high in complexity, low in operation efficiency and difficult to realize large-area fusion, and the spatial resolution is not suitable to be too large and is generally about 4 times larger.

The STARFM method proposed by Gao et al is the most widely applied method in the field of remote sensing image space-time fusion. The method assumes that the reflectivity of the low spatial resolution (e.g., MODIS) image element can be expressed as a linear combination of the reflectivities of the image elements corresponding to the high spatial resolution (e.g., Landsat) while ignoring geometric errors and atmospheric correction errors. If the spatial resolution of the MODIS image is improved to enable the MODIS image and the Landsat data to have the same spatial resolution, the original low-spatial-resolution pixel and the pixel at the position corresponding to the high-spatial resolution have the following relationship:

M(x,y,t)＝L(x,y,t)+ (1)

m (x, y, t) and L (x, y, t) respectively represent the reflectivity of pixels of the MODIS image and the Landsat image at the time t, wherein the coordinates of the pixels are (x, y), and represent the system errors of the images with different resolutions due to different sensors and the like. Assuming that it does not change with time, if t already exists₀And t_kThe MODIS image at a moment in time,and t₀The Landsat image at the moment is unknown t_kThe Landsat image at the time can be obtained by equation (2).

L(x,y,t_k)＝L(x,y,t₀)+M(x,y,t_k)-M(x,y,t₀) (2)

In order to avoid the pixel mixing phenomenon caused by the assumption, the STARFM method introduces the adjacent pixel information by the sliding window technique, that is:

w is the sliding window size, W_i,j,kIs a joint weight matrix. In the STARFM method, similar pixels in the sliding window need to be searched, poor pixels are excluded, and only the filtered pixels are configured with non-zero weights. The similar pixel searching method is shown in formula (4).

f(x_i,y_j) Is a candidate pixel when searching similar pixels in the ith band, f (x)_w/2,y_w/2) Is a sliding window center pixel, L_stdevFor the standard deviation of the input Landsat data, m is the total number of ground object classes estimated in advance. After similar pixel searching is carried out, poor pixels need to be excluded so as to further improve the similar precision. The method for excluding the bad pixels is shown in formulas (5) and (6).

S_i,j,k<|L(x_w/2,y_w/2,t₀)-M(x_w/2,y_w/2,t₀)|±_LM(5)

T_i,j,k<|M(x_w/2,y_w/2,t_k)-M(x_w/2,y_w/2,t₀)|±_MM(6)

_L、_MRespectively, standard deviations of Landsat and MODIS images,

the standard deviation between Landsat and MODIS images of the same time phase,

is the standard deviation, S, between MODIS images of different time phases_i,j,k、T_i,j,kRespectively, the time distance and the spectrum distance from the candidate pixel to the center pixel, and the calculation formulas are respectively formula (7) and formula (8). When the weight is configured to the screened similar pixels, the spectral distance S between the pixels needs to be comprehensively considered_i,j,kTime distance T_i,j,kAnd a spatial distance D_i,j,kAnd the like, the calculation formulas are respectively as follows.

S_i,j,k＝|L(x_i,y_j,t_k)-M(x_i,y_j,t_k)| (7)

T_i,j,k＝|M(x_i,y_j,t₀)-M(x_i,y_j,t_k)| (8)

D_i,i,k＝1.0+d_i,j,k/A (10)

A is a constant defining S_i,j,k、T_i,j,kAnd D_i,j,kThe ratio of the degrees of importance therebetween. Weight w of the similar pixel_i,j,kThe calculation formula is formula (11) or (12).

w_i,j,k＝S_i,j,k×T_i,j,k×D_i,j,k (11)

w_i，j，k＝ln(S_i，j，k×B+1)×ln(T_i，j，k×B+1)×D_i，j，k (12)

B is a scaling factor that depends on the difference between the sensor resolutions. Calculating the weight w of all screened similar pixels in the sliding window_i,j,kNormalizing the weight matrix to obtain a combined weight matrix W_i,j,kAnd predicting the resolution of the pixels of the high-resolution remote sensing image at the unknown moment.

The technical route of the STARFMF method is as shown in fig. 1, low-resolution data is resampled to high-resolution data, similar pixels between sliding window screening data pairs are constructed, distance factors, time factors and spectrum factors are respectively calculated, joint weights are obtained, a sliding window is moved, and high-resolution data at an entire unknown moment are obtained.

Due to the fact that the difference of resolution between MODIS and Landsat images is too large, mixed pixels are serious in the resampling process, and the accuracy of prediction of the STARFM method is limited.

STARFM method when introducing the sliding window technique, the weight calculation depends on the combination of spectral distance, temporal distance and spatial distance, and can be regarded as a manual feature extracted according to expert experience knowledge.

With the rapid development of machine learning, a fusion method based on a learning model is generated. For example, Huang et al propose a method for performing super-resolution reconstruction on a low-spatial-resolution image by using sparse representation and then performing high-pass filtering and then fusing on data, and the fusion effect is good. However, the fusion method based on the learning model has the disadvantages of more shallow learning, higher complexity, low operation efficiency, difficulty in realizing large-area fusion, and difficulty in too large difference in spatial resolution, which is generally about 4 times.

Meanwhile, the remote sensing image data volume is large, the complexity of the remote sensing image space-time fusion algorithm is high, the efficiency of serial calculation by using a CPU is low, and the consumed time is long. Therefore, it is very difficult to obtain a remote sensing image with high space-time resolution of a mass time sequence by a remote sensing image space-time fusion algorithm.

Aiming at the problems that in the prior art, the process of performing space-time fusion on remote sensing images to obtain high space-time resolution characteristic data has high complexity of a fusion algorithm, low fusion quality and low operation efficiency, and an effective solution is not provided yet.

Disclosure of Invention

In view of this, embodiments of the present invention provide a high spatiotemporal fusion processing algorithm, an apparatus, an electronic device and a computer-readable storage medium for remote sensing images, so as to solve the problems in the prior art that the complexity of a fusion algorithm is high, the fusion quality is low, and the operation efficiency is low in the process of performing spatiotemporal fusion on remote sensing images to obtain high spatiotemporal resolution feature data.

Therefore, the embodiment of the invention provides the following technical scheme:

the invention provides a high space-time fusion processing algorithm for remote sensing images in a first aspect, which comprises the following steps:

acquiring a first remote sensing image with low space and high time and a second remote sensing image with high space and low time; the spatial resolution of the first remote sensing image is a first spatial resolution, the period of the first remote sensing image is a first period, the spatial resolution of the second remote sensing image is a second spatial resolution, and the period of the second remote sensing image is a second period;

resampling the first remote sensing image to obtain a third remote sensing image; the spatial resolution of the third remote sensing image is greater than that of the first remote sensing image;

performing down-sampling processing on the second remote sensing image to obtain a fourth remote sensing image; the spatial resolution of the fourth remote sensing image is smaller than the spatial resolution of the second remote sensing image, and the spatial resolution of the fourth remote sensing image is equal to the spatial resolution of the third remote sensing image;

inputting the third remote sensing image and the fourth remote sensing image into an SRCNN network to obtain a first reconstructed image with intermediate spatial resolution; wherein the intermediate spatial resolution is the same spatial resolution as the third remote sensing image and the fourth remote sensing image;

resampling the first reconstructed image to obtain a first reconstructed image with the second spatial resolution;

inputting the first reconstructed image with the second spatial resolution and the second remote sensing image into an SRCNN network to obtain a second reconstructed image;

inputting the second reconstructed image into a space-time fusion model to obtain a high-space high-time remote sensing image; wherein the high-spatial high-temporal remote sensing image has the first period and the second spatial resolution.

Optionally, comprising: the first remote sensing image is an MODIS image; the second remote sensing image is a Landsat8 image.

Optionally, inputting the second reconstructed image into the space-time fusion model to obtain a high-space high-time remote sensing image further includes:

acquiring a second reconstructed image at the first moment, a second reconstructed image at the second moment and a second remote sensing image at the first moment; taking the second reconstructed image as a low-resolution image and taking the second remote sensing image as a high-resolution image;

searching similar pixels in a preset sliding window according to the second reconstructed image at the first moment, the second reconstructed image at the second moment and the second remote sensing image at the first moment;

acquiring the spectral distance, the time distance and the spatial distance between the similar pixel and the central pixel;

configuring the weight of the similar pixel according to the spectral distance, the time distance and the spatial distance;

normalizing the weights of the similar pixels to obtain a combined weight matrix;

predicting a second remote sensing image at a second moment according to the combined weight matrix; and taking the second remote sensing image at the second moment as the high-space high-time remote sensing image.

Optionally, the method further comprises:

PSNR and SSIM were calculated by the following formula:

and/or the presence of a gas in the gas,

wherein f is₁Representing the original image without super-resolution reconstruction, f₂Showing super-resolutionReconstructed image, MSE (f)₁,f₂) Representing the mean square error, mu, between the original image not subjected to super-resolution reconstruction and the reconstructed image subjected to super-resolution reconstruction₁、σ₁Respectively representing the mean, variance, mu of the original image without super-resolution reconstruction₂、σ₂Respectively representing the mean, variance, sigma of the reconstructed image after super-resolution reconstruction₁₂Representing the covariance between the original image not subjected to super-resolution reconstruction and the reconstructed image subjected to super-resolution reconstruction, C₁、C₂Is a constant used to maintain stability.

Optionally, before the program is executed by the thread in the SRCNN network, the convolution kernel parameters are preloaded from the global memory to the shared memory, and the convolution kernel parameters are read from the shared memory when the thread in the first predetermined block executes the program.

Optionally, before a thread in the SRCNN network executes a program, preloading a related image area from a global memory to a shared memory, and reading the related image area from the shared memory when a thread in a second predetermined block executes the program; wherein the related image area comprises image areas involved by all threads in the second predetermined block.

Optionally, in the SRCNN network, the convolutional layer and the nonlinear mapping layer are merged, and the same thread performs the convolutional operation and the nonlinear mapping operation.

In a second aspect of the present invention, there is provided a high spatial-temporal fusion processing apparatus for remote sensing images, comprising:

the acquisition module is used for acquiring a first remote sensing image with low space and high time and a second remote sensing image with high space and low time; the spatial resolution of the first remote sensing image is a first spatial resolution, the period of the first remote sensing image is a first period, the spatial resolution of the second remote sensing image is a second spatial resolution, and the period of the second remote sensing image is a second period;

the first resampling module is used for resampling the first remote sensing image to obtain a third remote sensing image; the spatial resolution of the third remote sensing image is greater than that of the first remote sensing image;

the down-sampling module is used for performing down-sampling processing on the second remote sensing image to obtain a fourth remote sensing image; the spatial resolution of the fourth remote sensing image is smaller than the spatial resolution of the second remote sensing image, and the spatial resolution of the fourth remote sensing image is equal to the spatial resolution of the third remote sensing image;

the first input module is used for inputting the third remote sensing image and the fourth remote sensing image into an SRCNN network to obtain a first reconstructed image with intermediate spatial resolution; wherein the intermediate spatial resolution is the same spatial resolution as the third remote sensing image and the fourth remote sensing image;

the second resampling module is used for resampling the first reconstructed image to obtain a first reconstructed image with the second spatial resolution;

the second input module is used for inputting the first reconstructed image with the second spatial resolution and the second remote sensing image into the SRCNN network to obtain a second reconstructed image; wherein the second reconstructed image has the first period and the second spatial resolution;

the third input module is used for inputting the second reconstructed image into a space-time fusion model to obtain a high-space high-time remote sensing image; wherein the high-spatial high-temporal remote sensing image has the first period and the second spatial resolution.

In a third aspect of the present invention, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the remote sensing image high space-time fusion processing algorithm of any one of the first aspect.

In a fourth aspect of the present invention, there is provided a computer readable storage medium, on which computer instructions are stored, and when the instructions are executed by a processor, the algorithm for processing high spatial-temporal fusion of remote sensing images according to any one of the first aspect is implemented.

The technical scheme of the embodiment of the invention has the following advantages:

the embodiment of the invention provides a remote sensing image high space-time fusion processing algorithm, a device, electronic equipment and a computer readable storage medium, wherein the remote sensing high space-time fusion processing algorithm comprises the following steps: acquiring a first remote sensing image with low space and high time and a second remote sensing image with high space and low time; the spatial resolution of the first remote sensing image is a first spatial resolution, the period of the first remote sensing image is a first period, the spatial resolution of the second remote sensing image is a second spatial resolution, and the period of the second remote sensing image is a second period; resampling the first remote sensing image to obtain a third remote sensing image; the spatial resolution of the third remote sensing image is greater than that of the first remote sensing image; performing down-sampling processing on the second remote sensing image to obtain a fourth remote sensing image; the spatial resolution of the fourth remote sensing image is smaller than that of the second remote sensing image, and the spatial resolution of the fourth remote sensing image is equal to that of the third remote sensing image; inputting the third remote sensing image and the fourth remote sensing image into an SRCNN network to obtain a first reconstructed image with intermediate spatial resolution; wherein the intermediate spatial resolution is the same spatial resolution as the third remote sensing image and the fourth remote sensing image; resampling the first reconstructed image to obtain a first reconstructed image with a second spatial resolution; inputting the first reconstructed image and the second remote sensing image with the second spatial resolution into the SRCNN network to obtain a second reconstructed image; inputting the second reconstructed image into a space-time fusion model to obtain a high-space high-time remote sensing image; the remote sensing image with high space and high time has the first period and the second spatial resolution. The method solves the problems that in the prior art, the process of performing space-time fusion on the remote sensing image to obtain high space-time resolution characteristic data has high complexity of a fusion algorithm, low fusion quality and low operation efficiency.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow diagram of high resolution data for obtaining an entire unknown time using STARFM techniques;

FIG. 2 is a flow chart of a remote sensing high spatiotemporal fusion method according to an embodiment of the present invention;

FIG. 3 is a flow chart of a super-resolution reconstruction method according to an embodiment of the present invention;

FIG. 4 is a comparison of remote sensing images sampled to 90m resolution by different sensors;

FIG. 5 is a schematic diagram of a CUDA memory model;

FIG. 6 is a schematic diagram of kernel function sharing;

FIG. 7 is a schematic diagram of an improved image block reading;

FIG. 8 is a comparison graph of remote sensing images of different sensors;

FIG. 9 is a Landsat8 true color remote sensing image map;

FIG. 10 is a graph showing experimental results of different methods for reconstruction with a resolution of 90 m;

FIG. 11 is a graph comparing experimental results obtained by super-resolution reconstruction through secondary learning with experimental raw data according to an embodiment of the present invention;

FIG. 12 is a graph comparing the results of three spatiotemporal fusion methods;

FIG. 13 is a block diagram of the remote sensing high space-time fusion device according to the embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more features. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.

In this application, the word "exemplary" is used to mean "serving as an example, instance, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments. The following description is presented to enable any person skilled in the art to make and use the application. In the following description, details are set forth for the purpose of explanation. It will be apparent to one of ordinary skill in the art that the present application may be practiced without these specific details. In other instances, well-known structures and processes are not set forth in detail in order to avoid obscuring the description of the present application with unnecessary detail. Thus, the present application is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.

In accordance with an embodiment of the present invention, there is provided an embodiment of a remote sensing high spatiotemporal fusion method, it being noted that the steps illustrated in the flowchart of the accompanying drawings may be carried out in a computer system such as a set of computer executable instructions and that, although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be carried out in an order different than that described herein.

In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Aiming at the 'space-time contradiction' of the remote sensing image and the high complexity of the space-time fusion algorithm of the remote sensing image, the embodiment provides a remote sensing high space-time fusion method which can be used for electronic equipment, and fig. 2 is a flow chart of the remote sensing high space-time fusion method according to the embodiment of the invention, and as shown in fig. 2, the flow comprises the following steps:

step S201, acquiring a first remote sensing image with low space and high time and a second remote sensing image with high space and low time; the spatial resolution of the first remote sensing image is a first spatial resolution, the period of the first remote sensing image is a first period, the spatial resolution of the second remote sensing image is a second spatial resolution, and the period of the second remote sensing image is a second period. Specifically, the first remote sensing image with low space and high time is an MODIS image, and the second remote sensing image with high space and low time is a Landsat8 image. Specifically, the MODIS image is an MOD09Q1 remote sensing image with a spatial resolution of 250m, and the Landsat8 image is a Landsat8 remote sensing image with a spatial resolution of 30 m.

Step S202, resampling the first remote sensing image to obtain a third remote sensing image; and the spatial resolution of the third remote sensing image is greater than that of the first remote sensing image.

Step S203, performing down-sampling processing on the second remote sensing image to obtain a fourth remote sensing image; and the spatial resolution of the fourth remote sensing image is smaller than that of the second remote sensing image, and is equal to that of the third remote sensing image.

Specifically, the MODIS image is sampled to obtain a MODIS image with a third spatial resolution, and the Landsat8 image is sampled to obtain a Landsat8 image with the third spatial resolution. Specifically, the Landsat8 remote sensing image with 30m spatial resolution is down-sampled to 90m resolution (i.e., the third spatial resolution) using the nearest neighbor method, and the MOD09Q1 remote sensing image with 250m spatial resolution is re-sampled to 90m resolution (i.e., the third spatial resolution) using the bicubic interpolation method. The sampled remote sensing images have the same resolution in form, but have large gap in spatial detail.

Step S204, inputting the third remote sensing image and the fourth remote sensing image into an SRCNN network to obtain a first reconstructed image with intermediate spatial resolution; wherein the intermediate spatial resolution is the same spatial resolution as the third remote sensing image and the fourth remote sensing image. Specifically, the SRCNN is used to perform the first learning on the two sets of sampled data, so as to obtain a MOD09Q1 reconstructed image with a resolution of 90m (i.e., a third spatial resolution).

In step S205, the first reconstructed image is resampled to obtain a first reconstructed image with a second spatial resolution. Specifically, the resulting 90m resolution reconstructed image is resampled to 30m resolution.

Step S206, inputting the first reconstructed image and the second remote sensing image with the second spatial resolution into the SRCNN network to obtain a second reconstructed image; the second reconstructed image is a reconstructed image with ultrahigh resolution. Specifically, the second learning is performed using the srncn together with the Landsat8 original image with a resolution of 30m, resulting in a MOD09Q1 reconstructed image with a resolution of 30 m. The SRCNN networks used twice in the embodiment of the invention have the same structure, and the difference is only in that the input and the output are different, and the difference of the input and the output is only reflected in the resolution. The SRCNN network training process is carried out through image block cutting, so that the two SRCNN network parameter formats have complete consistency, the trained first SRCNN network parameters can be transferred to the second SRCNN network, the program running efficiency is improved, and the time complexity improvement brought by deep learning is relieved.

Step S207, inputting the second reconstructed image into a space-time fusion model to obtain a high-space high-time remote sensing image; the remote sensing image with high space and high time has the first period and the second spatial resolution.

The existing remote sensing image space-time fusion algorithm is complex, the calculation time is long, and the acquisition of a large number of time sequence remote sensing images with high space-time resolution is very difficult. In order to improve the fusion quality, the embodiment of the invention introduces deep learning with stronger learning capacity and more powerful feature extraction into remote sensing space-time fusion, and reconstructs a low-Resolution image by a method of secondary learning on an SRCNN (Super Resolution conditional Neural Network) Network on the basis of the existing remote sensing image space-time fusion based on a learning model. The SRCNN network has a simple structure, can relatively relieve the improvement of complexity, and can relieve the influence of overlarge resolution difference during fusion by a secondary learning mode. The embodiment of the invention is based on the idea of STARFM, utilizes the neural network to automatically extract the characteristics in the fusion process, learns the mapping relation and replaces the original process of screening similar pixels in a sliding window and calculating the weight of the pixels.

The embodiment of the invention adopts the SRCNN method to carry out super-resolution reconstruction on the low-spatial resolution data, and replaces the resampling process of the low-spatial resolution image of the STARFM method. The spatial resolution difference between the MODIS and Landsat images usually used for space-time fusion is too large, and the capability difference for describing spatial detail information is large. In addition, because there are differences between different sensors of the remote sensing images, even if there are large differences between the images in the pixel reflectivity of the same position at the same time, it is difficult to directly introduce the SRCNN into the remote sensing images for space-time fusion. The embodiment of the invention selects one of the intermediate resolutions to carry out transition, reconstructs the low-resolution data to the intermediate resolution through the learning training of the SRCNN, and reconstructs the intermediate-resolution data to the designated high resolution.

Through the steps, the SRCNN is used for performing super-resolution reconstruction on the low-resolution image, the high-resolution image is used as the priori knowledge of the low-resolution image, and because the resolution difference of the two groups of fused images is overlarge, the method provided by the embodiment of the invention is improved by adopting a secondary learning mode. The artificial features are extracted by utilizing expert knowledge when similar pixels are screened, weights are calculated and the like by the STARFM. Aiming at the problem of time complexity improvement caused by deep learning introduced into the improvement method, the GPU parallelization improvement method based on the CUDA realizes parallelization processing of the SRCNN, fully utilizes a memory classification technology and a thread configuration method of the GPU, and further improves the operation efficiency compared with the traditional direct parallelization method.

The super-resolution reconstruction method based on the quadratic learning related to the above steps is described in detail below with reference to specific alternative embodiments.

Due to the diffraction phenomenon, the highest resolution that can be achieved by an imaging system is generally limited by the imaging optics, the conventional method for improving image quality is difficult to recover information of the system beyond the cut-off frequency, and a Super Resolution (SR) reconstruction technology adopts a signal processing method to reconstruct information beyond the cut-off frequency of the imaging system, so as to obtain an image with resolution higher than that of the imaging system.

Conventional super-resolution reconstruction techniques are classified into reconstruction-based methods and shallow learning-based methods. Based on a reconstruction method, sub-pixel precision alignment is carried out on a plurality of low-resolution images to obtain the motion offset between the images with different resolutions, so that space motion parameters are constructed, and then the high-resolution images are reconstructed through various prior constraint conditions and an optimal solution. The method based on shallow learning aims at guiding the reconstruction of the image by learning to obtain the mapping relation between the high-resolution image and the low-resolution image, and is generally divided into three stages of feature extraction, learning and reconstruction, and the stages are mutually independent, so that the feature extraction and expression capacity of the shallow learning is limited and needs to be further enhanced.

In recent years, deep learning techniques have been developed rapidly. A deep learning network is adopted in super-resolution reconstruction, and the design is carried out by still referring to the idea of the traditional super-resolution reconstruction method structurally, so that the super-resolution reconstruction method can be used as a predictor to output a more accurate predicted value.

In 2016, a Convolutional Neural Network (CNN) is applied to super-resolution reconstruction for the first time, and a new network SRCNN is provided. Starting from the relationship between deep learning and traditional Sparse Coding (SC), the network is divided into three stages of image block extraction, nonlinear mapping and image reconstruction, so that the three stages respectively correspond to 3 convolutional layers in a deep convolutional neural network framework and are unified in the neural network, and super-resolution reconstruction from a low-resolution image to a high-resolution image is realized. The network learns end-to-end mapping between low and high resolution imagery directly, requiring little preprocessing and post-processing other than optimization.

The three convolutional layer convolutional kernels used in the SRCNN are 9X9, 1X1 and 5X5 in size, and the first two output feature numbers are 64 and 32, respectively. The SRCNN regards the Sparse coding process as convolution operation, the network design is simple, the reconstruction effect is greatly improved compared with an SCSR (Sparse coding based super resolution) method which is taken as a shallow learning representation, and the method is a referential super-resolution reconstruction method. According to the embodiment of the invention, a secondary learning mode is introduced into the space-time fusion of the remote sensing image, so that the reconstruction effect of the low-resolution remote sensing image is improved.

In the embodiment of the invention, the SRCNN method is adopted to carry out super-resolution reconstruction on low-spatial-resolution data to replace the resampling process of the low-spatial-resolution image in the STARFM method, however, because the MOD09Q1 remote sensing image adopted in the embodiment of the invention has only 250m spatial resolution, the pixel mixing phenomenon is serious, and the Landsat8 remote sensing image has 30m spatial resolution, the resolution difference between the two is 8 to 9 times, and the capacity difference of describing spatial detail information is large; moreover, because various differences exist among different sensors for obtaining the remote sensing images, even if the pixel reflectivity of the images at the same position at the same moment is also greatly different, the space-time fusion of directly introducing the SRCNN into the remote sensing images is difficult.

The difference of spatial resolution of the method based on the learning model is not too large, generally about 4 times, the training difficulty of a deep learning neural network is easily caused by the too large difference, the effect of enhancing the resolution of a low-resolution image is not good enough, and the phenomenon of serious pixel mixing is difficult to relieve.

In order to solve the above problem, the embodiment of the present invention improves the super-resolution reconstruction stage by using a secondary learning method. The low spatial resolution remote sensing image is reconstructed to 90m resolution from 250m resolution through learning and then to 30m resolution, so that the difference of the resolutions in the two learning processes is ensured to be within 4 times. The technical route of the super-resolution reconstruction method based on the quadratic learning is shown in fig. 3.

The Landsat8 remote sensing image with the spatial resolution of 30m is down-sampled to the resolution of 90m by using the nearest neighbor method, and the MOD09Q1 remote sensing image with the spatial resolution of 250m is re-sampled to the resolution of 90m by using the bicubic interpolation method. As shown in fig. 4, the sampled remote sensing images still have a large difference in spatial detail although they have the same resolution in form. In the embodiment of the invention, SRCNN is adopted to carry out first learning on two groups of sampled data to obtain a MOD09Q1 reconstructed image with 90m resolution. The obtained 90 m-resolution reconstructed image is resampled to 30 m-resolution, and is subjected to secondary learning together with the Landsat8 original image with 30 m-resolution by using SRCNN to obtain a MOD09Q1 reconstructed image with 30 m-resolution, and the reconstructed image is input as low-resolution data in a space-time fusion model in the embodiment of the invention.

The input of the conventional STARFM method is only achieved by resampling the MODIS image to Landsat8 resolution through an interpolation method, and when the SRCNN network is used for super-resolution reconstruction, the embodiment of the invention learns the Landsat8 image as the prior knowledge of MOD09Q1 data, so that on one hand, the accuracy of the super-resolution reconstruction is obviously superior to the result of resampling through the interpolation method, and the input quality of the STARFM is improved; on the other hand, in the learning process, various errors caused by different sensors can be relieved, and the reconstructed image is closer to the input high-resolution image in style and similarity. The method can restore the original difference information between the high-resolution data and the low-resolution data as much as possible, so that the fusion result is more accurate.

Deep learning generally has good migration capability, namely, model parameters which are trained on a certain data set are migrated and used, and problems can be processed on a different data set. The good property can help the deep learning neural network to process different problems, only by transferring and using a small amount of training samples of new problems to fine-tune the parameters of the model for the trained model, the network can be quickly updated to a mode for processing the new problems, and a good prediction classification effect is realized. The transfer learning can save a large amount of time cost and improve the operation efficiency.

The two SRCNN networks for secondary learning in the embodiment of the invention have the same structure, and the difference is only in that the input and the output are different, and the difference of the input and the output is only reflected in the resolution. The training process of the SRCNN network is carried out through image block cutting, so that two network parameter formats for secondary learning have complete consistency, the trained network parameters for the primary learning can be transferred to the network for the secondary learning, the program operation efficiency is improved, and the time complexity improvement brought by deep learning is relieved.

In an alternative embodiment, the SRCNN network is trained by the following steps: the SRCNN network is trained using the following formula: l (x, y, t)_k)＝L(x,y,t₀)+M(x,y,t_k)-M(x,y,t₀) (ii) a M (x, y, t) and L (x, y, t) respectively represent the reflectivity of pixels of the MODIS image and the Landsat image at the coordinate (x, y) at the t moment; mixing L (x, y, t)₀)、M(x,y,t_k)、M(x,y,t₀) Inputting the data into an SRCNN network to obtain an output result; according to L (x, y, t)_k) And correcting the SRCNN network with the mean square error of the output result to obtain the SRCNN network. The embodiment of the invention improves the STARFM method, and replaces the manually extracted features with the automatically extracted features by fully utilizing the strong feature extraction and expression capability of deep learning.

The following is a detailed description of a specific automatic feature extraction method based on deep learning.

The embodiment of the invention uses SRCNN as a basic frame, extracts image characteristics by a deep learning neural network method, and discovers and learns the mapping relation between high-resolution data and low-resolution data.

The basic idea of STARFM is to predict high spatial resolution data at unknown times using high and low spatial resolution data at known times and low spatial resolution data at unknown times. In the concrete implementation of STARFM, in order to reduce the influence of the pixel mixing phenomenon, the reflectivity of the central pixel is calculated by introducing adjacent pixel information through a sliding window technology, similar pixels are screened according to a specific rule, bad pixels are eliminated, the corresponding weight of the pixels is calculated by a specific formula according to the spectral distance and the time distance between image pairs and the space distance inside the window, and the weight matrix of the sliding window is obtained.

The rules used by STARFM in the implementation, whether to screen or exclude pels within a sliding window, or the specific formulas used in calculating weights, are all within the domain of expert knowledge, i.e., manually extracted features. The embodiment of the invention adopts the neural network to carry out deep learning to replace the process, thereby realizing the automatic extraction of the characteristics.

Based on the basic idea of STARFM, t is known₀Time of day high and low resolution data and t_kAt the time of low resolution data, by the formula L (x, y, t)_k)＝L(x,y,t₀)+M(x,y,t_k)-M(x,y,t₀) The calculated data can be regarded as a predicted theoretical value, and the data is used as a learning sample, high-resolution actual data at a corresponding moment is used as a label, and the label is input into the SRCNN for training, so that the mapping relation between the predicted theoretical value and the actual label can be learned. The sliding window calculation method used by STARFM is also a direct way to calculate this mapping.

The embodiment of the invention adopts the neural network to automatically extract the characteristics when the method is used for carrying out the spatial-temporal fusion on the remote sensing images, learns the mapping relation between the prediction theoretical value and the corresponding label, and can quickly obtain the images needing fusion prediction after learning the mapping relation although more time is needed to learn in the early stage. In contrast, although the conventional STARFM method does not require the time cost of early learning, a large amount of computation with high complexity is required for each fusion prediction, and theoretically, if a large amount of data needs to be fused and predicted, the embodiment of the present invention has a relative advantage in time complexity.

The embodiment of the invention carries out two-point improvement on the traditional STARFM method, on one hand, the super-resolution reconstruction is carried out on the low-spatial resolution data by referring to a space-time fusion method based on a learning idea to replace the original direct resampling method. In order to alleviate the influence of overlarge resolution difference, the embodiment of the invention adopts a secondary learning mode, realizes super-resolution reconstruction of low-spatial resolution data by using an SRCNN network, and enriches the detail information of a low-resolution image by using high-resolution data of different sensors as priori knowledge in the reconstruction process; on the other hand, the embodiment of the invention is based on the basic thought of STARFM, the SRCNN is used as a framework, the features are automatically extracted by utilizing deep learning in the fusion process, and compared with the original method for manually calculating the features by utilizing a sliding window, the fusion quality can be obviously improved.

The reconstruction method of fig. 2 improves the spatial resolution of the high temporal resolution and low spatial resolution data to obtain high spatial-temporal resolution data with a certain precision, and then the high spatial-temporal resolution data is input into the improved STARFM method of the embodiment of the present invention, which is equivalent to performing double-layer spatial-temporal fusion, thereby ensuring a better fusion effect. In an alternative embodiment, referring to fig. 1, the remote sensing image high space-time fusion processing algorithm further includes obtaining a second reconstructed image at the first time, a second reconstructed image at the second time, and a second remote sensing image at the first time; the second reconstructed image is used as low-resolution data, and the second remote sensing image at the first moment is used as high-resolution data; searching similar pixels in a preset sliding window according to the second reconstructed image at the first moment, the second reconstructed image at the second moment and the second remote sensing image at the first moment; acquiring the spectral distance, the time distance and the spatial distance between the similar pixel and the central pixel; configuring the weight of the similar pixel according to the spectral distance, the time distance and the spatial distance; normalizing the weights of the similar pixels to obtain a combined weight matrix; predicting a second remote sensing image at a second moment according to the combined weight matrix; and taking the second remote sensing image at the second moment as the high-space high-time remote sensing image.

In an alternative embodiment, prior to execution of the program by the threads in the SRCNN network, the convolution kernel parameters are preloaded from global memory into shared memory from which the threads within the first predetermined block read the convolution kernel parameters as they execute the program. In another alternative embodiment, before the thread executes the program in the SRCNN network, the related image area is preloaded from the global memory to the shared memory, and the thread in the second predetermined block reads the related image area from the shared memory when executing the program; wherein the related image area comprises image areas related to all threads in the second predetermined block. In yet another alternative embodiment, the convolution layer and the non-linear mapping layer are merged in the SRCNN network, and the convolution operation and the non-linear mapping operation are performed by the same thread.

The parallelization method based on the memory hierarchy technology and the thread configuration is described in detail below with reference to specific embodiments.

Aiming at the problem that the operation efficiency is reduced due to the fact that the calculation amount is increased due to the introduction of deep learning in a space-time fusion algorithm, the embodiment of the invention performs CUDA programming on an improved SRCNN module calculation-intensive program to realize parallelization transformation. The module has the characteristics of large data volume, regular data storage form and basically the same data processing process, and is very suitable for performing highly parallel processing by using the GPU.

The memory hierarchy technology is the key to parallelization improvement by using the CUDA, and a schematic diagram of a CUDA memory model is shown in fig. 5. Each thread has its own private local memory, each thread block has a shared memory that can be shared by all threads in the block, the length of time that it exists depends on the life cycle of the thread block, and there is a global memory that can be accessed by all threads outside the thread block.

The GPU has a hierarchical memory structure including global memory, shared memory, registers, etc., which have very different access bandwidths and latencies, as shown in table 1. Global memory is the largest memory on the GPU that each thread can access, however it is also the slowest memory and often becomes the bottleneck for the whole program. Shared memory acts like a cache, being accessible only to threads within the same thread block, with greater bandwidth and lower latency than global memory. Registers are the fastest but smallest memories that can only be accessed by the corresponding thread, and memory access is much faster if the reused data can be stored in registers.

TABLE 1 GPU different memory Bandwidth and latency comparison Table

The parallel improvement of the SRCNN in the embodiment of the invention is mainly reflected in the parallel of convolution operation, the parameter of a convolution layer usually accounts for about 5% of the whole network, and the calculation amount accounts for 90% -95% of the whole network, so the efficiency of the whole neural network is directly determined by the implementation efficiency of the convolution operation. The traditional direct parallelization method has relatively rough detail processing, and each thread accesses the global memory to read the parameter of the convolution kernel (filter), thereby causing redundant reading and great waste of memory bandwidth.

The kernel function sharing diagram is shown in fig. 6. The embodiment of the invention fully utilizes the division of the blocks among the threads, namely, the convolution kernel parameters are preloaded to the shared memory from the global memory before the threads are executed, and then the threads in the same block can read the convolution kernel parameters from the shared memory during the execution, thereby avoiding redundant reading and reducing the occupation of the bandwidth of the global memory.

In the traditional direct parallelization method, the reading of the image blocks is also performed by accessing the global memory, the redundant reading is mainly reflected in the overlapping of the image blocks processed by each thread, and the overlapping area is closely related to the size and the step length of the convolution kernel. As shown in fig. 7, under the condition that the capacity of the shared memory allows, the embodiment of the present invention reads the image area (i.e. a certain part of the whole image) involved by all threads in the same block from the global memory to the shared memory in advance before the threads execute, so that the threads in the block can only access the shared memory when executing. In the embodiment of the invention, the nonlinear mapping ReLu operation performed after convolution is further considered in the parallelization, and in the traditional direct parallelization method, the convolution layer and the nonlinear mapping layer are separated and need to be executed on two different threads respectively, so that waste of data exchange between the threads is caused. The embodiment of the invention combines the convolutional layer and the nonlinear mapping layer together, namely, after the convolution operation is finished in the same thread, the ReLu operation is carried out, thus avoiding writing data into the global memory after the operation of the convolutional layer and reading data from the global memory before the ReLU layer, and improving the operation efficiency.

Aiming at the problem of time complexity improvement caused by deep learning, the embodiment of the invention realizes the parallelization processing of the SRCNN by using the GPU parallelization improvement method based on CUDA, fully utilizes the memory classification technology and the thread configuration method of the GPU, and further improves the operation efficiency compared with the traditional direct parallelization method.

In an optional embodiment, the remote sensing image high space-time fusion processing algorithm further comprises: PSNR and SSIM were calculated by the following formula:

and/or the presence of a gas in the gas,

wherein f is₁Representing the original image without super-resolution reconstruction, f₂Representing the reconstructed image, MSE (f), reconstructed at super-resolution₁,f₂) Represents the mean square error, mu, between the original image not subjected to super-resolution reconstruction and the reconstructed image subjected to super-resolution reconstruction₁、σ₁Respectively representing the mean, variance, mu of the original image without super-resolution reconstruction₂、σ₂Respectively representing the mean, variance, sigma of the reconstructed image after super-resolution reconstruction₁₂Representing the covariance between the original image not subjected to super-resolution reconstruction and the reconstructed image subjected to super-resolution reconstruction, C₁、C₂Is a constant used to maintain stability. PSNR is an objective evaluation index for images that is very widely used, and the larger PSNR is, the smaller distortion of an image is represented. Although PSNR is sensitive to image errors, it does not contain information about the subjective viewing characteristics of the human eye, which is sensitive to spatial frequency, luminance, chrominance and surrounding neighborsThe subjective evaluation observed by the human eye is quite contrary to the objective evaluation of PSNR due to the influence of domain information and the like. In the embodiment of the invention, SSIM is further considered as a second objective evaluation index, the similarity among images is measured by SSIM from three different aspects of brightness, contrast and structure, and generally, the larger the SSIM value is, the higher the similarity of the images is, and the smaller the distortion is.

The following is a detailed description of the experimental results and analysis of the remote sensing high space-time fast fusion algorithm.

Data selecting method

The embodiment of the invention selects the classic MODIS image and the Landsat image to be fused in the field of remote sensing space-time fusion, and integrates the advantages of the classic MODIS image and the Landsat image in the aspects of time resolution and space resolution. The 1 st wave band red wave band data and the 2 nd wave band near infrared wave band data of MOD09Q1 (8-day synthetic data) with the spatial resolution of 250 meters and the revisit period of 1 day, and the 4 th wave band and the 5 th wave band corresponding to Landsat8 with the spatial resolution of 30 meters and the revisit period of 16 days are selected, and the legend is represented by red wave band experimental data.

The research area selected by the embodiment of the invention is positioned in the boundary area of Shangmon, and comprises most areas in the north of elm City in Shaanxi province and some areas in the south-east of Ordorsi City in inner Mongolia, the geographic coordinates are 38 degrees, 47'59 degrees, 12'5 degrees, N degrees, 39 degrees, 12'5 degrees, 110 degrees, 20'35 degrees, E degrees, 110 degrees, 49'6 degrees, and the geographic range intercepted by the experiment is 45km multiplied by 45 km. The area is in transition zones of Mauwusu sand land and loess plateau, once the area is seriously eroded by wind, desertified and lost, and is a key area for national ecological environment construction. The red soda paste in northwest is the biggest desert lake in China and the breeding land and habitat of the biggest rare animal gull all over the world, and the water level is in the trend of shrinking continuously in recent years due to the reduction and evaporation of water sources, and the shrinking amplitude is as high as 44.7% from 1980 to 2015. China starts the 'three north protection construction project' in 1978 and steps into the project closing stage of the fifth stage of the second stage by 2019. The area is one of the key areas for preventing and controlling desertification in China, can be used as a representative for researching the ecological environment recovery condition of the 'three-north shelter forest', and has extremely high research and application values on the fusion of the ecological system of the Hongalkali paste.

As can be seen from fig. 8, the MOD09Q1 image with low resolution has severe pixel mixing phenomenon in some regions. FIG. 9 shows that the types of ground objects in the research area are complex, including water bodies, mountains, sand lands, forest lands, cultivated lands, buildings, roads and the like, and part of the ground objects are distributed alternately, so that the research area belongs to a type which is difficult to process by using a traditional space-time fusion method.

The data used in the experiment of the embodiment of the invention are all from NASA official network USGS (https:// earth x plorer. USGS. gov /), and the data with less than 1% of all clouds in the research area from 2015 to 2019 are selected. And converting the MODIS image from the HDF data format into the TIF data format by using an MCTK (MODIS Conversion Toolkit) tool, and projecting the converted MODIS image to a UTM-WGS84 coordinate system, wherein all data are cut to a specified area according to corresponding coordinates by adopting ENVI software, and are registered.

Experimental result and analysis of super-resolution reconstruction method based on secondary learning

The traditional SRCNN is a single-image super-resolution reconstruction method, namely a sample and a label are obtained based on the same image transformation, the sample image is firstly subjected to down-sampling before training to obtain a low-resolution image with the resolution only being 1/3 of that of an original image, the down-sampled image is amplified to the original image size by using a bicubic interpolation method, the image is still considered as the low-resolution image at the moment and is used as an input sample of the SRCNN, the SRCNN learns the original image which is not subjected to down-sampling treatment as high-resolution priori knowledge during training, and a network obtained by training is applied to a test set image of the same type for detection.

In the embodiment of the invention, when the SRCNN is introduced, two attempts are made, taking the first learning as an example, the method is that a high-resolution remote sensing image is directly used for single-image super-resolution reconstruction, namely, only a set of data of a 90 m-resolution Landsat8 image after down-sampling is trained by using the traditional SRCNN method, and then the obtained network is acted on low-resolution remote sensing images in the same area at the same time, namely 90 m-resolution MOD09Q1 data after re-sampling, and the results of the 90 m-resolution Landsat8 image detection experiment are compared. The second method is to use the existing high-resolution image as the prior knowledge of the low-resolution image of the same area at the same time for learning, the embodiment of the invention adopts two groups of data, namely the Landsat8 image and the MOD09Q1 image which are obtained after sampling and have the same resolution of 90m to train the SRCNN network, the MOD09Q1 image with the resolution of 90m after reconstruction can be obtained through the network, and the detection experiment result of the Landsat8 image with the resolution of 90m after down-sampling is compared in the same way.

The comparison of the experimental results obtained on the test set is shown in fig. 10 (the pixel numbers do not correspond to the pixel numbers because the peripheral convolution operation is not filled), and it can be seen that, compared with the first method, the second method is closer to Landsat8 when the MOD09Q1 data information is used as the basis. In the embodiment of the invention, Mean Square Error (MSE) is used as a loss function to train the SRCNN, and the smaller the MSE is, the better the reconstruction effect is. The MSE value of the method I is 0.08757569, the MSE value of the method II is 0.04204063, and the experimental result of the method II has richer space detail information compared with the experimental result of the method I.

In the embodiment of the invention, a second method is adopted to obtain a MOD09Q1 reconstructed image with higher quality and 90m resolution after the first learning, the image is resampled to 30m resolution, the second super-resolution reconstruction is carried out by using the SRCNN network using method II, the original Landsat8 remote sensing image with 30m resolution is used as the priori knowledge of the image in the current reconstruction, a second super-resolution reconstruction result with 30m resolution shown in fig. 11(c) is obtained, and the MSE value of the corresponding Landsat8 image is compared to be 0.02961750.

Fig. 11 is a graph comparing experimental results obtained by super-resolution reconstruction through secondary learning with experimental raw data according to an embodiment of the present invention. It can be seen that the spatial detail is gradually enriched from fig. 11(a) to fig. 11 (d). From an overall perspective, it can be considered that the experiment trains the neural network with fig. 11(d) as the prior knowledge of fig. 11(a), and finally obtains fig. 11(c) through fig. 11 (b). Fig. 11(d) is most abundant as prior knowledge spatial detail information, but the time resolution is 16 days, and the experimentally obtained fig. 11(c) is only next to fig. 11(d) in spatial detail, and the time resolution is only 1 day, and the embodiment of the present invention performs space-time fusion on fig. 11(c) and fig. 11(d) to obtain a fused image with the time resolution of 1 day and more abundant spatial detail information, thereby improving the accuracy of space-time fusion of the remote sensing image.

The embodiment of the invention adopts Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) as objective evaluation indexes to further analyze experimental results, and the calculation formulas are respectively formula (13) and formula (14).

MSE(f₁,f₂) Representing image f₁And image f₂Mean square error between, mu₁、σ₁Respectively represent images f₁Mean, variance, μ₂、σ₂Respectively represent images f₂Mean, variance, σ of₁₂Representing image f₁And image f₂Covariance between, C₁、C₂Is a constant used to maintain stability.

PSNR is an objective evaluation index for images that is very widely used, and the larger PSNR is, the smaller distortion of an image is represented. Although PSNR is sensitive to an error of an image, it does not include information of subjective observation characteristics of human eyes, and sensitivity of human eyes is affected by spatial frequency, luminance, chromaticity, surrounding neighborhood information, and the like, so that a phenomenon occurs in which subjective evaluation observed by human eyes is completely opposite to objective evaluation of PSNR. In the embodiment of the invention, SSIM is further considered as a second objective evaluation index, the similarity among images is measured by SSIM from three different aspects of brightness, contrast and structure, and generally, the larger the SSIM value is, the higher the similarity of the images is, and the smaller the distortion is.

Table 2 shows objective evaluation results of the super-resolution reconstruction experiment results performed in the embodiment of the present invention. First, comparing the data in the second row to the fourth row, it can be seen that the PSNR difference between the two reconstruction methods is not large when reconstructing to the intermediate resolution, but the second advantage of the method on SSIM is obvious. Comparing the line 5 and the line 6, it can be seen that compared with the simple resampling method used by STARFM, the super-resolution reconstruction method based on the quadratic learning provided by the embodiment of the present invention has higher PSNR and SSIM, and especially has a significant enhancement effect on SSIM.

TABLE 2 Objective evaluation results of super-resolution reconstruction experiment results

Thirdly, experimental results and analysis of remote sensing high space-time fusion algorithm

The traditional STARFM method constructs a weighted model of the relationship between the reflectivity of low spatial resolution pixels and the reflectivity of high spatial resolution pixels by introducing information of neighboring pixels, i.e. the size of each central pixel is determined by other pixels in a certain range around the central pixel, and the weight corresponding to each pixel value in a window depends on the temporal distance, the spatial distance and the spectral distance between the input image pairs on the window, and similar pixel search is performed in the window, and a specific screening rule exists.

The embodiment of the invention takes the process of constructing the model as the characteristic of manual extraction of expert knowledge, and improves the mode of using machine learning. The process of super-resolution reconstruction of the SRCNN is to learn the mapping relation between a low-resolution image and a high-resolution image by a deep learning neural network. The embodiment of the invention is based on the basic idea of STARFM, if t is known₀High resolution, low resolution data and t at time of day_kLow resolution data at time, required to be t_kAnd setting the low-resolution image for learning as the following according to the high-resolution data at the moment: at t₀Adding low resolution data at t on the basis of high resolution data at time_kTime t₀Change information of time. The embodiment of the invention converts the original t_kHigh resolution number of momentsAccording to the prior knowledge used for learning, the two are input into the SRCNN network to learn the mapping relation between the low-resolution images and the high-resolution images, and the improvement of the automatic feature extraction of the deep learning neural network is realized.

Fig. 12 shows a comparison of the results of the three space-time fusion methods, and fig. 12(a) shows the prediction result obtained by fusion of the STARFM method, and it can be seen that although the image style is relatively close to that of the Landsat8 image, the image plaque phenomenon is still serious in some regions. Fig. 12(b) is a STARFM-based improved space-time fusion method, and a method of resampling by using a super-resolution reconstruction technique realizes dual fusion, and compared with fig. 12(a), the method has richer spatial details and alleviates pixel mixing to a certain extent. Fig. 12(c) shows the prediction result of the spatio-temporal fusion algorithm according to the embodiment of the present invention, and it can be seen that the image style is substantially consistent with the Landsat8 data, and only the image patch (the mixture of pixels is serious) is shown in fig. 12(a) and fig. 12(b), and the spatial detail information same as that of the Landsat8 image is shown in fig. 12(c), which significantly improves the fusion quality.

Table 3 shows the objective evaluation index comparison between the spatio-temporal fusion algorithm of the embodiment of the invention and the original STARFM method and the improved STARFM method. Compared with the original STARFM method and the improved STARFM method, the fusion algorithm of the embodiment of the invention has the highest PSNR and SSIM and the best fusion effect. By comparing objective evaluation indexes, the algorithm of the embodiment of the invention is verified to effectively improve the quality of the space-time fusion of the remote sensing image.

TABLE 3 Objective index evaluation of experimental results by spatio-temporal fusion method

Experimental result and analysis of super-resolution convolution neural network parallelization method

Aiming at the problem that the operation efficiency is reduced due to the fact that the calculation amount is increased due to the introduction of deep learning in the space-time fusion algorithm, the embodiment of the invention adopts CUDA to parallelize SRCNN to improve the operation efficiency of the fusion algorithm of the embodiment of the invention.

The feed-forward process of the embodiment of the invention is as follows: finishing image down-sampling or re-sampling (the SRCNN network adopted in the fusion process also combines data) at a CPU end to obtain an input image specified by the SRCNN network in the embodiment of the invention, equally dividing the high-resolution image and the low-resolution image into image blocks with specified sizes according to the size and the step length of a convolution kernel as network input, and distributing a memory space for subsequent image processing; transmitting the image block input into the network from the CPU end to the GPU end, and distributing memory space at the GPU end; defining Kernel function Kernel, mainly considering solving convolution operation between an image block and a filter, setting the convolution operation and ReLu operation in the same thread, and mapping the image to each thread according to pixel points; balancing the utilization rate of the memory and the threads, setting the allocation of thread blocks and thread grids, simultaneously considering the calculation of pixel points related to the threads in the thread blocks and preloading the pixel points to a shared memory in the blocks, and preloading the layer of convolution kernel parameters to the shared memory; the SRCNN method of the embodiment of the invention is executed in parallel on a plurality of pixel points according to the thread blocks in the GPU, and after all threads are calculated, the results are transmitted back to the CPU; and circularly calling the kernel function, and sequentially transmitting other image blocks from the CPU end to the GPU end. Repeating the above steps can complete the feed forward process of the whole image. And finally, at the CPU end, comprehensively calculating results, normalizing to obtain a final reconstruction result, and storing and displaying the reconstruction result.

The process of performing back propagation in the GPU in the embodiment of the invention comprises the following steps: taking the mean square error as a loss function, and measuring the loss between the result of the feedforward process and the target high-resolution image block; setting another Kernel function Kernel, calculating the gradient of an output layer through a loss function, and further obtaining the bias gradient, the weight gradient and the gradient of an input layer of each convolution layer; and updating the convolution kernel parameters in the shared memory by using the calculation result, and iterating the feedforward process again by using the new convolution kernel parameters and the input layer image block. The loop is repeated until the loss function value falls below a set threshold, and the network parameters are saved. And finally, at the CPU end, comprehensively calculating results and normalizing to obtain a final result.

Table 4 shows the average time-use comparison of single images processed by the SRCNN method of the present invention under different conditions. The GPU parallelization can better improve the running efficiency of the program, and the convolution kernel sharing and image block sharing and the method for combining the convolution layer and the nonlinear mapping layer provided by the embodiment of the invention can further effectively improve the running efficiency of the program, reduce the fusion time of the high-space-time resolution image and be beneficial to providing technical support for the deep application of massive space-time remote sensing images.

TABLE 4 average time-of-use contrast for single image processing before and after SRCNN parallelization improvement

The embodiment also provides a remote sensing image high space-time fusion processing device, which is used for implementing the above embodiments and preferred embodiments, and the description of the device is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

The present embodiment provides a high spatial-temporal fusion processing apparatus for remote sensing images, as shown in fig. 13, including: the obtaining module 1301 is configured to obtain a first remote sensing image with low space and high time and a second remote sensing image with high space and low time; the spatial resolution of the first remote sensing image is a first spatial resolution, the period of the first remote sensing image is a first period, the spatial resolution of the second remote sensing image is a second spatial resolution, and the period of the second remote sensing image is a second period; a first resampling module 1302, configured to resample the first remote sensing image to obtain a third remote sensing image; the spatial resolution of the third remote sensing image is greater than that of the first remote sensing image; the down-sampling module 1303 is configured to down-sample the second remote sensing image to obtain a fourth remote sensing image; the spatial resolution of the fourth remote sensing image is smaller than the spatial resolution of the second remote sensing image, and the spatial resolution of the fourth remote sensing image is equal to the spatial resolution of the third remote sensing image; a first input module 1304, configured to input the third remote sensing image and the fourth remote sensing image to an SRCNN network, so as to obtain a first reconstructed image with an intermediate spatial resolution; wherein the intermediate spatial resolution is the same spatial resolution as the third remote sensing image and the fourth remote sensing image; a second resampling module 1305, configured to resample the first reconstructed image to obtain a first reconstructed image with the second spatial resolution; a second input module 1306, configured to input the first reconstructed image with the second spatial resolution and the second remote sensing image into an SRCNN network to obtain a second reconstructed image; a third input module 1307, configured to input the second reconstructed image to a space-time fusion model to obtain a high-space high-time remote sensing image; wherein the high-spatial high-temporal remote sensing image has the first period and the second spatial resolution.

The remote sensing high spatiotemporal fusion device in this embodiment is presented in the form of a functional unit, where the unit refers to an ASIC circuit, a processor and memory executing one or more software or fixed programs, and/or other devices that can provide the above-described functionality.

Further functional descriptions of the modules are the same as those of the corresponding embodiments, and are not repeated herein.

An embodiment of the present invention further provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute any remote sensing image high space-time fusion processing algorithm in the embodiment.

The embodiment of the invention also provides a non-transient computer storage medium, wherein the computer storage medium stores computer executable instructions which can execute the remote sensing image high space-time fusion processing algorithm in any method embodiment. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.

In summary, the embodiment of the invention provides a super-resolution reconstruction method based on quadratic learning, which aims to solve the problems that the difference between the resolutions of two groups of images is too large, so that deep learning neural network training is difficult to cause, the resolution enhancement effect of low-resolution images is poor, the phenomenon of serious pixel mixing is difficult to relieve, and the like. The SRCNN method is adopted to carry out super-resolution reconstruction on the low-spatial-resolution data to replace the resampling process of the low-spatial-resolution image in the STARFM method, secondary learning and a method using the high-resolution data as priori knowledge are utilized, the low-resolution data is reconstructed into the high-resolution data through the SRCNN super-resolution reconstruction, a good reconstruction result is obtained, and system errors caused by sensor differences can be relieved.

An automatic image feature extraction method based on deep learning is provided, and the mapping relation between high-resolution data and low-resolution data is found and learned. According to the embodiment of the invention, the manual feature extraction in the traditional STARFM method is replaced by the automatic feature extraction by utilizing deep learning, when the traditional STARFM method is fused, a sliding window technology is adopted to assign a value to a central pixel, similar pixels are searched in a window, and then corresponding weights are generated according to time, space and spectral distance. The embodiment of the invention is based on the basic idea of STARFM, the difference graph of low-resolution data at unknown time and high-resolution data at known time is fused to be used as the input data of SRCNN, the existing high-resolution data at unknown time is used as the prior knowledge, the mapping relation between the low-resolution data and the known high-resolution data is learned, the automatic extraction of image features based on deep learning is realized, and the fusion quality is obviously improved.

Aiming at the problem that the operation efficiency is reduced due to the fact that the calculation amount is increased due to the introduction of deep learning in a space-time fusion algorithm, the embodiment of the invention provides a parallelization method based on a memory classification technology and thread configuration, and parallelization processing of SRCNN is realized. The SRCNN module has the characteristics of large data volume, regular data storage form and basically the same data processing process, utilizes the GPU to perform highly parallel processing, and adopts a method of sharing convolution kernels and image blocks and combining convolution layers and nonlinear mapping, thereby improving the operation efficiency.

The remote sensing space-time fusion problem comes from the limitation of the self physical performance of a remote sensing satellite, and the space resolution and the time resolution of the obtained single remote sensing image are a pair of spears and cannot be obtained at all. The technical problems to be solved by the invention are the space-time contradiction of the remote sensing image and the high complexity of the space-time fusion algorithm of the remote sensing image. The embodiment of the invention provides a remote sensing high-space-time rapid fusion algorithm, researches an improved machine learning method, fuses remote sensing image data with high spatial resolution and low temporal resolution and remote sensing image data with high temporal resolution and low spatial resolution, carries out parallelization processing, integrates the spatial advantage of satellite data with high spatial resolution and the temporal advantage of satellite data with low spatial resolution, and improves the fusion efficiency.

Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims

1. A high spatial-temporal fusion processing algorithm for remote sensing images is characterized by comprising the following steps:

2. The remote sensing image high space-time fusion processing algorithm according to claim 1, comprising: the first remote sensing image is an MODIS image; the second remote sensing image is a Landsat8 image.

3. The remote-sensing image high space-time fusion processing algorithm according to claim 1, wherein inputting the second reconstructed image into a space-time fusion model to obtain a high space-time remote-sensing image comprises:

acquiring a second reconstructed image at the first moment, a second reconstructed image at the second moment and a second remote sensing image at the first moment; the second reconstructed image is used as low-resolution data, and the second remote sensing image at the first moment is used as high-resolution data;

4. The remote sensing image high space-time fusion processing algorithm according to claim 1, characterized in that the method further comprises:

PSNR and SSIM were calculated by the following formula:

such as/or the like, in the case of,

wherein f is₁Representing the original image without super-resolution reconstruction, f₂Representing the reconstructed image, MSE (f), reconstructed at super-resolution₁，f₂) Representing the mean square error, mu, between the original image not subjected to super-resolution reconstruction and the reconstructed image subjected to super-resolution reconstruction₁、σ₁Respectively representing the mean, variance, mu of the original image without super-resolution reconstruction₂、σ₂Respectively representing the mean, variance, sigma of the reconstructed image after super-resolution reconstruction₁₂Representing the covariance between the original image not subjected to super-resolution reconstruction and the reconstructed image subjected to super-resolution reconstruction, C₁、C₂Is a constant used to maintain stability.

5. The remote sensing image high space-time fusion processing algorithm according to claim 1, wherein before a thread in the SRCNN network executes a program, a convolution kernel parameter is preloaded to a shared memory from a global memory, and the thread in a first predetermined block reads the convolution kernel parameter from the shared memory when executing the program.

6. The remote sensing image high space-time fusion processing algorithm according to claim 1, wherein before a thread in the SRCNN network executes a program, a related image area is preloaded to a shared memory from a global memory, and the related image area is read from the shared memory when the thread in a second predetermined block executes the program; wherein the related image area comprises image areas involved by all threads in the second predetermined block.

7. The remote sensing image high space-time fusion processing algorithm according to claim 1, wherein the convolution layer and the nonlinear mapping layer are combined in the SRCNN network, and the convolution operation and the nonlinear mapping operation are performed by the same thread.

8. A remote sensing image high space-time fusion processing device is characterized by comprising:

the second input module is used for inputting the first reconstructed image with the second spatial resolution and the second remote sensing image into the SRCNN network to obtain a second reconstructed image;

9. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the remote sensing image high space-time fusion processing algorithm of any one of claims 1-7.

10. A computer readable storage medium having stored thereon computer instructions, wherein the instructions when executed by a processor implement the remote sensing image high space-time fusion processing algorithm of any one of claims 1-7.