CN117422619A

CN117422619A - Training method of image reconstruction model, image reconstruction method, device and equipment

Info

Publication number: CN117422619A
Application number: CN202311142367.2A
Authority: CN
Inventors: 姜垣良; 任庆滢; 董绍华; 张行; 马云栋; 刘海鹏
Original assignee: China University of Petroleum Beijing
Current assignee: China University of Petroleum Beijing
Priority date: 2023-09-06
Filing date: 2023-09-06
Publication date: 2024-01-19

Abstract

The present disclosure relates to the field of image processing technologies, and in particular, to a training method of an image reconstruction model, an image reconstruction method, an image reconstruction device, and an apparatus. Inputting a training image into a generator network for calculation to obtain a reconstructed image, wherein the generator network comprises global feature extraction, local feature extraction and image reconstruction, and the local feature extraction comprises local feature extraction of global features through a dense receptive field network formed by connecting a residual block and three RRNL blocks in series and an RFB module; inputting the reconstructed image and the training image into a discriminator network for calculation to obtain a discrimination result; calculating a loss value according to the training image, the reconstructed image and the discrimination result; and iterating the steps until the loss value meets the preset requirement, and taking the generator network as an image reconstruction model. By the embodiment, a non-local dense experience wild antigen network is established, the resolution of the image is improved, and detail information is well restored.

Description

Training method of image reconstruction model, image reconstruction method, device and equipment

Technical Field

Embodiments of the present disclosure relate to the field of image processing technologies, and in particular, to a training method of an image reconstruction model, an image reconstruction method, an image reconstruction device, and an apparatus.

Background

At present, an oil gas conveying pipeline bears an important task, the oil gas conveying process often needs to be conducted on terrains with complex geographical conditions, safety detection of the oil gas pipeline is difficult, inspection and detection work are difficult to cover all areas, and the influence of geological disasters such as earthquakes, landslides and the like on the pipeline can be easily caused by the complexity of the terrains. Therefore, the satellite remote sensing technology is applied to pipeline safety detection in the current oil and gas transmission field. However, as the distance between the satellite and the ground is too long, the resolution of the imager shot by the satellite is difficult to support the continuous amplification of the image, so that the content of the close-range image is blurred and the image quality is low.

Aiming at the problems, the resolution of satellite images can be improved by matching with an image reconstruction technology. The existing common remote sensing image super-resolution reconstruction method comprises an interpolation-based method, an edge-based method, a statistics-based method, a sparse representation-based method and a deep learning-based method, wherein the interpolation-based method increases the resolution of an image by interpolating a low-resolution image, and the method has the advantages of simple calculation and tends to introduce image blurring and pseudo details. The edge-based method uses edge information in the image for super-resolution reconstruction, but has poor effect on smooth regions.

Convolutional Neural Networks (CNNs) are a type of deep learning model. The core idea of CNN is to extract the features of the input data by convolution operation and to combine and abstract the features through the neural network structure of layer-by-layer stack. It has several key components of convolution layer, activation function, pooling layer, full connection layer and Dropout layer when processing image data. The training process of CNNs typically uses a back-propagation algorithm for optimization of parameters. The prediction error of the model is measured by a loss function by comparing the input data with the real labels, then the gradient is calculated by back propagation and the weight parameters in the network are updated. However, the CNN is not suitable for the super-resolution reconstruction of satellite remote sensing images, and has the following problems:

1. data scarcity: the acquisition and labeling costs of the remote sensing images are high, resulting in the scarcity of training data. The common CNN needs a large amount of marking data to learn image characteristics in the training process, but in the super-resolution of the remote sensing images, the available high-resolution remote sensing images are fewer, so that the problem of data scarcity is caused, and the generalization capability and performance of the model are affected.

2. Edge retention problem: remote sensing images often contain rich edge information, whereas common CNNs may cause blurring or distortion of edges when performing convolution and pooling operations. Due to the local nature of the convolution and pooling operations, the edge information is susceptible to blurring, resulting in unclear or lossy edges of the reconstructed high resolution image.

3. Scaling problem: super-resolution reconstruction of the remote sensing image involves a scale transformation, i.e. reconstruction from a low resolution image to a high resolution image. However, common CNN models are typically trained based on fixed input and output dimensions at design time, and are not well suited to scaling tasks. This may result in the model not performing well when dealing with scaling problems.

4. Learning complex texture and detail problems: the remote sensing image has rich complex texture and detail information, which puts high demands on the learning ability of the model. Conventional CNNs may have limitations in handling complex textures and details, resulting in reconstructed high resolution images lacking some detail information or artifacts.

Generating a countermeasure network (GAN) is a deep learning model, consisting of two main components, a generator and a arbiter. The generator functions to take as input random noise and map it to data samples similar to real samples. It generally adopts deep convolutional neural network or recurrent neural network, etc., and gradually generates more real data samples through multiple neural network layers. The role of the discriminator is to classify the input data sample and determine whether it is a true sample or a sample generated by the generator. It also typically employs deep convolutional neural networks or the like structures that provide feedback signals regarding sample authenticity by learning to distinguish between true samples and generated samples.

The training process of GAN is achieved by the countermeasure learning between the generator and the arbiter. In the training process, the generator and the discriminator compete with each other to continuously optimize the performance of the device. The generator improves the quality of the generated samples by minimizing the discrimination error of the discriminator on the generated samples, while the discriminator improves its accuracy by maximizing the discrimination capability on the true samples and the generated samples.

However, GAN training is unstable, and the training process is relatively complex, involving dynamic balancing between the generator and the arbiter. During training, unstable competition and play may occur between the generator and the arbiter, making it difficult for the training process to converge or produce undesirable results. Careful adjustment of the super parameters and network structure is required to achieve good training results. In some cases, the generator may be involved in the problem of a pattern collapse. Mode collapse refers to the fact that the generator can only generate a limited number of modes, but cannot generate diversified and high quality super-resolution images. And the general GAN model can not be well adapted to super-resolution reconstruction of images in the middle and burm areas.

The existing image super-resolution reconstruction method is needed to solve the problems that image blurring and pseudo details are introduced, the effect on smooth areas is poor, the dependence on statistical models is strong, and the flexibility and adaptability of the models are poor.

Disclosure of Invention

In order to solve the problems in the prior art, the embodiment of the specification provides a training method, an image reconstruction device and equipment for an image reconstruction model, a non-local dense sensing wild countering network is established, the resolution ratio of remote sensing images of an oil gas transmission pipeline is improved, detailed information is well restored, and the accuracy of satellite remote sensing detection is greatly improved.

In order to solve any one of the above technical problems, the specific technical solutions of the embodiments of the present specification are as follows:

in one aspect, embodiments of the present disclosure provide a method for training an image reconstruction model, including,

inputting the training image into a generator network for calculation to obtain a reconstructed image, wherein the generator network comprises global feature extraction, local feature extraction and image reconstruction, and the global feature extraction comprises global feature extraction of the training image through a first convolution layer to obtain global features; the local feature extraction comprises local feature extraction of the global feature through a dense receptive field network formed by connecting a residual block and three RRNL blocks in series and an RFB module to obtain a local feature; the image reconstruction includes: performing image reconstruction on the global features and the local features to obtain a reconstructed image;

Inputting the reconstructed image and the training image into a discriminator network for calculation to obtain a discrimination result;

calculating a loss value according to the training image, the reconstructed image and the discrimination result;

and iterating the steps until the loss value meets the preset requirement, and taking the generator network as an image reconstruction model.

Further, the local feature extraction is performed on the global feature through a dense receptive field network formed by connecting a residual block and three RRNL blocks in series and an RFB module, and the obtaining the local feature further comprises:

inputting the global features into a residual block for feature extraction to obtain first features;

inputting the first feature into three RRNL blocks connected in series to perform feature extraction, and taking the output result of the last RRNL block as a second feature;

and inputting the second characteristic to an RFB module for characteristic extraction to obtain the local characteristic.

Further, each RRNL block is formed by serially connecting an RRDB structure with a Non-local structure, and the RRDB structure is formed by serially connecting three residual stacking blocks;

the calculating step of any one of the RRNL blocks includes:

inputting the input value of the RRNL block into three serially connected residual stacking blocks for feature extraction;

And inputting the features output by the last residual stacking block into a Non-local structure for feature extraction, and taking the features extracted by the Non-local structure as the output value of the RRNL block.

Further, inputting the feature output by the last residual stacking block into the Non-local structure for feature extraction further comprises:

carrying out convolution calculation on the characteristics output by the last residual stacking block through three convolution layers of 1 multiplied by 1 to obtain an input characteristic diagram;

inputting the first convolution result into a value branch, a key branch and a query branch of the Non-local structure respectively for calculation, wherein the value branch is used for mapping the input feature map to a value feature map to obtain a feature value for calculating each pixel position, the key branch is used for mapping the input feature map to a key feature map through convolution operation to obtain a key feature of each pixel position, and the query branch is used for mapping the input feature map to a query feature map through convolution operation to obtain a query feature of each pixel position;

multiplying the query feature output by the query branch and the key feature output by the key branch, and inputting the product result to a softmax layer of the Non-local structure for normalization processing to obtain an attention map between a pixel position in the input feature map and other pixel positions except the pixel position in the input feature map;

And multiplying the feature value output by the value branch by the attention map to obtain the feature extracted by the Non-local structure.

Further, the RFB module includes three branches, a first branch includes a convolution layer having a convolution kernel of 1×1, a convolution layer having a convolution kernel of 3×3 and a rate of 1, a second branch includes a convolution layer having a convolution kernel of 1×1, a convolution layer having a convolution kernel of 3×3, and a convolution layer having a convolution kernel of 3×3 and a rate of 3, and a third branch includes a convolution layer having a convolution kernel of 1×1, a convolution layer having a convolution kernel of 5×5, and a convolution layer having a convolution kernel of 3×3 and a rate of 5;

inputting the second feature to an RFB module for feature extraction, and obtaining the local feature further comprises:

inputting the second features into a first branch, a second branch and a third branch respectively for calculation;

inputting the calculation results of the first branch, the second branch and the third branch to a localization layer for characteristic splicing and fusion;

and inputting the output of the 1 multiplied by 1 convolution layer of the first layer in the third branch and the result of the concatenation and fusion of the features of the localization layer into a ReLU activation function layer for calculation to obtain the local features.

Further, the penalty values include a generator network penalty value and a arbiter network penalty value;

the formula for calculating the generator network loss value is:

GLoss＝Min(λ _a L _a +λ _p L _p +λ _i L _i +λ _tv L _tv )

wherein, gloss represents the generator network loss value, lambda _a 、λ _p 、λ _i 、λ _tv All are constant coefficients, L _a 、L _p 、L _i 、L _tv The countermeasures loss, the perception loss, the image mean square error loss and the total variation loss respectively;

the formula for calculating the loss value of the discriminator network is as follows:

DLoss＝Max(D(x _real )-1-D(x _fake ))

wherein DLoss represents a loss value of the arbiter network, and Max tableMaximum value is shown, D represents a discriminator network, x _real Representing training images, x _fake ＝G(z)，x _fake The representation generator network is based on training images x _real The reconstructed image is constructed, G represents the generator network, z represents noise.

Based on the same inventive concept, the embodiments of the present disclosure further provide an image reconstruction method, including:

receiving a real image of an image to be reconstructed;

and inputting the real image into an image reconstruction model for processing to obtain a reconstructed image, wherein the image reconstruction model is obtained by using the training method of the image reconstruction model.

In another aspect, an embodiment of the present specification further provides a training apparatus for an image reconstruction model, where the apparatus includes:

the system comprises a generator network training unit, a reconstruction unit and a reconstruction unit, wherein the generator network training unit is used for inputting training images into a generator network for calculation to obtain reconstructed images, the generator network comprises global feature extraction, local feature extraction and image reconstruction, and the global feature extraction comprises global feature extraction of the training images through a first convolution layer to obtain global features; the local feature extraction comprises the step of carrying out local feature extraction on the training image through a dense receptive field network formed by connecting a residual block and three RRNL blocks in series and an RFB module to obtain local features; the image reconstruction includes: performing image reconstruction on the global features and the local features to obtain a reconstructed image;

The discriminant network training unit is used for inputting the reconstructed image and the training image into a discriminant network for calculation to obtain a discriminant result;

the loss value calculation unit is used for calculating a loss value according to the training image, the reconstructed image and the discrimination result;

and the iterative training unit is used for iterating the steps until the loss value meets the preset requirement, and taking the generator network as an image reconstruction model.

Based on the same inventive concept, the embodiments of the present specification also provide an image reconstruction apparatus, including:

the real image receiving unit is used for receiving a real image of the image to be reconstructed;

and the image reconstruction unit is used for inputting the real image into an image reconstruction model for processing to obtain a reconstructed image, wherein the image reconstruction model is obtained by using the training method of the image reconstruction model.

Finally, the embodiments of the present specification also provide a computer device, including a memory, a processor, and a computer program stored on the memory, where the processor implements the above method when executing the computer program.

By the method of the embodiment of the specification, the pipeline optical remote sensing image comprising longitude, latitude and elevation, which is obtained by a synthetic aperture radar technology (Synthetic Aperture Radar, SAR), can be used as a training image, the image deep features of the pipeline optical remote sensing image are extracted by using a dense receptive field network formed by connecting a residual block and three RRNL blocks in series, and the addition of the residual block (residual block) means that jump connection is introduced into a neural network, so that the network can learn residual mapping. In the super-resolution of an image, there is a correlation between details of a high-resolution image and low-frequency information. By introducing residual blocks, the network can better capture and retain this information. The method is helpful for reducing information loss in the super-resolution process, so that the generated high-resolution image is more accurate and the details are more abundant. And introduces RFB mechanisms to simulate the Receptive Field of human vision to enhance the local feature extraction capability of a dense Receptive Field network, a Receptive Field refers to the Receptive Field of a neuron to a specific region in the input image. The Receptive Field Block (RFB) module expands the receptive field of each neuron in the neural network, enabling it to span a larger spatial range. In image super-resolution, the addition of receptive fields helps the network capture a wider range of contextual information, especially when restoring detail. This helps the network to better understand the structure and texture of the image, thereby improving the quality of the super-resolution result. The method has the advantages that the local feature extraction is carried out on the pipeline optical remote sensing image, the reconstruction quality of the pipeline optical remote sensing image is improved, the super-resolution reconstruction is carried out on the training image by combining the global feature of the extracted training image, so that important targets such as an oil gas pipeline and the like are captured and identified more easily by the image reconstruction model, abnormal conditions such as leakage, damage, corrosion and the like of the pipeline can be detected more easily, the safety and monitoring effect of the pipeline are improved, in addition, the high-resolution reconstruction image can provide more detailed geographic information and landform features, the planning and design of the pipeline are facilitated, and the pipeline planning and design are more accurate and reliable.

Drawings

In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an implementation system of a training method of an image reconstruction model according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart of a training method of an image reconstruction model according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of the structure of an RRNL-SRGAN generator network according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a network of discriminators according to an embodiment of the present disclosure;

FIG. 5 illustrates steps for extracting local features in an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of the RRNL block structure in the embodiment of the disclosure;

fig. 7 is a schematic structural diagram of an RRDB block according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram showing the structure of a Dense block in the embodiment of the present disclosure;

FIG. 9 is a schematic diagram showing a Non-local structure in an embodiment of the present disclosure;

fig. 10 is a schematic view showing the structure of an RFB module in the embodiment of the present disclosure;

FIG. 11 is a schematic structural diagram of an image reconstruction model training apparatus according to an embodiment of the present disclosure;

fig. 12 is a flowchart of an image reconstruction method according to an embodiment of the present disclosure;

fig. 13 is a schematic view showing a structure of an image reconstruction apparatus according to an embodiment of the present disclosure;

FIG. 14 is a schematic view showing the structure of a computer device in the embodiment of the present specification;

FIG. 15 shows a comparison of visual effects of reconstructed images of a building including a house in an embodiment of the present disclosure;

fig. 16 shows a visual effect contrast of reconstructed images including river and green areas in the embodiment of the present specification.

[ reference numerals description ]:

101. a terminal;

102. a processor;

1101. a generator network training unit;

1102. a arbiter network training unit;

1103. a loss value calculation unit;

1104. an iterative training unit;

1301. a real image receiving unit;

1302. an image reconstruction unit;

1402. a computer device;

1404. a processing device;

1406. storing the resource;

1408. a driving mechanism;

1410. An input/output module;

1412. an input device;

1414. an output device;

1416. a presentation device;

1418. a graphical user interface;

1420. a network interface;

1422. a communication link;

1424. a communication bus.

Detailed Description

The technical solutions of the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is apparent that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the embodiments of the present disclosure, are intended to be within the scope of the embodiments of the present disclosure.

It should be noted that the terms "first," "second," and the like in the description and the claims of the embodiments of the present specification and the above-described drawings are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present description described herein may be capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or device.

It should be noted that, in the technical scheme of the application, the acquisition, storage, use, processing and the like of the data all conform to the relevant regulations of national laws and regulations.

Fig. 1 is a schematic diagram of an implementation system of a training method of an image reconstruction model in an embodiment of the present disclosure, which includes a terminal 101 and a processor 102. The terminal 101 and the server 102 may communicate over a network, which may include a local area network (Local Area Network, abbreviated as LAN), a wide area network (Wide Area Network, abbreviated as WAN), the internet, or a combination thereof, and be connected to a website, user equipment (e.g., computing device), and a backend system.

The user inputs a large number of training images to the server 102 through the terminal 101, and the server 102 performs training of the image reconstruction model by using the training images input by the terminal 101, and obtains and stores the trained image reconstruction model. Then, the user may want the server 102 to input the low-resolution image to be reconstructed through the terminal 101, the server 102 processes the low-resolution image to be reconstructed by using the stored image reconstruction model to generate a high-resolution reconstructed image, and finally the server 102 feeds back the generated high-resolution reconstructed image to the user through the terminal 101, so that the user can perform subsequent processing.

In addition, it should be noted that, fig. 1 is only one application environment provided by the present disclosure, and in practical application, other application environments may also be included, which is not limited in this specification.

Specifically, the embodiment of the specification provides a training method of an image reconstruction model, which establishes a non-local dense sensing wild antagonism network, improves the resolution of remote sensing images of an oil gas transmission pipeline, well restores detailed information and greatly improves the accuracy of satellite remote sensing detection. Fig. 2 is a flow chart of a training method of an image reconstruction model according to an embodiment of the present disclosure. The process of training an image reconstruction model is described in this figure. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When a system or apparatus product in practice is executed, it may be executed sequentially or in parallel according to the method shown in the embodiments or the drawings. As shown in fig. 2 in particular, the method may be performed by the server 102, and may include:

step 201: inputting the training image into a generator network for calculation to obtain a reconstructed image;

in the step, the generator network comprises global feature extraction, local feature extraction and image reconstruction, wherein the global feature extraction comprises global feature extraction of training images through a first convolution layer to obtain global features; the local feature extraction comprises local feature extraction of the global feature through a dense receptive field network formed by connecting a residual block and three RRNL (residual-in-residual non-local dense block) blocks in series and an RFB module; the image reconstruction includes: performing image reconstruction on the global features and the local features to obtain a reconstructed image;

Step 202: inputting the reconstructed image and the training image into a discriminator network for calculation to obtain a discrimination result;

step 203: calculating a loss value according to the training image, the reconstructed image and the discrimination result;

step 204: and iterating the steps until the loss value meets the preset requirement, and taking the generator network as an image reconstruction model.

By the method of the embodiment of the specification, the pipeline optical shaking image including longitude, latitude and elevation obtained by the synthetic aperture radar technology (Synthetic Aperture Radar, SAR) can be used as a training image, the intensive receptive field network formed by connecting the residual block and the three RRNL blocks in series is utilized to extract the deep image features of the pipeline optical shaking image, and the adding of the residual block (residual block) means that jump connection is introduced into the neural network, so that the network can learn residual mapping. In the super-resolution of an image, there is a correlation between details of a high-resolution image and low-frequency information. By introducing residual blocks, the network can better capture and retain this information. The method is helpful for reducing information loss in the super-resolution process, so that the generated high-resolution image is more accurate and the details are more abundant. And introduces RFB mechanisms to simulate the Receptive Field of human vision to enhance the local feature extraction capability of a dense Receptive Field network, a Receptive Field refers to the Receptive Field of a neuron to a specific region in the input image. The Receptive Field Block (RFB) module expands the receptive field of each neuron in the neural network, enabling it to span a larger spatial range. In image super-resolution, the addition of receptive fields helps the network capture a wider range of contextual information, especially when restoring detail. This helps the network to better understand the structure and texture of the image, thereby improving the quality of the super-resolution result. The method has the advantages that the local feature extraction is carried out on the pipeline optical remote sensing image, the reconstruction quality of the pipeline optical remote sensing image is improved, the super-resolution reconstruction is carried out on the training image by combining the global feature of the extracted training image, so that important targets such as an oil gas pipeline and the like are captured and identified more easily by the image reconstruction model, abnormal conditions such as leakage, damage, corrosion and the like of the pipeline can be detected more easily, the safety and monitoring effect of the pipeline are improved, in addition, the high-resolution reconstruction image can provide more detailed geographic information and landform features, the planning and design of the pipeline are facilitated, and the pipeline planning and design are more accurate and reliable.

In the embodiment of the present disclosure, the training image may be obtained by using SAR technology, so as to ensure that the trained image reconstruction model can better reconstruct the super-resolution of the pipeline optical shake images of different areas, and the training image of the embodiment of the present disclosure may come from different areas, thereby improving the representativeness of the training image.

The training image is then preprocessed, and in the embodiments of the present disclosure, specific preprocessing methods, such as filtering, polarization correction, median filtering, gaussian filtering, histogram equalization, and the like, may be used in addition to conventional denoising, speckle removal, image registration, and the like, so as to eliminate artifacts and interference in the training image and improve database quality.

The training image is then subjected to data enhancement and data splitting.

Through operations such as rotation, translation, overturning, noise addition and the like, the image data set is expanded, and the generalization capability of the image reconstruction model is increased. In addition, data enhancement can be performed by means of random flipping, random rotation, random translation, color disturbance, random noise addition, sample mixing, color space transformation and the like, and the embodiment of the specification is not limited.

And then dividing a large number of training images into a training set, a verification set and a test set according to a preset proportion to train the image reconstruction model, and adopting a cross verification method to better evaluate the performance of the image reconstruction model in order to ensure the representativeness and fairness of the data set.

Then constructing a generating countermeasure network, wherein the generating countermeasure network structure is based on a generator framework of SRGAN and is mainly divided into three parts: global feature extraction, local feature extraction, and image reconstruction. The image reconstruction is based on global features, local features and image information acquired through a network, and the principle is that the dimensions of feature parameters of a training image are adjusted through a plurality of convolution layers to obtain a reconstructed image. Image reconstruction is a common method in the art, and the embodiments of the present disclosure are not repeated.

Fig. 3 is a schematic structural diagram of an RRNL-SRGAN generator network according to an embodiment of the present disclosure. The LR image is a low-resolution image obtained by downsampling, that is, a training image, and then a convolution calculation is performed through a standard 2D convolution layer, for example, global features are extracted through a 9×9 convolution layer, so as to obtain global features. And then performing calculation through a PReLU activation function layer, performing local feature extraction on the global features through a dense receptive field network formed by connecting a residual block and three RRNL blocks in series and an RFB module, performing calculation through a Normalization layer, and finally obtaining a reconstructed image SR through an Upsample layer.

And then constructing a discriminator network, and inputting the reconstructed image and the training image into the discriminator network for calculation to obtain a discrimination result. The discriminator network is used to distinguish between a true high resolution image and a reconstructed image generated by the generator network, the structure of the discriminator network according to the embodiments of the specification is shown in fig. 4, where HR represents a true high resolution image and SR represents a reconstructed image, the discriminator network is mainly composed of eight feature extraction blocks (each of which is mainly composed of Conv, BN and LeakyReLU), the convolution layers in the feature extraction blocks are alternately composed of convolutions with a convolution kernel size of 3 and a fill of 1 and a step size of 1 or 2, the feature mapping is increased from 64 to 1024, and at the end of the discriminator, the final discrimination result is obtained by an adaptive averaging pooling layer and LeakyReLU function.

Then calculating a loss value according to the training image, the reconstructed image and the discrimination result; and iterating the steps until the loss value meets the preset requirement, and taking the generator network as an image reconstruction model. The number of iterative training can be set manually, and the loss value is used for evaluating the accuracy of the image reconstruction model, so that the image reconstruction model is trained and learned towards the correct direction.

According to one embodiment of the present disclosure, as shown in fig. 5, the local feature extraction is performed on the global feature by using a dense receptive field network formed by connecting a residual block with three RRNL blocks in series and an RFB module, where obtaining the local feature further includes:

step 501: inputting the global features into a residual block for feature extraction to obtain first features;

step 502: inputting the first feature into three RRNL blocks connected in series to perform feature extraction, and taking the output result of the last RRNL block as a second feature;

step 503: and inputting the second characteristic to an RFB module for characteristic extraction to obtain the local characteristic.

In the embodiment of the present specification, compared with the original SRGAN structure, the residual block module removes the Bulk Normalization (BN) layer to reduce the introduction of artifacts and thereby enhance the generalization capability of the network. Continuing to show in fig. 1, from three RRNL blocks connected in series, the output of the last RRNL block is used as the input of the next RRNL block, the output result of the last RRNL block is used as the second feature, and then the feature extraction is performed through the RFB module, so as to obtain the local feature.

According to an embodiment of the present disclosure, a schematic structure of RRNL blocks is shown in fig. 6, each RRNL block is formed by concatenating an RRDB structure with a Non-local structure, wherein the schematic structure of RRDB blocks is shown in fig. 7, and the RRDB structure is formed by concatenating three residual stacking blocks (redundancy blocks) and is densely connected to effectively expand network capacity. A schematic diagram of the structure of the Dense block is shown in fig. 8.

The calculating step of any one of the RRNL blocks includes:

Further, a schematic diagram of the Non-local structure is shown in fig. 9, where the Non-local structure directly calculates the relationship between two locations by Non-local operations to quickly capture long-range dependencies. The Non-local module is used to model long-range relationships between pixels, regardless of how far they are from in the image. In image super-resolution, the relationship between pixels is important for recovering lost high frequency details. Through the Non-local module, the network can better understand the dependency relationship between different pixels, so that details can be recovered more accurately. This helps to generate a more natural, clearer high resolution image.

Inputting the feature output by the last residual stacking block into a Non-local structure for feature extraction further comprises:

Inputting the first convolution result into a value branch, a key branch and a query branch of the Non-local structure respectively for calculation:

wherein the value branch is used for mapping the input feature map to a value feature map to obtain a feature value for calculating each pixel position, the feature map comprises feature representations of different positions in the original image, and when the attention weighted value is calculated, the value features (value) are multiplied by the attention distribution to calculate the weighted value;

the key branches are used for mapping the input feature map to a key feature map through convolution operation to obtain key features of each pixel position, wherein each pixel position corresponds to one key feature, and the key features are used for calculating the attention distribution. The key features contain association information between pixels;

the query branch is used for mapping the input feature map to a query feature map through convolution operation to obtain query features of each pixel position, wherein each pixel position corresponds to one query feature, and the query features are used for calculating the attention distribution. The query feature is used for measuring the association degree between pixels;

performing matrix multiplication on the query feature output by the query branch and the key feature reshape output by the key branch to obtain a relation matrix, and inputting a product result (the relation matrix) to a softmax layer of the Non-local structure to perform normalization processing to obtain an attention map (attention map) between one pixel position in the input feature map and other pixel positions except the pixel position in the input feature map;

And multiplying the characteristic value output by the value branch by the attention map to obtain the characteristic extracted by the Non-local structure, wherein the characteristic of each point in the output result is related to all other points through an attribute map, so that the global context information is possessed.

And finally, an RFB mechanism is introduced to simulate the receptive field of human vision so as to strengthen the characteristic extraction capability of the network. The RFB structure integrally refers to acceptance, replaces dense links with sparse connections, and additionally introduces 3 dialated convolution layers, specifically, a structural schematic diagram of an RFB module is shown in fig. 10, where the RFB module includes three branches, a first branch includes one convolution layer with a convolution kernel of 1×1, one convolution layer with a convolution kernel of 3×3 and a rate of 1, a second branch includes one convolution layer with a convolution kernel of 1×1, one convolution layer with a convolution kernel of 3×3, and one convolution layer with a convolution kernel of 3×3 and a rate of 3, and a third branch includes one convolution layer with a convolution kernel of 1×1, one convolution layer with a convolution kernel of 5×5, and one convolution layer with a convolution kernel of 3×3 and a rate of 5;

and inputting the output of the 1 multiplied by 1 convolution layer of the first layer in the third branch, which is obtained from the first layer 1 multiplied by 1 convolution layer crossing the middle three-layer network through the shortcut operation, and the result of the concatenation and fusion of the features of the localization layer into a ReLU activation function layer for calculation, so as to obtain the local features.

According to one embodiment of the present description, the penalty values include generator network penalty values and arbiter network penalty values.

The formula for calculating the generator network loss value is formula (1):

GLoss＝Min(λ _a L _a +λ _p L _p +λ _i L _i +λ _tv L _tv ) (1)

wherein, gloss represents the generator network loss value, lambda _a 、λ _p 、λ _i 、λ _tv All are constant coefficients, and are set to 0.001, 0.006, 1, 2e-8 in this order in the examples of the present specification. L (L) _a 、L _p 、L _i 、L _tv The countermeasures loss, the perception loss, the image mean square error loss and the total variation loss respectively;

wherein counter loss L _a The formula of (2):

wherein x is _real For inputting real image information, x _fake A reconstructed image generated by the generator network,representing averaging training images, < > >Representing taking the mean of the reconstructed image,/->Representing the passage of discriminator->Solving for reconstructed image x _fake And true image x _real The difference between (I) and (II)>Representing the passage of discriminator->Solving for reconstructed image x _fake And true image x _real The gap between them. The value of the Log function is the output from the arbiter network, typically a value representing the result of the arbiter's discrimination on the input samples, which may be a probability value or a score.

The discriminator is a discrimination network that is continuously trained, and one value can be output for an input value (image) through the discriminator (discrimination network).

Perception loss L _p The formula of (3):

wherein, (x) _fake ) _i Representing the ith reconstructed image, (x) _real ) _i Representing the ith real image, vgg (-) represents the loss network, the loss network described in the embodiments of this specification is a Vgg-19 pretraining network, N is the training batch size and, representing the number of samples used to update the model weights in one iteration, I ₁ Representing the L1 norm.

Image mean square error L _i The formula of (2) is (4):

where N is the training batch size, representing the number of samples used to update the model weights in one iteration, I ₁ Represents the L1 norm, (x) _fake ) _i Representing the ith reconstructed image, (x) _real ) _i Representing the ith real image.

Equation L for total variation loss _tv Is (5):

/>

wherein x is _i,j X is the pixel of the ith row and the jth column in the image _i,j-1 For its left pixel, x _i+1,j For its lower pixels, the total variation loss is the sum of the square of the difference between each pixel and its left and lower pixels plus the root number.

The formula for calculating the loss value of the discriminator network is (6):

DLoss＝Max(D(x _real )-1-D(x _fake )) (6)

wherein DLoss represents a loss value of the arbiter network, max represents a maximum value, D represents the arbiter network, and x _real Representing training images, x _fake ＝G(z)，x _fake The representation generator network is based on training images x _real The reconstructed image is constructed, G represents the generator network, z represents noise.

According to one embodiment of the present description, a Nadam optimizer is used in the training process of the image reconstruction model, and as the number of training iterations increases, the learning rate is changed to stabilize the training process.

In accordance with one embodiment of the present description, to verify the feasibility of a trained image reconstruction model, two indicators, peak signal to noise ratio (Peak Signal to Noise Ratio, PSNR) and structural similarity (Structural Similarity Index Measure, SSIM), are selected to evaluate the feasibility of the model.

Specifically, PSNR is an index that measures image quality, and represents the ratio of the maximum possible power of a signal to the destructive noise power affecting its expression accuracy, which is obtained by MSE (mean square error), and the specific formula of MSE is shown in (7):

Where m, n, I, K means two m x n monochromatic images I and K, one of which is noise approximation of the other, m and n representing the pixel coordinates of the image.

The specific formula of PSNR is shown as (8):

wherein MAX _I The maximum value representing the color of the image point is 255 if each sample point is represented by 8 bits. The smaller the MSE, the greater the PSNR, generally the greater the PSNR, representing better image quality.

SSIM is an index for measuring the similarity of two pictures, wherein the input of the SSIM is two images, one is an undistorted image, the other is a restored image, the other two images are x and y respectively, and the SSIM formula is as follows (9):

SSIM(x,y)＝[l(x,y)] ^α [c(x,y)] ^β [s(x,y)] ^γ (9)

wherein, alpha, beta and gamma are all larger than 0.l (x, y), c (x, y), s (x, y) are respectively brightness comparison, contrast comparison and structure comparison, and specific formulas are as follows (10) - (12):

wherein mu _x ，μ _y Mean value of x, y, sigma _x ，σ _y Standard deviation of x, y, sigma _xy Covariance of x, y, c ₁ 、c ₂ 、c ₃ Is constant, is used for avoiding that denominator is 0, and is generally that another alpha=beta=gamma=1 and c in actual engineering ₃ ＝c ₂ 2, the SSIM is simplified into the following form (13):

it can thus be seen that SSIM is a number between 0 and 1, with a closer to 1 indicating a smaller difference between the output image and the undistorted image.

The PSNR and SSIM means for the different methods are shown in Table 1:

TABLE 1 mean values of PSNR and SSIM under different methods

On subjective effect, the reconstruction effect of the non-local dense-sense wild-pair model proposed by the text on the remote sensing image of the sister-to-sister pipeline is shown by the image more intuitively, fig. 15-16 show the reconstruction comparison result of each super-resolution algorithm on the remote sensing image of the pipeline, wherein (a) - (c) in fig. 15 and (d) - (f) in fig. 16 are input images, (a 1) - (c 1) in fig. 15 and (d 1) - (f 1) in fig. 16 are images reconstructed by the bicubic algorithm, (a 2) - (c 2) in fig. 15 and (d 2) - (f 2) in fig. 16 are images reconstructed by the nearest algorithm, (a 3) - (c 3) in fig. 15 and (d 3) - (f 3) in fig. 16 are images reconstructed by the srcnn, and (a 4) - (c 4) in fig. 16 are images reconstructed by the srgan, and (a 5 c 5) in fig. 16 are images reconstructed by the text. Fig. 15 is a comparison diagram of visual effects of reconstructed images including houses and buildings, and (a) in fig. 15 is a remote sensing image including drums, houses and greens, and it can be seen from the diagram that (a 1), (a 2) and (a 3) in fig. 15 are very fuzzy for edge processing of the drums, and (a 4) and (a 5) in fig. 15 restore edge information of the drums better, and meanwhile, whether the roof grid is divided or the edge area of the drums, the texture of (a 5) in fig. 15 is clearer and more vivid than that of (a 4) in fig. 15; fig. 15 (b) is a remote sensing image including a living place and a house, the processing of the place lines by (b 1), (b 2) and (b 3) in fig. 15 is quite fuzzy, the senses of (b 4) and (b 5) in fig. 15 are more true, the white lines on the place are restored, and meanwhile (b 5) in fig. 15 is clearer for the edge demarcation processing of the house compared with (b 4) in fig. 15; in fig. 15, (c) is a remote sensing image including houses and roads, and compared with (c 1), (c 2), (c 3), and (c 4) in fig. 15, the content shown in (c 5) in fig. 15 is clearer, and the boundaries are well defined, so that the remote sensing image is more suitable for the viewing habit of human eyes.

Fig. 16 is a comparison graph of visual effects of reconstructed images including river and green areas, and it can be seen from fig. 16 that the algorithm proposed herein has better texture extraction than other algorithms, for example, (d) in fig. 16 is a remote sensing image including puddles and fields, and (d 4) and (d 5) in fig. 16 are more clearly defined for the treatment between water and land than (d 1), (d 2) and (d 3) in fig. 16, and (d 5) in fig. 16 is more truly clear for the reconstruction of cultivated textures on fields than (d 4) in fig. 16 before improvement; fig. 16 (e) is a remote sensing image including a puddle and a greenbelt area, from which it can be seen that fig. 16 (e 5) has a more pronounced reconstruction effect than the first four images; in fig. 16, (f) is a terrace remote sensing image, the reconstructed image obtained by bicubic, nearest, srcnn algorithm is generally represented in detail texture, the processing is too smooth, the visual effect is far less than that of the image obtained by model reconstruction proposed herein, and in particular, as shown in fig. 16, (f 5), the layering condition of the terrace is relatively truly restored.

In addition, the quality and realism of the reconstructed image may also be verified by manual subjective evaluation or expert evaluation, and the embodiment of the present specification is not limited.

Based on the same inventive concept, the embodiment of the present disclosure further provides a training device for an image reconstruction model, as shown in fig. 11, including:

the generator network training unit 1101 is configured to input a training image into a generator network for calculation to obtain a reconstructed image, where the generator network includes global feature extraction, local feature extraction and image reconstruction, and the global feature extraction includes global feature extraction on the training image through a first convolution layer to obtain global features; the local feature extraction comprises local feature extraction of the global feature through a dense receptive field network formed by connecting a residual block and three RRNL blocks in series and an RFB module to obtain a local feature; the image reconstruction includes: performing image reconstruction on the global features and the local features to obtain a reconstructed image;

the arbiter network training unit 1102 is configured to input the reconstructed image and the training image to the arbiter network for calculation, so as to obtain a discrimination result;

a loss value calculating unit 1103 for calculating a loss value according to the training image, the reconstructed image and the discrimination result;

an iterative training unit 1104, configured to iterate the above steps until the loss value meets a predetermined requirement, and use the generator network as an image reconstruction model.

Since the principle of the device for solving the problem is similar to that of the method, the implementation of the device can be referred to the implementation of the method, and the repetition is omitted.

Based on the same inventive concept, the embodiments of the present disclosure further provide an image reconstruction method, as shown in fig. 12, including:

step 1201: receiving a real image of an image to be reconstructed;

step 1202: and inputting the real image into an image reconstruction model for processing to obtain a reconstructed image.

In this step, the image reconstruction model is obtained by using the training method of the image reconstruction model described above.

In the embodiment of the present specification, the real image of the image to be reconstructed is first subjected to preprocessing such as denoising, speckle removing, filtering, and the like, and then the preprocessed image is input into the image reconstruction model for processing, so as to generate the reconstructed image.

Based on the same inventive concept, the embodiments of the present disclosure further provide an image reconstruction apparatus, as shown in fig. 13, including:

a real image receiving unit 1301 configured to receive a real image of an image to be reconstructed;

an image reconstruction unit 1302, configured to input the real image into an image reconstruction model for processing, so as to obtain a reconstructed image, where the image reconstruction model is obtained by using the above-mentioned training method of the image reconstruction model.

Fig. 14 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure, where an apparatus in the present disclosure may be the computer device in the present embodiment, and perform the method in the present disclosure. Computer device 1402 may include one or more processing devices 1404, such as one or more Central Processing Units (CPUs), each of which may implement one or more hardware threads. The computer device 1402 may also include any storage resources 1406 for storing any kind of information, such as code, settings, data, etc. For example, and without limitation, storage resource 1406 may include any one or more of the following combinations: any type of RAM, any type of ROM, flash memory devices, hard disks, optical disks, etc. More generally, any storage resource may store information using any technology. Further, any storage resource may provide volatile or non-volatile retention of information. Further, any storage resources may represent fixed or removable components of computer device 1402. In one case, when the processing device 1404 executes associated instructions stored in any storage resource or combination of storage resources, the computer device 1402 can perform any of the operations of the associated instructions. The computer device 1402 also includes one or more drive mechanisms 1408, such as a hard disk drive mechanism, an optical disk drive mechanism, and the like, for interacting with any storage resources.

Computer device 1402 may also include an input/output module 1410 (I/O) for receiving various inputs (via input device 1412) and for providing various outputs (via output device 1414). One particular output mechanism may include a presentation device 1416 and an associated Graphical User Interface (GUI) 1418. In other embodiments, input/output module 1410 (I/O), input device 1412, and output device 1414 may not be included as just one computer device in a network. Computer device 1402 may also include one or more network interfaces 1420 for exchanging data with other devices via one or more communication links 1422. One or more communication buses 1424 couple together the above-described components.

The communication link 1422 may be implemented in any manner, for example, through a local area network, a wide area network (e.g., the internet), a point-to-point connection, etc., or any combination thereof. Communication link 1422 may include any combination of hardwired links, wireless links, routers, gateway functions, name servers, etc., governed by any protocol or combination of protocols.

The present description embodiment also provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-described method.

The present description also provides computer-readable instructions, wherein the program therein causes a processor to perform the above-described method when the processor executes the instructions.

It should be understood that, in various embodiments of the present disclosure, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation of the embodiments of the present disclosure.

It should also be understood that, in the embodiments of the present specification, the term "and/or" is merely one association relationship describing the association object, meaning that three relationships may exist. For example, a and/or B may represent: a exists alone, A and B exist together, and B exists alone. In the embodiment of the present specification, the character "/", generally indicates that the front and rear associated objects are in an "or" relationship.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the various illustrative elements and steps have been described above generally in terms of function in order to best explain the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments herein.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided in the embodiments of this specification, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purposes of the embodiments of the present description.

In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the embodiments of the present specification is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present specification. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The principles and implementations of the embodiments of the present specification are explained by applying specific embodiments in the embodiments of the present specification, and the above explanation of the embodiments is only for helping to understand the methods of the embodiments of the present specification and the core ideas thereof; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope based on the ideas of the embodiments of the present specification, the contents of the present specification should not be construed as limiting the embodiments of the present specification in view of the above.

Claims

1. A method of training an image reconstruction model, the method comprising:

2. The method of claim 1, wherein extracting local features from the global features by a dense receptive field network of residual blocks in series with three RRNL blocks and RFB modules further comprises:

3. The method of claim 2, wherein each RRNL block is formed by concatenating an RRDB structure with a Non-local structure, the RRDB structure being formed by concatenating three residual stacked blocks;

the calculating step of any one of the RRNL blocks includes:

4. A method according to claim 3, wherein inputting the features of the last residual stack block output to a Non-local structure for feature extraction further comprises:

5. A method according to claim 3, wherein the RFB module comprises three branches, the first branch comprising one convolution layer with a convolution kernel of 1 x 1, one convolution layer with a convolution kernel of 3 x 3 and a rate of 1, the second branch comprising one convolution layer with a convolution kernel of 1 x 1, one convolution layer with a convolution kernel of 3 x 3 and a rate of 3, the third branch comprising one convolution layer with a convolution kernel of 1 x 1, one convolution layer with a convolution kernel of 5 x 5 and one convolution layer with a convolution kernel of 3 x 3 and a rate of 5;

6. The method of claim 1, wherein the penalty values include a generator network penalty value and a arbiter network penalty value;

the formula for calculating the generator network loss value is:

GLoss＝Min(λ _a L _a +λ _p L _p +λ _i L _i +λ _tv L _tv )

DLoss＝Max(D(x _real )-1-D(x _fake ))

7. A method of image reconstruction, the method comprising:

Receiving a real image of an image to be reconstructed;

inputting the real image into an image reconstruction model for processing to obtain a reconstructed image, wherein the image reconstruction model is obtained by using the training method of the image reconstruction model as set forth in any one of claims 1 to 6.

8. A training apparatus for an image reconstruction model, the apparatus comprising:

the system comprises a generator network training unit, a reconstruction unit and a reconstruction unit, wherein the generator network training unit is used for inputting training images into a generator network for calculation to obtain reconstructed images, the generator network comprises global feature extraction, local feature extraction and image reconstruction, and the global feature extraction comprises global feature extraction of the training images through a first convolution layer to obtain global features; the local feature extraction comprises local feature extraction of the global feature through a dense receptive field network formed by connecting a residual block and three RRNL blocks in series and an RFB module to obtain a local feature; the image reconstruction includes: performing image reconstruction on the global features and the local features to obtain a reconstructed image;

9. An image reconstruction apparatus, the apparatus comprising:

an image reconstruction unit for inputting the real image into an image reconstruction model for processing to obtain a reconstructed image, wherein the image reconstruction model is obtained by using the training method of the image reconstruction model according to any one of claims 1-6.

10. A computer device comprising a memory, a processor, and a computer program stored on the memory, characterized in that the processor implements the method of any of claims 1 to 7 when executing the computer program.