CN116597033A

CN116597033A - Image reconstruction method, device, equipment and medium

Info

Publication number: CN116597033A
Application number: CN202310544909.2A
Authority: CN
Inventors: 李金新
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2023-05-15
Filing date: 2023-05-15
Publication date: 2023-08-15

Abstract

The present disclosure provides an image reconstruction method, which relates to the field of artificial intelligence. The method comprises the following steps: inputting an image to be reconstructed to a trained encoder; obtaining a super-resolution image output by the trained encoder; wherein the trained encoder is configured to be trained in advance according to: inputting the first image to the encoder to generate a second image after super-resolution reconstruction; inputting the second image to a decoder, generating a third image having a lower resolution; inputting the third image to the encoder to generate a fourth image after super-resolution reconstruction; the encoder is updated according to the similarity between the second image and the fourth image and the first image respectively. The present disclosure also provides an image reconstruction apparatus, a device, a storage medium, and a program product.

Description

Image reconstruction method, device, equipment and medium

Technical Field

The present disclosure relates to the field of artificial intelligence, and more particularly, to an image reconstruction method, apparatus, device, medium, and program product.

Background

Image reconstruction refers to the process of generating a new image similar to the original image through a series of mathematical models and algorithms using existing image information. Common image reconstruction methods include interpolation-based methods, transformation-based methods, machine learning-based methods, and the like. Image reconstruction is commonly used in application scenarios such as image recognition, image enhancement, image restoration, etc., like license plate recognition, face recognition, medical image analysis, etc. The image resolution directly affects the image recognition effect, while the recognition result of the low resolution image is less accurate.

In the process of realizing the inventive concept, the inventor finds that the time cost and the labor cost of the current image super-resolution reconstruction technology are high, and the image reconstruction effect is not ideal.

Disclosure of Invention

In view of the foregoing, the present disclosure provides image reconstruction methods, apparatus, devices, media, and program products.

In one aspect of the disclosed embodiments, there is provided an image reconstruction method including: inputting an image to be reconstructed to a trained encoder; obtaining a super-resolution image output by the trained encoder; wherein the trained encoder is configured to be trained in advance according to: inputting the first image to the encoder to generate a second image after super-resolution reconstruction; inputting the second image to a decoder, generating a third image having a lower resolution; inputting the third image to the encoder to generate a fourth image after super-resolution reconstruction; the encoder is updated according to the similarity between the second image and the fourth image and the first image respectively.

According to an embodiment of the present disclosure, updating the encoder according to a similarity between each of the second image and the fourth image and the first image includes: obtaining a first similarity between the second image and the first image and a second similarity between the fourth image and the first image with a discriminator, the discriminator being obtained based on a convolutional neural network; comparing the first similarity with the second similarity to obtain a comparison result; and updating the encoder according to the comparison result.

According to an embodiment of the present disclosure, if the similarity between each of the second image and the fourth image and the first image is not equal, the method further includes: and taking the fourth image as a new second image, and circularly executing the steps of generating a third image with lower resolution, generating a fourth image after super-resolution reconstruction and updating the encoder.

According to an embodiment of the present disclosure, updating the encoder according to the comparison result includes: if the first similarity is less than the second similarity minus a first threshold, updating the encoder according to a first loss function value between the second image and the fourth image; if the first similarity is greater than the second similarity by a sum of a second threshold, not updating the encoder.

According to an embodiment of the present disclosure, before updating the encoder according to the comparison result, it includes: the decoder is updated according to a first loss function value between the second image and the fourth image.

According to an embodiment of the present disclosure, before inputting the third image to the encoder, the method further comprises: obtaining a second loss function value between the first image and the third image; updating the encoder and the decoder according to the second loss function value.

According to an embodiment of the disclosure, the arbiter comprises a first convolutional layer, a conversion layer, and a deep convolutional neural network structure, the conversion layer is obtained based on at least one convolutional layer, and the obtaining a first similarity between the second image and the first image using the arbiter comprises: processing the second image with the first convolution layer such that the second image is the same size as the first image; performing dimension lifting operation on the first image and the second image through the conversion layer; and calculating the first similarity between the first image and the second image after the dimension rise by using the depth convolution neural network structure.

According to an embodiment of the present disclosure, the decoder includes a first convolution set, a second convolution set, and N channels, wherein the number of convolution sets and the convolution set size are different between each channel and at least one other channel, N being greater than or equal to 2; said inputting said second image to a decoder, generating a third image having a lower resolution comprising: inputting the second image into the first convolution group to obtain a first feature vector; respectively inputting the first feature vectors into the N channels to obtain N second feature vectors; and inputting the N second eigenvectors to the second convolution group to obtain the third image.

Another aspect of an embodiment of the present disclosure provides an image reconstruction apparatus, including: a first input module that inputs an image to be reconstructed to a trained encoder; the first output module is used for obtaining the super-resolution image output by the trained encoder; wherein the trained encoder is configured to be trained in advance according to: inputting the first image to the encoder to generate a second image after super-resolution reconstruction; inputting the second image to a decoder, generating a third image having a lower resolution; inputting the third image to the encoder to generate a fourth image after super-resolution reconstruction; the encoder is updated according to the similarity between the second image and the fourth image and the first image respectively.

The apparatus comprises means for performing the steps of the method as claimed in any one of the preceding claims, respectively.

Another aspect of an embodiment of the present disclosure provides an electronic device, including: one or more processors; and a storage means for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method as described above.

Another aspect of the disclosed embodiments also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the method as described above.

Another aspect of the disclosed embodiments also provides a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.

One or more of the above embodiments have the following advantages: the super-resolution reconstruction process is regarded as an encoding process, and a trained encoder is utilized to process the image to be reconstructed and output a super-resolution image. In the process of pre-training the encoder, the first image is used as input, the second image and the fourth image which are subjected to super-resolution reconstruction are obtained through the processing of the encoder and the decoder, and the third image with lower resolution can be used for determining whether the reconstruction effect of the encoder accords with expectations and tends to be stable according to the similarity between the second image and the fourth image and the first image, and optimizing the encoder towards the direction of obtaining a better reconstruction image.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be more apparent from the following description of embodiments of the disclosure with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates an application scenario diagram of an image recognition method according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow chart of an image reconstruction method according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow chart of a training method of an encoder according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates an architecture diagram of a training method of an encoder in accordance with an embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow chart of a first update of the encoder and decoder according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates a flow chart of a second update of the encoder according to an embodiment of the disclosure;

FIG. 7 schematically illustrates a flow chart of a method of training an encoder in accordance with another embodiment of the present disclosure;

FIG. 8 schematically illustrates a flow chart for obtaining a first similarity according to an embodiment of the disclosure;

FIG. 9 schematically illustrates a schematic diagram of a arbiter in accordance with an embodiment of the disclosure;

FIG. 10 schematically illustrates a flowchart of obtaining a third image according to an embodiment of the disclosure;

FIG. 11 schematically illustrates an architecture diagram of a decoder according to an embodiment of the present disclosure;

fig. 12 schematically shows a block diagram of an image reconstruction apparatus according to an embodiment of the present disclosure;

FIG. 13 schematically illustrates a block diagram of a training device of an encoder in accordance with an embodiment of the present disclosure; and

fig. 14 schematically illustrates a block diagram of an electronic device adapted to implement an image reconstruction method according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

To facilitate an understanding of the invention by those skilled in the art, some terms or nouns involved in the various embodiments of the invention are explained below:

image resolution: the number of pixels is contained in a unit inch.

Low resolution image: the resolution of the image is low, blurring and even distorting the image.

High resolution image: the resolution of the image is high, the color is rich, and the image is free from blurring.

Super-resolution image: an image that is reconstructed from the low resolution image and is consistent with the original high resolution image size.

For example, there are a large number of business scenarios requiring images, such as face recognition, in banks at present, including but not limited to identity confirmation of business such as card opening, financial management, account management, etc. of customers in banks, and there is a requirement for definition of acquired images. Identity verification is performed through face recognition, such as handling bank cards, logging in a mobile banking system, reporting loss, changing mobile phone number business and the like, and the problems of business failure and the like caused by inconsistent image definition acquired by different equipment. In particular, the embodiments of the present disclosure are not limited to face recognition in a banking scenario, but may be used for face recognition in other scenarios, or image reconstruction in any scenario, and are not limited to face images.

In the related art, a supervised learning method is generally used, and a mapping function between pairs of low-resolution and high-resolution images is learned by using the images as training data, so that the resolution of the low-resolution images is improved. This approach requires human determination of image pairs, such as manual capture or generation of low-resolution or high-resolution image component image pairs, relying on extensive manual processing of training data, which can be costly in terms of manpower, time, and training costs. In the training process, the artificial image pairs in the training set are input into the model to be trained for general training, and the training is dependent on the learning of the batch image pairs, so that the information of each image is not fully utilized, and the model after the training is limited by the training set, so that the generalization capability is poor.

In some embodiments of the present disclosure, an image reconstruction method is provided, which regards a process of super-resolution reconstruction as an encoding process, processes an image to be reconstructed with a trained encoder, and outputs a super-resolution image. In the process of pre-training the encoder, the first image is used as input, the second image and the fourth image which are subjected to super-resolution reconstruction are obtained through the processing of the encoder and the decoder, and the third image with lower resolution can be used for determining whether the reconstruction effect of the encoder accords with expectations and tends to be stable according to the similarity between each of the second image and the fourth image and the first image, and optimizing the encoder towards the direction of obtaining a better reconstruction image.

In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information (such as the user image) of the user is obtained or collected. The related processes of collecting, storing, using, processing, transmitting, providing, disclosing and applying personal information of the user all accord with the regulations of related laws and regulations, necessary security measures are taken, and the public order is not violated.

Fig. 1 schematically illustrates an application scenario diagram of an image recognition method according to an embodiment of the present disclosure. The present disclosure is not limited thereto.

As shown in fig. 1, an application scenario 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like. In some embodiments, the terminal device 101, 102, 103 may have a camera unit or be capable of receiving video or images acquired from the camera unit.

The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the image recognition request, and feed back the processing result (for example, obtain or generate the image recognition result according to the user request) to the terminal device.

It should be noted that, the image recognition method provided by the embodiment of the present disclosure may be generally performed by the server 105. Accordingly, the image recognition apparatus provided by the embodiments of the present disclosure may be generally provided in the server 105. The image recognition method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the image recognition apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

The image reconstruction method according to the embodiment of the present disclosure will be described in detail below with reference to fig. 2 to 11 based on the scene described in fig. 1.

Fig. 2 schematically shows a flowchart of an image reconstruction method according to an embodiment of the present disclosure.

As shown in fig. 2, the image reconstruction of this embodiment includes operations S210 to S220.

In operation S210, an image to be reconstructed is input to a trained encoder.

The image to be reconstructed acquired by the terminal equipment can be acquired, and the resolution of the image to be reconstructed is lower. Specifically, an image obtained by shooting or capturing a screen by a terminal device (such as a mobile phone) can be used as an image to be reconstructed, or an image shot and transmitted by other terminal devices can be used as an image to be reconstructed.

In operation S220, a super-resolution image output by the trained encoder is obtained.

The decoder may be constructed using, for example, a super-resolution convolutional network (SRCNN), a fast super-resolution convolutional network (FSRCNN), a Deep Recursive Convolutional Network (DRCN), or a laplacian pyramid super-resolution network (LapLRNN), and is trained in advance according to the operations shown in fig. 3 below.

The embodiment of the disclosure provides a single-image super-resolution reconstruction method based on a coding and decoding structure, which can reconstruct an input low-resolution image into a high-resolution image which accords with expectations through repeated iterative coding and decoding operations only by using the input low-resolution image in a training process. The method eliminates the cost of constructing a dataset without using low resolution and high resolution image pairs. Further description is provided below with reference to fig. 3 and 4.

Fig. 3 schematically illustrates a flow chart of a training method of an encoder according to an embodiment of the present disclosure. Fig. 4 schematically illustrates a block diagram of a training method of an encoder according to an embodiment of the present disclosure.

As shown in fig. 3, the training method of the encoder of this embodiment includes operations S310 to S340.

In operation S310, the first image is input to an encoder, and a super-resolution reconstructed second image is generated. The second image has a higher resolution than the first image, for example, a super resolution image.

The first image may be a low resolution image acquired by the terminal device, where the resolution of the image is low, and blurring or even distortion occurs. Specifically, the low resolution image includes an image having an image resolution lower than the standard resolution. The standard resolution can be flexibly set according to the requirements in actual use scenes, for example, when the intelligent teller machine of the bank website performs face recognition, the standard resolution can be determined according to the resolution requirement that the intelligent teller machine can normally perform face recognition.

For example, in a pre-training process, the training set may include one or more first images. For example, the training set may include both background rich and single images. Including both fully lit and insufficiently lit images. Including both normally exposed and abnormally exposed images, and the like. The encoder reconstruction effect obtained by training is more universal, and the generalization capability is stronger.

In operation S320, the second image is input to the decoder, and a third image having a lower resolution is generated.

In some embodiments, the decoder may add interference reducing resolution to the second image, including for example downsampling, blurring, spatially non-uniform noise, motion panning, compression, etc., in particular, the third image may be obtained using a degenerate approach of globally non-uniform gaussian noise, anisotropic gaussian kernel blurring, random-direction motion blurring, jpeg compression, bicubic/bilinear interpolation downsampling, etc. The third image may be a low resolution image. The decoder may be obtained based on a neural network.

In operation S330, the third image is input to the encoder, and a fourth image after super-resolution reconstruction is generated. The fourth image may have a higher resolution than the third image and may be a super-resolution image. The effect of this operation is to reconstruct the third image output by the decoder further to determine the effect of the super-resolution image output by the decoder, if the reconstruction effect of the second image is good, the loss difference between the third image and the first image is smaller, otherwise, larger. Likewise, the loss difference between the second image and the fourth image is also smaller.

In operation S340, the encoder is updated according to the similarity between each of the second image and the fourth image and the first image.

Operation S340 may be performed for each first image to update the encoder if the update condition is satisfied. And each time the similarity between the second image and the fourth image and the first image is calculated, namely whether the parameter adjustment is carried out on the encoder is judged according to the calculation result of the time, so that the image output by the encoder after reconstruction is more accurate and meets the requirements.

In some embodiments, who of the second and fourth images is more similar to the first image, i.e., who is more similar, then a back propagation algorithm is performed based on the similarity between the image and the first image, and parameters of the encoder are adjusted based on the gradient descent algorithm. This allows to determine the loss value (i.e. the similarity) from the better reconstruction results, thus optimizing the encoder.

In other embodiments, who of the second and fourth images is more similar to the first image, i.e., who is more similar, is used as a basis for whether to update the encoder based on the loss function value between the second and fourth images, as described in detail below with reference to fig. 7.

According to an embodiment of the present disclosure, a process of super-resolution reconstruction is regarded as an encoding process, an image to be reconstructed is processed with a trained encoder, and a super-resolution image is output. In the process of pre-training the encoder, the first image is used as input, the second image and the fourth image which are subjected to super-resolution reconstruction are obtained through the processing of the encoder and the decoder, and the third image with lower resolution can be used for determining whether the reconstruction effect of the encoder accords with expectations and tends to be stable according to the similarity between each of the second image and the fourth image and the first image, and optimizing the encoder towards the direction of obtaining a better reconstruction image.

Fig. 5 schematically shows a flow chart of updating an encoder and decoder for the first time according to an embodiment of the present disclosure.

Before inputting the third image to the encoder, the first update of the encoder and decoder in this embodiment includes operations S510 to S520 as shown in fig. 5.

In operation S510, a second loss function value between the first image and the third image is obtained.

In operation S520, the encoder and decoder are updated according to the second loss function value.

In some embodiments, the second loss function value may be calculated using the L2 loss function and the perceptual loss function, and the results of the two may be added directly or may be added after weighting, e.g., the values of the loss functions may be summed after multiplying the learnable weights, which may be iteratively adjusted during training.

Referring to fig. 4, a second loss function is constructed based on simulating the difference between two low resolution images, i.e., the first image and the third image. The second loss function is used to characterize the difference between the simulated two low resolution images. By minimizing the losses based on this second loss function, the difference between the first image and the third image obtained after the encoding and decoding process can be minimized, i.e. the second image reconstructed by the encoder is closer and closer to the content of the first image and the resolution is higher.

In some embodiments, referring to fig. 4, before updating the encoder in operation S340, a second update may be performed on the decoder, including: the decoder is updated according to a first loss function value between the second image and the fourth image.

In some embodiments, the first loss function value may be calculated using the L1 loss function and the perceptual loss function, and the results of the two may be added directly or may be added after weighting, e.g., the values of the loss functions may be summed after multiplying the learnable weights, which may be iteratively adjusted during training.

With continued reference to fig. 4, a first loss function is constructed based on simulating the difference between the two super-resolution images, i.e., the second image and the fourth image. The first loss function is used to characterize a difference between the super-resolution image generated by the encoder for the first time and the super-resolution image generated by the encoder for the second time. Likewise, by minimizing the loss based on the first loss function, the difference between the super-resolution images output by the encoder twice can be minimized, i.e., the reconstruction effect is better and the stability is better. The fourth image is reconstructed by the encoder according to the low resolution image output by the decoder, so that the updating decoder can optimize the decoding effect and is convenient for cooperation with the encoder.

Fig. 6 schematically shows a flow chart of updating an encoder a second time according to an embodiment of the present disclosure.

As shown in fig. 6, one of the embodiments in operation S340 includes operations S610 to S630.

In operation S610, a first similarity between the second image and the first image and a second similarity between the fourth image and the first image are obtained using a discriminator, which is obtained based on a convolutional neural network.

For example, the image feature vectors of the first image, the second image, and the fourth image may be extracted by using a discriminator, and the first similarity and the second similarity may be obtained by calculating the euclidean distance.

In operation S620, the magnitudes of the first similarity and the second similarity are compared, and a comparison result is obtained.

In operation S630, the encoder is updated according to the comparison result. The optimization direction of the encoder can be influenced in the training process, and the encoder is prevented from being updated towards the direction with poorer effect.

In some embodiments, updating the encoder based on the comparison result includes:

if the first similarity is less than the difference of the second similarity minus the first threshold, the encoder is updated according to a first loss function value between the second image and the fourth image.

If the first similarity is greater than the second similarity by a sum of the second threshold, the encoder is not updated. The first threshold is the same as or different from the second threshold. The first threshold and the second threshold are both greater than or equal to 0.

For example, the first similarity and the second similarity are valued in [0,1 ]. If the first threshold is 0.1, the first similarity is 0.6, the second similarity is 0.9, and the difference of the first similarity and the second similarity minus the first threshold is smaller than 0.8, updating is performed. If the second threshold is also 0.1, the first similarity is 0.85, the second similarity is 0.7, and the first similarity is greater than the sum value of the second similarity increased by the second threshold by 0.8, the updating is not performed.

The method has the advantages that when the updating condition is met, the fourth image is closer to the first image compared with the second image, namely, the fourth image is more similar to the first image, and the effect of reconstructing the fourth image by the encoder is better, so that the encoder can be updated according to the loss between the two super-resolution images, and the aim of iteratively optimizing the parameters of the encoder is fulfilled. On the contrary, if the second image is closer to the first image than the fourth image, i.e. is more similar to the first image, the effect of reconstructing the fourth image by the encoder is worse, and the fourth image can be not updated any more, so that the existing reconstruction capability of the encoder is prevented from being damaged.

In some embodiments, it is understood that in the second update, if the first similarity is less than the second similarity minus the first threshold, the decoder and encoder may be updated simultaneously according to the first loss function value. If the first similarity is greater than the second similarity by a sum of the second threshold, only the decoder is updated according to the first loss function value.

Fig. 7 schematically illustrates a flow chart of a method of training an encoder according to another embodiment of the present disclosure.

As shown in fig. 7, the training method of the encoder of this embodiment includes operations S310 to S340, and operations S710 to S750. Operations S310 to S340 are the same as above, and are not described here again.

In operation S710, a comparison result between the first similarity (similarity 1) and the second similarity (similarity 2) is obtained.

In operation S720, if the similarity 1 is smaller than the difference of the similarity 2 minus the first threshold, the encoder is updated.

In operation S730, if the similarity 1 is greater than the similarity 2 by a sum of the second threshold values, the encoder is not updated.

In operation S740, if the similarities between the second image and the fourth image and the first image are not equal, the fourth image is used as a new second image, and the steps of generating a third image having a lower resolution, generating a fourth image after super-resolution reconstruction, and updating the encoder are cyclically performed.

Referring to fig. 7, if the operation S720 or the operation S730 is satisfied, the operations S320 to S710 are continued with the fourth image as the new second image, and it is determined to execute any one of the operations S720 to S750 depending on the comparison result, and if the operation S750 is not executed, the loop execution is continued.

In particular, the similarity between the second image and the fourth image and the first image is not equal, that is, the similarity 1 and the similarity 2 are not equal, which may include a sum of the difference that the similarity 1 is smaller than the similarity 2 minus the first threshold, and the sum that the similarity 1 is larger than the similarity 2 plus the second threshold. In some embodiments, the first threshold and the second threshold may both be 0. In other embodiments, at least one of the first threshold and the second threshold may be other than 0.

In operation S750, if the absolute value of the difference between the similarity 1 and the similarity 2 is smaller than or equal to the third threshold, the training is ended or the next first image is input to continue the training. The third threshold is greater than or equal to 0. When the absolute value of the difference between the first similarity and the second similarity is smaller than or equal to a third threshold value, the difference representing the second image and the fourth image is smaller, and the reconstruction effect of the encoder tends to be stable. In some embodiments, it may further be evaluated whether the first similarity and the second similarity are greater than a preset value, so as to ensure that the super-resolution reconstruction effect is good enough, and whether the first loss function value is close to the first loss function value obtained by previous calculation, for example, an absolute value of a difference value between the first loss function value and the second loss function value is less than or equal to a fourth threshold, so as to further ensure the image reconstruction stability.

According to the embodiment of the disclosure, under the condition that the preset condition is met, the fourth image is taken as a new second image to be re-participated in the training process, on one hand, a plurality of inputs can be constructed by fully utilizing the same first image to realize multi-round iterative training, the required quantity of training samples in the training set is reduced, or the training effect is improved. On the other hand, the deep mining encoder can continue to reconstruct a better image based on the first image, and the image reconstruction result is used as a guiding update encoder parameter. On the other hand, since the reconstruction may update the decoder based on the result of encoding and decoding the first image or update the encoder, the reconstruction may update the fourth image to the second image, which is used as a new input of the decoder in the next reconstruction, and continue the decoding and encoding process, and has a certain migration learning effect based on the reason from the same first image. The transfer learning can learn new knowledge based on the existing knowledge, find the similarity between the existing knowledge and the new knowledge, and achieve a better learning effect through the transfer of the similarity.

Fig. 8 schematically illustrates a flowchart of obtaining a first similarity according to an embodiment of the present disclosure. Fig. 9 schematically shows a schematic diagram of a arbiter according to an embodiment of the disclosure.

As shown in fig. 8, obtaining the first similarity of this embodiment includes operations S810 to S830. The arbiter comprises a first convolutional layer, a conversion layer and a deep convolutional neural network structure, wherein the conversion layer is obtained based on at least one convolutional layer.

Referring to fig. 9, the arbiter consists of a conversion layer and a VGG19 network (i.e., a deep convolutional neural network structure), with low resolution images and high resolution images as inputs, respectively. Because of the input image size non-uniformity, a 5*5 convolution layer (by way of example only) is required to reduce the super-resolution image size to be consistent with the low-resolution image size. The conversion layer extracts shallow characteristic information of the input image. And extracting deep characteristic information by using a VGG19 network, and calculating corresponding one-dimensional characteristic information by using a full connection layer. And calculating the one-dimensional characteristic information difference of the two images by using an L1 function, and mapping the one-dimensional characteristic information difference to between 0 and 1 by using a sigmoid function. The similarity is high when the value is closer to 1, otherwise the opposite is true. The VGG19 network consists of 19 weight layers including 16 convolutional layers, 3 full connection layers, 5 max pooling layers, and 1 sigmoid layer.

It can be appreciated that the deep convolutional neural network structure of the present disclosure is not limited to VGG19 network, and can be flexibly set in the case of implementing similarity calculation.

In operation S810, the second image is processed using the first convolution layer such that the second image is the same size as the first image. Referring to fig. 9, the first convolution layer may be a 5*5 convolution layer.

In operation S820, the first image and the second image are subjected to an up-scaling operation through the conversion layer. The method aims at extracting the characteristic information of the deep convolutional neural network structure.

In operation S830, a first similarity between the first image and the second image after the upscaled is calculated using the deep convolutional neural network structure.

It is understood that the manner of calculating the second similarity can also be implemented through operations S810 to S830, which are not described herein.

Fig. 10 schematically illustrates a flowchart of obtaining a third image according to an embodiment of the disclosure. Fig. 11 schematically illustrates a block diagram of a decoder according to an embodiment of the present disclosure.

As shown in fig. 10, the embodiment obtains a third image including operations S1010 to S1030. The decoder includes a first convolution set, a second convolution set, and N channels, where the number of convolution sets and the convolution set size are different between each channel and at least one other channel, N being greater than or equal to 2.

The decoder structure is shown in fig. 11 and comprises a plurality of 1*1 convolution layers, 3*3 convolution layers, 5*5 convolution layers and 7*7 convolution layers, the plurality of convolution layers forming a group. 1*1 convolution sets are used to promote dimensionality and reduce dimensionality. 3*3 convolution layer, 5*5 convolution layer and 7*7 convolution sets are used to extract and filter feature information. A low resolution image is ultimately generated.

In operation S1010, a second image is input to the first convolution set to obtain a first feature vector. The method is used for dimension lifting operation, and is convenient for subsequent extraction and filtering of characteristic information.

In operation S1020, the first feature vectors are input to the N channels, respectively, to obtain N second feature vectors.

In operation S1030, N second feature vectors are input to the second convolution set, resulting in a third image. The method is used for dimension reduction operation, and is convenient for obtaining a third image.

After N second feature vectors are extracted, the N second feature vectors are unified in a second convolution group for fusion, the purpose of reducing the image resolution can be achieved, and a low-resolution image is obtained. Different channels extract different image feature vectors, so that the resolution can be reduced, the image content can be better represented, the method can be suitable for images with different contents, and the application range is wider.

In some embodiments, the second eigenvector extracted by each channel may be mixed with a specific noise vector before the channel is input to the second convolution set, further serving the purpose of adding noise.

Specifically, each channel may add a different noise vector than the other at least one channel. Noise vectors may include texture feature vectors, shape feature vectors, and spatial relationship feature vectors of an image, and high-level semantic feature vectors of an image. The low-resolution image obtained in this way participates in the training process, so that the anti-interference capability of the encoder can be enhanced, and the image reconstruction effect can be improved.

In some embodiments, referring to fig. 2-11, the training process of the encoder is further described as follows:

step 1, the first image is input to an encoder, which reconstructs it into a super-resolution image a (second image).

Step 2, the super-resolution image a is input to a decoder, which reconstructs a corresponding low-resolution image (third image).

And step 3, calculating a loss value between the first image and the low-resolution image generated in the step 2 through the L2 loss function and the perception loss function, and iteratively updating the parameters of the encoder and the decoder.

And 4, passing the low-resolution image obtained in the step 3 through an encoder again to generate a super-resolution image B (fourth image).

And step 5, calculating loss values of the super-resolution image A and the super-resolution image B through the L1 loss function and the perception loss function, and iteratively updating the decoder.

The above-mentioned L1 loss function and L2 loss function include a minimum absolute value deviation (L1) and a minimum square error (L2).

And 6, respectively judging the similarity of the super-resolution image A and the super-resolution image B with the first image through a discriminator, if the similarity of the super-resolution image A is lower than that of the super-resolution image B, updating the encoder parameters by using the loss value obtained in the step 5, otherwise, not updating.

And 7, taking the super-resolution image B as input, and iteratively repeating the steps 2-6.

And step 8, iterating for a plurality of times until the absolute value of the similarity difference between the super-resolution image A and the super-resolution image B is within a certain range, and the loss value of the generated third image and the loss value of the generated first image are not obviously changed. The reconstructed image of the first image is finished, the next first image can be input or the whole training is finished, and the encoder is deployed into the practical application.

Based on the image reconstruction method, the disclosure also provides an image reconstruction device and a training device of the encoder. The details will be described below in conjunction with fig. 12 and 13.

Fig. 12 schematically shows a block diagram of the structure of an image reconstruction apparatus according to an embodiment of the present disclosure.

As shown in fig. 12, the image reconstruction apparatus 1200 of this embodiment includes a first input module 1210 and a first output module 1220.

The first input module 1210 may perform operation S210 for inputting an image to be reconstructed to a trained encoder.

The first output module 1220 may perform operation S220 for obtaining a super-resolution image output by the trained encoder.

Fig. 13 schematically shows a block diagram of a training device of an encoder according to an embodiment of the present disclosure.

As shown in fig. 13, the training apparatus 1300 of the encoder of this embodiment includes a first super-resolution reconstruction module 1310, a first low-resolution reconstruction module 1320, a second super-resolution reconstruction module 1330, and a first update module 1340.

The first super-resolution reconstruction module 1310 may perform operation S310 for inputting the first image to the encoder to generate a super-resolution reconstructed second image.

The first low resolution reconstruction module 1320 may perform operation S320 for inputting the second image to a decoder, generating a third image having a lower resolution.

In some embodiments, the first low resolution reconstruction module 1320 may perform operations S1010 to S1030, which are not described herein.

The second super-resolution reconstruction module 1330 may perform operation S330 for inputting the third image to the encoder to generate a fourth image after super-resolution reconstruction.

The first updating module 1340 may perform operation S340 for updating the encoder according to the similarity between each of the second image and the fourth image and the first image.

In some embodiments, the first update module 1340 may perform operations S510-S520, operations S610-S630, and operations S710-S730, which are not described herein.

In some embodiments, the first update module 1340 may include a arbiter unit that may perform operations S810-S830, which are not described herein.

In some embodiments, exercise device 1300 may include an iterative update module that may perform operations S740-S750, which are not described in detail herein.

It should be noted that, in the embodiment of the apparatus portion, the implementation manner, the solved technical problem, the realized function, and the achieved technical effect of each module/unit/subunit and the like are the same as or similar to the implementation manner, the solved technical problem, the realized function, and the achieved technical effect of each corresponding step in the embodiment of the method portion, and are not described herein again.

According to an embodiment of the present disclosure, any of the plurality of modules in the image reconstruction apparatus 1200 or the training apparatus 1300 of the encoder may be combined in one module to be implemented, or any of the plurality of modules may be split into a plurality of modules. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module.

At least one of the image reconstruction apparatus 1200 or the encoder training apparatus 1300 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), programmable Logic Array (PLA), system-on-chip, system-on-substrate, system-on-package, application Specific Integrated Circuit (ASIC), or by hardware or firmware, such as any other reasonable way of integrating or packaging the circuitry, or in any one of or a suitable combination of three of software, hardware, and firmware, in accordance with embodiments of the present disclosure. Alternatively, at least one of the image reconstruction apparatus 1200 or the encoder training apparatus 1300 may be at least partially implemented as computer program modules which, when executed, perform the corresponding functions.

As shown in fig. 14, an electronic device 1400 according to an embodiment of the present disclosure includes a processor 1401 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1402 or a program loaded from a storage section 1408 into a Random Access Memory (RAM) 1403. The processor 1401 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 1401 may also include on-board memory for caching purposes. The processor 1401 may include a single processing unit or a plurality of processing units for performing different actions of the method flows according to embodiments of the present disclosure.

In the RAM 1403, various programs and data necessary for the operation of the electronic device 1400 are stored. The processor 1401, ROM 1402, and RAM 1403 are connected to each other through a bus 1404. The processor 1401 performs various operations of the method flow according to the embodiment of the present disclosure by executing programs in the ROM 1402 and/or the RAM 1403. Note that the program may be stored in one or more memories other than the ROM 1402 and the RAM 1403. The processor 1401 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in one or more memories.

According to an embodiment of the disclosure, the electronic device 1400 may also include an input/output (I/O) interface 1405, the input/output (I/O) interface 1405 also being connected to the bus 1404. Electronic device 1400 may also include one or more of the following components connected to I/O interface 1405: including an input portion 1406 of a keyboard, mouse, etc. Including an output portion 1407 such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like. A storage section 1408 including a hard disk or the like. And a communication section 1409 including a network interface card such as a LAN card, a modem, and the like. The communication section 1409 performs communication processing via a network such as the internet. The drive 1410 is also connected to the I/O interface 1405 as needed. Removable media 1411, such as magnetic disks, optical disks, magneto-optical disks, semiconductor memory, and the like, is installed as needed on drive 1410 so that a computer program read therefrom is installed as needed into storage portion 1408.

The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments. Or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 1402 and/or RAM 1403 described above and/or one or more memories other than ROM 1402 and RAM 1403.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. The program code, when executed in a computer system, causes the computer system to perform the methods provided by embodiments of the present disclosure.

The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 1401. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program can also be transmitted, distributed over a network medium in the form of signals, and downloaded and installed via the communication portion 1409, and/or installed from the removable medium 1411. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1409 and/or installed from the removable medium 1411. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 1401. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be provided in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.

The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims

1. An image reconstruction method, comprising:

inputting an image to be reconstructed to a trained encoder;

obtaining a super-resolution image output by the trained encoder;

Wherein the trained encoder is configured to be trained in advance according to:

inputting the first image to the encoder to generate a second image after super-resolution reconstruction;

inputting the second image to a decoder, generating a third image having a lower resolution;

inputting the third image to the encoder to generate a fourth image after super-resolution reconstruction;

the encoder is updated according to the similarity between the second image and the fourth image and the first image respectively.

2. The method of claim 1, wherein updating the encoder according to the similarity between the second image and the fourth image each and the first image comprises:

obtaining a first similarity between the second image and the first image and a second similarity between the fourth image and the first image with a discriminator, the discriminator being obtained based on a convolutional neural network;

comparing the first similarity with the second similarity to obtain a comparison result;

and updating the encoder according to the comparison result.

3. The method of claim 1 or 2, wherein if the similarity between the second image and the fourth image and the first image are not equal, the method further comprises:

And taking the fourth image as a new second image, and circularly executing the steps of generating a third image with lower resolution, generating a fourth image after super-resolution reconstruction and updating the encoder.

4. The method of claim 2, wherein updating the encoder according to the comparison result comprises:

if the first similarity is less than the second similarity minus a first threshold, updating the encoder according to a first loss function value between the second image and the fourth image;

if the first similarity is greater than the second similarity by a sum of a second threshold, not updating the encoder.

5. The method of claim 2, wherein prior to updating the encoder based on the comparison result, comprising:

the decoder is updated according to a first loss function value between the second image and the fourth image.

6. The method of claim 1, wherein prior to inputting the third image to the encoder, the method further comprises:

obtaining a second loss function value between the first image and the third image;

updating the encoder and the decoder according to the second loss function value.

7. The method of claim 2, wherein the discriminant comprises a first convolutional layer, a transform layer, and a deep convolutional neural network structure, the transform layer being obtained based on at least one convolutional layer, the obtaining a first similarity between the second image and the first image with the discriminant comprising:

processing the second image with the first convolution layer such that the second image is the same size as the first image;

performing dimension lifting operation on the first image and the second image through the conversion layer;

and calculating the first similarity between the first image and the second image after the dimension rise by using the depth convolution neural network structure.

8. The method of claim 1, wherein,

the decoder comprises a first convolution group, a second convolution group and N channels, wherein the number of the convolution groups and the size of the convolution groups between each channel and at least one other channel are different, and N is more than or equal to 2;

said inputting said second image to a decoder, generating a third image having a lower resolution comprising:

inputting the second image into the first convolution group to obtain a first feature vector;

Respectively inputting the first feature vectors into the N channels to obtain N second feature vectors;

and inputting the N second eigenvectors to the second convolution group to obtain the third image.

9. An image reconstruction apparatus comprising:

a first input module that inputs an image to be reconstructed to a trained encoder;

the first output module is used for obtaining the super-resolution image output by the trained encoder;

10. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-8.

11. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1-8.

12. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 8.