CN116547696A - Image enhancement method and device - Google Patents

Image enhancement method and device Download PDF

Info

Publication number
CN116547696A
CN116547696A CN202180079713.XA CN202180079713A CN116547696A CN 116547696 A CN116547696 A CN 116547696A CN 202180079713 A CN202180079713 A CN 202180079713A CN 116547696 A CN116547696 A CN 116547696A
Authority
CN
China
Prior art keywords
training
image
deblurring
neural network
application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180079713.XA
Other languages
Chinese (zh)
Inventor
沈枫易
奥纳伊·优厄法利欧格路
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN116547696A publication Critical patent/CN116547696A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • G06F18/2178Validation; Performance evaluation; Active pattern learning techniques based on feedback of a supervisor
    • G06F18/2185Validation; Performance evaluation; Active pattern learning techniques based on feedback of a supervisor the supervisor being an automated module, e.g. intelligent oracle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/98Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
    • G06V10/993Evaluation of the quality of the acquired pattern
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data

Abstract

An image enhancement method, comprising: an input image is generated by concatenating the original input image with the depth map. The method comprises the following steps: generating bottleneck characteristics by encoding the input image using an encoder; a perturbation vector is injected into the bottleneck feature. The bottleneck characteristics injected into the disturbance vector are fed to an image generator. The method further comprises the steps of: at the image generator, an enhanced image is generated from the bottleneck characteristics and the disturbance vector. The method further comprises the steps of: at a arbiter, the enhanced image and a randomly selected sharp image from a sharp image dataset are received, and an image enhancement score is determined from a comparison between the enhanced image and the randomly selected sharp image. The method can provide a plurality of output images, reduce processing complexity, and improve the quality of the output images.

Description

Method and system for deblurring blurred images
Cross reference
The present application claims priority from a prior application, U.S. non-provisional application No. 17/098, 605, entitled "method and System for deblurring blurred images" filed on 11-16 in 2020, the contents of which are incorporated herein by reference.
Technical Field
The present disclosure relates to the field of computer vision, and more particularly, to a method and system for deblurring blurred digital images.
Background
Images captured by digital cameras often appear poorly due to some unwanted blurring artifacts. The blurring of the image captured by the digital camera may be due to the following reasons: motion of objects in the scene during image capture, movement of the digital camera during image capture, low illumination or insufficient light in the scene during image capture, and the like. The task of processing blurred images (i.e., blurred images) to generate clean images (i.e., clearer images with little or no blur) is referred to as deblurring. The blur in the image may be uniform (i.e. each partial blur of the image is of the same form, e.g. blur due to shake of the digital camera) or non-uniform (i.e. different partial blur of the image is of different form, e.g. blur due to movement of objects in the scene). Non-uniform blur in an image may also be referred to as dynamic scene blur, as non-uniform blur in an image is often caused by dynamic motion of objects in the scene during image capture. Deblurring unevenly blurred images is often very challenging, as the blur in the image is caused by dynamic motion of objects in the scene during image capture, which may be irregular and variable.
Some existing solutions for dynamic scene image deblurring (commonly referred to as image deblurring) utilize neural networks, particularly those that have been trained end-to-end for image deblurring (i.e., those that have been trained to output predicted deblurred images from blurred images input to the neural network). Such solutions typically require a large set of training data (e.g., a large set of labeled training data, where the labeled training data is a blurred image paired with a reference genuine clean image, it is difficult, if not impossible, to easily obtain a real blurred-clean image pair (e.g., a digital camera typically cannot capture both blurred and clean images at the same time).
Another challenge faced by existing image deblurring solutions is that it is difficult, if not impossible, to include all possible types of dynamic scene blur in the training data, which (if possible) would also result in model overfitting. Existing image deblurring solutions tend to be sensitive to training data. That is, the trained neural network performs well when deblurring images that are similar to images in the training dataset, but performs poorly for images that are different from the training dataset (e.g., outside of the statistical distribution). When the trained neural network is applied to a real blurred image that is different from the training dataset, the clean image output by the trained neural network may contain some unwanted ghosts. The neural network of conventional training cannot adapt to new blurred images.
Accordingly, there is a need to provide a method and system for self-adapting deblurring of blurred images.
Disclosure of Invention
In various examples, the present disclosure describes methods and systems for deblurring a blurred image through a neural network. The disclosed neural network is designed to perform both the primary task of image deblurring, as well as the secondary task. The auxiliary tasks are defined as being related to the primary tasks, but are designed to learn more easily (e.g., no additional collection of reference genuine tags is required at the time of testing). In the disclosed example, the trained neural network is further trained with a predetermined number of iterations through a particular blurred input image (i.e., a particular blurred image input to the neural network) to update the weights of the trained neural network such that the weights of the neural network are customized (i.e., adapted) to process the particular blurred input image. After further training of the neural network is completed, the further trained neural network deblurs the particular input blurred image to generate a clean image. Customizing (i.e., adapting) weights of a neural network that is further trained based on a particular blurred input image and the ability to deblur the particular blurred input image using the further trained neural network may be referred to herein as self-adapting deblurring, and further training the trained neural network to adapt the particular blurred input image may be referred to herein as application-time training.
In examples of the present disclosure, self-adapting deblurring (e.g., for deblurring a real image captured by a camera of an electronic device) may be implemented in a real implementation, and a neural network may be trained to deblur the real blurred image at the application stage. The present disclosure describes methods and systems that enable training of a neural network during application phases (e.g., only requiring a few training iterations) that are relatively fast to apply, so that the weights of the neural network can be adapted instantaneously to perform deblurring for each particular real image. The disclosed method and system provide the following technical effects: the particular blurred input image may be used to immediately further train a trained neural network (e.g., a neural network that has been trained to deblur the blurred input image) during the application phase. The neural network trained herein provides improved performance in deblurring a particular blurred input image as compared to deblurring a particular blurred input image using existing trained neural networks.
Application of the trained neural network training may only be performed by an auxiliary task (e.g., an auxiliary reconstruction task), using relatively few iterations (e.g., ten or fewer iterations, or five or fewer iterations). The following technical effects are provided: the application time training of the trained neural network can be performed immediately during the application phase, without excessive use of memory resources and/or without excessive computation time, and thus can be performed by a resource-constrained system, such as a handheld device or mobile device (e.g., a smart phone, tablet or notebook, etc.).
Examples of the present disclosure may enable a higher quality clean output image to be generated after image deblurring than some existing deblurring methods and systems. The disclosed methods and systems enable further training of a trained neural network to adapt the weights of the trained neural network to deblur a particular blurred input image, while requiring a reference true phase (i.e., a clean image) of the particular image.
In some exemplary aspects, the present disclosure describes an image deblurring method. The method includes obtaining a deblurred neural network having a meta-training weight, wherein the meta-training weight was previously obtained by meta-training the deblurred neural network on a primary deblurring task and a secondary reconstruction task. The method further includes obtaining a blurred input image at the time of application. The method further comprises the steps of: performing an application time training on the deblurred neural network with the meta training weights by the application time blurred input image to obtain application time training weights of the deblurred neural network by: executing the auxiliary reconstruction task on the application-time blurred input image to predict a reconstructed blurred image; and updating the meta-training weights of the deblurring neural network based on the auxiliary loss calculated from the auxiliary loss function, the application-time blurred input image, and the reconstructed blurred image. The method further includes, after the application-time training is completed, generating a deblurred output image from the application-time blurred input image through the deblurred neural network with the application-time training weights.
In any of the above examples, the application-time training includes a plurality of iterations of the application-time blurred input image, the application-time blurred input image being a single blurred image, each iteration may include performing the auxiliary reconstruction task and updating weights of the deblurring neural network.
In any of the examples above, the application time training includes at most five iterations.
In any of the above examples, the auxiliary reconstruction task is performed by features passed from the primary deblurring task.
In any of the above examples, the deblurring neural network may include: a shared subnetwork for processing the application-time blurred input image, wherein the shared subnetwork is coupled to a main subnetwork for performing the main deblurring task and a secondary subnetwork for performing the auxiliary reconstruction task; the main sub-network comprising a main output neural network layer for performing the main deblurring task, the main sub-network processing output from the shared sub-network to generate the deblurred output image; the auxiliary sub-network includes an auxiliary output neural network layer for performing the auxiliary reconstruction task, the auxiliary sub-network processing output from the shared sub-network to generate the reconstructed blurred image.
In any of the above examples, the features output by the primary output neural network layer may be replicated to the neural network layer of the secondary sub-network.
In any of the above examples, in the primary sub-network, the output from the primary output neural network layer may represent a residual between the deblurred output image and the application-time blurred input image, and the residual may be added to the application-time blurred input image to generate the deblurred output image.
In any of the above examples, the meta-training of the deblurring neural network may be performed based on a training data set, wherein the training data set includes input-output pairs of labeled training data.
In some exemplary aspects, the present disclosure describes a training method for deblurring a neural network. The method includes initializing weights of the deblurring neural network. The method further comprises the steps of: performing a round of meta-training of the deblurring neural network to perform a primary deblurring task and an auxiliary reconstruction task, and obtaining meta-training weights by: sampling a sampling batch of training data, wherein the sampling batch comprises a plurality of blurred training images, each blurred training image paired with a corresponding clean training image in the sampling batch; for each given fuzzy training image in the sample batch, performing the auxiliary reconstruction task on the given fuzzy training image to predict a corresponding reconstructed fuzzy image, and calculating a corresponding temporary weight set based on the auxiliary loss calculated from an auxiliary loss function, the given fuzzy training image, and the corresponding reconstructed fuzzy image; for each given pair of blurred training images and corresponding clean training images in the sample batch, performing the primary deblurring task to predict a corresponding predicted clean image and calculating a corresponding primary loss according to a primary loss function, the corresponding clean training image, the corresponding predicted clean image and a corresponding temporary weight set; and updating the weight of the deblurring neural network by gradient summation of the main losses calculated accordingly. The method further includes, after meta-training is completed, storing the meta-training weights, and further training the deblurring neural network with the meta-training weights for use in application of blurred input images.
In any of the above examples, the auxiliary reconstruction task is performed by features passed from the primary deblurring task.
In any of the above examples, the deblurring neural network may include: a shared subnetwork for processing the blurred training image, wherein the shared subnetwork is coupled to a primary subnetwork for performing the primary deblurring task and a secondary subnetwork for performing the secondary reconstruction task; the main sub-network comprising a main output neural network layer for performing the main deblurring task, the main sub-network processing output from the shared sub-network to generate the deblurred output image; the auxiliary sub-network includes an auxiliary output neural network layer for performing the auxiliary reconstruction task, the auxiliary sub-network processing output from the shared sub-network to generate the reconstructed blurred image.
In any of the above examples, the features output by the primary output neural network layer may be replicated to the neural network layer of the secondary sub-network.
In any of the above examples, in the primary sub-network, the output from the primary output neural network layer may represent a residual between the deblurred output image and the application-time blurred input image, and the residual may be added to the application-time blurred input image to generate the deblurred output image.
In some exemplary aspects, the disclosure describes an apparatus comprising a processor to execute instructions to cause the apparatus to perform any of the methods described above.
In some example aspects, the disclosure describes a computer-readable medium having instructions stored therein. The instructions, when executed by a processor of a computing device, cause the computing device to perform any of the methods described above.
Drawings
Reference will now be made, by way of example, to the accompanying drawings, which show exemplary embodiments of the present application, and in which:
FIG. 1 is a block diagram of an example system architecture that may be used for meta-training and application-time training in accordance with some embodiments of the present disclosure.
Fig. 2 is a block diagram of one example hardware architecture of a neural network processor, according to some embodiments of the present disclosure.
Fig. 3 is a block diagram of one example architecture of a deblurring network according to some embodiments of the present disclosure.
Fig. 4 is a flowchart of an example method for meta-training and application-time training of a deblurring network according to some embodiments of the present disclosure.
Fig. 5 is a flowchart of an example method for meta-training and application-time training of a deblurring network according to some embodiments of the present disclosure.
Fig. 6 is an example pseudo code for implementing the example method of fig. 5, according to some embodiments of the disclosure.
Fig. 7 is a flowchart of an example method for meta-training and application-time training of a deblurring network according to some embodiments of the present disclosure.
Like reference numerals may be used in different figures to denote like components.
Detailed Description
The technical scheme of the present disclosure is described below with reference to the accompanying drawings.
The image deblurring methods and systems described in the examples herein may be applied to scenes that deblur blurred digital images. In the disclosed method and system, the neural network is first trained by meta-training to perform primary and secondary tasks. The main task is to generate a clean image based on a blurred input image. Auxiliary tasks are defined as tasks related to the primary task but are easier to learn. For example, the auxiliary task may be rendering a blurred input image. The result of the meta-training is the meta-training weight of the neural network. When the neural network has been meta-trained on a training device (e.g., computing system) other than the application device, the neural network and meta-training weights may be provided or deployed to the application device (e.g., end-consumption system), such as a handheld device, smart phone, tablet, or digital camera. When the neural network trains on the application device, the meta-training weights are stored in the memory of the application device after training is completed. In the application phase, the neural network with the meta-training weights can be further trained by specific blurred input images (without corresponding reference live output clean images) to update the meta-training weights to the application-time training weights. After further training of the neural network at the application stage, the particular blurred input image may be deblurred using the neural network with the training weights at the time of application to output a clean image. It should be noted that after a clean image is output from a particular blurred input image using a neural network with application-time training weights, the application-time training weights may be discarded. That is, starting from the meta-training weights, the neural network may be retrained to deblur each particular blurred input image. Thus, the disclosed methods and systems provide the following technical effects: the blurred image can be deblurred by a neural network that is further trained in real time during the application phase. In this disclosure, the application phase refers to the use of a deblurring network for real-world applications (e.g., deblurring a real-world captured image), and may also be referred to as an inference phase, a prediction phase, or an online phase.
Training the neural network to update the meta-training weights to the application-time training weights may be referred to as application-time training. The training of the neural network at the time of application is only performed on auxiliary tasks. This may enable application-time training of the neural network to be performed with relatively few iterations (e.g., ten or fewer iterations, or five or fewer iterations), thereby enabling application-time training of the neural network to be performed immediately during the application phase without excessive use of memory resources and/or without excessive computation time. The technical effect is that after further training is completed, the trained neural network with the as-applied training weights can be used to generate high quality deblurred images from the real-world blurred input images, requiring only a relatively short computational time (e.g., less than one second) and can be performed on resource-limited systems, such as handheld or mobile devices (e.g., smartphones, tablets or notebooks, etc.) and desktop devices (e.g., desktop or personal computing devices, etc.).
To facilitate an understanding of the present disclosure, some prior art techniques for image restoration will now be discussed. Some prior art techniques are conventional (i.e., not machine learning based). Conventional techniques include, for example, deconvolution of estimated blur kernels based on the assumption that the entire image is uniformly blurred. Such techniques typically perform poorly when blur is not uniform. Deblurring dynamic scenes using conventional techniques is often challenging.
There have been some attempts to apply machine learning based techniques to image restoration. For example, meta learning (especially model-independent meta-learning (MAML)) is proposed as a solution to achieve single image super-resolution (single image super-resolution, SISR). The goal of SISR is to obtain a high resolution output image from only a single low resolution input image. With MAML, the neural network is quickly trained (e.g., in several gradient steps, or training iterations) to adapt its learning weights, and a high resolution output image is predicted from a particular low resolution input image using the neural network with the adapted learning weights. Further training of the neural network to adapt its learning weights and predicting the high resolution output image from the particular input image using the neural network with the adapted learning weights may be referred to as training-in-test, as the particular input image may be considered as an suitability test for the neural network. Training at test typically requires rapid training of the neural network (e.g., within a few gradient steps, or training iterations) to have practical use, and typically also requires pairing of each input data (e.g., each test input image) with reference real-world output data (e.g., expected output images) in order to train the neural network. However, this method of training while passing the test may not be suitable for the image deblurring problem. This is because a reference real-phase output image cannot be obtained for a real blurred image.
Another contemplated machine learning-based technique for image restoration is assisted learning. In auxiliary learning, auxiliary tasks are defined as being related to a primary task (e.g., image deblurring), but are easier to learn. Some layers (and weights) of the neural network are shared between the auxiliary task and the primary task such that updating the weights to improve the performance of the neural network in the auxiliary task affects the performance of the neural network in the primary task. Auxiliary tasks supporting primary deblurring task learning may be defined so that reference real-phase output data for each test input data can be readily obtained during training at test time. However, this approach generally provides little or no improvement in the performance of the primary deblurring task, as the updating of the weights of the neural network only contributes to the performance of the secondary task.
In various examples, the present disclosure describes methods and systems for image deblurring, where neural networks trained by training techniques (referred to herein as meta-assist learning) are used, which may realize the advantages of meta-learning and assisted learning. As will be discussed below, the disclosed methods and systems enable training of a neural network when testing the neural network using a particular blurred input image that does not have a corresponding reference live output clean image for rapid adaptation (e.g., within several training iterations) of previous meta-training weights of the neural network to improve the performance of the neural network in terms of a clean output image predicted from the particular input blurred image. In examples disclosed herein, test-time training of the deblurring network may be performed to enable deblurring of real images at the application stage. Accordingly, the present disclosure may refer to the test-time training of the deblurred network as the application-time training. The application time training is performed on auxiliary tasks, but prior training of neural networks through meta-learning and architecture of neural networks are designed to ensure that primary deblurring tasks also benefit from application time training. Examples of the disclosed methods and systems may provide good performance in the presence of data distribution differences, where blur of the test image or the real image may be different from blur found in the training dataset.
In some examples, the present disclosure describes example methods of training a neural network to learn an image deblurring task. In particular, this disclosure describes examples of training a neural network to adapt its weights to improve deblurring performance for a particular input image. The training method involves the processing of computer vision. In particular, the training method may be applied to data processing methods such as data training, machine learning, or deep learning to perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training, etc. on training data (e.g., blurred image data in the context of the present disclosure) to obtain a trained neural network. These will be discussed further below. Furthermore, the present disclosure describes an example method of image deblurring that may be performed by the trained neural network described above. In the examples discussed herein, input data (e.g., real-world blurred image data) is used to further train a trained neural network to obtain output data (e.g., deblurred images). It should be noted that the training method and the deblurring method of the neural network described herein may be considered to be based on the same idea, and may also be considered to be two parts of the system or two stages of the whole process: for example, a model training phase and a model application phase.
In general, examples disclosed herein relate to a number of neural network applications. For ease of understanding, some concepts related to neural networks and some related terms that may be related to the examples disclosed herein are described below.
The neural network is composed of neurons. Neurons being a computational unit, using x s And an intercept of 1 as input. The output of the computing unit may be:
where s=1, 2,..and n, n is a natural number greater than 1, W s Is x s B is the offset (i.e., bias) of the neuron, and f is the activation function of the neuron and is used to introduce nonlinear features into the neural network to convert the input of the neuron into an output. The output of the activation function may be used as an input to a neuron in the next convolutional layer in the neural network. For example, the activation function may be an S-type function. A neural network is formed by the connection of a plurality of said neurons. In other words, the output of one neuron may be the input of another neuron. The input of each neuron may be associated with a local reception region of an upper layer to extract features of the local reception region. The local receiving area may be an area composed of several neurons.
The deep neural network (deep neural network, abbreviated DNN) is also referred to as a multi-layer neural network, and can be understood as a neural network including a first layer (commonly referred to as an input layer), a plurality of hidden layers, and a last layer (commonly referred to as an output layer). There is no particular metric for "multiple". When there is a full connection between two adjacent layers of the neural network, that layer is considered a full connection layer. Specifically, if two adjacent layers (e.g., an i-th layer and an (i+1) -th layer) are to be fully connected, each neuron in the i-th layer must be connected to each neuron in the (i+1) -th layer.
The processing on each layer of DNN may be relatively simple. Briefly, the operation of each layer is represented by the following linear relational expression:wherein (1)>For inputting vectors, ++>For outputting the vector +.>For the offset vector, W is a weight (also called coefficient) and α () is an activation function. At each layer, input vector +.>Performing an operation to obtain an output vector +.>
Since there are a large number of layers in DNN, there are also a large number of weights W and offset vectorsThese parameters in DNN are defined as follows, taking weight W as an example. In this example, in a three-layer DNN (i.e. DNN with three hidden layers), the linear weight from the fourth neuron of the second layer to the second neuron of the third layer is denoted +.>The superscript 3 indicates the number of layers of the weight W (i.e., the third layer (or layer three) in this example), and the subscript indicates the inputAnd outputs at layer three index two (i.e., the second neuron of the third layer) while inputting at layer two index four (i.e., the fourth neuron of the second layer). In general, the weight from the kth neuron of the (L-1) th layer to the jth neuron of the L th layer can be expressed as +.>It should be noted that the input layer has no W parameter.
In DNNs, more hidden layers may allow the DNNs to better model complex situations (e.g., real world situations). Theoretically, DNNs with more parameters are more complex and have more capacity (this may refer to the ability of the learned model to adapt to various possible scenarios), indicating that DNNs can accomplish more complex learning tasks. Training of DNNs is the process of learning a weight matrix. The purpose of the training is to obtain a training weight matrix consisting of the learning weights W of all layers of DNN.
The convolutional neural network (convolutional neural network, abbreviated as CNN) is a DNN having a convolutional structure. The CNN includes a feature extractor consisting of a convolution layer and a sub-sampling layer. The feature extractor may be regarded as a filter. The convolution process may be regarded as the convolution of a two-dimensional (2D) input image or a convolved feature map using a trainable filter.
The convolutional layer is a neuron layer that performs convolutional processing on an input in the CNN. In a convolutional layer, one neuron may be connected to only a subset of neurons (i.e., not all neurons) in an adjacent layer. That is, the convolutional layer is typically not a fully-connected layer. A convolutional layer typically comprises several feature maps, each of which may consist of a number of neurons arranged in a rectangle. Neurons in the same feature map share weights. The shared weights may be collectively referred to as a convolution kernel. Typically, the convolution kernel is a two-dimensional matrix of weights. It should be appreciated that the convolution kernel may be independent of the manner and location of image information extraction. One principle behind the convolutional layer is that the statistics of one part of the image are the same as the statistics of another part of the image. This means that the image information learned from one part of the image can also be applied to another part of the image. Multiple convolution kernels may be used at the same convolution layer to extract different image information. In general, the larger the number of convolution kernels, the more rich the image information that the convolution operation reflects.
The convolution kernel may be initialized to a two-dimensional matrix of random values. During the training process of CNN, the weights of the convolution kernels are learned. One advantage of using convolution kernels to share weights between neurons in the same feature map is that the connections between the convolution layers of the CNN are reduced (compared to fully connected layers) and the risk of overfitting is reduced.
In training the DNN, predicted values of the DNN output may be compared to a desired target value (e.g., a true value). The weight vector (vector containing the weight W of a given layer) of each layer of DNN is updated based on the difference between the predicted value and the desired target value. For example, if the predicted value of the DNN output is too high, the weight vector for each layer may be adjusted to lower the predicted value. Such comparison and adjustment may be performed iteratively until a convergence condition is met (e.g., a predefined maximum number of iterations have been performed, or the predicted value of the DNN output converges sufficiently with the desired target value). A loss function or objective function is defined as a method of quantitatively indicating the proximity of a predicted value to a target value. The objective function represents an amount to be optimized (e.g., minimized or maximized) so that the predicted value is as close as possible to the target value. The loss function more specifically represents the difference between the predicted value and the target value, and the goal of training DNN is to minimize the loss function.
Back propagation is an algorithm that trains DNNs. The back propagation is used to adjust (also referred to as update) the values of parameters (e.g., weights) in the DNN to reduce errors (or losses) in the output. For example, the defined loss function is calculated from the forward propagation from the input to the output of the DNN. Back propagation calculates the gradient of the loss function from the parameters of the DNN, and a gradient algorithm (e.g., gradient descent) is used to update the parameters to reduce the loss function. The back propagation is performed iteratively, thereby converging or minimizing the loss function.
Referring to fig. 1, fig. 1 illustrates a system 100 according to an example embodiment of the present disclosure. The following description should not be construed as limiting any examples of the present disclosure. As shown in system 100, the tag training data may be stored in database 130. Database 130 may be located in a server or data center, or may be provided as a service by a cloud computing service provider. In the context of the present disclosure, the labeled training data refers to training data for learning meta-training weights of the deblurring neural network 101 (also referred to as the deblurring network 101 for simplicity). The marker training data includes an input-output image pair, where the input image is a blurred image and the paired output image (i.e., the expected output image) is a reference true phase clean (i.e., non-blurred) image. The tag training data is different from the application-time training data, which may be untagged real data (e.g., a real image captured by the application device 110, discussed below) or untagged test data. The application time data includes blurred images without paired output images (e.g., without reference genuine clean images). As will be discussed further below, the application-time training of the deblurring network 101 may be performed with a single input reality image to obtain application-time training weights for the deblurring network 101, the deblurring network 101 including the application-time training weights may be used to predict a corresponding single deblurred output image from the single input reality image.
Database 130 may contain, for example, labeled training data that has been previously collected and is typically used for training models related to image tasks (e.g., image recognition). The input image of the tag training data stored in the database 130 may perhaps or additionally be an image optionally collected (e.g., with user consent) from the application device 110 (which may be a user device). For example, images captured by the camera of the application device 110 and stored on the application device 110 may optionally be anonymized and uploaded to the database 130 for storage as input images of the tag training data. The marker training data stored in database 130 may include input-output image pairs, where the input images are synthetic blurred images from paired output images (e.g., reference live clean images).
As will be discussed further below, the deblurring network 101 may be trained using the meta-training device 120 based on training data stored in the database 130. In addition, or alternatively, meta-training device 120 may train deblurring network 101 using training data obtained from other sources, such as a distributed storage (or cloud storage platform). Meta-training deblurring network 101 (i.e., the result of meta-training by meta-training device 120) has a set of meta-training weights. According to examples disclosed herein, the application device 110 may further train the meta-training deblurring network 101 for deblurring a particular blurred real image. The application time training of the application device 110 may be performed by images (e.g., digital photographs) captured by a camera (not shown) of the application device 110. The application device 110 may not have access to the training data stored in the database 130.
In the examples disclosed herein, the meta-training deblurring network 101 may be implemented in the processing unit 111 of the application device 110. For example, the deblurring network 101 may be encoded and then stored as instructions in a memory (not shown) of the application device 110, which the processing unit 111 executes to implement the deblurring network 101. In some examples, deblurring network 101 may be encoded and then stored as instructions in a memory of processing unit 111 (e.g., the weights of deblurring network 101 may be stored in a corresponding weight memory of processing unit 111, which processing unit 111 may be embodied as neural network processor 200 as shown in fig. 2). In some examples, deblurring network 101 may be implemented (as software and/or hardware) in an integrated circuit of application device 110. Although fig. 1 shows an example in which meta-training device 120 is separate from application device 110, it should be understood that the present disclosure is not limited to this embodiment. In some examples, there may be no separate meta-training device 120 and application device 110. That is, meta-training of deblurring network 101 and application-time training of deblurring network 101 may be performed on the same device (e.g., application device 110).
The application device 110 may be a user device, such as a client terminal, a mobile terminal, a tablet, a notebook, an Augmented Reality (AR) device, a Virtual Reality (VR) device, or an in-vehicle terminal. The application device 110 may also be a server, a cloud computing platform, or the like, through which a user may access. In fig. 1, the application device 110 includes an I/O interface 112 for data interaction with external devices. For example, the application device 110 may provide upload data (e.g., image data, such as photographs and/or videos captured by the application device 110) to the database 130 via the I/O interface 112. Although fig. 1 shows an example in which a user directly interacts with the application device 110, it should be understood that the present disclosure is not limited to this embodiment. In some examples, there may be a user device separate from application device 110 with which the user interacts, which in turn communicates data with application device 110 via I/O interface 112.
In this example, the application device 110 includes a data storage 114, which data storage 114 may be a system memory (e.g., random access memory (random access memory, RAM), read-only memory (ROM), etc.) or a mass storage device (e.g., solid state drive, hard disk drive, etc.). The data memory 114 may store data accessible to the processing unit 111. For example, the data storage 114 may be separate from the processing unit 111, storing the captured image and/or the restored image on the application device 110.
In some examples, the application device 110 may optionally call data and code or the like from the external data storage system 150 for processing, or may store data and instructions or the like obtained by the respective processing in the data storage system 150.
It should be noted that fig. 1 is merely a schematic diagram of an example system architecture 100 according to an embodiment of the present disclosure. The relationships and interactions between the devices, components, and processing units shown in fig. 1, etc. are not intended to limit the present disclosure.
Fig. 2 is a block diagram of one example hardware architecture of an example neural network processor 200, according to an embodiment of the disclosure. The neural network processor 200 may be disposed on an integrated circuit (also referred to as a computer chip). The neural network processor 200 may be provided in the application device 110 shown in fig. 1, performing calculations for the processing unit 111 and implementing the deblurring network 101 (including performing application-time training of the deblurring network). Additionally or alternatively, the neural network processor 200 may be provided in the meta-training device 120 shown in fig. 1 to perform meta-training of the deblurring network 101. All algorithms of the layers in the neural network (e.g., the layers of deblurring network 101 discussed further below) may be implemented in neural network processor 200.
The neural network processor 200 may be any processor capable of performing the required calculations (e.g., the calculation of a number of exclusive or operations) in the neural network. For example, the neural network processor 200 may be a neural processing unit (neural processing unit, NPU for short), a tensor processing unit (tensor processing unit, TPU for short), or a graphics processing unit (graphics processing unit, GPU for short), or the like. The neural network processor 200 may be a coprocessor of an optional host central processing unit (central processing unit, CPU) 220. For example, the neural network processor 200 and the host CPU 220 may be mounted on the same package. The host CPU 220 may be responsible for executing core functions of the application device 110 (e.g., execution of an Operating System (OS), managing communications, etc.). The host CPU 220 may manage the operation of the neural network processor 200, for example, by assigning tasks to the neural network processor 200.
The neural network processor 200 includes an arithmetic circuit 203. The controller 204 of the neural network processor 200 controls the arithmetic circuit 203 so as to extract data (e.g., matrix data) from the input memory 201 and the weight memory 202 of the neural network processor 200, for example, and perform data operations (e.g., addition and multiplication operations).
In some examples, the arithmetic circuitry 203 internally includes a plurality of processing units (also referred to as Process Engines (PEs)). In some examples, the operational circuitry 203 is a two-dimensional systolic array. In other examples, the operation circuit 203 may be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some examples, the operational circuitry 203 is a general-purpose matrix processor.
In one example operation, the arithmetic circuit 203 retrieves weight data of the weight matrix B from the weight memory 202 and buffers the weight data in each PE of the arithmetic circuit 203. The arithmetic circuit 203 acquires input data of the input matrix a from the input memory 201, performs matrix operation based on the input data of the matrix a and the weight data of the matrix B, and stores the obtained partial or final matrix result in the accumulator 208 of the neural network processor 200.
In this example, the neural network processor 200 includes a vector calculation unit 207. The vector calculation unit 207 includes a plurality of arithmetic processing units. The vector calculation unit 207 performs further processing, such as vector multiplication, vector addition, exponential operation, logarithmic operation, or amplitude comparison, on the output from the operation circuit 203 (which may be retrieved from the accumulator 208 by the vector calculation unit 207), if necessary. The vector calculation unit 207 may be mainly used for the operation of the non-convolutional layer or the full-connection layer of the neural network. For example, the vector calculation unit 207 may perform processing such as pooling or normalization on the operations. The vector calculation unit 307 may apply a nonlinear function, such as a vector of accumulated values, to the output of the operation circuit 203 to generate an activation value. The activation value may be used by the arithmetic circuitry 203 as an activation input for the next layer of the neural network. In some examples, vector calculation unit 207 generates a normalized value, a combined value, or both.
In this example, the neural network processor 200 includes a memory cell access controller 205 (also referred to as direct memory access control (direct memory access control, abbreviated DMAC)). The memory unit access controller 205 is used to access memory external to the neural network processor 200 (e.g., the data memory 114 of the execution device 110) via the bus interface unit 210. The memory cell access controller 205 may access data from a memory external to the neural network processor 200 and transmit the data directly to one or more memories of the neural network processor 200. For example, the memory cell access controller 205 may transfer the weight data directly to the weight memory 202, or may transfer the input data directly to the unified memory 206 and/or the input memory 201. The unified memory 206 is used to store input data and output data (e.g., processing vectors from the vector calculation unit 207).
The bus interface unit 210 is also used for interaction between the memory unit access controller 205 and an instruction fetch memory (also referred to as an instruction fetch buffer) 209. The bus interface unit 210 is also used to enable the instruction fetch memory 209 to fetch instructions from a memory external to the neural network processor 200 (e.g., the data memory 114 of the application device 110). Instruction fetch memory 209 is used to store instructions for use by controller 204.
Typically, unified memory 206, input memory 201, weight memory 202, and instruction fetch memory 209 are memories (also referred to as on-chip memories) of neural network processor 200. The data memory 114 is independent of the hardware architecture of the neural network processor 200.
Fig. 3 is a block diagram of an example architecture of deblurring network 101. Details of meta-training and application-time training of deblurring network 101 are discussed further below.
The deblurring network 101 is meta-trained by combining training auxiliary tasks and primary deblurring tasks. The input to deblurring network 101 is a blurred input image (denoted as I b ) It may be represented as a two-dimensional matrix that is an encoding of individual pixels of a plurality of channels (e.g., red-green-blue (RGB) channels) of an input image. The architecture of the deblurring network 101 is described below in connection with the example of fig. 3. It should be appreciated that the architecture of deblurring network 101 may be modified (e.g., with a fewer or greater number of neural network layers). In the following discussion, the neural network layer (or block) of deblurring network 101 will be referred to as a layer for simplicity.
In this example, deblurring network 101 is a single-scale network having a plurality of convolution layers 306 and a plurality of deconvolution layers 308 for processing a blurred input image (I b ). In this example, each convolution layer 306 and each deconvolution layer 308 is followed by a corresponding residual layer 310a. The first convolution layer 306 of the deblurring network 101 receives a blurred input image (I b ) And performs a convolution operation on the blurred input image to generate a blurred input image (I b ) Output feature map (feature representation) of feature encoding of (a). The output feature map being the input feature map (i.e. feature representation) Is provided to the first residual layer 310a.
Each respective convolution layer 306 receives an output feature map (i.e., a feature representation) that is the result of the previous convolution layer 306 processing the input feature map (i.e., the feature representation); and performs convolution operations using different convolution kernels to input feature maps (i.e., feature representations). When the convolution layer 306 is the first convolution layer 306 in the deblurring network 101, the output feature map (i.e., feature representation) of the convolution layer 306 is a blurred input image (I b ) Is encoded by the features of (a); or when the convolution layer is another convolution layer 306 in the deblurring network 101, the output feature map (i.e., feature representation) of the convolution layer 306 encodes the features of the input feature map (i.e., feature representation). Each deconvolution layer 308 decodes the feature map generated by the corresponding residual layer 310a of the deblurring network 101 to generate an output feature map (i.e., a feature representation). Deblurring network 101 has two output layers 302 and 304, referred to herein as a primary output layer 302 and a secondary output layer 304. The primary output layer 302 performs a convolution operation to generate a feature map (i.e., a feature representation) that is combined with the blurred input image to generate an output image (i.e., a predicted clean image) for the primary deblurring task. The auxiliary output layer 304 generates an output image (i.e., a reconstructed blurred image) for the auxiliary task. In this example, the auxiliary output layer 304 is preceded by an auxiliary convolutional layer 306b (followed by an auxiliary residual layer 310 c). Short connections (indicated by dashed arrows) are used to provide a direct path between residual layer 310a (following the corresponding convolutional layer 306) to provide a feature map (feature representation) to residual layer 310b (following the corresponding deconvolution layer 308), and between the primary output layer 302 and the secondary convolutional layer 306. Short connections (also referred to as skipped connections) may be used for the residual neural network in order to learn the weights of the deblurring network 101 faster.
The deblurring network 101 has a shared subnet 312, the shared subnet 312 comprising a plurality of shared layers (including a convolution layer 306, a deconvolution layer 308, and residual layers 310a and 310 b). These sharing layers are coupled to a primary subnet 314 that performs primary tasks and a secondary subnet 312 that performs secondary tasks (where primary tasks are deblurring tasks, secondary tasks are defined to support learning of deblurring tasks, as described below). This means that training the deblurring network 101 to perform auxiliary tasks will also affect (and potentially boost) the performance of the primary deblurring task. In addition, deblurring network 101 includes a primary subnetwork 314 having layers specific to primary deblurring tasks (i.e., primary output layer 302) and a secondary subnetwork 316 having layers specific to secondary tasks (i.e., secondary convolutional layer 306, secondary residual layer 310, and secondary output layer 304). The primary subnetwork 314 and the secondary subnetwork 316 may also be referred to as a primary branch and a secondary branch, respectively. It should be appreciated that each of the subnets 312, 314, and 316 is defined as a respective set of one or more layers of the overall deblurring network 101 (e.g., as shown in fig. 3) and is not independent of the overall deblurring network 101. The primary subnet 314 processes the feature map (i.e., feature representation) generated by the shared subnet 312 to generate an output image (i.e., predicted deblurred image) for the primary deblurring task. The secondary subnetwork 316 processes the feature map (i.e., feature representation) generated by the shared subnetwork 312 to generate output images (i.e., reconstructed blurred images) for the secondary tasks. It should be understood that deblurring network 101 may be implemented by different subnets and layers than those described in this example. However, a short connection from the primary output layer 302 to the secondary convolution layer 306b is useful, as discussed further below.
The primary deblurring task is based on a blurred input image I b The predicted clean image (i.e., deblurred image) is output (noted as). Predicted clean image->And blurring the input image I b Having the same size and resolution (i.e., the same pixel size). In the example of FIG. 3, the primary deblurring task utilizes residual learning to predict a blurred input image I b And a difference (also referred to as a residual) between the reference true phase expected output image. Adding the predicted residual to the original blurred input image Ib to output a predicted deblurred image +.>The auxiliary task in this example is defined as self-supervised reconstruction, which means that the auxiliary task is based on a blurred input image I b Outputting the reconstructed blurred image (denoted as +.>). That is, the auxiliary task performed by deblurring network 101 is to learn weights to blur input image I b Map to a set of feature maps and reconstruct a blurred image using the feature maps. It should be appreciated that different auxiliary tasks may be defined (instead of the reconstruction task), for example, any other auxiliary task that supports the learning of the primary deblurring task may be defined, wherein the auxiliary task may be trained at the time of application without the need for reference real-world data separate from the input image at the time of application (or wherein the reference real-world data may be readily obtained).
Training deblurring network 101 to perform auxiliary tasks, as described above, is also advantageous for the performance of primary deblurring tasks. Performing auxiliary tasks during training of the deblurring network 101 provides regularization that guides learning the weights of the deblurring network while performing the primary deblurring task, which may result in better performance of the primary deblurring task. It should be noted that the weights of the deblurring network 101 learned during training include weights related to the blurred information about the blurred input image that can be used to reconstruct the blurred input image and generate a deblurred output image (i.e., a clean output image). The output features of the primary output layer 302 are replicated to the secondary convolution layer 306b (also referred to as feature delivery) over short connections to enable the application-time training of the secondary tasks to be counter-propagated, thereby updating the weights of the deblurring network 101, including the convolution kernel and the weights in the deconvolution kernel, to facilitate the primary tasks.
Fig. 4 is a flow chart of an example method 400 of deblurring an input image over a deblurring network 101. Method 400 provides an overview of the different training phases for deblurring network 101 and may include steps performed by meta-training device 120 and application device 110, for example, as shown in fig. 1.
The method 400 may begin at optional step 402. In optional step 402, pre-training may be performed. The labeled training data (e.g., from database 130) may be used to pre-train deblurring network 101 by jointly training deblurring network 101 to perform primary and secondary tasks. For example, pre-training may be performed by obtaining deblurred output images (from the primary subnet 314) and reconstructing blurred images (from the secondary subnet 316), calculating the loss for each subnet using the loss function, and performing back propagation based on the sum of the losses. Then, in a subsequent step 404, the pre-trained weights may be used as initial weights at the beginning of meta-training.
If optional step 402 is not performed, the weights of deblurring network 101 may be randomly initialized at the beginning of meta-training. For example, each weight may be set to a respective random value.
In step 404, meta-training of the deblurring network 101 is performed. In some examples, meta-training may be performed by meta-training device 120 in fig. 1 using labeled training data from database 130. In other examples, meta-training may be performed by application device 110. If pre-training is performed in step 402, the tag training data for meta-training may be different from the tag training data for pre-training (e.g., sampled from a different training data set, or from a different sample of the same training data set). Meta-training may be performed until a convergence condition is met (e.g., the loss function is optimized for the weight values of deblurring network 101, or a predetermined number of training iterations have been completed). Further details of meta-training will be described below.
The meta-training weights of deblurring network 101 are then stored. If the meta-training is performed in a meta-training device 120 that is separate from the application device 110, the meta-training device 120 may transmit meta-training weights to the application device 110 and the meta-training weights may be stored by the application device 110 in its local memory. If the meta-training is performed by the application device 110, the meta-training weights may be stored directly in the local memory of the application device 110. The deblurring network 101 with meta-training weights may be referred to as a meta-training deblurring network 101.
In step 406, application-time training of the deblurring network 101 is performed using the specific blurred input image. The specific blurred input image may be, for example, a real image (e.g., a digital photograph) captured by a digital camera, such as a stand-alone digital camera or a digital camera of an electronic device, such as a smart phone, a notebook or tablet computer, or the like. The application-time training of the deblurring network 101 enables the meta-training weights to be updated to the application-time training weights such that the deblurring network 101 is adapted to perform deblurring of a particular blurred input image. After the application-time training of the meta-training deblurring network 101, the deblurring network 101 with the application-time training weights receives a specific blurred input image, generates a predicted clean image (i.e., deblurred image) from the specific input image, and outputs the predicted clean image. The result of the application-time training of the deblurring network 101 is an application-time trained deblurring network 101 that performs better than the meta-training deblurring network 101 in deblurring a particular input image. Further details of the meta-training defuzzification network 101 application training will be described below.
The predicted clean image generated by the application-time trained deblurring network 101 may be stored, for example, in a local memory of the application device 110. The generated predicted clean image may also be displayed (e.g., on a display screen of the application device 110) or otherwise presented to the user. After the predicted clean image (for the particular blurred input image) is generated, the application-time trained deblurring network 101 may revert to the meta-training deblurring network 101 (i.e., the application-time trained weights may be discarded). For example, the values of the as-applied training weights may be replaced with previously stored meta-training weights. The application time training of the deblurring network 101 may be repeated starting with meta-training weights for deblurring of each particular blurred input image.
FIG. 5 is a flowchart of one example method for performing step 404 to perform meta-training of deblurring network 101. For example, the method of FIG. 5 may be performed by meta-training device 120 shown in FIG. 1. In some examples, meta-training may be performed by application device 110 instead of meta-training device 120.
Fig. 6 illustrates example pseudo code 600 that may be used (e.g., as software instructions executed by a processor) to implement the method of fig. 5. Fig. 5 and 6 will be discussed together.
Optionally, in step 502, pre-training weights for deblurring network 101 may be obtained. If optional step 402 of FIG. 4 is performed to pre-train deblurring network 101, step 502 may be performed.
In step 504, parameters for deblurring network 101 meta-training are initialized. For example, if pre-training weights are obtained in step 502, the deblurring network 101 may be initialized to a value with pre-training weights. If pre-training weights are not available, the weights of the deblurring network 101 may be randomly initialized (e.g., by setting the value of each weight to a corresponding random value). Other parameters for deblurring network 101 meta-training that may be initialized at step 504 include gradient steps for updating the weights of deblurring network 101 using gradient descent.
Step 504 may be performed using the instructions represented by row 602 in pseudo-code 600. In line 602, the weight of deblurring network 101 is denoted as θ, where θ s Representing the weights of shared subnetwork 312 (e.g., weights in the shared layer of shared subnetwork 312 of deblurring network 101 as shown in fig. 3), θ pri Representing the weights of the layers for the primary deblurring task only (e.g., weights in the layers of the primary subnet 314 of the deblurring network 101 as shown in fig. 3), θ aux Representing the weights of the layers for auxiliary tasks only (e.g., the weights in the layers of the auxiliary subnetwork 316 of the deblurring network 101 as shown in fig. 3). The gradient steps used to calculate the gradient descent of the auxiliary and main loss functions are denoted as α and β, respectively. For example, θ s 、θ pri And theta aux The values of (2) may be obtained from pre-training or may be randomly initialized. For example, the values of α and β may be selected according to the desired convergence speed of the gradient descent of the auxiliary and main loss functions.
Meta-training involves training deblurring network 101 to perform auxiliary tasks, and then training deblurring network 101 to perform primary deblurring tasks. Meta-training may include multiple rounds of training, where each round of training includes training deblurring network 101 to perform the auxiliary tasks for multiple iterations, followed by one training iteration of deblurring network 101 to perform the primary deblurring task. The multiple rounds of training may be repeated until a condition is met (e.g., the secondary and primary loss functions each converge on an optimal value, a predetermined number of rounds of training have been performed, or all of the labeled training data in the training dataset have been processed). Line 604 of pseudo code 600 may represent instructions to repeat training until a convergence condition is met.
A round of training (comprising a plurality of training iterations as described above) will now be described.
In step 506, training of the auxiliary task is performed. This may be performed by steps 508 and 510.
In step 508, the tag training data is sampled. For example, a batch of training marker training data may be sampled from database 130 containing marker training data. The sample batch of labeled training data includes paired training images, where each blurred input image is paired with a corresponding reference true phase clean (i.e., non-blurred) image. Each pair of blurred input images and corresponding reference live clean training images may be referred to herein as an input-output image pair. The number of input-output image pairs in a sample batch may be denoted as N (where N is a positive integer). Then, each blurred input image (denoted as) Clean output image (denoted as +.>) Pairing, where N is the index of each input-output image pair (N is a positive integer no greater than N). Line 606 of pseudo code 600 may represent instructions for obtaining the tag training data.
In step 510, an auxiliary task is performed and a temporary weight is calculated based on the calculated auxiliary loss. Step 510 is iterated (i.e., iterating from n=1 to n=n) for each value of index N in the sample batch.
For the nth blurred image, a blurred training image is usedPerforming an auxiliary task as an input image to generate a reconstructed blurred image, denoted +.>According to the auxiliary loss function, blurring training image +.>And reconstructing blurred image +.>Calculation of the auxiliary loss (denoted as L aux ) Wherein the training image is blurred->Used as a reference true phase. Auxiliary loss L aux It can be calculated, for example, as reconstructing blurred images +.>And fuzzy training image->Absolute distance between them. It is noted that a clean training image +.>Not used to train the deblurring network 101 to perform auxiliary tasks. The weight of the deblurring network 101 is then updated according to the auxiliary loss with respect to the gradient decline of the deblurring network 101 weight, which can be expressed by the following formula:
where n is the number used to train deblurring network 101 to perform auxiliary tasksIs used for indexing the blurred image of one iteration,representing temporary weights of deblurring network 101 based on one iteration calculation, +.representing update operations (e.g., replacing previous values of weights with updated values), α is the step size of the gradient decrease, +.>Is a gradient operation. It should be noted that for each training iteration of the deblurring network to perform the auxiliary task (i.e. for each respective blurred training image +. >Is used for the training of the deblurring network 101), a corresponding temporary weight is calculated>A collection.
Each of the N blurred input images in the sample batch of labeled training data is propagated forward through deblurring network 101. The respective temporary weight set is calculated by calculating the respective auxiliary loss and updating the weights of the deblurring network 101 by back-propagating the respective auxiliary loss (e.g., by gradient descent). Thus, N temporary weight sets are obtained, after which the method proceeds to step 512. Lines 608-614 of pseudo code may represent instructions for performing step 510.
In step 512, training of the deblurring network 101 is performed on the primary deblurring task. This may be performed by step 514.
In step 514, a primary deblurring task is performed to predict the predicted clean image and the weights of the deblurring network 101 are updated based on the calculated primary loss. An iteration of step 514 may be performed for each round of training. To calculate the dominant loss, training images are blurred for every n in the sample batchGeneratingA predicted clean picture, denoted +.>
Clean image by principal loss function, reference true phaseAnd predicted clean image- >Calculate the principal loss (denoted as L pri ). Reference live-view clean image from nth mark training data in sample lot +.>(and n-th blurred input image->Pairing) is used as a reference true phase. Major loss L pri Can be calculated as, for example, a generated clean image +.>And clean training image->Absolute distance between them. Major loss L pri Is also based on the temporary weight calculated when training the deblurring network 101 to perform auxiliary tasks +.>Calculating a principal loss L for each marker training data (input-output image pair) in a sample batch of marker training data pri And sum gradients of all calculated principal losses. The summed gradient is then used to update the weight of the deblurring network 101, which can be expressed by the following formula:
where β is the step size of the gradient decrease. Lines 616-618 of pseudo code may represent instructions for performing step 514.
It should be noted that the training of the deblurring network 101 based on the primary loss is based on temporary weights calculated from the secondary loss. Training the deblurring network 101 to perform the primary deblurring task is designed such that updating based on the secondary loss may improve the performance of the deblurring network 101 in performing the primary deblurring task.
The conditions are then checked to determine if meta-training should end. If the conditions are not met (e.g., neither the auxiliary loss function nor the primary loss function of the deblurring network 101 converges), the method returns to step 506 to perform another round of training. If the condition is met, the method proceeds to step 516.
In step 516, after the meta-training is completed, weights learned during the meta-training are stored, which may be referred to as meta-training weights. For example, if the meta-training is performed in a meta-training device 120 that is separate from the application device 110, the meta-training device 120 may transmit meta-training weights to the application device 110, and the meta-training weights may be stored by the application device 110 in its local memory. If the meta-training is performed by the application device 110, the meta-training weights may be stored directly in the local memory of the application device 110.
The meta-training weights may be stored and used as initial weights for deblurring network 101 for application-time training of deblurring network 101. Application-time training uses application-specific blurred images (e.g., real blurred digital images without corresponding reference real images) to update the meta-training weights of deblurring network 101 in relatively few training iterations (e.g., less than ten iterations, or less than ten iterations). The updated meta-training weights are the as-applied training weights. The deblurring network 101 with the application-time training weights works better in deblurring a particular blurred input image after the application-time training is completed.
As shown in fig. 4, the application time training of the deblurring network 101 follows the meta training of the deblurring network 101. However, it should be appreciated that the application time training of the deblurring network 101 does not necessarily follow the meta training of the deblurring network 101. For example, the meta-training weights may be stored by the application device 110 in a local memory. When it is desired to deblur a particular blurred input image, such as in response to user input on the application device 110 (e.g., a user selecting an option to deblur a captured image), meta-training weights may be used to train weights at a later time of application. The at-application training may be considered as training of real data (e.g., a real blurred image, without a reference real clean image) during the application phase, unlike meta-training performed on a training dataset comprising labeled training data (i.e., training data comprising input-output pairs comprising blurred input images and corresponding reference real clean images).
FIG. 7 is a flowchart of one example method for performing step 406 to perform application-time training of deblurring network 101. For example, the method in fig. 7 may be performed by the application device 110 shown in fig. 1.
In step 702, parameters for training when deblurring network 101 is applied are initialized. Initialization may include obtaining the deblurring network 101 with meta-training weights (e.g., by initializing the weights of the deblurring network 101 with values of stored meta-training weights). Other parameters for application time training that may be initialized in step 702 include gradient steps for adapting weights by gradient descent.
In step 704, an application-time blurred input image is acquired. For example, the application-time blurred input image may be a real image captured by a digital camera of application device 110. The application-time blurred input image may be obtained in real-time as the image is captured, or may be a previously captured image, such as retrieved from a local memory of the application device 110.
In the application-time training of the deblurring network 101, the weights of the deblurring network 101 are updated by further training the deblurring network 101 to perform auxiliary tasks (e.g., self-supervising auxiliary reconstruction tasks). Updating the weights of the deblurring network 101 based on the auxiliary tasks also facilitates the primary deblurring tasks because the deblurring network 101 (see fig. 3) includes features that are passed from the primary subnet 314 to the secondary subnet 316.
In step 706, an auxiliary task is performed on the temporal blur input image to predict the reconstructed blurred image. The auxiliary task in this example is to reconstruct the blurred input image, the features being passed from the primary deblurring task. However, as discussed above, an auxiliary task may be defined as any other auxiliary task that supports learning a primary deblurring task (i.e., learning to perform an auxiliary task enables the deblurring network 101 to learn weights to generate a feature map related to performing the primary task).
In step 708, the weights of deblurring network 101 are updated based on the calculated assistance loss (these weights are initialized to meta-training weights in step 702). The updated weights of deblurring network 101 are referred to as application training weights. The calculation of the auxiliary penalty may be the same as in meta-training (e.g., as shown in line 612 of pseudo-code 600). That is, the auxiliary loss may be calculated from the auxiliary loss function (same auxiliary loss function as used in meta-training), the as-applied blurred input image, and the reconstructed blurred image. The updating of the weights of deblurring network 101 may be performed over several iterations (e.g., less than ten or less than five iterations) (by repeating steps 706 and 708). Alternatively, steps 706 and 708 may be performed only once as an iteration. In general, the number of iterations performed during application-time training should be equal to the number of iterations performed when performing auxiliary tasks during meta-training to achieve better performance.
In step 710, after the application-time training is complete (e.g., after the defined number of iterations of steps 706 and 708 have been performed), the deblurring network 101 with the application-time training weights is used to generate a predicted clean image from the application-time blurred input image. For example, the predicted clean image may be stored locally on the application device 110 and/or may be output to a display device (e.g., a display screen of the application device 110) for viewing by a user.
In various examples, the present disclosure describes methods and systems for image deblurring. A machine learning based method is described that can achieve deblurring of dynamic scenes and deblurring of images with uniform image blur.
The disclosed method for training the deblurring network involves meta-training and application-time (or test-time) training to perform auxiliary tasks (also referred to as meta-auxiliary learning) may enable the weights of the deblurring network 101 to be adapted to each application-time input image in relatively few iterations (e.g., less than five application-time iterations). Such rapid adaptation of the weights of the deblurring network 101 may make the disclosed deblurring method and system useful in practical applications (e.g., for deblurring real-world images captured by a digital camera of an electronic device).
The deblurring network (e.g., with a system architecture) is designed such that training of auxiliary tasks (e.g., self-supervising auxiliary reconstruction tasks) facilitates primary deblurring tasks.
In various evaluation tests, examples of the present disclosure have been found to perform better than some other existing machine learning based deblurring techniques that are adapted when no application is made.
The present disclosure describes examples in the context of image deblurring, including deblurring of captured digital images. It should be appreciated that the present disclosure may be applicable to deblurring of both still images (e.g., digital photographs) and video images (e.g., each image to be deblurred is a corresponding frame in video).
Although described in the context of deblurring, it should be understood that the present disclosure may be applicable to other image restoration tasks, such as defogging, denoising, or rain removal, among others. For example, the primary deblurring task may be a primary defogging task, a primary denoising task, or a primary rain removal task. It should be understood that the present disclosure is not limited to image deblurring only. Any task that would benefit from application-time training and that does not have reference real-phase data in application-time training may benefit from examples of the present disclosure.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether a function is performed by hardware or software depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality for each particular application using different methods, but such implementation is not intended to be beyond the scope of this disclosure.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, reference may be made to the corresponding process in the above method embodiment for the specific working process of the above system, apparatus and unit, and this is not repeated here.
It should be understood that the disclosed systems and methods may be implemented in other ways. The elements described as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in a single location, or may be distributed over a plurality of network elements. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment. In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
When the functions are implemented in the form of software functional units and sold or used as a stand-alone product, the functions may be stored in a computer-readable storage medium. Based on such an understanding, the technical solution of the invention may be embodied essentially or partly in the form of a software product or in part in addition to the prior art. The software product is stored in a storage medium comprising instructions for instructing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the methods described in the various embodiments of the present application. The storage medium includes any medium that can store program codes, such as a universal serial bus (universal serial bus, abbreviated as USB) flash drive, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, abbreviated as RAM), a magnetic disk, or an optical disk.
The foregoing description is only a specific embodiment of the present application and is not intended to limit the scope of the present disclosure. Any changes or substitutions that would be obvious to one skilled in the art within the scope of the present disclosure are intended to be within the scope of the present disclosure.

Claims (22)

1. A method of deblurring an image, comprising:
acquiring a deblurred neural network having a meta-training weight, wherein the meta-training weight was previously obtained by meta-training the deblurred neural network on a primary deblurring task and a secondary reconstruction task;
acquiring a fuzzy input image during application;
performing an application time training on the deblurred neural network with the meta training weights by the application time blurred input image to obtain application time training weights of the deblurred neural network by:
executing the auxiliary reconstruction task on the application-time blurred input image to predict a reconstructed blurred image; and
updating the meta-training weights of the deblurring neural network based on the auxiliary loss calculated from the auxiliary loss function, the application-time blurred input image, and the reconstructed blurred image; and
After the application time training is finished, a deblurring output image is generated according to the application time blurred input image through the deblurring neural network with the application time training weight.
2. The method of claim 1, wherein the application-time training comprises a plurality of iterations of the application-time blurred input image, the application-time blurred input image being a single blurred image, wherein each iteration comprises performing the auxiliary reconstruction task and updating weights of the deblurring neural network.
3. The method of claim 2, wherein the application time training comprises at most five iterations.
4. A method according to any of claims 1 to 3, characterized in that the auxiliary reconstruction task is performed by features transferred from the main deblurring task.
5. A method according to any one of claims 1 to 3, wherein the deblurring neural network comprises:
a shared subnetwork for processing the application-time blurred input image, wherein the shared subnetwork is coupled to a main subnetwork for performing the main deblurring task and a secondary subnetwork for performing the auxiliary reconstruction task;
The main sub-network comprising a main output neural network layer for performing the main deblurring task, the main sub-network processing output from the shared sub-network to generate the deblurred output image;
the auxiliary sub-network includes an auxiliary output neural network layer for performing the auxiliary reconstruction task, the auxiliary sub-network processing output from the shared sub-network to generate the reconstructed blurred image.
6. The method of claim 5, wherein the features output by the primary output neural network layer are replicated to a neural network layer of the secondary sub-network.
7. The method of claim 5, wherein in the main subnetwork, the output from the main output neural network layer represents a residual between the deblurred output image and the application-time blurred input image, and the residual is added to the application-time blurred input image to generate the deblurred output image.
8. The method of any of claims 1 to 7, wherein the meta-training of the deblurring neural network is performed based on a training data set, wherein the training data set comprises input-output pairs of labeled training data.
9. A method of training a deblurring neural network, comprising:
initializing the weight of a deblurring neural network;
performing a round of meta-training of the deblurring neural network to perform a primary deblurring task and an auxiliary reconstruction task, and obtaining meta-training weights by:
sampling a sampling batch of training data, wherein the sampling batch comprises a plurality of blurred training images, each blurred training image paired with a corresponding clean training image in the sampling batch;
for each given fuzzy training image in the sample batch, performing the auxiliary reconstruction task on the given fuzzy training image to predict a corresponding reconstructed fuzzy image, and calculating a corresponding temporary weight set based on the auxiliary loss calculated from an auxiliary loss function, the given fuzzy training image, and the corresponding reconstructed fuzzy image;
for each given pair of blurred training images and corresponding clean training images in the sample batch, performing the primary deblurring task to predict a corresponding predicted clean image and calculating a corresponding primary loss according to a primary loss function, the corresponding clean training image, the corresponding predicted clean image and a corresponding temporary weight set; and
Updating the weight of the deblurring neural network by gradient summation of the main losses calculated correspondingly; and after meta training is completed, storing the meta training weights, the deblurring neural network with the meta training weights may be further trained for use in application of blurred input images.
10. The method of claim 9, wherein the auxiliary reconstruction task is performed by features passed from the primary deblurring task.
11. The method according to any one of claims 9 or 10, wherein the deblurring neural network comprises:
a shared subnetwork for processing the blurred training image, wherein the shared subnetwork is coupled to a primary subnetwork for performing the primary deblurring task and a secondary subnetwork for performing the secondary reconstruction task;
the main sub-network comprising a main output neural network layer for performing the main deblurring task, the main sub-network processing output from the shared sub-network to generate the deblurred output image;
the auxiliary sub-network includes an auxiliary output neural network layer for performing the auxiliary reconstruction task, the auxiliary sub-network processing output from the shared sub-network to generate the reconstructed blurred image.
12. The method of claim 11, wherein the features output by the primary output neural network layer are replicated to a neural network layer of the secondary sub-network.
13. The method of claim 11, wherein in the main subnetwork, the output from the main output neural network layer represents a residual between the deblurred output image and the application-time blurred input image, and the residual is added to the application-time blurred input image to generate the deblurred output image.
14. An apparatus for deblurring an image, comprising:
a processor to execute instructions to cause the apparatus to:
acquiring a deblurred neural network having a meta-training weight, wherein the meta-training weight was previously obtained by meta-training the deblurred neural network on a primary deblurring task and a secondary reconstruction task;
acquiring a fuzzy input image during application;
performing an application time training on the deblurred neural network with the meta training weights by the application time blurred input image to obtain application time training weights of the deblurred neural network by:
Executing the auxiliary reconstruction task on the application-time blurred input image to predict a reconstructed blurred image; and
updating the meta-training weights of the deblurring neural network based on the auxiliary loss calculated from the auxiliary loss function, the application-time blurred input image, and the reconstructed blurred image; and
after the application time training is finished, a deblurring output image is generated according to the application time blurred input image through the deblurring neural network with the application time training weight.
15. The apparatus of claim 14, wherein the application-time training comprises a plurality of iterations of the application-time blurred input image, the application-time blurred input image being a single blurred image, wherein each iteration comprises performing the auxiliary reconstruction task and updating weights of the deblurring neural network.
16. The apparatus of claim 15, wherein the application time training comprises at most five iterations.
17. The apparatus according to any of claims 14 to 16, wherein the auxiliary reconstruction task is performed by features passed from the primary deblurring task.
18. The apparatus of any one of claims 14 to 17, wherein the deblurring neural network comprises:
a shared subnetwork for processing the application-time blurred input image, wherein the shared subnetwork is coupled to a main subnetwork for performing the main deblurring task and a secondary subnetwork for performing the auxiliary reconstruction task;
the main sub-network comprising a main output neural network layer for performing the main deblurring task, the main sub-network processing output from the shared sub-network to generate the deblurred output image;
the auxiliary sub-network includes an auxiliary output neural network layer for performing the auxiliary reconstruction task, the auxiliary sub-network processing output from the shared sub-network to generate the reconstructed blurred image.
19. The apparatus of claim 18, wherein the features output by the primary output neural network layer are replicated to a neural network layer of the secondary sub-network.
20. The apparatus of any of claims 15 to 19, wherein the meta-training of the deblurring neural network is performed based on a training data set, wherein the training data set comprises input-output pairs of labeled training data.
21. A computer program comprising instructions which, when executed by a computer, cause the computer to perform the method of any one of claims 1 to 13.
22. A computer-readable medium comprising instructions which, when executed by a computer, cause the computer to perform the method of any one of claims 1 to 13.
CN202180079713.XA 2021-02-19 2021-02-19 Image enhancement method and device Pending CN116547696A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2021/054088 WO2022174908A1 (en) 2021-02-19 2021-02-19 Method and apparatus for image enhancement

Publications (1)

Publication Number Publication Date
CN116547696A true CN116547696A (en) 2023-08-04

Family

ID=74673209

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180079713.XA Pending CN116547696A (en) 2021-02-19 2021-02-19 Image enhancement method and device

Country Status (3)

Country Link
EP (1) EP4232944A1 (en)
CN (1) CN116547696A (en)
WO (1) WO2022174908A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117893413A (en) * 2024-03-15 2024-04-16 博创联动科技股份有限公司 Vehicle-mounted terminal man-machine interaction method based on image enhancement

Also Published As

Publication number Publication date
WO2022174908A1 (en) 2022-08-25
EP4232944A1 (en) 2023-08-30

Similar Documents

Publication Publication Date Title
Zhang et al. Gated fusion network for joint image deblurring and super-resolution
US20210350168A1 (en) Image segmentation method and image processing apparatus
Xu et al. Learning to restore low-light images via decomposition-and-enhancement
CN111311629B (en) Image processing method, image processing device and equipment
US11107205B2 (en) Techniques for convolutional neural network-based multi-exposure fusion of multiple image frames and for deblurring multiple image frames
CN112308200B (en) Searching method and device for neural network
WO2022116856A1 (en) Model structure, model training method, and image enhancement method and device
CN109993707B (en) Image denoising method and device
WO2022134971A1 (en) Noise reduction model training method and related apparatus
CN116547694A (en) Method and system for deblurring blurred images
Zhang et al. Gated fusion network for degraded image super resolution
CN112446380A (en) Image processing method and device
CN111914997B (en) Method for training neural network, image processing method and device
CN112070664B (en) Image processing method and device
CN109993712A (en) Training method, image processing method and the relevant device of image processing model
CN112541877B (en) Defuzzification method, system, equipment and medium for generating countermeasure network based on condition
Zuo et al. Convolutional neural networks for image denoising and restoration
US11915383B2 (en) Methods and systems for high definition image manipulation with neural networks
Rasheed et al. LSR: Lightening super-resolution deep network for low-light image enhancement
Nam et al. Learning srgb-to-raw-rgb de-rendering with content-aware metadata
Sharif et al. DarkDeblur: Learning single-shot image deblurring in low-light condition
Chen et al. Deep Richardson–Lucy Deconvolution for Low-Light Image Deblurring
CN113284055A (en) Image processing method and device
CN116547696A (en) Image enhancement method and device
US20230073175A1 (en) Method and system for processing image based on weighted multiple kernels

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination