CN113496465A

CN113496465A - Image scaling

Info

Publication number: CN113496465A
Application number: CN202010203650.1A
Authority: CN
Inventors: 郑书新; 刘畅; 贺笛; 柯国霖; 李亚韬; 边江; 刘铁岩
Original assignee: Microsoft Technology Licensing LLC
Current assignee: Microsoft Technology Licensing LLC
Priority date: 2020-03-20
Filing date: 2020-03-20
Publication date: 2021-10-12
Also published as: EP4121936A1; US20230093734A1; JP2023517486A; WO2021188254A1; KR20220157402A

Abstract

According to implementations of the present disclosure, a scheme for image scaling is presented. According to this scheme, an input image having a first resolution is acquired. Generating an output image having a second resolution and high frequency information obeying a predetermined distribution based on the input image with the trained reversible neural network, wherein the first resolution is higher than the second resolution and the input image and the output image have the same semantics. Further, another input image having a second resolution is acquired. With an inverse network of the reversible neural network, a further output image with the first resolution is generated based on a further input image and the high frequency information from the predetermined distribution, wherein the further input image and the further output image have the same semantic meaning. The scheme can reduce the original image into a low-resolution image with the same semanteme and pleasant vision, and can reconstruct a high-quality high-resolution image from the low-resolution image.

Description

Image scaling

Background

Image scaling is one of the most common operations when processing digital images. On the one hand, with the use of large amounts of high resolution images/videos on the internet, image reduction is essential for storing, transmitting and sharing such large size data, because the reduced images can significantly save storage space and efficiently improve bandwidth utilization while maintaining the same semantic information. On the other hand, many such image reduction scenarios inevitably place high demands on the inverse task (i.e., enlarging the reduced image to its original size).

Conventional image reduction (i.e., reducing a high resolution image to a low resolution image) schemes tend to result in the loss of high frequency information in the high resolution image. Due to the absence of high frequency information, conventional image magnification (i.e., magnifying a low resolution image into a high resolution image) schemes often fail to reconstruct a high quality high resolution image from a low resolution image.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Drawings

FIG. 1A illustrates a schematic block diagram of a computing device capable of implementing various implementations of the present disclosure;

FIG. 1B illustrates a schematic diagram of the working principle of an image scaling module according to an implementation of the present disclosure;

FIG. 2A illustrates a schematic block diagram of a reversible neural network in accordance with implementations of the present disclosure;

FIG. 2B illustrates a schematic diagram of an example reversible neural network element, according to an implementation of the present disclosure;

FIG. 3A illustrates a schematic block diagram of an inverse network of a reversible neural network in accordance with implementations of the present disclosure;

FIG. 3B illustrates a schematic diagram of an example reversible neural network element, in accordance with implementations of the present disclosure;

FIG. 4 illustrates a flow diagram of an example method for image scaling according to an implementation of the present disclosure;

FIG. 5 illustrates a flow diagram of an example method for image scaling according to an implementation of the present disclosure; and

FIG. 6 illustrates a block diagram of an example system capable of implementing implementations of the present disclosure.

In the drawings, the same or similar reference characters are used to designate the same or similar elements.

Detailed Description

The present disclosure will now be discussed with reference to several example implementations. It should be understood that these implementations are discussed only to enable those of ordinary skill in the art to better understand and thus implement the present disclosure, and are not intended to imply any limitation as to the scope of the present disclosure.

As used herein, the term "include" and its variants are to be read as open-ended terms meaning "including, but not limited to. The term "based on" is to be read as "based, at least in part, on". The terms "one implementation" and "an implementation" are to be read as "at least one implementation". The term "another implementation" is to be read as "at least one other implementation". The terms "first," "second," and the like may refer to different or the same object. Other explicit and implicit definitions are also possible below.

As used herein, a "neural network" is capable of processing an input and providing a corresponding output, which generally includes an input layer and an output layer and one or more hidden layers between the input layer and the output layer. The layers in the neural network are connected in sequence such that the output of a previous layer is provided as the input of a subsequent layer, wherein the input layer receives the input of the neural network model and the output of the output layer is the final output of the neural network model. Each layer of the neural network model includes one or more nodes (also referred to as processing nodes or neurons), each node processing input from a previous layer. The terms "neural network," "model," "network," and "neural network model" are used interchangeably herein.

As mentioned above, image scaling is one of the most common operations when processing digital images. However, conventional image reduction (i.e., reducing a high resolution image to a low resolution image) schemes tend to result in the loss of high frequency information in the high resolution image. The absence of high frequency information makes the task of image magnification (i.e., magnifying a low resolution image into a high resolution image) very challenging, meaning that there may be multiple High Resolution (HR) images corresponding to the same Low Resolution (LR) image (also referred to as image magnification process discomfort). Therefore, conventional approaches often fail to reconstruct a high quality HR image from the LR image.

Conventional approaches typically select a Super Resolution (SR) method to magnify the LR image. Current SR methods focus primarily on learning a priori information through example-based strategies or deep learning models. Obviously, if the target LR image is pre-reduced from the corresponding HR image, considering the image reduction method in the image enlargement process will help to improve the quality of HR image reconstruction. However, current SR methods do not take this into account.

Conventional image reduction methods employ a frequency-based kernel (such as bilinear interpolation, bicubic interpolation, etc.) as a low-pass filter to subsample the input HR image to the target resolution. These methods typically result in the image being overly smooth because high frequency information is suppressed. Recently, several image reduction methods have been proposed that preserve detail or structural similarity. However, none of these perceptually oriented image reduction methods takes into account the potential mutual enhancement between image reduction and its inverse task (i.e., image magnification).

Some conventional approaches attempt to model image reduction and image enlargement as a joint task, taking into account the potential mutual enhancement between image reduction and its inverse task (i.e., image enlargement). For example, some schemes propose an image reduction model based on an auto-encoder framework, where the encoder and decoder act as image reduction and SR models, respectively, such that the image reduction and image magnification processes are jointly trained as a unified task. Some solutions propose to estimate the scaled-down low resolution image using a convolutional neural network and perform HR image reconstruction using a learned or specified SR model. Still other schemes propose content adaptive resampler based image reduction methods that can be trained with any existing SR model. Although these schemes can improve the quality of the HR image restored from the LR image after the reduction to some extent, the problem of the ill-qualification of the image enlargement process cannot be fundamentally solved, and therefore the HR image of high quality cannot be reconstructed from the LR image.

In accordance with an implementation of the present disclosure, a scheme for image scaling is presented. In this scheme, an input image with a first resolution is scaled to an output image with a second resolution using a reversible neural network. Furthermore, the inverse network of the neural network is capable of scaling an input image having the second resolution into an output image having the first resolution. Specifically, the neural network can convert the HR image into the LR image and high frequency information from a specific distribution when image reduction is performed. The inverse network of the neural network is capable of converting the LR image and the high-frequency information complying with the specific distribution into the HR image at the time of image enlargement. Because the reversible neural network is utilized to model the image reduction and image enlargement processes, the scheme can reduce the original image into a low-resolution image which is pleasant to the eyes, and greatly relieve the problem of the inadaptation of the image enlargement process, thereby reconstructing a high-quality high-resolution image from the low-resolution image.

Various example implementations of this approach are described in further detail below in conjunction with the figures.

FIG. 1A illustrates a block diagram of a computing device 100 capable of implementing multiple implementations of the present disclosure. It should be understood that the computing device 100 shown in FIG. 1 is merely exemplary, and should not be construed as limiting in any way the functionality or scope of the implementations described in this disclosure. As shown in fig. 1, computing device 100 comprises computing device 100 in the form of a general purpose computing device. Components of computing device 100 may include, but are not limited to, one or more processors or processing units 110, memory 120, storage 130, one or more communication units 140, one or more input devices 150, and one or more output devices 160.

In some implementations, the computing device 100 may be implemented as various user terminals or service terminals. The service terminals may be servers, mainframe computing devices, etc. provided by various service providers. A user terminal such as any type of mobile terminal, fixed terminal, or portable terminal, including a mobile handset, station, unit, device, multimedia computer, multimedia tablet, internet node, communicator, desktop computer, laptop computer, notebook computer, netbook computer, tablet computer, Personal Communication System (PCS) device, personal navigation device, Personal Digital Assistant (PDA), audio/video player, digital camera/camcorder, positioning device, television receiver, radio broadcast receiver, electronic book device, game device, or any combination thereof, including the accessories and peripherals of these devices, or any combination thereof. It is also contemplated that computing device 100 can support any type of interface to the user (such as "wearable" circuitry, etc.).

The processing unit 110 may be a real or virtual processor and can perform various processes according to programs stored in the memory 120. In a multiprocessor system, multiple processing units execute computer-executable instructions in parallel to increase the parallel processing capability of computing device 100. The processing unit 110 may also be referred to as a Central Processing Unit (CPU), microprocessor, controller, microcontroller.

Computing device 100 typically includes a number of computer storage media. Such media may be any available media that is accessible by computing device 100 and includes, but is not limited to, volatile and non-volatile media, removable and non-removable media. Memory 120 may be volatile memory (e.g., registers, cache, Random Access Memory (RAM)), non-volatile memory (e.g., Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory), or some combination thereof.

Storage device 130 may be a removable or non-removable medium and may include a machine-readable medium, such as memory, a flash drive, a diskette, or any other medium, which may be used to store information and/or data and which may be accessed within computing device 100. The computing device 100 may further include additional removable/non-removable, volatile/nonvolatile storage media. Although not shown in FIG. 1, a magnetic disk drive for reading from or writing to a removable, nonvolatile magnetic disk and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk may be provided. In these cases, each drive may be connected to a bus (not shown) by one or more data media interfaces.

The communication unit 140 enables communication with another computing device over a communication medium. Additionally, the functionality of the components of computing device 100 may be implemented in a single computing cluster or multiple computing machines, which are capable of communicating over a communications connection. Thus, the computing device 100 may operate in a networked environment using logical connections to one or more other servers, Personal Computers (PCs), or another general network node.

The input device 150 may be one or more of a variety of input devices such as a mouse, keyboard, trackball, voice input device, and the like. Output device 160 may be one or more output devices such as a display, speakers, printer, or the like. Computing device 100 may also communicate with one or more external devices (not shown), such as storage devices, display devices, etc., communicating with one or more devices that enable a user to interact with computing device 100, or communicating with any devices (e.g., network cards, modems, etc.) that enable computing device 100 to communicate with one or more other computing devices, as desired, via communication unit 140. Such communication may be performed via input/output (I/O) interfaces (not shown).

In some implementations, some or all of the various components of computing device 100 may be provided in the form of a cloud computing architecture, in addition to being integrated on a single device. In a cloud computing architecture, these components may be remotely located and may work together to implement the functionality described in this disclosure. In some implementations, cloud computing provides computing, software, data access, and storage services that do not require end users to know the physical location or configuration of the systems or hardware providing these services. In various implementations, cloud computing provides services over a wide area network (such as the internet) using appropriate protocols. For example, cloud computing providers provide applications over a wide area network, and they may be accessed through a web browser or any other computing component. The software or components of the cloud computing architecture and corresponding data may be stored on a server at a remote location. The computing resources in a cloud computing environment may be consolidated at a remote data center location or they may be dispersed. Cloud computing infrastructures can provide services through shared data centers, even though they appear as a single point of access to users. Accordingly, the components and functionality described herein may be provided from a service provider at a remote location using a cloud computing architecture. Alternatively, they may be provided from a conventional server, or they may be installed directly or otherwise on the client device.

Computing device 100 may be used to implement image scaling according to various implementations of the present disclosure. The memory 120 may include an image scaling module 122 having one or more program instructions that may be accessed and executed by the processing unit 110 to implement the functionality of the various implementations described herein.

In performing image scaling, the computing device 100 may receive an input image 170 through the input device 150. In some implementations, the input image 170 may be, for example, an image having a first resolution. The input image 170 may be input to the image scaling module 122 in the memory 120. The image scaling module 122 may generate an output image 180 having a second resolution and high frequency information from a particular distribution based on the input image 170 using the trained reversible neural network, wherein the first resolution is higher than the second resolution and the input image 170 and the output image 180 have the same semantics. In other implementations, the input image 170 may be, for example, an image having a second resolution. The input image 170 may be input to the image scaling module 122 in the memory 120. The image scaling module 122 may generate an output image 180 with a first resolution based on the input image 170 and the high frequency information from the particular distribution using an inverse network of the reversible neural network, wherein the first resolution is higher than the second resolution and the input image 170 and the output image 180 have the same semantics. Output image 180 may be output via output device 160.

In some implementations, the image scaling module 122 may perform image downscaling (i.e., convert the HR image to the LR image) using a trained reversible neural network, and the image scaling module 122 may perform inverse magnification (i.e., reconstruct the LR image into the HR image) using an inverse network of the reversible neural network. FIG. 1B illustrates a schematic diagram of the working principle of the image scaling module 122 according to an implementation of the present disclosure. As shown in FIG. 1B, the image scaling module 122 may utilize a reversible neural network 191 (denoted as "f_θ") generates an output image 180 having a low resolution and high frequency information 185 from a predetermined distribution based on the input image 170 having a high resolution. For example, the high frequency information 185 may be embodied as high frequency noise that is unrelated to the semantics of the input image 170. The image scaling module 122 may utilize an inverse network 192 (represented as inverse neural network 191) of the reversible neural network 191

) An output image 180 having a high resolution is generated based on the input image 170 having a low resolution and the high frequency information 175 from a predetermined distribution. As used herein, "preFixed distribution "may include, but is not limited to, gaussian distribution, uniform distribution, etc., which may be specified during training of the reversible neural network.

Reversible neural network (INN) is a popular network structure in generative models, which can specify the mapping m ═ f_θ(n) and inverse mapping relationship thereof

The INN is typically composed of at least one reversible block. For the l block, input h^lIs divided into along the channel axis

And

and undergoes an affine (affine) transformation:

corresponding output is

Given an output, its inverse transform can be computed as follows:

wherein

ρ and η may be arbitrary functions, <' >Indicating a exclusive nor operation.

When INN is applied to the image scaling task, it can output not only the reduced low resolution image y but also high frequency information z following a certain distribution, which is embodied, for example, as high frequency noise independent of the semantics of the image, based on the input image x with high resolution. This enables the inverse network of INN to reconstruct a high quality high resolution image x from the low resolution image y and the noise z. That is, high-frequency information z lost in the image reduction process needs to be maintained to make the image scaling process reversible, and the entire image scaling process can utilize the mapping relationship (y, z) ═ f_θ(x) And

to indicate.

However, in the image enlargement process, it is often necessary to enlarge an arbitrary LR image. Therefore, the high-frequency information z corresponding to the input LR image does not normally exist. The inventors have noted that the information lost during image reduction is equivalent to high frequency detail according to Nyquist-Shannon sampling theorem. Given that a set of HR images corresponding to the same LR image includes different high frequency details, these details can often exhibit some variability and randomness. Thus, z can be represented as a random variable, with the distribution representing the manner in which z is represented by INN (i.e., f)_θThe manner in which z is output). In particular, the INN may be trained to satisfy a specified distribution p (z). In this way, the high-frequency noise z output by the reversible neural network does not have to be preserved during image reduction. Furthermore, during image magnification, a high resolution image may be reconstructed based on the low resolution image and any one sample at the specified distribution.

Fig. 2A illustrates a schematic block diagram of a reversible neural network 191 in accordance with an implementation of the present disclosure. It should be understood that the structure of the reversible neural network 191 as shown in fig. 2A is merely exemplary, and is not intended to limit the scope of the present disclosure. Implementations of the present disclosure are also applicable to reversible neural networks having different structures.

As shown in fig. 2A, the reversible neural network 191 may be formed by one or more downsampling modules 210 connected in series. For simplicity, one downsampling module 210 is shown in fig. 2A. The scale of image reduction supported by the reversible neural network 191 may be determined by the scale of image reduction supported by each downsampling module 210 and the number of downsampling modules 210 included. For example, assuming that each downsampling module 210 supports a 2-fold image reduction and the reversible neural network 191 includes 2 downsampling modules 210, the reversible neural network 191 supports a 4-fold image reduction.

As shown in FIG. 2A, for example, downsampling module 210 may include a transform module 230 and one or more INN units 220-1, 220-2 … … 220-M (collectively or individually referred to as "INN units 220", where M ≧ 1).

The transformation module 230 may decompose the input image 170 with high resolution into a low frequency component 242 and a high frequency component 241, wherein the low frequency component 242 represents the semantics of the input image 170 and the high frequency component 241 is related to the semantics. In some implementations, the transform module 230 may be implemented as a wavelet transform module, such as a Haar transform module. For example, when the transform module 230 is implemented as a Haar transform module, the downsampling module 210 may support a 2-fold reduction of the image. In particular, the Haar transform module may convert an input image or a set of eigenmaps having a length H, a width W, and a number of channels C into an output tensor

The first C slice in the output tensor can be approximated as a low-pass representation equivalent to a bilinear interpolated downsampling. The remaining three sets of C slices contain residual components in the vertical, horizontal and diagonal directions, respectively. These residual components are based on the high frequency information in the original HR image. Alternatively, the transform module 230 may also be implemented using a 1 x 1 reversible convolution block, or as any known or future developed transform module capable of decomposing the input image 170 into low and high frequency components. It should be appreciated that the implementation of the transform module 230 may be different when the image reduction scale supported by the downsampling module 210 is different. In this way, low frequency information 242 and high frequency information 241 may be fed to subsequent INN cells220-1。

As described above, the structure of each INN cell 220 should be reversible, thereby ensuring that the network structure of the neural network 191 is reversible. The INN unit 220 is configured to extract corresponding features from the input low frequency component and high frequency component, and convert the high frequency component related to the image semantics into high frequency information that obeys a predetermined distribution and is not related to the image semantics.

Fig. 2B shows a schematic diagram of an example INN cell 220 in accordance with an implementation of the present disclosure. It is assumed herein that the low frequency component and the high frequency component input to the INN unit 220 are represented as a low frequency component and a high frequency component, respectively

And

as shown in fig. 2B, low frequency components may be added

Applying affine transformation as shown in the above formula (1) and converting to high frequency components

Affine transformation as shown in the above formula (2) is applied. The transformation function shown in FIG. 2B

η and ρ may be arbitrary functions. It should be understood that INN cell 220 as shown in fig. 2B is shown for purposes of example only and is not intended to limit the scope of the present disclosure. Implementations of the present disclosure are also applicable to INN cells having other different structures. Examples of INN cells include, but are not limited to, invertible volume blocks, invertible residual network cells, invertible generating network cells, deep invertible network cells, and the like.

Fig. 3A shows a schematic block diagram of an inverse network 192 of the reversible neural network 191 shown in fig. 2A. As shown in fig. 3A, the network 192 may be formed of one or more upsampling modules 310 connected in series. For simplicity, one upsampling module 310 is shown in fig. 3A. The proportion of image magnification supported by the inverse network 192 may be determined by the proportion of image magnification supported by each upsampling module 310 and the number of upsampling modules 310 included. For example, assuming that each upsampling module 310 supports a 2 x magnification of the image and the inverse network 192 includes 2 upsampling modules 310, the inverse network 192 supports a 4 x magnification of the image.

As shown in FIG. 3A, for example, the upsampling module 310 may include a transformation module 330 and one or more INN units 320-1, 320-2 … … 320-M (collectively or individually referred to as "INN units 320", where M ≧ 1). The structure of the INN cell 320 is reciprocal to the structure of the INN cell 220 shown in fig. 2B, for example, as shown in fig. 3B. Taking INN unit 320-M as an example, assume herein that the input image 170 with low resolution input to INN unit 320-M is represented as

And high frequency information 175 following a predetermined distribution is represented as

As shown in fig. 3B, can be oriented towards

Applying an inverse of the affine transformation as shown in the above formula (3), and transforming

The inverse of the affine transformation as shown in the above equation (4) is applied. The transformation function shown in FIG. 3B

η and ρ may be arbitrary functions. It should be understood that the INN cell 320 as shown in fig. 3B is shown for exemplary purposes only and is not intended to limit the scope of the present disclosure. Implementations of the present disclosure are also applicable to INN cells having other different structures. Examples of INN cells include, but are not limited to, invertible volume blocks, invertible residual network cells, invertible generating network cells, deep invertible network cells, and the like.

As shown in fig. 3A, the one or more INN units 320 may convert the input image 170 having a low resolution and the high frequency information 175 from a predetermined distribution into high frequency components 341 and low frequency components 342 to be combined. In contrast to the transformation module 230 as shown in fig. 2A, the transformation module 330 may combine the high frequency components 341 and the low frequency components 342 into the output image 180 having a high resolution. In some implementations, when transform module 230 is implemented as a wavelet transform module, transform module 330 may be implemented as an inverse wavelet transform module. For example, when the transform module 230 is implemented as a Haar transform module, the transform module 330 may be implemented as an inverse Haar transform module. Alternatively, transform module 330 may also be implemented using a 1 x 1 reversible convolution block, or as any known or future developed transform module capable of merging low and high frequency components into an image.

The training process of the reversible neural network will be described in further detail below. Hereinafter, for the sake of simplicity, the neural network to be trained and its inverse network are collectively referred to as "model". As can be seen from the above description, the training goal of the model is to determine the mapping f between the high resolution image x, the low resolution image y, and the specified distribution p (z)_θ。

To achieve this training goal, in some implementations, a set of high resolution images may be acquired

(also referred to as a "first set of training images" where N represents the number of images) and a set of low resolution images (also referred to as a "second set of training images") semantically corresponding thereto as training data for training the model. In some implementations, the second set of training images with low resolution may be generated based on the first set of training images with high resolution. For example, interpolation, or any known or to be developed method, may be utilized to generate a low resolution training image from a high resolution training image that semantically corresponds to the high resolution training image. The scope of the present disclosure is not limited thereto. In some implementations, an objective function for training the model may be generated based on the first and second sets of training images. Then, by minimizing the objective functionAnd (4) determining parameters of the model.

In some implementations, an objective function for training the model may be determined based on a difference between the low resolution training image and a low resolution image generated by the model based on the high resolution training image. For example, for a high resolution training image x in the first set of training images⁽ⁿ⁾Suppose that the image x is trained by the model based on this high resolution⁽ⁿ⁾The generated low resolution image is represented as

And the second set of training images is compared to the high resolution training image x⁽ⁿ⁾The corresponding low resolution training image is represented as

Images may be trained based on low resolution

Model generated low resolution images

The difference between them generates an objective function (also referred to as "first objective function" or "low resolution guidance loss function") for training the reversible neural network. For example, the first objective function may be expressed as:

wherein

Representing a difference metric function, e.g. L₁Loss function or L₂A loss function.

Additionally or alternatively, in some implementations, the determination to train the model may be based on a difference between a high resolution training image and a high resolution image reconstructed by the model based on a low resolution imageThe objective function of (1). For example, for a high resolution training image x in the first set of training images⁽ⁿ⁾Suppose that the image x is trained by the model based on this high resolution⁽ⁿ⁾The generated low resolution image is represented as

Model-based low resolution images

The resulting high resolution reconstructed image is represented as

Where z obeys a predetermined distribution p (z) (i.e., z-p (z)), the image x may be trained based on high resolution⁽ⁿ⁾And high resolution reconstructed images

z) to generate an objective function (also referred to as "second objective function" or "high resolution reconstruction loss function") for training the reversible neural network. For example, the second objective function may be expressed as:

wherein

The difference between the high resolution original image and the reconstructed image is measured,

denotes the case where z obeys a predetermined distribution p (z)

Is expected to be the data of (1).

Additionally or alternatively, another goal of model training is to encourage the model to capture the data distribution of high resolution training images.It is assumed here that the first set of training data

The data distribution over is denoted as q (x). For example, for a high resolution training image x in the first set of training images⁽ⁿ⁾The high-resolution image reconstructed by the model is represented as

Wherein

Representing model-high resolution training image x⁽ⁿ⁾A low resolution image obtained after the reduction, and

representing random variables subject to a predetermined distribution p (z). By traversing a first set of training data

A reduced set of low resolution images may be obtained

Can be expressed as

Which represents the transformed random variable

The data distribution of (1), wherein the original random variable x obeys the data distribution q (x), i.e., x to q (x). Similarly, the high resolution image reconstructed by the model may be represented as

The distribution of data thereon can be expressed as

Because of the fact that

In some implementations, the data distribution may be based on the original data distribution q (x) and the modeled reconstructed data distribution

The difference between them generates an objective function (also referred to as a "third objective function" or "distribution matching loss function") for training the reversible neural network. For example, the third objective function may be expressed as:

wherein

For measuring the difference between the two data distributions.

In some cases, it may be difficult to directly minimize the third objective function as shown in equation (6), since both distributions are highly dimensional and may have unknown density functions. In some implementations, JS divergence can be utilized to measure the difference between the two data distributions. That is, the third objective function may also be expressed as:

in some implementations, an overall objective function for training the model may be generated based on a combination of the first objective function, the second objective function, and the third objective function. For example, the overall objective function may be expressed as:

L_total：＝λ₁L_recon+λ₂L_guide+λ₃L_distr (8)

wherein λ is₁、λ₂And λ₃Are coefficients used to balance the different loss terms.

In some implementations, to improve the stability of model training, a pre-training process may be performed before the model is trained using the overall objective function as shown in equation (8). During the pre-training process, a loss function can be matched with a weakened but more stable distribution. For example, the distribution matching loss function may be constructed based on a cross-entropy loss function to improve the stability of model training. For example, a distribution matching loss function (also referred to as a "fourth objective function") constructed based on a cross-entropy (CE) loss function may be expressed as:

where CE represents the cross entropy loss function. Accordingly, the overall objective function used in the pre-training process may be expressed as:

L_IRN：＝λ₁L_recon+λ₂L_guide+λ₃L′_distr (10)

In some implementations, after pre-training, the model may be re-trained with a second round of training using the overall objective function as shown in equation (8). Alternatively, in some implementations, after pre-training, a second round of training of the model may be performed with an overall objective function as shown in equation (11) below:

L_IRN+：＝λ₁L_recon+λ₂L_guide+λ₃L_distr+λ₄L_percp (11)

wherein the perceptual loss function L_percpFor measuring the difference in semantic features between the original high-resolution image and the reconstructed high-resolution image. For example, semantic features of both the original high-resolution image and the reconstructed high-resolution image may be extracted by other known reference models, which are not described in detail herein. Lambda [ alpha ]₁、λ₂、λ₃And λ₄Is used for balancing different loss termsThe coefficient of (a).

FIG. 4 illustrates a flow diagram of a method 400 for image scaling according to some implementations of the present disclosure. The method 400 may be implemented by the computing device 100, for example, at the image scaling module 122 in the memory 120 of the computing device 100. At block 410, the computing device 100 acquires an input image having a first resolution. At block 420, the computing device 100 generates, using the trained reversible neural network, an output image having a second resolution and high frequency information subject to a predetermined distribution based on the input image, wherein the first resolution is higher than the second resolution and the input image and the output image have the same semantics.

In some implementations, the reversible neural network includes a transformation module and at least one reversible network element, and generating the output image and the high frequency information based on the input image includes: decomposing, with the transform module, the input image into a low frequency component representing semantics of the input image and a high frequency component related to the semantics; and generating, with the at least one reversible network element, the output image and the high frequency information unrelated to the semantics based on the low frequency component and the high frequency component.

In some implementations, the transformation module includes any one of: a wavelet transform module; and a reversible volume block.

In some implementations, the method 400 further includes: training the reversible neural network such that the trained reversible neural network is capable of generating a second image having the second resolution and first high frequency information obeying the predetermined distribution based on the first image of the first resolution, and the trained inverse network of the reversible neural network is capable of generating a fourth image having the first resolution based on the third image of the second resolution and second high frequency information obeying the predetermined distribution.

In some implementations, training the reversible neural network includes: acquiring a first set of training images having the first resolution; acquiring a second set of training images respectively corresponding to semantics of the first set of training images and having the second resolution; and training the reversible neural network based on the first and second sets of training images.

In some implementations, acquiring the second set of training images includes: generating the second set of training images based on the first set of training images using interpolation.

In some implementations, training the reversible neural network includes: determining a plurality of objective functions based on the first and second sets of training images; determining a total objective function for training the reversible neural network by combining at least a portion of the plurality of objective functions; and determining network parameters of the reversible neural network by minimizing the overall objective function.

In some implementations, determining the plurality of objective functions includes: generating, with the reversible neural network, a third set of training images having the second resolution and a set of random variables based on the first set of training images; and determining a first objective function based on a difference between the second set of training images and the third set of training images.

In some implementations, determining the plurality of objective functions includes: generating, with the reversible neural network, a third set of training images having the second resolution and a set of random variables based on the first set of training images; generating, with the inverse network, a fourth set of training images having the first resolution based on the third set of training images and high frequency information that obeys the predetermined distribution; and determining a second objective function based on a difference between the first set of training images and the fourth set of training images.

In some implementations, determining the plurality of objective functions includes: determining a first data distribution of the first set of training images; determining a second data distribution for the fourth set of training images; and determining a third objective function based on a difference between the first data distribution and the second data distribution.

In some implementations, determining the plurality of objective functions includes: determining a third data distribution of the set of random variables; and determining a fourth objective function based on a difference between the third data distribution and the predetermined distribution.

FIG. 5 illustrates a flow diagram of a method 500 for image scaling according to some implementations of the present disclosure. The method 500 may be implemented by the computing device 100, for example, at the image scaling module 122 in the memory 120 of the computing device 100. At block 510, computing device 100 acquires an input image having a second resolution. At block 520, the computing device 100 generates an output image with a first resolution based on the input image and the high frequency information from the predetermined distribution using the trained reversible neural network, wherein the first resolution is higher than the second resolution and the input image and the output image have the same semantics.

In some implementations, the reversible neural network includes a transformation module and at least one reversible network element, and generating the output image based on the input image and the high frequency information includes: generating, with the at least one reversible network element, a low frequency component and a high frequency component to be merged based on the input image and the high frequency information, the low frequency component representing semantics of the input image and the high frequency component relating to the semantics; and combining, with the transform module, the low frequency component and the high frequency component into the output image.

In some implementations, the transformation module includes any one of: a wavelet inverse transformation module; and a reversible volume block.

As can be seen from the above description, implementations of the present disclosure propose a scheme for image scaling. When image reduction is performed, the reversible neural network can convert the HR image into an LR image and high-frequency noise from a specific distribution. The inverse network of the reversible neural network is capable of converting the LR image and random noise subject to the specific distribution into an HR image upon image magnification. Because the reversible neural network is utilized to model the image reduction and image enlargement processes, the scheme can reduce the original image into a low-resolution image which is pleasant to the eyes, and can reconstruct a high-quality high-resolution image from the low-resolution image, thereby greatly relieving the problem of the unsuitability of the image enlargement process. Furthermore, various experimental data indicate that implementations of the present disclosure enable better image reconstruction performance indicators, such as higher peak signal-to-noise ratio (PSNR) and/or Structural Similarity (SSIM), than conventional image scaling schemes.

Implementations of the present disclosure can be widely applied in the field of image and/or video processing. For example, online video streaming plays a crucial role in our lives, such as video websites, live websites, video streaming mobile applications, and so on. High quality online video streams are desirable, such as high resolution video with rich perceptual details. High-resolution video tends to occupy a large amount of network bandwidth for transmission. Therefore, to conserve network bandwidth, high-resolution video is typically processed and compressed when it is transmitted to the user client. This will result in low resolution video, often of poor quality, being presented at the user client. The above problems can be solved by applying the image scaling scheme according to the implementation of the present disclosure.

Fig. 6 illustrates a block diagram of an example system 600 capable of implementing implementations of the present disclosure. As shown in fig. 6, system 600 may include a video streaming service provider 610, a server 620, and a client device 630. For example, a video streaming service provider 610 may provide video data requested by a client device 630 to a server 620, and the server 620 may transmit the video data from the video streaming service provider 610 to the client device 630 via a network.

As shown in fig. 6, in some implementations, a video streaming service provider 610 may provide a high-resolution video stream 601, also referred to as a high-resolution image sequence 601, to a server 620. The server 620 may convert the high resolution image sequence 601 into a low resolution image sequence using the reversible neural network 191 described above. In some implementations, the server 620 may send the resulting sequence of low-resolution images directly to the client device 430 as the low-resolution video stream 602. In this case, the client device 630 may receive the sequence of low resolution images. Additionally or alternatively, in some implementations, the server 620 may video encode the sequence of low resolution images to generate an encoded low resolution video stream 602, and send the encoded low resolution video 602 to the client device 430 via the network. In this case, the client device 630 may decode the received encoded low resolution video 602 to obtain a decoded sequence of low resolution images. The client device 630 may then reconstruct the resulting sequence of low resolution images into a high resolution video stream 603 using the inverse network 192 of the reversible neural network 191 described above. In this way, the client can obtain a high quality video stream while conserving network bandwidth.

Implementations of the present disclosure can be applied to the field of image and/or video storage in addition to the field of image and/or video processing. For example, when storing high resolution images and/or videos in a storage device, the reversible neural network 191 described above may be utilized to convert the high resolution images and/or videos into low resolution images and/or videos and corresponding high frequency information subject to a predetermined distribution. The resulting low resolution images and/or videos may then be stored in a storage device, while the resulting corresponding high frequency information is discarded. When the images and/or videos stored in the storage device are to be accessed, the images and/or videos of low resolution may be first acquired from the storage device, and then the images and/or videos of high resolution may be reconstructed based on the acquired images and/or videos and random noise subject to the predetermined distribution using the inverse network 192 of the reversible neural network 191 described above. In this way, storage space for storing images and/or video can be saved without losing image and/or video quality.

Some example implementations of the present disclosure are listed below.

In a first aspect, the present disclosure provides a computer-implemented method. The method comprises the following steps: acquiring an input image having a first resolution; and generating, with the trained reversible neural network, an output image with a second resolution and high frequency information subject to a predetermined distribution based on the input image, wherein the first resolution is higher than the second resolution and the input image and the output image have the same semantics.

In some embodiments, the method further comprises: storing the output image without storing the high frequency information.

In some embodiments, the method further comprises: encoding the output image; and providing the encoded output image.

In some implementations, the method further comprises: training the reversible neural network such that the trained reversible neural network is capable of generating a second image having the second resolution and first high frequency information obeying the predetermined distribution based on the first image of the first resolution, and the trained inverse network of the reversible neural network is capable of generating a fourth image having the first resolution based on the third image of the second resolution and second high frequency information obeying the predetermined distribution.

In a second aspect, the present disclosure provides a computer-implemented method. The method comprises the following steps: acquiring an input image with a second resolution; and generating an output image with a first resolution based on the input image and high frequency information from a predetermined distribution using a trained reversible neural network, wherein the first resolution is higher than the second resolution and the input image and the output image have the same semantics.

In some implementations, acquiring the input image includes: obtaining the encoded input image; and decoding the encoded input image.

In a third aspect, the present disclosure provides an electronic device. The electronic device includes: a processing unit; and a memory coupled to the processing unit and containing instructions stored thereon that, when executed by the processing unit, cause the apparatus to perform acts comprising: acquiring an input image having a first resolution; and generating, with the trained reversible neural network, an output image with a second resolution and high frequency information subject to a predetermined distribution based on the input image, wherein the first resolution is higher than the second resolution and the input image and the output image have the same semantics.

In some embodiments, the actions further include: storing the output image without storing the high frequency information.

In some embodiments, the actions further include: encoding the output image; and providing the encoded output image.

In some implementations, the actions further include: training the reversible neural network such that the trained reversible neural network is capable of generating a second image having the second resolution and first high frequency information obeying the predetermined distribution based on the first image of the first resolution, and the trained inverse network of the reversible neural network is capable of generating a fourth image having the first resolution based on the third image of the second resolution and second high frequency information obeying the predetermined distribution.

In a fourth aspect, the present disclosure provides an electronic device. The electronic device includes: a processing unit; and a memory coupled to the processing unit and containing instructions stored thereon that, when executed by the processing unit, cause the apparatus to perform acts comprising: acquiring an input image with a second resolution; and generating an output image with a first resolution based on the input image and high frequency information from a predetermined distribution using a trained reversible neural network, wherein the first resolution is higher than the second resolution and the input image and the output image have the same semantics.

In a fifth aspect, the present disclosure provides a computer program product tangibly stored in a non-transitory computer storage medium and comprising machine executable instructions that, when executed by a device, cause the device to perform a method according to the first or second aspect described above.

In a further aspect, the present disclosure provides a computer-readable medium having stored thereon machine-executable instructions which, when executed by a device, cause the device to perform a method according to the first or second aspect described above.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a load programmable logic device (CPLD), and the like.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Further, while operations are depicted in a particular order, this should be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate implementations can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A computer-implemented method, comprising:

acquiring an input image having a first resolution; and

generating, with a trained reversible neural network, an output image having a second resolution and high frequency information subject to a predetermined distribution based on the input image, wherein the first resolution is higher than the second resolution and the input image and the output image have the same semantics.

2. The method of claim 1, wherein the reversible neural network comprises a transformation module and at least one reversible network element, and generating the output image and the high frequency information based on the input image comprises:

decomposing, with the transform module, the input image into a low frequency component representing semantics of the input image and a high frequency component related to the semantics; and

generating, with the at least one reversible network element, the output image and the high frequency information unrelated to the semantics based on the low frequency component and the high frequency component.

3. The method of claim 2, wherein the transformation module comprises any one of:

a wavelet transform module; and

the block can be rolled up reversibly.

4. The method of claim 1, further comprising:

training the reversible neural network such that:

the trained reversible neural network is capable of generating a second image having the second resolution and first high-frequency information obeying the predetermined distribution based on the first image of the first resolution, and

the trained inverse network of the reversible neural network is capable of generating a fourth image having the first resolution based on the third image of the second resolution and second high frequency information obeying the predetermined distribution.

5. The method of claim 4, wherein training the reversible neural network comprises:

acquiring a first set of training images having the first resolution;

acquiring a second set of training images respectively corresponding to semantics of the first set of training images and having the second resolution; and

training the reversible neural network based on the first and second sets of training images.

6. The method of claim 5, wherein acquiring the second set of training images comprises:

generating the second set of training images based on the first set of training images using interpolation.

7. The method of claim 5, wherein training the reversible neural network comprises:

determining a plurality of objective functions based on the first and second sets of training images;

determining a total objective function for training the reversible neural network by combining at least a portion of the plurality of objective functions; and

determining network parameters of the reversible neural network by minimizing the overall objective function.

8. The method of claim 7, wherein determining the plurality of objective functions comprises:

generating, with the reversible neural network, a third set of training images having the second resolution and a set of random variables based on the first set of training images; and

determining a first objective function based on differences between the second set of training images and the third set of training images.

9. The method of claim 7, wherein determining the plurality of objective functions comprises:

generating, with the reversible neural network, a third set of training images having the second resolution and a set of random variables based on the first set of training images;

generating, with the inverse network, a fourth set of training images having the first resolution based on the third set of training images and high frequency information that obeys the predetermined distribution; and

determining a second objective function based on a difference between the first set of training images and the fourth set of training images.

10. The method of claim 9, wherein determining the plurality of objective functions comprises:

determining a first data distribution of the first set of training images;

determining a second data distribution for the fourth set of training images; and

determining a third objective function based on a difference between the first data distribution and the second data distribution.

11. The method of claim 9, wherein determining the plurality of objective functions comprises:

determining a third data distribution of the set of random variables; and

determining a fourth objective function based on a difference between the third data distribution and the predetermined distribution.

12. A computer-implemented method, comprising:

acquiring an input image with a second resolution; and

generating, with a trained reversible neural network, an output image having a first resolution based on the input image and high frequency information from a predetermined distribution, wherein the first resolution is higher than the second resolution and the input image and the output image have the same semantics.

13. The method of claim 12, wherein the reversible neural network comprises a transformation module and at least one reversible network element, and generating the output image based on the input image and the high frequency information comprises:

generating, with the at least one reversible network element, a low frequency component and a high frequency component to be merged based on the input image and the high frequency information, the low frequency component representing semantics of the input image and the high frequency component relating to the semantics; and

combining, with the transform module, the low frequency component and the high frequency component into the output image.

14. The method of claim 13, wherein the transformation module comprises any one of:

a wavelet inverse transformation module; and

the block can be rolled up reversibly.

15. An electronic device, comprising:

a processing unit; and

a memory coupled to the processing unit and containing instructions stored thereon that, when executed by the processing unit, cause the apparatus to perform acts comprising:

acquiring an input image having a first resolution; and

16. The apparatus of claim 15, wherein the reversible neural network comprises a transformation module and at least one reversible network element, and generating the output image and the high frequency information based on the input image comprises:

17. The apparatus of claim 16, wherein the transformation module comprises any one of:

a wavelet transform module; and

the block can be rolled up reversibly.

18. The apparatus of claim 15, wherein the actions further comprise:

training the reversible neural network such that:

19. The apparatus of claim 18, wherein training the reversible neural network comprises:

acquiring a first set of training images having the first resolution;

20. The apparatus of claim 19, wherein training the reversible neural network comprises: