CN112119408A

CN112119408A - Method for acquiring image quality enhancement network, image quality enhancement method, image quality enhancement device, movable platform, camera and storage medium

Info

Publication number: CN112119408A
Application number: CN201980031413.7A
Authority: CN
Inventors: 赵文军; 牛兵兵
Original assignee: SZ DJI Technology Co Ltd
Current assignee: SZ DJI Technology Co Ltd
Priority date: 2019-08-29
Filing date: 2019-08-29
Publication date: 2020-12-22
Also published as: WO2021035629A1

Abstract

The embodiment of the application provides a method for acquiring an image quality enhancement network, an image quality enhancement method, an image quality enhancement device, a movable platform, a camera and a computer-readable storage medium. The method for acquiring the image quality enhancement network comprises the following steps: training a preset generative countermeasure network according to a plurality of preset training images; wherein the generative confrontation network comprises a generator; after the training of the generative confrontation network is finished, taking the generator as an image quality enhancement network; the generator or the image quality enhancement network comprises N scale extraction networks for extracting features of different scales, wherein N is an integer greater than 0, and the image quality enhancement network is used for carrying out image quality enhancement processing on any input image to be enhanced. According to the embodiment, the image quality enhancement network capable of enhancing the image quality is obtained through training of the image quality enhancement network, and the user experience is improved.

Description

Method for acquiring image quality enhancement network, image quality enhancement method, image quality enhancement device, movable platform, camera and storage medium

Technical Field

The embodiment of the application relates to the technical field of image processing, in particular to a method for acquiring an image quality enhancement network, an image quality enhancement method, an image quality enhancement device, a movable platform, a camera and a computer-readable storage medium.

Background

With the development of the technology, the image quality of the image is required to be higher and higher, but due to the limitation of network bandwidth or storage space, the image quality of the image has to be reduced in the image transmission or storage process to reduce the size of the image, so that the user experience is not good. However, the lossy compression method reduces the image quality of an image.

In the related art, image quality enhancement algorithms, such as histogram equalization or gamma correction, mainly perform image enhancement through human summary experience and human eye characteristics, and the image quality enhancement capability is limited and is easily limited by the image scene.

Disclosure of Invention

The embodiment of the application provides a method for acquiring an image quality enhancement network, an image quality enhancement method, an image quality enhancement device, a movable platform, a camera and a computer-readable storage medium.

In a first aspect, an embodiment of the present application provides a method for obtaining an image quality enhancement network, including:

training a preset generative countermeasure network according to a plurality of preset training images; wherein the generative confrontation network comprises a generator;

after the training of the generative confrontation network is finished, taking the generator as an image quality enhancement network; the generator or the image quality enhancement network comprises N scale extraction networks for extracting features of different scales, wherein N is an integer greater than 0, and the image quality enhancement network is used for carrying out image quality enhancement processing on any input image to be enhanced.

In a second aspect, an embodiment of the present application provides an image quality enhancement method, including:

inputting an image to be promoted into a preset image quality enhancement network, and carrying out image quality enhancement on the image to be promoted by the image quality enhancement network to obtain an image with image quality higher than that of the image to be promoted; the image quality enhancement network comprises N scale extraction networks for extracting different scale features, and is obtained by training as a generator of a generating countermeasure network; n is an integer greater than 0.

In a third aspect, an embodiment of the present application provides an apparatus for acquiring an image quality enhancement network, including:

a memory and a processor; the memory is connected with the processor through a communication bus and is used for storing computer instructions executable by the processor; the processor is configured to read the computer instructions from the memory to obtain the method of the first aspect.

In a fourth aspect, an embodiment of the present application provides an image quality enhancement apparatus, including a memory and a processor; the memory is connected with the processor through a communication bus and is used for storing computer instructions executable by the processor; the processor is configured to read a computer instruction from the memory to implement the image quality enhancement method according to any one of the second aspect.

In a fifth aspect, an embodiment of the present application provides a movable platform, including:

a body;

the power system is arranged in the machine body and used for providing power for the movable platform; and the number of the first and second groups,

the image quality enhancement apparatus according to the fourth aspect.

In a sixth aspect, an embodiment of the present application provides a camera, including:

a housing;

the lens assembly is arranged inside the shell;

the sensor assembly is arranged in the shell and used for sensing the light passing through the lens assembly and generating an electric signal; and the number of the first and second groups,

the image quality enhancement apparatus according to the fourth aspect.

In a seventh aspect, this application provides a computer-readable storage medium, on which computer instructions are stored, and when executed, the computer instructions implement the steps of the method according to any one of the first and second aspects.

In this embodiment, training is performed based on the architecture of the generative countermeasure network, so that the generator in the generative countermeasure network includes N scale extraction networks for extracting features of different scales, and thus the generator is used as an image quality enhancement network to implement image quality enhancement processing on any input image to be enhanced.

In this embodiment, an image to be enhanced is input to a preset image quality enhancement network, and the image quality enhancement network performs image quality enhancement on the image to be enhanced to obtain a target image with a higher image quality than the image quality of the image to be enhanced, so as to implement an image quality enhancement process.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

Fig. 1 is a flowchart of a method for acquiring an image quality enhancement network according to an embodiment of the present disclosure.

Fig. 2 is a structural diagram of an image quality enhancement network according to an embodiment of the present application.

Fig. 3 is a structural diagram of a scale extraction network according to an embodiment of the present application.

Fig. 4 is a flowchart of another method for acquiring an image quality enhancement network according to an embodiment of the present disclosure.

Fig. 5 is a training structure diagram of a generative confrontation network according to an embodiment of the present disclosure.

Fig. 6 is a structural diagram of an image quality evaluation network according to an embodiment of the present application.

Fig. 7 is a flowchart of an image quality enhancement method according to an embodiment of the present disclosure.

Fig. 8 is a scene diagram of an image quality enhancement network application according to an embodiment of the present application.

Fig. 9 is a block diagram of an apparatus for acquiring an image quality enhancement network according to an embodiment of the present application.

Fig. 10 is a block diagram of an image quality enhancement apparatus according to an embodiment of the present application.

Fig. 11 is a structural diagram of a movable platform according to an embodiment of the present application.

Fig. 12 is a structural diagram of a camera according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Image quality enhancement algorithms in the related art, such as histogram equalization or gamma correction, mainly perform image enhancement through human summary experience and human eye characteristics, have limited image quality improvement capability, are easily limited by image scenes, and can only process the same type of images in specific scenes; or some professional image quality optimization applications such as Photoshop applications require users to have higher software use skills and experiences, and meanwhile, the processing time is longer, which depends on a manual operation process, and the image quality improvement efficiency is not high.

Based on this, please refer to fig. 1, which is a flowchart illustrating a method for acquiring an image quality enhancement network according to an exemplary embodiment of the present application, in the embodiment illustrated in fig. 1, the method may be executed by an electronic device, where the electronic device may be a computing device such as a server or a terminal, and the method includes:

in step S101, a preset generative countermeasure network is trained according to a plurality of preset training images; wherein the generative countermeasure network comprises a generator.

In step S102, after the training of the generative confrontation network is completed, the generator is used as the image quality enhancement network; the generator or the image quality enhancement network comprises N scale extraction networks for extracting features of different scales, wherein N is an integer greater than 0, and the image quality enhancement network is used for carrying out image quality enhancement processing on any input image to be enhanced.

It should be noted that the picture quality is picture quality, which is related to the degree of lossy compression of the image, and if image information is lost in the compression process of the image, the picture quality is also reduced correspondingly. In one example, for example, an image contains high frequency information and low frequency information, the frequency of the high frequency information changes rapidly, the frequency of the low frequency information changes slowly, and a part of the high frequency information may be lost during the compression process of the image, so that the image quality of the compressed and decompressed image is reduced compared with the uncompressed original image.

In one example, the original image is encoded by an encoder to obtain a lossy encoded binary file, and then the binary file is decoded by a decoder to obtain the training image.

It is to be understood that, the specific types of the encoder and the decoder are not limited in any way in the embodiments of the present application, and may be specifically selected according to actual situations; in a possible implementation manner, a plurality of encoders of different types may be selected to encode an original image, so as to obtain a compressed image encoded based on different encoders, and then the compressed image is decoded by a decoder to obtain a training image. In the training process, according to different types of encoders, a plurality of training images corresponding to the types of the encoders are adopted to respectively train the generative networks so as to obtain the image quality enhancement networks meeting the encoding modes of different types, realize the image quality enhancement in different directions and improve the image quality enhancement capability.

In one possible implementation, to improve the efficiency of network training, the original image is an image that is pre-cropped to a specified size, in one example, the original image may be an image block cropped to a size (length, width, and depth) of 384 × 3.

In this embodiment, the Generative Adaptation Networks (GAN) is a deep learning model, and the Generative adaptation Networks include a generator and a discriminator, the electronic device takes the training image as an input of the generator, trains the generator, and takes the generator as an image quality enhancement network after the training of the generator, where the image quality enhancement network includes N scale extraction Networks for extracting features of different scales, and N is an integer greater than 0.

In addition, in order to improve the training stability and convergence of the network, the training image and the original image may be normalized, and for example, the electronic device may divide each channel pixel of the training image and the original image by 255.0, so that each channel pixel of the training image and the original image is represented between [0,1], thereby implementing the normalization process of the training image and the original image.

Referring to fig. 2, it is a structural diagram of the image quality enhancement network, where the image quality enhancement network further includes a first convolution layer 21, where the first convolution layer 21 is used to perform a convolution operation on the training image to realize a feature extraction process, so as to obtain image data after the convolution operation; the first convolution layer 21 includes a convolution kernel and an activation function, and the activation function may be any one of a PReLU function, a ReLU function, a tanh function, a Sigmod function, and an ELU function, and may be, for example, a PReLU function with a faster calculation speed; it is understood that, in the embodiment of the present application, there is no limitation to the size of the convolution kernel, and the size may be specifically set according to actual situations.

In addition, since the scale of the output image is reduced after the convolution operation, based on this, the first convolution layer 21 is also used to perform an expansion (padding) operation on the input training image to ensure that the size (length and width) of the output image is the same as that of the input image.

Referring to fig. 2, the first convolution layer 21 is connected to the scale extraction network 22, N scale extraction networks 22 are connected in sequence, and fig. 2 illustrates an example of 4 scale extraction networks 22, where an input image of each scale extraction network 22 is an output image of the first convolution layer 21 or an output image of a previous scale extraction network 22, that is, if the scale extraction network 22 is connected to the first convolution layer 21, an input image thereof is an output image of the first convolution layer 21, and if the scale extraction network 22 is not connected to the first convolution layer 21, an input image thereof is an output image of the previous scale extraction network 22.

In an embodiment, the scale extraction networks 22 are respectively configured to extract different scale features of an input image, please refer to fig. 2, the image quality enhancement network further includes at least one third convolution layer 23, which is exemplarily illustrated in fig. 2 by taking 3 third convolution layers 23 as an example, and the at least one third convolution layer 23 is configured to perform dimension reduction on the scale features output by the scale extraction networks 22 to obtain a low-dimensional feature vector to generate an image quality enhanced target image; it is understood that the number of the third convolutional layers 23 can be specifically set according to practical situations, and the embodiment of the present application does not limit this.

In one implementation, the plurality of scale extraction networks 22 extract different scale features of an input image to generate an image including the different scale features, since the dimensions (depths) of the output image including the different scale features are increased after being processed by the first convolution layer 21 and the scale extraction network 22, and in order to ensure that the dimensions (depths) of the finally output image and the training image are the same, dimension reduction processing needs to be performed on the image including the different scale features, at least one third convolution layer 23 sequentially connected is arranged in the image quality enhancement network, the third convolution layer 23 performs dimension reduction on the image including the different scale features to obtain a low-dimensional image including the different scale features, and generates an image quality enhanced target image, wherein the number of the third convolution layers can be adaptively adjusted according to the compression degree of the training data, to output an image with a good quality.

As an example, for example, the size of the training image is 384 × 3, the dimension thereof is 3, the size of the image including the features of different scales generated by the scale extraction network 22 is 384 × 9, the dimension (depth) thereof is 9, and a third convolution layer 23 is required to perform dimension reduction processing on the image, for example, the image quality enhancement network includes 2 third convolution layers 23, the two third convolution layers 23 sequentially perform dimension reduction on the image including the features of different scales, for example, the first third convolution layer 23 reduces the image into an image with a size of 384 × 6, and the second third convolution layer 23 reduces the image into an image with a size of 384 × 3, so that the dimension of the finally obtained image including the features of different scales is the same as the dimension of the training image; it is understood that the dimension of each dimension reduction can be specifically set according to actual situations, and the embodiment of the present application does not set any limitation.

Wherein the third convolutional layer 23 includes a convolutional kernel and an activation function, the activation function of the third convolutional layer 23 may be any one of a prellu function, a ReLU function, a tanh function, a Sigmod function, and an ELU function, as an example, the activation function of the third convolutional layer which is not the last layer may be a prellu function with a faster calculation speed, and the activation function of the third convolutional layer which is the last layer may be a Sigmod function, so as to implement the normalization process; it is understood that, in the embodiment of the present application, there is no limitation to the size of the convolution kernel, and the size may be specifically set according to actual situations.

Referring to fig. 3, a structure diagram of the scale extraction network is shown, the scale extraction network includes a plurality of filters 221, a splicing layer 222, a second convolution layer 223, and an addition layer 224, which are connected in sequence:

the plurality of filters 221 are used for extracting scale features of the input image to obtain a plurality of different scale features, the scale features extracted by the different filters 221 are different, and the plurality of filters 221 contribute to effective extraction of the scale features. For example, the filter includes convolution kernels, different scale features of the input image can be obtained through different convolution kernels, a large convolution kernel can obtain contour information of the input image, and a small convolution kernel can obtain detail information of the input image. It should be noted that, in the embodiment of the present application, there is no limitation on the arrangement manner of the plurality of filters 221, for example, the filters 221 may be arranged in parallel, and as an example, 2 filters 221 listed in fig. 3 are used for exemplary illustration: the 2 filters 221 form a parallel relationship, and the electronic device inputs the input image into the first filter 221 and the second filter 221 respectively to perform scale feature extraction, so as to obtain 2 different scale features; it should be noted that the "first" and "second" are only used to distinguish the filter 221, and do not have any meaning per se.

Wherein the filter 221 includes a convolution kernel and an activation function, and the activation function may be any one of a prilu function, a ReLU function, a tanh function, a Sigmod function, and an ELU function, and may be, for example, a prilu function with a faster computation speed; it can be understood that, in the embodiment of the present application, no limitation is imposed on the size of the convolution kernel, the sizes of the convolution kernels between different filters 221 may be the same or different, and may be specifically set according to actual situations.

In addition, considering that the image size is reduced after the convolution operation by the convolution kernel, in order to ensure that the input image and the output image have the same size for the convenience of subsequent splicing, the filter 221 is also used for performing an expansion (padding) operation on the input image to ensure that the output image and the input image have the same size (length and width).

Then, the outputs of the filters 221 are used as the inputs of the splicing layer 222, and the splicing layer 222 is used for splicing a plurality of different scale features output by the filters 221 to obtain spliced image data; the description is given by taking fig. 3 as an example: for example, 2 images (including length, width, and dimension) of size 254 × 128 that are finally output by the plurality of filters 221, the size of the stitched image is 254 × 256(128 × 2 — 256).

As can be seen, the dimension of the stitched image data is increased, and the stitched image data needs to be processed with the input image input to the scale extraction network subsequently, the scale extraction network sets a second convolution layer 223, and the second convolution layer 223 performs convolution operation on the stitched image data to obtain image data with the same dimension as the input image; finally, the addition layer 224 performs point-by-point summation on the image data output by the second convolution layer 223 and the input image to obtain image data containing different scale features, that is, the addition layer 224 fuses the extracted different scale features with the original scale features of the input image to further obtain image data containing different scale features.

Referring to fig. 4, which is a flowchart illustrating another method for obtaining an image quality enhancement network according to an exemplary embodiment of the present application, in the embodiment illustrated in fig. 4, the method may be executed by an electronic device, where the electronic device may be a computing device such as a server or a terminal, and the method includes:

in step S201, a preset generative countermeasure network is trained according to a plurality of preset training images and original images; wherein the generative confrontation network comprises a generator and an arbiter.

In step S202, after the training of the generative confrontation network is completed, the generator is used as an image quality enhancement network, and the discriminator is used as an image quality evaluation network; the generator or the image quality enhancement network comprises N scale extraction networks for extracting features of different scales, wherein N is an integer greater than 0, and the image quality enhancement network is used for carrying out image quality enhancement processing on any input image to be enhanced so as to obtain a target image; the image quality evaluation network is used for evaluating the probability that the target image is the original image.

In an embodiment, the input data of the image quality evaluation network is a target image and the original image output by the image quality enhancement network, and the output data is a probability that the target image is the original image.

Referring to fig. 5, in the training structure diagram of the generative countermeasure network, after an original image is decoded by an encoder 40 and a decoder 10, a training image is obtained, the training image is input into a generator 20, the generator 20 outputs an image quality enhanced target image, the target image and the original image are input into a discriminator 30, then the generative countermeasure network is trained according to the probability that the generated target image is the original image, and parameters of the generator and the discriminator in the generative countermeasure network are updated.

Referring to fig. 6, it is a structural diagram of the image quality evaluation network, the image quality evaluation network includes a fifth convolutional layer 31, a plurality of sixth convolutional layers 32 (4 sixth convolutional layers 32 are exemplified in fig. 6), a first fully-connected layer 33, and a second fully-connected layer 34, which are connected in sequence:

the fifth convolution layer 31 is used for performing convolution operation on the input target image and the input original image to realize feature extraction; the sixth convolutional layers 32 are sequentially connected, and an input image of the sixth convolutional layer 32 is an output image of the fifth convolutional layer 31 or an output image of the previous sixth convolutional layer 32, that is, if the sixth convolutional layer 32 is connected to the fifth convolutional layer 31, an input image thereof is an output image of the fifth convolutional layer 31, and if the sixth convolutional layer 32 is not connected to the fifth convolutional layer 31, an input image thereof is an output image of the previous sixth convolutional layer 32; the sixth convolutional layer 32 is used for extracting features of the input image to obtain high-dimensional features, taking fig. 6 as an example for explanation, assuming that the dimension of the image data output from the fifth convolutional layer 31 is 64, the 1 st sixth convolutional layer 32 performs high-dimension feature extraction on the output image of the fifth convolutional layer 31, performs convolution operation, so that the dimension of the output image is 128 dimensions, the 2 nd sixth convolutional layer 32 performs high-dimensional feature extraction on the last sixth convolutional layer 32, and after the convolution operation, so that the dimension of the output image is 256 dimensions, the 3 rd sixth convolutional layer 32 performs high-dimensional feature extraction on the last sixth convolutional layer 32, and after the convolution operation, the dimensionality of the output image is 512 dimensions, the 4 th sixth convolutional layer 32 performs high-dimensional feature extraction on the last sixth convolutional layer 32, and the dimensionality of the output image is 1024 dimensions through convolution operation.

Wherein the fifth convolutional layer 31 includes a convolutional kernel and an activation function, and the activation function may be any one of a PReLU function, a ReLU function, a tanh function, a Sigmod function, and an ELU function, and for example, the activation function may be a PReLU function with a faster calculation speed; it is understood that, in the embodiment of the present application, there is no limitation to the size of the convolution kernel, and the size may be specifically set according to actual situations.

The sixth convolution layer 32 includes a convolution kernel, an activation function, and a BN (Batch Normalization) layer, the activation function may be any one of a prellu function, a ReLU function, a tanh function, a Sigmod function, and an ELU function, and the activation function may be, for example, a ReLU function with a faster computation speed; it can be understood that, in the embodiment of the present application, no limitation is imposed on the size of the convolution kernel, and the size may be specifically set according to actual situations; the BN layer can accelerate the network convergence speed.

Next, the first fully-connected layer 33 classifies the high-dimensional features output by the sixth convolutional layer 32; finally, the second fully-connected layer 34 classifies the result output by the first fully-connected layer 33 to obtain the probability that the target image is the original image, and trains the entire generative countermeasure network based on the probability and a preset loss function.

In one embodiment, the training of the predetermined generative confrontation network includes: training the generative countermeasure network based on a preset loss function; wherein the loss function comprises a countering loss function and a picture quality loss function; the resistance loss function is used for evaluating the generation capability of the image quality enhancement network for generating a target image and the discrimination capability of the image quality evaluation network for discriminating the target image from an original image; the image quality loss function is used for evaluating the generation capability of the image quality enhancement network, namely, evaluating whether the image quality enhancement network converges, whether the target image meets a preset condition and the like.

The countermeasure loss function is realized by Wasserstein loss, so that the problem of gradient disappearance can be solved, and the network training process is simplified; the image quality loss function in the generative countermeasure network may be any one of: the Multi-level structure Similarity algorithm (MS-SSIM, Multi-Scale-Structure Similarity Index), the structure Similarity algorithm (SSIM, Structure Similarity Index), the Mean Absolute Difference algorithm (MAD, Mean Absolute Difference), the Sum of Absolute differences algorithm (SAD, Sum of Absolute Difference), the Hadamard transform algorithm (SATD, Sum of Absolute Transform Difference), the Sum of Squared differences algorithm (SSD, Sum of Squared Difference) and the Mean Squared error algorithm (MSD, Mean Squared Difference).

For example, the loss function is related to a VGG-19 pre-training model, and the loss function is set to l_VGGThen, then

Wherein I ═ G (D (E (I))⁰) E) represents an encoder, D represents a decoder, G represents an image quality enhancement network, and the embodiment compares the image quality enhancement network output in a high-dimensional feature space by using a VGG-19 pre-training model as an image quality loss function through the VGG-19 pre-training modelAnd (4) information loss between the target image and the original image.

It can be understood that the ratio between the penalty function and the loss of image quality function can be specifically set according to actual situations, and the embodiment of the present application does not limit the ratio; by way of example, let the penalty function be l_adverLet the loss function of the generative countermeasure network be loss_allThen the penalty function can be set to: loss_all＝l_VGG+0.001*l_adver。

In one embodiment, the optimizer in the generative countermeasure network employs any one of: root Mean Square transfer algorithm (RMSProp), Adaptive Moment Estimation algorithm (Adam), adagard algorithm, and adalta algorithm.

In an embodiment, the parameters of the image quality enhancement network may also be adjusted based on different training image formats (such as RGB format, YUV format, etc.) to adapt to images of different formats.

Accordingly, referring to fig. 7, an embodiment of the present application further provides an image quality enhancement method, which can be executed by an electronic device, where the electronic device may be a server or a terminal, and the method includes:

step 301, inputting an image to be promoted into a preset image quality enhancement network, and performing image quality enhancement on the image to be promoted by the image quality enhancement network to obtain a target image with image quality higher than that of the image to be promoted; the image quality enhancement network comprises N scale extraction networks for extracting different scale features, and is obtained by training as a generator of a generating countermeasure network; n is an integer greater than 0.

In this embodiment, the image to be promoted is an image that is output after being encoded by an encoder and decoded by a decoder and loses image information, and it can be understood that the format of the image to be promoted is not limited in this embodiment, and the image to be promoted may be an RGB image or a YUV image, for example.

Since the image quality enhancement network has a convolutional layer structure, an image with an arbitrary resolution can be processed.

Referring to fig. 8, in order to apply the scene graph to the image quality enhancement network, the image to be enhanced decoded by the decoder 10 is input into the image quality enhancement network 20 for operation, so as to obtain the target image with the image quality higher than that of the image to be enhanced.

In an implementation manner, the image quality enhancement network includes a first convolution layer, the first convolution layer is connected to a first one of the N scale extraction networks, and is configured to perform convolution operation on the image to be enhanced to obtain image data after the convolution operation, the N scale extraction networks are sequentially connected, an input image of each scale extraction network is an output image of the first convolution layer or an output image of a previous scale extraction network, and the scale extraction networks are used for extracting different scale features of the input image; wherein the first convolution layer includes a convolution kernel and an activation function, for example, the activation function is a PreLU function.

Then, the image quality enhancement network further comprises at least one third convolution layer, wherein the at least one third convolution layer is used for reducing the dimension of the scale features output by the scale extraction structure to obtain low-dimension feature vectors so as to generate an image with enhanced image quality; the activation function of the third convolutional layer 23 may be any one of a prellu function, a ReLU function, a tanh function, a Sigmod function, and an ELU function, for example, the activation function of the third convolutional layer that is not the last layer may be a prellu function with a faster calculation speed, and the activation function of the third convolutional layer that is the last layer may be a Sigmod function, so as to implement the normalization process.

In one embodiment, the scale extraction network comprises: a plurality of filters, a splice layer, a second convolution layer and an addition layer connected in sequence:

the filters are used for respectively extracting different scale features of the input image to obtain a plurality of different scale features, and the scale features extracted by the different filters are different; the splicing layer is used for splicing a plurality of different scale characteristics to obtain spliced image data; the second convolution layer performs convolution operation on the spliced image data to obtain image data with the same dimensionality as the input image; the addition layer carries out point-by-point summation on the image data output by the second convolution layer and the input image to obtain image data containing different scale characteristics; wherein the filter includes a convolution kernel and an activation function, which is, for example, a PreLU function.

In an example, for example, if a to-be-promoted image loses part of image information (e.g., high-frequency information or other image information) during an encoding process, in the process of enhancing the image quality of the to-be-promoted image, the multiple filters in the scale extraction network may extract different scale features of the to-be-promoted image, so as to restore the image information.

Accordingly, referring to fig. 9, an embodiment of the present application further provides an apparatus 100 for acquiring an image quality enhancement network, including:

a memory 101 and a processor 102; the memory 101 is connected to the processor 102 through a communication bus 103 for storing computer instructions executable by the processor 102; the processor 102 is configured to read computer instructions from the memory 101, and when executed, is configured to:

training a preset generative countermeasure network according to a plurality of preset training images; wherein the generative countermeasure network comprises a generator.

The Processor 102 executes the program codes included in the memory 101, and the Processor 102 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, or the like. The general purpose processor 102 may be a microprocessor or the processor 102 may be any conventional processor or the like.

The memory 101 stores a program code of the method for acquiring the picture quality enhancement network, and the memory 101 may include at least one type of storage medium, which includes a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. Further, the apparatus 100 for acquiring the image quality enhancement network may cooperate with a network storage apparatus that performs a storage function of the memory 101 through network connection. The storage 101 may be an internal storage unit of the apparatus 100 for acquiring the image quality enhancement network, for example, a hard disk or a memory of the apparatus 100 for acquiring the image quality enhancement network. The memory 101 may also be an external storage device of the apparatus 100 for obtaining the image quality enhancement network, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the apparatus 100 for obtaining the image quality enhancement network. Further, the memory 101 may include both an internal storage unit and an external storage device of the apparatus 100 for acquiring the image quality enhancement network. The memory 101 is used to store computer program code and other programs and data needed to acquire the device 100 of the network for quality enhancement. The memory 101 may also be used to temporarily store data that has been output or is to be output.

The various embodiments described herein may be implemented using a computer-readable medium such as computer software, hardware, or any combination thereof. For a hardware implementation, the embodiments described herein may be implemented using at least one of an Application Specific Integrated Circuit (ASIC), a digital signal processor 102(DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a processor 102, a controller, a microcontroller, a microprocessor 102, an electronic unit designed to perform the functions described herein. For a software implementation, the implementation such as a process or a function may be implemented with a separate software module that allows performing at least one function or operation. The software code may be implemented by a software application (or program) written in any suitable programming language, which may be stored in memory 101 and executed by the controller.

The apparatus 100 for acquiring an image quality enhancement network may include, but is not limited to, a memory 101 and a processor 102. It will be understood by those skilled in the art that fig. 9 is only an example of the apparatus 100 for acquiring the image quality enhancement network, and does not constitute a limitation to the apparatus 100 for acquiring the image quality enhancement network, and may include more or less components than those shown in the drawings, or may combine some components, or different components, for example, the apparatus may further include an input/output device, a network access device, and the like.

As an example, the training image is generated from the original image by encoding and decoding.

As an example, N of the scale extraction networks are connected in sequence.

The image quality enhancement network further comprises a first convolution layer; the first convolution layer is connected with a first one of the N scale extraction networks and is used for performing convolution operation on the training image to obtain image data after the convolution operation.

The input image of each scale extraction network is the output image of the first convolution layer or the output image of the last scale extraction network.

As an example, the scale extraction network comprises a plurality of filters; the filters are used for extracting the scale features of the input image to obtain a plurality of different scale features.

By way of example, the filter includes a convolution kernel and an activation function.

As an example, the activation function includes at least one of: a PReLU function, a ReLU function, a tanh function, a Sigmod function, and an ELU function.

As an example, the activation function is a PreLU function.

As an example, the scale extraction network further includes a splicing layer, where the splicing layer is configured to splice a plurality of different scale features output by the plurality of filters to obtain spliced image data.

As an example, the scale extraction network further comprises a second convolutional layer; and the second convolution layer performs convolution operation on the spliced image data to obtain image data with the same dimensionality as the input image.

As an example, the scale extraction network further comprises an addition layer; and the addition layer performs point-by-point summation on the image data output by the second convolution layer and the input image to obtain image data containing high-frequency characteristics and low-frequency characteristics.

As an example, the image quality enhancement network further includes at least one third convolution layer; and the at least one third convolution layer is used for reducing the dimension of the scale features output by the scale extraction network to obtain low-dimension feature vectors and generating a target image with enhanced image quality.

As an example, the third convolutional layer includes a convolution kernel and an activation function.

As an example, the activation function is a PreLU function or a Sigmod function.

As an example, the generative confrontation network includes an arbiter.

The processor 102 is further configured to perform the following operations:

and after the training of the generative confrontation network is finished, taking the discriminator as an image quality evaluation network.

As an example, the input data of the image quality evaluation network is a target image and the original image output by the image quality enhancement network, and the output data is a probability that the target image is the original image.

As an example, the processor 102 is further configured to perform the following operations:

training the generative countermeasure network based on a preset loss function; wherein the loss function includes a penalty loss function and a loss of image quality function.

As an example, the countering loss function is implemented using Wasserstein loss.

As an example, the loss of image function is associated with a VGG-19 pre-training model.

As an example, the optimizer in the generative countermeasure network is implemented using a root mean square transfer algorithm.

As an example, the picture quality loss function in the generative countermeasure network includes at least one of: a multi-hierarchy structural similarity algorithm, a mean absolute difference algorithm, a sum of absolute errors algorithm, a hadamard transform algorithm, a sum of squared differences algorithm, and a mean squared errors algorithm.

As an example, the original image is an image cropped to a designated size.

Accordingly, referring to fig. 10, an embodiment of the present invention further provides an image quality enhancement apparatus 200, including:

a memory 201 and a processor 202; the memory 201 is connected to the processor 202 via a communication bus 203 for storing computer instructions executable by the processor 202; the processor 202 is configured to read computer instructions from the memory 201, and when executed, is configured to:

inputting an image to be promoted into a preset image quality enhancement network, and carrying out image quality enhancement on the image to be promoted by the image quality enhancement network to obtain a target image with the image quality higher than that of the image to be promoted; the image quality enhancement network comprises N scale extraction networks for extracting different scale features, and is obtained by training as a generator of a generating countermeasure network; n is an integer greater than 0.

The Processor 202 executes the program code included in the memory 201, and the Processor 202 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 201 stores a program code of the image quality enhancement method, and the memory 201 may include at least one type of storage medium, which includes a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., an SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. Further, the image quality enhancement apparatus 200 may cooperate with a network storage apparatus that performs a storage function of the memory 201 through a network connection. The memory 201 may be an internal storage unit of the image quality enhancement apparatus 200, such as a hard disk or a memory of the image quality enhancement apparatus 200. The memory 201 may also be an external storage device of the image quality enhancement apparatus 200, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the image quality enhancement apparatus 200. Further, the memory 201 may include both an internal storage unit of the image quality enhancement apparatus 200 and an external storage device. The memory 201 is used for storing the computer program code 23 and other programs and data required by the image quality enhancement apparatus 200. The memory 201 may also be used to temporarily store data that has been output or is to be output.

The various embodiments described herein may be implemented using a computer-readable medium such as computer software, hardware, or any combination thereof. For a hardware implementation, the embodiments described herein may be implemented using at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a processor, a controller, a microcontroller, a microprocessor, and an electronic unit designed to perform the functions described herein. For a software implementation, the implementation such as a process or a function may be implemented with a separate software module that allows performing at least one function or operation. The software codes may be implemented by software applications (or programs) written in any suitable programming language, which may be stored in memory and executed by the controller.

The image quality enhancement apparatus 200 may include, but is not limited to, a memory 201 and a processor 202. Those skilled in the art will appreciate that fig. 10 is only an example of the image quality enhancement apparatus 200, and does not constitute a limitation to the image quality enhancement apparatus 200, and may include more or less components than those shown in the drawings, or combine some components, or different components, for example, the device may further include an input/output device, a network access device, and the like.

As an example, the image to be promoted is an image output by a decoder.

As an example, N of the scale extraction networks are connected in sequence.

The image quality enhancement network further comprises a first convolution layer; the first convolution layer is connected with a first one of the N scale extraction networks and is used for performing convolution operation on the image to be promoted to obtain image data after the convolution operation.

As an example, the scale extraction network comprises: the filter, the splicing layer and the second convolution layer are connected in sequence.

The filters are used for extracting different scale features of the input image to obtain a plurality of different scale features.

The splicing layer is used for splicing a plurality of different scale characteristics under each template to obtain spliced image data.

And the second convolution layer performs convolution operation on the spliced image data to obtain image data with the same dimensionality as the input image.

As an example, the scale extraction network further comprises an addition layer.

And the addition layer performs point-by-point summation on the image data output by the second convolution layer with the same dimensionality as the input image and the input image to obtain image data with different scales containing high-frequency characteristics and low-frequency characteristics.

As an example, the activation function is a PreLU function.

As an example, the image quality enhancement network further includes at least one third convolution layer; and the at least one third convolution layer is used for reducing the dimension of the scale features output by the scale extraction structure to obtain low-dimension feature vectors and generating an image with enhanced image quality.

Accordingly, referring to fig. 11, an embodiment of the present invention further provides a movable platform 001, including:

a machine body 02.

And the power system 03 is arranged in the machine body 02 and used for providing power for the movable platform 001.

And the image quality enhancing apparatus 200 described above.

Those skilled in the art will appreciate that fig. 11 is merely an example of a movable platform and is not intended to be limiting and may include more or fewer components than shown, or some components in combination, or different components, e.g., the movable platform may also include input-output devices, network access devices, etc.

Accordingly, referring to fig. 12, an embodiment of the present application further provides a camera 002, including:

a housing 04.

And the lens assembly 05 is arranged inside the shell 04.

And the sensor component 06 is arranged inside the shell 04 and used for sensing the light passing through the lens component and generating an electric signal.

And the image quality enhancing apparatus 200.

Those skilled in the art will appreciate that fig. 12 is merely an example of a camera and is not meant to be limiting and may include more or fewer components than those shown, or some components may be combined, or different components, e.g., the camera may also include a network access device, etc.

Accordingly, the present embodiment also provides a computer-readable storage medium, on which computer instructions are stored, and when executed, the computer instructions implement the steps of any one of the above methods.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

The present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features herein disclosed.

Claims

1. A method for obtaining an image quality enhancement network is characterized by comprising the following steps:

2. The method of claim 1, wherein the training image is generated from an original image by encoding and decoding.

3. The method of claim 2, wherein N of said scaling networks are connected in series;

the image quality enhancement network further comprises a first convolution layer; the first convolution layer is connected with a first one of the N scale extraction networks and is used for performing convolution operation on the training image to obtain image data after the convolution operation;

4. The method of claim 3, wherein the scale extraction network comprises a plurality of filters; the filters are used for extracting the scale features of the input image to obtain a plurality of different scale features.

5. The method of claim 4, wherein the filter comprises a convolution kernel and an activation function.

6. The method of claim 5, wherein the activation function comprises at least one of: a PReLU function, a ReLU function, a tanh function, a Sigmod function, and an ELU function.

7. The method of claim 6, wherein the activation function is a PreLU function.

8. The method of claim 4, wherein the scale extraction network further comprises a stitching layer for stitching a plurality of different scale features output by the plurality of filters to obtain stitched image data.

9. The method of claim 8, wherein the scale extraction network further comprises a second convolutional layer; and the second convolution layer performs convolution operation on the spliced image data to obtain image data with the same dimensionality as the input image.

10. The method of claim 9, wherein the scale extraction network further comprises an addition layer; and the addition layer performs point-by-point summation on the image data output by the second convolution layer and the input image to obtain image data containing different scale characteristics.

11. The method of claim 1, wherein the network further comprises at least one third convolutional layer; and the at least one third convolution layer is used for reducing the dimension of the scale features output by the scale extraction network to obtain low-dimension feature vectors and generating a target image with enhanced image quality.

12. The method of claim 11, wherein the third convolutional layer comprises a convolutional kernel and an activation function.

13. The method of claim 12, wherein the activation function is a PreLU function or a Sigmod function.

14. The method of claim 2, wherein the generative confrontation network comprises an arbiter;

the method further comprises the following steps:

15. The method of claim 14, wherein the input data of the quality estimation network is a target image and the original image output by the quality enhancement network, and the output data is a probability that the target image is the original image.

16. The method of claim 1, wherein training the pre-established generative countermeasure network comprises:

17. The method of claim 16,

the penalty function is implemented using Wasserstein loss.

18. The method of claim 16,

the loss of picture quality function is related to the VGG-19 pre-training model.

19. The method of claim 1, wherein the optimizer in the generative countermeasure network is implemented using a root mean square transfer algorithm.

20. The method of claim 1, wherein the generated picture quality loss function in the countermeasure network comprises at least one of: a multi-hierarchy structural similarity algorithm, a mean absolute difference algorithm, a sum of absolute errors algorithm, a hadamard transform algorithm, a sum of squared differences algorithm, and a mean squared errors algorithm.

21. The method of claim 1, wherein the original image is an image cropped to a specified size.

22. An image quality enhancement method, comprising:

23. The method of claim 22, wherein the image to be promoted is an image output by a decoder.

24. The method of claim 22, wherein N of said scaling networks are connected in series;

the image quality enhancement network further comprises a first convolution layer; the first convolution layer is connected with a first one of the N scale extraction networks and is used for performing convolution operation on the image to be promoted to obtain image data after the convolution operation;

25. The method of claim 22, wherein the scale extraction network comprises: the filter, the splicing layer and the second convolution layer are connected in sequence;

the filters are used for extracting different scale features of the input image to obtain a plurality of different scale features;

the splicing layer is used for splicing a plurality of different scale characteristics to obtain spliced image data;

26. The method of claim 25, wherein the scale extraction network further comprises an addition layer;

and the addition layer performs point-by-point summation on the image data output by the second convolution layer and the input image to obtain image data containing different scale characteristics.

27. The method of claim 25, wherein the filter comprises a convolution kernel and an activation function.

28. The method of claim 27, wherein the activation function comprises at least one of: a PReLU function, a ReLU function, a tanh function, a Sigmod function, and an ELU function.

29. The method of claim 28, wherein the activation function is a PreLU function.

30. The method of claim 22, wherein the network further comprises at least one third convolutional layer connected in sequence; and the at least one third convolution layer is used for reducing the dimension of the scale features output by the scale extraction structure to obtain low-dimension feature vectors and generating a target image with enhanced image quality.

31. The method of claim 30, wherein the third convolutional layer comprises a convolutional kernel and an activation function.

32. The method of claim 31, wherein the activation function is a PreLU function or a Sigmod function.

33. An apparatus for acquiring an image quality enhancement network, comprising:

a memory and a processor; the memory is connected with the processor through a communication bus and is used for storing computer instructions executable by the processor; the processor is used for reading computer instructions from the memory to realize the method for acquiring the picture quality enhancement network according to any one of claims 1 to 21.

34. An image quality enhancement device is characterized by comprising a memory and a processor; the memory is connected with the processor through a communication bus and is used for storing computer instructions executable by the processor; the processor is used for reading computer instructions from the memory to realize the image quality enhancement method of any one of claims 22 to 32.

35. A movable platform, comprising:

a body;

the power system is arranged in the machine body and used for providing power for the movable platform; and the image quality enhancement apparatus according to claim 34.

36. A camera, comprising:

a housing;

the lens assembly is arranged inside the shell;

the image quality enhancement apparatus according to claim 34.

37. A computer-readable storage medium having stored thereon computer instructions which, when executed, implement the steps of the method of any one of claims 1 to 32.