CN112419219A

CN112419219A - Image enhancement model training method, image enhancement method and related device

Info

Publication number: CN112419219A
Application number: CN202011333906.7A
Authority: CN
Inventors: 孟冬伟; 周济; 罗先桂; 张建
Original assignee: Guangzhou Huya Technology Co Ltd
Current assignee: Guangzhou Huya Technology Co Ltd
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2021-02-26

Abstract

The application discloses an image enhancement model training method, an image enhancement method and a related device, wherein the image enhancement model training method comprises the following steps: acquiring a clear sample image and a degraded image corresponding to the clear sample image; performing feature extraction on the degraded image to obtain a first image feature of the degraded image; performing feature mining on the first image features through a plurality of cascaded residual error networks to obtain second image features; performing image enhancement on the degraded image based on the second image characteristics to obtain a predicted image; and training an image enhancement model by using the difference between the predicted image and the clear sample image. According to the scheme, the image enhancement effect of the image enhancement model can be improved.

Description

Image enhancement model training method, image enhancement method and related device

Technical Field

The present application relates to the field of model training technologies, and in particular, to an image enhancement model training method, an image enhancement method, and a related apparatus.

Background

With the rapid development of internet technology, the live broadcast industry is more and more frequently appeared in the visual field of people, and a novel entertainment mode is provided for people. At present, in the live broadcast industry, the requirements for live broadcast pictures with high frame rate and high resolution are higher and higher, so as to improve the look and feel and experience of users. Currently, the frame rate of mainstream live video is typically 60fps, and the resolution is 1080 × 1920.

However, in various specific live scenes, there are often uncontrollable phenomena or hidden dangers to some extent in anchor behavior, devices themselves, and the like, for example: the network environment is poor, high-definition video cannot be transmitted in real time or the anchor device is poor, and when a live broadcast picture is rendered, the phenomena of noise and the like occur, so that the quality of the live broadcast video is uneven easily.

In the industry, a BM3D dessication algorithm based on block matching is generally adopted to process a live broadcast picture, but the algorithm only has the effect of dessicating one dimension and cannot enhance the details. The GRDB algorithm based on the deep convolutional neural network and the attention mechanism has obvious drying effect, but has long running time, cannot achieve the real-time effect of 60 frames, and cannot enhance the detail texture.

Disclosure of Invention

The application provides an image enhancement model training method, an image enhancement method and a related device.

The application provides an image enhancement model training method, which comprises the following steps: acquiring a clear sample image and a degraded image corresponding to the clear sample image; performing feature extraction on the degraded image to obtain a first image feature of the degraded image; performing feature mining on the first image features through a plurality of cascaded residual error networks to obtain second image features; performing image enhancement on the degraded image based on the second image characteristics to obtain a predicted image; and training an image enhancement model by using the difference between the predicted image and the clear sample image.

The method comprises the following steps of carrying out feature mining on a first image feature through a plurality of cascaded residual error networks to obtain a second image feature: performing feature coding on the first image features through a plurality of cascaded first residual error networks to obtain coded image features; and performing feature decoding on the coded image features through a plurality of cascaded second residual error networks to obtain decoded image features matched with the size of the degraded image, and determining the decoded image features as second image features.

The method for obtaining the coded image characteristics of the first image characteristics by carrying out characteristic coding on the first image characteristics through a plurality of cascaded first residual error networks comprises the following steps: performing down-sampling on the first image characteristic by using a convolution layer with the step length of 2 through at least four sequentially cascaded first residual error networks to obtain a coded image characteristic; each first residual network further comprises 4 densely-connected convolutional layers with the core size of 3x3 and one convolutional layer with the core size of 1x 1.

The method comprises the following steps of carrying out feature decoding on the coded image features through a plurality of cascaded second residual error networks to obtain decoded image features matched with the size of the degraded image, and determining the decoded image features as second image features, wherein the step of carrying out feature decoding on the coded image features through a plurality of cascaded second residual error networks comprises the following steps of: the first image features are up-sampled by pixel reconstruction through at least four sequentially cascaded second residual error networks to obtain decoded image features matched with the size of the degraded image; each second residual network further comprises 4 densely-connected deconvolution layers with the core size of 3x3 and one deconvolution layer with the core size of 1x 1.

The step of performing image enhancement on the degraded image based on the second image characteristic to obtain a predicted image comprises the following steps: processing the second image characteristic through a convolution layer with the kernel size of 3x3 to obtain high-frequency residual error information; and superposing the high-frequency residual error information and the degraded image to obtain a predicted image.

The step of training the image enhancement model by using the difference between the predicted image and the clear sample image comprises the following steps: obtaining a loss function of an image enhancement model through the difference between the predicted image and the clear sample image; and carrying out parameter adjustment on the image enhancement model through a loss function so as to train the image enhancement model.

Wherein the loss function comprises at least one of a reconstruction loss function, a perceptual loss function, and an antagonistic loss function.

When the loss function comprises a reconstruction loss function, the step of obtaining the loss function of the image enhancement model through the difference between the prediction image and the clear sample image comprises the following steps: calculating to obtain the Manhattan distance between the predicted image and the clear sample image based on the predicted image and the clear sample image; performing Gaussian blur with a Gaussian kernel as a set value on the clear sample image to obtain a blurred image of the clear sample image; performing subtraction on the clear sample image and the fuzzy image to obtain a texture residual error of the clear sample image; carrying out normalization processing on the texture residual error to obtain the spatial weight of each position of the clear sample image; and multiplying the space weight by the Manhattan distance to obtain a reconstruction loss function of the image enhancement model.

The step of acquiring the clear sample image and the corresponding degraded image comprises the following steps: acquiring a clear sample image, wherein the resolution of the clear sample image exceeds a resolution threshold; and carrying out transcoding degradation and noise degradation on the clear sample image to obtain a degraded image.

The image enhancement model training method further comprises the following steps: judging whether convolution kernels in each convolution layer in the image enhancement model training are important or not through an L1 paradigm; if the convolution kernel is not important, the convolution kernel is removed based on the pruning proportion, and model parameters on the image enhancement model are adjusted.

The application also provides an image enhancement model training device, including: the acquisition module is used for acquiring the clear sample image and the degraded image corresponding to the clear sample image; the characteristic extraction module is used for extracting the characteristics of the degraded image to obtain first image characteristics of the degraded image; the mining module is used for carrying out feature mining on the first image features through a plurality of cascaded residual error networks to obtain second image features; the enhancement module is used for carrying out image enhancement on the degraded image based on the second image characteristics to obtain a predicted image; and the training module is used for training the image enhancement model by utilizing the difference between the predicted image and the clear sample image.

The application also provides an image enhancement method, which comprises the following steps: acquiring an image to be enhanced and an image enhancement model, wherein the image enhancement model is obtained by training through any one of the image enhancement model training methods; performing feature extraction on the image to be enhanced by using the image enhancement model to obtain a first image feature of the image to be enhanced; performing feature mining on the first image features through a plurality of cascaded residual error networks in the image enhancement model to obtain second image features of the image to be enhanced; and carrying out image enhancement on the image to be enhanced based on the second image characteristic through the image enhancement model to obtain an enhanced image.

The present application also provides an image enhancement apparatus, including: the image enhancement model comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring an image to be enhanced and an image enhancement model, and the image enhancement model is obtained by training through any one of the image enhancement model training methods; the characteristic extraction module is used for extracting the characteristics of the image to be enhanced by utilizing the image enhancement model to obtain first image characteristics of the image to be enhanced; the mining module is used for carrying out feature mining on the first image features through a plurality of cascaded residual error networks in the image enhancement model to obtain second image features of the image to be enhanced; and the enhancement module is used for carrying out image enhancement on the image to be enhanced based on the second image characteristic through the image enhancement model to obtain an enhanced image.

The present application further provides an electronic device, which includes a memory and a processor coupled to each other, wherein the processor is configured to execute program instructions stored in the memory to implement the image enhancement model training method or the image enhancement method.

The present application also provides a computer readable storage medium having stored thereon program instructions that, when executed by a processor, implement the image enhancement model training method or the image enhancement method described above.

According to the scheme, the first image feature of the degraded image is obtained by firstly extracting the feature of the degraded image, then the feature mining is carried out on the first image feature through a plurality of cascaded residual error networks to obtain the second image feature, the degraded image is subjected to image enhancement based on the second image feature to obtain a predicted image, and then the image enhancement model is trained by utilizing the difference between the predicted image and the clear sample image. According to the image enhancement model based on the cascade residual error network, the first image features can be subjected to feature mining through the cascade residual error networks, so that the semantic expression capacity of the first image features is improved, the second image features are obtained, finally, the image enhancement model is trained by utilizing the difference between the predicted image obtained through the second image features and the clear sample image, the enhancement effect of the image enhancement model can be fully improved, and the output of the image enhancement model can be close to the clear sample image to a certain extent.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of an image enhancement model training method according to the present application;

FIG. 2 is a schematic flow chart diagram illustrating another embodiment of the image enhancement model training method of the present application;

FIG. 3 is a schematic diagram illustrating a flow of obtaining a reconstruction loss function according to the embodiment of FIG. 2;

FIG. 4 is a schematic diagram of an embodiment of the image enhancement model of the embodiment of FIG. 2;

FIG. 5 is a schematic diagram of a partial structure of an embodiment of the feature encoding subnetwork of the embodiment of FIG. 2;

FIG. 6 is a schematic diagram of a partial structure of an embodiment of a feature decoding subnetwork of the embodiment of FIG. 2;

FIG. 7 is a schematic flowchart of an embodiment of an image enhancement method of the present application;

FIG. 8 is a block diagram of an embodiment of an image enhancement model training apparatus according to the present application;

FIG. 9 is a block diagram of an embodiment of an image enhancement apparatus according to the present application;

FIG. 10 is a block diagram of an embodiment of an electronic device of the present application;

FIG. 11 is a block diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The following describes in detail the embodiments of the present application with reference to the drawings attached hereto.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present application.

The terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association describing an associated object, and there may be three relationships, e.g., a and/or B, and: a exists alone, A and B exist simultaneously, and B exists alone. In addition, in this document, the character "/", generally, the former and latter related objects are in an "or" relationship. Further, herein, "more" than two or more than two.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating an embodiment of an image enhancement model training method according to the present application. Specifically, the method may include the steps of:

step S11: and acquiring a clear sample image and a degraded image corresponding to the clear sample image.

Before training the image enhancement model, a training data set is prepared.

In a specific application scene, a GPU video card or a high-definition camera with super-high rendering capability may be used to record a lossless super-high-definition sample video, and in order to remove redundancy in sample content, a plurality of frames of images in the sample video may be extracted according to a preset frequency, so as to obtain a clear sample image. The preset frequency may be 100 frames/decimation, 200 frames/decimation, etc., and the specific preset frequency may be determined based on the actual situation, which is not limited herein.

The sample video may be a video such as a live game original painting, a dance screen recording video, an outdoor activity green screen video, and the like, and is not limited herein.

And after the clear sample image is obtained, performing degradation processing on the clear sample image to obtain a degraded image corresponding to the clear sample image. The degradation processing of the clear sample images can simulate the degradation phenomenon of various types of sample videos in practical application to carry out degradation processing, so that the training data of the training data set can be strengthened based on the specific sample types, and the image enhancement effect is improved.

Step S12: and performing feature extraction on the degraded image to obtain a first image feature of the degraded image.

And performing feature extraction on each degraded image through an image enhancement model to obtain first image features corresponding to each degraded image.

In a specific application scenario, feature extraction may be performed on the degraded image through the convolutional layer, so as to obtain a first image feature of the degraded image. In a specific application scenario, convolution processing may be performed based on three channels (RGB mode) of the degraded image to obtain multi-channel features of the degraded image, so as to obtain first image features of each degraded image.

In a specific application scenario, feature extraction may also be performed on the degraded image through a tree model, L1, L2 penalty values, or recursive feature elimination method, so as to obtain a first image feature of the degraded image. And are not limited herein.

Step S13: and performing feature mining on the first image features through a plurality of cascaded residual error networks to obtain second image features.

And performing feature mining on the first image features through a plurality of cascaded residual error networks to obtain second image features. And carrying out feature mining on the first image features through a plurality of cascaded residual error networks to enrich the image features of the degraded image and obtain second image features.

In a specific application scenario, the first image feature can be subjected to convolution processing for multiple times, so that feature mining of the first image feature is achieved, and semantic expression capability of the first image feature is improved.

Step S14: and performing image enhancement on the degraded image based on the second image characteristics to obtain a predicted image.

After the second image characteristic is obtained, the degraded image is subjected to image enhancement processing based on the second image characteristic, and an output of an image enhancement model, namely a prediction image is obtained.

Step S15: and training an image enhancement model by using the difference between the predicted image and the clear sample image.

And after the predicted image is obtained, comparing the predicted image with the clear sample image to obtain the difference between the predicted image and the clear sample image. The image enhancement model is trained based on the difference between the predicted image and the clear sample image. In a specific application scenario, a difference threshold may be set, and it is determined whether the difference between the predicted image and the clear sample image is smaller than the difference threshold, and when the difference between the predicted image and the clear sample image is not smaller than the difference threshold, the relevant parameters of the image enhancement training model are modified based on the difference between the predicted image and the clear sample image. And when the difference between the predicted image and the clear sample image is smaller than the difference threshold value, finishing the training of the image enhancement training model.

Through the above manner, the image enhancement model training method of the embodiment obtains the first image feature of the degraded image by performing feature extraction on the degraded image, then performs feature mining on the first image feature through a plurality of cascaded residual error networks to obtain the second image feature, performs image enhancement on the degraded image based on the second image feature to obtain the predicted image, and then trains the image enhancement model by using the difference between the predicted image and the clear sample image. According to the image enhancement model based on the cascade residual error network, the first image features can be subjected to feature mining through the cascade residual error networks, so that the semantic expression capacity of the first image features is improved, the second image features are obtained, finally, the image enhancement model is trained by utilizing the difference between the predicted image obtained through the second image features and the clear sample image, the enhancement effect of the image enhancement model can be fully improved, and the output of the image enhancement model can be close to the clear sample image to a certain extent.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating an image enhancement model training method according to another embodiment of the present application. In this embodiment, the live game original is used as a training sample to train the training direction of the image enhancement model. In other embodiments, other training directions, for example, directions such as a dance dynamic picture and an outdoor picture, may also be used as the training directions for training, or multiple directions may be trained simultaneously, and the subsequent degradation processing may also be performed based on an actual degradation phenomenon in the training directions, which is not limited herein.

Step S21: and acquiring a clear sample image, wherein the resolution of the clear sample image exceeds a resolution threshold, and performing transcoding degradation and noise degradation on the clear sample image to obtain a degraded image.

In live game, video quality degradation types of live game video are diversified. Firstly, the rendering capability of the game engine of each anchor is limited, which can cause the noise of the original picture of the live video, namely the game original picture, in the rendering process and the local detail is fuzzy. And the network environment of part of the anchor is poor, and high-definition video cannot be transmitted in real time, so that the video is excessively compressed before transmission, and serious compression noise is generated.

Therefore, in the step, the lossless ultrahigh-definition game original picture can be recorded through the GPU display card with the ultrahigh rendering capability to obtain a sample video, and the sample video is subjected to frame extraction to obtain a clear sample image. In a specific application scenario, the resolution of the clear sample image needs to exceed a resolution threshold to ensure the training effect of the final model, wherein the resolution threshold may be set based on the actual application, and is not limited herein.

In a specific application scene, lossless ultrahigh-definition game original pictures can be recorded for 10 hours through a GPU (graphics processing Unit) display card with ultrahigh rendering capability, various popular games are included, and in order to remove redundancy on picture content, one frame of image is extracted for every 100 frames of video, and 21600 frames of images are used in total.

And carrying out transcoding degradation and noise degradation on the clear sample image to obtain a degraded image, wherein the transcoding degradation and the noise degradation in the step need to simulate the degradation phenomenon of the actual live game because the embodiment refers to training the live game direction. In a specific application scene, a plurality of code rates, a plurality of transcoding modes and a plurality of quantization parameters can be adopted to carry out transcoding degradation on a clear sample image, and then JPEG compression, Gaussian noise, Poisson noise and the like are randomly added to simulate a plurality of degradation phenomena of a live stream in game rendering and transmission transcoding. In a specific application scenario, 10 quantization parameters (QP values) (10, 15, 24, 28, 32, 36, 38, 40, 44, 48) of 5 code rates (1M, 2M, 4M, 6M, and 8M) and 2 transcoding modes (h264 and h265) can be used, and 100 transcoding configurations are used to transcode a clear sample image to generate low-quality data, and then various degradation phenomena occurring in game rendering and transmission transcoding of a live stream are simulated by randomly increasing JPEG compression, gaussian noise, and poisson noise, so as to obtain a degradation image corresponding to each clear sample image. Through the three steps of processing, 2160000 degraded images can be generated for 21600 frames of clear sample images. The samples contain multi-dimensional degradation effects such as transcoding degradation, noise degradation and compression degradation. The clear sample image of one frame can correspond to a plurality of frames of degraded images with different degradation phenomena so as to simulate the picture degradation in actual live game as much as possible and improve the robustness of the image enhancement model.

Step S22: and performing feature extraction on the degraded image to obtain a first image feature of the degraded image.

And performing feature extraction on the degraded image through a feature extraction sub-network of the image enhancement model to obtain a first image feature of the degraded image.

In a specific application scene, feature extraction can be performed on the degraded image through the two-dimensional convolution layer, so that multi-channel first image features of the degraded image are obtained. In a specific application scenario, three image channels (RGB mode) can be mapped to 64 channels by two-dimensional 5x5 convolution, so as to obtain a first image feature with 64 channel feature. In other embodiments, the first image features of other numbers of channels may also be obtained, which is not limited herein.

Step S23: and carrying out feature coding on the first image features through a plurality of cascaded first residual error networks to obtain coded image features.

And performing feature coding on the first image feature through a feature coding sub-network of the image enhancement model to obtain a coded image feature. In particular, the feature coding sub-network of the image enhancement model may comprise a plurality of cascaded first residual networks. In a specific application scenario, the feature coding sub-network of the image enhancement model may include four cascaded first residual networks.

In a specific application scenario, the first image feature may be downsampled by using a convolutional layer with a step size of 2 through four sequentially cascaded first residual error networks, so as to obtain a coded image feature. In a specific application scenario, 2-time down-sampling can be performed through four sequentially cascaded first residual error networks by using a convolution with a step length of 2 x3, and the number of channels is increased by 2 times while each time of down-sampling is performed, so that the semantic expression capability of the first image feature is improved. Each first residual network further comprises 4 densely-connected convolutional layers with the core size of 3x3 and one convolutional layer with the core size of 1x 1. In the downsampling process, although the semantic expression capability of the features is improved, the size of the features of the coded image is changed to be matched with the display area, so that the features are conveniently mined.

In a specific application scenario, when a 64-channel first image feature is obtained, the first image feature is input into a 3x3 convolutional layer with a step size of 2 in a first residual network for convolution processing, then sequentially input into 4 convolutional layers with a kernel size of 3x3 and a convolutional layer with a kernel size of 1x1 in the first residual network for convolution processing, the output result of the convolutional layer with a kernel size of 1x1 is input into a 3x3 convolutional layer with a step size of 2 in a second first residual network for convolution processing, then sequentially input into 4 convolutional layers with a kernel size of 3x3 and a convolutional layer with a kernel size of 1x1 in the second first residual network for convolution processing, the output result of the convolutional layer with a second kernel size of 1x1 is input into a 3x3 convolutional layer with a step size of 2 in a third first residual network for convolution processing, and then sequentially input into a 4 convolutional layers with a kernel size of 3x 362 and a kernel size of 3 in the third first residual network for convolution processing And performing convolution processing on the convolution layer of 1x1, inputting the output result of the convolution layer of which the third kernel size is 1x1 into a 3x3 convolution layer of which the step size is 2 in a fourth first residual network for convolution processing, and sequentially inputting the output result into 4 convolution layers of which the kernel sizes are 3x3 and one convolution layer of which the kernel size is 1x1 in the fourth first residual network for convolution processing to finally obtain the coded image characteristics. Wherein, the number of channels of the coded image features is 64⁴Therefore, the semantic expression capability of the coded image features is improved. In other embodiments, when there are multiple residual error networks, the processing method is similar to the above application scenario, and is not described herein again.

Step S24: and performing feature decoding on the coded image features through a plurality of cascaded second residual error networks to obtain decoded image features matched with the size of the degraded image, and determining the decoded image features as second image features.

And performing feature decoding on the coded image features through a feature decoding sub-network of the image enhancement model to obtain decoded image features matched with the size of the degraded image, and determining the decoded image features as second image features. In particular, the feature coding sub-network of the image enhancement model may comprise a plurality of cascaded second residual networks. In a specific application scenario, the feature coding sub-network of the image enhancement model may include four cascaded second residual networks.

In a specific application scene, pixel reconstruction can be used for performing 2-time upsampling on first image features through at least four sequentially cascaded second residual error networks, and half of the number of channels is reduced after upsampling to obtain decoded image features matched with the size of a degraded image; each second residual network further comprises 4 densely-connected deconvolution layers with the core size of 3x3 and one deconvolution layer with the core size of 1x 1.

Specifically, when 64 is obtained⁴After the coded image of the channel is characterized, the coded image is input into a first second residual error network to be subjected to 2-time upsampling through pixel recombination (PixelShuffle), then sequentially input into 4 deconvolution layers with a kernel size of 3x3 and a deconvolution layer with a kernel size of 1x1 in the first second residual error network to be subjected to deconvolution processing, the output result of the deconvolution layer with the first kernel size of 1x1 is input into a second residual error network to be subjected to 2-time upsampling through pixel recombination, then sequentially input into 4 deconvolution layers with a kernel size of 3x3 and a deconvolution layer with a kernel size of 1x1 in the second residual error network to be subjected to deconvolution processing, the output result of the convolution deconvolution layer with a kernel size of 1x1 is input into a third second residual error network to be subjected to 2-time upsampling through pixel recombination, and then sequentially input into a deconvolution layer with a kernel size of 3x3 and a deconvolution layer with a kernel size of 1 in the third residual error network to be subjected to 2-time upsampling processing To arrange the thirdThe output result of the deconvolution layer with the kernel size of 1x1 is input into a fourth second residual network to be subjected to 2 times of upsampling through pixel recombination, and then is sequentially input into 4 deconvolution layers with the kernel size of 3x3 and one deconvolution layer with the kernel size of 1x1 in the fourth residual network to be subjected to deconvolution processing, and finally the decoded image characteristic is obtained. The number of channels of the decoded image features is 64, so that the size of the decoded image features is adapted to the degraded image, and the decoded image features are determined as second image features, namely depth convolution features of the degraded image. In other embodiments, when there are multiple residual error networks, the processing method is similar to the above application scenario, and is not described herein again.

Common upsampling methods include bilinear interpolation, transposed convolution, upsampling (unsampling), and pooling (unpoiuting), and this step is not limited to the specific method of upsampling.

Step S25: and processing the second image characteristic through a convolution layer with the kernel size of 3x3 to obtain high-frequency residual error information, and overlapping the high-frequency residual error information and the degraded image to obtain a predicted image.

And processing the second image characteristic through an image reconstruction sub-network of the image enhancement model to realize image enhancement of the degraded image.

In a specific application scenario, the second image feature is processed by a convolution layer with a kernel size of 3 × 3, so as to obtain high-frequency residual information. Since the second image feature is resized to fit the degraded image during feature decoding, the high frequency residual information of this step also fits the degraded image. And superposing the high-frequency residual error information and the degraded image to obtain a predicted image. The predicted image is the output of the image enhancement model in the training process.

Step S26: and obtaining a loss function of the image enhancement model through the difference between the predicted image and the clear sample image, and performing parameter adjustment on the image enhancement model through the loss function so as to train the image enhancement model.

And obtaining a loss function of the image enhancement model through the difference between the predicted image and the clear sample image, and performing parameter adjustment on the image enhancement model through the loss function so as to train the image enhancement model.

In a specific application scenario, the loss function includes at least one of a reconstruction loss function, a perceptual loss function, and a countering loss function. For example: the loss function may include a reconstruction loss function and a perceptual loss function, or the loss function may include a reconstruction loss function, a perceptual loss function, and an antagonistic loss function, which are not limited herein.

Referring to fig. 3, fig. 3 is a schematic diagram illustrating a flow of acquiring a reconstruction loss function according to the embodiment of fig. 2.

Step S31: and calculating to obtain the Manhattan distance between the predicted image and the clear sample image based on the predicted image and the clear sample image.

The present embodiment uses manhattan distance as the basis reconstruction loss function, and performs spatial content sub-adaptive weighting on the basis reconstruction loss function. The manhattan distance (L1 distance) between the predicted image and the clear sample image is first calculated.

Step S32: and carrying out Gaussian blur with a Gaussian kernel as a set value on the clear sample image to obtain a blurred image of the clear sample image.

And carrying out Gaussian blur with a Gaussian kernel as a set value on the clear sample image to obtain a blurred image of the clear sample after the Gaussian blur. In a specific application scenario, gaussian blurring with a gaussian kernel of 5 may be performed on a clear sample image to obtain a blurred image corresponding to the clear sample image.

Step S33: and performing subtraction on the clear sample image and the blurred image to obtain a texture residual error of the clear sample image.

And performing subtraction on the clear sample image and the blurred image to obtain a texture residual error of the clear sample image. The texture residual reflects the position of the sharp sample image where the texture difference is large.

Step S34: and carrying out normalization processing on the texture residual error to obtain the spatial weight of each position of the clear sample image.

And carrying out normalization processing on the texture residual error to obtain the spatial weight of each position of the clear sample image.

Step S35: and multiplying the space weight by the Manhattan distance to obtain a reconstruction loss function of the image enhancement model.

And multiplying the spatial weight of each position of the clear sample image obtained in the previous step by the Manhattan distance between the prediction image obtained in the step S31 and the clear sample image, and adjusting the Manhattan distance to obtain a reconstruction loss function of the image enhancement model. Therefore, the content perception weight reconstruction loss function can well restore the high-frequency area of the degraded image, and the output degraded image is clearer.

The method for acquiring the perception loss function comprises the following steps: firstly, multilayer convolution characteristics of a prediction image and a clear sample image are extracted by using a VGG deep convolution network, the multilayer convolution characteristics simultaneously comprise low-layer texture characteristics and high-layer semantic characteristics, then cosine distances between multilayer convolution characteristic images are respectively calculated, and finally weighting summation is carried out on the cosine distances of different layers, so that a perception loss value is obtained.

The countermeasure loss function generates a loss value by using a generated countermeasure network and by using a discrimination network while learning.

And based on the three loss functions, feeding back the output of the image enhancement model, so as to modify relevant model parameters of the image enhancement model based on the clear sample image and finish the training of the image enhancement model.

In a specific application scenario, in the training process of the image enhancement model, in order to improve the running speed of the image enhancement model, network pruning can be performed on the image enhancement model by using a network pruning method. Pruning is a technique of deep learning, and aims to develop a smaller and more efficient neural network. This is a model optimization technique that involves removing redundant values from the weight tensor. The operation speed of the pruned image enhancement model can be higher, and meanwhile, the calculation cost of the image enhancement model can be reduced. Specifically, whether convolution kernels in each convolution layer in the image enhancement model training are important or not is judged through an L1 paradigm; if the convolution kernel is not important, the convolution kernel is removed based on the pruning proportion, and model parameters on the image enhancement model are adjusted.

In a specific application scenario, first, in each convolution layer of the image enhancement model, whether the convolution kernel is important is determined by the L1 norm value of the convolution kernel. And setting pruning proportion, directly removing the convolution kernels which are not important in the layer, and performing fine tuning training. When the pruning proportion is determined, assuming that each convolutional layer is independent, pruning is respectively carried out on each convolutional layer under different pruning proportions, the performance of the PSNR value (peak signal-to-noise ratio) and the VMAF value (a video quality index) of the model on the verification set is evaluated, sensitive analysis is carried out, and then the reasonable pruning proportion is determined. When pruning is performed, if a certain convolution kernel in a certain convolution layer is removed, a corresponding channel in the output feature map of the certain convolution layer is lost, so that parameters on the subsequent normalization layer and the corresponding channel of the convolution layer are correspondingly adjusted to completely remove the convolution kernel, and network pruning is realized.

In a specific application scenario, after training of the image enhancement training model is finished, and before the image enhancement model is applied, deployment quantization acceleration can be performed on a deployment hardware GPU of the image enhancement model. Specifically, in the training process of the image enhancement model, the precision of FP32 can be adopted for training, and in the deployment of the image enhancement model, the low-precision data INT8 can be used for deployment. The process of reducing the FP32 precision to the INT8 precision corresponds to information recoding, and originally, one tensor is expressed by using 32 bits, and one tensor is expressed by using 8 bits.

In one specific application scenario, linear mapping (or linear quantization) can be used to map ± T | of the training data to ± 127, and direct mapping beyond the threshold ± T | to the threshold ± 127, such mapping is called saturation (saturrate). By counting the distribution of the activation values in the output result of each convolution layer in the image enhancement model, selecting a proper threshold value and mapping the larger activation values with scattered distribution to 127, the precision loss is not reduced too much. Specifically, the T value may be calculated by a KL variance value (also called KL distance), and when the activation value is greater than ± | T |, the activation value is mapped to 127 or-127 based on positive and negative values, that is, when the activation value is negative, it is mapped to-127, and when the activation value is positive, it is mapped to 127. And when the activation value is less than +/-T, the activation value is mapped into +/-127 based on a linear mapping rule, so that the deployment quantitative acceleration of the image enhancement model is completed, the running speed of the image enhancement model can be improved, the reasoning capability of the model is obviously improved under the condition that the PSNR index and the VMAF index are only slightly reduced, and the cost of large-scale deployment is greatly reduced.

By the method, the degraded image simulating the actual degradation phenomenon of the live game is prepared, so that the image enhancement model can be trained based on the live game direction to a certain extent, and the pertinence and the image enhancement effect of the image enhancement model are improved. Specifically, the image enhancement model training method comprises the steps of carrying out feature extraction, feature coding, feature decoding and image reconstruction on degraded images in sequence, wherein feature extraction and feature coding are carried out feature mining through a plurality of cascaded residual error networks, so that the semantic expression capability of image features can be fully enriched to a certain extent through the residual error networks, and image enhancement can be carried out based on the enriched image features, so that the image enhancement effect of the image enhancement model is directly improved. And the image enhancement model is optimized through three loss functions to keep the space structure of the degraded picture, improve the visual quality of the predicted image, and enrich the texture by using the label image, thereby ensuring the training effect of the image enhancement model. In addition, with the improvement of live broadcast image quality, a larger code rate is provided for subsequent transcoding service, space is saved, and bandwidth cost can be further reduced

Referring to fig. 4-6, fig. 4 is a schematic structural diagram of an embodiment of the image enhancement model of the embodiment of fig. 2. Fig. 5 is a schematic partial structure diagram of an embodiment of the feature encoding subnetwork of fig. 2, and fig. 6 is a schematic partial structure diagram of an embodiment of the feature decoding subnetwork of fig. 2.

The image enhancement model 40 of the present embodiment includes a feature extraction subnetwork 41, a feature encoding subnetwork 42, a feature decoding subnetwork 43, an image reconstruction subnetwork 44, and a loss function subnetwork 45. The image enhancement model 40 of the present embodiment may be trained by using a U-net network (image semantic segmentation network) as a basic network, and in other embodiments, may also be trained by using other image enhancement networks as the basic network.

The degraded image is input into a feature extraction sub-network 41 to be subjected to feature extraction to obtain a first image feature, the first image feature is input into a feature coding sub-network 42 to be subjected to feature coding to obtain a coded image feature, the coded image feature is input into a feature decoding sub-network 43 to be subjected to feature decoding to obtain a decoded image feature, namely a second image feature, and the second image feature is input into an image reconstruction sub-network 44 to be subjected to image reconstruction to obtain a predicted image. The training of the image enhancement model 40 is completed by performing a feedback process based on the predicted image by the loss function subnetwork 45 to modify the relevant parameters of the image enhancement model 40.

Feature extraction subnetwork 41 includes two-dimensional 5x5 convolutional layers to perform feature extraction on the degenerate graph.

Feature coding subnetwork 42 comprises a plurality of first residual networks 421, wherein first residual networks 421 comprise: step size 2 convolution RDB of 3x3, convolution layer conv1 with core size 3x3, convolution layer conv2 with core size 3x3, convolution layer conv3 with core size 3x3, convolution layer conv4 with core size 3x3, and convolution conv5 with core size 1x 1. And inputting the first image characteristic into a 3x3 convolution RDB with the step length of 2 for processing, sequentially inputting the first image characteristic into each convolution layer for characteristic coding, and finally obtaining the coded image characteristic.

Feature decoding subnetwork 43 comprises a plurality of second residual networks 431, wherein second residual networks 431 comprise: a sub-network of pixel reorganizations 4311, a deconvolution layer with kernel size 3x3 4312, a deconvolution layer with kernel size 3x3 4313, a deconvolution layer with kernel size 3x3 4314, a deconvolution layer with kernel size 3x3 4315, and a deconvolution layer with kernel size 1x1 4316. The coded image features are input into a pixel recombination sub-network 4311 for processing, and then sequentially input into each deconvolution layer for feature decoding, and finally decoded image features are obtained. Wherein feature encoding subnetwork 42 and feature decoding subnetwork 43 are symmetric.

Image reconstruction subnetwork 44 includes convolution layers of kernel size 3x3 to process the second image features for high frequency residual information.

With the above structure, the image enhancement model of the present embodiment can quickly remove the degradation phenomenon on the degraded image, thereby realizing image enhancement of the degraded image.

Referring to fig. 7, fig. 7 is a flowchart illustrating an embodiment of an image enhancement method according to the present application.

Step S51: and acquiring an image to be enhanced and an image enhancement model.

And acquiring an image to be enhanced and an image enhancement model which need to be subjected to image enhancement. In a specific application scene, the image to be enhanced may be a live game video in a live scene, a live dance video, or other kinds of live videos or other images requiring image enhancement. And the image enhancement model may be an image enhancement model trained based on the species direction of the image to be enhanced. Specifically, the training method of the image enhancement model of the present embodiment is the same as the training method of the image enhancement model of the embodiment in fig. 1 or the embodiment in fig. 2, and please refer to the foregoing, which is not repeated herein.

In a specific application scene, when the image to be enhanced is a live game video in a live broadcast scene, the image enhancement model can be an image enhancement model obtained by training based on a live game original picture as a training sample, so that the adaptability between the image enhancement model and the image to be enhanced is improved.

In a specific application scenario, the image enhancement model may include a feature extraction sub-network, a feature coding sub-network, a feature decoding sub-network, and an image reconstruction sub-network, and the feature extraction sub-network may include two-dimensional 5x5 convolutional layers. The feature coding subnetwork may comprise a plurality of first residual networks, wherein the first residual networks may comprise: a 3x3 convolutional layer with a step size of 2, 4 convolutional layers with a core size of 3x3, and one convolutional layer with a core size of 1x 1. The feature decoding subnetwork may comprise a plurality of second residual networks, wherein the second residual networks may comprise: a pixel recombination subnetwork, 4 deconvolution layers with a kernel size of 3x3, and one deconvolution layer with a kernel size of 1x 1. The image reconstruction sub-network may include convolution layers with a kernel size of 3x 3.

Step S52: and performing feature extraction on the image to be enhanced by using the image enhancement model to obtain a first image feature of the image to be enhanced.

And performing feature extraction on the image to be enhanced through the image enhancement model to obtain first image features corresponding to the image to be enhanced.

In a specific application scenario, feature extraction can be performed on an image to be enhanced through a convolution layer of an image enhancement model, so that first image features of the image to be enhanced are obtained. In a specific application scenario, convolution processing may be performed based on three channels (RGB modes) of an image to be enhanced to obtain multi-channel features of the image to be enhanced, so as to obtain first image features of the image to be enhanced.

In a specific application scenario, a first image feature of an image to be enhanced may be obtained by feature extraction of a two-dimensional 5x5 convolution layer in a feature extraction sub-network of an image enhancement model.

Step S53: and performing feature mining on the first image features through a plurality of cascaded residual error networks in the image enhancement model to obtain second image features of the image to be enhanced.

And performing feature mining on the first image features through a plurality of cascaded residual error networks in the image enhancement model to obtain second image features. And carrying out feature mining on the first image features through a plurality of cascaded residual error networks to enrich the image features of the degraded image and obtain second image features.

In a specific application scenario, the first image features may also be sequentially feature-coded through a plurality of first residual error networks of the feature coding sub-network of the image enhancement model, so as to obtain the coded image features of the image to be enhanced. And then, carrying out feature decoding on the coded image features through a feature decoding sub-network of the image enhancement model to obtain decoded image features, namely second image features of the image to be enhanced.

Step S54: and carrying out image enhancement on the image to be enhanced based on the second image characteristic through the image enhancement model to obtain an enhanced image.

After the second image characteristic is obtained, the image enhancement processing is carried out on the image to be enhanced through the image enhancement model based on the second image characteristic, and the output of the image enhancement model, namely the enhanced image, is obtained.

In a specific application scenario, the second image feature may be processed through an image reconstruction sub-network of the image enhancement model to obtain high-frequency residual information. And superposing the high-frequency residual error information and the image to be enhanced to obtain the enhanced image.

By the method, the image enhancement method of the embodiment sequentially performs feature extraction, feature coding, feature decoding and image reconstruction on the image to be enhanced through the image enhancement model, wherein the feature extraction and the feature coding perform feature mining through a plurality of cascaded residual error networks, so that the semantic expression capability of the image features can be fully enriched to a certain extent through the residual error networks, the image enhancement can be performed based on the enriched image features, the image enhancement effect of the image enhancement model is directly improved, and the visual quality of the enhanced image is improved.

Referring to fig. 8, fig. 8 is a schematic diagram of a framework of an embodiment of an image enhancement model training apparatus according to the present application. Image enhancement model training 80 includes an acquisition module 81, a feature extraction module 82, a mining module 83, an enhancement module 84, and a training module 85. The obtaining module 81 is configured to obtain a clear sample image and a degraded image corresponding to the clear sample image; the feature extraction module 82 is configured to perform feature extraction on the degraded image to obtain a first image feature of the degraded image; the mining module 83 is configured to perform feature mining on the first image feature through a plurality of cascaded residual error networks to obtain a second image feature; the enhancement module 84 is configured to perform image enhancement on the degraded image based on the second image feature to obtain a predicted image; and the training module 85 is used for training the image enhancement model by utilizing the difference between the predicted image and the clear sample image.

The obtaining module 81 is further configured to obtain a clear sample image, where a resolution of the clear sample image exceeds a resolution threshold; and carrying out transcoding degradation and noise degradation on the clear sample image to obtain a degraded image.

The mining module 83 is further configured to perform feature coding on the first image feature through a plurality of cascaded first residual error networks to obtain a coded image feature; and carrying out feature decoding on the coded image features through a plurality of cascaded second residual error networks to obtain decoded image features matched with the size of the degraded image, and determining the decoded image features as second image features.

The enhancement module 84 is further configured to process the second image feature by using a convolution layer with a kernel size of 3x3, so as to obtain high-frequency residual error information; and superposing the high-frequency residual error information and the degraded image to obtain a predicted image.

The training module 85 is further configured to obtain a loss function of the image enhancement model according to a difference between the predicted image and the clear sample image; and carrying out parameter adjustment on the image enhancement model through a loss function so as to train the image enhancement model.

According to the scheme, the image enhancement effect of the image enhancement model can be enhanced.

Referring to fig. 9, fig. 9 is a schematic diagram of a framework of an embodiment of an image enhancement apparatus according to the present application.

The image enhancement device 90 includes an obtaining module 91, configured to obtain an image to be enhanced and an image enhancement model, where the image enhancement model is obtained by training through the image enhancement model training method according to the embodiment of fig. 1 or the embodiment of fig. 2; the feature extraction module 92 is configured to perform feature extraction on the image to be enhanced by using an image enhancement model to obtain a first image feature of the image to be enhanced; the mining module 93 is configured to perform feature mining on the first image feature through a plurality of cascaded residual error networks in the image enhancement model to obtain a second image feature of the image to be enhanced; and an enhancement module 94, configured to perform image enhancement on the image to be enhanced based on the second image feature through the image enhancement model, so as to obtain an enhanced image.

According to the scheme, the image quality of the image to be enhanced can be improved.

Referring to fig. 10, fig. 10 is a schematic diagram of a frame of an embodiment of an electronic device according to the present application. The electronic device 100 comprises a memory 101 and a processor 102 coupled to each other, and the processor 102 is configured to execute program instructions stored in the memory 101 to implement the steps of any of the above-described embodiments of the image enhancement model training method or the steps of any of the above-described embodiments of the image enhancement method. In one particular implementation scenario, electronic device 100 may include, but is not limited to: a microcomputer, a server, and the electronic device 100 may further include a mobile device such as a notebook computer, a tablet computer, and the like, which is not limited herein.

In particular, the processor 102 is configured to control itself and the memory 101 to implement the steps of any of the above-described embodiments of the image enhancement model training method. Processor 102 may also be referred to as a CPU (Central Processing Unit). The processor 102 may be an integrated circuit chip having signal processing capabilities. The Processor 102 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. Additionally, the processor 102 may be commonly implemented by integrated circuit chips.

By the scheme, the image enhancement effect of the image enhancement model can be enhanced, and the image quality of the image to be enhanced can be improved.

Referring to fig. 11, fig. 11 is a block diagram illustrating an embodiment of a computer-readable storage medium according to the present application. The computer readable storage medium 110 stores program instructions 1101 capable of being executed by a processor, the program instructions 1101 being for implementing the steps of any of the above-described image enhancement model training method embodiments or the steps of any of the above-described image enhancement method embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely one type of logical division, and an actual implementation may have another division, for example, a unit or a component may be combined or integrated with another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on network elements. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims

1. An image enhancement model training method, characterized in that the image enhancement model training method comprises:

acquiring a clear sample image and a degraded image corresponding to the clear sample image;

performing feature extraction on the degraded image to obtain a first image feature of the degraded image;

performing feature mining on the first image features through a plurality of cascaded residual error networks to obtain second image features;

performing image enhancement on the degraded image based on the second image characteristics to obtain a predicted image;

training the image enhancement model using the difference between the predicted image and the clear sample image.

2. The method of claim 1, wherein the step of performing feature mining on the first image features through a plurality of cascaded residual error networks to obtain second image features comprises:

performing feature coding on the first image features through a plurality of cascaded first residual error networks to obtain coded image features;

and performing feature decoding on the coded image features through a plurality of cascaded second residual error networks to obtain decoded image features matched with the size of the degraded image, and determining the decoded image features as the second image features.

3. The method according to claim 2, wherein the step of performing feature coding on the first image feature through a plurality of cascaded first residual error networks to obtain the coded image feature of the first image feature comprises:

using convolution layers with the step length of 2 to carry out down-sampling on the first image characteristics through at least four sequentially cascaded first residual error networks to obtain the coded image characteristics;

wherein each of the first residual networks further comprises 4 densely-connected convolutional layers with a core size of 3x3 and one convolutional layer with a core size of 1x 1.

4. The method according to claim 2, wherein the step of feature-decoding the encoded image features through a plurality of cascaded second residual networks to obtain decoded image features matching the degraded image size and determining the decoded image features as the second image features comprises:

the first image features are up-sampled by using pixel reconstruction through at least four sequentially cascaded second residual error networks to obtain decoded image features matched with the size of the degraded image;

wherein each second residual network further comprises 4 densely-connected deconvolution layers with a core size of 3x3 and one deconvolution layer with a core size of 1x 1.

5. The method according to claim 1, wherein the step of performing image enhancement on the degraded image based on the second image feature to obtain a predicted image comprises:

processing the second image characteristic through a convolution layer with the kernel size of 3x3 to obtain high-frequency residual error information;

and superposing the high-frequency residual error information and the degraded image to obtain the predicted image.

6. The method according to claim 1, wherein the step of training the image enhancement model using the difference between the predicted image and the clear sample image comprises:

obtaining a loss function of the image enhancement model according to the difference between the predicted image and the clear sample image;

and carrying out parameter adjustment on the image enhancement model through the loss function so as to train the image enhancement model.

7. The method of claim 6, wherein the loss function comprises at least one of a reconstruction loss function, a perceptual loss function, and a countering loss function.

8. The method according to claim 7, wherein when the loss function comprises a reconstruction loss function, the step of obtaining the loss function of the image enhancement model from the difference between the predicted image and the clear sample image comprises:

calculating the Manhattan distance between the predicted image and the clear sample image based on the predicted image and the clear sample image;

performing Gaussian blur with a Gaussian kernel as a set value on the clear sample image to obtain a blurred image of the clear sample image;

performing subtraction on the clear sample image and the blurred image to obtain a texture residual error of the clear sample image;

carrying out normalization processing on the texture residual error to obtain the spatial weight of each position of the clear sample image;

and multiplying the space weight by the Manhattan distance to obtain a reconstruction loss function of the image enhancement model.

9. The method for training an image enhancement model according to claim 1, wherein the step of acquiring a sharp sample image and a corresponding degraded image comprises:

acquiring the clear sample image, wherein the resolution of the clear sample image exceeds a resolution threshold;

and carrying out transcoding degradation and noise degradation on the clear sample image to obtain the degraded image.

10. The image enhancement model training method of claim 1, further comprising:

judging whether convolution kernels in each convolution layer in the image enhancement model training are important or not through an L1 paradigm;

and if the convolution kernel is not important, removing the convolution kernel based on the pruning proportion, and adjusting model parameters on the image enhancement model.

11. An image enhancement model training apparatus, comprising:

the acquisition module is used for acquiring a clear sample image and a degraded image corresponding to the clear sample image;

the characteristic extraction module is used for extracting the characteristics of the degraded image to obtain first image characteristics of the degraded image;

the mining module is used for carrying out feature mining on the first image features through a plurality of cascaded residual error networks to obtain second image features;

the enhancement module is used for carrying out image enhancement on the degraded image based on the second image characteristics to obtain a predicted image;

and the training module is used for training the image enhancement model by utilizing the difference between the predicted image and the clear sample image.

12. An image enhancement method, characterized in that the image enhancement method comprises:

acquiring an image to be enhanced and an image enhancement model, wherein the image enhancement model is obtained by training according to the image enhancement model training method of any one of claims 1-10;

performing feature extraction on the image to be enhanced by using the image enhancement model to obtain a first image feature of the image to be enhanced;

performing feature mining on the first image features through a plurality of cascaded residual error networks in the image enhancement model to obtain second image features of the image to be enhanced;

and carrying out image enhancement on the image to be enhanced based on the second image characteristic through the image enhancement model to obtain an enhanced image.

13. An image enhancement apparatus, characterized in that the image enhancement apparatus comprises:

an obtaining module, configured to obtain an image to be enhanced and an image enhancement model, where the image enhancement model is obtained by training according to the image enhancement model training method according to any one of claims 1 to 10;

the characteristic extraction module is used for extracting the characteristics of the image to be enhanced by utilizing the image enhancement model to obtain first image characteristics of the image to be enhanced;

the mining module is used for carrying out feature mining on the first image features through a plurality of cascaded residual error networks in the image enhancement model to obtain second image features of the image to be enhanced;

and the enhancement module is used for carrying out image enhancement on the image to be enhanced based on the second image characteristic through the image enhancement model to obtain an enhanced image.

14. An electronic device comprising a memory and a processor coupled to each other, the processor being configured to execute program instructions stored in the memory to implement the image enhancement model training method according to any one of claims 1 to 10 or the image enhancement method according to claim 12.

15. A computer-readable storage medium having stored thereon program instructions which, when executed by a processor, implement the image enhancement model training method of any one of claims 1 to 10 or the image enhancement method of claim 12.