CN114743245A

CN114743245A - Training method of enhanced model, image processing method, device, equipment and medium

Info

Publication number: CN114743245A
Application number: CN202210375152.4A
Authority: CN
Inventors: 况志强; 许盛辉; 潘照明
Original assignee: Netease Media Technology Beijing Co Ltd
Current assignee: Netease Media Technology Beijing Co Ltd
Priority date: 2022-04-11
Filing date: 2022-04-11
Publication date: 2022-07-12

Abstract

The disclosure relates to a training method and device of an image enhancement model, an image processing method and device, electronic equipment and a computer readable medium, and belongs to the technical field of image processing. The training method of the image enhancement model comprises the following steps: acquiring a training data set of the image enhancement model, and performing image degradation processing on a training sample image in the training data set to obtain an input sample image; adding an attention focusing module into an image semantic segmentation network to construct an initial image enhancement model, and inputting the input sample image into the initial image enhancement model to obtain a corresponding output sample image; and inputting the output sample image into a discriminator to obtain a discriminator loss, and performing iterative updating on model parameters in the initial image enhancement model based on the discriminator loss to obtain the trained image enhancement model. The image enhancement quality can be effectively improved by training the image enhancement model based on the training data synthesized by the degradation model.

Description

Training method of enhanced model, image processing method, device, equipment and medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a training method and apparatus for an image enhancement model, an image processing method and apparatus, an electronic device, and a computer-readable medium.

Background

With the rise of images and videos, there are many different types of images and videos on a web page for a user to browse. However, the images and videos generally have the problems of low definition, uneven frame rate, insufficient color of the images, and the like.

The improvement of image quality is mainly based on three aspects: color enhancement, deblurring, and image super-resolution. For images or videos with poor visual impression, indexes such as saturation, sharpness and contrast of the picture can be improved through the three modules, and meanwhile noise and blur of the picture are reduced. But the effect of the treatment cannot be too strong, otherwise an unrealistic look and feel can be caused.

In view of the above, there is a need in the art for a method for effectively improving the image enhancement quality.

It is noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure and therefore may include information that does not constitute prior art that is already known to a person of ordinary skill in the art.

Disclosure of Invention

The present disclosure is directed to a training method of an image enhancement model, a training device of an image enhancement model, an electronic device, and a computer-readable medium, so as to effectively improve image enhancement quality at least to a certain extent.

According to a first aspect of the present disclosure, there is provided a training method of an image enhancement model, including:

acquiring a training data set of the image enhancement model, and performing image degradation processing on training sample images in the training data set to obtain input sample images of the image enhancement model;

adding an attention focusing module into an image semantic segmentation network to construct an initial image enhancement model, and inputting the input sample image into the initial image enhancement model to obtain an output sample image corresponding to the input sample image;

and inputting the output sample image into a discriminator to obtain corresponding discriminator loss, and performing iterative updating on model parameters in the initial image enhancement model based on the discriminator loss to obtain the trained image enhancement model.

In an exemplary embodiment of the present disclosure, the performing image degradation processing on a training sample image in the training data set to obtain an input sample image of the image enhancement model includes:

acquiring a self-training degradation method set and a random degradation method set for image degradation processing;

determining a target self-training degeneration method from the self-training degeneration method set, and determining a target random degeneration method from the random degeneration method set;

and performing image degradation processing on the training sample image by the target self-training degradation method and the target random degradation method to obtain an input sample image corresponding to the training sample image.

In an exemplary embodiment of the present disclosure, the inputting the input sample image into the initial image enhancement model to obtain an output sample image corresponding to the input sample image includes:

sequentially obtaining a feature map corresponding to each feature extraction layer of the input sample image through a plurality of feature extraction layers in the image semantic segmentation network;

adjusting a channel attention coefficient corresponding to each channel and a region attention coefficient corresponding to each region in a feature map corresponding to the feature extraction layer through an attention concentration module corresponding to each feature extraction layer in the image semantic segmentation network;

and obtaining an output sample image corresponding to the input sample image through a plurality of upsampling layers in the image semantic segmentation network based on the channel attention coefficient and the region attention coefficient corresponding to each feature extraction layer and the feature map.

In an exemplary embodiment of the disclosure, the inputting the output sample image into a discriminator to obtain a corresponding discriminator loss, and iteratively updating model parameters in the initial image enhancement model based on the discriminator loss includes:

inputting the output sample image into a discriminator, obtaining a first output result of the discriminator through an image semantic segmentation network in the discriminator, and obtaining a second output result of the discriminator through a cross network in the discriminator;

iteratively updating model parameters in the initial image enhancement model based on the first output result of the discriminator and the second output result of the discriminator.

According to a second aspect of the present disclosure, there is provided an image processing method comprising:

acquiring an original image to be processed, and inputting the original image into a pre-trained image enhancement model, wherein the image enhancement model is obtained by the training method of the image enhancement model according to any item;

and carrying out image enhancement processing on the original image through the image enhancement model to obtain an enhanced image corresponding to the original image.

In an exemplary embodiment of the present disclosure, the performing, by the image enhancement model, image enhancement processing on the original image to obtain an enhanced image corresponding to the original image includes:

sequentially obtaining a feature map corresponding to each feature extraction layer of the original image through a plurality of feature extraction layers in the image semantic segmentation network;

determining a channel attention coefficient corresponding to each channel and a region attention coefficient corresponding to each region in a feature map corresponding to the feature extraction layer through an attention concentration module corresponding to each feature extraction layer in the image semantic segmentation network;

obtaining an output image of the image enhancement model through a plurality of upsampling layers in the image semantic segmentation network based on the channel attention coefficient and the region attention coefficient corresponding to each feature extraction layer and the feature map;

and obtaining an enhanced image corresponding to the original image according to the output image of the image enhancement model.

According to a third aspect of the present disclosure, there is provided a training apparatus for an image enhancement model, comprising:

the image degradation processing module is used for acquiring a training data set of the image enhancement model and performing image degradation processing on training sample images in the training data set to obtain input sample images of the image enhancement model;

the enhancement model building module is used for adding the attention focusing module into an image semantic segmentation network to build an initial image enhancement model, inputting the input sample image into the initial image enhancement model and obtaining an output sample image corresponding to the input sample image;

and the enhancement model training module is used for inputting the output sample image into a discriminator to obtain corresponding discriminator loss, and iteratively updating model parameters in the initial image enhancement model based on the discriminator loss to obtain the trained image enhancement model.

In an exemplary embodiment of the present disclosure, the image degradation processing module includes:

the degradation method set acquisition unit is used for acquiring a self-training degradation method set and a random degradation method set for image degradation processing;

the target degradation method determining unit is used for determining a target self-training degradation method from the self-training degradation method set and determining a target random degradation method from the random degradation method set;

and the image degradation processing unit is used for carrying out image degradation processing on the training sample image through the target self-training degradation method and the target random degradation method to obtain an input sample image corresponding to the training sample image.

In an exemplary embodiment of the present disclosure, the augmentation model building module includes:

the feature map extraction unit is used for sequentially obtaining a feature map corresponding to the input sample image on each feature extraction layer through a plurality of feature extraction layers in the image semantic segmentation network;

an attention coefficient adjusting unit, configured to adjust, through an attention focusing module corresponding to each feature extraction layer in the image semantic segmentation network, a channel attention coefficient corresponding to each channel and a region attention coefficient corresponding to each region in a feature map corresponding to the feature extraction layer;

and the feature upsampling unit is used for obtaining an output sample image corresponding to the input sample image through a plurality of upsampling layers in the image semantic segmentation network based on the channel attention coefficient and the region attention coefficient corresponding to each feature extraction layer and the feature map.

In an exemplary embodiment of the present disclosure, the augmented model training module includes:

a discriminator loss determining unit, configured to input the output sample image into a discriminator, obtain a first output result of the discriminator through an image semantic segmentation network in the discriminator, and obtain a second output result of the discriminator through a cross network in the discriminator;

and the model parameter updating unit is used for iteratively updating the model parameters in the initial image enhancement model based on the first output result of the discriminator and the second output result of the discriminator.

According to a fourth aspect of the present disclosure, there is provided an image processing apparatus comprising:

the image enhancement device comprises an original image input module, a pre-training module and a processing module, wherein the original image input module is used for acquiring an original image to be processed and inputting the original image into an image enhancement model which is trained in advance, and the image enhancement model is obtained by the training device of the image enhancement model;

and the image enhancement processing module is used for carrying out image enhancement processing on the original image through the image enhancement model to obtain an enhanced image corresponding to the original image.

In an exemplary embodiment of the present disclosure, the image enhancement processing module includes:

the original image feature map extraction unit is used for sequentially obtaining a feature map corresponding to each feature extraction layer of the original image through a plurality of feature extraction layers in the image semantic segmentation network;

an attention coefficient determining unit, configured to determine, through an attention focusing module corresponding to each feature extraction layer in the image semantic segmentation network, a channel attention coefficient corresponding to each channel and a region attention coefficient corresponding to each region in a feature map corresponding to the feature extraction layer;

an output image determining unit, configured to obtain an output image of the image enhancement model through a plurality of upsampling layers in the image semantic segmentation network based on the channel attention coefficient and the region attention coefficient corresponding to each feature extraction layer and the feature map;

and the enhanced image determining unit is used for obtaining an enhanced image corresponding to the original image according to the output image of the image enhancement model.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the training method or the image processing method of the image enhancement model of any one of the above via execution of the executable instructions.

According to a sixth aspect of the present disclosure, there is provided a computer readable medium, on which a computer program is stored, the computer program, when executed by a processor, implementing the training method or the image processing method of the image enhancement model of any one of the above.

The exemplary embodiments of the present disclosure may have the following advantageous effects:

in the training method of the image enhancement model according to the exemplary embodiment of the present disclosure, an attention focusing module is added to an image semantic segmentation network to construct an initial image enhancement model, an input sample image of the image enhancement model is obtained by performing image degradation processing on a training sample image in a training data set, a corresponding discriminator loss is obtained according to the output sample image, and a model parameter in the initial image enhancement model is iteratively updated based on the discriminator loss to obtain the image enhancement model. According to the training method of the image enhancement model in the disclosed example embodiment, a more complex degradation space is considered and designed, the image enhancement model is trained based on training data synthesized by the degradation model, the obtained model has a very good effect on different types of real degradation data, the image enhancement quality is effectively improved, the manpower screening cost is saved, real-time calculation can be realized, and the visual experience of a user when the user browses images or videos is optimized.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It should be apparent that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived by those of ordinary skill in the art without inventive effort.

FIG. 1 shows a flow diagram of a method of training an image enhancement model according to an example embodiment of the present disclosure;

FIG. 2 shows a schematic flow diagram of an image degradation process on a training sample image according to an example embodiment of the present disclosure;

FIG. 3 shows a schematic flow chart of obtaining an output sample image by an initial image enhancement model according to an example embodiment of the present disclosure;

FIG. 4 illustrates a flowchart of an iterative update of model parameters based on a discriminant loss in accordance with an exemplary embodiment of the present disclosure;

FIG. 5 illustrates an overall framework diagram of a training method of an image enhancement model according to one embodiment of the present disclosure;

FIG. 6 illustrates a schematic diagram of a manner of image degradation in accordance with one particular embodiment of the present disclosure;

FIG. 7 shows a schematic diagram of a manner of random degeneration in accordance with one embodiment of the present disclosure;

FIG. 8 illustrates a schematic structural diagram of an attention module in accordance with one embodiment of the present disclosure;

FIG. 9 shows a schematic diagram of a network of discriminators in accordance with an embodiment of the present disclosure;

FIG. 10 shows a flow diagram of an image processing method of an example embodiment of the present disclosure;

FIG. 11 is a schematic flowchart illustrating an enhanced image corresponding to an original image obtained by an image enhancement model according to an exemplary embodiment of the disclosure;

FIG. 12 shows a block diagram of a training apparatus for an image enhancement model according to an example embodiment of the present disclosure;

fig. 13 shows a block diagram of an image processing apparatus of an example embodiment of the present disclosure;

FIG. 14 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The present exemplary embodiment first provides a training method of an image enhancement model. Referring to fig. 1, the method for training the image enhancement model may include the following steps:

and S110, acquiring a training data set of the image enhancement model, and performing image degradation processing on training sample images in the training data set to obtain input sample images of the image enhancement model.

And S120, adding the attention focusing module into the image semantic segmentation network to construct an initial image enhancement model, and inputting the input sample image into the initial image enhancement model to obtain an output sample image corresponding to the input sample image.

And S130, inputting the output sample image into a discriminator to obtain corresponding discriminator loss, and performing iterative updating on model parameters in the initial image enhancement model based on the discriminator loss to obtain a trained image enhancement model.

The above steps of the present exemplary embodiment will be described in more detail with reference to fig. 2 to 4.

In step S110, a training data set of the image enhancement model is obtained, and image degradation processing is performed on training sample images in the training data set to obtain input sample images of the image enhancement model.

In the present exemplary embodiment, it is first necessary to acquire a training data set of an image enhancement model, and for example, a DIV2K data set (image super-resolution reconstruction data set), a Flickr2K data set (image super-resolution reconstruction data set), a WED data set (image super-resolution reconstruction data set), and a large number of face images derived from an FFHQ (Flickr Faces high Quality) data set may be adopted as the training data set. After the training data set of the image enhancement model is determined, image degradation processing needs to be performed on training sample images in the training data set to obtain input sample images of the image enhancement model.

In this exemplary embodiment, as shown in fig. 2, performing image degradation processing on a training sample image in a training data set to obtain an input sample image of an image enhancement model may specifically include the following steps:

and S210, acquiring a self-training degradation method set and a random degradation method set for image degradation processing.

The image degradation processing can adopt two modes of self-training degradation and random degradation, the random degradation method set can comprise noise, blur, scaling, compression and other methods, the self-training degradation method set can comprise shallow layer blur, middle layer blur and high layer blur, and the shallow layer, the middle layer and the high layer respectively represent the blur degree.

Step S220, a target self-training degeneration method is determined from the self-training degeneration method set, and a target random degeneration method is determined from the random degeneration method set.

And randomly determining a self-training degeneration method from the self-training degeneration method set as a target self-training degeneration method, and randomly determining a random degeneration method from the random degeneration method set as a target random degeneration method.

And S230, carrying out image degradation processing on the training sample image through a target self-training degradation method and a target random degradation method to obtain an input sample image corresponding to the training sample image.

And processing the training sample images in a random superposition mode by a target self-training degradation method and a target random degradation method to obtain input sample images corresponding to the training sample images for sending to a network.

In step S120, the attention focusing module is added to the image semantic segmentation network to construct an initial image enhancement model, and the input sample image is input into the initial image enhancement model, so as to obtain an output sample image corresponding to the input sample image.

In the present exemplary embodiment, the backbone network of the image enhancement model adopts the structure of Unet + NATM. The Unet (image semantic segmentation network) is a segmentation network with a U-shaped symmetric structure, the left side is a feature extraction layer, the right side is an up-sampling layer, and feature maps obtained by each feature extraction layer of the Unet network are connected to the corresponding up-sampling layer, so that the feature maps of each layer can be effectively used in subsequent calculation. The NATM (Net Attention Model, network Attention module) is used for Attention processing of features and enhancing the perception learning capability of the Model. By applying the attention mechanism to the Unet split network, attention to the salient region and suppression of the irrelevant background region can be well achieved.

In this exemplary embodiment, as shown in fig. 3, inputting an input sample image into an initial image enhancement model to obtain an output sample image corresponding to the input sample image, which may specifically include the following steps:

and S310, sequentially obtaining a feature map corresponding to each feature extraction layer of the input sample image through a plurality of feature extraction layers in the image semantic segmentation network.

Firstly, a plurality of feature extraction layers on the left side of the network are segmented through image semantics, and a feature map corresponding to each feature extraction layer of an input sample image is obtained in sequence.

And S320, adjusting channel attention coefficients corresponding to all channels and area attention coefficients corresponding to all areas in the feature map corresponding to the feature extraction layers through the attention concentration module corresponding to each feature extraction layer in the image semantic segmentation network.

When the feature map obtained by each feature extraction layer is connected to the corresponding upsampling layer, the channel attention coefficient corresponding to each channel and the region attention coefficient corresponding to each region in the feature map corresponding to each feature extraction layer can be adjusted through the attention concentration module corresponding to each feature extraction layer.

The channel attention coefficient may be used to adjust the channel focusing attention of the feature map, where the channel focusing attention is a conscious attention from top to bottom, and belongs to active attention, which refers to attention that has a predetermined purpose, depends on a task, and is actively and consciously focused on a certain object. The regional attention coefficient can be used for adjusting the spatial focusing attention of the feature map, the spatial focusing attention is conscious attention from bottom to top, passive attention is concerned, attention based on significance is attention driven by external stimulation, active intervention is not needed, and the attention is not related to tasks; the pooling and gating mechanism can be viewed approximately as a bottom-up saliency-based attention mechanism.

And S330, obtaining an output sample image corresponding to the input sample image through a plurality of upper sampling layers in the image semantic segmentation network based on the channel attention coefficient and the region attention coefficient corresponding to each feature extraction layer and the feature map.

And adjusting the channel focusing attention and the space focusing attention of the feature map corresponding to each feature extraction layer based on the channel attention coefficient and the region attention coefficient corresponding to each feature extraction layer, connecting to the corresponding upper sampling layer, and obtaining an output sample image corresponding to the input sample image through a plurality of upper sampling layers.

In step S130, the output sample image is input to the discriminator to obtain a corresponding discriminator loss, and model parameters in the initial image enhancement model are iteratively updated based on the discriminator loss to obtain a trained image enhancement model.

The discriminator can be used for judging the quality of the generated image, taking the judgment result as the loss of the discriminator and carrying out iterative updating on the model parameters in the initial image enhancement model. The discrimination network can enhance the discrimination difference through parallel loss calculation.

In the present exemplary embodiment, as shown in fig. 4, inputting the output sample image into the discriminator to obtain a corresponding discriminator loss, and iteratively updating the model parameters in the initial image enhancement model based on the discriminator loss, which may specifically include the following steps:

and S410, inputting the output sample image into a discriminator, obtaining a first output result of the discriminator through an image semantic segmentation network in the discriminator, and obtaining a second output result of the discriminator through a cross network in the discriminator.

In the present exemplary embodiment, the discriminator is designed to recognize the original training sample image as "true" and the output sample image as "false", and the generator constituted by the image semantic segmentation network and the network attention focusing module aims to confuse the discriminator so as to erroneously discriminate the output sample image generated by the image semantic segmentation network as "true", thereby achieving the purpose of image reconstruction.

In the present exemplary embodiment, the training sample image and the output sample image are input to the discriminator, and the features of the images are extracted by the discriminator network, thereby determining whether the image input to the discriminator is "true" or "false". Specifically, "False" may be represented by "0" or "False", and "True" may be represented by "1" or "True" of 0. The discriminator network can be divided into two branches, one branch uses a Unet mode, the integral characteristic of the image is obtained through the image semantic segmentation network, and a first output result of the discriminator is obtained according to the integral characteristic of the image; and the other branch uses a cross network to obtain the detail characteristics of the image and obtain a second output result of the discriminator. By combining the discrimination results output by the Unet network and the cross network, the overall characteristics and the detail characteristics of the image can be better fused, so that the enhancement effect of the image is improved.

And S420, carrying out iterative updating on model parameters in the initial image enhancement model based on the first output result of the discriminator and the second output result of the discriminator.

And finally, based on the first output result and the second output result of the discriminator, the overall loss of the image enhancement model can be obtained, the neural network parameters in the image enhancement model can be iteratively updated through the back propagation of the overall loss of the image enhancement model, so that the image semantic segmentation network generates an image which is closer to the original training sample image, and the iteration is stopped until the discriminator cannot distinguish whether the input image is the image generated by the image semantic segmentation network or the original training sample image, thereby realizing the training process of the image enhancement model and obtaining the trained image enhancement model.

Fig. 5 is a general block diagram of a training method of an image enhancement model according to an embodiment of the present disclosure, which is an illustration of the above steps in this exemplary embodiment. The general frame diagram describes a model training process, an image enhancement model adopts a Unet + NATM structure, and the specific training method comprises the following steps: the method comprises the steps of carrying out random superposition processing on training sample images in a self-training degradation and random degradation mode to obtain input sample images 501, carrying out Unet generation on the network after the input sample images 501 are sent into the network to obtain output sample images 502, carrying out discrimination on the output sample images 502 through a discriminator, and carrying out iterative training on a model based on the loss of the discriminator.

As shown in fig. 6, which is a schematic diagram of image degradation modes in one embodiment of the present disclosure, the image degradation process may include two modes, i.e., self-training degradation and random degradation, wherein the random degradation method may include noise, blur, scaling and compression, and the self-training degradation method may include shallow blur, middle blur and high blur.

Fig. 7 is a schematic diagram of the random degradation mode in one embodiment of the present disclosure, which can be classified into four categories, blur, scale, noise and compression. Wherein, the blur can be classified into gaussian blur, mean blur, anisotropic blur, and the like; zooming can be divided into two-line zooming, three-line zooming, area zooming, etc.; the noise can be classified into gaussian noise, color noise, poisson noise, gray noise and the like; compression may include jepg compression, and the like.

Fig. 8 is a schematic structural diagram of an attention focusing module in an embodiment of the present disclosure, where the NATM module may be used to perform attention processing on features to enhance the perceptual learning capability of the model. By applying the attention mechanism to the Unet segmentation network, attention to the salient region and suppression to the irrelevant background region can be well realized.

Fig. 9 is a schematic structural diagram of a discriminator network in an embodiment of the disclosure, where the discriminator network may be divided into two branches, and one branch uses a Unet mode, and obtains an overall feature of an output sample image through an image semantic segmentation network, and obtains a first output result of the discriminator according to the overall feature of the image; and the other branch uses a cross network to obtain the detail characteristics of the output sample image and obtain a second output result of the discriminator. The input sizes of the two branches of the discriminator network are 512x512 respectively, and the integral features and the detail features of the images can be better fused by combining the discrimination results output by the Unet network and the cross network and then feeding back, so that the enhancement effect of the images is improved.

On the other hand, the present exemplary embodiment also provides an image processing method. Referring to fig. 10, the image processing method may include the steps of:

and S1010, acquiring an original image to be processed, and inputting the original image into a pre-trained image enhancement model.

The image enhancement model can be obtained by a training method of the image enhancement model as in fig. 1 to 4.

And S1020, carrying out image enhancement processing on the original image through the image enhancement model to obtain an enhanced image corresponding to the original image.

In this exemplary embodiment, as shown in fig. 11, performing image enhancement processing on an original image through an image enhancement model to obtain an enhanced image corresponding to the original image may specifically include the following steps:

step S1110, sequentially obtaining a feature map corresponding to the original image in each feature extraction layer through a plurality of feature extraction layers in the image semantic segmentation network.

Firstly, a plurality of feature extraction layers on the left side of the network are segmented through image semantics, and a feature map corresponding to an original image in each feature extraction layer is obtained in sequence.

Step S1120, determining a channel attention coefficient corresponding to each channel and a region attention coefficient corresponding to each region in the feature map corresponding to the feature extraction layer through the attention concentration module corresponding to each feature extraction layer in the image semantic segmentation network.

When the feature map obtained by each feature extraction layer is connected to the corresponding upsampling layer, the channel attention coefficient corresponding to each channel and the region attention coefficient corresponding to each region in the feature map corresponding to each feature extraction layer can be adjusted through the attention concentrating module corresponding to each feature extraction layer.

And S1130, obtaining an output image of the image enhancement model through a plurality of upsampling layers in the image semantic segmentation network based on the channel attention coefficient, the region attention coefficient and the feature map corresponding to each feature extraction layer.

And adjusting the channel focusing attention and the space focusing attention of the feature map corresponding to each feature extraction layer based on the channel attention coefficient and the region attention coefficient corresponding to each feature extraction layer, connecting to the corresponding upper sampling layer, and obtaining an output image corresponding to the original image through a plurality of upper sampling layers.

And S1140, obtaining an enhanced image corresponding to the original image according to the output image of the image enhancement model.

Specifically, the output image of the image enhancement model may be directly used as the enhanced image corresponding to the original image, or the size of the output image of the image enhancement model may be enlarged by using methods such as image interpolation, so as to obtain the final enhanced image.

It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken into multiple step executions, etc.

Further, the present disclosure also provides a training device for the image enhancement model. Referring to fig. 12, the training apparatus of the image enhancement model may include an image degradation processing module 1210, an enhancement model construction module 1220, and an enhancement model training module 1230. Wherein:

the image degradation processing module 1210 may be configured to obtain a training data set of an image enhancement model, and perform image degradation processing on a training sample image in the training data set to obtain an input sample image of the image enhancement model;

the enhancement model constructing module 1220 may be configured to add the attention focusing module to the image semantic segmentation network to construct an initial image enhancement model, and input the input sample image into the initial image enhancement model to obtain an output sample image corresponding to the input sample image;

the enhanced model training module 1230 may be configured to input the output sample image into a discriminator to obtain a corresponding discriminator loss, and iteratively update model parameters in the initial image enhanced model based on the discriminator loss to obtain a trained image enhanced model.

In some exemplary embodiments of the present disclosure, the image degradation processing module 1210 may include a degradation method set acquiring unit, a target degradation method determining unit, and an image degradation processing unit.

Wherein:

the degradation method set acquisition unit may be configured to acquire a self-training degradation method set and a random degradation method set for image degradation processing;

the target degeneration method determination unit may be configured to determine a target self-training degeneration method from a set of self-training degeneration methods, and a target stochastic degeneration method from a set of stochastic degeneration methods;

the image degradation processing unit may be configured to perform image degradation processing on the training sample image through a target self-training degradation method and a target random degradation method, so as to obtain an input sample image corresponding to the training sample image.

In some exemplary embodiments of the present disclosure, the enhancement model construction module 1220 may include a feature map extraction unit, an attention coefficient adjustment unit, and a feature upsampling unit. Wherein:

the feature map extraction unit can be used for sequentially obtaining a feature map corresponding to each feature extraction layer of the input sample image through a plurality of feature extraction layers in the image semantic segmentation network;

the attention coefficient adjusting unit may be configured to adjust, through an attention focusing module corresponding to each feature extraction layer in the image semantic segmentation network, a channel attention coefficient corresponding to each channel and a region attention coefficient corresponding to each region in a feature map corresponding to the feature extraction layer;

the feature upsampling unit may be configured to obtain an output sample image corresponding to the input sample image through a plurality of upsampling layers in the image semantic segmentation network based on the channel attention coefficient and the region attention coefficient corresponding to each feature extraction layer and the feature map.

In some exemplary embodiments of the present disclosure, the enhanced model training module 1230 may include a discriminator loss determination unit and a model parameter update unit. Wherein:

the discriminator loss determining unit can be used for inputting the output sample image into the discriminator, obtaining a first output result of the discriminator through an image semantic segmentation network in the discriminator and obtaining a second output result of the discriminator through a cross network in the discriminator;

the model parameter updating unit may be configured to iteratively update the model parameters in the initial image enhancement model based on the first output result of the discriminator and the second output result of the discriminator.

Further, the present disclosure also provides an image processing apparatus. Referring to fig. 13, the image processing apparatus may include an original image input module 1310 and an image enhancement processing module 1320. Wherein:

the original image input module 1310 may be configured to obtain an original image to be processed, and input the original image into a pre-trained image enhancement model, where the image enhancement model is obtained by the training apparatus of the image enhancement model as described above;

the image enhancement module 1320 may be configured to perform image enhancement processing on the original image through the image enhancement model, so as to obtain an enhanced image corresponding to the original image.

In some exemplary embodiments of the present disclosure, the image enhancement processing module 1320 may include an original image feature map extraction unit, an attention coefficient determination unit, an output image determination unit, and an enhanced image determination unit. Wherein:

the original image feature map extraction unit can be used for sequentially obtaining a feature map corresponding to the original image in each feature extraction layer through a plurality of feature extraction layers in the image semantic segmentation network;

the attention coefficient determining unit may be configured to determine, through an attention focusing module corresponding to each feature extraction layer in the image semantic segmentation network, a channel attention coefficient corresponding to each channel and a region attention coefficient corresponding to each region in a feature map corresponding to the feature extraction layer;

the output image determining unit may be configured to obtain an output image of the image enhancement model through a plurality of upsampling layers in the image semantic segmentation network based on the channel attention coefficient and the region attention coefficient corresponding to each feature extraction layer and the feature map;

the enhanced image determining unit may be configured to obtain an enhanced image corresponding to the original image according to an output image of the image enhancement model.

The specific details of each module/unit in the training apparatus for image enhancement model and the image processing apparatus have been described in detail in the corresponding method embodiment section, and are not described herein again.

FIG. 14 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device to implement an embodiment of the invention.

It should be noted that the computer system 1400 of the electronic device shown in fig. 14 is only an example, and should not bring any limitation to the function and the scope of the application of the embodiment of the present invention.

As shown in fig. 14, the computer system 1400 includes a Central Processing Unit (CPU)1401, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)1402 or a program loaded from a storage portion 1408 into a Random Access Memory (RAM) 1403. In the RAM 1403, various programs and data necessary for system operation are also stored. The CPU 1401, ROM 1402, and RAM 1403 are connected to each other via a bus 1404. An input/output (I/O) interface 1405 is also connected to bus 1404.

The following components are connected to the I/O interface 1405: an input portion 1406 including a keyboard, a mouse, and the like; an output portion 1407 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker and the like; a storage portion 1408 including a hard disk and the like; and a communication section 1409 including a network interface card such as a LAN card, a modem, or the like. The communication section 1409 performs communication processing via a network such as the internet. The driver 1410 is also connected to the I/O interface 1405 as necessary. A removable medium 1411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1410 as necessary, so that a computer program read out therefrom is installed into the storage section 1408 as necessary.

In particular, according to an embodiment of the present invention, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1409 and/or installed from the removable medium 1411. When the computer program is executed by a Central Processing Unit (CPU)1401, various functions defined in the system of the present application are executed.

It should be noted that the computer readable medium shown in the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method as described in the embodiments above.

It should be noted that although in the above detailed description several modules of the device for action execution are mentioned, this division is not mandatory. Indeed, the features and functionality of two or more of the modules described above may be embodied in one module, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module described above may be further divided into embodiments by a plurality of modules.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A training method of an image enhancement model is characterized by comprising the following steps:

2. The method for training the image enhancement model according to claim 1, wherein performing image degradation processing on the training sample images in the training data set to obtain the input sample images of the image enhancement model comprises:

3. The method for training an image enhancement model according to claim 1, wherein the inputting the input sample image into the initial image enhancement model to obtain an output sample image corresponding to the input sample image comprises:

adjusting a channel attention coefficient corresponding to each channel and a region attention coefficient corresponding to each region in a feature map corresponding to the feature extraction layer through an attention focusing module corresponding to each feature extraction layer in the image semantic segmentation network;

4. The method for training the image enhancement model according to claim 1, wherein the inputting the output sample image into a discriminator to obtain a corresponding discriminator loss, and iteratively updating model parameters in the initial image enhancement model based on the discriminator loss comprises:

5. An image processing method, comprising:

acquiring an original image to be processed, and inputting the original image into a pre-trained image enhancement model, wherein the image enhancement model is obtained by the training method of the image enhancement model according to any one of claims 1 to 4;

6. The image processing method according to claim 5, wherein the image enhancement processing on the original image through the image enhancement model to obtain an enhanced image corresponding to the original image comprises:

7. An apparatus for training an image enhancement model, comprising:

and the enhanced model training module is used for inputting the output sample image into a discriminator to obtain corresponding discriminator loss, and iteratively updating model parameters in the initial image enhanced model based on the discriminator loss to obtain the trained image enhanced model.

8. An image processing apparatus characterized by comprising:

an original image input module, configured to obtain an original image to be processed, and input the original image into a pre-trained image enhancement model, where the image enhancement model is obtained by a training apparatus of the image enhancement model as claimed in claim 8;

9. An electronic device, comprising:

a processor; and

a memory for storing one or more programs which, when executed by the processor, cause the processor to implement the method of training an image enhancement model as claimed in any one of claims 1 to 4, or the method of image processing as claimed in any one of claims 5 to 6.

10. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out a method of training an image enhancement model as set forth in any one of claims 1 to 4, or an image processing method as set forth in any one of claims 5 to 6.