CN116894801A

CN116894801A - Image quality enhancement method and device, electronic equipment and storage medium

Info

Publication number: CN116894801A
Application number: CN202310892867.1A
Authority: CN
Inventors: 周凡
Original assignee: Guangzhou Huya Technology Co Ltd
Current assignee: Guangzhou Huya Technology Co Ltd
Priority date: 2023-07-19
Filing date: 2023-07-19
Publication date: 2023-10-17

Abstract

The invention provides an image quality enhancement method, an image quality enhancement device, electronic equipment and a storage medium, and relates to the technical field of image processing. The method comprises the following steps: based on semantic priori of a preset service scene, performing semantic segmentation on an original high-quality image in an original image set to obtain a semantic segmentation result; according to the semantic segmentation result, carrying out data enhancement on the original low-quality images in the original image set to obtain a data enhancement result; inputting the data enhancement result into a pre-constructed image quality enhancement model for training to obtain a trained image quality enhancement model; and the trained image quality enhancement model is utilized to enhance the image quality of the image in the preset service scene, so that the condition that the original low-quality image is directly used as input data of the trained image quality enhancement model is avoided, and the practical application effect of the trained image quality enhancement model is improved.

Description

Image quality enhancement method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image quality enhancement method, an image quality enhancement device, an electronic device, and a storage medium.

Background

Image quality enhancement techniques essentially boost all edge texture details in an image. The current scheme is to directly input low-quality images into an image quality enhancement model for training, then take the corresponding high-quality images as labels, and optimize model parameters to obtain a final image quality enhancement model which can only treat the whole image in a general way when the final image quality enhancement model is applied to an actual service scene, so that the enhancement effect is poor, and the service requirement is difficult to be met.

Disclosure of Invention

In order to overcome the defects in the prior art, the embodiment of the invention provides an image quality enhancement method, an image quality enhancement device, electronic equipment and a storage medium, which can be used for solving the problems of the prior art.

The technical scheme of the invention can be realized as follows:

in a first aspect, the present invention provides an image quality enhancement method, the method comprising:

based on semantic priori of a preset service scene, performing semantic segmentation on an original high-quality image in an original image set to obtain a semantic segmentation result;

according to the semantic segmentation result, carrying out data enhancement on the original low-quality images in the original image set to obtain a data enhancement result;

inputting the data enhancement result into a pre-constructed image quality enhancement model for training to obtain a trained image quality enhancement model;

And carrying out image quality enhancement on the images in the preset service scene by using the trained image quality enhancement model.

Optionally, the semantic segmentation result includes a first target semantic region mask image, a second target semantic region mask image and a transition region mask image, and the step of performing data enhancement on the original low-quality image in the original image set according to the semantic segmentation result to obtain a data enhancement result includes:

based on the first target semantic region mask image, fusing the original low-quality image and the original high-quality image to obtain a first low-quality image, wherein the first low-quality image has a clear semantic region;

based on the second target semantic region mask image, fusing the original low-quality image and the original high-quality image to obtain a second low-quality image, wherein the second low-quality image has a clear background region;

and fusing the original low-quality image and the original high-quality image based on the transition region mask image to obtain a third low-quality image, wherein the third low-quality image has a transition region meeting the preset definition, and the data enhancement result comprises the first low-quality image, the second low-quality image and the third low-quality image.

Optionally, the step of fusing the original low-quality image and the original high-quality image based on the first target semantic region mask image to obtain a first low-quality image includes:

processing the original high-quality image by using the first target semantic region mask image to obtain a high-quality first target semantic region;

respectively processing the original low-quality image and the original high-quality image by using the mask image of the second target semantic region to obtain a low-quality second target semantic region and a high-quality second target semantic region;

and carrying out weighted fusion on the low-quality second target semantic region and the high-quality second target semantic region, and combining a fusion result with the high-quality semantic image to obtain the first low-quality image.

Optionally, the step of fusing the original low-quality image and the original high-quality image based on the second target semantic region mask image to obtain a second low-quality image further includes:

processing the original high-quality image by using the mask image of the second target semantic region to obtain a high-quality second target semantic region;

respectively processing the original low-quality image and the original high-quality image by using the first target semantic region mask image to obtain a low-quality first target semantic region and a high-quality first target semantic region;

And carrying out weighted fusion on the low-quality first target semantic region and the high-quality first target semantic region, and combining a fusion result with the high-quality second target semantic region to obtain the second low-quality image.

Optionally, the step of fusing the original low-quality image and the original high-quality image based on the transition region mask image to obtain a third low-quality image further includes:

generating a non-transition region mask image based on the first target semantic region mask image, the second target semantic region mask image, and the transition region mask image;

processing the original high-quality image by using the non-transition region mask image to obtain a high-quality non-transition region;

respectively processing the original low-quality image and the original high-quality image by utilizing the transition region mask image to obtain a low-quality transition region and a high-quality transition region;

and carrying out weighted fusion on the low-quality transition region and the high-quality transition region, and combining a fusion result with the high-quality non-transition region to obtain the third low-quality image.

Optionally, the data enhancement result includes a plurality of enhancement image sets corresponding to different weighting parameters, and the step of inputting the data enhancement result into a pre-constructed image quality enhancement model for training to obtain a trained image quality enhancement model includes:

And inputting each enhanced image set into a pre-constructed image quality enhancement model for training according to the weighted parameters from small to large, so as to obtain an image quality enhancement model to be adjusted.

Optionally, the method further comprises:

and under the condition that the weighting parameter corresponding to the enhanced image set of the input model is a preset maximum value, updating the enhanced image set of the input model by utilizing the original low-quality image, and adjusting the trained image quality enhanced model by utilizing the updated enhanced image set so as to minimize the loss function of the trained image quality enhanced model.

In a second aspect, the present invention provides an image quality enhancement apparatus, the apparatus comprising:

the semantic segmentation module is used for carrying out semantic segmentation on the original high-quality images in the original image set based on semantic priori of a preset service scene to obtain a semantic segmentation result;

the data enhancement module is used for carrying out data enhancement on the original low-quality images in the original image set according to the semantic segmentation result to obtain a data enhancement result;

the model training module is used for inputting the data enhancement result into a pre-constructed image quality enhancement model for training to obtain a trained image quality enhancement model;

And the image quality enhancement module is used for enhancing the image quality of the image in the preset service scene by utilizing the trained image quality enhancement model.

In a third aspect, the present invention provides an electronic device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, implements the image quality enhancement method as described in the foregoing first aspect.

In a fourth aspect, the present invention provides a computer-readable storage medium storing a computer program which, when executed, implements the image quality enhancement method according to the foregoing first aspect.

Compared with the prior art, the image quality enhancement method provided by the invention comprises the following steps: based on semantic priori of a preset service scene, performing semantic segmentation on an original high-quality image in an original image set to obtain a semantic segmentation result; according to the semantic segmentation result, carrying out data enhancement on the original low-quality images in the original image set to obtain a data enhancement result; inputting the data enhancement result into a pre-constructed image quality enhancement model for training to obtain a trained image quality enhancement model; and carrying out image quality enhancement on the images in the preset service scene by using the trained image quality enhancement model. The data for training the image quality enhancement model is obtained by carrying out semantic segmentation on the original high-quality image and then carrying out data enhancement on the original low-quality image by using the semantic segmentation result, so that the original low-quality image is prevented from being directly used as input data of the training image quality enhancement model, and the practical application effect of the trained image quality enhancement model is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a diagram of an exemplary training of an image quality enhancement model according to an embodiment of the present invention;

fig. 2 is a flowchart of an image quality enhancement method according to an embodiment of the present invention;

FIG. 3 is a diagram of an exemplary training image quality enhancement model according to an embodiment of the present invention;

FIG. 4 is a functional block diagram of an image quality enhancement device according to an embodiment of the present invention;

fig. 5 is a schematic block diagram of an electronic device according to an embodiment of the present invention.

Icon: 100-an image quality enhancing device; 101-a semantic segmentation module; 102-a data enhancement module; 103-a model training module; 104, an image quality enhancement module; 200-an electronic device; 210-memory; 220-processor.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

Furthermore, the terms "first," "second," and the like, if any, are used merely for distinguishing between descriptions and not for indicating or implying a relative importance.

It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.

As shown in fig. 1, in the conventional image quality enhancement method, an original low-quality image is directly input into an image quality enhancement model, a reconstructed image is output, and then a corresponding original high-quality image is used as a label to optimize model parameters, so as to obtain a final image quality enhancement model.

When the final image quality enhancement model is applied in an actual service scene, the whole image can be processed in a general way, the specific enhancement can not be performed on a specific area, the enhancement effect in a target area is poor, and the service requirement is difficult to achieve.

In order to improve the enhancement effect of image quality in a service scene, an embodiment of the present invention provides an image quality enhancement method, which will be described in detail below.

Referring to fig. 2, the image quality enhancement method includes steps S101 to S104.

S101, carrying out semantic segmentation on an original high-quality image in an original image set based on semantic priori of a preset service scene to obtain a semantic segmentation result.

The preset service scene can be an entertainment live broadcast scene, an urban traffic scene and the like, and a plurality of semantic areas can be segmented from the image by adopting a semantic segmentation model corresponding to the preset service scene, and a corresponding semantic area mask image is acquired.

For example, in an urban traffic scene, a semantic segmentation model corresponding to the traffic scene may be used, and a plurality of semantic regions such as pedestrian regions, vehicle regions, road regions, background regions, etc. may be segmented, and corresponding pedestrian region mask images, vehicle region mask images, road region mask images, and background region mask images may be acquired.

The original image set comprises a plurality of image pairs, each image pair comprises a frame of original low-quality image and a frame of original high-quality image, and the original low-quality image and the original high-quality image belonging to the same image pair have the same content. In order to ensure the accuracy of the semantic segmentation result, the original high-quality image containing more detail information can be selected for semantic segmentation.

By way of example, assuming that the preset business scenario is an entertainment live scenario in which the audience is typically more interested in the anchor, the image of the scenario may be divided into two semantic regions, an anchor portrait and a background. For the original high-quality image in each image pair contained in the original image set, a portrait segmentation model can be adopted to segment a mask image F epsilon R of the region of the anchor portrait according to the pixels at each position on the original high-quality image ^H×W Background area mask image B epsilon R ^H×W The segmentation process may be as follows:

for the position (i, j) on the original high-quality image, if the pixel of the position belongs to the main broadcasting portrait area, the image F epsilon R is masked in the main broadcasting portrait area ^H×W The pixel F (i, j) at this position is 1, and the mask image B ε R is masked in the second target semantic region ^H×W The pixel B (i, j) at this position is 0.

Similarly, if the pixel at that location belongs to the background region, then mask image F ε R in the anchor portrait region ^H×W The pixel F (i, j) at this position is 0, and the mask image B ε R is masked in the second target semantic region ^H×W The pixel B (i, j) at this position is 1.

Dividing out a main cast portrait region mask image F epsilon R ^H×W And background area mask image B epsilon R ^H×W For the transition region between the anchor portrait region and the background region, a corresponding transition region mask image S epsilon R needs to be generated ^H×W 。

For the position (i, j) on the original high-quality image, if the pixel of the position belongs to the transition region between the anchor portrait region and the background region, the mask image S epsilon R is masked in the transition region ^H×W On the pixel at that locationS (i, j) is 1, whereas if the pixel at the position does not belong to the transition region, the mask image S epsilon R is masked in the transition region ^H×W The pixel S (i, j) at this position is 0.

Can also mask the transition region image S epsilon R ^H×W Gaussian blur is performed to expand the range of the interface area, and after Gaussian blur, the pixel is higher than a set threshold value ^ε The pixels at each position of (2) are set to 1, and the pixels at other positions are set to 0, so as to obtain a final transition region mask image S epsilon R ^H×W 。

As can be seen from the above segmentation process, the image F epsilon R is masked in the anchor portrait region ^H×W In the above, for the pixel F (i, j) having the position (i, j), if F (i, j) is 1, it means that the position belongs to the anchor portrait region, and if F (i, j) is 0, it means that the position does not belong to the anchor portrait region.

Masking image B epsilon R in background area ^H×W In the above, for the pixel B (i, j) having the position (i, j), if B (i, j) is 1, it means that the position belongs to the background area, and if B (i, j) is 0, it means that the position does not belong to the background area.

The mask image in the transition area is S epsilon R ^H×W In the above, for the pixel S (i, j) having the position (i, j), if S (i, j) is 1, it means that the position belongs to the transition region between the anchor portrait region and the background region, and if S (i, j) is 0, it means that the position does not belong to the transition region.

S102, carrying out data enhancement on the original low-quality images in the original image set according to the semantic segmentation result to obtain a data enhancement result.

Since the image in any business scene can be divided into at least more than two semantic areas, the emphasis of image quality enhancement is often a target semantic area with larger relevance to the image application in all the semantic areas.

For example, an image of an urban traffic scene may divide a plurality of semantic regions such as pedestrian regions, vehicle regions, road regions, background regions, etc., and road safety is mainly related to pedestrian regions, vehicle regions, and thus, pedestrian regions and vehicle regions may be regarded as target semantic regions of enhanced image quality.

For an entertainment live broadcast scene, the image of the scene can be divided into two semantic areas, namely a main broadcasting portrait and a background, and the main broadcasting portrait and the background can be used as target semantic areas with enhanced image quality.

In addition, in order to secure the overall effect of the image quality enhancement image, attention needs to be paid to the transition region between each target semantic region.

In the embodiment of the invention, the semantic segmentation result comprises a first target semantic region mask image, a second target semantic region mask image and a transition region mask image.

And carrying out data enhancement on the original low-quality image in each image pair contained in the original image set by utilizing the first target semantic region mask image, the second target semantic region mask image and the transition region mask image to obtain a data enhancement result for training an image quality enhancement model.

The data enhancement results corresponding to the original low-quality images in each image pair contained in the original image set comprise a first low-quality image, a second low-quality image and a third low-quality image, wherein the first low-quality image has a clear first target semantic region and a second target semantic region meeting preset definition, the second low-quality image has a clear second target semantic region and a first target semantic region meeting preset definition, and the third low-quality image has a clear non-transition region and a transition region meeting preset definition.

The implementation procedure of step S102 will be described in detail based on the following settings.

Assuming that a preset service scene is an entertainment live broadcast scene, setting an original low-quality image in each image pair contained in an original image set as I _LR ∈R ^H×W The corresponding original high-quality image is I _HR ∈R ^H×W 。

From I _HR ∈R ^H×W Upper dividing out the image F E R of the main cast portrait region mask ^H×W And background area mask image B epsilon R ^H×W The first target semantic region mask image and the second target semantic region mask image are respectively.

From I _HR ∈R ^H×W The mask image of the transition region for the transition region between the anchor portrait region and the background region is segmented into S epsilon R ^H×W 。

Step S102 may include substeps S102-1 through S102-3.

S102-1, fusing the original low-quality image and the original high-quality image based on the first target semantic region mask image to obtain a first low-quality image.

It will be appreciated that in an entertainment live scene, the first low quality image includes a clear presenter image area and a background area that meets a preset definition, the preset definition being related to weighting parameters at the time of the fusion process.

In a possible implementation, the implementation procedure of step S102-1 may be as follows:

s102-1a, processing the original high-quality image by using the first target semantic region mask image to obtain a high-quality first target semantic region.

For any position on the original high-quality image, if the pixel of the position on the first target semantic region mask image is 1, the pixel of the position on the original high-quality image is reserved, and if the pixel of the position on the first target semantic region mask image is 0, the pixel of the position on the original high-quality image is not reserved, and each position of the reserved pixel on the original high-quality image forms a high-quality first target semantic region.

Understandably, in an entertainment live scene, the image F E R is masked with a presenter's portrait area ^H×W For original high-quality image I _HR ∈R ^H×W Processing to obtain high-quality anchor portrait area Wherein->Representing a pixel-by-pixel multiplication.

S102-1b, respectively processing an original low-quality image and an original high-quality image by using a mask image of a second target semantic region to obtain a low-quality second target semantic region and a high-quality second target semantic region;

for any position on the original low-quality image, if the pixel of the position on the second target semantic region mask image is 1, the pixel of the position on the original low-quality image is reserved, and if the pixel of the position on the second target semantic region mask image is 0, the pixel of the position on the original low-quality image is not reserved, and each position of the reserved pixel on the original low-quality image forms a low-quality second target semantic region.

It will be appreciated that in an entertainment live scene, the background region mask image B ε R is utilized ^H×W For original low-quality image I _LR ∈R ^H×W Processing to obtain low-quality background region

For any position on the original high-quality image, if the pixel of the position on the second target semantic region mask image is 1, the pixel of the position on the original high-quality image is reserved, and if the pixel of the position on the second target semantic region mask image is 0, the pixel of the position on the original high-quality image is not reserved, and each position of the reserved pixel on the original high-quality image forms a high-quality second target semantic region.

It will be appreciated that in an entertainment live scene, the background region mask image B ε R is utilized ^H×W For original high-quality image I _HR ∈R ^H×W Processing to obtain high-quality background region

S102-1c, carrying out weighted fusion on the low-quality second target semantic region and the high-quality second target semantic region, and combining the fusion result with the high-quality first target semantic region to obtain a first low-quality image.

In an entertainment live scene, a first low-quality imageTop-quality anchor portrait area +.>Low background area->High quality background area->The following formula is satisfied:

where α ε [0,1] represents the weighting parameters.

It will be appreciated that when α=0, the above formula may degenerate to Namely, the main broadcasting portrait area and the background area on the first low-quality image are both from the original high-quality image, and correspondingly, when model training is carried out subsequently, the model can learn identity mapping;

when α=1, the above formula may degrade intoI.e. the background areas in the original high-quality image are all replaced by the background areas of the original low-quality image.

And 0 < alpha < 1, the background area on the first low-quality image is obtained by weighted summation of the background area of the original low-quality image and the background area of the original high-quality image.

Obviously, when the value of α is gradually increased from 0 to 1, the image quality of the background area on the first low-quality image gradually decreases, and accordingly, when model training is performed, as the number of training steps increases, the model learning difficulty gradually increases, so that after the image quality of the original low-quality image is reached, the trained model is converged.

S102-2, fusing the original low-quality image and the original high-quality image based on the second target semantic region mask image to obtain a second low-quality image.

It will be appreciated that in an entertainment live scene, the second low quality image includes a clear background region and a presenter region that meets a preset definition, the preset definition being related to weighting parameters at the time of the fusion process.

In a possible implementation, the implementation procedure of step S102-2 may be as follows:

s102-2a, processing the original high-quality image by using the mask image of the second target semantic region to obtain a high-quality second target semantic region.

It will be appreciated that in an entertainment live scene, the background region mask image B ε R is utilized ^H×W For original high-quality image I _HR ∈R _H×W Processing to obtain high-quality background region

S102-2b, the original low-quality image and the original high-quality image are respectively processed by using the mask image of the first target semantic region, so that a low-quality first target semantic region and a high-quality first target semantic region are obtained.

For any position on the original low-quality image, if the pixel of the position on the first target semantic region mask image is 1, the pixel of the position on the original low-quality image is reserved, and if the pixel of the position on the first target semantic region mask image is 0, the pixel of the position on the original low-quality image is not reserved, and each position of the reserved pixel on the original low-quality image forms a low-quality first target semantic region.

Understandably, in an entertainment live scene, the image F E R is masked with a presenter's portrait area ^H×W For original low-quality image I _LR ∈R ^H×W Processing to obtain a low-quality anchor portrait area

Understandably, in an entertainment live scene, the image F E R is masked with a presenter's portrait area ^H×W For original high-quality image I _HR ∈R ^H×W Processing to obtain high-quality anchor portrait area

S102-2c, carrying out weighted fusion on the low-quality first target semantic region and the high-quality first target semantic region, and combining the fusion result with the high-quality second target semantic region to obtain a second low-quality image.

In an entertainment live scene, a first low-quality imageUpper high quality background area->Low quality anchor portrait area->High quality anchor portrait area +.>The following formula is satisfied:

where α ε [0,1] represents the weighting parameters.

It will be appreciated that when α=0, both the anchor portrait region and the background region on the second low quality image are from the original high quality image, and accordingly, the model can learn an identity map when model training is performed subsequently;

when α=1, i.e. the anchor portrait region in the original high-quality image is replaced by the anchor portrait region of the original low-quality image.

And when 0 < alpha < 1, the anchor portrait area on the second low-quality image is obtained by carrying out weighted summation on the anchor portrait area of the original low-quality image and the anchor portrait area of the original high-quality image.

Obviously, when the value of α is gradually increased from 0 to 1, the image quality of the anchor portrait area on the second low-quality image gradually decreases, and accordingly, when model training is performed, as the training step number increases, the model learning difficulty gradually increases, so that after the image quality of the original low-quality image is reached, the trained model is converged.

S102-3, fusing the original low-quality image and the original high-quality image based on the transition region mask image to obtain a third low-quality image.

It will be appreciated that in an entertainment live scene, the third low quality image includes a clear non-transition region and a transition region (region between the presenter region and the background region) that meets a preset definition, the preset definition being related to the weighting parameters at the time of the fusion process.

In a possible implementation, the implementation procedure of step S102-3 may be as follows:

s102-3a, generating a non-transition region mask image based on the first target semantic region mask image, the second target semantic region mask image and the transition region mask image.

Live entertainment sceneThe non-transition region mask image can be expressed as F+B-S, namely, the main cast portrait region mask image F epsilon R is firstly ^H×W Mask image B epsilon R with background area ^H×W After the combination, the transition region mask image S epsilon R is removed from the combination result ^H×W 。

S102-3b, processing the original high-quality image by using the non-transition region mask image to obtain a high-quality non-transition region.

For any position on the original high-quality image, if the pixel of the position on the non-transition area mask image is 1, the pixel of the position on the original high-quality image is reserved, and if the pixel of the position on the non-transition area mask image is 0, the pixel of the position on the original high-quality image is not reserved, and each position of the reserved pixel on the original high-quality image forms a high non-transition area.

It will be appreciated that in an entertainment live scene, the original high-quality image I is masked with the non-transition region mask image F+B-S _HR ∈R ^H×W Processing to obtain high-quality non-transition region

S102-3c, respectively processing the original low-quality image and the original high-quality image by using the transition region mask image to obtain a low-quality transition region and a high-quality transition region.

For any position on the original low-quality image, if the pixel of the position on the transition area mask image is 1, the pixel of the position on the original low-quality image is reserved, and if the pixel of the position on the transition area mask image is 0, the pixel of the position on the original low-quality image is not reserved, and each position of the reserved pixel on the original low-quality image forms a low-quality transition area.

It will be appreciated that in an entertainment live scene, the transition region mask image S ε R is utilized ^H×W For original low-quality image I _LR ∈R ^H×W Processing to obtain low-quality transition region

For any position on the original high-quality image, if the pixel of the position on the transition area mask image is 1, the pixel of the position on the original high-quality image is reserved, and if the pixel of the position on the transition area mask image is 0, the pixel of the position on the original high-quality image is not reserved, and each position of the reserved pixel on the original high-quality image forms a high-quality transition area.

It will be appreciated that in an entertainment live scene, the transition region mask image S ε R is utilized ^H×W For original high-quality image I _HR ∈R ^H×W Processing to obtain a high-quality transition region

And S102-3d, carrying out weighted fusion on the low-quality transition region and the high-quality transition region, and combining the fusion result with the high-quality non-transition region to obtain a third low-quality image.

In the entertainment live scene, the third low-quality imageSuperior non-transition region-> Low mass transition region->High-quality transition region->The following formula is satisfied:

where α ε [0,1] represents the weighting parameters.

It will be appreciated that when α=0, both the transition and non-transition regions on the third low quality image are from the original high quality image, and accordingly, the model can learn an identity map when model training is performed subsequently;

When α=1, i.e., the transition regions in the original high-quality image are all replaced by the transition regions of the original low-quality image.

And when 0 < alpha < 1, the transition region on the third low-quality image is obtained by weighted summation of the transition region of the original low-quality image and the transition region of the original high-quality image.

Obviously, when ^α When the value of (2) is gradually increased from 0 to 1, the image quality of the transition region on the third low-quality image gradually decreases, and accordingly, when model training is performed, the model learning difficulty gradually increases with the increase of the training steps, so that the trained model is converged after the image quality of the original low-quality image is reached.

The following continues the description of steps S103 and S104 in fig. 2.

And S103, inputting the data enhancement result into a pre-constructed image quality enhancement model for training, and obtaining a trained image quality enhancement model.

The data enhancement result comprises a plurality of enhancement image sets corresponding to different weighting parameters, and the weighting parameters are alpha in the content.

Assuming that a preset service scene is an entertainment live broadcast scene, setting an original low-quality image in each image pair contained in an original image set as I _LR ∈R ^H×W The corresponding original high-quality image is I _HR ∈R ^H×W Any enhanced image set in the data enhancement results obtained in the steps S101 to S102 is as followsEach enhanced image set->The corresponding alpha is different.

Let the reconstructed image output by the image quality enhancement model be I _SR ∈R ^H×W The loss function of the image quality enhancement model is L.

The training targets of the image quality enhancement model are:

min L(I _HR ，I _SR )＝min L(I _HR ，Φ(Θ(I _LR )))

wherein Φ represents an image quality enhancement model, Θ (I _LR ) Representing data enhancement of the original low-quality image.

In a possible implementation, step S103 includes substeps S103-1 through S103-2.

S103-1, inputting each enhanced image set into a pre-constructed image quality enhancement model for training according to the weighted parameters from small to large, and obtaining a trained image quality enhancement model.

It will be appreciated that multiple enhanced image sets are grouped according to the value of alpha from 0 to 1And sequentially inputting a pre-constructed image quality enhancement model for training.

Due to the respective enhanced image sets being sequentially inputThe image quality of the original low-quality image is gradually reduced, so that the learning difficulty of the image quality enhancement model is gradually increased, and when the image quality of the original low-quality image is reached, namely alpha=1, the obtained trained image quality enhancement model is converged.

S103-2, under the condition that the weighting parameter corresponding to the enhanced image set of the input model is a preset maximum value, updating the enhanced image set of the input model by utilizing the original low-quality image, and adjusting the trained image quality enhanced model by utilizing the updated enhanced image set so as to minimize the loss function of the trained image quality enhanced model.

Wherein the preset maximum value may be 1, when the weighting parameter α=1, for thisEnhanced image set corresponding to weighting parametersUpdating, the updated image set is +.>

Since the trained image quality enhancement model is convergent, by controlling the original low quality image I _LR ∈R ^H×W The occurrence ratio is finely adjusted to the trained image quality enhancement model so that the loss function satisfies min L (I _HR ，I _SR )＝min L(I _HR ，Φ(Θ(I _LR )))。

Compared with the existing image quality model training process shown in fig. 1, the image quality model training process in the embodiment of the invention is different in that, for any one original low-quality image, a semantic segmentation model is adopted based on the corresponding original high-quality image, different semantic regions and transition regions are segmented, then the semantic regions and the transition regions are combined in different degrees, so that the data of the original low-quality image is enhanced, as shown in fig. 3, in an entertainment live broadcast scene, a clear portrait region is combined with a blurred background region, a blurred portrait region is combined with a clear background region, and a blurred transition region is combined with a clear non-transition region.

The difficulty of model learning is gradually increased by adjusting the difficulty of data enhancement, the problem that the model is not converged due to the fact that the initial difficulty is too large is avoided, and in the whole process, the model gradually learns the processing and optimization of different semantic areas, so that the enhancement effect of a target area is optimized in a targeted mode.

S104, utilizing the trained image quality enhancement model to enhance the image quality of the image in the preset service scene.

In the entertainment live broadcast scene, because the semantic priori highly related to the entertainment live broadcast scene is introduced in the training process of the image quality enhancement model, the trained image quality enhancement can specifically enhance the main broadcasting portrait area and the background area in the live broadcast scene, and the transitional details among the semantic areas can be kept.

In order to perform the corresponding steps in the above method embodiments and various possible implementations, an implementation of the image quality enhancement apparatus 100 is given below.

Referring to fig. 4, the image quality enhancement apparatus 100 includes a semantic segmentation module 101, a data enhancement module 102, a model training module 103, and an image quality enhancement module 104.

The semantic segmentation module 101 is configured to perform semantic segmentation on an original high-quality image in the original image set based on a semantic priori of a preset service scene, so as to obtain a semantic segmentation result.

The data enhancement module 102 is configured to perform data enhancement on the original low-quality images in the original image set according to the semantic segmentation result, so as to obtain a data enhancement result.

The model training module 103 is configured to input the data enhancement result into a pre-constructed image quality enhancement model for training, and obtain a trained image quality enhancement model.

The image quality enhancement module 104 is configured to enhance the image quality of the image in the preset service scene by using the trained image quality enhancement model.

Optionally, the semantic segmentation result includes a first target semantic region mask image, a second target semantic region mask image, and a transition region mask image, and the data enhancement module 102 is specifically configured to fuse the original low-quality image and the original high-quality image based on the first target semantic region mask image to obtain a first low-quality image, where the first low-quality image has a clear semantic region; based on the second target semantic region mask image, fusing the original low-quality image and the original high-quality image to obtain a second low-quality image, wherein the second low-quality image has a clear background region; and fusing the original low-quality image and the original high-quality image based on the transition region mask image to obtain a third low-quality image, wherein the third low-quality image has a transition region meeting the preset definition, and the data enhancement result comprises a first low-quality image, a second low-quality image and a third low-quality image.

Optionally, the data enhancement module 102 is configured to, when configured to fuse the original low-quality image and the original high-quality image based on the first target semantic region mask image to obtain the first low-quality image, specifically process the original high-quality image by using the first target semantic region mask image to obtain a high-quality first target semantic region; respectively processing the original low-quality image and the original high-quality image by using the mask image of the second target semantic region to obtain a low-quality second target semantic region and a high-quality second target semantic region; and carrying out weighted fusion on the low-quality second target semantic region and the high-quality second target semantic region, and combining the fusion result with the high-quality first target semantic region to obtain a first low-quality image.

Optionally, the data enhancement module 102 is configured to, when configured to fuse the original low-quality image and the original high-quality image based on the second target semantic region mask image to obtain the second low-quality image, specifically process the original high-quality image by using the second target semantic region mask image to obtain a high-quality second target semantic region; respectively processing an original low-quality image and an original high-quality image by using the first target semantic region mask image to obtain a low-quality first target semantic region and a high-quality first target semantic region; and carrying out weighted fusion on the low-quality first target semantic region and the high-quality first target semantic region, and combining the fusion result with the high-quality second target semantic region to obtain a second low-quality image.

Optionally, the data enhancement module 102 is configured to, when configured to fuse the original low-quality image and the original high-quality image based on the transition region mask image to obtain the third low-quality image, specifically configured to generate a non-transition region mask image based on the first target semantic region mask image, the second target semantic region mask image, and the transition region mask image; processing the original high-quality image by using the non-transition region mask image to obtain a high-quality non-transition region; respectively processing an original low-quality image and an original high-quality image by using the transition region mask image to obtain a low-quality transition region and a high-quality transition region; and carrying out weighted fusion on the low-quality transition region and the high-quality transition region, and combining the fusion result with the high-quality non-transition region to obtain a third low-quality image.

Optionally, the data enhancement result includes a plurality of enhancement image sets corresponding to different weighting parameters, and the model training module 103 is specifically configured to sequentially input each enhancement image set into a pre-constructed image quality enhancement model for training according to the weighting parameters from small to large, so as to obtain a trained image quality enhancement model.

Optionally, the model training module 103 is further specifically configured to update the enhanced image set of the input model with the original low-quality image and adjust the trained image quality enhancement model with the updated enhanced image set to minimize a loss function of the trained image quality enhancement model when the weighting parameter corresponding to the enhanced image set of the input model is a preset maximum value.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the image quality enhancement apparatus 100 described above may refer to the corresponding process in the foregoing method embodiment, and will not be described herein again.

Further, referring to fig. 5, the electronic device 200 may include a memory 210 and a processor 220.

The processor 220 may be a general-purpose central processing unit (Central Processing Unit, CPU), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the program of the image quality enhancement method provided in the above method embodiment.

The MEMory 210 may be, but is not limited to, ROM or other type of static storage device that can store static information and instructions, RAM or other type of dynamic storage device that can store information and instructions, or an electrically erasable programmable Read-Only MEMory (EEPROM), compact Read-Only MEMory (CD-ROM) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 210 may be stand alone and be coupled to the processor 220 via a communication bus. Memory 210 may also be integrated with processor 220. Wherein the memory 210 is used to store machine-executable instructions for performing aspects of the present application. Processor 220 is operative to execute machine executable instructions stored in memory 210 to implement the method embodiments described above.

The embodiment of the present application also provides a computer-readable storage medium containing a computer program, which when executed can be used to perform the related operations in the image quality enhancement method provided by the above-mentioned method embodiment.

The present invention is not limited to the above embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A method for enhancing image quality, the method comprising:

2. The method of claim 1, wherein the semantic segmentation result includes a first target semantic region mask image, a second target semantic region mask image, and a transition region mask image, and the step of performing data enhancement on the original low-quality images in the original image set according to the semantic segmentation result to obtain a data enhancement result includes:

Based on the first target semantic region mask image, fusing the original low-quality image and the original high-quality image to obtain a first low-quality image, wherein the first low-quality image has a clear first target semantic region and a second target semantic region meeting preset definition;

based on the mask image of the second target semantic region, fusing the original low-quality image and the original high-quality image to obtain a second low-quality image, wherein the second low-quality image has a clear second target semantic region and a first target semantic region meeting the preset definition;

based on the transition region mask image, fusing the original low-quality image and the original high-quality image to obtain a third low-quality image, wherein the third low-quality image has a clear non-transition region and a transition region meeting the preset definition;

the data enhancement result includes the first low-quality image, the second low-quality image, and the third low-quality image.

3. The method of claim 2, wherein the step of fusing the original low-quality image and the original high-quality image based on the first target semantic region mask image to obtain a first low-quality image comprises:

and carrying out weighted fusion on the low-quality second target semantic region and the high-quality second target semantic region, and combining a fusion result with the high-quality first target semantic region to obtain the first low-quality image.

4. The method of claim 2, wherein the step of fusing the original low-quality image and the original high-quality image based on the second target semantic region mask image to obtain a second low-quality image further comprises:

5. The method of claim 2, wherein the step of fusing the original low-quality image and the original high-quality image based on the transition region mask image to obtain a third low-quality image further comprises:

6. The method of claim 1, wherein the data enhancement result includes a plurality of enhanced image sets corresponding to different weighting parameters, and the step of inputting the data enhancement result into a pre-constructed image quality enhancement model for training, and obtaining the trained image quality enhancement model includes:

And inputting each enhanced image set into a pre-constructed image quality enhancement model in turn for training according to the weighted parameters from small to large, so as to obtain a trained image quality enhancement model.

7. The method of claim 6, wherein the method further comprises:

8. An image quality enhancement device, the device comprising:

9. An electronic device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, implements the image quality enhancement method of any one of claims 1-7.

10. A computer-readable storage medium, characterized in that it stores a computer program which, when executed, implements the image quality enhancement method according to any one of claims 1 to 7.