CN117745544A

CN117745544A - Image super-resolution method of skin detector

Info

Publication number: CN117745544A
Application number: CN202311763736.XA
Authority: CN
Inventors: 孙建华
Original assignee: Beijing Adss Development Co ltd
Current assignee: Beijing Adss Development Co ltd
Priority date: 2023-12-20
Filing date: 2023-12-20
Publication date: 2024-03-22

Abstract

The invention discloses an image super-resolution method of a skin detector, and relates to the technical field of image processing; the image super-resolution method of the skin detector is used for processing a low-resolution image captured by the skin detector by constructing a high-order degradation model, a generator model and a discriminator model based on a deep learning model for generating an antagonism network, and optimizing the visual quality of the generated image in the training process by designing and combining a pixel-level loss function, a perception loss function and an antagonism loss function.

Description

Image super-resolution method of skin detector

Technical Field

The invention relates to the technical field of image processing, in particular to an image super-resolution method of a skin detector.

Background

In the skin science and cosmetic industry, accurate assessment of skin condition using skin detectors is critical to providing personalized care and treatment. The premise of a skin detector that can capture high-definition visible detail skin imaging generally requires a high-resolution camera, but the cost of the high-resolution camera is generally high, which limits the application of the high-resolution camera in low-cost skin detection equipment, and many equipment adopts low-resolution cameras with low cost, so that the resolution of an image is improved through a super-resolution technology.

However, conventional super-resolution techniques (such as bilinear interpolation or bicubic interpolation) often cannot effectively recover high-frequency details such as skin texture and subtle changes when the image resolution is improved, especially when complex skin textures are processed. In addition, these methods have limitations in maintaining the realism of the image, often resulting in distortion and blurring of the image, and often amplifying noise during the image amplification process, especially in low quality raw images, which further reduces the quality of the image, resulting in deviations of the reconstructed high resolution image from the actual skin condition, thereby affecting an accurate assessment of the skin condition.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides an image super-resolution method of a skin detector, which solves the problem that the existing super-resolution technology has low detection efficiency of the skin detector due to low processing quality of low-quality original images.

In order to achieve the above purpose, the invention is realized by the following technical scheme: the super-resolution image method of the skin detector specifically comprises the following steps:

s1, constructing a high-order degradation model: repeating the process of the degradation model for a plurality of times, and simulating the degradation process of the low-resolution image in the real world;

s2, constructing a generator model: constructing a deep network for image super-resolution by a residual learning method, wherein the construction of the generator comprises a plurality of residual density blocks (RRDB) and an improved network layer design;

s3, constructing a discriminator model: the method is constructed by using a U-Net architecture and a spectrum normalization technology and is used for distinguishing a generated image from a real image;

s4, countermeasure training of a generator and a discriminator: the generator generates a high-resolution image, and the discriminator distinguishes the real image from the image generated by the generator to form an countermeasure training relation for deep learning;

s5, designing a loss function to optimize the image quality: the visual quality of the generated image is optimized during the training process using a combined loss function including pixel level loss, perceptual loss, and contrast loss.

Preferably, the mathematical formula for designing the degradation model in S1 is as follows:

x＝D(y)＝[(y*k)↓ _r +n]jpeg

where y is the original high resolution image, x is the degraded low resolution image, k is the fuzzy sum, +. _r Is a downsampling operation, n is added noise, []jpeg represents compression of jpeg pictures, and the higher-order degradation model is an image degradation that is repeatedly applied and simulates the transmission and processing of the wording based on the degradation process described above.

Preferably, the generator in S2 is constructed using a plurality of residual dense blocks (RRDB), each of which is a network of several convolutional layers and active layers, generating a high resolution imageCan be expressed as

The output of the generator is the result of the input x after repeated residual learning and enhancement, and the calculation formula of the residual block is as follows:

R(x)＝F(x)+x

where x is the input of the residual block, F (x) is a combination of network layers, and the output module forming the generator is constructed by learning the residual between the input x and the output R (x) multiple times and performing further reinforcement learning.

Preferably, the F (x) includes two or more convolution layers, and further includes a normalization layer and an activation layer, and the operations of the residual blocks in the two convolution layers are as follows:

first layer convolution: f (F) ₁ (x)＝ReLU(BN(Conv(x)))

Second layer convolution: f (F) ₂ (x)＝BN(Conv(F ₁ (x)))

Residual error learning: r (x) =f ₂ (x)+x

Wherein Conv represents the operation of convolution, BN represents batch normalization, reLU is an activation function, and residual blocks allow information to directly skip some layers in the network through the above-described processing operation on residual blocks in two convolution layers.

Preferably, the U-Net structure in S3 is used to capture global and local features of the image, and each layer of convolution in the discriminator can be expressed as:

Ci(I)＝ReLU(BN(Conv(I)))

wherein Conv represents a convolution operation, BN represents batch normalization, reLU is an activation function, I is an input image, C _i The method is characterized in that the method comprises the steps that i-th layer convolution is output, and after multi-layer convolution operation, a discriminator outputs a scalar value representing the probability that an image is judged to be real, wherein the calculation formula of the scalar value is as follows:

D(I)＝Sigmoid(FC(C _n (I)))

wherein FC is a fully-connected layer, C _n The output of the last layer of convolution, the Sigmoid function compresses the scalar value of the output to between 0 and 1, representing the probability output.

Preferably, the spectrum normalization technique in S3 is a discriminator loss for training a discriminator to distinguish between a real image and a generated image, expressed as:

L _D ＝-logD(T(x))-log(1-D(G(x)))

where T (x) is the true high resolution image. G (x) is the generated picture.

Preferably, the pixel level penalty (typically L1 or L2 penalty) directly calculates the difference between the generated image and the target high resolution image at the pixel level. The expression for L1 loss is:

where W and H are the width and height, respectively, of the image, G (x, y) is the pixel value of the generated image at location (x, y), and T (x, y) is the pixel value of the target high resolution image at location (x, y).

Preferably, the perceived loss in S4 uses a VGG network to extract image features to calculate the difference between the generated image and the target image in the feature space, and the mathematical expression of the VGG network is as follows:

wherein phi is _i Feature map representing layer i in network, W _i And H _i Is the dimension of the i-th layer feature map.

Preferably, the counter loss encourages the generator in S4 to produce an image that is difficult to distinguish by the discriminator, expressed as:

L _adv ＝-logD(G(x))

where D is the discriminator and G (x) is the image generated by the generator.

Preferably, the countermeasure training in S5 is based on a deep learning algorithm, and model training output evaluation are performed.

The invention provides an image super-resolution method of a skin detector, which has the following beneficial effects compared with the prior art: the method is used for processing the low-resolution image captured by the skin detector based on a deep learning model of the generated countermeasure network by constructing a high-order degradation model, a generator model and a discriminator model, and the model effectively improves the resolution of the image through an advanced generated countermeasure network (GAN) architecture, simultaneously keeps the authenticity of the image, reduces noise and optimizes the practicability effect.

Drawings

FIG. 1 is a flow chart of the method steps of the present invention.

FIG. 2 is a flow chart of data preparation and data processing and training in accordance with the present invention.

FIG. 3 is a flow chart of the training of the generator and discriminator in the present invention.

Detailed Description

The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by those skilled in the art without making creative efforts based on the embodiments of the present invention are included in the protection scope of the present invention.

Referring to fig. 1-3, the embodiment of the invention provides a technical scheme: the super-resolution image method of the skin detector specifically comprises the following steps:

s1, constructing a high-order degradation model: repeating the process of the degradation model for a plurality of times, simulating the degradation process of the low-resolution image in the real world, wherein the high-order degradation model is a model for simulating the complex degradation process in the real world;

s2, building a generator: constructing a deep network for image super-resolution by a residual learning method, wherein the construction of the generator comprises a plurality of residual density blocks (RRDB) and an improved network layer design;

s4, generator and discriminator challenge training: the generator generates a high-resolution image, and the discriminator distinguishes the real image from the image generated by the generator to form an countermeasure training relation for deep learning;

s5, designing a loss function: the visual quality of the generated image is optimized during the training process using a combined loss function including pixel level loss, perceptual loss, and contrast loss.

In this embodiment, the mathematical formula for the design of the degradation model in S1 is as follows:

x＝D(y)＝[(y*k)↓ _r +n]jpeg

where y is the original high resolution image, x is the degraded low resolution image, k is the fuzzy sum, +.r is the downsampling operation, n is the added noise, [ ] jpeg represents compression of the jpeg picture, and the higher order degradation model is repeatedly applied based on the degradation process described above, simulating the image degradation of the transmission and processing of the wording.

In this embodiment, the generator in S2 is constructed using a plurality of residual dense blocks (RRDB), each of which is a network of several convolutional layers and active layers, generating a high resolution imageCan be expressed as

R(x)＝F(x)+x

where x is the input of the residual block and F (x) is the combination of the network layers (e.g., convolutional layer, normalization layer, and active layer), the output module forming the generator is constructed by learning the residual between input x and output R (x) multiple times and performing further reinforcement learning.

In this embodiment, the F (x) includes two or more convolution layers, and further includes a normalization layer and an activation layer, and the operations on the residual blocks in the two convolution layers are as follows:

first layer convolution: f (F) ₁ (x)＝ReLU(BN(Conv(x)))

Second layer convolution: f (F) ₂ (x)＝BN(Conv(F ₁ (x)))

Residual error learning: r (x) =f ₂ (x)+x

In this embodiment, the U-Net structure in S3 is used to capture global and local features of the image, and each layer of convolution in the discriminator can be expressed as:

C _i (I)＝ReLU(BN(Conv(I)))

D(I)＝Sigmoid(FC(C _n (I)))

In this embodiment, the spectral normalization technique in S3 is a discriminator loss for training a discriminator to distinguish between a real image and a generated image, expressed as:

L _D ＝-logD(T(x))-log(1-D(G(x)))

where T (x) is the true high resolution image. G (x) is the generated picture. This definition of the discriminator loss is to let the discriminator learn how to better distinguish between the real image and the generated image, while indirectly driving the generator to produce higher quality images by optimizing the performance of the discriminator, and also provides a balancing mechanism to ensure that the competition between the generator and the discriminator is fair.

In this embodiment, the design of the loss function in S5 includes pixel level loss, perceptual loss, and GAN loss, which directly calculates the difference between the generated image and the target high resolution image at the pixel level (typically L1 or L2 loss). The expression for L1 loss is:

where W and H are the width and height of the image, G (x, y) is the pixel value of the generated image at position (x, y), T (x, y) is the pixel value of the target high resolution image at position (x, y), and the difference of the images at the pixel level is visually compared by the pixel level loss, so as to achieve the effect of visual matching.

In this embodiment, the perceived loss in S4 uses a VGG network to extract image features to calculate the difference in feature space between the generated image and the target image, the mathematical expression of the VGG network being:

wherein phi is _i Feature map representing layer i in network, W _i And H _i Is the dimension of the i-th layer feature map, matches by emphasizing the similarity of the image on the advanced feature representation, not just pixel-level similarity, which helps to generate a more visually convincing image, while also helping to improve the visual quality of the generated image, as image perception takes into account the content and texture information of the image, and the perceived loss can reduce the problem of excessive smoothness compared to pixel-level loss, making the generated image more perceptually natural and realistic.

In this embodiment, the contrast loss encourages the generator in S4 to produce an image that is difficult to distinguish by the discriminator, expressed as:

L _adv ＝-logD(G(x))

where D is the discriminator and G (x) is the image generated by the generator, the countering losses may encourage the generator to generate an image that is difficult to distinguish by the discriminator, thereby enhancing the authenticity of the generated image; help correct the blurred details in the generated image because the discriminator will focus particularly on those areas that are easily distinguished from true and false; it facilitates the generator to continuously improve through the process of antagonism with the discriminator, thereby generating finer, higher quality images.

In this embodiment, the countermeasure training in S4 is based on a deep learning algorithm that performs model training and model training output evaluation by training a generator and a discriminator network to effect conversion from low resolution to high resolution images, the discriminator and the generator being alternately updated during training, the purpose of the discriminator being to accurately distinguish between true images and generated images, and the generator attempting to generate high resolution images sufficient to fool the discriminator. This process creates a dynamic "challenge" relationship that improves the quality and authenticity of the generated image.

The model training process and the process of model training output evaluation are explained below by way of a flowchart, fig. 2 is a process of data preparation and data processing and training, and fig. 3 is a process of generator and discriminator training.

Compared with the traditional super-resolution technology, the invention has the advantages that: the deep learning model based on the generated countermeasure network is used for processing the low-resolution image captured by the skin detector, the model effectively improves the resolution of the image through an advanced generated countermeasure network (GAN) architecture, simultaneously keeps the authenticity of the image and reduces noise, and further optimizes the practical effect, and the model is specifically as follows:

1. by applying the high-order degradation model and the perception loss, the invention can simulate the real world image degradation process more closely, and can improve the definition and detail of the skin image, especially in the aspect of the texture and tiny characteristics of the skin, thereby recovering high-quality detail from the low-resolution image more accurately;

2. the residual error density block (RRDB) design and the countering loss function in the generator framework work together, so that the network can reduce noise and distortion while improving the image resolution, and a high-resolution image can be generated more clearly and almost without noise and compression artifacts by using the combination of the framework and the loss function, thereby providing more accurate images in applications such as skin lesion detection;

3. the training strategy of the invention comprises dynamic learning rate adjustment and real world data set use, which is helpful to improve the stability and generalization capability of the model, through the training strategy, the model can be better adapted to various degradation conditions, and the performance of the model on the real world data is improved, which means that the model can stably generate high-quality images under different skin types and conditions;

4. the invention provides a model interpretability mechanism, so that a user can understand and identify key influencing factors in a model generated image, the tuning and optimizing capacity of the model is enhanced, researchers and practitioners are allowed to further improve the model according to specific application requirements, and better image quality and analysis precision are realized.

And all that is not described in detail in this specification is well known to those skilled in the art.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The image super-resolution method of the skin detector is characterized by comprising the following steps of:

s1, constructing a high-order degradation model: simulating the degradation process of the low-resolution image in the real world by designing a degradation model and repeating the process of the degradation model for a plurality of times;

s2, constructing a generator model: constructing a deep network for image super-resolution by a residual learning method, wherein the construction of the generator comprises a plurality of residual dense blocks (RRDB) and a network layer;

s3, constructing a discriminator model: the discriminator model is constructed by using a U-Net architecture and a spectrum normalization technology and is used for distinguishing a generated image from a real image;

2. The method for super resolution imaging of a skin detector according to claim 1, wherein the mathematical formula of the degradation model in S1 is as follows:

x＝D(y)＝[(y*k)↓ _r +n]jpeg

3. The method according to claim 1, wherein the S2 generator comprises a plurality of residual error density blocks (RRDB) each of which is formed by combining several convolution layers and an activation layer, and the generated high-resolution imageCan be expressed as +.>

R(x)＝F(x)+x

4. A method of image super resolution for a skin detector according to claim 3, wherein said F (x) comprises two or more convolution layers, further comprising a normalization layer and an activation layer, and wherein the residual block in the two convolution layers is operated as follows:

first layer convolution processing:

F ₁ (x)＝ReLU(BN(Conv(x)))

second layer convolution processing:

F ₂ (x)＝BN(Conv(F ₁ (x)))

further enhancing residual learning between input x and output R (x):

R(x)＝F ₂ (x)+x

where Conv represents the operation of the convolution, BN represents batch normalization, reLU represents the activation function.

5. The method of claim 1, wherein the U-Net structure in S3 is used to capture global and local features of the image, and each layer of convolution in the discriminator can be expressed as:

C _i (I)＝ReLU(BN(Conv(I)))

wherein Conv represents a convolution operation, BN represents batch normalization, reLU represents an activation function, l represents an input image, C _i The output representing the i-th layer convolution, after the multi-layer convolution operation, the discriminator outputs a scalar value representing the probability that the image is judged to be true, and the calculation formula of the scalar value is:

D(I)＝Sigmoid(FC(C _n (I)))

wherein FC is a fully-connected layer, C _n Is the output of the last layer convolution, and the Sigmoid function outputsIs compressed to between 0 and 1, representing a probability output.

6. The method of claim 1, wherein the spectral normalization technique in S3 is used to train the discriminator to distinguish between the actual image and the discriminator loss of the generated image, expressed as:

L _D ＝-logD(T(x))-log(1-D(G(x)))

where T (x) is the true high resolution image G (x) is the generated picture.

7. The method according to claim 1, wherein the countermeasure training in S4 is based on a deep learning algorithm, performing model training and model training output evaluation, and performing conversion from low resolution to high resolution images by a training generator and a discriminator network, wherein the discriminator and the generator are alternately updated during training, the purpose of the discriminator is to accurately distinguish between true images and generated images, and the generator tries to generate high resolution images sufficient to fool the discriminator, thereby forming a dynamic "countermeasure" relationship.

8. The method according to claim 1, wherein the pixel level loss (typically, L1 or L2 loss) in S5 is a difference between the generated image and the target high resolution image at the pixel level, and the expression of the L1 loss is:

9. The method according to claim 1, wherein the perceived loss in S5 uses a VGG network to extract image features to calculate the difference between the generated image and the target image in the feature space, the mathematical expression of the VGG network being:

10. The method of claim 1, wherein the counter loss encourages the generator in S5 to produce an image that is difficult to distinguish by the discriminator, expressed as:

L _adv ＝-logD(G(x))

where D is the discriminator and G (x) is the image generated by the generator.