WO2024131707A1

WO2024131707A1 - Hair enhancement method, neural network, electronic device, and storage medium

Info

Publication number: WO2024131707A1
Application number: PCT/CN2023/139420
Authority: WO
Inventors: 张航; 许合欢; 王进
Original assignee: 虹软科技股份有限公司
Priority date: 2022-12-22
Filing date: 2023-12-18
Publication date: 2024-06-27
Also published as: CN116188295A

Abstract

A hair enhancement method, a neural network, an electronic device, and a storage medium. The hair enhancement method comprises: acquiring an image feature of an original image; performing residual calculation and feature fusion on the image feature of the original image by means of a plurality of sequentially-connected residual modules to obtain a residual module fused feature, wherein in two adjacent residual modules, an input feature of the latter residual module is an output feature of the former residual module; and performing feature reconstruction on the basis of the residual module fused feature to obtain an enhanced image corresponding to the original image.

Description

Hair enhancement method, neural network, electronic device and storage medium

cross reference

This application claims priority to a Chinese patent application filed with the China Patent Office on December 22, 2022, with application number 202211659099.7 and titled “Hair enhancement method, neural network, electronic device and storage medium”, the entire contents of which are incorporated by reference into this application.

Technical Field

The present application relates to but is not limited to the field of image processing technology, and in particular to a hair enhancement method, a neural network, an electronic device and a storage medium.

Background technique

With the popularity of mobile devices, taking photos has gradually become a way for people to record their lives. In order to better freeze the picture, people have higher and higher requirements for the quality of mobile phone photos, for example, the picture must be clean, colorful, and with clear textures.

Due to the limitations of shooting conditions, areas such as pet hair or hair in portraits will inevitably have problems such as blur, noise, and out-of-focus, resulting in low-quality photos. The most common solution for improving image quality is the super-resolution reconstruction method based on deep learning, which can process a low-resolution image through a convolutional neural network to obtain a high-resolution image and increase the high-frequency detail information missing in the image. However, since the current mainstream super-resolution reconstruction methods are all for natural images, although the resolution of the picture is improved when taking pictures, the texture details of pet hair or hair in portraits are often unsatisfactory.

Currently, no effective solution has been proposed for the problem of poor processing of pet hair or hair texture details in portraits.

SUMMARY OF THE INVENTION

The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.

In a first aspect, the present application provides a hair enhancement method, the method comprising:

Obtain image features of the original image;

The residual calculation and feature fusion are performed on the image features of the original image through a plurality of sequentially connected residual modules to obtain residual module fusion features, wherein, in two adjacent residual modules, the input feature of the latter residual module is The feature is the output feature of the previous residual module;

Feature reconstruction is performed based on the residual module fusion features to obtain an enhanced image corresponding to the original image.

In some embodiments, the plurality of sequentially connected residual modules include a first residual module and a second residual module that are sequentially connected; performing residual calculation and feature fusion on the image features of the original image through the plurality of sequentially connected residual modules to obtain the residual module fusion features includes:

Performing convolution fusion on the image features based on the first residual module to obtain a first feature;

Performing convolution fusion on the first features based on a second residual module to obtain a second feature;

The first feature and the second feature are fused to obtain the residual module fusion feature.

In some embodiments, the plurality of sequentially connected residual modules include a first residual module, a second residual module, a third residual module, and a fourth residual module that are sequentially connected; performing residual calculation and feature fusion on the image features of the original image through the plurality of sequentially connected residual modules to obtain the residual module fusion features includes:

Performing convolution fusion on the second features based on a third residual module to obtain a third feature;

Performing convolution fusion on the third feature based on a fourth residual module to obtain a fourth feature;

The first feature, the second feature, the third feature and the fourth feature are fused to obtain the residual module fusion feature.

In some embodiments, each of the residual modules includes an initial layer and a plurality of sequentially connected residual layers, and the method for obtaining the output features of the residual module includes:

For two adjacent residual layers, convolution calculation is performed on the final output feature of the previous residual layer in the subsequent residual layer to obtain a convolution output feature, and the convolution output feature is added to the final output feature of the previous residual layer as the final output feature of the subsequent residual layer;

In the case of multiple residual layers, the final output feature of each residual layer and the final output feature of the initial layer of the residual module are concatenated to obtain a residual layer concatenated feature;

Determine the output features of the residual module according to the input features of the residual module and the residual layer concatenation features.

In some embodiments, acquiring the image features of the original image includes:

Obtain the initial features of the original image;

The initial features are downsampled to obtain image features of the original image.

In some embodiments, downsampling the initial features to obtain the image features of the original image includes:

The initial features are downsampled step by step based on a plurality of sequentially connected downsampling modules to obtain image features of the original image, wherein, in two adjacent downsampling modules, the input features of the latter downsampling module are the output features of the former downsampling module.

In some of the embodiments, downsampling the initial features is achieved by wavelet transform.

In some embodiments, the step of reconstructing features based on the residual module fusion features to obtain an enhanced image corresponding to the original image includes:

The enhanced image is obtained by performing multiple upsampling and feature fusion calculations on the fusion features of the residual module based on multiple upsampling modules connected in sequence; wherein the number of the upsampling modules corresponds one to one to the number of the downsampling modules, and in two adjacent upsampling modules, the input features of the subsequent upsampling module are jointly determined according to the output features of the previous upsampling module and the output features of the target downsampling module, and the target downsampling module refers to the downsampling module corresponding to the subsequent upsampling module.

In some embodiments, the wavelet transform comprises:

The initial features of the original image are sampled at intervals in rows and columns according to a preset step size to obtain sampling results;

A plurality of different frequency band information of the initial feature is calculated according to the sampling result as the image feature of the original image.

In some embodiments, the hair enhancement method is implemented based on a neural network, and a method for obtaining sample image pairs for training the neural network includes:

Acquire a first sample image, where the image quality of the first sample image meets a preset image quality threshold;

Performing image degradation on the first sample image to obtain a second sample image, wherein the image quality of the second sample image is lower than the image quality of the first sample image;

The first sample image and the second sample image are regarded as a sample image pair.

In a second aspect, an embodiment of the present application provides a neural network, including an acquisition module, a plurality of sequentially connected residual modules and a reconstruction module;

The acquisition module is configured to acquire image features of the original image;

The plurality of residual modules are configured to sequentially perform residual calculation and feature fusion on the image features of the original image to obtain residual module fusion features, wherein, in two adjacent residual modules, the input features of the latter residual module are the output features of the former residual module;

The reconstruction module is configured to perform feature reconstruction based on the fusion features of the residual module to obtain an enhanced image corresponding to the original image.

In a third aspect, an embodiment of the present application provides an electronic device, comprising a processor and a memory, wherein the memory is configured to store executable instructions of the processor; and the processor is configured to execute the hair enhancement method as described in any one of the first aspects above by executing the executable instructions.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, which stores one or more programs, and the one or more programs can be executed by one or more processors to implement the hair enhancement method as described in any of the first aspects above.

Other aspects will be apparent upon reading and understanding the drawings and detailed description.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings are used to provide an understanding of the technical solution of the present application and constitute a part of the specification. Together with the embodiments of the present application, they are used to explain the technical solution of the present application and do not constitute a limitation on the technical solution of the present application.

FIG1 is a flow chart of a hair enhancement method according to an embodiment of the present application;

FIG2 is a flow chart of a method for generating a residual module fusion feature according to an embodiment of the present application;

FIG3 is a schematic diagram of the structure of multiple residual modules according to an embodiment of the present application;

FIG4 is a flow chart of a method for calculating output features of a residual module according to an embodiment of the present application;

FIG5 is a schematic diagram of the internal structure of a residual module according to an embodiment of the present application;

FIG6 is a flow chart of wavelet transform according to an embodiment of the present application;

FIG7 is a schematic diagram of the effect of wavelet transformation according to an embodiment of the present application;

FIG8 is a flow chart of a method for acquiring a sample image pair according to an embodiment of the present application;

FIG9 is a schematic diagram of the structure of a neural network according to an embodiment of the present application;

FIG10 is a schematic diagram showing a comparison between an original image and an enhanced image according to an embodiment of the present application;

FIG11 is a structural block diagram of a neural network according to an embodiment of the present application.

Details

The present application is described and illustrated below in conjunction with the accompanying drawings and embodiments.

Unless otherwise defined, the technical terms or scientific terms involved in this application shall have the general meaning understood by people with ordinary skills in the technical field to which this application belongs. The words "one", "a", "a kind of", "the", "these" and the like in this application do not represent quantitative restrictions, and they can be singular or plural. The terms "include", "comprise", "have" and any variants thereof involved in this application are intended to cover non-exclusive inclusions; for example, a process, method and system, product or device comprising a series of steps or modules (units) is not limited to the listed steps or modules (units), but may include unlisted steps or modules (units), or may include other steps or modules (units) inherent to these processes, methods, products or devices. The words "connect", "connected", "coupled" and the like involved in this application are not limited to physical or mechanical connections, but may include electrical connections, whether directly or indirectly. The "multiple" involved in this application refers to two or more. "And/or" describes the association relationship of associated objects, indicating that there can be three relationships. For example, "A and/or B" can mean: A exists alone, A and B exist at the same time, and B exists alone. Usually, the character "/" indicates that the objects associated with each other are in an "or" relationship. The terms "first", "second", "third", etc. involved in this application are only used to distinguish similar objects and do not represent a specific ordering of objects.

The present application describes multiple embodiments, but the description is exemplary rather than restrictive, and for those of ordinary skill in the art, there may be more embodiments and implementations within the scope of the embodiments described in the present application. Although many possible feature combinations are shown in the drawings and discussed in the specific embodiments, many other combinations of the disclosed features are also possible. Unless specifically limited, any feature or element of any embodiment may be used in combination with any other feature or element in any other embodiment, or may replace any other feature or element in any other embodiment.

This application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features, and elements that have been disclosed in this application may also be combined with any conventional features or elements to form a unique inventive solution as defined by the claims. Any features or elements of any embodiment may also be combined with features or elements from other inventive solutions to form another unique inventive solution as defined by the claims. Therefore, it should be understood that any feature shown and/or discussed in this application may be implemented alone or in any appropriate combination. Therefore, the embodiments are not subject to other limitations except as provided in the appended claims and their equivalents. In addition, the invention may be described in the appended claims. Various modifications and changes are made within the scope of protection.

In addition, when describing representative embodiments, the specification may have presented the method and/or process as a specific sequence of steps. However, to the extent that the method or process does not rely on the specific order of the steps described herein, the method or process should not be limited to the steps of the specific order described. As will be understood by those of ordinary skill in the art, other sequences of steps are also possible. Therefore, the specific sequence of the steps set forth in the specification should not be interpreted as a limitation to the claims. In addition, the claims for the method and/or process should not be limited to the steps of performing them in the order written, and those skilled in the art can easily understand that these sequences can be changed and still remain within the spirit and scope of the embodiments of the present application.

The present application provides a hair enhancement method, as shown in FIG1 , which comprises the following steps:

Step S101, obtaining image features of an original image.

The image features of the original image in this embodiment are image features related to hair. The hair in this embodiment includes pet hair and/or human hair. The original image can be any type of image. If the original image contains hair, the method in this embodiment can enhance the detailed texture of the hair to obtain a clearer image.

The process of acquiring image features can be achieved by a trained neural network through convolution calculation.

Step S102, performing residual calculation and feature fusion on the image features of the original image through a plurality of sequentially connected residual modules to obtain residual module fusion features, wherein, in two adjacent residual modules, the input features of the latter residual module are the output features of the former residual module.

The “connected in sequence” in this step indicates the data transmission relationship between the residual modules, for example, multiple residual modules can be cascaded.

In this embodiment, the neural network used for hair enhancement includes multiple residual modules, which perform residual calculations on input features in turn to obtain multiple residual features, and then perform feature fusion on the multiple residual features through convolution calculation to obtain residual module fusion features. Among them, since multiple residual modules are cascaded, the output features of the first residual module are the input features of the second residual module, the output features of the second residual module are the input features of the third residual module, and so on. The input of the first residual module is the image feature of the original image. In this embodiment, the number of residual modules is not limited, so the number of residual modules can be 2, 3, 4, 5, or even more.

Through multiple cascaded residual modules, the receptive field of the neural network can be deepened, and features at different scales can be better extracted, which is conducive to restoring complex hair textures. For example, the scale of the convolutional layer used for feature fusion can be 1×1 to increase the correlation between features of different depths.

Step S103, reconstructing features based on the residual module fusion features to obtain an enhanced image corresponding to the original image.

In one implementation of this embodiment, feature reconstruction can be achieved through convolution calculation.

Through the above steps S101 to S103, in this embodiment, the image features of the original image are calculated based on multiple residual modules, so as to obtain the residual module fusion features including details such as direction and texture at different scales in the original image. The enhanced image obtained by feature reconstruction based on the residual module fusion features has higher resolution and richer details than the original image, which can improve the processing effect of the hair texture details in the pet hair or portrait, and enhance the texture details in the image.

In one implementation of this embodiment, multiple residual modules of the neural network of this embodiment are cascaded, and the residual module fusion feature is obtained by fusion of the outputs of the multiple residual modules. The multiple sequentially connected residual modules may include a first residual module and a second residual module that are sequentially connected. FIG2 is a flow chart of a method for generating residual module fusion features according to an embodiment of the present application. As shown in FIG2, the method may include the following steps:

Step S201, performing convolution fusion on the image features based on the first residual module to obtain a first feature;

Step S202, performing convolution fusion on the first feature based on the second residual module to obtain a second feature;

Step S203, fusing the first feature and the second feature to obtain a residual module fusion feature.

In this embodiment, a method for processing image features using multiple residual modules is provided. The first feature output by the first residual module is used as the input of the second residual module, and the receptive field of the extracted features can be increased step by step. Finally, all outputs are fused to obtain features under different receptive fields, thereby enhancing the restoration effect of the original image. In this embodiment, the fusion of the first feature and the second feature can be achieved through a 1×1 convolution layer to enhance the correlation between features of different receptive fields.

It can be seen that the more residual modules there are, the deeper the receptive field is, the more features of different scales can be extracted, and the greater the amount of calculation. Therefore, in order to balance the detail features and the amount of calculation, in some embodiments, the neural network may also include a third residual module and a fourth residual module. As shown in FIG3 , the neural network includes four residual modules (Multi-Scale Res-Block, referred to as MSRB). At this time, the image features are convolutionally fused based on the first residual module to obtain the first feature; the first feature is convolutionally fused based on the second residual module to obtain the second feature, the second feature is convolutionally fused based on the third residual module to obtain the third feature, and the third feature is convolutionally fused based on the fourth residual module to obtain the fourth feature. Finally, the first feature, the second feature, the third feature and the fourth feature are convolutionally fused through the convolution layer of the fusion module to obtain the residual module fusion feature. The fusion module in this embodiment is a 1×1 convolution layer, which is used to change the number of output channels and increase the correlation between each feature at different receptive field depths.

In some embodiments, the residual module may include an initial layer and a plurality of sequentially connected residual layers, and the output features of the residual module are calculated through the plurality of residual layers. FIG4 is a flow chart of a method for calculating the output features of the residual module according to an embodiment of the present application. The first feature, the second feature, the third feature, and the fourth feature in the above embodiment as output features of the residual module can be obtained by the method, and the method comprises the following steps:

Step S401, for two adjacent residual layers, convolution calculation is performed on the final output feature of the previous residual layer by the subsequent residual layer to obtain a convolution output feature, and the convolution output feature is added to the final output feature of the previous residual layer as the final output feature of the subsequent residual layer;

Step S402, when there are multiple residual layers, concatenate the final output feature of each residual layer with the final output feature of the initial layer of the residual module to obtain a residual layer concatenation feature;

Step S403, determining the output features of the residual module according to the input features of the residual module and the residual layer concatenation features.

In one implementation of this embodiment, the residual layer splicing features can be first convolved through a convolution layer to reduce the channel, and then added to the input features of the residual module to obtain the output features of the residual module.

Among them, the first layer of the residual module serves as the initial layer, which can be an ordinary convolution layer, which is set to calculate the input features of the residual module and directly obtain the final output features of the initial layer. The residual module is a residual layer from the second layer onwards, and the residual layer of the second layer (i.e., the first residual layer) adds its own convolution output features and the final output features of the initial layer as its own final output features.

FIG5 is a schematic diagram of the internal structure of a residual module according to an embodiment of the present application. As shown in FIG5 , the residual module is composed of a convolution layer for residual calculation and a convolution layer for splicing and fusion. Exemplarily, the present embodiment includes 4 residual structures for residual calculation. After the residual calculation, the output features of each residual layer are concatenated (concat), and then a 1×1 convolution layer is connected to reduce the number of channels to reduce the amount of calculation of the neural network. The following steps may be included: first, the input feature S of the residual module is convolved through the initial layer of the residual module to obtain S01, S01 is the final output feature of the initial layer, S01 passes through a convolution layer to obtain the convolution output feature S01', S01' and S01 are added to form the first residual structure, and the final output feature S02 of the first residual layer is obtained, S02 passes through a convolution layer to obtain the convolution output feature S02', S02' and S02 are added to form a second residual structure, and the final output feature S03 of the second residual layer is obtained, S03 After a convolution layer, the convolution output feature S03' is obtained. S03' and S03 are added to form the third residual structure to obtain the final output feature S04 of the third residual layer. Finally, S01, S02, S03, and S04 are concatenated (concat) in the channel dimension to obtain the residual layer concatenation feature. The residual layer concatenation feature is convoluted and fused by a 1×1 convolution layer to increase the correlation between features of different receptive field depths and reduce the number of channels to obtain S'. Finally, S' and S are added to form a residual structure again to obtain the output feature of the residual module. The "⊕" in Figure 5 represents addition. This embodiment The addition process may be elementwise add to achieve element-by-element addition, thereby retaining more information in the original image and ensuring that the texture details of the enhanced image are consistent with the direction information of the hair in the original image.

Through the above steps S401 to S403, within each residual module, the receptive field is gradually increased through multiple residual layers, and multi-scale features under different receptive fields are obtained, which is conducive to restoring the hair texture.

In some embodiments, the image features of the original image are obtained by first obtaining the initial features of the original image, such as the underlying features at the pixel level, and then downsampling the initial features to obtain the image features of the original image for residual calculation. In this embodiment, by downsampling to obtain the image features, not only the high and low frequency information after decomposition can be obtained, but also more image details can be obtained.

In one implementation of this embodiment, when the image features of the original image are obtained based on the initial features, it can be implemented by downsampling step by step, which may include: downsampling the initial features step by step based on multiple sequentially connected downsampling modules to obtain image features at different scales. Among them, in two adjacent downsampling modules, the input features of the later downsampling module are the output features of the previous downsampling module.

In one implementation of this embodiment, multiple step-by-step decomposition and downsampling of the initial features can be achieved through wavelet transform (WT). Compared with the usual downsampling through convolution calculation, wavelet transform can save calculation amount without losing various feature information of the original image. It can not only efficiently obtain the high and low frequency information after decomposition, but also restore it through inverse transformation without losing details, and the calculation amount is very small, which is very conducive to deployment on mobile terminals. Therefore, for texture features such as hair, the use of wavelet transform can better retain details and reduce losses. Optionally, discrete wavelet transform (DWT) is used in this embodiment.

After the wavelet transform, in order to reduce the amount of calculation, the input features after the wavelet transform can be convolved to reduce the number of channels, thereby finally obtaining the output features of the downsampling module.

In one implementation of this embodiment, the step-by-step decomposition and feature extraction of the initial features include a total of 3 downsampling modules, and each downsampling module includes DWT decomposition and convolution calculation. Exemplarily, the first layer of decomposition and convolution structure performs DWT decomposition on the initial feature x0, and then convolves the decomposed features to reduce the number of channels, and then enhances the nonlinearity through the ReLU operation to obtain x1. The second layer of decomposition and convolution structure performs DWT decomposition on x1, and also performs convolution and ReLU operations on the decomposed features to obtain the output feature x2. The third layer of decomposition and convolution operation performs DWT decomposition on the feature x2, and then performs convolution operation on the decomposed features to obtain the output feature x3, which can be used as the input feature S of the residual module. Among them, the size of the convolution layer can be 3×3 to reduce the amount of calculation, and the number of convolution layers is not limited.

In some embodiments, FIG6 is a flow chart of wavelet transform according to an embodiment of the present application. As shown in FIG6 , the method includes:

Step S601 , performing interval sampling on the initial features of the original image in rows and columns according to a preset step size to obtain sampling results.

The preset step size can be set according to the requirements. When the preset step size is 2, the sampling calculation can be performed by the following formulas 1 to 6:
p01＝p[:,∶,0::2,∶]/2 Formula 1
p02＝p[:,∶,1::2,∶]/2 Formula 2
p1＝p01[:,∶,∶,0::2] Formula 3
p2＝p02[:,∶,∶,0::2] Formula 4
p3＝p01[:,∶,∶,1::2] Formula 5
p4＝p02[:,∶,∶,1::2] Formula 6

Among them, p represents the pixel of the initial feature, p01 represents the pixel obtained by sampling every two pixels starting from 0 in the column direction of the image, and taking half of the sampling result, and p02 represents the pixel obtained by sampling every two pixels starting from 1 in the column direction of the image, and taking half of the sampling result. p1 to p4 represent four pixels in a 2×2 square, p1 is the pixel obtained by sampling every two pixels starting from 0 in the row direction of the image for p01, p2 is the pixel obtained by sampling every two pixels starting from 0 in the row direction of the image for p02, p3 is the pixel obtained by sampling every two pixels starting from 1 in the row direction of the image for p01, and p4 is the pixel obtained by sampling every two pixels starting from 1 in the row direction of the image for p02. And so on, complete the entire sampling process and get the sampling result.

Step S602: Calculate a plurality of different frequency band information of the initial feature according to the sampling result as the image feature of the original image.

In an implementation of this embodiment, the frequency band information calculation process may be completed by the following formulas 7 to 10:
LL＝p1+p2+p3+p4 Formula 7
HL＝-p1-p2+p3+p4 Formula 8
LH＝-p1+p2-p3+p4 Formula 9
HH＝p1-p2-p3+p4 Formula 10

Among them, LL is low-frequency information, HL is high-frequency information in the vertical direction, LH is high-frequency information in the horizontal direction, and HH is high-frequency information in the diagonal direction. Since low frequency reflects the image overview and high frequency reflects the image details, the image features can be better preserved through wavelet transform. As shown in Figure 7, the image on the left is the original input image, and the image on the right is a schematic diagram after a wavelet decomposition. After wavelet transform, 4 different frequency band information are obtained, and the horizontal and vertical coordinates in the image on the right represent the image size after wavelet transform.

Through the above steps S601 and S602, image features including multiple different frequency band information can be obtained losslessly. It is helpful to obtain details such as hair texture and direction.

Correspondingly, the process of reconstructing the enhanced image by fusion features of the residual module is to perform multiple upsampling and feature fusion calculations on the fusion features of the residual module based on multiple sequentially connected upsampling modules to obtain an enhanced image, wherein the number of upsampling modules corresponds to the number of downsampling modules one by one, and in two adjacent upsampling modules, the input features of the subsequent upsampling module are determined based on the output features of the previous upsampling module and the output features of the target downsampling module, and the target downsampling module refers to the downsampling module corresponding to the subsequent upsampling module. Among them, for the output features of the previous upsampling module and the output features of the target downsampling module, the input features of the subsequent upsampling module are obtained by elementwise add element by element. In the case where the downsampling is a wavelet transform, the upsampling corresponds to an inverse wavelet transform (Inverse Wavelet Transform, referred to as IWT) to reduce the loss of details in the original image.

In one implementation of this embodiment, the obtained LL, HL, LH, and HH components are first concatenated in the channel dimension and then restored. The specific calculation formula of IWT is shown in Formulas 11 to 18:
p1＝LL/2 Formula 11
p2＝HL/2 Formula 12
p3＝LH/2 Formula 13
p4＝HH/2 Formula 14
Rlt[:,:,0::2,0::2]＝p1-p2-p3+p4 Formula 15
Rlt[:,:,1::2,0::2]＝p1-p2+p3-p4 Formula 16
Rlt[:,:,0::2,1::2]＝p1+p2-p3-p4 Formula 17
Rlt[:,:,1::2,1::2]＝p1+p2+p3+p4 Formula 18

Among them, Rlt is the result of the final inverse wavelet transform.

In one implementation of this embodiment, feature reconstruction and step-by-step synthetic upsampling include a total of three upsampling modules, each of which includes a convolution layer and an IWT layer. The residual module fusion feature is regarded as the input y3 of the first upsampling module. The first upsampling module first convolves y3 output by the bottom multi-scale residual module to increase the number of channels, followed by a ReLU operation to enhance nonlinearity, and then uses IWT to obtain the feature y3', which is added to x2 obtained in the downsampling process to obtain the input feature y2 of the second upsampling module. The second upsampling module reconstructs the feature of y2 through convolution, ReLU operation and IWT to obtain y2', which is added to x1 to obtain the input feature y1 of the third upsampling module. The third upsampling module also uses convolution, ReLU operation and IWT to reconstruct the feature of y1 to obtain y1', which is added to x0 to obtain the feature y0. After completing the upsampling process, y0 is calculated through a 3×3 convolution layer to obtain the final output feature y as an enhanced image. The addition process in this embodiment may be an elementwise add to perform element-by-element addition, so that more information in the original image can be retained, thereby ensuring that the texture details of the enhanced image are consistent with the direction information of the hair in the original image.

In some embodiments, the present application implements the above-mentioned hair enhancement method based on a neural network. When training the neural network, corresponding sample image pairs are required. FIG8 is a flow chart of a method for obtaining sample image pairs according to an embodiment of the present application. The method includes the following steps:

Step S801: acquiring a first sample image, wherein the image quality of the first sample image meets a preset image quality threshold.

In one implementation of this embodiment, a first sample image of high-definition pet hair or human hair can be collected by a high-definition image acquisition device such as a SLR camera. For the collected first sample image, the collected hair is required to be smooth, the texture is clear, the detail resolution is high, and the hair direction consistency is good. Based on this, a corresponding image quality threshold can be set to screen the first sample image.

Step S802: performing image degradation on the first sample image to obtain a second sample image, wherein the image quality of the second sample image is lower than that of the first sample image.

Among them, degradation refers to the process of reducing image quality, which can be simulated through JPEG compression, raw noise, lens blur, zoom and other operations, and finally a low-quality pet hair image is obtained after the actual image is degraded.

Step S803: taking the first sample image and the second sample image as a sample image pair.

Through the above steps S801 to S803, the training set is all acquired by real shooting with an image acquisition device that can acquire high-quality images, and high-definition hair images under different light, different environments, and different angles are collected, requiring the hair in the hair image to be smooth, with clear texture, high detail resolution, and good consistency of hair direction. Then, paired low-quality images are obtained through degradation to simulate low-quality hair images taken in real scenes, and finally sample image pairs are obtained, ensuring that the input and output are strictly aligned, and there is no pixel misalignment problem, so that the training results of the neural network are better.

In one implementation of this embodiment, the training of the neural network may include the following steps:

S1, obtain multiple sample image pairs;

S2, input the training set consisting of multiple sample image pairs into the neural network to be trained based on the multi-scale residual network structure for training. The loss function of the neural network in this embodiment is shown in Formula 19:

In order to improve the authenticity of the generation, the loss function in this embodiment is obtained by weighted summation of multiple sub-loss functions. Wherein, L represents the final loss function, n represents the number of sample image pairs, _L1 is the pixel-by-pixel calculation loss, L _SSIM is the structural similarity loss, L _VGG is the perceptual loss, L _GAN is the loss of the generative adversarial network, and the weights λ ₁ , λ ₂ , λ ₃ , and λ ₄ can be set according to requirements. The loss is calculated based on the output results of the neural network and the real training set. When the value of the loss function reaches the minimum or the number of iterations exceeds the preset threshold, the training ends.

S3, saves the neural network model that meets the convergence condition for hair enhancement.

An embodiment is given below by taking the application scenario of pet hair enhancement as an example.

In this embodiment, the structure of the neural network is shown in Figure 9, including an initial feature extraction module, multiple downsampling modules, multiple residual modules, a fusion module, multiple upsampling modules and a repair enhancement module, wherein the initial feature extraction module is configured to extract the basic features of the original image, multiple downsampling modules, multiple residual modules, a fusion module and multiple upsampling modules are configured to mine more feature information of the image, and the repair enhancement module is configured to achieve the final pet hair repair enhancement.

The initial feature extraction module is implemented by a 3×3 convolution layer, which is responsible for extracting the underlying pixel-level features x0 from the low-quality pet hair image x input to the neural network, and using more output channels to represent the feature information of x. The size of the convolution kernel can be 3×3, which can avoid the increase in network parameters caused by too large a convolution kernel and reduce the computing performance consumed in the network inference stage.

The three downsampling modules decompose and downsample x0 step by step, and obtain the output features x1, x2, and x3 in turn through the DWT decomposition layer and the convolution layer.

In this embodiment, the multiple residual modules are exemplified as 4 identical multi-scale residual modules, and the fusion module is a 1×1 convolution layer, which is used to change the number of output channels and increase the correlation between each feature at different receptive field depths. Each residual module consists of multiple 3×3 convolution layers and 1 1×1 convolution layer, and the residual layer performs residual calculation. The final fusion module concatenates the output features of each of the four residual modules in the channel dimension and then convolves them to obtain the underlying feature extraction result, that is, the residual module fusion feature.

The multiple upsampling modules are exemplified as three upsampling modules, each of which includes a convolutional layer and an IWT reconstruction layer. y1’ is obtained through calculation by the three upsampling modules, and the output feature y0 is obtained by adding y1’ and x0.

The final repair enhancement module is implemented by a deconvolution layer with a convolution kernel size of 3×3 and a step size of 2. Deconvolution is performed on y0 to obtain the final repair reconstruction result y.

It can be seen that the convolution layers of the initial feature extraction module, downsampling module, upsampling module and repair enhancement module in this embodiment are all 3×3, which can reduce parameter calculation and reduce the amount of calculation of the neural network, which is conducive to deployment on the mobile terminal. The convolution layers of the fusion module are all 1×1, which can increase the correlation between each feature at different receptive field depths.

The “⊕” in Figure 9 represents addition. In this embodiment, the addition process can be elementwise add to achieve element-by-element addition, so that more information in the original image can be retained, ensuring that the texture details of the enhanced image are consistent with the direction information of the hair in the original image.

In one implementation of this embodiment, this embodiment uses DWT and IWT to implement step-by-step decomposition downsampling and step-by-step decomposition downsampling. Compared with the traditional convolution and deconvolution to achieve downsampling and upsampling, the use of DWT and IWT has the following two advantages: 1. It can reduce parameters and calculations. DWT and IWT are parameter-free operations with simple calculations, avoiding the performance consumption caused by parameterized up and down sampling; 2. After the original image is represented by the four components of HH, HL, LH, and LL, the high-frequency detail information of the image can be effectively mined, and DWT and IWT are a pair of lossless conversion operations, which can ensure that the content of the original image is restored without losing details.

As shown in Figure 10, Src represents the original image, and Rlt represents the enhanced image after restoration. The texture of the restored and reconstructed pet hair is clearer, and the direction is consistent with the original image, which can significantly enhance the hair resolution of the original image and improve the visual effect of the human eye.

Therefore, the hair enhancement method based on the multi-scale residual network structure in this embodiment can solve the problems of blur, noise, out-of-focus, etc. in the hair area of the image. The multi-scale residual structure can not only obtain the characteristics of different receptive fields and better mine the missing high-frequency detail information, but also the residual structure is convenient for training, ensuring the stability of the training process, and ultimately achieving the repair and enhancement of low-quality hair areas.

The steps shown in the above process or the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer executable instructions, and, although a logical order is shown in the flowchart, in some cases, the steps shown or described can be performed in an order different from that shown here.

In this embodiment, a neural network is also provided, which is used to implement the above embodiments and implementation methods, and the descriptions that have been made will not be repeated. The terms "module", "unit", "sub-unit", etc. used below can be a combination of software and/or hardware that implements the predetermined functions. Although the devices described in the following embodiments are preferably implemented in software, the implementation of hardware, or a combination of software and hardware, is also possible and conceivable.

FIG. 11 is a block diagram of a neural network according to an embodiment of the present application. As shown in FIG. 11 , the neural network is used for hair enhancement, and includes an acquisition module 1101, a plurality of sequentially connected residual modules 1102, and a reconstruction module 1103;

An acquisition module 1101 is configured to acquire image features of an original image;

A plurality of residual modules 1102 are configured to sequentially perform residual calculation and feature fusion on image features of the original image to obtain residual module fusion features, wherein, in two adjacent residual modules, the input features of the latter residual module are the output features of the former residual module;

The reconstruction module 1103 is configured to perform feature reconstruction based on the residual module fusion features to obtain an enhanced image corresponding to the original image.

Through the above steps of the neural network, in this embodiment, the image features of the original image are calculated based on multiple residual modules 1102, so as to obtain the residual module fusion features including details such as direction and texture in the original image, and the reconstruction module 1103 The enhanced image obtained by feature reconstruction based on the residual module fusion feature has higher resolution and richer details than the original image, which can improve the processing effect of the hair texture details in the pet hair or portrait, and enhance the texture details in the image.

In one implementation of the present embodiment, a plurality of sequentially connected residual modules include a first residual module and a second residual module connected in sequence; residual calculation and feature fusion are performed on the image features of the original image through a plurality of sequentially connected residual modules to obtain residual module fusion features, including: the first residual module performs convolution fusion on the image features to obtain a first feature; the second residual module performs convolution fusion on the first feature to obtain a second feature; the fusion module fuses the first feature and the second feature to obtain a residual module fusion feature.

In one implementation of the present embodiment, a plurality of sequentially connected residual modules include a first residual module, a second residual module, a third residual module and a fourth residual module that are sequentially connected; the residual calculation and feature fusion are performed on the image features of the original image through a plurality of sequentially connected residual modules to obtain the residual module fusion features, including: performing convolution fusion on the image features based on the first residual module to obtain the first feature; performing convolution fusion on the first feature based on the second residual module to obtain the second feature; performing convolution fusion on the second feature based on the third residual module to obtain the third feature; performing convolution fusion on the third feature based on the fourth residual module to obtain the fourth feature; and fusing the first feature, the second feature, the third feature and the fourth feature to obtain the residual module fusion feature.

In one implementation of this embodiment, for two adjacent residual layers, a convolution output feature is obtained by performing a convolution calculation on the final output feature of the previous residual layer in the subsequent residual layer, and the convolution output feature is added to the final output feature of the previous residual layer as the final output feature of the subsequent residual layer; in the case of multiple residual layers, the fusion layer splices the final output feature of each residual layer and the final output feature of the initial layer of the residual module to obtain a residual layer splicing feature; the input feature of the residual module and the residual layer splicing feature jointly determine the output feature of the residual module.

In an implementation manner of this embodiment, the acquisition module 1101 is further configured to acquire initial features of the original image; and downsample the initial features to obtain image features of the original image.

In one implementation of this embodiment, the acquisition module 1101 downsamples the initial features step by step based on multiple sequentially connected downsampling modules to obtain image features of the original image, wherein, in two adjacent downsampling modules, the input features of the later downsampling module are the output features of the previous downsampling module.

In one implementation of this embodiment, downsampling of the initial features is achieved through wavelet transformation.

In one implementation of this embodiment, the reconstruction module 1103 is further configured to perform multiple upsampling and feature fusion calculations on the residual module fusion features based on a plurality of sequentially connected upsampling modules to obtain an enhanced image; wherein the number of upsampling modules corresponds to the number of downsampling modules one by one, and in two adjacent upsampling modules, the input of the subsequent upsampling module is The input features are determined based on the output features of the preceding upsampling module and the output features of the target downsampling module. The target downsampling module refers to the downsampling module corresponding to the succeeding upsampling module.

In one implementation of this embodiment, the wavelet transform includes performing interval sampling on the initial features of the original image in rows and columns according to a preset step size to obtain sampling results; and calculating multiple different frequency band information of the initial features as image features of the original image based on the sampling results.

In one implementation of this embodiment, a method for acquiring a sample image pair for training a neural network may include: acquiring a first sample image, where the image quality of the first sample image meets a preset image quality threshold; performing image degradation on the first sample image to obtain a second sample image, where the image quality of the second sample image is lower than that of the first sample image; and treating the first sample image and the second sample image as a sample image pair.

This embodiment uses the acquisition method to acquire a large number of sample image pairs for training a neural network.

The hair enhancement method provided by the present application performs residual calculation and feature fusion on the image features of the original image through multiple sequentially connected residual modules to obtain residual module fusion features, wherein, in two adjacent residual modules, the input features of the latter residual module are the output features of the former residual module; feature reconstruction is performed based on the residual module fusion features to obtain an enhanced image corresponding to the original image, thereby improving the processing effect of hair texture details in pet hair or portraits and enhancing the texture details in the image.

Each of the above modules may be a functional module or a program module, and may be implemented by software or hardware. For modules implemented by hardware, each of the above modules may be located in the same processor; or each of the above modules may be located in different processors in any combination.

This embodiment also provides an electronic device, including a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to run the computer program to execute the steps in any one of the above method embodiments.

In one implementation of this embodiment, the electronic device may include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

In one implementation of this embodiment, the processor may be configured to perform the following steps through a computer program:

S1, obtain the image features of the original image.

S2, performing residual calculation and feature fusion on the image features of the original image through a plurality of sequentially connected residual modules to obtain residual module fusion features, wherein, in two adjacent residual modules, the input features of the latter residual module are the output features of the former residual module.

S3, performing feature reconstruction based on the residual module fusion features to obtain an enhanced image corresponding to the original image.

For specific examples in this embodiment, reference may be made to the examples described in the above embodiments and optional implementation modes, which will not be described in detail in this embodiment.

In addition, in combination with the method for obtaining sample labels provided in the above embodiments, a computer-readable storage medium may be provided in this embodiment for implementation. The storage medium stores one or more programs, and the one or more programs may be executed by one or more processors; when the program is executed by the processor, any one of the methods in the above embodiments is implemented.

It should be understood that the specific embodiments described herein are only used to explain the application, rather than to limit it. Based on the embodiments provided in this application, all other embodiments obtained by ordinary technicians in this field without creative work are within the protection scope of this application.

The accompanying drawings are only some examples or embodiments of the present application. For ordinary technicians in the field, the present application can be applied to other similar situations based on these drawings without creative work. In addition, it is understandable that although the work done in this development process may be complicated and lengthy, for ordinary technicians in the field, certain changes in design, manufacturing or production based on the technical content disclosed in this application are only conventional technical means and should not be regarded as insufficient content disclosed in this application.

The user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties.

The term "embodiment" in this application refers to a specific feature, structure or characteristic described in conjunction with the embodiment that can be included in at least one embodiment of the present application. The appearance of this phrase in various places in the specification does not necessarily mean the same embodiment, nor does it mean that it is mutually exclusive with other embodiments and is independent or optional. It can be clearly or implicitly understood by those of ordinary skill in the art that the embodiments described in this application can be combined with other embodiments without conflict.

The above-mentioned embodiments only express several implementation methods of the present application, and the descriptions thereof are relatively specific and detailed, but they cannot be understood as limiting the scope of patent protection. It should be pointed out that, for a person of ordinary skill in the art, several variations and improvements can be made without departing from the concept of the present application, and these all belong to the scope of protection of the present application. Therefore, the scope of protection of the present application shall be subject to the attached claims.

It will be appreciated by those skilled in the art that all or some of the steps, systems, and functional modules/units in the methods disclosed above may be implemented as software, firmware, hardware, and appropriate combinations thereof. In hardware implementations, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, a physical component may have multiple functions, or a function or step may be performed by several physical components in cooperation. Some or all components may be implemented as software executed by a processor, such as a digital signal processor or a microprocessor, or implemented as hardware, or implemented as an integrated circuit, such as an application-specific integrated circuit. Such software may be distributed on a computer-readable medium, which may include a computer storage medium (or non-transitory medium) and a communication medium (or temporary medium). As known to those skilled in the art, the term computer storage medium includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tapes, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and can be accessed by a computer. In addition, it is well known to those of ordinary skill in the art that communication media typically contain computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media.

Claims

A method for hair enhancement, comprising:

Obtain image features of the original image;

Performing residual calculation and feature fusion on the image features of the original image through a plurality of sequentially connected residual modules to obtain residual module fusion features, wherein, in two adjacent residual modules, the input features of the latter residual module are the output features of the former residual module;

Feature reconstruction is performed based on the residual module fusion features to obtain an enhanced image corresponding to the original image.
The hair enhancement method according to claim 1, wherein the plurality of sequentially connected residual modules include a first residual module and a second residual module that are sequentially connected; and performing residual calculation and feature fusion on the image features of the original image through the plurality of sequentially connected residual modules to obtain the residual module fusion features includes:

Performing convolution fusion on the image features based on the first residual module to obtain a first feature;

Performing convolution fusion on the first features based on a second residual module to obtain a second feature;

The first feature and the second feature are fused to obtain the residual module fusion feature.
The hair enhancement method according to claim 1, wherein the plurality of sequentially connected residual modules include a first residual module, a second residual module, a third residual module, and a fourth residual module that are sequentially connected; and the residual module fusion features obtained by performing residual calculation and feature fusion on the image features of the original image through the plurality of sequentially connected residual modules include:

Performing convolution fusion on the image features based on the first residual module to obtain a first feature;

Performing convolution fusion on the first features based on a second residual module to obtain a second feature;

Performing convolution fusion on the second features based on a third residual module to obtain a third feature;

Performing convolution fusion on the third feature based on a fourth residual module to obtain a fourth feature;

The first feature, the second feature, the third feature and the fourth feature are fused to obtain the residual module fusion feature.
The hair enhancement method according to claim 1 or 2, wherein each of the residual modules comprises an initial layer and a plurality of sequentially connected residual layers, and a method for acquiring output features of the residual module comprises:

For two adjacent residual layers, the final output features of the previous residual layer are convolved in the subsequent residual layer to obtain the convolution output features, and the convolution output features are added to the final output features of the previous residual layer. As the final output feature of the subsequent residual layer;

In the case of multiple residual layers, the final output feature of each residual layer and the final output feature of the initial layer of the residual module are concatenated to obtain a residual layer concatenated feature;

The output feature of the residual module is determined according to the input feature of the residual module and the residual layer concatenation feature.
The hair enhancement method according to claim 1, wherein the acquiring the image features of the original image comprises:

Obtain the initial features of the original image;

The initial features are downsampled to obtain image features of the original image.
The hair enhancement method according to claim 5, wherein downsampling the initial features to obtain the image features of the original image comprises:

The initial features are downsampled step by step based on a plurality of sequentially connected downsampling modules to obtain image features of the original image, wherein, in two adjacent downsampling modules, the input features of the latter downsampling module are the output features of the former downsampling module.
The hair enhancement method according to claim 5 or 6, wherein the downsampling of the initial features is achieved by wavelet transform.
The hair enhancement method according to claim 7, wherein the step of performing feature reconstruction based on the residual module fusion feature to obtain an enhanced image corresponding to the original image comprises:

The enhanced image is obtained by performing multiple upsampling and feature fusion calculations on the fusion features of the residual module based on multiple upsampling modules connected in sequence; wherein the number of the upsampling modules corresponds one-to-one to the number of the downsampling modules, and in two adjacent upsampling modules, the input features of the subsequent upsampling module are jointly determined according to the output features of the previous upsampling module and the output features of the target downsampling module, and the target downsampling module refers to the downsampling module corresponding to the subsequent upsampling module.
The hair enhancement method according to claim 7, wherein the wavelet transform comprises:

The initial features of the original image are sampled at intervals in rows and columns according to a preset step size to obtain sampling results;

A plurality of different frequency band information of the initial feature is calculated according to the sampling result as the image feature of the original image.
The hair enhancement method according to claim 1, wherein the hair enhancement method is implemented based on a neural network. The method for obtaining the sample image pairs for training the neural network includes:

Acquire a first sample image, where the image quality of the first sample image meets a preset image quality threshold;

Performing image degradation on the first sample image to obtain a second sample image, wherein the image quality of the second sample image is lower than the image quality of the first sample image;

The first sample image and the second sample image are regarded as a sample image pair.
A neural network comprises an acquisition module, a plurality of sequentially connected residual modules and a reconstruction module;

The acquisition module is configured to acquire image features of the original image;

The plurality of residual modules are configured to sequentially perform residual calculation and feature fusion on the image features of the original image to obtain residual module fusion features, wherein, in two adjacent residual modules, the input features of the latter residual module are the output features of the former residual module;

The reconstruction module is configured to perform feature reconstruction based on the fusion features of the residual module to obtain an enhanced image corresponding to the original image.
An electronic device comprises a processor and a memory, wherein the memory is configured to store executable instructions of the processor; and the processor is configured to execute the hair enhancement method according to any one of claims 1 to 10 by executing the executable instructions.
A computer-readable storage medium storing one or more programs, wherein the one or more programs can be executed by one or more processors to implement the hair enhancement method according to any one of claims 1 to 10.