WO2022110638A1 - 人像修复方法、装置、电子设备、存储介质和程序产品 - Google Patents
人像修复方法、装置、电子设备、存储介质和程序产品 Download PDFInfo
- Publication number
- WO2022110638A1 WO2022110638A1 PCT/CN2021/090296 CN2021090296W WO2022110638A1 WO 2022110638 A1 WO2022110638 A1 WO 2022110638A1 CN 2021090296 W CN2021090296 W CN 2021090296W WO 2022110638 A1 WO2022110638 A1 WO 2022110638A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- face
- face image
- feature map
- network
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 69
- 238000012545 processing Methods 0.000 claims abstract description 40
- 230000009466 transformation Effects 0.000 claims abstract description 19
- 238000003062 neural network model Methods 0.000 claims description 53
- 230000008439 repair process Effects 0.000 claims description 51
- 238000005070 sampling Methods 0.000 claims description 38
- 238000006731 degradation reaction Methods 0.000 claims description 21
- 230000015556 catabolic process Effects 0.000 claims description 18
- 238000013528 artificial neural network Methods 0.000 claims description 15
- 239000000284 extract Substances 0.000 claims description 14
- 238000012937 correction Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 12
- 230000006835 compression Effects 0.000 claims description 11
- 238000007906 compression Methods 0.000 claims description 11
- 238000001514 detection method Methods 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 10
- 230000004927 fusion Effects 0.000 claims description 7
- 230000011218 segmentation Effects 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 230000001815 facial effect Effects 0.000 abstract description 12
- 230000000694 effects Effects 0.000 abstract description 8
- 230000009286 beneficial effect Effects 0.000 description 22
- 238000010586 diagram Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 9
- 238000012549 training Methods 0.000 description 8
- 210000004209 hair Anatomy 0.000 description 7
- 238000013527 convolutional neural network Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 2
- 230000000593 degrading effect Effects 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000001308 synthesis method Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/77—Retouching; Inpainting; Scratch removal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4046—Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/73—Deblurring; Sharpening
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present application relates to the technical field of image processing, and in particular, to a portrait restoration method, apparatus, electronic device, storage medium and program product.
- the existing camera equipment collects images, it is subject to factors such as its own design, environment, and the operation of the cameraman, and the imaging effect may be unsatisfactory, especially when shooting portraits, where noise, blur, and local deformation of portraits are common.
- the problem is due to factors such as its own design, environment, and the operation of the cameraman, and the imaging effect may be unsatisfactory, especially when shooting portraits, where noise, blur, and local deformation of portraits are common. The problem.
- the present application provides a portrait restoration method, apparatus, electronic device, storage medium and program product.
- a first aspect of the embodiments of the present application provides a portrait restoration method, the method includes: acquiring a face image to be repaired; extracting a brightness channel of the face image to be repaired, performing portrait restoration based on the brightness channel, and obtaining a target person face image; fuse the color channel of the target face image and the face image to be repaired to obtain a first face repair image; perform image transformation processing on the first face repair image to obtain a second face image Fix images.
- the extracting the luminance channel of the face image to be repaired includes: when the format of the face image to be repaired is the first format, extracting the the brightness channel of the face image to be repaired; or when the format of the face image to be repaired is the second format, convert the format of the face image to be repaired to the first format, extract the luminance channel of the face image to be repaired after format conversion.
- the brightness channel can be directly extracted for the face image to be repaired in the first format that can directly extract the brightness channel, and the brightness channel cannot be directly extracted.
- the face image to be repaired in the second format is converted into the first format and then the brightness channel is extracted, so as to ensure that the face image to be repaired in various formats can be repaired based on the brightness channel, which is conducive to improving the Applicability to the image format of the face to be repaired.
- performing portrait restoration based on the luminance channel to obtain a target face image includes: inputting the luminance channel into a trained neural network model to perform portrait restoration, and obtaining the target face image.
- the trained neural network model is used for portrait restoration, which is beneficial to repair the noisy, blurred, and deformed face image to be restored due to problems such as poor illumination, jitter, out-of-focus, and digital zoom, and improves human facial features and hair. , skin clarity and texture detail.
- the neural network model includes a first network, a second network, a third network, and a fourth network
- the second network includes N fuzzy upsampling modules
- the fuzzy upsampling in at least one fuzzy upsampling module in the N fuzzy upsampling modules includes a blur (Blur) convolution
- the weight of the convolution kernel of the fuzzy convolution is a preset fixed value, where N is an integer greater than 1
- the neural network model has shortcut connections at the input of the first network, the output of the second network, and the output of the third network, and the output of the first network And there is a shortcut connection at the output of the fourth network.
- the input of the first network, the output of the second network, and the output of the third network are the highest resolution scale
- the output of the first network and the output of the fourth network are the lowest resolution scale
- the highest resolution scale Shortcut connection with the lowest resolution scale is beneficial to prevent the neural network model from overfitting, and it can make the iteration speed faster during training
- fuzzy upsampling has a fuzzy convolution operation, and the weight of the convolution kernel used is from
- the neural network model is fixed at the beginning of training, and its role is equivalent to a low-pass filter, which is conducive to generating smooth and natural contours and hair in the process of image restoration.
- inputting the luminance channel into a trained neural network model to perform portrait restoration to obtain the target face image includes: using the first network to The luminance channel is used for encoding to obtain a target feature map; the second network and the third network are used to decode the target feature map to obtain the target face image.
- the first network is used for encoding to reduce the size of the input luminance channel, and the target feature map is extracted, the second network restores the size of the luminance channel during the decoding process, and the third network is decoding In the process, it is beneficial to ensure the stability of the neural network model, and finally the target face image with the restored brightness channel can be obtained.
- the encoding operation on the luminance channel using the first network to obtain a target feature map includes: inputting the luminance channel into the first network for performing an encoding operation on the luminance channel. down-sampling to obtain a first feature map; using the fourth network to perform high-level feature extraction on the first feature map to obtain a high-level feature map; superimposing the first feature map and the high-level feature map to obtain the Describe the target feature map.
- the fourth network adopts the structure of the residual block, it is beneficial to extract high-level features, and the output of the first network and the output of the fourth network are superimposed by a shortcut connection, which can prevent the neural network model on the one hand. Overfitting, on the other hand, can enrich feature information.
- performing a decoding operation on the target feature map by using the second network and the third network to obtain the target face image includes: The target feature map is input into the N fuzzy upsampling modules in the second network for fuzzy upsampling to obtain a second feature map; the first to (N-1)th The feature maps output by the fuzzy upsampling modules are input to the third network for upsampling to obtain a third feature map; the target is obtained by superimposing the luminance channel, the second feature map and the third feature map face image.
- the second network adopts N fuzzy upsampling modules to perform fuzzy upsampling, which is conducive to generating smooth and natural contours and hair while restoring the size of the target feature map;
- Upsampling to the feature map output by the (N-1)th fuzzy upsampling module is beneficial to ensure the stability of the neural network model, and the input of the first network, the output of the second network and the output of the third network are connected by a shortcut.
- the output superposition can prevent the neural network model from overfitting on the one hand, and on the other hand, is conducive to enriching the feature information and improving the restoration quality of the target face image.
- the third network includes (N-1) upsampling modules; ) The feature maps output by the fuzzy upsampling modules are input into the third network for upsampling, and the third feature maps are obtained, including: the feature maps output by the first fuzzy upsampling module in the N fuzzy upsampling modules.
- the number of channels of the feature map output by the i-th fuzzy up-sampling module in the up-sampling module is compressed to obtain a second compressed feature map; wherein, i is an integer greater than 1 and less than N; the (N-1) upper
- the feature map output by the (i-1)th upsampling module in the sampling module is superimposed with the second compressed feature map, and the feature map obtained after the superposition is input into the (N-1)th upsampling module.
- the i up-sampling modules perform up-sampling; after processing by the (N-1) up-sampling modules, the third feature map is obtained.
- compressing the number of channels of the feature maps output by the first to (N-1)th fuzzy upsampling modules in the N fuzzy upsampling modules is beneficial to ensure that at least one upsampling module in the third network is The number of input channels is the same, which is beneficial to improve the stability of the neural network model.
- the obtaining the face image to be repaired includes: performing face detection on the collected original image; cropping based on the position of the detected face in the original image A face image is obtained; the face image is zoomed to obtain the face image to be repaired.
- face detection is performed, and then the face image is cropped, and the face image is scaled to a fixed size, which is conducive to the restoration of a larger size face image.
- the method further includes: performing portrait segmentation on the original image to obtain a portrait mask; after obtaining the second face restoration image, the method further includes: performing Gaussian blur on the edge of the portrait mask; based on the position of the face image cropped in the original image and the The portrait mask pastes the face in the second face restoration image back to the cropped original image to complete the restoration of the original image.
- the position of the face in the original image can be determined based on the cropped position of the face image in the original image and the face mask, so that the repaired face in the second face repaired image can be pasted back
- the background part still uses the background in the original image
- Gaussian blurring the edges of the portrait mask before sticking back to the face can make the final repaired image smoother and more natural.
- performing image transformation processing on the first face restoration image to obtain a second face restoration image includes: performing image transformation on the first face restoration image. Color correction; determine the zoom ratio; if the zoom ratio is greater than the preset ratio, use super-resolution technology to zoom the first face restoration image after color correction to obtain the second face restoration image.
- color correction is performed on the first face restoration image, and the first face restoration image after color correction is scaled, and the size of the first face restoration image is restored to the size of the cropped face image, that is, A second face restoration avatar with better quality is obtained.
- the zoom ratio of the first face restoration image exceeds the preset rate, the super-resolution technology is used for scaling, which is beneficial to improve the resolution of the second face restoration image.
- the method before acquiring the face image to be repaired, the method further includes: constructing a sample image pair; the sample image pair includes a first face image and a a second face image obtained from a face image; the sample image is trained on the input neural network, and a repaired image of the second face image is output; determined according to the repaired image and the first face image target loss; the neural network model is obtained by adjusting the parameters of the neural network to minimize the target loss.
- a pair of sample images is used to train the neural network. There is a degraded image in the sample image pair. It is beneficial to improve the generalization of the neural network model; minimizing the target loss can make the repaired image output by the neural network model as close as possible to the quality of the first face image, and increase the processing of details such as contours and hairs in the repaired image.
- the target loss includes at least one of regression loss, perceptual loss, generative adversarial loss, and context loss.
- the target loss includes at least one of regression loss, perceptual loss, generative adversarial loss, and context loss to train the neural network model, so that the neural network model can be used to solve various problems existing in the degraded image as a whole. Repair to improve the quality of portrait repairs.
- the constructing a sample image pair includes: acquiring the preset first face image; if the image quality of the first face image is not degraded, Perform atmospheric disturbance degradation on the first face image to obtain a first degraded image; downsample the first degraded image to obtain a target degraded image; perform up-sampling on the target degraded image to obtain a second degradation image obtaining a third degraded image according to the second degraded image; compressing the third degraded image by using a preset compression quality parameter to obtain a fourth degraded image; determining a rectangular area in the fourth degraded image, and determine the target area corresponding to the rectangular area in the first face image; use the pixel values in the target area to replace the corresponding pixel values in the rectangular area to obtain the second face image , construct the sample image pair with the first face image and the second face image; or if the picture quality of the first face image is degraded, use the two first face images
- the sample image pair if the image quality of the first face image is
- the image quality of the first face image is judged. If the first face image itself is relatively clear and the image quality is not degraded, a series of degradation processes are performed on the first face image to synthesize a A second face image with degradation problem, so that the second face image is similar to the degraded image actually collected, so as to simulate the scene of repairing the real degraded image; if the first face image itself has degradation problem, it is not necessary to It needs to be degraded, and two first face images can be directly used to form a sample image pair to simulate a scene of repairing a real degraded image.
- the obtaining a third degraded image according to the second degraded image includes: adding noise to a luminance channel of the second degraded image, and applying noise to the second degraded image. Performing non-local average denoising on the degraded image to obtain the third degraded image; or performing a blurring operation on the second degraded image to obtain a fifth degraded image; adding noise to the luminance channel of the fifth degraded image, and Performing non-local average denoising on the fifth degraded image to obtain the third degraded image.
- various degradation processes are performed on the second degraded image by means of blur operation, noise superposition, non-local average denoising, etc., which is beneficial to make the third degraded image have various degradation problems, so that there are more degradation problems in subsequent use. images of the degradation problem to train a neural network model.
- a second aspect of the embodiments of the present application provides a portrait restoration device, the device comprising:
- an image acquisition module used to acquire the face image to be repaired
- a portrait restoration module configured to extract the brightness channel of the face image to be repaired, perform portrait restoration based on the brightness channel, and obtain a target face image
- an image fusion module configured to fuse the color channel of the target face image and the face image to be repaired to obtain a first face repair image
- An image adjustment module configured to perform image transformation processing on the first face restoration image to obtain a second face restoration image.
- a third aspect of the embodiments of the present application provides an electronic device, the electronic device includes an input device and an output device, and further includes a processor, adapted to implement one or more instructions; and a computer storage medium, the computer storage medium storing There is one or more instructions adapted to be loaded by the processor and to perform the steps in any of the embodiments of the first aspect above.
- a fourth aspect of the embodiments of the present application provides a computer storage medium, where the computer storage medium stores one or more instructions, and the one or more instructions are suitable for being loaded by a processor and executing any one of the foregoing first aspects steps in the implementation.
- a fifth aspect of the embodiments of the present application provides a computer program product, including computer-readable codes, when the computer-readable codes are executed in an electronic device, the processor in the electronic device executes the code for implementing the first A step in any embodiment of the aspect.
- the embodiment of the present application obtains the face image to be repaired; extracts the brightness channel of the face image to be repaired, performs portrait repair based on the brightness channel, and obtains the target face image; Fusion with the color channel of the face image to be repaired to obtain a first face repair image; image transformation processing is performed on the first face repair image to obtain a second face repair image.
- extracting the brightness channel of the face image to be repaired performing portrait repair based on the extracted brightness channel to obtain the target face image with the brightness channel repaired, and then merging the color channels to obtain the first repaired face repair image.
- FIG. 1 is a schematic diagram of an application environment provided by an embodiment of the present application.
- FIG. 2 is a schematic flowchart of a method for restoring a portrait according to an embodiment of the present application
- FIG. 3 is a schematic structural diagram of a neural network model provided by an embodiment of the present application.
- FIG. 4 is a schematic diagram of decoding a feature map according to an embodiment of the present application.
- FIG. 5 is a schematic diagram of constructing a sample image pair according to an embodiment of the present application.
- FIG. 6 is a schematic diagram of a replacement pixel value provided by an embodiment of the present application.
- FIG. 7 is a schematic flowchart of another portrait restoration method provided by an embodiment of the present application.
- FIG. 8 is a schematic structural diagram of a portrait restoration device provided by an embodiment of the application.
- FIG. 9 is a schematic structural diagram of another portrait restoration device provided by the application embodiment.
- FIG. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
- the embodiment of the present application proposes a solution for performing portrait restoration on a face image, which is beneficial to improve the quality of the restored face image and improve the overall restoration effect of the face image.
- the application environment includes an image acquisition device and a server.
- the image acquisition device can be a mobile phone, a tablet, a camera, a video camera, etc.
- the server can be an independent physical server. It can also be a server cluster or a distributed system, and can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, as well as big data and artificial intelligence.
- Cloud servers for basic cloud computing services such as intelligent platforms.
- the image capture device is used to capture or capture images, and the image can be a single photo or a video, such as a user's selfie, a video captured in a video capture scene, etc. Since the image capture device has light during capture Poor, shaking, out of focus, digital zoom and other conditions make the face in the image have various problems such as noise, blur, deformation, etc.
- the user can send a portrait restoration request to the server through the image acquisition device to request the server to repair the image. The face in the image is repaired.
- the server After receiving the image collected by the image acquisition device, the server performs a series of operations such as face detection, face segmentation, and face repair, and finally outputs the image with the completed face repair.
- models such as face detection, face segmentation, and face restoration can be deployed on the server, and the server can implement the entire process of face restoration by invoking these models.
- the portrait restoration method proposed by the embodiments of the present application may be executed by a server or an image acquisition device, for example, a model such as face restoration is deployed on the image acquisition device.
- FIG. 2 is a schematic flowchart of a portrait restoration method provided by an embodiment of the present application, applied to a server, as shown in FIG. 2, including steps S21-S24:
- the face image to be repaired refers to the face image obtained based on the original image with poor imaging and directly used for repair.
- the face detection algorithm is used to perform Face detection, crop the face image based on the position of the detected face in the original image, for example: Faster R-CNN (Faster Region-Convolutional Neural Networks, faster candidate area convolutional neural network detector) , YOLO (You Only Look Once, a glance target detector), etc. for face detection, crop a square face image based on the face detection frame, and scale the cropped face image to the preset size to get the desired size.
- Faster R-CNN Faster Region-Convolutional Neural Networks, faster candidate area convolutional neural network detector
- YOLO You Only Look Once, a glance target detector
- Repair face images which can repair larger size face images, such as resolution 896*896, and then use portrait segmentation technology to segment the masks of the portrait and background from the original images collected by the image acquisition device, and record the mask.
- the matrix is M, where the portrait mask is represented as 1 and the background part is represented as 0.
- the target face image refers to an image obtained by performing brightness channel repair on the face image to be repaired.
- the format of the face image to be repaired is the first format
- the brightness of the face image to be repaired is extracted.
- channel perform portrait restoration based on the luminance channel to obtain the target face image
- the format of the face image to be restored is the second format
- extract the format conversion After the brightness channel of the face image to be repaired, the portrait repair is performed based on the brightness channel to obtain the target face image.
- the first format refers to the YUV format
- the second format refers to the RGB format.
- the luminance channel can be directly extracted, and for the face image to be repaired in RGB format, you can After converting to the first format and then extracting the brightness channel, it can ensure that the face images to be repaired in various formats can be repaired based on the brightness channel, and the format of the face image to be repaired has wider applicability.
- the above-mentioned performing portrait restoration based on the brightness channel to obtain a target face image includes: inputting the brightness channel into a trained neural network model to perform portrait restoration to obtain the target face image .
- the trained neural network model is used for portrait restoration.
- the structure of the neural network model is shown in Figure 3, which mainly includes a first network, a second network, a third network and a fourth network.
- the input layer extracts the brightness channel
- the first network uses multiple down-sampling modules for encoding
- the fourth network performs high-level feature extraction on the output of the first network
- the second network and the third network extract the output of the first network and the fourth network.
- the superposition of the output of the first network, the output of the second network and the output of the third network are processed by the output layer to obtain a target face image with the same size as the face image to be repaired.
- the image refers to the face image whose luminance channel is repaired, and the first face repair image is output by fusing the target face image and the color channel of the face image to be repaired.
- the second network includes N fuzzy upsampling modules, the fuzzy upsampling in at least one of the N fuzzy upsampling modules includes fuzzy convolution, and the weight of the convolution kernel of the fuzzy convolution is preset Fixed value, the neural network model has shortcut connections at the input of the first network, the output of the second network and the output of the third network, and there are shortcut connections at the output of the first network and the output of the fourth network.
- the input of the first network, the output of the second network, and the output of the third network are the highest resolution scales, and the output of the first network and the output of the fourth network are the lowest resolution scales.
- Shortcut connections are used to prevent over-fitting of the neural network model, and the iteration speed can be faster during training;
- fuzzy upsampling has a fuzzy convolution operation, and the weight of the convolution kernel used is from the training of the neural network model. It is fixed at the beginning, and its function is equivalent to a low-pass filter, which is conducive to generating smooth and natural contours and hair in the process of image restoration.
- Such a neural network model is conducive to repairing the face images to be repaired that are noisy, blurred, and deformed by problems such as poor lighting, jitter, out-of-focus, and digital zoom, and improve the clarity and texture details of human facial features, hair, and skin.
- inputting the brightness channel into a trained neural network model to perform portrait restoration to obtain the target face image includes: using the first network to encode the brightness channel , obtain the target feature map; use the second network and the third network to decode the target feature map to obtain the target face image.
- using the first network to perform an encoding operation on the brightness channel to obtain a target feature map includes: inputting the brightness channel into the first network for downsampling to obtain a first feature map; Using the fourth network to perform high-level feature extraction on the first feature map to obtain a high-level feature map; and superimposing the first feature map and the high-level feature map to obtain the target feature map.
- the first feature map refers to the low-resolution feature map obtained after downsampling by multiple downsampling modules in the first network
- the high-level feature map refers to the feature map obtained after deep feature extraction using the fourth network.
- the first feature map and the high-level feature map are superimposed through shortcut connections to obtain the target feature map. It should be understood that by superimposing the output of the first network and the output of the fourth network with a shortcut connection, on the one hand, the neural network model can be prevented from overfitting, and on the other hand, feature information can be enriched; the fourth network can be a residual block, the residual Difference block is a conventional setting in residual network, and it has a good performance in the extraction of deep features or high-level features.
- the above-mentioned use of the second network and the third network to decode the target feature map to obtain the target face image includes:
- the feature maps output by the first to (N-1)th fuzzy upsampling modules in the N fuzzy upsampling modules are input into the third network for upsampling to obtain a third feature map;
- the fuzzy upsampling module in the second network and the downsampling module in the first network have a symmetrical structure and are used to restore the size of the target feature map.
- the second feature map refers to the fuzzy upsampling through N fuzzy upsampling modules.
- the convolution layer in at least one fuzzy up-sampling module performs convolution processing in the manner of standard convolution-fuzzy convolution-standard convolution. As shown in Figure 3, the processing order of the N fuzzy upsampling modules is the first fuzzy upsampling module, the second fuzzy upsampling module, the third fuzzy upsampling module...the Nth fuzzy upsampling module from left to right.
- the sampling module for the feature maps output by the 1st to (N-1)th fuzzy upsampling modules in the N fuzzy upsampling modules, input the third network for upsampling, and the third feature map is upsampled by the third network.
- the target face image can be obtained by superimposing the high-resolution luminance channel, the second feature map and the third feature map.
- the third network upsamples the feature maps output by the 1st to (N-1)th fuzzy upsampling modules in the second network, which is beneficial to ensure the stability of the neural network model.
- the output of the second network and the output of the third network are superimposed, which on the one hand can prevent the neural network model from overfitting, and on the other hand is conducive to enriching feature information and improving the restoration quality of the target face image.
- the third network includes (N-1) upsampling modules; the above-mentioned 1st to (N-1)th fuzzy upsampling modules among the N fuzzy upsampling modules
- the output feature map is input into the third network for upsampling to obtain a third feature map, including: compressing the number of channels of the feature map output by the first fuzzy upsampling module in the N fuzzy upsampling modules to obtain the first compressed feature map; inputting the first compressed feature map into the first upsampling module in the (N-1) upsampling modules for upsampling;
- the number of channels of the feature maps output by the fuzzy upsampling modules is compressed to obtain a second compressed feature map; wherein, i is an integer greater than 1 and less than N; -1)
- the feature maps output by the up-sampling modules are superimposed with the second compressed feature map, and the feature maps obtained after the superposition are input into the i-th up-sampling module in the (N-1) up-sampling modules for processing.
- the processing order of the (N-1) upsampling modules in the third network is the first upsampling module, the second upsampling module...the (N-1) ) upsampling modules, the upsampling in at least one upsampling module is completed by standard convolution, and for the feature maps output by the 1st to (N-1)th fuzzy upsampling modules, the upsampling module of the third network is input Before, compress the number of channels first, so that the number of channels of the feature maps input by at least one upsampling module is the same.
- the first compressed feature map is the feature map obtained by compressing the feature map output by the first fuzzy upsampling module by channel number
- the second feature map is the feature map output by the ith fuzzy upsampling module.
- the input of the i-th up-sampling module is the (i-1) The superposition of the feature map output by the upsampling module and the feature map output and compressed by the i-th fuzzy upsampling module.
- the input of the second upsampling module is the first upsampling
- the input of the third upsampling module is the feature map output by the second upsampling module and the third fuzzy upsampling module.
- the input of the (N-1)th upsampling module is the feature map output by the N-2th upsampling module and the output and compression of the (N-1)th fuzzy upsampling module
- the third feature map is output through the upsampling processing of (N-1) upsampling modules in the third network.
- Compressing the number of channels of the feature maps output by the 1st to (N-1)th fuzzy upsampling modules is beneficial to ensure that the number of channels of the input of at least one upsampling module in the third network is the same, which is beneficial to improve the neural network. Stability of the network model.
- the first face restoration image refers to the face image obtained by the neural network model restoration and color channel fusion.
- For the target face image whose brightness channel is restored calculate the difference between the image and the face image to be restored.
- the ratio information of the color channel of the target face image and the color channel of the face image to be repaired are fused according to the calculated ratio to achieve image enhancement, and the first face repair image is output.
- S24 Perform image transformation processing on the first face restoration image to obtain a second face restoration image.
- the color-corrected first face restoration image is scaled, and its size is restored to the face cut out in step S21.
- the size of the image that is, to obtain the second face restoration image with better quality, is beneficial to improve the resolution of the second face restoration image.
- the current need to be zoomed is determined. If the zoom ratio exceeds 1.5 times, the super-resolution technology is used to perform 2x scaling to restore the first face.
- the size of the face restoration image for example, can be scaled by SRCNN (Super-Resolution Convolutional Neural Network, super-resolution reconstruction convolutional neural network).
- the method further includes:
- Gaussian blur is performed on the edge of the portrait mask; based on the position where the face image is cropped in the original image and the portrait mask, the face in the second face restoration image is pasted back after the cropping to complete the restoration of the original image.
- the position of the face in the original image can be determined based on the cropped position of the face image in the original image and the portrait mask, so that the restored face in the second face restoration image can be pasted back to the cropped original image While the background part still uses the background in the original image, the edge of the portrait mask is Gaussian blurring based on the mask matrix M before the face is pasted back, which can make the final repaired image smoother and more natural.
- the method before acquiring the face image to be repaired, the method further includes: constructing a sample image pair; the sample image pair includes a first face image and an image obtained based on the first face image The second face image of The parameters of the neural network are adjusted to minimize the target loss to obtain the neural network model.
- the training of the neural network model adopts paired images, that is, the first face image and the second face image in the sample image pair, the first face image refers to the pre-prepared image, and the second face image is Refers to the degraded image with image quality problems obtained based on the first face image, such as: out-of-focus blur, noise, compression loss, sampling aliasing, ISP (Image Signal Processor, image signal processor) denoising residual, etc.
- the second face image can be the same face image as the first face image; it can also be a face image obtained by degrading the first face image.
- the second face image can be synthesized based on the first face image.
- the method shown in the above steps S22 and S23 is used to repair the face in the second face image to obtain the repaired image of the second face image, and then
- the target loss is calculated based on the repaired image and the first face image.
- the target loss includes at least one of regression loss, perceptual loss, generative adversarial loss and context loss.
- the parameters of the neural network are adjusted based on the target loss, and the target loss is minimized by minimizing the target loss.
- the loss gets the trained neural network model.
- the regression loss L 1
- X represents the repaired image output by the neural network
- Y represents the first face image
- the regression loss is used to minimize the distance between the repaired image and the corresponding pixels of the first face image information such as the L1 distance, processing noise, and maintaining the color of the final restored image.
- the perceptual loss is used to minimize the L1 distance between the inpainted image and the first face image in the depth feature space, which can make the inpainted image more realistic and natural visually.
- the depth feature space can pass the trained VGG (Visual Geometry Group, Visual Geometry Group) network extraction, l represents the number of layers of VGG features.
- L GAN -F minus (a real ,D(X))+F plus (a fake ,D(X)), generating adversarial loss using the discriminator to discriminate between the repaired image and the first face image, using
- F minus and F plus represent two metric functions of one positive and one negative in the skewness
- a real and a fake are two fixed anchor values
- D is the discriminator
- D(X) represents the discriminator's discrimination result on the repaired image
- Context loss L CX (X,Y) ⁇ l ⁇ S -log CX( ⁇ 1 (X), ⁇ 1 (Y)), the context loss is used to calculate the cosine of the repaired image and the first face image in the feature space distance, and minimize the diversity of the cosine distance, through the loss between the first face image to ensure the consistency of the final restored image content.
- CX represents the calculated cosine distance
- ⁇ represents the feature extraction network, which can be a VGG network
- l also represents the number of feature layers
- a pair of sample images is used to train the neural network.
- the target loss includes at least one of regression loss, perceptual loss, generative adversarial loss and context loss to train the neural network model, so that the neural network model can repair various problems existing in degraded images as a whole, and improve the performance of portrait restoration. quality.
- the constructing a sample image pair includes:
- step S503 If yes, go to step S503; if not, go to step S504.
- the image quality is first judged to determine whether the image quality is degraded. If the image quality is degraded, no degradation processing is required.
- the two first face images are used to construct a sample image pair, and Determine any one of the two as the second face image, and if the image quality is not degraded, then degrade it. Specifically, it can be realized by using a preset algorithm, input the first face image, and if it finally returns to the original first face image face image, it means that the image quality of the first face image itself is degraded.
- the first degraded image is 0- 8 times downsampling to obtain a low-resolution target degraded image
- perform corresponding up-sampling on the target degraded image to obtain a second degraded image with the same resolution as the first degraded image
- add the luminance channel of the second degraded image noise, and perform non-local average denoising to obtain a third degraded image.
- the second degraded image can also be blurred to obtain a corresponding degraded image (that is, the fifth degraded image).
- Noise is added to the luminance channel, and non-local average denoising is performed to obtain the third degraded image.
- the second degraded image is subjected to various degradation processing by means of blur operation, noise superposition, non-local average denoising, etc., which is beneficial to make the third degraded image have various degradation problems, so that the images with various degradation problems can be used in the follow-up.
- Train a neural network model For the third degraded image, a preset compression quality parameter is used to perform the JPEG compression operation, wherein the compression quality parameter can be set according to the actual situation.
- a rectangular area is randomly selected , and select the target area corresponding to this area in the first face image, and replace the pixel value in the rectangular area with the pixel value in the target area, that is, the synthesis of the degraded image is completed, and the second face image is obtained, by
- the first face image and the second face image constitute a sample image pair.
- the obtained second face image is closer to the actual degraded image.
- the image quality of the first face image is judged. If the first face image itself is relatively clear and its image quality is not degraded, then a series of degradation processes are performed on the first face image to synthesize a
- the second face image with degradation problem makes the second face image similar to the actually collected degraded image to simulate the scene of repairing the real degraded image; if the first face image itself has a degradation problem, it is not necessary to To degrade it, two first face images can be directly used to form a sample image pair to simulate the scene of repairing a real degraded image.
- the embodiment of the present application obtains the face image to be repaired; extracts the brightness channel of the face image to be repaired, performs portrait repair based on the brightness channel, and obtains the target face image; Fusion with the color channel of the face image to be repaired to obtain a first face repair image; image transformation processing is performed on the first face repair image to obtain a second face repair image.
- FIG. 7 is a schematic flowchart of another portrait restoration method provided by an embodiment of the present application, as shown in FIG. 7, including steps S71-S76:
- step S72 when the format of the face image to be repaired is the first format, extract the brightness channel of the face image to be repaired, and perform step S74;
- S76 Perform image transformation processing on the first face restoration image to obtain a second face restoration image.
- FIG. 8 is a schematic structural diagram of a portrait restoration device provided by an embodiment of the present application. As shown in Figure 8, the device includes:
- An image acquisition module 81 configured to acquire a face image to be repaired
- a portrait repair module 82 configured to extract the brightness channel of the face image to be repaired, perform portrait repair based on the brightness channel, and obtain a target face image
- An image fusion module 83 configured to fuse the color channel of the target face image and the face image to be repaired to obtain a first face repair image
- the image adjustment module 84 is configured to perform image transformation processing on the first face restoration image to obtain a second face restoration image.
- the portrait repair module 82 is specifically used for:
- the format of the face image to be repaired is the first format
- extract the luminance channel of the face image to be repaired or when the format of the face image to be repaired is the second format , converting the format of the face image to be repaired into the first format, and extracting the brightness channel of the face image to be repaired after the format conversion.
- the portrait restoration module 82 is specifically used for:
- the neural network model includes a first network, a second network, a third network and a fourth network
- the second network includes N fuzzy upsampling modules
- the N fuzzy upsampling modules The fuzzy upsampling in at least one fuzzy upsampling module in the sampling module includes fuzzy convolution, and the weight of the convolution kernel of the fuzzy convolution is a preset fixed value, wherein N is an integer greater than 1, and the neural
- the network model has shortcut connections at the input of the first network, the output of the second network and the output of the third network, and there are shortcuts at the output of the first network and the output of the fourth network connect.
- the portrait repair module 82 is specifically used for:
- the first network is used to encode the luminance channel to obtain a target feature map; the second network and the third network are used to decode the target feature map to obtain the target face image.
- the portrait restoration module 82 is specifically configured to:
- the portrait restoration module 82 is specifically configured to:
- the third network includes (N-1) upsampling modules; in the N fuzzy upsampling modules, the 1st to (N-1)th fuzzy upsampling modules are The output feature map is input into the third network for up-sampling to obtain the third feature map, and the portrait restoration module 82 is specifically used for:
- the first upsampling module in the upsampling modules performs upsampling; the channel number of the feature map output by the ith fuzzy upsampling module in the N fuzzy upsampling modules is compressed to obtain a second compressed feature map;
- i is an integer greater than 1 and less than N;
- the feature map output by the (i-1)th upsampling module in the (N-1) upsampling modules is superimposed with the second compressed feature map, and input the feature map obtained after superposition into the i-th up-sampling module in the (N-1) up-sampling modules for up-sampling; after processing by the (N-1) up-sampling modules, the Three feature maps.
- the image acquiring module 81 is specifically used for:
- the image acquisition module 81 is further configured to: perform portrait segmentation on the original image to obtain a portrait mask;
- the image adjustment module 84 is also used for:
- Gaussian blur is performed on the edge of the portrait mask; based on the position where the face image is cropped in the original image and the portrait mask, the face in the second face restoration image is pasted back after the cropping to complete the restoration of the original image.
- the image adjustment module 84 is specifically used for:
- the apparatus further includes a model building module 85, and the model building module 85 is used for:
- the sample image pair includes a first face image and a second face image obtained based on the first face image; inputting the sample image pair into a neural network for training, and outputting the second face image A repaired image of a face image; a target loss is determined according to the repaired image and the first face image; the parameters of the neural network are adjusted to minimize the target loss to obtain the neural network model.
- the target loss includes at least one of regression loss, perceptual loss, generative adversarial loss, and context loss.
- model building module 85 is specifically used to:
- the preset first face image if the image quality of the first face image is not degraded, perform atmospheric disturbance degradation on the first face image to obtain a first degraded image;
- the first degraded image is down-sampled to obtain a target degraded image;
- the target degraded image is up-sampled to obtain a second degraded image;
- a third degraded image is obtained according to the second degraded image; compressing the third degraded image to obtain a fourth degraded image; determining a rectangular area in the fourth degrading image, and determining the target area corresponding to the rectangular area in the first face image; using the The pixel values in the target area are replaced with the corresponding pixel values in the rectangular area to obtain the second face image, and the sample image pair is constructed with the first face image and the second face image or if the picture quality of the first face image is degraded, then construct the pair of sample images with two of the first face images, and any one of the two is determined as the second person face image.
- the model building module 85 is specifically configured to:
- Noise is added to the luminance channel of the second degraded image, and non-local average denoising is performed on the second degraded image to obtain the third degraded image; or, a blurring operation is performed on the second degraded image to obtain A fifth degraded image; adding noise to the luminance channel of the fifth degraded image, and performing non-local average denoising on the fifth degraded image to obtain the third degraded image.
- each unit in the human image restoration device shown in FIG. 8 or FIG. 9 may be respectively or all combined into one or several other units to form, or some of the unit(s) may also be It is further divided into multiple units with smaller functions, which can realize the same operation without affecting the realization of the technical effects of the embodiments of the present application.
- the above-mentioned units are divided based on logical functions.
- the function of one unit may also be implemented by multiple units, or the functions of multiple units may be implemented by one unit.
- the image-based restoration device may also include other units, and in practical applications, these functions may also be implemented with the assistance of other units, and may be implemented by cooperation of multiple units.
- a general-purpose computing device such as a computer
- a general-purpose computing device may be implemented on a general-purpose computing device including a central processing unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and other processing elements and storage elements.
- CPU central processing unit
- RAM random access storage medium
- ROM read-only storage medium
- Run a computer program capable of executing the steps involved in the corresponding method as shown in FIG. 2 or FIG. 7, to construct the portrait restoration apparatus as shown in FIG. 8 or FIG. 9, and to realize the present invention.
- the portrait restoration method of the application embodiment The computer program can be recorded on, for example, a computer-readable recording medium, and loaded in the above-mentioned computing device through the computer-readable recording medium, and executed therein.
- the embodiments of the present application further provide an electronic device.
- the electronic device includes at least a processor 1001 , an input device 1002 , an output device 1003 and a computer storage medium 1004 .
- the processor 1001 , the input device 1002 , the output device 1003 and the computer storage medium 1004 in the electronic device may be connected through a bus or other means.
- the computer storage medium 1004 can be stored in the memory of the electronic device, the computer storage medium 1004 is used for storing a computer program, the computer program includes program instructions, and the processor 1001 is used for executing the program stored in the computer storage medium 1004 instruction.
- the processor 1001 (or called CPU (Central Processing Unit, central processing unit)) is the computing core and the control core of the electronic device, which is suitable for implementing one or more instructions, specifically suitable for loading and executing one or more instructions to achieve Corresponding method flow or corresponding function.
- the computer storage medium may be a volatile storage medium or a non-volatile storage medium.
- the processor 1001 of the electronic device provided in this embodiment of the present application may be configured to perform a series of portrait restoration processing: acquiring a face image to be restored; extracting a luminance channel of the face image to be restored, based on the Performing portrait restoration on the brightness channel to obtain a target face image; fusing the target face image with the color channel of the face image to be restored to obtain a first face restoration image; performing the first face restoration image on the first face restoration image Image transformation processing to obtain a second face restoration image.
- the processor 1001 performing the extracting the luminance channel of the face image to be repaired includes: in the case that the format of the face image to be repaired is the first format, extracting the face image to be repaired. the brightness channel of the face image; or when the format of the face image to be repaired is the second format, convert the format of the face image to be repaired to the first format, and extract the format converted the luminance channel of the face image to be repaired.
- the processor 1001 executes the performing portrait restoration based on the luminance channel to obtain a target face image, including: inputting the luminance channel into a trained neural network model to perform portrait restoration to obtain the target face image. face image.
- the neural network model includes a first network, a second network, a third network, and a fourth network
- the second network includes N fuzzy upsampling modules, among the N fuzzy upsampling modules.
- the fuzzy upsampling in at least one fuzzy upsampling module includes fuzzy convolution, and the weight of the convolution kernel of the fuzzy convolution is a preset fixed value, wherein N is an integer greater than 1, and the neural network model is There are shortcut connections at the input of the first network, the output of the second network, and the output of the third network, and there are shortcut connections at the output of the first network and the output of the fourth network.
- the processor 1001 executes the process of inputting the brightness channel into the trained neural network model to perform portrait restoration to obtain the target face image, which includes: using the first network to perform an image restoration on the brightness channel.
- the encoding operation is performed to obtain a target feature map; the second network and the third network are used to perform a decoding operation on the target feature map to obtain the target face image.
- the processor 1001 performs the encoding operation on the luminance channel using the first network to obtain a target feature map, including: inputting the luminance channel into the first network for downsampling to obtain the target feature map. the first feature map; using the fourth network to perform high-level feature extraction on the first feature map to obtain a high-level feature map; superimposing the first feature map and the high-level feature map to obtain the target feature map .
- the processor 1001 performs the decoding operation on the target feature map using the second network and the third network to obtain the target face image, including:
- the third network includes (N-1) upsampling modules; the processor 1001 executes the step of performing the fuzzy upsampling of the 1st to (N-1)th fuzzy upsampling modules in the N fuzzy upsampling modules.
- the feature map output by the sampling module is input into the third network for up-sampling to obtain a third feature map, including: compressing the number of channels of the feature map output by the first fuzzy up-sampling module in the N fuzzy up-sampling modules , obtain the first compressed feature map; input the first compressed feature map into the first upsampling module in the (N-1) upsampling modules for upsampling; put the N fuzzy upsampling modules in the The number of channels of the feature map output by the ith fuzzy upsampling module is compressed to obtain the second compressed feature map; wherein, i is an integer greater than 1 and less than N;
- the feature maps output by the (i-1) upsampling modules are superimposed with the second compressed feature map, and the feature maps obtained after the superposition are input
- the processor 1001 performs the acquiring of the face image to be repaired, including: performing face detection on the collected original image; cropping out the face image based on the position of the detected face in the original image; ; scaling the face image to obtain the face image to be repaired.
- the processor 1001 is further configured to perform: perform portrait segmentation on the original image to obtain a portrait mask; After the second face image is restored, the processor 1001 is further configured to perform: Gaussian blurring the edge of the face mask; The face in the second face restoration image is pasted back to the cropped original image to complete the restoration of the original image.
- the processor 1001 executes the image transformation processing on the first face restoration image to obtain a second face restoration image, including: performing color correction on the first face restoration image; determining The zoom ratio; if the zoom ratio is greater than the preset ratio, the super-resolution technology is used to zoom the first face restoration image after color correction to obtain the second face restoration image.
- the processor 1001 before acquiring the face image to be repaired, is further configured to execute: constructing a sample image pair; the sample image pair includes a first face image and an image obtained based on the first face image. the second face image; train the sample image to the input neural network, and output the repaired image of the second face image; determine the target loss according to the repaired image and the first face image; The parameters of the neural network are adjusted to minimize the objective loss to obtain the neural network model.
- the target loss includes at least one of regression loss, perceptual loss, generative adversarial loss, and context loss.
- the processor 1001 executes the construction of the sample image pair, including: acquiring the preset first face image; if the image quality of the first face image is not degraded, Perform atmospheric disturbance degradation on the first face image to obtain a first degraded image; downsample the first degraded image to obtain a target degraded image; upsample the target degraded image to obtain a second degraded image; obtaining a third degraded image from the second degraded image; compressing the third degraded image by using preset compression quality parameters to obtain a fourth degraded image; determining a rectangular area in the fourth degraded image, and determining the The target area corresponding to the rectangular area in the first face image; the pixel value in the target area is used to replace the corresponding pixel value in the rectangular area to obtain the second face image, with the The first face image and the second face image construct the sample image pair; or if the image quality of the first face image is degraded, construct the sample with the two first face images image pair, and
- the processor 1001 performing the obtaining of the third degraded image according to the second degraded image includes: adding noise to a luminance channel of the second degraded image, and performing a non-degraded image on the second degraded image. local average denoising to obtain the third degraded image; or, performing a blurring operation on the second degraded image to obtain a fifth degraded image; adding noise to the luminance channel of the fifth degraded image, and applying noise to the first degraded image.
- the five degraded images are subjected to non-local average denoising to obtain the third degraded image.
- the above-mentioned electronic device may be a computer, a computer host, a server, a cloud server, a server cluster, or any image acquisition device such as a camera, a video camera, etc.
- the electronic device may include, but is not limited to, the processor 1001, the input device 1002, an output device 1003, and a computer storage medium 1004.
- the input device 1002 can be a keyboard, a touch screen, etc.
- the output device 1003 can be a speaker, a display, a radio frequency transmitter, and the like.
- the schematic diagram is only an example of an electronic device, and does not constitute a limitation to the electronic device, and may include more or less components than the one shown, or combine some components, or different components.
- the processor 1001 of the electronic device implements the steps in the above-mentioned portrait restoration method when executing the computer program, the above-mentioned embodiments of the portrait restoration method are all applicable to the electronic device, and can achieve the same or similar beneficial effects. Effect.
- Embodiments of the present application further provide a computer storage medium (Memory), where the computer storage medium is a memory device in an electronic device and is used to store programs and data.
- the computer storage medium here may include both a built-in storage medium in the terminal, and certainly also an extended storage medium supported by the terminal.
- the computer storage medium provides storage space, and the storage space stores the operating system of the terminal.
- one or more instructions suitable for being loaded and executed by the processor 1001 are also stored in the storage space, and these instructions may be one or more computer programs (including program codes).
- the computer storage medium here can be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory; optionally, it can also be at least one memory located far away from the aforementioned processing
- the computer storage medium of the device 1001 can be loaded and executed by the processor 1001, so as to implement the corresponding steps of the above-mentioned method for portrait restoration.
- the computer program of the computer storage medium includes computer program code, which may be in source code form, object code form, executable file or some intermediate form, and the like.
- the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (19)
- 一种人像修复方法,所述方法包括:获取待修复人脸图像;提取所述待修复人脸图像的亮度通道,基于所述亮度通道进行人像修复,得到目标人脸图像;将所述目标人脸图像与所述待修复人脸图像的颜色通道融合,得到第一人脸修复图像;对所述第一人脸修复图像进行图像变换处理,得到第二人脸修复图像。
- 根据权利要求1所述的方法,其中,所述提取所述待修复人脸图像的亮度通道,包括:在所述待修复人脸图像的格式为第一格式的情况下,提取所述待修复人脸图像的所述亮度通道;或在所述待修复人脸图像的格式为第二格式的情况下,将所述待修复人脸图像的格式转换为所述第一格式,提取格式转换后的所述待修复人脸图像的所述亮度通道。
- 根据权利要求1或2所述的方法,其中,所述基于所述亮度通道进行人像修复,得到目标人脸图像,包括:将所述亮度通道输入训练好的神经网络模型进行人像修复,得到所述目标人脸图像。
- 根据权利要求3所述的方法,其中,所述神经网络模型包括第一网络、第二网络、第三网络和第四网络,所述第二网络包括N个模糊上采样模块,所述N个模糊上采样模块中至少一个模糊上采样模块中的模糊上采样包括模糊卷积,所述模糊卷积的卷积核的权重是预先设定的固定值,其中,N为大于1的整数,所述神经网络模型在所述第一网络的输入、所述第二网络的输出以及所述第三网络的输出处存在捷径连接,在所述第一网络的输出以及所述第四网络的输出处存在捷径连接。
- 根据权利要求4所述的方法,其中,所述将所述亮度通道输入训练好的神经网络模型进行人像修复,得到所述目标人脸图像,包括:采用所述第一网络对所述亮度通道进行编码操作,得到目标特征图;采用所述第二网络和所述第三网络对所述目标特征图进行解码操作,得到所述目标人脸图像。
- 根据权利要求5所述的方法,其中,所述采用所述第一网络对所述亮度通道进行编码操作,得到目标特征图,包括:将所述亮度通道输入所述第一网络进行下采样,得到第一特征图;采用所述第四网络对所述第一特征图进行高层特征提取,得到高层特征图;将所述第一特征图与所述高层特征图进行叠加,得到所述目标特征图。
- 根据权利要求5或6所述的方法,其中,所述采用所述第二网络和所述第三网络对所述目标特征图进行解码操作,得到所述目标人脸图像,包括:将所述目标特征图输入所述第二网络中的所述N个模糊上采样模块进行模糊上采样,得到第二特征图;将所述N个模糊上采样模块中第1至第(N-1)个模糊上采样模块输出的特征图输入所述第三网络进行上采样,得到第三特征图;将所述亮度通道、所述第二特征图及所述第三特征图进行叠加得到所述目标人脸图像。
- 根据权利要求7所述的方法,其中,所述第三网络包括(N-1)个上采样模块;所述将所述N个模糊上采样模块中第1至第(N-1)个模糊上采样模块输出的特征图输入所述第三网络进行上采样,得到第三特征图,包括:对所述N个模糊上采样模块中第1个模糊上采样模块输出的特征图的通道数进行压缩,得到第一压缩特征图;将所述第一压缩特征图输入所述(N-1)个上采样模块中的第1个上采样模块进行上采样;将所述N个模糊上采样模块中第i个模糊上采样模块输出的特征图的通道数进行压缩,得到第二压缩特征图;其中,i为大于1且小于N的整数;将所述(N-1)个上采样模块中第(i-1)个上采样模块输出的特征图与所述第二压缩特征图进行叠加,并将叠加后得到的特征图输入所述(N-1)个上采样模块中第i个上采样模块进行上采样;经过所述(N-1)个上采样模块的处理,得到所述第三特征图。
- 根据权利要求1-8任一项所述的方法,其中,所述获取待修复人脸图像,包括:对采集的原始图像进行人脸检测;基于检测出的人脸在所述原始图像中的位置裁剪出人脸图像;对所述人脸图像进行缩放,得到所述待修复人脸图像。
- 根据权利要求9所述的方法,其中,在对所述人脸图像进行缩放,得到所述待修复人脸图像之后,所述方法还包括:对所述原始图像进行人像分割,得到人像掩码;在得到第二人脸修复图像之后,所述方法还包括:对所述人像掩码的边缘进行高斯模糊;基于所述人脸图像在所述原始图像中裁剪的位置及所述人像掩码将所述第二人脸修复图像中的人脸贴回裁剪后的所述原始图像,完成所述原始图像的修复。
- 根据权利要求1-8任一项所述的方法,其中,所述对所述第一人脸修复图像进行图像变换处理,得到第二人脸修复图像,包括:对所述第一人脸修复图像进行颜色矫正;确定缩放的倍率;若缩放的倍率大于预设倍率,则采用超分辨率技术对颜色矫正后的所述第一人脸修复图像进行缩放,得到所述第二人脸修复图像。
- 根据权利要求3-8任一项所述的方法,其中,在获取待修复人脸图像之前,所述方法还包括:构建样本图像对;所述样本图像对包括第一人脸图像和基于所述第一人脸图像得到的第二人脸图像;将所述样本图像对输入神经网络进行训练,输出所述第二人脸图像的修复图像;根据所述修复图像与所述第一人脸图像确定目标损失;对所述神经网络的参数进行调整,以最小化所述目标损失,获得所述神经网络模型。
- 根据权利要求12所述的方法,其中,所述目标损失包括回归损失、感知损失、生 成对抗损失以及上下文损失中的至少一种。
- 根据权利要求12所述的方法,其中,所述构建样本图像对,包括:获取预设的所述第一人脸图像;若所述第一人脸图像的画质不存在退化,则对所述第一人脸图像进行大气扰动退化,得到第一退化图像;对所述第一退化图像进行下采样,得到目标退化图像;对所述目标退化图像进行上采样,得到第二退化图像;根据所述第二退化图像得到第三退化图像;采用预设压缩质量参数对所述第三退化图像进行压缩,得到第四退化图像;在所述第四退化图像中确定出矩形区域,并确定所述矩形区域在所述第一人脸图像中对应的目标区域;采用所述目标区域内的像素值对所述矩形区域内对应的像素值进行替换,得到所述第二人脸图像,以所述第一人脸图像和所述第二人脸图像构建所述样本图像对;或若所述第一人脸图像的画质存在退化,则以两张所述第一人脸图像构建所述样本图像对,并将两张中的任一张确定为所述第二人脸图像。
- 根据权利要求14所述的方法,其中,所述根据所述第二退化图像得到第三退化图像,包括:在所述第二退化图像的亮度通道加噪声,并对所述第二退化图像进行非局部平均去噪,得到所述第三退化图像;或者,对所述第二退化图像进行模糊操作,得到第五退化图像;在所述第五退化图像的亮度通道加噪声,并对所述第五退化图像进行非局部平均去噪,得到所述第三退化图像。
- 一种人像修复装置,所述装置包括:图像获取模块,用于获取待修复人脸图像;人像修复模块,用于提取所述待修复人脸图像的亮度通道,基于所述亮度通道进行人像修复,得到目标人脸图像;图像融合模块,用于将所述目标人脸图像与所述待修复人脸图像的颜色通道融合,得到第一人脸修复图像;图像调整模块,用于对所述第一人脸修复图像进行图像变换处理,得到第二人脸修复图像。
- 一种电子设备,包括输入设备和输出设备,还包括:处理器,适于实现一条或多条指令;以及,计算机存储介质,所述计算机存储介质存储有一条或多条指令,所述一条或多条指令适于由所述处理器加载并执行如权利要求1-15任一项所述的方法。
- 一种计算机存储介质,所述计算机存储介质存储有一条或多条指令,所述一条或多条指令适于由处理器加载并执行如权利要求1-15任一项所述的方法。
- 一种计算机程序产品,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现权利要求1-15中任一项所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020237009537A KR102697331B1 (ko) | 2020-11-30 | 2021-04-27 | 인물 이미지 복원 방법, 장치, 전자 기기, 기억 매체 및 프로그램 제품 |
JP2023537450A JP7542156B2 (ja) | 2020-11-30 | 2021-04-27 | 人物画像修復方法、装置、電子機器、記憶媒体及びプログラム製品 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011386894.4 | 2020-11-30 | ||
CN202011386894.4A CN112330574B (zh) | 2020-11-30 | 2020-11-30 | 人像修复方法、装置、电子设备及计算机存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022110638A1 true WO2022110638A1 (zh) | 2022-06-02 |
Family
ID=74308400
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/090296 WO2022110638A1 (zh) | 2020-11-30 | 2021-04-27 | 人像修复方法、装置、电子设备、存储介质和程序产品 |
Country Status (4)
Country | Link |
---|---|
JP (1) | JP7542156B2 (zh) |
KR (1) | KR102697331B1 (zh) |
CN (1) | CN112330574B (zh) |
WO (1) | WO2022110638A1 (zh) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114782291A (zh) * | 2022-06-23 | 2022-07-22 | 中国科学院自动化研究所 | 图像生成器的训练方法、装置、电子设备和可读存储介质 |
CN115376188A (zh) * | 2022-08-17 | 2022-11-22 | 天翼爱音乐文化科技有限公司 | 一种视频通话处理方法、系统、电子设备及存储介质 |
CN115760646A (zh) * | 2022-12-09 | 2023-03-07 | 中山大学·深圳 | 一种针对不规则孔洞的多模态人脸图像修复方法和系统 |
CN116782041A (zh) * | 2023-05-29 | 2023-09-19 | 武汉工程大学 | 一种基于液晶微透镜阵列的图像质量提高方法及系统 |
CN117593462A (zh) * | 2023-11-30 | 2024-02-23 | 约翰休斯(宁波)视觉科技有限公司 | 三维空间场景的融合方法和系统 |
US20240143835A1 (en) * | 2022-11-02 | 2024-05-02 | Adobe Inc. | Anonymizing digital images utilizing a generative adversarial neural network |
CN118429505A (zh) * | 2024-07-02 | 2024-08-02 | 中山大学 | 一种基于高斯泼溅的三维人物模型生成方法 |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112330574B (zh) * | 2020-11-30 | 2022-07-12 | 深圳市慧鲤科技有限公司 | 人像修复方法、装置、电子设备及计算机存储介质 |
CN112862852A (zh) * | 2021-02-24 | 2021-05-28 | 深圳市慧鲤科技有限公司 | 图像处理方法及装置、电子设备及计算机可读存储介质 |
CN113034393A (zh) * | 2021-03-25 | 2021-06-25 | 北京百度网讯科技有限公司 | 照片修复方法、装置、设备以及存储介质 |
CN115222606A (zh) * | 2021-04-16 | 2022-10-21 | 腾讯科技(深圳)有限公司 | 图像处理方法、装置、计算机可读介质及电子设备 |
CN113222874B (zh) * | 2021-06-01 | 2024-02-02 | 平安科技(深圳)有限公司 | 应用于目标检测的数据增强方法、装置、设备及存储介质 |
CN113706402A (zh) * | 2021-07-23 | 2021-11-26 | 维沃移动通信(杭州)有限公司 | 神经网络训练方法、装置及电子设备 |
CN113763268B (zh) * | 2021-08-26 | 2023-03-28 | 中国科学院自动化研究所 | 人脸图像盲修复方法及系统 |
CN113793286B (zh) * | 2021-11-18 | 2022-05-10 | 成都索贝数码科技股份有限公司 | 一种基于多阶注意力神经网络的媒体图像水印移除方法 |
CN115294055A (zh) * | 2022-08-03 | 2022-11-04 | 维沃移动通信有限公司 | 图像处理方法、装置、电子设备和可读存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105469407A (zh) * | 2015-11-30 | 2016-04-06 | 华南理工大学 | 一种基于改进的引导滤波器的人脸图像图层分解方法 |
US20180315196A1 (en) * | 2017-04-27 | 2018-11-01 | Intel Corporation | Fast color based and motion assisted segmentation of video into region-layers |
CN111402135A (zh) * | 2020-03-17 | 2020-07-10 | Oppo广东移动通信有限公司 | 图像处理方法、装置、电子设备及计算机可读存储介质 |
CN112330574A (zh) * | 2020-11-30 | 2021-02-05 | 深圳市慧鲤科技有限公司 | 人像修复方法、装置、电子设备及计算机存储介质 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000004372A (ja) * | 1998-06-17 | 2000-01-07 | Konica Corp | 画像復元装置及び画像出力装置 |
JP2006338377A (ja) * | 2005-06-02 | 2006-12-14 | Fujifilm Holdings Corp | 画像補正方法および装置並びにプログラム |
US9262684B2 (en) * | 2013-06-06 | 2016-02-16 | Apple Inc. | Methods of image fusion for image stabilization |
CN105931211A (zh) * | 2016-04-19 | 2016-09-07 | 中山大学 | 一种人脸图像美化方法 |
CN107301625B (zh) * | 2017-06-05 | 2021-06-01 | 天津大学 | 基于亮度融合网络的图像去雾方法 |
KR102174777B1 (ko) * | 2018-01-23 | 2020-11-06 | 주식회사 날비컴퍼니 | 이미지의 품질 향상을 위하여 이미지를 처리하는 방법 및 장치 |
CN109389562B (zh) | 2018-09-29 | 2022-11-08 | 深圳市商汤科技有限公司 | 图像修复方法及装置 |
-
2020
- 2020-11-30 CN CN202011386894.4A patent/CN112330574B/zh active Active
-
2021
- 2021-04-27 WO PCT/CN2021/090296 patent/WO2022110638A1/zh active Application Filing
- 2021-04-27 JP JP2023537450A patent/JP7542156B2/ja active Active
- 2021-04-27 KR KR1020237009537A patent/KR102697331B1/ko active IP Right Grant
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105469407A (zh) * | 2015-11-30 | 2016-04-06 | 华南理工大学 | 一种基于改进的引导滤波器的人脸图像图层分解方法 |
US20180315196A1 (en) * | 2017-04-27 | 2018-11-01 | Intel Corporation | Fast color based and motion assisted segmentation of video into region-layers |
CN111402135A (zh) * | 2020-03-17 | 2020-07-10 | Oppo广东移动通信有限公司 | 图像处理方法、装置、电子设备及计算机可读存储介质 |
CN112330574A (zh) * | 2020-11-30 | 2021-02-05 | 深圳市慧鲤科技有限公司 | 人像修复方法、装置、电子设备及计算机存储介质 |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114782291A (zh) * | 2022-06-23 | 2022-07-22 | 中国科学院自动化研究所 | 图像生成器的训练方法、装置、电子设备和可读存储介质 |
CN114782291B (zh) * | 2022-06-23 | 2022-09-06 | 中国科学院自动化研究所 | 图像生成器的训练方法、装置、电子设备和可读存储介质 |
CN115376188A (zh) * | 2022-08-17 | 2022-11-22 | 天翼爱音乐文化科技有限公司 | 一种视频通话处理方法、系统、电子设备及存储介质 |
CN115376188B (zh) * | 2022-08-17 | 2023-10-24 | 天翼爱音乐文化科技有限公司 | 一种视频通话处理方法、系统、电子设备及存储介质 |
US20240143835A1 (en) * | 2022-11-02 | 2024-05-02 | Adobe Inc. | Anonymizing digital images utilizing a generative adversarial neural network |
CN115760646A (zh) * | 2022-12-09 | 2023-03-07 | 中山大学·深圳 | 一种针对不规则孔洞的多模态人脸图像修复方法和系统 |
CN115760646B (zh) * | 2022-12-09 | 2024-03-15 | 中山大学·深圳 | 一种针对不规则孔洞的多模态人脸图像修复方法和系统 |
CN116782041A (zh) * | 2023-05-29 | 2023-09-19 | 武汉工程大学 | 一种基于液晶微透镜阵列的图像质量提高方法及系统 |
CN116782041B (zh) * | 2023-05-29 | 2024-01-30 | 武汉工程大学 | 一种基于液晶微透镜阵列的图像质量提高方法及系统 |
CN117593462A (zh) * | 2023-11-30 | 2024-02-23 | 约翰休斯(宁波)视觉科技有限公司 | 三维空间场景的融合方法和系统 |
CN117593462B (zh) * | 2023-11-30 | 2024-06-07 | 约翰休斯(宁波)视觉科技有限公司 | 三维空间场景的融合方法和系统 |
CN118429505A (zh) * | 2024-07-02 | 2024-08-02 | 中山大学 | 一种基于高斯泼溅的三维人物模型生成方法 |
Also Published As
Publication number | Publication date |
---|---|
KR102697331B1 (ko) | 2024-08-20 |
CN112330574B (zh) | 2022-07-12 |
JP2023539691A (ja) | 2023-09-15 |
CN112330574A (zh) | 2021-02-05 |
JP7542156B2 (ja) | 2024-08-29 |
KR20230054432A (ko) | 2023-04-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022110638A1 (zh) | 人像修复方法、装置、电子设备、存储介质和程序产品 | |
US20220222786A1 (en) | Image processing method, smart device, and computer readable storage medium | |
CN109493350B (zh) | 人像分割方法及装置 | |
Yu et al. | A unified learning framework for single image super-resolution | |
CN108537754B (zh) | 基于形变引导图的人脸图像复原系统 | |
WO2022206202A1 (zh) | 图像美颜处理方法、装置、存储介质与电子设备 | |
WO2023284401A1 (zh) | 图像美颜处理方法、装置、存储介质与电子设备 | |
US11677897B2 (en) | Generating stylized images in real time on mobile devices | |
KR20240089729A (ko) | 화상 처리 방법, 장치, 저장 매체 및 전자 기기 | |
CN116645598A (zh) | 一种基于通道注意力特征融合的遥感图像语义分割方法 | |
CN116612015A (zh) | 模型训练方法、图像去摩尔纹方法、装置及电子设备 | |
Liu et al. | Single image super-resolution using a deep encoder–decoder symmetrical network with iterative back projection | |
US20220398704A1 (en) | Intelligent Portrait Photography Enhancement System | |
CN112200817A (zh) | 基于图像的天空区域分割和特效处理方法、装置及设备 | |
CN114511487A (zh) | 图像融合方法及装置、计算机可读存储介质、终端 | |
CN116703777A (zh) | 一种图像处理方法、系统、存储介质及电子设备 | |
CN111836058B (zh) | 用于实时视频播放方法、装置、设备以及存储介质 | |
CN115294055A (zh) | 图像处理方法、装置、电子设备和可读存储介质 | |
CN113658073B (zh) | 图像去噪处理方法、装置、存储介质与电子设备 | |
CN113628115A (zh) | 图像重建的处理方法、装置、电子设备和存储介质 | |
CN117830077A (zh) | 图像处理方法、装置以及电子设备 | |
CN109447900A (zh) | 一种图像超分辨率重建方法及装置 | |
CN114299105A (zh) | 图像处理方法、装置、计算机设备及存储介质 | |
CN115222606A (zh) | 图像处理方法、装置、计算机可读介质及电子设备 | |
CN111080543A (zh) | 图像处理方法及装置、电子设备及计算机可读存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21896153 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2023537450 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20237009537 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22.09.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21896153 Country of ref document: EP Kind code of ref document: A1 |