CN112200152A

CN112200152A - Super-resolution method for aligning face images based on residual back-projection neural network

Info

Publication number: CN112200152A
Application number: CN202011281052.2A
Authority: CN
Inventors: 陆耀; 王学博; 陈晓珍; 王子建; 李玮琪; 李公平; 吴紫薇
Original assignee: China Media Group
Current assignee: China Media Group
Priority date: 2019-12-06
Filing date: 2020-11-16
Publication date: 2021-01-08
Anticipated expiration: 2040-11-16
Also published as: CN112200152B; CN110991355A

Abstract

The invention relates to a super-resolution method for aligning a face image based on a residual back projection neural network, belonging to the technical field of image processing. The method adopts a mode of combining iterative back projection and a deep learning neural network, and amplifies the ultra-low resolution face image by 8 times through three steps. (1) Inputting the ultra-low resolution face image into a neural network, extracting depth features, and simultaneously amplifying a low resolution feature map to 128x128 size by adopting a deconvolution network. (2) Inputting the feature map with the size of 128 × 128 obtained in the step (1) into a residual back projection unit of a neural network, obtaining a compensated 128 × 128 high-resolution feature map through continuous iteration, and generating a final 128 × 128 high-resolution image by the high-resolution feature map obtained in the step (2) through a convolution layer (3). The method has clear modules and simple steps, and the super-resolution effect and efficiency meet the super-resolution requirement of the actual low-resolution face image.

Description

Super-resolution method for aligning face images based on residual back-projection neural network

Technical Field

The invention relates to a super-resolution method for aligning a face image based on a residual back projection neural network, belonging to the technical field of image processing.

Technical Field

In the field of computer vision research, face image super-resolution is an important sub-topic, and not only has many practical application scenes, but also is the basis of other research topics.

From the practical significance, many intelligent applications do not leave the support of the face image super-resolution technology, and the most important applications are like urban monitoring systems: with the rapid development of economy, more and more video surveillance cameras are used beside people, and the cameras are mainly used for building urban video surveillance systems and play an important role in public security organ criminal investigation business. However, in the process of acquiring a face by an actual camera, the face information is often difficult to directly identify, and the main reasons are as follows: on one hand, the distance between the camera and the face is usually long, and the low-resolution target image is difficult to provide enough detail information for recognition; on the other hand, in the video monitoring system, the optical devices are fuzzy, and the interference of the field environment, the transmission compression noise and the like causes the detail information of the target object to have errors, so that the feature information required by the face identification is difficult to provide. Therefore, it is a core technical requirement of video monitoring service to perform resolution enhancement processing on an actual low-resolution face image and further enhance the identifiability of a target image.

From the scientific research point of view, with the rapid development of artificial intelligence technology, computer vision tasks are developing continuously as one of the cores. Classical vision task research such as image classification, target detection and face recognition is becoming mature, and the premise is that high-resolution images are required to be provided, and face images belong to subsets of the high-resolution images. Therefore, super-resolution research of the face images can be regarded as the basis of the advanced vision tasks, such as the above tasks, higher-definition images are provided, the image classification result can be more accurate, the detection accuracy of target detection is higher, and the recognition rate of face recognition is higher. Before other visual tasks of the real image are carried out, the quality of the image is improved through a super-resolution method.

Face Super-Resolution (FSR), Single Image Super-Resolution (SISR) belonging to a specific category. The method aims to process a Low Resolution (LR) face image by using an algorithm, and improve the Resolution of the image, so that a clear High Resolution (HR) face image is obtained. The super-resolution is an algorithm for reversely solving from a low-resolution image to a high-resolution image, and due to the fact that the prior information of the image is insufficient due to the loss of high-frequency information in the image degradation process, the super-resolution is a problem of getting more ill conditions, and the solution of the ill conditions is also a problem of difficulty and hot points which are always concerned in a plurality of research fields, so that the super-resolution has high academic research value.

Disclosure of Invention

The invention aims to provide a super-resolution method for aligning face images based on a residual back-projection neural network, aiming at the problems that the existing aligned low-resolution face images have poor visual effect and are difficult to apply to the existing face analysis system and aiming at amplifying the ultra-low-resolution face images.

The invention is realized by the following technical scheme.

The super-resolution method for aligning the face image based on the residual back projection neural network comprises the following steps:

step 1, cutting a low-resolution face image to obtain a face image with a cut face area;

the size of a face area in the face image is 16 pixels;

step 2, carrying out height alignment treatment on the face image of the face area cut out in the step 1 to enable eyes of the face image to be on a horizontal straight line to obtain the highly aligned face image;

step 3, extracting the edge image of the highly aligned human face image obtained in the step 2 by using a sober operator;

step 4, channel merging is carried out on the edge image extracted in the step 3 and the highly aligned face image, and an image after channel merging is obtained;

step 5, extracting the depth features of the image after channel merging in the step 4, and amplifying the feature map of the low-resolution image to 128 × 128 size by using an iterative back projection mode to obtain a 128 × 128 feature map, which specifically comprises the following substeps:

step 5.1, extracting 256-dimensional depth features of the image merged by the channels in the step 4 by using the 3-by-3 convolution layer of the neural network;

step 5.2 mapping the 256-dimensional depth features extracted in step 5.1 into 64-dimensional features using 1 x1 convolutional layers;

step 5.3 scale up the 64-dimensional features to 128x128 size using a deconvolution layer with convolution kernel size 12 x12, step size 8, and padding of 2 x 2, resulting in a 128x128 feature map;

step 6, using convolution kernel size of 12 × 12 and step size of 8, filling convolution layer of 2 × 2, down-sampling feature map of 128 × 128 size to 16 × 16 size, and subtracting the feature map with 64-dimensional feature extracted in step 5.2 to obtain residual feature map;

step 7, amplifying the residual feature map to 128 × 128 size by using a deconvolution layer with convolution kernel size of 12 × 12, step size of 8 and filling of 2 × 2, adding the residual feature map to the 128 × 128 feature map obtained in the step 5 to obtain a compensated feature map, and calling the feature map as residual iterative back projection;

and 8, extracting the edge map of the residual iterative back projection obtained in the step 7, and adding the edge map to super-resolution reconstruction, wherein the method specifically comprises the following steps: extracting a residual iterative back-projected edge map by using a convolution kernel of 3 x 3, and supervising the generation of the edge map by using a label image of the edge map;

step 9, combining the edge image generated in the step 8 with the residual iterative back projection generated in the step 7, generating a final high-resolution face image by using a convolution layer, and performing supervised training by using a high-resolution face label image;

so far, from step 1 to step 9, the super-resolution method for aligning the face image based on the residual back projection neural network is completed.

Advantageous effects

Compared with the existing super-resolution method for aligning the face images, the super-resolution method for aligning the face images based on the residual back projection neural network has the following beneficial effects:

1. the peak signal-to-noise ratio (PSNR) of the high-resolution face image generated by the invention is higher;

2. the high-resolution face image generated by the invention has higher Structural Similarity (SSIM);

3. the high-resolution face image generated by the invention has better visualization effect.

Drawings

FIG. 1 is a flow chart of a super resolution method for aligning face images based on a residual back projection neural network according to an embodiment of the present invention;

fig. 2 is a super-resolution visualization result of a face image.

Detailed Description

The super-resolution method for aligning a face image based on a residual back-projection neural network according to the present invention is described in detail below with reference to the accompanying drawings and specific embodiments.

Example 1

The example illustrates the specific implementation of the super-resolution method for aligning face images based on the residual back-projection neural network.

When the super-resolution method for aligning the face images is implemented, a test is carried out by using a celebA data set of an open-source face image data set, and the data set contains 20 ten thousand face images on the front face. 5000 face images are randomly sampled to serve as a verification set, 1000 face images serve as a test set, and the rest face images serve as the verification set. The training, validation, and testing steps for neural networks are consistent, except for the different data sets used. The experimental environment adopted by the invention is as follows: the hardware system is a TiTan X independent display card, the video memory is 12G, the software system is ubuntu14.04, and a python pytorch framework is used. Using peak signal-to-noise ratio (PSNR) and structure similarity measurement (SSIM) as super-resolution evaluation index

The super-resolution method for aligning the face images, disclosed by the invention, comprises the specific implementation steps as shown in figure 1.

As can be seen from fig. 1, the super-resolution method includes the following steps:

step 1) carrying out uniform preprocessing on a face image data set, and cutting out a 128x128 size face region image of a face part

Step 2) carrying out height alignment treatment on the face image of the face area cut out in the step 1, so that eyes of the face image are on a horizontal straight line to obtain the highly aligned face image;

step 3) extracting the edge image of the highly aligned human face image obtained in the step 2 by using a sober operator;

step 4) simultaneously sampling the edge graph extracted in the step 3) and the highly aligned face image to 16 × 16 by using a bicubic difference downsampling mode;

and 5) carrying out channel combination on the edge image obtained by down sampling in the step 4) and the highly aligned face image.

And 6) extracting the depth features of the image after the channels are merged in the step 5), and enlarging the feature map of the low-resolution image to 128 × 128 size by using an iterative back projection mode. Specifically, 256-dimensional depth features of the input image are extracted using 3 × 3 convolutional layers of the neural network, and the 256 features are mapped to 64-dimensional features using 1 × 1 convolutional layers. The features were then scaled up to 128x128 size using a deconvolution kernel size of 12 x12, step size 8, filled with 2 x 2 deconvolution layers.

Step 7), using the convolution layer with the same parameters to sample the feature map with the size of 128 × 128 back to the size of 16 × 16, and subtracting the feature map from the input feature map to obtain a residual feature map;

and 8) amplifying the residual characteristic map to 128 × 128 size by using the deconvolution layer, and adding the residual characteristic map to the 128 × 128 characteristic map obtained in the previous step to obtain a compensated characteristic map. This is called residual iterative back-projection. The same iteration process is carried out for 7 times in total;

and 9) extracting the edge map of the high-resolution feature map obtained in the step 8, and adding the edge map into the super-resolution reconstruction process. Specific us use a 3 x 3 convolution kernel to extract an edge map of a 128x128 feature map, and use a labeled image of the edge map to supervise the generation of the edge map;

and step 10) combining the edge map generated in the step 9 with the feature map generated in the step 8, and generating a final high-resolution face image by using the convolution layer.

And (3) specific super-resolution result display:

we performed tests on a test set of 1000 low resolution face images and compared them with the best current super resolution methods laprn, DBPN, URDGN, CBN, the results are shown in table 1 below.

TABLE 1 super-resolution test results of face images

Methods	Bicubic	LapSRN	DBPN	URDGN	CBN	Ours
							PSNR(dB)	22.2025	23.9884	24.0100	23.6326	23.8004	24.2391
SSIM	0.5653	0.6810	0.6812	0.6710	0.6723	0.6921

As can be seen from the quantitative indexes in Table 1, the super-resolution method for aligning the face image based on the residual back projection neural network is higher than the current best super-resolution method in both peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM), wherein the PSNR is higher by 2.03dB than the traditional bicubic method and is higher by 0.22dB than the current best method DBPN, and meanwhile, the SSIM evaluation index is higher by 0.13 than the bicubic method and is higher by 0.011 than the current best method DBPN.

Except for quantitative evaluation, qualitative visual comparison is carried out between the method and the currently best super-resolution method LapSRN, DBPN, URDGN and CBN, as shown in the 'super-resolution visualization of face images' in FIG. 2, the structure of the high-resolution face image generated by the super-resolution method based on residual back-projection neural network alignment of the face image is more consistent with the original image, and meanwhile, the detail information is richer.

While the foregoing is directed to the preferred embodiment of the present invention, it is not intended that the invention be limited to the embodiment and the drawings disclosed herein. Equivalents and modifications may be made without departing from the spirit of the disclosure, which is to be considered as within the scope of the invention.

Claims

1. The super-resolution method for aligning the face image based on the residual back projection neural network is characterized in that: the method comprises the following steps:

step 5, extracting the depth features of the image after channel merging in the step 4, and amplifying the feature map of the low-resolution image to 128 × 128 size by using an iterative back projection mode to obtain a 128 × 128 feature map;

step 6, using convolution kernel size of 12 × 12, step size of 8, filling convolution layer of 2 × 2, and down-sampling 128 × 128 feature map to 16 × 16 size to obtain residual feature map;

step 8, extracting the edge map of the residual iterative back projection obtained in the step 7, and adding the edge map to super-resolution reconstruction to generate an edge map;

and 9, combining the edge image generated in the step 8 with the residual iterative back projection generated in the step 7, generating a final high-resolution face image by using a convolution layer, and performing supervised training by using the high-resolution face label image.

2. The super-resolution method for aligning face images based on residual back-projection neural network as claimed in claim 1, wherein: in step 1, the size of the face area in the face image is 16 pixels.

3. The super-resolution method for aligning face images based on residual back-projection neural network as claimed in claim 1, wherein: step 5, specifically comprising the following substeps:

step 5.3 magnifies the 64-dimensional features to 128x128 size using a deconvolution layer with convolution kernel size of 12 x12, step size of 8, and padding of 2 x 2, resulting in a 128x128 feature map.

4. The super-resolution method for aligning face images based on residual back-projection neural network as claimed in claim 1, wherein: the residual feature map in step 6 was obtained by using a convolution kernel size of 12 × 12, step size of 8, and filling 2 × 2 convolution layers to down-sample the 128 × 128 feature map to 16 × 16 size, and subtracting the 64-dimensional feature extracted in step 5.2.

5. The super-resolution method for aligning face images based on residual back-projection neural network as claimed in claim 1, wherein: step 8, specifically: extracting residual iterative back-projected edge maps using a 3 x 3 convolution kernel, and using labeled images of the edge maps to supervise the generation of the edge maps.