CN113011366A

CN113011366A - Method, apparatus, electronic device, and medium for improving face recognition accuracy

Info

Publication number: CN113011366A
Application number: CN202110346376.8A
Authority: CN
Inventors: 刘宗帅
Original assignee: Beijing Jingdong Qianshi Technology Co Ltd
Current assignee: Beijing Jingdong Qianshi Technology Co Ltd
Priority date: 2021-03-31
Filing date: 2021-03-31
Publication date: 2021-06-22

Abstract

The embodiment of the disclosure discloses a method, a device, an electronic device and a medium for improving face recognition accuracy. One embodiment of the method comprises: acquiring a target face image; extracting the features of the target face image through a pre-trained feature extraction network to generate a feature map, wherein the feature extraction network is used for representing the mapping relation between the target face image and the feature map on the resolution; and carrying out feature amplification processing on the feature image through a deconvolution network to generate a reconstructed face image. The embodiment improves the resolution ratio of the face image, improves the face recognition precision, and ensures the accuracy of identity recognition of the goods taker.

Description

Method, apparatus, electronic device, and medium for improving face recognition accuracy

Technical Field

Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method, an apparatus, an electronic device, and a medium for improving face recognition accuracy.

Background

With the development of unmanned technology, unmanned cargo vehicles are gradually moving into the field of vision of people. In order to avoid the situation of goods being taken by mistake, the identity of the goods taker needs to be identified. At present, in the prior art, a facial image of a consignee is acquired through an on-board sensor (for example, a camera), and the identity of the consignee is determined according to the facial image.

However, when the above-described technique is employed, there are often technical problems as follows:

because the environment around unmanned cargo vehicle has the uncertainty, for example, under the rainy day's the condition, because light is not enough, can lead to the facial image that vehicle sensor shot to be comparatively fuzzy to, lead to face identification accurate not enough, and then can lead to the identity identification precision to getting the goods person lower.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose methods, apparatuses, electronic devices, and media for improving face recognition accuracy to solve one of the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a method for improving face recognition accuracy, the method including: acquiring a target face image; extracting the features of the target face image through a pre-trained feature extraction network to generate a feature map, wherein the feature extraction network is used for representing the mapping relation between the target face image and the feature map on the resolution; and carrying out feature amplification processing on the feature map through a deconvolution network to generate a reconstructed face image.

Optionally, the feature extraction network includes: a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a fifth convolutional layer and a sixth convolutional layer; and the above-mentioned characteristic extraction network through training in advance carries on the characteristic extraction to the above-mentioned goal face picture, in order to produce the characteristic map, including: inputting the target face image into the first convolution layer to generate a first sub-feature map, wherein the first convolution layer is used for extracting feature vectors corresponding to sub-images of each block in the target face image block by block and generating the first sub-feature map according to at least one obtained feature vector; inputting the first sub-feature map into the second convolution layer to generate a second sub-feature map; inputting the second sub-feature map into the third convolutional layer to generate a third sub-feature map; inputting the third sub-feature map into the fourth convolutional layer to generate a fourth sub-feature map; inputting the fourth sub-feature map into the fifth convolutional layer to generate a fifth sub-feature map; inputting the fifth sub-feature map into the sixth convolutional layer to generate the feature map.

Alternatively, the second convolutional layer may generate the second sub-feature map by performing a high-dimensional mapping process on the first sub-feature map, the third convolutional layer may generate the third sub-feature map by performing a high-dimensional mapping process on the second sub-feature map, the fourth convolutional layer may generate the fourth sub-feature map by performing a high-dimensional mapping process on the third sub-feature map, and the fifth convolutional layer may generate the fifth sub-feature map by performing a high-dimensional mapping process on the fourth sub-feature map.

Optionally, the performing, by a deconvolution network, a feature amplification process on the feature map to generate a reconstructed face image includes: and carrying out deconvolution processing on the characteristic graph through a target convolution kernel to generate the reconstructed face image.

Optionally, the feature extraction network is generated by: generating a candidate feature extraction network based on a training data set, wherein the training data in the training data set are face images with target resolution; testing the candidate feature extraction network through a test sample set to generate a test result; and in response to determining that the test result meets a preset condition, determining the candidate feature extraction network as the feature extraction network.

Optionally, the test sample set is obtained by: randomly intercepting each sample face image in the sample face image set to generate a candidate face image to obtain a candidate face image set; and performing down-sampling processing on the candidate face images in the candidate face image set to generate a test sample, so as to obtain the test sample set.

In a second aspect, some embodiments of the present disclosure provide an apparatus for improving face recognition accuracy, the apparatus comprising: an acquisition unit configured to acquire a target face image; the characteristic extraction unit is configured to perform characteristic extraction on the target face image through a pre-trained characteristic extraction network to generate a characteristic map, wherein the characteristic extraction network is used for representing the mapping relation between the target face image and the characteristic map in the resolution; and the characteristic amplification processing unit is configured to perform characteristic amplification processing on the characteristic graph through a deconvolution network to generate a reconstructed face image.

Optionally, the feature extraction network includes: a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a fifth convolutional layer and a sixth convolutional layer; and the feature extraction unit is further configured to: inputting the target face image into the first convolution layer to generate a first sub-feature map, wherein the first convolution layer is used for extracting feature vectors corresponding to sub-images of each block in the target face image block by block and generating the first sub-feature map according to at least one obtained feature vector; inputting the first sub-feature map into the second convolution layer to generate a second sub-feature map; inputting the second sub-feature map into the third convolutional layer to generate a third sub-feature map; inputting the third sub-feature map into the fourth convolutional layer to generate a fourth sub-feature map; inputting the fourth sub-feature map into the fifth convolutional layer to generate a fifth sub-feature map; inputting the fifth sub-feature map into the sixth convolutional layer to generate the feature map.

Optionally, the feature amplification processing unit is further configured to: and carrying out deconvolution processing on the characteristic graph through a target convolution kernel to generate the reconstructed face image.

In a third aspect, some embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method described in any of the implementations of the first aspect.

In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium on which a computer program is stored, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect.

The above embodiments of the present disclosure have the following beneficial effects: by the method for improving the face recognition precision of some embodiments of the present disclosure, the resolution of the face image is improved, thereby ensuring the precision of the identification of the identity of the goods-taker. Specifically, the reason why the identification precision of the goods taker is not high is that: due to the influence of environmental factors such as weather, the resolution of the shot face image is low. For example, in the case of rainy days, the resolution of the face image obtained by shooting may be low due to insufficient light. For another example, in the case of night or in a room with insufficient light for an unmanned cargo vehicle, the resolution of the face image captured may be low. Based on this, the method for improving the face recognition accuracy of some embodiments of the present disclosure first acquires a target face image. Secondly, feature extraction is carried out on the target face image through a pre-trained feature extraction network so as to generate a feature map. The feature extraction network can represent the mapping relation between the target face image and the feature map in the resolution, so that the face feature information contained in the feature map and the face feature information in the target face image have the mapping relation in the resolution. And finally, performing feature amplification processing on the feature map through a deconvolution network to generate a reconstructed face image. Thereby enabling mapping of the low resolution image to the high resolution image. Meanwhile, the size of the reconstructed face image is consistent with that of the target face image. By the method, the resolution of the face image is improved, so that the accuracy of face recognition is improved, and the accuracy of identification of the goods taker is guaranteed.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.

Fig. 1 is a schematic diagram of an application scenario of a method for improving face recognition accuracy according to some embodiments of the present disclosure;

FIG. 2 is a flow diagram of some embodiments of a method for improving face recognition accuracy according to the present disclosure;

FIG. 3 is a schematic view of a target unmanned vehicle;

FIG. 4 is a schematic diagram of a network architecture of a feature extraction network;

FIG. 5 is a schematic diagram of a network structure of a candidate feature extraction network;

FIG. 6 is a schematic diagram of the comparison result of the generated reconstructed face image;

FIG. 7 is a schematic diagram of a network structure of a deconvolution network;

FIG. 8 is a flow diagram of further embodiments of a method for improving face recognition accuracy according to the present disclosure;

FIG. 9 is a schematic diagram of a deconvolution process performed on a feature map by a target convolution kernel;

FIG. 10 is a schematic block diagram of some embodiments of an apparatus for improving face recognition accuracy according to the present disclosure;

FIG. 11 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 is a schematic diagram of an application scenario of a method for improving face recognition accuracy according to some embodiments of the present disclosure.

In the application scenario of fig. 1, first, the computing device 101 may acquire a target face image 102. Next, the computing device 101 may perform feature extraction on the target face image 102 through a pre-trained feature extraction network 103 to generate a feature map 104, where the feature extraction network 103 is configured to characterize a mapping relationship between the target face image 102 and the feature map 104 in terms of resolution. Finally, the computing device 101 may perform feature enlargement processing on the feature map 104 through the deconvolution network 105 to generate a reconstructed face image 106.

The computing device 101 may be hardware or software. When the computing device is hardware, it may be implemented as a distributed cluster composed of multiple servers or terminal devices, or may be implemented as a single server or a single terminal device. When the computing device is embodied as software, it may be installed in the hardware devices enumerated above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.

It should be understood that the number of computing devices in FIG. 1 is merely illustrative. There may be any number of computing devices, as implementation needs dictate.

With continued reference to fig. 2, a flow 200 of some embodiments of a method for improving face recognition accuracy in accordance with the present disclosure is shown. The method for improving the face recognition precision comprises the following steps:

step 201, obtaining a target face image.

In some embodiments, the executing subject (e.g., the computing device 101 shown in fig. 1) of the method for improving the accuracy of face recognition may acquire the target face image by means of a wired connection or a wireless connection. The target face image may be captured by a camera mounted on the target unmanned vehicle. The target unmanned vehicle may be an unmanned transport vehicle for transporting the article.

As an example, the target unmanned vehicle may be as shown in fig. 3. The target unmanned vehicle may include a environment sensing device 301 and an inventory device 302. The environment sensing device 301 may be used to sense the environment (e.g., obstacles, etc.) around the target unmanned vehicle. The inventory device 302 described above may be used to place items (e.g., couriers). The environment sensing apparatus 301 may be a camera.

Step 202, performing feature extraction on the target face image through a pre-trained feature extraction network to generate a feature map.

In some embodiments, the executing subject may perform feature extraction on the target face image through a pre-trained feature extraction network to generate a feature map. The feature extraction network is used for extracting features of the target face image. The feature extraction network is used for representing the mapping relation between the target face image and the feature image in the resolution. The feature extraction network may be, but is not limited to, any of the following: CNN (Convolutional Neural Networks) model, RNN (Recurrent Neural Networks) model, and LSTM (Long Short-Term Memory) model.

Optionally, the feature extraction network may include: the first convolution processing layer, the second convolution processing layer, the first pooling layer, the third convolution processing layer, the fourth convolution processing layer, the second pooling layer, the fifth convolution processing layer, the sixth convolution processing layer and the seventh convolution processing layer. The number of convolution kernels of the seventh convolution processing layer may be 256.

As an example, the network structure of the above-described feature extraction network may be as shown in fig. 4. Wherein, the feature extraction network may include: a first convolution processing layer 401, a second convolution processing layer 402, a first pooling layer 403, a third convolution processing layer 404, a fourth convolution processing layer 405, a second pooling layer 406, a fifth convolution processing layer 407, a sixth convolution processing layer 408 and a seventh convolution processing layer 409. Here, the convolution kernel size of the first convolution processing layer 401 may be 3 × 3. The convolution kernel size of the second convolution processing layer 402 described above may be 3 × 3. The convolution kernel size of the third convolution processing layer 404 described above may be 3 × 3. The convolution kernel size of the fourth convolution processing layer 405 described above may be 3 × 3. The convolution kernel size of the above-described fifth convolution processing layer 407 may be 3 × 3. The convolution kernel size of the above-described sixth convolution processing layer 408 may be 3 × 3. The convolution kernel size of the seventh convolution processing layer 409 may be 3 × 3. The number of convolution kernels 401 of the first convolution processing layer may be 64. The number of convolution kernels of the second convolution processing layer 402 may be 64. The number of convolution kernels of the third convolution processing layer 404 may be 128. The number of convolution kernels of the fourth convolution processing layer 405 may be 128. The number of convolution kernels of the above-described fifth convolution processing layer 407 may be 256. The number of convolution kernels of the sixth convolution processing layer 408 may be 256. The number of convolution kernels of the seventh convolution processing layer 409 may be 256.

Optionally, the feature extraction network may include: a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a fifth convolutional layer and a sixth convolutional layer. The first convolution layer may be configured to extract feature vectors corresponding to sub-images of each block in the target face image block by block, and generate the first sub-feature map according to at least one obtained feature vector. The second convolutional layer may generate the second sub-feature map by performing a high-dimensional mapping process on the first sub-feature map. The third convolutional layer may generate the third sub-feature map by performing a high-dimensional mapping process on the second sub-feature map. The fourth convolution layer generates the fourth sub-feature map by performing a high-dimensional mapping process on the third sub-feature map. The fifth convolutional layer may generate the fifth sub-feature map by performing a high-dimensional mapping process on the fourth sub-feature map.

Alternatively, the feature extraction network may be generated by:

firstly, generating a candidate feature extraction network based on a training data set.

The training data in the training data set is a face image with a target resolution.

As an example, the above target resolution may be 100 × 100.

As yet another example, the network structure of the candidate feature extraction network described above may be as shown in fig. 5. The candidate feature extraction network may include: a first candidate convolutional layer 501, a second candidate convolutional layer 502, a third candidate convolutional layer 503, a fourth candidate convolutional layer 504, a fifth candidate convolutional layer 505, and a sixth candidate convolutional layer 506.

And secondly, testing the candidate characteristic extraction network through a test sample set to generate a test result.

Optionally, the test results may characterize the accuracy of the candidate feature extraction network. The test sample can be obtained by the following substeps:

the first substep is to randomly intercept each sample face image in the sample face image set to generate a candidate face image, and obtain a candidate face image set.

The sample face image set may be face images from LFW Faces DataBase and ORLFaces DataBase.

As an example, the resolution of the sample face image in the sample face image set described above may be 250 × 250. The resolution of the candidate face images in the candidate face image set may be 100 × 100.

And a second sub-step of performing down-sampling processing on the candidate face images in the candidate face image set to generate a test sample, so as to obtain the test sample set.

As an example, the test sample in the above-described test sample set may be an image having a resolution of 50 × 50.

And thirdly, in response to the fact that the test result meets the preset condition, determining the candidate feature extraction network as the feature extraction network.

The preset condition may be that the test result is greater than a preset value.

As an example, the above-mentioned preset value may be 99%.

As yet another example, as shown in fig. 6. Wherein the images in the image set 601 are low resolution images. The images in the image set 602 may be images obtained by restoring the images in the image set 601 by using a Super-resolution neural network (SRCNN) model. The images in the image set 603 may be images obtained by restoring the images in the image set 601 through a feature extraction network and a deconvolution network.

And step 203, performing feature amplification processing on the feature map through a deconvolution network to generate a reconstructed face image.

In some embodiments, the execution subject may perform feature amplification processing on the feature map through the deconvolution network to generate a reconstructed face image. The reconstructed face image may be an image obtained by improving the resolution of the target face image.

As an example, the above-mentioned deconvolution network may be as shown in fig. 7, wherein the above-mentioned deconvolution network may include: first deconvolution layer 701, second deconvolution layer 702, third deconvolution layer 703, first deconvolution layer 704, fourth deconvolution layer 705, fifth deconvolution layer 706, second deconvolution layer 707, sixth deconvolution layer 708, and seventh deconvolution layer 709. Here, the convolution kernel size of the first deconvolution layer 701 may be 3 × 3. The convolution kernel size of the second deconvolution layer 702 may be 3 × 3. The convolution kernel size of the third deconvolution layer 703 may be 3 × 3. The convolution kernel size of the fourth deconvolution layer 705 may be 3 × 3. The convolution kernel size of the fifth deconvolution layer 706 may be 3 × 3. The convolution kernel size of sixth deconvolution layer 708 and seventh deconvolution layer 709 may be 3 × 3. The number of convolution kernels of the first deconvolution layer 701 may be 256, and the number of convolution kernels of the second deconvolution layer 702 may be 256. The number of convolution kernels of the third deconvolution layer 703 may be 256. The number of convolution kernels of the fourth deconvolution layer 705 may be 128. The number of convolution kernels of the fifth deconvolution layer 706 may be 128. The number of convolution kernels of the sixth deconvolution layer 708 may be 64. The number of convolution kernels of the seventh deconvolution layer 709 may be 64.

The above embodiments of the present disclosure have the following beneficial effects: by the method for improving the face recognition precision of some embodiments of the present disclosure, the resolution of the face image is improved, thereby ensuring the precision of the identification of the identity of the goods-taker. Specifically, the reason why the identification precision of the goods taker is not high is that: due to the influence of environmental factors such as weather, the resolution of the shot face image is low. For example, in the case of rainy days, the resolution of the face image obtained by shooting may be low due to insufficient light. For another example, in the case of night or in a room with insufficient light for an unmanned cargo vehicle, the resolution of the face image captured may be low. Based on this, the method for improving the face recognition accuracy of some embodiments of the present disclosure first acquires a target face image. Secondly, feature extraction is carried out on the target face image through a pre-trained feature extraction network so as to generate a feature map. The feature extraction network can represent the mapping relation between the target face image and the feature map in the resolution, so that the face feature information contained in the feature map and the face feature information in the target face image have the mapping relation in the resolution. And finally, performing feature amplification processing on the feature map through a deconvolution network to generate a reconstructed face image. Thereby enabling mapping of low resolution to high resolution. Meanwhile, the size of the reconstructed face image is consistent with that of the target face image. By the method, the resolution of the face image is improved, so that the accuracy of face recognition is improved, and the accuracy of identification of the goods taker is guaranteed.

With further reference to fig. 8, a flow 800 of further embodiments of methods for improving face recognition accuracy is illustrated. The process 800 of the method for improving the face recognition accuracy comprises the following steps:

step 801, acquiring a target face image.

In some embodiments, the specific implementation of step 801 and the technical effect thereof may refer to step 201 in those embodiments corresponding to fig. 2, and are not described herein again.

Step 802, inputting the target face image into the first convolution layer to generate a first sub-feature map.

In some embodiments, the executing entity may input the target face image into the first convolution layer to generate the first sub-feature map. Therefore, the first characteristic extraction of the target face image is realized.

As an example, the convolution kernel size of the first convolution layer may be 3 × 3. The number of convolution kernels of the first convolution layer may be 16.

Step 803, input the first sub-feature map into the second convolution layer to generate a second sub-feature map.

In some embodiments, the execution subject may input the first sub-feature map into the second convolutional layer to generate the second sub-feature map. So as to realize the further feature extraction of the target face image.

As an example, the convolution kernel size of the second convolution layer may be 3 × 3. The number of convolution kernels of the second convolution layer may be 32.

Step 804, inputting the second sub-feature map into the third convolution layer to generate a third sub-feature map.

In some embodiments, the execution subject may input the second sub-feature map into the third convolutional layer to generate the third sub-feature map. So as to realize the further feature extraction of the target face image.

As an example, the convolution kernel size of the third convolution layer may be 3 × 3. The number of convolution kernels of the third convolution layer may be 64.

Step 805, the third sub-feature map is input into the fourth convolution layer to generate a fourth sub-feature map.

In some embodiments, the execution body may input the third sub-feature map into the fourth convolutional layer to generate the fourth sub-feature map. So as to realize the further feature extraction of the target face image.

As an example, the convolution kernel size of the fourth convolution layer may be 3 × 3. The number of convolution kernels of the fourth convolution layer may be 128.

Step 806, input the fourth sub-feature map into the fifth convolutional layer to generate a fifth sub-feature map.

In some embodiments, the execution subject may input the fourth sub-feature map into the fifth convolutional layer to generate the fifth sub-feature map. So as to realize the further feature extraction of the target face image.

As an example, the convolution kernel size of the fifth layer described above may be 3 × 3. The number of convolution kernels of the fifth layer may be 128.

Step 807, input the fifth sub-feature map into the sixth convolution layer to generate the feature map.

In some embodiments, the execution body may input the fifth sub-feature map into the sixth convolutional layer to generate the feature map. So as to realize the further feature extraction of the target face image.

As an example, the convolution kernel size of the sixth convolution layer may be 3 × 3. The number of convolution kernels of the sixth convolution layer may be 256.

And 808, performing deconvolution processing on the feature map through a target convolution kernel to generate a reconstructed face image.

In some embodiments, the execution subject may perform deconvolution processing on the feature map through a target convolution kernel to generate a reconstructed face image.

As an example, as shown in fig. 9. The size of the target convolution kernel 901 may be 4 × 4. The execution subject may perform deconvolution processing on the feature map 104 with a step size of 2 to generate the reconstructed face image 106.

As can be seen from fig. 8, the method for improving the accuracy of face recognition of the present disclosure is compared with the description of some embodiments corresponding to fig. 2. Firstly, considering that the feature extraction network is only used for extracting the features in the target face image, the model structure is simplified by optimizing the network structure of the feature extraction network. By the method, on the premise of simplifying the model structure, the resolution of the generated reconstructed face image can be improved.

With further reference to fig. 10, as an implementation of the methods shown in the above figures, the present disclosure provides some embodiments of an apparatus for improving face recognition accuracy, which correspond to those shown in fig. 2, and which may be applied in various electronic devices.

As shown in fig. 10, an apparatus 1000 for improving face recognition accuracy of some embodiments includes: an acquisition unit 1001 configured to acquire a target face image; a feature extraction unit 1002, configured to perform feature extraction on the target face image through a pre-trained feature extraction network to generate a feature map, where the feature extraction network is used to represent a mapping relationship between the target face image and the feature map in resolution; a feature enlargement processing unit 1003 configured to perform feature enlargement processing on the feature map through a deconvolution network to generate a reconstructed face image.

In some optional implementations of some embodiments, the feature extraction network includes: a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a fifth convolutional layer and a sixth convolutional layer; and the above-mentioned feature extraction unit 1002 is further configured to: inputting the target face image into the first convolution layer to generate a first sub-feature map, wherein the first convolution layer is used for extracting feature vectors corresponding to sub-images of each block in the target face image block by block and generating the first sub-feature map according to at least one obtained feature vector; inputting the first sub-feature map into the second convolution layer to generate a second sub-feature map; inputting the second sub-feature map into the third convolutional layer to generate a third sub-feature map; inputting the third sub-feature map into the fourth convolutional layer to generate a fourth sub-feature map; inputting the fourth sub-feature map into the fifth convolutional layer to generate a fifth sub-feature map; inputting the fifth sub-feature map into the sixth convolutional layer to generate the feature map.

In some optional implementations of some embodiments, the second convolutional layer generates the second sub-feature map by performing a high-dimensional mapping process on the first sub-feature map, the third convolutional layer generates the third sub-feature map by performing a high-dimensional mapping process on the second sub-feature map, the fourth convolutional layer generates the fourth sub-feature map by performing a high-dimensional mapping process on the third sub-feature map, and the fifth convolutional layer generates the fifth sub-feature map by performing a high-dimensional mapping process on the fourth sub-feature map.

In some optional implementations of some embodiments, the feature amplification processing unit 1003 is further configured to: and carrying out deconvolution processing on the characteristic graph through a target convolution kernel to generate the reconstructed face image.

In some optional implementations of some embodiments, the feature extraction network is generated by: generating a candidate feature extraction network based on a training data set, wherein the training data in the training data set are face images with target resolution; testing the candidate feature extraction network through a test sample set to generate a test result; and in response to determining that the test result meets a preset condition, determining the candidate feature extraction network as the feature extraction network.

In some optional implementations of some embodiments, the set of test samples is obtained by: randomly intercepting each sample face image in the sample face image set to generate a candidate face image to obtain a candidate face image set; and performing down-sampling processing on the candidate face images in the candidate face image set to generate a test sample, so as to obtain the test sample set.

It will be understood that the units described in the apparatus 1000 correspond to the various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 1000 and the units included therein, and are not described herein again.

Referring now to FIG. 11, shown is a schematic block diagram of an electronic device (such as computing device 101 shown in FIG. 1)1100 suitable for use in implementing some embodiments of the present disclosure. The electronic device shown in fig. 11 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 11, the electronic device 1100 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 1101 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)1102 or a program loaded from a storage means 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for the operation of the electronic device 1100 are also stored. The processing device 1101, the ROM 1102, and the RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

Generally, the following devices may be connected to the I/O interface 1105: input devices 1106 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 1107 including, for example, Liquid Crystal Displays (LCDs), speakers, vibrators, and the like; storage devices 1108, including, for example, magnetic tape, hard disk, etc.; and a communication device 1109. The communication means 1109 may allow the electronic device 1100 to communicate wirelessly or wiredly with other devices to exchange data. While fig. 11 illustrates an electronic device 1100 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 11 may represent one device or may represent a plurality of devices as desired.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network via the communication device 1109, or installed from the storage device 1108, or installed from the ROM 1102. The computer program, when executed by the processing apparatus 1101, performs the above-described functions defined in the methods of some embodiments of the present disclosure.

It should be noted that the computer readable medium described in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a target face image; extracting the features of the target face image through a pre-trained feature extraction network to generate a feature map, wherein the feature extraction network is used for representing the mapping relation between the target face image and the feature map on the resolution; and carrying out feature amplification processing on the feature map through a deconvolution network to generate a reconstructed face image.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a feature extraction unit, and a feature enlargement processing unit. Where the names of these units do not in some cases constitute a limitation on the unit itself, for example, the feature extraction unit may also be described as a "unit that generates a feature map by inputting a target face image into a feature extraction network trained in advance".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A method for improving face recognition accuracy, comprising:

acquiring a target face image;

performing feature extraction on the target face image through a pre-trained feature extraction network to generate a feature map, wherein the feature extraction network is used for representing the mapping relation between the target face image and the feature map on the resolution;

and carrying out feature amplification processing on the feature image through a deconvolution network to generate a reconstructed face image.

2. The method of claim 1, wherein the feature extraction network comprises: a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a fifth convolutional layer and a sixth convolutional layer; and

the feature extraction of the target face image through a pre-trained feature extraction network to generate a feature map comprises the following steps:

inputting the target face image into the first convolution layer to generate a first sub-feature map, wherein the first convolution layer is used for extracting feature vectors corresponding to sub-images of each block in the target face image block by block and generating the first sub-feature map according to at least one obtained feature vector;

inputting the first sub-feature map into the second convolution layer to generate a second sub-feature map;

inputting the second sub-feature map into the third convolutional layer to generate a third sub-feature map;

inputting the third sub-feature map into the fourth convolutional layer to generate a fourth sub-feature map;

inputting the fourth sub-feature map into the fifth convolutional layer to generate a fifth sub-feature map;

inputting the fifth sub-feature map into the sixth convolution layer to generate the feature map.

3. The method according to claim 2, wherein the second convolutional layer generates the second sub-feature map by performing a high-dimensional mapping process on the first sub-feature map, the third convolutional layer generates the third sub-feature map by performing a high-dimensional mapping process on the second sub-feature map, the fourth convolutional layer generates the fourth sub-feature map by performing a high-dimensional mapping process on the third sub-feature map, and the fifth convolutional layer generates the fifth sub-feature map by performing a high-dimensional mapping process on the fourth sub-feature map.

4. The method of claim 1, wherein the feature enlarging the feature map through a deconvolution network to generate a reconstructed face image, comprises:

and carrying out deconvolution processing on the feature map through a target convolution kernel to generate the reconstructed face image.

5. The method of claim 1, wherein the feature extraction network is generated by:

generating a candidate feature extraction network based on a training data set, wherein the training data in the training data set are face images with target resolution;

testing the candidate feature extraction network through a test sample set to generate a test result;

and in response to determining that the test result meets a preset condition, determining the candidate feature extraction network as the feature extraction network.

6. The method of claim 5, wherein the set of test samples is obtained by:

randomly intercepting each sample face image in the sample face image set to generate a candidate face image to obtain a candidate face image set;

and performing down-sampling processing on the candidate face images in the candidate face image set to generate a test sample, so as to obtain the test sample set.

7. An apparatus for improving face recognition accuracy, comprising:

an acquisition unit configured to acquire a target face image;

the characteristic extraction unit is configured to perform characteristic extraction on the target face image through a pre-trained characteristic extraction network to generate a characteristic map, wherein the characteristic extraction network is used for representing the mapping relation between the target face image and the characteristic map in the resolution;

and the characteristic amplification processing unit is configured to perform characteristic amplification processing on the characteristic graph through a deconvolution network to generate a reconstructed face image.

8. The apparatus of claim 7, wherein the feature amplification processing unit is further configured to:

9. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.

10. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-6.