CN114612972A

CN114612972A - Face recognition method and system of light field camera

Info

Publication number: CN114612972A
Application number: CN202210215530.2A
Authority: CN
Inventors: 温建伟; 其他发明人请求不公开姓名
Original assignee: Beijing Zhuohe Technology Co Ltd
Current assignee: Beijing Zhuohe Technology Co Ltd
Priority date: 2022-03-07
Filing date: 2022-03-07
Publication date: 2022-06-10

Abstract

The text provides a face recognition method and system of a light field camera, use the light field camera to catch the primitive human face picture and get the light field picture at first; creating a multi-view array using the light field image; selecting a SA image sequence from the multi-view array based on a preset topological graph; inputting the SA image sequence into a first depth network to generate spatial feature information of the SA image; inputting the spatial feature information of the SA image into a second depth network to generate an image output set; and providing the image output set for a softmax classifier to classify, and outputting a target image. A double-depth spatial angle learning model is provided, which comprises a depth convolution neural network VGG-16 network and a long-short term memory network LSTM, and simultaneously considers angle information in a light field image and angle information between views. The model solves the problem that the recognition precision of the face image can still be improved under extreme scenes such as shielding, aging and the like.

Description

Face recognition method and system of light field camera

Technical Field

This document relates to the field of image recognition, and more particularly, to a method, system, medium, and apparatus for face recognition for a light field camera.

Background

In recent years, with the development of deep learning solutions and the improvement of computing power, various visual recognition tasks including face recognition have been rapidly progressed. At present, the deep neural network dominates the field of face recognition. However, even with the advent of this type of complex network, certain conditions may not allow sufficiently accurate face recognition, especially when less constrained scenes are encountered during image acquisition, such as finding significant changes in lighting, occlusion, and aging.

The advent of new imaging sensors, such as depth, near infrared, thermal imaging, and lenslet light field cameras, opened up new areas for face recognition systems. Of course, richer scene representations captured by these emerging imaging sensors may help improve face recognition performance.

At present, in each practical scheme based on deep learning, recognition is performed based on static or simple scenes, and no scheme for improving the face recognition efficiency or accuracy in some complex scenes is provided.

Disclosure of Invention

In order to overcome the problems in the related art, the invention provides a face recognition method, a face recognition system, a face recognition medium and a face recognition device of a light field camera, and provides a double-depth spatial angle learning model which comprises a depth convolution neural network VGG-16 network and a long-short term memory network LSTM, and simultaneously considers angle information in views and angle information between the views in a light field image. The model solves the problem that the recognition precision of the face image can still be improved under extreme scenes such as shielding, aging and the like.

According to a first aspect herein, there is provided a face recognition method for a light field camera, comprising: capturing an original face image by using a light field camera to obtain a light field image; creating a multi-view array using the light field images, the multi-view array consisting of a set of sub-aperture SA images, each of the SA images forming a view; selecting a SA image sequence from the multi-view array based on a preset topological graph; inputting the SA image sequence into a first depth network to generate spatial feature information of the SA image, wherein the spatial feature information comprises angle information in a view and angle information between views; inputting the spatial feature information of the SA image into a second depth network to generate an image output set; and providing the image output set for a softmax classifier to classify, and outputting a target image.

Based on the foregoing solution, the creating a multi-view array using the light field image further includes preprocessing the multi-view array.

Based on the foregoing, the pre-processing includes cropping a face region image in an SA image in the multi-view array; the face region image is adjusted to 224 × 224 pixels.

Based on the scheme, the first deep network is a deep convolutional neural network VGG-16 network; the second deep network comprises a long short term memory network LSTM.

Based on the scheme, the preset topological graph is divided into a horizontal image topological graph, a vertical image topological graph and horizontal and vertical image topological graphs.

Based on the foregoing scheme, the providing the image output set to a softmax classifier for classification and outputting a target image includes: a final output target image is determined based on the highest value of the probabilities of the light-field images in the image output set.

According to another aspect herein, there is provided a face recognition system of a light field camera, comprising: the system comprises an acquisition unit, a display unit and a processing unit, wherein the acquisition unit is used for capturing an original face image by using a light field camera to obtain a light field image; a creation unit for creating a multi-view array using the light field image; a selecting unit for selecting a SA image sequence from the multi-view array based on a preset topological graph; a first generation unit, configured to input the SA image sequence to a first depth network to generate spatial feature information of the SA image; the second generation unit is used for inputting the spatial feature information of the SA image into a second depth network to generate an image output set; and the output unit is used for providing the image output set for the softmax classifier to classify and outputting the target image.

According to another aspect herein, there is provided a computer readable storage medium having stored thereon a computer program which, when executed, implements the steps of a face recognition method for a light field camera.

According to another aspect herein, there is provided a computer device comprising a processor, a memory and a computer program stored on the memory, the processor when executing the computer program implementing the steps of the face recognition method of a light field camera.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification. In the drawings:

fig. 1 is a flowchart illustrating a face recognition method of a light field camera according to an exemplary embodiment.

Fig. 2 is a block diagram illustrating a face recognition system for a light field camera according to an example embodiment.

Fig. 3 is a block diagram illustrating a face recognition device of a light field camera according to an example embodiment.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings of the embodiments of the present invention, and it is obvious that the described embodiments are some but not all of the embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments herein without making any creative effort, shall fall within the scope of protection. It should be noted that the embodiments and features of the embodiments in the present disclosure may be arbitrarily combined with each other without conflict.

Unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, what is meant is "including, but not limited to".

In the description of the present disclosure, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present disclosure, "a plurality" means two or more unless otherwise specified.

The invention provides a face recognition method and a face recognition system of a light field camera, and provides a double-depth spatial angle learning model which comprises a depth convolution neural network VGG-16 network and a long-short term memory network LSTM, and simultaneously considers angle information in views and angle information between the views in a light field image. The model solves the problem that the recognition precision of the face image can still be improved under extreme scenes such as shielding, aging and the like.

Fig. 1 is a flowchart illustrating a face recognition method of a light field camera according to an exemplary embodiment. Referring to fig. 1, the face recognition method at least includes:

step 101: and capturing an original face image by using a light field camera to obtain a light field image.

Specifically, the light field camera may be a commercial light field camera such as Lytro and Raytrix; conventional cameras capture images by integrating the intensity of light from various directions striking the sensor element; whereas in a light field camera, each pixel collects light from a single ray in a given angular direction, converging on a particular microlens in the array. A light field camera captures light information from different angles of incidence using a microlens array in front of an image sensor, forming a light field image.

Step 102: a multi-view array is created using the light field images.

In particular, the multi-view array is composed of a set of sub-aperture SA images, each of the SA images forming a view;

the created multi-view array is represented as L (u, v, x, y), wherein (u, v) is defined as the number of SA image units in the horizontal direction and the vertical direction of the multi-view array; (x, y) is defined as the spatial resolution of the SA image.

In one embodiment herein, the multi-view array includes 15 × 15 SA images, each having a spatial resolution of 625 × 434 pixels. Each SA image corresponds to a different viewpoint of the visual scene, and thus "sees" a different visual scene from a different angle.

In order to meet the input requirements of the deep network, the multi-view array needs to be preprocessed; the pre-processing manner includes cropping a face region image in the SA image in the multi-view array; the face region image is adjusted to 224 × 224 pixels.

Step 103: a sequence of SA images is selected from the multi-view array based on a preset topology map.

Specifically, the preset topology may be a horizontal image topology, a vertical image topology, a horizontal and vertical image topology. The method can also be divided into a high-density image topological graph, a medium-density image topological graph and a low-density image topological graph. In this context, at least one topology map may be selected as needed for subsequent deep network training.

It will be appreciated that the selected SA image sequence should be a subset of the multi-view array, which includes a plurality of SA image sequences, each of which may be selected by a different topology.

Step 104: inputting the SA image sequence into a first depth network to generate spatial feature information of the SA image.

Specifically, the first deep network is a deep learning network and can be a deep convolutional neural network VGG-16 network; in the present context, the VGG-16 network is a pre-trained VGG-16 network, and the VGG-16 network is not additionally trained and learned, so that no training parameters need to be set; the SA image sequence is subjected to a trained VGG-16 network to generate spatial feature information of the SA image comprising angle information in views and angle information between the views;

in one embodiment, the deep convolutional neural network VGG-16 network represents a deep convolutional neural network with superior discriminative power. The network model is divided into six parts, the first five parts represent convolution networks, and the last part is a fully-connected network. The input image is resized to 224 x 224 pixels. The size of the convolution kernel is set to 3 × 3 for all convolution layers. Information of the SA image is extracted through a full connection layer of the VGG-16 network, a feature vector with a fixed length is extracted, and 4096 elements are extracted; the fully connected layer contains 4096 elements, i.e., 4096 SA images.

Specifically, the first part consists of two conv 3-64 layers and one maxpool layer. In this section, the image size changes from 224 × 224 to 224 × 224 × 64, which may be considered as input for the second section. The second section, similar to the first section, consists of two conv 3-128 layers and one maxpool layer. In this section, the size of the image is adjusted to 56 × 56 × 128. The third part consists of three conv 3-256 layers and one maxpool layer. In this portion, the image size is changed from 56 × 56 × 128 to 28 × 28 × 256. The fourth section, like the third section, consists of three conv 3-512 layers and one maxpool layer. In this portion, the image size is changed from 28 × 28 × 256 to 14 × 14 × 512. The output of the fifth part is converted into a one-dimensional vector consisting of 25088 parameters 7 × 7 × 512, and then sent to two fully connected layers of 4096 neurons and a missing layer. 4096 SA images are obtained, which contain intra-view angle information and inter-view angle information.

Step 105: and inputting the spatial feature information of the SA image into a second depth network to generate an image output set.

Specifically, the second deep network may be a long-short term memory network LSTM; SA images generated by the VGG-16 network and comprising angle information in views and angle information between the views are input into the LSTM network and are trained to generate an SA image output set.

The long-short term memory LSTM network model mainly aims to solve the problems of gradient extinction and gradient explosion in the long sequence training process. LSTM networks can perform better in longer sequences. The SA image sequence in the text can obtain good effect through LSTM network training.

Step 106: and providing the image output set for a softmax classifier to classify, and outputting a target image.

Herein, the highest value of the probability generated using the Softmax classifier determines the final output target image.

It should be noted that the Softmax classifier is adopted in the present invention to solve the multi-classification problem, and is defined as follows:

the output of the Softmax classifier P (-) is a normalized classification probability, so the value of P (-) is as high as 1.θ represents a plurality of inputs. And finally, when the output node is selected, selecting the node with the highest probability value as a prediction target, namely the output target image.

Fig. 2 illustrates a light field camera face recognition system 30 according to an exemplary embodiment. The system 20 comprises:

an acquisition unit 201, configured to capture an original face image using a light field camera to obtain a light field image;

a creating unit 202 for creating a multi-view array using the light field image;

a selecting unit 203 for selecting a SA image sequence from the multi-view array based on a preset topology map;

a first generating unit 204, configured to input the SA image sequence into a first depth network to generate spatial feature information of the SA image;

a second generating unit 205, configured to input the spatial feature information of the SA image into a second depth network to generate an image output set;

and the output unit 206 is used for providing the image output set for the softmax classifier to classify and outputting the target image.

It should be understood that this embodiment is a system example corresponding to the first embodiment, and may be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the first embodiment.

It should be noted that each module referred to in this embodiment is a logical module, and in practical applications, one logical unit may be one physical unit, may be a part of one physical unit, and may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, elements that are not so closely related to solving the technical problems proposed by the present invention are not introduced in the present embodiment, but this does not indicate that other elements are not present in the present embodiment.

FIG. 3 is a block diagram illustrating a method for a computer device 40 according to an example embodiment. Referring to fig. 3, the apparatus 30 includes a processor 301, and the number of processors may be set to one or more as necessary. The device 30 further comprises a memory 302 for storing instructions, such as an application program, executable by the processor 301. The number of the memories can be set to one or more according to needs. Which may store one or more application programs. The processor 301 is configured to execute instructions to perform the above-described method.

The memory 302 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store a list of options, etc. Further, the memory 302 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 302 may optionally include memory located remotely from processor 301, which may be connected to an external device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

As will be appreciated by one of skill in the art, the embodiments herein may be provided as a method, apparatus (device), or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied in the medium. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, including, but not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer, and the like. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices) and computer program products according to embodiments herein. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments herein have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following appended claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of this disclosure.

It will be apparent to those skilled in the art that various changes and modifications may be made herein without departing from the spirit and scope thereof. Thus, it is intended that such changes and modifications be included herein, provided they come within the scope of the appended claims and their equivalents.

Claims

1. A face recognition method of a light field camera is characterized by comprising the following steps:

capturing an original face image by using a light field camera to obtain a light field image;

creating a multi-view array using the light field images, the multi-view array consisting of a set of sub-aperture SA images, each of the SA images forming a view;

selecting a SA image sequence from the multi-view array based on a preset topological graph;

inputting the SA image sequence into a first depth network to generate spatial feature information of the SA image, wherein the spatial feature information comprises angle information in a view and angle information between views;

inputting the spatial feature information of the SA image into a second depth network to generate an image output set;

and providing the image output set for a softmax classifier to classify, and outputting a target image.

2. The method of claim 1, wherein the creating a multi-view array using the light field image further comprises preprocessing the multi-view array.

3. A method for face recognition in a light field camera as claimed in claim 2 wherein the pre-processing comprises cropping a facial region image in the SA image in the multi-view array; the face region image is adjusted to 224 × 224 pixels.

4. The face recognition method of a light field camera as claimed in claim 1 wherein the first depth network is a deep convolutional neural network VGG-16 network; the second deep network comprises a long short term memory network LSTM.

5. The face recognition method of a light field camera according to claim 1, wherein the preset topological graph is divided into a horizontal image topological graph, a vertical image topological graph, and horizontal and vertical image topological graphs.

6. The face recognition method of a light field camera according to claim 1, wherein the providing the image output set to a softmax classifier for classification and outputting a target image comprises: a final output target image is determined based on the highest value of the probabilities of the light-field images in the image output set.

7. A face recognition system for a light field camera, comprising:

the system comprises an acquisition unit, a display unit and a processing unit, wherein the acquisition unit is used for capturing an original face image by using a light field camera to obtain a light field image;

a creation unit for creating a multi-view array using the light field image;

a selecting unit for selecting a SA image sequence from the multi-view array based on a preset topological graph;

a first generation unit, configured to input the SA image sequence to a first depth network to generate spatial feature information of the SA image;

the second generation unit is used for inputting the spatial feature information of the SA image into a second depth network to generate an image output set;

and the output unit is used for providing the image output set for the softmax classifier to classify and outputting the target image.

8. The face recognition system of a light field camera of claim 7, wherein the first depth network is a deep convolutional neural network VGG-16 network; the second deep network comprises a long short term memory network LSTM.

9. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed, implements the steps of the method according to any one of claims 1-6.

10. A computer arrangement comprising a processor, a memory and a computer program stored on the memory, characterized in that the steps of the method according to any of claims 1-6 are implemented when the computer program is executed by the processor.