CN110956056A

CN110956056A - Face living body detection method and system

Info

Publication number: CN110956056A
Application number: CN201811126768.8A
Authority: CN
Inventors: 李海青; 侯广琦
Original assignee: Beijing Zhongke Hongxing Technology Co Ltd
Current assignee: Beijing Zhongke Hongxing Technology Co Ltd
Priority date: 2018-09-26
Filing date: 2018-09-26
Publication date: 2020-04-03

Abstract

The invention relates to a human face living body detection method and a system, comprising the following steps: acquiring a picture to be detected; extracting the characteristics of the picture to be detected by utilizing a pre-constructed convolutional neural network model; converting the features into feature vectors using a full-connectivity layer; and determining the category label of the picture to be detected according to the feature vector. According to the technical scheme provided by the invention, a new loss function is introduced, and the advantages of the two-classification loss function and the multi-classification loss function are integrated, so that the living body and the prosthesis picture can be better distinguished, and a better living body detection effect is achieved.

Description

Face living body detection method and system

Technical Field

The invention belongs to the field of face recognition, and particularly relates to a face in-vivo detection method and system.

Background

The face living body detection is to distinguish a real face picture and a false face picture by using a device or a model, wherein the false face picture can comprise a printed face picture, a face mask picture and the like. In-vivo detection of human faces is very challenging due to the diversity and high simulation of the images of the prosthesis and the difference between human faces.

The existing in vivo detection technology based on deep learning only uses the traditional deep convolutional network, utilizes softmax to calculate a loss function, and updates network parameters, but because the prosthesis is various and has larger characteristic difference, the classification interface is difficult to learn by directly utilizing the traditional softmax to calculate the two-classification loss function, so that the original two-classification problem is changed into a multi-classification problem, the essence of the in vivo detection problem is changed, and the human face in vivo detection effect is poor.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a human face in-vivo detection method and a human face in-vivo detection system.

A face in-vivo detection method comprises the following steps:

acquiring a picture to be detected;

extracting the characteristics of the picture to be detected by utilizing a pre-constructed convolutional neural network model;

converting the features into feature vectors using a full-connectivity layer;

and determining the category label of the picture to be detected according to the feature vector.

Further, the building of the convolutional neural network model comprises:

acquiring a plurality of picture samples, wherein each sample uniquely corresponds to one type label, the plurality of picture samples comprise a plurality of type labels, and the plurality of type labels at least comprise more than two prosthesis type labels;

extracting features of the picture sample by using a convolutional neural network based on deep learning;

converting the features into feature vectors by utilizing a full connection layer based on the number of all prosthesis type labels corresponding to the multiple image samples;

training the convolutional neural network according to a loss function to obtain optimized network parameters;

and constructing a convolutional neural network model according to the optimized network parameters.

Further, the converting the features into feature vectors using a full-connected layer includes:

converting the features into a high-dimensional vector through a full connection layer;

multiplying the high-dimensional vector by an N + 1-dimensional parameter matrix and a 2-dimensional parameter matrix respectively to obtain an N + 1-dimensional fractional eigenvector and a 2-dimensional fractional eigenvector;

wherein N is the number of prosthesis type tags.

Further, training the convolutional neural network according to a loss function to obtain optimized network parameters, including:

the loss was calculated as follows:

L＝L₁+λL₂

when y is_iWhen the number is equal to 1, the alloy is put into a container,

when y is_iWhen the number is not equal to 1, the content is determined,

wherein L is the overall loss; l is₁Loss of the first type; l is₂Loss of the second type; λ is a hyper-parameter; n is the number of pictures with labels; l is_1,iThe first type loss corresponding to the ith picture; l is_2,iThe second type loss corresponding to the ith picture; y is_iIs the type of the ith picture, y_iThe type of the picture is 1, namely, a face living body picture, y_iNot equal to 1 represents that the type of the picture is other types of pictures; s is a parameter; theta_j,iIs composed of

And x_iThe included angle therebetween;

is an N +1 dimensional parameter matrix W¹Column j of (1); x is the number of_iMulti-dimensional characteristic vector output for ith picture full connection layer α_j,iIs composed of

And x_iThe included angle therebetween;

is a 2-dimensional parameter matrix W²Column j of (1); m is a hyper-parameter threshold; w¹And W²Are respectively x_iAnd a parameter matrix between the N + 1-dimensional fractional eigenvector and the 2-dimensional fractional eigenvector.

Further, determining the category label of the picture to be detected according to the feature vector includes:

when the first feature vector of the N + 1-dimensional fractional feature vector is larger than the sum of other feature vectors and the hyper-parameter threshold, and the first feature vector of the 2-dimensional fractional feature vector is larger than the sum of the second feature vector and the hyper-parameter threshold, the type label of the picture to be detected is a living body type label;

otherwise, the type label of the picture to be detected is a prosthesis type label.

A face liveness detection system, comprising:

the acquisition module is used for acquiring a picture to be detected;

the extraction module is used for extracting the characteristics of the picture to be detected by utilizing a pre-constructed convolutional neural network model;

a conversion module for converting the features into feature vectors using a full connection layer;

and the determining module is used for determining the category label of the picture to be detected according to the feature vector.

Further, the extraction module comprises:

the obtaining submodule is used for obtaining a plurality of picture samples, each sample uniquely corresponds to one kind label, the plurality of picture samples comprise a plurality of kinds labels, and the plurality of kinds labels at least comprise more than two prosthesis kind labels;

the extraction submodule is used for extracting the characteristics of the picture sample by utilizing a convolutional neural network based on deep learning;

the conversion submodule is used for converting the features into feature vectors by utilizing a full connection layer based on the number of all the prosthesis type labels corresponding to the multiple image samples;

the parameter optimization submodule is used for training the convolutional neural network according to a loss function to obtain optimized network parameters;

and the model construction submodule is used for constructing a convolutional neural network model according to the optimized network parameters.

Further, the conversion sub-module is configured to,

wherein N is the number of prosthesis type tags.

Further, the optimization parameter sub-module is configured to,

the loss was calculated as follows:

L＝L₁+λL₂

when y is_iWhen the number is equal to 1, the alloy is put into a container,

when y is_iWhen the number is not equal to 1, the content is determined,

And x_iThe included angle therebetween;

And x_iThe included angle therebetween;

Further, the determining module is configured to,

The technical scheme provided by the invention has the following beneficial effects:

the technical scheme provided by the invention comprises the steps of firstly extracting the characteristics of the picture to be detected by using a convolutional neural network based on deep learning, converting the characteristics into characteristic vectors by using a full connection layer, then calculating the loss vectors by using a loss function, and finally determining the category label corresponding to the calculated loss amount, namely the category label of the picture to be detected according to the corresponding relation between the preset loss amount and the category label. The new loss function is introduced, and the advantages of the two-classification loss function and the multi-classification loss function are integrated, so that the living body and the prosthesis picture can be better distinguished, and a better living body detection effect is achieved.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

fig. 2 is a schematic diagram of a model building process provided in the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

As shown in fig. 1, an embodiment of the present invention provides a face live detection method, including:

s1, acquiring a picture to be detected;

s2, extracting the characteristics of the picture to be detected by utilizing a pre-constructed convolutional neural network model;

s3, converting the features into feature vectors by utilizing a full connection layer;

and S4, determining the category label of the picture to be detected according to the feature vector.

In the embodiment of the application, the characteristics of the picture to be detected are firstly extracted by using a pre-constructed convolutional neural network model, the characteristics are converted into characteristic vectors by using a full connection layer, and finally the type label of the picture to be detected is determined according to the characteristic vectors. By introducing a new loss function and integrating the advantages of the two-classification loss function and the multi-classification loss function, the living body and the prosthesis picture can be better distinguished, and a better living body detection effect is achieved.

In some embodiments of the present application, as shown in fig. 2, the construction of the convolutional neural network model comprises:

s21, obtaining a plurality of picture samples, wherein each sample uniquely corresponds to one kind label, the plurality of picture samples comprise a plurality of kinds labels, and the plurality of kinds labels at least comprise more than two prosthesis kind labels;

the input pictures are divided into two categories, one is a real face picture, and the other is a false body picture, wherein the false body can be various, such as a printed photo, a plastic face mask and the like, and N kinds of pictures of the false body are assumed.

S22, extracting the characteristics of the picture sample by using a convolutional neural network based on deep learning;

and (3) extracting the features of the picture by using Resnet-101 without the last softmax operation as a network, namely, taking the picture matrix as network input, performing operations such as convolution, pooling and the like layer by using a convolutional neural network, and expanding the picture matrix into a 1000-dimensional vector by the last layer of the network, wherein the vector is the features of the input picture matrix extracted by the network.

S23, converting the features into feature vectors by utilizing a full connection layer based on the number of all prosthesis type labels corresponding to the multiple picture samples;

in the convolutional layer, the final output is a 1000-dimensional high-dimensional vector, and the high-dimensional vector is multiplied by an N + 1-dimensional parameter matrix and a 2-dimensional parameter matrix respectively to obtain an N + 1-dimensional fractional eigenvector and a 2-dimensional fractional eigenvector.

S24, training the convolutional neural network according to a loss function to obtain optimized network parameters;

the loss was calculated as follows:

L＝L₁+λL₂

when y is_iWhen the number is equal to 1, the alloy is put into a container,

when y is_iWhen the number is not equal to 1, the content is determined,

And x_iAngle between them；

And x_iThe included angle therebetween;

is a 2-dimensional parameter matrix W²Column j of (1); m is a hyper-parameter threshold; w¹And W²Are respectively x_iA parameter matrix between the N + 1-dimensional fractional eigenvector and the 2-dimensional fractional eigenvector; s and m are both to enlarge the difference between the picture of the real face and the picture of the prosthesis.

And S25, constructing a convolutional neural network model according to the optimized network parameters.

And (5) bringing the optimized network parameters back to the convolutional neural network to obtain a convolutional neural network model.

That is to say, in some embodiments of the present application, a plurality of labeled pictures are used for sexual training, wherein each sample uniquely corresponds to one type label, a convolutional neural network based on deep learning is used to extract features of the picture samples, the features are converted into feature vectors through a full-link layer, the convolutional neural network is trained according to a loss function to obtain optimized network parameters, and finally, the optimized network parameters are brought back to the convolutional neural network to obtain an optimized convolutional neural network model.

In some embodiments of the present application, the determining, by the feature vector, the category label of the picture to be detected includes:

For example, for a piece of test picture data X, it is assumed that the (N +1) score vector and the 2-dimensional score vector, which are output after the features obtained by the convolutional layer pass through the full-link layer, are X₁And X₂Then if for j 2,3, …, N +1 there is X_1，1＞X_1,j+ m, and X_2，1＞X_2,2+ m, where m is the hyper-parameter threshold, then the input test picture is considered to be a live face picture; and in other cases, the input test picture is considered to be a prosthesis picture. Where m is a threshold parameter set by the designer, which determines the inter-class difference between the living body class and the prosthesis class, the larger m is, the more strict the requirements for classification of living body pictures are, and the specific value of m needs to be determined by experiments and the characteristics of specific problems.

Based on the same inventive concept, the invention also provides a face living body detection system, which comprises:

the acquisition module is used for acquiring a picture to be detected;

Preferably, the extraction module comprises:

Preferably, the conversion sub-module is configured to,

wherein N is the number of prosthesis type tags.

Preferably, the optimization parameter submodule is configured to,

the loss was calculated as follows:

L＝L₁+λL₂

when y is_iWhen the number is equal to 1, the alloy is put into a container,

when y is_iWhen the number is not equal to 1, the content is determined,

And x_iThe included angle therebetween;

And x_iThe included angle therebetween;

Preferably, the determining module is configured to,

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the Processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques described herein may be implemented by means of units performing the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or make a contribution to the prior art, or may be implemented in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

It is to be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A face living body detection method is characterized by comprising the following steps:

acquiring a picture to be detected;

converting the features into feature vectors using a full-connectivity layer;

2. The method for detecting the living human face according to claim 1, wherein the constructing of the convolutional neural network model comprises:

3. The method for detecting the living human face according to claim 1 or 2, wherein the converting the features into feature vectors by using a full connection layer comprises:

wherein N is the number of prosthesis type tags.

4. The method of claim 3, wherein the training of the convolutional neural network according to the loss function to obtain the optimized network parameters comprises:

the loss was calculated as follows:

L＝L₁+λL₂