CN107992842B

CN107992842B - Living body detection method, computer device, and computer-readable storage medium

Info

Publication number: CN107992842B
Application number: CN201711330349.1A
Authority: CN
Inventors: 余梓彤; 严蕤; 牟永强
Original assignee: Shenzhen Lifei Technologies Co ltd
Current assignee: Shenzhen Li Fei Technology Co., Ltd.
Priority date: 2017-12-13
Filing date: 2017-12-13
Publication date: 2020-08-11
Anticipated expiration: 2037-12-13
Also published as: CN107992842A; WO2019114580A1

Abstract

The invention discloses a living body detection method, a computer device and a computer readable storage medium. In the invention, a multilayer perceptron model is determined by training the multilayer perceptron through a preset training set, continuous N frames of face images to be detected are obtained, intermediate frame face images of the continuous N frames of face images are converted into a second color space from a first color space, the texture characteristics of the converted intermediate frame face images and the dynamic mode characteristics of the continuous N frames of face images are extracted, the texture characteristics and the dynamic mode characteristics are fused to obtain fusion characteristics, the multilayer perceptron model is used for carrying out feature mapping on the fusion characteristics, the mapping characteristics are output and normalized, the prediction probability value of a living body label and the prediction probability value of a non-living body label are obtained, and then the continuous N frames of face images are determined to be living body or non-living body face images. The fusion features in the invention comprise texture features and dynamic mode features, so that the identification accuracy and safety of the living body detection can be improved.

Description

Living body detection method, computer device, and computer-readable storage medium

Technical Field

The invention belongs to the field of human face anti-counterfeiting, and particularly relates to a living body detection method, a computer device and a computer readable storage medium.

Background

In a face recognition or face anti-counterfeiting system, a living body detection technology is generally required to prevent lawbreakers from attacking with images or video information of others. The existing in-vivo detection technology is generally divided into an interactive method and a non-interactive method. The interactive living body detection technology requires the user to complete corresponding actions, such as blinking, shaking, smiling, etc., which results in poor user experience and unsatisfactory recognition effect. Non-interactive in-vivo detection techniques are generally classified into two types, detection based on color texture information and detection based on image motion information. The basic idea of the living body detection technology based on color texture information is to use face color texture information for classification and identification, but the method lacks face action information and is easy to be attacked by high-definition pictures or videos. In addition, the basic idea of the image motion information-based living body detection technology is to use micro motion information of a human face and simple human face texture information, but the method lacks deep extraction of human face distinguishing characteristics and is easy to be attacked by high-definition pictures or video information. Therefore, the existing living body detection system has low identification accuracy and poor safety.

Therefore, the existing living body detection system has the problems of low identification accuracy and poor safety.

Disclosure of Invention

The invention provides a living body detection method, a computer device and a computer readable storage medium, and aims to solve the problems of low identification accuracy and poor safety of the existing living body detection system.

The first aspect of the present invention provides a living body detection method, including:

training a multilayer perceptron by utilizing a preset training set, and determining a multilayer perceptron model;

acquiring face images of continuous N frames to be detected, wherein N is a positive integer greater than 3;

converting the face image of the intermediate frame in the face images of the continuous N frames from a first color space to a second color space, wherein when N is an odd number, the face image of the intermediate frame is the face image of the (N +1)/2 th frame, and when N is an even number, the face image of the intermediate frame is the face image of the N/2 th frame or the N/2+1 th frame;

extracting the texture features of the face image of the intermediate frame converted into the second color space;

extracting the dynamic mode characteristics of the continuous N frames of face images;

fusing the texture features and the dynamic mode features to obtain fused features;

inputting the fusion features into the multilayer perceptron model to obtain a prediction probability value of a living body label and a prediction probability value of a non-living body label;

when the prediction probability value of the living body label is larger than the prediction probability value of the non-living body label, determining the face image of the continuous N frames as a living body face image;

and when the prediction probability value of the non-living body label is larger than the prediction probability value of the non-living body label, determining that the face image of the continuous N frames is a non-living body face image.

In a preferred embodiment, the first color space is an RGB color space, the second color space is a Lab color space, and the extracting the texture feature of the face image converted into the intermediate frame of the second color space includes:

and extracting the local phase quantization texture characteristics of the preset neighborhood of the face image converted into the intermediate frame of the Lab color space.

In a preferred embodiment, the extracting the local phase quantization texture feature of the preset neighborhood of the face image converted into the intermediate frame of the Lab color space includes:

extracting the multilevel local phase quantization textural features of the preset neighborhood of the face image converted into the intermediate frame of the Lab color space;

the fusing the texture features and the dynamic mode features, and acquiring fused fusion features comprises:

and fusing the multilevel local phase quantization texture feature of the preset neighborhood with the dynamic mode feature to obtain a fused feature.

In a preferred embodiment, the extracting the dynamic mode feature of the consecutive N frames of face images includes:

and extracting the dynamic mode feature with the maximum energy in the dynamic mode features of the continuous N frames of face images.

In a preferred embodiment, the extracting of the most energetic dynamic mode feature of the dynamic mode features of the N consecutive frames of face images includes:

adopting (m x N) 1 column vectors to represent m x N gray value data contained in the face image, and acquiring a first data matrix consisting of N-1 column vectors corresponding to the front N-1 frames of face images and a second data matrix consisting of N-1 column vectors corresponding to the back N-1 frames of face images, wherein m and N are positive integers;

acquiring an adjoint matrix of a linear mapping matrix according to the first data matrix and the second data matrix, wherein the linear mapping matrix is a matrix obtained by multiplying the first data matrix and an inverse matrix of the second data matrix;

obtaining an eigenvector and an eigenvalue of the adjoint matrix through eigenvalue decomposition;

determining a feature vector corresponding to the feature value with the maximum absolute value in the feature values;

and multiplying the first data matrix by the eigenvector corresponding to the eigenvalue with the maximum absolute value, and taking an absolute value of a result of the multiplication to obtain the dynamic mode characteristic with the maximum energy in the dynamic mode characteristics of the face images of the continuous N frames.

In a preferred embodiment, the obtaining a adjoint matrix of a linear mapping matrix according to the first data matrix and the second data matrix includes:

performing triangular decomposition on the first data matrix, and respectively obtaining an upper triangular matrix and a lower triangular matrix of the first data matrix;

acquiring an inverse matrix of the upper triangular matrix and a pseudo-inverse matrix of the lower triangular matrix;

and multiplying the inverse matrix of the upper triangular matrix, the pseudo-inverse matrix of the lower triangular matrix and the second data matrix to obtain an adjoint matrix of the linear mapping matrix.

In a preferred embodiment, the multi-layered sensor includes at least a first fully-connected layer and a second fully-connected layer, the training of the multi-layered sensor using the preset training set includes:

randomly extracting a first sample and a second sample from a preset training set, wherein each sample in the preset training set comprises at least N continuous frames of face images;

respectively extracting the fusion characteristics of the first sample and the fusion characteristics of the second sample;

inputting the fusion characteristics of the first sample and the fusion characteristics of the second sample into the multilayer perceptron respectively, and acquiring the Softmax loss of the first sample and the Softmax loss of the second sample;

determining a loss of contrast for the first sample and the second sample;

determining a total loss from the Softmax loss of the first sample, the Softmax loss of the second sample, and the contrast loss;

when the total loss does not meet the preset condition of loss convergence, adjusting the parameters of a first full-connection layer and the parameters of a second full-connection layer in the multilayer sensor through a back propagation process by using a random gradient descent method;

repeating the above process until the total loss meets the preset condition of loss convergence;

and taking the parameters of the first full connection layer and the parameters of the second full connection layer in the last iteration process before the preset condition of loss convergence is met as the parameters of the first full connection layer and the parameters of the second full connection layer of the multilayer sensor model, and determining the multilayer sensor model.

In a preferred embodiment, the preset condition includes that the total loss is calculated by a number equal to a preset number threshold or the total loss is smaller than or equal to a preset loss threshold.

A second aspect of the present invention provides a living body detecting system including:

the training module is used for training the multilayer perceptron by utilizing a preset training set and determining a multilayer perceptron model;

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring face images of continuous N frames to be detected, and N is a positive integer greater than 3;

the conversion module is used for converting the face image of the intermediate frame in the face images of the continuous N frames from a first color space to a second color space, wherein when N is an odd number, the face image of the intermediate frame is the face image of the (N +1)/2 th frame, and when N is an even number, the face image of the intermediate frame is the face image of the N/2 th frame or the N/2+1 th frame;

the texture feature extraction module is used for extracting the texture features of the face image of the intermediate frame converted into the second color space;

the dynamic mode feature extraction module is used for extracting dynamic mode features of the continuous N frames of face images;

the fusion module is used for fusing the texture feature and the dynamic mode feature to obtain fused features;

the probability acquisition module is used for inputting the fusion characteristics into the multilayer perceptron model to acquire a prediction probability value of a living body label and a prediction probability value of a non-living body label;

the determining module is used for determining the face images of the continuous N frames as the living face images when the prediction probability value of the living label is greater than the prediction probability value of the non-living label;

the determining module is further configured to determine that the face images of the consecutive N frames are non-living face images when the prediction probability value of the non-living label is smaller than the prediction probability value of the non-living label.

A third aspect of the invention provides a computer arrangement comprising a processor for implementing the method of living body detection of any of the above embodiments when executing a computer program stored in a memory.

A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the living body detecting method of any of the above-described embodiments.

In the invention, the fusion characteristics of the face images of the continuous N frames and a trained multilayer perceptron model are utilized to detect the face images of the continuous N frames, and further the face images of the continuous N frames are determined to be living body face images or non-living body face images.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart of an implementation of a method for detecting a living body according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating the implementation of step S105 in the method for detecting a living body according to an embodiment of the present invention;

FIG. 3 is a flowchart of an implementation of step S101 in the method for detecting a living body according to an embodiment of the present invention;

FIG. 4 is a functional block diagram of a biopsy system provided in accordance with an embodiment of the present invention;

FIG. 5 is a block diagram illustrating the structure of the dynamic mode feature extraction module 105 in the in-vivo detection system according to the embodiment of the present invention;

FIG. 6 is a block diagram of a training module 101 in the in-vivo detection system according to the embodiment of the present invention;

fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Fig. 1 shows a flow of implementing the living body detecting method provided by the embodiment of the invention, and the sequence of steps in the flow chart can be changed and some steps can be omitted according to different requirements. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and detailed as follows:

as shown in fig. 1, the living body detecting method includes:

and S101, training the multilayer perceptron by using a preset training set, and determining a multilayer perceptron model.

The preset training set is a preset training set, and the training set comprises a large number of human face pictures used for training the multilayer perceptron model. The multi-layer perceptron is a feedforward artificial Neural network (FF-ANN) model that maps multiple input data sets onto a single output data set. In the embodiment of the invention, a plurality of face pictures contained in the preset training set are used for training the multilayer perceptron, and the trained multilayer perceptron model is determined, so that the multilayer perceptron model is used for detecting the face pictures to judge that the face pictures are living body face pictures or non-living body face pictures.

Step S102, acquiring continuous N frames of face images to be detected, wherein N is a positive integer greater than 3.

In order to detect whether the face image is a living body face image or a non-living body face image, it is first necessary to acquire N consecutive frames of face images by an image acquisition apparatus. For example, a camera of a mobile phone, an access control recognition system, or an image acquisition device of a face anti-counterfeiting system, such as a camera, may be used to acquire N consecutive frames of face images within a certain time; or capturing a scene image through a monocular camera, detecting a human face image in real time by using a human face detection algorithm, and intercepting continuous multiframe human face images. Wherein N is a positive integer greater than 3. For example, a camera of the face anti-counterfeiting system acquires face images of 60 continuous frames within 1-2 seconds, so as to detect whether the face images of the 60 continuous frames are live face images or non-live face images in the following process.

In a preferred embodiment, in order to further improve the identification accuracy and safety of the in-vivo detection, the in-vivo detection method further includes: and carrying out gray processing and/or normalization processing on the face images of the continuous N frames.

After the face images of the continuous N frames are acquired, the acquired face images can be preprocessed. For example, the acquired face image is subjected to a graying process or a normalization process. In addition, preprocessing such as smoothing, filtering, segmentation and the like can be performed on the acquired face image, and details are not repeated here. In addition, when the normalization processing is performed on the face images of the continuous N frames, the normalization processing can be performed on the face images of the continuous N frames according to face key point detection and face alignment.

Step S103, converting the face image of the intermediate frame in the face images of the continuous N frames from a first color space to a second color space, wherein when N is an odd number, the face image of the intermediate frame is the face image of the (N +1)/2 th frame, and when N is an even number, the face image of the intermediate frame is the face image of the N/2 th frame or the N/2+1 th frame.

After the face images of the continuous N frames are acquired, the face images of the intermediate frames are converted from a first color space to a second color space. For the determination of the face image of the intermediate frame, when N is an odd number, the face image of the intermediate frame is the face image of the (N +1)/2 th frame, and assuming that N is 61 as an example, the 31 st frame face image is the face image of the intermediate frame; when N is an even number, assuming that N is 60 as an example, the face image of the intermediate frame is the 30 th frame face image or the 31 st frame face image.

Since the RGB color space is the most commonly used color space and the Lab color space can better simulate human perception of color and highlight the strong selectivity of the opposite color space (i.e. the red color channel a and the blue-yellow color channel b) compared to the RGB color space, in a preferred embodiment, the first color space is the RGB color space and the second color space is the Lab color space. The RGB color space includes a red channel R, a green channel G, and a blue channel B. The Lab color space includes a luminance channel L representing the luminance of the pixel over a range of [0,100], and two opposite color channels, a green-red color channel a representing the range from red to green over a range of [127, -128], and a blue-yellow color channel b representing the range from yellow to blue over a range of [127, -128 ]. Specifically, the face image of the intermediate frame in the face images of the consecutive N frames may be converted from an RGB color space to an Lab color space according to the following transformation:

L＝0.2126*R+0.7152*G+0.0722*B；

a＝1.4749*(2.2213*R-0.339*G+0.1177*B)+128；

b＝0.6245*(0.1949*R+0.6057*G-0.8006*B)+128；

wherein L, a and B are the values of the Lab color space luminance channel, the green-red color channel, and the blue-yellow color channel, respectively, and R, G and B are the values of the RGB color space red channel, the green channel, and the blue channel, respectively.

In addition, the conversion of the face image of the intermediate frame in the face images of the consecutive N frames from the RGB color space to the Lab color space is not limited to the above conversion, and the face image may be converted from the RGB color space to the XYZ color space and then from the XYZ color space to the Lab color space, which is not described in detail herein. The XYZ color space is a new colorimetric system established by using three hypothetical primaries X, Y and Z through a large number of visual measurements and statistics of normal people based on the RGB color space by the international commission on illumination, and is not described in detail herein.

And step S104, extracting the texture features of the face image of the intermediate frame converted into the second color space.

The texture feature is a visual feature that reflects the homogeneity phenomenon in an image, and is expressed by the gray level distribution of pixels and their surrounding spatial neighborhoods, i.e., local texture information. Texture features describe the spatial color distribution and light intensity distribution of an image or a small region therein. And after the color space conversion is carried out on the face image of the intermediate frame, extracting the texture features of the face image of the intermediate frame of the second color space after the conversion, namely extracting the texture features of the face image of the intermediate frame of the Lab color space.

In a preferred embodiment, in order to improve the efficiency of the live body detection and further improve the identification accuracy and safety of the live body detection, the step S104 of extracting the texture feature of the face image converted into the intermediate frame of the second color space includes: and extracting the local phase quantization texture characteristics of the preset neighborhood of the face image converted into the intermediate frame of the Lab color space.

When extracting the texture features of the face image, Local Binary Pattern (LBP) texture features (LBP texture features) of the face image may be extracted in a spatial domain, or Local Phase Quantization (LPQ) texture features (LPQ texture features) of the face image may be extracted in a frequency domain. In order to improve the efficiency and accuracy of living body detection, in the embodiment of the invention, the LPQ texture features of the preset neighborhood on the frequency domain of the face image of the intermediate frame converted into the Lab color space are extracted. In other embodiments, the LBP texture feature of the face image converted into the intermediate frame of the Lab color space in the spatial domain may also be extracted, which is not described in detail herein.

The preset neighborhood is a preset neighborhood, and is not particularly limited herein. In a preferred embodiment, the predetermined neighborhood is a 3 × 3 neighborhood, or a 5 × 5 neighborhood, or a 7 × 7 neighborhood. In addition, in a preferred embodiment, in order to further improve the accuracy of the in-vivo detection, for the face image converted into the intermediate frame of the Lab color space, multi-level LPQ texture features of preset neighborhoods are extracted, for example, LPQ texture features of 3 × 3 neighborhoods, 5 × 5 neighborhoods, and 7 × 7 neighborhoods of the face image converted into the intermediate frame of the Lab color space are respectively extracted, and the extracted multi-level LPQ texture features are spliced and fused, in an embodiment of the present invention, the LPQ texture features are texture features expressed in a vector form, where the fusing of the multi-level LPQ texture features refers to: and splicing the vectors of the LPQ textural features of the 3 × 3 neighborhoods, the 5 × 5 neighborhoods and the 7 × 7 neighborhoods to form a spliced vector with the vector dimension being the sum of the three vectors, namely the fused multilevel LPQ textural features, which are used as the final textural features of the face image converted into the intermediate frame of the second color space. In addition, the order of the LPQ texture feature concatenations of the 3 × 3 neighborhood, the 5 × 5 neighborhood, and the 7 × 7 neighborhood and the positions of the vectors are not particularly limited, and may be freely combined and arranged. For example, the LPQ texture features of the 3 × 3 neighborhood, the 5 × 5 neighborhood, and the 7 × 7 neighborhood are sequentially spliced, or the LPQ texture features of the 5 × 5 neighborhood, the 3 × 3 neighborhood, and the 7 × 7 neighborhood are sequentially spliced, and the like.

And step S105, extracting the dynamic mode characteristics of the continuous N frames of face images.

In view of the fact that the dynamic mode features of the continuous frames contain dynamic motion information between the continuous frames, the dynamic mode features of the continuous frames are extracted during the living body detection, and therefore image attacks can be better detected. In the embodiment of the present invention, the dynamic mode features are also expressed in the form of vectors.

And step S106, fusing the texture features and the dynamic mode features to obtain fused features.

After the texture features and the dynamic mode features of the face image are respectively obtained, the texture features and the dynamic mode features can be fused to obtain fused features. The fusion characteristics simultaneously contain abundant textural characteristics and dynamic motion information of the face images of the continuous N frames, so that the accuracy of the living body detection can be improved.

In a preferred embodiment, the step S106 of fusing the texture feature and the dynamic mode feature, and acquiring a fused feature includes: and fusing the multilevel local phase quantization texture feature of the preset neighborhood with the dynamic mode feature to obtain a fused feature.

The multi-stage LPQ texture features and the dynamic mode features are expressed in a vector mode, and the multi-stage LPQ texture features and the dynamic mode features are spliced to obtain fused features. In addition, when performing splicing and fusion, the splicing and fusion can be performed in the order of the front dynamic mode features and the rear dynamic mode features or in the order of the front dynamic mode features and the rear dynamic mode features.

And S107, inputting the fusion characteristics into the multilayer perceptron model to obtain the prediction probability value of the living body label and the prediction probability value of the non-living body label.

After the fusion features are obtained, inputting the fusion features into the trained multilayer perceptron model to perform feature mapping and normalization on the fusion features, and obtaining the prediction probability value of the corresponding living body label and the prediction probability value of the non-living body label. The prediction probability value of the living body label represents the prediction probability that the face picture to be detected is the living body face picture, and the prediction probability value of the non-living body label represents the prediction probability that the face picture to be detected is the non-living body face picture. For performing feature mapping and normalization on the fusion features by using the multi-layer perceptron model, reference may be made to the related contents of the multi-layer perceptron training hereinafter, and details are not repeated here.

And when the prediction probability value of the living body label is larger than the prediction probability value of the non-living body label, executing step S108, and determining the face image of the continuous N frames as a living body face image.

And when the prediction probability value of the non-living body label is greater than the prediction probability value of the non-living body label, executing step S109, and determining that the face image of the continuous N frames is a non-living body face image.

Comparing the obtained prediction probability value of the living body label with the prediction probability value of the non-living body label, and determining the face image of the continuous N frames as a living body face image when the prediction probability value of the living body label is larger than the prediction probability value of the non-living body label; and when the prediction probability value of the non-living body label is larger than the prediction probability value of the non-living body label, determining the face image of the continuous N frames as a non-living body face image.

In a preferred embodiment, in order to further improve the identification accuracy and safety of the in-vivo detection, the in-vivo detection method further includes: and carrying out normalization processing on the dynamic mode characteristics to obtain the dynamic mode characteristics after the normalization processing.

In a preferred embodiment, in order to further improve the recognition efficiency of the living body detection, in step S105, the extracting the dynamic mode feature of the consecutive N frames of face images includes:

Considering that the dynamic mode features of the face images of the N continuous frames comprise a plurality of dynamic mode features, wherein the dynamic mode feature with the maximum energy comprises the maximum dynamic structural information and the most abundant texture information between the continuous frames. Therefore, in order to improve the efficiency of the living body detection, in the embodiment of the present invention, the dynamic mode feature with the largest energy among the dynamic mode features of the consecutive N frames of face images is extracted.

In the embodiment of the invention, the trained multilayer perceptron model is utilized to detect the face images of the continuous N frames according to the fusion characteristics of the face images of the continuous N frames, and the prediction probability value of the face images of the continuous N frames as the living body face images or the non-living body labels is determined.

Fig. 2 shows steps S105 included in the living body detecting method provided by the embodiment of the present invention: and according to different requirements, the sequence of steps in the flow chart can be changed, and some steps can be omitted. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and detailed as follows:

in a preferred embodiment, as shown in fig. 2, step S105 includes: the extracting of the dynamic mode feature with the largest energy in the dynamic mode features of the continuous N frames of face images comprises:

step S1051, using (m x N) x 1 column vector to represent m x N gray value data contained in the face image, and obtaining a first data matrix composed of N-1 column vectors corresponding to the front N-1 frame face image and a second data matrix composed of N-1 column vectors corresponding to the back N-1 frame face image, wherein m and N are positive integers.

And representing m × N gray value data contained in each face image of the continuous N frames by using a column vector of (m × N) × 1, wherein m and N are positive integers. I.e. a column vector p representing the face image of the r-th frame as (m x n) 1_rWherein r is a positive integer less than or equal to N. Then sequentially forming a first data matrix P by the column vectors corresponding to the face images of the first N-1 frames₁Sequentially forming a second data matrix P by the column vectors corresponding to the human face images of the next N-1 frames₂I.e. obtaining a first data matrix P₁And a second data matrix P₂。

Step S1052, according to the first data matrix P₁And said second data matrix P₂Obtaining a companion matrix H of a linear mapping matrix A, wherein the linear mapping matrix A is the first data matrix P₁And said second data matrix P₂Inverse matrix of

The multiplied matrix.

Obtaining the first data matrix P₁And said second data matrix P₂Then, the data can be obtained according to the first data matrix P₁And said second data matrix P₂Obtaining a companion matrix H of the linear mapping matrix A, wherein the linear mapping matrix A is the first data matrix P₁And said second data matrix P₂Inverse matrix of

The multiplied matrix, i.e.:

the linear mapping matrix A contains global visual dynamic information in the face images of the continuous N frames, and dynamic mode characteristics of the face images of the continuous N frames can be obtained through the linear mapping matrix A.

In a preferred embodiment, in order to improve the identification efficiency of the living body detecting method, the step S1052 of acquiring a companion matrix of a linear mapping matrix according to the first data matrix and the second data matrix includes:

and carrying out triangular decomposition on the first data matrix, and respectively obtaining an upper triangular matrix and a lower triangular matrix of the first data matrix.

In view of the fact that the trigonometric decomposition is mainly used for simplifying the calculation process of the matrix, particularly the matrix with a large dimension, and the calculation efficiency can be improved, and further the efficiency of the in vivo detection is improved, in the embodiment of the present invention, the adjoint matrix H of the linear mapping matrix a is solved by using the trigonometric decomposition. Triangle Decomposition (i.e., LU Decomposition) is one of matrix decompositions that can decompose a matrix into the product of a triangle matrix in unity and a triangle matrix in unity. In the embodiment of the invention, the first data matrix P is subjected to₁Performing triangle decomposition to obtain the first data matrix P₁The upper triangular matrix U and the lower triangular matrix L, i.e.: p₁＝L*U。

In addition, when the companion matrix H of the linear mapping matrix a is obtained, it may be obtained by other matrix decomposition methods, such as orthogonal triangular decomposition (i.e., QR decomposition) and singular value decomposition, which are not described in detail here.

Obtaining an inverse matrix U of the upper triangular matrix U^-1And a pseudo-inverse L of the lower triangular L matrix⁺。

After obtaining the first data matrix P₁After the upper triangular matrix U is obtained, the inverse matrix U of the upper triangular matrix U is obtained^-1(ii) a In additionAnd acquiring a pseudo-inverse matrix L of the lower triangular matrix L according to the lower triangular matrix L⁺. The pseudo-inverse matrix is a generalized form of an inverse matrix, also referred to as a generalized inverse matrix. When there is an inverse matrix K to the matrix K^-1Matrices X of the same type satisfy K X K ═ K, and X K ═ X, and are referred to as matrices X in this case as pseudo-inverse matrices of matrices K.

Inverse matrix U of the upper triangular matrix U^-1A pseudo inverse matrix L of the lower triangular matrix L⁺And said second data matrix multiplication P₂And acquiring a adjoint matrix H of the linear mapping matrix A.

Obtaining an inverse matrix U of the upper triangular matrix U^-1A pseudo inverse matrix L of the lower triangular matrix L⁺And said second data matrix P₂Inverse matrix U^-1Pseudo-inverse matrix L⁺And a second data matrix P₂Multiplying to obtain a adjoint matrix H of the linear mapping matrix A, namely: h ═ U^-1*L⁺*P₂。

Step S1053, obtaining the eigenvector E of the adjoint matrix H through eigenvalue decomposition_vecAnd a characteristic value E_val。

Eigenvalue decomposition, also called spectral decomposition, is a method of decomposing a matrix into the product of matrices represented by eigenvalues and eigenvectors of the matrix. Typically, the matrix contains a plurality of eigenvalues and eigenvectors. After obtaining the adjoint matrix H, the eigenvector E of the adjoint matrix H can be obtained by eigenvalue decomposition_vecAnd a characteristic value E_val。

Step S1054, determining the characteristic value E_valCharacteristic value E with maximum medium absolute value_val(K) Corresponding feature vector E_vec(K)。

Considering that the adjoint matrix H includes a plurality of eigenvalues, and the eigenvalue with the largest absolute value among the eigenvalues corresponds to the dynamic mode feature with the largest energy among the dynamic mode features, here, the eigenvalue E included in the adjoint matrix H is calculated separately_valAnd comparing all the absolute values to determine the feature direction corresponding to the feature value with the maximum absolute value among the feature valuesAmount of the compound (A). For example, the eigenvalues E of the adjoint matrix may be labeled_valAnd associating the feature values with the corresponding feature vectors. Assuming that the eigenvalue with the maximum absolute value is the eigenvalue E with the index position K_val(K) Then, in determining the characteristic value E with the index position K_val(K) Then, the characteristic value E with the index position being K is determined_val(K) Corresponding feature vector E_vec(K)。

Step S1055, the first data matrix P₁And the eigenvalue E with the maximum absolute value among the eigenvalues_val(K) Corresponding feature vector E_vec(K) And multiplying, and taking an absolute value of a multiplied result to obtain the dynamic mode feature with the maximum energy in the dynamic mode features of the face images of the continuous N frames.

Determining the characteristic value E with the index position with the maximum absolute value as K_val(K) Corresponding feature vector E_vec(K) Then, the first data matrix P is divided into₁Characteristic value E having the maximum absolute value_val(K) Corresponding feature vector E_vec(K) Multiplying, and taking an absolute value of an element in the multiplied vector, and assuming that the dynamic mode characteristic with the maximum energy is DM, the following steps are provided: DM ═ abs (P)₁*E_vec(K) And then, the dynamic mode feature with the largest energy in the dynamic mode features of the face images of the continuous N frames can be obtained.

In the embodiment of the present invention, the adjoint matrix H of the linear mapping matrix a is obtained by triangular decomposition, the eigenvalue and the eigenvector of the adjoint matrix H are obtained by eigenvalue decomposition, the eigenvector corresponding to the eigenvalue with the largest absolute value in the eigenvalue is determined, and the dynamic mode feature with the largest energy in the dynamic mode features of the face images of the consecutive N frames is further obtained.

Fig. 3 shows a flow of implementing step S101 in the living body detecting method provided by the embodiment of the invention, and the order of the steps in the flow chart can be changed and some steps can be omitted according to different requirements. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and detailed as follows:

in a preferred embodiment, in order to improve the identification accuracy and safety of the in-vivo detection, as shown in fig. 3, the multi-layered sensor includes at least a first fully-connected layer and a second fully-connected layer, and the step S101 of training the multi-layered sensor by using a preset training set, and determining the multi-layered sensor model includes:

step S1011, randomly extracting a first sample and a second sample from a preset training set, where each sample in the preset training set includes at least N consecutive frames of face images.

The preset training set is a preset training set and comprises a large number of face pictures, namely samples, and each sample in the preset training set comprises at least N continuous frames of face images. And randomly drawing a first sample and a second sample from the preset training set for training.

Step S1012, extracting the fusion feature of the first sample and the fusion feature of the second sample, respectively.

After the first sample and the second sample are extracted from the preset training set, the fusion features of the first sample and the second sample are respectively extracted, which may specifically refer to the contents of step S102 to step S106, and details are not repeated here.

And S1013, respectively inputting the fusion characteristics of the first sample and the fusion characteristics of the second sample into the multilayer perceptron, and acquiring the Softmax loss of the first sample and the Softmax loss of the second sample.

In this embodiment of the present invention, the multi-layer sensor at least includes a first full connection layer and a second full connection layer, where the first full connection layer and the second full connection layer are used to perform feature mapping on the fused feature, and specifically, both the first full connection layer and the second full connection layer use an activation function to perform feature mapping transformation on any fused feature vector. In view of the fact that a Relu (modified linear unit, called a redirected linear unit for short) activation function can accelerate convergence of a regression model and improve the speed and efficiency of training the regression model, in a preferred embodiment, the first full connection layer and the second full connection layer both use the Relu activation function to perform feature mapping transformation on any fused feature vector. The multilayer perceptron further comprises a Softmax layer, the preset training set further comprises label categories of the samples, the label categories comprise two types of living body labels and non-living body labels, and before training, the label category of each sample in the preset training set is known and determined.

After feature mapping of a fully-connected layer of the multilayer sensor, inputting an output of a second fully-connected layer to a Softmax layer of the multilayer sensor, where the Softmax layer of the multilayer sensor is mainly used to perform normalization processing on the input features, and specifically, the normalization processing may be performed according to the following formula:

and

wherein, f (z)_i) And f (z)_j) Respectively representing the predicted probability, z, of the label of the first and second sample after passing through the Softmax layer of the multi-layer perceptron_iAnd z_jRespectively representing the outputs of the first and second samples after passing through the second fully-connected layer of the multilayer sensor, i and j respectively representing the label categories, and k representing the number of label categories, where there are only live labels and non-live labels, so that in an embodiment of the invention k is 2.

Determining the output f (z) of the first and second samples_i) And f (z)_j) Thereafter, the Softmax loss for the first sample and the Softmax loss for the second sample may be determined. Assume that the preset training set includes 2M samples, and each sample of the 2M samples includes at least N consecutive frames of face images, where M is a positive integer. Specifically, the Softmax loss of the first sample and the Softmax loss of the second sample may be determined according to the following formula:

and

wherein L is_s(i) And L_s(j) Respectively representing Softmax loss of the first sample and the second sample, M representing the number of pairs of samples in the batch in the preset training set, y_iAnd y_jThe true label classes representing the first and second samples, respectively, i.e. when determining the Softmax loss of the first sample, y for the first sample_iIs 1, and for samples other than the first, y is_iAre all zero; in determining the Softmax loss of the second sample, y for the second sample_iIs 1, and for samples other than the second sample, y thereof_iAre all zero. At this point, the Softmax loss for the first sample and the second sample, respectively, may be determined.

Step S1014, determining a contrast loss of the first sample and the second sample.

The contrast Loss (English full name: contrast Loss) can well express the matching degree of paired samples, and can also be well used for training a model for extracting features, and the method is mainly used for dimension reduction. In an embodiment of the present invention, the loss of contrast of the first sample and the second sample may be determined according to the following formula:

wherein L is_cRepresenting the contrast loss of the first sample and the second sample, M representing the number of sample pairs in the preset training set batch, y_n1 when the first and second samples are of the same label class and zero when the first and second samples are of different label classes, i.e. y_nMay represent whether the first sample and the second sample match, d represents the euclidean distance of the first sample and the second sample, and the calculation of the euclidean distance of the particular first sample and the second sample, whereM is not described in detail_ijIn order to preset a distance threshold, i.e. a preset distance threshold, which can affect the convergence speed and performance of the multi-layer perceptron model training, in a preferred embodiment, the preset distance threshold m_ijIn the range of 0.01 to 0.1.

Step S1015, determining a total loss by the Softmax loss of the first sample, the Softmax loss of the second sample, and the contrast loss.

The Softmax loss L of the first sample and the second sample obtained respectively_s(i)、L_s(j) And the loss of contrast L of the first and second samples_cThe total loss of the first and second samples may then be determined according to the following equation:

L＝L_s(i)+L_s(j)+weight*L_c；

where L is the total loss of the first sample and the second sample, and weight is a preset weight parameter, that is, a preset weight parameter, and in a preferred embodiment, weight is 0.003.

When the total loss does not satisfy the preset condition of loss convergence, step S1016 is executed, and the parameters of the first fully-connected layer and the parameters of the second fully-connected layer in the multi-layer sensor are adjusted through a back propagation process by using a random gradient descent method. The process goes to step S1011 and goes to step S1011 to step S1015.

The stochastic gradient descent is mainly used for weight updating in a neural network model, and parameters of the model are updated and adjusted in one direction to minimize a loss function. In the back propagation, the products of the input signals and the corresponding weights are calculated in the forward propagation, then the activation function is applied to the sum of the products, then the related errors are returned in the back propagation process of the network model, the weight values are updated by using random gradient descent, and the weight parameters are updated in the opposite direction of the gradient of the loss function by calculating the gradient of the error function relative to the weight parameters. The following formula is satisfied at the first fully-connected layer and the second fully-connected layer of the multilayer sensor: S-W T + B, where S denotes output features, T denotes input features, W denotes weights of neurons in the fully connected layer, and B denotes an offset term. Therefore, in the embodiment of the present invention, when the total loss L does not satisfy the preset condition of loss convergence, the parameters of the first full-link layer and the parameters of the second full-link layer of the regression model, that is, the weight W and the bias term of the full-link layer neuron, are adjusted through a back propagation process by using a stochastic gradient descent method. After adjusting the parameters of the first full link layer and the parameters of the second full link layer of the regression model, go to step S1011, and execute steps S1011 to S1015.

The preset condition of loss convergence is a preset condition of loss convergence, and in a preferred embodiment, in order to further improve the identification efficiency of the in-vivo detection, the preset condition includes: the calculation times of the total loss are equal to a preset time threshold or the total loss is less than or equal to a preset loss threshold.

In setting the condition for loss convergence, the number of times of calculation of the total loss, that is, the number of times of iteration of the above-described process may be used as the condition for loss convergence. For example, when the number of times of calculation of the total loss is equal to a preset number threshold, the training of the multi-layer sensor is stopped when the total loss is considered to satisfy the preset condition of loss convergence, where the preset number threshold is a preset number threshold, and is not limited herein. Or when the total loss is less than or equal to a preset loss threshold, the total loss is considered to satisfy a preset condition of loss convergence, where the preset loss threshold is a preset loss threshold, and is not particularly limited herein.

When the total loss satisfies the preset condition of loss convergence, step S1017 is executed, and the parameters of the first fully-connected layer and the parameters of the second fully-connected layer in the last calculation process before the preset condition of loss convergence is satisfied are used as the parameters of the first fully-connected layer and the parameters of the second fully-connected layer of the multi-layer sensor model, so as to determine the multi-layer sensor model.

And when the total loss meets the preset condition of loss convergence, stopping training the multilayer perceptron, and taking the parameters of the first full-connection layer and the parameters of the second full-connection layer in the last calculation process before the preset condition of loss convergence is met as the parameters of the first full-connection layer and the parameters of the second full-connection layer of the multilayer perceptron model so as to determine the trained multilayer perceptron model.

In a preferred embodiment, in order to further improve the identification accuracy and safety of the living body detection, the step S1012 of respectively extracting the fusion feature of the first sample and the fusion feature of the second sample includes:

and respectively extracting the local phase quantization texture feature of the intermediate frame of the first sample and the dynamic mode feature with the maximum energy in the dynamic mode features of the first sample.

And fusing the local phase quantization texture feature of the intermediate frame of the first sample and the dynamic mode feature with the maximum energy in the dynamic mode features of the first sample to obtain the fusion feature of the first sample.

And respectively extracting the local phase quantization texture feature of the intermediate frame of the second sample and the dynamic mode feature with the maximum energy in the dynamic mode features of the second sample.

And fusing the local phase quantization texture feature of the intermediate frame of the second sample and the dynamic mode feature with the maximum energy in the dynamic mode features of the second sample to obtain the fusion feature of the second sample.

For the local phase quantization texture feature of the intermediate frame from which the first sample or the second sample is extracted, the content related to the step S104 may be specifically referred to; for extracting the dynamic mode feature with the largest energy from among the dynamic mode features of the first sample or the second sample, reference may be specifically made to the content related to step S105 above; for fusing the local phase quantization texture feature of the intermediate frame of the first sample or the second sample with the dynamic mode feature with the largest energy in the dynamic mode features of the first sample or the second sample, obtaining the fused feature of the first sample or the second sample, please refer to step S106 above specifically, which is not repeated here in detail.

In the embodiment of the invention, the multi-layer perceptron is trained by utilizing the fusion characteristics of the samples, the parameters of the full-connection layer of the multi-layer perceptron are adjusted by adopting a random gradient descent method through a back propagation process, and when the total loss meets the preset condition of loss convergence, the trained multi-layer perceptron model is determined. In view of the fact that the fusion features of the samples in the embodiment of the invention include the multi-level texture features and the dynamic mode features with the maximum energy of the samples, the identification accuracy and safety of the living body detection can be improved. In addition, compared with other gradient descent methods, the random gradient descent method has higher operation speed and can achieve the purpose of fast convergence, so that the method and the device can also improve the efficiency of in-vivo detection.

Fig. 4 shows functional modules of a living body detecting system provided by an embodiment of the invention, and for convenience of explanation, only the parts related to the embodiment of the invention are shown, and the details are as follows:

referring to fig. 4, each module included in the in-vivo detection system 10 is used to execute each step in the embodiment corresponding to fig. 1, and specific reference is made to fig. 1 and the related description in the embodiment corresponding to fig. 1, which are not repeated herein. In a preferred embodiment, the living body detecting system 10 includes a training module 101, an obtaining module 102, a converting module 103, a texture feature extracting module 104, a dynamic mode feature extracting module 105, a fusing module 106, a probability obtaining module 107, and a determining module 108.

The training module 101 is configured to train the multi-layered sensor by using a preset training set, and determine a multi-layered sensor model.

The acquiring module 102 is configured to acquire N consecutive frames of face images to be detected, where N is a positive integer greater than 3.

The conversion module 103 is configured to convert the face image of the intermediate frame in the consecutive N frames of face images from a first color space to a second color space, where when N is an odd number, the face image of the intermediate frame is the face image of the (N +1)/2 th frame, and when N is an even number, the face image of the intermediate frame is the face image of the N/2 th frame or the N/2+1 th frame.

The texture feature extraction module 104 is configured to extract the texture feature of the face image of the intermediate frame converted into the second color space.

The dynamic mode feature extraction module 105 is configured to extract dynamic mode features of the consecutive N frames of face images.

The fusion module 106 is configured to fuse the texture feature and the dynamic mode feature to obtain a fused feature.

The probability obtaining module 107 is configured to input the fusion feature to the multilayer perceptron model, and obtain a predicted probability value of a live tag and a predicted probability value of a non-live tag.

The determining module 108 is configured to determine that the face images of the consecutive N frames are live face images when the predicted probability value of the live body label is greater than the predicted probability value of the non-live body label.

The determining module 108 is further configured to determine that the face images of the consecutive N frames are non-living face images when the predicted probability value of the non-living label is smaller than the predicted probability value of the non-living label.

In the embodiment of the present invention, a trained multilayer perceptron model is used to detect face images of consecutive N frames according to fusion features of the face images of the consecutive N frames, and the determining module 108 further determines a prediction probability value that the face images of the consecutive N frames are live face images or non-live labels.

Fig. 5 shows a block diagram of the dynamic mode feature extraction module 105 in the living body detecting system according to the embodiment of the present invention, which only shows the relevant parts according to the embodiment of the present invention for convenience of description, and the following details are described below:

referring to fig. 5, each unit included in the dynamic mode feature extraction module 105 is configured to execute each step in the embodiment corresponding to fig. 2, and please refer to fig. 2 and the related description in the embodiment corresponding to fig. 2 for details, which are not described herein again. In a preferred embodiment, the dynamic mode feature extraction module 105 includes a data matrix obtaining unit 1051, an accompanying matrix obtaining unit 1052, a feature value decomposition unit 1053, a feature vector determination unit 1054, and a dynamic mode feature obtaining unit 1055.

The data matrix obtaining unit 1051 is configured to use the column vector of (m × N) × 1 to represent m × N gray-level data included in the face image, and obtain a first data matrix composed of N-1 column vectors corresponding to the first N-1 frames of face images and a second data matrix composed of N-1 column vectors corresponding to the last N-1 frames of face images.

The adjoint matrix obtaining unit 1052 is configured to obtain an adjoint matrix of a linear mapping matrix according to the first data matrix and the second data matrix, where the linear mapping matrix is a matrix obtained by multiplying the first data matrix by an inverse matrix of the second data matrix, and m and n are positive integers.

The eigenvalue decomposition unit 1053 is configured to obtain an eigenvector and an eigenvalue of the adjoint matrix through eigenvalue decomposition.

The feature vector determining unit 1054 is configured to determine a feature vector corresponding to a feature value with a largest absolute value in the feature values.

The dynamic mode feature obtaining unit 1055 is configured to multiply the first data matrix with a feature vector corresponding to a feature value closest to zero in the phase angle values, and obtain an absolute value of a result of the multiplication, so as to obtain a dynamic mode feature with the largest energy in the dynamic mode features of the face images of the consecutive N frames.

In the embodiment of the present invention, the adjoint matrix obtaining unit 1052 first obtains the adjoint matrix of the linear mapping matrix, the eigenvalue decomposition unit 1053 obtains the eigenvalue and the eigenvector of the adjoint matrix through eigenvalue decomposition, determines the eigenvector corresponding to the eigenvalue with the largest absolute value in the eigenvalue, and further obtains the dynamic mode feature with the largest energy in the dynamic mode features of the face images of the consecutive N frames, and the dynamic mode feature in the embodiment of the present invention is the dynamic mode feature with the largest energy.

Fig. 6 shows a block diagram of a training module 101 in the biopsy system according to the embodiment of the present invention, and for convenience of illustration, only the parts related to the embodiment of the present invention are shown, and the details are as follows:

referring to fig. 6, each unit included in the training module 101 is configured to execute each step in the embodiment corresponding to fig. 3, and please refer to fig. 3 and the related description in the embodiment corresponding to fig. 3 for details, which are not described herein again. In a preferred embodiment, the training module 101 comprises: a sample extraction unit 1011, a fusion feature extraction unit 1012, a Softmax loss determination unit 1013, a contrast loss determination unit 1014, a total loss determination unit 1015, a parameter adjustment unit 1016, and a model determination unit 1017.

The sample extraction unit 1011 is configured to randomly extract a first sample and a second sample from a preset training set, where each sample in the preset training set includes at least N consecutive frames of face images.

The fused feature extracting unit 1012 is configured to extract a fused feature of the first sample and a fused feature of the second sample, respectively.

The Softmax loss determining unit 1013 is configured to input the fusion characteristic of the first sample and the fusion characteristic of the second sample into the multilayer perceptron, and obtain a Softmax loss of the first sample and a Softmax loss of the second sample.

The contrast loss determination unit 1014 is configured to determine a contrast loss of the first sample and the second sample.

The total loss determining unit 1015 is configured to determine a total loss by the Softmax loss of the first sample, the Softmax loss of the second sample, and the contrast loss.

The parameter adjusting unit 1016 is configured to adjust a parameter of the first fully-connected layer and a parameter of the second fully-connected layer in the multi-layer sensor through a back propagation process by using a stochastic gradient descent method when the total loss does not satisfy a preset condition of loss convergence.

The model determining unit 1017 is configured to, when the total loss satisfies a preset condition of loss convergence, determine the multilayer sensor model by using a parameter of the first fully-connected layer and a parameter of the second fully-connected layer in a last calculation process before the preset condition of loss convergence is satisfied as a parameter of the first fully-connected layer and a parameter of the second fully-connected layer of the multilayer sensor model.

In the embodiment of the present invention, the multi-layer sensor is trained by using the fusion characteristics of the samples, the parameter adjusting unit 1016 adjusts the parameters of the fully-connected layers of the multi-layer sensor through a back propagation process by using a random gradient descent method, and the model determining unit 1017 determines the trained multi-layer sensor model when the total loss satisfies the preset condition of loss convergence. In the embodiment of the invention, the fusion characteristics of the samples comprise the multi-level texture characteristics and the dynamic mode characteristics with the maximum energy of the samples, so that the identification accuracy and safety of the living body detection can be improved. In addition, compared with other gradient descent methods, the random gradient descent method has higher operation speed and can achieve the purpose of fast convergence, so that the method and the device can also improve the efficiency of in-vivo detection.

Fig. 7 is a schematic structural diagram of a computer device 1 according to a preferred embodiment of the method for detecting a living body according to an embodiment of the present invention. As shown in fig. 7, the computer apparatus 1 includes a memory 11, a processor 12, and an input/output device 13.

The computer device 1 is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and its hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device 1 may be any electronic product capable of performing human-computer interaction with a user, such as a Personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), a game machine, an interactive web Television (IPTV), and a smart wearable device. The computer apparatus 1 may be a server including, but not limited to, a single web server, a server group consisting of a plurality of web servers, or a Cloud based Computing (Cloud Computing) consisting of a large number of hosts or web servers, wherein Cloud Computing is one of distributed Computing, a super virtual computer consisting of a collection of loosely coupled computers. The Network where the computer device 1 is located includes, but is not limited to, the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.

The memory 11 is used for storing programs and various data of the living body detecting method and realizing high-speed and automatic access of the programs or the data during the operation of the computer device 1. The memory 11 may be an external storage device and/or an internal storage device of the computer apparatus 1. Further, the Memory 11 may be a circuit having a storage function without a physical form In an integrated circuit, such as a RAM (Random-Access Memory), a FIFO (First In First Out), and the like, or the Memory 11 may also be a storage device with a physical form, such as a Memory stick, a TF card (Trans-flash card), and the like.

The processor 12 may be a Central Processing Unit (CPU). The CPU is an ultra-large scale integrated circuit, and is an arithmetic Core (Core) and a Control Core (Control Unit) of the computer device 1. The processor 12 may execute an operating system of the computer device 1 and various types of application programs, program codes, and the like installed, for example, execute the operating system in each module or unit in the living body detecting system 10 and various types of application programs, program codes, and the like installed, to realize the living body detecting method.

The input/output device 13 is mainly used for implementing input/output functions of the computer apparatus 1, such as transceiving input numeric or character information, or displaying information input by a user or information provided to a user and various menus of the computer apparatus 1.

The modules/units integrated with the computer device 1 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The characteristic means of the present invention described above may be realized by an integrated circuit and control functions of realizing the living body detecting method described in any of the above embodiments. That is, the integrated circuit according to the present invention is mounted on the computer device 1, and causes the computer device 1 to function as follows:

The functions that can be realized by the biopsy method in any of the embodiments can be implemented in the computer device 1 by the integrated circuit of the present invention, so that the computer device 1 can perform the functions that can be realized by the biopsy method in any of the embodiments, and the details thereof will not be described here.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of modules or means recited in the system claims may also be implemented by one module or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A method of in vivo detection, the method comprising:

extracting the dynamic mode feature with the maximum energy in the dynamic mode features of the continuous N frames of face images, wherein the extracting comprises the following steps: adopting (m x N) 1 column vectors to represent m x N gray value data contained in the face image, and acquiring a first data matrix consisting of N-1 column vectors corresponding to the front N-1 frames of face images and a second data matrix consisting of N-1 column vectors corresponding to the back N-1 frames of face images, wherein m and N are positive integers; acquiring an adjoint matrix of a linear mapping matrix according to the first data matrix and the second data matrix, wherein the linear mapping matrix is a matrix obtained by multiplying the first data matrix and an inverse matrix of the second data matrix; obtaining an eigenvector and an eigenvalue of the adjoint matrix through eigenvalue decomposition; determining a feature vector corresponding to the feature value with the maximum absolute value in the feature values; multiplying the first data matrix by the eigenvector corresponding to the eigenvalue with the maximum absolute value, and taking an absolute value of a result after multiplication to obtain the dynamic mode characteristic with the maximum energy in the dynamic mode characteristics of the face images of the continuous N frames;

2. The live body detection method according to claim 1, wherein the first color space is an RGB color space, the second color space is a Lab color space, and the extracting the texture feature of the face image converted into the intermediate frame of the second color space comprises:

3. The in-vivo detection method as set forth in claim 2, wherein the extracting the local phase quantization texture feature of the preset neighborhood of the face image converted into the intermediate frame of the Lab color space comprises:

4. The liveness detection method of claim 1 wherein the obtaining a companion matrix of a linear mapping matrix from the first data matrix and the second data matrix comprises:

5. The liveness detection method of claim 1 wherein the multi-layered sensor comprises at least a first fully-connected layer and a second fully-connected layer, the training of the multi-layered sensor with the preset training set, the determining the multi-layered sensor model comprising:

determining a loss of contrast for the first sample and the second sample;

and until the total loss meets the preset condition of loss convergence, taking the parameters of the first full-connection layer and the parameters of the second full-connection layer in the last calculation process before the preset condition of loss convergence are met as the parameters of the first full-connection layer and the parameters of the second full-connection layer of the multilayer sensor model, and determining the multilayer sensor model.

6. The in-vivo detection method according to claim 5, wherein the preset condition includes that the total loss is calculated a number of times equal to a preset number threshold or that the total loss is less than or equal to a preset loss threshold.

7. A computer arrangement, characterized in that the computer arrangement comprises a processor for implementing the liveness detection method according to any one of claims 1 to 6 when executing a computer program stored in a memory.

8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the living body detecting method according to any one of claims 1 to 6.