CN115909467A

CN115909467A - Human face living body detection method, device, equipment and medium in motion state scene

Info

Publication number: CN115909467A
Application number: CN202310015734.6A
Authority: CN
Inventors: 罗燕
Original assignee: Shenzhen Yihuitong Technology Co ltd
Current assignee: Shenzhen Yihuitong Technology Co ltd
Priority date: 2023-01-06
Filing date: 2023-01-06
Publication date: 2023-04-04

Abstract

The invention relates to the field of intelligent decision making, and discloses a method, a device, equipment and a medium for detecting a living human face in a motion state scene, wherein the method comprises the following steps: acquiring a face image in a motion state scene, performing face detection on the face image to obtain a detected face, and performing format standardization on the detected face to obtain a standardized face; carrying out texture coding on the standardized face to obtain coding texture of the standardized face, extracting texture information, calculating pixel change information and constructing three-dimensional structure information of the standardized face; respectively extracting the texture information, the pixel change information and the three-dimensional structure information to obtain texture characteristics, pixel change characteristics and three-dimensional structure characteristics, and performing characteristic fusion on the texture characteristics, the pixel change characteristics and the three-dimensional structure characteristics to obtain fusion characteristics; calculating the living body detection score of the standardized human face and determining the human face living body detection result of the human face image. The invention can improve the comprehensiveness of the human face living body detection in the scene of motion state.

Description

Human face living body detection method, device, equipment and medium in motion state scene

Technical Field

The invention relates to the field of intelligent decision, in particular to a human face living body detection method, a human face living body detection device, human face living body detection equipment and a human face living body detection medium in a motion state scene.

Background

The human face living body detection in the motion state scene refers to a process of distinguishing whether a currently acquired human face image is from a living body human face or a false human face.

At present, for the human face living body detection problem, an attention-based fusion method is provided to fuse RGB and MSR characteristics, but in a deep neural network, a characteristic map from a deep layer expresses high semantic level information, and when a deception cue is positioned in a low-level image pixel, a problem occurs; and aiming at the problem that a small amount of feature information has limitation, the serial or parallel fusion mode of a plurality of feature vectors treats each feature vector independently and equally without utilizing the correlation among features. Therefore, due to the omission of low-level information and the omission of the correlation between features in the face image, the face liveness detection in the motion state scene is not comprehensive enough.

Disclosure of Invention

In order to solve the above problems, the present invention provides a method, an apparatus, a device, and a medium for detecting a living human face in a motion state scene, which can enhance the attention on the correlation between low-level information in a human face image and features, and improve the comprehensiveness of the living human face detection in the motion state scene.

In a first aspect, the present invention provides a method for detecting a living human face in a motion state scene, including:

acquiring a face image in a motion state scene, performing face detection on the face image to obtain a detected face, and performing format standardization on the detected face to obtain a standardized face;

performing texture coding on the standardized face to obtain coding texture of the standardized face, extracting texture information in the standardized face based on the coding texture, calculating pixel change information in the standardized face, and constructing three-dimensional structure information of the standardized face;

respectively extracting the texture information, the pixel change information and the three-dimensional structure information to obtain texture characteristics, pixel change characteristics and three-dimensional structure characteristics, and performing characteristic fusion on the texture characteristics, the pixel change characteristics and the three-dimensional structure characteristics to obtain fusion characteristics;

and calculating the living body detection score of the standardized human face according to the fusion characteristics, and determining the human face living body detection result of the human face image by using the living body detection score.

In a possible implementation manner of the first aspect, the format normalizing the detected face to obtain a normalized face includes:

acquiring a face key point corresponding to the detected face;

calculating a similarity transformation matrix between the face key points and preset standard face key points;

and carrying out standard space transformation on the detected face by using the similarity transformation matrix to obtain the standardized face.

In a possible implementation manner of the first aspect, the texture coding the standardized face to obtain a coded texture of the standardized face includes:

carrying out region division on the standardized face to obtain a divided face region;

calculating pixel texture values in the divided face regions using the following formula:

wherein, L (x) _c ,y _c ) Represents the pixel texture value, (x) _c ,y _c ) Coordinates representing pixel points in the divided face region, p represents (x) _c ,y _c ) Number of neighborhood pixels of a pixel point, p =8,i _c Represents (x) _c ,y _c ) Gray value of pixel i _n Expressing the gray value of the neighborhood pixel point, and s (x) expresses a symbol function;

carrying out frequency statistics on the pixel texture value to obtain the statistical frequency of the pixel texture value;

carrying out normalization processing on the statistical frequency to obtain normalized frequency;

and carrying out vector coding on the normalized frequency to obtain the coding texture.

In one possible implementation manner of the first aspect, the calculating pixel variation information in the normalized face includes:

calculating pixel intensity difference values in the normalized face using the following formula:

wherein, epsilon (y) _c ) Representing the pixel intensity difference value, q representing the total number of neighborhood points of a pixel randomly selected from the standardized face, p =8, y _c Representing the gray value, y, of an arbitrarily selected pixel in said standardized face _j Denotes y _c J represents the serial number of the neighborhood pixel point;

calculating a pixel gradient direction value in the standardized human face;

performing histogram transformation on the pixel intensity difference value and the pixel gradient direction value to obtain an intensity difference image and a gradient direction image;

and carrying out image merging processing on the intensity difference image and the gradient direction image to obtain the pixel change information.

In a possible implementation manner of the first aspect, the constructing three-dimensional structure information of the standardized human face includes:

calculating a three-dimensional structure value of the standardized face using the following formula:

I(x,y)＝ρ(x,y)n ^T (x,y)s

wherein I (x, y) represents a three-dimensional structure value of the standardized face, ρ represents an albedo of a face image corresponding to the standardized face, and n represents ^T (x, y) represents the surface normal of the standardized face, the surface normal appearing in a 3D shape, s represents a point source of illumination illuminating the face image, n is a two-dimensional planar structure since the screen of the photograph and replay video is a two-dimensional planar structure ^T (x, y) is a constant;

constructing a three-dimensional structure vector of the three-dimensional structure value;

and carrying out image form conversion on the three-dimensional structure vector to obtain the three-dimensional structure information.

In a possible implementation manner of the first aspect, the performing feature fusion on the texture feature, the pixel variation feature, and the three-dimensional structure feature to obtain a fusion feature includes:

calculating the texture feature importance, the change feature importance and the three-dimensional feature importance of the texture feature, the pixel change feature and the three-dimensional structure feature by using the following formulas:

d _i ＝q ^T f _i

wherein, d ₁ Representing the importance of said texture feature, d ₂ Representing the importance of the change feature, d ₃ Representing the importance of said three-dimensional feature, d _i Denotes the importance, i =1, 2, 3, q denotes the query vector, f ₁ Representing said texture feature, f ₂ Representing said pixel variation characteristic, f ₃ Representing the three-dimensional structural feature, T representing a transposed symbol;

calculating the texture feature weight, the change feature weight and the three-dimensional feature weight of the texture feature, the pixel change feature and the three-dimensional structure feature according to the texture feature importance, the change feature importance and the three-dimensional feature importance by using the following formulas:

wherein, ω is ₁ Representing the texture feature weight, ω ₂ Representing the weight, ω, of the variation feature ₃ Representing the three-dimensional feature weight, d ₁ Representing the importance of said texture feature, d ₂ Representing the importance of the variation feature, d ₃ Representing the importance of said three-dimensional feature, d _i Represents importance, i =1, 2, 3;

calculating the fusion feature according to the texture feature weight, the change feature weight and the three-dimensional feature weight by using the following formula:

wherein v represents the fusion feature, i =1, 2, 3,f ₁ Representing said texture feature, f ₂ Representing said pixel variation characteristic, f ₃ Representing said three-dimensional structural feature, ω ₁ Representing the texture feature weight, ω ₂ Representing the weight, ω, of the variation feature ₃ Representing the three-dimensional feature weights.

In one possible implementation manner of the first aspect, the calculating a living body detection score of the normalized human face according to the fusion feature includes:

performing linear fitting on the fusion characteristics by using the following formula to obtain fitting characteristics:

Z＝b+uv

wherein Z represents the fitting feature, v represents the fusion feature, b represents a deviation parameter of a model for in vivo examination, and u represents a weight parameter of the model for in vivo examination;

calculating the in-vivo detection score according to the fitted feature by using the following formula:

wherein, y ^′ Representing the liveness detection score, Z representing the fitted feature.

In a second aspect, the present invention provides a living human face detection apparatus in a motion state scene, the apparatus comprising:

the face standardization module is used for acquiring a face image in a motion state scene, carrying out face detection on the face image to obtain a detected face, and carrying out format standardization on the detected face to obtain a standardized face;

the three-dimensional construction module is used for carrying out texture coding on the standardized human face to obtain coding texture of the standardized human face, extracting texture information in the standardized human face based on the coding texture, calculating pixel change information in the standardized human face and constructing three-dimensional structure information of the standardized human face;

the feature fusion module is used for respectively extracting features of the texture information, the pixel change information and the three-dimensional structure information to obtain texture features, pixel change features and three-dimensional structure features, and performing feature fusion on the texture features, the pixel change features and the three-dimensional structure features to obtain fusion features;

and the result determining module is used for calculating the living body detection score of the standardized human face according to the fusion characteristics and determining the human face living body detection result of the human face image by using the living body detection score.

In a third aspect, the present invention provides an electronic device comprising:

at least one processor; and a memory communicatively coupled to the at least one processor;

wherein the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method for live human face detection in a motion state scene as described in any one of the above first aspects.

In a fourth aspect, the present invention provides a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, implements the method for detecting a living human face in a motion state scene as set forth in any one of the above first aspects.

Compared with the prior art, the technical principle and the beneficial effects of the scheme are as follows:

the method comprises the steps of firstly collecting a face image in a motion state scene to be used for carrying out living body detection on face information in the motion state scene, further carrying out face detection on the face image to be used for extracting a face part in the image and neglecting information irrelevant to the face part, further standardizing the format of the detected face to be used for extracting an accurate face image and reserving some face background images to improve the detection performance of subsequent feature extraction, and fixing the detected face into the size of a standard format; secondly, the embodiment of the invention performs texture coding on the standardized face to ensure the stability of face information, and since the non-living face information can show phenomena of local highlight, image blur and the like due to poor exposure and noise interference, the embodiment of the invention performs coding on texture details, so as to ensure that the information is kept stable in the non-living face information and easy for subsequent feature extraction, further, the embodiment of the invention extracts texture information in the standardized face based on the coding texture to be used for extracting feature information of the standardized face, further, the embodiment of the invention calculates pixel change information in the standardized face to make up the defect that the texture information cannot sufficiently reflect space distribution structure information of gray change in a local window and is difficult to embody inherent change features of the texture, enhances the mutual relationship between the texture and the pixel change features, and further, the embodiment of the invention constructs three-dimensional structure information of the standardized face to be used for displaying three-dimensional structure information based on living features of the non-living face mainly existing in two-dimensional plane structures such as paper, screens of video equipment and photos; further, the embodiment of the present invention performs feature extraction on the texture information, the pixel change information, and the three-dimensional structure information, respectively, so as to utilize the advantage that a residual error network has a higher output accuracy, improve the accuracy of feature extraction of low-level information in a face image, and ensure that a subsequent feature fusion result is not affected by the accuracy of the feature extraction, and further, the embodiment of the present invention performs feature fusion on the texture feature, the pixel change feature, and the three-dimensional structure feature, so as to correlate relationships among different features in the face image, so as to improve the comprehensiveness of face in-vivo detection; furthermore, according to the embodiment of the invention, the living body detection score of the standardized human face is calculated according to the fusion feature so as to be used for converging the recognition result of the features in the human face image into a score value, so that the living body detection result can be conveniently judged according to the numerical value. Therefore, the method, the device, the equipment and the medium for detecting the living human face in the motion state scene, which are provided by the embodiment of the invention, can enhance the attention to the mutual relation between the low-layer information in the human face image and the characteristics and improve the comprehensiveness of the living human face detection in the motion state scene.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a schematic flow chart of a human face live detection method in a motion state scene according to an embodiment of the present invention;

fig. 2 is a schematic flowchart illustrating a step of the method for detecting a living human face in a motion state scene according to an embodiment of the present invention;

fig. 3 is a schematic flowchart illustrating another step of the method for detecting a living human face in a motion state scene in accordance with an embodiment of the present invention;

fig. 4 is a schematic block diagram of a human face live detection apparatus in a motion state scene according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an internal structure of an electronic device for implementing a face live detection method in a motion state scene according to an embodiment of the present invention.

Detailed Description

It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.

The embodiment of the invention provides a face living body detection method in a motion state scene, wherein an execution subject of the face living body detection method in the motion state scene comprises at least one of electronic devices such as but not limited to a server and a terminal which can be configured to execute the method provided by the embodiment of the invention. In other words, the living human face detection method in the motion state scene may be executed by software or hardware installed in a terminal device or a server device, and the software may be a block chain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like. The server may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a web service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), and a big data and artificial intelligence platform.

Fig. 1 is a schematic flow chart of a human face live detection method in a motion state scene according to an embodiment of the present invention. The method for detecting the living human face in the motion state scene depicted in fig. 1 includes:

s1, acquiring a face image in a motion state scene, carrying out face detection on the face image to obtain a detected face, and carrying out format standardization on the detected face to obtain a standardized face.

The embodiment of the invention is used for carrying out living body detection on the face information in the motion state scene by acquiring the face image in the motion state scene.

Further, the embodiment of the invention performs face detection on the face image to extract a face part in the image and ignores information irrelevant to the face part.

In an embodiment of the present invention, the performing face detection on the face image to obtain a detected face includes: identifying a face candidate window in the face image; carrying out window correction on the face candidate window to obtain a corrected candidate window; and determining face key points of the face image based on the correction candidate window, and taking face key parts corresponding to the face key points as the detected face.

Illustratively, the process of recognizing the face candidate window in the face image is realized by utilizing a P-Net network structure of a face detection deep learning model of multitask cascade CNN, wherein the P-Net network structure comprises three convolution + Max-Pooling operations; the process of correcting the face candidate window to obtain a corrected candidate window is realized by utilizing an R-Net network structure of a face detection deep learning model of the multi-task cascading CNN, wherein the R-Net network structure comprises three convolution + Max-Pooling operations, a full connection layer is connected behind a feature diagram of the last layer, and the full connection operation is used when three different tasks are connected; the process of determining the face key points of the face image and using the face key parts corresponding to the face key points as the detected face is realized by using an O-Net network structure of a face detection deep learning model of the multitask cascade CNN, wherein the O-Net network structure is similar to the R-Net network structure, and further description is omitted here.

Furthermore, the embodiment of the invention performs format standardization on the detected face to extract accurate face images and reserve some face background images so as to improve the detection performance of subsequent feature extraction and fix the detected face into a standard format.

In an embodiment of the present invention, referring to fig. 2, the performing format normalization on the detected face to obtain a normalized face includes:

s201, obtaining face key points corresponding to the detected face;

s202, calculating a similarity transformation matrix between the face key points and preset standard face key points;

and S203, performing standard space transformation on the detected face by using the similarity transformation matrix to obtain the standardized face.

Illustratively, a least square method is used for solving a spatial transformation matrix between a key point of a current face image and a predefined standard face key point, and then spatial transformation corresponding to the spatial transformation matrix is carried out on the face image, so that the face image with the standard size after transformation is obtained.

S2, carrying out texture coding on the standardized human face to obtain coding textures of the standardized human face, extracting texture information in the standardized human face based on the coding textures, calculating pixel change information in the standardized human face, and constructing three-dimensional structure information of the standardized human face.

The embodiment of the invention carries out texture coding on the standardized face so as to ensure the stability of face information, and because the non-living face information can show phenomena of local highlight, image blurring and the like due to poor exposure and noise interference, the texture detail coding is carried out on the non-living face information, so that the information can be ensured to be kept stable in the non-living face information, and the subsequent feature extraction is easy.

In an embodiment of the present invention, the texture coding on the standardized human face to obtain the coding texture of the standardized human face includes: carrying out region division on the standardized human face to obtain a divided human face region; calculating pixel texture values in the divided face regions by using the following formula:

wherein, L (x) _c ,y _c ) Representing the pixel texture value, (x) _c ,y _c ) Coordinates representing pixel points in the divided face region, p represents (x) _c ,y _c ) The number of neighborhood pixels of a pixel point, p =8,i _c Represents (x) _c ,y _c ) Gray value of pixel i _n Expressing the gray value of a neighborhood pixel point, and s (x) expressing a symbolic function;

carrying out frequency statistics on the pixel texture value to obtain the statistical frequency of the pixel texture value; carrying out normalization processing on the statistical frequency to obtain normalized frequency; and carrying out vector coding on the normalized frequency to obtain the coding texture.

Optionally, the process of performing frequency statistics on the pixel texture values to obtain the statistical frequency of the pixel texture values refers to calculating the frequency of the pixel texture values of each pixel point in each divided face region appearing in the respective divided face region; the process of carrying out vector coding on the normalized frequency to obtain the coding texture refers to that the frequencies of the pixels in each divided face region are connected into a vector.

Further, the embodiment of the present invention extracts texture information in the standardized human face based on the coding texture, so as to extract feature information of the standardized human face.

Optionally, the process of extracting texture information in the standardized human face based on the coding texture is implemented by performing feature extraction on the coding texture by using a convolutional neural network.

Furthermore, the embodiment of the invention calculates the pixel change information in the standardized human face to make up for the defect that the texture information cannot sufficiently reflect the spatial distribution structure information of the gray level change in the local window and the internal change feature of the texture is difficult to embody, and enhances the correlation between the texture and the pixel change feature. Wherein the pixel variation information refers to variation information of the luminance intensity of the pixel.

In an embodiment of the present invention, the calculating pixel variation information in the normalized face includes: calculating pixel intensity difference values in the normalized face using the following formula:

wherein epsilon (y) _c ) Representing the pixel intensity difference value, q representing the total number of neighborhood points of a pixel randomly selected from the standardized face, p =8, y _c Representing the gray value, y, of an arbitrarily selected pixel in said standardized face _j Denotes y _c J represents the serial number of the neighborhood pixel point;

calculating a pixel gradient direction value in the standardized human face; performing histogram transformation on the pixel intensity difference value and the pixel gradient direction value to obtain an intensity difference image and a gradient direction image; and carrying out image combination processing on the intensity difference image and the gradient direction image to obtain the pixel change information.

Optionally, the process of performing image merging processing on the intensity difference image and the gradient direction image to obtain the pixel change information refers to a process of converting a two-dimensional array formed by combining the intensity difference image and the gradient direction image into a one-dimensional array.

Further, the embodiment of the present invention constructs the three-dimensional structure information of the standardized human face to highlight the three-dimensional structure information of the live human face based on the characteristics of the non-live human face mainly existing in the medium of the two-dimensional planar structure such as paper, a screen of a video device, and a photo.

In an embodiment of the present invention, the constructing the three-dimensional structure information of the standardized human face includes: calculating a three-dimensional structure value of the standardized face using the following formula:

I(x,y)＝ρ(x,y)n ^T (x,y)s

constructing a three-dimensional structure vector of the three-dimensional structure value; and carrying out image form conversion on the three-dimensional structure vector to obtain the three-dimensional structure information.

Optionally, the process of constructing the three-dimensional structure vector of the three-dimensional structure value is implemented by first converting a face image composed of the three-dimensional structure value into a histogram image, and then performing vector combination on pixel values in the histogram image.

And S3, respectively extracting the texture information, the pixel change information and the three-dimensional structure information to obtain texture characteristics, pixel change characteristics and three-dimensional structure characteristics, and performing characteristic fusion on the texture characteristics, the pixel change characteristics and the three-dimensional structure characteristics to obtain fusion characteristics.

The embodiment of the invention respectively extracts the texture information, the pixel change information and the three-dimensional structure information, so that the accuracy of extracting the characteristics of the low-layer information in the face image is improved by utilizing the advantage that a residual error network has higher output accuracy, and the result of subsequent characteristic fusion is not influenced by the accuracy. The residual error network is a deep residual error network, is composed of a plurality of layers of convolution layers and is mainly used for feature extraction.

In an embodiment of the present invention, the extracting the texture information, the pixel change information, and the three-dimensional structure information to obtain the texture feature, the pixel change feature, and the three-dimensional structure feature is implemented by a residual error network.

Furthermore, the embodiment of the invention performs feature fusion on the texture feature, the pixel change feature and the three-dimensional structure feature so as to be used for associating the correlation among different features in the face image to improve the comprehensiveness of the face living body detection.

In an embodiment of the present invention, the performing feature fusion on the texture feature, the pixel variation feature, and the three-dimensional structure feature to obtain a fusion feature includes: calculating the texture feature importance, the change feature importance and the three-dimensional feature importance of the texture feature, the pixel change feature and the three-dimensional structure feature by using the following formulas:

d _i ＝q ^T f _i

wherein d is ₁ Representing the importance of said texture feature, d ₂ Representing the importance of the variation feature, d ₃ Representing the importance of said three-dimensional feature, d _i Denotes importance, i =1, 2, 3, q denotes query vector, f ₁ Representing said texture feature, f ₂ Representing said pixel variation characteristic, f ₃ Representing the three-dimensional structural feature, and T represents a transposed symbol;

according to the texture feature weight, the change feature weight and the three-dimensional feature weight, calculating the fusion feature by using the following formula:

wherein v represents the fusion feature, i =1, 2, 3,f ₁ Representing said texture feature, f ₂ Representing said pixel variation characteristic, f ₃ Representing said three-dimensional structural feature, ω ₁ Representing the texture feature weight, ω ₂ Representing the weight, ω, of the variation feature ₃ Representing the three-dimensional feature weight.

And S4, calculating the living body detection score of the standardized face according to the fusion characteristics, and determining the face living body detection result of the face image by using the living body detection score.

According to the embodiment of the invention, the living body detection score of the standardized human face is calculated according to the fusion characteristics so as to be used for converging the identification result of the characteristics in the human face image into a score value, and the living body detection result can be conveniently judged according to the numerical value.

In an embodiment of the present invention, the calculating the live-body detection score of the normalized face according to the fusion feature includes: performing linear fitting on the fusion characteristics by using the following formula to obtain fitting characteristics:

Z＝b+uv

In an embodiment of the present invention, referring to fig. 3, the determining the living human face detection result of the human face image by using the living human face detection score includes:

s301, setting a detection score threshold value of the living body detection score;

s302, when the living body detection score is not larger than the detection score threshold value, taking the failure of the living body detection of the face image as the face living body detection result;

and S303, when the living body detection score is larger than the detection score threshold value, successfully detecting the living body of the face image to be used as the face living body detection result.

The embodiment of the invention firstly collects the face image in the motion state scene to be used for carrying out living body detection on the face information in the motion state scene, and further, the embodiment of the invention carries out face detection on the face image to be used for extracting the face part in the image and neglecting the information irrelevant to the face part, and further, the embodiment of the invention carries out format standardization on the detected face to be used for extracting the accurate face image and reserving some face background images to improve the detection performance of the subsequent feature extraction and fix the detected face as the size of the standard format; secondly, texture coding is carried out on the standardized human face to ensure the stability of human face information, and as the non-living human face information can show phenomena of local highlight, image blurring and the like due to poor exposure and noise interference, texture detail coding is carried out on the standardized human face information to ensure that the information is kept stable in the non-living human face information and subsequent feature extraction is easy; further, the embodiment of the present invention respectively performs feature extraction on the texture information, the pixel change information, and the three-dimensional structure information to use the advantage that a residual error network has a higher output accuracy, improve the accuracy of feature extraction of low-level information in a face image, and ensure that a subsequent feature fusion result is not affected by the accuracy of the feature extraction, and further, the embodiment of the present invention performs feature fusion on the texture feature, the pixel change feature, and the three-dimensional structure feature to relate relationships among different features in the face image to improve the comprehensiveness of face living body detection; furthermore, according to the embodiment of the invention, the living body detection score of the standardized human face is calculated according to the fusion characteristics so as to be used for converging the recognition results of the characteristics in the human face image into a score value, and the living body detection result can be conveniently judged according to the numerical value. Therefore, the living human face detection method in the motion state scene provided by the embodiment of the invention can enhance the attention to the mutual relation between the attention and the characteristics of the low-level information in the human face image and improve the comprehensiveness of the living human face detection in the motion state scene.

Fig. 4 is a functional block diagram of the living human face detection device in a motion state scene according to the invention.

The human face living body detection device 400 in the motion state scene can be installed in an electronic device. According to the realized functions, the living human face detection device in the motion state scene can comprise a human face standardization module 401, a three-dimensional construction module 402, a feature fusion module 403 and a result determination module 404. The module of the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.

In the embodiment of the present invention, the functions of the modules/units are as follows:

the face standardization module 401 is configured to collect a face image in a motion state scene, perform face detection on the face image to obtain a detected face, and perform format standardization on the detected face to obtain a standardized face;

the three-dimensional construction module 402 is configured to perform texture coding on the standardized human face to obtain a coding texture of the standardized human face, extract texture information in the standardized human face based on the coding texture, calculate pixel change information in the standardized human face, and construct three-dimensional structure information of the standardized human face;

the feature fusion module 403 is configured to perform feature extraction on the texture information, the pixel change information, and the three-dimensional structure information respectively to obtain a texture feature, a pixel change feature, and a three-dimensional structure feature, and perform feature fusion on the texture feature, the pixel change feature, and the three-dimensional structure feature to obtain a fusion feature;

the result determining module 404 is configured to calculate a living body detection score of the standardized face according to the fusion feature, and determine a living body detection result of the face image by using the living body detection score.

In detail, when the modules in the human face living body detection apparatus 400 in the motion state scene in the embodiment of the present invention are used, the same technical means as the above-mentioned human face living body detection method in the motion state scene described in fig. 1 to fig. 3 are adopted, and the same technical effects can be produced, which is not described herein again.

Fig. 5 is a schematic structural diagram of an electronic device for implementing a face live detection method in a motion state scene according to the present invention.

The electronic device may include a processor 50, a memory 51, a communication bus 52, and a communication interface 53, and may further include a computer program stored in the memory 51 and operable on the processor 50, such as a living human face detection program in a motion state scene.

In some embodiments, the processor 50 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same function or different functions, and includes one or more Central Processing Units (CPUs), a microprocessor, a digital Processing chip, a graphics processor, a combination of various control chips, and the like. The processor 50 is a Control Unit (Control Unit) of the electronic device, connects various components of the whole electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device by running or executing programs or modules (for example, executing a living human face detection program in a motion state scene, etc.) stored in the memory 51 and calling data stored in the memory 51.

The memory 51 includes at least one type of readable storage medium including flash memory, removable hard disks, multimedia cards, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, etc. The memory 51 may in some embodiments be an internal storage unit of the electronic device, for example a removable hard disk of the electronic device. The memory 51 may also be an external storage device of the electronic device in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the electronic device. Further, the memory 51 may also include both an internal storage unit and an external storage device of the electronic device. The memory 51 may be used to store not only application software installed in the electronic device and various types of data, such as codes of a database configuration connection program, but also temporarily store data that has been output or will be output.

The communication bus 52 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 51 and at least one processor 50 or the like.

The communication interface 53 is used for communication between the electronic device 5 and other devices, and includes a network interface and a user interface. Optionally, the network interface may include a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), which are typically used to establish a communication connection between the electronic device and other electronic devices. The user interface may be a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable, among other things, for displaying information processed in the electronic device and for displaying a visualized user interface.

Fig. 5 shows only an electronic device with components, and those skilled in the art will appreciate that the structure shown in fig. 5 does not constitute a limitation of the electronic device, and may include fewer or more components than shown, or some components may be combined, or a different arrangement of components.

For example, although not shown, the electronic device may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 50 through a power management device, so that functions of charge management, discharge management, power consumption management and the like are realized through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.

It is to be understood that the embodiments described are for illustrative purposes only and that the scope of the claimed invention is not limited to this configuration.

The database configuration connection program stored in the memory 51 of the electronic device is a combination of computer programs, and when running in the processor 50, can realize:

Specifically, the processor 50 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the computer program, which is not described herein again.

Further, the electronic device integrated module/unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a non-volatile computer-readable storage medium. The storage medium may be volatile or nonvolatile. For example, the computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM).

The present invention also provides a storage medium, which is readable and stores a computer program that, when executed by a processor of an electronic device, can implement:

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.

The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

It is noted that, in this document, relational terms such as "first" and "second," and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

The above description is merely illustrative of particular embodiments of the invention that enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A human face living body detection method under a motion state scene is characterized by comprising the following steps:

2. The method of claim 1, wherein the format normalizing the detected face to obtain a normalized face comprises:

acquiring a face key point corresponding to the detected face;

3. The method of claim 1, wherein said texture coding the standardized face to obtain the coded texture of the standardized face comprises:

4. The method of claim 1, wherein the calculating pixel variation information in the normalized face comprises:

calculating a pixel gradient direction value in the standardized human face;

5. The method of claim 1, wherein the constructing the three-dimensional structure information of the standardized face comprises:

I(x,y)＝ρ(x,y)n ^T (x,y)s

6. The method according to claim 1, wherein the feature fusing the texture feature, the pixel variation feature and the three-dimensional structure feature to obtain a fused feature comprises:

d _i ＝q ^T f _i

wherein d is ₁ Representing the importance of said texture feature, d ₂ Representing the importance of the variation feature, d ₃ Representing the importance of said three-dimensional feature, d _i Denotes the importance, i =1, 2, 3, q denotes the query vector, f ₁ Representing said texture feature, f ₂ Representing the pixelChange characteristic f ₃ Representing the three-dimensional structural feature, T representing a transposed symbol;

wherein, ω is ₁ Representing the texture feature weight, ω ₂ Representing the weight, ω, of the variation feature ₃ Representing the three-dimensional feature weight, d ₁ Representing the importance of the texture feature, d ₂ Representing the importance of the variation feature, d ₃ Representing the importance of said three-dimensional feature, d _i Represents importance, i =1, 2, 3;

7. The method of claim 1, wherein calculating the liveness detection score for the normalized face based on the fused features comprises:

Z＝b+uv

calculating the in-vivo detection score according to the fitted feature using the following formula:

8. A living human face detection device in a motion state scene, the device comprising:

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of live face detection in a motion state scene as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, implements a method for detecting a living human face in a motion state scene according to any one of claims 1 to 7.