WO2020249054A1 - 一种长基线双目人脸活体检测方法及系统 - Google Patents

一种长基线双目人脸活体检测方法及系统 Download PDF

Info

Publication number
WO2020249054A1
WO2020249054A1 PCT/CN2020/095663 CN2020095663W WO2020249054A1 WO 2020249054 A1 WO2020249054 A1 WO 2020249054A1 CN 2020095663 W CN2020095663 W CN 2020095663W WO 2020249054 A1 WO2020249054 A1 WO 2020249054A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
face image
image
long
camera
Prior art date
Application number
PCT/CN2020/095663
Other languages
English (en)
French (fr)
Inventor
冀怀远
刘澍
杨现
徐兆坤
许艳茹
Original Assignee
苏宁云计算有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏宁云计算有限公司 filed Critical 苏宁云计算有限公司
Priority to CA3147418A priority Critical patent/CA3147418A1/en
Publication of WO2020249054A1 publication Critical patent/WO2020249054A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/166Detection; Localisation; Normalisation using acquisition arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive

Definitions

  • the invention relates to the technical field of face recognition, in particular to a long-baseline binocular face living body detection method and system.
  • the living body detection methods using ordinary cameras are roughly divided into three types.
  • One is the image living body detection method based on pure software. This method judges the living body according to the characteristics of the image, such as texture, background, and illumination. The disadvantage of this method is the surrounding The environment is more sensitive, the detection performance is unstable, and the applicability is poor.
  • the second is a video live detection method based on interaction with the user. This method judges whether the current face is a live face by the user continuously making certain actions. The disadvantage of this method is that the detection result is greatly affected by the standardization of user actions , Its user experience is poor and can be breached by recorded videos.
  • the third is a live body detection method based on additional hardware to collect information.
  • This method usually uses a short baseline binocular camera to collect face images, and the additional information obtained by the auxiliary camera is used to achieve live body detection.
  • the defect of this method is the short baseline binocular
  • the actual three-dimensional restoration effect of the camera is unstable, and the calculation of this method is relatively complicated, and the recognition efficiency is low.
  • embodiments of the present invention provide a long-baseline binocular face living body detection method and system.
  • the technical solution is as follows:
  • a long-baseline binocular face live detection method includes:
  • auxiliary camera does not collect the second face image, it is determined that the current face is a non-living human face; if the auxiliary camera can collect the second face image, then the first person The face image and the second face image are respectively normalized to a preset pixel size;
  • the live body detection score meets the preset score standard, and if it is satisfied, it is determined that the current face is a live face, or if it is not satisfied, it is determined that the current face is a non-living face.
  • auxiliary camera is one or more, which are located in the same plane as the main camera, and are arranged at any one or more positions of the main camera up, down, left, and right.
  • the step of training the normalized first face image and the second face image through a neural network model to obtain a live body detection score includes:
  • the live body detection score is obtained according to the fusion feature.
  • the image quality features include: facial clarity, noise, lighting performance, and frequency spectrum features;
  • the frame structure features include: line structure features and texture features of the image.
  • the step of performing weighted fusion of the image quality feature and the frame structure feature to obtain the fusion feature includes:
  • the image quality feature and the frame structure feature are respectively multiplied by their respective learnable parameters, the learnable parameters are obtained by training the neural network model from the live human face samples.
  • the neural network model is a twin deep neural network model, and the twin deep neural network model includes two feature extractors and a fully connected classifier.
  • a long-baseline binocular face living body detection system includes: an image acquisition device and a detection system;
  • the image acquisition device includes:
  • the main camera located at one end of the long baseline, is set directly opposite to the face to be detected, and is used to collect the first face image;
  • Auxiliary camera located at the other end of the long baseline, used to collect the second face image
  • the detection system includes:
  • the face detection module is used to detect whether the main camera has collected the first face image, whether the auxiliary camera has collected the second face image, and determine whether the size of the first face image is Meet the preset size standard;
  • a face image processing module configured to normalize the first face image and the second face image to a preset pixel size
  • the face living body discrimination module which includes a neural network model, used to train the normalized first face image and the second face image to obtain a living body detection score, and determine the living body detection score Whether the preset score standard is met, if it is met, it is determined that the current face is a living face, or if it is not met, it is determined that the current face is a non-living face.
  • auxiliary camera is one or more, which are located in the same plane as the main camera, and are arranged at any one or more positions of the main camera up, down, left, and right.
  • the main camera includes a camera and a filter for filtering non-visible light
  • the auxiliary camera is any one or more of an infrared camera, a wide-angle camera, and a visible light camera.
  • the face image processing module includes: a twin deep neural network model, and the twin deep neural network model includes two feature extractors and a fully connected classifier.
  • the present invention uses a long baseline binocular camera to collect face images and combines the twin neural network model to extract image features to obtain live detection scores, which can accurately and efficiently perform live face image detection and recognition, overcoming the existing face recognition technology The defects of unstable recognition effect, high requirements on hardware equipment, and large amount of image processing calculations;
  • the long-baseline binocular camera disclosed in the present invention can include primary and secondary camera devices to collect images at the same time.
  • For general non-living face detection it can be quickly recognized at the first time, and for non-living face detection with high recognition difficulty It can be quickly identified through the short-term processing of the neural network model, and the identification efficiency is high;
  • the present invention simultaneously extracts image quality features that are sensitive to imaging materials and highly distinguishable from the main and auxiliary cameras, and frame structure features that are not easily disturbed by factors such as noise, ambient light, etc., as the recognition of non-living face images Feature factor, that is, the high accuracy provided by the image quality feature, and the high robustness provided by the frame structure feature.
  • FIG. 1 is a flowchart of a long-baseline binocular face live detection method provided by an embodiment of the present invention
  • Figure 2 is a schematic diagram of the arrangement of a main camera and an auxiliary camera provided by an embodiment of the present invention
  • Fig. 3 is a schematic diagram of a long-baseline binocular face living detection system module according to an embodiment of the present invention.
  • the embodiments of the present invention disclose a long-baseline binocular face live detection method and system.
  • the specific technical solutions are as follows.
  • a long-baseline binocular face live detection method includes:
  • auxiliary camera does not collect the second face image, it is determined that the current face is a non-living human face; if the auxiliary camera can collect the second face image, then the first person The face image and the second face image are respectively normalized to a preset pixel size;
  • the live body detection score meets the preset score standard, and if it is satisfied, it is determined that the current face is a live face, or if it is not satisfied, it is determined that the current face is a non-living face.
  • the baseline in the above method refers to the linear distance between the cameras, and the long baseline refers to the baseline longer than the short baseline.
  • the main camera is mainly used to collect the first face image from the front, so the first face image is the front image of the face.
  • the preset size standard is related to the distance between the face and the main camera. When setting the preset size standard, the distance between the face and the main camera can be specified in advance, and then the size standard can be set.
  • the main camera collects the first face image it can prompt the user to stand at a designated position or put the face in a prompt box on the display screen to measure the size of the first face image.
  • the size of the first face image is smaller than the preset size standard, it is determined that the current face is a non-living face, and if it meets the preset size standard, the next step is determined.
  • the auxiliary camera at the other end of the long baseline can collect part of the face image.
  • the non-living face image is It is flat, and the straight line distance between the auxiliary camera and the main camera is long, so the auxiliary camera usually cannot collect part of the face image.
  • the auxiliary camera can collect the second face image. If the auxiliary camera cannot collect the second face image, then the current detection is determined
  • the face image is a non-living face image, and the second face image is usually a partial face image.
  • normalization refers to the process of performing a series of standard processing transformations on the image to transform it into a fixed standard form.
  • the first face image is preferably normalized to 128*128 pixels
  • the second face image is normalized to 64*64 pixels.
  • the normalized first face image and second face image are put into the neural network model for feature extraction and training, and the live detection score is obtained.
  • the live detection score is affected by the image quality of the first face image and the frame structure characteristics of the second face image.
  • the live detection score is compared with a preset preset score standard.
  • the preset score standard is a standard obtained through neural network model training using a large number of live face images as training samples.
  • the standard is usually a Threshold, if the live detection score falls within the threshold, the currently detected face is a live face, and if it does not fall within the threshold, the currently detected face is a non-living face.
  • Figure 2 shows the possible layout of the main camera and the auxiliary camera.
  • the main camera 1 and the auxiliary camera 2 are located in the same plane to ensure that the vertical distance between the main camera and the auxiliary camera and the face is equal.
  • the connection between the main camera and the auxiliary camera is a long baseline 3.
  • each auxiliary camera can collect a second face image. Therefore, if one or more of the multiple auxiliary cameras does not collect the second face image, it can be directly determined that the current face image is a non-living face image.
  • the normalized first face image and the second face image are trained through a neural network model, and the step of obtaining a live body detection score includes:
  • the live body detection score is obtained according to the fusion feature.
  • the image quality feature is an image feature extracted from the first face image. Since the first face image is a front image of a face, the image quality of the first face image needs to be measured.
  • the image quality features include: the sharpness of the face in the picture, the degree of image noise, the illumination performance, the spectral characteristics, and can also include: wavelet characteristics, etc. If the currently detected face image is a non-living face image, because the material of the non-living face is very different from the skin material of the live face, these differences will definitely result in the captured non-living face image and the live person Face images have differences in many aspects, such as the sharpness of face texture, noise content, lighting performance, spectrum performance, etc., which reflect the characteristics of the material of the subject.
  • non-living human faces are usually electronic photographs or paper photographs.
  • the imaging clarity of these two types of materials must be lower than real human faces, with higher noise content, and reflections and moiré. Therefore, it can be determined whether the face picture is a live face picture by detecting the above image quality characteristics.
  • the method disclosed in the embodiment of the present invention combines the first face image and the second face image to determine whether the detected face image is a living face image.
  • the frame structure feature of the second face image if the currently detected face image is a non-living face image, then there may be a frame of the picture, or the boundary of the face image may not be alive in connection with the background of the environment. Face images are so naturally connected, so the frame structure features reflect the degree of integration between the image boundary and the background, including: texture lines, object boundaries and other line structure features and texture structure features in the image.
  • the two are weighted and fused to obtain the fusion feature.
  • the specific method is to multiply the two by their respective learnable parameters.
  • the learnable parameters are the weight values of the two features obtained by the training of the neural network model from the living human face sample.
  • the neural network model in the above method uses a twin deep neural network model.
  • the twin deep neural network model contains two feature extractors and a fully connected classifier.
  • the feature extractor can use the feature extractor of the existing neural network model. Taking the ResNet-50 model as an example, the feature extractor adopts the input and feature extraction layer structure of the ResNet-50 model, and the fully connected classifier is set after the feature extractor, which in turn includes the Average Pooling layer and FC fully connected Layer and Softmax layer.
  • ResNet-50 is a deep training neural network model, which uses a "shortcut connection" connection method, which can improve processing efficiency.
  • the feature extraction structure of the ResNet-50 model consists of a 7x7 convolutional layer, a 3x3max-pool layer and 16 residual blocks. Each residual block is made up of 3 convolutional layers, one 1x1 convolutional layer before and after each. , A 3x3 convolutional layer in the middle.
  • the entire feature extraction structure consists of 49 convolutional layers. After the data is input to the feature extractor, it first passes through the 7x7 convolutional layer and 3x3max-pool layer, and then sequentially passes through 16 residual blocks to finally obtain the extracted feature map.
  • the twin deep neural network model disclosed in the embodiment of the present invention improves its structure on the basis of the ResNet-50 model, and is suitable for the need for feature extraction of the first face image and the second face image in the technical solution of the present invention. , Can process two face images at the same time.
  • the embodiment of the present invention also discloses a long-baseline binocular face living body detection system, which includes: an image acquisition device and a detection system.
  • the image acquisition device includes: the main camera, located at one end of the long baseline, is set directly opposite to the face to be detected, and is used to capture the first face image; the auxiliary camera, located at the other end of the long baseline, is used to capture the second face image .
  • the detection system includes: a face detection module for detecting whether the main camera has collected the first face image, whether the auxiliary camera has collected the second face image, and determining the first face image Whether the size of the image meets a preset size standard; a face image processing module for normalizing the first face image and the second face image to a preset pixel size; face living body discrimination module , Which contains a neural network model for training the normalized first face image and the second face image to obtain a live body detection score, and determine whether the live body detection score meets a preset score If the value criterion is met, it is determined that the current face is a living face, or if it is not met, it is determined that the current face is a non-living face.
  • auxiliary cameras which are located in the same plane as the main camera, and are arranged at any one or more positions on the upper, lower, left, and right of the main camera.
  • the main camera includes a camera and a filter for filtering non-visible light;
  • the auxiliary camera is any one or more of an infrared camera, a wide-angle camera, and a visible light camera.
  • the face living body discrimination module is specifically used to extract the image quality characteristics of the first face image and the frame structure characteristics of the second face image using a neural network model, and combine the first face image with The second face image is reduced to the same dimension; the image quality feature and the frame structure feature are weighted and fused to obtain a fusion feature; the living body detection score is obtained according to the fusion feature.
  • the image quality features include: face definition, noise, lighting performance, and frequency spectrum features; frame structure features include: image line structure features and texture features.
  • the aforementioned neural network model is a twin deep neural network model, and the twin deep neural network model includes two feature extractors and a fully connected classifier.
  • the feature extractor adopts the input and feature extraction layer structure of the ResNet-50 model
  • the fully connected classifier is set after the feature extractor, and in turn includes the Average Pooling layer of the ResNet-50 model, the FC full Connection layer and Softmax layer.
  • the Average Pooling layer is used to reduce the dimensionality of the fusion feature
  • the FC fully connected layer and the Softmax layer are used to obtain the face live detection score.
  • the present invention uses a long baseline binocular camera to collect face images and combines the twin neural network model to extract image features to obtain live detection scores, which can accurately and efficiently perform live face image detection and recognition, overcoming the existing face recognition technology The defects of unstable recognition effect, high requirements on hardware equipment, and large amount of image processing calculations;
  • the long-baseline binocular camera disclosed in the present invention can include primary and secondary camera devices to collect images at the same time.
  • For general non-living face detection it can be quickly recognized at the first time, and for non-living face detection with high recognition difficulty It can be quickly identified through the short-term processing of the neural network model, and the identification efficiency is high;
  • the present invention simultaneously extracts image quality features that are sensitive to imaging materials and highly distinguishable from the main and auxiliary cameras, and frame structure features that are not easily disturbed by factors such as noise, ambient light, etc., as the recognition of non-living face images Feature factor, that is, the high accuracy provided by the image quality feature, and the high robustness provided by the frame structure feature.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)
  • Studio Devices (AREA)

Abstract

本发明公开了一种长基线双目人脸活体检测方法及系统,包括:通过主摄像头正面采集第一人脸图像,检测第一人脸图像的尺寸是否符合预设尺寸标准;若符合预设尺寸标准则判断辅助摄像头是否能采集到第二人脸图像;若没有则判定当前人脸为非活体人脸;若采集到则将第一人脸图像和第二人脸图像分别归一化到预设的像素尺寸;将归一化后的人脸图像通过神经网络模型进行训练得出活体检测分值;判断活体检测分值是否满足预设分值标准,若满足则判定当前人脸为活体人脸,若不满足则判定当前人脸为非活体人脸。本发明能够准确、高效地进行活体人脸图像的检测和识别,克服了现有人脸识别技术中识别效果不稳定、对硬件设备要求高以及图像处理计算量较大的缺陷。

Description

一种长基线双目人脸活体检测方法及系统 技术领域
本发明涉及人脸识别技术领域,特别涉及一种长基线双目人脸活体检测方法及系统。
背景技术
随着人体身份识别核验技术和图像智能检测识别技术的不断发展,人脸识别技术也日趋成熟,与此同时,对人脸识别核验系统的非活体假冒攻击方式也层出不穷,对人脸识别核验系统的可靠性和安全性构成了巨大的威胁。活体人脸检测方法正是用来排除非活体假冒攻击,保障人脸识别核验系统安全性的一种切实可行的方法。
目前,采用普通摄像头的活体检测方法大致分为三种,一种是基于纯软件的图片活体检测方法,该方法根据图片的纹理、背景、光照等特征进行活体判断,该方法的缺陷是对周围环境较为敏感,检测性能不稳定,适用性较差。第二种是基于与用户交互的视频活体检测方法,该方法通过用户连续做出一定的动作判断当前人脸是否为活体人脸,该方法的缺陷是检测结果受用户动作的规范性影响较大,其用户体验较差,可以被录制的视频攻破。第三种是基于额外硬件采集信息的活体检测方法,该方法通常采用短基线的双目摄像头采集人脸图像,通过辅助摄像头获取的额外信息来实现活体检测,该方法的缺陷是短基线双目摄像头实际的三维立体恢复效果不稳定,且该方法计算较为复杂,识别效率较低。
发明内容
为了解决现有技术的问题,本发明实施例提供了一种长基线双目人脸活体检测方法及系统。所述技术方案如下:
一方面,提供了一种长基线双目人脸活体检测方法,所述方法包括:
通过长基线一端的主摄像头正面采集第一人脸图像,检测所述第一人脸图像的尺寸是否符合预设尺寸标准;
若所述第一人脸图像的尺寸符合所述预设尺寸标准,则判断位于所述长基线另一端的辅助摄像头是否能采集到第二人脸图像;
若所述辅助摄像头没有采集到所述第二人脸图像,则判定当前人脸为非活体人脸;若所述辅助摄像头能采集到所述第二人脸图像,则将所述第一人脸图像和所述第二人脸图像分别归一化到预设的像素尺寸;
将归一化后的所述第一人脸图像和所述第二人脸图像通过神经网络模型进行训练,得出活体检测分值;
判断所述活体检测分值是否满足预设分值标准,若满足,则判定当前人脸为活体人脸,或不满足则判定当前人脸为非活体人脸。
进一步地,所述辅助摄像头为一台或多台,与所述主摄像头位于同一平面内,设置在所述主摄像头上、下、左、右的任意一个或多个位置。
进一步地,所述将归一化后的第一人脸图像和第二人脸图像通过神经网络模型进行训练,得出活体检测分值的步骤包括:
提取所述第一人脸图像的图像质量特征以及第二人脸图像的边框结构特征,并将所述第一人脸图像和所述第二人脸图像降低到同一维度;
将所述图像质量特征和所述边框结构特征进行加权融合,得到融合特征;
根据所述融合特征得出所述活体检测分值。
进一步地,所述图像质量特征包括:人脸清晰度、噪点、光照表现、频谱特征;所述边框结构特征包括:图像的线条结构特征和纹理特征。
进一步地,所述将所述图像质量特征和所述边框结构特征进行加权融合,得到融合特征的步骤包括:
将所述图像质量特征和所述边框结构特征分别乘以各自的可学习参数,所述可学习参数由活体人脸样本通过所述神经网络模型训练得出。
进一步地,所述神经网络模型为双生深度神经网络模型,所述双生深度神 经网络模型包含两个特征提取器和一个全连接分类器。
另一方面,提供了一种长基线双目人脸活体检测系统,所述系统包括:图像获取装置和检测系统;
其中所述图像获取装置包括:
主摄像头,位于长基线的一端,与待检测人脸正对设置,用于采集第一人脸图像;
辅助摄像头,位于长基线的另一端,用于采集第二人脸图像;
所述检测系统包括:
人脸检测模块,用于检测所述主摄像头是否采集到所述第一人脸图像,所述辅助摄像头是否采集到所述第二人脸图像,以及判断所述第一人脸图像的尺寸是否满足预设尺寸标准;
人脸图像处理模块,用于将所述第一人脸图像和所述第二人脸图像分别归一化到预设的像素尺寸;
人脸活体判别模块,其中包含神经网络模型,用于训练归一化后的所述第一人脸图像和所述第二人脸图像,得出活体检测分值,判断所述活体检测分值是否满足预设分值标准,若满足,则判定当前人脸为活体人脸,或不满足则判定当前人脸为非活体人脸。
进一步地,所述辅助摄像头为一台或多台,与所述主摄像头位于同一平面内,设置在所述主摄像头上、下、左、右的任意一个或多个位置。
进一步地,所述主摄像头包括摄像头和滤除非可见光的滤光镜;所述辅助摄像头为红外摄像头、广角摄像头、可见光摄像头中的任意一种或多种。
进一步地,所述人脸图像处理模块包括:双生深度神经网络模型,所述双生深度神经网络模型包含两个特征提取器和一个全连接分类器。
本发明实施例提供的技术方案带来的有益效果是:
1、本发明采用长基线双目摄像头采集人脸图像并结合双生神经网络模型提取图像特征获取活体检测得分,能够准确、高效地进行活体人脸图像的检测和 识别,克服了现有人脸识别技术中识别效果不稳定、对硬件设备要求高以及图像处理计算量较大的缺陷;
2、本发明公开的长基线双目摄像头能够包括主、辅两种摄像装置同时采集图像,对于一般的非活体人脸检测可以第一时间快速识别,对于识别难度较高的非活体人脸检测可以通过神经网络模型的短时间处理迅速识别出来,识别效率较高;
3、本发明从主、辅摄像头中同时提取对成像材质敏感、区分度高的图片图像质量特征,和不易受噪声、环境光照等因素干扰的图片边框结构特征,作为识别非活体人脸图像的特征因素,即拥有图像质量特征提供的高准确率,还兼具边框结构特征提供的高鲁棒性。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例提供的一种长基线双目人脸活体检测方法流程图;
图2是本发明实施例提供的主摄像头和辅助摄像头布置示意图;
图3是本发明实施例提供的一种长基线双目人脸活体检测系统模块示意图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
由于现有的活体人脸检测方法中均存在检测效果不稳定、计算过程较为复杂的问题,因此本发明实施例公开一种长基线双目人脸活体检测方法及系统, 具体的技术方案如下。
如图1所示,一种长基线双目人脸活体检测方法,包括:
通过长基线一端的主摄像头正面采集第一人脸图像,检测所述第一人脸图像的尺寸是否符合预设尺寸标准;
若所述第一人脸图像的尺寸符合所述预设尺寸标准,则判断位于所述长基线另一端的辅助摄像头是否能采集到第二人脸图像;
若所述辅助摄像头没有采集到所述第二人脸图像,则判定当前人脸为非活体人脸;若所述辅助摄像头能采集到所述第二人脸图像,则将所述第一人脸图像和所述第二人脸图像分别归一化到预设的像素尺寸;
将归一化后的所述第一人脸图像和所述第二人脸图像通过神经网络模型进行训练,得出活体检测分值;
判断所述活体检测分值是否满足预设分值标准,若满足,则判定当前人脸为活体人脸,或不满足则判定当前人脸为非活体人脸。
需要说明的是,上述方法中基线是指摄像头之间的直线距离,长基线是指相较于短基线长的基线。主摄像头主要用于正面采集第一人脸图像,因此第一人脸图像为人脸的正面图像。预设尺寸标准与人脸和主摄像头之间的距离有关系,设定预设尺寸标准时可以预先规定人脸和主摄像头的距离,然后设定尺寸标准。当主摄像头采集第一人脸图像时,可以提示用户站到指定的位置处,或者将人脸置入显示屏中的提示框中,以此来测量第一人脸图像尺寸。若第一人脸图像尺寸小于预设尺寸标准侧判定当前人脸为非活体人脸,若符合预设尺寸标准则进行下一步判断。一般情况下,若当前检测人脸为活体人脸,那么位于长基线另一端的辅助摄像头能够采集到部分的人脸图像,若当前检测人脸为非活体人脸,由于非活体人脸图像是平面,且辅助摄像头与主摄像头之间的直线距离较远,因此辅助摄像头通常不能采集到部分人脸图像。基于这个原理,在第一人脸图像的尺寸符合预设尺寸标准的前提下,判断辅助摄像头是否能够采集到第二人脸图像,若辅助摄像头采集不到第二人脸图像,那么判定当前检测 人脸图像为非活体人脸图像,其中第二人脸图像通常为部分人脸图像。
那么若辅助摄像头采集到第二人脸图像了,则需要结合第一人脸图像、第二人脸图像的图像特征判断当前检测人脸图像是否为活体人脸图像。上述方法中,归一化是指对图像进行了一系列标准的处理变换,使之变换为一固定标准形式的过程。本发明实施例中优选地将第一人脸图像归一化到128*128像素,将第二人脸图像归一化到64*64像素。归一化后的第一人脸图像和第二人脸图像放入神经网络模型中进行特征提取和训练,得出活体检测分值。活体检测分值受第一人脸图像的图像质量和第二人脸图像的边框结构特征影响。最后将活体检测分值与预设的预设分值标准比较,其中预设分值标准是利用大量的活体人脸图像作为训练样本经过神经网络模型训练得出的一个标准,该标准通常为一个阈值,若活体检测分值落入该阈值内,则当前检测人脸为活体人脸,若没有落入该阈值内,则当前检测人脸为非活体人脸。
图2为主摄像头和辅助摄像头可能的布置方式,主摄像头1和辅助摄像头2位于同一平面内,以保证主摄像头和辅助摄像头与人脸的垂直距离相等。主摄像头和辅助摄像头之间的连线为长基线3。辅助摄像头可以是一台,布置在主摄像头上、下、左、右中的任意一个位置,也可以是多台,布置在主摄像头上、下、左、右中的任意多个位置。
需要说明的是,若辅助摄像头是多台,那么其采集的图像就是从多个角度拍摄的部分人脸图像。若当前检测人脸为活体人脸,则每个辅助摄像头均能够采集到第二人脸图像。因此若多台辅助摄像头中有一台或多台没有采集到第二人脸图像,可以直接判定当前人脸图像为非活体人脸图像。
具体地,上述方法中将归一化后的第一人脸图像和第二人脸图像通过神经网络模型进行训练,得出活体检测分值的步骤包括:
提取所述第一人脸图像的图像质量特征以及第二人脸图像的边框结构特征,并将所述第一人脸图像和所述第二人脸图像降低到同一维度;
将所述图像质量特征和所述边框结构特征进行加权融合,得到融合特征;
根据所述融合特征得出所述活体检测分值。
需要说明的是,图像质量特征是从第一人脸图像中提取的图像特征,由于第一人脸图像为人脸的正面图像,因此需要对第一人脸图像的图像质量进行衡量。图像质量特征包括:图片中人脸清晰度、图像的噪点程度、光照表现、频谱特征,还可以包括:小波特征等。若当前检测的人脸图像为非活体人脸图像,由于非活体人脸的材质与活体人脸的皮肤材质有很大的差别,这些差别一定会导致拍摄到的非活体人脸图像和活体人脸图像在多个方面存在差异,例如人脸纹理清晰度、噪点含量、光照表现、频谱表现等体现被摄对象材质的特征。并且,非活体人脸通常为电子照片或纸质照片图片,这两类材质的成像清晰度一定低于真实人脸,噪点含量较高,并会出现反光和摩尔纹等情况。因此通过检测上述图像质量特征可以判断出人脸图片是否为活体人脸图片。
为了进一步提高判定的准确率,本发明实施例公开的方法结合第一人脸图像和第二人脸图像判定检测的人脸图像是否为活体人脸图像。对于第二人脸图像的边框结构特征,若当前检测的人脸图像为非活体人脸图像,那么其可能存在图片的边框,或者其人脸图像的边界与其所处的环境背景的衔接没有活体人脸图像衔接的那么自然,因此边框结构特征体现了图像的边界与背景的融合程度,包括:图像中纹理线条、物品边界等线条结构特征和纹理结构特征。
在获得图像质量特征和边框结构特征后,将二者加权融合得到融合特征,具体的做法是,二者分别乘以各自的可学习参数。可学习参数为由活体人脸样本通过所述神经网络模型训练得出的两种特征的权重值。
上述方法中的神经网络模型采用双生深度神经网络模型。双生深度神经网络模型包含两个特征提取器和一个全连接分类器,其中特征提取器可以采用现有神经网络模型的特征提取器。以ResNet-50模型为例,所述特征提取器采用ResNet-50模型的输入和特征提取层结构,所述全连接分类器设置在所述特征提取器后,依次包含Average Pooling层、FC全连接层和Softmax层。
需要说明的是,ResNet-50是一种深度训练神经网络模型,其采用了一种 “shortcut connection”的连接方式,该种连接方式可以提高处理效率。所述ResNet-50模型的特征提取结构由一个7x7卷积层、一个3x3max-pool层和16个残差块构成,每个残差块由3个卷积层构成,前后各一个1x1卷积层,中间一个3x3卷积层。整套特征提取结构由49个卷积层构成,数据输入特征提取器后先经由7x7卷积层和3x3max-pool层,之后再依次通过16个残差块,最终得到提取出的特征图。本发明实施例公开的双生深度神经网络模型在ResNet-50模型的基础上对其结构进行改进,适用于本发明技术方案中分别对第一人脸图像和第二人脸图像进行特征提取的需要,可以同时对两种人脸图像进行处理。
另一方面,如图3所示,本发明实施例在上述方法的基础上还公开一种长基线双目人脸活体检测系统,包括:图像获取装置和检测系统。
图像获取装置包括:主摄像头,位于长基线的一端,与待检测人脸正对设置,用于采集第一人脸图像;辅助摄像头,位于长基线的另一端,用于采集第二人脸图像。
检测系统包括:人脸检测模块,用于检测所述主摄像头是否采集到所述第一人脸图像,所述辅助摄像头是否采集到所述第二人脸图像,以及判断所述第一人脸图像的尺寸是否满足预设尺寸标准;人脸图像处理模块,用于将所述第一人脸图像和所述第二人脸图像分别归一化到预设的像素尺寸;人脸活体判别模块,其中包含神经网络模型,用于训练归一化后的所述第一人脸图像和所述第二人脸图像,得出活体检测分值,判断所述活体检测分值是否满足预设分值标准,若满足,则判定当前人脸为活体人脸,或不满足则判定当前人脸为非活体人脸。
上述图像获取装置中,辅助摄像头为一台或多台,与所述主摄像头位于同一平面内,设置在所述主摄像头上、下、左、右的任意一个或多个位置。主摄像头包括摄像头和滤除非可见光的滤光镜;辅助摄像头为红外摄像头、广角摄像头、可见光摄像头中的任意一种或多种。
上述检测系统中,人脸活体判别模块具体用于利用神经网络模型提取所述 第一人脸图像的图像质量特征以及第二人脸图像的边框结构特征,并将所述第一人脸图像和所述第二人脸图像降低到同一维度;将所述图像质量特征和所述边框结构特征进行加权融合,得到融合特征;根据所述融合特征得出所述活体检测分值。其中图像质量特征包括:人脸清晰度、噪点、光照表现、频谱特征;边框结构特征包括:图像的线条结构特征和纹理特征。
上述神经网络模型为双生深度神经网络模型,所述双生深度神经网络模型包含两个特征提取器和一个全连接分类器。例如:所述特征提取器采用ResNet-50模型的输入和特征提取层结构,所述全连接分类器设置在所述特征提取器后,依次包含所述ResNet-50模型的Average Pooling层、FC全连接层和Softmax层。其中Average Pooling层用于降低融合特征的维度,FC全连接层和Softmax层用于获取人脸活体检测得分。
本发明实施例提供的技术方案带来的有益效果是:
1、本发明采用长基线双目摄像头采集人脸图像并结合双生神经网络模型提取图像特征获取活体检测得分,能够准确、高效地进行活体人脸图像的检测和识别,克服了现有人脸识别技术中识别效果不稳定、对硬件设备要求高以及图像处理计算量较大的缺陷;
2、本发明公开的长基线双目摄像头能够包括主、辅两种摄像装置同时采集图像,对于一般的非活体人脸检测可以第一时间快速识别,对于识别难度较高的非活体人脸检测可以通过神经网络模型的短时间处理迅速识别出来,识别效率较高;
3、本发明从主、辅摄像头中同时提取对成像材质敏感、区分度高的图片图像质量特征,和不易受噪声、环境光照等因素干扰的图片边框结构特征,作为识别非活体人脸图像的特征因素,即拥有图像质量特征提供的高准确率,还兼具边框结构特征提供的高鲁棒性。
上述所有可选技术方案,可以采用任意结合形成本发明的可选实施例,在此不再一一赘述。以上所述仅为本发明的较佳实施例,并不用以限制本发明, 凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (10)

  1. 一种长基线双目人脸活体检测方法,其特征在于,包括:
    通过长基线一端的主摄像头正面采集第一人脸图像,检测所述第一人脸图像的尺寸是否符合预设尺寸标准;
    若所述第一人脸图像的尺寸符合所述预设尺寸标准,则判断位于所述长基线另一端的辅助摄像头是否能采集到第二人脸图像;
    若所述辅助摄像头没有采集到所述第二人脸图像,则判定当前人脸为非活体人脸;若所述辅助摄像头能采集到所述第二人脸图像,则将所述第一人脸图像和所述第二人脸图像分别归一化到预设的像素尺寸;
    将归一化后的所述第一人脸图像和所述第二人脸图像通过神经网络模型进行训练,得出活体检测分值;
    判断所述活体检测分值是否满足预设分值标准,若满足,则判定当前人脸为活体人脸,或不满足则判定当前人脸为非活体人脸。
  2. 如权利要求1所述的一种长基线双目人脸活体检测方法,其特征在于,所述辅助摄像头为一台或多台,与所述主摄像头位于同一平面内,设置在所述主摄像头上、下、左、右的任意一个或多个位置。
  3. 如权利要求1所述的一种长基线双目人脸活体检测方法,其特征在于,所述将归一化后的第一人脸图像和第二人脸图像通过神经网络模型进行训练,得出活体检测分值的步骤包括:
    提取所述第一人脸图像的图像质量特征以及第二人脸图像的边框结构特征,并将所述第一人脸图像和所述第二人脸图像降低到同一维度;
    将所述图像质量特征和所述边框结构特征进行加权融合,得到融合特征;
    根据所述融合特征得出所述活体检测分值。
  4. 如权利要求3所述的一种长基线双目人脸活体检测方法,其特征在于,所述图像质量特征包括:人脸清晰度、噪点、光照表现、频谱特征;所述边框 结构特征包括:图像的线条结构特征、纹理特征。
  5. 如权利要求3所述的一种长基线双目人脸活体检测方法,其特征在于,所述将所述图像质量特征和所述边框结构特征进行加权融合,得到融合特征的步骤包括:
    将所述图像质量特征和所述边框结构特征分别乘以各自的可学习参数,所述可学习参数由活体人脸样本通过所述神经网络模型训练得出。
  6. 如权利要求1所述的一种长基线双目人脸活体检测方法,其特征在于,所述神经网络模型为双生深度神经网络模型,所述双生深度神经网络模型包含两个特征提取器和一个全连接分类器。
  7. 基于权利要求1~6中任一项所述方法建立的一种长基线双目人脸活体检测系统,其特征在于,包括:图像获取装置和检测系统;
    其中所述图像获取装置包括:
    主摄像头,位于长基线的一端,与待检测人脸正对设置,用于采集第一人脸图像;
    辅助摄像头,位于长基线的另一端,用于采集第二人脸图像;
    所述检测系统包括:
    人脸检测模块,用于检测所述主摄像头是否采集到所述第一人脸图像,所述辅助摄像头是否采集到所述第二人脸图像,以及判断所述第一人脸图像的尺寸是否满足预设尺寸标准;
    人脸图像处理模块,用于将所述第一人脸图像和所述第二人脸图像分别归一化到预设的像素尺寸;
    人脸活体判别模块,其中包含神经网络模型,用于训练归一化后的所述第一人脸图像和所述第二人脸图像,得出活体检测分值,判断所述活体检测分值是否满足预设分值标准,若满足,则判定当前人脸为活体人脸,或不满足则判定当前人脸为非活体人脸。
  8. 如权利要求6所述的一种长基线双目人脸活体检测系统,其特征在于, 所述辅助摄像头为一台或多台,与所述主摄像头位于同一平面内,设置在所述主摄像头上、下、左、右的任意一个或多个位置。
  9. 如权利要求6所述的一种长基线双目人脸活体检测系统,其特征在于,所述主摄像头包括摄像头和滤除非可见光的滤光镜;所述辅助摄像头为红外摄像头、广角摄像头、可见光摄像头中的任意一种或多种。
  10. 如权利要求6所述的一种长基线双目人脸活体检测系统,其特征在于,所述人脸图像处理模块包括:双生深度神经网络模型,所述双生深度神经网络模型包含两个特征提取器和一个全连接分类器。
PCT/CN2020/095663 2019-06-12 2020-06-11 一种长基线双目人脸活体检测方法及系统 WO2020249054A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA3147418A CA3147418A1 (en) 2019-06-12 2020-06-11 Living body detection method and system for human face by using two long-baseline cameras

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910505346.X 2019-06-12
CN201910505346.XA CN110363087B (zh) 2019-06-12 2019-06-12 一种长基线双目人脸活体检测方法及系统

Publications (1)

Publication Number Publication Date
WO2020249054A1 true WO2020249054A1 (zh) 2020-12-17

Family

ID=68215679

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/095663 WO2020249054A1 (zh) 2019-06-12 2020-06-11 一种长基线双目人脸活体检测方法及系统

Country Status (3)

Country Link
CN (1) CN110363087B (zh)
CA (1) CA3147418A1 (zh)
WO (1) WO2020249054A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110363087B (zh) * 2019-06-12 2022-02-25 苏宁云计算有限公司 一种长基线双目人脸活体检测方法及系统
TWI731503B (zh) 2019-12-10 2021-06-21 緯創資通股份有限公司 活體臉部辨識系統與方法
CN111126216A (zh) * 2019-12-13 2020-05-08 支付宝(杭州)信息技术有限公司 风险检测方法、装置及设备
CN112488018A (zh) * 2020-12-09 2021-03-12 巽腾(广东)科技有限公司 双目活体检测方法、装置、设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107590430A (zh) * 2017-07-26 2018-01-16 百度在线网络技术(北京)有限公司 活体检测方法、装置、设备及存储介质
CN107862299A (zh) * 2017-11-28 2018-03-30 电子科技大学 一种基于近红外与可见光双目摄像头的活体人脸检测方法
CN108229362A (zh) * 2017-12-27 2018-06-29 杭州悉尔科技有限公司 一种基于门禁系统的双目人脸识别活体检测方法
US20180307895A1 (en) * 2014-08-12 2018-10-25 Microsoft Technology Licensing, Llc False face representation identification
CN110363087A (zh) * 2019-06-12 2019-10-22 苏宁云计算有限公司 一种长基线双目人脸活体检测方法及系统

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11023709B2 (en) * 2017-04-28 2021-06-01 ID R&D, Inc. System, method and apparatus for multi-modal biometric authentication and liveness detection
CN107167077B (zh) * 2017-07-07 2021-05-14 京东方科技集团股份有限公司 立体视觉测量系统和立体视觉测量方法
CN109325933B (zh) * 2017-07-28 2022-06-21 阿里巴巴集团控股有限公司 一种翻拍图像识别方法及装置
CN109034102B (zh) * 2018-08-14 2023-06-16 腾讯科技(深圳)有限公司 人脸活体检测方法、装置、设备及存储介质
CN109359634B (zh) * 2018-12-11 2021-11-16 西安第六镜网络科技有限公司 一种基于双目摄像机的人脸活体检测方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180307895A1 (en) * 2014-08-12 2018-10-25 Microsoft Technology Licensing, Llc False face representation identification
CN107590430A (zh) * 2017-07-26 2018-01-16 百度在线网络技术(北京)有限公司 活体检测方法、装置、设备及存储介质
CN107862299A (zh) * 2017-11-28 2018-03-30 电子科技大学 一种基于近红外与可见光双目摄像头的活体人脸检测方法
CN108229362A (zh) * 2017-12-27 2018-06-29 杭州悉尔科技有限公司 一种基于门禁系统的双目人脸识别活体检测方法
CN110363087A (zh) * 2019-06-12 2019-10-22 苏宁云计算有限公司 一种长基线双目人脸活体检测方法及系统

Also Published As

Publication number Publication date
CA3147418A1 (en) 2020-12-17
CN110363087B (zh) 2022-02-25
CN110363087A (zh) 2019-10-22

Similar Documents

Publication Publication Date Title
WO2020249054A1 (zh) 一种长基线双目人脸活体检测方法及系统
CN108009531B (zh) 一种多策略防欺诈的人脸识别方法
CN106446873B (zh) 人脸检测方法及装置
WO2020078229A1 (zh) 目标对象的识别方法和装置、存储介质、电子装置
WO2020088588A1 (zh) 一种基于深度学习的静态三维人脸活体检测方法
JP6336117B2 (ja) 建物高さの計算方法、装置及び記憶媒体
WO2018040307A1 (zh) 一种基于红外可见双目图像的活体检测方法及装置
CN112052831B (zh) 人脸检测的方法、装置和计算机存储介质
CN109190475B (zh) 一种人脸识别网络与行人再识别网络协同训练方法
CN109101871A (zh) 一种基于深度和近红外信息的活体检测装置、检测方法及其应用
KR101781358B1 (ko) 디지털 영상 내의 얼굴 인식을 통한 개인 식별 시스템 및 방법
CN109359577B (zh) 一种基于机器学习的复杂背景下人数检测系统
CN109670430A (zh) 一种基于深度学习的多分类器融合的人脸活体识别方法
WO2022206319A1 (zh) 图像处理方法、装置、设备、存储介质计算机程序产品
CN112818722B (zh) 模块化动态可配置的活体人脸识别系统
TW201405445A (zh) 基於雙攝影機之真實人臉識別系統及其方法
US11315360B2 (en) Live facial recognition system and method
WO2021217764A1 (zh) 一种基于偏振成像的人脸活体检测方法
CN111209820B (zh) 人脸活体检测方法、系统、设备及可读存储介质
CN106599880A (zh) 一种面向无人监考的同人判别方法
CN111582118A (zh) 一种人脸识别方法及装置
CN108363944A (zh) 人脸识别终端双摄防伪方法、装置及系统
CN112991159B (zh) 人脸光照质量评估方法、系统、服务器与计算机可读介质
CN111723656B (zh) 一种基于YOLO v3与自优化的烟雾检测方法及装置
Manu et al. Visual artifacts based image splicing detection in uncompressed images

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20823114

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3147418

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20823114

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20823114

Country of ref document: EP

Kind code of ref document: A1