WO2018133791A1 - 一种基于视频分析的活体判别方法、系统及存储介质 - Google Patents

一种基于视频分析的活体判别方法、系统及存储介质 Download PDF

Info

Publication number
WO2018133791A1
WO2018133791A1 PCT/CN2018/072973 CN2018072973W WO2018133791A1 WO 2018133791 A1 WO2018133791 A1 WO 2018133791A1 CN 2018072973 W CN2018072973 W CN 2018072973W WO 2018133791 A1 WO2018133791 A1 WO 2018133791A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
feature information
information
calculation
probability
Prior art date
Application number
PCT/CN2018/072973
Other languages
English (en)
French (fr)
Inventor
赵凌
李季檩
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018133791A1 publication Critical patent/WO2018133791A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive

Definitions

  • the present application relates to the field of information processing technologies, and in particular, to a living body discrimination method, system, and storage medium based on video analysis.
  • the biometric discrimination technology can be applied to more and more fields, such as face access control, gates, and remote banking accounts. Specifically, in the application of the face access control and the gate, it is necessary to verify that the current user is indeed a legitimate user, and can effectively resist the detection of the illegal user borrowing the photo of the legitimate user through the system.
  • An existing living body discrimination method requires a certain interaction in a practical application scenario, such as shaking a head, blinking, etc., when the user makes a correct interaction according to the prompt, the living body detection process can be passed, and the whole living body discrimination process is cumbersome, and There is a situation in which the user does not cooperate with the interaction, resulting in a low pass rate and affecting the user experience.
  • Another method for distinguishing living body based on binocular vision is to reconstruct a living body in the video through a dual camera, and calculate whether the reconstructed three-dimensional model is in a plane to determine whether it is a living body. The method needs to be equipped with a dual camera, and the calculation amount is large. It does not apply to live discrimination on the embedded and mobile side.
  • the embodiment of the present application provides a living body discrimination method, system and storage medium based on video analysis, and determines whether a video to be analyzed is a live video according to a trained machine learning model.
  • the embodiment of the present application provides a living body discrimination method based on video analysis, including:
  • the server extracts first feature information of the video to be analyzed according to the preset feature extraction model
  • the server calculates a type discriminant parameter value corresponding to the to-be-analyzed video according to the preset classification model and the first feature information, where the classification model includes a type identification based on feature information corresponding to the live video and the non-living video respectively. Calculation information of the parameter;
  • the server determines, according to the type discriminating parameter value, whether the video to be analyzed belongs to a live video.
  • An embodiment of the present application provides a living body discrimination system based on video analysis, comprising: a processor and a memory connected to the processor; wherein the memory stores computer instructions executable by the processor, the computer Instructions include:
  • a feature extraction unit configured to extract first feature information of the video to be analyzed according to the preset feature extraction model
  • a parameter value calculation unit configured to calculate, according to the preset classification model and the first feature information, a type discriminant parameter value corresponding to the to-be-analyzed video, where the classification model includes a feature corresponding to the live video and the non-living video respectively The type of information discriminates the calculation information of the parameter;
  • a type determining unit configured to determine, according to the type discriminating parameter value, whether the video to be analyzed belongs to a live video.
  • Another aspect of the present application provides a storage medium having stored thereon a computer program executable by a processor and implementing the living body discrimination method of any of the above embodiments.
  • the living body discriminating system based on the video analysis obtains the type discriminating parameter value of the video to be analyzed by using the first feature information of the video to be analyzed and the preset classification model, and then determines the parameter value according to the type discriminating parameter value. Whether the video to be analyzed belongs to a live video. This does not require interaction with the user, nor does it need to be equipped with a dual camera. Just recording a video, the living body discrimination system based on video analysis determines the segment according to the preset machine learning model (including classification model and feature extraction model). Whether the video belongs to the living video simplifies the living body discrimination process and facilitates the application of the living body discrimination method in various fields.
  • FIG. 1 is a schematic diagram of an application scenario of a living body discrimination method based on video analysis according to an embodiment of the present application
  • FIG. 2 is a flowchart of a living body discrimination method based on video analysis according to an embodiment of the present application
  • FIG. 3 is a flowchart of a method for extracting first feature information of a video to be analyzed in an embodiment of the present application
  • FIG. 4 is a schematic structural diagram of a feature extraction model and a classification model extracted in an application embodiment of the present application
  • FIG. 5 is a schematic structural diagram of a living body discrimination system based on video analysis according to an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of another living body discrimination system based on video analysis according to an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
  • FIG. 1 is an application scenario of a living body discrimination method based on video analysis according to an embodiment of the present application.
  • the video discriminating method based on the video analysis obtains the user's video through the camera 1 when the user passes the face access control 3, and provides the acquired video as the to-be-analyzed video to the server 2, and the server 2 further verifies whether the current user is a legitimate user. In order to effectively resist the illegal user borrowing photos of legitimate users through the detection of the access control system 3.
  • the server 2 may extract the first feature information of the video to be analyzed according to the preset feature extraction model, and calculate a type discriminant parameter value corresponding to the to-be-analyzed video according to the preset classification model and the first feature information. And determining, according to the type discriminating parameter value, whether the video to be analyzed belongs to a live video.
  • the classification model includes calculation information of a type discriminant parameter based on feature information corresponding to the live video and the non-living video, respectively.
  • the embodiment of the present application provides a living body discrimination method based on video analysis, which is mainly a method performed by a living body discrimination system based on video analysis.
  • the flowchart is as shown in FIG. 2, and includes:
  • Step 101 Extract first feature information of the video to be analyzed according to the preset feature extraction model, where the first feature information may include time feature information and spatial feature information, where the spatial feature information is specifically a multi-frame image included in the video to be analyzed. Pixel feature information.
  • Step 102 Calculate a type discriminant parameter value corresponding to the video to be analyzed according to the preset classification model and the first feature information, where the classification model includes calculation information of the type discriminant parameter based on the feature information corresponding to the live video and the non-living video respectively.
  • the calculation information herein may refer to a mathematical formula, a fixed parameter value, and the like used in the process of discriminating the parameter value by using the feature information as an input calculation type.
  • the preset feature extraction model and the classification model may be based on the video analysis-based living body discrimination system, and the video training samples of the plurality of labeled live video and the non-live video are trained and stored in the system.
  • the feature extraction model may adopt a deep learning network, including multiple parameter calculation layers (such as a convolution layer, a fully connected layer, etc.), and the data of the feature extraction model may be stored in the system, including the calculation parameter values of each parameter calculation layer.
  • the convolution layer can convolute the time information and pixel information of the multi-frame image included in the video, thereby The time characteristic information and the pixel feature information of the video are obtained, and the fully connected layer can obtain the association relationship between the feature information acquired by the convolution layer.
  • the classification model may be a two-classifier.
  • the classification model stored in the system may include probability information calculation information corresponding to the feature information corresponding to the live video and the non-living video respectively, that is, the live video and the non-living video respectively correspond to
  • the calculation information of the type discrimination parameter based on the first feature information is probability calculation information based on the first feature information corresponding to the live video and the non-living video, respectively, including a probability calculation formula and a fixed parameter value, so that the execution is performed
  • the first probability that the video to be analyzed belongs to the live video and the second probability that belongs to the non-live video may be calculated according to the first feature information and the probability calculation information.
  • the classification model may specifically be a softmax classifier or the like, wherein the softmax classifier mainly takes the first feature information as an input, and calculates a first probability that the video to be analyzed belongs to the live video and a second that belongs to the non-live video by the softmax function. Probability, and the sum of the first probability and the second probability is 1.
  • the classification model stored in the system may include distance calculation information between the feature information of the live video and the non-living video, respectively, that is, the live video and the non-living video respectively correspond to the first
  • the calculation information of the type discrimination parameter of the feature information is distance calculation information between the feature information of the live video and the non-living video, respectively, including a distance calculation formula (which may be a European distance calculation formula, etc.) and a live video and a non-living video respectively.
  • the feature information or the like such that when the step 102 is performed, the first distance of the first feature information and the feature information of the live video and the second distance of the feature information of the non-living video may be calculated according to the distance calculation information.
  • the classification model may specifically adopt a Support Vector Machines (SVM) classifier or the like.
  • SVM Support Vector Machines
  • Step 103 Determine, according to the type discriminating parameter value obtained in step 102 above, whether the video to be analyzed belongs to a live video.
  • the type discriminant parameter value calculated in the above step 102 is the first probability that the video to be analyzed belongs to the live video and the second probability belongs to the non-living video, the video with the greater probability of the first probability and the second probability
  • the type is the video type of the video to be analyzed, for example, the first probability of belonging to the live video is large, and the video to be analyzed belongs to the live video.
  • the type discriminating parameter value calculated in the above step 102 is the first distance of the first feature information and the feature information of the live video and the second distance of the feature information of the non-living video.
  • the video type corresponding to the smaller distance of the second distance is determined as the type of the video to be analyzed. For example, if the first distance between the first feature information and the feature information of the live video is small, the video to be analyzed belongs to the live video.
  • the living body discriminating system based on the video analysis obtains the type discriminating parameter value of the video to be analyzed by using the first feature information of the video to be analyzed and the preset classification model, and then determines the parameter value according to the type discriminating parameter value. Whether the video to be analyzed belongs to a live video. This does not require interaction with the user, nor does it need to be equipped with a dual camera. Just recording a video, the living body discrimination system based on video analysis determines the segment according to the preset machine learning model (including classification model and feature extraction model). Whether the video belongs to the living video simplifies the living body discrimination process and facilitates the application of the living body discrimination method in various fields.
  • the biometric discriminating system based on the video analysis may perform the following steps to extract the first feature information in the step 101, which specifically includes:
  • Step 201 The video to be analyzed is divided into sub-videos of a plurality of n frames, and there are overlapping images of m frames between two adjacent sub-videos, where n is a natural number greater than m.
  • each video contains an image of multiple frames, and an image of each frame represents an image at a certain time point.
  • the multi-segment video divided by the video to be analyzed in this embodiment between each adjacent two sub-videos There are overlapping images so that there is an association between the sub-videos.
  • Step 202 Extract feature information of the multi-segment sub-video according to the feature extraction model.
  • the biometric discriminating system based on the video analysis may be implemented by the following steps when extracting the feature information of a sub-video, including:
  • Step A convolution calculation of time information and pixel information of a sub-video through a convolution layer to obtain time feature information and pixel feature information of t dimensions, specifically, multiple sub-videos are respectively included in the sub-video
  • the element values (including time and pixel) of the corresponding position in the frame image are multiplied, and the multiplied results are added to obtain time feature information and pixel feature information of t dimensions.
  • step B the time feature information and the pixel feature information of the t dimensions are reduced by the pooling layer to obtain time feature information and pixel feature information of the p dimensions, where p is a natural number less than t.
  • step C the association relationship between the time feature information of the p dimensions and the pixel feature information is determined by the fully connected layer. Specifically, the association relationship may be represented by the time feature information of each dimension and the weight value of the pixel feature information.
  • the feature information of the sub video includes time feature information and pixel feature information of p dimensions having an association relationship.
  • Step 203 Calculate an average value of the feature information of the multi-segment sub-video extracted in step 202 as the first feature information of the video to be analyzed.
  • the preset feature extraction model may be obtained by using the following methods: inputting element values (including time information and pixel information) of the image included in the plurality of video training samples into the computing network to calculate corresponding feature information.
  • the calculation network includes a plurality of parameter calculation layers connected in series, and any parameter calculation layer obtains a calculation result according to the input information and the corresponding calculation parameter value, and inputs the calculation result to the next parameter calculation layer, and the parameter calculation layer includes a convolution layer. , pooled layer and fully connected layer.
  • the calculation parameter values corresponding to the calculation layers of each parameter in the calculation network are adjusted, and the feature information of another video training sample is obtained based on the adjusted calculation network.
  • the feature information of the other video training sample is made to satisfy the convergence condition, and the feature extraction model is the adjusted computing network.
  • the living body discriminating system stores the structural information of the computing network and the calculated parameter values corresponding to the respective parameter calculation layers obtained by the final adjustment.
  • the computing network may be a computing network of any structure.
  • the specific structure of the computing network is not limited herein.
  • the training process is to calculate the calculated parameter values of each parameter computing layer in the multi-computing network.
  • the living body discriminating system can continue to train to obtain the classification model.
  • the feature information of the live video is determined according to the second feature information corresponding to the first video training sample belonging to the living video in the plurality of video training samples.
  • the first probability calculation information is such that the probability obtained by calculating the information according to the determined first probability is greater than 0.5; or determining the non-determination based on the third feature information corresponding to the second video training sample belonging to the non-live video in the plurality of video training samples
  • the second probability calculation information of the feature video based on the feature information is such that the probability obtained by calculating the information according to the determined second probability is greater than 0.5.
  • Any of the probability calculation information (the first probability calculation information or the second probability calculation information) may include information such as a probability calculation formula and a fixed parameter.
  • This embodiment may include two processes, namely, an offline training process and an online prediction process, specifically:
  • the offline training process mainly trains the video training samples of the labeled live video and the non-live video to obtain a feature extraction model and a classification model, including a pre-training process and a fine tune process.
  • the living body discriminating system trains a plurality of video training samples to obtain calculation information of each parameter calculation layer in the computing network as shown in FIG. 4.
  • the element values (including time information and pixel information) of the image included in the plurality of video training samples are respectively input into the computing network to calculate corresponding first feature information, where the computing network includes multiple parameter calculation layers connected in series, and any parameter
  • the calculation layer obtains the calculation result according to the input information and the corresponding calculation parameter value, and inputs the calculation result to the next parameter calculation layer, and the parameter calculation layer includes a convolution layer 310, a pooling layer 320, and a fully connected layer 330.
  • the computing network includes eight three-dimensional (3D) convolution layers 310, five pooling layers 320 and two fully connected layers 330, and a Softmax classifier 340.
  • the Softmax classifier 340 belongs to the classification model, and the convolution layer 310, the pooling layer 320, and the fully connected layer 330 belong to the feature extraction model.
  • Each convolutional layer 310 includes a 3 ⁇ 3 ⁇ 3 convolution kernel, the convolution span is 1 in both spatial and temporal sequence dimensions, and convolutional layer 1a 311 includes 64 convolution kernels, convolutional layer 2a 312 The convolution kernel is 128, and the convolutional layer 3a 313 and the convolutional layer 3b 314 have a number of convolution kernels of 256, a convolutional layer 4a 315, a convolutional layer 4b 316, a convolutional layer 5a 317, and a convolutional layer 5b 318.
  • the number of convolution kernels of the four convolutional layers is 512; in the pooling layer 320, the core size of the pooling layer 1 321 is 1 ⁇ 2 ⁇ 2, the pooling layer 2 322, the pooling layer 3 323, and the pooling layer
  • the core sizes of 4324 and pooling layer 5 325 are both 2 ⁇ 2 ⁇ 2; the output dimensions of the fully connected layer 330 are all 4096 dimensions.
  • the server After obtaining the first feature information corresponding to a video training sample in the training process, the server adjusts the calculation parameter value corresponding to each parameter calculation layer in the computing network, and obtains another video training sample based on the adjusted computing network.
  • a feature information is such that the first feature information of the other video training sample satisfies the convergence condition, and the feature extraction model is the adjusted computing network.
  • the living body discriminating system stores the structural information of the computing network and the calculated parameter values corresponding to the respective parameter calculation layers obtained by the final adjustment.
  • the pre-training process after the feature information corresponding to the plurality of video training samples is extracted, the general problem of the video is classified by using the classifier 340, and the plurality of video training samples can be divided into multiple
  • the types are not limited to the two types of live video and non-living video, so that the initial calculation information of each parameter calculation layer in the calculation network shown in FIG. 4 is trained through the premise training process.
  • the fine tuning process the initial calculation information of each parameter calculation layer in the calculation network obtained in the previous training process is adjusted, so that the classifier 340 only trains two types of videos belonging to the live video and the non-living video, and fine-tunes the video.
  • the final calculation information of each parameter calculation layer in the calculation network obtained by the process training is used as a parameter of the online prediction process.
  • Practice has proved that better initial calculation information can be obtained through the pre-training process, so that the final calculation information obtained is better in the application, that is, the video type (live video or non-living video) of the video to be analyzed is determined according to the final calculation information. The effect is better.
  • the computing network may be of any structure, and is not limited to the structure shown in FIG. 4, and the training process is to train the calculated parameter values of each parameter calculation layer in the multi-computation network.
  • the online prediction process mainly uses the calculation information of each parameter calculation layer in the calculation network obtained after the fine adjustment to analyze whether the video belongs to the live video.
  • the living body discriminating system first decomposes the video to be analyzed into a plurality of 16-frame sub-videos, and has 8 overlapping images between adjacent two sub-videos; and then 16 images of the decomposed sub-videos are included.
  • the element values are input to the calculation network obtained by the above training, and the 4096-dimensional vector corresponding to each sub-video is obtained through the fully-connected layer 6, and the feature vectors are averaged by the fully-connected layer 7, so that the feature vector of the analyzed video is obtained, that is, The feature information of the video to be analyzed; finally, the probability that the video to be analyzed belongs to the live video and the non-live video is calculated according to the feature information of the Softmax classifier and the video to be analyzed, and the video type corresponding to the larger probability is determined as the video to be analyzed. Video type.
  • the embodiment of the present application further provides a living body discriminating system based on video analysis, and a schematic structural diagram thereof is shown in FIG. 5, which may specifically include:
  • the feature extraction unit 10 is configured to extract first feature information of the video to be analyzed according to the preset feature extraction model
  • the parameter value calculation unit 11 is configured to calculate a type discriminant parameter value corresponding to the to-be-analyzed video according to the preset classification model and the first feature information extracted by the feature extraction unit 10, where the classification model includes a live video and a non- The calculation information of the type discrimination parameter based on the feature information corresponding to the live video;
  • the type determining unit 12 is configured to determine, according to the type discriminating parameter value calculated by the parameter value calculating unit 11, whether the video to be analyzed belongs to a live video.
  • the parameter value calculation unit 11 is specifically configured to: if the classification model includes feature information-based probability calculation information corresponding to the live video and the non-living video respectively, that is, the live video and the non-live video respectively.
  • the calculation information of the corresponding type information-based discriminant parameter is the feature information-based probability calculation information corresponding to the live video and the non-living video respectively
  • the video to be analyzed is calculated according to the first feature information and the probability calculation information.
  • the type determining unit 12 is specifically configured to determine a video type of the first probability and the second probability as a larger probability as the to-be-analyzed The type of video.
  • the parameter value calculation unit 11 is specifically configured to: if the data of the classification model includes distance calculation information between the feature information of the live video and the non-living video, that is, the live video and When the calculation information of the feature identification type-based discriminant parameter corresponding to the non-living video is the distance calculation information between the feature information of the live video and the non-living video, respectively, calculating the first feature information according to the distance calculation information a first distance from the feature information of the live video and a second distance of the feature information of the non-live video; the type determining unit 12 is specifically configured to: use the video type corresponding to the smaller distance of the first distance and the second distance Determined as the type of the video to be analyzed.
  • the parameter value calculation unit 11 obtains the type discriminant parameter value of the video to be analyzed by using the first feature information of the video to be analyzed and the preset classification model, and then the type determining unit 12 discriminates the parameter according to the type.
  • the value determines whether the video to be analyzed belongs to a live video. This does not require interaction with the user, nor does it need to be equipped with a dual camera.
  • the living body discrimination system based on video analysis determines the segment according to the preset machine learning model (including classification model and feature extraction model). Whether the video belongs to the living video simplifies the living body discrimination process and facilitates the application of the living body discrimination method in various fields.
  • the discriminating system may include an extraction model training unit 13 and a classification model training unit 14 in addition to the structure shown in FIG. 5, and the feature extraction unit in the system. 10 can be implemented by the dividing unit 110, the extracting unit 120, and the determining unit 130, specifically:
  • the dividing unit 110 is configured to divide the video to be analyzed into sub-videos of a plurality of n frames, and between the two adjacent sub-videos, there is an overlapping image of m frames, where n is a natural number greater than m;
  • the extracting unit 120 is configured to extract feature information of the multi-segment sub-video divided by the dividing unit 110 according to the feature extraction model, respectively;
  • the determining unit 130 is configured to calculate an average value of the feature information of the multi-segment sub-video obtained by the extracting unit 120 as the first feature information.
  • the extracting unit 120 is specifically configured to: for each sub video of the multi-segment sub-video, if the feature extraction model includes a convolution layer, a pooling layer, and a fully connected layer, by using the convolution layer
  • the time information of the sub-video and the pixel information are convoluted to obtain time feature information and pixel feature information of t dimensions; and the time feature information and the pixel feature information of the t dimensions are reduced by the pooling layer
  • the extraction model training unit 13 is configured to input the element values of the images included in the plurality of video training samples into the computing network to calculate corresponding feature information, where the computing network includes multiple parameter calculation layers connected in series, and any parameter calculation The layer obtains a calculation result according to the input information and the corresponding calculated parameter value, and inputs the calculation result to the next parameter calculation layer; wherein, after obtaining the feature information corresponding to the video training sample, adjusting each parameter calculation in the calculation network Corresponding to the calculation of the parameter value corresponding to the layer, and obtaining the feature information of another video training sample based on the adjusted computing network, so that the feature information of the other video training sample satisfies the convergence condition, and the feature extraction model is configured to perform the adjustment.
  • Computing network When extracting the feature information of any sub video, the extraction unit 120 included in the feature extraction unit 10 extracts the feature extraction model trained by the extraction model training unit 13 .
  • the classification model training unit 14 is configured to determine first feature calculation information based on the feature information of the live video according to the second feature information corresponding to the first video training sample belonging to the live video in the plurality of video training samples, so that The probability that the first probability calculation information is obtained is greater than 0.5; or the second probability information of the non-living video based on the feature information is determined according to the third feature information corresponding to the second video training sample belonging to the non-live video in the plurality of video training samples The information is calculated such that the probability of calculating the information from the second probability is greater than 0.5.
  • the parameter value calculation unit 11 calculates the type discrimination parameter value of the video to be analyzed according to the classification model trained by the classification model training unit 14 and the first feature information determined by the determination unit 130 included in the feature extraction unit 10.
  • the embodiment of the present application further provides a terminal device, which is shown in FIG. 7.
  • the terminal device may have a large difference due to different configurations or performances, and may include one or more central processing units (central processing units, CPU) 20 (eg, one or more processors) and memory 21, one or more storage media 22 that store application 221 or data 222 (eg, one or one storage device in Shanghai).
  • the memory 21 and the storage medium 22 may be short-term storage or persistent storage.
  • the program stored on the storage medium 22 may include one or more modules (not shown), each of which may include a series of instruction operations in the terminal device.
  • central processor 20 may be arranged to communicate with storage medium 22 to perform a series of instruction operations in storage medium 22 on the terminal device.
  • the application 221 stored in the storage medium 22 includes an application for living body discrimination based on video analysis
  • the program may include the feature extraction unit 10 in the above-described video analysis-based living body discrimination system, and the parameter value calculation unit 11,
  • the type determining unit 12 extracts the model training unit 13 and the classification model training unit 14, and details are not described herein.
  • the central processing unit 20 may be arranged to communicate with the storage medium 22 to perform a series of operations corresponding to the application of the live analysis based on the video analysis stored in the storage medium 22 on the terminal device.
  • the terminal device may also include one or more power sources 23, one or more wired or wireless network interfaces 24, one or more input and output interfaces 25, and/or one or more operating systems 223, such as Windows ServerTM, Mac OS. XTM, UnixTM, LinuxTM, FreeBSDTM and more.
  • the steps performed by the video analysis-based living body discrimination system described in the above method embodiments may be based on the structure of the terminal device shown in FIG.
  • the program may be stored in a computer readable storage medium, and the storage medium may include: Read only memory (ROM), random access memory (RAM), magnetic or optical disk, and the like.
  • the computer instructions when executed by the processor, can implement the video analysis-based living body discrimination method described in the above embodiments of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例公开了一种基于视频分析的活体判别方法、系统及存储介质,应用于信息处理技术领域。在本实施例的方法中,基于视频分析的活体判别系统会通过待分析视频的第一特征信息及预置的分类模型得到待分析视频的类型判别参数值,然后根据类型判别参数值确定待分析视频是否属于活体视频。这样不需要与用户进行交互,也不需要配备双摄像头,只需录制一段视频,则基于视频分析的活体判别系统就会根据预置的机器学习模型(包括分类模型和特征提取模型)确定该段视频是否属于活体视频,简化了活体判别过程,方便了活体判别方法在各个领域的应用。

Description

一种基于视频分析的活体判别方法、系统及存储介质
本申请要求于2017年1月19日提交中国专利局、申请号为201710044150.6、发明名称为“一种基于视频分析的活体判别方法及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及信息处理技术领域,特别涉及一种基于视频分析的活体判别方法、系统及存储介质。
技术背景
活体判别技术可以应用于越来越多的领域,比如人脸门禁、闸机、网络银行远程开户等领域的应用。具体地,在人脸门禁、闸机的应用中,需要验证当前用户确实是合法用户,能够有效抵挡非法用户借用合法用户的照片通过系统的检测。
一种现有的活体判别方法,需要在实际应用场景中结合一定的交互,如摇头、眨眼等,当用户按照提示做出正确的交互后,才能通过活体检测,整个活体判别过程较繁琐,且存在用户不配合交互的情况,导致通过率较低,影响用户体验。而另一种基于双目视觉的活体判别方法,是通过双摄像头重建视频中的活体,计算重建三维模型是否在一个平面内,从而判断是否为活体,该方法需要配备双摄像头,且计算量大,不适用于嵌入式和移动端的活体判别。
技术内容
本申请实施例提供一种基于视频分析的活体判别方法、系统及存储介质,实现了根据训练的机器学习模型确定待分析视频是否为活体视 频。
本申请实施例提供一种基于视频分析的活体判别方法,包括:
服务器根据预置的特征提取模型提取待分析视频的第一特征信息;
所述服务器根据预置的分类模型及所述第一特征信息,计算所述待分析视频对应的类型判别参数值,所述分类模型包括活体视频和非活体视频分别对应的基于特征信息的类型判别参数的计算信息;
所述服务器根据所述类型判别参数值确定所述待分析视频是否属于活体视频。
本申请实施例提供一种基于视频分析的活体判别系统,包括:处理器和与所述处理器相连的存储器;其中,所述存储器存储有可被所述处理器执行的计算机指令,所述计算机指令包括:
特征提取单元,用于根据预置的特征提取模型提取待分析视频的第一特征信息;
参数值计算单元,用于根据预置的分类模型及所述第一特征信息,计算所述待分析视频对应的类型判别参数值,所述分类模型包括活体视频和非活体视频分别对应的基于特征信息的类型判别参数的计算信息;
类型确定单元,用于根据所述类型判别参数值确定所述待分析视频是否属于活体视频。
本申请另一方面提供了一种存储介质,其上存储有计算机程序,所述计算机程序能够被一处理器执行并实现上述任一实现方式的活体判别方法。
可见,在本实施例的方法中,基于视频分析的活体判别系统会通过待分析视频的第一特征信息及预置的分类模型得到待分析视频的类型判别参数值,然后根据类型判别参数值确定待分析视频是否属于活体视频。这样不需要与用户进行交互,也不需要配备双摄像头,只需录制一 段视频,则基于视频分析的活体判别系统就会根据预置的机器学习模型(包括分类模型和特征提取模型)确定该段视频是否属于活体视频,简化了活体判别过程,方便了活体判别方法在各个领域的应用。
附图简要说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种基于视频分析的活体判别方法的应用场景示意图;
图2是本申请实施例提供的一种基于视频分析的活体判别方法的流程图;
图3是本申请实施例中提取待分析视频的第一特征信息的方法流程图;
图4是本申请应用实施例中提取的特征提取模型和分类模型的结构示意图;
图5是本申请实施例提供的一种基于视频分析的活体判别系统的结构示意图;
图6是本申请实施例提供的另一种基于视频分析的活体判别系统的结构示意图;
图7是本申请实施例提供的一种终端设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例例如能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排它的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
图1为本申请实施例提供的一种基于视频分析的活体判别方法的应用场景。该基于视频分析的活体判别方法在用户通过人脸门禁3时,通过摄像头1获取用户的视频,将所获取的视频作为待分析视频提供给服务器2,由服务器2进一步验证当前用户是否为合法用户,以能够有效抵挡非法用户借用合法用户的照片通过门禁系统3的检测。
具体地,服务器2可根据预置的特征提取模型提取待分析视频的第一特征信息;并根据预置的分类模型及所述第一特征信息,计算所述待分析视频对应的类型判别参数值,以根据所述类型判别参数值确定所述待分析视频是否属于活体视频。其中,所述分类模型包括活体视频和非活体视频分别对应的基于特征信息的类型判别参数的计算信息。
本申请实施例提供一种基于视频分析的活体判别方法,主要是基于视频分析的活体判别系统所执行的方法,流程图如图2所示,包括:
步骤101,根据预置的特征提取模型提取待分析视频的第一特征信息,这里的第一特征信息可以包括时间特征信息和空间特征信息,其中空间特征信息具体是待分析视频包含的多帧图像的像素特征信息。
步骤102,根据预置的分类模型及第一特征信息,计算待分析视频对应的类型判别参数值,其中,分类模型包括活体视频和非活体视频分别对应的基于特征信息的类型判别参数的计算信息,这里的计算信息可以是指在将特征信息作为输入计算类型判别参数值的过程中所用到的数学公式和固定参数值等。
在本实施例中,预置的特征提取模型和分类模型可以是基于视频分析的活体判别系统对多个已标记活体视频和非活体视频的视频训练样本进行训练得到并储存在系统中的。具体地,特征提取模型可以采用深度学习网络,包括多个参数计算层(比如卷积层,全连接层等),在系统中可以储存该特征提取模型的数据包括各个参数计算层的计算参数值(比如卷积核信息等)及关系信息(比如参数计算层之间的连接关系),其中,卷积层可以对视频所包含的多帧图像的时间信息和像素信息进行卷积运算,从而可以得到视频的时间特征信息和像素特征信息,全连接层可以得到卷积层获取的特征信息之间的关联关系。
分类模型可以是二分类器,在一种情况下,系统中储存的分类模型可以包括活体视频和非活体视频分别对应的基于特征信息的概率计算信息,即所述活体视频和非活体视频分别对应的基于所述第一特征信息的类型判别参数的计算信息为活体视频和非活体视频分别对应的基于所述第一特征信息的概率计算信息,包括概率计算公式和固定参数值,这样在执行本步骤102时,可以根据第一特征信息及概率计算信息计算 待分析视频属于活体视频的第一概率和属于非活体视频的第二概率。该分类模型具体可以是softmax分类器等,其中,softmax分类器主要是将上述第一特征信息作为输入,并通过softmax函数计算待分析视频属于活体视频的第一概率和属于非活体视频的第二概率,且第一概率与第二概率之和为1。
在另一种情况下,系统中储存的分类模型可以包括分别与活体视频和非活体视频的特征信息之间的距离计算信息,即所述活体视频和非活体视频分别对应的基于所述第一特征信息的类型判别参数的计算信息为分别与活体视频和非活体视频的特征信息之间的距离计算信息,包括距离计算公式(可以是欧式距离计算公式等)及活体视频和非活体视频分别对应的特征信息等,这样在执行本步骤102时,可以根据距离计算信息计算上述第一特征信息分别与活体视频的特征信息的第一距离和非活体视频的特征信息的第二距离。该分类模型具体可以采用支持向量机(Support Vector Machines,SVM)分类器等。
步骤103,根据上述步骤102得到的类型判别参数值确定待分析视频是否属于活体视频。
一种情况下,如果上述步骤102计算的类型判别参数值为待分析视频属于活体视频的第一概率和属于非活体视频的第二概率,将第一概率和第二概率中较大概率的视频类型(活体视频或非活体视频)作为待分析视频的视频类型,比如属于活体视频的第一概率较大,则该待分析视频属于活体视频。另一种情况下,如果上述步骤102计算的类型判别参数值为第一特征信息分别与活体视频的特征信息的第一距离和非活体视频的特征信息的第二距离,则可以将第一距离和第二距离中较小距离对应的视频类型确定为待分析视频的类型,比如第一特征信息与活体视频的特征信息之间的第一距离较小,则该待分析视频属于活体视频。
可见,在本实施例的方法中,基于视频分析的活体判别系统会通过待分析视频的第一特征信息及预置的分类模型得到待分析视频的类型判别参数值,然后根据类型判别参数值确定待分析视频是否属于活体视频。这样不需要与用户进行交互,也不需要配备双摄像头,只需录制一段视频,则基于视频分析的活体判别系统就会根据预置的机器学习模型(包括分类模型和特征提取模型)确定该段视频是否属于活体视频,简化了活体判别过程,方便了活体判别方法在各个领域的应用。
参考图3所示,在一个具体的实施例中,基于视频分析的活体判别系统可以通过如下步骤来执行上述步骤101中的提取第一特征信息,具体包括:
步骤201,将待分析视频分为多段n帧的子视频,两段相邻的子视频之间有m帧的重叠图像,这里n为大于m的自然数。
可以理解,每段视频都包含多个帧的图像,每一帧的图像表示某个时间点的图像,本实施例中待分析视频划分的多段子视频中,每相邻的两段子视频之间具有重叠图像,这样使得子视频之间具有关联性。
步骤202,分别根据特征提取模型提取多段子视频的特征信息。
其中,如果特征提取模型可以包括卷积层,池化层和全连接层,则基于视频分析的活体判别系统在提取某一个子视频的特征信息时,可以通过如下步骤来实现,具体包括:
步骤A,通过卷积层对某一子视频的时间信息和像素信息进行卷积计算得到t个维度的时间特征信息和像素特征信息,具体是通过多个卷积核分别与子视频包含的多帧图像中相应位置的元素值(包括时间和像素)进行相乘,再将相乘的结果相加得到t个维度的时间特征信息和像素特征信息。
步骤B,通过池化层将t个维度的时间特征信息和像素特征信息进行 降维处理得到p个维度的时间特征信息和像素特征信息,p为小于t的自然数。
步骤C,通过全连接层确定p个维度的时间特征信息和像素特征信息之间的关联关系,具体可以通过各个维度的时间特征信息和像素特征信息的权重值来表示该关联关系,则某一子视频的特征信息包括具有关联关系的p个维度的时间特征信息和像素特征信息。
步骤203,计算步骤202提取的多段子视频的特征信息的平均值作为待分析视频的第一特征信息。
进一步地,上述预置的特征提取模型可以采用如下方法进行训练得到:分别将多个视频训练样本包含的图像的元素值(包括时间信息和像素信息)输入到计算网络中计算得到对应的特征信息,这里计算网络包括多个串联的参数计算层,任一参数计算层根据输入信息与对应的计算参数值得到计算结果,并将计算结果输入到下一参数计算层,参数计算层包括卷积层,池化层和全连接层。在这个过程中,当得到一个视频训练样本对应的特征信息后,都会调整计算网络中的各个参数计算层对应的计算参数值,并基于调整后的计算网络得到另一视频训练样本的特征信息,使得另一视频训练样本的特征信息满足收敛条件,则特征提取模型为进行调整后的计算网络。在训练得到特征提取模型后,活体判别系统会储存该计算网络的结构信息及最终调整得到的各个参数计算层对应的计算参数值。
其中,计算网络可以是任意结构的计算网络,这里并不对该计算网络的具体结构进行限定,上述训练的过程是多计算网络中各个参数计算层的计算参数值进行训练。
在训练得到特征提取模型后,活体判别系统可以继续训练得到分类模型,具体地,根据多个视频训练样本中属于活体视频的第一视频训练 样本对应的第二特征信息确定活体视频的基于特征信息的第一概率计算信息,使得根据该确定的第一概率计算信息得到的概率大于0.5;或者,根据多个视频训练样本中属于非活体视频的第二视频训练样本对应的第三特征信息确定非活体视频的基于特征信息的第二概率计算信息,使得根据该确定的第二概率计算信息得到的概率大于0.5。其中任一概率计算信息(第一概率计算信息或第二概率计算信息)可以包括概率计算公式和固定参数等信息。
以下以一个具体的应用实例说明本实施例的基于视频分析的活体判别方法,本实施例可以包括两个过程,即离线训练过程和在线预测过程,具体地:
(1)离线训练过程,主要是对多个已标记活体视频和非活体视频的视频训练样本进行训练得到特征提取模型和分类模型,包括前期训练(train)过程和微调(fine tune)过程。
具体地,活体判别系统会对多个视频训练样本进行训练,得到如图4所示的计算网络中各个参数计算层的计算信息。
分别将多个视频训练样本包含的图像的元素值(包括时间信息和像素信息)输入到计算网络中计算得到对应的第一特征信息,这里计算网络包括多个串联的参数计算层,任一参数计算层根据输入信息与对应的计算参数值得到计算结果,并将计算结果输入到下一参数计算层,参数计算层包括卷积层310,池化层320和全连接层330。
本申请实施例中,该计算网络包括8个三维(three-dimensional,3D)卷积层310,5个池化(pooling)层320和2个全连接层330,以及Softmax分类器340。其中,Softmax分类器340属于分类模型,卷积层310、池化层320及全连接层330属于特征提取模型。各个卷积层310包括3×3×3的卷积核,卷积跨度(stride)在空间和时间序列维度均为1,卷积层1a 311 包括64个卷积核,卷积层2a 312的卷积核为128个,卷积层3a 313和卷积层3b 314的卷积核数量均为256,卷积层4a 315,卷积层4b 316,卷积层5a 317和卷积层5b 318四个卷积层的卷积核数量均为512;池化层320中,池化层1 321的核大小为1×2×2,池化层2 322、池化层3 323、池化层4324及池化层5 325的核大小均为2×2×2;全连接层330的输出维度均为4096维。
在训练过程中,得到一个视频训练样本对应的第一特征信息后,服务器会调整计算网络中的各个参数计算层对应的计算参数值,并基于调整后的计算网络得到另一视频训练样本的第一特征信息,使得另一视频训练样本的第一特征信息满足收敛条件,则特征提取模型为进行调整后的计算网络。在训练得到特征提取模型后,活体判别系统会储存该计算网络的结构信息及最终调整得到的各个参数计算层对应的计算参数值。
需要说明的是,在前期训练过程中,在提取得到多个视频训练样本对应的特征信息后,会使用分类器340对视频的通用问题进行分类,即可将这多个视频训练样本分为多个类型,不限定于活体视频和非活体视频两种类型,这样通过前提训练过程训练出如图4所示的计算网络中各个参数计算层的初始计算信息。而在微调过程中,对前期训练过程得到的计算网络中各个参数计算层的初始计算信息进行调整,使得分类器340只对视频属于活体视频和非活体视频的两个类型进行训练,且通过微调过程训练得到的计算网络中各个参数计算层的最终计算信息作为以后在线预测过程的参数。实践证明,通过前期训练过程可以得到较好的初始计算信息,从而使得得到的最终计算信息在应用中效果比较好,即根据最终计算信息确定待分析视频的视频类型(活体视频或非活体视频)的效果比较好。
需要说明的是,计算网络可以是任意结构,并不限定于如图4所示 的结构,上述训练的过程是多计算网络中各个参数计算层的计算参数值进行训练。
(2)在线预测过程,主要是使用微调后得到的计算网络中各个参数计算层的计算信息对待分析视频是否属于活体视频。
具体地,活体判别系统会先将待分析视频分解为多个16帧的子视频,相邻的两段子视频之间有8帧重叠图像;然后将分解后的各个子视频所包含的16帧图像的元素值输入到上述训练得到的计算网络,通过全连接层6得到每段子视频分别对应的4096维特向量,通过全连接层7将这些特征向量进行平均,即得到得分析视频的特征向量,即待分析视频的特征信息;最后根据Softmax分类器和待分析视频的特征信息分别计算得到待分析视频属于活体视频和非活体视频的概率,并将较大概率对应的视频类型确定为待分析视频的视频类型。
本申请实施例还提供一种基于视频分析的活体判别系统,其结构示意图如图5所示,具体可以包括:
特征提取单元10,用于根据预置的特征提取模型提取待分析视频的第一特征信息;
参数值计算单元11,用于根据预置的分类模型及所述特征提取单元10提取的第一特征信息,计算所述待分析视频对应的类型判别参数值,所述分类模型包括活体视频和非活体视频分别对应的基于特征信息的类型判别参数的计算信息;
类型确定单元12,用于根据所述参数值计算单元11计算的类型判别参数值确定所述待分析视频是否属于活体视频。
在一种情况下,所述参数值计算单元11,具体用于如果所述分类模型包括活体视频和非活体视频分别对应的基于特征信息的概率计算信息,即所述活体视频和非活体视频分别对应的基于特征信息的类型判别 参数的计算信息为活体视频和非活体视频分别对应的基于特征信息的概率计算信息时,根据所述第一特征信息及所述概率计算信息计算所述待分析视频属于活体视频的第一概率和属于非活体视频的第二概率;所述类型确定单元12,具体用于将所述第一概率和第二概率中较大概率的视频类型确定为所述待分析视频的类型。
在另一种情况下,所述参数值计算单元11,具体用于如果所述分类模型的数据包括分别与活体视频和非活体视频的特征信息之间的距离计算信息,即所述活体视频和非活体视频分别对应的基于特征信息的类型判别参数的计算信息为分别与活体视频和非活体视频的特征信息之间的距离计算信息时,根据所述距离计算信息计算所述第一特征信息分别与活体视频的特征信息的第一距离和非活体视频的特征信息的第二距离;所述类型确定单元12,具体用于将所述第一距离和第二距离中较小距离对应的视频类型确定为所述待分析视频的类型。
可见,在本实施例的系统中,参数值计算单元11会通过待分析视频的第一特征信息及预置的分类模型得到待分析视频的类型判别参数值,然后类型确定单元12根据类型判别参数值确定待分析视频是否属于活体视频。这样不需要与用户进行交互,也不需要配备双摄像头,只需录制一段视频,则基于视频分析的活体判别系统就会根据预置的机器学习模型(包括分类模型和特征提取模型)确定该段视频是否属于活体视频,简化了活体判别过程,方便了活体判别方法在各个领域的应用。
参考图6所示,在一个具体的实施例中,判别系统除了可以包括如图5所示的结构外,还可以包括提取模型训练单元13和分类模型训练单元14,且系统中的特征提取单元10可以通过划分单元110,提取单元120和确定单元130来实现,具体地:
划分单元110,用于将所述待分析视频分为多段n帧的子视频,两段 相邻的所述子视频之间有m帧的重叠图像,所述n为大于m的自然数;
提取单元120,用于分别根据所述特征提取模型提取所述划分单元110划分的多段子视频的特征信息;
确定单元130,用于计算所述提取单元120得到的多段子视频的特征信息的平均值作为所述第一特征信息。这样参数计算单元11会根据确定单元130确定的第一特征信息
其中,所述提取单元120,具体用于针对所述多段子视频中的每一子视频,如果所述特征提取模型包括卷积层,池化层和全连接层,通过所述卷积层对所述子视频的时间信息和像素信息进行卷积计算得到t个维度的时间特征信息和像素特征信息;通过所述池化层将所述t个维度的时间特征信息和像素特征信息进行降维处理得到p个维度的时间特征信息和像素特征信息;通过所述全连接层确定所述p个维度的时间特征信息和像素特征信息之间的关联关系,则所述子视频的特征信息包括具有所述关联关系的p个维度的时间特征信息和像素特征信息。
提取模型训练单元13,用于分别将多个视频训练样本包含的图像的元素值输入到计算网络中计算得到对应的特征信息,所述计算网络包括多个串联的参数计算层,任一参数计算层根据输入信息与对应的计算参数值得到计算结果,并将计算结果输入到下一参数计算层;其中,在得到一个视频训练样本对应的特征信息后,调整所述计算网络中的各个参数计算层对应的计算参数值,并基于调整后的计算网络得到另一视频训练样本的特征信息,使得另一视频训练样本的特征信息满足收敛条件,则所述特征提取模型为进行所述调整后的计算网络。这样特征提取单元10所包括的提取单元120在提取任一子视频的特征信息时,会根据该提取模型训练单元13训练得到的特征提取模型进行提取。
分类模型训练单元14,用于根据所述多个视频训练样本中属于活体 视频的第一视频训练样本对应的第二特征信息确定活体视频的基于特征信息的第一概率计算信息,使得根据所述第一概率计算信息得到的概率大于0.5;或,根据所述多个视频训练样本中属于非活体视频的第二视频训练样本对应的第三特征信息确定非活体视频的基于特征信息的第二概率计算信息,使得根据所述第二概率计算信息得到的概率大于0.5。这样参数值计算单元11会根据该分类模型训练单元14训练得到的分类模型及上述特征提取单元10所包括的确定单元130确定的第一特征信息,计算待分析视频的类型判别参数值。
本申请实施例还提供一种终端设备,其结构示意图如图7所示,该终端设备可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)20(例如,一个或一个以上处理器)和存储器21,一个或一个以上存储应用程序221或数据222的存储介质22(例如一个或一个以上海量存储设备)。其中,存储器21和存储介质22可以是短暂存储或持久存储。存储在存储介质22的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对终端设备中的一系列指令操作。更进一步地,中央处理器20可以设置为与存储介质22通信,在终端设备上执行存储介质22中的一系列指令操作。
具体地,在存储介质22中储存的应用程序221包括基于视频分析的活体判别的应用程序,且该程序可以包括上述基于视频分析的活体判别系统中的特征提取单元10,参数值计算单元11,类型确定单元12,提取模型训练单元13和分类模型训练单元14,在此不进行赘述。更进一步地,中央处理器20可以设置为与存储介质22通信,在终端设备上执行存储介质22中储存的基于视频分析的活体判别的应用程序对应的一系列操作。
终端设备还可以包括一个或一个以上电源23,一个或一个以上有线 或无线网络接口24,一个或一个以上输入输出接口25,和/或,一个或一个以上操作系统223,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
上述方法实施例中所述的由基于视频分析的活体判别系统所执行的步骤可以基于该图7所示的终端设备的结构。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:只读存储器(ROM)、随机存取存储器RAM)、磁盘或光盘等。
本申请另一方面提供了一种活体判别系统,所述活体判别系统包括存储有可被所述处理器执行的计算机指令的存储器、以及与所述存储器连接的处理器。所述计算机指令在被所述处理器执行时能够实现本申请上述实施例中所描述的基于视频分析的活体判别方法。
本领域普通技术人员可以意识到,结合本申请实施例中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
以上对本申请实施例所提供的基于视频分析的活体判别方法及系统进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核 心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (16)

  1. 一种基于视频分析的活体判别方法,其特征在于,包括:
    服务器根据预置的特征提取模型提取待分析视频的第一特征信息;
    所述服务器根据预置的分类模型及所述第一特征信息,计算所述待分析视频对应的类型判别参数值,所述分类模型包括活体视频和非活体视频分别对应的基于特征信息的类型判别参数的计算信息;
    所述服务器根据所述类型判别参数值确定所述待分析视频是否属于活体视频。
  2. 如权利要求1所述的方法,其特征在于,所述服务器根据预置的特征提取模型提取待分析视频的第一特征信息,具体包括:
    所述服务器将所述待分析视频分为多段n帧的子视频,两段相邻的所述子视频之间有m帧的重叠图像,所述n为大于m的自然数;
    所述服务器分别根据所述特征提取模型提取所述多段子视频的特征信息;
    所述服务器计算所述多段子视频的特征信息的平均值作为所述第一特征信息。
  3. 如权利要求2所述的方法,其特征在于,所述特征提取模型包括卷积层,池化层和全连接层,所述服务器分别根据所述特征提取模型提取所述多段子视频的特征信息,具体包括:
    针对所述多段子视频中的每个子视频,所述服务器通过所述卷积层对所述子视频的时间信息和像素信息进行卷积计算得到t个维度的时间特征信息和像素特征信息;
    所述服务器通过所述池化层将所述t个维度的时间特征信息和像素特征信息进行降维处理得到p个维度的时间特征信息和像素特征信息;
    所述服务器通过所述全连接层确定所述p个维度的时间特征信息和像素特征信息之间的关联关系,则所述子视频的特征信息包括具有所述关联关系的p个维度的时间特征信息和像素特征信息。
  4. 如权利要求1至3任一项所述的方法,其特征在于,所述活体视频和非活体视频分别对应的基于特征信息的类型判别参数的计算信息为:活体视频和非活体视频分别对应的基于特征信息的概率计算信息;则所述服务器根据预置的分类模型及第一特征信息,计算所述待分析视频对应的类型判别参数值,具体包括:
    所述服务器根据所述第一特征信息及所述概率计算信息计算所述待分析视频属于活体视频的第一概率和属于非活体视频的第二概率;
    所述服务器根据所述类型判别参数值确定所述待分析视频是否属于活体视频,具体包括:所述服务器将所述第一概率和第二概率中较大概率的视频类型确定为所述待分析视频的类型。
  5. 如权利要求4所述的方法,其特征在于,在所述服务器根据预置的特征提取模型提取待分析视频的第一特征信息之前,所述方法还包括:
    所述服务器分别将多个视频训练样本包含的图像的元素值输入到计算网络中计算得到对应的特征信息,所述计算网络包括多个串联的参数计算层,任一参数计算层根据输入信息与对应的计算参数值得到计算结果,并输入到下一参数计算层;
    其中,在得到一个视频训练样本对应的特征信息后,调整所述计算网络中的各个参数计算层对应的计算参数值,并基于调整后的计算网络得到另一视频训练样本的特征信息,使得另一视频训练样本的特征信息满足收敛条件,则所述特征提取模型为进行所述调整后的计算网络。
  6. 如权利要求5所述的方法,其特征在于,所述方法还包括:
    所述服务器根据所述多个视频训练样本中属于活体视频的第一视频训练样本对应的第二特征信息确定活体视频的基于特征信息的第一概率计算信息,使得根据所述第一概率计算信息得到的概率大于0.5。
  7. 如权利要求5所述的方法,其特征在于,所述方法还包括:
    所述服务器根据所述多个视频训练样本中属于非活体视频的第二视频训练样本对应的第三特征信息确定非活体视频的基于特征信息的第二概率计算信息,使得根据所述第二概率计算信息得到的概率大于0.5。
  8. 如权利要求1至3任一项所述的方法,其特征在于,所述活体视频和非活体视频分别对应的基于特征信息的类型判别参数的计算信息为:分别与活体视频和非活体视频的特征信息之间的距离计算信息,则所述服务器根据预置的分类模型及第一特征信息,计算所述待分析视频对应的类型判别参数值,具体包括:
    所述服务器根据所述距离计算信息计算所述第一特征信息分别与活体视频的特征信息的第一距离和非活体视频的特征信息的第二距离;
    所述服务器根据所述类型判别参数值确定所述待分析视频是否属于活体视频,具体包括:所述服务器将所述第一距离和第二距离中较小距离对应的视频类型确定为所述待分析视频的类型。
  9. 一种基于视频分析的活体判别系统,其特征在于,包括:处理器和与所述处理器相连的存储器;其中,所述存储器存储有可被所述处理器执行的计算机指令,所述计算机指令包括:
    特征提取单元,用于根据预置的特征提取模型提取待分析视频的第一特征信息;
    参数值计算单元,用于根据预置的分类模型及所述第一特征信息,计算所述待分析视频对应的类型判别参数值,所述分类模型包括活体视 频和非活体视频分别对应的基于特征信息的类型判别参数的计算信息;
    类型确定单元,用于根据所述类型判别参数值确定所述待分析视频是否属于活体视频。
  10. 如权利要求9所述的系统,其特征在于,所述特征提取单元包括:
    划分单元,用于将所述待分析视频分为多段n帧的子视频,两段相邻的所述子视频之间有m帧的重叠图像,所述n为大于m的自然数;
    提取单元,用于分别根据所述特征提取模型提取所述多段子视频的特征信息;
    确定单元,用于计算所述多段子视频的特征信息的平均值作为所述第一特征信息。
  11. 如权利要求10所述的系统,其特征在于,
    所述提取单元,具体用于针对所述多段子视频中的每一子视频,如果所述特征提取模型包括卷积层,池化层和全连接层,通过所述卷积层对所述子视频的时间信息和像素信息进行卷积计算得到t个维度的时间特征信息和像素特征信息;通过所述池化层将所述t个维度的时间特征信息和像素特征信息进行降维处理得到p个维度的时间特征信息和像素特征信息;通过所述全连接层确定所述p个维度的时间特征信息和像素特征信息之间的关联关系,则所述子视频的特征信息包括具有所述关联关系的p个维度的时间特征信息和像素特征信息。
  12. 如权利要求9至11任一项所述的系统,其特征在于,
    所述参数值计算单元,具体用于在所述活体视频和非活体视频分别对应的基于特征信息的类型判别参数的计算信息为活体视频和非活体视频分别对应的基于特征信息的概率计算信息时,根据所述第一特征信息及所述概率计算信息计算所述待分析视频属于活体视频的第一概率 和属于非活体视频的第二概率;
    所述类型确定单元,具体用于将所述第一概率和第二概率中较大概率的视频类型确定为所述待分析视频的类型。
  13. 如权利要求12所述的系统,其特征在于,所述计算机指令还包括:
    提取模型训练单元,用于分别将多个视频训练样本包含的图像的元素值输入到计算网络中计算得到对应的特征信息,所述计算网络包括多个串联的参数计算层,任一参数计算层根据输入信息与对应的计算参数值得到计算结果,并输入下一参数计算层;
    其中,在得到一个视频训练样本对应的特征信息后,调整所述计算网络中的各个参数计算层对应的计算参数值,并基于调整后的计算网络得到另一视频训练样本的特征信息,使得另一视频训练样本的特征信息满足收敛条件,则所述特征提取模型为进行所述调整后的计算网络。
  14. 如权利要求13所述的系统,其特征在于,所述计算机指令还包括:
    分类模型训练单元,用于根据所述多个视频训练样本中属于活体视频的第一视频训练样本对应的第二特征信息确定活体视频的基于特征信息的第一概率计算信息,使得根据所述第一概率计算信息得到的概率大于0.5;或,根据所述多个视频训练样本中属于非活体视频的第二视频训练样本对应的第三特征信息确定非活体视频的基于特征信息的第二概率计算信息,使得根据所述第二概率计算信息得到的概率大于0.5。
  15. 如权利要求9至11任一项所述的系统,其特征在于,
    所述参数值计算单元,具体用于在所述活体视频和非活体视频分别对应的基于特征信息的类型判别参数的计算信息为:分别与活体视频和非活体视频的特征信息之间的距离计算信息时,根据所述距离计算信息 计算所述第一特征信息分别与活体视频的特征信息的第一距离和非活体视频的特征信息的第二距离;
    所述类型确定单元,具体用于将所述第一距离和第二距离中较小距离对应的视频类型确定为所述待分析视频的类型。
  16. 一种计算机可读存储介质,其上存储有计算机程序;其特征在于,所述计算机程序能够被一处理器执行并实现如权利要求1至8中任一项所述的基于视频分析的活体判别方法。
PCT/CN2018/072973 2017-01-19 2018-01-17 一种基于视频分析的活体判别方法、系统及存储介质 WO2018133791A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710044150.6A CN106874857B (zh) 2017-01-19 2017-01-19 一种基于视频分析的活体判别方法及系统
CN201710044150.6 2017-01-19

Publications (1)

Publication Number Publication Date
WO2018133791A1 true WO2018133791A1 (zh) 2018-07-26

Family

ID=59159164

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/072973 WO2018133791A1 (zh) 2017-01-19 2018-01-17 一种基于视频分析的活体判别方法、系统及存储介质

Country Status (2)

Country Link
CN (1) CN106874857B (zh)
WO (1) WO2018133791A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858381A (zh) * 2019-01-04 2019-06-07 深圳壹账通智能科技有限公司 活体检测方法、装置、计算机设备和存储介质
CN110110660A (zh) * 2019-05-07 2019-08-09 广东工业大学 手部操作行为的分析方法、装置及设备
CN110147711A (zh) * 2019-02-27 2019-08-20 腾讯科技(深圳)有限公司 视频场景识别方法、装置、存储介质和电子装置
CN110383288A (zh) * 2019-06-06 2019-10-25 深圳市汇顶科技股份有限公司 人脸识别的方法、装置和电子设备
CN111178204A (zh) * 2019-12-20 2020-05-19 深圳大学 一种视频数据编辑识别方法、装置、智能终端及存储介质
CN112215133A (zh) * 2020-10-10 2021-01-12 中国平安人寿保险股份有限公司 基于人工智能的学员态度识别方法、装置、计算机设备
CN113128258A (zh) * 2019-12-30 2021-07-16 杭州海康威视数字技术股份有限公司 活体检测方法、装置、电子设备及存储介质

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874857B (zh) * 2017-01-19 2020-12-01 腾讯科技(上海)有限公司 一种基于视频分析的活体判别方法及系统
CN107992842B (zh) * 2017-12-13 2020-08-11 深圳励飞科技有限公司 活体检测方法、计算机装置及计算机可读存储介质
CN108133020A (zh) * 2017-12-25 2018-06-08 上海七牛信息技术有限公司 视频分类方法、装置、存储介质及电子设备
CN108182409B (zh) * 2017-12-29 2020-11-10 智慧眼科技股份有限公司 活体检测方法、装置、设备及存储介质
CN108509803B (zh) * 2018-03-15 2019-06-07 平安科技(深圳)有限公司 一种应用图标的显示方法及终端设备
CN108399401B (zh) * 2018-03-27 2022-05-03 百度在线网络技术(北京)有限公司 用于检测人脸图像的方法和装置
CN110443102B (zh) * 2018-05-04 2022-05-24 北京眼神科技有限公司 活体人脸检测方法及装置
CN109308719B (zh) * 2018-08-31 2022-03-15 电子科技大学 一种基于三维卷积的双目视差估计方法
CN110378219B (zh) * 2019-06-13 2021-11-19 北京迈格威科技有限公司 活体检测方法、装置、电子设备及可读存储介质
CN111091047B (zh) * 2019-10-28 2021-08-27 支付宝(杭州)信息技术有限公司 活体检测方法、装置、服务器和人脸识别设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090074259A1 (en) * 2005-07-29 2009-03-19 Madalina Baltatu Automatic biometric identification based on face recognition and support vector machines
CN104933414A (zh) * 2015-06-23 2015-09-23 中山大学 一种基于wld-top的活体人脸检测方法
CN105956572A (zh) * 2016-05-15 2016-09-21 北京工业大学 一种基于卷积神经网络的活体人脸检测方法
CN106709458A (zh) * 2016-12-27 2017-05-24 深圳市捷顺科技实业股份有限公司 一种人脸活体检测方法及装置
CN106874857A (zh) * 2017-01-19 2017-06-20 腾讯科技(上海)有限公司 一种基于视频分析的活体判别方法及系统

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103310590A (zh) * 2012-03-06 2013-09-18 上海骏聿数码科技有限公司 一种驾驶员疲劳度分析及预警系统及方法
CN103593598B (zh) * 2013-11-25 2016-09-21 上海骏聿数码科技有限公司 基于活体检测和人脸识别的用户在线认证方法及系统
CN104182735A (zh) * 2014-08-18 2014-12-03 厦门美图之家科技有限公司 训练优化的基于卷积神经网络的色情图像或视频检测方法
CN105095867A (zh) * 2015-07-21 2015-11-25 哈尔滨多智科技发展有限公司 基于深度学习的快速动态人脸提取、识别方法
CN105335716B (zh) * 2015-10-29 2019-03-26 北京工业大学 一种基于改进udn提取联合特征的行人检测方法
CN105930710B (zh) * 2016-04-22 2019-11-12 北京旷视科技有限公司 活体检测方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090074259A1 (en) * 2005-07-29 2009-03-19 Madalina Baltatu Automatic biometric identification based on face recognition and support vector machines
CN104933414A (zh) * 2015-06-23 2015-09-23 中山大学 一种基于wld-top的活体人脸检测方法
CN105956572A (zh) * 2016-05-15 2016-09-21 北京工业大学 一种基于卷积神经网络的活体人脸检测方法
CN106709458A (zh) * 2016-12-27 2017-05-24 深圳市捷顺科技实业股份有限公司 一种人脸活体检测方法及装置
CN106874857A (zh) * 2017-01-19 2017-06-20 腾讯科技(上海)有限公司 一种基于视频分析的活体判别方法及系统

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858381A (zh) * 2019-01-04 2019-06-07 深圳壹账通智能科技有限公司 活体检测方法、装置、计算机设备和存储介质
CN110147711A (zh) * 2019-02-27 2019-08-20 腾讯科技(深圳)有限公司 视频场景识别方法、装置、存储介质和电子装置
CN110147711B (zh) * 2019-02-27 2023-11-14 腾讯科技(深圳)有限公司 视频场景识别方法、装置、存储介质和电子装置
CN110110660A (zh) * 2019-05-07 2019-08-09 广东工业大学 手部操作行为的分析方法、装置及设备
CN110383288A (zh) * 2019-06-06 2019-10-25 深圳市汇顶科技股份有限公司 人脸识别的方法、装置和电子设备
CN110383288B (zh) * 2019-06-06 2023-07-14 深圳市汇顶科技股份有限公司 人脸识别的方法、装置和电子设备
CN111178204A (zh) * 2019-12-20 2020-05-19 深圳大学 一种视频数据编辑识别方法、装置、智能终端及存储介质
CN111178204B (zh) * 2019-12-20 2023-05-09 深圳大学 一种视频数据编辑识别方法、装置、智能终端及存储介质
CN113128258A (zh) * 2019-12-30 2021-07-16 杭州海康威视数字技术股份有限公司 活体检测方法、装置、电子设备及存储介质
CN113128258B (zh) * 2019-12-30 2022-10-04 杭州海康威视数字技术股份有限公司 活体检测方法、装置、电子设备及存储介质
CN112215133A (zh) * 2020-10-10 2021-01-12 中国平安人寿保险股份有限公司 基于人工智能的学员态度识别方法、装置、计算机设备
CN112215133B (zh) * 2020-10-10 2023-09-08 中国平安人寿保险股份有限公司 基于人工智能的学员态度识别方法、装置、计算机设备

Also Published As

Publication number Publication date
CN106874857B (zh) 2020-12-01
CN106874857A (zh) 2017-06-20

Similar Documents

Publication Publication Date Title
WO2018133791A1 (zh) 一种基于视频分析的活体判别方法、系统及存储介质
JP7040952B2 (ja) 顔認証方法及び装置
US10891466B2 (en) Face verification method and apparatus
Sun et al. Abnormal event detection for video surveillance using deep one-class learning
WO2018188453A1 (zh) 人脸区域的确定方法、存储介质、计算机设备
US20210019541A1 (en) Technologies for transferring visual attributes to images
WO2021077984A1 (zh) 对象识别方法、装置、电子设备及可读存储介质
Zhang et al. Occlusion-free face alignment: Deep regression networks coupled with de-corrupt autoencoders
WO2021139324A1 (zh) 图像识别方法、装置、计算机可读存储介质及电子设备
US9449432B2 (en) System and method for identifying faces in unconstrained media
US11790494B2 (en) Facial verification method and apparatus based on three-dimensional (3D) image
WO2022078041A1 (zh) 遮挡检测模型的训练方法及人脸图像的美化处理方法
CN109614910B (zh) 一种人脸识别方法和装置
CN112052831B (zh) 人脸检测的方法、装置和计算机存储介质
US20220292351A1 (en) Systems, methods, and storage media for generating synthesized depth data
Yang et al. Correspondence driven adaptation for human profile recognition
US20230081982A1 (en) Image processing method and apparatus, computer device, storage medium, and computer program product
TW200910223A (en) Image processing apparatus and image processing method
JP2023547028A (ja) 顔画像品質評価方法及び装置、コンピュータ機器並びにコンピュータプログラム
EP4220560A1 (en) Device for processing image and method for operating same
Wu et al. Patchwise dictionary learning for video forest fire smoke detection in wavelet domain
WO2021047453A1 (zh) 图像质量确定方法、装置及设备
US20140050404A1 (en) Combining Multiple Image Detectors
CN117037244A (zh) 人脸安全检测方法、装置、计算机设备和存储介质
US20240037995A1 (en) Detecting wrapped attacks on face recognition

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18741864

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18741864

Country of ref document: EP

Kind code of ref document: A1