WO2021051602A1 - Lip password-based face recognition method and system, device, and storage medium - Google Patents

Lip password-based face recognition method and system, device, and storage medium Download PDF

Info

Publication number
WO2021051602A1
WO2021051602A1 PCT/CN2019/118281 CN2019118281W WO2021051602A1 WO 2021051602 A1 WO2021051602 A1 WO 2021051602A1 CN 2019118281 W CN2019118281 W CN 2019118281W WO 2021051602 A1 WO2021051602 A1 WO 2021051602A1
Authority
WO
WIPO (PCT)
Prior art keywords
lip
password
lip language
language
video
Prior art date
Application number
PCT/CN2019/118281
Other languages
French (fr)
Chinese (zh)
Inventor
张国辉
董洪涛
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021051602A1 publication Critical patent/WO2021051602A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships

Definitions

  • This application relates to the field of biometric recognition technology, and in particular to a face recognition method, system, device, and storage medium based on lip ciphers.
  • the password used is a combination of numbers, characters or other symbols.
  • the user must accurately remember the password used and enter it correctly to be successfully identified. Therefore, the traditional password identification method is easy to forget and remember incorrectly. , The operation is cumbersome and other disadvantages.
  • fingerprint recognition, face recognition, and iris recognition methods can overcome the above-mentioned shortcomings of traditional passwords that need to be memorized and input one by one, but they also have the following shortcomings: 1. Simple fingerprint recognition and face recognition, facing the attack of copying fingerprints and photos The risk of using copied fingerprints and static photos will deceive fingerprint recognition and face recognition; 2. The security of iris recognition is higher, but there is the problem of expensive equipment and higher economic costs.
  • This application provides a face recognition method, system, electronic device, and computer readable storage medium based on lip language passwords, which mainly use the dlib+Resnet face key point detection model to screen lip language images; through two-way LSTM
  • the lip language recognition model completes the classification of lip language images, and realizes the technical effect of using lip language as a password for face recognition.
  • the present application provides a face recognition method based on lip language password, which is applied to an electronic device.
  • the method includes: S110: Obtain a password-reading video of the subject to be tested; S120: Use a Resnet-based face detection model to The video is detected frame by frame, and the lip language images of consecutive frames in the time period from the start time to the end time of the subject to be tested reading the password in the video are obtained; S130, the lip language images of the consecutive frames in the time period , Determine the lip language feature set in the time period; S140, input the lip language feature set into the trained two-way LSTM-based lip language recognition model to obtain the predicted value of the lip language password; S150, if the lip language password If the predicted value is consistent with the password stored in the lip language recognition model, it is confirmed that the subject to be tested is recognized by the lip language password.
  • the present application provides a lip language password-based face recognition system, including a lip language feature set acquisition unit, a lip language password prediction value acquisition unit, and a password determination unit; wherein the lip language feature set acquisition unit , Used to obtain the password-reading video of the subject to be tested; the video is detected frame by frame through the face detection model based on Resnet, and the time period from the starting point of the subject to be tested to the end point in the video is obtained continuously Frame of lip language images; determine the lip language feature set in the time period according to the lip language images of consecutive frames in the time period; the lip language password prediction value acquisition unit is configured to combine the lip language features Input the trained lip language recognition model based on two-way LSTM to obtain the predicted value of the lip language password; the password determination unit is used to compare the predicted value of the lip language password with the password stored in the lip language recognition model Yes, if they are the same, confirm that the subject to be tested is recognized by the lip code.
  • the lip language feature set acquisition unit Used to obtain the password-reading video of the subject to
  • the present application also provides an electronic device, the electronic device comprising: a memory, a processor, the memory stores a lip-password-based facial recognition program, the lip-password-based facial recognition program
  • the following steps are implemented: S110, obtaining a password-reading video of the subject to be tested; S120, detecting the video frame by frame through the Resnet-based face detection model to obtain the password-reading video of the subject to be tested in the video
  • S130 Determine the lip language feature set in the time period according to the lip images of the consecutive frames in the time period; S140.
  • the language feature set is input into the trained lip language recognition model based on the two-way LSTM to obtain the predicted value of the lip language password; S150. If the predicted value of the lip language password is consistent with the password stored in the lip language recognition model, confirm the test to be tested The subject is identified by a lip code.
  • the present application also provides a computer-readable storage medium storing a computer program, the computer program including a lip-based cipher-based face recognition program, the lip-based
  • the face recognition program of the language password is executed by the processor, the steps of the above-mentioned face recognition method based on the lip language password are realized.
  • the face recognition method, system, electronic device and computer readable storage medium based on lip language password proposed in this application classify lip language images by using 2D convolution model + bidirectional LSTM model + Softmax layer + optimized network layer, Train a model; among them, the 2D convolution model is used to extract human lips features; the two-way LSTM model is used to connect multiple lip pictures in the video in series to extract temporal features; the Softmax layer is used to output lip password prediction values; optimization
  • the network layer is used to input the predicted value of the lip language password into the loss function for iterative training until the value of the loss function reaches the set threshold; finally obtain a face recognition model based on the lip language password; use the face recognition model based on the lip language password
  • the recognition model performs lip-password recognition, which can achieve the technical effects of low cost and high recognition accuracy.
  • FIG. 1 is a flowchart of a preferred embodiment of a face recognition method based on lip language password according to this application;
  • FIG. 2 is a flowchart of a preferred embodiment of a method for constructing a lip language recognition model based on a two-way LSTM according to the present application;
  • FIG. 3 is a schematic diagram of the principle of a face recognition method based on lip language passwords in this application;
  • FIG. 4 is a schematic structural diagram of a preferred embodiment of a face recognition system based on lip language passwords according to this application;
  • FIG. 5 is a schematic structural diagram of a preferred embodiment of the electronic device of this application.
  • FIG. 1 shows a flowchart of a preferred embodiment of a face recognition method based on lip language passwords according to an embodiment of the present application.
  • the method can be executed by a device, and the device can be implemented by software and/or hardware.
  • This application uses the dlib+Resnet face key point detection model to screen lip language images; the two-way LSTM model completes the classification of lip language images, thereby realizing the use of lip language as a password for face recognition.
  • Resnet residual Network
  • Resnet network is a new type of network structure with powerful functions. Resnet network can be seen as a combination of parallel and serial modules. Resnet is a better network for classification problems in the ImageNet competition. It has a variety of structural forms, including Resnet-34, Resnet-50, Resnet-101, and Resnet-152; the design of the Resnet network structure follows two design rules: (1) For the same output feature map size, the layers have the same number of filters; (2) If the feature map size is halved, the number of filters is doubled in order to maintain the time complexity of each layer. The first is the skip connection method, and the second is the use of the Batch Normalization layer.
  • the Resnet-based face detection algorithm needs to use windows with different sizes and positions to slide in the image, and then determine whether there is a face in the window.
  • HOG human oriented gradient
  • dlib convolutional neural networks for face detection.
  • the face recognition method based on lip ciphers in this application uses the interface of the trained Resnet model in dlib, and this interface returns a 128-dimensional face feature vector.
  • the face recognition method based on lip language password includes step S110-step S150.
  • the subject to be tested is the user who wants to perform password detection.
  • the user who needs to perform password detection needs to read out the preset voice password to pass the password detection.
  • the detection device needs to obtain the action video of the subject to be tested reading the password.
  • the user randomly selects 4-6 numbers as the password.
  • the user faces the camera and reads the above numbers at a constant speed.
  • the speed is about one number per second.
  • the reading can be in Mandarin, dialect, and English. Wait. When collecting information, it can be repeated 4-10 times.
  • the user must remember the 4-6 digits set by himself as a password. The next time you enter, just read out the above password to complete the identification.
  • S120 Perform frame-by-frame detection on the video by using a Resnet-based face detection model, and obtain lip language images of consecutive frames in the time period from the starting point to the ending point when the subject to be tested reads the password in the video.
  • the lip language image is parsed through the feature point model of the dlib database to obtain lip feature information; and the time period from the start time to the end time of the subject to be tested reading the password in the video is analyzed by analyzing the reading Acquire the sound waveform of the password video.
  • the video is detected frame by frame through the Resnet-based face detection model, the lip position is determined by the feature point model of the dlib database, and several frames of lip images are cut out; the digital password video is obtained by analyzing the sound waveform. According to the start time and end time of the sound, continuous frames of lip language images are filtered out according to the above start time and end time.
  • the face detection model detects the face of the video frame by frame, and then confirms the position of the lips according to the feature point model of dlib, thereby cutting out the lip picture from the video.
  • An exemplary description is as follows: For example, if the user's video reads 1234, the sample is a number of frames of lip pictures that are cut out, and the label is the pinyin corresponding to 1234: er sansi, and the different pinyins are separated by spaces.
  • the lip language recognition model determines the image of a continuous frame of the face in the video, so that the number is recognized through the lip movement, and the recognized number is compared with the password stored in the background. If they are consistent, then by.
  • the method for preprocessing the lip language image in step S130 includes: S310, normalizing the lip language image; S320, storing the normalized lip language image as a data set sample in a specified format ;
  • the storage format of the data set samples is [number of samples, data sequence length, image length, image width, image depth].
  • S140 Input the lip language feature set into a trained lip language recognition model based on two-way LSTM to obtain a lip language password prediction value.
  • Fig. 2 is a flowchart of a preferred embodiment of a method for constructing a lip language recognition model based on a two-way LSTM according to an embodiment of the present application. As shown in Fig. 2, the method for constructing a lip language recognition model based on a two-way LSTM includes: step S210- Step S240.
  • S210 Construct an initial network layer for acquiring lip features of the subject to be tested; the initial network layer is a 2D convolutional network.
  • S220 Construct a bidirectional LSTM layer on the initial network layer for extracting temporal features in the training set data.
  • Mouth vocalization is a dynamic process, and you need to look at the frames before and after the current frame to get the hidden layer information of the current frame more accurately. That is to say, the two-way LSTM layer considers the influence of "past” and "future” on the current frame.
  • the step size is the number of lip images extracted from the video.
  • Softmax used to classify a segment of lip motion images, there are a total of 0-9, a total of 10 categories.
  • the output layer using Softmax has multiple units. In fact, as many categories as we have, there will be as many units as possible. In this example, we have 10 categories, so there are also 10 neural units. Represents these 10 categories. Under the action of Softmax, each neural unit will calculate the probability that the current sample belongs to this category.
  • the neural network of the lip recognition model includes: a three-layer 2D convolutional network, a two-layer bidirectional LSTM network, and a fully connected layer with an activation function of Softmax, and finally a logical prediction layer (that is, an optimized network) Floor).
  • S240 Construct an optimized network layer on the Softmax layer; wherein the optimized network layer is used to input the predicted value of the lip password into a loss function for iterative training until the value of the loss function reaches a set threshold.
  • the training set data is input to the neural network, and the predicted value of the lip password is obtained through layer-by-layer calculation, and the predicted value of the lip password and the real label are input into the loss function, and Loss is calculated. Propagate the revised model parameters.
  • the loss function is used to iteratively train the above neural network model until Loss reaches the set threshold.
  • the bidirectional LSTM-based lip language recognition model of the present application is a bidirectional cyclic neural network.
  • the Forward layer and the Backward layer are jointly connected to the output layer, which contains 6 shared weights w1-w6.
  • the way to adjust the parameters is to use the gradient descent algorithm (Gradientdescent) to adjust the size of the parameters along the gradient direction.
  • the greater the gradient of the activation function the faster the size of w and b are adjusted, and the faster the training will converge.
  • the activation function commonly used in neural networks is the sigmoid function.
  • the forward calculation is performed from time 1 to time t, and the output of the forward hidden layer at each time is obtained and saved.
  • the backward calculation is performed from time t to time 1, and the output of the backward hidden layer at each time is obtained and saved. Finally, at each moment, the final output is obtained by combining the output results of the Forward layer and the Backward layer at the corresponding time.
  • the two-way LSTM network performs better in time series classification tasks. At the same time, it uses time series history and future information, combined with context information, and comprehensively judges the results, and the judgment results are more accurate.
  • Fig. 3 is a schematic diagram of the principle of a face recognition method based on a lip language password according to an embodiment of the present application.
  • the prescribed format is the storage format of the data set sample.
  • the prescribed storage format is [sample number, data sequence length, image length, image width, image depth ⁇ .
  • the above data set is divided into two parts at a ratio of eight to two, one is used as a training set, and the other is used as a test set; among them, the training set is used to train a lip recognition model based on two-way LSTM.
  • the steps of constructing a lip language recognition model based on bidirectional LSTM include: first designing the number of convolution kernels, constructing a 2D convolutional network, then designing the number of hidden layer cells and step size, constructing a bidirectional LSTM network; then constructing the Softmax layer, Softmax The layer is used to use the Softmax function to classify the obtained time characteristics of the password-reading video of the subject to be tested, and finally select the node with the largest probability (that is, the value corresponding to the largest) as the lip password prediction value output.
  • the last step is to use the loss function to optimize the network, and finally through the iterative training of the loss function, the lip recognition model based on two-way LSTM with the best recognition effect is obtained.
  • the lip language recognition model based on two-way LSTM can predict the lip language video.
  • FIG. 4 shows the structure of a preferred embodiment of a face recognition system based on lip ciphers according to the present application.
  • a face recognition system 400 based on a lip language password includes a lip language feature set acquisition unit 410, a lip language password prediction value acquisition unit 420, and a password determination unit 430; wherein,
  • the lip feature set obtaining unit 410 is used to obtain the password reading video of the subject to be tested; the video is detected frame by frame through the Resnet-based face detection model to obtain the starting point of the subject to be tested in the video to read the password
  • the lip language images of consecutive frames in the time period from the time to the end time; and the lip language feature set in the time period is determined according to the lip language images of the consecutive frames in the time period.
  • the lip language password prediction value obtaining unit 420 is configured to input the lip language feature set into a trained lip language recognition model based on two-way LSTM to obtain a lip language password prediction value.
  • the password determination unit 430 is configured to compare the predicted value of the lip language password with the password stored in the lip language recognition model, and if they are consistent, confirm that the subject to be tested is recognized by the lip language password.
  • the lip language feature set acquisition unit 410 includes a video acquisition sub-unit 411, a lip language image acquisition sub-unit 412, and a lip language feature set acquisition sub-unit 413; wherein, the video acquisition sub-unit uses To obtain the password-reading video of the subject to be tested; the lip language image acquisition subunit is used to detect the video frame by frame through the Resnet-based face detection model to obtain the starting point of the subject to be tested in the video to read the password Lip language images of consecutive frames in the time period from time to end time; the lip language feature set acquiring subunit is used to determine the lip language images in the time period according to the lip language images of consecutive frames in the time period Feature set.
  • the lip language feature set acquisition subunit 413 includes an image normalization module and a data set sample acquisition module; wherein the image normalization module is used to normalize the lip language image;
  • the data set sample acquisition module is used to store the normalized lip language image as a data set sample in a prescribed format.
  • the lip language image acquisition sub-unit 412 includes a video detection module and a lip language image acquisition module; wherein, the video detection module is used to perform the video detection on the video through the Resnet-based face detection model. Frame-by-frame detection; the lip language image acquisition module is used to acquire the lip language images of consecutive frames in the time period from the starting point to the ending point of the subject to be tested reading the password in the video.
  • the lip language image acquisition module includes a lip feature information acquisition sub-module and a time point acquisition sub-module; wherein the lip feature information acquisition sub-module is used to analyze the lip language through the feature point model of the dlib database The image acquires lip feature information; the time point acquisition sub-module is used for the time period from the start time to the end time of the subject to be tested for reading the password in the video by parsing the sound waveform of the password reading video.
  • the lip language password prediction value acquisition unit 420 includes a two-way LSTM-based lip language recognition model building module 421; the two-way LSTM-based lip language recognition model building module includes 421 an initial network layer construction sub-module, a two-way LSTM layer The construction sub-module, the Softmax layer construction module and the optimized network layer construction module; wherein the initial network layer construction sub-module is used to construct an initial network layer for obtaining the lip characteristics of the subject to be tested; the initial network layer is 2D Convolutional network; the two-way LSTM layer construction sub-module is used to construct a two-way LSTM layer for extracting temporal features in the training set data on the initial network layer; the Softmax layer construction module is used to The Softmax layer for outputting the predicted value of the lip cipher is constructed on the bidirectional LSTM layer; the optimized network layer construction module is used to construct an optimized network layer on the Softmax layer; wherein, the optimized network layer is used to transfer the The predicted value of the lip password is input to the
  • the Softmax layer construction module includes a lip password prediction value acquisition sub-module; the lip password prediction value acquisition sub-module is used to obtain a lip password prediction value through the temporal feature data extracted by the two-way LSTM layer.
  • the Softmax layer construction module also includes a parameter adjustment sub-module, the parameter adjustment sub-module is used to adjust the weight W and the offset b along the gradient direction by using the activation function sigmoid and the gradient descent algorithm.
  • Fig. 5 shows a schematic diagram of an application environment of a preferred embodiment of a face recognition method based on lip ciphers according to the present application.
  • the electronic device 5 may be a terminal device with arithmetic function, such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, and the like.
  • the electronic device 5 includes a processor 52, a memory 51, a communication bus 53 and a network interface 54.
  • the memory 51 includes at least one type of readable storage medium.
  • the at least one type of readable storage medium may be a non-volatile storage medium such as flash memory, hard disk, multimedia card, card-type memory 51, and the like.
  • the readable storage medium may be an internal storage unit of the electronic device 5, such as a hard disk of the electronic device 5.
  • the readable storage medium may also be the external memory 51 of the electronic device 5, such as a plug-in hard disk equipped on the electronic device 5, or a smart memory card (Smart Media Card, SMC). , Secure Digital (SD) card, Flash Card (Flash Card), etc.
  • SD Secure Digital
  • Flash Card Flash Card
  • the readable storage medium of the memory 51 is generally used to store the lip-password-based face recognition program 50 and the like installed in the electronic device 5.
  • the memory 51 can also be used to temporarily store data that has been output or will be output.
  • the processor 52 may be a central processing unit (CPU), microprocessor or other data processing chip, used to run the program code or processing data stored in the memory 51, for example, to execute lip-based Password facial recognition program 50 etc.
  • CPU central processing unit
  • microprocessor or other data processing chip, used to run the program code or processing data stored in the memory 51, for example, to execute lip-based Password facial recognition program 50 etc.
  • the communication bus 53 is used to realize the connection and communication between these components.
  • the network interface 54 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is generally used to establish a communication connection between the electronic device 5 and other electronic devices.
  • a standard wired interface and a wireless interface such as a WI-FI interface
  • FIG. 5 only shows the electronic device 5 with the components 51-54, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
  • the electronic device 5 may also include a user interface.
  • the user interface may include an input unit such as a keyboard (Keyboard), a voice input device such as a microphone (microphone) and other devices with voice recognition functions, and a voice output device such as audio, earphones, etc.
  • the user interface may also include a standard wired interface and a wireless interface.
  • the electronic device 5 may also include a display, and the display may also be referred to as a display screen or a display unit.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, and an organic light-emitting diode (Organic Light-Emitting Diode, OLED) touch device, etc.
  • the display is used to display the information processed in the electronic device 5 and to display a visualized user interface.
  • the electronic device 5 may also include a radio frequency (RF) circuit, a sensor, an audio circuit, etc., which will not be repeated here.
  • RF radio frequency
  • the memory 51 which is a computer storage medium, can store an operating system and a face recognition program 50 based on a lip code; the processor 52 executes the lip based code stored in the memory 51
  • the password facial recognition program 50 implements the following steps: S110, obtain the password-reading video of the subject to be tested; S120, detect the obtained password-reading video frame by frame through the Resnet-based face detection model, and obtain the obtained video
  • S130 according to the lip language images of the consecutive frames in the time period, determine the lip language feature set in the time period S140, input the lip language feature set into a trained lip language recognition model based on two-way LSTM to obtain a lip language password predicted value; S150, if the lip language password predicted value is stored in the lip language recognition model If the passwords are consistent, it is confirmed that the subject to be tested is identified by the lip-password.
  • the facial recognition program 50 based on lip ciphers may also be divided into one or more modules, and the one or more modules are stored in the memory 51 and executed by the processor 52 to complete the application .
  • the module referred to in this application refers to a series of computer program instruction segments that can complete specific functions.
  • the face recognition program 50 based on the lip language password can be divided into: a lip language feature set acquisition unit, a lip language password prediction value acquisition unit, and a password determination unit.
  • the realized functions or operation steps are similar to the above, and will not be described in detail here.
  • an embodiment of the present application also proposes a computer-readable storage medium, the computer-readable storage medium includes a lip-password-based facial recognition program, and the lip-password-based facial recognition program is executed by a processor Realize the following operations: S110. Obtain the password-reading video of the subject to be tested; S120. Detect the video frame by frame through the Resnet-based face detection model to obtain the time from the starting point to the end point of the subject to be tested in the video. Lip language images of consecutive frames in a time period; S130. Determine a lip language feature set in the time period according to the lip language images of consecutive frames in the time period; S140. Input the lip language feature set for training.
  • the lip language recognition model based on the two-way LSTM obtains the lip language password prediction value; S150. If the lip language password prediction value is consistent with the password stored in the lip language recognition model, confirm that the subject to be tested is recognized by the lip language password .
  • the specific implementation of the computer-readable storage medium of the present application is substantially the same as the specific implementation of the above-mentioned face recognition method and electronic device based on the lip-password, and will not be repeated here.

Abstract

The present application belongs to the field of biometric recognition technology, and provides a lip password-based face recognition method and system, a device, and a storage medium. Said method comprises: obtaining a video in which a main body to be detected reads a password; performing detection on the video frame by frame by means of a Resnet-based face detection model, so as to acquire continuous lip images in the video within a time period from a start time to an end time at which the main body to be detected reads a password; determining a lip feature set within the time period according to the continuous lip images within the time period; inputting the lip feature set into a bidirectional LSTM-based trained lip recognition model, so as to obtain a prediction value of the lip password; and if the prediction value of the lip password is consistent with a password stored in the lip recognition model, determining that the main body to be detected passes the lip password recognition. The lip password-based face recognition method in the present application is able to achieve the effects of quick detection and high detection accuracy.

Description

基于唇语密码的人脸识别方法、系统、装置及存储介质Face recognition method, system, device and storage medium based on lip code
本申请要求申请号为201910885930.2,申请日为2019年9月19日,发明创造名称为“基于唇语密码的人脸识别方法、装置及存储介质”的专利申请的优先权。This application requires the priority of the patent application whose application number is 201910885930.2, the filing date is September 19, 2019, and the invention-creation title is "Face Recognition Method, Device and Storage Medium Based on Lip Password".
技术领域Technical field
本申请涉及生物识别技术领域,尤其涉及一种基于唇语密码的人脸识别方法、系统、装置及存储介质。This application relates to the field of biometric recognition technology, and in particular to a face recognition method, system, device, and storage medium based on lip ciphers.
背景技术Background technique
传统的密码识别方法中,所使用的密码为数字、字符或者其他符号组合,使用者必须准确记住所使用的密码并且正确输入才能被成功识别,因此传统的密码识别方法存在容易遗忘、记错、操作繁琐等弊端。In the traditional password identification method, the password used is a combination of numbers, characters or other symbols. The user must accurately remember the password used and enter it correctly to be successfully identified. Therefore, the traditional password identification method is easy to forget and remember incorrectly. , The operation is cumbersome and other disadvantages.
申请人意识到指纹识别、人脸识别、虹膜识别方法能够克服上述传统密码需要记忆、逐个输入的不足,但也存在如下缺陷:1、单纯的指纹识别、人脸识别,面临复制指纹、照片攻击的风险,用复制的指纹以及静态的照片会骗过指纹识别和人脸识别;2、虹膜识别的安全性高些,但是,却存在设备昂贵,需要投入的经济成本较高的问题。The applicant realizes that fingerprint recognition, face recognition, and iris recognition methods can overcome the above-mentioned shortcomings of traditional passwords that need to be memorized and input one by one, but they also have the following shortcomings: 1. Simple fingerprint recognition and face recognition, facing the attack of copying fingerprints and photos The risk of using copied fingerprints and static photos will deceive fingerprint recognition and face recognition; 2. The security of iris recognition is higher, but there is the problem of expensive equipment and higher economic costs.
鉴于以上问题的存在,亟需一种检测成本低,且不损失检测准确度的安全识别方法。In view of the existence of the above problems, there is an urgent need for a safe identification method that has low detection cost and does not lose detection accuracy.
发明内容Summary of the invention
本申请提供一种基于唇语密码的人脸识别方法、系统、电子装置及计算机可读存储介质,其主要通过dlib+Resnet人脸关键点检测模型,进行唇语图像的筛选;通过基于双向LSTM的唇语识别模型完成对唇语图像的分类,实现了利用唇语作为密码进行人脸识别的技术效果。This application provides a face recognition method, system, electronic device, and computer readable storage medium based on lip language passwords, which mainly use the dlib+Resnet face key point detection model to screen lip language images; through two-way LSTM The lip language recognition model completes the classification of lip language images, and realizes the technical effect of using lip language as a password for face recognition.
为实现上述目的,本申请提供一种基于唇语密码的人脸识别方法,应用于电子装置,方法包括:S110、获得待测主体的读密码视频;S120、通过基于Resnet的人脸检测模型对所述视频进行逐帧检测,获取所述视频中待测主体读密码的起点时刻至终点时刻的时间段内连续帧的唇语图像;S130、根据所述时间段内的连续帧的唇语图像,确定所述时间段内的唇语特征集;S140、将所述唇语特征集输入训练好的基于双向LSTM的唇语识别模型,得到唇语密码预测值;S150、若所述唇语密码预测值与所述唇语识别模型中存储的密码一致,则确认待测主体通过唇语密码识别。In order to achieve the above objective, the present application provides a face recognition method based on lip language password, which is applied to an electronic device. The method includes: S110: Obtain a password-reading video of the subject to be tested; S120: Use a Resnet-based face detection model to The video is detected frame by frame, and the lip language images of consecutive frames in the time period from the start time to the end time of the subject to be tested reading the password in the video are obtained; S130, the lip language images of the consecutive frames in the time period , Determine the lip language feature set in the time period; S140, input the lip language feature set into the trained two-way LSTM-based lip language recognition model to obtain the predicted value of the lip language password; S150, if the lip language password If the predicted value is consistent with the password stored in the lip language recognition model, it is confirmed that the subject to be tested is recognized by the lip language password.
为实现上述目的,本申请提供一种基于唇语密码的人脸识别系统,包括唇语特征集获取单元、唇语密码预测值获取单元和密码判定单元;其中,所述唇语特征集获取单元,用于获得待测主体的读密码视频;通过基于Resnet的人脸检测模型对所述视频进行逐帧检测,获取所述视频中待测主体读密码的起点时刻至终点时刻的时间段内连续帧的唇语图像;根据所述时间段内的连续帧的唇语图像,确定所述时间段内的唇语特征集;所述唇语密码预测值获取单元,用于将所述唇语特征集输入训练好的基于双向LSTM的唇语识别模型,得到唇语密码预测值;所述密码判定单元,用于将所述唇语密码预测值与所述唇语识别模型中存储的密码进行比对,若一致,则确认待测主体通过唇语密码识别。To achieve the above objective, the present application provides a lip language password-based face recognition system, including a lip language feature set acquisition unit, a lip language password prediction value acquisition unit, and a password determination unit; wherein the lip language feature set acquisition unit , Used to obtain the password-reading video of the subject to be tested; the video is detected frame by frame through the face detection model based on Resnet, and the time period from the starting point of the subject to be tested to the end point in the video is obtained continuously Frame of lip language images; determine the lip language feature set in the time period according to the lip language images of consecutive frames in the time period; the lip language password prediction value acquisition unit is configured to combine the lip language features Input the trained lip language recognition model based on two-way LSTM to obtain the predicted value of the lip language password; the password determination unit is used to compare the predicted value of the lip language password with the password stored in the lip language recognition model Yes, if they are the same, confirm that the subject to be tested is recognized by the lip code.
为实现上述目的,本申请还提供一种电子装置,该电子装置包括:存储器、处理器,所述存储器中存储基于唇语密码的人脸识别程序,所述基于唇语密码的人脸识别程序被所述处理器执行时实现如下步骤:S110、获得待测主体的读密码视频;S120、通过基于Resnet的人脸检测模型对所述视频进行逐帧检测,获取视频中待测主体读密码的起点时刻至终点时刻的时间段内连续帧的唇语图像;S130、根据所述时间段内的连续帧的唇语图像,确定所述时间段内的唇语特征集;S140、将所述唇语特征集输入训练好的基于双向LSTM的唇语识别模型,得到唇语密码预测值;S150、若所述唇语密码预测值与所述唇语识别模型中存储的密码一致,则确认待测主体通过唇语密码识别。To achieve the above objective, the present application also provides an electronic device, the electronic device comprising: a memory, a processor, the memory stores a lip-password-based facial recognition program, the lip-password-based facial recognition program When executed by the processor, the following steps are implemented: S110, obtaining a password-reading video of the subject to be tested; S120, detecting the video frame by frame through the Resnet-based face detection model to obtain the password-reading video of the subject to be tested in the video The lip language images of consecutive frames in the time period from the start time to the end time; S130. Determine the lip language feature set in the time period according to the lip images of the consecutive frames in the time period; S140. The language feature set is input into the trained lip language recognition model based on the two-way LSTM to obtain the predicted value of the lip language password; S150. If the predicted value of the lip language password is consistent with the password stored in the lip language recognition model, confirm the test to be tested The subject is identified by a lip code.
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括基于唇语密码 的人脸识别程序,所述基于唇语密码的人脸识别程序被处理器执行时,实现上述的基于唇语密码的人脸识别方法的步骤。In addition, in order to achieve the above object, the present application also provides a computer-readable storage medium storing a computer program, the computer program including a lip-based cipher-based face recognition program, the lip-based When the face recognition program of the language password is executed by the processor, the steps of the above-mentioned face recognition method based on the lip language password are realized.
本申请提出的基于唇语密码的人脸识别方法、系统、电子装置及计算机可读存储介质,通过采用2D卷积模型+双向LSTM模型+Softmax层+优化网络层,对唇语图像进行分类,训练出模型;其中,2D卷积模型用于提取人的嘴唇特征;双向LSTM模型用于将视频中的多幅嘴唇图片串联起来,提取时间特征;Softmax层用于输出唇语密码预测值;优化网络层,用于将所述唇语密码预测值输入损失函数进行迭代训练,直至损失函数的值达到设定阈值;最终得到基于唇语密码的人脸识别模型;利用基于唇语密码的人脸识别模型进行唇语密码识别,可实现使用成本低、识别精度高的技术效果。The face recognition method, system, electronic device and computer readable storage medium based on lip language password proposed in this application classify lip language images by using 2D convolution model + bidirectional LSTM model + Softmax layer + optimized network layer, Train a model; among them, the 2D convolution model is used to extract human lips features; the two-way LSTM model is used to connect multiple lip pictures in the video in series to extract temporal features; the Softmax layer is used to output lip password prediction values; optimization The network layer is used to input the predicted value of the lip language password into the loss function for iterative training until the value of the loss function reaches the set threshold; finally obtain a face recognition model based on the lip language password; use the face recognition model based on the lip language password The recognition model performs lip-password recognition, which can achieve the technical effects of low cost and high recognition accuracy.
附图说明Description of the drawings
图1为本申请基于唇语密码的人脸识别方法较佳实施例的流程图;FIG. 1 is a flowchart of a preferred embodiment of a face recognition method based on lip language password according to this application;
图2为本申请基于双向LSTM的唇语识别模型的构建方法较佳实施例的流程图;2 is a flowchart of a preferred embodiment of a method for constructing a lip language recognition model based on a two-way LSTM according to the present application;
图3为本申请基于唇语密码的人脸识别方法的原理示意图;FIG. 3 is a schematic diagram of the principle of a face recognition method based on lip language passwords in this application;
图4为本申请基于唇语密码的人脸识别系统较佳实施例的结构示意图;4 is a schematic structural diagram of a preferred embodiment of a face recognition system based on lip language passwords according to this application;
图5为本申请的电子装置的较佳实施例的结构示意图。FIG. 5 is a schematic structural diagram of a preferred embodiment of the electronic device of this application.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
具体实施方式detailed description
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described here are only used to explain the application, and not used to limit the application.
本申请提供一种基于唇语密码的人脸识别方法。图1示出了根据本申请实施例的基于唇语密码的人脸识别方法较佳实施例的流程图。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。This application provides a face recognition method based on lip code. Fig. 1 shows a flowchart of a preferred embodiment of a face recognition method based on lip language passwords according to an embodiment of the present application. The method can be executed by a device, and the device can be implemented by software and/or hardware.
本申请是通过dlib+Resnet人脸关键点检测模型,进行唇语图像的筛选; 通过双向LSTM模型完成对唇语图像的分类,从而实现利用唇语作为密码进行人脸识别。This application uses the dlib+Resnet face key point detection model to screen lip language images; the two-way LSTM model completes the classification of lip language images, thereby realizing the use of lip language as a password for face recognition.
需要说明的是,Resnet(Residual Network)是一种新型的网络结构,它功能强大,Resnet网络可以被看作是并行和串行多个模块的结合。Resnet是ImageNet竞赛中分类问题比较好的网络,它有多种结构形式,有Resnet-34,Resnet-50,Resnet-101,Resnet-152;Resnet网络结构的设计遵循两种设计规则:(1)对于相同的输出特征图尺寸,层具有相同数量的滤波器;(2)如果特征图大小减半,则滤波器的数量加倍,以便保持每一层的时间复杂度。第一个是跳跃式连接方法,第二个则是Batch Normalization层的使用。基于Resnet的人脸检测算法需要用大小位置不同的窗口在图像中进行滑动,然后判断窗口中是否存在人脸。It should be noted that Resnet (Residual Network) is a new type of network structure with powerful functions. Resnet network can be seen as a combination of parallel and serial modules. Resnet is a better network for classification problems in the ImageNet competition. It has a variety of structural forms, including Resnet-34, Resnet-50, Resnet-101, and Resnet-152; the design of the Resnet network structure follows two design rules: (1) For the same output feature map size, the layers have the same number of filters; (2) If the feature map size is halved, the number of filters is doubled in order to maintain the time complexity of each layer. The first is the skip connection method, and the second is the use of the Batch Normalization layer. The Resnet-based face detection algorithm needs to use windows with different sizes and positions to slide in the image, and then determine whether there is a face in the window.
另外,dlib中使用的是HOG(histogram of oriented gradient)+回归树的方法,使用dlib训练好的模型进行检测效果要好很多。dlib也使用了卷积神经网络来进行人脸检测。In addition, the HOG (histogram of oriented gradient) + regression tree method is used in dlib. It is much better to use the model trained by dlib for detection. dlib also uses convolutional neural networks for face detection.
本申请的基于唇语密码的人脸识别方法用到的是dlib中已经训练好的Resnet模型的接口,此接口会返回一个128维的人脸特征向量。The face recognition method based on lip ciphers in this application uses the interface of the trained Resnet model in dlib, and this interface returns a 128-dimensional face feature vector.
如图1所示,在本实施例中,基于唇语密码的人脸识别方法包括步骤S110-步骤S150。As shown in Fig. 1, in this embodiment, the face recognition method based on lip language password includes step S110-step S150.
S110、获得待测主体的读密码视频。S110. Obtain a password-reading video of the subject to be tested.
待测主体也就是想要进行密码检测的用户,该需要进行密码检测的用户需要读出预先设定好的语音密码才可以通过密码检测。具体地说,检测装置要获取待测主体读密码的动作视频。The subject to be tested is the user who wants to perform password detection. The user who needs to perform password detection needs to read out the preset voice password to pass the password detection. Specifically, the detection device needs to obtain the action video of the subject to be tested reading the password.
在一个具体的实施例中,用户随机选出4-6个数字作为密码,用户脸部正对摄像头,匀速读出上述数字,速度大约为1秒钟一个数字,读数可以采用普通话、方言、英语等。在采集信息时,可重复4-10遍。用户要记住自己设定的那个4-6位的数字,作为密码。在下次进入时,读出上述密码即可,完成识别。In a specific embodiment, the user randomly selects 4-6 numbers as the password. The user faces the camera and reads the above numbers at a constant speed. The speed is about one number per second. The reading can be in Mandarin, dialect, and English. Wait. When collecting information, it can be repeated 4-10 times. The user must remember the 4-6 digits set by himself as a password. The next time you enter, just read out the above password to complete the identification.
S120、通过基于Resnet的人脸检测模型对所述视频进行逐帧检测,获取视频中待测主体读密码的起点时刻至终点时刻的时间段内连续帧的唇语图像。S120: Perform frame-by-frame detection on the video by using a Resnet-based face detection model, and obtain lip language images of consecutive frames in the time period from the starting point to the ending point when the subject to be tested reads the password in the video.
在一个具体的实施例中,通过dlib数据库的特征点模型解析所述唇语图像获取嘴唇特征信息;并且所述视频中待测主体读密码的起点时刻至终点时刻的时间段通过解析所述读密码视频的声音波形获取。In a specific embodiment, the lip language image is parsed through the feature point model of the dlib database to obtain lip feature information; and the time period from the start time to the end time of the subject to be tested reading the password in the video is analyzed by analyzing the reading Acquire the sound waveform of the password video.
具体地说,通过基于Resnet的人脸检测模型对所述视频进行逐帧检测,利用dlib数据库的特征点模型确定嘴唇位置,剪切出若干帧嘴唇图像;根据解析声音波形得到数字密码视频中的声音起点时间和声音终点时间,根据上述起点时间和终点时间,筛选出连续帧的唇语图像。Specifically, the video is detected frame by frame through the Resnet-based face detection model, the lip position is determined by the feature point model of the dlib database, and several frames of lip images are cut out; the digital password video is obtained by analyzing the sound waveform. According to the start time and end time of the sound, continuous frames of lip language images are filtered out according to the above start time and end time.
在一个具体的实施例中,人脸检测模型对视频进行逐帧检测人脸,然后根据dlib的特征点模型,确认嘴唇的位置,从而在上述视频中剪切出嘴唇图片。示例性的说明如下:比如用户视频中读的是1234,则样本就是剪切出的若干帧嘴唇图片,标签是1234对应的拼音:yi er san si,不同拼音之间用空格分开。In a specific embodiment, the face detection model detects the face of the video frame by frame, and then confirms the position of the lips according to the feature point model of dlib, thereby cutting out the lip picture from the video. An exemplary description is as follows: For example, if the user's video reads 1234, the sample is a number of frames of lip pictures that are cut out, and the label is the pinyin corresponding to 1234: er sansi, and the different pinyins are separated by spaces.
比如用户视频中读出的是数字6,则样本剪切出的是若干帧嘴唇图片,标签是6对应的拼音:liu;根据声音波形得到每个数字的声音的起点和重点时间,基于双向LSTM的唇语识别模型根据起点和终点时间,确定视频中的一段人脸连续帧的图像,从而通过唇动识别出数字,根据识别出的数字和后台预先存储的密码进行比对,若一致,则通过。For example, if the user video reads the number 6, the sample cuts out several frames of lip pictures, and the label is the pinyin corresponding to 6: liu; the starting point and key time of the sound of each number are obtained according to the sound waveform, based on two-way LSTM According to the start and end time, the lip language recognition model determines the image of a continuous frame of the face in the video, so that the number is recognized through the lip movement, and the recognized number is compared with the password stored in the background. If they are consistent, then by.
S130、根据所述时间段内的连续帧的唇语图像,确定所述时间段内的唇语特征集。S130. Determine a lip language feature set in the time period according to the lip language images of consecutive frames in the time period.
所述步骤S130中对所述唇语图像进行预处理的方法包括:S310、将所述唇语图像进行归一化;S320、将归一化后的唇语图像存储为规定格式的数据集样本;所述数据集样本的存储格式为【样本数,数据序列长度,图像长,图像宽,图像深度】。The method for preprocessing the lip language image in step S130 includes: S310, normalizing the lip language image; S320, storing the normalized lip language image as a data set sample in a specified format ; The storage format of the data set samples is [number of samples, data sequence length, image length, image width, image depth].
S140、将所述唇语特征集输入训练好的基于双向LSTM的唇语识别模型,得到唇语密码预测值。S140. Input the lip language feature set into a trained lip language recognition model based on two-way LSTM to obtain a lip language password prediction value.
S150、若所述唇语密码预测值与所述唇语识别模型中存储的密码一致,则确认待测主体通过唇语密码识别。S150. If the predicted value of the lip language password is consistent with the password stored in the lip language recognition model, confirm that the subject to be tested is recognized by the lip language password.
综上所述,利用基于唇语密码的人脸识别模型进行唇语密码识别,可实现使用成本低、识别精度高的技术效果。In summary, using a face recognition model based on lip ciphers for lip cipher recognition can achieve the technical effects of low cost and high recognition accuracy.
图2为根据本申请实施例的基于双向LSTM的唇语识别模型的构建方法较佳实施例的流程图,如图2所示,基于双向LSTM的唇语识别模型的构建方法包括:步骤S210-步骤S240。Fig. 2 is a flowchart of a preferred embodiment of a method for constructing a lip language recognition model based on a two-way LSTM according to an embodiment of the present application. As shown in Fig. 2, the method for constructing a lip language recognition model based on a two-way LSTM includes: step S210- Step S240.
S210、构建用于获取待测主体的嘴唇特征的初始网络层;该初始网络层为2D卷积网络。S210. Construct an initial network layer for acquiring lip features of the subject to be tested; the initial network layer is a 2D convolutional network.
具体地说,先建立Masking层,过滤掉输入样本中补齐为0的数据;然后,设计卷积核个数,构建2D卷积网络的初始网络层,利用初始网络层的2D卷积网络,提取输入数据中每一帧的特征值;其中,卷积核根据训练集数据中样本嘴唇大小进行设置。输入训练集样本中每一帧图像,与网络中的卷积核相乘求和,经过三层卷积后,得到特征值。Specifically, first establish a Masking layer to filter out the data that is filled with 0 in the input sample; then, design the number of convolution kernels, build the initial network layer of the 2D convolutional network, and use the 2D convolutional network of the initial network layer, Extract the feature value of each frame in the input data; among them, the convolution kernel is set according to the sample lip size in the training set data. Input each frame of image in the training set sample, multiply it with the convolution kernel in the network, and get the feature value after three layers of convolution.
S220、在所述初始网络层上构建用于提取训练集数据中的时间特征的双向LSTM层。S220. Construct a bidirectional LSTM layer on the initial network layer for extracting temporal features in the training set data.
嘴部发声是一个动态过程,需要看当前帧的前后帧才能更准确得出当前帧的隐藏层信息。也就是说双向LSTM层,考虑了“过去”和“将来”对当前帧的影响。Mouth vocalization is a dynamic process, and you need to look at the frames before and after the current frame to get the hidden layer information of the current frame more accurately. That is to say, the two-way LSTM layer considers the influence of "past" and "future" on the current frame.
其中,双向LSTM层,步长为视频提取的嘴唇图像数量。Among them, in the bidirectional LSTM layer, the step size is the number of lip images extracted from the video.
S230、在所述双向LSTM层上构建用于输出唇语密码预测值的Softmax层。S230. Construct a Softmax layer on the two-way LSTM layer for outputting the predicted value of the lip cipher.
具体地说,利用Softmax层将一段唇动图像做分类,共为0-9,共10类。需要说明的是,使用Softmax的输出层拥有多个单元,实际上我们有多少个分类就会有多少个单元,在这个例子中,我们有10个分类,所以也就有10个神经单元,它们代表了这10个分类。在Softmax的作用下每个神经单元都会计算出当前样本属于本类的概率。Specifically, using the Softmax layer to classify a segment of lip motion images, there are a total of 0-9, a total of 10 categories. It should be noted that the output layer using Softmax has multiple units. In fact, as many categories as we have, there will be as many units as possible. In this example, we have 10 categories, so there are also 10 neural units. Represents these 10 categories. Under the action of Softmax, each neural unit will calculate the probability that the current sample belongs to this category.
将神经网络得出的时间值X通过Softmax层得到该样本的唇语密码预测值;换句话说,Softmax层通过公式将所述双向LSTM层提取的时间特征数据计算为唇语密码预测值,所述公式如下:P=W*X+b;其中,P为唇语密码预测值,X为时间特征数据,W为权重,b为偏移量。The time value X obtained by the neural network is passed through the Softmax layer to obtain the lip password prediction value of the sample; in other words, the Softmax layer uses the formula to calculate the time feature data extracted by the two-way LSTM layer as the lip password prediction value, so The formula is as follows: P=W*X+b; where P is the predicted value of the lip code, X is the time feature data, W is the weight, and b is the offset.
总的来说,唇语识别模型的神经网络包括:三层2D卷积网络,两层双向LSTM网络,以及一层激活函数为Softmax的全连接层,最后添加一层逻辑预测层(即优化网络层)。In general, the neural network of the lip recognition model includes: a three-layer 2D convolutional network, a two-layer bidirectional LSTM network, and a fully connected layer with an activation function of Softmax, and finally a logical prediction layer (that is, an optimized network) Floor).
S240、在所述Softmax层上构建优化网络层;其中,所述优化网络层用于将所述唇语密码预测值输入损失函数进行迭代训练,直至损失函数的值达到设定阈值。S240. Construct an optimized network layer on the Softmax layer; wherein the optimized network layer is used to input the predicted value of the lip password into a loss function for iterative training until the value of the loss function reaches a set threshold.
所述神经网络的训练过程,将训练集数据输入神经网络,经过层层计算,得到唇语密码预测值,将唇语密码预测值与真实的标签输入到损失函数,计算出Loss,通过反向传播修正模型参数。换句话说,就是利用损失函数对上述神经网络模型进行迭代训练,直至Loss达到设定的阈值。In the training process of the neural network, the training set data is input to the neural network, and the predicted value of the lip password is obtained through layer-by-layer calculation, and the predicted value of the lip password and the real label are input into the loss function, and Loss is calculated. Propagate the revised model parameters. In other words, the loss function is used to iteratively train the above neural network model until Loss reaches the set threshold.
总的来说,本申请的基于双向LSTM的唇语识别模型是一种双向循环神经网络,Forward层和Backward层共同连接着输出层,其中包含了6个共享权值w1-w6。神经元之间的连接权重w,以及每个神经元本身的偏置b。调参的方式是采用梯度下降算法(Gradientdescent),沿着梯度方向调整参数大小。激活函数的梯度越大,w和b的大小调整得越快,训练收敛得就越快。而神经网络常用的激活函数为sigmoid函数。In general, the bidirectional LSTM-based lip language recognition model of the present application is a bidirectional cyclic neural network. The Forward layer and the Backward layer are jointly connected to the output layer, which contains 6 shared weights w1-w6. The connection weight w between neurons, and the bias b of each neuron itself. The way to adjust the parameters is to use the gradient descent algorithm (Gradientdescent) to adjust the size of the parameters along the gradient direction. The greater the gradient of the activation function, the faster the size of w and b are adjusted, and the faster the training will converge. The activation function commonly used in neural networks is the sigmoid function.
在Forward层从1时刻到t时刻正向计算一遍,得到并保存每个时刻向前隐含层的输出。在Backward层沿着时刻t到时刻1反向计算一遍,得到并保存每个时刻向后隐含层的输出。最后在每个时刻结合Forward层和Backward层的相应时刻输出的结果得到最终的输出。In the Forward layer, the forward calculation is performed from time 1 to time t, and the output of the forward hidden layer at each time is obtained and saved. In the Backward layer, the backward calculation is performed from time t to time 1, and the output of the backward hidden layer at each time is obtained and saved. Finally, at each moment, the final output is obtained by combining the output results of the Forward layer and the Backward layer at the corresponding time.
综上所述,双向LSTM网络,时间序列分类任务表现更好,同时利用时间序列历史和未来信息,结合上下文信息,结果综合判断,判断结果更为准确。In summary, the two-way LSTM network performs better in time series classification tasks. At the same time, it uses time series history and future information, combined with context information, and comprehensively judges the results, and the judgment results are more accurate.
图3为根据本申请实施例的基于唇语密码的人脸识别方法的原理示意图。如图3所示,先通过待测主体的读取密码的视频获得连续帧的唇语图像,然后将获得的唇语图像进行归一化处理,将归一化处理后的唇语图像存储为规定格式的数据集样本;其中的规定格式为数据集样本的存储格式,在本申请的一个具体实施方式中,该规定的存储格式为【样本数,数据序列长度,图像长,图像宽,图像深度】。Fig. 3 is a schematic diagram of the principle of a face recognition method based on a lip language password according to an embodiment of the present application. As shown in Figure 3, first obtain continuous frames of lip language images from the video of the subject to be tested for reading the password, and then normalize the obtained lip language images, and store the normalized lip language images as A data set sample in a prescribed format; the prescribed format is the storage format of the data set sample. In a specific implementation of this application, the prescribed storage format is [sample number, data sequence length, image length, image width, image depth】.
将上述数据集以八比二的比例分为两部分,一部分作为训练集,一部分作为测试集;其中,训练集用于训练基于双向LSTM的唇语识别模型。构建基于双向LSTM的唇语识别模型的步骤包括:先设计卷积核的个数,构建2D卷积网络,然后设计隐藏层单元格数和步长,构建双向LSTM网络;然后构建 Softmax层,Softmax层用于利用Softmax函数对所获得的待测主体的读密码视频的时间特征进行分类,最终选取概率最大(也就是值对应最大的)结点作为唇语密码预测值输出。最后一步,利用损失函数进行网络优化,最终通过损失函数的迭代训练得到识别效果最佳的基于双向LSTM的唇语识别模型。The above data set is divided into two parts at a ratio of eight to two, one is used as a training set, and the other is used as a test set; among them, the training set is used to train a lip recognition model based on two-way LSTM. The steps of constructing a lip language recognition model based on bidirectional LSTM include: first designing the number of convolution kernels, constructing a 2D convolutional network, then designing the number of hidden layer cells and step size, constructing a bidirectional LSTM network; then constructing the Softmax layer, Softmax The layer is used to use the Softmax function to classify the obtained time characteristics of the password-reading video of the subject to be tested, and finally select the node with the largest probability (that is, the value corresponding to the largest) as the lip password prediction value output. The last step is to use the loss function to optimize the network, and finally through the iterative training of the loss function, the lip recognition model based on two-way LSTM with the best recognition effect is obtained.
利用测试集的样本进行基于双向LSTM的唇语识别模型的测试。完成测试后的基于双向LSTM的唇语识别模型,可以进行唇语视频的预测。Use the samples of the test set to test the lip language recognition model based on two-way LSTM. After the test, the lip language recognition model based on two-way LSTM can predict the lip language video.
本申请提供一种基于唇语密码的人脸识别系统400。图4示出了根据本申请基于唇语密码的人脸识别系统较佳实施例的结构。This application provides a face recognition system 400 based on lip code. Fig. 4 shows the structure of a preferred embodiment of a face recognition system based on lip ciphers according to the present application.
如图4所示,一种基于唇语密码的人脸识别系统400,包括唇语特征集获取单元410、唇语密码预测值获取单元420和密码判定单元430;其中,As shown in FIG. 4, a face recognition system 400 based on a lip language password includes a lip language feature set acquisition unit 410, a lip language password prediction value acquisition unit 420, and a password determination unit 430; wherein,
所述唇语特征集获取单元410,用于获得待测主体的读密码视频;通过基于Resnet的人脸检测模型对所述视频进行逐帧检测,获取所述视频中待测主体读密码的起点时刻至终点时刻的时间段内连续帧的唇语图像;根据所述时间段内的连续帧的唇语图像,确定所述时间段内的唇语特征集。The lip feature set obtaining unit 410 is used to obtain the password reading video of the subject to be tested; the video is detected frame by frame through the Resnet-based face detection model to obtain the starting point of the subject to be tested in the video to read the password The lip language images of consecutive frames in the time period from the time to the end time; and the lip language feature set in the time period is determined according to the lip language images of the consecutive frames in the time period.
所述唇语密码预测值获取单元420,用于将所述唇语特征集输入训练好的基于双向LSTM的唇语识别模型,得到唇语密码预测值。The lip language password prediction value obtaining unit 420 is configured to input the lip language feature set into a trained lip language recognition model based on two-way LSTM to obtain a lip language password prediction value.
所述密码判定单元430,用于将所述唇语密码预测值与所述唇语识别模型中存储的密码进行比对,若一致,则确认待测主体通过唇语密码识别。The password determination unit 430 is configured to compare the predicted value of the lip language password with the password stored in the lip language recognition model, and if they are consistent, confirm that the subject to be tested is recognized by the lip language password.
在一个具体实施例中,所述唇语特征集获取单元410包括视频获取子单元411、唇语图像获取子单元412以及唇语特征集获取子单元413;其中,所述视频获取子单元,用于获得待测主体的读密码视频;所述唇语图像获取子单元,用于通过基于Resnet的人脸检测模型对所述视频进行逐帧检测,获取所述视频中待测主体读密码的起点时刻至终点时刻的时间段内连续帧的唇语图像;所述唇语特征集获取子单元,用于根据所述时间段内的连续帧的唇语图像,确定所述时间段内的唇语特征集。In a specific embodiment, the lip language feature set acquisition unit 410 includes a video acquisition sub-unit 411, a lip language image acquisition sub-unit 412, and a lip language feature set acquisition sub-unit 413; wherein, the video acquisition sub-unit uses To obtain the password-reading video of the subject to be tested; the lip language image acquisition subunit is used to detect the video frame by frame through the Resnet-based face detection model to obtain the starting point of the subject to be tested in the video to read the password Lip language images of consecutive frames in the time period from time to end time; the lip language feature set acquiring subunit is used to determine the lip language images in the time period according to the lip language images of consecutive frames in the time period Feature set.
具体地说,所述唇语特征集获取子单元413包括图像归一化模块和数据集样本获取模块;其中,所述图像归一化模块,用于将所述唇语图像进行归一化;所述数据集样本获取模块,用于将归一化后的唇语图像存储为规定格式的数据集样本。Specifically, the lip language feature set acquisition subunit 413 includes an image normalization module and a data set sample acquisition module; wherein the image normalization module is used to normalize the lip language image; The data set sample acquisition module is used to store the normalized lip language image as a data set sample in a prescribed format.
需要说明的是,所述所述唇语图像获取子单元412包括视频检测模块和唇语图像获取模块;其中,所述视频检测模块,用于通过基于Resnet的人脸检测模型对所述视频进行逐帧检测;所述唇语图像获取模块,用于获取所述视频中待测主体读密码的起点时刻至终点时刻的时间段内连续帧的唇语图像。It should be noted that the lip language image acquisition sub-unit 412 includes a video detection module and a lip language image acquisition module; wherein, the video detection module is used to perform the video detection on the video through the Resnet-based face detection model. Frame-by-frame detection; the lip language image acquisition module is used to acquire the lip language images of consecutive frames in the time period from the starting point to the ending point of the subject to be tested reading the password in the video.
具体地说,所述唇语图像获取模块包括嘴唇特征信息获取子模块和时间点获取子模块;其中,所述嘴唇特征信息获取子模块,用于通过dlib数据库的特征点模型解析所述唇语图像获取嘴唇特征信息;所述时间点获取子模块,用于所述视频中待测主体读密码的起点时刻至终点时刻的时间段通过解析所述读密码视频的声音波形获取。Specifically, the lip language image acquisition module includes a lip feature information acquisition sub-module and a time point acquisition sub-module; wherein the lip feature information acquisition sub-module is used to analyze the lip language through the feature point model of the dlib database The image acquires lip feature information; the time point acquisition sub-module is used for the time period from the start time to the end time of the subject to be tested for reading the password in the video by parsing the sound waveform of the password reading video.
所述唇语密码预测值获取单元420包括基于双向LSTM的唇语识别模型的构建模块421;所述基于双向LSTM的唇语识别模型的构建模块包421括初始网络层构建子模块、双向LSTM层构建子模块、Softmax层构建模块和优化网络层构建模块;其中,所述初始网络层构建子模块,用于构建用于获取待测主体的嘴唇特征的初始网络层;所述初始网络层为2D卷积网络;所述双向LSTM层构建子模块,用于在所述初始网络层上构建用于提取训练集数据中的时间特征的双向LSTM层;所述Softmax层构建模块,用于在所述双向LSTM层上构建用于输出唇语密码预测值的Softmax层;所述优化网络层构建模块,用于在所述Softmax层上构建优化网络层;其中,所述优化网络层用于将所述唇语密码预测值输入损失函数进行迭代训练,直至损失函数的值达到设定阈值。所述Softmax层构建模块包括唇语密码预测值获取子模块;所述唇语密码预测值获取子模块,用于通过所述双向LSTM层提取的时间特征数据获取唇语密码预测值。所述Softmax层构建模块还包括参数调整子模块,所述参数调整子模块,用于采用激活函数sigmoid和梯度下降算法沿着梯度方向调整权重W和偏移量b。The lip language password prediction value acquisition unit 420 includes a two-way LSTM-based lip language recognition model building module 421; the two-way LSTM-based lip language recognition model building module includes 421 an initial network layer construction sub-module, a two-way LSTM layer The construction sub-module, the Softmax layer construction module and the optimized network layer construction module; wherein the initial network layer construction sub-module is used to construct an initial network layer for obtaining the lip characteristics of the subject to be tested; the initial network layer is 2D Convolutional network; the two-way LSTM layer construction sub-module is used to construct a two-way LSTM layer for extracting temporal features in the training set data on the initial network layer; the Softmax layer construction module is used to The Softmax layer for outputting the predicted value of the lip cipher is constructed on the bidirectional LSTM layer; the optimized network layer construction module is used to construct an optimized network layer on the Softmax layer; wherein, the optimized network layer is used to transfer the The predicted value of the lip password is input to the loss function for iterative training until the value of the loss function reaches the set threshold. The Softmax layer construction module includes a lip password prediction value acquisition sub-module; the lip password prediction value acquisition sub-module is used to obtain a lip password prediction value through the temporal feature data extracted by the two-way LSTM layer. The Softmax layer construction module also includes a parameter adjustment sub-module, the parameter adjustment sub-module is used to adjust the weight W and the offset b along the gradient direction by using the activation function sigmoid and the gradient descent algorithm.
本申请提供一种基于唇语密码的人脸识别方法,应用于一种电子装置5。图5示出了根据本申请基于唇语密码的人脸识别方法较佳实施例的应用环境示意图。The present application provides a face recognition method based on lip language password, which is applied to an electronic device 5. Fig. 5 shows a schematic diagram of an application environment of a preferred embodiment of a face recognition method based on lip ciphers according to the present application.
在本实施例中,电子装置5可以是服务器、智能手机、平板电脑、便携计算机、桌上型计算机等具有运算功能的终端设备。In this embodiment, the electronic device 5 may be a terminal device with arithmetic function, such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, and the like.
如图5所示,该电子装置5包括:处理器52、存储器51、通信总线53及网络接口54。As shown in FIG. 5, the electronic device 5 includes a processor 52, a memory 51, a communication bus 53 and a network interface 54.
存储器51包括至少一种类型的可读存储介质。所述至少一种类型的可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器51等的非易失性存储介质。在一些实施例中,所述可读存储介质可以是所述电子装置5的内部存储单元,例如该电子装置5的硬盘。在另一些实施例中,所述可读存储介质也可以是所述电子装置5的外部存储器51,例如所述电子装置5上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。The memory 51 includes at least one type of readable storage medium. The at least one type of readable storage medium may be a non-volatile storage medium such as flash memory, hard disk, multimedia card, card-type memory 51, and the like. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 5, such as a hard disk of the electronic device 5. In other embodiments, the readable storage medium may also be the external memory 51 of the electronic device 5, such as a plug-in hard disk equipped on the electronic device 5, or a smart memory card (Smart Media Card, SMC). , Secure Digital (SD) card, Flash Card (Flash Card), etc.
在本实施例中,所述存储器51的可读存储介质通常用于存储安装于所述电子装置5的基于唇语密码的人脸识别程序50等。所述存储器51还可以用于暂时地存储已经输出或者将要输出的数据。In this embodiment, the readable storage medium of the memory 51 is generally used to store the lip-password-based face recognition program 50 and the like installed in the electronic device 5. The memory 51 can also be used to temporarily store data that has been output or will be output.
处理器52在一些实施例中可以是一中央处理器(Central Processing Unit,CPU),微处理器或其他数据处理芯片,用于运行存储器51中存储的程序代码或处理数据,例如执行基于唇语密码的人脸识别程序50等。In some embodiments, the processor 52 may be a central processing unit (CPU), microprocessor or other data processing chip, used to run the program code or processing data stored in the memory 51, for example, to execute lip-based Password facial recognition program 50 etc.
通信总线53用于实现这些组件之间的连接通信。The communication bus 53 is used to realize the connection and communication between these components.
网络接口54可选地可以包括标准的有线接口、无线接口(如WI-FI接口),通常用于在该电子装置5与其他电子设备之间建立通信连接。The network interface 54 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is generally used to establish a communication connection between the electronic device 5 and other electronic devices.
图5仅示出了具有组件51-54的电子装置5,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。FIG. 5 only shows the electronic device 5 with the components 51-54, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
可选地,该电子装置5还可以包括用户接口,用户接口可以包括输入单元比如键盘(Keyboard)、语音输入装置比如麦克风(microphone)等具有语音识别功能的设备、语音输出装置比如音响、耳机等,可选地用户接口还可以包括标准的有线接口、无线接口。Optionally, the electronic device 5 may also include a user interface. The user interface may include an input unit such as a keyboard (Keyboard), a voice input device such as a microphone (microphone) and other devices with voice recognition functions, and a voice output device such as audio, earphones, etc. Optionally, the user interface may also include a standard wired interface and a wireless interface.
可选地,该电子装置5还可以包括显示器,显示器也可以称为显示屏或显示单元。在一些实施例中可以是LED显示器、液晶显示器、触控式液晶显示器以及有机发光二极管(Organic Light-Emitting Diode,OLED)触摸器等。显示器用于显示在电子装置5中处理的信息以及用于显示可视化的用户界面。Optionally, the electronic device 5 may also include a display, and the display may also be referred to as a display screen or a display unit. In some embodiments, it may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, and an organic light-emitting diode (Organic Light-Emitting Diode, OLED) touch device, etc. The display is used to display the information processed in the electronic device 5 and to display a visualized user interface.
可选地,该电子装置5还可以包括射频(Radio Frequency,RF)电路,传感器、音频电路等等,在此不再赘述。Optionally, the electronic device 5 may also include a radio frequency (RF) circuit, a sensor, an audio circuit, etc., which will not be repeated here.
在图5所示的装置实施例中,作为一种计算机存储介质的存储器51中可以存储操作系统、以及基于唇语密码的人脸识别程序50;处理器52执行存储器51中存储的基于唇语密码的人脸识别程序50时实现如下步骤:S110、获得待测主体的读密码视频;S120、通过基于Resnet的人脸检测模型对所获取的读密码视频进行逐帧检测,获取所获得的视频中待测主体读密码的起点时刻至终点时刻的时间段内连续帧的唇语图像;S130、根据所述时间段内的连续帧的唇语图像,确定所述时间段内的唇语特征集;S140、将所述唇语特征集输入训练好的基于双向LSTM的唇语识别模型,得到唇语密码预测值;S150、若所述唇语密码预测值与所述唇语识别模型中存储的密码一致,则确认待测主体通过唇语密码识别。In the device embodiment shown in FIG. 5, the memory 51, which is a computer storage medium, can store an operating system and a face recognition program 50 based on a lip code; the processor 52 executes the lip based code stored in the memory 51 The password facial recognition program 50 implements the following steps: S110, obtain the password-reading video of the subject to be tested; S120, detect the obtained password-reading video frame by frame through the Resnet-based face detection model, and obtain the obtained video The lip language images of consecutive frames in the time period from the start time to the end time when the subject to be tested reads the password; S130, according to the lip language images of the consecutive frames in the time period, determine the lip language feature set in the time period S140, input the lip language feature set into a trained lip language recognition model based on two-way LSTM to obtain a lip language password predicted value; S150, if the lip language password predicted value is stored in the lip language recognition model If the passwords are consistent, it is confirmed that the subject to be tested is identified by the lip-password.
在其他实施例中,基于唇语密码的人脸识别程序50还可以被分割为一个或者多个模块,一个或者多个模块被存储于存储器51中,并由处理器52执行,以完成本申请。本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段。In other embodiments, the facial recognition program 50 based on lip ciphers may also be divided into one or more modules, and the one or more modules are stored in the memory 51 and executed by the processor 52 to complete the application . The module referred to in this application refers to a series of computer program instruction segments that can complete specific functions.
所述基于唇语密码的人脸识别程序50可以被分割为:唇语特征集获取单元、唇语密码预测值获取单元和密码判定单元。所实现的功能或操作步骤均与上文类似,此处不再详述。The face recognition program 50 based on the lip language password can be divided into: a lip language feature set acquisition unit, a lip language password prediction value acquisition unit, and a password determination unit. The realized functions or operation steps are similar to the above, and will not be described in detail here.
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质中包括基于唇语密码的人脸识别程序,所述基于唇语密码的人脸识别程序被处理器执行时实现如下操作:S110、获得待测主体的读密码视频;S120、通过基于Resnet的人脸检测模型对所述视频进行逐帧检测,获取视频中待测主体读密码的起点时刻至终点时刻的时间段内连续帧的唇语图像;S130、根据所述时间段内的连续帧的唇语图像,确定所述时间段内的唇语特征集;S140、将所述唇语特征集输入训练好的基于双向LSTM的唇语识别模型,得到唇语密码预测值;S150、若所述唇语密码预测值与所述唇语识别模型中存储的密码一致,则确认待测主体通过唇语密码识别。In addition, an embodiment of the present application also proposes a computer-readable storage medium, the computer-readable storage medium includes a lip-password-based facial recognition program, and the lip-password-based facial recognition program is executed by a processor Realize the following operations: S110. Obtain the password-reading video of the subject to be tested; S120. Detect the video frame by frame through the Resnet-based face detection model to obtain the time from the starting point to the end point of the subject to be tested in the video. Lip language images of consecutive frames in a time period; S130. Determine a lip language feature set in the time period according to the lip language images of consecutive frames in the time period; S140. Input the lip language feature set for training. The lip language recognition model based on the two-way LSTM obtains the lip language password prediction value; S150. If the lip language password prediction value is consistent with the password stored in the lip language recognition model, confirm that the subject to be tested is recognized by the lip language password .
本申请之计算机可读存储介质的具体实施方式与上述基于唇语密码的人脸识别方法、电子装置的具体实施方式大致相同,在此不再赘述。The specific implementation of the computer-readable storage medium of the present application is substantially the same as the specific implementation of the above-mentioned face recognition method and electronic device based on the lip-password, and will not be repeated here.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, device, article or method including a series of elements not only includes those elements, It also includes other elements not explicitly listed, or elements inherent to the process, device, article, or method. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, device, article, or method that includes the element.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。The serial numbers of the foregoing embodiments of the present application are only for description, and do not represent the advantages and disadvantages of the embodiments. Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above. , Magnetic disks, optical disks), including several instructions to make a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) execute the method described in each embodiment of the present application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the application, and do not limit the scope of the patent for this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of the application, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims (20)

  1. 一种基于唇语密码的人脸识别方法,应用于电子装置,其特征在于,所述方法包括:A face recognition method based on lip code, applied to an electronic device, characterized in that the method includes:
    S110、获得待测主体的读密码视频;S110. Obtain a password-reading video of the subject to be tested;
    S120、通过基于Resnet的人脸检测模型对所述视频进行逐帧检测,获取所述视频中待测主体读密码的起点时刻至终点时刻的时间段内连续帧的唇语图像;S120: Perform frame-by-frame detection on the video by using a Resnet-based face detection model, and obtain lip language images of consecutive frames in the time period from the starting point to the ending point when the subject to be tested reads the password in the video;
    S130、根据所述时间段内的连续帧的唇语图像,确定所述时间段内的唇语特征集;S130: Determine a lip language feature set in the time period according to the lip language images of consecutive frames in the time period;
    S140、将所述唇语特征集输入训练好的基于双向LSTM的唇语识别模型,得到唇语密码预测值;S140: Input the lip language feature set into a trained lip language recognition model based on two-way LSTM to obtain a lip language password prediction value;
    S150、若所述唇语密码预测值与所述唇语识别模型中存储的密码一致,则确认待测主体通过唇语密码识别。S150. If the predicted value of the lip language password is consistent with the password stored in the lip language recognition model, confirm that the subject to be tested is recognized by the lip language password.
  2. 根据权利要求1所述的基于唇语密码的人脸识别方法,其特征在于,所述基于双向LSTM的唇语识别模型的构建方法包括:The face recognition method based on lip ciphers according to claim 1, wherein the method for constructing a lip recognition model based on two-way LSTM comprises:
    S210、构建用于获取待测主体的嘴唇特征的初始网络层;所述初始网络层为2D卷积网络;S210. Construct an initial network layer for acquiring lip features of the subject to be tested; the initial network layer is a 2D convolutional network;
    S220、在所述初始网络层上构建用于提取训练集数据中的时间特征的双向LSTM层;S220: Construct a two-way LSTM layer for extracting temporal features in the training set data on the initial network layer;
    S230、在所述双向LSTM层上构建用于输出唇语密码预测值的Softmax层;S230: Construct a Softmax layer on the two-way LSTM layer for outputting the prediction value of the lip cipher;
    S240、在所述Softmax层上构建优化网络层;其中,所述优化网络层用于将所述唇语密码预测值输入损失函数进行迭代训练,直至损失函数的值达到设定阈值。S240. Construct an optimized network layer on the Softmax layer; wherein the optimized network layer is used to input the predicted value of the lip password into a loss function for iterative training until the value of the loss function reaches a set threshold.
  3. 根据权利要求2所述的基于唇语密码的人脸识别方法,其特征在于,步骤S140包括:The face recognition method based on lip language passwords according to claim 2, wherein step S140 comprises:
    在所述Softmax层中,通过唇语密码预测值公式将所述双向LSTM层提取的时间特征数据计算为唇语密码预测值,所述唇语密码预测值公式如下:P=W*X+b;In the Softmax layer, the time feature data extracted by the two-way LSTM layer is calculated as the lip password prediction value by the lip password prediction value formula, and the lip password prediction value formula is as follows: P=W*X+b ;
    其中,P为唇语密码预测值,X为时间特征数据,W为权重,b为偏移量。Among them, P is the predicted value of the lip language password, X is the time feature data, W is the weight, and b is the offset.
  4. 根据权利要求1所述的基于唇语密码的人脸识别方法,其特征在于,所述确定所述时间段内的唇语特征集步骤S130包括:The face recognition method based on lip language password according to claim 1, wherein the step S130 of determining the lip language feature set in the time period comprises:
    S310、将所述唇语图像进行归一化;S310: Normalize the lip language image;
    S320、将归一化后的唇语图像存储为规定格式的数据集样本;所述数据集样本包括:样本数、数据序列长度、图像长、图像宽、图像深度。S320. Store the normalized lip language image as a data set sample in a prescribed format; the data set sample includes: number of samples, data sequence length, image length, image width, and image depth.
  5. 根据权利要求1所述的基于唇语密码的人脸识别方法,其特征在于,所述步骤S120中,通过dlib数据库的特征点模型解析所述唇语图像获取嘴唇特征信息;所述视频中待测主体读密码的起点时刻至终点时刻的时间段通过解析所述读密码视频的声音波形获取。The face recognition method based on lip language passwords according to claim 1, wherein in the step S120, the lip language image is parsed through the feature point model of the dlib database to obtain lip feature information; The time period from the start point to the end point of the main body reading the password is obtained by analyzing the sound waveform of the password reading video.
  6. 根据权利要求3所述的基于唇语密码的人脸识别方法,其特征在于,The face recognition method based on lip language password according to claim 3, characterized in that,
    在所述Softmax层中,采用激活函数sigmoid和梯度下降算法沿着梯度方向调整权重W和偏移量b,其中,所述激活函数sigmoid的梯度越大,所述权重W和偏移量b调整越快。In the Softmax layer, the activation function sigmoid and gradient descent algorithm are used to adjust the weight W and the offset b along the gradient direction, where the greater the gradient of the activation function sigmoid, the weight W and the offset b are adjusted Faster.
  7. 一种基于唇语密码的人脸识别系统,其特征在于,包括唇语特征集获取单元、唇语密码预测值获取单元和密码判定单元;其中,A face recognition system based on lip language password, which is characterized by comprising a lip language feature set acquisition unit, a lip language password prediction value acquisition unit and a password determination unit; wherein,
    所述唇语特征集获取单元,用于获得待测主体的读密码视频;通过基于Resnet的人脸检测模型对所述视频进行逐帧检测,获取所述视频中待测主体读密码的起点时刻至终点时刻的时间段内连续帧的唇语图像;根据所述时间段内的连续帧的唇语图像,确定所述时间段内的唇语特征集;The lip language feature set acquisition unit is used to obtain the password reading video of the subject to be tested; the video is detected frame by frame through the Resnet-based face detection model to obtain the starting moment of the subject to be tested in the video to read the password The lip language images of consecutive frames in the time period to the end point; determine the lip language feature set in the time period according to the lip language images of the consecutive frames in the time period;
    所述唇语密码预测值获取单元,用于将所述唇语特征集输入训练好的基于双向LSTM的唇语识别模型,得到唇语密码预测值;The lip language password prediction value obtaining unit is configured to input the lip language feature set into a trained two-way LSTM-based lip language recognition model to obtain the lip language password prediction value;
    所述密码判定单元,用于将所述唇语密码预测值与所述唇语识别模型中存储的密码进行比对,若一致,则确认待测主体通过唇语密码识别。The password determination unit is configured to compare the predicted value of the lip language password with the password stored in the lip language recognition model, and if they are consistent, confirm that the subject to be tested is recognized by the lip language password.
  8. 根据权利要求7所述的基于唇语密码的人脸识别系统,其特征在于,The face recognition system based on lip password according to claim 7, characterized in that,
    所述唇语特征集获取单元包括视频获取子单元、唇语图像获取子单元以及唇语特征集获取子单元;其中,The lip language feature set acquisition unit includes a video acquisition sub-unit, a lip language image acquisition sub-unit, and a lip language feature set acquisition sub-unit; wherein,
    所述视频获取子单元,用于获得待测主体的读密码视频;The video acquisition subunit is used to acquire the password-reading video of the subject to be tested;
    所述唇语图像获取子单元,用于通过基于Resnet的人脸检测模型对所述视频进行逐帧检测,获取所述视频中待测主体读密码的起点时刻至终点时刻的时间段内连续帧的唇语图像;The lip language image acquisition subunit is used to detect the video frame by frame through the Resnet-based face detection model, and acquire consecutive frames in the time period from the starting point to the ending point when the subject to be tested reads the password in the video Image of lip language;
    所述唇语特征集获取子单元,用于根据所述时间段内的连续帧的唇语图像,确定所述时间段内的唇语特征集。The lip language feature set acquiring subunit is configured to determine the lip language feature set in the time period according to the lip language images of consecutive frames in the time period.
  9. 根据权利要求8所述的基于唇语密码的人脸识别系统,其特征在于,The face recognition system based on lip language password according to claim 8, characterized in that,
    所述唇语特征集获取子单元包括图像归一化模块和数据集样本获取模块;其中,The lip language feature set acquisition subunit includes an image normalization module and a data set sample acquisition module; wherein,
    所述图像归一化模块,用于将所述唇语图像进行归一化;The image normalization module is used to normalize the lip language image;
    所述数据集样本获取模块,用于将归一化后的唇语图像存储为规定格式的数据集样本。The data set sample acquisition module is used to store the normalized lip language image as a data set sample in a prescribed format.
  10. 根据权利要求8所述的基于唇语密码的人脸识别系统,其特征在于,所述所述唇语图像获取子单元包括视频检测模块和唇语图像获取模块;其中,The face recognition system based on lip language password according to claim 8, wherein the lip language image acquisition sub-unit includes a video detection module and a lip language image acquisition module; wherein,
    所述视频检测模块,用于通过基于Resnet的人脸检测模型对所述视频进行逐帧检测;The video detection module is configured to detect the video frame by frame through the Resnet-based face detection model;
    所述唇语图像获取模块,用于获取所述视频中待测主体读密码的起点时刻至终点时刻的时间段内连续帧的唇语图像。The lip language image acquisition module is used to acquire the lip language images of consecutive frames in the time period from the start time to the end time when the subject to be tested reads the password.
  11. 根据权利要求10所述的基于唇语密码的人脸识别系统,其特征在于,The face recognition system based on lip ciphers according to claim 10, characterized in that,
    所述唇语图像获取模块包括嘴唇特征信息获取子模块和时间点获取子模块;其中,The lip language image acquisition module includes a lip feature information acquisition sub-module and a time point acquisition sub-module; wherein,
    所述嘴唇特征信息获取子模块,用于通过dlib数据库的特征点模型解析所述唇语图像获取嘴唇特征信息;The lip feature information acquisition sub-module is used to parse the lip language image through the feature point model of the dlib database to obtain lip feature information;
    所述时间点获取子模块,用于所述视频中待测主体读密码的起点时刻至终点时刻的时间段通过解析所述读密码视频的声音波形获取。The time point acquisition sub-module is used to obtain the time period from the start time to the end time of the password reading of the subject to be tested in the video by parsing the sound waveform of the password reading video.
  12. 根据权利要求7所述的基于唇语密码的人脸识别系统,其特征在于,The face recognition system based on lip password according to claim 7, characterized in that,
    所述唇语密码预测值获取单元包括基于双向LSTM的唇语识别模型的构建模块;所述基于双向LSTM的唇语识别模型的构建模块包括初始网络层构建子模块、双向LSTM层构建子模块、Softmax层构建模块和优化网络层构建模块;其中,The lip cipher prediction value acquisition unit includes a two-way LSTM-based lip language recognition model building module; the two-way LSTM-based lip language recognition model building module includes an initial network layer construction sub-module, a two-way LSTM layer construction sub-module, Softmax layer building module and optimized network layer building module; among them,
    所述初始网络层构建子模块,用于构建用于获取待测主体的嘴唇特征的初始网络层;所述初始网络层为2D卷积网络;The initial network layer construction sub-module is used to construct an initial network layer for obtaining the lip features of the subject to be tested; the initial network layer is a 2D convolutional network;
    所述双向LSTM层构建子模块,用于在所述初始网络层上构建用于提取训练集数据中的时间特征的双向LSTM层;The two-way LSTM layer construction sub-module is used to construct a two-way LSTM layer for extracting temporal features in the training set data on the initial network layer;
    所述Softmax层构建模块,用于在所述双向LSTM层上构建用于输出唇语密码预测值的Softmax层;The Softmax layer construction module is configured to construct a Softmax layer for outputting the predicted value of the lip cipher on the two-way LSTM layer;
    所述优化网络层构建模块,用于在所述Softmax层上构建优化网络层;其中,所述优化网络层用于将所述唇语密码预测值输入损失函数进行迭代训练,直至损失函数的值达到设定阈值。The optimized network layer construction module is used to construct an optimized network layer on the Softmax layer; wherein, the optimized network layer is used to input the prediction value of the lip password into a loss function for iterative training until the value of the loss function Reached the set threshold.
  13. 根据权利要求12所述的基于唇语密码的人脸识别系统,其特征在于,The face recognition system based on lip language password according to claim 12, characterized in that,
    所述Softmax层构建模块包括唇语密码预测值获取子模块;所述唇语密码预测值获取子模块,用于通过所述双向LSTM层提取的时间特征数据获取唇语密码预测值。The Softmax layer construction module includes a lip password prediction value acquisition sub-module; the lip password prediction value acquisition sub-module is used to obtain a lip password prediction value through the temporal feature data extracted by the two-way LSTM layer.
  14. 根据权利要求12或13所述的基于唇语密码的人脸识别系统,其特征在于,所述Softmax层构建模块还包括参数调整子模块,所述参数调整子模块,用于采用激活函数sigmoid和梯度下降算法沿着梯度方向调整权重W和偏移量b。The face recognition system based on lip ciphers according to claim 12 or 13, characterized in that the Softmax layer construction module further comprises a parameter adjustment sub-module, and the parameter adjustment sub-module is configured to adopt the activation function sigmoid and The gradient descent algorithm adjusts the weight W and the offset b along the gradient direction.
  15. 一种电子装置,其特征在于,该电子装置包括:存储器、处理器,所述存储器中存储有基于唇语密码的人脸识别程序,所述基于唇语密码的人脸识别程序被所述处理器执行时实现如下步骤:An electronic device, characterized in that it includes a memory and a processor, the memory stores a face recognition program based on a lip language password, and the face recognition program based on a lip language password is processed by the The following steps are implemented when the device is executed:
    S110、获得待测主体的读密码视频;S110. Obtain a password-reading video of the subject to be tested;
    S120、通过基于Resnet的人脸检测模型对所述视频进行逐帧检测,获取所述视频中待测主体读密码的起点时刻至终点时刻的时间段内连续帧的唇语图像;S120: Perform frame-by-frame detection on the video by using a Resnet-based face detection model, and obtain lip language images of consecutive frames in the time period from the starting point to the ending point when the subject to be tested reads the password in the video;
    S130、根据所述时间段内的连续帧的唇语图像,确定所述时间段内的唇语特征集;S130: Determine a lip language feature set in the time period according to the lip language images of consecutive frames in the time period;
    S140、将所述唇语特征集输入训练好的基于双向LSTM的唇语识别模型,得到唇语密码预测值;S140: Input the lip language feature set into a trained lip language recognition model based on two-way LSTM to obtain a lip language password prediction value;
    S150、若所述唇语密码预测值与所述唇语识别模型中存储的密码一致,则确认待测主体通过唇语密码识别。S150. If the predicted value of the lip language password is consistent with the password stored in the lip language recognition model, confirm that the subject to be tested is recognized by the lip language password.
  16. 根据权利要求15所述的电子装置,其特征在于,所述的基于双向LSTM的唇语识别模型的构建方法包括:The electronic device according to claim 15, wherein the method for constructing a lip language recognition model based on two-way LSTM comprises:
    S210、构建用于获取待测主体的嘴唇特征的初始网络层;所述初始网络层为2D卷积网络;S210. Construct an initial network layer for acquiring lip features of the subject to be tested; the initial network layer is a 2D convolutional network;
    S220、在所述初始网络层上构建用于提取训练集数据中的时间特征的双向LSTM层;S220: Construct a two-way LSTM layer for extracting temporal features in the training set data on the initial network layer;
    S230、在所述双向LSTM层上构建用于输出唇语密码预测值的Softmax层;S230: Construct a Softmax layer on the two-way LSTM layer for outputting the prediction value of the lip cipher;
    S240、在所述Softmax层上构建优化网络层;其中,所述优化网络层用于将所述唇语密码预测值输入损失函数进行迭代训练,直至损失函数的值达到设定阈值。S240. Construct an optimized network layer on the Softmax layer; wherein the optimized network layer is used to input the predicted value of the lip password into a loss function for iterative training until the value of the loss function reaches a set threshold.
  17. 根据权利要求16所述的电子装置,其特征在于,步骤S140包括:在所述Softmax层中,通过唇语密码预测公式将所述双向LSTM层提取的时间特征数据计算为唇语密码预测值,所述唇语密码预测值公式如下:The electronic device according to claim 16, wherein step S140 comprises: in the Softmax layer, calculating the time feature data extracted by the bidirectional LSTM layer as a lip cipher prediction value through a lip cipher cipher prediction formula, The formula for the prediction value of the lip language password is as follows:
    P=W*X+b;P=W*X+b;
    其中,P为唇语密码预测值,X为时间特征数据,W为权重,b为偏移量。Among them, P is the predicted value of the lip language password, X is the time feature data, W is the weight, and b is the offset.
  18. 根据权利要求15所述的电子装置,其特征在于,所述步骤S120中,通过dlib数据库的特征点模型解析所述唇语图像获取嘴唇特征信息;所述视频中待测主体读密码的起点时刻至终点时刻的时间段通过解析所述读密码视频的声音波形获取。The electronic device according to claim 15, characterized in that, in the step S120, the lip language image is analyzed through the feature point model of the dlib database to obtain lip feature information; the starting time of the subject to be tested in the video to read the password The time period to the end point is obtained by analyzing the sound waveform of the password-reading video.
  19. 根据权利要求15所述的电子装置,其特征在于,所述确定所述时间段内的唇语特征集步骤S130包括:The electronic device according to claim 15, wherein the step S130 of determining the lip language feature set in the time period comprises:
    S310、将所述唇语图像进行归一化;S310: Normalize the lip language image;
    S320、将归一化后的唇语图像存储为规定格式的数据集样本;所述数据集样本包括:样本数、数据序列长度、图像长、图像宽、图像深度。S320. Store the normalized lip language image as a data set sample in a prescribed format; the data set sample includes: number of samples, data sequence length, image length, image width, and image depth.
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括基于唇语密码的人脸识别程序,所述基于唇语密码的人脸识别程序被处理器执行时,实现如权利要求1至6中任一项所述的基于唇语密码的人脸识别方法的步骤。A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, the computer program includes a lip-password-based face recognition program, and the lip-password-based face recognition program When executed by the processor, the steps of the face recognition method based on lip ciphers according to any one of claims 1 to 6 are realized.
PCT/CN2019/118281 2019-09-19 2019-11-14 Lip password-based face recognition method and system, device, and storage medium WO2021051602A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910885930.2 2019-09-19
CN201910885930.2A CN110717407A (en) 2019-09-19 2019-09-19 Human face recognition method, device and storage medium based on lip language password

Publications (1)

Publication Number Publication Date
WO2021051602A1 true WO2021051602A1 (en) 2021-03-25

Family

ID=69209940

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118281 WO2021051602A1 (en) 2019-09-19 2019-11-14 Lip password-based face recognition method and system, device, and storage medium

Country Status (2)

Country Link
CN (1) CN110717407A (en)
WO (1) WO2021051602A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116132637A (en) * 2023-02-15 2023-05-16 武汉博晟安全技术股份有限公司 Online examination monitoring system and method, electronic equipment and storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401134A (en) * 2020-02-19 2020-07-10 北京三快在线科技有限公司 Living body detection method, living body detection device, electronic apparatus, and storage medium
CN112089595A (en) * 2020-05-22 2020-12-18 未来穿戴技术有限公司 Login method of neck massager, neck massager and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102201055A (en) * 2010-03-25 2011-09-28 索尼公司 Information processing device, information processing method, and program
US9721079B2 (en) * 2014-01-15 2017-08-01 Steve Y Chen Image authenticity verification using speech
CN107404381A (en) * 2016-05-19 2017-11-28 阿里巴巴集团控股有限公司 A kind of identity identifying method and device
CN107977559A (en) * 2017-11-22 2018-05-01 杨晓艳 A kind of identity identifying method, device, equipment and computer-readable recording medium
CN109409195A (en) * 2018-08-30 2019-03-01 华侨大学 A kind of lip reading recognition methods neural network based and system
CN109726624A (en) * 2017-10-31 2019-05-07 百度(美国)有限责任公司 Identity identifying method, terminal device and computer readable storage medium
CN110163156A (en) * 2019-05-24 2019-08-23 南京邮电大学 It is a kind of based on convolution from the lip feature extracting method of encoding model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102201055A (en) * 2010-03-25 2011-09-28 索尼公司 Information processing device, information processing method, and program
US9721079B2 (en) * 2014-01-15 2017-08-01 Steve Y Chen Image authenticity verification using speech
CN107404381A (en) * 2016-05-19 2017-11-28 阿里巴巴集团控股有限公司 A kind of identity identifying method and device
CN109726624A (en) * 2017-10-31 2019-05-07 百度(美国)有限责任公司 Identity identifying method, terminal device and computer readable storage medium
CN107977559A (en) * 2017-11-22 2018-05-01 杨晓艳 A kind of identity identifying method, device, equipment and computer-readable recording medium
CN109409195A (en) * 2018-08-30 2019-03-01 华侨大学 A kind of lip reading recognition methods neural network based and system
CN110163156A (en) * 2019-05-24 2019-08-23 南京邮电大学 It is a kind of based on convolution from the lip feature extracting method of encoding model

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116132637A (en) * 2023-02-15 2023-05-16 武汉博晟安全技术股份有限公司 Online examination monitoring system and method, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN110717407A (en) 2020-01-21

Similar Documents

Publication Publication Date Title
WO2019120115A1 (en) Facial recognition method, apparatus, and computer apparatus
WO2019200781A1 (en) Receipt recognition method and device, and storage medium
WO2019109526A1 (en) Method and device for age recognition of face image, storage medium
US9477685B1 (en) Finding untagged images of a social network member
WO2019033525A1 (en) Au feature recognition method, device and storage medium
WO2021051602A1 (en) Lip password-based face recognition method and system, device, and storage medium
WO2010103736A1 (en) Face authentification device, person image search system, face authentification device control program, computer readable recording medium, and method of controlling face authentification device
US11641352B2 (en) Apparatus, method and computer program product for biometric recognition
WO2019085331A1 (en) Fraud possibility analysis method, device, and storage medium
CN108197592B (en) Information acquisition method and device
TWI712980B (en) Claim information extraction method and device, and electronic equipment
US9824313B2 (en) Filtering content in an online system based on text and image signals extracted from the content
US10423817B2 (en) Latent fingerprint ridge flow map improvement
CN111626371A (en) Image classification method, device and equipment and readable storage medium
US20220046012A1 (en) Method and System for Verifying the Identity of a User
US9378406B2 (en) System for estimating gender from fingerprints
US20230410220A1 (en) Information processing apparatus, control method, and program
US10755074B2 (en) Latent fingerprint pattern estimation
KR20220016217A (en) Systems and methods for using human recognition in a network of devices
CN113255557B (en) Deep learning-based video crowd emotion analysis method and system
US20220004652A1 (en) Providing images with privacy label
US11113838B2 (en) Deep learning based tattoo detection system with optimized data labeling for offline and real-time processing
Nahar et al. Twins and Similar Faces Recognition Using Geometric and Photometric Features with Transfer Learning
CN107341457A (en) Method for detecting human face and device
JP2013069187A (en) Image processing system, image processing method, server and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19946025

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19946025

Country of ref document: EP

Kind code of ref document: A1