WO2021051602A1

WO2021051602A1 - Lip password-based face recognition method and system, device, and storage medium

Info

Publication number: WO2021051602A1
Application number: PCT/CN2019/118281
Authority: WO
Inventors: 张国辉; 董洪涛
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-09-19
Filing date: 2019-11-14
Publication date: 2021-03-25
Also published as: CN110717407A

Abstract

The present application belongs to the field of biometric recognition technology, and provides a lip password-based face recognition method and system, a device, and a storage medium. Said method comprises: obtaining a video in which a main body to be detected reads a password; performing detection on the video frame by frame by means of a Resnet-based face detection model, so as to acquire continuous lip images in the video within a time period from a start time to an end time at which the main body to be detected reads a password; determining a lip feature set within the time period according to the continuous lip images within the time period; inputting the lip feature set into a bidirectional LSTM-based trained lip recognition model, so as to obtain a prediction value of the lip password; and if the prediction value of the lip password is consistent with a password stored in the lip recognition model, determining that the main body to be detected passes the lip password recognition. The lip password-based face recognition method in the present application is able to achieve the effects of quick detection and high detection accuracy.

Description

Face recognition method, system, device and storage medium based on lip code

This application requires the priority of the patent application whose application number is 201910885930.2, the filing date is September 19, 2019, and the invention-creation title is "Face Recognition Method, Device and Storage Medium Based on Lip Password".

Technical field

This application relates to the field of biometric recognition technology, and in particular to a face recognition method, system, device, and storage medium based on lip ciphers.

Background technique

In the traditional password identification method, the password used is a combination of numbers, characters or other symbols. The user must accurately remember the password used and enter it correctly to be successfully identified. Therefore, the traditional password identification method is easy to forget and remember incorrectly. , The operation is cumbersome and other disadvantages.

The applicant realizes that fingerprint recognition, face recognition, and iris recognition methods can overcome the above-mentioned shortcomings of traditional passwords that need to be memorized and input one by one, but they also have the following shortcomings: 1. Simple fingerprint recognition and face recognition, facing the attack of copying fingerprints and photos The risk of using copied fingerprints and static photos will deceive fingerprint recognition and face recognition; 2. The security of iris recognition is higher, but there is the problem of expensive equipment and higher economic costs.

In view of the existence of the above problems, there is an urgent need for a safe identification method that has low detection cost and does not lose detection accuracy.

Summary of the invention

This application provides a face recognition method, system, electronic device, and computer readable storage medium based on lip language passwords, which mainly use the dlib+Resnet face key point detection model to screen lip language images; through two-way LSTM The lip language recognition model completes the classification of lip language images, and realizes the technical effect of using lip language as a password for face recognition.

In order to achieve the above objective, the present application provides a face recognition method based on lip language password, which is applied to an electronic device. The method includes: S110: Obtain a password-reading video of the subject to be tested; S120: Use a Resnet-based face detection model to The video is detected frame by frame, and the lip language images of consecutive frames in the time period from the start time to the end time of the subject to be tested reading the password in the video are obtained; S130, the lip language images of the consecutive frames in the time period , Determine the lip language feature set in the time period; S140, input the lip language feature set into the trained two-way LSTM-based lip language recognition model to obtain the predicted value of the lip language password; S150, if the lip language password If the predicted value is consistent with the password stored in the lip language recognition model, it is confirmed that the subject to be tested is recognized by the lip language password.

To achieve the above objective, the present application provides a lip language password-based face recognition system, including a lip language feature set acquisition unit, a lip language password prediction value acquisition unit, and a password determination unit; wherein the lip language feature set acquisition unit , Used to obtain the password-reading video of the subject to be tested; the video is detected frame by frame through the face detection model based on Resnet, and the time period from the starting point of the subject to be tested to the end point in the video is obtained continuously Frame of lip language images; determine the lip language feature set in the time period according to the lip language images of consecutive frames in the time period; the lip language password prediction value acquisition unit is configured to combine the lip language features Input the trained lip language recognition model based on two-way LSTM to obtain the predicted value of the lip language password; the password determination unit is used to compare the predicted value of the lip language password with the password stored in the lip language recognition model Yes, if they are the same, confirm that the subject to be tested is recognized by the lip code.

To achieve the above objective, the present application also provides an electronic device, the electronic device comprising: a memory, a processor, the memory stores a lip-password-based facial recognition program, the lip-password-based facial recognition program When executed by the processor, the following steps are implemented: S110, obtaining a password-reading video of the subject to be tested; S120, detecting the video frame by frame through the Resnet-based face detection model to obtain the password-reading video of the subject to be tested in the video The lip language images of consecutive frames in the time period from the start time to the end time; S130. Determine the lip language feature set in the time period according to the lip images of the consecutive frames in the time period; S140. The language feature set is input into the trained lip language recognition model based on the two-way LSTM to obtain the predicted value of the lip language password; S150. If the predicted value of the lip language password is consistent with the password stored in the lip language recognition model, confirm the test to be tested The subject is identified by a lip code.

In addition, in order to achieve the above object, the present application also provides a computer-readable storage medium storing a computer program, the computer program including a lip-based cipher-based face recognition program, the lip-based When the face recognition program of the language password is executed by the processor, the steps of the above-mentioned face recognition method based on the lip language password are realized.

The face recognition method, system, electronic device and computer readable storage medium based on lip language password proposed in this application classify lip language images by using 2D convolution model + bidirectional LSTM model + Softmax layer + optimized network layer, Train a model; among them, the 2D convolution model is used to extract human lips features; the two-way LSTM model is used to connect multiple lip pictures in the video in series to extract temporal features; the Softmax layer is used to output lip password prediction values; optimization The network layer is used to input the predicted value of the lip language password into the loss function for iterative training until the value of the loss function reaches the set threshold; finally obtain a face recognition model based on the lip language password; use the face recognition model based on the lip language password The recognition model performs lip-password recognition, which can achieve the technical effects of low cost and high recognition accuracy.

Description of the drawings

FIG. 1 is a flowchart of a preferred embodiment of a face recognition method based on lip language password according to this application;

2 is a flowchart of a preferred embodiment of a method for constructing a lip language recognition model based on a two-way LSTM according to the present application;

FIG. 3 is a schematic diagram of the principle of a face recognition method based on lip language passwords in this application;

4 is a schematic structural diagram of a preferred embodiment of a face recognition system based on lip language passwords according to this application;

FIG. 5 is a schematic structural diagram of a preferred embodiment of the electronic device of this application.

The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

detailed description

It should be understood that the specific embodiments described here are only used to explain the application, and not used to limit the application.

This application provides a face recognition method based on lip code. Fig. 1 shows a flowchart of a preferred embodiment of a face recognition method based on lip language passwords according to an embodiment of the present application. The method can be executed by a device, and the device can be implemented by software and/or hardware.

This application uses the dlib+Resnet face key point detection model to screen lip language images; the two-way LSTM model completes the classification of lip language images, thereby realizing the use of lip language as a password for face recognition.

It should be noted that Resnet (Residual Network) is a new type of network structure with powerful functions. Resnet network can be seen as a combination of parallel and serial modules. Resnet is a better network for classification problems in the ImageNet competition. It has a variety of structural forms, including Resnet-34, Resnet-50, Resnet-101, and Resnet-152; the design of the Resnet network structure follows two design rules: (1) For the same output feature map size, the layers have the same number of filters; (2) If the feature map size is halved, the number of filters is doubled in order to maintain the time complexity of each layer. The first is the skip connection method, and the second is the use of the Batch Normalization layer. The Resnet-based face detection algorithm needs to use windows with different sizes and positions to slide in the image, and then determine whether there is a face in the window.

In addition, the HOG (histogram of oriented gradient) + regression tree method is used in dlib. It is much better to use the model trained by dlib for detection. dlib also uses convolutional neural networks for face detection.

The face recognition method based on lip ciphers in this application uses the interface of the trained Resnet model in dlib, and this interface returns a 128-dimensional face feature vector.

As shown in Fig. 1, in this embodiment, the face recognition method based on lip language password includes step S110-step S150.

S110. Obtain a password-reading video of the subject to be tested.

The subject to be tested is the user who wants to perform password detection. The user who needs to perform password detection needs to read out the preset voice password to pass the password detection. Specifically, the detection device needs to obtain the action video of the subject to be tested reading the password.

In a specific embodiment, the user randomly selects 4-6 numbers as the password. The user faces the camera and reads the above numbers at a constant speed. The speed is about one number per second. The reading can be in Mandarin, dialect, and English. Wait. When collecting information, it can be repeated 4-10 times. The user must remember the 4-6 digits set by himself as a password. The next time you enter, just read out the above password to complete the identification.

S120: Perform frame-by-frame detection on the video by using a Resnet-based face detection model, and obtain lip language images of consecutive frames in the time period from the starting point to the ending point when the subject to be tested reads the password in the video.

In a specific embodiment, the lip language image is parsed through the feature point model of the dlib database to obtain lip feature information; and the time period from the start time to the end time of the subject to be tested reading the password in the video is analyzed by analyzing the reading Acquire the sound waveform of the password video.

Specifically, the video is detected frame by frame through the Resnet-based face detection model, the lip position is determined by the feature point model of the dlib database, and several frames of lip images are cut out; the digital password video is obtained by analyzing the sound waveform. According to the start time and end time of the sound, continuous frames of lip language images are filtered out according to the above start time and end time.

In a specific embodiment, the face detection model detects the face of the video frame by frame, and then confirms the position of the lips according to the feature point model of dlib, thereby cutting out the lip picture from the video. An exemplary description is as follows: For example, if the user's video reads 1234, the sample is a number of frames of lip pictures that are cut out, and the label is the pinyin corresponding to 1234: er sansi, and the different pinyins are separated by spaces.

For example, if the user video reads the number 6, the sample cuts out several frames of lip pictures, and the label is the pinyin corresponding to 6: liu; the starting point and key time of the sound of each number are obtained according to the sound waveform, based on two-way LSTM According to the start and end time, the lip language recognition model determines the image of a continuous frame of the face in the video, so that the number is recognized through the lip movement, and the recognized number is compared with the password stored in the background. If they are consistent, then by.

S130. Determine a lip language feature set in the time period according to the lip language images of consecutive frames in the time period.

The method for preprocessing the lip language image in step S130 includes: S310, normalizing the lip language image; S320, storing the normalized lip language image as a data set sample in a specified format ; The storage format of the data set samples is [number of samples, data sequence length, image length, image width, image depth].

S140. Input the lip language feature set into a trained lip language recognition model based on two-way LSTM to obtain a lip language password prediction value.

S150. If the predicted value of the lip language password is consistent with the password stored in the lip language recognition model, confirm that the subject to be tested is recognized by the lip language password.

In summary, using a face recognition model based on lip ciphers for lip cipher recognition can achieve the technical effects of low cost and high recognition accuracy.

Fig. 2 is a flowchart of a preferred embodiment of a method for constructing a lip language recognition model based on a two-way LSTM according to an embodiment of the present application. As shown in Fig. 2, the method for constructing a lip language recognition model based on a two-way LSTM includes: step S210- Step S240.

S210. Construct an initial network layer for acquiring lip features of the subject to be tested; the initial network layer is a 2D convolutional network.

Specifically, first establish a Masking layer to filter out the data that is filled with 0 in the input sample; then, design the number of convolution kernels, build the initial network layer of the 2D convolutional network, and use the 2D convolutional network of the initial network layer, Extract the feature value of each frame in the input data; among them, the convolution kernel is set according to the sample lip size in the training set data. Input each frame of image in the training set sample, multiply it with the convolution kernel in the network, and get the feature value after three layers of convolution.

S220. Construct a bidirectional LSTM layer on the initial network layer for extracting temporal features in the training set data.

Mouth vocalization is a dynamic process, and you need to look at the frames before and after the current frame to get the hidden layer information of the current frame more accurately. That is to say, the two-way LSTM layer considers the influence of "past" and "future" on the current frame.

Among them, in the bidirectional LSTM layer, the step size is the number of lip images extracted from the video.

S230. Construct a Softmax layer on the two-way LSTM layer for outputting the predicted value of the lip cipher.

Specifically, using the Softmax layer to classify a segment of lip motion images, there are a total of 0-9, a total of 10 categories. It should be noted that the output layer using Softmax has multiple units. In fact, as many categories as we have, there will be as many units as possible. In this example, we have 10 categories, so there are also 10 neural units. Represents these 10 categories. Under the action of Softmax, each neural unit will calculate the probability that the current sample belongs to this category.

The time value X obtained by the neural network is passed through the Softmax layer to obtain the lip password prediction value of the sample; in other words, the Softmax layer uses the formula to calculate the time feature data extracted by the two-way LSTM layer as the lip password prediction value, so The formula is as follows: P=W*X+b; where P is the predicted value of the lip code, X is the time feature data, W is the weight, and b is the offset.

In general, the neural network of the lip recognition model includes: a three-layer 2D convolutional network, a two-layer bidirectional LSTM network, and a fully connected layer with an activation function of Softmax, and finally a logical prediction layer (that is, an optimized network) Floor).

S240. Construct an optimized network layer on the Softmax layer; wherein the optimized network layer is used to input the predicted value of the lip password into a loss function for iterative training until the value of the loss function reaches a set threshold.

In the training process of the neural network, the training set data is input to the neural network, and the predicted value of the lip password is obtained through layer-by-layer calculation, and the predicted value of the lip password and the real label are input into the loss function, and Loss is calculated. Propagate the revised model parameters. In other words, the loss function is used to iteratively train the above neural network model until Loss reaches the set threshold.

In general, the bidirectional LSTM-based lip language recognition model of the present application is a bidirectional cyclic neural network. The Forward layer and the Backward layer are jointly connected to the output layer, which contains 6 shared weights w1-w6. The connection weight w between neurons, and the bias b of each neuron itself. The way to adjust the parameters is to use the gradient descent algorithm (Gradientdescent) to adjust the size of the parameters along the gradient direction. The greater the gradient of the activation function, the faster the size of w and b are adjusted, and the faster the training will converge. The activation function commonly used in neural networks is the sigmoid function.

In the Forward layer, the forward calculation is performed from time 1 to time t, and the output of the forward hidden layer at each time is obtained and saved. In the Backward layer, the backward calculation is performed from time t to time 1, and the output of the backward hidden layer at each time is obtained and saved. Finally, at each moment, the final output is obtained by combining the output results of the Forward layer and the Backward layer at the corresponding time.

In summary, the two-way LSTM network performs better in time series classification tasks. At the same time, it uses time series history and future information, combined with context information, and comprehensively judges the results, and the judgment results are more accurate.

Fig. 3 is a schematic diagram of the principle of a face recognition method based on a lip language password according to an embodiment of the present application. As shown in Figure 3, first obtain continuous frames of lip language images from the video of the subject to be tested for reading the password, and then normalize the obtained lip language images, and store the normalized lip language images as A data set sample in a prescribed format; the prescribed format is the storage format of the data set sample. In a specific implementation of this application, the prescribed storage format is [sample number, data sequence length, image length, image width, image depth】.

The above data set is divided into two parts at a ratio of eight to two, one is used as a training set, and the other is used as a test set; among them, the training set is used to train a lip recognition model based on two-way LSTM. The steps of constructing a lip language recognition model based on bidirectional LSTM include: first designing the number of convolution kernels, constructing a 2D convolutional network, then designing the number of hidden layer cells and step size, constructing a bidirectional LSTM network; then constructing the Softmax layer, Softmax The layer is used to use the Softmax function to classify the obtained time characteristics of the password-reading video of the subject to be tested, and finally select the node with the largest probability (that is, the value corresponding to the largest) as the lip password prediction value output. The last step is to use the loss function to optimize the network, and finally through the iterative training of the loss function, the lip recognition model based on two-way LSTM with the best recognition effect is obtained.

Use the samples of the test set to test the lip language recognition model based on two-way LSTM. After the test, the lip language recognition model based on two-way LSTM can predict the lip language video.

This application provides a face recognition system 400 based on lip code. Fig. 4 shows the structure of a preferred embodiment of a face recognition system based on lip ciphers according to the present application.

As shown in FIG. 4, a face recognition system 400 based on a lip language password includes a lip language feature set acquisition unit 410, a lip language password prediction value acquisition unit 420, and a password determination unit 430; wherein,

The lip feature set obtaining unit 410 is used to obtain the password reading video of the subject to be tested; the video is detected frame by frame through the Resnet-based face detection model to obtain the starting point of the subject to be tested in the video to read the password The lip language images of consecutive frames in the time period from the time to the end time; and the lip language feature set in the time period is determined according to the lip language images of the consecutive frames in the time period.

The lip language password prediction value obtaining unit 420 is configured to input the lip language feature set into a trained lip language recognition model based on two-way LSTM to obtain a lip language password prediction value.

The password determination unit 430 is configured to compare the predicted value of the lip language password with the password stored in the lip language recognition model, and if they are consistent, confirm that the subject to be tested is recognized by the lip language password.

In a specific embodiment, the lip language feature set acquisition unit 410 includes a video acquisition sub-unit 411, a lip language image acquisition sub-unit 412, and a lip language feature set acquisition sub-unit 413; wherein, the video acquisition sub-unit uses To obtain the password-reading video of the subject to be tested; the lip language image acquisition subunit is used to detect the video frame by frame through the Resnet-based face detection model to obtain the starting point of the subject to be tested in the video to read the password Lip language images of consecutive frames in the time period from time to end time; the lip language feature set acquiring subunit is used to determine the lip language images in the time period according to the lip language images of consecutive frames in the time period Feature set.

Specifically, the lip language feature set acquisition subunit 413 includes an image normalization module and a data set sample acquisition module; wherein the image normalization module is used to normalize the lip language image; The data set sample acquisition module is used to store the normalized lip language image as a data set sample in a prescribed format.

It should be noted that the lip language image acquisition sub-unit 412 includes a video detection module and a lip language image acquisition module; wherein, the video detection module is used to perform the video detection on the video through the Resnet-based face detection model. Frame-by-frame detection; the lip language image acquisition module is used to acquire the lip language images of consecutive frames in the time period from the starting point to the ending point of the subject to be tested reading the password in the video.

Specifically, the lip language image acquisition module includes a lip feature information acquisition sub-module and a time point acquisition sub-module; wherein the lip feature information acquisition sub-module is used to analyze the lip language through the feature point model of the dlib database The image acquires lip feature information; the time point acquisition sub-module is used for the time period from the start time to the end time of the subject to be tested for reading the password in the video by parsing the sound waveform of the password reading video.

The lip language password prediction value acquisition unit 420 includes a two-way LSTM-based lip language recognition model building module 421; the two-way LSTM-based lip language recognition model building module includes 421 an initial network layer construction sub-module, a two-way LSTM layer The construction sub-module, the Softmax layer construction module and the optimized network layer construction module; wherein the initial network layer construction sub-module is used to construct an initial network layer for obtaining the lip characteristics of the subject to be tested; the initial network layer is 2D Convolutional network; the two-way LSTM layer construction sub-module is used to construct a two-way LSTM layer for extracting temporal features in the training set data on the initial network layer; the Softmax layer construction module is used to The Softmax layer for outputting the predicted value of the lip cipher is constructed on the bidirectional LSTM layer; the optimized network layer construction module is used to construct an optimized network layer on the Softmax layer; wherein, the optimized network layer is used to transfer the The predicted value of the lip password is input to the loss function for iterative training until the value of the loss function reaches the set threshold. The Softmax layer construction module includes a lip password prediction value acquisition sub-module; the lip password prediction value acquisition sub-module is used to obtain a lip password prediction value through the temporal feature data extracted by the two-way LSTM layer. The Softmax layer construction module also includes a parameter adjustment sub-module, the parameter adjustment sub-module is used to adjust the weight W and the offset b along the gradient direction by using the activation function sigmoid and the gradient descent algorithm.

The present application provides a face recognition method based on lip language password, which is applied to an electronic device 5. Fig. 5 shows a schematic diagram of an application environment of a preferred embodiment of a face recognition method based on lip ciphers according to the present application.

In this embodiment, the electronic device 5 may be a terminal device with arithmetic function, such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, and the like.

As shown in FIG. 5, the electronic device 5 includes a processor 52, a memory 51, a communication bus 53 and a network interface 54.

The memory 51 includes at least one type of readable storage medium. The at least one type of readable storage medium may be a non-volatile storage medium such as flash memory, hard disk, multimedia card, card-type memory 51, and the like. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 5, such as a hard disk of the electronic device 5. In other embodiments, the readable storage medium may also be the external memory 51 of the electronic device 5, such as a plug-in hard disk equipped on the electronic device 5, or a smart memory card (Smart Media Card, SMC). , Secure Digital (SD) card, Flash Card (Flash Card), etc.

In this embodiment, the readable storage medium of the memory 51 is generally used to store the lip-password-based face recognition program 50 and the like installed in the electronic device 5. The memory 51 can also be used to temporarily store data that has been output or will be output.

In some embodiments, the processor 52 may be a central processing unit (CPU), microprocessor or other data processing chip, used to run the program code or processing data stored in the memory 51, for example, to execute lip-based Password facial recognition program 50 etc.

The communication bus 53 is used to realize the connection and communication between these components.

The network interface 54 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is generally used to establish a communication connection between the electronic device 5 and other electronic devices.

FIG. 5 only shows the electronic device 5 with the components 51-54, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.

Optionally, the electronic device 5 may also include a user interface. The user interface may include an input unit such as a keyboard (Keyboard), a voice input device such as a microphone (microphone) and other devices with voice recognition functions, and a voice output device such as audio, earphones, etc. Optionally, the user interface may also include a standard wired interface and a wireless interface.

Optionally, the electronic device 5 may also include a display, and the display may also be referred to as a display screen or a display unit. In some embodiments, it may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, and an organic light-emitting diode (Organic Light-Emitting Diode, OLED) touch device, etc. The display is used to display the information processed in the electronic device 5 and to display a visualized user interface.

Optionally, the electronic device 5 may also include a radio frequency (RF) circuit, a sensor, an audio circuit, etc., which will not be repeated here.

In the device embodiment shown in FIG. 5, the memory 51, which is a computer storage medium, can store an operating system and a face recognition program 50 based on a lip code; the processor 52 executes the lip based code stored in the memory 51 The password facial recognition program 50 implements the following steps: S110, obtain the password-reading video of the subject to be tested; S120, detect the obtained password-reading video frame by frame through the Resnet-based face detection model, and obtain the obtained video The lip language images of consecutive frames in the time period from the start time to the end time when the subject to be tested reads the password; S130, according to the lip language images of the consecutive frames in the time period, determine the lip language feature set in the time period S140, input the lip language feature set into a trained lip language recognition model based on two-way LSTM to obtain a lip language password predicted value; S150, if the lip language password predicted value is stored in the lip language recognition model If the passwords are consistent, it is confirmed that the subject to be tested is identified by the lip-password.

In other embodiments, the facial recognition program 50 based on lip ciphers may also be divided into one or more modules, and the one or more modules are stored in the memory 51 and executed by the processor 52 to complete the application . The module referred to in this application refers to a series of computer program instruction segments that can complete specific functions.

The face recognition program 50 based on the lip language password can be divided into: a lip language feature set acquisition unit, a lip language password prediction value acquisition unit, and a password determination unit. The realized functions or operation steps are similar to the above, and will not be described in detail here.

In addition, an embodiment of the present application also proposes a computer-readable storage medium, the computer-readable storage medium includes a lip-password-based facial recognition program, and the lip-password-based facial recognition program is executed by a processor Realize the following operations: S110. Obtain the password-reading video of the subject to be tested; S120. Detect the video frame by frame through the Resnet-based face detection model to obtain the time from the starting point to the end point of the subject to be tested in the video. Lip language images of consecutive frames in a time period; S130. Determine a lip language feature set in the time period according to the lip language images of consecutive frames in the time period; S140. Input the lip language feature set for training. The lip language recognition model based on the two-way LSTM obtains the lip language password prediction value; S150. If the lip language password prediction value is consistent with the password stored in the lip language recognition model, confirm that the subject to be tested is recognized by the lip language password .

The specific implementation of the computer-readable storage medium of the present application is substantially the same as the specific implementation of the above-mentioned face recognition method and electronic device based on the lip-password, and will not be repeated here.

It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, device, article or method including a series of elements not only includes those elements, It also includes other elements not explicitly listed, or elements inherent to the process, device, article, or method. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, device, article, or method that includes the element.

The serial numbers of the foregoing embodiments of the present application are only for description, and do not represent the advantages and disadvantages of the embodiments. Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above. , Magnetic disks, optical disks), including several instructions to make a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) execute the method described in each embodiment of the present application.

The above are only the preferred embodiments of the application, and do not limit the scope of the patent for this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of the application, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims

A face recognition method based on lip code, applied to an electronic device, characterized in that the method includes:

S110. Obtain a password-reading video of the subject to be tested;

S120: Perform frame-by-frame detection on the video by using a Resnet-based face detection model, and obtain lip language images of consecutive frames in the time period from the starting point to the ending point when the subject to be tested reads the password in the video;

S130: Determine a lip language feature set in the time period according to the lip language images of consecutive frames in the time period;

S140: Input the lip language feature set into a trained lip language recognition model based on two-way LSTM to obtain a lip language password prediction value;

S150. If the predicted value of the lip language password is consistent with the password stored in the lip language recognition model, confirm that the subject to be tested is recognized by the lip language password.
The face recognition method based on lip ciphers according to claim 1, wherein the method for constructing a lip recognition model based on two-way LSTM comprises:

S210. Construct an initial network layer for acquiring lip features of the subject to be tested; the initial network layer is a 2D convolutional network;

S220: Construct a two-way LSTM layer for extracting temporal features in the training set data on the initial network layer;

S230: Construct a Softmax layer on the two-way LSTM layer for outputting the prediction value of the lip cipher;

S240. Construct an optimized network layer on the Softmax layer; wherein the optimized network layer is used to input the predicted value of the lip password into a loss function for iterative training until the value of the loss function reaches a set threshold.
The face recognition method based on lip language passwords according to claim 2, wherein step S140 comprises:

In the Softmax layer, the time feature data extracted by the two-way LSTM layer is calculated as the lip password prediction value by the lip password prediction value formula, and the lip password prediction value formula is as follows: P=W*X+b ；

Among them, P is the predicted value of the lip language password, X is the time feature data, W is the weight, and b is the offset.
The face recognition method based on lip language password according to claim 1, wherein the step S130 of determining the lip language feature set in the time period comprises:

S310: Normalize the lip language image;

S320. Store the normalized lip language image as a data set sample in a prescribed format; the data set sample includes: number of samples, data sequence length, image length, image width, and image depth.
The face recognition method based on lip language passwords according to claim 1, wherein in the step S120, the lip language image is parsed through the feature point model of the dlib database to obtain lip feature information; The time period from the start point to the end point of the main body reading the password is obtained by analyzing the sound waveform of the password reading video.
The face recognition method based on lip language password according to claim 3, characterized in that,

In the Softmax layer, the activation function sigmoid and gradient descent algorithm are used to adjust the weight W and the offset b along the gradient direction, where the greater the gradient of the activation function sigmoid, the weight W and the offset b are adjusted Faster.
A face recognition system based on lip language password, which is characterized by comprising a lip language feature set acquisition unit, a lip language password prediction value acquisition unit and a password determination unit; wherein,

The lip language feature set acquisition unit is used to obtain the password reading video of the subject to be tested; the video is detected frame by frame through the Resnet-based face detection model to obtain the starting moment of the subject to be tested in the video to read the password The lip language images of consecutive frames in the time period to the end point; determine the lip language feature set in the time period according to the lip language images of the consecutive frames in the time period;

The lip language password prediction value obtaining unit is configured to input the lip language feature set into a trained two-way LSTM-based lip language recognition model to obtain the lip language password prediction value;

The password determination unit is configured to compare the predicted value of the lip language password with the password stored in the lip language recognition model, and if they are consistent, confirm that the subject to be tested is recognized by the lip language password.
The face recognition system based on lip password according to claim 7, characterized in that,

The lip language feature set acquisition unit includes a video acquisition sub-unit, a lip language image acquisition sub-unit, and a lip language feature set acquisition sub-unit; wherein,

The video acquisition subunit is used to acquire the password-reading video of the subject to be tested;

The lip language image acquisition subunit is used to detect the video frame by frame through the Resnet-based face detection model, and acquire consecutive frames in the time period from the starting point to the ending point when the subject to be tested reads the password in the video Image of lip language;

The lip language feature set acquiring subunit is configured to determine the lip language feature set in the time period according to the lip language images of consecutive frames in the time period.
The face recognition system based on lip language password according to claim 8, characterized in that,

The lip language feature set acquisition subunit includes an image normalization module and a data set sample acquisition module; wherein,

The image normalization module is used to normalize the lip language image;

The data set sample acquisition module is used to store the normalized lip language image as a data set sample in a prescribed format.
The face recognition system based on lip language password according to claim 8, wherein the lip language image acquisition sub-unit includes a video detection module and a lip language image acquisition module; wherein,

The video detection module is configured to detect the video frame by frame through the Resnet-based face detection model;

The lip language image acquisition module is used to acquire the lip language images of consecutive frames in the time period from the start time to the end time when the subject to be tested reads the password.
The face recognition system based on lip ciphers according to claim 10, characterized in that,

The lip language image acquisition module includes a lip feature information acquisition sub-module and a time point acquisition sub-module; wherein,

The lip feature information acquisition sub-module is used to parse the lip language image through the feature point model of the dlib database to obtain lip feature information;

The time point acquisition sub-module is used to obtain the time period from the start time to the end time of the password reading of the subject to be tested in the video by parsing the sound waveform of the password reading video.
The face recognition system based on lip password according to claim 7, characterized in that,

The lip cipher prediction value acquisition unit includes a two-way LSTM-based lip language recognition model building module; the two-way LSTM-based lip language recognition model building module includes an initial network layer construction sub-module, a two-way LSTM layer construction sub-module, Softmax layer building module and optimized network layer building module; among them,

The initial network layer construction sub-module is used to construct an initial network layer for obtaining the lip features of the subject to be tested; the initial network layer is a 2D convolutional network;

The two-way LSTM layer construction sub-module is used to construct a two-way LSTM layer for extracting temporal features in the training set data on the initial network layer;

The Softmax layer construction module is configured to construct a Softmax layer for outputting the predicted value of the lip cipher on the two-way LSTM layer;

The optimized network layer construction module is used to construct an optimized network layer on the Softmax layer; wherein, the optimized network layer is used to input the prediction value of the lip password into a loss function for iterative training until the value of the loss function Reached the set threshold.
The face recognition system based on lip language password according to claim 12, characterized in that,

The Softmax layer construction module includes a lip password prediction value acquisition sub-module; the lip password prediction value acquisition sub-module is used to obtain a lip password prediction value through the temporal feature data extracted by the two-way LSTM layer.
The face recognition system based on lip ciphers according to claim 12 or 13, characterized in that the Softmax layer construction module further comprises a parameter adjustment sub-module, and the parameter adjustment sub-module is configured to adopt the activation function sigmoid and The gradient descent algorithm adjusts the weight W and the offset b along the gradient direction.
An electronic device, characterized in that it includes a memory and a processor, the memory stores a face recognition program based on a lip language password, and the face recognition program based on a lip language password is processed by the The following steps are implemented when the device is executed:

S110. Obtain a password-reading video of the subject to be tested;

S120: Perform frame-by-frame detection on the video by using a Resnet-based face detection model, and obtain lip language images of consecutive frames in the time period from the starting point to the ending point when the subject to be tested reads the password in the video;

S130: Determine a lip language feature set in the time period according to the lip language images of consecutive frames in the time period;

S140: Input the lip language feature set into a trained lip language recognition model based on two-way LSTM to obtain a lip language password prediction value;

S150. If the predicted value of the lip language password is consistent with the password stored in the lip language recognition model, confirm that the subject to be tested is recognized by the lip language password.
The electronic device according to claim 15, wherein the method for constructing a lip language recognition model based on two-way LSTM comprises:

S210. Construct an initial network layer for acquiring lip features of the subject to be tested; the initial network layer is a 2D convolutional network;

S220: Construct a two-way LSTM layer for extracting temporal features in the training set data on the initial network layer;

S230: Construct a Softmax layer on the two-way LSTM layer for outputting the prediction value of the lip cipher;

S240. Construct an optimized network layer on the Softmax layer; wherein the optimized network layer is used to input the predicted value of the lip password into a loss function for iterative training until the value of the loss function reaches a set threshold.
The electronic device according to claim 16, wherein step S140 comprises: in the Softmax layer, calculating the time feature data extracted by the bidirectional LSTM layer as a lip cipher prediction value through a lip cipher cipher prediction formula, The formula for the prediction value of the lip language password is as follows:

P=W*X+b;

Among them, P is the predicted value of the lip language password, X is the time feature data, W is the weight, and b is the offset.
The electronic device according to claim 15, characterized in that, in the step S120, the lip language image is analyzed through the feature point model of the dlib database to obtain lip feature information; the starting time of the subject to be tested in the video to read the password The time period to the end point is obtained by analyzing the sound waveform of the password-reading video.
The electronic device according to claim 15, wherein the step S130 of determining the lip language feature set in the time period comprises:

S310: Normalize the lip language image;

S320. Store the normalized lip language image as a data set sample in a prescribed format; the data set sample includes: number of samples, data sequence length, image length, image width, and image depth.
A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, the computer program includes a lip-password-based face recognition program, and the lip-password-based face recognition program When executed by the processor, the steps of the face recognition method based on lip ciphers according to any one of claims 1 to 6 are realized.