CN110717407A

CN110717407A - Human face recognition method, device and storage medium based on lip language password

Info

Publication number: CN110717407A
Application number: CN201910885930.2A
Authority: CN
Inventors: 张国辉; 董洪涛
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-09-19
Filing date: 2019-09-19
Publication date: 2020-01-21
Also published as: WO2021051602A1

Abstract

The invention belongs to the technical field of biological recognition, and provides a human face recognition method, a human face recognition device and a storage medium based on a lip language password, wherein the method comprises the following steps: obtaining a password reading video of a main body to be detected; performing frame-by-frame detection on the video through a face detection model based on Resnet to obtain continuous frame lip language images in a time period from a starting point moment to an end point moment of a main body password reading to be detected in the video; determining a lip language feature set in the time period according to the lip language images of the continuous frames in the time period; inputting the lip language feature set into a trained bidirectional LSTM-based lip language identification model to obtain a predicted value of a lip language password; and if the predicted value of the lip language password is consistent with the password stored in the lip language recognition model, confirming that the main body to be tested passes the recognition of the lip language password. According to the invention, the effects of quick detection and high detection precision can be realized by the face recognition method based on the lip language password.

Description

Human face recognition method, device and storage medium based on lip language password

Technical Field

The invention relates to the technical field of biological recognition, in particular to a human face recognition method and device based on a lip language password and a storage medium.

Background

In the traditional password identification method, the used password is a combination of numbers, characters or other symbols, and a user must accurately remember the used password and correctly input the password to be successfully identified, so that the traditional password identification method has the defects of easy forgetting, error recording, complex operation and the like.

The popular fingerprint identification, face identification and iris identification methods in recent years can overcome the defects that the traditional passwords need to be memorized and input one by one, but have the following defects: 1. the simple fingerprint identification and face identification face the risks of fingerprint copying and photo attack, and the fingerprint identification and face identification can be cheated by the copied fingerprint and the static photo; 2. however, the iris recognition is highly safe, but there is a problem that the equipment is expensive and the investment cost is high.

In view of the above problems, a need exists for a secure identification method with low detection cost and without loss of detection accuracy.

Disclosure of Invention

The invention provides a face recognition method based on a lip language password, an electronic device and a computer readable storage medium, wherein lip language images are screened mainly through a dlib + Resnet face key point detection model; the lip language images are classified through a lip language recognition model based on the bidirectional LSTM, and the technical effect of carrying out face recognition by using lip language as a password is achieved.

In order to achieve the above object, the present invention provides a face recognition method based on a lip language password, which is applied to an electronic device, and the method comprises: s110, obtaining a password reading video of a main body to be detected; s120, detecting the video frame by frame through a face detection model based on Resnet to obtain continuous frame lip language images in a time period from a starting time to an end time of a main body password reading to be detected in the video; s130, determining a lip language feature set in the time period according to the lip language images of the continuous frames in the time period; s140, inputting the lip language feature set into a trained bidirectional LSTM-based lip language identification model to obtain a predicted value of a lip language password; s150, if the predicted value of the lip language password is consistent with the password stored in the lip language recognition model, confirming that the body to be tested passes the recognition of the lip language password.

Preferably, the method for constructing the bidirectional LSTM-based lip language identification model includes:

s210, constructing an initial network layer for acquiring lip characteristics of a body to be detected; the initial network layer is a 2D convolutional network; s220, constructing a bidirectional LSTM layer for extracting time features in training set data on the initial network layer; s230, constructing a Softmax layer for outputting a predicted value of the lip language password on the bidirectional LSTM layer; s240, constructing an optimized network layer on the Softmax layer; and the optimization network layer is used for inputting the predicted value of the lip language password into a loss function for iterative training until the value of the loss function reaches a set threshold value.

Preferably, step S140 includes, in the Softmax layer, calculating the temporal feature data extracted by the bidirectional LSTM layer as a lip-cipher prediction value through a lip-cipher prediction value formula, where the lip-cipher prediction value formula is as follows: p ═ W X + b; wherein, P is a predicted value of the lip language password, X is time characteristic data, W is a weight, and b is an offset.

Preferably, the step S130 includes: s310, normalizing the lip language image; s320, storing the normalized lip language image into a data set sample with a specified format; the dataset sample comprises: number of samples, data sequence length, image width, image depth.

Preferably, in step S120, the lip language image is analyzed through a feature point model of the dlib database to obtain lip feature information; and the time period from the starting point moment to the end point moment of the password reading of the main body to be detected in the video is obtained by analyzing the sound waveform of the password reading video.

To achieve the above object, the present invention also provides an electronic device, including: the device comprises a memory and a processor, wherein the memory stores a face recognition program based on a lip language password, and the face recognition program based on the lip language password realizes the following steps when being executed by the processor: s110, obtaining a password reading video of a main body to be detected; s120, detecting the video frame by frame through a face detection model based on Resnet to obtain continuous frame lip language images in a time period from a starting time to an end time of a main body password reading to be detected in the video; s130, determining a lip language feature set in the time period according to the lip language images of the continuous frames in the time period; s140, inputting the lip language feature set into a trained bidirectional LSTM-based lip language identification model to obtain a predicted value of a lip language password; s150, if the predicted value of the lip language password is consistent with the password stored in the lip language recognition model, confirming that the body to be tested passes the recognition of the lip language password.

s210, constructing an initial network layer for acquiring lip characteristics of a body to be detected; the initial network layer is a 2D convolutional network; s220, constructing a bidirectional LSTM layer for extracting time features in training set data on the initial network layer; s230, constructing a Softmax layer for outputting a predicted value of the lip language password on the bidirectional LSTM layer; s240, constructing an optimized network layer on the Softmax layer; and the optimization network layer is used for inputting the predicted value of the lip language password into the loss function for iterative training until the value of the loss function reaches a set threshold value.

Preferably, step S140 includes: in the Softmax layer, calculating the time characteristic data extracted by the bidirectional LSTM layer into a lip password predicted value through a lip password predicted value formula, wherein the lip password predicted value formula is as follows: p ═ W X + b; wherein, P is a predicted value of the lip language password, X is time characteristic data, W is a weight, and b is an offset.

In addition, to achieve the above object, the present invention further provides a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, the computer program includes a face recognition program based on a lip-language password, and when the face recognition program based on the lip-language password is executed by a processor, the steps of the face recognition method based on the lip-language password are implemented.

According to the face recognition method based on the lip language password, the electronic device and the computer readable storage medium, the 2D convolution model, the two-way LSTM model, the Softmax layer and the optimization network layer are adopted to classify the lip language images, and the models are trained; wherein, the 2D convolution model is used for extracting lip characteristics of a person; the bidirectional LSTM model is used for connecting a plurality of lip pictures in a video in series and extracting time characteristics; the Softmax layer is used for outputting a predicted value of the lip language password; the optimization network layer is used for inputting the predicted value of the lip language password into a loss function for iterative training until the value of the loss function reaches a set threshold value; finally, a face recognition model based on the lip language password is obtained; the lip language password recognition is carried out by utilizing the face recognition model based on the lip language password, and the technical effects of low use cost and high recognition precision can be realized.

Drawings

FIG. 1 is a flowchart illustrating a preferred embodiment of a face recognition method based on a lip-language password according to the present invention;

FIG. 2 is a flowchart of a preferred embodiment of the method for constructing a bi-directional LSTM-based lip language identification model according to the present invention;

FIG. 3 is a schematic diagram illustrating the principle of a face recognition method based on a lip language password according to the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to a preferred embodiment of the invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides a face recognition method based on a lip language password. Fig. 1 is a flowchart illustrating a preferred embodiment of a face recognition method based on a lip-language password according to an embodiment of the present invention. The method may be performed by an apparatus, which may be implemented by software and/or hardware.

The lip language image is screened through a dlib + Resnet face key point detection model; the classification of the lip language images is completed through the bidirectional LSTM model, so that the lip language is used as a password to perform face recognition.

It should be noted that Resnet (redundant network) is a new type of network structure, which is powerful, and the Resnet network can be regarded as a combination of parallel and serial modules. Resnet is a network with better classification problems in ImageNet competition, and has various structural forms, including Resnet-34, Resnet-50, Resnet-101 and Resnet-152; the design of the Resnet network architecture follows two design rules: (1) for the same output feature size, the layers have the same number of filters; (2) if the signature graph size is halved, the number of filters is doubled in order to preserve the temporal complexity of each layer. The first is the jump-connection method and the second is the use of the Batch Normalization layer. The face detection algorithm based on Resnet needs to slide in an image by using windows with different sizes and positions, and then judges whether a face exists in the window or not.

In addition, the method of hog (histogram of oriented program) + regression tree is used in dlib, and the detection effect by using the trained model of dlib is much better. dlib also uses convolutional neural networks for face detection.

The human face recognition method based on the lip language password uses an interface of a trained Resnet model in dlib, and the interface returns a 128-dimensional human face feature vector.

As shown in fig. 1, in the present embodiment, the face recognition method based on the lip language password includes steps S110 to S150.

And S110, obtaining a password reading video of the main body to be detected.

The main body to be detected is the user who wants to perform password detection, and the user who needs to perform password detection needs to read out a preset voice password so that password detection can be performed. Specifically, the detection device acquires a motion video of the password read by the subject to be detected.

In a specific embodiment, the user randomly selects 4-6 numbers as the password, the face of the user faces the camera, the numbers are read at a constant speed, the speed is about 1 second and one number, and the reading can be mandarin, dialect, English and the like. The information can be collected for 4-10 times. The user remembers the 4-6 digit number set by the user as the password. And when entering next time, reading the password to finish the identification.

And S120, detecting the video frame by frame through a face detection model based on Resnet, and acquiring continuous frame lip language images in a time period from the starting time to the end time of the main body password reading to be detected in the video.

In a specific embodiment, the lip feature information is obtained by analyzing the lip language image through a feature point model of a dlib database; and the time period from the starting point moment to the end point moment of the password reading of the main body to be detected in the video is obtained by analyzing the sound waveform of the password reading video.

Specifically, frame-by-frame detection is carried out on the video through a face detection model based on Resnet, lip positions are determined by utilizing a characteristic point model of a dlib database, and a plurality of frame lip images are cut; and according to the starting time and the ending time of the sound in the digital password video, lip language images of continuous frames are screened out.

In a specific embodiment, the face detection model detects a face frame by frame for the video, and then confirms the position of the lips according to the characteristic point model of dlib, so as to cut out a lip picture in the video. An exemplary description is as follows: for example, if 1234 is read in the video of the user, the samples are a plurality of cut lip pictures, and the label is a pinyin corresponding to 1234: yi er san si, different pinyins are separated by a space.

For example, if the number 6 is read from the user video, the sample is cut into a plurality of lip pictures, and the label is a pinyin corresponding to 6: liu; the starting point and the key time of the sound of each digit are obtained according to the sound waveform, the lip language recognition model based on the bidirectional LSTM determines images of a section of continuous frames of the human face in the video according to the starting point and the end point time, so that the digits are recognized through lip movement, comparison is carried out according to the recognized digits and a password stored in a background in advance, and if the recognized digits are consistent with the password, the images pass.

S130, determining a lip language feature set in the time period according to the lip language images of the continuous frames in the time period.

The method for preprocessing the lip language image in the step S130 includes: s310, normalizing the lip language image; s320, storing the normalized lip language image into a data set sample with a specified format; the storage format of the data set samples is [ number of samples, data sequence length, image width, image depth ].

S140, inputting the lip language feature set into a trained bidirectional LSTM-based lip language recognition model to obtain a predicted value of the lip language password.

S150, if the predicted value of the lip language password is consistent with the password stored in the lip language recognition model, confirming that the body to be tested passes the recognition of the lip language password.

In conclusion, the technical effects of low use cost and high identification precision can be achieved by using the face identification model based on the lip language password to identify the lip language password.

Fig. 2 is a flowchart of a preferred embodiment of a method for constructing a bidirectional LSTM-based lip recognition model according to an embodiment of the present invention, and as shown in fig. 2, the method for constructing a bidirectional LSTM-based lip recognition model includes: step S210-step S240.

S210, constructing an initial network layer for acquiring lip characteristics of a body to be detected; the initial network layer is a 2D convolutional network.

Specifically, a Masking layer is established first, and data which are completely filled with 0 in input samples are filtered; then, designing the number of convolution kernels, constructing an initial network layer of the 2D convolution network, and extracting a characteristic value of each frame in input data by using the 2D convolution network of the initial network layer; wherein the convolution kernel is set according to the size of the sample lips in the training set data. Inputting each frame of image in a training set sample, multiplying and summing the frame of image with a convolution kernel in a network, and obtaining a characteristic value after three layers of convolution.

And S220, constructing a bidirectional LSTM layer for extracting the time features in the training set data on the initial network layer.

The speech of the mouth is a dynamic process, and the hidden layer information of the current frame can be obtained more accurately only by looking at the previous frame and the next frame of the current frame. I.e., the bi-directional LSTM layer, takes into account the effects of "past" and "future" on the current frame.

And in the bidirectional LSTM layer, the step length is the number of lip images extracted by the video.

S230, constructing a Softmax layer for outputting the predicted value of the lip language password on the bidirectional LSTM layer.

Specifically, a Softmax layer is used for classifying lip motion images into 0-9 classes and 10 classes. It should be noted that the output layer using Softmax has a plurality of cells, and in fact there are as many classes as we have, in this example we have 10 classes, so there are also 10 neural cells, which represent the 10 classes. Under the action of Softmax, each neural unit calculates the probability that the current sample belongs to the class.

Obtaining a predicted value of the lip language password of the sample by the time value X obtained by the neural network through a Softmax layer; in other words, the Softmax layer calculates the temporal feature data extracted by the bidirectional LSTM layer as a predicted value of the lip cipher by the formula: p ═ W X + b; wherein, P is a predicted value of the lip language password, X is time characteristic data, W is a weight, and b is an offset.

In general, the neural network of the lip language recognition model includes: three layers of 2D convolutional networks, two layers of bidirectional LSTM networks, a fully-connected layer with an activation function of Softmax, and finally a logic prediction layer (namely an optimization network layer) is added.

S240, constructing an optimized network layer on the Softmax layer; and the optimization network layer is used for inputting the predicted value of the lip language password into a loss function for iterative training until the value of the loss function reaches a set threshold value.

In the training process of the neural network, training set data are input into the neural network, a lip language password predicted value is obtained through layer-by-layer calculation, the lip language password predicted value and a real label are input into a Loss function, Loss is calculated, and model parameters are corrected through back propagation. In other words, the neural network model is iteratively trained by using the Loss function until Loss reaches a set threshold.

In general, the two-way LSTM-based lip language recognition model is a two-way recurrent neural network, wherein a Forward layer and a Backward layer are connected with an output layer in common, and 6 shared weights w1-w6 are included in the Forward layer and the Backward layer. The connection weight w between the neurons, and the bias b of each neuron itself. The parameter adjustment method is to adjust the parameter size along the gradient direction by using a gradient descent algorithm (gradientparameter). The larger the gradient of the activation function, the faster the magnitudes of w and b are adjusted, and the faster the training converges. While the usual activation function for neural networks is the sigmoid function.

And Forward computing once from 1 moment to t moment in the Forward layer to obtain and store the output of the Forward hidden layer at each moment. And (4) reversely calculating once along the time t to the time 1 at the Backward layer, and obtaining and storing the output of the Backward hidden layer at each time. And finally, combining the results output at the corresponding moments of the Forward layer and the Backward layer at each moment to obtain final output.

In conclusion, the bidirectional LSTM network has better time-series classification task performance, and meanwhile, the time-series history and the future information are utilized, the context information is combined, the result is comprehensively judged, and the judgment result is more accurate.

Fig. 3 is a schematic diagram illustrating a principle of a face recognition method based on a lip-language password according to an embodiment of the present invention. As shown in fig. 3, first, lip language images of consecutive frames are obtained through a video of a subject to be tested, and then the obtained lip language images are normalized, and the normalized lip language images are stored as data set samples in a specified format; the specified format is a storage format of the data set samples, and in one embodiment of the invention, the specified storage format is [ number of samples, length of data sequence, length of image, width of image, depth of image ].

Dividing the data set into two parts according to the proportion of eight to two, wherein one part is used as a training set, and the other part is used as a testing set; wherein the training set is used for training a bidirectional LSTM-based lip language recognition model. The method for constructing the bidirectional LSTM-based lip language recognition model comprises the following steps: designing the number of convolution kernels, constructing a 2D convolution network, designing the number and the step length of hidden layer cells, and constructing a bidirectional LSTM network; and then constructing a Softmax layer, wherein the Softmax layer is used for classifying the obtained time characteristics of the password reading video of the main body to be detected by using a Softmax function, and finally selecting a node with the maximum probability (namely the node with the maximum value corresponding to the probability) as a predicted value of the lip language password to be output. And finally, performing network optimization by using the loss function, and finally obtaining the bidirectional LSTM-based lip language recognition model with the best recognition effect through iterative training of the loss function.

And carrying out the test of the lip language recognition model based on the two-way LSTM by using the samples of the test set. And the lip language identification model based on the bidirectional LSTM after the test can predict the lip language video.

The invention provides a face recognition method based on a lip language password, which is applied to an electronic device 4. FIG. 4 is a schematic diagram of an application environment of a preferred embodiment of the face recognition method based on lip language passwords according to the present invention.

In the present embodiment, the electronic device 4 may be a terminal device having an arithmetic function, such as a server, a smart phone, a tablet computer, a portable computer, or a desktop computer.

As shown in fig. 4, the electronic apparatus 4 includes: a processor 42, a memory 41, a communication bus 43, and a network interface 44.

The memory 41 includes at least one type of readable storage medium. The at least one type of readable storage medium may be a non-volatile storage medium such as a flash memory, a hard disk, a multimedia card, a card-type memory 41, and the like. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 4, such as a hard disk of the electronic device 4. In other embodiments, the readable storage medium may also be an external memory 41 of the electronic device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the electronic device 4.

In the present embodiment, the readable storage medium of the memory 41 is generally used for storing the lip-password-based face recognition program 40 and the like installed in the electronic device 4. The memory 41 may also be used to temporarily store data that has been output or is to be output.

The processor 42 may be, in some embodiments, a Central Processing Unit (CPU), microprocessor or other data Processing chip for executing program codes stored in the memory 41 or Processing data, such as executing the lip-based face recognition program 40.

The communication bus 43 is used to realize connection communication between these components.

The network interface 44 may optionally include a standard wired interface, a wireless interface (e.g., a WI-FI interface), and is typically used to establish a communication link between the electronic apparatus 4 and other electronic devices.

Fig. 4 only shows the electronic device 4 with components 41-44, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may alternatively be implemented.

Optionally, the electronic device 4 may further include a user interface, which may include an input unit such as a Keyboard (Keyboard), a voice input device such as a microphone (microphone) or other equipment with voice recognition function, a voice output device such as a sound box, a headset, etc., and optionally may also include a standard wired interface or a wireless interface.

Optionally, the electronic device 4 may further include a display, which may also be referred to as a display screen or a display unit. In some embodiments, the display device may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an Organic Light-Emitting Diode (OLED) touch device, or the like. The display is used for displaying information processed in the electronic apparatus 4 and for displaying a visualized user interface.

Optionally, the electronic device 4 may further include a Radio Frequency (RF) circuit, a sensor, an audio circuit, and the like, which are not described in detail herein.

In the embodiment of the apparatus shown in fig. 4, the memory 41, which is a kind of computer storage medium, may store therein an operating system, and a face recognition program 40 based on a lip password; the processor 42 executes the lip-cipher-based face recognition program 40 stored in the memory 41 to implement the following steps: s110, obtaining a password reading video of a main body to be detected; s120, carrying out frame-by-frame detection on the acquired password reading video through a Resnet-based face detection model, and acquiring lip language images of continuous frames in a time period from the starting time to the end time of the main body password reading to be detected in the acquired video; s130, determining a lip language feature set in the time period according to the lip language images of the continuous frames in the time period; s140, inputting the lip language feature set into a trained bidirectional LSTM-based lip language identification model to obtain a predicted value of a lip language password; s150, if the predicted value of the lip language password is consistent with the password stored in the lip language recognition model, confirming that the body to be tested passes the recognition of the lip language password.

In other embodiments, the lip-cipher based face recognition program 40 may be further divided into one or more modules, and the one or more modules are stored in the memory 41 and executed by the processor 42 to implement the present invention. The modules referred to herein are referred to as a series of computer program instruction segments capable of performing specified functions.

The lip-cipher-based face recognition program 40 may be divided into: the device comprises a password video acquisition unit, a lip language feature set acquisition unit, a lip language recognition model training unit based on bidirectional LSTM, and a lip language recognition model test unit based on bidirectional LSTM; the password video acquisition unit is used for acquiring a password reading video of a main body to be detected; the lip language feature set acquisition unit is used for detecting the video frame by frame through a face detection model based on Resnet to acquire lip language images of continuous frames in a time period from a starting point moment to an end point moment of a password read by a main body to be detected in the video; determining a lip language feature set in the time period according to the lip language images of the continuous frames in the time period; a lip language recognition model training unit based on bidirectional LSTM, which is a well-trained neural network model by utilizing the obtained lip language feature set arranged according to time series; the bidirectional LSTM-based lip language recognition model testing unit is used for testing a trained bidirectional LSTM-based lip language recognition model by utilizing the acquired lip language feature sets arranged according to the time sequence. The functions or operational steps performed are similar to those described above and will not be described in detail here.

In addition, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a facial recognition program based on a lip-language password, and when executed by a processor, the facial recognition program based on the lip-language password implements the following operations: s110, obtaining a password reading video of a main body to be detected; s120, detecting the video frame by frame through a face detection model based on Resnet to obtain continuous frame lip language images in a time period from a starting time to an end time of a main body password reading to be detected in the video; s130, determining a lip language feature set in the time period according to the lip language images of the continuous frames in the time period; s140, inputting the lip language feature set into a trained bidirectional LSTM-based lip language identification model to obtain a predicted value of a lip language password; s150, if the predicted value of the lip language password is consistent with the password stored in the lip language recognition model, confirming that the body to be tested passes the recognition of the lip language password.

The specific implementation of the computer-readable storage medium of the present invention is substantially the same as the specific implementation of the above-mentioned human face recognition method and electronic device based on the lip language password, and will not be described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A face recognition method based on a lip language password is applied to an electronic device, and is characterized by comprising the following steps:

s110, obtaining a password reading video of a main body to be detected;

s120, detecting the video frame by frame through a face detection model based on Resnet to obtain continuous frame lip language images in a time period from a starting time to an end time of a main body password reading to be detected in the video;

s130, determining a lip language feature set in the time period according to the lip language images of the continuous frames in the time period;

s140, inputting the lip language feature set into a trained bidirectional LSTM-based lip language identification model to obtain a predicted value of a lip language password;

2. The method for recognizing the face based on the lip language password as claimed in claim 1, wherein the method for constructing the two-way LSTM based lip language recognition model comprises the following steps:

s210, constructing an initial network layer for acquiring lip characteristics of a body to be detected; the initial network layer is a 2D convolutional network;

s220, constructing a bidirectional LSTM layer for extracting time features in training set data on the initial network layer;

s230, constructing a Softmax layer for outputting a predicted value of the lip language password on the bidirectional LSTM layer;

3. The method for recognizing a face based on a lip language password as claimed in claim 2, wherein the step S140 comprises:

in the Softmax layer, calculating the time characteristic data extracted by the bidirectional LSTM layer into a lip password predicted value through a lip password predicted value formula, wherein the lip password predicted value formula is as follows: p ═ W X + b;

wherein, P is a predicted value of the lip language password, X is time characteristic data, W is a weight, and b is an offset.

4. The method for identifying a face based on a lip language password as claimed in claim 1, wherein the step S130 of determining the lip language feature set in the time period comprises:

s310, normalizing the lip language image;

s320, storing the normalized lip language image into a data set sample with a specified format; the dataset sample comprises: number of samples, data sequence length, image width, image depth.

5. The lip-language-password-based face recognition method according to claim 1, wherein in step S120, the lip-language image is analyzed by a feature point model of a dlib database to obtain lip feature information; and the time period from the starting point moment to the end point moment of the password reading of the main body to be detected in the video is obtained by analyzing the sound waveform of the password reading video.

6. An electronic device, comprising: the device comprises a memory and a processor, wherein a face recognition program based on a lip language password is stored in the memory, and when being executed by the processor, the face recognition program based on the lip language password realizes the following steps:

s110, obtaining a password reading video of a main body to be detected;

7. The electronic device of claim 6, wherein the method for constructing the bi-directional LSTM-based lip language recognition model comprises:

8. The electronic device of claim 7, wherein step S140 comprises: in the Softmax layer, calculating the time characteristic data extracted by the bidirectional LSTM layer into a lip-language password predicted value through a lip-language password prediction formula, wherein the lip-language password predicted value formula is as follows: p ═ W X + b;

9. The electronic device according to claim 6, wherein in step S120, the lip feature information is obtained by parsing the lip language image through a feature point model of a dlib database; and the time period from the starting point moment to the end point moment of the password reading of the main body to be detected in the video is obtained by analyzing the sound waveform of the password reading video.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, the computer program comprising a lip-cipher-based face recognition program, which when executed by a processor, implements the steps of the lip-cipher-based face recognition method according to any one of claims 1 to 5.