CN110929239A - Terminal unlocking method based on lip language instruction - Google Patents

Terminal unlocking method based on lip language instruction Download PDF

Info

Publication number
CN110929239A
CN110929239A CN201911045860.6A CN201911045860A CN110929239A CN 110929239 A CN110929239 A CN 110929239A CN 201911045860 A CN201911045860 A CN 201911045860A CN 110929239 A CN110929239 A CN 110929239A
Authority
CN
China
Prior art keywords
lip
image
representing
frame
terminal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911045860.6A
Other languages
Chinese (zh)
Other versions
CN110929239B (en
Inventor
兰星
胡庆浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Artificial Intelligence Chip Innovation Institute Institute Of Automation Chinese Academy Of Sciences
Institute of Automation of Chinese Academy of Science
Original Assignee
Nanjing Artificial Intelligence Chip Innovation Institute Institute Of Automation Chinese Academy Of Sciences
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Artificial Intelligence Chip Innovation Institute Institute Of Automation Chinese Academy Of Sciences, Institute of Automation of Chinese Academy of Science filed Critical Nanjing Artificial Intelligence Chip Innovation Institute Institute Of Automation Chinese Academy Of Sciences
Priority to CN201911045860.6A priority Critical patent/CN110929239B/en
Publication of CN110929239A publication Critical patent/CN110929239A/en
Application granted granted Critical
Publication of CN110929239B publication Critical patent/CN110929239B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to a terminal unlocking method based on a lip language instruction. In the verification process, key feature points needing face recognition are extracted in the same way, the Euclidean distance of face features is calculated by adopting facenet network, and comparison threshold judgment is carried out. The user can design the instruction action by himself during collection, and the same action can be made during identification, so that the action instruction is not easy to be stolen by others, and the authentication safety is improved. Meanwhile, the lip language instruction unlocking method does not need large-scale operation on the terminal, so that the hardware performance requirement is greatly reduced, and the recognition speed is increased. The invention can avoid the problem of overlarge gradient caused by accumulation of a certain quadrant in the space, improve the network learning and training efficiency, play the effect of actively learning and training the model and solve the problem that the traditional fixed instruction action is easy to expose.

Description

Terminal unlocking method based on lip language instruction
Technical Field
The invention relates to a terminal unlocking method based on a lip language instruction, and belongs to the technical field of image information processing.
Background
At present, the terminal unlocking mode mainly comprises: face, fingerprint, iris. However, the information is easy to forge, and the static identification method is easy to crack, so that the security is poor, and the leakage of private information is easy to cause. The invention adopts a lip language instruction unlocking method to realize dynamic unlocking and improve the safety of authentication.
The existing lip language unlocking technology is extremely dependent on deep learning, a specific single instruction model needs to be trained at a PC (personal computer) end and then deployed at a terminal for use, and a user needs to match a fixed instruction action. The method has poor effect, does not adapt to the data of the user, only can adapt to fixed command actions, and the commands are easy to be exposed.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the defects of the existing unlocking technology, a terminal unlocking method based on a lip language instruction is provided.
The technical scheme is as follows: a terminal unlocking method based on a lip language instruction comprises the following steps:
step 1, a terminal camera collects a lip language instruction video frame of unlocking by a user, the terminal carries out face detection and extracts face features, and meanwhile, a lip region video frame is extracted;
step 2, extracting characteristic points of the lip video frame data set, matching the characteristic points of adjacent frames and marking position coordinates;
step 3, extracting the change characteristics of the positions of the characteristic points by using a frame difference method, namely the algebraic characteristics of the lip movement;
step 4, matching the human face in a database;
step 5, if matching is successful, people need to be identified to make the same lip language instruction action towards the terminal camera, the terminal extracts lip feature points similarly, and calculates algebraic features of lip movement, and whether matching is an unlocking instruction or not;
and 6, when the face is matched or the matching instruction is unsuccessful, prompting that the matching is failed, and jumping to the step 4.
In a further embodiment, the step 1 is further:
step 1-1, calculating a color histogram of an RGB space of each frame of a video segment, dividing each channel into 32 intervals according to pixel values, and carrying out normalization processing to obtain 96-dimensional features; forming a matrix by the characteristic vectors of each frame, performing dimensionality reduction on the matrix, and calculating an initialization clustering center:
Figure BDA0002254119410000011
in the formula, CnRepresenting the cluster center of the nth segment, fnFeature vector representing the nth frame, fn+1Represents the (n +1) th feature vector;
calculating the similarity of each new frame to the current cluster center, defining a threshold value sigma, and judging f when the similarity is greater than the threshold valuenBelonging to the cluster center CnAt this time, f isnAdding CnIn the method, a new clustering center C is obtained by updatingn′
Figure BDA0002254119410000021
In the formula (f)nFeature vector representing the nth frame, CnRepresenting the cluster center of the nth segment, Cn′Representing and updating to obtain a new clustering center;
when the similarity is smaller than the threshold value, f is judgednMembership to a new cluster center, using fnInitializing a new cluster center Cn′
Cn′=fn
Step 1-2, firstly, recognizing the contour of a human face, removing a background, carrying out lip cutting on the human face in a video frame, positioning the position of facial feature contour points in the human face, including the coordinates of a nose tip, the leftmost coordinates of the lips, the rightmost coordinates of the lips and the coordinates of a central point of a mouth, cutting an image containing lip details according to the coordinates, and calculating the cutting size according to a formula:
Figure BDA0002254119410000022
in the formula, LMNDistance, x, between coordinates representing nose tip and coordinates of center point of mouthRight sideAbscissa, y, representing the rightmost feature point of the lipRight sideOrdinate, x, representing the rightmost feature point of the lipLeft side ofAbscissa, y, representing the leftmost feature point of the lipLeft side ofA vertical coordinate representing a feature point at the leftmost side of the lip;
step 1-3, performing deviation correction on the cut lip image, training the lip image based on a binary model of a convolutional neural network, and judging whether the extracted lip image is an effective image:
Figure BDA0002254119410000023
where l denotes the number of convolution layers, k denotes the convolution kernel, b denotes the convolution offset, MjRepresenting the local perceptual value of the input, β the output parameter, and down () the pooling function.
In a further embodiment, the step 2 is further:
step 2-1, aiming at the cropped images extracted in the step 1, a D3D model is constructed to accelerate network convergence, and a loss function correction model is introduced:
Figure BDA0002254119410000031
in the formula (I), the compound is shown in the specification,
Figure BDA0002254119410000032
denoted is the cross entropy loss, { yiK is an indicator function, local (pre) denotes the network output probability, σ is a scaling factor;
where P ({ Z | X }) ═ Σk=1P (pi | X), which is the sum of the probabilities formed by all paths after merging;
step 2-2, respectively extracting feature points from the images of two adjacent frames and obtaining two sets of feature point sets:
p={p1、p2、p3…pn}
p′={p1′、p2′、p3′…pn′}
and respectively calculating pixel interpolation values of the neighborhoods of the two groups of feature points by taking the pixel values of the windows W of the neighborhoods of the two groups of feature points as descriptors of the feature points according to the two adjacent groups of feature points as centers:
Figure BDA0002254119410000033
in the formula, S represents the pixel interpolation of two groups of characteristic point fields, x represents the abscissa of a pixel point, y represents the ordinate of the pixel point, W represents a field window, a descriptor is made in the formula, p represents a previous frame image, and p' represents a next frame image;
step 2-3, according to the pixel interpolation obtained in the step 2-2, finding a matching point according to a matching coefficient between the feature point and a neighborhood window:
Figure BDA0002254119410000034
in the formula, G represents the gray value of the previous frame image, G' represents the gray value of the next frame image, C represents the matching coefficient, and the other symbols have the same meanings as above.
In a further embodiment, the step 3 is further:
step 3-1, recording images of three adjacent independent frames, respectively recording the images as f (n +1), f (n) and f (n-1), and respectively recording the gray values corresponding to the three frames of images as G (n +1)x,y、G(n)x,y、G(n-1)x,yAnd obtaining an image P' by adopting a frame difference method:
P′=|G(n+1)x,y-G(n)x,y|∩|G(n)x,y-G(n-1)x,y|
comparing the image P' with a preset threshold value T to analyze the liquidity, and extracting a moving target, wherein the comparison conditions are as follows:
Figure BDA0002254119410000035
in the formula, N represents the total number of pixels in the region to be detected, τ represents the suppression coefficient of illumination, a represents the image of the entire frame, and T is a threshold.
In a further embodiment, the step 4 is further:
step 4-1, on a multi-user terminal, such as a safe case and a door lock, face recognition is required to be carried out, and whether the face of the user exists in a matching database or not is matched; on a single user private terminal, such as a mobile phone and a tablet, face recognition is not needed, face verification can be performed, the facenet network is adopted to calculate the Euclidean distance of face features, and comparison threshold judgment is performed:
Figure BDA0002254119410000041
in the formula (I), the compound is shown in the specification,
Figure BDA0002254119410000047
a pair of positive samples is represented, and,
Figure BDA0002254119410000048
a pair of negative samples is represented, and,
Figure BDA0002254119410000049
representing flat sample pairs, α representing the range of constraints between positive and negative sample pairs, Φ representing the set of triples;
introducing a neuron model:
hW,b(x)=f(WTx)
wherein W represents a weight vector of a neuron, WTx denotes the nonlinear transformation of the input vector x, f (W)Tx) represents the activation function transformation of the weight vector;
assigning an input vector x to xiInto WTx:
Figure BDA0002254119410000045
In the formula, n represents the number of stages of the neural network, and b represents an offset.
In a further embodiment, the step 5 is further: the method comprises the following steps of establishing a coordinate axis by taking the center of a lip as a coordinate origin in an acquisition process, fitting an inner lip region in a lip gray image into two semi-ellipse combinations, enabling an upper inner lip to correspond to an upper ellipse, enabling a lower inner lip to correspond to a lower ellipse, and extracting change characteristics of corresponding characteristic point positions by using a frame difference method, namely algebraic characteristics of interframe lip motion:
recording images of two adjacent independent frames, respectively recording the images as f (n +1) and f (n), and respectively recording the gray values corresponding to the two frames of images as G (n +1)x,y、G(n)x,yObtaining an image P' by adopting a frame difference method:
P′=|G(n+1)x,y-G(n)x,y|
comparing the image P' with a preset threshold value T to analyze the liquidity, and extracting a moving target, wherein the comparison conditions are as follows:
Figure BDA0002254119410000046
in the formula, N represents the total number of pixels in the region to be detected, τ represents the suppression coefficient of illumination, a represents the image of the entire frame, and T is a threshold.
Has the advantages that: the invention relates to a terminal unlocking method based on a lip language instruction, wherein a user can design instruction actions by himself during collection and only needs to make the same actions during identification, so that the action instructions are not easy to steal by others, and the authentication safety is improved. Meanwhile, the lip language instruction unlocking method does not need large-scale operation on the terminal, so that the hardware performance requirement is greatly reduced, and the recognition speed is increased. According to the invention, through carrying out matrix dimensionality reduction processing, extracting feature points, initializing a clustering center and adopting facenet network to calculate the Euclidean distance of face features, the problem of overlarge gradient caused by accumulation of a certain quadrant in a space can be avoided, the network learning and training efficiency is improved, the effect of actively learning a training model is achieved, and the problem that the traditional fixed instruction action is easy to expose is solved.
Drawings
FIG. 1 is a flow chart of the present invention.
FIG. 2 is a schematic diagram of establishing a coordinate system for lips according to the present invention.
FIG. 3 is a diagram illustrating an image containing details of a lip cut out from a lip unlock command according to the present invention.
FIG. 4 is a schematic diagram of the introduction of a neuron model according to the present invention.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without one or more of these specific details. In other instances, well-known features have not been described in order to avoid obscuring the invention.
The applicant believes that in the field of lip language unlocking, the prior art extremely depends on deep learning, a specific single instruction model needs to be trained at a PC (personal computer) end and then deployed at a terminal for use, and a user needs to match a fixed instruction action. The method has poor effect, does not adapt to the data of the user, can only adapt to fixed command actions, and the commands are easy to expose, so that how to construct the lip language model and continuously improve the active learning of the machine are very important.
In order to solve the problems in the prior art, the invention provides a terminal unlocking method based on a lip language instruction, a user can design instruction actions by himself during collection and only needs to make the same actions during identification, so that the action instructions are not easy to steal by others, and the authentication safety is improved.
The technical scheme of the invention is further explained by the embodiment and the corresponding attached drawings.
Firstly, a terminal camera collects a lip language instruction video frame unlocked by a user, the terminal carries out face detection and extracts face features, and meanwhile, a lip region video frame is extracted; calculating a color histogram of an RGB space of each frame of a video clip, dividing each channel into 32 intervals according to pixel values, and carrying out normalization processing to obtain 96-dimensional features; forming a matrix by the characteristic vectors of each frame, performing dimensionality reduction on the matrix, and calculating an initialization clustering center:
Figure BDA0002254119410000061
in the formula, CnRepresenting the cluster center of the nth segment, fnFeature vector representing the nth frame, fn+1Represents the (n +1) th feature vector;
calculating the similarity of each new frame to the current cluster center, defining a threshold value sigma, and judging f when the similarity is greater than the threshold valuenBelonging to the cluster center CnAt this time, f isnAdding CnIn the method, a new clustering center C is obtained by updatingn′
Figure BDA0002254119410000062
In the formula (f)nFeature vector representing the nth frame, CnRepresenting the cluster center of the nth segment, Cn′Representing and updating to obtain a new clustering center;
when the similarity is smaller than the threshold value, f is judgednMembership to a new cluster center, using fnInitializing a new cluster center Cn′
Cn′=fn
Recognizing the outline of the face, removing the background, cutting the lips of the face in a video frame, positioning the positions of facial feature outline points in the face, including the coordinates of the nose tip, the leftmost coordinates of the lips, the rightmost coordinates of the lips and the coordinates of the center point of the mouth, cutting an image containing the details of the lips according to the coordinates, and calculating the cutting size according to a formula:
Figure BDA0002254119410000063
in the formula, LMNDistance, x, between coordinates representing nose tip and coordinates of center point of mouthRight sideRepresenting characteristic points of extreme right of the lipAbscissa, yRight sideOrdinate, x, representing the rightmost feature point of the lipLeft side ofAbscissa, y, representing the leftmost feature point of the lipLeft side ofA vertical coordinate representing a feature point at the leftmost side of the lip;
carrying out deviation correction on the cut lip images, training the lip images based on a binary model of a convolutional neural network, and judging whether the extracted lip images are effective images:
Figure BDA0002254119410000064
where l denotes the number of convolution layers, k denotes the convolution kernel, b denotes the convolution offset, MjRepresenting the local perceptual value of the input, β the output parameter, and down () the pooling function.
Then, extracting characteristic points of the lip video frame data set, matching the characteristic points of adjacent frames, and marking position coordinates;
for the extracted cropped images, a D3D model is constructed to accelerate network convergence, and a loss function correction model is introduced:
Figure BDA0002254119410000071
in the formula (I), the compound is shown in the specification,
Figure BDA0002254119410000076
denoted is the cross entropy loss, { yiK is an indicator function, local (pre) denotes the network output probability, σ is a scaling factor;
where P ({ Z | X }) ═ Σk=1P (pi | X), which is the sum of the probabilities formed by all paths after merging;
respectively extracting feature points from the images of two adjacent frames and obtaining two groups of feature point sets:
p={p1、p2、p3… pn}
p′={p1′、p2′、p3′ … pn′}
and respectively calculating pixel interpolation values of the neighborhoods of the two groups of feature points by taking the pixel values of the windows W of the neighborhoods of the two groups of feature points as descriptors of the feature points according to the two adjacent groups of feature points as centers:
Figure BDA0002254119410000073
in the formula, S represents the pixel interpolation of two groups of characteristic point fields, x represents the abscissa of a pixel point, y represents the ordinate of the pixel point, W represents a field window, a descriptor is made in the formula, p represents a previous frame image, and p' represents a next frame image;
according to the pixel interpolation obtained above, finding a matching point according to the matching coefficient between the feature point and the neighborhood window:
Figure BDA0002254119410000074
in the formula, G represents the gray value of the previous frame image, G' represents the gray value of the next frame image, C represents the matching coefficient, and the other symbols have the same meanings as above.
Then, extracting the change characteristics of the positions of the characteristic points by using a frame difference method, namely the algebraic characteristics of the lip movement; recording images of three adjacent independent frames, respectively recording the images as f (n +1), f (n-1), and respectively recording the gray values corresponding to the three frames as G (n +1)x,y、G(n)x,y、G(n-1)x,yAnd obtaining an image P' by adopting a frame difference method:
P′=|G(n+1)x,y-G(n)x,y|∩|G(n)x,y-G(n-1)x,y|
comparing the image P' with a preset threshold value T to analyze the liquidity, and extracting a moving target, wherein the comparison conditions are as follows:
Figure BDA0002254119410000075
in the formula, N represents the total number of pixels in the region to be detected, τ represents the suppression coefficient of illumination, a represents the image of the entire frame, and T is a threshold.
Step 4, matching the human face in a database: on a multi-user terminal, such as a safe case and a door lock, face recognition is required to be carried out, and whether the face of the user exists in a database or not is matched; on a single user private terminal, such as a mobile phone and a tablet, face recognition is not needed, face verification can be performed, the facenet network is adopted to calculate the Euclidean distance of face features, and comparison threshold judgment is performed:
Figure BDA0002254119410000081
in the formula (I), the compound is shown in the specification,
Figure BDA0002254119410000082
a pair of positive samples is represented, and,
Figure BDA0002254119410000083
a pair of negative samples is represented, and,
Figure BDA0002254119410000084
representing flat sample pairs, α representing the range of constraints between positive and negative sample pairs, Φ representing the set of triples;
introducing a neuron model:
hW,b(x)=f(WTx)
wherein W represents a weight vector of a neuron, WTx denotes the nonlinear transformation of the input vector x, f (W)Tx) represents the activation function transformation of the weight vector;
assigning an input vector x to xiInto WTx:
Figure BDA0002254119410000085
In the formula, n represents the number of stages of the neural network, and b represents an offset.
Step 5, if matching is successful, people need to be identified to make the same lip language instruction action towards the terminal camera, the terminal extracts lip feature points similarly, and calculates algebraic features of lip movement, and whether matching is an unlocking instruction or not; the method comprises the following steps of establishing a coordinate axis by taking the center of a lip as a coordinate origin in an acquisition process, fitting an inner lip region in a lip gray image into two semi-ellipse combinations, enabling an upper inner lip to correspond to an upper ellipse, enabling a lower inner lip to correspond to a lower ellipse, and extracting change characteristics of corresponding characteristic point positions by using a frame difference method, namely algebraic characteristics of interframe lip motion:
recording images of two adjacent independent frames, respectively recording the images as f (n +1) and f (n), and respectively recording the gray values corresponding to the two frames of images as G (n +1)x,y、G(n)x,yObtaining an image P' by adopting a frame difference method:
P′=|G(n+1)x,y-G(n)x,y|
comparing the image P' with a preset threshold value T to analyze the liquidity, and extracting a moving target, wherein the comparison conditions are as follows:
Figure BDA0002254119410000086
in the formula, N represents the total number of pixels in the region to be detected, τ represents the suppression coefficient of illumination, a represents the image of the entire frame, and T is a threshold.
And when the face matching or the matching instruction is unsuccessful, prompting that the matching is failed, continuing matching the face in the database, repeating the steps, and temporarily locking the terminal equipment when the matching is failed for more than three times.
In summary, aiming at the defects of the prior art, the invention provides a terminal unlocking method based on a lip language instruction, which is used for acquiring a face by taking several frames of images in the acquisition process and extracting part of key feature points. In the verification process, key feature points needing face recognition are extracted in the same way, the Euclidean distance of face features is calculated by adopting facenet network, and comparison threshold judgment is carried out. In the acquisition process, a coordinate axis is established by taking the center of the lip as a coordinate origin, an inner lip region in the lip gray image is fitted into two semiellipse combinations (an upper inner lip corresponds to an upper ellipse, and a lower inner lip corresponds to a lower ellipse), the variation characteristic of the position of the corresponding characteristic point, namely the algebraic characteristic of the interframe lip motion, is extracted by using a frame difference method, and a judgment threshold value is calculated. In the verification process, lip motion characteristics are extracted in the same way and are compared and judged. By carrying out matrix dimensionality reduction processing, extracting feature points, initializing a clustering center and adopting the facenet network to calculate the Euclidean distance of the face features, the problem of overlarge gradient caused by accumulation of a certain quadrant in a space can be avoided, the network learning and training efficiency is improved, the effect of actively learning a training model is achieved, and the problem that the traditional fixed instruction action is easy to expose is solved.
As noted above, while the present invention has been shown and described with reference to certain preferred embodiments, it is not to be construed as limited thereto. Various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (6)

1. A terminal unlocking method based on a lip language instruction is characterized by comprising the following steps:
step 1, a terminal camera collects a lip language instruction video frame of unlocking by a user, the terminal carries out face detection and extracts face features, and meanwhile, a lip region video frame is extracted;
step 2, extracting characteristic points of the lip video frame data set, matching the characteristic points of adjacent frames and marking position coordinates;
step 3, extracting the change characteristics of the positions of the characteristic points by using a frame difference method, namely the algebraic characteristics of the lip movement;
step 4, matching the human face in a database;
step 5, if matching is successful, people need to be identified to make the same lip language instruction action towards the terminal camera, the terminal extracts lip feature points similarly, and calculates algebraic features of lip movement, and whether matching is an unlocking instruction or not;
and 6, when the face is matched or the matching instruction is unsuccessful, prompting that the matching is failed, and jumping to the step 4.
2. The method for unlocking a terminal based on a lip language command according to claim 1, wherein the step 1 further comprises:
step 1-1, calculating a color histogram of an RGB space of each frame of a video segment, dividing each channel into 32 intervals according to pixel values, and carrying out normalization processing to obtain 96-dimensional features; forming a matrix by the characteristic vectors of each frame, performing dimensionality reduction on the matrix, and calculating an initialization clustering center:
Figure FDA0002254119400000011
in the formula, CnRepresenting the cluster center of the nth segment, fnFeature vector representing the nth frame, fn+1Represents the (n +1) th feature vector;
calculating the similarity of each new frame to the current cluster center, defining a threshold value sigma, and judging f when the similarity is greater than the threshold valuenBelonging to the cluster center CnAt this time, f isnAdding CnIn the method, a new clustering center C is obtained by updatingn′:
Figure FDA0002254119400000012
In the formula (f)nFeature vector representing the nth frame, CnRepresenting the cluster center of the nth segment, Cn' means update to get new cluster center;
when the similarity is smaller than the threshold value, f is judgednMembership to a new cluster center, using fnInitializing a new cluster center Cn′:
Cn′=fn
Step 1-2, firstly, recognizing the contour of a human face, removing a background, carrying out lip cutting on the human face in a video frame, positioning the position of facial feature contour points in the human face, including the coordinates of a nose tip, the leftmost coordinates of the lips, the rightmost coordinates of the lips and the coordinates of a central point of a mouth, cutting an image containing lip details according to the coordinates, and calculating the cutting size according to a formula:
Figure FDA0002254119400000021
in the formula, LMNDistance, x, between coordinates representing nose tip and coordinates of center point of mouthRight sideAbscissa, y, representing the rightmost feature point of the lipRight sideOrdinate, x, representing the rightmost feature point of the lipLeft side ofAbscissa, y, representing the leftmost feature point of the lipLeft side ofA vertical coordinate representing a feature point at the leftmost side of the lip;
step 1-3, performing deviation correction on the cut lip image, training the lip image based on a binary model of a convolutional neural network, and judging whether the extracted lip image is an effective image:
Figure FDA0002254119400000022
where l denotes the number of convolution layers, k denotes the convolution kernel, b denotes the convolution offset, MjRepresenting the local perceptual value of the input, β the output parameter, and down () the pooling function.
3. The method for unlocking a terminal based on a lip language command according to claim 1, wherein the step 2 further comprises:
step 2-1, aiming at the cropped images extracted in the step 1, a D3D model is constructed to accelerate network convergence, and a loss function correction model is introduced:
Figure FDA0002254119400000023
in the formula (I), the compound is shown in the specification,
Figure FDA0002254119400000024
denoted is the cross entropy loss, { yiK is an indicator function, local (pre) denotes the network output probability, σ is a scaling factor;
where P ({ Z | X }) ═ Σk=1P (pi | X), which is the sum of the probabilities formed by all paths after merging;
step 2-2, respectively extracting feature points from the images of two adjacent frames and obtaining two sets of feature point sets:
p={p1、p2、p3…pn}
p′={p1′、p2′、p3′…pn′}
and respectively calculating pixel interpolation values of the neighborhoods of the two groups of feature points by taking the pixel values of the windows W of the neighborhoods of the two groups of feature points as descriptors of the feature points according to the two adjacent groups of feature points as centers:
Figure FDA0002254119400000031
in the formula, S represents the pixel interpolation of two groups of characteristic point fields, x represents the abscissa of a pixel point, y represents the ordinate of the pixel point, W represents a field window, a descriptor is made in the formula, p represents a previous frame image, and p' represents a next frame image;
step 2-3, according to the pixel interpolation obtained in the step 2-2, finding a matching point according to a matching coefficient between the feature point and a neighborhood window:
Figure FDA0002254119400000032
in the formula, G represents the gray value of the previous frame image, G' represents the gray value of the next frame image, C represents the matching coefficient, and the other symbols have the same meanings as above.
4. The method for unlocking a terminal based on a lip language command according to claim 1, wherein the step 3 further comprises:
step 3-1, recording images of three adjacent independent frames, respectively recording the images as f (n +1), f (n) and f (n-1), and respectively recording the gray values corresponding to the three frames of images as G (n +1)x,y、G(n)x,y、G(n-1)x,yAnd obtaining an image P' by adopting a frame difference method:
P′=|G(n+1)x,y-G(n)x,y|∩|G(n)x,y-G(n-1)x,y|
comparing the image P' with a preset threshold value T to analyze the liquidity, and extracting a moving target, wherein the comparison conditions are as follows:
Figure FDA0002254119400000033
in the formula, N represents the total number of pixels in the region to be detected, τ represents the suppression coefficient of illumination, a represents the image of the entire frame, and T is a threshold.
5. The method for unlocking a terminal according to claim 1, wherein the step 4 further comprises:
step 4-1, on a multi-user terminal, such as a safe case and a door lock, face recognition is required to be carried out, and whether the face of the user exists in a matching database or not is matched; on a single user private terminal, such as a mobile phone and a tablet, face recognition is not needed, face verification can be performed, the facenet network is adopted to calculate the Euclidean distance of face features, and comparison threshold judgment is performed:
Figure FDA0002254119400000034
in the formula (I), the compound is shown in the specification,
Figure FDA0002254119400000035
a pair of positive samples is represented, and,
Figure FDA0002254119400000036
a pair of negative samples is represented, and,
Figure FDA0002254119400000037
representing flat sample pairs, α representing the range of constraints between positive and negative sample pairs, Φ representing the set of triples;
introducing a neuron model:
hW,b(x)=f(WTx)
wherein W represents a weight vector of a neuron, WTx represents the pair inputThe vector x is non-linearly transformed, f (W)Tx) represents the activation function transformation of the weight vector;
assigning an input vector x to xiInto WTx:
Figure FDA0002254119400000041
In the formula, n represents the number of stages of the neural network, and b represents an offset.
6. The method for unlocking a terminal according to claim 1, wherein the step 5 further comprises: the method comprises the following steps of establishing a coordinate axis by taking the center of a lip as a coordinate origin in an acquisition process, fitting an inner lip region in a lip gray image into two semi-ellipse combinations, enabling an upper inner lip to correspond to an upper ellipse, enabling a lower inner lip to correspond to a lower ellipse, and extracting change characteristics of corresponding characteristic point positions by using a frame difference method, namely algebraic characteristics of interframe lip motion:
recording images of two adjacent independent frames, respectively recording the images as f (n +1) and f (n), and respectively recording the gray values corresponding to the two frames of images as G (n +1)x,y、G(n)x,yObtaining an image P' by adopting a frame difference method:
P′=|G(n+1)x,y-G(n)x,y|
comparing the image P' with a preset threshold value T to analyze the liquidity, and extracting a moving target, wherein the comparison conditions are as follows:
Figure FDA0002254119400000042
wherein N represents the total number of pixels in the region to be detected, tau represents the suppression coefficient of illumination, A represents the image of the whole frame, T is a threshold value, G (N)x,yRepresenting the gray value corresponding to the nth frame image, G (n +1)x,yIndicating the corresponding gray value of the n +1 th frame image.
CN201911045860.6A 2019-10-30 2019-10-30 Terminal unlocking method based on lip language instruction Active CN110929239B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911045860.6A CN110929239B (en) 2019-10-30 2019-10-30 Terminal unlocking method based on lip language instruction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911045860.6A CN110929239B (en) 2019-10-30 2019-10-30 Terminal unlocking method based on lip language instruction

Publications (2)

Publication Number Publication Date
CN110929239A true CN110929239A (en) 2020-03-27
CN110929239B CN110929239B (en) 2021-11-19

Family

ID=69849882

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911045860.6A Active CN110929239B (en) 2019-10-30 2019-10-30 Terminal unlocking method based on lip language instruction

Country Status (1)

Country Link
CN (1) CN110929239B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733807A (en) * 2021-02-22 2021-04-30 佳都新太科技股份有限公司 Face comparison graph convolution neural network training method and device
CN114220142A (en) * 2021-11-24 2022-03-22 慧之安信息技术股份有限公司 Face feature recognition method of deep learning algorithm
CN114220177A (en) * 2021-12-24 2022-03-22 湖南大学 Lip syllable recognition method, device, equipment and medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130226587A1 (en) * 2012-02-27 2013-08-29 Hong Kong Baptist University Lip-password Based Speaker Verification System
US20150279364A1 (en) * 2014-03-29 2015-10-01 Ajay Krishnan Mouth-Phoneme Model for Computerized Lip Reading
WO2016061780A1 (en) * 2014-10-23 2016-04-28 Intel Corporation Method and system of facial expression recognition using linear relationships within landmark subsets
WO2016201679A1 (en) * 2015-06-18 2016-12-22 华为技术有限公司 Feature extraction method, lip-reading classification method, device and apparatus
CN106570461A (en) * 2016-10-21 2017-04-19 哈尔滨工业大学深圳研究生院 Video frame image extraction method and system based on lip movement identification
KR101767234B1 (en) * 2016-03-21 2017-08-10 양장은 System based on pattern recognition of blood vessel in lips
CN107358085A (en) * 2017-07-28 2017-11-17 惠州Tcl移动通信有限公司 A kind of unlocking terminal equipment method, storage medium and terminal device
CN108960103A (en) * 2018-06-25 2018-12-07 西安交通大学 The identity identifying method and system that a kind of face and lip reading blend
CN109409195A (en) * 2018-08-30 2019-03-01 华侨大学 A kind of lip reading recognition methods neural network based and system
CN110276259A (en) * 2019-05-21 2019-09-24 平安科技(深圳)有限公司 Lip reading recognition methods, device, computer equipment and storage medium
CN110276230A (en) * 2018-03-14 2019-09-24 阿里巴巴集团控股有限公司 The method, apparatus and electronic equipment that user authentication, lip reading identify

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130226587A1 (en) * 2012-02-27 2013-08-29 Hong Kong Baptist University Lip-password Based Speaker Verification System
US20150279364A1 (en) * 2014-03-29 2015-10-01 Ajay Krishnan Mouth-Phoneme Model for Computerized Lip Reading
WO2016061780A1 (en) * 2014-10-23 2016-04-28 Intel Corporation Method and system of facial expression recognition using linear relationships within landmark subsets
WO2016201679A1 (en) * 2015-06-18 2016-12-22 华为技术有限公司 Feature extraction method, lip-reading classification method, device and apparatus
KR101767234B1 (en) * 2016-03-21 2017-08-10 양장은 System based on pattern recognition of blood vessel in lips
CN106570461A (en) * 2016-10-21 2017-04-19 哈尔滨工业大学深圳研究生院 Video frame image extraction method and system based on lip movement identification
CN107358085A (en) * 2017-07-28 2017-11-17 惠州Tcl移动通信有限公司 A kind of unlocking terminal equipment method, storage medium and terminal device
CN110276230A (en) * 2018-03-14 2019-09-24 阿里巴巴集团控股有限公司 The method, apparatus and electronic equipment that user authentication, lip reading identify
CN108960103A (en) * 2018-06-25 2018-12-07 西安交通大学 The identity identifying method and system that a kind of face and lip reading blend
CN109409195A (en) * 2018-08-30 2019-03-01 华侨大学 A kind of lip reading recognition methods neural network based and system
CN110276259A (en) * 2019-05-21 2019-09-24 平安科技(深圳)有限公司 Lip reading recognition methods, device, computer equipment and storage medium

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
FLORIAN SCHROFF: "FaceNet: A Unified Embedding for Face Recognition and Clustering", 《2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
KEVIN LAI: "A Large-Scale Hierarchical Multi-View RGB-D Object Dataset", 《PROCEEDINGS - IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION·MAY 2011》 *
WEIXIN_30596343: "声纹识别(说话人识别)", 《HTTPS://BLOG.CSDN.NET/WEIXIN_30596343/DETAILS/99112089》 *
邱道尹: "帧差法在运动目标实时跟踪中的应用", 《华北水利水电学院学报》 *
郝会芬: "视频镜头分割和关键帧提取关键技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
陈春雨: "基于帧差法和边缘检测法的视频分割算法", 《济南大学学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733807A (en) * 2021-02-22 2021-04-30 佳都新太科技股份有限公司 Face comparison graph convolution neural network training method and device
CN114220142A (en) * 2021-11-24 2022-03-22 慧之安信息技术股份有限公司 Face feature recognition method of deep learning algorithm
CN114220142B (en) * 2021-11-24 2022-08-23 慧之安信息技术股份有限公司 Face feature recognition method of deep learning algorithm
CN114220177A (en) * 2021-12-24 2022-03-22 湖南大学 Lip syllable recognition method, device, equipment and medium
CN114220177B (en) * 2021-12-24 2024-06-25 湖南大学 Lip syllable recognition method, device, equipment and medium

Also Published As

Publication number Publication date
CN110929239B (en) 2021-11-19

Similar Documents

Publication Publication Date Title
CN111401257B (en) Face recognition method based on cosine loss under non-constraint condition
CN110929239B (en) Terminal unlocking method based on lip language instruction
CN111368683B (en) Face image feature extraction method and face recognition method based on modular constraint CenterFace
CN109949278B (en) Hyperspectral anomaly detection method based on antagonistic self-coding network
Shen et al. Finger vein recognition algorithm based on lightweight deep convolutional neural network
CN108764041B (en) Face recognition method for lower shielding face image
CN111639558B (en) Finger vein authentication method based on ArcFace Loss and improved residual error network
Tian et al. Ear recognition based on deep convolutional network
CN108268859A (en) A kind of facial expression recognizing method based on deep learning
CN105069434B (en) A kind of human action Activity recognition method in video
CN112036383B (en) Hand vein-based identity recognition method and device
CN109325440B (en) Human body action recognition method and system
CN107784263B (en) Planar rotation face detection method based on improved accelerated robust features
CN109325472B (en) Face living body detection method based on depth information
CN113591747A (en) Multi-scene iris recognition method based on deep learning
Huang et al. Human emotion recognition based on face and facial expression detection using deep belief network under complicated backgrounds
Chen et al. Robust gender recognition for uncontrolled environment of real-life images
CN111950454B (en) Finger vein recognition method based on bidirectional feature extraction
Tao et al. Design of face recognition system based on convolutional neural network
Ren et al. Alignment free and distortion robust iris recognition
CN112069898A (en) Method and device for recognizing human face group attribute based on transfer learning
CN111428643A (en) Finger vein image recognition method and device, computer equipment and storage medium
CN111160121A (en) Portrait recognition system, method and device based on deep learning
CN112800959B (en) Difficult sample mining method for data fitting estimation in face recognition
CN114998966A (en) Facial expression recognition method based on feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 211000 floor 3, building 3, Qilin artificial intelligence Industrial Park, 266 Chuangyan Road, Nanjing, Jiangsu

Applicant after: Zhongke Nanjing artificial intelligence Innovation Research Institute

Applicant after: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Address before: 211000 3rd floor, building 3, 266 Chuangyan Road, Jiangning District, Nanjing City, Jiangsu Province

Applicant before: NANJING ARTIFICIAL INTELLIGENCE CHIP INNOVATION INSTITUTE, INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Applicant before: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

GR01 Patent grant
GR01 Patent grant