CN110929239B - Terminal unlocking method based on lip language instruction - Google Patents

Terminal unlocking method based on lip language instruction Download PDF

Info

Publication number
CN110929239B
CN110929239B CN201911045860.6A CN201911045860A CN110929239B CN 110929239 B CN110929239 B CN 110929239B CN 201911045860 A CN201911045860 A CN 201911045860A CN 110929239 B CN110929239 B CN 110929239B
Authority
CN
China
Prior art keywords
representing
lip
image
frame
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911045860.6A
Other languages
Chinese (zh)
Other versions
CN110929239A (en
Inventor
兰星
胡庆浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Nanjing Artificial Intelligence Innovation Research Institute
Institute of Automation of Chinese Academy of Science
Original Assignee
Zhongke Nanjing Artificial Intelligence Innovation Research Institute
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Nanjing Artificial Intelligence Innovation Research Institute, Institute of Automation of Chinese Academy of Science filed Critical Zhongke Nanjing Artificial Intelligence Innovation Research Institute
Priority to CN201911045860.6A priority Critical patent/CN110929239B/en
Publication of CN110929239A publication Critical patent/CN110929239A/en
Application granted granted Critical
Publication of CN110929239B publication Critical patent/CN110929239B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to a terminal unlocking method based on a lip language instruction. In the verification process, key feature points needing face recognition are extracted in the same way, the Euclidean distance of face features is calculated by adopting facenet network, and comparison threshold judgment is carried out. The user can design the instruction action by himself during collection, and the same action can be made during identification, so that the action instruction is not easy to be stolen by others, and the authentication safety is improved. Meanwhile, the lip language instruction unlocking method does not need large-scale operation on the terminal, so that the hardware performance requirement is greatly reduced, and the recognition speed is increased. The invention can avoid the problem of overlarge gradient caused by accumulation of a certain quadrant in the space, improve the network learning and training efficiency, play the effect of actively learning and training the model and solve the problem that the traditional fixed instruction action is easy to expose.

Description

Terminal unlocking method based on lip language instruction
Technical Field
The invention relates to a terminal unlocking method based on a lip language instruction, and belongs to the technical field of image information processing.
Background
At present, the terminal unlocking mode mainly comprises: face, fingerprint, iris. However, the information is easy to forge, and the static identification method is easy to crack, so that the security is poor, and the leakage of private information is easy to cause. The invention adopts a lip language instruction unlocking method to realize dynamic unlocking and improve the safety of authentication.
The existing lip language unlocking technology is extremely dependent on deep learning, a specific single instruction model needs to be trained at a PC (personal computer) end and then deployed at a terminal for use, and a user needs to match a fixed instruction action. The method has poor effect, does not adapt to the data of the user, only can adapt to fixed command actions, and the commands are easy to be exposed.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the defects of the existing unlocking technology, a terminal unlocking method based on a lip language instruction is provided.
The technical scheme is as follows: a terminal unlocking method based on a lip language instruction comprises the following steps:
step 1, a terminal camera collects a lip language instruction video frame of unlocking by a user, the terminal carries out face detection and extracts face features, and meanwhile, a lip region video frame is extracted;
step 2, extracting characteristic points of the lip video frame data set, matching the characteristic points of adjacent frames and marking position coordinates;
step 3, extracting the change characteristics of the positions of the characteristic points by using a frame difference method, namely the algebraic characteristics of the lip movement;
step 4, matching the human face in a database;
step 5, if matching is successful, people need to be identified to make the same lip language instruction action towards the terminal camera, the terminal extracts lip feature points similarly, and calculates algebraic features of lip movement, and whether matching is an unlocking instruction or not;
and 6, when the face is matched or the matching instruction is unsuccessful, prompting that the matching is failed, and jumping to the step 4.
In a further embodiment, the step 1 is further:
step 1-1, calculating a color histogram of an RGB space of each frame of a video segment, dividing each channel into 32 intervals according to pixel values, and carrying out normalization processing to obtain 96-dimensional features; forming a matrix by the characteristic vectors of each frame, performing dimensionality reduction on the matrix, and calculating an initialization clustering center:
Figure BDA0002254119410000011
in the formula, CnRepresenting the cluster center of the nth segment, fnFeature vector representing the nth frame, fn+1Represents the (n +1) th feature vector;
calculating the similarity of each new frame to the current cluster center, defining a threshold value sigma, and judging f when the similarity is greater than the threshold valuenBelonging to the cluster center CnAt this time, f isnAdding CnIn the method, a new clustering center C is obtained by updatingn′
Figure BDA0002254119410000021
In the formula (f)nFeature vector representing the nth frame, CnRepresenting the cluster center of the nth segment, Cn′Representing and updating to obtain a new clustering center;
when the similarity is smaller than the threshold value, f is judgednMembership to a new cluster center, using fnInitializing a new cluster center Cn′
Cn′=fn
Step 1-2, firstly, recognizing the contour of a human face, removing a background, carrying out lip cutting on the human face in a video frame, positioning the position of facial feature contour points in the human face, including the coordinates of a nose tip, the leftmost coordinates of the lips, the rightmost coordinates of the lips and the coordinates of a central point of a mouth, cutting an image containing lip details according to the coordinates, and calculating the cutting size according to a formula:
Figure BDA0002254119410000022
in the formula, LMNDistance, x, between coordinates representing nose tip and coordinates of center point of mouthRight sideAbscissa, y, representing the rightmost feature point of the lipRight sideOrdinate, x, representing the rightmost feature point of the lipLeft side ofAbscissa, y, representing the leftmost feature point of the lipLeft side ofA vertical coordinate representing a feature point at the leftmost side of the lip;
step 1-3, performing deviation correction on the cut lip image, training the lip image based on a binary model of a convolutional neural network, and judging whether the extracted lip image is an effective image:
Figure BDA0002254119410000023
where l denotes the number of convolution layers, k denotes the convolution kernel, b denotes the convolution offset, MjRepresenting the local perceptual value of the input, beta the output parameter, and down () the pooling function.
In a further embodiment, the step 2 is further:
step 2-1, aiming at the cropped images extracted in the step 1, a D3D model is constructed to accelerate network convergence, and a loss function correction model is introduced:
Figure BDA0002254119410000031
in the formula,
Figure BDA0002254119410000032
denoted is the cross entropy loss, { yiK is an indicator function, local (pre) denotes the network output probability, σ is a scaling factor;
wherein, P({Z|X})=∑k=1P (pi | X), which is the sum of the probabilities formed by all paths after merging;
step 2-2, respectively extracting feature points from the images of two adjacent frames and obtaining two sets of feature point sets:
p={p1、p2、p3…pn}
p′={p1′、p2′、p3′…pn′}
and respectively calculating pixel interpolation values of the neighborhoods of the two groups of feature points by taking the pixel values of the windows W of the neighborhoods of the two groups of feature points as descriptors of the feature points according to the two adjacent groups of feature points as centers:
Figure BDA0002254119410000033
in the formula, S represents the pixel interpolation of two groups of characteristic point fields, x represents the abscissa of a pixel point, y represents the ordinate of the pixel point, W represents a field window, a descriptor is made in the formula, p represents a previous frame image, and p' represents a next frame image;
step 2-3, according to the pixel interpolation obtained in the step 2-2, finding a matching point according to a matching coefficient between the feature point and a neighborhood window:
Figure BDA0002254119410000034
in the formula, G represents the gray value of the previous frame image, G' represents the gray value of the next frame image, C represents the matching coefficient, and the other symbols have the same meanings as above.
In a further embodiment, the step 3 is further:
step 3-1, recording images of three adjacent independent frames, respectively recording the images as f (n +1), f (n) and f (n-1), and respectively recording the gray values corresponding to the three frames of images as G (n +1)x,y、G(n)x,y、G(n-1)x,yAnd obtaining an image P' by adopting a frame difference method:
P′=|G(n+1)x,y-G(n)x,y|∩|G(n)x,y-G(n-1)x,y|
comparing the image P' with a preset threshold value T to analyze the liquidity, and extracting a moving target, wherein the comparison conditions are as follows:
Figure BDA0002254119410000035
in the formula, N represents the total number of pixels in the region to be detected, τ represents the suppression coefficient of illumination, a represents the image of the entire frame, and T is a threshold.
In a further embodiment, the step 4 is further:
step 4-1, on a multi-user terminal, such as a safe case and a door lock, face recognition is required to be carried out, and whether the face of the user exists in a matching database or not is matched; on a single user private terminal, such as a mobile phone and a tablet, face recognition is not needed, face verification can be performed, the facenet network is adopted to calculate the Euclidean distance of face features, and comparison threshold judgment is performed:
Figure BDA0002254119410000041
in the formula,
Figure BDA0002254119410000047
a pair of positive samples is represented, and,
Figure BDA0002254119410000048
a pair of negative samples is represented, and,
Figure BDA0002254119410000049
representing a flat sample pair, alpha representing the constraint range between the positive sample pair and the negative sample pair, phi representing the set of triples;
introducing a neuron model:
hW,b(x)=f(WTx)
wherein W represents a weight vector of a neuron, WTx represents the non-linear transformation of an input vector x,f(WTx) represents the activation function transformation of the weight vector;
assigning an input vector x to xiInto WTx:
Figure BDA0002254119410000045
In the formula, n represents the number of stages of the neural network, and b represents an offset.
In a further embodiment, the step 5 is further: the method comprises the following steps of establishing a coordinate axis by taking the center of a lip as a coordinate origin in an acquisition process, fitting an inner lip region in a lip gray image into two semi-ellipse combinations, enabling an upper inner lip to correspond to an upper ellipse, enabling a lower inner lip to correspond to a lower ellipse, and extracting change characteristics of corresponding characteristic point positions by using a frame difference method, namely algebraic characteristics of interframe lip motion:
recording images of two adjacent independent frames, respectively recording the images as f (n +1) and f (n), and respectively recording the gray values corresponding to the two frames of images as G (n +1)x,y、G(n)x,yObtaining an image P' by adopting a frame difference method:
P′=|G(n+1)x,y-G(n)x,y|
comparing the image P' with a preset threshold value T to analyze the liquidity, and extracting a moving target, wherein the comparison conditions are as follows:
Figure BDA0002254119410000046
in the formula, N represents the total number of pixels in the region to be detected, τ represents the suppression coefficient of illumination, a represents the image of the entire frame, and T is a threshold.
Has the advantages that: the invention relates to a terminal unlocking method based on a lip language instruction, wherein a user can design instruction actions by himself during collection and only needs to make the same actions during identification, so that the action instructions are not easy to steal by others, and the authentication safety is improved. Meanwhile, the lip language instruction unlocking method does not need large-scale operation on the terminal, so that the hardware performance requirement is greatly reduced, and the recognition speed is increased. According to the invention, through carrying out matrix dimensionality reduction processing, extracting feature points, initializing a clustering center and adopting facenet network to calculate the Euclidean distance of face features, the problem of overlarge gradient caused by accumulation of a certain quadrant in a space can be avoided, the network learning and training efficiency is improved, the effect of actively learning a training model is achieved, and the problem that the traditional fixed instruction action is easy to expose is solved.
Drawings
FIG. 1 is a flow chart of the present invention.
FIG. 2 is a schematic diagram of establishing a coordinate system for lips according to the present invention.
FIG. 3 is a diagram illustrating an image containing details of a lip cut out from a lip unlock command according to the present invention.
FIG. 4 is a schematic diagram of the introduction of a neuron model according to the present invention.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without one or more of these specific details. In other instances, well-known features have not been described in order to avoid obscuring the invention.
The applicant believes that in the field of lip language unlocking, the prior art extremely depends on deep learning, a specific single instruction model needs to be trained at a PC (personal computer) end and then deployed at a terminal for use, and a user needs to match a fixed instruction action. The method has poor effect, does not adapt to the data of the user, can only adapt to fixed command actions, and the commands are easy to expose, so that how to construct the lip language model and continuously improve the active learning of the machine are very important.
In order to solve the problems in the prior art, the invention provides a terminal unlocking method based on a lip language instruction, a user can design instruction actions by himself during collection and only needs to make the same actions during identification, so that the action instructions are not easy to steal by others, and the authentication safety is improved.
The technical scheme of the invention is further explained by the embodiment and the corresponding attached drawings.
Firstly, a terminal camera collects a lip language instruction video frame unlocked by a user, the terminal carries out face detection and extracts face features, and meanwhile, a lip region video frame is extracted; calculating a color histogram of an RGB space of each frame of a video clip, dividing each channel into 32 intervals according to pixel values, and carrying out normalization processing to obtain 96-dimensional features; forming a matrix by the characteristic vectors of each frame, performing dimensionality reduction on the matrix, and calculating an initialization clustering center:
Figure BDA0002254119410000061
in the formula, CnRepresenting the cluster center of the nth segment, fnFeature vector representing the nth frame, fn+1Represents the (n +1) th feature vector;
calculating the similarity of each new frame to the current cluster center, defining a threshold value sigma, and judging f when the similarity is greater than the threshold valuenBelonging to the cluster center CnAt this time, f isnAdding CnIn the method, a new clustering center C is obtained by updatingn′
Figure BDA0002254119410000062
In the formula (f)nFeature vector representing the nth frame, CnRepresenting the cluster center of the nth segment, Cn′Representing and updating to obtain a new clustering center;
when the similarity is smaller than the threshold value, f is judgednMembership to a new cluster center, using fnInitializing a new cluster center Cn′
Cn′=fn
Recognizing the outline of the face, removing the background, cutting the lips of the face in a video frame, positioning the positions of facial feature outline points in the face, including the coordinates of the nose tip, the leftmost coordinates of the lips, the rightmost coordinates of the lips and the coordinates of the center point of the mouth, cutting an image containing the details of the lips according to the coordinates, and calculating the cutting size according to a formula:
Figure BDA0002254119410000063
in the formula, LMNDistance, x, between coordinates representing nose tip and coordinates of center point of mouthRight sideAbscissa, y, representing the rightmost feature point of the lipRight sideOrdinate, x, representing the rightmost feature point of the lipLeft side ofAbscissa, y, representing the leftmost feature point of the lipLeft side ofA vertical coordinate representing a feature point at the leftmost side of the lip;
carrying out deviation correction on the cut lip images, training the lip images based on a binary model of a convolutional neural network, and judging whether the extracted lip images are effective images:
Figure BDA0002254119410000064
where l denotes the number of convolution layers, k denotes the convolution kernel, b denotes the convolution offset, MjRepresenting the local perceptual value of the input, beta the output parameter, and down () the pooling function.
Then, extracting characteristic points of the lip video frame data set, matching the characteristic points of adjacent frames, and marking position coordinates;
for the extracted cropped images, a D3D model is constructed to accelerate network convergence, and a loss function correction model is introduced:
Figure BDA0002254119410000071
in the formula,
Figure BDA0002254119410000076
denoted is the cross entropy loss, { yiK is an indicator function, and location (pre) representsThe network output probability, sigma is a proportionality coefficient;
where P ({ Z | X }) ═ Σk=1P (pi | X), which is the sum of the probabilities formed by all paths after merging;
respectively extracting feature points from the images of two adjacent frames and obtaining two groups of feature point sets:
p={p1、p2、p3 … pn}
p′={p1′、p2′、p3′ … pn′}
and respectively calculating pixel interpolation values of the neighborhoods of the two groups of feature points by taking the pixel values of the windows W of the neighborhoods of the two groups of feature points as descriptors of the feature points according to the two adjacent groups of feature points as centers:
Figure BDA0002254119410000073
in the formula, S represents the pixel interpolation of two groups of characteristic point fields, x represents the abscissa of a pixel point, y represents the ordinate of the pixel point, W represents a field window, a descriptor is made in the formula, p represents a previous frame image, and p' represents a next frame image;
according to the pixel interpolation obtained above, finding a matching point according to the matching coefficient between the feature point and the neighborhood window:
Figure BDA0002254119410000074
in the formula, G represents the gray value of the previous frame image, G' represents the gray value of the next frame image, C represents the matching coefficient, and the other symbols have the same meanings as above.
Then, extracting the change characteristics of the positions of the characteristic points by using a frame difference method, namely the algebraic characteristics of the lip movement; recording images of three adjacent independent frames, respectively recording the images as f (n +1), f (n-1), and respectively recording the gray values corresponding to the three frames as G (n +1)x,y、G(n)x,y、G(n-1)x,yAnd obtaining an image P' by adopting a frame difference method:
P′=|G(n+1)x,y-G(n)x,y|∩|G(n)x,y-G(n-1)x,y|
comparing the image P' with a preset threshold value T to analyze the liquidity, and extracting a moving target, wherein the comparison conditions are as follows:
Figure BDA0002254119410000075
in the formula, N represents the total number of pixels in the region to be detected, τ represents the suppression coefficient of illumination, a represents the image of the entire frame, and T is a threshold.
Step 4, matching the human face in a database: on a multi-user terminal, such as a safe case and a door lock, face recognition is required to be carried out, and whether the face of the user exists in a database or not is matched; on a single user private terminal, such as a mobile phone and a tablet, face recognition is not needed, face verification can be performed, the facenet network is adopted to calculate the Euclidean distance of face features, and comparison threshold judgment is performed:
Figure BDA0002254119410000081
in the formula,
Figure BDA0002254119410000082
a pair of positive samples is represented, and,
Figure BDA0002254119410000083
a pair of negative samples is represented, and,
Figure BDA0002254119410000084
representing a flat sample pair, alpha representing the constraint range between the positive sample pair and the negative sample pair, phi representing the set of triples;
introducing a neuron model:
hW,b(x)=f(WTx)
wherein W represents a weight vector of a neuron, WTx denotes the nonlinear transformation of the input vector x, f (W)Tx) represents the activation function transformation of the weight vector;
assigning an input vector x to xiInto WTx:
Figure BDA0002254119410000085
In the formula, n represents the number of stages of the neural network, and b represents an offset.
Step 5, if matching is successful, people need to be identified to make the same lip language instruction action towards the terminal camera, the terminal extracts lip feature points similarly, and calculates algebraic features of lip movement, and whether matching is an unlocking instruction or not; the method comprises the following steps of establishing a coordinate axis by taking the center of a lip as a coordinate origin in an acquisition process, fitting an inner lip region in a lip gray image into two semi-ellipse combinations, enabling an upper inner lip to correspond to an upper ellipse, enabling a lower inner lip to correspond to a lower ellipse, and extracting change characteristics of corresponding characteristic point positions by using a frame difference method, namely algebraic characteristics of interframe lip motion:
recording images of two adjacent independent frames, respectively recording the images as f (n +1) and f (n), and respectively recording the gray values corresponding to the two frames of images as G (n +1)x,y、G(n)x,yObtaining an image P' by adopting a frame difference method:
P′=|G(n+1)x,y-G(n)x,y|
comparing the image P' with a preset threshold value T to analyze the liquidity, and extracting a moving target, wherein the comparison conditions are as follows:
Figure BDA0002254119410000086
in the formula, N represents the total number of pixels in the region to be detected, τ represents the suppression coefficient of illumination, a represents the image of the entire frame, and T is a threshold.
And when the face matching or the matching instruction is unsuccessful, prompting that the matching is failed, continuing matching the face in the database, repeating the steps, and temporarily locking the terminal equipment when the matching is failed for more than three times.
In summary, aiming at the defects of the prior art, the invention provides a terminal unlocking method based on a lip language instruction, which is used for acquiring a face by taking several frames of images in the acquisition process and extracting part of key feature points. In the verification process, key feature points needing face recognition are extracted in the same way, the Euclidean distance of face features is calculated by adopting facenet network, and comparison threshold judgment is carried out. In the acquisition process, a coordinate axis is established by taking the center of the lip as a coordinate origin, an inner lip region in the lip gray image is fitted into two semiellipse combinations (an upper inner lip corresponds to an upper ellipse, and a lower inner lip corresponds to a lower ellipse), the variation characteristic of the position of the corresponding characteristic point, namely the algebraic characteristic of the interframe lip motion, is extracted by using a frame difference method, and a judgment threshold value is calculated. In the verification process, lip motion characteristics are extracted in the same way and are compared and judged. By carrying out matrix dimensionality reduction processing, extracting feature points, initializing a clustering center and adopting the facenet network to calculate the Euclidean distance of the face features, the problem of overlarge gradient caused by accumulation of a certain quadrant in a space can be avoided, the network learning and training efficiency is improved, the effect of actively learning a training model is achieved, and the problem that the traditional fixed instruction action is easy to expose is solved.
As noted above, while the present invention has been shown and described with reference to certain preferred embodiments, it is not to be construed as limited thereto. Various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (1)

1. A terminal unlocking method based on a lip language instruction is characterized by comprising the following steps:
step 1, a terminal camera collects a lip language instruction video frame of unlocking by a user, the terminal carries out face detection and extracts face features, and meanwhile, a lip region video frame is extracted;
step 1-1, calculating a color histogram of an RGB space of each frame of a video segment, dividing each channel into 32 intervals according to pixel values, and carrying out normalization processing to obtain 96-dimensional features; forming a matrix by the characteristic vectors of each frame, performing dimensionality reduction on the matrix, and calculating an initialization clustering center:
Figure DEST_PATH_IMAGE002
in the formula,
Figure DEST_PATH_IMAGE004
the cluster center of the nth segment is represented,
Figure DEST_PATH_IMAGE006
a feature vector representing the n-th frame,
Figure DEST_PATH_IMAGE008
represents the (n +1) th feature vector;
calculating the similarity of each new frame to the current cluster center, and defining a threshold value
Figure DEST_PATH_IMAGE010
When the similarity is larger than the threshold value, judging
Figure 249203DEST_PATH_IMAGE006
Belonging to the cluster center
Figure 697502DEST_PATH_IMAGE004
At this time, the
Figure 273977DEST_PATH_IMAGE006
Adding into
Figure 731503DEST_PATH_IMAGE004
In the method, a new clustering center is obtained by updating
Figure DEST_PATH_IMAGE012
Figure DEST_PATH_IMAGE014
In the formula,
Figure 735755DEST_PATH_IMAGE006
a feature vector representing the n-th frame,
Figure 354956DEST_PATH_IMAGE004
the cluster center of the nth segment is represented,
Figure 418727DEST_PATH_IMAGE012
representing and updating to obtain a new clustering center;
when the similarity is smaller than the threshold value, judging
Figure 679944DEST_PATH_IMAGE006
Belonging to a new cluster centre, in this case
Figure 879981DEST_PATH_IMAGE006
Initializing new cluster centers
Figure 670082DEST_PATH_IMAGE012
Figure DEST_PATH_IMAGE016
Step 1-2, firstly, recognizing the contour of a human face, removing a background, carrying out lip cutting on the human face in a video frame, positioning the position of facial feature contour points in the human face, including the coordinates of a nose tip, the leftmost coordinates of the lips, the rightmost coordinates of the lips and the coordinates of a central point of a mouth, cutting an image containing lip details according to the coordinates, and calculating the cutting size according to a formula:
Figure DEST_PATH_IMAGE018
in the formula,
Figure DEST_PATH_IMAGE020
representing the distance between the coordinates of the tip of the nose and the coordinates of the centre point of the mouth,
Figure DEST_PATH_IMAGE022
the abscissa representing the rightmost feature point of the lip,
Figure DEST_PATH_IMAGE024
the ordinate representing the rightmost feature point of the lip,
Figure DEST_PATH_IMAGE026
the abscissa representing the leftmost feature point of the lip,
Figure DEST_PATH_IMAGE028
a vertical coordinate representing a feature point at the leftmost side of the lip;
step 1-3, performing deviation correction on the cut lip image, training the lip image based on a binary model of a convolutional neural network, and judging whether the extracted lip image is an effective image:
Figure DEST_PATH_IMAGE030
in the formula,lthe number of layers of convolution is represented,kwhich represents the kernel of the convolution,bwhich represents the offset of the convolution,
Figure DEST_PATH_IMAGE032
a local perceptual value representing the input is represented,
Figure DEST_PATH_IMAGE034
the output parameters are represented by a number of output parameters,
Figure DEST_PATH_IMAGE036
representing a pooling function;
step 2, extracting characteristic points of the lip video frame data set, matching the characteristic points of adjacent frames and marking position coordinates;
step 2-1, aiming at the cropped images extracted in the step 1, a D3D model is constructed to accelerate network convergence, and a loss function correction model is introduced:
Figure DEST_PATH_IMAGE038
in the formula,
Figure DEST_PATH_IMAGE040
it is indicated that the cross-entropy loss is,
Figure DEST_PATH_IMAGE042
in order to indicate the function,
Figure DEST_PATH_IMAGE044
the probability of the network output is represented,
Figure DEST_PATH_IMAGE046
is a proportionality coefficient;
wherein,
Figure DEST_PATH_IMAGE048
i.e. the sum of the probabilities formed after all paths are merged;
step 2-2, respectively extracting feature points from the images of two adjacent frames and obtaining two sets of feature point sets:
Figure DEST_PATH_IMAGE050
Figure DEST_PATH_IMAGE052
and respectively calculating pixel interpolation values of the neighborhoods of the two groups of feature points by taking the pixel values of the windows W of the neighborhoods of the two groups of feature points as descriptors of the feature points according to the two adjacent groups of feature points as centers:
Figure DEST_PATH_IMAGE054
in the formula,Spixel interpolation representing the domain of two sets of feature points,xthe abscissa representing the pixel point,yThe ordinate of the pixel point is represented,Wrepresenting a domain window, in this formula a descriptor, p represents the previous frame image,
Figure DEST_PATH_IMAGE056
representing the next frame of image;
step 2-3, according to the pixel interpolation obtained in the step 2-2, finding a matching point according to a matching coefficient between the feature point and a neighborhood window:
Figure DEST_PATH_IMAGE058
in the formula,
Figure DEST_PATH_IMAGE060
representing the gray value of the image of the previous frame,
Figure DEST_PATH_IMAGE062
representing the gray-scale value of the image of the next frame,Crepresenting the matching coefficient, and the other symbols have the same meanings as above;
step 3, extracting the change characteristics of the positions of the characteristic points by using a frame difference method, namely the algebraic characteristics of the lip movement;
step 3-1, recording images of three adjacent independent frames, and respectively recording the images
Figure DEST_PATH_IMAGE064
Figure DEST_PATH_IMAGE066
Figure DEST_PATH_IMAGE068
The gray values corresponding to the three frames of images are respectively recorded as
Figure DEST_PATH_IMAGE070
Figure DEST_PATH_IMAGE072
Figure DEST_PATH_IMAGE074
Obtaining images by frame difference method
Figure DEST_PATH_IMAGE076
Figure DEST_PATH_IMAGE078
The image is processed
Figure 875278DEST_PATH_IMAGE076
With a predetermined threshold valueTComparing and analyzing the circulation, and extracting a moving target under the comparison conditions:
Figure DEST_PATH_IMAGE080
in the formula,
Figure DEST_PATH_IMAGE082
representing the total number of pixels in the region to be detected,
Figure DEST_PATH_IMAGE084
the suppression coefficient of the light irradiation is represented,
Figure DEST_PATH_IMAGE086
an image representing a complete frame of the image,Tis a threshold value;
step 4, matching the human face in a database;
step 4-1, on a multi-user terminal, such as a safe case and a door lock, face recognition is required to be carried out, and whether the face of the user exists in a matching database or not is matched; on a single user private terminal, such as a mobile phone and a tablet, face recognition is not needed, face verification can be performed, the facenet network is adopted to calculate the Euclidean distance of face features, and comparison threshold judgment is performed:
Figure DEST_PATH_IMAGE088
in the formula,
Figure DEST_PATH_IMAGE090
a pair of positive samples is represented, and,
Figure DEST_PATH_IMAGE092
a pair of negative samples is represented, and,
Figure DEST_PATH_IMAGE094
a pair of flat samples is represented, and,
Figure DEST_PATH_IMAGE096
representing the range of constraints between pairs of positive and negative samples,
Figure DEST_PATH_IMAGE098
representing a set of triples;
introducing a neuron model:
Figure DEST_PATH_IMAGE100
in the formula,
Figure DEST_PATH_IMAGE102
a weight vector representing the neuron and,
Figure DEST_PATH_IMAGE104
representing input vectorxThe non-linear transformation is carried out,
Figure DEST_PATH_IMAGE106
representing an activation function transformation of the weight vector;
inputting vectorxIs assigned a value of
Figure DEST_PATH_IMAGE108
In a belt
Figure 796311DEST_PATH_IMAGE104
Figure DEST_PATH_IMAGE110
In the formula,nthe number of stages of the neural network is represented,brepresents an offset;
step 5, if matching is successful, people need to be identified to make the same lip language instruction action towards the terminal camera, the terminal extracts lip feature points similarly, and calculates algebraic features of lip movement, and whether matching is an unlocking instruction or not;
the method comprises the following steps of establishing a coordinate axis by taking the center of a lip as a coordinate origin in an acquisition process, fitting an inner lip region in a lip gray image into two semi-ellipse combinations, enabling an upper inner lip to correspond to an upper ellipse, enabling a lower inner lip to correspond to a lower ellipse, and extracting change characteristics of corresponding characteristic point positions by using a frame difference method, namely algebraic characteristics of interframe lip motion:
recording images of two adjacent separate frames, respectively
Figure 978418DEST_PATH_IMAGE064
Figure 205000DEST_PATH_IMAGE066
The gray values corresponding to the two frames of images are respectively recorded as
Figure 243363DEST_PATH_IMAGE070
Figure 315225DEST_PATH_IMAGE072
Obtaining an image by adopting a frame difference method
Figure 755433DEST_PATH_IMAGE076
Figure DEST_PATH_IMAGE112
The image is processed
Figure 746392DEST_PATH_IMAGE076
With a predetermined threshold valueTComparing and analyzing the circulation, and extracting a moving target under the comparison conditions:
Figure DEST_PATH_IMAGE114
in the formula,
Figure 151614DEST_PATH_IMAGE082
representing the total number of pixels in the region to be detected,
Figure 27166DEST_PATH_IMAGE084
the suppression coefficient of the light irradiation is represented,
Figure 321881DEST_PATH_IMAGE086
an image representing a complete frame of the image,Tis a threshold value, and is,
Figure 890265DEST_PATH_IMAGE072
indicating the corresponding gray value of the nth frame image,
Figure 637641DEST_PATH_IMAGE070
representing the corresponding gray value of the n +1 th frame image;
and 6, when the face is matched or the matching instruction is unsuccessful, prompting that the matching is failed, and jumping to the step 4.
CN201911045860.6A 2019-10-30 2019-10-30 Terminal unlocking method based on lip language instruction Active CN110929239B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911045860.6A CN110929239B (en) 2019-10-30 2019-10-30 Terminal unlocking method based on lip language instruction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911045860.6A CN110929239B (en) 2019-10-30 2019-10-30 Terminal unlocking method based on lip language instruction

Publications (2)

Publication Number Publication Date
CN110929239A CN110929239A (en) 2020-03-27
CN110929239B true CN110929239B (en) 2021-11-19

Family

ID=69849882

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911045860.6A Active CN110929239B (en) 2019-10-30 2019-10-30 Terminal unlocking method based on lip language instruction

Country Status (1)

Country Link
CN (1) CN110929239B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733807A (en) * 2021-02-22 2021-04-30 佳都新太科技股份有限公司 Face comparison graph convolution neural network training method and device
CN114220142B (en) * 2021-11-24 2022-08-23 慧之安信息技术股份有限公司 Face feature recognition method of deep learning algorithm
CN114220177B (en) * 2021-12-24 2024-06-25 湖南大学 Lip syllable recognition method, device, equipment and medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016061780A1 (en) * 2014-10-23 2016-04-28 Intel Corporation Method and system of facial expression recognition using linear relationships within landmark subsets
WO2016201679A1 (en) * 2015-06-18 2016-12-22 华为技术有限公司 Feature extraction method, lip-reading classification method, device and apparatus
CN106570461A (en) * 2016-10-21 2017-04-19 哈尔滨工业大学深圳研究生院 Video frame image extraction method and system based on lip movement identification
KR101767234B1 (en) * 2016-03-21 2017-08-10 양장은 System based on pattern recognition of blood vessel in lips
CN107358085A (en) * 2017-07-28 2017-11-17 惠州Tcl移动通信有限公司 A kind of unlocking terminal equipment method, storage medium and terminal device
CN108960103A (en) * 2018-06-25 2018-12-07 西安交通大学 The identity identifying method and system that a kind of face and lip reading blend
CN109409195A (en) * 2018-08-30 2019-03-01 华侨大学 A kind of lip reading recognition methods neural network based and system
CN110276230A (en) * 2018-03-14 2019-09-24 阿里巴巴集团控股有限公司 The method, apparatus and electronic equipment that user authentication, lip reading identify
CN110276259A (en) * 2019-05-21 2019-09-24 平安科技(深圳)有限公司 Lip reading recognition methods, device, computer equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9159321B2 (en) * 2012-02-27 2015-10-13 Hong Kong Baptist University Lip-password based speaker verification system
US20150279364A1 (en) * 2014-03-29 2015-10-01 Ajay Krishnan Mouth-Phoneme Model for Computerized Lip Reading

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016061780A1 (en) * 2014-10-23 2016-04-28 Intel Corporation Method and system of facial expression recognition using linear relationships within landmark subsets
WO2016201679A1 (en) * 2015-06-18 2016-12-22 华为技术有限公司 Feature extraction method, lip-reading classification method, device and apparatus
KR101767234B1 (en) * 2016-03-21 2017-08-10 양장은 System based on pattern recognition of blood vessel in lips
CN106570461A (en) * 2016-10-21 2017-04-19 哈尔滨工业大学深圳研究生院 Video frame image extraction method and system based on lip movement identification
CN107358085A (en) * 2017-07-28 2017-11-17 惠州Tcl移动通信有限公司 A kind of unlocking terminal equipment method, storage medium and terminal device
CN110276230A (en) * 2018-03-14 2019-09-24 阿里巴巴集团控股有限公司 The method, apparatus and electronic equipment that user authentication, lip reading identify
CN108960103A (en) * 2018-06-25 2018-12-07 西安交通大学 The identity identifying method and system that a kind of face and lip reading blend
CN109409195A (en) * 2018-08-30 2019-03-01 华侨大学 A kind of lip reading recognition methods neural network based and system
CN110276259A (en) * 2019-05-21 2019-09-24 平安科技(深圳)有限公司 Lip reading recognition methods, device, computer equipment and storage medium

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
A Large-Scale Hierarchical Multi-View RGB-D Object Dataset;Kevin Lai;《Proceedings - IEEE International Conference on Robotics and Automation·May 2011》;20140528;文章全文 *
FaceNet: A Unified Embedding for Face Recognition and Clustering;Florian Schroff;《2015 IEEE Conference on Computer Vision and Pattern Recognition》;20151015;文章全文 *
基于帧差法和边缘检测法的视频分割算法;陈春雨;《济南大学学报》;20120316;第26卷(第1期);第31-36页 *
声纹识别(说话人识别);weixin_30596343;《https://blog.csdn.net/weixin_30596343/details/99112089》;20180726;文章全文 *
帧差法在运动目标实时跟踪中的应用;邱道尹;《华北水利水电学院学报》;20100111;第30卷(第3期);第45-46、64页 *
视频镜头分割和关键帧提取关键技术研究;郝会芬;《中国优秀硕士学位论文全文数据库 信息科技辑》;20160215;第2-4章 *

Also Published As

Publication number Publication date
CN110929239A (en) 2020-03-27

Similar Documents

Publication Publication Date Title
CN111401257B (en) Face recognition method based on cosine loss under non-constraint condition
CN108009520B (en) Finger vein identification method and system based on convolution variational self-encoder network
CN110929239B (en) Terminal unlocking method based on lip language instruction
Shen et al. Finger vein recognition algorithm based on lightweight deep convolutional neural network
CN109949278B (en) Hyperspectral anomaly detection method based on antagonistic self-coding network
CN111160533B (en) Neural network acceleration method based on cross-resolution knowledge distillation
CN111368683B (en) Face image feature extraction method and face recognition method based on modular constraint CenterFace
Tian et al. Ear recognition based on deep convolutional network
CN108268859A (en) A kind of facial expression recognizing method based on deep learning
CN107103281A (en) Face identification method based on aggregation Damage degree metric learning
Liu et al. Gait recognition based on outermost contour
CN112036383B (en) Hand vein-based identity recognition method and device
CN106355138A (en) Face recognition method based on deep learning and key features extraction
CN107239741B (en) Single-sample face recognition method based on sparse reconstruction
CN110633655A (en) Attention-attack face recognition attack algorithm
CN111666845A (en) Small sample deep learning multi-mode sign language recognition method based on key frame sampling
CN113591747A (en) Multi-scene iris recognition method based on deep learning
CN109325472B (en) Face living body detection method based on depth information
CN108875645A (en) A kind of face identification method under the conditions of underground coal mine complex illumination
CN103745242A (en) Cross-equipment biometric feature recognition method
Huang et al. Human emotion recognition based on face and facial expression detection using deep belief network under complicated backgrounds
Chen et al. Robust gender recognition for uncontrolled environment of real-life images
Fang et al. Deep belief network based finger vein recognition using histograms of uniform local binary patterns of curvature gray images
CN114998966B (en) Facial expression recognition method based on feature fusion
CN112800959B (en) Difficult sample mining method for data fitting estimation in face recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 211000 floor 3, building 3, Qilin artificial intelligence Industrial Park, 266 Chuangyan Road, Nanjing, Jiangsu

Applicant after: Zhongke Nanjing artificial intelligence Innovation Research Institute

Applicant after: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Address before: 211000 3rd floor, building 3, 266 Chuangyan Road, Jiangning District, Nanjing City, Jiangsu Province

Applicant before: NANJING ARTIFICIAL INTELLIGENCE CHIP INNOVATION INSTITUTE, INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Applicant before: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant