CN110929239B - Terminal unlocking method based on lip language instruction - Google Patents
Terminal unlocking method based on lip language instruction Download PDFInfo
- Publication number
- CN110929239B CN110929239B CN201911045860.6A CN201911045860A CN110929239B CN 110929239 B CN110929239 B CN 110929239B CN 201911045860 A CN201911045860 A CN 201911045860A CN 110929239 B CN110929239 B CN 110929239B
- Authority
- CN
- China
- Prior art keywords
- lip
- image
- frame
- formula
- face
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 230000009471 action Effects 0.000 claims abstract description 19
- 230000008569 process Effects 0.000 claims abstract description 9
- 238000012795 verification Methods 0.000 claims abstract description 6
- 239000000284 extract Substances 0.000 claims description 23
- 239000013598 vector Substances 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 12
- 239000011159 matrix material Substances 0.000 claims description 8
- 230000008859 change Effects 0.000 claims description 7
- 210000002569 neuron Anatomy 0.000 claims description 7
- 238000005286 illumination Methods 0.000 claims description 6
- 230000001629 suppression Effects 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 230000001815 facial effect Effects 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000011946 reduction process Methods 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 2
- 238000001514 detection method Methods 0.000 claims description 2
- 239000012634 fragment Substances 0.000 claims 2
- 238000012549 training Methods 0.000 abstract description 6
- 230000000694 effects Effects 0.000 abstract description 5
- 238000009825 accumulation Methods 0.000 abstract description 3
- 238000013461 design Methods 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/31—User authentication
- G06F21/32—User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
Landscapes
- Engineering & Computer Science (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Computer Security & Cryptography (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
本发明涉及一种基于唇语指令的终端解锁方法,在采集过程中,取几帧图像进行人脸的获取,并提取部分关键特征点。在验证过程中,同理提取需要识别人脸的关键特征点,采用facenet网络计算人脸特征的欧氏距离,进行比较阈值判断。在采集时用户可自行设计指令动作,在识别时做出相同动作即可,这样动作指令不易被他人窃取,提高了认证的安全性。同时,该唇语指令解锁方法无需在终端上进行大规模的运算,极大的降低了硬件性能要求,提高了识别速度。本发明能够避免空间内某一象限堆积造成梯度过大的问题,并提高网络学习和训练效率,起到主动学习训练模型的效果,解决了传统的固定指令动作易暴露的问题。
The invention relates to a terminal unlocking method based on lip language instructions. During the acquisition process, several frames of images are taken to acquire the face, and some key feature points are extracted. In the verification process, the key feature points that need to be recognized for the face are extracted in the same way, and the Euclidean distance of the face features is calculated by the facenet network, and the threshold is compared and judged. When collecting, users can design command actions by themselves, and just make the same actions when identifying, so that action commands are not easily stolen by others, and the security of authentication is improved. At the same time, the lip language instruction unlocking method does not require large-scale operations on the terminal, which greatly reduces the hardware performance requirements and improves the recognition speed. The invention can avoid the problem of excessive gradient caused by accumulation of a certain quadrant in the space, improve the efficiency of network learning and training, play the effect of active learning and training model, and solve the problem of easy exposure of traditional fixed command actions.
Description
技术领域technical field
本发明涉及一种基于唇语指令的终端解锁方法,属于图像信息处理技术领域。The invention relates to a terminal unlocking method based on lip language instructions, and belongs to the technical field of image information processing.
背景技术Background technique
目前终端解锁方式主要包括:人脸,指纹,虹膜。但是这些信息很容易被伪造,采用静态识别方法很容易被破解,安全性较差,容易导致私人信息的泄露。本发明采用唇语指令解锁的方法,实现动态解锁,可以提高认证的安全性。At present, the terminal unlocking methods mainly include: face, fingerprint, and iris. However, these information can be easily forged, and the static identification method is easy to be cracked, and the security is poor, which can easily lead to the leakage of private information. The invention adopts the method of unlocking by lip language instruction, realizes dynamic unlocking, and can improve the security of authentication.
现有唇语解锁技术极度依赖深度学习,需要在PC端训练出特定单一指令模型,然后部署在终端使用,用户需要匹配固定的指令动作。此方法实现的效果差,并没有对使用者的数据进行适配,只能适应于固定的指令动作,并且指令容易被暴露。The existing lip-language unlocking technology relies heavily on deep learning. It needs to train a specific single command model on the PC side, and then deploy it on the terminal. Users need to match fixed command actions. The effect of this method is poor, it does not adapt to the user's data, it can only adapt to fixed command actions, and the commands are easily exposed.
发明内容SUMMARY OF THE INVENTION
发明目的:针对现有解锁技术的不足,提供一种基于唇语指令的终端解锁方法。Purpose of the invention: Aiming at the deficiencies of the existing unlocking technology, a terminal unlocking method based on lip language instruction is provided.
技术方案:一种基于唇语指令的终端解锁方法,包括以下步骤:Technical solution: a terminal unlocking method based on lip language instruction, comprising the following steps:
步骤1、终端摄像头采集用户开锁的唇语指令视频帧,终端进行人脸检测并提取人脸特征,同时提取出唇部区域视频帧;
步骤2、对嘴唇视频帧数据集提取特征点,并且匹配相邻帧的特征点,标记位置坐标;Step 2, extract feature points from the lip video frame data set, and match the feature points of adjacent frames, and mark the position coordinates;
步骤3、使用帧差法提取特征点位置的变化特征,即嘴唇运动的代数特征;Step 3. Use the frame difference method to extract the change feature of the feature point position, that is, the algebraic feature of the lip movement;
步骤4、在数据库中匹配人脸;Step 4. Match faces in the database;
步骤5、如果匹配成功,需要识别人对着终端摄像头做出同样的唇语指令动作,终端同样提取出唇部特征点,并计算嘴唇运动的代数特征,匹配是否是解锁指令;Step 5. If the matching is successful, it is necessary to recognize that the person makes the same lip language command action to the terminal camera, and the terminal also extracts the lip feature points, and calculates the algebraic features of the lip movement to determine whether the match is an unlock command;
步骤6、当匹配人脸或者匹配指令不成功时,提示匹配失败,并且跳到步骤4。Step 6. When matching the face or the matching instruction is unsuccessful, it indicates that the matching fails, and skips to Step 4.
在进一步的实施例中,所述步骤1进一步为:In a further embodiment, the
步骤1-1、对视频片段的每一帧计算其RGB空间的颜色直方图,每个通道按照像素值划分为32个区间,并归一化处理,得到96维特征;将每一帧的特征向量组成矩阵,对改矩阵进行降维处理,计算出初始化聚类中心:Step 1-1. Calculate the color histogram of the RGB space for each frame of the video clip, each channel is divided into 32 intervals according to the pixel value, and normalized to obtain 96-dimensional features; the features of each frame are The vectors form a matrix, and the dimensionality reduction process is performed on the modified matrix to calculate the initial cluster center:
式中,Cn表示第n个片段的聚类中心,fn表示第n帧的特征向量,fn+1表示第n+1个特征向量;In the formula, C n represents the cluster center of the nth segment, fn represents the feature vector of the nth frame, and fn +1 represents the n+1th feature vector;
计算每个新帧对于当前的聚类中心的相似度,规定一个阈值σ,当相似度大于该阈值,则判断fn隶属于该聚类中心Cn,此时将fn加入Cn中,更新得到新的聚类中心Cn′:Calculate the similarity of each new frame to the current cluster center, and specify a threshold σ. When the similarity is greater than the threshold, it is judged that f n belongs to the cluster center C n , and f n is added to C n at this time, Update to get a new cluster center C n′ :
式中,fn表示第n帧的特征向量,Cn表示第n个片段的聚类中心,Cn′表示更新得到新的聚类中心;In the formula, fn represents the feature vector of the nth frame, Cn represents the cluster center of the nth segment, and Cn′ represents the new cluster center obtained by updating;
当相似度小于该阈值,则判断fn隶属于新的聚类中心,此时用fn初始化新的聚类中心Cn′:When the similarity is less than the threshold, it is judged that f n belongs to a new cluster center, and at this time, f n is used to initialize the new cluster center C n′ :
Cn′=fn C n′ =f n
步骤1-2、首先对人脸轮廓进行识别,并去除背景,对视频帧中的人脸进行唇部裁切,定位人脸中五官轮廓点的位置,包括鼻梢的坐标、唇部最左侧的坐标、唇部最右侧的坐标、嘴部中心点的坐标,根据这些坐标裁切处包含唇部细节的图像,裁切尺寸由此公式计算:Step 1-2, firstly identify the contour of the face, remove the background, cut the lips of the face in the video frame, and locate the position of the contour points of the facial features in the face, including the coordinates of the tip of the nose and the leftmost lip. The coordinates of the side, the coordinates of the far right side of the lip, and the coordinates of the center point of the mouth. According to these coordinates, the image containing the details of the lip is cropped, and the crop size is calculated by this formula:
式中,LMN表示鼻梢的坐标与嘴部中心点的坐标之间的距离,x右表示唇部最右侧特征点的横坐标,y右表示唇部最右侧特征点的纵坐标,x左表示唇部最左侧特征点的横坐标,y左表示唇部最左侧特征点的纵坐标;In the formula, L MN represents the distance between the coordinates of the nose tip and the coordinates of the center point of the mouth, x right represents the abscissa of the rightmost feature point of the lip, y right represents the ordinate of the rightmost feature point of the lip, x left represents the abscissa of the leftmost feature point of the lip, and y left represents the ordinate of the leftmost feature point of the lip;
步骤1-3、对裁切出的唇部图像进行偏差纠正,训练该唇部图像基于卷积神经网络的二分模型,判断提取出的唇部图像是否为有效图像:Steps 1-3, perform deviation correction on the cropped lip image, train the lip image based on the bipartite model of the convolutional neural network, and judge whether the extracted lip image is a valid image:
式中,l表示卷积层数,k表示卷积核,b表示卷积偏置,Mj表示输入的局部感受值,β表示输出参数,down()表示池化函数。In the formula, l represents the number of convolution layers, k represents the convolution kernel, b represents the convolution bias, M j represents the input local receptive value, β represents the output parameter, and down() represents the pooling function.
在进一步的实施例中,所述步骤2进一步为:In a further embodiment, the step 2 is further:
步骤2-1、针对步骤1中提取出的裁切图像,构建D3D模型加速网络收敛,并引入损失函数纠正模型:Step 2-1. For the cropped image extracted in
式中,表示的是交叉熵损失,{yi=k}为指示函数,logit(pre)表示网络输出概率,σ是比例系数;In the formula, represents the cross-entropy loss, {y i =k} is the indicator function, logit(pre) represents the network output probability, and σ is the proportional coefficient;
其中,P({Z|X})=∑k=1P(π||X),即所有路径经过合并之后形成的概率之和;Among them, P({Z|X})=∑ k=1 P(π||X), that is, the sum of the probabilities formed after all paths are combined;
步骤2-2、分别对相邻两帧的图像提取特征点并得到两组特征点集合:Step 2-2, extract feature points from the images of two adjacent frames and obtain two sets of feature point sets:
p={p1、p2、p3…pn}p={p 1 , p 2 , p 3 . . . p n }
p′={p1′、p2′、p3′…pn′}p'={p 1 ', p 2 ', p 3 '...p n '}
根据相邻两组特征点为中心,将其邻域的窗口W的像素值作为该特征点的描述符,分别计算两组特征点邻域的像素插值:According to the center of the adjacent two groups of feature points, the pixel value of the window W of its neighborhood is used as the descriptor of the feature point, and the pixel interpolation of the neighborhood of the two groups of feature points is calculated separately:
式中,S表示两组特征点领域的像素插值,x表示像素点的横坐标、y表示像素点的纵坐标,W表示领域窗口,在此公式中做描述符,p表示前一帧图像,p′表示后一帧图像;In the formula, S represents the pixel interpolation of the two groups of feature points, x represents the abscissa of the pixel, y represents the ordinate of the pixel, W represents the field window, which is used as a descriptor in this formula, p represents the previous frame image, p' represents the next frame of image;
步骤2-3、根据步骤2-2中得出的像素插值,以根据特征点与邻域窗口之间的匹配系数寻找匹配点:Step 2-3, according to the pixel interpolation obtained in step 2-2, to find the matching point according to the matching coefficient between the feature point and the neighborhood window:
式中,G表示前一帧图像的灰度值,G′表示后一帧图像的灰度值,C表示匹配系数,其余符号含义同上。In the formula, G represents the gray value of the image of the previous frame, G′ represents the gray value of the image of the next frame, C represents the matching coefficient, and other symbols have the same meanings as above.
在进一步的实施例中,所述步骤3进一步为:In a further embodiment, the step 3 is further:
步骤3-1、记录相邻三个单独帧的图像,分别记为f(n+1)、f(n)、f(n-1),三帧图像对应的灰度值分别记为G(n+1)x,y、G(n)x,y、G(n-1)x,y,采用帧差法得到图像P′:Step 3-1. Record the images of three adjacent individual frames, denoted as f(n+1), f(n), and f(n-1), respectively, and the gray values corresponding to the three frames of images are respectively denoted as G( n+1) x,y , G(n) x,y , G(n-1) x,y , using the frame difference method to get the image P′:
P′=|G(n+1)x,y-G(n)x,y|∩|G(n)x,y-G(n-1)x,y|P′=|G(n+1) x,y -G(n) x,y |∩|G(n) x,y -G(n-1) x,y |
将该图像P′与预设的阈值T比对分析流通性,提取出运动目标,比对条件为:Compare the image P' with the preset threshold T to analyze the circulation, and extract the moving target. The comparison conditions are:
式中,N表示待检测区域中像素的总数目,τ表示光照的抑制系数,A表示完整帧的图像,T为阈值。In the formula, N represents the total number of pixels in the area to be detected, τ represents the suppression coefficient of illumination, A represents the image of the complete frame, and T is the threshold.
在进一步的实施例中,所述步骤4进一步为:In a further embodiment, the step 4 is further:
步骤4-1、在多用户终端上,如保险柜、门锁,需要进行人脸识别,匹配数据库中是否存在这个用户的人脸;在单一用户私人终端上,如手机、平板,不需要进行人脸识别,人脸验证即可,采用facenet网络计算人脸特征的欧氏距离,进行比较阈值判断:Step 4-1. On multi-user terminals, such as safes and door locks, face recognition needs to be performed to match whether the user's face exists in the database; on single-user private terminals, such as mobile phones and tablets, no need to perform face recognition. Face recognition, face verification is enough, the facenet network is used to calculate the Euclidean distance of the face features, and the comparison threshold is judged:
式中,表示正样本对,表示负样本对,表示平样本对,α表示正样本对与负样本对之间的约束范围,Φ表示三元组的集合;In the formula, represents a positive sample pair, represents the negative sample pair, represents the flat sample pair, α represents the constraint range between the positive sample pair and the negative sample pair, and Φ represents the set of triples;
引入神经元模型:Introduce the neuron model:
hW,b(x)=f(WTx)h W,b (x)=f(W T x)
式中,W表示神经元的权重向量,WTx表示对输入向量x进行非线性变换,f(WTx)表示对该权重向量进行激活函数转换;In the formula, W represents the weight vector of the neuron, W T x represents the nonlinear transformation of the input vector x, and f(W T x) represents the activation function transformation of the weight vector;
将输入向量x赋值为xi,带入WTx:Assign the input vector x to x i , bringing in W T x :
式中,n表示神经网络的级数,b表示偏量。In the formula, n represents the series of the neural network, and b represents the bias.
在进一步的实施例中,所述步骤5进一步为:在采集过程中以嘴唇中心为坐标原点建立坐标轴,将嘴唇灰度图像中内唇区域拟合成两个半椭圆组合,上内唇对应上椭圆,下内唇对应下椭圆,使用帧差法提取对应特征点位置的变化特征即帧间嘴唇运动的代数特征:In a further embodiment, the step 5 is further: in the acquisition process, the center of the lips is used as the coordinate origin to establish a coordinate axis, and the inner lip area in the lip grayscale image is fitted into a combination of two semi-ellipses, and the upper inner lip corresponds to The upper ellipse, the lower inner lip corresponds to the lower ellipse, and the frame difference method is used to extract the change feature of the corresponding feature point position, that is, the algebraic feature of the lip movement between frames:
记录相邻两个单独帧的图像,分别记为f(n+1)、f(n),两帧图像对应的灰度值分别记为G(n+1)x,y、G(n)x,y、,采用帧差法得到图像P′:Record the images of two adjacent separate frames, denoted as f(n+1), f(n), respectively, and the gray values corresponding to the two frames of images are denoted as G(n+1) x,y , G(n) x, y , and the image P' is obtained by the frame difference method:
P′=|G(n+1)x,y-G(n)x,y|P′=|G(n+1) x,y -G(n) x,y |
将该图像P′与预设的阈值T比对分析流通性,提取出运动目标,比对条件为:Compare the image P' with the preset threshold T to analyze the circulation, and extract the moving target. The comparison conditions are:
式中,N表示待检测区域中像素的总数目,τ表示光照的抑制系数,A表示完整帧的图像,T为阈值。In the formula, N represents the total number of pixels in the area to be detected, τ represents the suppression coefficient of illumination, A represents the image of the complete frame, and T is the threshold.
有益效果:本发明涉及一种基于唇语指令的终端解锁方法,在采集时用户可自行设计指令动作,在识别时做出相同动作即可,这样动作指令不易被他人窃取,提高了认证的安全性。同时,该唇语指令解锁方法无需在终端上进行大规模的运算,极大的降低了硬件性能要求,提高了识别速度。本发明通过对矩阵降维处理、提取特征点、初始化聚类中心、采用facenet网络计算人脸特征的欧氏距离,能够避免空间内某一象限堆积造成梯度过大的问题,并提高网络学习和训练效率,起到主动学习训练模型的效果,解决了传统的固定指令动作易暴露的问题。Beneficial effects: The present invention relates to a terminal unlocking method based on lip language commands. When collecting, users can design command actions by themselves, and they can make the same actions when recognizing, so that action commands are not easily stolen by others, and the security of authentication is improved. sex. At the same time, the lip language instruction unlocking method does not require large-scale operations on the terminal, which greatly reduces the hardware performance requirements and improves the recognition speed. The invention can avoid the problem of excessive gradient caused by the accumulation of a certain quadrant in the space by reducing the matrix, extracting feature points, initializing the cluster center, and using the facenet network to calculate the Euclidean distance of the face features, and improving the network learning and The training efficiency has the effect of actively learning the training model, which solves the problem that the traditional fixed command actions are easy to be exposed.
附图说明Description of drawings
图1为本发明的流程图。FIG. 1 is a flow chart of the present invention.
图2为本发明为嘴唇建立坐标系的示意图。FIG. 2 is a schematic diagram of establishing a coordinate system for lips according to the present invention.
图3为本发明唇语解锁指令中裁切出包含唇部细节的图像。FIG. 3 is a cropped image including lip details in the lip language unlocking instruction of the present invention.
图4为本发明引入神经元模型的示意图。FIG. 4 is a schematic diagram of introducing a neuron model according to the present invention.
具体实施方式Detailed ways
在下文的描述中,给出了大量具体的细节以便提供对本发明更为彻底的理解。然而,对于本领域技术人员而言显而易见的是,本发明可以无需一个或多个这些细节而得以实施。在其他的例子中,为了避免与本发明发生混淆,对于本领域公知的一些技术特征未进行描述。In the following description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without one or more of these details. In other instances, some technical features known in the art have not been described in order to avoid obscuring the present invention.
申请人认为,在唇语解锁领域,现有技术极度依赖深度学习,需要在PC端训练出特定单一指令模型,然后部署在终端使用,用户需要匹配固定的指令动作。此方法实现的效果差,并没有对使用者的数据进行适配,只能适应于固定的指令动作,并且指令容易被暴露,因此,如何对唇语模型进行构建、并不断提高机器的主动学习性是至关重要的。The applicant believes that in the field of lip language unlocking, the existing technology relies heavily on deep learning, and a specific single command model needs to be trained on the PC side, and then deployed on the terminal for use, and the user needs to match fixed command actions. The effect of this method is poor, and it does not adapt to the user's data, but can only adapt to fixed command actions, and the commands are easily exposed. Therefore, how to build a lip language model and continuously improve the active learning of the machine Sex is crucial.
为解决现有技术的存在的上述问题,本发明提出了一种基于唇语指令的终端解锁方法,在采集时用户可自行设计指令动作,在识别时做出相同动作即可,这样动作指令不易被他人窃取,提高了认证的安全性。In order to solve the above-mentioned problems existing in the prior art, the present invention proposes a terminal unlocking method based on lip language instructions. When collecting, the user can design an instruction action by himself, and the same action can be made when recognizing, so that the action instruction is not easy. It is stolen by others, which improves the security of authentication.
下面通过实施例,并结合相应附图,对本发明的技术方案做进一步说明。The technical solutions of the present invention will be further described below through examples and in conjunction with the corresponding drawings.
首先,终端摄像头采集用户开锁的唇语指令视频帧,终端进行人脸检测并提取人脸特征,同时提取出唇部区域视频帧;对视频片段的每一帧计算其RGB空间的颜色直方图,每个通道按照像素值划分为32个区间,并归一化处理,得到96维特征;将每一帧的特征向量组成矩阵,对改矩阵进行降维处理,计算出初始化聚类中心:First, the terminal camera collects the lip language instruction video frame of the user unlocking, the terminal detects the face and extracts the face features, and extracts the video frame of the lip area at the same time; Each channel is divided into 32 intervals according to the pixel value, and normalized to obtain 96-dimensional features; the eigenvectors of each frame are formed into a matrix, and the dimensionality reduction process is performed on the modified matrix to calculate the initialization cluster center:
式中,Cn表示第n个片段的聚类中心,fn表示第n帧的特征向量,fn+1表示第n+1个特征向量;In the formula, C n represents the cluster center of the nth segment, fn represents the feature vector of the nth frame, and fn +1 represents the n+1th feature vector;
计算每个新帧对于当前的聚类中心的相似度,规定一个阈值σ,当相似度大于该阈值,则判断fn隶属于该聚类中心Cn,此时将fn加入Cn中,更新得到新的聚类中心Cn′:Calculate the similarity of each new frame to the current cluster center, and specify a threshold σ. When the similarity is greater than the threshold, it is judged that f n belongs to the cluster center C n , and f n is added to C n at this time, Update to get a new cluster center C n′ :
式中,fn表示第n帧的特征向量,Cn表示第n个片段的聚类中心,Cn′表示更新得到新的聚类中心;In the formula, fn represents the feature vector of the nth frame, Cn represents the cluster center of the nth segment, and Cn′ represents the new cluster center obtained by updating;
当相似度小于该阈值,则判断fn隶属于新的聚类中心,此时用fn初始化新的聚类中心Cn′:When the similarity is less than the threshold, it is judged that f n belongs to a new cluster center, and at this time, f n is used to initialize the new cluster center C n′ :
Cn′=fn C n′ =f n
对人脸轮廓进行识别,并去除背景,对视频帧中的人脸进行唇部裁切,定位人脸中五官轮廓点的位置,包括鼻梢的坐标、唇部最左侧的坐标、唇部最右侧的坐标、嘴部中心点的坐标,根据这些坐标裁切处包含唇部细节的图像,裁切尺寸由此公式计算:Identify the contour of the face, remove the background, cut the lips of the face in the video frame, and locate the contour points of the facial features in the face, including the coordinates of the tip of the nose, the coordinates of the leftmost lip, and the lip The rightmost coordinates, the coordinates of the center point of the mouth, according to these coordinates, the image containing the lip details is cropped, and the crop size is calculated by this formula:
式中,LMN表示鼻梢的坐标与嘴部中心点的坐标之间的距离,x右表示唇部最右侧特征点的横坐标,y右表示唇部最右侧特征点的纵坐标,x左表示唇部最左侧特征点的横坐标,y左表示唇部最左侧特征点的纵坐标;In the formula, L MN represents the distance between the coordinates of the nose tip and the coordinates of the center point of the mouth, x right represents the abscissa of the rightmost feature point of the lip, y right represents the ordinate of the rightmost feature point of the lip, x left represents the abscissa of the leftmost feature point of the lip, and y left represents the ordinate of the leftmost feature point of the lip;
对裁切出的唇部图像进行偏差纠正,训练该唇部图像基于卷积神经网络的二分模型,判断提取出的唇部图像是否为有效图像:Correct the deviation of the cropped lip image, train the bipartite model based on the convolutional neural network on the lip image, and judge whether the extracted lip image is a valid image:
式中,l表示卷积层数,k表示卷积核,b表示卷积偏置,Mj表示输入的局部感受值,β表示输出参数,down()表示池化函数。In the formula, l represents the number of convolution layers, k represents the convolution kernel, b represents the convolution bias, M j represents the input local receptive value, β represents the output parameter, and down() represents the pooling function.
接着,对嘴唇视频帧数据集提取特征点,并且匹配相邻帧的特征点,标记位置坐标;Next, extract feature points to the lip video frame data set, and match the feature points of adjacent frames, and mark the position coordinates;
针对提取出的裁切图像,构建D3D模型加速网络收敛,并引入损失函数纠正模型:For the extracted cropped image, build a D3D model to accelerate network convergence, and introduce a loss function to correct the model:
式中,表示的是交叉熵损失,{yi=k}为指示函数,logit(pre)表示网络输出概率,σ是比例系数;In the formula, represents the cross-entropy loss, {y i =k} is the indicator function, logit(pre) represents the network output probability, and σ is the proportional coefficient;
其中,P({Z|X})=∑k=1P(π||X),即所有路径经过合并之后形成的概率之和;Among them, P({Z|X})=∑ k=1 P(π||X), that is, the sum of the probabilities formed after all paths are combined;
分别对相邻两帧的图像提取特征点并得到两组特征点集合:Extract feature points from the images of two adjacent frames and obtain two sets of feature points:
p={p1、p2、p3 … pn}p={p 1 , p 2 , p 3 … p n }
p′={p1′、p2′、p3′ … pn′}p′={p 1 ′, p 2 ′, p 3 ′ … p n ′}
根据相邻两组特征点为中心,将其邻域的窗口W的像素值作为该特征点的描述符,分别计算两组特征点邻域的像素插值:According to the center of the adjacent two groups of feature points, the pixel value of the window W of its neighborhood is used as the descriptor of the feature point, and the pixel interpolation of the neighborhood of the two groups of feature points is calculated separately:
式中,S表示两组特征点领域的像素插值,x表示像素点的横坐标、y表示像素点的纵坐标,W表示领域窗口,在此公式中做描述符,p表示前一帧图像,p′表示后一帧图像;In the formula, S represents the pixel interpolation of the two groups of feature points, x represents the abscissa of the pixel, y represents the ordinate of the pixel, W represents the field window, which is used as a descriptor in this formula, p represents the previous frame image, p' represents the next frame of image;
根据上文得出的像素插值,以根据特征点与邻域窗口之间的匹配系数寻找匹配点:According to the pixel interpolation obtained above, to find matching points according to the matching coefficient between the feature points and the neighborhood window:
式中,G表示前一帧图像的灰度值,G′表示后一帧图像的灰度值,C表示匹配系数,其余符号含义同上。In the formula, G represents the gray value of the image of the previous frame, G′ represents the gray value of the image of the next frame, C represents the matching coefficient, and other symbols have the same meanings as above.
接着,使用帧差法提取特征点位置的变化特征,即嘴唇运动的代数特征;记录相邻三个单独帧的图像,分别记为f(n+1)、f(n)、f(n-1),三帧图像对应的灰度值分别记为G(n+1)x,y、G(n)x,y、G(n-1)x,y,采用帧差法得到图像P′:Next, use the frame difference method to extract the change feature of the feature point position, that is, the algebraic feature of the lip movement; record the images of three adjacent individual frames, denoted as f(n+1), f(n), f(n- 1), the gray values corresponding to the three frames of images are respectively recorded as G(n+1) x,y , G(n) x,y , G(n-1) x,y , and the image P′ is obtained by the frame difference method :
P′=|G(n+1)x,y-G(n)x,y|∩|G(n)x,y-G(n-1)x,y|P′=|G(n+1) x,y -G(n) x,y |∩|G(n) x,y -G(n-1) x,y |
将该图像P′与预设的阈值T比对分析流通性,提取出运动目标,比对条件为:Compare the image P' with the preset threshold T to analyze the circulation, and extract the moving target. The comparison conditions are:
式中,N表示待检测区域中像素的总数目,τ表示光照的抑制系数,A表示完整帧的图像,T为阈值。In the formula, N represents the total number of pixels in the area to be detected, τ represents the suppression coefficient of illumination, A represents the image of the complete frame, and T is the threshold.
步骤4、在数据库中匹配人脸:在多用户终端上,如保险柜、门锁,需要进行人脸识别,匹配数据库中是否存在这个用户的人脸;在单一用户私人终端上,如手机、平板,不需要进行人脸识别,人脸验证即可,采用facenet网络计算人脸特征的欧氏距离,进行比较阈值判断:Step 4. Match faces in the database: On multi-user terminals, such as safes and door locks, face recognition needs to be performed to match whether the user's face exists in the database; on single-user private terminals, such as mobile phones, The tablet does not need face recognition, just face verification. The facenet network is used to calculate the Euclidean distance of the face features, and the comparison threshold is judged:
式中,表示正样本对,表示负样本对,表示平样本对,α表示正样本对与负样本对之间的约束范围,Φ表示三元组的集合;In the formula, represents a positive sample pair, represents the negative sample pair, represents the flat sample pair, α represents the constraint range between the positive sample pair and the negative sample pair, and Φ represents the set of triples;
引入神经元模型:Introduce the neuron model:
hW,b(x)=f(WTx)h W,b (x)=f(W T x)
式中,W表示神经元的权重向量,WTx表示对输入向量x进行非线性变换,f(WTx)表示对该权重向量进行激活函数转换;In the formula, W represents the weight vector of the neuron, W T x represents the nonlinear transformation of the input vector x, and f(W T x) represents the activation function transformation of the weight vector;
将输入向量x赋值为xi,带入WTx:Assign the input vector x to x i , bringing in W T x :
式中,n表示神经网络的级数,b表示偏量。In the formula, n represents the series of the neural network, and b represents the bias.
步骤5、如果匹配成功,需要识别人对着终端摄像头做出同样的唇语指令动作,终端同样提取出唇部特征点,并计算嘴唇运动的代数特征,匹配是否是解锁指令;在采集过程中以嘴唇中心为坐标原点建立坐标轴,将嘴唇灰度图像中内唇区域拟合成两个半椭圆组合,上内唇对应上椭圆,下内唇对应下椭圆,使用帧差法提取对应特征点位置的变化特征即帧间嘴唇运动的代数特征:Step 5. If the matching is successful, it is necessary to recognize that the person makes the same lip language command action to the terminal camera, and the terminal also extracts the lip feature points, and calculates the algebraic features of the lip movement to determine whether the match is an unlock command; during the acquisition process Use the center of the lips as the coordinate origin to establish the coordinate axis, and fit the inner lip area in the gray image of the lips into a combination of two semi-ellipses. The upper inner lip corresponds to the upper ellipse, and the lower inner lip corresponds to the lower ellipse. The frame difference method is used to extract the corresponding feature points. The change feature of the position is the algebraic feature of the lip movement between frames:
记录相邻两个单独帧的图像,分别记为f(n+1)、f(n),两帧图像对应的灰度值分别记为G(n+1)x,y、G(n)x,y、,采用帧差法得到图像P′:Record the images of two adjacent separate frames, denoted as f(n+1), f(n), respectively, and the gray values corresponding to the two frames of images are denoted as G(n+1) x,y , G(n) x, y , and the image P' is obtained by the frame difference method:
P′=|G(n+1)x,y-G(n)x,y|P′=|G(n+1) x,y -G(n) x,y |
将该图像P′与预设的阈值T比对分析流通性,提取出运动目标,比对条件为:Compare the image P' with the preset threshold T to analyze the circulation, and extract the moving target. The comparison conditions are:
式中,N表示待检测区域中像素的总数目,τ表示光照的抑制系数,A表示完整帧的图像,T为阈值。In the formula, N represents the total number of pixels in the area to be detected, τ represents the suppression coefficient of illumination, A represents the image of the complete frame, and T is the threshold.
当匹配人脸或者匹配指令不成功时,提示匹配失败,继续在数据库中匹配人脸,重复上述步骤,当超过三次都匹配失败即临时锁死终端设备。When matching the face or the matching instruction is unsuccessful, it will prompt that the matching fails, continue to match the face in the database, repeat the above steps, and temporarily lock the terminal device if the matching fails more than three times.
综上,针对现有技术的不足,本发明提出了一种基于唇语指令的终端解锁方法,在采集过程中,取几帧图像进行人脸的获取,并提取部分关键特征点。在验证过程中,同理提取需要识别人脸的关键特征点,采用facenet网络计算人脸特征的欧氏距离,进行比较阈值判断。在采集过程中以嘴唇中心为坐标原点建立坐标轴,将嘴唇灰度图像中内唇区域拟合成两个半椭圆组合(上内唇对应上椭圆,下内唇对应下椭圆),使用帧差法提取对应特征点位置的变化特征即帧间嘴唇运动的代数特征,计算判断阈值。在验证过程中,同理提取嘴唇运动特征,进行比较判断。通过对矩阵降维处理、提取特征点、初始化聚类中心、采用facenet网络计算人脸特征的欧氏距离,能够避免空间内某一象限堆积造成梯度过大的问题,并提高网络学习和训练效率,起到主动学习训练模型的效果,解决了传统的固定指令动作易暴露的问题。To sum up, in view of the deficiencies of the prior art, the present invention proposes a terminal unlocking method based on lip language instructions. During the acquisition process, several frames of images are taken to acquire the face, and some key feature points are extracted. In the verification process, the key feature points that need to be recognized for the face are extracted in the same way, and the Euclidean distance of the face features is calculated by the facenet network, and the threshold is compared and judged. In the acquisition process, the center of the lips is used as the coordinate origin to establish the coordinate axis, and the inner lip area in the lip grayscale image is fitted into a combination of two semi-ellipses (the upper inner lip corresponds to the upper ellipse, and the lower inner lip corresponds to the lower ellipse), and the frame difference is used. The method extracts the change feature of the corresponding feature point position, that is, the algebraic feature of the lip motion between frames, and calculates the judgment threshold. In the verification process, the lip motion features are extracted in the same way for comparison and judgment. By reducing the matrix dimension, extracting feature points, initializing cluster centers, and using facenet network to calculate the Euclidean distance of face features, the problem of excessive gradients caused by accumulation of a certain quadrant in the space can be avoided, and the efficiency of network learning and training can be improved. , has the effect of active learning and training model, and solves the problem that the traditional fixed command action is easy to be exposed.
如上所述,尽管参照特定的优选实施例已经表示和表述了本发明,但其不得解释为对本发明自身的限制。在不脱离所附权利要求定义的本发明的精神和范围前提下,可对其在形式上和细节上做出各种变化。As mentioned above, although the present invention has been shown and described with reference to specific preferred embodiments, this should not be construed as limiting the invention itself. Various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the appended claims.
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911045860.6A CN110929239B (en) | 2019-10-30 | 2019-10-30 | Terminal unlocking method based on lip language instruction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911045860.6A CN110929239B (en) | 2019-10-30 | 2019-10-30 | Terminal unlocking method based on lip language instruction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110929239A CN110929239A (en) | 2020-03-27 |
CN110929239B true CN110929239B (en) | 2021-11-19 |
Family
ID=69849882
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911045860.6A Active CN110929239B (en) | 2019-10-30 | 2019-10-30 | Terminal unlocking method based on lip language instruction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110929239B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112733807A (en) * | 2021-02-22 | 2021-04-30 | 佳都新太科技股份有限公司 | Face comparison graph convolution neural network training method and device |
CN114220142B (en) * | 2021-11-24 | 2022-08-23 | 慧之安信息技术股份有限公司 | Face feature recognition method of deep learning algorithm |
CN114220177B (en) * | 2021-12-24 | 2024-06-25 | 湖南大学 | Lip syllable recognition method, device, equipment and medium |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016061780A1 (en) * | 2014-10-23 | 2016-04-28 | Intel Corporation | Method and system of facial expression recognition using linear relationships within landmark subsets |
WO2016201679A1 (en) * | 2015-06-18 | 2016-12-22 | 华为技术有限公司 | Feature extraction method, lip-reading classification method, device and apparatus |
CN106570461A (en) * | 2016-10-21 | 2017-04-19 | 哈尔滨工业大学深圳研究生院 | Video frame image extraction method and system based on lip movement identification |
KR101767234B1 (en) * | 2016-03-21 | 2017-08-10 | 양장은 | System based on pattern recognition of blood vessel in lips |
CN107358085A (en) * | 2017-07-28 | 2017-11-17 | 惠州Tcl移动通信有限公司 | A kind of unlocking terminal equipment method, storage medium and terminal device |
CN108960103A (en) * | 2018-06-25 | 2018-12-07 | 西安交通大学 | The identity identifying method and system that a kind of face and lip reading blend |
CN109409195A (en) * | 2018-08-30 | 2019-03-01 | 华侨大学 | A kind of lip reading recognition methods neural network based and system |
CN110276230A (en) * | 2018-03-14 | 2019-09-24 | 阿里巴巴集团控股有限公司 | The method, apparatus and electronic equipment that user authentication, lip reading identify |
CN110276259A (en) * | 2019-05-21 | 2019-09-24 | 平安科技(深圳)有限公司 | Lip reading recognition methods, device, computer equipment and storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9159321B2 (en) * | 2012-02-27 | 2015-10-13 | Hong Kong Baptist University | Lip-password based speaker verification system |
US20150279364A1 (en) * | 2014-03-29 | 2015-10-01 | Ajay Krishnan | Mouth-Phoneme Model for Computerized Lip Reading |
-
2019
- 2019-10-30 CN CN201911045860.6A patent/CN110929239B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016061780A1 (en) * | 2014-10-23 | 2016-04-28 | Intel Corporation | Method and system of facial expression recognition using linear relationships within landmark subsets |
WO2016201679A1 (en) * | 2015-06-18 | 2016-12-22 | 华为技术有限公司 | Feature extraction method, lip-reading classification method, device and apparatus |
KR101767234B1 (en) * | 2016-03-21 | 2017-08-10 | 양장은 | System based on pattern recognition of blood vessel in lips |
CN106570461A (en) * | 2016-10-21 | 2017-04-19 | 哈尔滨工业大学深圳研究生院 | Video frame image extraction method and system based on lip movement identification |
CN107358085A (en) * | 2017-07-28 | 2017-11-17 | 惠州Tcl移动通信有限公司 | A kind of unlocking terminal equipment method, storage medium and terminal device |
CN110276230A (en) * | 2018-03-14 | 2019-09-24 | 阿里巴巴集团控股有限公司 | The method, apparatus and electronic equipment that user authentication, lip reading identify |
CN108960103A (en) * | 2018-06-25 | 2018-12-07 | 西安交通大学 | The identity identifying method and system that a kind of face and lip reading blend |
CN109409195A (en) * | 2018-08-30 | 2019-03-01 | 华侨大学 | A kind of lip reading recognition methods neural network based and system |
CN110276259A (en) * | 2019-05-21 | 2019-09-24 | 平安科技(深圳)有限公司 | Lip reading recognition methods, device, computer equipment and storage medium |
Non-Patent Citations (6)
Title |
---|
A Large-Scale Hierarchical Multi-View RGB-D Object Dataset;Kevin Lai;《Proceedings - IEEE International Conference on Robotics and Automation·May 2011》;20140528;文章全文 * |
FaceNet: A Unified Embedding for Face Recognition and Clustering;Florian Schroff;《2015 IEEE Conference on Computer Vision and Pattern Recognition》;20151015;文章全文 * |
基于帧差法和边缘检测法的视频分割算法;陈春雨;《济南大学学报》;20120316;第26卷(第1期);第31-36页 * |
声纹识别(说话人识别);weixin_30596343;《https://blog.csdn.net/weixin_30596343/details/99112089》;20180726;文章全文 * |
帧差法在运动目标实时跟踪中的应用;邱道尹;《华北水利水电学院学报》;20100111;第30卷(第3期);第45-46、64页 * |
视频镜头分割和关键帧提取关键技术研究;郝会芬;《中国优秀硕士学位论文全文数据库 信息科技辑》;20160215;第2-4章 * |
Also Published As
Publication number | Publication date |
---|---|
CN110929239A (en) | 2020-03-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | Cosface: Large margin cosine loss for deep face recognition | |
CN111368683B (en) | Face Image Feature Extraction Method and Face Recognition Method Based on Modular Constraint CenterFace | |
CN102194131B (en) | Fast human face recognition method based on geometric proportion characteristic of five sense organs | |
US8655029B2 (en) | Hash-based face recognition system | |
CN108268859A (en) | A kind of facial expression recognizing method based on deep learning | |
CN107330397B (en) | A Pedestrian Re-identification Method Based on Large-Interval Relative Distance Metric Learning | |
US11594074B2 (en) | Continuously evolving and interactive Disguised Face Identification (DFI) with facial key points using ScatterNet Hybrid Deep Learning (SHDL) network | |
CN111027464B (en) | Iris Recognition Method Jointly Optimized for Convolutional Neural Network and Sequential Feature Coding | |
CN110929239B (en) | Terminal unlocking method based on lip language instruction | |
US10885171B2 (en) | Authentication verification using soft biometric traits | |
CN107977609A (en) | A kind of finger vein identity verification method based on CNN | |
CN107729820B (en) | A Finger Vein Recognition Method Based on Multi-scale HOG | |
CN107103281A (en) | Face identification method based on aggregation Damage degree metric learning | |
CN106355138A (en) | Face recognition method based on deep learning and key features extraction | |
CN108446601A (en) | A kind of face identification method based on sound Fusion Features | |
CN109344856B (en) | Offline signature identification method based on multilayer discriminant feature learning | |
CN110659586B (en) | Gait recognition method based on identity-preserving cyclic generation type confrontation network | |
CN106650606A (en) | Matching and processing method of face image and face image model construction system | |
CN111898533B (en) | A gait classification method based on spatiotemporal feature fusion | |
CN112464730A (en) | Pedestrian re-identification method based on domain-independent foreground feature learning | |
CN110119695A (en) | A kind of iris activity test method based on Fusion Features and machine learning | |
CN111832405A (en) | A face recognition method based on HOG and deep residual network | |
CN110555386A (en) | Face recognition identity authentication method based on dynamic Bayes | |
CN110880010A (en) | Visual SLAM closed loop detection algorithm based on convolutional neural network | |
CN110633655A (en) | Attention-attack face recognition attack algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 211000 floor 3, building 3, Qilin artificial intelligence Industrial Park, 266 Chuangyan Road, Nanjing, Jiangsu Applicant after: Zhongke Nanjing artificial intelligence Innovation Research Institute Applicant after: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES Address before: 211000 3rd floor, building 3, 266 Chuangyan Road, Jiangning District, Nanjing City, Jiangsu Province Applicant before: NANJING ARTIFICIAL INTELLIGENCE CHIP INNOVATION INSTITUTE, INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES Applicant before: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES |
|
GR01 | Patent grant | ||
GR01 | Patent grant |