WO2015078007A1 - 一种快速人脸对齐方法 - Google Patents

一种快速人脸对齐方法 Download PDF

Info

Publication number
WO2015078007A1
WO2015078007A1 PCT/CN2013/088224 CN2013088224W WO2015078007A1 WO 2015078007 A1 WO2015078007 A1 WO 2015078007A1 CN 2013088224 W CN2013088224 W CN 2013088224W WO 2015078007 A1 WO2015078007 A1 WO 2015078007A1
Authority
WO
WIPO (PCT)
Prior art keywords
regression
feature
sample
face
human face
Prior art date
Application number
PCT/CN2013/088224
Other languages
English (en)
French (fr)
Inventor
徐勇
钟左峰
Original Assignee
徐勇
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 徐勇 filed Critical 徐勇
Priority to PCT/CN2013/088224 priority Critical patent/WO2015078007A1/zh
Publication of WO2015078007A1 publication Critical patent/WO2015078007A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions

Definitions

  • the present invention relates to the field of general image data processing, and in particular, to a fast face alignment method. Background technique
  • face recognition technology has gradually become one of the hot spots of biometric technology. It has broad application prospects in the fields of video surveillance, information security, and network socialization. At the same time, in line with the concept of Ping An City, the role of face recognition is further highlighted.
  • existing face recognition methods are based on digital images. These face images are usually acquired in an unconstrained environment and are affected by factors such as illumination, expression and posture. If the face alignment process is missing or the alignment result is rough. It is necessary to identify the algorithm itself to have good adaptability to illumination, expression and gesture, especially in the feature extraction stage, it is necessary to extract local invariant features. This increases the difficulty and complexity of the face recognition algorithm and reduces the versatility of the algorithm. Even coarse alignment results can interfere with the recognition algorithm, which in turn reduces the accuracy of the recognition algorithm.
  • the existing face alignment methods are mainly divided into two categories.
  • the parameter optimization method mainly transforms the face estimation shape from the face alignment process to the real shape convergence problem, and the objective function is easy to optimize.
  • Such an approach has an AAM model that constructs the entire face shape using a shape model and estimates the face shape by optimizing the error remainder.
  • the model that is learned is not very scalable, and it is difficult to perform a good face shape estimation for various face images affected by factors such as posture, expression, and illumination.
  • the AAM model relies on parameter initialization and limits its applicability.
  • the shape regression method is to learn a regression function, and the regression function directly maps the face shape to the mesh. Target results.
  • a large number of training samples can train a good objective function.
  • Another way to shape the regression method is to use only a single face shape to mark points during the learning process. But this only learns the local features and ignores the global correlation of the entire face. Therefore, the learned regression system has poor performance and does not have good generalization ability. Since most of the face images acquired in reality are collected in a non-ideal environment, it will be affected by the following factors:
  • the face image acquired in a non-uniform illumination environment will cause a strong contrast between the face and the face.
  • the dark part will be unclear after being grayed out.
  • the technical problem to be solved by the present invention is to provide a method with good adaptability and fast calculation speed, specifically using a tandem regression structure and a fast feature selection method in a face image.
  • the method of fast aligning the face shape the method performs fast face shape regression while maintaining the shape constraint of the face.
  • the technical solution adopted by the present invention is to provide a fast face alignment method, including a regression device training module and a regression device application module.
  • the regression device training module includes the following steps: Step 1: extracting a real shape label point of the face as a training sample for the face image sample in each case, and then initializing the face shape label point for the training sample, and labeling the face shape as a face image Manually mark the main feature points of the eyes, nose and mouth;
  • Step 2 performing feature extraction on the training sample
  • Step 3 Calculate a regression objective function of each training sample, that is, calculate an absolute value of a difference between the real face shape and the current estimated face shape;
  • Step 4 learning the regression device
  • Step 5 updating the current face shape with the result output by the regression device
  • Step 6 Calculate the current regression target / 2 norm of the training sample, and compare the / 2 norm with the threshold;
  • Step 7 sequentially train a set of regressions, and the regressions form a series structure, and obtain the regression device requires feature calculation s position.
  • the regressionr application module includes the following steps:
  • Step 10 input the test sample into a trained set of regressions
  • Step 20 the test sample is extracted according to step 2;
  • Step 30 Compare the test sample features with the threshold generated by each regression training, and assign the samples to a block existing in each regression device;
  • Step 40 Update the estimated face shape by adding the average regression target of the block to the current face shape of the sample in the block for the samples allocated to each block;
  • Step 50 Perform feature extraction according to the position that the regression device determines the feature extraction in step 7. If the feature extraction is required, return to step 20, otherwise return to step 30 until all the regressions are completed;
  • Step 60 Add the current face shape of the test sample to each regression test result to form a final face estimation shape.
  • the feature extraction method of the step 2 is as follows:
  • Step 21 Select the coordinate values of each face shape label point in order
  • Step 22 randomly generate local coordinate values in a local range of each face shape label point
  • Step 23 Add each local coordinate value to the corresponding coordinate value of each face shape label point to generate the generated feature point, and extract Its pixel value;
  • Step 24 The difference between the pixel values of all the extracted feature points is the feature.
  • the / 2 norm in the step 6 is compared with the threshold, specifically when the norm / 2 is less than a certain threshold, returning to the step 3; returning when the norm / 2 is greater than a certain threshold Said step
  • the feature selection method based on maximum correlation is repeatedly used to select Feature, extract multiple thresholds.
  • the feature selection method based on maximum correlation is as follows: calculating a correlation coefficient between each pixel difference feature and a random projection value of a regression target, and finding a feature with the largest correlation coefficient as a selection feature, the regression target is random
  • the projection value is obtained from the inner product of the regression target vector and the randomly generated projection vector.
  • the regression unit is learned in the step 4 as follows: the pixel difference features extracted by each test sample are all input into each of the regenerators, and each of the series structures is sequentially trained by the regression objective function. Regressor, until the required number of regressions are completed.
  • the repeller learns that all the pixel difference features extracted by each test sample are input into each of the repellers, and then each of them is sequentially trained according to the series structure and the regression objective function of each of the repellers. Regressor, until the required number of regressions are completed.
  • the certain threshold is [70, 40].
  • the position of the regression device in the step 60 that requires feature extraction is divided into a plurality of small structures that do not need to extract features according to the division of the serial structure, and each small structure shares the same feature.
  • the result of the regression unit output in the step 5 is that the current face shape of each training sample is added to each of the regression training results.
  • the beneficial effects of the present invention are:
  • the method is a parameterless method, and the training and testing of the regression device do not need to set and adjust parameters, which greatly enhances the flexibility of the model; the method can well maintain the face shape constraint during testing Fast and accurate regression of the face shape; low algorithm complexity, training and testing using pixel difference features, small computation; good generalization of the method, well-trained regression model can be widely adapted to face images in various situations , save a lot of algorithm running time, improve the efficiency of subsequent identification, tracking and other methods.
  • FIG. 1 is a frame diagram of a fast face alignment method of the present invention
  • FIG. 7 is a flow chart of a test phase of the fast face alignment method of the present invention.
  • the method proposes a fast alignment of face shapes in a face image using a tandem regression structure and a fast feature selection method.
  • the system is divided into two phases.
  • the learning and training phase firstly, the labeled real face shape samples are initialized into the face shape, and then the pixel difference features obtained by the current face shape of each sample are used to train the plurality of regressions respectively, and These regressions form a tandem structure. Finally, the current face shape of each sample is added to each regressionr training result as the final output.
  • test sample is arbitrarily initialized into the face shape, and then the current face shape is used to obtain the pixel difference feature and the regression target input trained in the regression device, and the regression result is added to the final face estimation shape.
  • This method can quickly and accurately obtain the face shape of face images in various situations (light, expression, posture).
  • the method is a fast face alignment method, including a regression tool training module 100 and a regressionr application module 200.
  • the regression tool training module 100 mainly inputs training samples of each artificial face shape that has been artificially generated in the training set. And a face shape is initialized for each training sample. Each time a regression is trained with all samples until the specified number of regressions are completed. In each training of the regression, a pixel difference feature is generated for each sample and N features having the largest correlation coefficient of the pixel difference feature and the regression target projection are selected. Among them, the regression target is the absolute value of the difference between the current face shape and the real face shape.
  • Each regression training uses a random threshold to divide the sample space into several small blocks.
  • the average regression target value of each small block is used as the updated accumulated value of the sample face estimation shape in the corresponding block, so each final training completes the regression.
  • the device consists of three random thresholds, the sequence number of the points from which the feature is extracted, and the average regression target of each small block.
  • the regression tool training module 100 includes the following steps:
  • Step 1 Extract the real shape label points of the face as the training samples for the face image samples in various situations.
  • the face image samples in various situations are obtained, and the sample size is at least 10,000 or more.
  • the coverage of the sample should be rich enough. It should include all kinds of illumination and facial expressions obtained in various scenes, face images with different postures, including indoor and outdoor collection environments, so as to ensure the full learning of the regression device.
  • the obtained face sample manually extracts the real shape labeling point of the face, and the method of labeling the face shape is to manually mark the main feature points of the eyes, nose and mouth in the face image, as shown in FIG. 2 , is a sample of 7 points on the eyes, nose and mouth for training.
  • the face shape annotation point is initialized for the training sample, and the initialization includes performing the same sample.
  • Step 2 performing feature extraction on the training samples; locally extracting pixel points of the currently estimated face shape label points of each training sample, and then extracting two or two pixel differences of the pixel points as selection features;
  • Step 3 calculating each training The regression objective function of the sample (the absolute value of the difference between the real face shape and the current estimated face shape);
  • Step 4 learning the regression device and forming the regression devices into a series structure; the regression device learns that the pixel difference features extracted by each test sample are all input into each regression device, and the regression objective function is sequentially used to train Each regenerator consisting of a series structure is completed until the required number of regenerators are trained.
  • Each of the regressions completed by the extracted pixel difference feature and the regression target training includes the obtained N random thresholds, the sequence number of the extracted feature points, and the average regression target of each tile.
  • the training of a single regressionr generates a random threshold for the ⁇ features of the input samples, and uses these thresholds to divide the samples into ⁇ small blocks.
  • the method constructs a binary decision tree with a random threshold.
  • the decision tree has a common layer, each layer shares the same threshold, and the feature entry into each node is greater than the threshold assigned to the right subtree, and less than the threshold is assigned to the left subtree. This divides the sample into ⁇ small blocks.
  • the estimated face shape is updated for the samples assigned to each block by adding the average regression target of the block to the current face shape of the sample. Finally calculate the updated regression target / 2 norm.
  • the value of the norm is greater than a certain threshold for feature extraction and selection and then enters the calculation of the regression target. If it is less than that, it is not necessary to directly calculate the regression target.
  • the training of such a regression is completed.
  • the current face shape after the training is input to the next regression device and trained in the same way. This can greatly reduce the amount of feature extraction, and does not require each regression to extract features. Because in the process of fine characterization, it is estimated that the shape position will not be greatly deviated. This speeds up the algorithm. It also ensures that the alignment process is done from coarse to fine.
  • the specific process is as follows: where / ⁇ nie is the pixel value of two feature points, ⁇ , ⁇ is the sequence number of the feature extraction point.
  • Step3 Divide the training samples into ⁇ b in ⁇ 6 :
  • Step4 Calculate the average regression target of the sample in each bin:
  • Step5 Construct a trained complete regressionr r b ⁇ f ⁇ ⁇ T f ⁇ ⁇ y __ l ⁇
  • Step6 Update the return target
  • Step 5 Update the current face shape with the output of the regression device; that is, add the regression device output result to the current face shape vector;
  • Step 6 calculating the current regression target / 2 norm of the training sample, using the / 2 norm to compare with the threshold; when the norm is less than a certain threshold, returning to step 3; when the norm is greater than a certain threshold, returning to step 2, Re-extract the pixel difference feature.
  • a set of trained regressions and the position where the regressions need feature calculation are obtained; and the / 2 norm is defined as the Euclidean distance between two points, that is, the squared sum between the two point vectors.
  • the method is the distance between the current face shape vector and the real face shape vector. All samples are trained to complete a regression, update the current face estimate shape and input to the next regression to train.
  • the method In order to speed up the training, it is not necessary to extract features every time the training is performed, because in the process of fine characterization, the current face shape position does not deviate greatly from the real face shape position. Therefore, in order to judge whether it is necessary to extract features, the method needs to take the / 2 norm of the current face regression target and judge by the threshold. When the norm is greater than the threshold, indicating that the alignment error is large, a new feature extraction is required, and when the norm is smaller than the threshold, the feature is not extracted. This can save a lot of training time.
  • Step 7 training a set of regressions in turn, and the regressions form a series structure, and at the same time obtain the position where the regression device needs the feature calculation.
  • the series structure is divided according to the position of the regression device that needs feature extraction, and can be divided into several small structures that do not need to extract features.
  • the principle of dividing the series structure is the position of the feature extraction regenerator in the algorithm. Because the threshold is used to judge whether the next regenerator training needs to extract features, so from the overall composition of all regressions, some regenerators need to extract features, some do not So, the regressions to be extracted divide the structure.
  • the threshold setting is determined by experimental experience. The difference is that the threshold of the initialization condition will be different, but it is generally within the range of [70, 40]. For example, the method uses the average value of the real face shape of all samples as the initial shape of the face. The threshold used is 50.
  • the regression unit uses the module 200 as a test phase, mainly for the initialized face. Adding to the current estimated face shape includes the following steps:
  • Step 10 input the test sample into a trained set of regressions, and the face shape of the test sample is arbitrarily initialized;
  • Step 20 the test sample is extracted according to step 2;
  • Step 30 comparing the test sample characteristics with the threshold of each regression/ 2 norm;
  • Step 40 Update the estimated face shape by adding the average regression objective function of the block to the current face shape of the sample in the block for the samples allocated to each block;
  • Step 50 Perform feature extraction according to the position that the regression device determines the feature extraction in step 7. If the feature is different, return to step 20 or return to step 30 until all the regressions are completed. After the training is completed, the regression device needs to be extracted. The position of the feature is determined, and the test can perform feature extraction based on this position;
  • Step 60 Add the current face shape of the test sample to each regression test result to form a final face estimation shape.
  • the method uses pixel difference as a feature. Because the pixel difference feature has the advantages of convenient extraction and fast calculation.
  • a plurality of pixel point gray values are randomly collected in a local range of each label point of the current face shape, and then the gray scale values are used as a feature.
  • each point is randomly selected within a range of 8 pixels, and a total of 4 QQ points are collected. Then these 4QQ points form a total of 16QQ()() pixel difference features.
  • the feature extraction method is as follows:
  • the feature extraction method of the step 2 is as follows:
  • Step 21 Select the coordinate values of each face shape label point in order
  • Step 22 randomly generate local coordinate values in a local range of each face shape label point
  • Step 23 Add each local coordinate value to the corresponding coordinate value of each face shape label point to generate the generated feature point, and extract Its pixel value;
  • Step 24 The difference between the pixel values of all the extracted feature points is the feature.
  • the best features In order to train a good regression, the best features must be selected for the extracted features.
  • the n-bes t method is usually used.
  • this method requires finding an optimal feature in a huge feature space, which requires a huge amount of computation. Therefore, the method repeatedly selects features based on the feature selection method based on maximum correlation, and extracts multiple thresholds. It mainly calculates the correlation coefficient of each pixel difference feature and the regression target projection value, and finds the feature with the largest correlation coefficient as the selection feature.
  • the method needs to repeat this process to extract multiple features, and the projection of the regression target is obtained by the inner product of the regression target vector and the randomly generated projection vector.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

本发明为一种快速人脸对齐方法,本方法提出使用串联回归结构和快速特征选择方法在人脸图片中对人脸形状进行快速对齐。系统分为两个阶段。在学习训练阶段,首先将已标注好的真实人脸形状样本进行人脸形状初始化,后用每个样本的当前人脸形状求取的像素差特征与回归目标分别训练多个回归器,并将这些回归器组成一个串联结构。最后每个样本的当前人脸形状与每个回归器训练结果相加作为最后输出。在测试阶段,先将测试样本任意初始化人脸形状,后用当前人脸形状求取像素差特征与回归目标输入训练好的回归器中,并与回归器结果相加即为最终人脸估计形状。此方法可以快速准确的获取各种情况下(光照、表情、姿态)人脸图片的人脸形状。

Description

一种快速人脸对齐方法
技术领域
本发明涉及一般的图像数据处理领域, 尤其涉及一种快速人脸对齐方法。 背景技术
随着科技的发展, 人脸识别技术渐渐成为生物识别技术的热点之一。 它 在视频监控、 信息安全、 网络社交等领域有着广泛的应用前景。 同时配合着平 安城市概念的提出, 人脸识别的作用被进一步的突出。 但是现有的人脸识别方 法都基于数字图像的, 这些人脸图像通常在非约束环境下采集, 会受到光照、 表情和姿态等因素的影响。 如果缺少人脸对齐过程或者对齐结果粗糙。 则需要 识别算法本身要对光照、 表情和姿态有很好的适应性, 特别是在特征提取阶段 更是需要提取局部不变性特征。 这就增加了人脸识别算法的难度与复杂度, 降 低了算法的通用性。 甚至粗糙的对齐结果还会对识别算法产生干扰, 反而降低 了识别算法的正确率。 现有的人脸对齐方法主要分为两类
( 1 )基于参数优化的方法 参数优化方法主要是将人脸对齐过程中的人脸估计形状向真实形状收敛 问题转换为一个目标函数求解问题, 且此目标函数是便于优化的。 此类方法有 AAM模型, 它是用形状模型构建整个人脸形状, 并且通过最小化误差余项的优 化方法来估计人脸形状。但是学习到的模型的扩展性不强,很难对各种受姿态、 表情和光照等因素影响的人脸图像的进行很好的人脸形状估计。 同时, AAM模 型依赖于参数初始化情况, 也限制它的应用性。
( 2 )基于形状回归的方法
形状回归方法是学习一个回归函数, 回归函数直接将人脸形状映射到目 标结果。 大量的训练样本可以训练出一个很好的目标函数。 但此类方法依旧存 在缺点: 函数求解过程中需要求解出一组最小化参数, 但是函数参数的最小化 并不意味着对齐错误的最小化, 所以求解出的函数并不能很好的表示人脸形状。 形状回归方法的另一种方式是在学习的过程仅使用单个人脸形状标注点。 但是 这样只学习到局部特征而将整个人脸的全局相关性忽略。 所以学习到的回归器 性能较差, 没有很好的泛化能力。 鉴于现实中获取的人脸图像大部分都在非理想环境下采集, 会受到以下 几个因素的影响:
1. 光照, 在非均匀光照环境下获取的人脸图像会使人脸部分出现明暗对 比强烈的情况。 暗的部分在灰度化后会照成轮廓不清晰。
2. 表情, 大部分自然状态下采集人脸图像都有表情变化, 这使得五官位 置与标准位置有所不同。
3. 姿态, 大部分人脸图像采集都不是采集对象主动配合下完成的, 所以 会有各种姿态变化, 如人脸向左或者向右偏转等。 除了以上三种主要的因素外, 还有很多情况是几种因素的共同影响。 所 以好的人脸对齐方法要对这些影响因素有良好的适应性, 现有的人脸对齐方法, 存在效果好但是计算量大或者计算速度快但是适应性差的问题。 发明内容
针对现有技术中存在的缺陷或不足, 本发明所要解决的技术问题是: 提供 一种适应性好且计算速度快的方法, 具体为用串联回归结构和快速特征选择方 法在人脸图片中对人脸形状进行快速对齐速的方法, 本方法在保持人脸形状约 束的情况下进行快速的人脸形状回归。
本发明采取的技术方案为提供一种快速人脸对齐方法, 包括回归器训练模 块和回归器运用模块,
所述回归器训练模块包括以下步骤: 步骤 1 , 对各种情况下的人脸图片样本提取人脸的真实形状标注点作为训 练样本, 再对训练样本进行人脸形状标注点初始化, 对人脸形状标注点的作法 为在人脸图像中在眼睛、 鼻子和嘴巴主要特征点上进行人工打点标注;
步骤 2 , 对训练样本进行特征提取;
步骤 3 , 计算每个训练样本的回归目标函数, 也就是计算真实人脸形状和 当前估计人脸形状的差的绝对值;
步骤 4 , 对回归器进行学习;
步骤 5 , 用回归器输出的结果更新当前人脸形状;
步骤 6 , 计算训练样本当前回归目标 /2范数, 用该 /2范数与阈值相比较; 步骤 7 , 依次训练出一组回归器, 且回归器组成串联结构, 同时获得回归 器需要特征计算的位置。
作为本发明的进一步改进, 所述回归器运用模块包括以下步骤:
步骤 10 , 将测试样本输入训练好的一组回归器;
步骤 20 , 测试样本依据步骤 2进行特征提取;
步骤 30 , 将测试样本特征与每个回归器训练生成的阈值做比较, 将样本分 配到每个回归器已有的某个块中;
步骤 40 ,对分配到每个块中的样本用本块的平均回归目标与本块中样本当 前人脸形状相加来更新估计人脸形状;
步骤 50 , 根据步骤 7中确定的回归器需要特征提取的位置进行特征提取, 如需要特征提取, 则返回步骤 20 , 否者返回步骤 30 , 直到所有的回归器计算 完成;
步骤 60 , 用测试样本的当前人脸形状与每个回归器测试结果相加, 形成最 终人脸估计形状。
作为本发明的进一步改进, 所述步骤 2的特征提取方法如下:
步骤 21 : 按顺序选取每个人脸形状标注点的坐标值;
步骤 22: 在每个人脸形状标注点的局部范围内随机生成局部坐标值; 步骤 23: 将每个局部坐标值与对应的每个人脸形状标注点坐标值相加即 为生成的特征点, 提取其像素值;
步骤 24: 将所有提取的特征点的像素值两两做差即为特征。
作为本发明的进一步改进,所述步骤 6中 /2 范数与阈值相比较具体为当范 数 /2小于某个阈值时, 返回所述步骤 3; 当范数 /2大于某个阈值时返回所述步骤
2 , 重新提取像素差特征。
作为本发明的进一步改进, 重复采用基于最大相关性的特征选取方法选取 特征, 提取多次阈值。
作为本发明的进一步改进, 所述基于最大相关性的特征选取方法如下: 计 算每个像素差特征和回归目标随机投影值的相关系数, 找到相关系数最大的特 征作为选取特征, 所述回归目标随机投影值由回归目标向量和随机生成的投影 向量的内积而得到。
作为本发明的进一步改进, 所述步骤 4中对回归器进行学习如下: 每个测 试样本提取的像素差特征全部输入到每个回归器中, 用回归目标函数依次训练 由串联结构组成的每个回归器, 直到要求数量的回归器训练完成。
作为本发明的进一步改进, 所述回归器学习为, 每个测试样本提取的像素 差特征全部输入到每个回归器中, 然后按照每个回归器组成的串联结构和回归 目标函数依次训练每个回归器, 直到要求数量的回归器训练完成。
作为本发明的进一步改进, 所述某个阈值为 [70 , 40]。
作为本发明的进一步改进, 所述步骤 60 中需要特征提取的回归器位置为 依据将串联结构进行划分, 划分为若干个不需要提取特征的小结构, 每个小结 构共享同一个特征。
作为本发明的进一步改进, 所述步骤 5中回归器输出的结果为每个训练样 本的当前人脸形状与每个回归器训练结果相加。
本发明的有益效果是: 本方法是无参数化方法, 回归器的训练和测试都不 需要设置和调整参数, 大大加强了模型的灵活性; 方法在测试时可以很好保持 的人脸形状约束, 快速准确的回归出人脸形状; 算法复杂度低, 训练和测试都 使用像素差特征, 计算量小; 方法的泛化性好, 训练充分的回归模型可以广泛 适应各种情况的人脸图片, 节约大量的算法运行时间, 提高后续识别, 跟踪等 方法的效率。
附图说明
图 1是本发明快速人脸对齐方法的框架图;
图 2是本发明快速人脸对齐方法人脸形状标注样例;
图 3是本发明快速人脸对齐方法测试结果例图一;
图 4是本发明快速人脸对齐方法测试结果例图二;
图 5是本发明快速人脸对齐方法测试结果例图三;
图 6是本发明快速人脸对齐方法测试结果例图四;
图 7是本发明快速人脸对齐方法测试阶段流程图。
具体实施方式
下面结合附图说明及具体实施方式对本发明进一步说明。 如图 1所示, 本方法提出使用串联回归结构和快速特征选择方法在人脸图 片中对人脸形状进行快速对齐。 系统分为两个阶段。 在学习训练阶段, 首先将 已标注好的真实人脸形状样本进行人脸形状初始化, 后用每个样本的当前人脸 形状求取的像素差特征与回归目标分别训练多个回归器, 并将这些回归器组成 一个串联结构。 最后每个样本的当前人脸形状与每个回归器训练结果相加作为 最后输出。 在测试阶段, 先将测试样本任意初始化人脸形状, 后用当前人脸形 状求取像素差特征与回归目标输入训练好的回归器中, 并与回归器结果相加即 为最终人脸估计形状。 此方法可以快速准确的获取各种情况下 (光照、 表情、 姿态)人脸图片的人脸形状。
本方法为快速人脸对齐方法, 包括回归器训练模块 100和回归器运用模块 200 , 回归器训练模块 100 为训练过程主要是将训练集中每个已经人工生成的 真实人脸形状的训练样本输入, 并且对每个训练样本初始化一个人脸形状。 每 次用所有的样本训练一个回归器, 直到指定数量的回归器训练完成。 每个回归 器的训练中对每个样本生成像素差特征并且选取像素差特征和回归目标投影 相关系数最大的 N个特征。 其中, 回归目标是当前人脸形状与真实人脸形状的 差的绝对值。 每个回归器训练就是用随机阈值将样本空间划分为若干个小块, 每个小块的平均回归目标值作为相应块中样本人脸估计形状的更新累加值, 故 每个最终训练完成的回归器由 N个随机阈值、 提取特征的点的序号和每个小块 的平均回归目标三个部分组成。
回归器训练模块 100包括以下步骤:
步骤 1 , 对各种情况下的人脸图片样本提取人脸的真实形状标注点作为训 练样本, 首先要获取各种情况下人脸图片样本, 样本量至少到万张以上。 样本 的覆盖的情况要足够丰富, 应该包括各种光照和各种场景下获取的表情, 姿态 各异的人脸图片, 包括室内和室外采集环境, 这样才可以保证回归器的充分的 学习, 在获取的人脸样本上人工提取人脸的真实形状标注点, 对人脸形状标注 点的作法为在人脸图像中在眼睛、 鼻子和嘴巴主要特征点上进行人工打点标注, 如图 2所示, 是一个在眼睛、 鼻子和嘴巴上共标注 7个点的样例, 以便训练使 用。 再对训练样本进行人脸形状标注点初始化, 初始化包括对同一张样本进行 步骤 2 , 对训练样本进行特征提取; 对每个训练样本的当前估计人脸形状 标注点局部随机提取像素点, 然后提取的像素点的两两像素差作为选取特征; 步骤 3 , 计算每个训练样本的回归目标函数(真实人脸形状和当前估计人 脸形状的差的绝对值);
步骤 4 , 对回归器进行学习并将这些回归器组成一个串联结构; 所述回归 器学习为, 每个测试样本提取的像素差特征全部输入到每个回归器中, 用回归 目标函数依次训练由串联结构组成的每个回归器, 直到要求数量的回归器训练 完成。 用提取的像素差特征和回归目标训练完成的每一个回归器包括获取的 N 个随机阔值、 被提取特征点的序号和每个小块的平均回归目标三个部分。
单个回归器的训练对于输入样本的 κ个特征对应生成 个随机阈值,用这些 阈值将样本划分入到 ^个小块中。 此处, 本方法用 个随机阈值构建一个二叉 决策树。 决策树共有 层, 每层共享同一个阈值, 特征进入每个节点大于阈值 分配到右子树, 小于阈值分配到左子树。 这样就将样本划分到 ^个小块中。 然 后计算每个块中样本的平均回归目标。 对分配到每个块中的样本用本块的平均 回归目标与样本当前人脸形状相加来更新估计人脸形状。 最后计算更新后回归 目标 /2范数。范数值大于某个阈值进行特征提取和选取后进入回归目标的计算。 小于则不需要, 直接进行回归目标的计算。 这样一个回归器的训练就完成。 这 次训练完的当前人脸形状输入到下一个回归器中依同样方法进行训练。 这样可 以大大减少算法特征提取量, 不需要每个回归器都要提取特征。 因为在精细刻 画过程中, 估计形状位置不会有巨大的偏离。 这样就加快算法运行速度。 同时 也保证对齐过程由粗到精的完成。 具体过程如下: 其中/ ρ„为两个特征点像素值, η^ ,ι^为特征提取点的序 号。
Stepl: 每个样本提取的 κ个特征 { Pm - Pn }K f=l ,{m ,n }K f=l
Step2:
Figure imgf000008_0001
Step3: 将训练样本划分入 ^个 b in Ω6中: Step4: 计算每个 bin中的样本平均回归目标:
°^= = 1 + ¾ | IQJ ^为收缩系数 Step5: 构建一个训练完成的回归器 rb ^{{ f} {Tf} {y __l}
Step6: 更新回归目标
Figure imgf000009_0001
步骤 5, 用回归器输出的结果更新当前人脸形状; 即将回归器输出结果与 当前人脸形状向量相加;
步骤 6, 计算训练样本当前回归目标 /2范数, 用该 /2范数与阈值相比较; 当 范数小于某个阈值时, 返回步骤 3; 当范数大于某个阈值时返回步骤 2, 重新 提取像素差特征。 最终获得一组训练好的回归器以及回归器需要特征计算的位 置; 而 /2范数的定义为两点之间的欧式距离, 即两个点向量之间的平方和后开 方。 本方法为当前人脸形状向量和真实人脸形状向量之间的距离。 所有样本训 练完一个回归器, 更新当前人脸估计形状后输入到下一个回归器进行训练。 为 了加快训练速度, 并不需要每次训练都提取特征, 因为在精细刻画过程中, 当 前人脸形状位置不会与真实人脸形状位置有巨大的偏离。 所以为了判断是否需 要提取特征,本方法需要求取当前人脸回归目标的 /2范数并且通过阈值来判断。 当范数大于阈值时表示对齐误差较大, 则需要进行新的特征提取, 而当范数小 于阈值时则不用提取特征。 这样可以节省大量训练时间。
步骤 7, 依次训练出一组回归器, 且回归器组成串联结构, 同时获得回归 器需要特征计算的位置。 依照需要特征提取的回归器位置为依据将串联结构进 行划分, 可以分为若干个不需要提取特征的小结构。 对串联结构进行划分的原 则是算法中需要特征提取回归器的位置, 因为通过阈值判断下一个回归器训练 是否需要提取特征, 所以从整体所有回归器组成结构看有些回归器要提取特征, 有些不要, 所以那些要提取的回归器就把结构进行了划分。 而阈值的设置则由 实验经验确定,不同是初始化情况阈值会不同,但一般都在 [70, 40]范围之内, 如本方法使用所有样本的真实人脸形状平均值作为初始化人脸形状所使用的 阈值为 50。
如图 7所示, 回归器运用模块 200为测试阶段, 主要是对初始化好的人脸 接与当前估计人脸形状相加, 包括以下步骤:
步骤 10 , 将测试样本输入训练好的一组回归器, 测试样本的人脸形状为任 意初始化的;
步骤 20 , 测试样本依据步骤 2进行特征提取;
步骤 30 , 将测试样本特征与每个回归器 /2范数的阈值做比较;
步骤 40 ,对分配到每个块中的样本用本块的平均回归目标函数与本块中样 本当前人脸形状相加来更新估计人脸形状;
步骤 50 , 根据步骤 7中确定的回归器需要特征提取的位置进行特征提取, 如特征不一样,则返回步骤 20或者返回步骤 30 ,直到所有的回归器计算完成; 训练完成后, 回归器需要提取特征的位置就确定了, 这时测试根据这个位置进 行特征提取就行了;
步骤 60 , 用测试样本的当前人脸形状与每个回归器测试结果相加, 形成最 终人脸估计形状。
为了提高计算速度, 本方法采用像素差作为特征。 因为像素差特征具有提 取方便, 计算快捷等特点。 本方法是在当前人脸形状每个标注点的局部范围内 随机采集多个像素点灰度值, 后将这些灰度值两两作差像素差作为特征。 如在 图 2中所示情况, 在 7个标注点中每个标注点局部 8像素范围内随机采点, 一 共采集 4 QQ个点。 则这 4QQ个点共形成 16QQ()()个像素差特征。 特征提取方法 如下: 所述步骤 2的特征提取方法如下:
步骤 21 : 按顺序选取每个人脸形状标注点的坐标值;
步骤 22: 在每个人脸形状标注点的局部范围内随机生成局部坐标值; 步骤 23: 将每个局部坐标值与对应的每个人脸形状标注点坐标值相加即 为生成的特征点, 提取其像素值;
步骤 24: 将所有提取的特征点的像素值两两做差即为特征。
为了训练出好的回归器, 必须对提取到的特征选取最优的特征。 通常都会 使用 n-bes t 方法。 但是此方法需要在巨大的特征空间中寻找最优特征, 要耗 费巨大的计算量。 所以本方法重复采用基于最大相关性的特征选取方法选取特 征,提取多次阈值。主要是计算每个像素差特征和回归目标投影值的相关系数, 找到相关系数最大的特征作为选取特征。 本方法需要重复此过程提取多次特征, 而回归目标的投影是由回归目标向量和随机生成的投影向量的内积而得到。
令/^和/^为两个特征点像素值, 则 pm - p„ 像素差特征。 而回归目标为 y = - 。 为样本 的真实人脸形状, 为样本 的当前估计人脸形状。 将回归 目标进行随机投影为 , 则像素差特征和回归目标投影的相关系数表示为: corr(Y pm-pn) =
(Y Pm-Pj 其中: cr(pm -pn) = cov( ?m ,pm) + coy(pn ,pn)-2 coy(pm ,pn) 0 根据公式, 每次计算相关系数时都要计算回归目标投影和一对像素差特征中单个像素的 协方差 cov(i ,p , co\(Yp , ) ,像素差本身方差 cov(/?m, p ,∞v(pn,pn)以及 相互的协方差 cov( pm,pn)„ 算法流程如下: 输入: 回归目标 采集点个数 需要选取特征数 输出: 像素差特征 {Pm-
Figure imgf000011_0001
过程: for f from 1 to K
} 为随机向量 计算回归目标投影和像素的协方差与回归目标投影的方差 ∞ν(Υρ,ρ),σ(Υρ)
mf = 1, nf =1
for m from 1 to P
for n from 1 to P
计算相关系数 corr(i ,pm - pn) i f corr(Yp ,pm-pj> corr(Yp , pmf - pn ) mf = m,nf = n
return
Figure imgf000011_0002
以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明, 不 能认定本发明的具体实施只局限于这些说明。 对于本发明所属技术领域的普通 技术人员来说, 在不脱离本发明构思的前提下, 还可以做出若干筒单推演或替 换, 都应当视为属于本发明的保护范围。

Claims

权利要求书
1. 一种快速人脸对齐方法, 其特征在于: 包括回归器训练模块和回归器运用模 块, 所述回归器训练模块包括以下步骤: 步骤 1, 对各种情况下的人脸图片样本提取人脸的真实形状标注点作为训练 样本, 再对训练样本进行人脸形状标注点初始化; 步骤 2, 对训练样本进行特征提取; 步骤 3, 计算每个训练样本的回归目标函数; 步骤 4, 对回归器进行学习; 步骤 5, 用回归器输出的结果更新当前人脸形状; 步骤 6, 计算训练样本当前回归目标 /2范数, 用该 /2范数与阈值相比较; 步骤 7, 依次训练出一组回归器, 且回归器组成串联结构, 同时获得回归器 需要特征计算的位置。
2. 根据权利要求 1所述快速人脸对齐方法, 其特征在于: 所述回归器运用模块包括以下步骤: 步骤 10, 将测试样本输入训练好的一组回归器; 步骤 20, 测试样本依据步骤 2进行特征提取; 步骤 30, 将测试样本特征与每个回归器训练生成的阈值做比较, 将测试样 本分配到每个回归器已有的某个块中; 步骤 40, 对分配到每个块中的样本用本块的平均回归目标与本块中样本当 前人脸形状相加来更新估计人脸形状; 步骤 50, 根据步骤 7 中确定的回归器需要特征提取的位置进行特征提取, 如需要特征提取, 则返回步骤 20, 否者返回步骤 30, 直到所有的回归器计 算完成; 步骤 60, 用测试样本的当前人脸形状与每个回归器测试结果相加, 形成最 终人脸估计形状。
3. 根据权利要求 1或 2所述快速人脸对齐方法, 其特征在于: 所述特征提取 方法: ¾口下:
步骤 21 : 按顺序选取每个人脸形状标注点的坐标值;
步骤 22: 在每个人脸形状标注点的局部范围内随机生成局部坐标值; 步骤 23: 将每个局部坐标值与对应的每个人脸形状标注点坐标值相加即为 生成的特征点, 提取其像素值;
步骤 24: 将所有提取的特征点的像素值两两做差即为特征。
4. 根据权利要求 1所述快速人脸对齐方法, 其特征在于: 所述步骤 6中 /2 范 数与阈值相比较具体为当范数 /2小于某个阈值时, 返回所述步骤 3; 当范数 /2大于某个阈值时返回所述步骤 2 , 重新提取像素差特征。
5. 根据权利要求 3所述快速人脸对齐方法,其特征在于:重复多次特征提取, 采用基于最大相关性的特征选取方法。
6. 根据权利要求 5 所述快速人脸对齐方法, 其特征在于: 所述基于最大相关 性的特征选取方法如下: 计算每个像素差特征和回归目标随机投影值的相 关系数, 找到相关系数最大的特征作为选取特征。
7. 根据权利要求 1所述快速人脸对齐方法, 其特征在于: 所述步骤 4 中对回 归器进行学习如下: 每个测试样本提取的像素差特征全部输入到每个回归 器中, 用回归目标函数依次训练由串联结构组成的每个回归器, 直到要求 数量的回归器训练完成。
8. 根据权利要求 1所述快速人脸对齐方法,其特征在于:所述某个阈值为 [70 , 40] 。
9. 根据权利要求 1所述快速人脸对齐方法, 其特征在于: 所述步骤 60中需要 特征提取的回归器位置为依据将串联结构进行划分, 划分为若干个不需要 提取特征的小结构, 每个小结构共享同一个特征。
10. 根据权利要求 1所述快速人脸对齐方法, 其特征在于: 所述步骤 5中回归 器输出的结 ΙΞΓ器训练结果相加。
PCT/CN2013/088224 2013-11-29 2013-11-29 一种快速人脸对齐方法 WO2015078007A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2013/088224 WO2015078007A1 (zh) 2013-11-29 2013-11-29 一种快速人脸对齐方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2013/088224 WO2015078007A1 (zh) 2013-11-29 2013-11-29 一种快速人脸对齐方法

Publications (1)

Publication Number Publication Date
WO2015078007A1 true WO2015078007A1 (zh) 2015-06-04

Family

ID=53198248

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/088224 WO2015078007A1 (zh) 2013-11-29 2013-11-29 一种快速人脸对齐方法

Country Status (1)

Country Link
WO (1) WO2015078007A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018188534A1 (zh) * 2017-04-14 2018-10-18 深圳市商汤科技有限公司 人脸图像处理方法、装置和电子设备
CN109977738A (zh) * 2017-12-28 2019-07-05 深圳Tcl新技术有限公司 一种视频场景分割判断方法、智能终端及存储介质
CN112597973A (zh) * 2021-01-29 2021-04-02 秒影工场(北京)科技有限公司 一种基于卷积神经网络的高清视频人脸对齐的方法
US11250241B2 (en) 2017-04-14 2022-02-15 Shenzhen Sensetime Technology Co., Ltd. Face image processing methods and apparatuses, and electronic devices

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831447A (zh) * 2012-08-30 2012-12-19 北京理工大学 多类别面部表情高精度识别方法

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831447A (zh) * 2012-08-30 2012-12-19 北京理工大学 多类别面部表情高精度识别方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CAO, XUDONG ET AL.: "Face Alignment by Explicit Shape Regression", 2012 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), PROVIDENCE, 21 June 2012 (2012-06-21), RI USA, pages 2887 - 2894 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018188534A1 (zh) * 2017-04-14 2018-10-18 深圳市商汤科技有限公司 人脸图像处理方法、装置和电子设备
US11132824B2 (en) 2017-04-14 2021-09-28 Shenzhen Sensetime Technology Co., Ltd. Face image processing method and apparatus, and electronic device
US11250241B2 (en) 2017-04-14 2022-02-15 Shenzhen Sensetime Technology Co., Ltd. Face image processing methods and apparatuses, and electronic devices
CN109977738A (zh) * 2017-12-28 2019-07-05 深圳Tcl新技术有限公司 一种视频场景分割判断方法、智能终端及存储介质
CN112597973A (zh) * 2021-01-29 2021-04-02 秒影工场(北京)科技有限公司 一种基于卷积神经网络的高清视频人脸对齐的方法

Similar Documents

Publication Publication Date Title
CN107025668B (zh) 一种基于深度相机的视觉里程计的设计方法
CN106778604B (zh) 基于匹配卷积神经网络的行人再识别方法
CN106845357B (zh) 一种基于多通道网络的视频人脸检测和识别方法
CN109919977B (zh) 一种基于时间特征的视频运动人物跟踪与身份识别方法
CN108038420B (zh) 一种基于深度视频的人体行为识别方法
WO2021051526A1 (zh) 多视图3d人体姿态估计方法及相关装置
CN108256439A (zh) 一种基于循环生成式对抗网络的行人图像生成方法及系统
CN109974743B (zh) 一种基于gms特征匹配及滑动窗口位姿图优化的视觉里程计
CN110717411A (zh) 一种基于深层特征融合的行人重识别方法
CN109635695B (zh) 基于三元组卷积神经网络的行人再识别方法
CN107424161B (zh) 一种由粗至精的室内场景图像布局估计方法
CN104517095B (zh) 一种基于深度图像的人头分割方法
CN103854283A (zh) 一种基于在线学习的移动增强现实跟踪注册方法
CN102982334B (zh) 基于目标边缘特征与灰度相似性的稀疏视差获取方法
CN112465021B (zh) 基于图像插帧法的位姿轨迹估计方法
US11361534B2 (en) Method for glass detection in real scenes
WO2015078007A1 (zh) 一种快速人脸对齐方法
CN111382613A (zh) 图像处理方法、装置、设备和介质
CN107292272B (zh) 一种在实时传输的视频中人脸识别的方法及系统
CN110390685A (zh) 一种基于事件相机的特征点跟踪方法
CN104182968A (zh) 宽基线多阵列光学探测系统模糊动目标分割方法
CN115601791B (zh) 基于Multiformer及离群样本重分配的无监督行人重识别方法
CN110516707A (zh) 一种图像标注方法及其装置、存储介质
CN112767546A (zh) 移动机器人基于双目图像的视觉地图生成方法
CN114187447A (zh) 一种基于实例分割的语义slam方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13898135

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 26/10/2016)

122 Ep: pct application non-entry in european phase

Ref document number: 13898135

Country of ref document: EP

Kind code of ref document: A1