CN109948560B

CN109948560B - Mobile robot target tracking system fusing bone recognition and IFace-TLD

Info

Publication number: CN109948560B
Application number: CN201910227611.2A
Authority: CN
Inventors: 苑晶; 蔡晶鑫; 高远兮
Original assignee: Nankai University
Current assignee: Nankai University
Priority date: 2019-03-25
Filing date: 2019-03-25
Publication date: 2023-04-07
Anticipated expiration: 2039-03-25
Also published as: CN109948560A

Abstract

A mobile robot target tracking system integrating skeleton recognition and IFace-TLD comprises an original color picture of a human body and a skeleton picture of an upper limb which are obtained through a Kinect sensor, an IFace-TLD unit used for tracking and positioning a target on the color picture and a skeleton recognition unit used for tracking and positioning the target on the skeleton picture, wherein a region frame where the target is located is obtained and sent to an image target positioning unit, the image target positioning unit marks a target region on the original color picture according to the obtained region frame where the target is located, and the target region is fed back to the IFace-TLD unit. The invention effectively solves the problem of short sequence tracking, and can realize better tracking on the target face no matter the length of the tracking sequence. The invention can realize stable recognition effect even under the condition that the human face faces back to the camera. The online processing based on the bone recognition is realized, and the tracking accuracy and robustness are improved.

Description

Mobile robot target tracking system integrating skeleton recognition and IFace-TLD

技术领域Technical Field

本发明涉及一种移动机器人目标跟踪系统。特别是涉及一种融合骨骼识别和IFace-TLD的移动机器人目标跟踪系统。The present invention relates to a mobile robot target tracking system, and more particularly to a mobile robot target tracking system integrating skeleton recognition and IFace-TLD.

背景技术Background Art

目标跟踪在安防、机器人、人机交互等领域有着广泛的应用。在实际的跟踪过程中，由于目标的快速运动、光照的变化以及遮挡等因素，实现鲁棒而高效的跟踪是一个极具挑战性的工作。Object tracking has a wide range of applications in security, robotics, human-computer interaction, etc. In the actual tracking process, due to factors such as the rapid movement of the target, changes in lighting, and occlusion, achieving robust and efficient tracking is a very challenging task.

人脸具有较大的区分性，为了达到较好的跟踪效果，我们选择对目标人脸进行跟踪。基于人脸的跟踪-检测-学习算法(Face-TLD)可以实现长时间地对人脸进行跟踪。但是，因为这是一个长跟踪算法，当跟踪序列较短时，由于Face-TLD中学习部分的训练不够充分，跟踪效果较差，甚至会发生较大的漂移。另外，在实际的应用场景中，人脸的转动有较大的随机性，不能保证人脸每时每刻都正对着摄像头，在有些情况下，目标人脸可能完全背对着摄像头，这时，基于图像外观的跟踪算法都会失效。Faces are highly distinguishable. In order to achieve better tracking results, we choose to track the target face. The face-based tracking-detection-learning algorithm (Face-TLD) can track faces for a long time. However, because this is a long tracking algorithm, when the tracking sequence is short, the learning part of Face-TLD is not fully trained, the tracking effect is poor, and even large drift may occur. In addition, in actual application scenarios, the rotation of the face is highly random, and it cannot be guaranteed that the face is facing the camera at all times. In some cases, the target face may be completely facing away from the camera. At this time, the tracking algorithm based on image appearance will fail.

人脸具有独特的生物学特性，已在许多场合得到应用。然而，大多数高精度的人脸识别算法都是很耗时的，这些耗时的算法不能应用到对实时性要求较高的移动机器人目标跟踪中。Human face has unique biological characteristics and has been used in many occasions. However, most high-precision face recognition algorithms are time-consuming, and these time-consuming algorithms cannot be applied to mobile robot target tracking, which has high real-time requirements.

基于TLD算法，Face-TLD能够长时间、鲁棒地跟踪人脸。原始的TLD是一个单目标跟踪算法，可以在视频流中长时间地跟踪一个未知的物体。原始的TLD可以被分为三个部分，即，跟踪部分、学习部分和检测部分。因为其跟踪效果良好，有很多基于此算法的改进工作。Face-TLD就是其中一个。Face-TLD将人脸检测和TLD结合起来，从而实现对人脸的长时间跟踪。在原始的Face-TLD中，检测器可以被分为两个部分，即，人脸检测部分和验证器部分。人脸检测部分对所有的图像块进行处理，验证器部分输出包含特定人脸的图像块的置信系数。然而，当跟踪序列较短时，由于学习部分的训练不够充分，Face-TLD无法达到一个令人满意的跟踪效果。具体来说，由于要处理各种不确定性，从而引入了学习部分。但是，为了保证精度，学习部分需要充分多的训练数据，同时，这是一个耗时的过程。短序列无法提供充足的图片来训练学习部分。更严重的是，在跟踪初期时，如果目标之间的外观比较相似的话，原始的Face-TLD很有可能会跟丢目标。Based on the TLD algorithm, Face-TLD can track faces for a long time and robustly. The original TLD is a single target tracking algorithm that can track an unknown object for a long time in a video stream. The original TLD can be divided into three parts, namely, the tracking part, the learning part, and the detection part. Because of its good tracking effect, there are many improvements based on this algorithm. Face-TLD is one of them. Face-TLD combines face detection and TLD to achieve long-term tracking of faces. In the original Face-TLD, the detector can be divided into two parts, namely, the face detection part and the verifier part. The face detection part processes all image blocks, and the verifier part outputs the confidence coefficient of the image block containing a specific face. However, when the tracking sequence is short, Face-TLD cannot achieve a satisfactory tracking effect due to insufficient training of the learning part. Specifically, the learning part is introduced to deal with various uncertainties. However, in order to ensure accuracy, the learning part requires sufficient training data, and this is a time-consuming process. Short sequences cannot provide enough pictures to train the learning part. What’s more serious is that in the early stage of tracking, if the targets have similar appearances, the original Face-TLD is likely to lose the target.

微软Kinect传感器可以直接采集人体骨骼信息，对人体运动较为鲁棒，即使人体完全背对着Kinect，都可以采集到较为稳定的骨骼。由于传感器的优势，基于骨骼的人体识别对光照、运动以及目标外观变化有较好的鲁棒性。但是，现有的基于骨骼的人体识别算法都是采集一定数量的数据，然后离线处理，需要手动设置训练标签，难以实现在线识别。这显然不能满足移动机器人在线跟踪的要求。Microsoft Kinect sensor can directly collect human skeleton information and is relatively robust to human movement. Even if the human body is completely facing away from Kinect, relatively stable skeletons can be collected. Due to the advantages of the sensor, skeleton-based human recognition has good robustness to changes in lighting, movement, and target appearance. However, existing skeleton-based human recognition algorithms all collect a certain amount of data and then process it offline. They need to manually set training labels, making it difficult to achieve online recognition. This obviously cannot meet the requirements of online tracking of mobile robots.

发明内容Summary of the invention

本发明所要解决的技术问题是，提供一种能够对目标人脸实现较好的跟踪，并能提高跟踪准确度以及鲁棒性的融合骨骼识别和IFace-TLD的移动机器人目标跟踪系统。The technical problem to be solved by the present invention is to provide a mobile robot target tracking system which integrates skeleton recognition and IFace-TLD and can achieve better tracking of a target face and improve tracking accuracy and robustness.

本发明所采用的技术方案是：一种融合骨骼识别和IFace-TLD的移动机器人目标跟踪系统，包括通过Kinect传感器获取人体的原始彩色图片和上肢的骨骼图片，分别通过用于在彩色图片上对目标进行跟踪定位的IFace-TLD单元，以及用于在骨骼图片上对目标进行跟踪定位的骨骼识别单元得到目标所在区域框，并送入图像目标定位单元，图像目标定位单元根据得到的目标所在区域框在原始彩色图片标出目标区域，并将目标区域反馈给IFace-TLD单元。The technical solution adopted by the present invention is: a mobile robot target tracking system integrating skeleton recognition and IFace-TLD, comprising obtaining an original color picture of a human body and a skeleton picture of an upper limb through a Kinect sensor, obtaining a target area frame through an IFace-TLD unit for tracking and locating the target on the color picture, and a skeleton recognition unit for tracking and locating the target on the skeleton picture, and sending the target area frame to an image target positioning unit, the image target positioning unit marks a target area in the original color picture according to the obtained target area frame, and feeds the target area back to the IFace-TLD unit.

所述IFace-TLD单元包括有分别获取原始彩色图片的跟踪部分、学习部分、检测部分，以及集成器，其中，所述的跟踪部分使用光流法跟踪器来估计获取的原始彩色图片中的目标在相邻两帧之间的运动轨迹，并分别送入学习部分和集成器；所述的检测部分对获取的第一帧原始彩色图片中所有的图像块进行独立地扫描和处理，将目标人脸和背景分离出来，并将目标人脸分别送入学习部分和集成器，对获取的第一帧之后的原始彩色图片只对图像目标定位单元所反馈的目标区域及周边进行扫描和处理，将目标人脸和背景分离出来，并将目标人脸分别送入学习部分和集成器；集成器根据得到的目标在相邻两帧之间的运动轨迹和目标人脸计算原始彩色图片中最可能包含目标位置的置信系数，将计算结果送入学习部分，以及骨骼识别单元或者图像目标定位单元；所述的学习部分根据原始彩色图片以及从跟踪部分、检测部分和集成器得到的结果进行训练，根据训练结果对跟踪部分和检测部分出现的错误进行更新和修正。The IFace-TLD unit includes a tracking part, a learning part, a detection part, and an integrator for respectively acquiring the original color picture, wherein the tracking part uses an optical flow tracker to estimate the motion trajectory of the target in the acquired original color picture between two adjacent frames, and sends the trajectory to the learning part and the integrator respectively; the detection part independently scans and processes all image blocks in the first frame of the acquired original color picture, separates the target face and the background, and sends the target face to the learning part and the integrator respectively, and scans and processes only the target area and the surrounding area fed back by the image target positioning unit for the original color picture after the first frame, separates the target face and the background, and sends the target face to the learning part and the integrator respectively; the integrator calculates the confidence coefficient of the most likely target position in the original color picture according to the obtained motion trajectory of the target and the target face between two adjacent frames, and sends the calculation result to the learning part, and the skeleton recognition unit or the image target positioning unit; the learning part is trained according to the original color picture and the results obtained from the tracking part, the detection part and the integrator, and updates and corrects the errors in the tracking part and the detection part according to the training results.

所述的检测部分包括有用于根据获取的原始彩色图片、学习部分的目标区域和图像目标定位单元所反馈的目标区域信息检测出人脸在原始彩色图片的区域的人脸检测部分，根据从人脸检测部分得到的人脸在原始彩色图片的区域识别出目标人脸区域的人脸识别部分，以及用于判断人脸识别部分识别出的目标人脸区域是否正确的验证器部分，所述验证器部分的验证结果分别送入学习部分和集成器。The detection part includes a face detection part for detecting the area of the face in the original color picture according to the acquired original color picture, the target area of the learning part and the target area information fed back by the image target positioning unit, a face recognition part for identifying the target face area according to the area of the face in the original color picture obtained from the face detection part, and a verifier part for judging whether the target face area recognized by the face recognition part is correct. The verification results of the verifier part are respectively sent to the learning part and the integrator.

所述骨骼识别单元包括有运动周期提取部分、骨骼特征提取部分和支持向量数据描述部分，其中，所述的运动周期提取部分根据获取的骨骼图片计算人体的运动周期，所述的骨骼特征提取部分是在得到的人体的运动周期内计算骨骼特征，当IFace-TLD单元中的集成器输出的结果为目标所在区域框时，骨骼特征提取部分得到的骨骼特征送入支持向量数据描述部分中的训练部分进行训练，当IFace-TLD单元中的集成器输出的结果为空时，骨骼特征提取部分得到的骨骼特征送入支持向量数据描述部分中的预测部分，所述预测部分根据训练部分的训练结果预测目标所在区域框，并将预测的目标所在区域框送入图像目标定位单元。The skeleton recognition unit includes a motion cycle extraction part, a skeleton feature extraction part and a support vector data description part, wherein the motion cycle extraction part calculates the motion cycle of the human body according to the acquired skeleton image, and the skeleton feature extraction part calculates the skeleton features within the obtained motion cycle of the human body. When the result output by the integrator in the IFace-TLD unit is the target area frame, the skeleton features obtained by the skeleton feature extraction part are sent to the training part in the support vector data description part for training. When the result output by the integrator in the IFace-TLD unit is empty, the skeleton features obtained by the skeleton feature extraction part are sent to the prediction part in the support vector data description part. The prediction part predicts the target area frame according to the training result of the training part, and sends the predicted target area frame to the image target positioning unit.

所述运动周期提取部分根据获取的骨骼图片计算人体的运动周期是采用如下公式：The motion cycle extraction part calculates the motion cycle of the human body according to the acquired skeleton image using the following formula:

其中，dist_k为在Kinect坐标系下，第k帧图像左手腕和肩膀中心之间的距离；

和

表示第k帧图像左手腕和肩膀中心的三维点坐标；N表示序列中总的图像总帧数。Where dist _k is the distance between the left wrist and shoulder center of the kth frame image in the Kinect coordinate system;

and

Represents the 3D point coordinates of the center of the left wrist and shoulder of the kth frame image; N represents the total number of image frames in the sequence.

所述骨骼特征提取部分在得到的人体的运动周期内计算骨骼特征，包括：The bone feature extraction part calculates the bone features within the obtained motion cycle of the human body, including:

首先定义步态半周期为T_w，则，基于人体上肢的骨骼特征分别表示为：First, the gait half-cycle is defined as T _w , then the skeletal features based on the upper limbs of the human body are expressed as:

轨迹特征：将肩膀中心点选为固定点，通过下式计算其他上肢骨骼点相对于固定点的相对位置，得到一个9维的特征P：Trajectory feature: The center point of the shoulder is selected as the fixed point, and the relative positions of other upper limb bone points relative to the fixed point are calculated by the following formula to obtain a 9-dimensional feature P:

其中，

表示第j个人体骨骼在步态半周期T_w中的第t帧图像的位置，

被表示为：in,

represents the position of the jth human skeleton in the tth frame image in the gait half cycle _Tw ,

is represented as:

其中，

表示相机坐标系下肩膀中心点的位置，上肢骨骼中其余坐标点的位置表示为

用P的协方差矩阵来表示包含人体行走习惯的轨迹特征矩阵F_T，定义in,

represents the position of the center point of the shoulder in the camera coordinate system, and the positions of the remaining coordinate points in the upper limb bones are expressed as

The covariance matrix of P is used to represent the trajectory feature matrix F _T containing the walking habits of human beings, and the definition is

其中，

和

分别表示测试数据和训练数据的轨迹特征矩阵；

是

和

之间的广义特征值，满足

x是相应的广义右特征向量；in,

and

Represent the trajectory feature matrices of test data and training data respectively;

yes

and

The generalized eigenvalues between

x is the corresponding generalized right eigenvector;

面积和距离特征：面积特征F_A表示人体上肢部分围成封闭区域的面积，距离特征F_D由不同人体中心之间的距离来表示，F_A表示为：Area and distance features: The area feature _FA represents the area of the closed area enclosed by the upper limbs of the human body. The distance feature _FD is represented by the distance between different human body centers. _FA is expressed as:

其中，

和

分别表示肩膀中心点、头部、左肩和右肩在步态半周期T_w中的第t帧的位置；in,

and

denote the positions of the shoulder center, head, left shoulder, and right shoulder in the tth frame in the gait half cycle Tw _, respectively;

为了计算距离特征F_D，首先计算上肢区域的三个封闭多边形的中心，三个中心点即头部中心

右手中心

和左手中心

通过以下公式计算得到：To calculate the distance feature _FD , we first calculate the centers of the three closed polygons in the upper limb area. The three center points are the center of the head.

Right hand center

and left hand center

Calculated by the following formula:

头部中心

是肩膀中心点和头部点围成多边形的中心；右手中心

是右肩点，右肘点和右手腕点围成多边形的中心；左手中心

是左肩点，左肘点和左手腕点围成多边形的中心；右手中心

分别与头部中心

和左手中心

之间的欧几里得距离f_t ^d1和f_t ^d2写为：Head Center

It is the center of the polygon formed by the center point of the shoulder and the head; the center of the right hand

It is the center of the polygon formed by the right shoulder point, right elbow point and right wrist point; the left hand center

It is the center of the polygon formed by the left shoulder point, left elbow point and left wrist point; the center of the right hand

Head center

and left hand center

The Euclidean distance between f _t ^d1 and f _t ^d2 is written as:

令

这样，整个距离特征表示为make

In this way, the entire distance feature is expressed as

其中，

和

分别表示f^di的均值、方差和最大值，其中,i＝1,2；in,

and

Respectively represent the mean, variance and maximum value of f ^di , where i = 1, 2;

静态特征：静态特征由一个5维向量F_S＝[f_h,f_lua,f_rua,f_lf,f_rf]^T来表示，其中f_h表示目标的高度，f_lua，f_rua，f_lf和f_rf分别表示左上臂长度，右上臂长度，左前臂长度和右前臂长度，具体由下各式得到：Static features: Static features are represented by a 5-dimensional vector F _S = [f _h ,f _lua ,f _rua ,f _lf ,f _rf ] ^T , where f _h represents the height of the target, and f _lua ,f _rua ,f _lf and f _rf represent the length of the left upper arm, the length of the right upper arm, the length of the left forearm and the length of the right forearm, respectively. They are obtained by the following formulas:

其中，

分别表示在相机坐标系下，肩部中心点、左肩点、左肘点、左腕点、左手点、右肩点、右肘点、右腕点和右手点的-位置；in,

Respectively represent the positions of the shoulder center point, left shoulder point, left elbow point, left wrist point, left hand point, right shoulder point, right elbow point, right wrist point and right hand point in the camera coordinate system;

频率特征和振幅特征：频率特征F_Fre是步态半周期内骨骼图像帧的数量，相邻的局部最大和局部最小之间的差值为振幅特征F_Amp；Frequency feature and amplitude feature: The frequency feature F _Fre is the number of skeleton image frames in a gait half cycle, and the difference between adjacent local maximum and local minimum is the amplitude feature F _Amp ;

最后，得到一个23维的混合特征

构成基于人体上肢的骨骼特征。Finally, we get a 23-dimensional mixed feature

The composition is based on the skeletal features of the human upper limbs.

本发明的融合骨骼识别和IFace-TLD的移动机器人目标跟踪系统，在原Face-TLD算法上加入了基于主成分分析(PCA)的人脸识别，有效地解决了短序列跟踪问题，本发明称之为IFace-TLD。本发明，不论跟踪序列的长短，IFace-TLD都能够对目标人脸实现较好的跟踪。同时，SIFace-TLD将IFace-TLD与基于骨骼的人体识别无缝融合起来，当IFace-TLD跟踪成功时，用提取到的骨骼特征去训练支持向量数据描述(SVDD)，当其跟踪失效时，新提取的骨骼特征可以送入训练好的SVDD中进行识别。这样，即使是人脸背对着摄像头的情况下，也能够实现稳定的识别效果。不仅实现了基于骨骼识别的在线处理，还提高了跟踪的准确度以及鲁棒性。The mobile robot target tracking system of the present invention, which integrates skeleton recognition and IFace-TLD, adds face recognition based on principal component analysis (PCA) to the original Face-TLD algorithm, effectively solves the short sequence tracking problem, and the present invention is referred to as IFace-TLD. In the present invention, regardless of the length of the tracking sequence, IFace-TLD can achieve better tracking of the target face. At the same time, SIFace-TLD seamlessly integrates IFace-TLD with skeleton-based human body recognition. When IFace-TLD tracks successfully, the extracted skeleton features are used to train support vector data description (SVDD). When its tracking fails, the newly extracted skeleton features can be sent to the trained SVDD for identification. In this way, even when the face is facing away from the camera, a stable recognition effect can be achieved. Not only online processing based on skeleton recognition is achieved, but also the accuracy and robustness of tracking are improved.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明融合骨骼识别和IFace-TLD的移动机器人目标跟踪系统的构成框图；FIG1 is a block diagram of a mobile robot target tracking system integrating skeleton recognition and IFace-TLD according to the present invention;

图2是含有20个关节的人体骨骼示意图；Fig. 2 is a schematic diagram of a human skeleton containing 20 joints;

图3是dist_k的曲线图。FIG3 is a graph of dist _k .

图中In the figure

101：IFace-TLD单元 101.1：跟踪部分101: IFace-TLD Unit 101.1: Tracking Section

101.2：学习部分 101.3：检测部分101.2: Learning part 101.3: Detection part

101.31人脸检测部分 101.32：人脸识别部分101.31 Face detection part 101.32: Face recognition part

101.33：验证器部分 101.4：集成器101.33: Validator Section 101.4: Integrator

102：骨骼识别单元 102.1：运动周期提取部分102: Skeleton Recognition Unit 102.1: Motion Cycle Extraction Part

102.2：骨骼特征提取部分 102.3：支持向量数据描述部分102.2: Skeleton feature extraction part 102.3: Support vector data description part

102.31：训练部分 102.32：预测部分102.31: Training part 102.32: Prediction part

103：图像目标定位单元103: Image target positioning unit

具体实施方式DETAILED DESCRIPTION

下面结合实施例和附图对本发明的融合骨骼识别和IFace-TLD的移动机器人目标跟踪系统做出详细说明。The mobile robot target tracking system integrating skeleton recognition and IFace-TLD of the present invention is described in detail below in conjunction with the embodiments and drawings.

如图1所示，本发明的融合骨骼识别和IFace-TLD的移动机器人目标跟踪系统，包括通过Kinect传感器获取人体的原始彩色图片A和上肢的骨骼图片B，分别通过用于在彩色图片A上对目标进行跟踪定位的IFace-TLD单元101，以及用于在骨骼图片B上对目标进行跟踪定位的骨骼识别单元102得到目标所在区域框，并送入图像目标定位单元103，图像目标定位单元103根据得到的目标所在区域框在原始彩色图片A中标出目标区域，并将目标区域反馈给IFace-TLD单元101。As shown in FIG1 , the mobile robot target tracking system integrating skeleton recognition and IFace-TLD of the present invention comprises obtaining an original color image A of a human body and a skeleton image B of an upper limb through a Kinect sensor, obtaining a target region frame through an IFace-TLD unit 101 for tracking and locating the target on the color image A, and a skeleton recognition unit 102 for tracking and locating the target on the skeleton image B, and sending the frame to an image target positioning unit 103. The image target positioning unit 103 marks a target region in the original color image A according to the obtained target region frame, and feeds the target region back to the IFace-TLD unit 101.

所述IFace-TLD单元101包括有分别获取原始彩色图片A的跟踪部分101.1、学习部分101.2、检测部分101.3，以及集成器101.4，其中，所述的跟踪部分101.1使用光流法跟踪器来估计获取原始彩色图片A中的目标在相邻两帧之间的运动轨迹，并分别送入学习部分101.2和集成器101.4；所述的检测部分101.3对获取的第一帧原始彩色图片A中所有的图像块进行独立地扫描和处理，将目标人脸和背景分离出来，并将目标人脸分别送入学习部分101.2和集成器101.4，对获取的第一帧之后的原始彩色图片A只对图像目标定位单元103所反馈的目标区域及周边进行扫描和处理，将目标人脸和背景分离出来，并将目标人脸分别送入学习部分101.2和集成器101.4；集成器101.4根据得到的目标在相邻两帧之间的运动轨迹和目标人脸计算原始彩色图片A中最可能包含目标位置的置信系数，将计算结果送入学习部分101.2，以及骨骼识别单元102或者图像目标定位单元103；所述的学习部分101.2根据原始彩色图片A以及从跟踪部分101.1、检测部分101.3和集成器101.4得到的结果进行训练，根据训练结果对跟踪部分101.1和检测部分101.3出现的错误进行更新和修正。The IFace-TLD unit 101 includes a tracking part 101.1, a learning part 101.2, a detection part 101.3, and an integrator 101.4 for respectively acquiring the original color image A, wherein the tracking part 101.1 uses an optical flow tracker to estimate the motion trajectory of the target in the original color image A between two adjacent frames, and sends the estimates to the learning part 101.2 and the integrator 101.4 respectively; the detection part 101.3 independently scans and processes all image blocks in the first frame of the original color image A acquired, separates the target face and the background, and sends the target face to the learning part 101.2 and the integrator 101.4 respectively, and only scans and processes the original color image A after the first frame acquired. 03, scan and process the target area and the surrounding area fed back by the tracking part 101.1, separate the target face from the background, and send the target face to the learning part 101.2 and the integrator 101.4 respectively; the integrator 101.4 calculates the confidence coefficient of the most likely target position in the original color picture A according to the obtained motion trajectory of the target between two adjacent frames and the target face, and sends the calculation result to the learning part 101.2, and the skeleton recognition unit 102 or the image target positioning unit 103; the learning part 101.2 is trained according to the original color picture A and the results obtained from the tracking part 101.1, the detection part 101.3 and the integrator 101.4, and updates and corrects the errors in the tracking part 101.1 and the detection part 101.3 according to the training results.

所述的检测部分101.3包括有用于根据获取的原始彩色图片A、学习部分101.2的目标区域和图像目标定位单元103所反馈的目标区域信息检测出人脸在原始彩色图片A的区域的人脸检测部分101.31，根据从人脸检测部分101.31得到的人脸在原始彩色图片A的区域识别出目标人脸区域的人脸识别部分101.32，以及用于判断人脸识别部分101.32识别出的目标人脸区域是否正确的验证器部分101.33，所述验证器部分101.33的验证结果分别送入学习部分101.2和集成器101.4。The detection part 101.3 includes a face detection part 101.31 for detecting the area of the face in the original color picture A based on the acquired original color picture A, the target area of the learning part 101.2 and the target area information fed back by the image target positioning unit 103, a face recognition part 101.32 for identifying the target face area in the original color picture A based on the face obtained from the face detection part 101.31, and a verifier part 101.33 for judging whether the target face area recognized by the face recognition part 101.32 is correct. The verification results of the verifier part 101.33 are respectively sent to the learning part 101.2 and the integrator 101.4.

本发明的IFace-TLD融合了Face-TLD和基于主成分分析的人脸识别。所述的Face-TLD由ZdenekKalal，Krystian Mikolajczyk以及Jiri Matas在2010年一篇名为《Face-TLD：Tracking-Learning-Detection Appliedto Faces》的文章中提出。与原始Face-TLD相比，IFace-TLD多了人脸检测部分，从而跟踪性能得到了加强。在IFace-TLD单元101中，基于主成分分析的人脸识别被添加在人脸检测的后面，从而用于区分形似的图像块。为了减少图像块的数量，提高处理速度，本发明只考虑反馈的目标区域，反馈的目标区域的大小是之前得到的包含目标人脸的包围框的两倍。人脸检测用于检测所有包含人脸的图像块，之后，人脸识别部分将不是目标的人脸滤除。最后，所有剩下的图像块被送到验证器中进一步确定是否包含目标人脸。The IFace-TLD of the present invention combines Face-TLD and face recognition based on principal component analysis. The Face-TLD was proposed by Zdenek Kalal, Krystian Mikolajczyk and Jiri Matas in an article entitled "Face-TLD: Tracking-Learning-Detection Applied to Faces" in 2010. Compared with the original Face-TLD, IFace-TLD has an additional face detection part, so that the tracking performance is enhanced. In the IFace-TLD unit 101, face recognition based on principal component analysis is added after face detection, so as to distinguish similar image blocks. In order to reduce the number of image blocks and improve the processing speed, the present invention only considers the target area of feedback, and the size of the target area of feedback is twice the size of the bounding box containing the target face obtained before. Face detection is used to detect all image blocks containing faces, and then the face recognition part filters out faces that are not the target. Finally, all the remaining image blocks are sent to the verifier to further determine whether they contain the target face.

所述骨骼识别单元102包括有运动周期提取部分102.1、骨骼特征提取部分102.2和支持向量数据描述部分102.3，其中，所述的运动周期提取部分102.1根据获取的骨骼图片B计算人体的运动周期，所述的骨骼特征提取部分102.2是在得到的人体的运动周期内计算骨骼特征，当IFace-TLD单元101中的集成器101.4输出的结果为目标所在区域框时，骨骼特征提取部分102.2得到的骨骼特征送入支持向量数据描述部分102.3中的训练部分102.31进行训练，当IFace-TLD单元101中的集成器101.4输出的结果为空时，骨骼特征提取部分102.2得到的骨骼特征送入支持向量数据描述部分102.3中的预测部分102.32，所述预测部分102.32根据训练部分102.31的训练结果预测目标所在区域框，并将预测的目标所在区域框送入图像目标定位单元103。The skeleton recognition unit 102 includes a motion cycle extraction part 102.1, a skeleton feature extraction part 102.2 and a support vector data description part 102.3, wherein the motion cycle extraction part 102.1 calculates the motion cycle of the human body according to the acquired skeleton image B, and the skeleton feature extraction part 102.2 calculates the skeleton features within the obtained motion cycle of the human body. When the result output by the integrator 101.4 in the IFace-TLD unit 101 is the target area frame, the skeleton feature extraction part 102.2 obtains The skeletal features are sent to the training part 102.31 in the support vector data description part 102.3 for training. When the result output by the integrator 101.4 in the IFace-TLD unit 101 is empty, the skeletal features obtained by the skeletal feature extraction part 102.2 are sent to the prediction part 102.32 in the support vector data description part 102.3. The prediction part 102.32 predicts the target area box according to the training result of the training part 102.31, and sends the predicted target area box to the image target positioning unit 103.

含有20个关节的人体骨骼如图2所示，图中的编号如表1所示：The human skeleton with 20 joints is shown in Figure 2, and the numbers in the figure are shown in Table 1:

表1Table 1

11 髋关节中心点Hip center 1111 右腕点Right wrist point 22 脊点Ridge Point 1212 右手点Right hand point 33 肩部中心点Shoulder center point 1313 左髋关节点Left hip joint 44 头部点Head Point 1414 左膝点Left knee point 55 左肩点Left shoulder point 1515 左脚踝点Left ankle point 66 左肘点Left elbow point 1616 左脚点Left foot point 77 左腕点Left wrist point 1717 右髋关节点Right hip joint 88 左手点Left hand point 1818 右膝点Right knee point 99 右肩点Right shoulder point 1919 右脚踝点Right ankle point 1010 右肘点Right elbow point 2020 右脚点Right foot point

本发明用上肢的十个骨骼点实现在线的人体识别，这十个骨骼点用黑色实线连接。剩下的十个骨骼点用黑色虚线连接。通过计算在Kinect坐标系下左手腕和肩膀中心之间的距离来获得步态周期。The present invention realizes online human body recognition using ten skeleton points of the upper limbs, which are connected by black solid lines. The remaining ten skeleton points are connected by black dotted lines. The gait cycle is obtained by calculating the distance between the left wrist and the center of the shoulder in the Kinect coordinate system.

所述运动周期提取部分102.1根据获取的骨骼图片B计算人体的运动周期是采用如下公式：The motion cycle extraction part 102.1 calculates the motion cycle of the human body according to the acquired skeleton image B using the following formula:

其中，dist_k为在Kinect坐标系下，第k帧图像左腕点和肩部中心点之间的距离；

和

表示第k帧图像左腕点和肩部中心点的三维点坐标；N表示序列中总的图像总帧数，dist_k曲线如图3所示。点线表示最原始的距离曲线。为了降低噪声的干扰，将原始数据进行均值滤波。滤波后的曲线由实线表示。本发明定义步态全周期是相邻局部最大(或最小)的帧数间隔。相邻局部最大和局部最小之间的帧数数目定义为步态半周期。因为骨骼特征是在步态周期内提取的，为了得到更多的骨骼特征，本发明在步态半周期内进行提取。Wherein, dist _k is the distance between the left wrist point and the shoulder center point of the kth frame image in the Kinect coordinate system;

and

Represents the three-dimensional point coordinates of the left wrist point and the shoulder center point of the k-th frame image; N represents the total number of image frames in the sequence, and the dist _k curve is shown in Figure 3. The dotted line represents the most original distance curve. In order to reduce the interference of noise, the original data is mean filtered. The filtered curve is represented by a solid line. The present invention defines the full gait cycle as the frame interval between adjacent local maximums (or minimums). The number of frames between adjacent local maximums and local minimums is defined as the gait half cycle. Because the skeletal features are extracted within the gait cycle, in order to obtain more skeletal features, the present invention extracts them within the gait half cycle.

所述骨骼特征提取部分102.2在得到的人体的运动周期内计算骨骼特征，包括：The bone feature extraction part 102.2 calculates the bone features within the obtained human body motion cycle, including:

轨迹特征：将肩部中心点选为固定点，通过下式计算其他上肢骨骼点相对于固定点的相对位置，得到一个9维的特征P：Trajectory feature: The center point of the shoulder is selected as the fixed point, and the relative positions of other upper limb bone points relative to the fixed point are calculated by the following formula to obtain a 9-dimensional feature P:

其中，

表示第j个人体骨骼在步态半周期T_w中的第t帧图像的位置，

被表示为：in,

is represented as:

其中，

表示相机坐标系下肩部中心点的位置，上肢骨骼中其余坐标点的位置表示为

其中，

和

分别表示测试数据和训练数据的轨迹特征矩阵；

是

和

之间的广义特征值，满足

x是相应的广义右特征向量；in,

and

yes

and

The generalized eigenvalues between

x is the corresponding generalized right eigenvector;

其中，

和

分别表示肩部中心点、头部点、左肩点和右肩点在步态半周期T_w中的第t帧的位置；in,

and

denote the positions of the shoulder center point, head point, left shoulder point, and right shoulder point in the tth frame in the gait half cycle Tw _, respectively;

右手中心

和左手中心

Right hand center

and left hand center

Calculated by the following formula:

头部中心

是肩部中心点和头部点围成多边形的中心；右手中心

是右肩点，右肘点和右腕点围成多边形的中心；左手中心

是左肩点，左肘点和左腕点围成多边形的中心；右手中心

分别与头部中心

和左手中心

之间的欧几里得距离f_t ^d1和f_t ^d2写为：Head Center

Head center

and left hand center

The Euclidean distance between f _t ^d1 and f _t ^d2 is written as:

令

这样，整个距离特征表示为make

In this way, the entire distance feature is expressed as

其中，

和

分别表示f^di的均值、方差和最大值，其中,i＝1,2；in,

and

其中，

分别表示在相机坐标系下，肩部中心点、左肩点、左肘点、左腕点、左手点、右肩点、右肘点、右腕点和右手点的-位置。in,

They respectively represent the positions of the shoulder center point, left shoulder point, left elbow point, left wrist point, left hand point, right shoulder point, right elbow point, right wrist point and right hand point in the camera coordinate system.

最后，得到一个23维的混合特征

The composition is based on the skeletal features of the human upper limbs.

Claims

1. A mobile robot target tracking system integrating skeleton recognition and IFace-TLD, characterized in that it comprises obtaining an original color picture (A) of a human body and a skeleton picture (B) of an upper limb through a Kinect sensor, obtaining a target area frame through an IFace-TLD unit (101) for tracking and locating the target on the color picture (A), and a skeleton recognition unit (102) for tracking and locating the target on the skeleton picture (B), and sending the frame to an image target positioning unit (103); the image target positioning unit (103) marks a target area on the original color picture (A) according to the obtained target area frame, and feeds the target area back to the IFace-TLD unit (101);

The IFace-TLD unit (101) comprises a tracking part (101.1), a learning part (101.2), a detection part (101.3), and an integrator (101.4) for respectively acquiring an original color image (A), wherein the tracking part (101.1) uses an optical flow tracker to estimate the motion trajectory of a target in the acquired original color image (A) between two adjacent frames, and sends the estimated motion trajectory to the learning part (101.2) and the integrator (101.4) respectively; the detection part (101.3) independently scans and processes all image blocks in the first frame of the acquired original color image (A), separates the target face from the background, and sends the target face to the learning part (101.2) and the integrator (101.4) respectively, and only processes the image target positioning unit (101.3) for the original color images (A) acquired after the first frame. The target area and the surrounding area fed back by the tracking part (101.1) and the detection part (101.3) are scanned and processed to separate the target face and the background, and the target face is sent to the learning part (101.2) and the integrator (101.4) respectively; the integrator (101.4) calculates the confidence coefficient of the most likely target position in the original color image (A) according to the obtained target motion trajectory between two adjacent frames and the target face, and sends the calculation result to the learning part (101.2), and the skeleton recognition unit (102) or the image target positioning unit (103); the learning part (101.2) is trained according to the original color image (A) and the results obtained from the tracking part (101.1), the detection part (101.3) and the integrator (101.4), and the errors in the tracking part (101.1) and the detection part (101.3) are updated and corrected according to the training results;

The skeleton recognition unit (102) comprises a motion cycle extraction part (102.1), a skeleton feature extraction part (102.2) and a support vector data description part (102.3), wherein the motion cycle extraction part (102.1) calculates the motion cycle of the human body according to the acquired skeleton image (B), and the skeleton feature extraction part (102.2) calculates the skeleton features within the obtained motion cycle of the human body. When the result output by the integrator (101.4) in the IFace-TLD unit (101) is the target area frame, the skeleton feature extraction part (102.2) obtains The skeletal features are sent to the training part (102.31) in the support vector data description part (102.3) for training. When the result output by the integrator (101.4) in the IFace-TLD unit (101) is empty, the skeletal features obtained by the skeletal feature extraction part (102.2) are sent to the prediction part (102.32) in the support vector data description part (102.3). The prediction part (102.32) predicts the target area frame according to the training result of the training part (102.31), and sends the predicted target area frame to the image target positioning unit (103).

2. According to the mobile robot target tracking system integrating skeleton recognition and IFace-TLD as described in claim 1, it is characterized in that the detection part (101.3) includes a face detection part (101.31) for detecting the area of the face in the original color picture (A) according to the acquired original color picture (A), the target area of the learning part (101.2) and the target area information fed back by the image target positioning unit (103), a face recognition part (101.32) for identifying the target face area according to the area of the face in the original color picture (A) obtained from the face detection part (101.31), and a verifier part (101.33) for judging whether the target face area recognized by the face recognition part (101.32) is correct, and the verification results of the verifier part (101.33) are respectively sent to the learning part (101.2) and the integrator (101.4).

3. The mobile robot target tracking system integrating skeleton recognition and IFace-TLD according to claim 1, characterized in that the motion cycle extraction part (102.1) calculates the motion cycle of the human body according to the acquired skeleton image (B) by using the following formula:

Where dist _k is the distance between the left wrist and shoulder center of the kth frame image in the Kinect coordinate system;

and

4. The mobile robot target tracking system integrating skeleton recognition and IFace-TLD according to claim 1, characterized in that the skeleton feature extraction part (102.2) calculates the skeleton features within the obtained motion cycle of the human body, including:

First, the gait half-cycle is defined as T _w , then the skeletal features based on the upper limbs of the human body are expressed as:

Trajectory feature: The center point of the shoulder is selected as the fixed point, and the relative positions of other upper limb bone points relative to the fixed point are calculated by the following formula to obtain a 9-dimensional feature P:

in,

is represented as:

in,

in,

and

yes

and

The generalized eigenvalues between

x is the corresponding generalized right eigenvector;

Area and distance features: The area feature _FA represents the area of the closed area enclosed by the upper limbs of the human body. The distance feature _FD is represented by the distance between different human body centers. _FA is expressed as:

in,

and

To calculate the distance feature _FD , we first calculate the centers of the three closed polygons in the upper limb area. The three center points are the center of the head.

Right hand center

and left hand center

Calculated by the following formula:

Head Center

Head center

and left hand center

The Euclidean distance between f _t ^d1 and f _t ^d2 is written as:

make

In this way, the entire distance feature is expressed as

in,

and

Static features: Static features are represented by a 5-dimensional vector F _S = [f _h ,f _lua ,f _rua ,f _lf ,f _rf ] ^T , where f _h represents the height of the target, and f _lua ,f _rua ,f _lf and f _rf represent the length of the left upper arm, the length of the right upper arm, the length of the left forearm and the length of the right forearm, respectively. They are obtained by the following formulas:

in,

Frequency feature and amplitude feature: The frequency feature F _Fre is the number of skeleton image frames in a gait half cycle, and the difference between adjacent local maximum and local minimum is the amplitude feature F _Amp ;

Finally, we get a 23-dimensional mixed feature

The composition is based on the skeletal features of the human upper limbs.