CN109948560B - Mobile robot target tracking system fusing bone recognition and IFace-TLD - Google Patents

Mobile robot target tracking system fusing bone recognition and IFace-TLD Download PDF

Info

Publication number
CN109948560B
CN109948560B CN201910227611.2A CN201910227611A CN109948560B CN 109948560 B CN109948560 B CN 109948560B CN 201910227611 A CN201910227611 A CN 201910227611A CN 109948560 B CN109948560 B CN 109948560B
Authority
CN
China
Prior art keywords
target
point
bone
tld
tracking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910227611.2A
Other languages
Chinese (zh)
Other versions
CN109948560A (en
Inventor
苑晶
蔡晶鑫
高远兮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nankai University
Original Assignee
Nankai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nankai University filed Critical Nankai University
Priority to CN201910227611.2A priority Critical patent/CN109948560B/en
Publication of CN109948560A publication Critical patent/CN109948560A/en
Application granted granted Critical
Publication of CN109948560B publication Critical patent/CN109948560B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

A mobile robot target tracking system integrating skeleton recognition and IFace-TLD comprises an original color picture of a human body and a skeleton picture of an upper limb which are obtained through a Kinect sensor, an IFace-TLD unit used for tracking and positioning a target on the color picture and a skeleton recognition unit used for tracking and positioning the target on the skeleton picture, wherein a region frame where the target is located is obtained and sent to an image target positioning unit, the image target positioning unit marks a target region on the original color picture according to the obtained region frame where the target is located, and the target region is fed back to the IFace-TLD unit. The invention effectively solves the problem of short sequence tracking, and can realize better tracking on the target face no matter the length of the tracking sequence. The invention can realize stable recognition effect even under the condition that the human face faces back to the camera. The online processing based on the bone recognition is realized, and the tracking accuracy and robustness are improved.

Description

Mobile robot target tracking system fusing bone recognition and IFace-TLD
Technical Field
The invention relates to a target tracking system of a mobile robot. In particular to a mobile robot target tracking system fusing bone recognition and IFace-TLD.
Background
The target tracking has wide application in the fields of security protection, robots, human-computer interaction and the like. In the actual tracking process, due to factors such as rapid movement of the target, change of illumination and occlusion, it is a very challenging task to realize robust and efficient tracking.
The human face has larger distinguishability, and in order to achieve a better tracking effect, the human face is selected to track the target human face. The Face-based tracking-detection-learning algorithm (Face-TLD) can realize the long-time tracking of the Face. However, since this is a long tracking algorithm, when the tracking sequence is short, the tracking effect is poor and even large drift occurs due to insufficient training of the learning part in the Face-TLD. In addition, in an actual application scene, the rotation of the face has great randomness, and the face cannot be ensured to face the camera at all times, in some cases, the target face may completely face away from the camera, and at this time, the tracking algorithm based on the image appearance is invalid.
Human faces have unique biological characteristics and are used in many situations. However, most high-precision face recognition algorithms are time-consuming, and the time-consuming algorithms cannot be applied to mobile robot target tracking with high real-time requirements.
Based on TLD algorithm, face-TLD can track human Face robustly for a long time. The original TLD is a single target tracking algorithm that can track an unknown object in a video stream for a long time. The original TLD can be divided into three parts, namely a tracking part, a learning part and a detection part. Because of its good tracking effect, there are many improvements based on this algorithm. Face-TLD is one of them. Face-TLD combines Face detection and TLD to achieve long-time tracking of faces. In the original Face-TLD, the detector can be split into two parts, a Face detection part and a verifier part. The face detection part processes all image blocks, and the verifier part outputs a confidence coefficient of the image block containing a specific face. However, when the tracking sequence is short, the Face-TLD cannot achieve a satisfactory tracking effect due to insufficient training in the learning part. Specifically, a learning part is introduced because various uncertainties are to be handled. However, to ensure accuracy, the learning part requires a sufficient amount of training data, and at the same time, this is a time-consuming process. Short sequences do not provide enough pictures to train the learning part. More seriously, at the beginning of tracking, if the appearance of the targets is similar, the original Face-TLD is likely to lose track of the targets.
The Microsoft Kinect sensor can directly collect skeleton information of a human body, is relatively robust to human body movement, and can collect relatively stable skeleton even if the human body completely faces away from the Kinect. Due to the advantages of the sensors, the skeleton-based human body recognition is robust to illumination, motion and changes in the appearance of objects. However, the existing human body recognition algorithm based on bones collects a certain amount of data, then processes the data off line, needs to set training labels manually, and is difficult to realize on-line recognition. This obviously does not meet the requirements for online tracking of mobile robots.
Disclosure of Invention
The invention aims to solve the technical problem of providing a mobile robot target tracking system which integrates skeleton recognition and IFace-TLD, can realize better tracking on a target face and can improve the tracking accuracy and robustness.
The technical scheme adopted by the invention is as follows: a mobile robot target tracking system integrating skeleton recognition and IFace-TLD comprises an original color picture of a human body and a skeleton picture of an upper limb which are obtained through a Kinect sensor, an IFace-TLD unit used for tracking and positioning a target on the color picture and a skeleton recognition unit used for tracking and positioning the target on the skeleton picture, wherein a region frame where the target is located is obtained and sent to an image target positioning unit, the image target positioning unit marks a target region on the original color picture according to the obtained region frame where the target is located, and the target region is fed back to the IFace-TLD unit.
The IFace-TLD unit comprises a tracking part, a learning part, a detection part and an integrator, wherein the tracking part is used for respectively acquiring an original color picture, the tracking part is used for estimating the motion trail of an object in the acquired original color picture between two adjacent frames by using an optical flow tracker, and the motion trail is respectively sent to the learning part and the integrator; the detection part scans and processes all image blocks in the acquired first frame of original color picture independently, separates a target face from a background, and sends the target face into the learning part and the integrator respectively; the integrator calculates a confidence coefficient most possibly containing a target position in an original color picture according to the obtained motion track of the target between two adjacent frames and the target face, and sends a calculation result to a learning part and a bone recognition unit or an image target positioning unit; the learning part is used for training according to the original color picture and results obtained from the tracking part, the detection part and the integrator, and updating and correcting errors occurring in the tracking part and the detection part according to the training results.
The detection part comprises a face detection part for detecting the face in the region of the original color picture according to the acquired original color picture, the target region of the learning part and the target region information fed back by the image target positioning unit, a face identification part for identifying the target face region in the region of the original color picture according to the face obtained from the face detection part, and a verifier part for judging whether the target face region identified by the face identification part is correct, wherein the verification results of the verifier part are respectively sent to the learning part and the integrator.
The skeleton identification unit comprises a motion cycle extraction part, a skeleton feature extraction part and a support vector data description part, wherein the motion cycle extraction part calculates the motion cycle of a human body according to an acquired skeleton picture, the skeleton feature extraction part calculates skeleton features in the obtained motion cycle of the human body, when the result output by an integrator in the IFace-TLD unit is an area frame where a target is located, the skeleton features obtained by the skeleton feature extraction part are sent to a training part in the support vector data description part for training, when the result output by the integrator in the IFace-TLD unit is empty, the skeleton features obtained by the skeleton feature extraction part are sent to a prediction part in the support vector data description part, the prediction part predicts the area frame where the target is located according to the training result of the training part, and sends the predicted area frame where the target is located to the image target positioning unit.
The motion cycle extracting part calculates the motion cycle of the human body according to the obtained skeleton picture by adopting the following formula:
Figure BDA0002005702460000021
wherein dist k For the left wrist of the k frame image under the Kinect coordinate systemAnd the distance between the center of the shoulder;
Figure BDA0002005702460000022
and
Figure BDA0002005702460000023
three-dimensional point coordinates representing the centers of the left wrist and the shoulder of the kth frame image; n represents the total number of frames of images in the sequence.
The bone feature extraction part calculates bone features in the obtained motion cycle of the human body, and comprises the following steps:
firstly, defining gait half period as T w Then, based on the bone features of the upper limbs of the human body, the following are expressed respectively:
track characteristics: selecting the center point of the shoulder as a fixed point, and calculating the relative positions of other upper limb bone points relative to the fixed point by the following formula to obtain a 9-dimensional feature P:
Figure BDA0002005702460000031
wherein,
Figure BDA0002005702460000032
represents the j th individual body skeleton in the gait half period T w The position of the t-th frame image in (b),
Figure BDA0002005702460000033
is represented as:
Figure BDA0002005702460000034
wherein,
Figure BDA0002005702460000035
represents the position of the center point of the shoulder in the coordinate system of the camera and the positions of the rest coordinate points in the upper limb skeleton are expressed as ^ greater than or equal to>
Figure BDA0002005702460000036
Representing a trajectory characteristic matrix F containing human walking habits by using covariance matrix of P T Definition of
Figure BDA0002005702460000037
Wherein,
Figure BDA0002005702460000038
and &>
Figure BDA0002005702460000039
Respectively representing the track characteristic matrixes of the test data and the training data;
Figure BDA00020057024600000310
Is that
Figure BDA00020057024600000311
And &>
Figure BDA00020057024600000312
A generalized characteristic value in between, satisfy->
Figure BDA00020057024600000313
x is the corresponding generalized right eigenvector;
area and distance characteristics: area characteristic F A The distance characteristic F represents the area of the upper limb part of the human body which is enclosed into a closed area D Expressed by the distance between the centers of the different bodies, F A Expressed as:
Figure BDA00020057024600000314
wherein,
Figure BDA00020057024600000315
and &>
Figure BDA00020057024600000316
Respectively representing the center point of the shoulder, the head, the left shoulder and the right shoulder in a gait half period T w The position of the t-th frame in (1);
to calculate a distance feature F D First, the centers of three closed polygons of the upper limb area are calculated, and the three center points are the head centers
Figure BDA00020057024600000317
Right hand center->
Figure BDA00020057024600000318
And left-hand center->
Figure BDA00020057024600000319
Calculated by the following formula:
Figure BDA00020057024600000320
head center
Figure BDA00020057024600000321
The center of the polygon enclosed by the shoulder center point and the head point; right hand center->
Figure BDA00020057024600000322
Is the center of a polygon enclosed by a right shoulder point, a right elbow point and a right wrist point; left hand center->
Figure BDA00020057024600000323
Is a left shoulder point, a left elbow point and a left wrist point enclose the center of a polygon; right hand center +>
Figure BDA00020057024600000324
Respectively is connected with the center of the head>
Figure BDA00020057024600000325
And left-hand center->
Figure BDA00020057024600000326
Euclidean distance f between them t d1 And f t d2 Write as:
Figure BDA00020057024600000327
order to
Figure BDA00020057024600000328
Thus, the entire distance characteristic is expressed as
Figure BDA00020057024600000329
Wherein,
Figure BDA00020057024600000330
and &>
Figure BDA00020057024600000331
Respectively represent f di Wherein i =1,2;
static characteristics: the static features are formed by a 5-dimensional vector F S =[f h ,f lua ,f rua ,f lf ,f rf ] T Is shown in which f h Indicating the height of the target, f lua ,f rua ,f lf And f rf Respectively representing the length of the left upper arm, the length of the right upper arm, the length of the left forearm and the length of the right forearm, and specifically obtained by the following formulas:
Figure BDA00020057024600000332
Figure BDA0002005702460000041
Figure BDA0002005702460000042
Figure BDA0002005702460000043
Figure BDA0002005702460000044
wherein,
Figure BDA0002005702460000045
respectively representing the positions of a shoulder center point, a left shoulder point, a left elbow point, a left wrist point, a left hand point, a right shoulder point, a right elbow point, a right wrist point and a right hand point in a camera coordinate system;
frequency and amplitude characteristics: frequency characteristic F Fre Is the number of skeletal image frames in a gait half-cycle, the difference between adjacent local maxima and local minima being the amplitude feature F Amp
Finally, a 23-dimensional mixture feature is obtained
Figure BDA0002005702460000046
The composition is based on the skeletal characteristics of the human upper limbs.
The mobile robot target tracking system fusing bone recognition and IFace-TLD adds Face recognition based on Principal Component Analysis (PCA) to the original Face-TLD algorithm, effectively solves the problem of short sequence tracking, and is called IFace-TLD. By the method, the IFace-TLD can realize better tracking on the target face no matter the length of the tracking sequence. Meanwhile, the SIFA-TLD seamlessly integrates the IFace-TLD with the human body recognition based on the skeleton, when the IFace-TLD is successfully tracked, extracted skeleton features are used for training Support Vector Data Description (SVDD), and when the IFace-TLD is unsuccessfully tracked, the newly extracted skeleton features can be sent into the trained SVDD for recognition. Thus, even when the face of a person is facing away from the camera, a stable recognition effect can be achieved. The online processing based on the bone recognition is realized, and the tracking accuracy and robustness are improved.
Drawings
FIG. 1 is a block diagram of the mobile robot target tracking system of the present invention incorporating bone recognition and IFace-TLD;
FIG. 2 is a schematic representation of a human skeleton containing 20 joints;
FIG. 3 is dist k A graph of (a).
In the drawings
101: IFace-TLD unit 101.1: tracking section
101.2: learning portion 101.3: detecting part
101.31 face detection section 101.32: face recognition section
101.33: validator section 101.4: integrated device
102: bone identification unit 102.1: motion cycle extraction section
102.2: bone feature extraction section 102.3: support vector data description section
102.31: training section 102.32: prediction part
103: image target positioning unit
Detailed Description
The mobile robot target tracking system fusing bone recognition and IFace-TLD of the present invention is described in detail with reference to the following embodiments and the accompanying drawings.
As shown in FIG. 1, the mobile robot target tracking system fusing bone recognition and IFace-TLD of the invention comprises an original color picture A of a human body and a bone picture B of an upper limb which are obtained through a Kinect sensor, an IFace-TLD unit 101 which is used for tracking and positioning a target on the color picture A and a bone recognition unit 102 which is used for tracking and positioning the target on the bone picture B, wherein a region frame where the target is located is obtained and sent to an image target positioning unit 103, the image target positioning unit 103 marks a target region on the original color picture A according to the obtained region frame where the target is located, and feeds the target region back to the IFace-TLD unit 101.
The IFace-TLD unit 101 comprises a tracking part 101.1, a learning part 101.2, a detection part 101.3 and an integrator 101.4, wherein the tracking part 101.1 respectively acquires an original color picture A, estimates and acquires a motion track of an object in the original color picture A between two adjacent frames by using an optical flow tracker, and respectively sends the motion track to the learning part 101.2 and the integrator 101.4; the detection part 101.3 scans and processes all image blocks in the acquired first frame original color picture A independently, separates a target face from a background, and sends the target face to the learning part 101.2 and the integrator 101.4 respectively, scans and processes only a target area and the periphery fed back by the image target positioning unit 103 for the acquired first frame original color picture A, separates the target face from the background, and sends the target face to the learning part 101.2 and the integrator 101.4 respectively; the integrator 101.4 calculates a confidence coefficient that the original color picture A most possibly contains the target position according to the obtained motion track of the target between two adjacent frames and the target face, and sends the calculation result to the learning part 101.2 and the bone recognition unit 102 or the image target positioning unit 103; the learning part 101.2 is trained on the basis of the original color picture a and the results obtained from the tracking part 101.1, the detection part 101.3 and the integrator 101.4, and the errors occurring in the tracking part 101.1 and the detection part 101.3 are updated and corrected on the basis of the training results.
The detection part 101.3 comprises a face detection part 101.31 for detecting the face in the region of the original color picture a according to the acquired original color picture a, the target region of the learning part 101.2 and the target region information fed back by the image target positioning unit 103, a face recognition part 101.32 for recognizing the target face region in the region of the original color picture a according to the face obtained from the face detection part 101.31, and a verifier part 101.33 for judging whether the target face region recognized by the face recognition part 101.32 is correct, wherein the verification results of the verifier part 101.33 are respectively sent to the learning part 101.2 and the integrator 101.4.
The IFace-TLD of the invention integrates Face-TLD and Face recognition based on principal component analysis. The Face-TLD was named as "Face-TLD" in 2010 by ZdenekKalal, krystian Mikolajczyk and Jiri Matas: the Tracking-Learning-Detection application to Faces article. Compared with the original Face-TLD, the IFace-TLD has more Face detection parts, so that the tracking performance is enhanced. In the IFace-TLD unit 101, face recognition based on principal component analysis is added after face detection, thereby serving to distinguish image blocks that are similar in shape. In order to reduce the number of image blocks and improve the processing speed, only the feedback target area is considered, and the size of the feedback target area is twice of that of the previously obtained bounding box containing the target face. The face detection is used to detect all image blocks containing faces, and then the face recognition part filters out faces which are not targets. Finally, all the remaining image blocks are sent to the verifier to further determine whether the image blocks contain the target face.
The bone recognition unit 102 comprises a motion cycle extraction part 102.1, a bone feature extraction part 102.2 and a support vector data description part 102.3, wherein the motion cycle extraction part 102.1 calculates the motion cycle of the human body according to the obtained bone picture B, the bone feature extraction part 102.2 calculates the bone features in the obtained motion cycle of the human body, when the result output by the integrator 101.4 in the IFace-TLD unit 101 is the region frame where the target is located, the bone features obtained by the bone feature extraction part 102.2 are sent to the training part 102.31 in the support vector data description part 102.3 for training, when the result output by the integrator 101.4 in the IFace-TLD unit 101 is empty, the bone features obtained by the bone feature extraction part 102.2 are sent to the prediction part 102.32 in the support vector data description part 102.3, and the prediction part 102.32 predicts the region frame where the target is located according to the result of the training part 102.31 and sends the predicted target region frame where the target is located to the training vector data positioning unit 103.
The human skeleton containing 20 joints is shown in fig. 2, wherein the numbering is shown in table 1:
TABLE 1
1 Center point of hip joint 11 Right wrist point
2 Ridge point 12 Right hand point
3 Shoulder center point 13 Left hip joint point
4 Head point 14 Point of left knee
5 Left shoulder point 15 Left ankle point
6 Point of left elbow 16 Left foot point
7 Left wrist point 17 Right hip joint point
8 Left hand point 18 Point of right knee
9 Point of right shoulder 19 Right ankle point
10 Point of right elbow 20 Right foot point
The invention realizes online human body identification by using ten skeleton points of the upper limb, and the ten skeleton points are connected by a black solid line. The remaining ten bone points are connected by black dashed lines. The gait cycle is obtained by calculating the distance between the left wrist and the center of the shoulder in the Kinect coordinate system.
The motion cycle extracting part 102.1 calculates the motion cycle of the human body according to the obtained skeleton picture B by adopting the following formula:
Figure BDA0002005702460000061
wherein, dist k The distance between the left wrist point and the shoulder central point of the kth frame image in the Kinect coordinate system;
Figure BDA0002005702460000062
and &>
Figure BDA0002005702460000063
Three-dimensional point coordinates representing a left wrist point and a shoulder center point of the kth frame image; n denotes the total number of frames of the image in the sequence, dist k The curves are shown in figure 3. The dotted line represents the most primitive distance curve. In order to reduce the noise interference, the raw data is mean filtered. The filtered curves are represented by solid lines. The present invention defines the full gait cycle as the interval between adjacent local maxima (or minima) in the number of frames. The number of frames between adjacent local maxima and local minima is defined as the gait half cycle. Because the skeletal features are extracted during the gait cycle, the invention extracts during the gait half cycle in order to obtain more skeletal features.
The bone feature extraction part 102.2 calculates bone features in the obtained motion cycle of the human body, and comprises the following steps:
firstly, defining gait half period as T w Then, based on the bone features of the upper limbs of the human body, the following are expressed respectively:
track characteristics: selecting the center point of the shoulder as a fixed point, and calculating the relative positions of other upper limb bone points relative to the fixed point by the following formula to obtain a 9-dimensional feature P:
Figure BDA0002005702460000071
wherein,
Figure BDA0002005702460000072
represents the j th individual body skeleton in the gait half period T w The position of the t-th frame image in (b),
Figure BDA0002005702460000073
is represented as:
Figure BDA0002005702460000074
wherein,
Figure BDA0002005702460000075
represents the position of the center point of the shoulder under the coordinate system of the camera, and the positions of the rest coordinate points in the upper limb skeleton are expressed as ^ greater than or equal to>
Figure BDA0002005702460000076
Representing a trajectory characteristic matrix F containing human walking habits by using covariance matrix of P T Definition of
Figure BDA0002005702460000077
Wherein,
Figure BDA0002005702460000078
and &>
Figure BDA0002005702460000079
Respectively representing the track characteristic matrixes of the test data and the training data;
Figure BDA00020057024600000710
Is that
Figure BDA00020057024600000711
And &>
Figure BDA00020057024600000712
A generalized characteristic value in between, satisfy->
Figure BDA00020057024600000713
x is the corresponding generalized right eigenvector;
area and distance characteristics: area characteristic F A The distance characteristic F represents the area of the upper limb part of the human body which is enclosed into a closed area D Expressed by the distance between the centers of the different bodies, F A Expressed as:
Figure BDA00020057024600000714
wherein,
Figure BDA00020057024600000715
and &>
Figure BDA00020057024600000716
Respectively showing the center point of the shoulder, the head point, the left shoulder point and the right shoulder point in the gait half period T w The position of the t-th frame in (1);
for calculating the distance characteristic F D Firstly, the centers of three closed polygons of the upper limb area are calculated, and the three center points are the head centers
Figure BDA00020057024600000717
Right hand center->
Figure BDA00020057024600000718
And left-hand center->
Figure BDA00020057024600000719
Calculated by the following formula:
Figure BDA00020057024600000720
head center
Figure BDA00020057024600000721
The center point of the shoulder and the head point enclose a polygon center; right hand center->
Figure BDA00020057024600000722
Is the center of a polygon enclosed by a right shoulder point, a right elbow point and a right wrist point; left hand center->
Figure BDA00020057024600000723
Is a left shoulder point, a left elbow point and a left wrist point enclose the center of a polygon; right hand center +>
Figure BDA00020057024600000724
Respectively is connected with the center of the head>
Figure BDA00020057024600000725
And left-hand center->
Figure BDA00020057024600000726
Euclidean distance f between them t d1 And f t d2 Write as:
Figure BDA00020057024600000727
order to
Figure BDA00020057024600000728
Thus, the entire distance characteristic is expressed as
Figure BDA00020057024600000729
Wherein,
Figure BDA00020057024600000730
and &>
Figure BDA00020057024600000731
Respectively represents f di Wherein i =1,2;
static characteristics: the static features are formed by a 5-dimensional vector F S =[f h ,f lua ,f rua ,f lf ,f rf ] T Is shown in which f h Indicating the height of the target, f lua ,f rua ,f lf And f rf Respectively representing the length of the left upper arm, the length of the right upper arm, the length of the left forearm and the length of the right forearm, and specifically obtained by the following formulas:
Figure BDA0002005702460000081
Figure BDA0002005702460000082
Figure BDA0002005702460000083
Figure BDA0002005702460000084
Figure BDA0002005702460000085
wherein,
Figure BDA0002005702460000086
respectively representing the-positions of the shoulder center point, the left shoulder point, the left elbow point, the left wrist point, the left hand point, the right shoulder point, the right elbow point, the right wrist point and the right hand point under the camera coordinate system.
Frequency and amplitude characteristics: frequency characteristic F Fre Is the number of skeletal image frames in a gait half-cycle, the difference between adjacent local maxima and local minima being the amplitude feature F Amp
Finally, a 23-dimensional mixture feature is obtained
Figure BDA0002005702460000087
The composition is based on the skeletal characteristics of the human upper limbs. />

Claims (4)

1. A mobile robot target tracking system fusing bone recognition and IFace-TLD is characterized by comprising an original color picture (A) of a human body and a bone picture (B) of an upper limb which are obtained through a Kinect sensor, an IFace-TLD unit (101) used for tracking and positioning a target on the color picture (A) and a bone recognition unit (102) used for tracking and positioning the target on the bone picture (B) are used for obtaining a region frame where the target is located and sending the region frame into an image target positioning unit (103), the image target positioning unit (103) marks a target region on the original color picture (A) according to the obtained region frame where the target is located and feeds the target region back to the IFace-TLD unit (101);
the IFace-TLD unit (101) comprises a tracking part (101.1), a learning part (101.2), a detection part (101.3) and an integrator (101.4), wherein the tracking part (101.1) respectively acquires an original color picture (A), estimates the motion track of an object in the acquired original color picture (A) between two adjacent frames by using an optical flow tracker, and respectively feeds the motion track into the learning part (101.2) and the integrator (101.4); the detection part (101.3) scans and processes all image blocks in the acquired first frame of original color picture (A) independently, separates a target face from a background, sends the target face into the learning part (101.2) and the integrator (101.4) respectively, scans and processes only a target area and the periphery fed back by the image target positioning unit (103) for the acquired first frame of original color picture (A), separates the target face from the background, and sends the target face into the learning part (101.2) and the integrator (101.4) respectively; the integrator (101.4) calculates a confidence coefficient that the original color picture (A) most possibly contains the target position according to the obtained motion track of the target between two adjacent frames and the target face, and sends the calculation result to a learning part (101.2) and a bone recognition unit (102) or an image target positioning unit (103); the learning part (101.2) trains according to the original color picture (A) and results obtained from the tracking part (101.1), the detection part (101.3) and the integrator (101.4), and updates and corrects errors occurring in the tracking part (101.1) and the detection part (101.3) according to the training results;
the bone recognition unit (102) comprises a motion cycle extraction part (102.1), a bone feature extraction part (102.2) and a support vector data description part (102.3), wherein the motion cycle extraction part (102.1) calculates the motion cycle of a human body according to an acquired bone picture (B), the bone feature extraction part (102.2) calculates bone features in the obtained motion cycle of the human body, when the result output by the integrator (101.4) in the IFace-TLD unit (101) is a region frame where a target is located, the bone features obtained by the bone feature extraction part (102.2) are sent to a training part (102.31) in the support vector data description part (102.3) for training, when the result output by the integrator (101.4) in the IFace-TLD unit (101) is empty, the bone features obtained by the bone feature extraction part (102.2) are sent to a prediction part (102.32) in the support vector data description part (102.3), and the prediction part (102.32) sends the target features obtained by the bone feature extraction part (102.2) to the prediction part (102.32) according to the training frame of the training frame where the target is located, and the target region frame (103) is located.
2. The mobile robot target tracking system integrating bone recognition and IFace-TLD according to claim 1, wherein the detecting section (101.3) comprises a face detecting section (101.31) for detecting a face in the region of the original color picture (a) based on the original color picture (a) obtained, the target region of the learning section (101.2) and the target region information fed back by the image target locating unit (103), a face recognizing section (101.32) for recognizing the target face region in the region of the original color picture (a) based on the face obtained from the face detecting section (101.31), and a verifier section (101.33) for determining whether the target face region recognized by the face recognizing section (101.32) is correct, and the verification results of the verifier section (101.33) are fed into the learning section (101.2) and the integrator (101.4) respectively.
3. The mobile robot target tracking system fusing bone recognition and IFace-TLD according to claim 1, wherein the motion cycle extracting part (102.1) calculates the motion cycle of the human body from the obtained bone picture (B) by using the following formula:
Figure FDA0004038316100000021
wherein, dist k Is the distance between the left wrist and the center of the shoulder of the k frame image in the Kinect coordinate system;
Figure FDA0004038316100000022
and &>
Figure FDA0004038316100000023
Three-dimensional point coordinates representing the centers of the left wrist and the shoulder of the image of the kth frame; n represents the total number of frames of images in the sequence.
4. The mobile robotic target tracking system fusing bone recognition and IFace-TLD according to claim 1, wherein the bone feature extraction part (102.2) calculates bone features in the derived motion cycle of the human body, comprising:
firstly, defining gait half period as T w Then, based on the bone features of the upper limbs of the human body, the following are expressed respectively:
track characteristics: selecting the center point of the shoulder as a fixed point, and calculating the relative positions of other upper limb bone points relative to the fixed point by the following formula to obtain a 9-dimensional feature P:
Figure FDA0004038316100000024
wherein,
Figure FDA0004038316100000025
represents the j th individual's body skeleton in the gait half period T w Is taken into the image of the tth frame in (4), and>
Figure FDA0004038316100000026
is represented as:
Figure FDA0004038316100000027
wherein,
Figure FDA0004038316100000028
representing the position of the center point of the shoulder and the rest coordinate points in the upper limb skeleton in the coordinate system of the cameraIs indicated as &>
Figure FDA0004038316100000029
Representing a trajectory characteristic matrix F containing human walking habits by using covariance matrix of P T Definition of
Figure FDA00040383161000000210
Wherein,
Figure FDA00040383161000000211
and &>
Figure FDA00040383161000000212
Respectively representing the track characteristic matrixes of the test data and the training data;
Figure FDA00040383161000000213
Is
Figure FDA00040383161000000214
And
Figure FDA00040383161000000215
a generalized characteristic value in between, satisfy->
Figure FDA00040383161000000216
x is the corresponding generalized right eigenvector;
area and distance characteristics: area characteristic F A The distance characteristic F represents the area of the upper limb part of the human body which is enclosed into a closed area D Expressed by the distance between the centers of the different bodies, F A Expressed as:
Figure FDA00040383161000000217
wherein,
Figure FDA00040383161000000218
and &>
Figure FDA00040383161000000219
Respectively represents the center point of the shoulder, the head, the left shoulder and the right shoulder in a gait half period T w The position of the t-th frame in (1);
to calculate a distance feature F D Firstly, the centers of three closed polygons of the upper limb area are calculated, and the three center points are the head centers
Figure FDA00040383161000000220
Right hand center +>
Figure FDA00040383161000000221
And left hand center +>
Figure FDA00040383161000000222
Calculated by the following formula:
Figure FDA00040383161000000223
head center
Figure FDA00040383161000000224
The center of the polygon enclosed by the shoulder center point and the head point; right hand center->
Figure FDA00040383161000000225
Is the center of a polygon enclosed by a right shoulder point, a right elbow point and a right wrist point; left hand center->
Figure FDA0004038316100000031
Is a left shoulder point, a left elbow point and a left wrist point enclose the center of a polygon; right hand center->
Figure FDA0004038316100000032
Respectively in combination with the center of the head>
Figure FDA0004038316100000033
And left-hand center->
Figure FDA0004038316100000034
Euclidean distance f between them t d1 And f t d2 Write as:
Figure FDA0004038316100000035
order to
Figure FDA0004038316100000036
Thus, the entire distance characteristic is expressed as
Figure FDA0004038316100000037
Wherein,
Figure FDA0004038316100000038
and &>
Figure FDA0004038316100000039
Respectively represents f di Wherein i =1,2;
static characteristics: the static features are formed by a 5-dimensional vector F S =[f h ,f lua ,f rua ,f lf ,f rf ] T Is shown in which f h Indicating the height of the target, f lua ,f rua ,f lf And f rf Respectively representing the length of the left upper arm, the length of the right upper arm, the length of the left forearm and the length of the right forearm, and specifically obtained by the following formulas:
Figure FDA00040383161000000310
Figure FDA00040383161000000311
Figure FDA00040383161000000312
Figure FDA00040383161000000313
Figure FDA00040383161000000314
wherein,
Figure FDA00040383161000000315
respectively representing the positions of a shoulder center point, a left shoulder point, a left elbow point, a left wrist point, a left hand point, a right shoulder point, a right elbow point, a right wrist point and a right hand point in a camera coordinate system;
frequency and amplitude characteristics: frequency characteristic F Fre Is the number of skeletal image frames in a gait half-cycle, the difference between adjacent local maxima and local minima being the amplitude feature F Amp
Finally, a 23-dimensional mixture feature is obtained
Figure FDA00040383161000000316
The composition is based on the skeletal characteristics of the human upper limbs. />
CN201910227611.2A 2019-03-25 2019-03-25 Mobile robot target tracking system fusing bone recognition and IFace-TLD Active CN109948560B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910227611.2A CN109948560B (en) 2019-03-25 2019-03-25 Mobile robot target tracking system fusing bone recognition and IFace-TLD

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910227611.2A CN109948560B (en) 2019-03-25 2019-03-25 Mobile robot target tracking system fusing bone recognition and IFace-TLD

Publications (2)

Publication Number Publication Date
CN109948560A CN109948560A (en) 2019-06-28
CN109948560B true CN109948560B (en) 2023-04-07

Family

ID=67010640

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910227611.2A Active CN109948560B (en) 2019-03-25 2019-03-25 Mobile robot target tracking system fusing bone recognition and IFace-TLD

Country Status (1)

Country Link
CN (1) CN109948560B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102945375A (en) * 2012-11-20 2013-02-27 天津理工大学 Multi-view monitoring video behavior detection and recognition method under multiple constraints
CN105469113A (en) * 2015-11-19 2016-04-06 广州新节奏智能科技有限公司 Human body bone point tracking method and system in two-dimensional video stream
CN105760832A (en) * 2016-02-14 2016-07-13 武汉理工大学 Escaped prisoner recognition method based on Kinect sensor
CN106652291A (en) * 2016-12-09 2017-05-10 华南理工大学 Indoor simple monitoring and alarming system and method based on Kinect
CN108805093A (en) * 2018-06-19 2018-11-13 华南理工大学 Escalator passenger based on deep learning falls down detection algorithm

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102945375A (en) * 2012-11-20 2013-02-27 天津理工大学 Multi-view monitoring video behavior detection and recognition method under multiple constraints
CN105469113A (en) * 2015-11-19 2016-04-06 广州新节奏智能科技有限公司 Human body bone point tracking method and system in two-dimensional video stream
CN105760832A (en) * 2016-02-14 2016-07-13 武汉理工大学 Escaped prisoner recognition method based on Kinect sensor
CN106652291A (en) * 2016-12-09 2017-05-10 华南理工大学 Indoor simple monitoring and alarming system and method based on Kinect
CN108805093A (en) * 2018-06-19 2018-11-13 华南理工大学 Escalator passenger based on deep learning falls down detection algorithm

Also Published As

Publication number Publication date
CN109948560A (en) 2019-06-28

Similar Documents

Publication Publication Date Title
CN111144217B (en) Motion evaluation method based on human body three-dimensional joint point detection
US9235753B2 (en) Extraction of skeletons from 3D maps
US9002099B2 (en) Learning-based estimation of hand and finger pose
US8824781B2 (en) Learning-based pose estimation from depth maps
US7239718B2 (en) Apparatus and method for high-speed marker-free motion capture
CN111881887A (en) Multi-camera-based motion attitude monitoring and guiding method and device
Del Rincón et al. Tracking human position and lower body parts using Kalman and particle filters constrained by human biomechanics
CN109377513B (en) Global three-dimensional human body posture credible estimation method for two views
Cerveri et al. Robust recovery of human motion from video using Kalman filters and virtual humans
CN109344694B (en) Human body basic action real-time identification method based on three-dimensional human body skeleton
CN111784775B (en) Identification-assisted visual inertia augmented reality registration method
CN111027432B (en) Gait feature-based visual following robot method
CN117671738B (en) Human body posture recognition system based on artificial intelligence
CN114120188B (en) Multi-row person tracking method based on joint global and local features
CN111596767A (en) Gesture capturing method and device based on virtual reality
CN107341179B (en) Standard motion database generation method and device and storage device
CN110910426A (en) Action process and action trend identification method, storage medium and electronic device
CN117238031B (en) Motion capturing method and system for virtual person
CN109948560B (en) Mobile robot target tracking system fusing bone recognition and IFace-TLD
CN113487674A (en) Human body pose estimation system and method
CN113221815A (en) Gait identification method based on automatic detection technology of skeletal key points
CN113592898A (en) Method for reconstructing missing mark in motion capture
Sadeghzadehyazdi et al. Glidar3dj: a view-invariant gait identification via flash lidar data correction
CN116503540A (en) Human body motion capturing, positioning and environment mapping method based on sparse sensor
Bonnet et al. Toward an affordable and user-friendly visual motion capture system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant