CN110826495A

CN110826495A - Body left and right limb consistency tracking and distinguishing method and system based on face orientation

Info

Publication number: CN110826495A
Application number: CN201911082183.5A
Authority: CN
Inventors: 田京兰; 王政元; 蒋彦; 冯志全
Original assignee: University of Jinan
Current assignee: University of Jinan
Priority date: 2019-11-07
Filing date: 2019-11-07
Publication date: 2020-02-21

Abstract

The invention discloses a method and a system for tracking and distinguishing consistency of left and right limbs of a body based on face orientation, wherein a video to be tracked is obtained, the video to be tracked is input into a component detector, and a plurality of identification results of each limb component of a tracked target and a confidence coefficient corresponding to each identification result are output; selecting a face region according to the confidence coefficient, and estimating the face orientation of the tracked target; judging the facing position of the human body according to the face facing direction; according to the face orientation, all recognition results of each limb part of the tracked target are selected according to different priority orders to obtain a candidate set of each limb; finally, the candidate set is inferred using the graph structure model PS to obtain the final pose of the limb.

Description

Body left and right limb consistency tracking and distinguishing method and system based on face orientation

Technical Field

The disclosure relates to the technical field of human posture tracking, in particular to a method and a system for tracking and distinguishing consistency of left and right limbs of a body based on face orientation.

Background

The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.

Current human pose tracking methods tend to be detection-based methods, i.e. first estimating the human pose in each video frame and then integrating the temporal and spatial correlation between poses in successive frames, two-dimensional human pose tracking being an important and still challenging problem in computer vision. In order to obtain better tracking performance, many studies improve the performance of pose estimation by constructing robust human appearance models or modeling more complex human structures, and researchers also suggest modeling more realistic body part dependencies through hierarchical human structures.

In the course of implementing the present disclosure, the inventors found that the following technical problems exist in the prior art:

although these methods are effective for pose estimation, there is an important problem affecting tracking effect in the tracking process, that is, the detection-based tracking method usually uses a graph structure model (PS), neglects consistency of left and right body limb components in continuous video frames, and only depends on relative positions of the limb components in a picture coordinate system, which often causes the problem of left and right limb confusion in tracking. For example, for a sequence of video frames, particularly when all body parts are visible, although all body part poses can be accurately detected, the left and right limbs are often confused, which is more likely to occur when there is sideways movement of the body in the video sequence. This problem is not addressed in the estimation and tracking of human body poses at present.

In video sequences, two-dimensional human pose tracking also has a great challenge to self-occlusion of the human body. Since the human body is a centrosymmetric structure, the different parts of the human body often occlude each other in the two-dimensional image presentation, for example, the limb parts are occluded by the torso or its symmetric parts. In occlusion situations, the problem of "double counting" also occurs in pose detection, i.e. the same image area is used to determine the position of two symmetric limb parts. In the prior art, occlusion time reasoning is integrated into a tracking frame to track human body gestures, and the method forces a tracker to search image evidence to support a smooth path when occlusion occurs, so that a certain improvement effect is achieved on a repeated counting problem.

Disclosure of Invention

In order to overcome the defects of the prior art, the disclosure provides a method and a system for tracking and judging the posture of the left and right limbs of the body based on the face orientation;

in a first aspect, the present disclosure provides a body left and right limb consistency tracking discrimination method based on face orientation;

the body left and right limb consistency tracking and distinguishing method based on the face orientation comprises the following steps:

acquiring a video to be tracked, inputting the video to be tracked into a component detector, and outputting a plurality of identification results of each limb component of a tracked target and a confidence coefficient corresponding to each identification result;

selecting a face region according to the confidence coefficient, and estimating the face orientation of the tracked target; judging the facing position of the human body according to the face facing direction;

according to the face orientation, all recognition results of each limb part of the tracked target are selected according to different priority orders to obtain a candidate set of each limb;

finally, the candidate set is inferred using the graph structure model PS to obtain the final pose of the limb.

In a second aspect, the present disclosure also provides a body left and right limb consistency tracking discrimination system based on face orientation;

facial orientation-based body left and right limb consistency tracking and discriminating system comprises:

a limb component detection module configured to: acquiring a video to be tracked, inputting the video to be tracked into a component detector, and outputting a plurality of identification results of each limb component of a tracked target and a confidence coefficient corresponding to each identification result;

a face orientation estimation module configured to: selecting a face region according to the confidence coefficient, and estimating the face orientation of the tracked target; judging the facing position of the human body according to the face facing direction;

a human pose discrimination module configured to: according to the face orientation, all recognition results of each limb part of the tracked target are selected according to different priority orders to obtain a candidate set of each limb; finally, the candidate set is inferred using the graph structure model PS to obtain the final pose of the limb.

In a third aspect, the present disclosure also provides an electronic device comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, wherein the computer instructions, when executed by the processor, perform the steps of the method of the first aspect.

In a fourth aspect, the present disclosure also provides a computer-readable storage medium for storing computer instructions which, when executed by a processor, perform the steps of the method of the first aspect.

Compared with the prior art, the beneficial effect of this disclosure is:

the method provides a frame for tracking and distinguishing consistency of left and right body parts based on face orientation, and the tracking efficiency of the two-dimensional human body posture is simply and effectively improved.

The proposed face orientation estimation is well integrated into a graph structure model, so that the problem of inconsistent left and right body judgment caused by occlusion and the like can be effectively avoided in tracking;

in the human body posture deducing stage, because the face orientation estimation is integrated, the limbs on the two sides can be respectively distinguished, so that the left and right distinguishing errors of the limbs in the tree diagram structure model can be corrected, and the repeated counting problem of the left and right symmetrical limbs can be avoided;

the proposed method improves especially the tracking accuracy of small body parts, such as arms and legs, which have a high detection rate for complex video sequences with large changes in motion and direction.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.

FIG. 1 is a flow chart of a method of the first embodiment;

FIG. 2(a), FIG. 2(e), FIG. 2(i), FIG. 2(m) illustrate different orientations of the head rectangular frame region of the tracked target;

FIG. 2(b), FIG. 2(f), FIG. 2(j), FIG. 2(n) illustrate skin tone regions where the head region is obtained by a skin tone detector;

fig. 2(c), fig. 2(g), fig. 2(k), fig. 2(o) illustrate binarization results of skin tone/non-skin tone regions of the head region;

FIG. 2(d), FIG. 2(h), FIG. 2(l), FIG. 2(p) illustrate a set of templates preset for different face orientations; fig. 2(d) shows the face facing forward, fig. 2(h) shows the face facing backward, fig. 2(l) shows the face facing left, and fig. 2(p) shows the face facing right.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The first embodiment provides a body left and right limb consistency tracking and distinguishing method based on face orientation;

as shown in fig. 1, the method for tracking and discriminating the consistency of left and right limbs of a body based on the orientation of a face includes:

s1: acquiring a video to be tracked, inputting the video to be tracked into a component detector, and outputting a plurality of identification results of each limb component of a tracked target and a confidence coefficient corresponding to each identification result;

s2: selecting a face region according to the confidence coefficient, and estimating the face orientation of the tracked target; judging the facing position of the human body according to the face facing direction;

s3: according to the face orientation, all recognition results of each limb part of the tracked target are selected according to different priority orders to obtain a candidate set of each limb;

As one or more examples, in S1, the limb member includes: a head, a torso, a left thigh, a left calf, a right thigh, a right calf, a left upper arm, a left lower arm, a right upper arm, and a right lower arm;

in one or more embodiments, in S1, the component detector is: an appearance model trained by machine learning methods for detecting all limb components.

The training process of the component detector comprises the following steps:

constructing a classifier and constructing a training set, wherein the training set is an image of a known limb part name;

and inputting the training set into a classifier, and training the classifier to obtain the trained classifier.

The application process of the component detector is as follows: the image of the unknown limb part name is input into a classifier, and the classifier outputs the identification result of the limb part and the confidence corresponding to each identification result.

Using the likelihood function p (l)_i|d_i) I is 1, …, N. Where i denotes a limb member number, N denotes the total number of limb members, and N is 10, l in this example_iA parametric representation of the ith limb element, represented by a triplet (x, y, theta) of centre point position coordinates and rotation angles, d_iIs a_iImage features of the corresponding region;

as one or more embodiments, in S1, the confidence level corresponding to each recognition result refers to all detection results given by each component detector after performing a sliding window scan on each frame.

As one or more embodiments, the specific step of estimating the face orientation of the tracked target in S2 includes:

s201: marking the head area of the tracked target in each frame of image of the video by using a rectangular frame; the rectangular frame is a head region boundary frame;

s202: identifying a skin color region from a head region bounding box;

s203: determining the position of a skin color area in a head boundary frame according to templates with different face orientations;

and judging the face orientation according to the position of the skin color area in the head bounding box: if the skin tone region is located in the lower left region in the head bounding box, then the face is considered to be facing the right side; if the skin tone region is located in the lower right region in the head bounding box, the face is considered to be facing the left.

As one or more embodiments, in S201, a head region of a tracked target in each frame of image of the video is marked with a rectangular frame; the rectangular frame is a head region boundary frame, and specifically, a position region with the highest confidence in the head detection result output in S1 is set as a head region; the bounding box is drawn according to a preset resolution scale, e.g. if the height of the person is preset 220 pixels, then the head region adopts 33 x 28 resolution.

As one or more embodiments, in S202, a skin color region is identified from a head region bounding box; specifically, a skin color detector based on the YCrCb color space is employed to generate a binary image of the skin region contained in the head region bounding box.

As one or more embodiments, in S203, the templates with different face orientations specifically refer to: a preset set of binary region templates capable of indicating face orientation,

FIG. 2(d), FIG. 2(h), FIG. 2(l), FIG. 2(p) illustrate a set of templates preset for different face orientations;

fig. 2(d) shows the face facing forward, fig. 2(h) shows the face facing backward, fig. 2(l) shows the face facing left, and fig. 2(p) shows the face facing right.

As one or more embodiments, the step of determining the facing position of the human body according to the face orientation in S2 may include: if the face is facing the left, the human body is facing the left; if the face is toward the rear side, the human body faces the right side.

As one or more embodiments, in S3, according to the face orientation, all the recognition results of each limb component of the tracked target are selected according to different priority orders, so as to obtain a candidate set of each limb; the method comprises the following specific steps:

s301: if the human body faces to the right side according to the face orientation, giving high priority to the left limb part, namely selecting the detection result of the left limb part to obtain a candidate set of the left limb part; then selecting the right limb part on the basis of the candidate of the left limb part to obtain a candidate set of the right limb part;

s302: if the human body is judged to face the left side according to the face orientation, the right limb part is given high priority, namely, the detection result of the right limb part is selected firstly to obtain a candidate set of the right limb part; and then selecting the left limb part on the basis of the candidate of the right limb part to obtain a candidate set of the left limb part.

As one or more embodiments, in S301 and S302, selecting a candidate set according to a detection result of a limb part, specifically: sampling all scanning detection results of each limb part in the S1 according to gaussian prior and the estimated limb orientation, where gaussian prior refers to sampling more pixel positions as candidate set elements at a position close to the center of the tracked limb part and according to the order of confidence level according to the detection result of the previous frame, for example, the first a of the confidence levels of each limb part can be taken as the elements of the candidate set of the limb part, where a can be adjusted according to actual application, and is generally less than 30.

As one or more embodiments, in S3, the method for deriving the final posture of the limb by using the graph structure model ps (pictorial structure model) for the candidate set specifically includes:

wherein E is^uRepresenting an apparent model of the detected limb part, E^pThe model of the relationship between limbs is expressed, and the articulated human body structure of the limb part is expressed as L ═ L₁,...,l_N]The corresponding image information is described as D ═ D₁,...,d_N]N is the number of defined limb parts; at time t, the anatomy is denoted L_tD for image information_tShowing, for each limb part

In this example, i denotes a limb member number, N denotes the total number of limb members, and N is 10. i-j represent two limb members that are directly associated.

The figure structure model PS is used for modeling the human body posture in the tracking, and the definition of the left limb part and the right limb part in the model only depends on the relative positions of the left limb part and the right limb part in the image coordinate system, so that the left limb part and the right limb part are often confused along with the difference of the motion position of the human body in the video sequence tracking. On the other hand, in the general graph structure model, since the body pose is always inferred using the candidates in all the body part detectors during the limb part detection process, when occlusion exists, such as a side pose, a "double calculation" problem occurs, i.e., the same image area is used to determine the positions of two symmetric limb parts. In order to solve the two problems, a method for tracking, judging and correcting consistency of left and right body parts based on face orientation is provided, and guiding information is provided in tracking by using the head orientation. The method is used as a tool for assisting the estimation of the human body posture by introducing an estimation step of the face orientation into a tracking frame. Face orientation is an important form of head pose, and in human pose tracking, we need an indicator (left or right) of the orientation of the human body, here by designing a skin tone detector. In general, occlusion occurs more easily when the tracked target is in a lateral pose. In the tracking process, the face orientation is firstly determined, so that the system can determine whether the non-occluded side of the body is the left side or the right side, and then the non-occluded side is completely visible, and then the posture of the limb at the other side is deduced according to the image information. For example, for a certain lateral pose, the head pose detector indicates that the face is facing to the right, then the system may recognize from this that the left side of the body is visible, and accordingly, the system gives higher priority to the left limb piece and assigns the higher scoring limb piece candidate to the left side first; then, posterior candidates of the right limb are searched again, and allocation is performed with reference to the left limb thereof. A right body part is considered occluded if all posterior scores for that part are low. Therefore, the method can effectively reduce confusion of distinguishing the left limb part and the right limb part, and can well avoid the problem of repeated counting, thereby well realizing the consistent tracking of the left limb part and the right limb part of the human body.

(1) For articulated body structures with limb parts, it can be expressed as L ═ L₁,...,l_N]The corresponding image information is described as D ═ D₁,...,d_N]Where N is the number of defined limb members. At time t, the anatomy is denoted L_tD for image information_tAnd (4) showing. Given a sequence of images, human pose tracking is the inference of the posterior probability p (L) of all frames_t|D_t) I.e. estimating the optimal trajectory of each limb part, i.e. finding the maximum a posteriori:

as previously mentioned, this framework has two problems, left and right limb discrimination inconsistency in repeat counting and tracking. In order to improve the tracking efficiency, it is proposed that, in the posture estimation, the head orientation is firstly estimated by the head orientation detector, then different priorities are given to different side limb parts according to the estimation result, and then the human body posture is deduced.

In the anatomical model, each individual limb part pose is determined by position (x, y) and direction θ, N is the number of defined body parts, in our tracking platform, N is 10, i.e. the body is divided into 10 limb parts: head, trunk, left thigh, left shank, right thigh, right shank, left upper arm, left lower arm, right upper arm, right lower arm. The graph structure model comprises two parts, namely a limb part appearance model part (unary term) and an association modeling part between limbs (binary term), and for each frame, the following parts are provided:

where i-j represent two limb members that are directly associated. The univariate term represents a set of pre-trained appearance models based on all the limb components, and the binary term represents the incidence relation information between the limb components.

(2) Face orientation estimation is used to determine the orientation (left/right/front/back) of the tracked object, i.e. the orientation of the tracked object, e.g. front, back, left or right, is identified by designing the face orientation estimator before inferring the body pose. Given the head bounding box, we note that the absolute position of the face region or the relative position of the face and hair regions is a very important clue for head orientation estimation. Considering that the size of a human head region in an image is generally small in a human posture tracking task in an image sequence, in order to ensure algorithm efficiency, a skin color detector is adopted to determine the face region, and the detected face region is subjected to binarization representation. The head image is first transformed from the RGB color space to the YCbCr color space, the resulting image consisting of intensity components (Y) and chrominance components (Cb and Cr). The YCbCr color space is an effective color image pixel separation method and is suitable for complex color images with uneven illumination.

To determine the face orientation, a set of templates for different face orientations is preset, as shown in fig. 2 (b). The boundary of the skin tone region is designed to be an ellipse, and the region where it intersects the head bounding box is regarded as a skin tone region marked in green. If the skin tone region is located only in the lower right or lower left region of the bounding box, we consider the head to be looking left or right, and assume that the lateral pose is left or right. The algorithm is as follows:

in the method, for all limb parts, firstly, a pre-trained human body part detector is used for obtaining confidence degrees of all pixel positions of all limb parts in each frame of picture, then, each limb part is sampled according to Gaussian prior and estimated limb orientation, and more candidates are sampled at positions close to the center of the tracked limb part according to the detection result of the previous frame. If it is determined that the body pose is not front or back, depending on head orientation, the system may first determine which side of the limb is fully visible and give that side a higher priority for preferential sampling. The next step is to infer the posterior of the other side limb and assign it to a different location than the fully visible side limb. In particular, if all posterior values of the other side limb member are relatively small, that side limb member will be considered occluded. If the limb is determined to be the front or the back, the system can directly carry out model inference through a complete 10-part graph structure model to obtain the human body posture.

The second embodiment also provides a body left and right limb posture tracking and distinguishing system based on the face orientation;

In a third embodiment, the present embodiment further provides an electronic device, which includes a memory, a processor, and computer instructions stored in the memory and executed on the processor, where the computer instructions, when executed by the processor, implement the steps of the method in the first embodiment.

In a fourth embodiment, the present embodiment further provides a computer-readable storage medium for storing computer instructions, and the computer instructions, when executed by a processor, perform the steps of the method in the first embodiment.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. The method for tracking and distinguishing the consistency of the left limb and the right limb of the body based on the face orientation is characterized by comprising the following steps of:

2. The method of claim 1, wherein the confidence level associated with each recognition result is a measure of all detection results from each component detector after performing a sliding window scan on each frame.

3. The method of claim 1, wherein the step of estimating the orientation of the face of the tracked object comprises:

s202: identifying a skin color region from a head region bounding box;

4. The method according to claim 3, wherein in S202, a skin color region is identified from a head region bounding box; specifically, a skin color detector based on the YCrCb color space is employed to generate a binary image of the skin region contained in the head region bounding box.

5. The method as claimed in claim 1, wherein the step of determining the facing position of the human body based on the face orientation comprises: if the face is facing the left, the human body is facing the left; if the face is toward the rear side, the human body faces the right side.

6. The method as claimed in claim 1, wherein all recognition results of each limb part of the tracked object are selected according to the face orientation in different priority orders to obtain a candidate set of each limb; the method comprises the following specific steps:

7. The method of claim 1, wherein the candidate set is inferred using a graph structure model PS to obtain a final pose for the limb, in particular:

wherein E is^uRepresenting an apparent model of the detected limb part, E^pThe model of the relationship between limbs is expressed, and the articulated human body structure of the limb part is expressed as L ═ L₁,...,l_N]The corresponding image information is described asD＝[d₁,...,d_N]N is the number of defined limb parts; at time t, the anatomy is denoted L_tD for image information_tShowing, for each limb part

I denotes a limb member number, N denotes the total number of limb members, and i to j denote two directly related limb members.

8. Facial orientation-based body left and right limb consistency tracking and discriminating system comprises:

9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executable on the processor, the computer instructions when executed by the processor performing the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the method of any one of claims 1 to 7.