WO2017084204A1

WO2017084204A1 - Method and system for tracking human body skeleton point in two-dimensional video stream

Info

Publication number: WO2017084204A1
Application number: PCT/CN2016/070898
Authority: WO
Inventors: 陈勇杰
Original assignee: 广州新节奏智能科技有限公司
Priority date: 2015-11-19
Filing date: 2016-01-14
Publication date: 2017-05-26
Also published as: CN105469113A; CN105469113B

Abstract

A method and a system for tracking a human body skeleton point in a two-dimensional video stream, the method comprising: a camera acquiring a two-dimensional video stream, acquiring, by means of a foreground extraction module, a foreground image, and acquiring, by means of a face detection module, the coordinates of a head point and a neck point, the system determining whether the head point is in a screen, if the head point is not in the screen, continuing the face detection, and if the head point is in the screen, dividing the human body into a left part ROI and a right part ROI so as to respectively detect other key points, acquiring, by means of a shoulder point detection module, the coordinates of a left shoulder point and a right shoulder point, acquiring, by means of a hand point detection module, the coordinates of a left hand point and a right hand point, and acquiring, by means of an elbow point detection module, the coordinates of a left elbow point and a right elbow point; and finally summarizing the credibility of each point and displaying credible points.

Description

Human bone point tracking method and system in two-dimensional video stream

Technical field

The invention relates to the research field of image processing, and in particular to a human bone point tracking method and system in a two-dimensional video stream.

Background technique

Human-computer interaction technology refers to the technology of realizing the effective communication between humans and computers through the input and output devices of computers to facilitate the way people use them. Human skeleton point tracking technology is an important technology in the field of human-computer interaction. It can identify the movement of the human body by means of infrared rays, and can track multiple parts of the human body in real time without any external equipment for action capture. It has a wide application prospect in the machine interaction environment. The prior art human bone tracking technology is generally a framework of Kinect and a PC host system. Kinect is mainly responsible for acquiring images, deep data streams and bone information, and the host is responsible for acquiring image and depth data through the database for bone trajectory tracking and three-dimensional data. The world coordinate system is transformed into the image pixel coordinate system of the two-dimensional data, and then each bone data is filtered by the noise reduction to obtain the bone tracking information of the human body, and the most important one in the technology is to identify the user's bone information, and the existing In the technology, the infrared sensor is used to perceive the environment at a speed of 30 frames per second by using a black-and-white spectrum to generate a depth of field image stream, and then the infrared sensor will detect the 3D depth image to find a moving object that may be a human body in the image. By identifying the different parts of the body by image-by-image distribution, the segmentation strategy is used to distinguish the human body from the background environment, and the useful signals are extracted from the noise. Finally, the random decision tree and the forest identify the pixel information through the body component recognition. All pixel information is assembled to form a 3D skeleton joint Reliable prediction opposed, given the possibility of a particular pixel belongs to which body parts. However, this method is sensitive to the surrounding lighting environment, and poor lighting conditions may affect the accuracy of tracking; Obstructions such as ornaments on the human body will reduce some local features of the human body, affecting the tracking of the bones, or even tracking, resulting in low recognition accuracy and reducing the efficiency and naturalness of human-computer interaction.

Summary of the invention

The main object of the present invention is to overcome the shortcomings and shortcomings of the prior art, and to provide a human bone point tracking method and system in a two-dimensional video stream, which is to establish the coordinates of each joint of the human body by processing the depth data, and determine the human body by using bone tracking. Various parts.

In order to achieve the above object, the present invention adopts the following technical solutions:

The invention provides a human skeleton point tracking method in a two-dimensional video stream, the method comprising the following steps:

The camera acquires a two-dimensional video stream, reconstructs the background, and extracts the foreground mask by using a subtractive background method, and outputs a foreground image after denoising processing;

Detecting a human face on the foreground image of the output, and obtaining coordinates of the rectangular area of the face, the head point, and the neck point;

Determining whether the head point is on the screen, if not, proceeding to the face detection module; if yes, dividing the human body into a left part ROI and a right part ROI to perform detection of other key points;

Use a specific position to scan and return a pixel value point to achieve shoulder point detection, and obtain left shoulder and right shoulder coordinates;

Utilizing the near-end point of the minimum circumscribed rectangle of the skin color region and returning to realize the hand point detection, and acquiring the coordinates of the left-hand point and the right-hand point;

By dividing the hand ROI into three regions, each region uses different scanning mode return points to realize elbow point detection, and obtain left elbow point and right elbow point coordinates;

Finally, the credibility of each point is counted and the credible points are displayed.

Preferably, in the step of acquiring the two-dimensional video stream by the camera, reconstructing the background and extracting the foreground mask by using the subtractive background method, and outputting the foreground image after the denoising processing, the specific method for outputting the foreground image is:

Obtain the face center position HEAD(x, y) by the face detection algorithm

Set two parameters: left composition threshold left_, right composition threshold right_, left composition indicator left_get=0, right composition indicator right_get=0;

Prompt the user to move to the left. When the horizontal position of the face is x<left_, left_get=1, the image of the right half of the current screen is saved and recorded as image_right;

Continue to prompt the user to move to the right, when the face center position abscissa x>right_, right_get=1, save the image of the left half of the current screen, recorded as image_left;

When left_get=1 and right_get=1, put image_left and image_right together to get the background image BACKGROUND.

BACKGROUND=image_left+LD(image_right,image_left.clos)

Where LD(a,b) represents shifting the image a as a whole to the right by b pixels;

Then, after inputting an image IMAGE, the IMAGE and the BACKGROUND are subtracted and denoised to obtain the foreground mask foreground_mask, and the foreground_mask is binarized to obtain the MASK;

The IMAGE and MASK are processed and processed to output the foreground image FOREGROUND.

Preferably, in the step of detecting the face of the output foreground image and acquiring the coordinates of the face rectangular area, the head point and the neck point, the Haar classifier is used for face detection, and the specific method is:

Convert the color map to a grayscale image;

Histogram equalization of grayscale images to enhance contrast;

The Haar classifier is used to detect the positive face, and when the positive face is detected, the coordinates of the center point of the face and the length and width of the face rectangle are returned;

If the positive face is not detected, use the Haar classifier to detect the side face and return the coordinates of the face center point and The face is rectangular in length and width.

Preferably, in the step of performing shoulder point detection by using a specific position scanning and returning a pixel value point, and acquiring the coordinates of the left shoulder point and the right shoulder point, the specific method for implementing the shoulder point detection is:

Image preprocessing to obtain the contour of the human body;

Take the left shoulder point ROI, the size is recorded as (ROI_HEIGHT, ROI_WIDTH);

Set SCAN_X, SCAN_X is n1 times the width of the input image, where 0<n1<1, ie SCAN_X=n 1*ROI_WIDTH;

Sweep the left shoulder ROI from the top with the width SCAN_X, and return the coordinates if the value is greater than the set value M;

If the sweep cannot have a value greater than M, sweep the left shoulder ROI from right to left with the length SCAN_Y, where SCAN_Y is n2 times the length of the input image, where 0<n1<1, ie SCAN_Y=n2*ROI_HEIGHT if the value is greater than M, return the coordinates of the point;

The coordinates of the left shoulder point are obtained by the same identification method as described above.

Preferably, in the step of using the minimum circumscribed rectangle near the end point of the skin color region and returning to realize the hand point detection, and acquiring the coordinates of the left hand point and the right hand point, the specific method for implementing the hand point detection is:

Convert RGB into YCrCb coordinate system and store it in YUU;

Separate the YUU three channels and extract the special information in each channel of the YUU to form a new map, which is stored in the BW;

Open the BW, except to remove noise, smooth the image and extract the outer contour;

Traversing the outer contour and extracting the contour L corresponding to the largest area, creating a minimum circumscribed rectangle K of L;

K returns directly to the center point when the following conditions are met: the rectangle width is less than X times the rectangle height and the rectangle height is less than X times the rectangle width, where 1<X<2;

If not satisfied:

Create a new point container ptt for the vertices of the minimum bounding rectangle K;

Detect the left hand, find the leftmost point, define it as ptt[0], determine the next left point, define it as ptt[1], define p1 as the midpoint of K, and define p2 as ptt[0] and ptt[1 Midpoint of

The geometric position of p1 and p2 is used to determine the general position of the hand, and the coordinates are assigned to p2. When p2 is in the edge part, the value is (0, 0), and the value of (0, 0) is not displayed;

Return p2;

The coordinates of the right hand are identified using the same method as described above.

Preferably, in the step of dividing the hand ROI into three regions, each region uses different scanning mode return points to realize elbow point detection, and acquiring left elbow point and right elbow point coordinates, and implementing elbow point The specific method of detection is:

Image preprocessing to obtain the contour of the human body;

Take the ROI of the left elbow and divide the ROI into three areas, which respectively correspond to the three postures of raising the hand, 45 degrees, and the hips down;

When the difference between the abscissa of the shoulder point and the abscissa of the hand point is greater than IMAGE_HEIGHT/50:

Raise your hand: When the difference between the ordinate of the hand point and the ordinate of the shoulder point is less than the threshold IMAGE_HEIGHT/5, sweep the point from the bottom up and swipe to return;

45 degrees downward: When the difference between the ordinate of the hand point and the ordinate of the shoulder point is greater than the threshold IMAGE_HEIGHT/5, the point is swept from right to left, and the pixel value of the swept point is greater than the return value;

Forked waist movement: When the difference between the shoulder coordinate of the shoulder point and the horizontal coordinate of the hand point is less than IMAGE_HEIGHT/50, the point is swept from left to right, and the coordinates of the point with the first pixel value greater than 50 are returned.

Preferably, the method further comprises the steps of:

The foot point detection is implemented by using the near-end point of the minimum circumscribed rectangle of the foreground area of the lower body and returning. The specific method of the foot point detection is:

In the whole body mode, the human body lower body ROI of the foreground image is taken out by half of the screen;

Extracting the outer contour, traversing the outer contour and extracting the contour L corresponding to the largest area, creating a minimum circumscribed rectangle K of L;

K returns directly to the center point when the following conditions are met: the rectangle width is less than Y times the rectangle height and the rectangle height is less than Y times the rectangle width, where 1<Y<2;

If not satisfied:

Create a new point container ptfoot to hold the vertices of the minimum bounding rectangle K;

Detect the left foot, find the leftmost point, define it as ptfoot[0], determine the next left point, define it as ptfoot[1], define p1 as the midpoint of K, and define p2 as ptfoot[0] and ptfoot[ The midpoint of 1];

Determine the general position of the foot by the geometric relationship of p1 and p2, and assign the coordinates to p2;

When p2 is in the edge part, the value is (0, 0), and the value of (0, 0) is not displayed;

Return p2;

The coordinates of the right foot point are obtained by the same recognition method described above.

Preferably, the method further comprises the steps of:

The knee point detection is realized by scanning and returning the distance from the foot point to the set height. The specific method of the knee point detection is:

The background reconstruction module acquires the foreground of the human body, and in the whole body mode, removes the body ROI of the lower body;

Get human height BODY_HEIGHT, BODY_HEIGHT=FOOT_LEFT_Y–FACE_Y+FACE_HEIGHT/2;

Take the left foot ROI, the size is recorded as (ROI_HEIGHT, ROI_WIDTH);

Set SCAN_Y, SCAN_Y is 0.2 times the height of the user, ie SCAN_Y=0.2*BODY_HEIGHT;

Sweep the left foot ROI from left to right with the height of FOOT_LEFT_Y above SCAN_Y, if there is a value If it is greater than 50, the coordinate of the point (x+12, y) is returned, where x+12 represents a 12-pixel offset processing on the abscissa, so that the knee point is at the center of the knee;

If the value cannot be greater than 50, it returns (0,0) and is set as an untrusted point;

The coordinates of the right knee point are obtained by the same recognition method described above.

The invention also provides a human skeleton point tracking system in a two-dimensional video stream, the system comprising:

The foreground extraction module is configured to acquire a two-dimensional video stream, reconstruct a background, and extract a foreground mask by using a subtractive background method, and output a foreground image after denoising processing;

a face detection module, configured to detect a face of the output foreground image, and obtain a rectangle of the face, a head point, and a neck point coordinate;

a judging module, configured to determine whether the head point is in the screen; if not, proceeding to the face detecting module; if yes, dividing the human body into a left part ROI and a right part ROI to perform other key points respectively;

a shoulder point detection module for performing shoulder point detection by scanning and returning a pixel value point using a specific position, and acquiring left shoulder point and right shoulder point coordinates;

The hand detection module is configured to perform the hand point detection by using the minimum circumscribed rectangle near the end point of the skin color region, and obtain the coordinates of the left hand point and the right hand point;

The elbow detection module is configured to divide the hand ROI into three regions, and each region uses different scanning mode return points to realize elbow point detection, and obtain left elbow point and right elbow point coordinates;

The statistics module finally counts the credibility of each point and displays the trusted points.

Preferably, the system further comprises a foot point detection module and a knee point detection module;

The foot point detecting module is configured to perform a foot point detection by using a minimum circumscribed rectangle near end point of the lower body foreground area and returning;

The knee point detecting module is configured to scan and return by using a foot point to take a set height distance The method of returning implements knee point detection.

Compared with the prior art, the present invention has the following advantages and beneficial effects:

1. The invention does not need to use depth information, and can directly realize the human body skeleton point recognition by using an ordinary camera, and has universal applicability.

2. The algorithm of the invention is simple, occupies less computing resources, has low hardware requirements, and has strong real-time performance;

3. The invention is not limited by the development platform, and can be applied to mobile terminals (such as mobile phones, tablets, etc.) to meet cross-platform requirements and has strong portability.

4. The invention can cope with the complicated background and uneven illumination in the general scene, and has strong robustness.

DRAWINGS

Figure 1 is a skeleton diagram of a human body as defined in the present invention;

2 is a flow chart of a method for tracking a human skeleton point in a two-dimensional video stream of the present invention;

Figure 3 is an original image input by the present invention;

Figure 4 is a background view of the present invention;

Figure 5 is a mask binary diagram of the present invention;

Figure 6 is a foreground view of the present invention;

7 is a schematic diagram of a face detection area of the present invention;

Figure 8 is a schematic view of a head point and a neck point obtained by face detection according to the present invention;

Figure 9 is a schematic view of a region of a shoulder point of the present invention;

Figure 10 is a schematic view showing the area of the hand point of the present invention;

Figure 11 is a schematic view showing the division of the area of the present invention;

Figure 12 is a schematic view showing the area of the elbow point of the present invention;

Figure 13 is a diagram showing the recognition effect of the overall key points of the present invention.

detailed description

The present invention will be further described in detail below with reference to the embodiments and drawings, but the embodiments of the present invention are not limited thereto.

Example

Currently, depth-based bone tracking technology creates depth coordinates for each joint of the human body by processing depth data, and bone tracking can determine various parts of the body, such as the hand, head, and body, and determine where they are. However, the ordinary camera can only obtain two-dimensional information in space. The goal of this algorithm is to realize the tracking of human skeleton points in the two-dimensional video stream.

First, as shown in Figure 1, the relevant detection points and related diagrams of the human body are defined, as shown in Table 1 and Table 2 below;

Table 1

11	头部点HEADHead point HEAD	22	颈部点SHOULDER_CENTERNeck point SHOULDER_CENTER
33	左肩点SHOULDER_LEFTLeft shoulder point SHOULDER_LEFT	44	右肩点SHOULDER_RIGHRight shoulder point SHOULDER_RIGH
55	左手点HAND_LEFTLeft hand point HAND_LEFT	66	右手点HAND_RIGHTRight hand point HAND_RIGHT
77	左肘点ELBOW_LEFTLeft elbow point ELBOW_LEFT	88	右肘点ELBOW_RIGHTRight elbow point ELBOW_RIGHT
99	臀部点HIP_CENTER Hip point HIP_CENTER	1010	左脚点FOOT_LEFTLeft foot point FOOT_LEFT
1111	右脚点FOOT_RIGHTRight foot point FOOT_RIGHT	1212	左膝点KNEE_LEFTLeft knee point KNEE_LEFT
1313	右膝点KNEE_RIGHTRight knee point KNEE_RIGHT

Table 2

原始图original image	IMAGEIMAGE	背景图Background image	BACKGROUNDBACKGROUND
原图宽度Original width	IMAGE_WIDTHIMAGE_WIDTH	前景掩膜Foreground mask	MASKMASK
原图长度Original length	IMAGE_HEIGHTIMAGE_HEIGHT	前景图Foreground map	FOREGROUNDFOREGROUND

As shown in FIG. 2, the present invention provides a method for tracking a human skeleton point in a two-dimensional video stream, the method comprising the following steps:

Step S1: The camera acquires a two-dimensional video stream, reconstructs a background, and extracts a foreground mask by using a subtractive background method, and outputs a foreground image after denoising processing;

As shown in Figure 3-6, the specific method of outputting the foreground map is:

S11. Obtain a face center position HEAD(x, y) by using a face detection algorithm.

S12, setting two parameters: left composition threshold left_, right composition threshold right_, left composition indicator left_get=0, right composition indicator right_get=0;

S13, prompting the user to move to the left. When the horizontal position of the face is x<left_, left_get=1, the image of the right half of the current screen is saved, and is recorded as image_right;

S14, continue to prompt the user to move to the right, when the face center position abscissa x>right_, right_get=1, save the image of the left half of the current screen, recorded as image_left;

S15. When left_get=1 and right_get=1, the image_left and the image_right are spelled together to obtain the background image BACKGROUND,

BACKGROUND=image_left+LD(image_right,image_left.clos)

S16, after each input of an image IMAGE, IMAGE and BACKGROUND are subtracted and denoised to obtain the foreground mask foreground_mask, and the foreground_mask is binarized to obtain MASK

Foreground_mask=abs(IMAGE–BACKGROUND);

Where abs(a) denotes an absolute value for a;

MASK=threshold(foreground_mask, 55);

Where threshold(a, T) indicates that the image a is binarized with the threshold b, the point where the pixel value is higher than T is set to 255, and the point where the pixel value is lower than T is set to 0.

S17. The IMAGE and the MASK are processed and processed to output a foreground image FOREGROUND.

Step S2: detecting a human face on the foreground image of the output, and acquiring coordinates of the rectangular area of the face, the head point, and the neck point;

In this embodiment, as shown in FIG. 7 and FIG. 8, the Haar classifier is used for face detection, and the specific method is:

S21, converting the color map to a grayscale image;

S22, performing histogram equalization on the grayscale image to enhance contrast;

S23. Using a Haar classifier to detect a positive face, and if a positive face is detected, returning a face center point coordinate and a face rectangle length and width (HEAD_HEIGHT, HEAD_WIDTH);

S24. If the positive face is not detected, the Haar classifier is used to detect the side face, and the coordinates of the center point of the face and the length and width of the face rectangle are returned;

S25, wherein the coordinates of the center point of the face are used as the head point, and the length of the face rectangle of 0.75 times is taken downward from the head point, that is,

(HEAD.X, HEAD.Y+0.75*HEAD_HEIGHT) is determined as the neck point

S26. Take the length of the face rectangle 3 times from the head point, and determine (HEAD.X, HEAD.Y+3*HEAD_HEIGHT) as the neck point.

The Harr eigenvalue is calculated as (window size N*N):

For a given N*N window I, the integral graph is calculated as follows:

The summation of pixels in a square in a window image is calculated as follows:

Step S3, determining whether the head point is in the screen, if not, proceeding to the face detection module; if yes, dividing the human body into a left part ROI and a right part ROI to perform detection of other key points;

Step S4, using a specific position scanning and returning a pixel value point method to achieve shoulder point detection, and acquiring left shoulder point and right shoulder point coordinates;

As shown in Figure 9, the specific method for implementing shoulder point detection is:

S41. Image preprocessing obtains a human body contour;

S42, taking the left shoulder point ROI, the size of which is recorded as (ROI_HEIGHT, ROI_WIDTH);

S43, setting SCAN_X, SCAN_X is 0.35 times of the input image width, that is, SCAN_X=0.35*ROI_WIDTH;

S44, sweeping the left shoulder ROI from the top with a width of SCAN_X, if the value is greater than 50, returning the coordinates of the point;

S45. If the value is greater than 50, the left shoulder ROI is swept from right to left by SCAN_Y, wherein SCAN_Y is 0.7 times the length of the input image, that is, SCAN_Y=0.7*ROI_HEIGHT, if the value is greater than 50, then return Point coordinates

Step S5, using the minimum circumscribed rectangle near the end point of the skin color region and returning to realize the hand point detection, and acquiring the coordinates of the left hand point and the right hand point;

As shown in Figure 10, the specific method for implementing hand point detection is:

S51, converting RGB into YCrCb coordinate system and storing in YUU;

S52. Separating the three channels of the YUU and separately extracting special information combinations (Y<=173, Cr<=127, Cb>=77) in each channel of the YUU into a new picture, and storing in the BW;

S53, performing an open operation on the BW (5*5 processing window): removing noise;

S54, expansion 2 times (3*3 processing window): smooth the image;

S55, extracting an outer contour;

S56, traversing the outer contour and extracting a contour L corresponding to the largest area;

S57, the minimum external rectangle K of the newly created L;

S58, K directly return to the center point when the following conditions are met: the rectangle width is less than 1.5 times the rectangle height and the rectangle height is less than 1.5 times the rectangle width;

S59, if not satisfied:

Detect the left hand and find the leftmost point, defined as ptt[0];

Determine the next left point, defined as ptt[1];

Define p1 as the midpoint of K and define p2 as the midpoint of ptt[0] and ptt[1];

Determine the general position of the hand by the geometric relationship of p1 and p2, and assign coordinates to p2;

Return p2;

The treatment of the right hand is the same as the left hand;

The YCbCr format can be obtained by linearly changing from the RGB format. The conversion formula is as follows:

Through statistical analysis of a large number of skin pixels, it can be seen that the skin color cluster is in a small range in the chromaticity space, and the following calculation formula determines whether it belongs to the skin area:

(Cb>77And Cb<127) And(Cr>133And Cr<173).

Step S6, the hand ROI is divided into three regions, each region uses different scanning mode return points to realize elbow point detection, and obtain left elbow point and right elbow point coordinates;

As shown in Figures 11-12, the method for achieving elbow point detection is:

S61. The hand ROI is divided into three regions, and the three regions are as shown in FIG. 11, and each region uses different scanning manner return points to realize elbow point recognition;

S62, image preprocessing to obtain the contour of the human body

S63, take the left elbow ROI

S64, the ROI is divided into three regions, respectively corresponding to raising the hand, 45 degrees, and the hips are downward.

S65. When the difference between the abscissa of the shoulder point and the abscissa of the hand point is greater than IMAGE_HEIGHT/50:

Raise hand movement (Zone 1): When the difference between the ordinate of the hand point and the ordinate of the shoulder point is less than the threshold IMAGE_HEIGHT/5

That is, HAND.y-SHOULDER.y<IMAGE_HEIGHT/5, then sweep from the bottom up, sweep back and return

45 degrees downward (zone 2): when the difference between the ordinate of the hand point and the ordinate of the shoulder point is greater than the threshold IMAGE_HEIGHT/5

That is, HAND.y-SHOULDER.y>IMAGE_HEIGHT/5, sweep the point from right to left (swipe horizontally with the ROI down to 8 pixels), and sweep back to the point where the pixel value is greater than return

Forklift action (Zone 3): When the difference between the abscissa of the shoulder point and the abscissa of the hand point is less than IMAGE_HEIGHT/50:

That is, when SHOULDER.x-HAND.x<IMAGE_HEIGHT/50, the point is swept from left to right, and the coordinates of the point where the first pixel value is greater than 50 are returned;

The identification of the right shoulder point is the same as the left shoulder.

In step S7, the credibility of each point is finally counted and the trusted point is displayed.

As an optimization scheme of the foregoing embodiment, the human skeleton point chase in the two-dimensional video stream in this embodiment Tracking method, the method further comprises the following steps:

S8, using the near-end point of the minimum circumscribed rectangle of the foreground area of the lower body and returning to realize the foot point detection, and the specific method of the foot point detection is:

S81 in the whole body mode, take the foreground half of the human body lower body ROI with half of the screen;

S82, extracting the outer contour, traversing the outer contour and extracting the contour L corresponding to the largest area, and newly creating a minimum circumscribed rectangle K of L;

S83, K directly return to the center point when the following conditions are satisfied: the rectangle width is less than 1.5 times the rectangle height and the rectangle height is less than 1.5 times the rectangle width

S84, if not satisfied:

Return p2;

S9: Performing a knee point detection by scanning and returning a distance of 0.2 times the height of the human body from the foot point, and the specific method of the knee point detection is:

S91. The background reconstruction module acquires the foreground of the human body, and in the whole body mode, removes the body ROI of the lower body;

S92. Obtain the height of the human body BODY_HEIGHT, BODY_HEIGHT=FOOT_LEFT_Y–FACE_Y+FACE_HEIGHT/2;

S93, taking the left foot ROI, the size of which is recorded as (ROI_HEIGHT, ROI_WIDTH);

S94, set SCAN_Y, SCAN_Y is 0.2 times the height of the user, ie SCAN_Y= 0.2*BODY_HEIGHT;

S95. Scan the left foot ROI from left to right with the height of FOOT_LEFT_Y and above SCAN_Y. If the value is greater than 50, return the coordinate of the point (x+12, y), where x+12 indicates that the horizontal coordinate is 12 pixels. Offset processing so that the knee point is at the center of the knee;

S96. If the value cannot be swept, the value returns to (0, 0), and is set as an untrusted point;

After the above steps S1-S9, the identification of all the overall key points is completed, as shown in FIG.

In this embodiment, in the step of S1, due to the problem of uneven illumination and the shadow of the mask in the real scene, the mask obtained in the foreground extraction module needs to be optimized, so that it can adapt to uneven illumination. Case. The GI filter function is mainly used for mask optimization, and the specific method is as follows:

Gaussian filtering is performed on the input mask to eliminate Gaussian noise. The Gaussian filtering preset parameters are: processing window size is 15x15, sigma is 20;

Applying GI filtering to the mask image after denoising, the 0-1 transition image is obtained. The preset parameters of the GI filter are: the processing window size is 8x8, and the penalty parameter is 51;

The GI filtering algorithm inputs the color map I and the initial mask P, and the output is an optimized mask for complementing the edge information of the color map. The process is as follows:

Algorithm 1.Guided Filter.

Input:filtering input image p,guidance image I,radius r,regularization∈

Output:filtering output q.

1:mean _I =f _mean (I)

Mean _p =f _mean (p)

Corr _I =f _mean (I.*I)

Corr _Ip =f _mean (I.*p)

2:var _I =corr _I -mean _I .*mean _I

Cov _Ip =corr _Ip -mean _I .*mean _p

3: a=cov _Ip ./(var _I +∈)

b=mean _p -a.*mean _I

4: mean _a = f _mean (a)

Mean _b =f _mean (b)

5:q=mean _a .*I+mean _b

/ ^* f _mean is a mean filter with a wide variety of O(N)time methods. ^* /

Among them, mean means to obtain the mean value of the image, corr means to find the mean value of the second moment; the second step to find the local variance of the image; the third step to calculate the linear coefficients a and b; the fourth step to calculate the mean value of the coefficient; the fifth step to achieve information completion.

Use the 3x3 processing window to open the operation to further eliminate the void points and discrete points;

Find the maximum connected domain of the mask, and then perform Gaussian filtering to obtain an optimized mask. The Gaussian filter preset parameters are: processing window size is 15x15, sigma is 20.

The invention also discloses a human skeleton point tracking system in a two-dimensional video stream, the system comprising:

a judging module, configured to determine whether the head point is in the screen, and if not, proceeding to the face detecting module; if yes, dividing the human body into a left part ROI and a right part ROI for performing other key points respectively Detection

In addition to the above main modules, the system further includes a foot point detection module and a knee point detection module;

The knee point detecting module is configured to perform knee point detection by scanning and returning with a distance of 0.2 times the height of the human body.

The above embodiments are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and combinations thereof may be made without departing from the spirit and scope of the invention. Simplifications should all be equivalent replacements and are included in the scope of the present invention.

Claims

A human skeleton point tracking method in a two-dimensional video stream, characterized in that the method comprises the following steps:

The camera acquires a two-dimensional video stream, reconstructs the background, and extracts the foreground mask by using a subtractive background method, and outputs a foreground image after denoising processing;

Detecting a human face on the foreground image of the output, and obtaining coordinates of the rectangular area of the face, the head point, and the neck point;

Determining whether the head point is on the screen, if not, proceeding to the face detection module; if yes, dividing the human body into a left part ROI and a right part ROI to perform detection of other key points;

Use a specific position to scan and return a pixel value point to achieve shoulder point detection, and obtain left shoulder and right shoulder coordinates;

Utilizing the near-end point of the minimum circumscribed rectangle of the skin color region and returning to realize the hand point detection, and acquiring the coordinates of the left-hand point and the right-hand point;

By dividing the hand ROI into three regions, each region uses different scanning mode return points to realize elbow point detection, and obtain left elbow point and right elbow point coordinates;

Finally, the credibility of each point is counted and the credible points are displayed.
The human skeleton point tracking method in the two-dimensional video stream according to claim 1, wherein the camera acquires a two-dimensional video stream, reconstructs a background, and extracts a foreground mask by using a subtractive background method, and outputs the denoising process. In the steps of the foreground map, the specific method of outputting the foreground map is:

Obtaining the face center position HEAD(x, y) by the face detection algorithm;

Set two parameters: left composition threshold left_, right composition threshold right_, left composition indicator left_get=0, right composition indicator right_get=0;

Prompt the user to move to the left. When the horizontal position of the face is x<left_, left_get=1, the image of the right half of the current screen is saved and recorded as image_right;

Continue to prompt the user to move to the right, when the face center position abscissa x>right_, right_get=1, save the image of the left half of the current screen, recorded as image_left;

When left_get=1 and right_get=1, put image_left and image_right together to get the background image BACKGROUND.

BACKGROUND=image_left+LD(image_right,image_left.clos)

Where LD(a,b) represents shifting the image a as a whole to the right by b pixels;

Then, after inputting an image IMAGE, the IMAGE and the BACKGROUND are subtracted and denoised to obtain the foreground mask foreground_mask, and the foreground_mask is binarized to obtain the MASK;

The IMAGE and MASK are processed and processed to output the foreground image FOREGROUND.
The human skeleton point tracking method in the two-dimensional video stream according to claim 1, wherein in the step of detecting a face of the output foreground image and acquiring coordinates of the face rectangular region, the head point, and the neck point , using the Haar classifier for face detection, the specific method is:

Convert the color map to a grayscale image;

Histogram equalization of grayscale images to enhance contrast;

The Haar classifier is used to detect the positive face, and when the positive face is detected, the coordinates of the center point of the face and the length and width of the face rectangle are returned;

If a positive face is not detected, the side face is detected using the Haar classifier, and the coordinates of the center point of the face and the length and width of the face rectangle are returned.
The human skeleton point tracking method in the two-dimensional video stream according to claim 1, wherein the shoulder point detection is implemented by scanning and returning a pixel value point using a specific position, and acquiring a left shoulder point and a right shoulder point. In the steps of coordinates, the specific method for implementing shoulder point detection is:

Image preprocessing to obtain the contour of the human body;

Take the left shoulder point ROI, the size is recorded as (ROI_HEIGHT, ROI_WIDTH);

Set SCAN_X, SCAN_X is n1 times the width of the input image, where 0<n1<1, ie SCAN_X=n1*ROI_WIDTH;

Sweep the left shoulder ROI from the top with the width SCAN_X, and return the coordinates if the value is greater than the set value M;

If the sweep cannot have a value greater than M, sweep the left shoulder ROI from right to left with the length SCAN_Y, where SCAN_Y is n2 times the length of the input image, where 0<n1<1, ie SCAN_Y=n2*ROI_HEIGHT if the value is greater than M, return the coordinates of the point;

The coordinates of the left shoulder point are obtained by the same identification method as described above.
The human skeleton point tracking method in the two-dimensional video stream according to claim 1, characterized in that In the step of using the minimum circumscribed rectangle near the end point of the skin color region and returning to realize the hand point detection, and acquiring the coordinates of the left hand point and the right hand point, the specific method for realizing the hand point detection is:

Convert RGB into YCrCb coordinate system and store it in YUU;

Separate the YUU three channels and extract the special information in each channel of the YUU to form a new map, which is stored in the BW;

Open the BW, except to remove noise, smooth the image and extract the outer contour;

Traversing the outer contour and extracting the contour L corresponding to the largest area, creating a minimum circumscribed rectangle K of L;

K returns directly to the center point when the following conditions are met: the rectangle width is less than X times the rectangle height and the rectangle height is less than X times the rectangle width, where 1<X<2;

If not satisfied:

Create a new point container ptt for the vertices of the minimum bounding rectangle K;

Detect the left hand, find the leftmost point, define it as ptt[0], determine the next left point, define it as ptt[1], define p1 as the midpoint of K, and define p2 as ptt[0] and ptt[1 Midpoint of

The geometric position of p1 and p2 is used to determine the general position of the hand, and the coordinates are assigned to p2. When p2 is in the edge part, the value is (0, 0), and the value of (0, 0) is not displayed;

Return p2;

The coordinates of the right hand are identified using the same method as described above.
The human skeleton point tracking method in the two-dimensional video stream according to claim 1, wherein the elbow point detection is realized by dividing the hand ROI into three regions, and each region respectively uses different scanning manner return points. And in the steps of obtaining the coordinates of the left elbow point and the right elbow point, the specific method for implementing the elbow point detection is:

Image preprocessing to obtain the contour of the human body;

Take the ROI of the left elbow and divide the ROI into three areas, which respectively correspond to the three postures of raising the hand, 45 degrees, and the hips down;

When the difference between the abscissa of the shoulder point and the abscissa of the hand point is greater than IMAGE_HEIGHT/50:

Raise your hand: When the difference between the ordinate of the hand point and the ordinate of the shoulder point is less than the threshold IMAGE_HEIGHT/5, sweep the point from the bottom up and swipe to return;

45 degrees downward: When the difference between the ordinate of the hand point and the ordinate of the shoulder point is greater than the threshold IMAGE_HEIGHT/5, the point is swept from right to left, and the pixel value of the swept point is greater than the return value;

Forked waist movement: When the difference between the shoulder coordinate of the shoulder point and the horizontal coordinate of the hand point is less than IMAGE_HEIGHT/50, the point is swept from left to right, and the coordinates of the point with the first pixel value greater than 50 are returned.
The human skeleton point tracking method in the two-dimensional video stream according to claim 1, wherein the method further comprises the following steps:

The foot point detection is implemented by using the near-end point of the minimum circumscribed rectangle of the foreground area of the lower body and returning. The specific method of the foot point detection is:

In the whole body mode, the human body lower body ROI of the foreground image is taken out by half of the screen;

Extracting the outer contour, traversing the outer contour and extracting the contour L corresponding to the largest area, creating a minimum circumscribed rectangle K of L;

K returns directly to the center point when the following conditions are met: the rectangle width is less than Y times the rectangle height and the rectangle height is less than Y times the rectangle width, where 1<Y<2;

If not satisfied:

Create a new point container ptfoot to hold the vertices of the minimum bounding rectangle K;

Detect the left foot, find the leftmost point, define it as ptfoot[0], determine the next left point, define it as ptfoot[1], define p1 as the midpoint of K, and define p2 as ptfoot[0] and ptfoot[ The midpoint of 1];

Determine the general position of the foot by the geometric relationship of p1 and p2, and assign the coordinates to p2;

When p2 is in the edge part, the value is (0, 0), and the value of (0, 0) is not displayed;

Return p2;

The coordinates of the right foot point are obtained by the same recognition method described above.
The human skeleton point tracking method in the two-dimensional video stream according to claim 7, wherein the method further comprises the following steps:

The knee point detection is realized by scanning and returning the distance from the foot point to the set height. The specific method of the knee point detection is:

The background reconstruction module acquires the foreground of the human body, and in the whole body mode, removes the body ROI of the lower body;

Get body height BODY_HEIGHT, BODY_HEIGHT=FOOT_LEFT_Y– FACE_Y+FACE_HEIGHT/2;

Take the left foot ROI, the size is recorded as (ROI_HEIGHT, ROI_WIDTH);

Set SCAN_Y, SCAN_Y is 0.2 times the height of the user, ie SCAN_Y=0.2*BODY_HEIGHT;

Sweep the left foot ROI from left to right with the height of FOOT_LEFT_Y above SCAN_Y. If there is a value greater than 50, return the coordinates of the point (x+12, y), where x+12 indicates a 12-pixel offset to the abscissa. Processing so that the knee point is at the center of the knee;

If the value cannot be greater than 50, it returns (0,0) and is set as an untrusted point;

The coordinates of the right knee point are obtained by the same recognition method described above.
A human skeleton point tracking system in a two-dimensional video stream, characterized in that the system comprises:

The foreground extraction module is configured to acquire a two-dimensional video stream, reconstruct a background, and extract a foreground mask by using a subtractive background method, and output a foreground image after denoising processing;

a face detection module, configured to detect a face of the output foreground image, and obtain a rectangle of the face, a head point, and a neck point coordinate;

a judging module, configured to determine whether the head point is in the screen; if not, proceeding to the face detecting module; if yes, dividing the human body into a left part ROI and a right part ROI to perform other key points respectively;

a shoulder point detection module for performing shoulder point detection by scanning and returning a pixel value point using a specific position, and acquiring left shoulder point and right shoulder point coordinates;

The hand detection module is configured to perform the hand point detection by using the minimum circumscribed rectangle near the end point of the skin color region, and obtain the coordinates of the left hand point and the right hand point;

The elbow detection module is configured to divide the hand ROI into three regions, and each region uses different scanning mode return points to realize elbow point detection, and obtain left elbow point and right elbow point coordinates;

The statistics module finally counts the credibility of each point and displays the trusted points.
The human skeleton point tracking system in the two-dimensional video stream according to claim 9, wherein the system further comprises a foot point detecting module and a knee point detecting module;

The foot point detecting module is configured to use a minimum circumscribed rectangle near the end point of the lower body foreground area and return Back to achieve foot detection;

The knee point detecting module is configured to perform knee point detection by scanning and returning the distance of the set height by the foot point.