Background technology
Human body attitude identification can apply to the fields such as physical activity analysis, man-machine interaction and visual surveillance, be recent
A popular problem in computer vision field.Human body attitude identification refers to detect the position of partes corporis humani point simultaneously from image
Its direction and dimensional information are calculated, the result of gesture recognition is divided to two and three dimensions two kinds of situations, and the method estimated point is based on mould
Two kinds of approach of type and model-free.
The Chinese patent application of Publication No. CN101350064A, discloses a kind of estimating two-dimension human body guise with dress
Put.The method detects the human region in two dimensional image and determines hunting zone of the human body in two dimensional image first.
Then according to the hunting zone of human body, with reference to the trunk of human body, head, hand, leg, foot, formwork calculation
With similarity, the identification at each position is realized;With reference to the restriction relation between adjacent regions, the attitude of two-dimension human body is obtained.Implement
Step is as follows:
The first step:Graded in existing method detection two dimensional image using existing optical flow method, frame differential method, background difference
Human region.
Second step:Determine the hunting zone of multiple human bodies in human region.
(1)Face datection is carried out in human region, search model of the position where the face that will be detected as head
Enclose;
(2)The hunting zone of left hand and right hand is determined using the face complexion feature detected;And then determine trunk, a left side
The hunting zone of arm, right arm.
(3)Remainder in human region is defined as to the hunting zone of left leg, left foot, right leg, right crus of diaphragm.
3rd step:Matching similarity is calculated in corresponding human body hunting zone according to each human body template, really
Determine the optimal location of partes corporis humani position, with reference to the restriction relation between adjacent human body, obtain the attitude of two-dimension human body.
The method of above-mentioned estimation human body attitude has following shortcomings:
First, detecting two dimensional image using existing method of grading is differed using existing optical flow method, frame differential method, background
In human region, there are problems that illumination variation, background dynamics change, light stream Multi-Scale Calculation speed, frequently can lead to
The human region for detecting has larger error, is that follow-up human body detection algorithm hides some dangers for, and can cause total algorithm
Failure;
Lead to not second, there can be face using method for detecting human face progress head zone positioning and partly or entirely block
The problem of detection, and, Face datection algorithm often only has accuracy of detection very high, offside dough figurine face effect to front face
It is poor;
Third, the method for template matches, which carries out human body identification positioning, can produce the problem of precision is not high, show and regard
Human body in frequency image can be because scale size change, the not equal factor of clothing, cause the precision of match cognization algorithm to become
Difference, causes human body Wrong localization, whole algorithm is failed.
The content of the invention
It is an object of the invention to provide the human body attitude in the two-dimensional video image that a kind of accuracy of identification is high, recognition speed is fast
Recognition methods.
For achieving the above object, the present invention is achieved using following technical proposals:
A kind of human posture recognition method in two-dimensional video image, methods described comprises the steps:
A, according to the metric space principle of stratification by raw video imageIt is divided intoGroup,,It is the resolution ratio of the raw video image;
B, to every group of video image, calculating a yardstick isSampled images,ForIn one of yardstick,Represent sampling function,Represent theGroup video image,,For the resolution ratio of the raw video image,It is the natural number more than 1 of setting, represents
The quantity for the sample video image that every group of video image is included,;
C, to the sampled images in every groupHOG low-level image feature descriptors are calculated respectively;
D, by step c obtain every group in a sampled images HOG low-level image feature descriptors based on, according to prediction
FormulaCalculating yardstick in every group isIn remaining()
The corresponding HOG low-level image features descriptor of sample video image of individual yardstick,WithSampled images are represented respectivelyAnd sample graph
PictureYardstick,It is setting value;
E, the HOG low-level image feature descriptors according to all different scale sample video images of step c and step d, with reference to
The SVM for training, detects the human body target region in the raw video image;
F, using the random forest grader trained the pixel in the step e human body target regions detected is classified,
Determine the body part region in the human body target region;
G, by step f determine each body part connect to form human body contour outline, realize human body attitude recognize.
Preferably, in the step b, utilizeIn end yardstick to every group of video figure
As sampling, the corresponding sampled images of end yardstick are calculated。
Random forest classification in human posture recognition method in two-dimensional video image as described above, the step f
Device is preferably trained by following methods:
Acquisition includes the real video images in the artificial synthesized video image and target detection scene of human body attitude, every width
Video image is used as a training sample;
The background area in each training sample and human body target region are labeled according to setting body part;
The pixel characteristic of each tab area, all tab area and its pixel characteristic data structures are calculated using SURF operators
Into training data set;
Using the training data set and object function
Random forest grader is trained;
Wherein,Decision tree class node in for random forest,It is weights,It is comentropy meter
Calculate function,It is the pixel characteristic of tab area in the artificial synthesized video image training sample,It is described true
The pixel characteristic of tab area in real video image training sample,It is the artificial synthesized video image training sample
In markedThe statistics descriptor of the pixel characteristic of individual body part,It is the artificial synthesized video image
In training sample in all tab areas all pixels feature statistics descriptor,It is the real video images
In training sample in all tab areas all pixels feature statistics descriptor,ForWith'sDistance.
Compared with prior art, advantages and positive effects of the present invention are:
(1)When detecting human body target from raw video image using the multiple dimensioned low-level image feature extracting methods of HOG, after packet
Every group of sampled images in only need to calculate the HOG low-level image feature descriptors of a secondary sampled images, the bottom of remaining sampled images
Feature descriptor is calculated by feature prediction, on the basis of accuracy of detection is not reduced, accelerates multiple dimensioned low-level image feature
Calculating speed, fundamentally solve the multiple dimensioned human body target detection method of restriction and move towards the amount of calculation that practical application faces
Greatly, the not enough thorny problem of real-time.
(2)Classification and Identification is carried out to human body limb position using random forest grader, when random forest grader is trained
Using decision tree nodes in new object function training grader, Weak Classifier can be made extensive to test from training sample space
Still there is consistent spatial activation pattern during sample space.So so that the training of the grader can be by by computer
The artificial synthesized human body attitude video image sample of graphics is main body, comes with reference to a small amount of real human body attitude video for having marked
The training of random forest grader is completed, so as to realize from artificial synthesized human body attitude sample to real human body attitude feature
It is extensive, reduce the requirement to training sample.
After specific embodiment of the invention is read in conjunction with the figure, other features of the invention and advantage will become more clear
Chu.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, below with reference to drawings and Examples,
The present invention is described in further detail.
First, the general roadmap that the present invention realizes human body attitude identification is briefly explained:
Human body attitude is recognized from two-dimensional video image, is divided into two steps, the first step is detected from raw video image
Human body target region, second step is to carry out Classification and Identification to human body target region, identifies human body limb position, such as head, hand, elbow
The joint parts such as portion, shoulder, buttocks, knee, pin, and body part is connected to form human body contour outline, and then realize human body attitude
Identification.In the present invention, during first step detection human body target region, using the multiple dimensioned low-level image feature extracting methods of HOG, reduce
The influence of background, illumination etc., keeps scale invariability;And low-level image feature extracting method is improved, improve real-time.Second
Step improves classification accuracy using random forest classification tree identification human body limb position;And to the mesh in random forest classification tree
Scalar functions are improved, and improve the generalization ability of grader, the complexity of required training sample when reducing classifier training.More
The implementation method of body, refer to following description.
Fig. 1 is referred to, the figure show the human posture recognition method one embodiment in two-dimensional video image of the present invention
Flow chart.
As shown in figure 1, the process of embodiment identification human body attitude is specifically realized using following step:
Step 101:Raw video image is divided into multiple series of images according to space delamination principle.
According to the metric space principle of stratification by raw video imageIt is divided intoGroup, wherein,,It is raw video imageResolution ratio.The principle and method being layered to video image according to metric space are existing skill
Art, is not specifically addressed herein.
Step 102:A sampled images for particular dimensions are calculated in every group, and calculates the HOG low-level image features of sampled images
Descriptor.
Every group of video image is sampled, calculating a yardstick isSampled images.YardstickFor
One particular dimensions, specifically,ForIn one of yardstick.It is preferred that,ForIn end yardstick.Wherein,Represent sampling function,Represent theGroup video image,,It is the resolution ratio of the raw video image,It is the natural number more than 1 of setting, represents
The quantity for the sample video image that every group of video image is included,.Usually,Value be 5-8, represent every group of video
Image includes 5-8 layers of sample video image.
Then, the HOG for the sampled images that yardstick is selected in every group is calculated(Histogram of Oriented
Gradient, histograms of oriented gradients)Low-level image feature descriptor.Calculating HOG low-level image features descriptor can use prior art
In method, be not specifically described herein.
Step 103:The HOG low-level image features of the sample video image of other particular dimensions in every group are calculated by prediction algorithm
Descriptor.
For every group of video image, the HOG low-level image feature descriptors of a sampled images have been calculated through step 102.So
Afterwards, based on the HOG low-level image feature descriptors that this is calculated, prediction calculates the sample video image of other particular dimensions
HOG low-level image feature descriptors.
Specifically, other particular dimensions refer toIn have calculated that except step 102
Remaining outside the yardstick of HOG low-level image feature descriptors()Individual yardstick.Calculating is predicted using following formula, and other are specific
The HOG low-level image feature descriptors of the sample video image of yardstick:
Wherein,WithSampled images are represented respectivelyAnd sampled imagesYardstick,,It is setting value,For sampled imagesHOG low-level image feature descriptors,It is sampled imagesHOG bottoms
Layer feature descriptor.
Wherein,It is a setting value as power exponent, the setting value rule of thumb can determine in verification method fitting.
In this embodiment,Preferred value be 0.0042.
In above-mentioned formula, power exponentIt is determination value, one of yardstick and its corresponding HOG low-level image features descriptor
It is calculated through step 102, then, for specified another yardstick, can calculates what this was specified conveniently by above-mentioned formula
The corresponding HOG low-level image features descriptor of another yardstick.The like, can easily it calculate in group corresponding to remaining yardstick
HOG low-level image feature descriptors, so as to calculate the HOG low-level image feature descriptors of the sample video image included in all groups.
Step 104:HOG low-level image feature descriptors according to all different scale sample video images, combined training is good
Human body target region in SVM, detection video image.
The HOG low-level image features of the sample video image included in all groups calculated using step 102 and step 103
Descriptor, you can detect the human body target region under different scale.Using HOG low-level image features descriptor and the SVM trained,
Realizing the specific method of human body target region detection can realize that no further details to be given herein using prior art.
Step 105:The pixel in human body target region is classified using random forest grader, body part area is determined
Domain.
Step 104 is determined after human body target region, using the random forest grader trained to human body target area
The pixel in domain is classified, so that it is determined that body part region.The input of random forest grader is the feature of pixel, selectes and divides
The parameter of class device, including the quantity of decision tree, internal node randomly choose number, the smallest sample of terminal note of attribute in forest
Number, grader is input into using the pixel characteristic in human body target region as |input paramete, and grader is by the affiliated limbs portion of output pixel
The result in position region, so that it is determined that going out body part region.In this embodiment, from SURF(speed up robust
Features, fast robust Gradient Features)Operator calculates pixel characteristic, and each pixel characteristic can be constructed as retouching for 128 dimensions
State symbol.Body part region includes seven articular portions of human body, is respectively:Pin, knee, buttocks, shoulder, ancon, hand, head.
Step 106:Each body part is connected to form human body contour outline, realizes that human body attitude is recognized.
Step 105 is determined after body part, and the connection of each body part connects according to head-shoulder-buttocks-knee-pin
Trunk is connected into, both sides reconnect ancon and hand, can so identify human body contour outline, so as to realize based on human synovial model
Human body attitude identification.
In this embodiment, when detecting human body target region, although employ the mode of HOG low-level image feature descriptors, but
Only raw video image is grouped, every group included sample video image is determined quantity, namely every group
The number of plies, calculates function in every group and calculates a HOG low-level image feature descriptor for sampled images only with low-level image feature, in group its
The HOG low-level image features descriptor of the sampled images of his yardstick is calculated using the prediction algorithm of step 103, computation complexity and
Amount of calculation calculates function fashion much smaller than using low-level image feature.And, it is corresponding without calculating each yardstick using prediction algorithm
Sample video image, directly obtains the HOG low-level image feature descriptors of the sample video image, reduce further amount of calculation.Enter
And, rapidity and real-time based on the detection of HOG human body targets are improve, fundamentally solve the multiple dimensioned human body target of restriction
Detection method moves towards computationally intensive, the thorny problem that real-time is not enough that practical application faces.
In machine learning, random forest is a grader comprising multiple decision trees.It is main that it is used for gesture recognition
Reason is nicety of grading high, in addition with four factors, one is its learning process is very quick;The second is the complexity of algorithm
Degree can be controlled by the depth adaptive of internal decision making tree;The third is when building forest, it can be internally for vague generalization
Error afterwards produces the estimation of not deviation;Fourth, have good tolerance to exceptional value and noise, and it is existing to be less prone to over-fitting
As.But its major defect is that requirement training data is similar to test data, i.e., both are distributed with identical, which has limited
The generalization ability of the grader.Therefore, high-precision random forest grader is obtained, it is desirable to which training sample covers survey in the future
The examination all possible variable condition of data.But, due to visual angle change, the twisting of limbs, human dressing in actual test scene
The factors such as texture variations, illumination variation influence, and can not possibly obtain training sample sufficient enough.
The disadvantages mentioned above existed for random forest grader, in the above embodiment of the present invention, is improved random gloomy
The object function of decision tree nodes is trained in woods grader, so that Weak Classifier is extensive to test sample from training sample space
Still there is consistent spatial activation pattern during space.So, target detection can be needed only in training sample selection empty
Between in some weak marks sample, and other training data can utilize the artificial synthesized human body of computer graphics
Attitude video image sample is completed, so as to reduce the requirement to training sample.Specific training process is as follows:
Acquisition includes the real video images in the artificial synthesized video image and target detection scene of human body attitude, every width
Video image is used as a training sample.Moreover, artificial synthesized video image is main body, with reference to having marked body part on a small quantity
And the real video images in the target detection scene of background.
The background area in each training sample and human body target region are labeled according to setting body part.Specifically
For, human body target area marking is eight parts by foundation human synovial position, and a portion is background, remaining seven part point
It is not:Pin, knee, buttocks, shoulder, ancon, hand, head.
Each pixel characteristic in each tab area, all tab areas and its corresponding picture are calculated using SURF operators
Plain characteristic composing training data acquisition system.Specifically, artificial synthesized video image training sample is calculated from SURF operators
With each pixel characteristic in each tab area in real video images training sample, each pixel characteristic is configured to 128 dimensions
Descriptor.The pixel characteristic of tab area is designated as in artificial synthesized video image training sample, real video images training
The pixel characteristic of tab area is designated as in sample,WithComposing training data acquisition system,For random forest
In a decision tree a class node.Meanwhile, calculate in the artificial synthesized all marked regions of video image training sample
The statistics descriptor of all 128 dimension SURF descriptorsAnd in all marked regions of real video images training sample
The statistics descriptor of all 128 dimension SURF descriptors。
Finally, the object function after above-mentioned training data set and improvement is utilizedRandom forest grader is carried out
Training.Wherein, improved object functionExpression formula be:
In above-mentioned formula,For weights, the weights are one and test the fixed value measured, are preferably, grader
Recognition effect it is best.For comentropy calculates function, specific function expression uses prior art.It is artificial
Marked in composite video image training sampleThe statistics descriptor of all pixels feature in individual body part,ForWith'sDistance.
Object function in above-mentioned expression formula, had both considered training sample entropy(), training number is combined again
According to the information difference between target detection data(), by both weighted sums, it is used as instruction
Practice the object function of decision tree, thus, improve the generalization ability of the grader for training.Know using the grader trained
During others' body body part, recognition accuracy higher is obtained in that.
Above-mentioned object function is usedDistance represents the information difference between training data and target detection data, but not
It is confined to this, it would however also be possible to employ Euclidean distance or other distances represent both diversity factoies.
The above embodiments are merely illustrative of the technical solutions of the present invention, rather than is limited;Although with reference to foregoing reality
Example is applied to be described in detail the present invention, for the person of ordinary skill of the art, still can be to foregoing implementation
Technical scheme described in example is modified, or carries out equivalent to which part technical characteristic;And these are changed or replaced
Change, do not make the spirit and scope of the essence disengaging claimed technical solution of the invention of appropriate technical solution.