CN102682302B

CN102682302B - Human body posture identification method based on multi-characteristic fusion of key frame

Info

Publication number: CN102682302B
Application number: CN201210063893.5A
Authority: CN
Inventors: 黄鲜萍; 郑莉莉; 梁荣华
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2012-03-12
Filing date: 2012-03-12
Publication date: 2014-03-26
Anticipated expiration: 2032-03-12
Also published as: CN102682302A

Abstract

The invention provides a human body posture identification method based on multi-characteristic fusion of a key frame, comprising the following steps of: (1) extracting Hu invariant moment characteristics from a video image; calculating a covering rate of an image sequence; extracting the highest covering percentage of the covering rate as a candidate key frame; then calculating a distortion rate of the candidate key frame and extracting the minimum distortion percentage as the key frame; (2) carrying out extraction of a foreground image on the key frame to obtain the foreground image of a moving human body; (3) extracting characteristic information of the key frame, wherein the characteristic information comprises a six-planet model, a six-planet angle and eccentricity; obtaining a multi-characteristic fused image characteristic vector; and (4) utilizing a one-to-one trained classification model, wherein the classification model is a posture classifier based on an SVM (Secure Virtual Machine); and identifying a posture. The human body posture identification method has the advantages of simplified calculation, good stability and good robustness.

Description

A kind of human posture recognition method of the many Fusion Features based on key frame

Technical field

The present invention relates to a kind of human posture recognition method.

Background technology

In recent years, along with China and other emerging economy Urbanization Construction accelerate, the city management problems such as a series of traffic that the simultaneous urbanization that movement of population increases causes, public security, video monitoring is more and more universal, also more and more higher to the intelligentized demand of video monitoring.People wish to extract from video by intellectual analysis each application in the live and work that more information is applied to people, such as safety monitoring, Smart Home, man-machine interaction, sportsman's supplemental training etc.Security protection industry, in the places such as bank, rail, warehouse, need to realize the detection and tracking of moving target automatically, and abnormal alarm situation, thereby reduces all kinds of losses; In house security system and medical monitoring system, can detect in time become a Buddhist monk or nun middle old man or patient whether abnormal conditions occur; In field in intelligent robotics, the analysis that hope can be to the attitude of people in video, gesture, language, carries out mutual exchange and effect with people; In some trainings such as sports, dance training, system can be analyzed athletic joint kinematic parameter, for improvement of athletic training patterns.

Existing human body attitude identification adopts two kinds of methods conventionally: template matching method and state-space method.Template matching method is that image sequence is converted to one group of static in shape pattern, then in identifying, with pre-stored behavior sample, carry out the motion of people in interpretation of images sequence, the advantage of this algorithm is that computation complexity is low, realizes simple, but responsive to the time interval of behavior, poor robustness.State-space method defines each static posture with state-space model and as a state, between these states, by certain probability, connect, each motion sequence can be seen the once traversal between these static posture different conditions as, and calculate the process of its joint probability, this algorithm can be avoided the problem to time interval modeling, but need training sample large, computation complexity is high.

Key frame refers to quantity of information maximum, a representational pair or multiple image in video sequence, the content outline that can reflect one section of video, to accomplish simultaneously as far as possible succinct, data volume is few, therefore in video analysis process, extract key frame and carry out gesture recognition and improve significantly video analysis efficiency.Conventional key-frame extraction technology roughly can be divided into: the method based on shot boundary, that video is divided into several independent camera lenses according to scene, get the first frame, last frame or the intermediate frame of every arrangement of mirrors head as key frame, this algorithm is simple, quick, but do not consider the complicacy of vision content, cannot represent the video segment that content is longer; Method based on video content analysis, to utilize characteristics of image, the feature difference degree of judgement video frame image and present frame, difference is more key frame, this class algorithm can obtain good effect for the video of different length and content, but for camera motion and video content, change when more violent, the key frame of extraction will be very unstable; Method based on motion analysis, the Typical Representative of this class algorithm has optical flow method, and the calculated amount of such algorithm is larger, strong for local movable information dependence; Method based on cluster, such algorithm is by cluster analysis, extracts cluster centre as key frame, and its advantage is reflecting video content preferably, but its algorithm is complicated, and poor stability.

Summary of the invention

In order to overcome algorithm complexity, the poor stability of existing human posture recognition method, the deficiency of poor robustness, the invention provides a kind ofly simplify calculating, have good stability, the human posture recognition method of many Fusion Features based on key frame that robustness is good.

The technical solution adopted for the present invention to solve the technical problems is:

A human posture recognition method for many Fusion Features based on key frame, described human posture recognition method comprises the following steps:

(1) video image is extracted to Hu invariant moment features, the coverage rate of sequence of computed images, extracting setting that coverage rate is the highest, to cover percentage be candidate's key frame, the distortion rate of calculated candidate key frame then take that to extract the wherein distortion percentage of minimum be key frame;

(2) key frame is carried out to the extraction of foreground image, obtain after the foreground image of movement human;

(3) extract the characteristic information of key frame, described characteristic information is six star models, six star angle and eccentricities; Obtain the image feature vector of many Fusion Features;

(4) use the man-to-man disaggregated model training, described disaggregated model is the attitude sorter based on SVM, and attitude is identified.

Further, in described step (4), for N class attitude, between any two class samples, design a SVM, therefore just need to design N* (N-1)/2 SVM.

Further again, in described step (4), the training process of disaggregated model is as follows:

(4.1) first the foreground image of human body is carried out to pre-service, for different behavior video segments, first it is carried out to the background training based on code book, by the image difference of video sequence frame and background model, obtain the foreground image of movement human, and carry out morphologic image processing, remove picture noise;

(4.2) extract the feature of the training data that comprises 11 kinds of attitudes, the feature database that forms master sample, use the method that multi-pose merges to carry out posture feature description to movement human, described 11 kinds of attitudes are respectively walking, little jump, jump greatly, sidle, squat down, bend over, creep, push-up, sit-ups and sit down;

(4.3), by the study to master sample, build the multicategory classification model based on SVM;

(4.4) utilize test sample book to verify model, if lower than the accuracy of estimating, adjust training sample, turn back to (4.1), until higher than the accuracy of estimating, the disaggregated model after being trained.

Further, in described step (3), the leaching process of described six star models: extract after the barycenter of human body foreground image, the human body silhouettes point obtaining and the distance between center of mass point, human body outline profile is divided into left and right two parts, calculates respectively the distance of the going up most of its two parts silhouette, the most lower and Far Left point, rightmost point and barycenter; At six star models, obtained, after the axis of the heart point that six point arrive, calculating the angle information between axis and adjacent axis; Calculate the eccentricity of human body silhouette.

In described step (1), calculate the process of coverage rate: first calculate the likeness coefficient of every two frames, the similarity of other frames in present frame and video is greater than in the associated frame set of listing present frame in of likeness coefficient mean value of present frame and other frames; In associated frame set, in frame and video, the ratio of all frame numbers is the coverage rate of present frame, asks for the coverage rate of each two field picture in video, and gets 30% frame that coverage rate is the highest as candidate's key frame;

The process of calculated distortion rate: the estimated probability of the gray-scale value of computed image first, and the gray level average of image, then calculate single order, second order, the third moment of each two field picture, submeter represents average, variance and measure of skewness, by the feature of this tri-vector presentation video, and calculate the average square of all frames in the average square of associated frame set of key frame and video; The last average square of associated frame set of calculated candidate key frame and the objective function maximal value of the average square of all frames, thus the distortion rate of candidate's key frame obtained, get the 50% final key frame as this video of minimum.

Beneficial effect of the present invention is mainly manifested in: (1) adopts the attitude of many Fusion Features to describe operator in algorithm, human body attitude can better be described, strengthen describing the embodiment ability to attitude of operator, can set up more accurate attitude mode and recognition result; (2) in video analysis process, added the concept of key frame, by key frame, described the whole content of video, to the posture analysis of key frame, can make video analysis more efficient.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of the human posture recognition method of the many Fusion Features based on key frame.

Fig. 2 is the process flow diagram of the extraction method of key frame based on video content.

Fig. 3 is six star aspect of model schematic diagram of many characteristic models.

Fig. 4 is six star angle character schematic diagram of many characteristic models.

Fig. 5 is the eccentricity schematic diagram of many characteristic models.

Fig. 6 is the schematic diagram of the SVM that designs one to one.

Embodiment

Below in conjunction with accompanying drawing, the invention will be further described.

With reference to Fig. 1～Fig. 6, a kind of human posture recognition method of the many Fusion Features based on key frame, described human posture recognition method comprises the following steps:

In the present embodiment, the training process of attitude mode comprises: the study of the expression that many feature extractions of attitude master sample, attitude are described operator and the multicategory classification device based on support vector machine (SVM:Support Vector Machine); The process of gesture recognition comprises: expression and gesture recognition that many feature extractions of content-based key-frame extraction, key frame, attitude are described operator, as shown in Figure 1, wherein content-based key-frame extraction, attitude based on multi-feature fusion represent and the gesture recognition based on SVM is the gordian technique in this method method structure.

Content-based key-frame extraction: complicacy and the stability of considering algorithm, the present invention proposes a kind of Key-frame Extraction Algorithm based on video content analysis, by calculating coverage rate and the distortion rate of sequence of frames of video, selects the maximum keyframe sequence of inclusion information in video sequence.

Because video image can exist certain geometric distortion in shooting process, and geometric distortion will be brought very large impact for image recognition, therefore need a kind of method with Invariant to rotation and scale, in this method with Hu not the characteristics of image of bending moment extract key frame, the Key-frame Extraction Algorithm flow process based on bending moment is not as shown in Figure 2.

(1) Hu bending moment not

Hu not bending moment is to put forward for 1962, has translation, rotation and yardstick indeformable, can in enough squares space, convert and express and analysis image coordinate transform accordingly.Under image discrete state, ask the multistage common Ju He of image center square.When image changes, common square can change, still responsive to rotation but centre distance has translation invariance.Directly with common Ju Huo center square, carry out character representation, can not make feature there is translation, Invariant to rotation and scale simultaneously.If utilize normalization center square, feature not only has translation invariance, but also has constant rate.

Hu utilizes second order and third central moment to construct seven not bending moments, and they can keep translation, convergent-divergent and invariable rotary under consecutive image condition.Video image is extracted to not bending moment of Hu, and the feature that obtains image is described operator.

(2) coverage rate

Calculate after the Hu invariant moment features of image, utilize the coverage rate of the every sub-picture of this feature calculation, extract candidate's key frame of video.Calculate the process of coverage rate, first calculate the likeness coefficient of every two frames, the similarity of other frames in present frame and video is greater than in the associated frame set of listing present frame in of likeness coefficient mean value of present frame and other frames.In associated frame set, in frame and video, the ratio of all frame numbers is the coverage rate of present frame, asks for the coverage rate of each two field picture in video, and gets 30% frame that coverage rate is the highest as candidate's key frame.

(3) distortion rate

Coverage rate has been calculated the probability that each frame can represent the information of other frames, but cannot from candidate's key frame, tell which frame is best key frame, therefore we are by calculating the distortion rate between other frames in each candidate frame and video, and distortion rate is lower is chosen as key frame.

First the estimated probability of the gray-scale value of computed image, and the gray level average of image, then calculate single order, second order, the third moment of each two field picture, submeter represents average, variance and measure of skewness, by the feature of this tri-vector presentation video, and calculate the average square of all frames in the average square of associated frame set of key frame and video.The last average square of associated frame set of calculated candidate key frame and the objective function maximal value of the average square of all frames, thus the distortion rate of candidate's key frame obtained, get the 50% final key frame as this video of minimum.

Attitude based on multi-feature fusion represents: by calculating above after the key frame of video sequence, extract the characteristic information of key frame, and carry out gesture recognition.The characteristic information adopting is herein the human body attitude feature of six star models, six star angles and the many Fusion Features of eccentricity, thereby obtains the image feature vector of many Fusion Features.

With the algorithm that attitude is described and template matches combines of many Fusion Features, the attitude of many Fusion Features is described algorithm can describe out human body attitude comparatively exactly, has overcome the sensitivity of template matching method to the time interval, has strengthened the robustness of method.

(1) six star model

For the different behaviors of human body, in its profile, comprised the abundantest information that can describe the behavior, so this method selects six star models as one of them feature.Six star models extract after the barycenter of human body foreground image, the human body silhouettes point obtaining and the distance between center of mass point.Human body outline profile is divided into left and right two parts, calculates respectively the distance of the going up most of its two parts silhouette, the most lower and the most left (right side) edge point and barycenter, as shown in Figure 3.

(2) six star angles

Second angle that is characterized as six stars in many features, has obtained, after the axis of the heart point that six point arrive, calculating the angle information between axis and adjacent axis at six star models, obtains six angles, as shown in Figure 4.

(3) eccentricity

This method is extracted the eccentricity of human body contour outline as one of feature, calculates human body silhouette eccentricity, thereby has obtained the image feature vector of many Fusion Features, as shown in Figure 5 by formula.

Attitude sorter based on SVM: this method selects the multicategory classification device of support vector to carry out modeling and classification to human body behavior.Support vector machine (Support Vector Machine, SVM) can, the in the situation that of small sample, obtain the good sorter of generalization ability.SVM can be transformed into a linear space by nonlinear feature space by kernel function, then in the feature space after conversion, constructs lineoid, obtains the optimization model between class and class.Due to what extract herein, being the many features that merge, is therefore a nonlinear space, and choice for use radial basis kernel function (Radial Basis Function, RBF) is carried out feature space conversion, finally obtains the optimal classification face between all kinds of.

Because the training sample gathering can be because the error in gatherer process exists certain noise, therefore in support vector machine, define penalty factor and nuclear parameter γ and solved study and the inseparable problem excessively causing due to noise, in training process, by the method for parameter tuning, determine penalty factor and nuclear parameter, thereby obtain an optimal classification face, finally structure obtains SVM multi-class classifier.

By kernel function, proper vector is carried out after higher-dimension conversion, to the training sample study that exercises supervision, with the function in lower dimensional space, calculate sorter.In the present invention, use the multicategory classification device method for designing of (one-versus-one) one to one, for N class attitude, between any two class samples, design a SVM, therefore just need to design N* (N-1)/2 SVM, as shown in Figure 6, there are three kinds of attitude classes, between attitude class 1 and attitude class 2, design a sorter f _1,2(x), design category device f between attitude class 2 and attitude class 3 _2,3(x), the sorter f between attitude class 1 and attitude class 3 _1,3(x), totally 3 sorters.When a unknown sample is classified, for each sorter, obtain a classification results, finally divide who gets the most votes's classification to be the classification of this unknown sample.

Gathered altogether more than 400 video segment (720*576,30fps), movement human video data and the Weizmann human body behavior public database certainly taken have been comprised, and video data is divided into training data and test data set, its ratio is 2:1, altogether comprised 11 kinds of attitudes, be respectively walking (walk), jump 1(jump1), jump 2(jump2), sidle (sidewalk), squat down (squat), bend over (stoop), creep (crawl), push-up (push-up), sit-ups (sit-up), (sit) sits down.

The training stage of disaggregated model:

(1) first human body prospect is carried out to pre-service, for different behavior video segments, first it is carried out to the background training based on code book, by the image difference of video sequence frame and background model, obtain the foreground image of movement human, and carry out morphologic image processing, remove picture noise;

(2) extract the feature (as shown in Fig. 3,4,5) of the training data that comprises 11 kinds of attitudes, form the feature database of master sample, use the method that multi-pose merges to carry out posture feature description to movement human;

(3), by the study to master sample, build the multicategory classification model based on SVM;

(4) utilize test sample book to verify its correctness to model, lower than the accuracy of estimating, adjust training sample.

To identifying in input video, comprise four actions, have 214 two field pictures, algorithm extracts 32 key frames, realizes the identification of attitude sequence, and as shown in Figure 8, detailed process is as follows:

(1) image is extracted to Hu invariant moment features, for calculating the coverage rate of video image sequence, extract coverage rate the highest 30% be candidate's key frame, the distortion rate of calculated candidate key frame then, take extract wherein 50% be key frame;

(3) in algorithm, use multi-pose to merge human body attitude and describe operator;

(4) with the disaggregated model training, attitude is identified.

Claims

1. a human posture recognition method for the many Fusion Features based on key frame, is characterized in that: described human posture recognition method comprises the following steps:

(1) video image is extracted to Hu invariant moment features, the coverage rate of sequence of computed images, extracting setting that coverage rate is the highest, to cover percentage be candidate's key frame, the distortion rate of calculated candidate key frame then take that to extract the wherein distortion percentage of minimum be key frame; In described step (1), calculate the process of coverage rate: first calculate the likeness coefficient of every two frames, the similarity of other frames in present frame and video is greater than in the associated frame set of listing present frame in of likeness coefficient mean value of present frame and other frames; In associated frame set, in frame and video, the ratio of all frame numbers is the coverage rate of present frame, asks for the coverage rate of each two field picture in video, and gets 30% frame that coverage rate is the highest as candidate's key frame;

The process of calculated distortion rate: the estimated probability of the gray-scale value of computed image first, and the gray level average of image, then calculate single order, second order, the third moment of each two field picture, submeter represents average, variance and measure of skewness, by the feature of this tri-vector presentation video, and calculate the average square of all frames in the average square of associated frame set of key frame and video; The last average square of associated frame set of calculated candidate key frame and the objective function maximal value of the average square of all frames, thus the distortion rate of candidate's key frame obtained, get the 50% final key frame as this video of minimum;

(3) extract the characteristic information of key frame, described characteristic information is six star models, six star angle and eccentricities; Obtain the image feature vector of many Fusion Features; In described step (3), the leaching process of described six star models: extract after the barycenter of human body foreground image, the human body silhouettes point obtaining and the distance between center of mass point, human body outline profile is divided into left and right two parts, calculates respectively the distance of the going up most of its two parts silhouette, the most lower and Far Left point, rightmost point and barycenter; At six star models, obtained, after the axis of the heart point that six point arrive, calculating the angle information between axis and adjacent axis; Calculate the eccentricity of human body silhouette;

2. the human posture recognition method of the many Fusion Features based on key frame as claimed in claim 1, it is characterized in that: in described step (4), for N class attitude, between any two class samples, design a SVM, therefore just need to design N* (N-1)/2 SVM.

3. the human posture recognition method of the many Fusion Features based on key frame as claimed in claim 1 or 2, is characterized in that: in described step (4), the training process of disaggregated model is as follows: