CN102682302A

CN102682302A - Human body posture identification method based on multi-characteristic fusion of key frame

Info

Publication number: CN102682302A
Application number: CN2012100638935A
Authority: CN
Inventors: 黄鲜萍; 郑莉莉; 梁荣华
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2012-03-12
Filing date: 2012-03-12
Publication date: 2012-09-19
Anticipated expiration: 2032-03-12
Also published as: CN102682302B

Abstract

The invention provides a human body posture identification method based on multi-characteristic fusion of a key frame, comprising the following steps of: (1) extracting Hu invariant moment characteristics from a video image; calculating a covering rate of an image sequence; extracting the highest covering percentage of the covering rate as a candidate key frame; then calculating a distortion rate of the candidate key frame and extracting the minimum distortion percentage as the key frame; (2) carrying out extraction of a foreground image on the key frame to obtain the foreground image of a moving human body; (3) extracting characteristic information of the key frame, wherein the characteristic information comprises a six-planet model, a six-planet angle and eccentricity; obtaining a multi-characteristic fused image characteristic vector; and (4) utilizing a one-to-one trained classification model, wherein the classification model is a posture classifier based on an SVM (Secure Virtual Machine); and identifying a posture. The human body posture identification method has the advantages of simplified calculation, good stability and good robustness.

Description

A kind of human body attitude recognition methods of the many Feature Fusion based on key frame

Technical field

The present invention relates to a kind of human body attitude recognition methods.

Background technology

In recent years; Along with China and other emerging economy urbanization construction are quickened; City management problems such as a series of traffic that simultaneous urbanization caused that movement of population increases, public security, video monitoring is more and more universal, and is also increasingly high to the intelligentized demand of video monitoring.People hope from video, to extract each application in the live and work that more information is applied to people, for example safety monitoring, Smart Home, man-machine interaction, sportsman's supplemental training etc. through intellectual analysis.The security protection industry in places such as bank, rail, warehouse, needs to realize automatically motion target detection and tracking, and the abnormal alarm situation, thereby reduces all kinds of losses; In house security system and medical monitoring system, can in time detect become a Buddhist monk or nun in old man or patient whether abnormal conditions take place; In field in intelligent robotics, hope can be to the analysis of people's in the video attitude, gesture, language, exchanges with the people and interactive; In some trainings such as sports, dance training, system can analyze athletic joint kinematic parameter, is used to improve athletic training patterns.

Two kinds of methods are adopted in existing human body attitude identification usually: template matching method and state-space method.Template matching method is to convert image sequence into one group of static in shape pattern; In identifying, use the behavior sample of storing in advance to come people's in the interpretation of images sequence motion then, the advantage of this algorithm is that computation complexity is low, realizes simple; But the time interval to behavior is responsive, poor robustness.State-space method is with state-space model each static posture of definition and as a state; Connect through certain probability between these states; Each motion sequence can be seen the once traversal between these static posture different conditions as, and calculates the process of its joint probability, and this algorithm can be avoided the problem to time interval modeling; But need training sample big, computation complexity is high.

Key frame is meant quantity of information maximum, a representational pair or multiple image in video sequence; The content outline that can reflect one section video; To accomplish simultaneously as far as possible succinct, data volume is few, therefore in the video analysis process, extract key frame and carry out gesture recognition and improve video analysis efficient significantly.Key-frame extraction technology commonly used roughly can be divided into: based on the method for shot boundary; Be that video is divided into several independent camera lenses according to scene; First frame, last frame or the intermediate frame of getting every set of shots are as key frame; This algorithm is simple, quick, but does not consider the complicacy of vision content, can't represent the video segment that content is long; Method based on video content analysis; Be to utilize characteristics of image; Judge the feature difference degree of video frame image and present frame, difference more greatly then is key frame, and this type algorithm can obtain effect preferably for the video of different length and content; But change when violent for camera motion and video content, the key frame of extraction will be very unstable; Based on the method for motion analysis, the typical case of this type algorithm representative has optical flow method, and the calculated amount of such algorithm is bigger, and is strong for the movable information dependence of part; Based on the method for cluster, such algorithm is through cluster analysis, extracts cluster centre as key frame, and its advantage is a reflecting video content preferably, but its complex algorithm, and poor stability.

Summary of the invention

For the deficiency of the complex algorithm that overcomes existing human body attitude recognition methods, poor stability, poor robustness, the present invention provides a kind of and simplifies calculating, has good stability, robustness is good based on the human body attitude recognition methods of many Feature Fusion of key frame.

The technical solution adopted for the present invention to solve the technical problems is:

A kind of human body attitude recognition methods of the many Feature Fusion based on key frame, said human body attitude recognition methods may further comprise the steps:

(1) video image is extracted the Hu invariant moment features, the coverage rate of sequence of computed images, extracting the highest setting covering percentage of coverage rate is candidate's key frame, the distortion rate of calculated candidate key frame is a key frame to extract wherein minimum distortion percentage then;

(2) key frame is carried out the extraction of foreground image, obtain the foreground image of movement human after;

(3) characteristic information of extraction key frame, said characteristic information is six star models, six star angle and eccentricities; Obtain the characteristics of image vector of many Feature Fusion;

(4) use the man-to-man disaggregated model that trains, said disaggregated model is the attitude sorter based on SVM, and attitude is discerned.

Further, in the said step (4), to N class attitude, therefore SVM of design between any two types of samples just needs design N* (N-1)/2 SVM.

Further again, in the said step (4), the training process of disaggregated model is following:

(4.1) at first the foreground image of human body is carried out pre-service; For different behavior video segments; Earlier it is carried out the background training based on code book,, obtain the foreground image of movement human through the image difference of video sequence frame and background model; And carry out morphologic Flame Image Process, remove picture noise;

(4.2) extraction comprises the characteristic of the training data of 11 kinds of attitudes; The feature database of forming master sample; The method of using colourful attitude to merge is carried out posture feature to movement human and is described, and said 11 kinds of attitudes are respectively walking, little jump, jump greatly, sidle, squat down, bend over, creep, push-up, sit-ups and sit down;

(4.3), make up multicategory classification model based on SVM through study to master sample;

(4.4) utilize test sample book that model is verified,, turn back to (4.1) if be lower than the accuracy of expectation then adjust training sample, up to the accuracy that is higher than expectation, the disaggregated model after obtaining training.

Further; In the said step (3); The leaching process of said six star models: after extracting the barycenter of human body foreground image; Human body silhouettes point that obtains and the distance between the center of mass point are divided into left and right sides two parts with the human body outline profile, calculate the going up most of its two parts silhouette, down and the distance of Far Left point, rightmost point and barycenter respectively; Obtained the axis of the heart point that six point arrive at six star models after, calculate the angle information between axis and the adjacent axis; Calculate the eccentricity of human body silhouette.

In the said step (1), calculate the process of coverage rate: at first calculate the likeness coefficient of per two frames, in the associated frame set of then listing present frame in of the similarity of other frames in present frame and the video greater than the likeness coefficient mean value of present frame and other frames; Then in the associated frame set in frame and the video ratio of all frame numbers be the coverage rate of present frame, ask for the coverage rate of each two field picture in the video, and get 30% the highest frame of coverage rate as candidate's key frame;

The process of calculated distortion rate: the estimated probability of the gray-scale value of computed image at first; And the gray level average of image; Calculate single order, second order, the third moment of each two field picture then; Submeter is represented average, variance and measure of skewness, with the characteristic of this tri-vector presentation video, and calculates the average square of all frames in average square and the video of associated frame set of key frame; The average square of the associated frame set of last calculated candidate key frame and the objective function maximal value of the average square of all frames, thus the distortion rate of candidate's key frame obtained, get 50% final key frame of minimum as this video.

Beneficial effect of the present invention mainly shows: (1) adopts the attitude of many Feature Fusion to describe operator in algorithm, can better describe human body attitude, strengthens describing the embodiment ability to attitude of operator, can set up more accurate attitude model and recognition result; (2) in the video analysis process, added the notion of key frame, described the entirety of video, can make video analysis more efficient the posture analysis of key frame through key frame.

Description of drawings

Fig. 1 is based on the process flow diagram of human body attitude recognition methods of many Feature Fusion of key frame.

Fig. 2 is based on the process flow diagram of the extraction method of key frame of video content.

Fig. 3 is six a star aspect of model synoptic diagram of many characteristic models.

Fig. 4 is six star angle character synoptic diagram of many characteristic models.

Fig. 5 is the eccentricity synoptic diagram of many characteristic models.

Fig. 6 is the synoptic diagram of the SVM that designs one to one.

Embodiment

Below in conjunction with accompanying drawing the present invention is further described.

With reference to Fig. 1～Fig. 6, a kind of human body attitude recognition methods of the many Feature Fusion based on key frame, said human body attitude recognition methods may further comprise the steps:

In the present embodiment, the training process of attitude model comprises: the expression that many feature extractions of attitude master sample, attitude are described operator and based on the study of the multicategory classification device of SVMs (SVM:Support Vector Machine); The process of gesture recognition comprises: expression and gesture recognition that many feature extractions of content-based key-frame extraction, key frame, attitude are described operator; The method structure is as shown in Figure 1, and wherein content-based key-frame extraction, attitude based on multi-feature fusion are represented and be the gordian technique in this method based on the gesture recognition of SVM.

Content-based key-frame extraction: complicacy and the stability of considering algorithm; The present invention proposes a kind of key-frame extraction algorithm based on video content analysis, selects to comprise the maximum key frame sequence of information in the video sequence through the coverage rate and the distortion rate that calculate sequence of frames of video.

Because can there be certain geometric distortion in video image in shooting process; And geometric distortion will be brought very big influence for image recognition; Therefore need a kind of method with rotation and constant rate property; Use the characteristics of image of Hu invariant moments to extract key frame in this method, as shown in Figure 2 based on the key-frame extraction algorithm flow of invariant moments.

(1) Hu invariant moments

The Hu invariant moments is to put forward in 1962, and it is indeformable to have translation, rotation and a yardstick, can enough squares space in corresponding conversion express and the analysis image coordinate transform.Under the image discrete state, ask multistage common square of image and central moment.When image changed, common square can change, and is still responsive to rotation but centre distance then has translation invariance.Directly carry out character representation, can not make characteristic have translation, rotation and constant rate property simultaneously with common square or central moment.If utilize normalization central moment, then characteristic not only has translation invariance, but also has constant rate property.

Hu utilizes second order and third central moment to construct seven invariant moments, and they can keep translation, convergent-divergent and invariable rotary under the consecutive image condition.Video image is extracted the Hu invariant moments, obtain the feature description operator of image.

(2) coverage rate

After calculating the Hu invariant moment features of image, utilize the coverage rate of the every sub-picture of this feature calculation, extract candidate's key frame of video.Calculate the process of coverage rate, at first calculate the likeness coefficient of per two frames, in the associated frame set of then listing present frame in of the similarity of other frames in present frame and the video greater than the likeness coefficient mean value of present frame and other frames.Then in the associated frame set in frame and the video ratio of all frame numbers be the coverage rate of present frame, ask for the coverage rate of each two field picture in the video, and get 30% the highest frame of coverage rate as candidate's key frame.

(3) distortion rate

Coverage rate has been calculated the probability that each frame can be represented the information of other frames; But can't from candidate's key frame, tell which frame is best key frame; Therefore we are through calculating the distortion rate between other frames in each candidate frame and the video, and distortion rate is low more is chosen as key frame.

The estimated probability of the gray-scale value of computed image at first; And the gray level average of image; Calculate single order, second order, the third moment of each two field picture then; Submeter is represented average, variance and measure of skewness, with the characteristic of this tri-vector presentation video, and calculates the average square of all frames in average square and the video of associated frame set of key frame.The average square of the associated frame set of last calculated candidate key frame and the objective function maximal value of the average square of all frames, thus the distortion rate of candidate's key frame obtained, get 50% final key frame of minimum as this video.

Attitude based on multi-feature fusion is represented: behind the above key frame that calculates video sequence, extract the characteristic information of key frame, and carry out gesture recognition.The characteristic information that this paper adopts is the human body attitude characteristic of six star models, six star angles and the many Feature Fusion of eccentricity, thereby obtains the characteristics of image vector of many Feature Fusion.

With the algorithm that the attitude of many Feature Fusion is described and template matches combines, the attitude of many Feature Fusion is described algorithm can describe out human body attitude comparatively exactly, has overcome template matching method to the sensitivity in the time interval, has strengthened the robustness of method.

(1) six star model

For the different behaviors of human body, comprised the abundantest information that to describe the behavior in its profile, so this method selects six star models as one of them characteristic for use.After six star models promptly extract the barycenter of human body foreground image, human body silhouettes point that obtains and the distance between the center of mass point.The human body outline profile is divided into left and right sides two parts, calculates the going up most of its two parts silhouette, the distance of a left side (right side) edge point and barycenter down and respectively, as shown in Figure 3.

(2) six star angles

Second angle that is characterized as six stars in many characteristics promptly after six star models have obtained the axis of the heart point that six point arrive, calculated the angle information between axis and the adjacent axis, obtains six angles, and be as shown in Figure 4.

(3) eccentricity

This method is extracted the eccentricity of human body contour outline as one of characteristic, calculates human body silhouette eccentricity through formula, thereby has obtained the characteristics of image vector of many Feature Fusion, and is as shown in Figure 5.

Attitude sorter based on SVM: this method selects the multicategory classification device of support vector that modeling and classification are carried out in the human body behavior.(Support Vector Machine SVM) can obtain generalization ability sorter preferably to SVMs under small sampling condition.SVM can be transformed into a linear space with the non-linear characteristics space through kernel function, constructs lineoid in the feature space after conversion then, the optimization model between type of obtaining and the class.Because what this paper extracted is the many characteristics that merge, and is a nonlinear space therefore, (Radial Basis Function RBF) carries out the feature space conversion, finally obtains the optimal classification face between all kinds of to select to use radially basic kernel function.

Owing to the training sample of gathering can be because there be certain noise in the error in the gatherer process; Therefore in SVMs, define penalty factor and nuclear parameter γ and solved study and the inseparable problem excessively that causes owing to noise; Transfer excellent method to confirm penalty factor and nuclear parameter through parameter in the training process; Thereby obtain an optimal classification face, structure obtains SVM multi-class targets sorter at last.

Through kernel function proper vector is carried out after higher-dimension transforms,, calculate sorter with the function in the lower dimensional space to the training sample study that exercises supervision.Use the multicategory classification device method for designing of (one-versus-one) one to one among the present invention; To N class attitude; Therefore SVM of design between any two types of samples just needs design N* (N-1)/2 SVM, as shown in Figure 6; Three kinds of attitude classes are arranged, sorter f of design between attitude class 1 and the attitude class 2 _1,2(x), design category device f between attitude class 2 and the attitude class 3 _2,3(x), the sorter f between attitude class 1 and the attitude class 3 _1,3(x), totally 3 sorters.When a unknown sample is carried out the branch time-like, obtain a classification results to each sorter, divide who gets the most votes's classification to be the classification of this unknown sample at last.

Gathered more than 400 video segment (720*576 altogether; 30fps); Movement human video data and the Weizmann human body behavior public database taken have certainly been comprised; And video data is divided into training data and test data set; Its ratio is 2: 1, has comprised 11 kinds of attitudes altogether, be respectively walking (walk), 1 (jump1) that jump, 2 (jump2) that jump, sidle (sidewalk), squat down (squat), bend over (stoop), creep (crawl), push-up (push-up), sit-ups (sit-up), (sit) sits down.

The training stage of disaggregated model:

(1) at first the human body prospect is carried out pre-service,, earlier it is carried out the background training based on code book for different behavior video segments; Image difference through video sequence frame and background model; Obtain the foreground image of movement human, and carry out morphologic Flame Image Process, remove picture noise;

(2) extraction comprises the characteristic (like Fig. 3,4, shown in 5) of the training data of 11 kinds of attitudes, forms the feature database of master sample, and the method for using colourful attitude to merge is carried out posture feature to movement human and described;

(3), make up multicategory classification model based on SVM through study to master sample;

(4) utilize test sample book that model is verified its correctness, the accuracy that is lower than expectation is then adjusted training sample.

To discerning in the input video, comprise four actions, have 214 two field pictures, algorithm extracts 32 key frames, realizes the identification of attitude sequence, and as shown in Figure 8, detailed process is following:

(1) image is extracted the Hu invariant moment features, is used for calculating the coverage rate of video image sequence, extract coverage rate the highest 30% be candidate's key frame, the distortion rate of calculated candidate key frame then, with extraction wherein 50% be key frame;

(3) use colourful attitude to merge human body attitude in the algorithm and describe operator;

(4) with the disaggregated model that trains, attitude is discerned.

Claims

1. human body attitude recognition methods based on many Feature Fusion of key frame, it is characterized in that: said human body attitude recognition methods may further comprise the steps:

2. the human body attitude recognition methods of the many Feature Fusion based on key frame as claimed in claim 1 is characterized in that: in the said step (4), to N class attitude, therefore SVM of design between any two types of samples just needs design N* (N-1)/2 SVM.

3. according to claim 1 or claim 2 human body attitude recognition methods based on many Feature Fusion of key frame, it is characterized in that: in the said step (4), the training process of disaggregated model is following:

4. according to claim 1 or claim 2 human body attitude recognition methods based on many Feature Fusion of key frame; It is characterized in that: in the said step (3); The leaching process of said six star models: after extracting the barycenter of human body foreground image; Human body silhouettes point that obtains and the distance between the center of mass point are divided into left and right sides two parts with the human body outline profile, calculate the going up most of its two parts silhouette, down and the distance of Far Left point, rightmost point and barycenter respectively; Obtained the axis of the heart point that six point arrive at six star models after, calculate the angle information between axis and the adjacent axis; Calculate the eccentricity of human body silhouette.

5. according to claim 1 or claim 2 human body attitude recognition methods based on many Feature Fusion of key frame; It is characterized in that: in the said step (1); Calculate the process of coverage rate: at first calculate the likeness coefficient of per two frames, in the associated frame set of then listing present frame in of the similarity of other frames in present frame and the video greater than the likeness coefficient mean value of present frame and other frames; Then in the associated frame set in frame and the video ratio of all frame numbers be the coverage rate of present frame, ask for the coverage rate of each two field picture in the video, and get 30% the highest frame of coverage rate as candidate's key frame;