CN106156775A

CN106156775A - Human body feature extraction method based on video, human body recognition method and device

Info

Publication number: CN106156775A
Application number: CN201510148613.4A
Authority: CN
Inventors: 黄锐
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2015-03-31
Filing date: 2015-03-31
Publication date: 2016-11-23
Anticipated expiration: 2035-03-31
Also published as: CN106156775B

Abstract

The present invention relates to a kind of human body feature extraction method based on video, human body recognition method and device, wherein, human body feature extraction method based on video is for extracting the characteristics of human body of the appearance characteristics that can represent individual from video, including: by the cycle of taking a step of individual, video is carried out segmentation, obtain at least one video-frequency band；Each described video-frequency band is carried out piecemeal by the body part of described individual, to obtain at least one image block set；Utilize the gauss hybrid models of training, from each described image block set, extract a space-time characteristic vector respectively.The video can taken a step individual carries out on the time and spatially has the division of physical significance, carries out effective space-time alignment according to dividing the image block set obtained, thus extracts characteristics of human body exactly.

Description

Human body feature extraction method based on video, human body recognition method and device

Technical field

The present invention relates to video content analysis and field of video monitoring, particularly relate to a kind of people based on video Body characteristics extracting method, human body recognition method and device.

Background technology

In video content analysis and video monitoring, it is often necessary to the identity of the personage occurred in video is entered Row identifies.The recognition methods generally used includes recognition of face, human bioequivalence etc..Human bioequivalence is to use The external appearance characteristic of whole human body carries out the identification of piece identity, is usually used in seeking in substantial amounts of monitor video Look for specific people, as public security department suitably reduces manual search, investigation suspicion according to external appearance characteristics such as clothes Suspect or the scope of missing crew.Another application of human bioequivalence wears specific uniform by identifying Personnel carry out convenient entrance guard management or demographics.

The most conventional human body recognition method is to be identified based on (individual or multiple) still image.With Recognition of face is similar to, and extracts feature from the image including whole human body, by the feature to unknown personage It is compared to determine the identity of unknown personage with the feature of known personage.Owing to still image is difficult to completely Describe people's external appearance characteristic under different attitudes, had and a small amount of carried out human bioequivalence based on video Research work.The method seen in document 1 is: first divided by the video sequence comprising people and carrying out displacement It is slit into and the most substantially only includes human body and a small amount of background, the time substantially comprises one take a step the cycle relatively Short video, using this video as a three-dimensional data block, extracts example from this three-dimensional data block Three-dimensional feature such as HOG (Histogram of Oriented Gradient, histograms of oriented gradients) carrys out table Let others have a look at the outward appearance of body, then by being compared to determine to the unknown feature of personage and the feature of known personage The identity of unknown personage.

But it is special that the HOG three-dimensional feature using the method to be extracted is a kind of rectangular histogram based on fixed block Levy, body part between frame before and after video of having no idea to ensure spatially to have alignd, therefore, no Can accurately extract the feature of human body, be easily caused recognition result inaccurate.

List of references:

[1]Person Re-identification by Video Ranking,Wang et al.,ECCV 2014.

Summary of the invention

Technical problem

In view of this, the technical problem to be solved in the present invention is, how to extract from video and can represent people The characteristics of human body of appearance characteristics, to improve the accuracy carrying out human bioequivalence based on video as far as possible.

Solution

In order to solve the problems referred to above, the embodiment of the present invention provides a kind of characteristics of human body side of extraction based on video Method, for extracting the characteristics of human body of the appearance characteristics that can represent individual from video, including:

By the cycle of taking a step of described individual, described video is carried out segmentation, obtain at least one video-frequency band；

Each described video-frequency band is carried out piecemeal by the body part of described individual, to obtain at least one image Set of blocks, a described image block set includes in all two field pictures of described video-frequency band for same body The image block at position；

Utilize the gauss hybrid models of training, from each described image block set, extract a space-time spy respectively Levying vector, described space-time characteristic vector can represent institute in the video-frequency band being associated with described image block set State the appearance characteristics of the targeted body part of image block set.

In a kind of possible implementation, described video carried out point by the cycle of taking a step of described individual Section, obtains at least one video-frequency band, including:

Calculate the light stream strength signal of each two field picture of described video, and obtain according to described light stream strength signal Actual optical flow intensity curve to described video；

The light stream strength signal of each described two field picture is carried out Fourier transformation and obtains regularization signal, and obtain Take the described regularization signal basic frequency at frequency domain；

According to described basic frequency, described regularization signal is carried out inverse Fourier transform, obtain described video Desired light intensity of flow curve；

According to described actual optical flow intensity curve and the pole value of described desired light intensity of flow curve, to described Video carries out segmentation, obtains at least one described video-frequency band.

In a kind of possible implementation, described gauss hybrid models is obtained by following steps training:

By the described cycle of taking a step, video sample is carried out segmentation, obtain at least one training video section；

Each described training video section is carried out piecemeal by described body part, to obtain at least one training figure As set of blocks, a described training image set of blocks includes pin in all two field pictures of described training video section Image block to same body part；

For each described training image set of blocks, respectively by each image block in described training image set of blocks Pixel classify according to low-level image feature, for each class pixel, training obtains such as following formula 1 Gauss model, described gauss hybrid models includes that each class pixel of described training image set of blocks is corresponding Gauss model；

Θ={ (μ_k,σ_k,π_k): k=1 ..., K} formula 1,

Wherein, Θ is the Gauss model that each class pixel is corresponding, and K is classification quantity, μ_kFor kth class The average of the low-level image feature of pixel, σ_kFor the variance of low-level image feature of the pixel of kth class, π_kFor kth The weight of the low-level image feature of the pixel of class.

In a kind of possible implementation, utilize the gauss hybrid models of training, respectively from each described figure As set of blocks extracts a space-time characteristic vector, including:

For each described image block set, extract the low-level image feature of each pixel；

By the low-level image feature of each described pixel of described image block set, according to described low-level image feature and instruction Practice the relation of the gauss hybrid models of gained, obtain the characteristic vector of each described pixel；

The characteristic vector of calculated each described pixel is average, obtain described image block set time Empty characteristic vector.

In order to solve the problems referred to above, the embodiment of the present invention provides a kind of human body recognition method based on video, Including:

Any one human body feature extraction method based on video according to embodiments of the present invention, from relating to Know the space-time characteristic vector extracting described known personage in the video of personage；

Any one human body feature extraction method based on video according to embodiments of the present invention, treats from relating to Identify the space-time characteristic vector extracting described personage to be identified in the video of personage；

The space-time characteristic vector of vectorial for the space-time characteristic of described known personage and described personage to be identified is entered Row compares, to determine the identity of described personage to be identified.

In order to solve the problems referred to above, the embodiment of the present invention provides a kind of characteristics of human body based on video to extract dress Put, for extracting the characteristics of human body of the appearance characteristics that can represent individual from video, including:

Time divides module, for described video being carried out segmentation by the cycle of taking a step of described individual, obtains At least one video-frequency band；

Space partitioning module, for each described video-frequency band is carried out piecemeal by the body part of described individual, To obtain at least one image block set, a described image block set includes all frames of described video-frequency band For the image block of same body part in image；

Characteristic extracting module, for utilizing the gauss hybrid models of training, respectively from each described image block collection Extracting a space-time characteristic vector in conjunction, described space-time characteristic vector can represent and described image block set The appearance characteristics of the body part that image block set is targeted described in the video-frequency band being associated.

In a kind of possible implementation, the described time divides module and includes:

Optical flow computation submodule, for calculating the light stream strength signal of each two field picture of described video, and root The actual optical flow intensity curve of described video is obtained according to described light stream strength signal；

Fourier transformation submodule, for carrying out Fourier's change to the light stream strength signal of each described two field picture Get regularization signal in return, and obtain the described regularization signal basic frequency at frequency domain；

Inverse Fourier transform submodule, for carrying out in Fu described regularization signal according to described basic frequency Leaf inverse transformation, obtains the desired light intensity of flow curve of described video；

Segmentation submodule, for according to described actual optical flow intensity curve and described desired light intensity of flow curve Pole value, described video is carried out segmentation, obtains at least one described video-frequency band.

In a kind of possible implementation, it is characterised in that

The described time divide module be additionally operable to, by the described cycle of taking a step, video sample is carried out segmentation, obtain to A few training video section；

Described space partitioning module is used for by described body part, each described training video section is carried out piecemeal, To obtain at least one training image set of blocks, a described training image set of blocks includes that described training regards For the image block of same body part in all two field pictures of frequency range；

Described device also includes:

Training module, for for each described training image set of blocks, respectively by described training image blocks collection The pixel of each image block in conjunction is classified according to low-level image feature, for each class pixel, training Obtaining such as the Gauss model of following formula 1, described gauss hybrid models includes the every of described training image set of blocks The Gauss model that one class pixel is corresponding；

Θ={ (μ_k,σ_k,π_k): k=1 ..., K} formula 1,

In a kind of possible implementation, described characteristic extracting module includes:

Low-level image feature extracts submodule, for for each described image block set, extracts the end of each pixel Layer feature；

Characteristic vector pickup submodule, for by special for the bottom of each described pixel of described image block set Levy, according to the relation of described low-level image feature with the gauss hybrid models of training gained, obtain each described pixel The characteristic vector of point；

The average submodule of characteristic vector, for putting down the characteristic vector of calculated each described pixel All, the space-time characteristic vector of described image block set is obtained.

In order to solve the problems referred to above, the embodiment of the present invention provides a kind of human bioequivalence device based on video, Including: characteristics of human body's extraction element based on video of any one structure in the embodiment of the present invention,

Described characteristics of human body's extraction element based on video, for extracting from the video relating to known personage The space-time characteristic vector of described known personage, extracts described to be identified from the video relating to personage to be identified The space-time characteristic vector of personage；

Described human bioequivalence device based on video also includes:

Comparison module, for by vectorial for the space-time characteristic of described known personage with described personage to be identified time Empty characteristic vector compares, to determine the identity of described personage to be identified.

Beneficial effect

The embodiment of the present invention, it is possible to by the cycle of taking a step, video carried out segmentation, and press body part to video Duan Jinhang piecemeal, obtains image block set, thus the video realizing that individual is taken a step to carry out the time upper and empty There is the division of physical significance between, carry out effective space-time alignment according to dividing the image block set obtained, It is thus possible to extract characteristics of human body exactly by image block set.

According to below with reference to the accompanying drawings to detailed description of illustrative embodiments, the further feature of the present invention and side Face will be clear from.

Accompanying drawing explanation

The accompanying drawing of the part comprising in the description and constituting description together illustrates with description The exemplary embodiment of the present invention, feature and aspect, and for explaining the principle of the present invention.

Fig. 1 is the flow chart of the human body feature extraction method based on video according to one embodiment of the invention；

Fig. 2 is to carry out video according in the human body feature extraction method based on video of one embodiment of the invention The flow chart of segmentation；

Fig. 3 is to carry out video according in the human body feature extraction method based on video of one embodiment of the invention The schematic diagram of segmentation；

Fig. 4 is according to training Gauss in the human body feature extraction method based on video of one embodiment of the invention The flow chart of mixed model；

Fig. 5 is according to feature extraction in the human body feature extraction method based on video of one embodiment of the invention Flow chart；

Fig. 6 is the flow chart of the human body recognition method based on video according to one embodiment of the invention；

Fig. 7 is the structure chart of the characteristics of human body's extraction element based on video according to one embodiment of the invention；

Fig. 8 is the structure chart of the human bioequivalence device based on video according to one embodiment of the invention.

Detailed description of the invention

Various exemplary embodiments, feature and the aspect of the present invention is described in detail below with reference to accompanying drawing.Attached Reference identical in figure represents the same or analogous element of function.Although enforcement shown in the drawings The various aspects of example, but unless otherwise indicated, it is not necessary to accompanying drawing drawn to scale.

The most special word " exemplary " means " as example, embodiment or illustrative ".Here as Any embodiment illustrated by " exemplary " should not necessarily be construed as preferred or advantageous over other embodiments.

It addition, in order to better illustrate the present invention, detailed description of the invention below gives numerous Detail.It will be appreciated by those skilled in the art that do not have some detail, the present invention is equally Implement.In some instances, for method well known to those skilled in the art, means, element and circuit It is not described in detail, in order to highlight the purport of the present invention.

Fig. 1 is the flow chart of the human body feature extraction method based on video according to one embodiment of the invention. The method can represent the characteristics of human body of the appearance characteristics of individual for extraction from video, as it is shown in figure 1, The method mainly may include that

Step 101, by the cycle of taking a step of described individual, described video is carried out segmentation, obtain at least one Video-frequency band.

Specifically, by the cycle of taking a step of individual, video is carried out segmentation and can include two steps.First, The video that can someone be taken a step becomes video-frequency band according to a period divisions of completely taking a step of individual, produces Video-frequency band may have different length, but usual each video-frequency band includes complete taking a step the cycle, Such as left foot lifts and starts to right crus of diaphragm to land end.Then, then to including regarding of this complete cycle of taking a step Frequency range is segmented further, specifically can be the thinnest to this video-frequency band according to the kinestate of parts of body Point, be such as divided into lift left lower limb, the left lower limb that falls, lift right lower limb, fall four video-frequency bands the most of right lower limb.For another example: After extracting the video-frequency band including a complete cycle of taking a step from video, can first determine and carry out health Spots localization, owing to head and trunk may change inconspicuous in the whole cycle of taking a step, can be incorrect The video-frequency band of portion and trunk segments (seeing Fig. 3) further.Wherein, above segmentation is only a kind of example, Specifically can divide flexibly according to the time span of the actual video chosen and physical motion state etc. and regard The length of frequency range, the concrete grammar of video segmentation is not limited by the present invention.

In a kind of possible implementation, as in figure 2 it is shown, the concrete grammar of video segmentation may include that

Step 201, calculate light stream (optic flow) strength signal of each two field picture of described video, and The actual optical flow intensity curve of described video is obtained according to described light stream strength signal；

Step 202, light stream strength signal to each described two field picture carry out Fourier transformation (FFT) and obtain Regularization signal；

Step 203, obtain the described regularization signal (regularization) basic frequency at frequency domain；

Step 204, according to described basic frequency, described regularization signal is carried out inverse Fourier transform (IFFT), Obtain the desired light intensity of flow curve of described video；

Step 205, according to described actual optical flow intensity curve and the limit of described desired light intensity of flow curve Value (local optima), carries out segmentation to described video, obtains at least one described video-frequency band.

For example, as shown in Figures 2 and 3, first can be to video V=(I₁,…,I_t) each frame figure As carrying out optical flow computation (Optical flow), obtaining the light intensity of flow of each frame, the light stream of all frames is strong Degree links up and obtains an one-dimensional signal M=(m₁,…,m_t) (motion profile), as shown in Figure 3 Dotted line wave is i.e. for the actual optical flow intensity curve obtained by light stream strength signal M.It is then based on having The ideal curve of constant frequency carries out regularization to curve movement, will this irregular actual optical flow intensity Curve becomes rule, and an implementation method is signal M to carry out Fourier transform (FFT), at frequency domain Space obtains basic frequency M of this signal ', according to this basic frequency M ' perform inverse Fourier transform (IFFT), obtain solid line wave, be desired light intensity of flow curve M*.Then, according to solid line ripple The extreme point of wave line finds the extreme point of corresponding dotted line wave, according to these extreme points, whole length is regarded Frequently V is divided into a bit of short video-frequency band S_i, each video-frequency band includes from a minima to one Individual maximum or the process from a maximum a to minima.Wherein, in general, from one The process of peak to peak, can the process that steps of corresponding one leg, next from this minima To the next maximum process that then corresponding lower limb lands, followed by another one leg, so complete stepping Step period typically can include the process of four extreme value switchings.In each video-frequency band S_iCan also be according to health The moving situation at position segments further.

Step 102, each described video-frequency band is carried out piecemeal by the body part of described individual, with obtain to A few image block set, a described image block set includes pin in all two field pictures of described video-frequency band Image block to same body part.

Specifically, can be by above-mentioned video-frequency band S_iEach two field picture be divided into the body of background and prospect Position, six positions such as such as head, trunk, left arm, right arm, left lower limb, right lower limb.Then, utilization is appointed What body part partitioning algorithm, such as, deformable part bit model (Deformable Part Model), will Two field picture is divided into image block, or directly applies one to obtain from many alignment case-based learnings to two field picture Template, by each two field picture divide obtain head, trunk, left arm, right arm, left lower limb, right lower limb etc. six The position at individual position and scope.So, in this video-frequency band, for same body part, each image Set of blocks includes the set of the image block at these positions multiple in all two field pictures of this video-frequency band.

Binding time segmentation and space separating, can obtain the fritter of the concrete motion at some concrete positions, The most each image block set is considered as a space-time alignment block S_ijk.Wherein, i represents at whole video Cycle；J represents the whole time period taken a step in the cycle, such as, if by complete taking a step week of people Phase (two legs steps a step respectively) is divided into 4 sections, then j=1 ..., 4；K represents body part, such as, K=1 ..., 6 represent respectively the body part of whole people has been divided into head, trunk, left arm, right arm, Left lower limb, 6 positions of right lower limb, so can be divided into 4 × 6=24 space-time by the video of the walking of this individual Alignment (temporal alignment and spatial alignment) block (video blog).Each space-time pair Neat block is all to be respectively provided with physical significance block in time and space, such as, during the cycle 1 first 1/4 of taking a step The space-time alignment block S of left arm₁₁₃, the space-time alignment block S of the right lower limb during the cycle 2 last 1/4 of taking a step₂₄₆.With Video-frequency band is divided into by HOG3D employing regular grid has the less of level altitude, width and length Block is compared, the space-time alignment block (image block set) of the present embodiment can with the different space of flexible adaptation and Time configures.For example, with reference to Fig. 3, according to time slice and space separating, can be complete by including Take a step video-frequency band S in cycle_i, it is divided into a head space-time alignment block, a trunk space-time alignment block, two Individual right arm space-time alignment block RA1H, RA2H, two left arms space-time alignment block LA1H, LA2H, two Individual right lower limb space-time alignment block RL1H, RL2H, two left lower limb space-time alignment block LL1H, LL2H.

Although it should be noted that the present embodiment to foregoing example illustrates video to carry out the time and space being drawn The method divided, but temporally and spatially can be carried out other and divide, as long as can be complete by individual The process taken a step is divided into the method for the fritter the most all with physical significance can Use.So, when carrying out interpersonal feature and comparing, the time that can carry out is upper with the most right Comparison between the space-time alignment block answered.

Step 103, the gauss hybrid models of utilization training, extract respectively from each described image block set One space-time characteristic vector, described space-time characteristic vector can represent and is associated with described image block set The appearance characteristics of the body part that image block set described in video-frequency band is targeted.

In a kind of possible implementation, gauss hybrid models (the Gaussian Mixture of the present embodiment Models, GMMs) refer to the combination of multiple Gauss models corresponding to each image block set, such as figure Shown in 4, gauss hybrid models specifically can be obtained by following steps training:

Step 401, by the described cycle of taking a step, video sample is carried out segmentation, obtain at least one training and regard Frequency range；

Step 402, each described training video section is carried out piecemeal by described body part, to obtain at least One training image set of blocks, a described training image set of blocks includes all of described training video section For the image block of same body part in two field picture；

Step 403, for each described training image set of blocks, respectively by described training image set of blocks The pixel of each image block classify according to low-level image feature, for each class pixel, training obtains Such as the Gauss model of following formula 1, described gauss hybrid models includes each class of described training image set of blocks The Gauss model that pixel is corresponding；

Θ={ (μ_k,σ_k,π_k): k=1 ..., K} formula 1,

It should be noted that step 401 to step 403, and step 101 to step 103, can with serial also Can be parallel.Specifically, serial algorithm can also carry out body part calmly on the basis of the cycle of taking a step Position, it is possible to use the health of people, in the priori of the different phase in the cycle of taking a step, helps location health each Individual position, according to the body part further segmentation of video-frequency band to the cycle that includes completely taking a step.Parallel algorithm (estimation in the cycle of taking a step is need not rely upon) in the case of utilizing existing body part location technology, Calculating resource (if in the case of having Multi-core calculating resource to use) can be made full use of, by two Individual modular concurrent performs, and reduces the calculating time.

In a kind of possible implementation, as it is shown in figure 5, step 103 specifically may include that

Step 501, for each described image block set, such as: extract the end of each pixel based on following formula 2 Layer feature；

f (x, y) = [\tilde{x}, \tilde{y}, I (x, y), &PartialD; I / &PartialD; x, &PartialD; I / &PartialD; y, {&PartialD;}^{2} I / {&PartialD; x}^{2}, {&PartialD;}^{2} I / {&PartialD; y}^{2}]

Formula 2,

Wherein,Representing the described pixel relative position at described image block set, (x y) represents institute to I State the color of pixel；WithRepresent the gradient of described pixel, f (x, y) table Show the low-level image feature of described pixel.

Step 502, by the low-level image feature of each described pixel of described image block set, according to the described end Layer feature and the relation of the gauss hybrid models of training gained, obtain the spy of each described pixel of following formula 3 Levy vector.

Φ=[u₁,v₁,w₁…,u_k,v_k,w_k] formula 3,

Wherein, according to low-level image feature and the kth Gauss model of pixel each in this image block set, meter Calculate

u_{k} = \frac{1}{| W | \sqrt{π_{k}}} \underset{i &Element; idx (W)}{Σ} w_{ik} \frac{f_{i} - μ_{k}}{σ_{k}}, u_{k} = \frac{1}{| W | \sqrt{{2 π}_{k}}} \underset{i &Element; idx (W)}{Σ} w_{ik} [{(\frac{f_{i} - μ_{k}}{σ_{k}})}^{2} - 1];

Further,f_iFor the low-level image feature of pixel i, w_ikIt is pixel f_iBelong to the posterior probability of kth Gauss model, be also Fisher vector (vector) Weight vectors, p is probability-distribution function.

Below it is only the example of a kind of Fisher vector, it would however also be possible to employ other computational methods extract Fisher Vector, or the algorithm of similar Fisher vector can also be used, come according to pixel low-level image feature and Gauss Mixed model carries out feature extraction.

Step 503, the characteristic vector of calculated for formula 3 each described pixel is average, obtain described figure Space-time characteristic vector as set of blocks.

It is then possible to by the space-time characteristic vector of all image block set of all video-frequency bands of this video, Series winding obtains space-time characteristic vector (the Spatio-temporal Fisher of the target person represented by this video Vector, STFVs).

Specifically, the whole description for individual's walking is a long characteristic vector, and this vector is Being formed by connecting of space-time characteristic vector in the above fritter (image block set), shows in conjunction with above-mentioned Example, the space-time characteristic vector of this video is to be formed by connecting by the space-time characteristic vector of 24 image block set. And these 24 space-time characteristic vectors all extract on same image block set.

The human body feature extraction method based on video of the embodiment of the present invention, it is possible to by the cycle of taking a step to video Carry out segmentation, and by body part, video-frequency band carried out piecemeal, obtain image block set, thus realize right The video that individual takes a step carries out on the time and spatially has the division of physical significance, so for different video The individual of middle different attitude, such as inconsistent on paces, parts of body does not lines up, it is also possible to according to Divide the image block set obtained effectively to align such that it is able to extract exactly by image block set Unknown personage and the feature of known personage.

Fig. 6 is the flow chart of the human body recognition method based on video according to one embodiment of the invention.Such as Fig. 6 Shown in, the method mainly can utilize the human body feature extraction method based on video of above-described embodiment to carry out Feature extraction, specifically includes:

Step 601, according to above-mentioned human body feature extraction method based on video, from relating to known personage's Video extracts the space-time characteristic vector of described known personage；

Step 602, according to above-mentioned human body feature extraction method based on video, from relating to personage to be identified Video in extract described personage to be identified space-time characteristic vector；

Step 603, by special for vectorial for the space-time characteristic of the described known personage space-time with described personage to be identified Levy vector to compare, to determine the identity of described personage to be identified.

Specifically, in comparison procedure, any vector distance function can be used, calculate known people The vectorial space-time characteristic vector distance with described personage to be identified of space-time characteristic of thing.Then according to task, Determine whether inquiry personage to be identified is some specific people (1:1 checking), or multiple known personages What middle inquiry personage to be identified was corresponding is which people (1:N identification).

When illustrating how to extract one in an image block set below as a example by an image block set Empty characteristic vector and feature based on extraction carry out human bioequivalence, in this illustration, employing It is Fisher vector.

First, the low-level feature of all of pixel in a certain image block set, the bottom of pixel are calculated Feature includes relative position, color, gradient etc., and the concrete low-level feature selected can be different.Wherein, Position relatively can be with the top left co-ordinate of the first frame initial point as relative coordinate system in this image block set Calculate, naturally it is also possible to calculate with other select locations for initial point.

Seeing the pixel low-level image feature that formula 2 obtains is 7 dimensions, corresponding one 7 dimension of the most each pixel Vector, if this image block set includes 1000 points, just has the vector of 1000 low-level image features.

It should be noted that in the training stage, same image block set (left arm lifts forward) may Including multiple video samples (video sample in cycle of taking a step from same person difference, or from difference The video sample of people), if so using 10 video samples as training data, 1000 can be obtained The vector of × 10 low-level image features.Then this 10000 vectors, training can be utilized to obtain this image block The gauss hybrid models of set.Such as, may there is the height of 16 different 7 dimensions in this image block set This model, then formula 1 can be Θ={ (μ_k,σ_k,π_k): k=1 ..., 16}.

Owing to each video to be divided into 24 image block set, it is possible to obtain 24 Gaussian Mixture moulds Type, has 16 Gauss models in each gauss hybrid models.Seeing formula 1, each Gauss model uses 3 Parameter describes, wherein mean μ_kIt is 7 dimensions with low-level feature, weight π_kIt is 1 dimension, variances sigma_kCan be 1 Dimension, 7 dimensions, 49 dimensions etc..

At test phase, if needing the video of two people is compared, above-mentioned step can be respectively adopted The process of rapid 101 to step 103, obtains 24 image block set (video blobs) of two people, for Each image block set, uses the gauss hybrid models of its correspondence, and sees formula 3 to calculate its Fisher Vector: Φ=[u₁,v₁,w₁…,u_k,v_k,w_k].Wherein, u_k,v_kThe same with low-level feature is 7 dimensions, w_kIt is 1 dimension, institute Can be the vector of (2 × 7+1) × 16=240 dimension with calculated each Fisher vector (can also there is no w_k).Wherein, owing to each image block set may have a lot of pixel (examples 1000 as the aforementioned), each pixel can calculate the vector of one 240 dimension, can be by same In image block set, the Fisher vector of all pixels is averaged and obtains a vector.Finally, then will The Fisher vector of 24 image block set that of one people took a step in the cycle contacts, and obtains 24 The vector of × 240=5760 dimension, constitutes the character representation of a people.

Seek the distance between two 5760 dimensional vectors more exactly when carrying out characteristics of human body and identifying, pass through Relatively this character representation extracted may determine that the identity of people.

The human body recognition method based on video of the embodiment of the present invention, uses above-mentioned human body based on video special Levy extracting method, it is possible to by image block set realize human body in time with aliging, with accurately spatially Ground extracts unknown personage and the feature of known personage, such that it is able to improve the accuracy of human bioequivalence.

Fig. 7 is the structure chart of the characteristics of human body's extraction element based on video according to one embodiment of the invention, As it is shown in fig. 7, this device is for extracting the characteristics of human body of the appearance characteristics that can represent individual from video, This device specifically may include that the time divides module 71, is used for by the cycle of taking a step of described individual described Video carries out segmentation, obtains at least one video-frequency band；Space partitioning module 73, for each described video Section carries out piecemeal by the body part of described individual, to obtain at least one image block set, described in one Image block set includes the image block in all two field pictures of described video-frequency band for same body part；Special Levy extraction module 75, for utilizing the gauss hybrid models of training, respectively from each described image block set Extracting a space-time characteristic vector, described space-time characteristic vector can represent relevant to described image block set The appearance characteristics of the body part that image block set is targeted described in the video-frequency band of connection.

In a kind of possible implementation, the time divides module 71 and includes:

Optical flow computation submodule 711, for calculating the light stream strength signal of each two field picture of described video, And the actual optical flow intensity curve of described video is obtained according to described light stream strength signal；

Fourier transformation submodule 713, for carrying out in Fu the light stream strength signal of each described two field picture Leaf transformation obtains regularization signal, and obtains the described regularization signal basic frequency at frequency domain；

Inverse Fourier transform submodule 715, for carrying out described regularization signal according to described basic frequency Inverse Fourier transform, obtains the desired light intensity of flow curve of described video；

Segmentation submodule 717, for according to described actual optical flow intensity curve and described desired light intensity of flow The pole value of curve, carries out segmentation to described video, obtains at least one described video-frequency band.

Wherein, each submodule of time division module 71 carries out the concrete grammar of time slice to video, can To see Fig. 2 and the associated description of above-mentioned human body feature extraction method based on video.

In a kind of possible implementation, the described time divides module 71 and is additionally operable to take a step the cycle by described Video sample is carried out segmentation, obtains at least one training video section；Described space partitioning module 73 is also used In each described training video section is carried out piecemeal by described body part, to obtain at least one training image Set of blocks, a described training image set of blocks include in all two field pictures of described training video section for The image block of same body part；

Described device also includes: training module 77, for for each described training image set of blocks, difference The pixel of each image block in described training image set of blocks is classified according to low-level image feature, for Each class pixel, training obtains such as the Gauss model of following formula 1, and described gauss hybrid models includes described The Gauss model that each class pixel of training image set of blocks is corresponding；

Θ={ (μ_k,σ_k,π_k): k=1 ..., K} formula 1,

In a kind of possible implementation, described characteristic extracting module 75 includes:

Low-level image feature extracts submodule 751, for for each described image block set, extracts each pixel Low-level image feature；

Characteristic vector pickup submodule 753, for the end by each described pixel of described image block set Layer feature, according to the relation of described low-level image feature with the gauss hybrid models of training gained, obtains each described The characteristic vector of pixel；

The average submodule of characteristic vector 755, for by the characteristic vector of calculated each described pixel Averagely, the space-time characteristic vector of described image block set is obtained.

Wherein, space partitioning module 73 carries out the process of space separating and may refer to Fig. 3 and above-mentioned video The associated description of human body feature extraction method embodiment based on video.Characteristic extracting module 75 utilizes Gauss Mixed model extract space-time characteristic vector such as Fisher vector example, may refer to formula 1 to formula 3 and Associated description.

The present embodiment, video can be carried out point by characteristics of human body's extraction element based on video by the cycle of taking a step Section, and by body part, video-frequency band is carried out piecemeal, obtain image block set, thus realize individual is stepped The video of step carries out on the time and spatially has the division of physical significance, the image block collection obtained according to division Conjunction carries out effective space-time alignment such that it is able to extract characteristics of human body exactly by image block set.

Fig. 8 is the structure chart of the human bioequivalence device based on video according to one embodiment of the invention, such as Fig. 8 Shown in, should mainly may include that any one knot in above-described embodiment by human bioequivalence device based on video Characteristics of human body's extraction element 81 based on video of structure, specifically, described characteristics of human body based on video Extraction element 81, for extract from the video relating to known personage the space-time characteristic of described known personage to Amount, extracts the space-time characteristic vector of described personage to be identified from the video relating to personage to be identified；

Additionally, described human bioequivalence device based on video can also include: comparison module 83, being used for will Vectorial and the described personage to be identified space-time characteristic vector of the space-time characteristic of described known personage compares, To determine the identity of described personage to be identified.

Specifically, human bioequivalence device based on video carries out the detailed process of human bioequivalence, Ke Yican See the associated description in the embodiment of above-mentioned human body recognition method based on video.

The present embodiment can use characteristics of human body's extraction element based on video to realize people by image block set Body in time with aliging spatially, extract characteristics of human body exactly, according to the characteristics of human body couple extracted Known personage and unknown personage compare, and the recognition result obtained is more accurate.

The above, the only detailed description of the invention of the present invention, but protection scope of the present invention is not limited to In this, any those familiar with the art, can be easily in the technical scope that the invention discloses Expect change or replace, all should contain within protection scope of the present invention.Therefore, the protection of the present invention Scope should be as the criterion with described scope of the claims.

Claims

1. a human body feature extraction method based on video, can represent individual for extracting from video The characteristics of human body of the appearance characteristics of people, it is characterised in that including:

Method the most according to claim 1, it is characterised in that by taking a step the cycle pair of described individual Described video carries out segmentation, obtains at least one video-frequency band, including:

Method the most according to claim 1 and 2, it is characterised in that described gauss hybrid models leads to Cross following steps training to obtain:

Θ={ (μ_k,σ_k,π_k): k=1 ..., K} formula 1,

Method the most according to claim 3, it is characterised in that utilize the gauss hybrid models of training, A space-time characteristic vector is extracted respectively from each described image block set, including:

5. a human body recognition method based on video, it is characterised in that including:

Human body feature extraction method based on video according to any one of claim 1 to 4, from relating to And the video of known personage extracts the space-time characteristic vector of described known personage；

Human body feature extraction method based on video according to any one of claim 1 to 4, from relating to And the video of personage to be identified extracts the space-time characteristic vector of described personage to be identified；

6. characteristics of human body's extraction element based on video, can represent individual for extracting from video The characteristics of human body of the appearance characteristics of people, it is characterised in that including:

Device the most according to claim 6, it is characterised in that the described time divides module and includes:

8. according to the device described in claim 6 or 7, it is characterised in that

Described space partitioning module is additionally operable to carry out each described training video section point by described body part Block, to obtain at least one training image set of blocks, a described training image set of blocks includes described instruction Practice the image block for same body part in all two field pictures of video-frequency band；

Described device also includes:

Θ={ (μ_k,σ_k,π_k): k=1 ..., K} formula 1,

Device the most according to claim 8, it is characterised in that described characteristic extracting module includes:

10. a human bioequivalence device based on video, it is characterised in that including: claim 6 to 9 According to any one of characteristics of human body's extraction element based on video,

Described human bioequivalence device based on video also includes: