CN102521582B

CN102521582B - Human upper body detection and splitting method applied to low-contrast video

Info

Publication number: CN102521582B
Application number: CN2011104465964A
Authority: CN
Inventors: 谢迪; 童若锋
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2011-12-28
Filing date: 2011-12-28
Publication date: 2013-09-25
Anticipated expiration: 2031-12-28
Also published as: CN102521582A

Abstract

The invention relates to a human upper body detection and splitting method applied to a low-contrast video. The method mainly comprises two processes. In the first process, a communicated area representing a foreground object is extracted from a current frame by a background subtraction technology and a morphological method; and for each foreground area, the shape features of a polar-coordinate-based two-dimensional histogram corresponding to the foreground area are extracted as the input of a pre-trained support-vector-machine-based classifier, and a class tag corresponding to a human upper body class and a class tag corresponding to a non-human upper body class are output. In the second process, when an area which is identified as a human body area is misjudged as a non-human body area,the area is represented by an energy function, an inaccurate contour line is corrected by an energy function minimization process at the same time, and finally, a background frame is updated on the basis that an accurate foreground human body contour is obtained. By the method, a video with low contrast and resolution can be processed in real time, and both detection accuracy and a splitting result can meet the requirements of application.

Description

A kind of upper half of human body detection of low contrast video and method of cutting apart of being applicable to

Technical field:

The present invention relates to technical field of video processing, relate in particular to detection and the extracting method in upper half of human body zone, specifically a kind of upper half of human body detection of low contrast video and method of cutting apart of being applicable to.

Background technology:

It is two different monitoring key in application steps that automatic detection is cut apart with the human region in the video.Human body detecting method finds foreground object usually and based on shape, color and further feature their signs is people or inhuman zone from video.The background scalping method is a kind of preconditioning technique of common extraction foreground area.Another kind of is method based on machine learning, and has used many new features that are applicable to machine learning.Feature based on gradient is the most representative.But these methods do not need to carry out the pre-service of background rejecting are cost with high assessing the cost but, have therefore limited its application in real-time system.Methods of video segmentation is rejected technology based on background equally, and simultaneously integrated probabilistic framework is as bayesian theory and Markov chain monte-Carlo model.

Because many methods need provide an arithmetic result of background rejecting relatively preferably, in case because illumination variation makes ambient lighting change, these methods just can lose efficacy.Though some improved backgrounds are rejected algorithm and can be addressed the above problem, if foreground object keeps the quite a long time transfixion before camera lens, prospect can be gradually varied to background so.In addition because the quality of its CCD chip of video camera of being equipped with of many supervisory systems is not high, thereby make that the video contrast who obtains is lower, it will be more difficult that existing method is handled these videos.

Summary of the invention:

(1) foreground extraction: first frame of designated frame as a setting at first, its form from the RGB color space conversion to the Lab color space, for each frame of input, all carry out color conversion in the same way then; The method that output frame after the conversion and background frames use background to reject is extracted the foreground object zone; To each zone after extracting, use the morphological operation of the corrosion of expanding that noise and cavity are carried out filtering then, use breadth First connected region searching algorithm that mark is carried out in preceding background area at last, generate the foreground area mask;

(2) Shape Feature Extraction: at first extract the outline line of foreground area by the profile detection algorithm and to its sampling; Be that initial point is set up a polar coordinate system with regional barycenter then, for each sampled contour point, it be mapped to a two dimensional surface, finally all sampled points have just formed a two-dimensional histogram; Histogram normalization and expansion to obtaining just can obtain a high dimension vector at last;

(3) based on the upper half of human body model training of support vector machine: with the vector that obtains in the previous step as sample, use with the radius basis function and as the non-linear algorithm of support vector machine of kernel function all training samples are carried out K cross validation analysis, finally generate a non-linear decision-making lineoid as the sorter of upper half of human body zone with non-upper half of human body zone;

(4) based on the upper half of human body category of model of support vector machine: the vector that is obtained in poly-to go on foot equally (2) is as the input of training gained sorter in the step (3), the class label of output after sorter decision-making mapping;

(5) energy function minimizes optimizing process: for a foreground area that begins to be considered to human region, when being classified device in its processing procedure and detecting its class label and be non-human region, come contour curve is carried out modeling with an energy function, be initial value with contour curve correct in the former frame, find the solution with Euler-Lagrangian method.

Method of the present invention mainly is made of two big processes.At first, from present frame, extract the connected region of expression foreground object by background rejecting technology and morphological method, then for each foreground area, extract the shape facility based on the polar coordinates two-dimensional histogram of its correspondence, as the good input based on the sorter of support vector machine of training in advance, export a class label corresponding to upper half of human body class and non-upper half of human body class.The second step process, when the zone that is identified as human body was mistaken for non-human region, the present invention characterized respective regions with an energy function, simultaneously the outline line of correcting a mistake by an energy function minimization process.Upgrade background frames on the basis that obtains correct prospect human body contour outline at last.The present invention can handle the video than low contrast and resolution in real time, detects accuracy and segmentation result and can both satisfy demands of applications.

Description of drawings:

Fig. 1 is process flow diagram of the present invention.

Embodiment:

Be elaborated according to the various piece of process flow diagram Fig. 1 of the present invention below:

1. foreground extraction

First frame of designated frame as a setting at first, its form from the RGB color space conversion to the Lab color space.Then for each frame of importing, all carry out color conversion in the same way, the method that output frame after the conversion and background frames use background to reject is extracted the foreground object zone and (is namely according to pixels asked the poor mode that takes absolute value with two frames, its value is higher than certain threshold value, just think foreground pixel, otherwise be background pixel).To each zone after extracting, use the morphological operation of the corrosion of expanding that noise and cavity are carried out filtering, use breadth First connected region searching algorithm that mark is carried out in preceding background area at last, generate the foreground area mask.

2. Shape Feature Extraction

Feature proposed by the invention compares to the local phase gradient histogram feature that has can describe the shape of upper half of human body more, therefore has bigger discrimination, has littler computation complexity simultaneously.

A people's profile, particularly Shang Banshen profile can be regarded a star convex set as.If have 1 x among the collection S ₀, make by x ₀The straight-line segment of any point all belongs to S in the S, claims that then S is starlike domain or star convex set.Shape facility of the present invention is to design on this basis.

For a specific foreground area, the present invention finds the barycenter of foreground area by BFS (Breadth First Search), finds the boundary contour of the same area then by the border following algorithm.Then sample counterclockwise to outline line in equal angles ground on outline line, namely with the barycenter of foreground area as a polar coordinate system initial point, then each sampled point on the outline line just can be expressed as one group of polar coordinates (θ under this coordinate system _i, r _i), i=1,2 ..., N, wherein r _iBe the Euclidean distance of regional barycenter to each point, θ _iBe the polar angle of each point, N is the sum of sampled point.These polar coordinates values are projected on the two dimensional surface subsequently, and the x axle on plane is represented the θ value, and the y axle is represented the r value, and each dimension is quantized respectively, are divided into m and n part.As a polar coordinates value (θ _i, r _i) when satisfying following condition:

θ _k≤θ _i≤θ _k+1,r _l≤r _i≤r _l+1,k=0,...,m-1,l=0,...,n-1

Then increase corresponding unit (k, value l).When having traveled through all points as stated above, will form a two-dimensional histogram with AD HOC.This specific pattern is characterizing the given shape of corresponding outline line.At last, carry out to obtain the vector f that a m * n ties up after the normalization by the value of this each cell of histogram of row expansion and to it.Obviously have nothing to do through the shape facility of the present invention's acquisition and position and the size of object.

3. based on upper half of human body model training and the detection of support vector machine

In the training stage, a large amount of upper half of human body images and non-upper half of human body image are collected, thereby extract the shape facility of prospect by the manual markings foreground area.These shape facilities the set of corresponding high dimension vector formed the sample set that the present invention is used for training.As the algorithm of training, its kernel function has adopted Gauss's radius basis function with support vector machine in the present invention:

K(x _i，x _j)＝exp(-γ||x _i-x _j|| ²)

X wherein _i, x _jBe proper vector, γ is normaliztion constant.

In order to train the sorter that obtains optimum performance, the present invention has used the method for K cross validation to determine two parameter γ and the C of support vector machine classifier.Namely all data are divided into K one's share of expenses for a joint undertaking data, a independent subdata is retained as verification msg, other K-1 one's share of expenses for a joint undertaking data are used for training. and as above process is repeated inferior K time, selects for use different subdata combinations as verification msg and training data at every turn, at last asking result is averaged.The parameter combinations of determining the optimal classification performance by the present invention of this mode is γ=0.25, and during C=2.0, classification accuracy is about 98%.

At detection-phase, for each frame, if there is foreground area, using the same method so extracts the region contour shape facility, as the input of the good sorter of precondition, sorter will export whether a Boolean explanation current region is the upper half of human body zone.

4. energy function minimizes optimizing process

In case the upper half of human body zone can not be supported vector machine classifier and be identified as the human body class, cause classification error, the present invention will carry out an energy minimization process to the prospect profile line of mistake so, eliminate because surround lighting changes the outline line that causes and expand the error that causes, thereby guarantee the correctness in prospect profile zone.For the integrity profile of one section closure, the present invention is with an energy function E _c(s) characterize:

E _c(s)=∮(E _int(s)+η(s)E _ext(s))ds

E wherein _Int(s) be the inside potential energy of outline line, E _Ext(s) provided outside limits based on image.η (s) is the weight corresponding to each sampled point, is defined as:

η (s_{i}) = \frac{{| | &dtri; I (x (s_{i}), y (s_{i})) | |}^{2}}{{&Sum;}_{i}^{N} {| | &dtri; I (x (s_{i}), y (s_{i})) | |}^{2}}

Wherein

The gradient of presentation video, N is the sum of sampled point.The target of optimizing is to find to make energy functional E _c(s) minimized curvilinear function v (s)=(x (s), y (s)).The present invention adopts Euler-lagrange's method of multipliers that the functional formula is converted into the problem of finding the solution of partial differential equation, then to its discretize, finally obtains a linear system Ax=b, and wherein A is the matrix that has and only have five nonzero elements on the diagonal line.Can use this linear system of Cholesky decomposition method solution.

5. the background area is upgraded

On the basis that obtains correct foreground area, the present invention upgrades the background area with the mode of linear interpolation:

I_{B} (x, y) = α I_{B}^{(t)} (x, y) + (1 - α) I_{B}^{*} (x, y)

I wherein _B(x, y) for upgrade the position, back (x, y) Dui Ying background frames pixel value,

Pixel value for same position before upgrading.

The pixel value that belongs to the background area in the present frame for correspondence.For foreground area, only copy the pixel value on the relevant position simply

What should be understood that is: above-described embodiment is just to explanation of the present invention, rather than limitation of the present invention, and any innovation and creation that do not exceed in the connotation scope of the present invention all fall within protection scope of the present invention.

Claims

1. one kind is applicable to the upper half of human body detection of low contrast video and the method for cutting apart, and it is characterized in that this method may further comprise the steps:

(2) Shape Feature Extraction: at first extract the outline line of foreground area by the profile detection algorithm and to its sampling; Be that initial point is set up a polar coordinate system with regional barycenter then, for each sampled contour point, it be mapped to a two dimensional surface, finally all sampled points have just formed a two-dimensional histogram; Histogram normalization and expansion to obtaining obtains a high dimension vector at last;

(4) based on the upper half of human body category of model of support vector machine: equally with the input as training gained sorter in the step (3) of the vector that obtained in the step (2), the class label of output after sorter decision-making mapping;

(5) energy function minimizes optimization: for a foreground area that begins to be considered to human region, when being classified device in its processing procedure and detecting its class label and be non-human region, come contour curve is carried out modeling with an energy function, be initial value with contour curve correct in the former frame, find the solution with Euler-Lagrangian method, and upgrade the background area with last result.

2. the upper half of human body that the is applicable to the low contrast video as claimed in claim 1 method that detects and cut apart, it is as follows to it is characterized in that method that the use background described in the step (1) is rejected is extracted the process in foreground object zone: according to pixels ask the mode that takes absolute value that differs from two frames, its value is higher than certain threshold value, just think foreground pixel, otherwise be background pixel.

3. the upper half of human body that the is applicable to the low contrast video as claimed in claim 1 method that detects and cut apart is characterized in that the detailed process of step (2) is as follows:

For a specific foreground area, find the barycenter of foreground area by BFS (Breadth First Search), find the boundary contour of the same area then by the border following algorithm; Then sample counterclockwise to outline line in equal angles ground on outline line, and the barycenter of foreground area is labeled as a polar coordinate system initial point, and each sampled point on the outline line just can be expressed as one group of polar coordinates (θ under this coordinate system _i, r _i), i=1,2 ..., N, wherein r _iBe the Euclidean distance of regional barycenter to each point, θ _iBe the polar angle of each point, N is the sum of sampled point; These polar coordinates values are projected on the two dimensional surface subsequently, and the x axle on plane is represented the θ value, and the y axle is represented the r value, and each dimension is quantized respectively, are divided into m and n part; As a polar coordinates value (θ _i, r _i) when satisfying following condition:

θ _k≤θ _i≤θ _k+1,r _l≤r _i≤r _l+1,k=0,...,m-1,l=0,...,n-1

Then increase corresponding unit (k, value l); When having traveled through all points as stated above, will form a two-dimensional histogram with AD HOC, this specific pattern is characterizing the given shape of corresponding outline line; At last, carry out to obtain the vector f that a m * n ties up after the normalization by the value of this each cell of histogram of row expansion and to it.

4. the upper half of human body that the is applicable to the low contrast video as claimed in claim 1 method that detects and cut apart is characterized in that the detailed process of step (3) is as follows:

In the training stage, a large amount of upper half of human body images and non-upper half of human body image are collected, thereby extract the shape facility of prospect by the manual markings foreground area, these shape facilities the set of corresponding high dimension vector formed the sample set that the present invention is used for training, adopt support vector machine as the algorithm of training, its kernel function adopts Gauss's radius basis function:

K(x _i,x _j)=exp(-γ||x _i-x _j|| ²)

X wherein _i, x _jBe proper vector, γ is normaliztion constant;

Adopt the method for K cross validation to determine two parameter γ and the C of support vector machine classifier: all data are divided into K one's share of expenses for a joint undertaking data, a independent subdata is retained as verification msg, other K-1 one's share of expenses for a joint undertaking data are used for training, as above process is repeated inferior K time, select for use different subdata combinations as verification msg and training data at every turn, at last asking result is averaged; At detection-phase, for each frame, if there is foreground area, using the same method so extracts the region contour shape facility, as the input of the good sorter of precondition, sorter will export whether a Boolean explanation current region is the upper half of human body zone.

5. the upper half of human body that the is applicable to the low contrast video as claimed in claim 1 method that detects and cut apart is characterized in that the detailed process of step (5) is as follows:

Count E with an energy functional _c(s) characterize the integrity profile of one section closure:

E _c(s)=∮（E _int(s)+η(s)E _ext(s))ds

E wherein _Int(s) be the inside potential energy of outline line, E _Ext(s) provided outside limits based on image, η (s) is the weight corresponding to each sampled point, is defined as:

η (s_{i}) = \frac{{| | &dtri; I (x (s_{i}), y (s_{i})) | |}^{2}}{{&Sum;}_{i}^{N} {| | &dtri; I (x (s_{i}), y (s_{i})) | |}^{2}}

Wherein

The gradient of presentation video, N is the sum of sampled point;

Adopt Euler-lagrange's method of multipliers that the functional formula is converted into the problem of finding the solution of partial differential equation, then to its discretize, finally obtain a linear system Ax=b, wherein A is the matrix that has and only have five nonzero elements on the diagonal line; Can use this linear system of Cholesky decomposition method solution;

On the basis that obtains correct foreground area, upgrade the background area with the mode of linear interpolation:

I_{B} (x, y) = α I_{B}^{(t)} (x, y) + (1 - α) I_{B}^{*} (x, y)

Be the pixel value of same position before upgrading,

The pixel value that belongs to the background area in the present frame for correspondence; For foreground area, only copy the pixel value on the relevant position simply