CN101477626A

CN101477626A - Method for detecting human head and shoulder in video of complicated scene

Info

Publication number: CN101477626A
Application number: CNA200910077108XA
Authority: CN
Inventors: 孙立峰; 丁锡锋; 徐辉; 崔鹏; 杨士强
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2009-01-16
Filing date: 2009-01-16
Publication date: 2009-07-08
Anticipated expiration: 2029-01-16
Also published as: CN101477626B

Abstract

The invention relates to a method for detecting head and shoulders of human body in a video with complex scene, which belongs to the technical field of computer information mining. The method comprises the following steps: manually determining the head and shoulder picture, the background picture and the pictures of other parts of the human body in each frame of a video as the positive and the negative samples, and mirroring the pictures; extracting the gradient vectors of the positive and the negative samples, and training a first classifier; selecting the head and the shoulder picture as the new positive sample, and the pictures of other parts of the human body as the new negative sample; extracting the gradient vectors of the positive and the negative samples, and training a second classifier; determining the position and the size of a window to be detected in one frame of a video frame to be detected, and extracting the gradient vector; classifying the gradient vector by the first classifier, and stopping detection if the classification result is negative; classifying the gradient vector by the second classifier if the first classification result is positive, and determining that the window contains the head and the shoulder if the classification is positive; and changing the position and the size of the window to detect new windows. The method can improve the accuracy as well as the detection speed.

Description

A kind of method of in the video of complex scene, carrying out the human head and shoulder detection

Technical field

The invention belongs to computerized information digging technology field, particularly a kind of method that in the picture of complex scene, the head shoulder of human body is detected, the frame middle row people's who relates in particular at the monitor video of real world head shoulder detects.

Background technology

In recent years, at the computer video analysis field, the detection to human body in video is the research direction of a hot topic.In the whole bag of tricks of human detection, coming human body by the each several part that detects health is an important supplementary means.And to these health each several parts, head shoulder zone is the feature of a highly significant.Owing to human body occurs by the situation of partial occlusion through regular meeting in the video, cause detection difficult, and this moment, the head shoulder also had very high probability to be detected, so the detection head shoulder is to the good booster action of human body.Simultaneously, at the Video Events detection range, near the many actions of people head shoulder often comprise event informations that some imply, such as waving or making a phone call etc.So the head shoulder under the complex background detects and has great importance.

Head shoulder detects and belongs to target detection, and in object detection field, method can be divided into two classes, and the one, to do background extracting or cut apart, isolated foreground target is as testing result.The 2nd, in image, directly search target.Method with background extracting in video can only be applied to still camera, for actionless target in the scene, detects very difficulty, and this has limited its range of application.So generally adopt the method for direct search target in image now.These methods generally use sorter according to clarification of objective target to be classified.Clarification of objective is the characteristic information that target itself comprises, such as the color histogram of object region, texture, gradient etc.Extract after the clarification of objective, sorter is judged the classification of target according to clarification of objective.International sorter mainly is support vector machine (hereinafter to be referred as SVM) at present, but present single-stage svm classifier device is often owing to only carry out a subseries, and accuracy rate is not high.

Summary of the invention

The objective of the invention is for overcoming the weak point of prior art, propose a kind of method that human head and shoulder detects of in the video of complex scene, carrying out, with gradient orientation histogram as describing clarification of objective.The SVM that adopts two-stage can improve accuracy rate as sorter, improves detection speed simultaneously.

The present invention be with the head shoulder picture of some and background picture as positive and negative sample set, SVM is as first order sorter in training.With the positive negative sample of picture conduct of head shoulder picture and the non-head-and-shoulder area of health, SVM is as second level sorter in training.So just constituted the cascade classifier of a two-stage.Surveyed area detects through this two-stage SVM successively, with this testing result as net result.

The SVM that uses among the present invention is present at the popular LibSVM sorter of international comparison, it is not made amendment.

The method of carrying out the human head and shoulder detection in the video of complex scene that the present invention proposes mainly may further comprise the steps:

(1) from a class video to be detected, selects a video.The picture of other parts of health of artificial head shoulder picture, some (at least 1000) background pictures and some (at least 1000) of demarcating some (at least 1000) from each frame of this video wherein requires at least 1 centimetre of the length of side of these pictures.With head shoulder picture as samples pictures just, with background picture as the negative sample picture;

(2) the positive and negative samples pictures that will obtain is carried out left and right sides mirror image, increases the quantity of sample;

(3) extract the gradient orientation histogram of the positive and negative samples pictures obtain, and gradient orientation histogram is converted into the form of vector, as the gradient vector of samples pictures;

(4) use the gradient vector that from positive negative sample, extracts that first order support vector machine (SVM) is trained, generate a first order model that is used to classify;

(5) with described head shoulder picture as the sample first month of the lunar year, replace described background picture as new negative sample with the picture of described other parts of health;

(6) extract the gradient orientation histogram of described first month of the lunar year of negative sample picture, and gradient orientation histogram is converted into 1 form of taking advantage of the vector of N, N is a positive integer, as the gradient vector of new samples;

(7) use the gradient vector that from new positive negative sample, extracts that second level support vector machine (SVM) is trained, generate a second level model that is used to classify;

(8) read in a video to be detected, extract a two field picture of this video;

(9) determine some position of window to be detected and size on this two field picture, adopt the method for step (3) to extract the gradient orientation histogram of this window, and obtain the gradient vector of this window;

(10) this gradient vector is carried out classification and Detection by first order sorter, if classification results for negative (be this window do not comprise head on the shoulders as), then finish the detection of this window, change step (11); If first order classification results for just (be first order sorter judge this window comprise the head on the shoulders as), then this gradient vector is carried out classification and Detection by second level sorter; If second level classification results changes step (11) for negative, if classification results confirms then that for just this window comprises the head shoulder, preserves the coordinate of window get off, as the testing result of this window;

(11) change position of window and size, adopt the method for step (3) to extract the gradient orientation histogram of this window, and obtain the gradient vector of this window, change the classification and Detection that step (10) is carried out this window, finally obtain the testing result of each window.

The step of said method (3) specifically comprises the steps:

(31) with each samples pictures as a window, window is divided into the piece of MxN, the overlapping of 30%-50% arranged between the piece, M, N are positive integer;

(32) this each piece is divided equally into a plurality of unit;

(33) to pixel compute gradient direction and size in this each unit;

(34) gradient with the pixel in each unit becomes a histogram by directional statistics, and gradient orientation histogram is converted into vectorial form, as the gradient vector of this unit;

(35) gradient vector with each unit connects into a long vector, as the gradient vector of this piece;

(36) gradient vector with each piece in the window connects into a long vector, as the gradient vector of this window;

(37) gradient vector of this window is carried out normalization, as the gradient vector of this samples pictures;

The step of said method (11) changes position of window and size, carries out the classification and Detection of this window, specifically comprises the steps:

(111) move the coordinate of this window, and keep the 30%-80% degree of overlapping, carry out the classification and Detection of this window;

(112) change the size (size according to head shoulder in the video is determined span) of window, and the position of moving window successively, carry out the classification and Detection of this window.

Characteristics of the present invention and effect:

The method of carrying out the human head and shoulder detection in the video of complex scene that the present invention proposes is used for the people of the monitor video in the real world is carried out the detection of head shoulder.Selected the character representation of gradient orientation histogram for use as the head shoulder.Gradient orientation histogram begins to be used for object detection field recent years, selects its expression head shoulder image for use, and the edge that can preserve target moves towards feature, test verifiedly, and it is a kind of feature of robust, can improve the performance of detection.Simultaneously, use two support vector machine of different sample trainings, they are constituted cascade classifier.In testing process, first order sorter removes a zone that does not obviously contain the head shoulder, and the window by first order sorter is detected by second level sorter again.Like this,, can improve accuracy rate, improve detection speed simultaneously owing to adopt the two-stage classification device.Head shoulder with the people in video detects, and can be used for human body tracking, event detection, for realizing that monitoring is significant automatically.

Embodiment

A kind of method of carrying out the human head and shoulder detection in the video of complex scene that the present invention proposes is described in detail as follows in conjunction with the embodiments:

The head shoulder that carries out human body in video detects, and the essence method is still on picture and detects, and present embodiment adopts the method for directly searching on picture, specifically may further comprise the steps:

(1) from a class video to be detected, selects a video; The picture of other parts of health of artificial head shoulder picture, some (at least 1000) background pictures and some (at least 1000) of demarcating some (at least 1000) from each frame of this video wherein requires at least 1 centimetre of the length of side of these pictures; With head shoulder picture as samples pictures just, with background picture as the negative sample picture;

(3) extract the gradient orientation histogram of the positive and negative samples pictures obtain, and gradient orientation histogram is converted into the form of vector, as the gradient vector of samples pictures; Specifically comprise:

(31) regard each picture as a window, this window be divided into the fritter of 3x3, have between the fritter 50% overlapping;

(32) to be divided equally again be four junior units to this each fritter;

(33) to pixel compute gradient direction and size in this each unit, the Grad of present embodiment adopts the method for template [1,0,1] to calculate, and direction is:

{grad}_{(x, y)} = \arctan \frac{I_{(x, y + 1)} - I_{(x, y - 1)}}{I_{(x + 1, y)} - I_{(x - 1, y)}}

Size is:

{value}_{(x, y)} = \sqrt{{[I_{(x + 1, y)} - I_{(x - 1, y)}]}^{2} + {[I_{(x, y + 1)} - I_{(x, y - 1)}]}^{2}}

Wherein, grad _{(x, y)}It is the gradient direction of this pixel.l _{(x, y)}It is the brightness value of this pixel.Value _{(x, y)}It is the value of the gradient of this pixel;

(34) Grad of each pixel in the unit is added up into color histogram according to gradient direction, preserve with the form of vector, the gradient vector of this vector as this unit;

(35) gradient vector with each unit couples together, and each fritter is represented with the vector of normalized 4 an x 9=36 dimension, as the gradient vector of this piece;

(36) gradient vector with each piece couples together, and the gradient orientation histogram of each window just can be represented with the vector of one 9 x36=324 dimension, as the gradient vector of this window;

(4) use the gradient vector that from positive negative sample, extracts that first order support vector machine (LibSVM) is trained, generate a first order model that is used to classify;

(6) extract obtain the first month of the lunar year negative sample picture gradient orientation histogram, and gradient orientation histogram is converted into 1 form of taking advantage of 324 vector, as the gradient vector of new samples;

(7) use the gradient vector that from new positive negative sample, extracts that second level support vector machine (LibSVM) is trained, generate a second level model that is used to classify;

(8) call the openCV storehouse and read in a video to be detected, parse the frame on this video;

(9) determine some position of window to be detected and size on this two field picture, adopt step (31) to extract the gradient orientation histogram of this window, and obtain the gradient vector of this window to the method for step (37);

(10) this gradient vector is carried out classification and Detection by first order LibSVM sorter, if classification results for negative (be this window do not comprise head on the shoulders as), then finish the detection of this window, then change step (13); If first order classification results for just (be first order sorter judge this window comprise the head on the shoulders as), then this gradient vector is carried out classification and Detection by second level LibSVM sorter; If second level classification results changes step (13) for negative, if classification results confirms then that for just this window comprises the head shoulder, preserves the coordinate of window get off, as the testing result of this window;

(11) move the coordinate of this window, and keep the 30%-80% degree of overlapping, this window to be detected is detected;

(12) change the size (size according to head shoulder in the video is determined span) of window, and the position of moving window successively, carry out the classification and Detection of this window, finally obtain the testing result of each window.

Claims

1, carry out the method that human head and shoulder detects in a kind of video of complex scene, it is characterized in that, mainly may further comprise the steps:

(1) from a class video to be detected, selects a video.The picture of other parts of health of artificial head shoulder picture, some background pictures and some of demarcating some from each frame of this video wherein requires at least 1 centimetre of the length of side of these pictures.With head shoulder picture as samples pictures just, with background picture as the negative sample picture;

(4) use the gradient vector that from positive negative sample, extracts that first order support vector machine is trained, generate a first order model that is used to classify;

(7) use the gradient vector that from new positive negative sample, extracts that second level support vector machine is trained, generate a second level model that is used to classify;

(8) read in a video to be detected, extract a two field picture of this video;

(10) this gradient vector is carried out classification and Detection by first order sorter,, change step (11) if classification results then finishes the detection of this window for negative; If first order classification results then carries out classification and Detection with this gradient vector by second level sorter for just; If second level classification results changes step (11) for negative, if classification results confirms then that for just this window comprises the head shoulder, preserves the coordinate of window get off, as the testing result of this window;

2, method according to claim 1 is characterized in that the step of described method (3) specifically comprises the steps:

(31) with each samples pictures as a window, window is divided into the piece of M x N, the overlapping of 30%-50% arranged between the piece, M, N are positive integer;

(32) this each piece is divided equally into a plurality of unit;

(33) to pixel compute gradient direction and size in this each unit;

(37) gradient vector of this window is carried out normalization, as the gradient vector of this samples pictures

3, method according to claim 1 is characterized in that, the step of described method (11) changes position of window and size, carries out the classification and Detection of this window, specifically comprises the steps:

(112) change the size of window, and the position of moving window successively, carry out the classification and Detection of this window.