CN105138979A

CN105138979A - Method for detecting the head of moving human body based on stereo visual sense

Info

Publication number: CN105138979A
Application number: CN201510512540.2A
Authority: CN
Inventors: 孙爱娟; 顾国华; 周玉蛟
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2015-08-19
Filing date: 2015-08-19
Publication date: 2015-12-09

Abstract

The invention provides a method for detecting the head of a moving human body based on stereo visual sense. The method comprises the steps: constructing a hardware platform; arranging two calibrated cameras of the same type above the target scene to be shot in parallel, wherein one camera is arranged at the left side and the other one camera is arranged at the right side in the same height; calculating the parallax between two binocular stereo images through a stereo matching algorithm based on a window; acquiring the distance between the cameras and the target scene through triangle operation based on the parallax so as to acquire the original depth image of the target scene; and performing head target segmentation for the original depth image according to the gray level and the geometrical characteristic of the head target of the human body so as to achieve identification of the head of the human body.

Description

Based on the movement human head detection method of stereoscopic vision

Technical field

The present invention relates to the detection and tracking technology of moving target, a kind of especially movement human head object detection method based on stereoscopic vision.

Background technology

Along with the quick raising of the aspect such as Computer Storage, computing performance, computing machine is progressively applied to the sophisticated functionss such as scene reconstruction, target identification, human-computer interaction by people, this has not only opened up scale and the research direction of computer application field, and facilitates the fast development of related discipline.As current active research field, the essence of computer vision utilizes video camera to replace human eye exactly, utilizes computer to replace the brain of people, carries out recognition and tracking to target, and make corresponding pattern analysis process, generate the image being applicable to instrument detection or eye-observation.

The identification of movement human target is the prerequisite of human body in video being carried out to tracking lock and understanding and the behavior of description human body.Human body target recognition technology based on two dimensional image process is a newer technology, also achieves larger progress in recent years.But due to the human body target recognition technology technical finesse based on two dimensional image process is visible images, therefore higher to the requirement of illumination, thus accuracy of identification and speed are all very easy to the impact being subject to illumination.

Summary of the invention

The object of the present invention is to provide a kind of movement human head detection method based on stereoscopic vision, comprise the following steps:

Step S101, building of hardware platform: video camera sustained height one the first from left right side of the same model of having demarcated two is parallel to be placed on directly over target scene to be captured;

Step S102, by based on window Stereo Matching Algorithm calculate binocular stereo image between parallax;

Step S103, adopts the triangulo operation based on parallax to obtain the distance of video camera to target scene, thus obtains the original depth image of target scene;

Step S104, to original depth image according to the gray scale of human body head target and geometric properties, carries out head Target Segmentation, reaches the object of human body head identification.

Adopt said method, described Stereo Matching Algorithm comprises:

Step S1021, two video camera shooting background images;

Step S1022, image and the background image of the shooting of each video camera do difference, obtain two width foreground images;

Step S1023, for benchmark, chooses the unique point in benchmark foreground image with a wherein width foreground image, sets up the window of m*m size centered by this unique point;

Step S1024, another width foreground image is set up the window of m*m size, and in units of pixel moving window, calculate the difference of gray scale on given parallax in two windows;

Step S1025, gray scale difference value and minimum time parallax then as the parallax of this pixel.

Adopt said method, described body head is known method for distinguishing and is comprised:

Step S1041, makes statistics with histogram to depth image, chooses the region at local maximum place as target area;

Step S1042, choose the threshold value of segmentation image, to target area, the pixel being not less than threshold value in region forms doubtful head zone, and the pixel lower than threshold value forms non-head region;

Step S1043, maximum in the target area according to the average gray value of head, adopt average gray and gray variance filtering part nontarget area;

Step S1044, to doubtful head zone remaining after filtering, according to the geometric properties determination human body head of head.

The present invention compared with prior art, have the following advantages: (1) adopts the method based on stereoscopic vision, generated the depth image of target scene by stereovision technique, the three-dimensional information of display-object human body, breach the limitation of two dimensional image by illumination effect; (2) what simultaneously the present invention adopted is top-down style of shooting, even if therefore when crowded, also has certain space between head and head, effectively can avoid the blocking of the stream of people, identification error that the phenomenon such as overlapping causes.

Below in conjunction with Figure of description, the present invention is described further.

Accompanying drawing explanation

Fig. 1 is method flow diagram of the present invention.

Fig. 2 is the present invention's two video camera optimal location arrangenent diagrams.

Fig. 3 is Stereo Matching Algorithm process schematic.

Embodiment

Composition graphs 1, a kind of movement human head detection method based on stereoscopic vision, comprises the following steps:

Demarcation described in step S101 comprises demarcates the Intrinsic Matrix of video camera and outer parameter matrix, and implementation method is as follows:

Step S1011, utilizes the MTV-1881EX-3 video camera of the calibration tool case of matlab to two experiments to demarcate.Obtain a mapping matrix H to every piece image, principle is as formula (1):

s [\begin{matrix} u \\ v \\ 1 \end{matrix}] = K [\begin{matrix} r_{1} & r_{2} & r_{3} & t \end{matrix}] [\begin{matrix} X \\ Y \\ 0 \\ 1 \end{matrix}] = K [\begin{matrix} r_{1} & r_{2} & t \end{matrix}] [\begin{matrix} X \\ Y \\ 1 \end{matrix}] - - - (1)

Suppose that the stencil plane demarcated is in the world coordinate system plane of Z=0, wherein s represents unknown scale factor, [uv1] ^tspot projection on expression stencil plane is to the homogeneous coordinates on the plane of delineation, and K shows video camera internal reference matrix, [r ₁r ₂r ₃] representing the rotation matrix of camera coordinate system relative to world coordinate system, t represents the translation vector of camera coordinate system relative to world coordinate system, [XY1] ^trepresent the homogeneous coordinates that template is put.(1) formula of arrangement, can obtain the homography matrix H of a 3*3,

H＝[h ₁,h ₂,h ₃]＝λK[r ₁,r ₂,t](2)

Wherein λ represents arrangement coefficient out, according to (2) formula, can obtain:

\begin{matrix} r_{1} = \frac{1}{λ} K^{- 1} h_{1} & r_{2} = \frac{1}{λ} K^{- 1} h_{2} \end{matrix} - - - (3)

The wherein character of rotation matrix: r ₁ ^tr ₂=0, and || r ₁||=|| r ₂||=1

By the character of formula (2), (3) and rotation matrix, two constraints substantially of camera intrinsic parameter A can be obtained:

h ₁ ^TK ^-TK ^-1h ₂＝0(4)

h ₁ ^TK ^-TK ^-1h ₁＝h ₂ ^TK ^-TK ^-1h ₂(5)

Can K be calculated according to (4), (5), then calculate every width image for the outer parameter matrix R of plane template and translation vector t by K and mapping matrix H:

r ₁＝λA ^-1h ₁(6)

r ₂＝λA ^-1h ₂(7)

r ₃＝r ₁r ₂(8)

t＝λA ^-1h ₃(9)

Demarcated two video cameras are placed on the top of required photographed scene, highly unanimously by step S1012, left and right is each places one, regulate the distance between left and right cameras according to the height of visual field, under the image taken as far as possible clearly condition, expand the scope of visual field.This experiment is through experimental demonstration, and video camera antenna height is 2.5 meters, and the distance between video camera is 0.8 meter (as described in Figure 2).

Composition graphs 3, in step s 102, described Stereo Matching Algorithm comprises:

Step S1021, two video camera shooting background images;

Step S1023, for benchmark, chooses the unique point in benchmark foreground image with a wherein width foreground image, sets up the window of m*m size, such as 5*5 pixel centered by this unique point;

Step S1024, another width foreground image is set up the window of m*m size, and in units of pixel moving window, calculate the difference of gray scale on given parallax in two windows according to formula (10)

Σ_{p = - \frac{m}{2}}^{\frac{m}{2}} Σ_{q = - \frac{m}{2}}^{\frac{m}{2}} | I_{r i g h t} [x + p] [y + q] - I_{l e f t} [x + p + d] [y + q] | - - - (10)

Wherein, m represents the size of window, in units of pixel.I _left, I _rightrepresent left images grey scale pixel value respectively, p, q represent the distance of window movement, and d represents the parallax value of setting.

Step 5, gray scale difference value and minimum time parallax then as the parallax of this pixel, the d namely in time calculating the minimum value of formula (10) is parallax.

In step s 103, the focal length of known video camera is f, and the distance between two video cameras is B, according to formula (11), calculates the depth information Z of scene

Z = f \cdot \frac{d}{d - B} - - - (11)

The target scene image be made up of the depth information of image is referred to as the depth image of scene.

In step S104, described body head is known method for distinguishing and is comprised:

Step S1042, choose the threshold value (threshold range is [25,30]) of segmentation image, to target area, the pixel being not less than threshold value in region forms doubtful head zone, and the pixel lower than threshold value forms non-head region; According to statistics, target area pixel bulk deposition is in head and shoulder.With gray level t for Threshold segmentation head and shoulder regions, form head zone in region higher than the pixel of t gray level, the pixel lower than gray level t forms non-head region.So the computing formula of the entropy of non-head region and head zone is:

H_{B} = - \underset{i}{Σ} (\frac{p_{i}}{p_{t}}) \lg (\frac{p_{i}}{p_{t}})

H_{O} = - \underset{i}{Σ} [p_{i} / (1 - p_{t})] \lg [p_{i} / (1 - p_{t})]

H_{t} = - \underset{i}{Σ} p_{i} {lgp}_{i}

H_{L} = - \underset{i}{Σ} p_{i} {lgp}_{i}

Wherein, p _ithe ratio of gray-scale value shared by the pixel of i in presentation image, t is the threshold value of segmentation image, h _brepresent non-head region in image unitary gray level entropy, H _orepresent the unitary gray level entropy of head zone in image, be two entropy function sums, when when obtaining maximum, gray level t is as the threshold value of segmentation image.

Step S1043, maximum in the target area according to the average gray value of head, adopt average gray and gray variance filtering part nontarget area.By continuous emulation, known, under the viewing conditions of this experiment, the scope of the ratio of width to height w/h of the total pixel of head is [0.65,1.5], and wherein, w represents the width of the total pixel of head, and h represents the height of the total pixel of head.Be specially: selected threshold [0.65,1.5], to doubtful head zone, if its gray variance is greater than the threshold value of setting, so by this doubtful head zone filtering, when the columns of doubtful head zone falls in [0.65,1.5] than the value of line number (the ratio of width to height), this region is human body head zone.

\overset{&OverBar;}{g} = \frac{Σ_{i = 0}^{M - 1} Σ_{j = 0}^{N - 1} f (i, j)}{M * N}

var = \frac{Σ_{i = 0}^{M - 1} Σ_{j = 0}^{N - 1} {(f (i, j) - \overset{&OverBar;}{g})}^{2}}{M * N}

Wherein, M, N represent the row, column number of doubtful head zone, and f (i, j) represents the frequency that gray feature two tuple (i, j) occurs. represent that size is the average gray of the doubtful head zone of M*N, var represents gray variance.

Claims

1., based on a movement human head detection method for stereoscopic vision, it is characterized in that, comprising:

Building of hardware platform: video camera sustained height one the first from left right side of the same model of having demarcated two is parallel to be placed on directly over target scene to be captured;

By based on window Stereo Matching Algorithm calculate binocular stereo image between parallax;

Adopt the triangulo operation based on parallax to obtain the distance of video camera to target scene, thus obtain the original depth image of target scene;

To original depth image according to the gray scale of human body head target and geometric properties, carry out head Target Segmentation, reach the object of human body head identification.

2. the movement human head detection method based on stereoscopic vision according to claim 1, is characterized in that, described demarcation comprises demarcates the Intrinsic Matrix of video camera and outer parameter matrix.

3. the movement human head detection method based on stereoscopic vision according to claim 1, it is characterized in that, described Stereo Matching Algorithm comprises:

Step 1, two video camera shooting background images;

Step 2, image and the background image of the shooting of each video camera do difference, obtain two width foreground images;

Step 3, for benchmark, chooses the unique point in benchmark foreground image with a wherein width foreground image, sets up the window of m*m size centered by this unique point;

Step 4, another width foreground image is set up the window of m*m size, and in units of pixel moving window, calculate the difference of gray scale on given parallax in two windows;

Step 5, gray scale difference value and minimum time parallax then as the parallax of this pixel.

4. the movement human head detection method based on stereoscopic vision according to claim 1, is characterized in that, described body head is known method for distinguishing and comprised:

Statistics with histogram is done to depth image, chooses the region at local maximum place as target area;

Choose the threshold value of segmentation image, to target area, the pixel being not less than threshold value in region forms doubtful head zone, and the pixel lower than threshold value forms non-head region;

Maximum in the target area according to the average gray value of head, adopt average gray and gray variance filtering part nontarget area;

To doubtful head zone remaining after filtering, according to the geometric properties determination human body head of head.

5. the movement human head detection method based on stereoscopic vision according to claim 4, is characterized in that, the threshold value of described segmentation image is obtained by following formula:

H_{B} = - \underset{i}{Σ} (\frac{p_{i}}{p_{t}}) \lg (\frac{p_{i}}{p_{t}})

H_{O} = - \underset{i}{Σ} [p_{i} / (1 - p_{t})] \lg [p_{i} / (1 - p_{t})]

H_{t} = - \underset{i}{Σ} p_{i} {lgp}_{i}

H_{L} = - \underset{i}{Σ} p_{i} {lgp}_{i}

6. the movement human head detection method based on stereoscopic vision according to claim 4, it is characterized in that, described filtering part nontarget area is specially: selected threshold, to doubtful head zone, if its gray variance is greater than the threshold value of setting, so by this doubtful head zone filtering

\overset{&OverBar;}{g} = \frac{Σ_{i = 0}^{M - 1} Σ_{j = 0}^{N - 1} f (i, j)}{M * N}

var = \frac{Σ_{i = 0}^{M - 1} Σ_{j = 0}^{N - 1} {(f (i, j) - \overset{&OverBar;}{g})}^{2}}{M * N}

7. the movement human head detection method based on stereoscopic vision according to claim 4, is characterized in that, when the columns of doubtful head zone falls in [0.65,1.5] than the value of line number, this region is human body head zone.

8. the movement human head detection method based on stereoscopic vision according to claim 1, it is characterized in that, video camera antenna height is 2.5 meters, and the distance between video camera is 0.8 meter.