CN104166841A

CN104166841A - Rapid detection identification method for specified pedestrian or vehicle in video monitoring network

Info

Publication number: CN104166841A
Application number: CN201410356465.0A
Authority: CN
Inventors: 于慧敏; 谢奕; 郑伟伟; 汪东旭
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2014-07-24
Filing date: 2014-07-24
Publication date: 2014-11-26
Anticipated expiration: 2034-07-24
Also published as: CN104166841B

Abstract

The invention relates to a rapid detection identification method for a specified pedestrian or vehicle in a video monitoring network. According to a target template provided by a user, accurate and rapid detection identification is carried out on a specified target in a monitoring network video. In the method, firstly, normalized scale change is carried out on a target template image and a plurality of kinds of mixed characteristics of the template image are calculated; a mixed Gaussian model is adopted to carry out background modeling on each monitoring video, and then parallel processing is carried out and area filtering and morphological postprocessing are used to extract motion foregrounds; normalized scale change is carried out on all motion foregrounds which meet a preliminary screening condition and a plurality of mixed characteristics are calculated and then weighted similarity distances between the motion foregrounds and the target template are calculated and first N objects, which are smaller than a judgment threshold and have smallest weighted similarity distances with the target template image, are returned and used as detection identification results. On the basis of ensuring the detection identification accuracy, the method significantly improves the processing speed of an algorithm through methods of parallel processing and frame skipping and the like.

Description

In a kind of video surveillance network, specify the fast detecting recognition methods of pedestrian or vehicle

Technical field

The present invention relates to technical field of video image processing, be specially appointment pedestrian in a kind of video surveillance network or the fast detecting recognition methods of vehicle.

Background technology

In field of intelligent monitoring, the appointment pedestrian in monitor video network or vehicle are carried out to detection and Identification and can help public security organ to determine fast time of occurrence and the position of suspect or suspected vehicles, accelerate case investigation efficiency.But owing to existing in practical application scene, illumination condition changes, exists between different monitoring camera parameter differences, moving object (pedestrian, vehicle etc.) exists attitude change and mutually block, and pedestrian or the vehicle size in monitor video is conventionally less, such as the detailed information such as face or car plate can not be guaranteed, therefore how in multi-cam monitor network fast and accurately detection and Identification to go out to specify pedestrian or vehicle be focus and the difficult point of computer vision field research always.

Oreifej is published in technology " Human identity recognition in aerial images " (pedestrian's identification in Aerial Images) article of delivering on 709 to 716 pages of " IEEE Comptuer Society Conference on Computer Vision and Pattern Recognition " (computer society of international IEEE computer vision and pattern-recognition meeting) technology collection and has proposed a kind of taking photo by plane across the pedestrian's recognition methods under video complex background based on vote by ballot in 2010, first this technology utilizes the pedestrian region in HoG (Histograms of Oriented Gradient gradient orientation histogram) feature detection video, form the pedestrian's set occurring in video sequence, be called candidate's set, candidate is voted by characteristic similarity for target pedestrian's characteristic set of training by pre-prepd, the highest pedestrian of gained vote is suspicious object.This algorithm preferably resolves the target pedestrian's identification problem under the complex environments such as image resolution ratio is lower, pedestrian's attitude is changeable.But the method be because will be in video image multiple dimensioned scanning repeatedly detect pedestrian region one by one, and need multiple To Template images to gather as voter, therefore Shortcomings all in speed and practical feasibility.

Because monitor video has fixed viewpoint conventionally, the method of cutting apart sport foreground by background modeling can effectively be implemented, than utilizing HoG feature, video image is carried out to multiple dimensioned scanning differentiation, these class methods are cut apart fast and efficiently the detection of moving object under fixed viewpoint scene, but the method will be no longer applicable in the time that camera existence is mobile.

Reducing training link is also the difficult point in target detection identification, although some detection recognizer recognition accuracies based on artificial neural network and support vector machine are high at present, but calculated amount is larger, and need to provide in advance a large amount of To Template images for training, the field using value less in To Template information such as criminal investigations is not high.

Said method respectively has relative merits, how for application-specific scene, said method to be carried out to effective combination, and current research is also insufficient.This impels finds a kind of rational method framework that is applicable to criminal investigation work in monitor network, and in ensuring method accuracy rate, raising method speed, reaches real-time processing.

Summary of the invention

The invention provides the fast detecting recognition methods of specifying pedestrian or vehicle in a kind of video surveillance network, slower to solve prior art detection speed, and mostly need the problem of training in advance under the enterprising line of training set.

For achieving the above object, the present invention adopts the technical scheme of following meter: in a kind of video surveillance network, specify the fast detecting recognition methods of pedestrian or vehicle, comprise the following steps:

Step 1: the mode of first demarcating by rectangle frame specifies picture or the video center of pedestrian or vehicle to get identification target to be detected from comprising;

Step 2: selected To Template image is normalized to change of scale, and calculates the multiple composite character of To Template image;

Step 3: offer independently thread for the monitor video that each camera is taken, to each camera parallel processing in monitor network, utilize gauss hybrid models to carry out respectively background modeling to the residing monitoring scene of each camera;

Step 4: utilize area filtering and morphology aftertreatment to extract the sport foreground target in each camera; Carry out the too small moving target of filtering area by setting moving target area threshold, to filter out other moving object that does not belong to pedestrian or vehicle in scene;

Step 5: each monitor video frame-skipping is extracted to sport foreground, all moving object that meets rough handling requirement in video surveillance network is normalized change of scale and calculates multiple composite character, be weighted similarity measurement with To Template image afterwards;

Step 6: according to the similarity size of template image, the information of the detected moving object in each monitor video in monitor network is dynamically put into the result vector V of designated length _result, when after all monitor videos of traversal, return to this result vector as final detection recognition result.

Further, the multiple composite character described in step 2 is main color characteristic, block margin direction histogram feature, piecemeal HSV histogram feature, HoG feature and invariable rotary LBP feature of equal value.Its specific implementation process is:

Step 2.1: wherein the generative process of main color characteristic is:

1. the To Template image after Normalized Scale conversion is transformed into HSV space from rgb space, only extracts the tone component of image in HSV space;

2. the span of tone component is divided into 8 intervals, the tone component of To Template image is become to 8 dimension H histogram of component chist according to these 8 interval projections, wherein i of chist interval value chist _iobtained by following formula:

{chist}_{i} = \underset{(x, y) &Element; Rect}{Σ} δ_{i} (h_{x, y})

Wherein h _x,ythe tone component value that refers to image (x, y) coordinate position pixel, Rect refers to the foreground area that To Template image or sport foreground detect, δ _i(h _x,y) be defined as follows:

δ_{i} (h_{x, y}) = \{\begin{matrix} 1, & h_{x, y} &Element; [45 * i, 45 * (i + 1)) \\ 0, & otherwise \end{matrix}

3. calculate respectively 8 average color tone pitch c between tone zone _i, and 8 dimension histograms are normalized, obtain from 8 dimension normalization histograms, choose p _i3 maximum dimensions, use vector v _dcpreserve the corresponding c of this 3 dimension _iand p _iform the main color characteristic of this target.Therefore main color characteristic v _dcformed altogether corresponding tone value and the percentage v of 3 color components of accounting maximum in target image by 6 dimensions _dc=[c ₁, p ₁, c ₂, p ₂, c ₃, p ₃];

Step 2.2: wherein the generative process of block margin direction histogram is:

1. first the target image after Normalized Scale conversion is converted into gray level image by RGB coloured image, and is divided into 4 × 4 totally 16;

2. utilize Sobel horizontal edge detection operator and Sobel vertical edge to detect operator gray level image is carried out to filtering, obtain horizontal edge intensity and the vertical edge intensity of each pixel, and utilize them to obtain the edge direction e of each pixel _x,yand edge strength | e _x,y|;

3. pixel is divided into five types according to edge direction and intensity: horizontal edge, vertical edge, 45 ° of edges, 135 ° of edges and directionless edge; Utilize four kinds of edges above to build edge orientation histogram;

4. set directionless edge threshold T _e, in each piecemeal, all edge strengths are greater than to T _epixel carry out statistics with histogram according to edge direction, generate 4 dimension edge orientation histograms; 4 × 4 piecemeal symbiosis become 64 dimension edge orientation histograms, and this 64 dimensional vector is saved as to local edge direction feature ehist ^l; Again the dimension of 4 in 16 piecemeals edge orientation histogram is added up, form the overall edge direction characteristic ehist of 4 dimension of view picture target image ^g; They are merged into 68 dimensional vectors, save as block margin direction histogram feature v _ed;

Step 2.3: wherein the histogrammic generative process of piecemeal HSV is:

1. To Template image or sport foreground image after Normalized Scale conversion are projected to HSV color space by rgb color space, and by image from top to bottom level be divided into 4;

2. calculate respectively 4 HSV histograms in piecemeal, wherein tone component is divided into 16 intervals, and saturation degree component and luminance component are respectively divided into 4 intervals, and each piecemeal finally obtains 24 dimension histogram features; 4 24 dimension histograms are spliced to form to 96 of entire image and tie up piecemeal HSV histograms;

Step 2.4: wherein the generative process of HoG feature is:

1. the To Template image after Normalized Scale conversion is converted into gray level image, and gray level image is carried out to Gamma correction, reduce the impact on characteristic extraction procedure of local shade and illumination variation;

2. use [1,0,1] gradient operator to do convolution algorithm to image, obtain the horizontal gradient G of each pixel _x(x, y), then use [1,0 ,-1] ^tgradient operator is done convolution algorithm to image, obtains the VG (vertical gradient) G of each pixel _y(x, y); And utilize horizontal gradient and VG (vertical gradient) compute gradient amplitude and gradient direction:

G (x, y) = \sqrt{G_{x} {(x, y)}^{2} + G_{y} {(x, y)}^{2}}

dir (x, y) = \tan^{- 1} (\frac{G_{y} (x, y)}{G_{x} (x, y)})

3. image is divided into the cell unit of multiple 8 × 8 pixels, gradient direction is on average divided into 9 intervals by span, according to the gradient direction of each pixel in cell unit, generate 9 dimension histogram of gradients;

4. set every 4 adjacent piecemeals of cell unit composition, each piecemeal has 16 × 16 pixels, and a point interblock can be overlapped; By 9 dimension histogram of gradients series connection of 4 cell unit in each piecemeal, obtain the piecemeal description vectors of 36 dimensions;

5. set 64 × 64 window, window is vertically slided from top to bottom, slip is spaced apart 8 pixels, the description vectors of all piecemeals that window is comprised is connected, form the description vectors of whole window, again the description vectors of all windows is connected, finally obtain HoG proper vector v _hog;

Step 2.5: wherein the generative process of invariable rotary LBP feature of equal value is:

1. To Template image or sport foreground image after Normalized Scale conversion are converted into gray level image, for each pixel in figure, relatively itself and the gray-scale value size of 8 neighbor pixels around, neighbours put gray-scale value and are greater than this pixel and are set to 1, otherwise are set to 0; In the direction of the clock 8 numeric strings are unified into one 82 system numbers since 12 o ' clock positions again, computing formula is as follows:

{LBP}_{P, R} = Σ_{y = 0}^{P - 1} s (g_{y} - g_{c}) 2^{y}

s (x) = \{\begin{matrix} 1, & x &GreaterEqual; 0 \\ 0, & x < 0 \end{matrix}

Wherein, P is neighbours' number, and R is radius size, g _yand g _cbe respectively the gray-scale value of neighbor pixel point and the gray-scale value of center pixel;

2. each 8 the 2 system numbers that obtain are joined end to end, form an annular, this annular is turned clockwise 7 times, can obtain altogether 8 groups 82 system numbers, choose 8 group of 2 system and count 2 system numbers of intermediate value minimum, this number is the invariable rotary LBP value that pixel is corresponding;

3. the invariable rotary LBP value 2. being obtained by step is divided into two classes, wherein no more than 2 times of 0-1 conversion times be classified as a class, be called consistent LBP operator; Remaining 2 system number is all classified as another kind of; By above-mentioned classification, consistent invariable rotary LBP operator has 9 kinds, and non-uniform LBP operator has a kind;

4. image is divided into the fritter of multiple 16 × 16 pixels, calculates pixel LBP value in each fritter, form the invariable rotary LBP histogram of equal value of one 10 dimension; The LBP histogram of all fritters is connected, finally obtain the invariable rotary LBP feature v of equal value of 840 dimensions _lBP;

Further, the foreground target in each camera is extracted in utilize area filtering and morphology aftertreatment described in step 4, and its specific implementation comprises following sub-step:

Step 4.1: mixed Gaussian background modeling and thresholding binary image after treatment are carried out to an area filtering;

Step 4.2: binary image is carried out to 3 expansive workings by 3 × 3 templates;

Step 4.3: use again 3 × 3 templates to carry out 1 corrosion operation to binary image;

Step 4.4: binary image is finally carried out to an area filtering again.

Area filtering described in step 4 refers to the fritter that binary image is divided into multiple 4 × 4 pixels, if the number of foreground pixel point is less than or equal at 3 o'clock in certain fritter, all pixels in this fritter are defined as to background pixel point, otherwise retain the foreground pixel point in this fritter;

Passing through described in step 4 set moving target area threshold and carrys out the too small moving target of filtering area, its specific implementation process is: find out and process all connected regions in rear binary image, calculate the number of each connected region foreground pixel point, set area threshold Th _cfor 1/400 of video frame images area, if the area of connected region is greater than Th _cretain connected region, and return to the minimum rectangle frame that comprises this connected region as sport foreground region; If connected region area is less than or equal to Th _cthis connected region is made as to background area.

Further, described in step 5, each monitor video frame-skipping is extracted to sport foreground, its specific implementation process is: F is counted in the frame-skipping of setting according to user, again the sport foreground object extracting in step 4 is weighted to similarity measurement with To Template image one by one every F frame;

Weighting similarity measurement described in step 5, its specific implementation process is: the main color characteristic distance B of first calculating respectively sport foreground object and To Template image _dc, block margin direction histogram characteristic distance D _ed, piecemeal HSV histogram feature distance B _hsv, HoG characteristic distance D _hogwith invariable rotary LBP characteristic distance D of equal value _lBP, then the distance of five kinds of features is normalized to weighting similarity measurement, obtain weighting similarity distance D _all:

Wherein α, beta, gamma, λ, for normalization weighting coefficient, the impact for the each feature of balance on global similarity tolerance, tests to adjust according to specific environment in advance.

Further, the result vector V that the information of the detected moving object in each monitor video in monitor network is dynamically put into designated length described in step 6 _result, its specific implementation comprises following sub-step:

Step 6.1: create a vectorial V with designated length N _result, wherein N be in the monitor video network that need to return with specify pedestrian or the immediate object number of car modal image;

Step 6.2: when there being new sport foreground to be detected in monitor video network, first judge V _resultthe object number of middle preservation;

Step 6.3: if V _resultthe object number of middle preservation is less than N, as the weighting similarity distance D of object and To Template image _allmeet predefined similarity threshold value Th _alltime, by the camera of living in numbering of this object, there is frame number, occur that the information such as subject image after coordinate position and the Normalized Scale conversion of rectangle frame in this frame are retained in vectorial V _resultin;

Step 6.4: if V _resultthe object number of middle preservation has reached N, and works as the weighting similarity distance D of object and To Template image _allmeet predefined similarity threshold value Th _alltime, the weighting similarity distance D obtaining with this object _allwith V _resultin the weighting similarity distance that obtains apart from maximum object with To Template compare, if substitute V with this object _resultin with To Template apart from maximum object, and by V _resultthe N of a middle preservation object is according to resequencing from small to large with To Template similarity distance;

Step 6.5: after the monitor video that in traversal monitor network, all cameras are taken, the V returning _resultin vector in store whole monitor network with specify pedestrian or the immediate N of a car modal image object, and comprise they separately residing camera numbering, the frame number occurring, there is the object image information after the coordinate position of rectangle frame in this frame and Normalized Scale convert.

Principle of the present invention is, pedestrian in monitor video network or vehicle must exist the process of motion just can enter and leave monitoring range in certain time period, and monitoring camera all has fixing visual angle, therefore if to specifying pedestrian or vehicle to carry out fast detecting identification, only need carry out respectively background modeling to each camera in network, extract the foreground object moving in each camera, after preliminary screening, each camera is offered separately to thread, carry out many characteristic weighings with the moving object after template image and the primary election of appointment pedestrian or vehicle and merge similarity measurements, again the similarity result in each camera is gathered to sequence, can obtain top n moving object the most similar to specifying pedestrian or vehicle in whole monitor network, as the result of fast detecting identification, because the sport foreground comprising in adjacent several frames in monitor video does not generally have too large variation, therefore the present invention can carry out frame-skipping processing in the process of background modeling and similarity measurement, carry out sport foreground similarity measurement one time every F frame, with the arithmetic speed of this further quickening system.

The present invention has the following advantages and good effect:

1) compared with prior art, the present invention has utilized monitoring camera visual angle to fix with pedestrian or vehicle must there is this two dot characteristics of motion process, adopt the method for Analysis on Prospect tentatively to determine target detection scope, compare multiple dimensioned searching algorithm, can effectively dwindle the target zone that detects identification, improve the efficiency that detects identification;

2) the present invention fully utilizes various features, considers to specify pedestrian or vehicle and each sport foreground in many-sided similarity degree such as color, shape, textures simultaneously, and multi-aspect information is weighted to fusion;

3) the present invention only needs simple target template as input, does not need to provide the database of detected target for training, has greatly increased the method feasibility in actual applications;

4) the present invention has introduced multi-cam parallel processing and frame-skipping processing scheduling algorithm, specifies the detection of pedestrian or vehicle to identify, the operation time that can effectively reduce again method thereby the present invention had both accurately been completed in complex scene robust.

Brief description of the drawings

Fig. 1 is overall flow schematic diagram of the present invention;

Fig. 2 (a) is monitoring scene schematic diagram of the present invention;

Fig. 2 (b) is mixed Gaussian background modeling result figure of the present invention;

Fig. 2 (c) is morphology aftertreatment result figure of the present invention;

Fig. 2 (d) is cut apart net result figure for sport foreground of the present invention;

Fig. 3 is intended target template image schematic diagram of the present invention;

Fig. 4 is intended target fast detecting recognition result figure of the present invention;

Embodiment

Below in conjunction with specific embodiment, further set forth this invention.The present embodiment is implemented under taking technical solution of the present invention as prerequisite, provided detailed embodiment and concrete operating process, but protection scope of the present invention is not limited to following embodiment.

Embodiment

The selected software environment of the present invention: Windows7.0 operating system, VS2010 development platform, C/C++ development language, OpenCV2.3.1 kit.

The present embodiment is processed certain monitor network.In this monitor network, video scene is comparatively complicated, has a large amount of moving object, and has mutually and block between moving object, between each camera, also due to parameter and position relationship, exists illumination variation and aberration.

Ask for an interview Fig. 1, the technical solution adopted in the present invention is: the appointment pedestrian in a kind of video surveillance network or the fast detecting recognition methods of vehicle, comprise the following steps:

User is provided and is comprised picture or the monitor video data of specifying pedestrian or vehicle by man-machine interface, the method of utilizing rectangle frame to demarcate, determine pedestrian or the residing position of vehicle from this picture or video frame images center fetching, the region that rectangle frame is comprised is divided into separately piece image.

Be 64 × 128 pixel sizes by To Template image normalization change of scale, calculate respectively main color characteristic, block margin direction histogram feature, piecemeal HSV histogram feature, HoG feature and the invariable rotary LBP feature of equal value of To Template image, appointment pedestrian or vehicle are expressed from many-sides such as color, shape, textures.

Step 2.1: wherein the generative process of main color characteristic is:

1. the To Template image after Normalized Scale conversion is transformed into HSV space from rgb space, only extracts the tone component of image in HSV space; Main color characteristic only adopts tone component to calculate to be equivalent to the color composition of only considering target, thereby makes illumination variation and the saturation difference of feature to different cameras insensitive, with the robustness of this Enhanced feature.

{chist}_{i} = \underset{(x, y) &Element; Rect}{Σ} δ_{i} (h_{x, y})

Wherein h _x,ythe tone component value that refers to image (x, y) coordinate position pixel, Rect refers to the foreground area that To Template image or sport foreground detect.δ _i(h _x,y) be defined as follows:

δ_{i} (h_{x, y}) = \{\begin{matrix} 1, & h_{x, y} &Element; [45 * i, 45 * (i + 1)) \\ 0, & otherwise \end{matrix}

3. calculate respectively 8 average color tone pitch c between tone zone _i, and 8 dimension histograms are normalized, obtain from 8 dimension normalization histograms, choose p _i3 maximum dimensions, use vector v _dcpreserve the corresponding c of this 3 dimension _iand p _iform the main color characteristic of this target.Therefore main color characteristic v _dcformed altogether corresponding tone value and the percentage v of 3 color components of accounting maximum in target image by 6 dimensions _dc=[c ₁, p ₁, c ₂, p ₂, c ₃, p ₃]

Given two main color feature vectors, the distance B between them _dcbe defined as follows:

D_{dc} (v_{dc}, v_{dc}^{'}) = \sqrt{Σ_{i = 1}^{3} p_{i}^{2} + Σ_{j = 1}^{3} p_{j}^{' 2} - Σ_{i = 1}^{3} Σ_{j = 1}^{3} 2 a_{i, j} p_{i} p_{j}^{'}}

A in above formula _i,jbe used for characterizing color component c _iand c _jsimilarity degree, it defines by following formula:

a_{i, j} = \{\begin{matrix} 1 - \frac{d_{i, j}}{d_{\max}}, & i {f   d}_{i, j} \leq T_{d} \\ 0, & ohterwise \end{matrix}

T in this example _d=25, d _max=50.

1. first the target image after Normalized Scale conversion is converted into gray level image by RGB coloured image, and is divided into 4 × 4 totally 16.

2. utilize Sobel horizontal edge detection operator and Sobel vertical edge to detect operator gray level image is carried out to filtering, obtain horizontal edge intensity and the vertical edge intensity of each pixel, and utilize them to obtain the edge direction e of each pixel _x,yand edge strength | e _x,y|.

3. conventional main color characteristic is divided into five types by pixel according to edge direction and intensity: horizontal edge, vertical edge, 45 ° of edges, 135 ° of edges and directionless edge; Wherein directionless edge refers to that edge strength is less than the pixel of a certain specific threshold; Because directionless edge pixel point accounts for significant proportion conventionally in entire image, therefore only consider in the present invention that four kinds of edges above form edge orientation histogram.

4. set directionless edge threshold T _e, T in this example _e=100.In each piecemeal, all edge strengths are greater than to T _epixel carry out statistics with histogram according to edge direction, generate 4 dimension edge orientation histograms.4 × 4 piecemeal symbiosis become 64 dimension edge orientation histograms, and this 64 dimensional vector is saved as to local edge direction feature ehist ^l; Again the dimension of 4 in 16 piecemeals edge orientation histogram is added up, form the overall edge direction characteristic ehist of 4 dimension of view picture target image ^g; They are merged into 68 dimensional vectors, save as block margin direction histogram feature v _ed.

Given two block margin direction histogram features, the distance B between them _edbe defined as:

D_{ed} (v_{ed}, v_{ed}^{'}) = Σ_{i = 1}^{64} | {ehist}_{i}^{l} - {ehist}_{i}^{' l} | + 5 \times Σ_{i = 1}^{4} | {ehist}_{i}^{g} - {ehist}_{i}^{' g} |

Than local component, the present invention has given larger weight to global component.

Step 2.3: wherein the histogrammic generative process of piecemeal HSV is:

1. To Template image or sport foreground image after Normalized Scale conversion are projected to HSV color space by rgb color space, and by image from top to bottom level be divided into 4.

2. calculate respectively 4 HSV histograms in piecemeal, wherein tone component is divided into 16 intervals, and saturation degree component and luminance component are respectively divided into 4 intervals, and therefore each piecemeal finally obtains 24 dimension histogram features.4 24 dimension histograms are spliced to form to 96 of entire image and tie up piecemeal HSV histograms.

In order to adapt to the application scenarios of vehicle and pedestrian detection identification, be in To Template or sport foreground image, to include some background element topmost and bottom, and center section is mainly the important information such as vehicle body or pedestrian's clothes, this example has been given larger weight to middle two-layer piecemeal in the time of the distance of measuring between two piecemeal HSV histogram features.

Given two piecemeal HSV histogram features, distance B between them _hsvbe defined as follows:

D _hsv(v _hsv,v′ _hsv)＝0.8·d(hist ₁,hist ₁′)+1.2·d(hist ₂,hist′ ₂)+1.0·d(hist ₃,hist ₃′)+0.8·d(hist ₄,hist′ ₄)

Wherein d (hist, hist ') is defined as card side's distance in this example:

d (hist, {hist}^{'}) = Σ_{i = 1}^{24} \frac{{({hist}_{i} - {hist}_{i}^{'})}^{2}}{{hist}_{i} + {hist}_{i}^{'}}

Step 2.4: wherein the generative process of HoG feature is:

1. the To Template image after Normalized Scale conversion is converted into gray level image, and gray level image is carried out to Gamma correction, reduce the impact on characteristic extraction procedure of local shade and illumination variation.

G (x, y) = \sqrt{G_{x} {(x, y)}^{2} + G_{y} {(x, y)}^{2}}

dir (x, y) = \tan^{- 1} (\frac{G_{y} (x, y)}{G_{x} (x, y)})

3. image is divided into the cell unit of multiple 8 × 8 pixels, every width image will be divided into 128 cell unit in this example, gradient direction is on average divided into 9 intervals by span, according to the gradient direction of each pixel in cell unit, generates 9 dimension histogram of gradients.

4. set every 4 adjacent piecemeals of cell unit composition, each piecemeal has 16 × 16 pixels, and a point interblock can be overlapped; By 9 dimension histogram of gradients series connection of 4 cell unit in each piecemeal, obtain the piecemeal description vectors of 36 dimensions.

5. set 64 × 64 window, window is vertically slided from top to bottom, slip is spaced apart 8 pixels, the description vectors of all piecemeals that window is comprised is connected, form the description vectors of whole window, again the description vectors of all windows is connected, finally obtain HoG proper vector v _hog; The HoG feature generating in the present embodiment has 15876 dimensions, and the distance metric between Hog proper vector adopts card side's distance:

D_{hog} (v_{hog}, v_{hog}^{'}) = Σ_{i = 1}^{15876} \frac{{(v_{hog} (i) - v_{hog}^{'} (i))}^{2}}{v_{hog} (i) + v_{hog}^{'} (i)}

1. To Template image or sport foreground image after Normalized Scale conversion are converted into gray level image, for each pixel in figure, relatively itself and the gray-scale value size of 8 neighbor pixels around, neighbours put gray-scale value and are greater than this pixel and are set to 1, otherwise be set to 0, more in the direction of the clock 8 numeric strings be unified into one 82 system numbers since 12 o ' clock positions.Computing formula is as follows:

{LBP}_{P, R} = Σ_{y = 0}^{P - 1} s (g_{y} - g_{c}) 2^{y}

s (x) = \{\begin{matrix} 1, & x &GreaterEqual; 0 \\ 0, & x < 0 \end{matrix}

Wherein, P is neighbours' number, and in this example, being made as 8, R is radius size, is made as 1, g in this example _yand g _cbe respectively the gray-scale value of neighbor pixel point and the gray-scale value of center pixel.

2. each 8 the 2 system numbers that obtain are joined end to end, form an annular, this annular is turned clockwise 7 times, can obtain altogether 8 groups 82 system numbers, choose 8 group of 2 system and count 2 system numbers of intermediate value minimum, this number is the invariable rotary LBP value that pixel is corresponding; By this step, 2 possible systems are exported by 2 ⁸be reduced to 36 kinds, and the LBP value obtaining has robustness to the rotation of image.

3. the invariable rotary LBP value 2. being obtained by step is divided into two classes, wherein no more than 2 times of 0-1 conversion times be classified as a class, be called consistent LBP operator; Remaining 2 system number is all classified as another kind of; By above-mentioned classification, consistent invariable rotary LBP operator has 9 kinds, and non-uniform LBP operator has a kind.

4. image is divided into the fritter of multiple 16 × 16 pixels, calculates pixel LBP value in each fritter, form the invariable rotary LBP histogram of equal value of one 10 dimension; The LBP histogram of all fritters is connected, finally obtain the invariable rotary LBP feature v of equal value of 840 dimensions _lBP; Distance B between invariable rotary LBP feature of equal value _lBPthe same card side's distance that adopts is measured:

D_{LBP} (v_{LBP}, v_{LBP}^{'}) = Σ_{i = 1}^{840} \frac{{(v_{LBP} (i) - v_{LBP}^{'} (i))}^{2}}{v_{LBP} (i) + v_{LBP}^{'} (i)}

Each camera in monitor network has fixing video scene, and in video scene, the gray-scale value of each pixel can be described with mixed Gauss model

P (X_{t}) = Σ_{k = 1}^{K} w_{k}^{t} \cdot η (X_{t}, μ_{k}^{t}, Σ_{k}^{t})

Wherein, η is Gaussian probability-density function, and K is the number upper limit of Gaussian function in mixed Gauss model, and K is set as 5 in this example, with be respectively weight, average and the variance of k Gauss model in the t frame moment.

In the present invention, the mixed Gaussian background modeling process of each camera realizes parallel computation by offering separately thread.Utilize gauss hybrid models to carry out the concrete steps of background modeling as follows:

1. carry out the mixed Gauss model of the each pixel of initialization with the gray-scale value of the each pixel of monitor video the first frame.Now mixed Gauss model only has a Gaussian function to be initialised, and its average is the gray-scale value of current pixel, and variance is set to fixed value σ ²=30, Gauss's weights are made as 0.05.

2. in the time reading in a new two field picture, by Gaussian function weights order from large to small check each Gaussian function whether therewith the gray-scale value of pixel match.The difference that is grey scale pixel value and this Gaussian function average is no more than Th _d=2.5 σ=13.69.If find the Gaussian function of coupling, jump to step 3..If this gray scale is not mated with any one Gaussian function, 1. reinitialize a new Gaussian function according to step.In the time there is the Gaussian function of no initializtion in mixture model, directly carry out initialization with new Gaussian function; In the time that K the Gaussian function of setting all used, replace the Gaussian function of weights minimum in current mixture model with this new Gaussian function.

3. when determining after the Gaussian function that current pixel gray-scale value is corresponding, need to upgrade weights, average and the variance of each Gaussian function having used in mixture model.The modeling of background and renewal need regular hour accumulation, specify this time length of window L=200.When the frame number reading in when video is less than 200, Gaussian function more new formula is:

{\hat{w}}_{k}^{N + 1} = {\hat{w}}_{k}^{N} + \frac{1}{N + 1} (\hat{p} (ω_{k} | X_{N + 1}) - {\hat{w}}_{k}^{N})

{\hat{μ}}_{k}^{N + 1} = {\hat{μ}}_{k}^{N} + \frac{\hat{p} (ω_{k} | X_{N + 1})}{Σ_{i = 1}^{N + 1} \hat{p} (ω_{k} | X_{i})} (X_{N + 1} - {\hat{μ}}_{k}^{N})

{\hat{Σ}}_{k}^{N + 1} = {\hat{Σ}}_{k}^{N} + \frac{\hat{p} (ω_{k} | X_{N + 1})}{Σ_{i = 1}^{N + 1} \hat{p} (ω_{k} | X_{i})} ((X_{N + 1} - {\hat{μ}}_{k}^{N}) {(X_{N + 1} - {\hat{μ}}_{k}^{N})}^{T} - {\hat{Σ}}_{k}^{N})

Wherein, N is frame number, ω _kfor recording the sequence number of k Gaussian function in weights descending sort. for two-valued function, work as ω _kwhen identical with the Gaussian function sequence number of mating, its value is 1, otherwise is 0.

When after frame number overtime length of window L, more new formula is:

{\hat{w}}_{k}^{N + 1} = {\hat{w}}_{k}^{N} + \frac{1}{L} (\hat{p} (ω_{k} | X_{N + 1}) - {\hat{w}}_{k}^{N})

{\hat{μ}}_{k}^{N + 1} = {\hat{μ}}_{k}^{N} + \frac{1}{L} (\frac{\hat{p} (ω_{k} | X_{N + 1}) X_{N + 1}}{{\hat{w}}_{k}^{N + 1}} - {\hat{μ}}_{k}^{N})

{\hat{Σ}}_{k}^{N + 1} = {\hat{Σ}}_{k}^{N} + \frac{1}{L} (\frac{\hat{p} (ω_{k} | X_{N + 1}) (X_{N + 1} - {\hat{μ}}_{k}^{N}) {(X_{N + 1} - {\hat{μ}}_{k}^{N})}^{T}}{{\hat{w}}_{k}^{N + 1}} - {\hat{Σ}}_{k}^{N})

After renewal, then the weights of each Gaussian function in mixed Gauss model are normalized.

4. the descending sequence of weight after each Gaussian function being upgraded according to it, determines that weight addition is greater than Th _w=0.7 front B Gaussian function is the Gaussian function of describing background.If the Gaussian function mating with current pixel comes front B position, judge that this pixel is background pixel, otherwise, be foreground pixel.

The binary image that mixed Gauss model is obtained carries out area filtering and morphology processing can be removed noise and hole, and its specific implementation comprises following sub-step:

Step 4.4: binary image is finally carried out to an area filtering again.

Wherein area filtering refers to the fritter that binary image is divided into multiple 4 × 4 pixels, if the number of foreground pixel point is less than or equal at 3 o'clock in certain fritter, all pixels in this fritter are defined as to background pixel point, otherwise retain the foreground pixel point in this fritter;

Because each second of the video sequence in monitor video network comprises 25 frames, therefore pedestrian or vehicle all can repeat in adjacent several frames, in order to accelerate algorithm speed, the present invention introduces frame-skipping algorithm, F is counted in the frame-skipping of setting according to user, again the sport foreground object extracting in step 4 is weighted to similarity measurement with To Template image one by one every F frame.In this example, set F=3.

Sport foreground image is normalized to change of scale, be converted into 64 × 128 pixel images, and calculate respectively main color characteristic, block margin direction histogram feature, piecemeal HSV histogram feature, HoG feature and the invariable rotary LBP feature of equal value of sport foreground according to the method in step 2.

For described weighting similarity measurement, its specific implementation process is: first utilize respectively the definition of distance metric between each feature to calculate the main color characteristic distance B of sport foreground object and To Template image _dc, block margin direction histogram characteristic distance D _ed, piecemeal HSV histogram feature distance B _hsv, HoG characteristic distance D _hogwith invariable rotary LBP characteristic distance D of equal value _lBP, then the distance of five kinds of features is normalized to weighting similarity measurement, obtain weighting similarity distance D _all:

Wherein α, beta, gamma, λ, for normalization weighting coefficient, the impact for the each feature of balance on global similarity tolerance, tests to adjust according to specific environment in advance.Their value is respectively 10,1 in this example, and 5,1.2 and 1.

N=4 in this example, and Th _all=50.

Implementation result

According to above-mentioned steps, the video in monitor network is specified to fast detecting and the identification of pedestrian or vehicle.Fig. 2 has provided the result figure that sport foreground is cut apart each stage, can see that from Fig. 2 (c) sport foreground region is estimated comparatively accurately by mixed Gaussian background modeling, area filtering and morphology aftertreatment; In Fig. 2 (d), by by connected region filtering too small area, net result has accurately returned to the residing rectangle frame of all sport foreground objects in scene.

Fig. 3 is that in this example, certain specifies pedestrian's template image schematic diagram, and Fig. 4 the present invention is directed to the retrieval result figure that this template image returns.Can find out, 4 return results is all the correct coupling to To Template image, even if change has occurred for personage's attitude and background of living in, the present invention still can find correct result from complex scene.

In this example, all experiments all realize on common PC computing machine, COMPUTER PARAMETER: central processing unit Interl (R) Core (TM) i3-2100 CPU@3.10GHz, internal memory 4GB.The processing speed of monitor video is relevant with the dense degree of moving object in scene, the processing speed of average every frame is 29ms in this example, add frame-skipping operation and parallel processing, can carry out fast detecting identification to the appointment pedestrian in monitor network or vehicle in real time.

Claims

1. a fast detecting recognition methods for the appointment pedestrian in video surveillance network or vehicle, is characterized in that, comprises the steps:

Step 2: selected To Template image is normalized to change of scale, and generates the multiple composite character of To Template image;

2. the fast detecting recognition methods of the appointment pedestrian in video surveillance network according to claim 1 or vehicle, it is characterized in that: the multiple composite character described in step 2 is main color characteristic, block margin direction histogram feature, piecemeal HSV histogram feature, HoG feature and invariable rotary LBP feature of equal value, and its concrete generative process is:

Step 2.1: wherein the generative process of main color characteristic is:

Step 2.3: wherein the histogrammic generative process of piecemeal HSV is:

Step 2.4: wherein the generative process of HoG feature is:

4. image is divided into the fritter of multiple 16 × 16 pixels, calculates pixel LBP value in each fritter, form the invariable rotary LBP histogram of equal value of one 10 dimension; The LBP histogram of all fritters is connected, finally obtain the invariable rotary LBP feature v of equal value of 840 dimensions _lBP.

3. the fast detecting recognition methods of the appointment pedestrian in video surveillance network according to claim 1 or vehicle, it is characterized in that: the foreground target in each camera is extracted in utilize area filtering and morphology aftertreatment described in step 4, and its specific implementation comprises following sub-step:

Step 4.4: binary image is finally carried out to an area filtering again;

4. the fast detecting recognition methods of the appointment pedestrian in video surveillance network according to claim 1 or vehicle, it is characterized in that: described in step 5, each monitor video frame-skipping is extracted to sport foreground, its specific implementation process is: F is counted in the frame-skipping of setting according to user, again the sport foreground object extracting in step 4 is weighted to similarity measurement with To Template image one by one every F frame;

5. the fast detecting recognition methods of the appointment pedestrian in video surveillance network according to claim 1 or vehicle, is characterized in that: the result vector V that the information of the detected moving object in each monitor video in monitor network is dynamically put into designated length described in step 6 _result, its specific implementation comprises following sub-step: