CN104166841B

CN104166841B - The quick detection recognition methods of pedestrian or vehicle is specified in a kind of video surveillance network

Info

Publication number: CN104166841B
Application number: CN201410356465.0A
Authority: CN
Inventors: 于慧敏; 谢奕; 郑伟伟; 汪东旭
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2014-07-24
Filing date: 2014-07-24
Publication date: 2017-06-23
Anticipated expiration: 2034-07-24
Also published as: CN104166841A

Abstract

A kind of quick detection recognition methods the present invention relates to specify pedestrian or vehicle in video surveillance network.According to the To Template that user provides, to specifying target that accurately and quickly to be detected identification in Internet video is monitored.The present invention is normalized various composite characters of change of scale and calculation template image to To Template image first；Then background modeling is carried out to each monitor video using mixed Gauss model, parallel processing extracts sport foreground using area filtering and morphology post processing；Change of scale is normalized to all sport foregrounds for meeting preliminary screening condition and various composite characters are calculated, the weighting similarity distance between each sport foreground and To Template is calculated afterwards, to be returned less than decision threshold and with the top n object of To Template image weighting similarity distance minimum, as detection recognition result.The present invention substantially increases the processing speed of algorithm by methods such as parallel processing and frame-skippings on the basis of detection recognition accuracy is ensured.

Description

The quick detection recognition methods of pedestrian or vehicle is specified in a kind of video surveillance network

Technical field

The present invention relates to technical field of video image processing, specified pedestrian or car in specially a kind of video surveillance network Quick detection recognition methods.

Background technology

In field of intelligent monitoring, to monitor video network in specified pedestrian or vehicle detected and recognized and can be helped Public security organ quickly determines time of occurrence and the position of suspect or suspected vehicles, accelerates case investigation efficiency.But due to reality Existed in application scenarios illumination condition change, exist between different monitoring camera parameter differences, moving object (pedestrian, Vehicle etc.) there is attitudes vibration and mutually block, and the size of pedestrian or vehicle in monitor video generally smaller, such as people The detailed information such as face or car plate can not be guaranteed, thus how multi-cam monitoring network in quickly and accurately detect with Identify the focus and difficult point of specified pedestrian or vehicle always computer vision field research.

Oreifej is published in 2010《IEEE Comptuer Society Conference on Computer Vision and Pattern Recognition》(computer society of International Electrical IEEE computer vision with Pattern-recognition meeting) technology " the Human identity recognition in that deliver on technology collection page 709 to 716 A kind of taking photo by plane across regarding based on vote by ballot is proposed in aerial images " (pedestrian's identification in Aerial Images) article Pedestrian recognition method under frequency complex background, this technology is first with HoG (Histograms of Oriented Gradient Gradient orientation histogram) pedestrian area in feature detection video, the pedestrian's set occurred in video sequence is constituted, referred to as Candidate gathers, and candidate is thrown by characteristic similarity by pre-prepd target pedestrian's characteristic set for training Ticket, gained vote highest pedestrian is most suspicious object.The algorithm preferably resolves that image resolution ratio is relatively low, pedestrian's attitude is changeable Problem is recognized Deng the target pedestrian under complex environment.But the method in video image because multiple dimensioned being repeatedly scanned with will one by one examine Pedestrian area is surveyed, and needs multiple To Template images to gather as voter, therefore all deposited in speed and practical feasibility In deficiency.

Because monitor video generally has fixed viewpoint so that the method for splitting sport foreground by background modeling can be with Effectively implement, differentiate that such method is in fixed viewpoint scene compared to multiple dimensioned scanning is carried out to video image using HoG features Under it is quick and efficient to the detection of moving object segmentation, but the method will no longer be applicable when camera has mobile.

Reduce training link be also target detection identification in difficult point, at present some be based on artificial neural network and support to Although the detection recognizer recognition accuracy of amount machine is high, amount of calculation is larger, and needs to be provided previously by a large amount of To Template figures It is not high in the less field application value of the To Template information such as criminal investigation as training.

The above method respectively has advantage and disadvantage, how the above method effectively to be combined for application-specific scene, current Study and insufficient.This promotes to find a kind of rational method framework of criminal investigation work network suitable for monitoring, in ensuring method While accuracy rate, method speed is improved, reach real-time processing.

The content of the invention

The present invention provides the quick detection recognition methods of specified pedestrian or vehicle in a kind of video surveillance network, existing to solve There is technology for detection speed slower, and need the problem trained under the enterprising line of training set in advance mostly.

To reach above-mentioned purpose, the present invention is using the following technical scheme counted：Pedestrian is specified in a kind of video surveillance network Or the quick detection recognition methods of vehicle, comprise the following steps：

Step 1：Taken from the picture comprising specified pedestrian or vehicle or video center by way of rectangle frame is demarcated first Identification target to be detected；

Step 2：To Template image to selecting is normalized change of scale, and calculates the various of To Template image Composite character；

Step 3：For the monitor video that each camera shoots opens up independent thread, to each camera in monitoring network simultaneously Row treatment, background modeling is carried out using gauss hybrid models respectively to the monitoring scene residing for each camera；

Step 4：The moving foreground object in each camera is extracted using area filtering and morphology post processing；By setting Determine the moving target area threshold moving target too small to filter area, pedestrian or vehicle are not belonging in scene to filter out Other moving objects；

Step 5：To each monitor video frame-skipping extract sport foreground, in video surveillance network it is all meet preliminary treatment will The moving object asked is normalized change of scale and calculates various composite characters, is weighted phase with To Template image afterwards Like property measurement；

Step 6：According to the similitude size with template image, the detected motion in each monitor video in network will be monitored The information of object is dynamically put into the result vector V of designated length_result, after all monitor videos are traveled through, return to the result vector As final detection recognition result.

Further, the various composite characters described in step 2 are main color characteristic, block margin direction histogram spy Levy, piecemeal HSV histogram features, HoG features and invariable rotary LBP features of equal value.It implements process：

Step 2.1：The generating process of wherein domain color feature is：

1. the To Template image after Normalized Scale is converted is transformed into HSV space from rgb space, only extracts image and exists Chrominance component in HSV space；

2. the span of chrominance component is divided into 8 intervals, by the chrominance component of To Template image according to this 8 Interval projection turns into the value chist in i-th interval of 8 dimension H histogram of component chist, wherein chist_iObtained by following formula：

Wherein h_x,yRefer to the chrominance component value of image (x, y) coordinate position pixel, Rect refer to To Template image or The foreground area that person's sport foreground is detected, δ_i(h_x,y) be defined as follows：

3. the interval average tone value c of 8 tones is calculated respectively_i, and 8 dimension histograms are normalized, obtainP is chosen from 8 dimension normalization histograms_iMaximum 3-dimensional, uses vector v_dcPreserve the c corresponding to this 3-dimensional_iWith p_iConstitute the domain color feature of the target.Therefore domain color feature v_dcIt is made up of 6 dimensions altogether, accounting is maximum in having corresponded to target image 3 color components tone value and percentage v_dc=[c₁,p₁,c₂,p₂,c₃,p₃]；

Step 2.2：The generating process of wherein block margin direction histogram is：

1. the target image after Normalized Scale is converted first is converted into gray level image by RGB color image, and splits Into 4 × 4 totally 16 pieces；

2. gray level image is filtered using Sobel horizontal edges detective operators and Sobel vertical edge detective operators, The horizontal edge intensity and vertical edge intensity of each pixel are obtained, and the edge direction of each pixel is obtained using them e_x,yAnd edge strength | e_x,y|；

3. pixel is divided into five types according to edge direction and intensity：Horizontal edge, vertical edge, 45 ° of edges, 135 ° of edges and directionless edge；Edge orientation histogram is built using four kinds of edges above；

4. directionless edge threshold T is set_e, in each piecemeal, T is more than to all edge strengths_ePixel according to side Edge direction carries out statistics with histogram, and edge orientation histogram is tieed up in generation 4；4 × 4 piecemeal symbiosis tie up edge direction Nogatas into 64 Figure, local edge direction feature ehist is saved as by this 64 dimensional vector^l；Again by 4 dimension edge orientation histograms in 16 piecemeals Added up, constituted the global edge direction characteristic ehist of 4 dimensions of view picture target image^g；They are merged into 68 dimensional vectors, are protected Save as block margin direction histogram feature v_ed；

Step 2.3：The wherein histogrammic generating process of piecemeal HSV is：

1. To Template image or sport foreground image after Normalized Scale is converted are projected to by rgb color space HSV color spaces, and level is divided into 4 pieces from top to bottom by image；

2. the HSV histograms in 4 piecemeals are calculated respectively, wherein chrominance component is divided into 16 intervals, saturation degree component and Luminance component is respectively divided into 4 intervals, and each piecemeal finally gives 24 dimension histogram features；4 24 dimension histograms are spliced to form 96 dimension piecemeal HSV histograms of entire image；

Step 2.4：The generating process of wherein HoG features is：

1. the To Template image after Normalized Scale is converted is converted into gray level image, and gray level image is carried out Gamma is corrected, and reduces the influence of local shades and illumination variation to characteristic extraction procedure；

2. convolution algorithm is done to image with [- 1,0,1] gradient operator, obtains the horizontal gradient G of each pixel_x(x, y), Use [1,0, -1] again^TGradient operator does convolution algorithm to image, obtains the vertical gradient G of each pixel_y(x,y)；And utilize water Flat ladder degree and vertical gradient calculate gradient magnitude and gradient direction：

3. the cell factory of multiple 8 × 8 pixels is divided the image into, gradient direction is averagely divided into 9 by span Individual interval, according to the gradient direction of each pixel in cell factory, histogram of gradients is tieed up in generation 9；

4. setting every 4 adjacent cell factory composition one piecemeal, i.e. each piecemeal has 16 × 16 pixels, and between piecemeal Can be overlapped；By 49 dimension histogram of gradients series connection of cell factory in each piecemeal, the piecemeal description vectors of 36 dimensions are obtained；

5. the window of setting 64 × 64, window is vertically slided from top to bottom, is slided at intervals of 8 pixels, The description vectors of all piecemeals that window is included are connected, and constitute the description vectors of whole window, then by all windows Description vectors connected, finally give HoG characteristic vectors v_hog；

Step 2.5：The generating process of invariable rotary LBP features wherein of equal value is：

1. To Template image or sport foreground image after Normalized Scale is converted are converted into gray level image, for figure In each pixel, compare itself and 8 gray value sizes of neighbor pixel around, neighbours' point gray value is more than the pixel then 1 is set to, otherwise is then set to 0；8 numeric strings are unified into 82 systems in the direction of the clock since 12 o'clock position again Number, computing formula is as follows：

Wherein, P is neighbours' number, and R is radius size, g_yAnd g_cThe respectively gray value and center picture of neighbor pixel point The gray value of vegetarian refreshments；

2. each the 82 system numbers that will be obtained are joined end to end, and form an annular, and the circular clockwise is rotated 7 times, 8 groups 82 system numbers can be obtained altogether, 8 group of 22 minimum system number of system number intermediate value is chosen, and the number is pixel correspondence Invariable rotary LBP values；

3. the invariable rotary LBP values that will 2. be obtained by step are divided into two classes, and wherein 0-1 conversion times are not more than 2 times and return It is a class, referred to as consistent LBP operators；Remaining 2 system number is all classified as another kind of；By above-mentioned classification, consistent invariable rotary LBP operators have 9 kinds, and non-uniform LBP operators have a kind；

4. the fritter of multiple 16 × 16 pixels is divided the image into, pixel LBP values in each fritter are calculated, one is constituted The invariable rotary LBP histograms of equal value of 10 dimensions；The LBP histograms of all fritters are connected, the equivalence of 840 dimensions is finally given Invariable rotary LBP features v_LBP；

Further, before the filtering of utilization area and morphology post processing described in step 4 are extracted in each camera Scape target, it is implemented including following sub-step：

Step 4.1：Area filtering is carried out to the binary image after mixed Gaussian background modeling and thresholding treatment；

Step 4.2：3 expansive workings are carried out with 3 × 3 templates to binary image；

Step 4.3：1 etching operation is carried out with 3 × 3 templates again to binary image；

Step 4.4：Finally carry out an area filtering again to binary image.

Area filtering described in step 4 refers to the fritter that binary image is divided into multiple 4 × 4 pixels, if certain When the number of foreground pixel point is less than or equal to 3 in individual fritter, then all pixels point in the fritter is defined as background pixel point, instead The foreground pixel point then retained in the fritter；

The moving target too small to filter area by setting moving target area threshold described in step 4, it is specific Implementation process is：All of connected region in binary image is found out after treatment, each connected region foreground pixel point is calculated Number, setting area threshold Th_cIt is the 1/400 of video frame images area, if the area of connected region is more than Th_cThen retain connection Region, and the minimum rectangle frame comprising the connected region is returned as sport foreground region；If connected region area is less than or equal to Th_cThe connected region is then set to background area.

Further, sport foreground is extracted to each monitor video frame-skipping described in step 5, it implements process and is： According to user set frame-skipping number F, every F frames again will in step 4 extract sport foreground object one by one with To Template image It is weighted similarity measurement；

Weighting similarity measurement described in step 5, it implements process and is：First respectively calculate sport foreground object with The domain color characteristic distance D of To Template image_dc, block margin direction histogram characteristic distance D_ed, piecemeal HSV histograms it is special Levy apart from D_hsv, HoG characteristic distances D_hogWith invariable rotary LBP characteristic distances D of equal value_LBP, then the distance of five kinds of features is returned One changes weighting similarity measurement, obtains weighting similarity distance D_all：

Wherein α, β, γ, λ,It is normalized weighing factors, for balancing influence of each feature to overall similarity measurement, Tested to adjust in advance according to specific environment.

Further, the information that will monitor the detected moving object in network in each monitor video described in step 6 is moved State is put into the result vector V of designated length_result, it is implemented including following sub-step：

Step 6.1：Create a vectorial V with designated length N_result, wherein N is the monitor video net for needing to return With specified pedestrian or the immediate object number of car modal image in network；

Step 6.2：When there is new sport foreground to be detected in monitor video network, V is first determined whether_resultMiddle preservation Object number；

Step 6.3：If V_resultThe object number of middle preservation is less than N, when object and To Template image weighting similitude away from From D_allMeet similarity threshold Th set in advance_allWhen, then the residing camera of the object is numbered, frame number occur, occur The information such as the subject image after rectangle frame coordinate position in the frame and Normalized Scale conversion are retained in vectorial V_result In；

Step 6.4：If V_resultThe object number of middle preservation has reached N, and when object and the weighting phase of To Template image Like property apart from D_allMeet similarity threshold Th set in advance_allWhen, then the weighting similarity distance D for being obtained with the object_allWith V_resultIn the weighting similarity distance that obtains of the object maximum with To Template distanceCompare, if Then V is substituted with the object_resultIn the object maximum with To Template distance, and by V_resultN number of object of middle preservation according to To Template similarity distance is resequenced from small to large；

Step 6.5：After the monitor video that all cameras shoot in traversal monitoring network, the V of return_resultProtected in vector Deposit in whole monitoring network with specified pedestrian or the immediate N number of object of car modal image, and comprising residing for each of which Camera numbering, occur frame number, there is the thing after rectangle frame coordinate position in the frame and Normalized Scale conversion Body image information.

Principle of the invention is that the pedestrian or vehicle in monitor video network must be in certain time periods in the presence of the mistake moved Cheng Caineng enters and leaves monitoring range, and monitoring camera all has fixed visual angle, so if will to specify pedestrian or Vehicle is used for quickly detecting identification, only need to respectively carry out background modeling to each camera in network, in extracting each camera The foreground object of motion, by after preliminary screening, thread individually being opened up to each camera, with specified pedestrian or the mould of vehicle Plate image carries out multiple features Weighted Fusion similarity measurement with the moving object after primary election, then to the similitude in each camera Result carries out collecting sequence, you can entirely monitored most like with specified pedestrian or vehicle top n moving object in network, As the result that quick detection is recognized；Due to the sport foreground included in adjacent several frames in monitor video do not have typically it is too big Change, therefore the present invention can carry out frame-skipping treatment during background modeling and similarity measurement, be carried out once every F frames Sport foreground similarity measurement, the arithmetic speed of system is further speeded up with this.

The present invention has the advantages that：

1) compared with prior art, fixed present invention utilizes monitoring camera visual angle and there will necessarily be fortune with pedestrian or vehicle Dynamic this two dot characteristics of process, target detection scope is primarily determined that using the method for Analysis on Prospect, compared to multiple dimensioned searching algorithm, The target zone of detection identification can be effectively reduced, the efficiency of detection identification is improved；

2) present invention comprehensive utilization various features, at the same consider to specify pedestrian or vehicle and each sport foreground color, Many degree of similarity such as shape, texture, and fusion is weighted to multi-aspect information；

3) present invention only needs to simple target template as input, it is not necessary to provide the database use for being detected target In training, the method feasibility in actual applications is considerably increased；

4) invention introduces multi-cam parallel processing and frame-skipping treatment scheduling algorithm, so that the present invention is in complex scene In both accurately can robustly complete to specify the detection identification of pedestrian or vehicle, the operation time of method can be effectively reduced again.

Brief description of the drawings

Fig. 1 is overall flow schematic diagram of the invention；

Fig. 2 (a) is monitoring scene schematic diagram of the invention；

Fig. 2 (b) is mixed Gaussian background modeling result figure of the invention；

Fig. 2 (c) is that morphology of the invention post-processes result figure；

Fig. 2 (d) is motion foreground segmentation final result figure of the invention；

Fig. 3 is the specified To Template image schematic diagram of the present invention；

Fig. 4 is the specified target quick detection recognition result figure of the present invention；

Specific embodiment

With reference to specific embodiment, this invention is expanded on further.The present embodiment is premised on technical solution of the present invention Under implemented, give detailed implementation method and specific operating process, but protection scope of the present invention be not limited to it is following Embodiment.

Embodiment

The selected software environment of the present invention：Windows7.0 operating systems, VS2010 development platforms, C/C++ exploitation languages Speech, OpenCV2.3.1 kits.

The present embodiment is processed certain monitoring network.Video scene is complex in the monitoring network, there are a large amount of fortune Had between animal body, and moving object and mutually blocked, also due to parameter and position relationship, have illumination between each camera Change and aberration.

See Fig. 1, the technical solution adopted in the present invention is：Specified pedestrian's or vehicle in a kind of video surveillance network Quick detection recognition methods, comprises the following steps：

User is provided comprising the picture or monitor video data for specifying pedestrian or vehicle, using rectangle collimation mark by man-machine interface Fixed method, the location of pedestrian or vehicle are determined from the picture or video frame images center fetching, and rectangle frame is included Region is individually divided into piece image.

It is 64 × 128 pixel sizes by To Template image normalization change of scale, To Template image is calculated respectively Domain color feature, block margin direction histogram feature, piecemeal HSV histogram features, HoG features and invariable rotary LBP of equal value Feature, with many-sided to specifying pedestrian or vehicle to express from color, shape, texture etc..

Step 2.1：The generating process of wherein domain color feature is：

1. the To Template image after Normalized Scale is converted is transformed into HSV space from rgb space, only extracts image and exists Chrominance component in HSV space；Domain color feature is calculated equivalent to the color group for only considering target only with chrominance component Into, so that feature is insensitive to the illumination variation and saturation difference of different cameras, with the robustness of this Enhanced feature.

Wherein h_x,yRefer to the chrominance component value of image (x, y) coordinate position pixel, Rect refer to To Template image or The foreground area that person's sport foreground is detected.δ_i(h_x,y) be defined as follows：

3. the interval average tone value c of 8 tones is calculated respectively_i, and 8 dimension histograms are normalized, obtainP is chosen from 8 dimension normalization histograms_iMaximum 3-dimensional, uses vector v_dcPreserve the c corresponding to this 3-dimensional_iWith p_iConstitute the domain color feature of the target.Therefore domain color feature v_dcIt is made up of 6 dimensions altogether, accounting is maximum in having corresponded to target image 3 color components tone value and percentage v_dc=[c₁,p₁,c₂,p₂,c₃,p₃]

Two domain color characteristic vectors are given, then the distance between they D_dcIt is defined as follows：

A in above formula_i,jFor characterizing color component c_iAnd c_jSimilarity degree, it passes through below equation and is defined：

T in this example_d=25, d_max=50.

1. the target image after Normalized Scale is converted first is converted into gray level image by RGB color image, and splits Into 4 × 4 totally 16 pieces.

2. gray level image is filtered using Sobel horizontal edges detective operators and Sobel vertical edge detective operators, The horizontal edge intensity and vertical edge intensity of each pixel are obtained, and the edge direction of each pixel is obtained using them e_x,yAnd edge strength | e_x,y|。

3. pixel is divided into five types by conventional domain color feature according to edge direction and intensity：Horizontal edge, hang down Straight edge, 45 ° of edges, 135 ° of edges and directionless edges；Wherein directionless edge refers to edge strength less than a certain specific The pixel of threshold value；Because directionless edge pixel point generally accounts for significant proportion in entire image, therefore in the present invention only Consider four kinds of edges above to constitute edge orientation histogram.

4. directionless edge threshold T is set_e, T in this example_e=100.It is big to all edge strengths in each piecemeal In T_ePixel carry out statistics with histogram according to edge direction, edge orientation histograms are tieed up in generation 4.4 × 4 piecemeal symbiosis into 64 dimension edge orientation histograms, local edge direction feature ehist is saved as by this 64 dimensional vector^l；Again by 4 in 16 piecemeals Dimension edge orientation histogram is added up, and constitutes the global edge direction characteristic ehist of 4 dimensions of view picture target image^g；They are closed And into 68 dimensional vectors, save as block margin direction histogram feature v_ed。

Two block margin direction histogram features are given, then the distance between they D_edIt is defined as：

Compared to local component, the present invention imparts bigger weight to global component.

Step 2.3：The wherein histogrammic generating process of piecemeal HSV is：

1. To Template image or sport foreground image after Normalized Scale is converted are projected to by rgb color space HSV color spaces, and level is divided into 4 pieces from top to bottom by image.

2. the HSV histograms in 4 piecemeals are calculated respectively, wherein chrominance component is divided into 16 intervals, saturation degree component and Luminance component is respectively divided into 4 intervals, therefore each piecemeal finally gives 24 dimension histogram features.By 4 24 dimension histogram splicings Constitute 96 dimension piecemeal HSV histograms of entire image.

In order to adapt to the application scenarios of vehicle and pedestrian detection identification, i.e., in To Template or sport foreground image topmost May include some background elements with bottom, and center section is mainly the important information such as vehicle body or pedestrian's clothes, this reality Example measure two piecemeal HSV histogram features between apart from when impart bigger weight to middle two-layer piecemeal.

Two piecemeal HSV histogram features are given, apart from D between them_hsvIt is defined as follows：

D_hsv(v_hsv,v′_hsv)=0.8d (hist₁,hist₁′)+1.2·d(hist₂,hist′₂)+1.0·d(hist₃, hist₃′)+0.8·d(hist₄,hist′₄)

Wherein d (hist, hist ') is defined as card side's distance in this example：

Step 2.4：The generating process of wherein HoG features is：

1. the To Template image after Normalized Scale is converted is converted into gray level image, and gray level image is carried out Gamma is corrected, and reduces the influence of local shades and illumination variation to characteristic extraction procedure.

3. the cell factory of multiple 8 × 8 pixels is divided the image into, each image will be divided into 128 in this example Individual cell factory, gradient direction is averagely divided into 9 intervals by span, according to the ladder of each pixel in cell factory Histogram of gradients is tieed up in degree direction, generation 9.

4. setting every 4 adjacent cell factory composition one piecemeal, i.e. each piecemeal has 16 × 16 pixels, and between piecemeal Can be overlapped；By 49 dimension histogram of gradients series connection of cell factory in each piecemeal, the piecemeal description vectors of 36 dimensions are obtained.

5. the window of setting 64 × 64, window is vertically slided from top to bottom, is slided at intervals of 8 pixels, The description vectors of all piecemeals that window is included are connected, and constitute the description vectors of whole window, then by all windows Description vectors connected, finally give HoG characteristic vectors v_hog；The HoG features generated in the present embodiment have 15876 dimensions, The distance between Hog characteristic vectors measurement uses card side's distance：

1. To Template image or sport foreground image after Normalized Scale is converted are converted into gray level image, for figure In each pixel, compare itself and 8 gray value sizes of neighbor pixel around, neighbours' point gray value is more than the pixel then 1 is set to, otherwise is then set to 0, then 8 numeric strings are unified into 82 systems in the direction of the clock since 12 o'clock position Number.Computing formula is as follows：

Wherein, P is neighbours' number, and it is to be set to 1, g in radius size, this example that 8, R is set in this example_yAnd g_cIt is respectively adjacent Occupy the gray value of pixel and the gray value of center pixel.

2. each the 82 system numbers that will be obtained are joined end to end, and form an annular, and the circular clockwise is rotated 7 times, 8 groups 82 system numbers can be obtained altogether, 8 group of 22 minimum system number of system number intermediate value is chosen, and the number is pixel correspondence Invariable rotary LBP values；By the step, possible 2 system output is by 2⁸36 kinds are reduced to, and the LBP values for obtaining are to image Rotation there is robustness.

3. the invariable rotary LBP values that will 2. be obtained by step are divided into two classes, and wherein 0-1 conversion times are not more than 2 times and return It is a class, referred to as consistent LBP operators；Remaining 2 system number is all classified as another kind of；By above-mentioned classification, consistent invariable rotary LBP operators have 9 kinds, and non-uniform LBP operators have a kind.

4. the fritter of multiple 16 × 16 pixels is divided the image into, pixel LBP values in each fritter are calculated, one is constituted The invariable rotary LBP histograms of equal value of 10 dimensions；The LBP histograms of all fritters are connected, the equivalence of 840 dimensions is finally given Invariable rotary LBP features v_LBP；The distance between invariable rotary LBP features of equal value D_LBPIt is same to be measured using card side's distance：

Each camera in monitoring network has fixed video scene, the gray scale of each pixel in video scene Value can be described with mixed Gauss model

Wherein, η is Gaussian probability-density function, and K is the number upper limit of Gaussian function in mixed Gauss model, in this example Middle K is set as 5,WithRespectively k-th Gauss model is in the weight at t frame moment, average and variance.

The mixed Gaussian background modeling process of each camera is by the way that individually opening up thread realizes parallel computation in the present invention. Comprising the following steps that for background modeling is carried out using gauss hybrid models：

1. the mixed Gauss model of each pixel is initialized with the gray value of each pixel of frame of monitor video first.Now Mixed Gauss model only one of which Gaussian function is initialised, and its average is the gray value of current pixel, and variance is set to Fixed value σ²=30, the weights of Gauss are set to 0.05.

2. when a new two field picture is read in, whether each Gaussian function is checked by Gaussian function weights order from large to small Gray value with this pixel matches.I.e. grey scale pixel value is no more than Th with the difference of the Gaussian function average_d=2.5 σ= 13.69.If finding the Gaussian function of matching, step is jumped to 3..If this gray scale and any one Gaussian function are not Match somebody with somebody, then a new Gaussian function is 1. reinitialized according to step.When the Gaussian function that there is no initializtion in mixed model When, directly initialized with new Gaussian function；When K Gaussian function of setting is all used, then with this new Gaussian function Count to replace the Gaussian function of weights minimum in current mixed model.

3., it is necessary to each has been used in mixed model after current pixel gray value corresponding Gaussian function is determined The weights of Gaussian function, average and variance be updated.The modeling of background needs the regular hour to accumulate with renewal, it is stipulated that this Time window length L=200.When the frame number that video reads in is less than 200, Gaussian function more new formula is：

Wherein, N is frame number, ω_kFor recording sequence number of k-th Gaussian function in the arrangement of weights descending.It is two-valued function, works as ω_kWhen Gaussian function sequence number with matching is identical, its value is 1, is otherwise 0.

After frame number over-time length of window L, more new formula is：

After renewal is finished, then the weights of each Gaussian function in mixed Gauss model are normalized.

4. each Gaussian function according to the descending sequence of weight after its renewal, determine that weight is added more than Th_w= 0.7 preceding B Gaussian function is the Gaussian function for describing background.If the Gaussian function matched with current pixel come it is first B, Judge that the pixel is background pixel, conversely, being then foreground pixel.

The binary image obtained to mixed Gauss model carries out area filtering and Morphological scale-space can remove noise and Hole, it is implemented including following sub-step：

Step 4.4：Finally carry out an area filtering again to binary image.

Wherein area filtering refers to the fritter that binary image is divided into multiple 4 × 4 pixels, if before in certain fritter When the number of scene vegetarian refreshments is less than or equal to 3, then all pixels point in the fritter is defined as background pixel point, otherwise then retain and be somebody's turn to do Foreground pixel point in fritter；

Because the video sequence each second in monitor video network includes 25 frames, thus no matter pedestrian or vehicle is adjacent Can all repeat in several frames, in order to accelerate algorithm speed, present invention introduces frame skipping strategy, i.e., the frame-skipping number for being set according to user F, similarity measurement is weighted every F frames with To Template image one by one by the sport foreground object extracted in step 4 again.This F=3 is set in example.

Sport foreground image is normalized change of scale, 64 × 128 pixel images are converted into, and according in step 2 Method calculate respectively the domain color feature of sport foreground, block margin direction histogram feature, piecemeal HSV histogram features, HoG features and invariable rotary LBP features of equal value.

For described weighting similarity measurement, it implements process and is：First it is utilized respectively distance degree between each feature The definition of amount calculates the domain color characteristic distance D of sport foreground object and To Template image_dc, block margin direction histogram it is special Levy apart from D_ed, piecemeal HSV histogram features are apart from D_hsv, HoG characteristic distances D_hogWith invariable rotary LBP characteristic distances D of equal value_LBP, The distance of five kinds of features is normalized weighting similarity measurement again, weighting similarity distance D is obtained_all：

Wherein α, β, γ, λ,It is normalized weighing factors, for balancing influence of each feature to overall similarity measurement, Tested to adjust in advance according to specific environment.Their value is respectively 10,1,5,1.2 and 1 in this example.

N=4, and Th in this example_all=50.

Implementation result

According to above-mentioned steps, to the video in monitoring network specify quick detection and the identification of pedestrian or vehicle.Fig. 2 The result figure in each stage of motion foreground segmentation is given, from Fig. 2 (c) it can be seen that passing through mixed Gaussian background modeling, area Filtering and morphology post processing, sport foreground region is accurately estimated；In Fig. 2 (d), by by area mistake Small connected region is filtered, and final result accurately returns the rectangle frame residing for all sport foreground objects in scene.

Fig. 3 is certain specified pedestrian's template image schematic diagram in this example, and Fig. 4 is that the present invention is returned for the template image Retrieval result figure.As can be seen that 4 returning results are all the correct matchings to To Template image, even if personage's attitude Change is there occurs with residing background, the present invention still can find correct result from complex scene.

All experiments are realized on common PC computers in this example, COMPUTER PARAMETER：Central processing unit Interl (R) Core (TM) i3-2100 CPU@3.10GHz, internal memory 4GB.The processing speed of monitor video with scene moving object it is intensive Degree is relevant, and the average processing speed per frame is 29ms in this example, adds frame-skipping operation and parallel processing, can be right in real time Specified pedestrian or vehicle in monitoring network are used for quickly detecting identification.

Claims

1. the quick detection recognition methods of the specified pedestrian or vehicle in a kind of video surveillance network, it is characterised in that including such as Lower step：

Step 1：Taken from the picture comprising specified pedestrian or vehicle or video center by way of rectangle frame is demarcated first to be checked Survey identification target；

Step 2：The To Template image taken to frame is normalized change of scale, and generates various mixing of To Template image Feature；

Step 3：For the monitor video that each camera shoots opens up independent thread, each camera in monitoring network is located parallel Reason, background modeling is carried out using gauss hybrid models respectively to the monitoring scene residing for each camera；

Step 4：The moving foreground object in each camera is extracted using area filtering and morphology post processing；Transported by setting The moving-target area threshold moving target too small to filter area, to filter out other that pedestrian or vehicle are not belonging in scene Moving object；

Step 5：Sport foreground is extracted to each monitor video frame-skipping, preliminary treatment requirement is met to all in video surveillance network Moving object is normalized change of scale and calculates various composite characters, is weighted similitude with To Template image afterwards Measurement；

Step 6：According to the similitude size with template image, the detected moving object in each monitor video in network will be monitored Information be dynamically put into the result vector V of designated length_result, after all monitor videos are traveled through, return to the result vector conduct Final detection recognition result.

2. the quick detection recognition methods of the specified pedestrian or vehicle in video surveillance network according to claim 1, its It is characterised by：Various composite characters described in step 2 are main color characteristic, block margin direction histogram feature, piecemeal HSV Histogram feature, HoG features and invariable rotary LBP features of equal value, its specific generating process is：

Step 2.1：The generating process of wherein domain color feature is：

1. the To Template image after Normalized Scale is converted is transformed into HSV space from rgb space, only extracts image in HSV Chrominance component in space；

2. the span of chrominance component is divided into 8 intervals, by the chrominance component of To Template image according to this 8 intervals Projection turns into the value chist in i-th interval of 8 dimension H histogram of component chist, wherein chist_iObtained by following formula：

{chist}_{i} = \underset{(x, y) &Element; Re c t}{Σ} δ_{i} (h_{x, y})

Wherein h_x,yRefer to the chrominance component value of image (x, y) coordinate position pixel, Rect refers to To Template image or fortune The foreground area that dynamic foreground detection goes out, δ_i(h_x,y) be defined as follows：

δ_{i} (h_{x, y}) = \{\begin{matrix} 1, & h_{x, y} &Element; [45 * i, 45 * (i + 1)) \\ 0, & o t h e r w i s e \end{matrix}

3. the interval average tone value c of 8 tones is calculated respectively_i, and 8 dimension histograms are normalized, obtainP is chosen from 8 dimension normalization histograms_iMaximum 3-dimensional, uses vector v_dcPreserve the c corresponding to this 3-dimensional_iWith p_iConstitute the domain color feature of the To Template image；Therefore domain color feature v_dcIt is made up of 6 dimensions altogether, in having corresponded to target image The tone value and percentage v of 3 maximum color components of accounting_dc=[c₁,p₁,c₂,p₂,c₃,p₃]；

1. target image after Normalized Scale is converted first is converted into gray level image by RGB color image, and it is divided into 4 × 4 totally 16 pieces；

2. gray level image is filtered using Sobel horizontal edges detective operators and Sobel vertical edge detective operators, is obtained The horizontal edge intensity and vertical edge intensity of each pixel, and the edge direction e of each pixel is obtained using them_x,y And edge strength | e_x,y|；

3. pixel is divided into five types according to edge direction and intensity：Horizontal edge, vertical edge, 45 ° of edges, 135 ° of sides Edge and directionless edge；Edge orientation histogram is built using four kinds of edges above；

4. directionless edge threshold T is set_e, in each piecemeal, T is more than to all edge strengths_ePixel according to edge side To statistics with histogram is carried out, edge orientation histogram is tieed up in generation 4；4 × 4 piecemeal symbiosis, will into 64 dimension edge orientation histograms This 64 dimensional vector saves as local edge direction feature ehist^l；4 dimension edge orientation histograms in 16 piecemeals are carried out again It is cumulative, constitute the global edge direction characteristic ehist of 4 dimensions of view picture target image^g；They are merged into 68 dimensional vectors, are saved as Block margin direction histogram feature v_ed；

Step 2.3：The wherein histogrammic generating process of piecemeal HSV is：

1. To Template image or sport foreground image after Normalized Scale is converted project to HSV colors by rgb color space Color space, and level is divided into 4 pieces from top to bottom by image；

2. the HSV histograms in 4 piecemeals are calculated respectively, and wherein chrominance component is divided into 16 intervals, saturation degree component and brightness Component is respectively divided into 4 intervals, and each piecemeal finally gives 24 dimension histogram features；4 24 dimension histograms are spliced to form view picture 96 dimension piecemeal HSV histograms of image；

Step 2.4：The generating process of wherein HoG features is：

1. the To Template image after Normalized Scale is converted is converted into gray level image, and carries out Gamma schools to gray level image Just, the influence of local shades and illumination variation to characteristic extraction procedure is reduced；

2. convolution algorithm is done to image with [- 1,0,1] gradient operator, obtains the horizontal gradient G of each pixel_x(x, y), then use [1,0,-1]^TGradient operator does convolution algorithm to image, obtains the vertical gradient G of each pixel_y(x,y)；And utilize horizontal ladder Degree and vertical gradient calculate gradient magnitude and gradient direction：

G (x, y) = \sqrt{G_{x} {(x, y)}^{2} + G_{y} {(x, y)}^{2}}

d i r (x, y) = \tan^{- 1} (\frac{G_{y} (x, y)}{G_{x} (x, y)})

3. the cell factory of multiple 8 × 8 pixels is divided the image into, gradient direction is averagely divided into 9 areas by span Between, according to the gradient direction of each pixel in cell factory, histogram of gradients is tieed up in generation 9；

4. setting every 4 adjacent cell factory composition one piecemeal, i.e. each piecemeal has 16 × 16 pixels, and can phase between piecemeal Mutually overlap；By 49 dimension histogram of gradients series connection of cell factory in each piecemeal, the piecemeal description vectors of 36 dimensions are obtained；

5. the window of setting 64 × 64, window is vertically slided from top to bottom, is slided at intervals of 8 pixels, by window The description vectors of all piecemeals that mouth is included are connected, and constitute the description vectors of whole window, then retouching all windows State vector to be connected, finally give HoG characteristic vectors v_hog；

1. To Template image or sport foreground image after Normalized Scale is converted are converted into gray level image, every in figure Individual pixel, compares itself and 8 gray value sizes of neighbor pixel around, and neighbours' point gray value is then set to more than the pixel 1, on the contrary then it is set to 0；8 numeric strings are unified into one 82 system numbers in the direction of the clock since 12 o'clock position again, are counted Calculate formula as follows：

{LBP}_{P, R} = Σ_{y = 0}^{P - 1} s (g_{y} - g_{c}) 2^{y}

s (x) = \{\begin{matrix} 1, & x &GreaterEqual; 0 \\ 0, & x < 0 \end{matrix}

Wherein, P is neighbours' number, and R is radius size, g_yAnd g_cThe respectively gray value and center pixel of neighbor pixel point Gray value；

2. each the 82 system numbers that will be obtained join end to end, formed an annular, by the circular clockwise rotate 7 times, altogether 8 groups 82 system numbers can be obtained, 8 group of 22 minimum system number of system number intermediate value is chosen, the number is the corresponding rotation of pixel Turn constant LBP values；

3. the invariable rotary LBP values that will 2. be obtained by step are divided into two classes, and wherein 0-1 conversion times are not more than 2 times and are classified as one Class, referred to as consistent LBP operators；Remaining 2 system number is all classified as another kind of；By above-mentioned classification, consistent invariable rotary LBP is calculated Son has 9 kinds, and non-uniform LBP operators have a kind；

4. the fritter of multiple 16 × 16 pixels is divided the image into, pixel LBP values in each fritter are calculated, one 10 dimension is constituted Invariable rotary LBP histograms of equal value；The LBP histograms of all fritters are connected, the rotation of equal value of 840 dimensions is finally given Constant LBP features v_LBP。

3. the quick detection recognition methods of the specified pedestrian or vehicle in video surveillance network according to claim 1, its It is characterised by：The foreground target in each camera is extracted in the filtering of utilization area and morphology post processing described in step 4, its Implement including following sub-step：

Step 4.4：Finally carry out an area filtering again to binary image；

Area filtering described in step 4 refers to the fritter that binary image is divided into multiple 4 × 4 pixels, if certain is small When the number of foreground pixel point is less than or equal to 3 in block, then all pixels point in the fritter is defined as background pixel point, otherwise then Retain the foreground pixel point in the fritter；

The moving target too small to filter area by setting moving target area threshold described in step 4, it is implemented Process is：All of connected region in binary image is found out after treatment, the number of each connected region foreground pixel point is calculated, Setting area threshold Th_cIt is the 1/400 of video frame images area, if the area of connected region is more than Th_cThen retain connected region, And the minimum rectangle frame comprising the connected region is returned as sport foreground region；If connected region area is less than or equal to Th_cThen The connected region is set to background area.

4. the quick detection recognition methods of the specified pedestrian or vehicle in video surveillance network according to claim 1, its It is characterised by：Sport foreground is extracted to each monitor video frame-skipping described in step 5, it implements process and is：According to user , one by one be weighted the sport foreground object extracted in step 4 with To Template image again every F frames by the frame-skipping number F of setting Similarity measurement；

Weighting similarity measurement described in step 5, it implements process and is：Sport foreground object and target are first calculated respectively The domain color characteristic distance D of template image_dc, block margin direction histogram characteristic distance D_ed, piecemeal HSV histogram features away from From D_hsv, HoG characteristic distances D_hogWith invariable rotary LBP characteristic distances D of equal value_LBP, then the distance of five kinds of features is normalized Weighting similarity measurement, obtains weighting similarity distance D_all：

Wherein α, β, γ, λ,It is normalized weighing factors, for balancing influence of each feature to overall similarity measurement, according to Specific environment is tested to adjust in advance.

5. the quick detection recognition methods of the specified pedestrian or vehicle in video surveillance network according to claim 1, its It is characterised by：The information that will monitor the detected moving object in network in each monitor video described in step 6 is dynamically put into finger The result vector V of measured length_result, it is implemented including following sub-step：

Step 6.1：Create a vectorial V with designated length N_result, wherein N be need return monitor video network in With specified pedestrian or the immediate object number of car modal image；

Step 6.2：When there is new sport foreground to be detected in monitor video network, V is first determined whether_resultThe object of middle preservation Number；

Step 6.3：If V_resultThe object number of middle preservation is less than N, when object and the weighting similarity distance of To Template image D_allMeet similarity threshold Th set in advance_allWhen, then the residing camera of the object numbered, frame number occur, square occur The information such as the subject image after shape frame coordinate position in the frame and Normalized Scale conversion are retained in vectorial V_resultIn；

Step 6.4：If V_resultThe object number of middle preservation has reached N, and when object and the weighting similitude of To Template image Apart from D_allMeet similarity threshold Th set in advance_allWhen, then the weighting similarity distance D for being obtained with the object_allWith V_resultIn the weighting similarity distance that obtains of the object maximum with To Template distanceCompare, ifThen V is substituted with the object_resultIn the object maximum with To Template distance, and by V_resultN number of object of middle preservation according to mesh Mark template similarity distance is resequenced from small to large；

Step 6.5：After the monitor video that all cameras shoot in traversal monitoring network, the V of return_resultIt is in store in vector With specified pedestrian or the immediate N number of object of car modal image in whole monitoring network, and comprising taking the photograph residing for each of which As head numbering, the frame number for occurring, there is rectangle frame coordinate position in the frame and Normalized Scale conversion after object figure As information.