CN110321869A

CN110321869A - Personnel's detection and extracting method based on Multiscale Fusion network

Info

Publication number: CN110321869A
Application number: CN201910617365.1A
Authority: CN
Inventors: 王鑫; 张良; 鲁志宝
Original assignee: Tianjin Institute Of Fire Protection Ministry Of Emergency Management
Current assignee: Tianjin Institute Of Fire Protection Ministry Of Emergency Management; Tianjin Fire Research Institute of MEM
Priority date: 2019-07-10
Filing date: 2019-07-10
Publication date: 2019-10-11

Abstract

The present invention relates to a kind of personnel's detection and extracting method based on Multiscale Fusion network, the detection of pedestrian target is carried out using computer using the detection method based on crowd density estimation.This method extracts a frame image of fire video image using computer as analysis platform, input is in trained crowd density estimation model, export an an equal amount of crowd density estimation image, whether have pedestrian target and pedestrian target position in the picture, and then complete the detection and extraction of pedestrian target in the picture if be can reflect out in image by crowd density estimation image.This method can use the detection that deep learning network directly carries out pedestrian target to original image, it does not need to carry out extra processing, compared with traditional feature extracting method, detection efficiency can be turned up, to different perspective image distortion better adaptabilities, this method effectively can carry out detection and quickly judgement to pedestrian target in fire video.

Description

Personnel's detection and extracting method based on Multiscale Fusion network

Technical field

The present invention relates to a kind of personnel's detection and extracting method based on Multiscale Fusion network, belong to computer vision neck Domain is the detection of video personnel targets and extracting method for cause of fire investigation.

Background technique

Screening is carried out to the pedestrian on scene of fire periphery during case investigation, judge whether its behavior is fiery with this Calamity is relevant.In today's society, the detection of pedestrian target is a critically important project, is able to detect trip in the picture People's target plays an important role in many fields.With the development of technology, in recent years pedestrian target detection technique there has also been The progress of significant progress, especially artificial intelligence field, pedestrian target detection method is from original based on pedestrian target feature The detection method of extraction is gradually evolved into the detection method based on deep learning.Based on deep learning method either from pedestrian Detection accuracy or detection efficiency have very big promotion compared to pervious method.

In traditional pedestrian target detection work, main means are mainly detected as with the feature of pedestrian target, in face of sea Video data is measured, investigator needs to take a long time observation image, and this traditional pedestrian target detection method has Drawback and limitation.Firstly, conventional method in the method for pedestrian's target detection by the way of feature extraction, extract image In pedestrian target various features, characteristic matching is then carried out with image to be detected, matched feature points correspond to figure The pedestrian target sum as in, invariance of this method dependent on the characteristic point of each pedestrian, but the pedestrian in image has When partial occlusion, if only occurred the part of people, such as leg, foot in picture, then it can not be judged as pedestrian, these methods cannot Corresponding pedestrian's characteristic point is extracted well, then is unable to complete and carries out pedestrian target detection.Secondly as present various places More and more it is mounted with video monitoring platform, the angle of the shooting of each camera is different from, and different shooting angle exists Perspective distortion difference is then shown as in image, therefore the image effect of each camera shooting is different.Side based on feature extraction Method has relatively high verification and measurement ratio for perspective distortion sensitivity under a certain angle shot scene, but after replacing scene, The detection effect of pedestrian target can reduce very much, therefore this method, in different photographed scenes, pedestrian target detection effect is poor It is different very big.

Summary of the invention

Situation and existing deficiency in view of the prior art, the present invention provides a kind of people based on Multiscale Fusion network Member's detection and extracting method, this method are to use base using computer when there is pedestrian in a fire video scene The detection of pedestrian target is carried out in the detection method of crowd density estimation.This method is extracted using computer as analysis platform One frame image of fire video image, in trained crowd density estimation model, output one is an equal amount of for input Whether crowd density estimation image, be can reflect out by crowd density estimation image has pedestrian target and pedestrian in image The position of target in the picture, and then complete the detection and extraction of pedestrian target in the picture.

The present invention to achieve the above object, the technical solution adopted is that: personnel based on Multiscale Fusion network detection and Extracting method, it is characterised in that: using computer as detection platform, using the method for image procossing, based on deep learning Technology detects the personnel targets in original image, chooses the fire video image for preparing analysis in advance Complete the detection to the pedestrian target in fire video image, the specific steps are as follows:

1, the training of crowd density estimation model:

1) preliminary processing is carried out to training fire video image, the preliminary treatment of image includes most possibly having in image Pedestrian position marks in the mark and picture in the region of pedestrian target, and the pedestrian in image is by the way of Gaussian Blur It is labeled, the number of people in the picture is all marked, and the number of people coordinate of label is then become corresponding density map, uses A kind of simple mode is converted, i.e., the rectangle of a normal distribution, rectangular area model are generated in the center position of pedestrian It is within enclosing and be 1, Gaussian Blur formula is as follows:

If a mark point on the xi of picture position, can be indicated, then corresponding ground with δ (x-xi) The density map Y of truth can generate G δ using normalized Gaussian convolution, and S indicates the combination of mark point, entire density map Integral be equal to the summations of all numbers in image；

2) network for the image input crowd density estimation completed mark, carries out the training of model, completes the figure of mark As needing to carry out image amplification, i.e., image is overturn, is scaled, random cropping operation, using the image after the completion of processing as Training set data inputs network, completes the training of network parameter；

3) test fire video image is inputted crowd density estimation model, carries out model after similarly handling Measure of merit, equally by the 10 of entire data set as verifying collection, retraining carries out compliance test result after completing, and leads to The training effect of test set is assessed in the performance completion for crossing verifying collection；

4) finally by the difference between test effect and actual effect, the loss function of network, Zhi Daoxiao are continued to optimize Fruit is optimal.Loss function formula is as follows:

Wherein θ is the parameter set of model, and N is the quantity of training sample, X_iIt is i-th of training sample, F_lIt is X_iIt is corresponding close Spend the true value of figure, F (X_i；θ) be then model predicted value, the difference that L both is represents loss；

2, the application of crowd density estimation model:

Preliminary processing is carried out to fire video to be detected using computer, marks the area for being likely to occur pedestrian in image Image to be detected input crowd density estimation model that processing is completed is predicted that output is the density estimation of crowd by domain Figure, can be completed the detection whether detection image has pedestrian target.

The beneficial effects of the present invention are: the present invention is using the pedestrian target detection method based on density estimation, it is a kind of Video personnel detection and extracting method applied to cause of fire investigation, are to carry out crowd density analysis to fire video image, Judge whether there is pedestrian target in video image, is a kind of method based on deep learning.The present invention is regarded using existing fire Frequency image is trained crowd density estimation model, is optimal the effect of model by adjusting the multiple parameters of model, Then by image input model to be detected, the corresponding crowd density figure of a width is generated, it can be direct using crowd density figure Whether there are pedestrian target and each pedestrian target position in the picture in analysis image.

The present invention carries out the assistant analysis of video image using depth learning technology, to image in pedestrian's target operation In pedestrian target detection, there is great advantage, the present invention is generated in corresponding scene using depth learning technology Crowd density distribution map can detecte out original image pedestrian target by the analysis of the crowd density distribution map to generation, be promoted Pedestrian target detection efficiency and accuracy.

First, this method is analyzed based on video image, accurate and visual.

Second, this method is different from the pervious detection method based on pedestrian's feature, for flame, the object etc. in image It blocks insensitive with camera perspective distortion.

Third has the extensive arrangement of monitoring and the development trend in future, this method universal in view of present each place It is applicable in and has wide popularization space.

In short, using the pedestrian target detection method of crowd density estimation, than traditional pedestrian's mesh based on feature extraction Marking detection method has much the detection that a little can use deep learning network directly to original image progress pedestrian target, It does not need to carry out extra processing, compared with traditional feature extracting method, detection efficiency can be turned up, to different figures As perspective distortion better adaptability, this method effectively can carry out detection and quickly judgement to pedestrian target in fire video.

Detailed description of the invention

Fig. 1 is present invention crowd's picture to be detected；

Fig. 2 is the video image for being possible to occur pedestrian area obtained after the present invention pre-processes；

Pedestrian Fig. 3 of the invention marks schematic diagram；

Fig. 4 is crowd density estimation model structure of the invention；

Crowd density estimation model Fig. 5 of the invention generates image schematic diagram；

Flow chart Fig. 6 of the invention.

Specific embodiment

As shown in Figures 1 to 6, personnel's detection and extracting method based on Multiscale Fusion network, are a kind of applied to fire The video personnel of calamity causal investigation detect and extracting method, using computer as detection platform, using the side of image procossing Method, the technology based on deep learning detect the personnel targets in original image, choose one in advance and prepare analysis The detection to the pedestrian target in fire video image can be completed in fire video image, the specific steps are as follows:

1, the training of crowd density estimation model:

1) preliminary processing is carried out to training fire video image, the preliminary treatment of image includes most possibly having in image Pedestrian position marks in the mark and picture in the region of pedestrian target, and the pedestrian in image is by the way of Gaussian Blur It is labeled, the number of people in the picture is all marked, and the number of people coordinate of label is then become corresponding density map.We It is converted using a kind of simple mode, i.e., generates the rectangle of a normal distribution, rectangle region in the center position of pedestrian It is within the scope of domain and be 1.Gaussian Blur formula is as follows:

If a mark point on the xi of picture position, can be indicated with δ (x-xi).So corresponding ground The density map Y of truth can generate G δ using normalized Gaussian convolution.The combination of S expression mark point.Entire density map Integral be equal to the summations of all numbers in image.

2) network for the image input crowd density estimation completed mark, carries out the training of model.Complete the figure of mark As needing to carry out image amplification, i.e., image is overturn, scaled, the image after the completion of processing is made in the operation such as random cropping Network is inputted for training set data, completes the training of network parameter.

3) test fire video image is inputted crowd density estimation model, carries out model after similarly handling Measure of merit, same we collect 10 or so of entire data set as verifying, and retraining carries out effect after completing Verifying assesses the training effect of test set by the performance completion of verifying collection.

4) finally by the difference between test effect and actual effect, the loss function of network, Zhi Daoxiao are continued to optimize Fruit is optimal, and loss function formula is as follows:

Wherein θ is the parameter set of model, and N is the quantity of training sample, X_iIt is i-th of training sample, F_iIt is X_iIt is corresponding close Spend the true value of figure, F (X_i；θ) be then model predicted value, the difference that L both is represents loss.

2, the application of crowd density estimation model:

Preliminary processing is carried out to fire video to be detected using computer, marks the area for being likely to occur pedestrian in image Domain.Image to be detected input crowd density estimation model that processing is completed is predicted that output is the density estimation of crowd Figure.The detection whether detection image has pedestrian target can be completed.

Fig. 1 is that crowd's picture to be detected passes through to not have the interference of pedestrian target region in rejection image It there will not be the mode of pedestrian area blurring, only the region for being likely to occur pedestrian be trained, Fig. 2 is by processing Crowd's picture.

Pedestrian Fig. 3 of the invention marks schematic diagram, is the figure after neural counting and network, is indicated in figure using origin Single pedestrian target, multiple pedestrians are indicated by the way of stacking when aggregation, complete whole image finally by integral Personnel count.

Fig. 4 is crowd density estimation model structure, that is, network structure of the invention, using biserial difference convolution kernel Network, in order to extract the characteristics of image of different scale, first row extracts larger size characteristics of image, using from 11*11 to 7*7 Convolution kernel.Secondary series extracts the characteristics of image of smaller size, only with the convolution kernel of 3*3.Then the fusion of network is completed. Since two column network structure of front has pooling layers of two layers of max, all output images only have original image resolution four/ One, so avoiding losing excessive letter in image to original image size by a column deconvolution network recovery after fusion Breath.

Crowd density estimation model Fig. 5 of the invention generates image schematic diagram, and as image is after neural network model The density image of generation, total number of persons are the sum of image integration.

Fire is first video image by processing change by flow chart, that is, inventive algorithm flow chart Fig. 6 of the invention At multi-frame video image, image inputs deep learning network model after denoising and enhancing, generates one by the operation of model Corresponding density of personnel image is opened, image size is identical with original video image size, passes through the integral of the density image to generation Completion personnel count, and obtain personnel targets.

Claims

1. a kind of personnel's detection and extracting method based on Multiscale Fusion network, it is characterised in that: using computer as inspection Platform is surveyed, using the method for image procossing, the technology based on deep learning detects the personnel targets in original image, The fire video image for preparing analysis is chosen in advance, and the detection to the pedestrian target in fire video image can be completed, Specific step is as follows:

1, the training of crowd density estimation model:

1) preliminary processing is carried out to training fire video image, the preliminary treatment of image includes most possibly having pedestrian in image Pedestrian position marks in the mark and picture of mesh target area, and the pedestrian in image is carried out by the way of Gaussian Blur Mark, the number of people in the picture are all marked, and the number of people coordinate of label are then become corresponding density map, using a kind of letter Single mode is converted, i.e. the rectangle in center position one normal distribution of generation of pedestrian, within the scope of rectangular area And be 1, Gaussian Blur formula is as follows:

If a mark point on the xi of picture position, can be indicated with δ (x-xi), then corresponding ground truth Density map Y can generate G δ using normalized Gaussian convolution, and S indicates the combination of mark point, and the integral of entire density map is just Equal to the summation of numbers all in image；

2) network for the image input crowd density estimation completed mark, carries out the training of model, and the image for completing mark needs Image amplification is carried out, i.e., image is overturn, is scaled, random cropping operation, using the image after the completion of processing as training Collect data and input network, completes the training of network parameter；

3) test fire video image is inputted crowd density estimation model, carries out the effect of model after similarly handling Test, equally by the 10 of entire data set as verifying collection, retraining carries out compliance test result after completing, and passes through verifying The training effect of test set is assessed in the performance completion of collection；

4) finally by the difference between test effect and actual effect, continue to optimize the loss function of network, until effect most Excellent, loss function formula is as follows:

Wherein θ is the parameter set of model, and N is the quantity of training sample, X_iIt is i-th of training sample, F_IIt is X_iCorresponding density map True value, F (X_i；θ) be then model predicted value, the difference that L both is represents loss；

2, the application of crowd density estimation model:

Preliminary processing is carried out to fire video to be detected using computer, marks the region for being likely to occur pedestrian in image, Image to be detected input crowd density estimation model that processing is completed is predicted that output is the density estimation figure of crowd, The detection whether detection image has pedestrian target can be completed.