CN108921083A

CN108921083A - Illegal flowing street pedlar recognition methods based on deep learning target detection

Info

Publication number: CN108921083A
Application number: CN201810688380.0A
Authority: CN
Inventors: 陈晋音; 龚鑫; 方航; 俞露; 王诗铭
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2018-06-28
Filing date: 2018-06-28
Publication date: 2018-11-30
Anticipated expiration: 2038-06-28
Also published as: CN108921083B

Abstract

The illegal flowing street pedlar recognition methods based on deep learning target detection that the object of the present invention is to provide a kind of, includes the following steps：Road monitoring image is obtained, and the traffic surveillance videos are cut into frame image；Stand and the position of pedestrian are detected from frame image using target detection model；According to the position of stand, the stand moved in image is filtered out, retains fixed stand；Position and quantity based on fixed stand cluster pedestrian with K-means clustering method, obtain pedestrian corresponding with each fixed stand；Distinguish different pedestrian and stand respectively using pedestrian's identification model and stand identification model；Determine whether the pedestrian of class where being divided into the same fixed stand is street pedlar.Method provided by the invention may be implemented to carry out automatic evidence-collecting to illegal flowing street pedlar existing within the scope of road monitoring, effectively improves the efficiency of city management department, reduces human cost.

Description

Illegal flowing street pedlar recognition methods based on deep learning target detection

Technical field

The invention belongs to intelligent cities to manage application field, and in particular to the illegal flowing based on deep learning target detection Street pedlar's recognition methods.

Background technique

Street pedlar is flowed, refers to and does not fix operation location, peddle the businessman or pedlar of article in city with forma fluens. For huckster mostly without operation permission, the commodity peddled are unable to get quality assurance.Moreover, there are open fire burnings by flowing street pedlar The behaviors such as roasting, fried food, generate a large amount of wastes, influence city appearance, pollute.The article that retailer peddles with breakfast, The food such as prepared food, fruit are in the majority, if sanitary condition, food quality cannot be guaranteed, it will cause certain health hazard.

Therefore, flowing street pedlar becomes one of the main object of city management department regulation.Due to flowing the flowing of street pedlar Property is strong, and scope of activities is wide, so that relevant departments are difficult to be managed it.It, can be with the fast development of artificial intelligence technology Flowing street pedlar is identified using the relevant technologies, to realize the automatic effect for capturing evidence obtaining.Using based on deep learning Illegal flowing street pedlar's identifying system, can detect whether there is flowing street pedlar, and then save city from monitoring probe picture automatically The manpower of administrative department of city improves city management efficiency.

It during illegally flowing street pedlar identification, needs to detect pedestrian and stand from image, according to pedestrian and booth The relative position of position, motion profile, analyzing which pedestrian is flowing street pedlar, and then carries out candid photograph evidence obtaining.In this regard, needing to use Object detection method is found out the interested object of people from image, and is identified to it.Target detection side common at present Method is all based on depth learning technology, there is the methods of Faster R-CNN, YOLO, SSD.

It is quick that the disclosure of Publication No. CN107679078A is related to a kind of bayonet image vehicle based on deep learning Search method and system, the present invention extract vehicle characteristic information using deep neural network, are based on inception_resnet_ V2 network extracts vehicle characteristics, shares wherein realizing network weight, effectively prevents largely computing repeatedly, loses Function uses triple sample training, directly 128 dimensional vectors of generation, and in the retrieving image stage, present invention employs poly- to feature The mode of class is established feature and is indexed, and inquiry velocity is promoted.The present invention can accelerate the extraction rate of characteristics of image, and quickly Real-time response while, can effectively check and chase this part illegal vehicle of false-trademark, fake-licensed car.

Summary of the invention

The illegal flowing street pedlar recognition methods based on deep learning target detection that the object of the present invention is to provide a kind of, with reality Automatic evidence-collecting now is carried out to illegal flowing street pedlar existing within the scope of road monitoring, effectively improves the effect of city management department Rate reduces human cost.

The illegal flowing street pedlar recognition methods of target detection based on deep learning, includes the following steps：

(1) road monitoring image is obtained, and the traffic surveillance videos are cut into frame image；

(2) stand and the position of pedestrian are detected from frame image using target detection model；

(3) according to the position of stand, the stand moved in image is filtered out, retains fixed stand；

(4) position and quantity based on fixed stand, cluster pedestrian with K-means clustering method, obtain and each solid Determine the corresponding pedestrian in stand；

(5) pedestrian in different frame image or stand are distinguished respectively whether using pedestrian's identification model and stand identification model For the same pedestrian or stand；

(6) determine whether the pedestrian of class where being divided into the same fixed stand is street pedlar；

The study that the target detection model is made of Inception Resnet v2 network and Faster R-CNN network Network training obtains；Pedestrian's identification model and street pedlar's identification model are obtained by Inception Resnet v2 network training It arrives.

The corresponding learning network of target detection model includes：

Inception Resnet v2 network carries out feature extraction, output characteristic pattern to RPN net to the frame image of input Network and the pond RoI layer；

RPN network receives the characteristic pattern that exports from Inception Resnet v2 network, extract there may be The rectangle candidate region of target, and it is output to the pond RoI layer；

The pond RoI layer receives the characteristic pattern of Inception Resnet v2 network output and the rectangle of RPN network output Candidate region, by output characteristic pattern rear on rectangle candidate area maps to characteristic pattern to full articulamentum；

Full articulamentum receives the characteristic pattern of the pond RoI layer output, exports the objects in images institute in each rectangle candidate region The classification of category and its classification confidence；Adjust the boundary of object in rectangle candidate region, output coordinate information.

Wherein, stamp identical category respectively to the pedestrian in image, stand, formed training sample to target detection model into Row training.

Pedestrian's identification model and the corresponding Inception Resnet v2 network of street pedlar's identification model include：

First layer is Reshape function layer；

The second layer, third layer are the convolutional layer of 3*3；

4th layer is maximum pond layer；

Layer 5, layer 6 are the convolutional layer of 3*3；

Layer 7 is maximum pond layer；

8th layer to the 13rd layer is the Reduction network module and Inception network module alternately connected；

The 14th layer of convolutional layer for 3*3；

15th layer is average pond layer；

16th layer is output layer；

The 17th layer of full articulamentum for 1*1*1024 exports the vector of characteristic pattern and 1*1*1024 dimension；

The 18th layer of full articulamentum for 1*1*N, the object in vector for tieing up to 1*1*1024 are classified, and are exported Object category and classification confidence, N are classification quantity.

Reduction A module, 5 are followed successively by for the 8th layer to the 13rd layer in the Inception Resnet v2 network A concatenated Inception A module, Reduction B module, 10 concatenated Inception B modules, Reduction C Module and 5 concatenated Inception C modules.

The Reduction-A module is by four part in parallel：First part is the convolutional layer of 1*1；Second part is 1*1's Convolutional layer, the convolutional layer of 3*3；Part III is the convolutional layer of 1*1, the convolutional layer of 3*3, the convolutional layer of 3*3；Part IV is 1* 1 convolutional layer, average pond layer, the output of four part in parallel；The Reduction-B module is by three parts parallel connection：First part For the convolutional layer of 1*1；Second part is the convolutional layer of 1*1, the convolutional layer of 3*3；Part III is average pond layer；This three parts It is attached by Concat layers, is exported after splicing；The Reduction-C module is by three parts parallel connection：First part is 1*1 Convolutional layer, the convolutional layer of 1*1；Second part is the convolutional layer of 1*1, the convolutional layer of 3*3；Part III is the convolutional layer of 1*1, The convolutional layer of 3*3, the convolutional layer of 3*3；Part IV is average pond layer；This four part is attached by Concat layers, is spelled It is exported after connecing；

The Inception-A module is by three parts parallel connection：First part is the convolutional layer of 1*1；Second part is 1*1's Convolutional layer, the convolutional layer of 3*3；Part III is the convolutional layer of 1*1, the convolutional layer of 3*3, the convolutional layer of 3*3；This three parts passes through Concat layers are attached, and after the convolutional layer of 3*3, constitute output together with depth residual error network；It is described Inception-B module is by two parts parallel connection：First part is the convolutional layer of 1*1；Second part is the convolutional layer of 1*1,3*3's Convolutional layer, the convolutional layer of 3*3；This two parts is attached by Concat layers, residual with depth after the convolutional layer of 3*3 Poor network constitutes output together；The Inception-C module is by two parts parallel connection：First part is the convolutional layer of 1*1；Second Part is the convolutional layer of 1*1, the convolutional layer of 3*3；This two parts is attached by Concat layers, by 3*3 convolutional layer it Afterwards, output is constituted together with depth residual error network.

Wherein, pedestrian image is extracted, stamps the same category for the same pedestrian, the category without same pedestrian is different, uses In training pedestrian's identification model；Pitch flow image is extracted, stamps same category for the same stand, and the class of different stands Mark is different, for training stand identification model.

In step (3), the method for retaining fixed stand is：

By the position of each stand detected and feature vector storage into database, and increase a counting variable COUNT；Whenever detecting a new stand, its feature vector is compared with the target deposited；If in the database It is stored with same target, and the changes in coordinates of target is less than preset value, then increases its count value COUNT+n₁, and more new data The information of target is corresponded in library；If the not stored target, is deposited into database in database；If some target in database Do not occur in certain frame, then reduces its count value COUNT-n₂；Given highest threshold value COUNT_MAX and lowest threshold COUNT_ MIN；If COUNT is greater than COUNT_MAX, COUNT is set to peak COUNT_MAX；If COUNT is less than COUNT_MIN, Delete current goal.

Wherein, the preset value size of the changes in coordinates adjusts according to the actual situation.

Wherein, if COUNT is greater than COUNT_MAX, COUNT is set to peak COUNT_MAX, count value can be prevented Data caused by excessive are crossed the border, and lead to not delete data excessive in database

The method that pedestrian corresponding with each fixed stand is obtained in step (4) is：According to the quantity n of fixed stand, Using the central point of n fixed stand as initial sample point；According to the center of gravity of the center of each pedestrian and each class cluster Distance, K-means clustering method classify to pedestrian, finally separate n class, corresponding n fixed stand.

In step (5), distinguishes different pedestrian and stand respectively using pedestrian's identification model and stand identification model, sentence Pedestrian or stand in disconnected different frame image are that the method for the same pedestrian or stand is：Pedestrian is extracted using pedestrian's identification model The feature of image obtains the feature vector of pedestrian；The feature that stand image is extracted using stand identification model, obtains the spy of stand Levy vector；Pedestrian and the feature vector of stand will be saved and the feature vector of the pedestrian and stand that have saved compare；

The characteristic distance D under Euclidean distance is calculated according to feature vector；Given threshold value T, if D > T, in different frame image Pedestrian or stand be not the same stand or pedestrian；If D≤T, pedestrian or stand in different frame image are the same booth Position or pedestrian.

Characteristic distance under Euclidean distance：

Wherein D indicates Euclidean distance, and n=1024 indicates feature vector dimension, a_iIndicate the value of i-th dimension in feature vector a, b_iIndicate the value of i-th dimension in feature vector b；A and b represent pedestrian or stand in different frame image.

In step (6), determine whether the pedestrian of class where being divided into the same fixed stand is that the method for street pedlar is： Database is established for pedestrian, stores corresponding characteristic information, history classification information and counting variable COUNT；The history point Category information refers to some pedestrian during multi-frame processing, the information classified by K-means clustering method；Whenever detecting It is compared with the pedestrian in database, if same pedestrian can be detected, increases its count value COUNT+ by pedestrian n₁, current class information is added toward its history classification information；If inspection does not measure same pedestrian, number is added in its information According to library；If some pedestrian does not occur in the current frame in database, reduce its count value COUNT-n₂；Given count threshold ginseng Number C_THRESHOLD and percentage threshold parameter P_THRESHOLD are greater than if the history classification information of some pedestrian is enough C_THRESHOLD, and its percentage for being divided into a certain classification is greater than P_THRESHOLD, then the pedestrian can be assert for flowing Street pedlar；Given highest threshold value COUNT_MAX and lowest threshold COUNT_MIN；If COUNT is greater than COUNT_MAX, by COUNT It is set as peak COUNT_MAX；If COUNT is less than COUNT_MIN, corresponding pedestrian in database is deleted.

The present invention uses Faster R-CNN (fast area convolutional Neural net) method, this is a kind of for target detection Mainstream deep learning network frame, its advantage is that having accuracy of identification more higher than method for distinguishing；Pedestrian, stand position Analysis, needs to use clustering algorithm；K-Means algorithm is a kind of simple and effective unsupervised learning clustering algorithm, it passes through Initial sample point is randomly selected, according to distance of the sample on feature space, sample is divided into different classifications.

Method provided by the invention from traffic surveillance videos by obtaining the position of pedestrian and stand, and to target signature Analysis, screening crossed filter data, obtains position and the quantity of fixed stand, and the method by being clustered based on K-means, from Street pedlar is found out in pedestrian, to carry out automatic evidence-collecting.

Actual benefit of the invention is mainly manifested in：In conjunction with depth learning technology, illegal flowing can be automatically realized Street pedlar's automatic evidence-collecting function effectively improves the efficiency of city management department using existing urban road video surveillance network, Reduce human cost.

Detailed description of the invention

Fig. 1 is the flow chart of illegal flowing street pedlar recognition methods provided by the invention；

Fig. 2 is the structure of Inception Resnet v2 network provided by the invention；

Fig. 3 is the Reduction network module in Inception Resnet v2 network；

Fig. 4 is the Inception network module in Inception Resnet v2 network；

Fig. 5 is the Inception-C network module in Inception Resnet v2 network；

Fig. 6 is the network structure of target detection model provided by the invention.

Specific embodiment

In order to more specifically describe the present invention, with reference to the accompanying drawing and specific embodiment is to technical solution of the present invention It is described in detail.

As shown in Figure 1, the illegal flowing street pedlar recognition methods of the target detection based on deep learning, includes the following steps：

(1) road monitoring image is obtained, and the traffic surveillance videos are cut into frame image.

(2) stand and the position of pedestrian are detected from frame image using target detection model.

As shown in fig. 6, the corresponding learning network of target detection model includes：

As shown in Fig. 2, pedestrian's identification model and the corresponding Inception Resnet v2 network of street pedlar's identification model Including：

First layer is Reshape function layer；

The second layer, third layer are the convolutional layer of 3*3；

4th layer is maximum pond layer；

Layer 5, layer 6 are the convolutional layer of 3*3；

Layer 7 is maximum pond layer；

The 14th layer of convolutional layer for 3*3；

15th layer is average pond layer；

16th layer is output layer；

As shown in figure 3, the Reduction-A module is by four part in parallel：First part is the convolutional layer of 1*1；Second Part is the convolutional layer of 1*1, the convolutional layer of 3*3；Part III is the convolutional layer of 1*1, the convolutional layer of 3*3, the convolutional layer of 3*3； Part IV is the convolutional layer of 1*1, average pond layer, the output of four part in parallel；The Reduction-B module by three parts simultaneously Connection：First part is the convolutional layer of 1*1；Second part is the convolutional layer of 1*1, the convolutional layer of 3*3；Part III is average pond Layer；This three parts is attached by Concat layers, is exported after splicing；The Reduction-C module is by three parts parallel connection：The A part is the convolutional layer of 1*1, the convolutional layer of 1*1；Second part is the convolutional layer of 1*1, the convolutional layer of 3*3；Part III is The convolutional layer of 1*1, the convolutional layer of 3*3, the convolutional layer of 3*3；Part IV is average pond layer；This four part passes through Concat layers It is attached, is exported after splicing；

As shown in Figure 4 and Figure 5, the Inception-A module is by three parts parallel connection：First part is the convolutional layer of 1*1； Second part is the convolutional layer of 1*1, the convolutional layer of 3*3；Part III is the convolutional layer of 1*1, the convolutional layer of 3*3, the convolution of 3*3 Layer；This three parts is attached by Concat layers, after the convolutional layer of 3*3, is constituted together with depth residual error network defeated Out；The Inception-B module is by two parts parallel connection：First part is the convolutional layer of 1*1；Second part is the convolution of 1*1 Layer, the convolutional layer of 3*3, the convolutional layer of 3*3；This two parts is attached by Concat layers, after the convolutional layer of 3*3, Output is constituted together with depth residual error network；The Inception-C module is by two parts parallel connection：First part is the volume of 1*1 Lamination；Second part is the convolutional layer of 1*1, the convolutional layer of 3*3；This two parts is attached by Concat layers, by 3*3's After convolutional layer, output is constituted together with depth residual error network.

The Inception Resnet v2 network further includes Resnet network structure, using depth residual error network, directly It is exported from being input to, and without intermediate module, it solves to deepen with network depth, what accuracy rate may decline instead shows As.

(3) according to the position of stand, the stand moved in image is filtered out, retains fixed stand.

To that can cross discovery in the analytic process of pitch flow, target detection network, which not can guarantee, can detect own every time Pedestrian and stand.In a certain frame, it is capable of detecting when the same pedestrian, stand, and the different surely detection of next frame is drawn, This brings difficulty to analytic process.Therefore it needs to filter out the stand in movement.

Specifically, by the position of each stand detected and feature vector storage into database, and increase by one Counting variable COUNT；Whenever detecting a new stand, its feature vector is compared with the target deposited；If Same target is stored in database, and the changes in coordinates of target is less than preset value, then increases its count value COUNT+n₁, and The information of target is corresponded in more new database；If the not stored target, is deposited into database in database；If in database Some target does not occur in certain frame, then reduces its count value COUNT-n₂；Given highest threshold value COUNT_MAX and lowest threshold COUNT_MIN；If COUNT is greater than COUNT_MAX, COUNT is set to peak COUNT_MAX；If COUNT is less than COUNT_ MIN then deletes current goal.

(4) position and quantity based on fixed stand, cluster pedestrian with K-means clustering method, obtain and each solid Determine the corresponding pedestrian in stand.

Specifically, according to the quantity n of fixed stand, using the central point of n fixed stand as initial sample point；According to The center of gravity distance of the center of each pedestrian and each class cluster, K-means clustering method classify to pedestrian, finally separate n Class, corresponding n fixed stand.

(5) pedestrian in different frame image or stand are distinguished respectively whether using pedestrian's identification model and stand identification model For the same pedestrian or stand.

By taking pedestrian as an example, target detection model detects in each frame and orients existing pedestrian, but can not judge In two frame of front and back, whether certain two pedestrian is the same person.Therefore, it when one frame of every processing, is seized and is taken using target detection model Pedestrian position information is extracted the feature of pedestrian image using pedestrian's identification model, can obtain the feature vector of each pedestrian.

Specifically, generating the difference of feature vector in people's identification model and stand identification model respectively according to pedestrian, stand It is different, by distance of the target in feature space, judge whether two objects are identical pedestrian or stand.

Specifically, extracting the feature of pedestrian image using pedestrian's identification model, the feature vector of pedestrian is obtained；Utilize stand Identification model extracts the feature of stand image, obtains the feature vector of stand；To save the feature vector of pedestrian and stand with The pedestrian of preservation and the feature vector of stand compare；

Characteristic distance under Euclidean distance：

(6) determine whether the pedestrian of class where being divided into the same fixed stand is street pedlar.

Specifically, establishing database for pedestrian, corresponding characteristic information, history classification information and counting variable are stored COUNT；The history classification information refers to that some pedestrian during multi-frame processing, classifies by K-means clustering method Information；Whenever detecting pedestrian, it is compared with the pedestrian in database, if same pedestrian can be detected, Increase its count value COUNT+n₁, current class information is added toward its history classification information；If inspection does not measure same row Then database is added in its information by people；If some pedestrian does not occur in the current frame in database, reduce its count value COUNT-n₂；Given count threshold parameter C_THRESHOLD and percentage threshold parameter P_THRESHOLD, if some pedestrian History classification information is enough, is greater than C_THRESHOLD, and its percentage for being divided into a certain classification is greater than P_ THRESHOLD can then assert the pedestrian for flowing street pedlar；Given highest threshold value COUNT_MAX and lowest threshold COUNT_MIN； If COUNT is greater than COUNT_MAX, peak COUNT_MAX is set by COUNT；If COUNT is less than COUNT_MIN, delete Except pedestrian corresponding in database.

Claims

1. a kind of illegal flowing street pedlar recognition methods of target detection based on deep learning, includes the following steps：

(4) position and quantity based on fixed stand, cluster pedestrian with K-means clustering method, obtain and each fixed booth The corresponding pedestrian in position；

(5) distinguish whether pedestrian in different frame image or stand are same respectively using pedestrian's identification model and stand identification model One pedestrian or stand；

The learning network that the target detection model is made of Inception Resnet v2 network and Faster R-CNN network Training obtains；Pedestrian's identification model and street pedlar's identification model are obtained by Inception Resnet v2 network training.

2. the illegal flowing street pedlar recognition methods of the target detection according to claim 1 based on deep learning, feature It is, the method that fixed stand is retained in the step (3) is：By the position of each stand detected and feature vector It stores in database, and increases a counting variable COUNT；Whenever detecting a new stand, by its feature vector It is compared with the target deposited；If being stored with same target in the database, and the changes in coordinates of target is less than preset value, Then increase its count value COUNT+n₁, and the information of target is corresponded in more new database；If not stored target in database, It is deposited into database；If some target does not occur in certain frame in database, reduce its count value COUNT-n₂。

3. the illegal flowing street pedlar recognition methods of the target detection according to claim 2 based on deep learning, feature It is, the method that fixed stand is retained in the step (3) further includes：Given highest threshold value COUNT_MAX and lowest threshold COUNT_MIN；If COUNT is greater than COUNT_MAX, COUNT is set to peak COUNT_MAX；If COUNT is less than COUNT_ MIN then deletes current goal.

4. the illegal flowing street pedlar recognition methods of the target detection according to claim 1 based on deep learning, feature It is, the method that pedestrian corresponding with fixed stand is obtained in the step (4) is：According to the quantity n of fixed stand, by n The central point of fixed stand is as initial sample point；According to the center of gravity distance of the center of each pedestrian and each class cluster, K-means clustering method classifies to pedestrian, finally separates n class, corresponding n fixed stand.

5. the illegal flowing street pedlar recognition methods of the target detection according to claim 1 based on deep learning, feature Be, distinguish in the step (5) pedestrian or stand in different frame image whether be the same pedestrian or stand method packet It includes：

The feature that pedestrian image is extracted using pedestrian's identification model, obtains the feature vector of pedestrian；It is mentioned using stand identification model The feature for taking stand image obtains the feature vector of stand；By the feature vector for saving pedestrian and stand and the pedestrian saved It is compared with the feature vector of stand；

The characteristic distance D under Euclidean distance is calculated according to feature vector；Given threshold value T, the row if D > T, in different frame image People or stand are not the same stand or pedestrian；If D≤T, pedestrian or stand in different frame image be the same stand or Pedestrian.

6. the illegal flowing street pedlar recognition methods of the target detection according to claim 1 based on deep learning, feature It is determine whether the pedestrian of class where being divided into the same stand is that the method for street pedlar is in the step (6)：

Database is established for pedestrian, stores corresponding characteristic information, history classification information and counting variable COUNT；It is described to go through History classification information refers to some pedestrian during multi-frame processing, the information classified by K-means clustering method；Whenever inspection Pedestrian is measured, it is compared with the pedestrian in database, if same pedestrian can be detected, increases its count value COUNT+n₁, current class information is added toward its history classification information；If inspection does not measure same pedestrian, by its information Database is added；If some pedestrian does not occur in the current frame in database, reduce its count value COUNT-n₂；

Given count threshold parameter C_THRESHOLD and percentage threshold parameter P_THRESHOLD, if the history of some pedestrian point Category information is enough, is greater than C_THRESHOLD, and its percentage for being divided into a certain classification is greater than P_THRESHOLD, then may be used Assert the pedestrian for flowing street pedlar.

7. the illegal flowing street pedlar recognition methods of the target detection according to claim 1 based on deep learning, feature It is determine whether the pedestrian of class where being divided into the same stand is that the method for street pedlar further includes in the step (6)：It gives Determine highest threshold value COUNT_MAX and lowest threshold COUNT_MIN；If COUNT is greater than COUNT_MAX, set COUNT to most High level COUNT_MAX；If COUNT is less than COUNT_MIN, corresponding pedestrian in database is deleted.