CN108921083B - Illegal mobile vendor identification method based on deep learning target detection - Google Patents

Illegal mobile vendor identification method based on deep learning target detection Download PDF

Info

Publication number
CN108921083B
CN108921083B CN201810688380.0A CN201810688380A CN108921083B CN 108921083 B CN108921083 B CN 108921083B CN 201810688380 A CN201810688380 A CN 201810688380A CN 108921083 B CN108921083 B CN 108921083B
Authority
CN
China
Prior art keywords
booth
count
pedestrian
pedestrians
booths
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810688380.0A
Other languages
Chinese (zh)
Other versions
CN108921083A (en
Inventor
陈晋音
龚鑫
方航
俞露
王诗铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201810688380.0A priority Critical patent/CN108921083B/en
Publication of CN108921083A publication Critical patent/CN108921083A/en
Application granted granted Critical
Publication of CN108921083B publication Critical patent/CN108921083B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention aims to provide an illegal mobile vendor identification method based on deep learning target detection, which comprises the following steps: acquiring a road monitoring image, and cutting the road monitoring video into frame images; detecting the positions of the booth and the pedestrians from the frame image by using the target detection model; filtering the moving booth in the image according to the position of the booth, and keeping a fixed booth; based on the positions and the number of the fixed booths, clustering the pedestrians by using a K-means clustering method to obtain the pedestrians corresponding to each fixed booth; distinguishing different pedestrians and booths respectively by utilizing a pedestrian recognition model and a booth recognition model; and judging whether the pedestrians classified as the same fixed booth are pedlars or not. The method provided by the invention can realize automatic evidence obtaining for illegal mobile vendors in the road monitoring range, effectively improve the efficiency of the urban management department and reduce the labor cost.

Description

Illegal mobile vendor identification method based on deep learning target detection
Technical Field
The invention belongs to the field of intelligent city management application, and particularly relates to an illegal mobile vendor identification method based on deep learning objective detection.
Background
A mobile vendor refers to a businessman or a vendor that sells goods in a city in a mobile form without a fixed place of business. Most mobile vendors do not have operating license, and the sold goods can not be guaranteed in quality. In addition, the flowing stall has the actions of roasting and frying food with open fire, and a large amount of waste is generated, which affects the appearance of the city and causes pollution. The goods sold by vendors are in the form of breakfast, cooked food, fruits and other foods, and if the sanitary conditions and the food quality are not guaranteed, certain health hazards can be caused.
Therefore, mobile vendors become one of the major targets for the urban management sector to settle. Because the mobile vendor has strong mobility and wide range of motion, related departments are difficult to manage the mobile vendor. Along with the rapid development of the artificial intelligence technology, the mobile vendor can be identified by utilizing the correlation technology, so that the effect of automatic snapshot evidence taking is realized. The illegal mobile vendor identification system based on deep learning can automatically detect whether mobile vendors exist in the monitoring probe picture, so that the manpower of a city management department is saved, and the city management efficiency is improved.
In the process of identifying illegal mobile vendors, pedestrians and booths need to be detected from images, and according to the relative positions and the movement tracks of the pedestrians and the booths, the pedestrians are analyzed to be the mobile vendors and then captured and evidence is obtained. In this regard, it is necessary to use an object detection method to find out and identify an object of interest from an image. At present, common target detection methods are based on a deep learning technology and include methods such as Faster R-CNN, YOLO and SSD.
The invention discloses a method and a system for quickly searching a bayonet image vehicle based on deep learning, and relates to a method and a system for quickly searching the bayonet image vehicle based on deep learning, wherein a deep neural network is adopted to extract vehicle characteristic information, and an acceptance _ resnet _ v2 network is used to extract vehicle characteristics, so that the sharing of network weights is realized, a large amount of repeated calculation is effectively avoided, a loss function of the method and the system adopts triple sample training to directly generate 128-dimensional vectors, and in a picture searching stage, the method and the system adopt a characteristic clustering mode to establish indexes on the characteristics, so that the query speed is improved. The invention can accelerate the extraction speed of the image characteristics, respond quickly in real time and effectively check illegal vehicles such as fake-brand pursuits and fake-brand vehicles.
Disclosure of Invention
The invention aims to provide a method for identifying illegal mobile vendors based on deep learning objective detection, so as to realize automatic evidence obtaining of the illegal mobile vendors existing in a road monitoring range, effectively improve the efficiency of a city management department and reduce the labor cost.
An illegal mobile vendor identification method based on deep learning target detection comprises the following steps:
(1) acquiring a road monitoring image, and cutting the road monitoring video into frame images;
(2) detecting the positions of the booth and the pedestrians from the frame image by using the target detection model;
(3) filtering the moving booth in the image according to the position of the booth, and keeping a fixed booth;
(4) based on the positions and the number of the fixed booths, clustering the pedestrians by using a K-means clustering method to obtain the pedestrians corresponding to each fixed booth;
(5) distinguishing whether pedestrians or booths in different frame images are the same pedestrian or booths by utilizing a pedestrian recognition model and a booth recognition model;
(6) judging whether the pedestrians classified as the same fixed booth are pedlars or not;
the target detection model is obtained by training a learning network consisting of an increment Resnet v2 network and a Faster R-CNN network; the pedestrian identification model and the vendor identification model are obtained by network training of the inclusion Resnet v 2.
The learning network corresponding to the target detection model comprises:
the Inception Resnet v2 network is used for extracting the characteristics of the input frame image and outputting a characteristic diagram to the RPN network and the RoI pooling layer;
the RPN network receives the feature map output by the inclusion Resnet v2 network, extracts a rectangular candidate region possibly having a target, and outputs the rectangular candidate region to the RoI pooling layer;
the RoI pooling layer receives the feature map output by the inclusion Resnet v2 network and the rectangular candidate region output by the RPN network, maps the rectangular candidate region on the feature map and outputs the feature map to the full connection layer;
the full connection layer receives the feature map output by the RoI pooling layer, and outputs the category to which the object in the image of each rectangular candidate region belongs and the classification confidence coefficient of the object; and adjusting the boundary of the object in the rectangular candidate area and outputting coordinate information.
And marking the same type of marks on pedestrians and booths in the image respectively to form a training sample to train the target detection model.
The pedestrian identification model and the internet Resnet v2 network corresponding to the vendor identification model comprise:
the first layer is a Reshape function layer;
the second layer and the third layer are 3-by-3 convolution layers;
the fourth layer is a maximum pooling layer;
the fifth layer and the sixth layer are 3-by-3 convolution layers;
the seventh layer is a maximum pooling layer;
the eighth layer to the thirteenth layer are alternately connected with a Reduction network module and an addition network module;
the fourteenth layer is a 3 x 3 convolutional layer;
the eleventh layer is an average pooling layer;
the sixteenth layer is an output layer;
the seventeenth layer is a fully connected layer with 1 × 1024, and outputs a feature map and a vector with dimensions of 1 × 1024;
and the eighteenth layer is a fully-connected layer of 1 x N and is used for classifying the objects in the vector of 1 x 1024 dimensions, and outputting object classes and classification confidence coefficients, wherein N is the classification number.
The eighth layer to the thirteenth layer in the inclusion respet v2 network are sequentially a Reduction A module, 5 tandem inclusion A modules, a Reduction B module, 10 tandem inclusion B modules, a Reduction C module and 5 tandem inclusion C modules.
The Reduction-A module is formed by connecting four parts in parallel: the first part is a 1 x 1 convolution layer; the second part is 1 × 1 convolution layer, 3 × 3 convolution layer; the third part is 1 × 1 convolution layer, 3 × 3 convolution layer; the fourth part is a convolution layer of 1 x 1, an average pooling layer and the four parts are output in parallel; the Reduction-B module is formed by connecting three parts in parallel: the first part is a 1 x 1 convolution layer; the second part is 1 × 1 convolution layer, 3 × 3 convolution layer; the third part is an average pooling layer; the three parts are connected through a Concat layer and output after splicing; the Reduction-C module is formed by connecting three parts in parallel: the first part is a 1 x 1 convolution layer, a 1 x 1 convolution layer; the second part is 1 × 1 convolution layer, 3 × 3 convolution layer; the third part is 1 × 1 convolution layer, 3 × 3 convolution layer; the fourth part is an average pooling layer; the four parts are connected through a Concat layer and output after splicing;
the Incep-A module is formed by connecting three parts in parallel: the first part is a 1 x 1 convolution layer; the second part is 1 × 1 convolution layer, 3 × 3 convolution layer; the third part is 1 × 1 convolution layer, 3 × 3 convolution layer; the three parts are connected through a Concat layer, and form output together with a depth residual error network after passing through a 3 x 3 convolution layer; the Incep-B module is formed by connecting two parts in parallel: the first part is a 1 x 1 convolution layer; the second part is 1 × 1 convolution layer, 3 × 3 convolution layer; the two parts are connected through a Concat layer, and form output together with a depth residual error network after passing through a 3 x 3 convolution layer; the Inceptation-C module is formed by connecting two parts in parallel: the first part is a 1 x 1 convolution layer; the second part is 1 × 1 convolution layer, 3 × 3 convolution layer; these two parts are connected by a Concat layer, and after passing through a 3 × 3 convolutional layer, they constitute an output together with a depth residual network.
The pedestrian recognition method comprises the steps of extracting pedestrian images, marking the same class mark for the same pedestrian, wherein the class marks of different pedestrians are different and are used for training a pedestrian recognition model; and extracting the images of the mobile booths, marking the same class marks on the same booth, and training booth identification models when the class marks of different booths are different.
In step (3), the method for keeping the fixed booth is as follows:
storing the position and the feature vector of each detected booth in a database, and adding a counting variable COUNT; comparing the feature vector of a new booth with the stored target every time the booth is detected; if the same target is stored in the database and the coordinate change of the target is smaller than the preset value, the COUNT value COUNT + n is increased1Updating the information of the corresponding target in the database; if the target is not stored in the database, storing the target into the database; if a target in the database does not appear in a frame, the COUNT value COUNT-n is decreased2(ii) a Giving a highest threshold value COUNT _ MAX and a lowest threshold value COUNT _ MIN; if the COUNT is greater than the COUNT _ MAX, setting the COUNT as the highest value COUNT _ MAX; if COUNT is less than COUNT _ MIN, then the current target is deleted.
And adjusting the preset value of the coordinate change according to the actual situation.
If the COUNT is greater than the COUNT _ MAX, the COUNT is set to the highest value COUNT _ MAX, so that data out-of-range caused by an excessively large COUNT value can be prevented, and excessive data in the database cannot be deleted
The method for obtaining the pedestrian corresponding to each fixed booth in the step (4) comprises the following steps: according to the number n of the fixed booths, taking the central points of the n fixed booths as initial sample points; classifying the pedestrians by a K-means clustering method according to the gravity center distance between the center position of each pedestrian and each class cluster, and finally separating n classes corresponding to n fixed booths.
In the step (5), the method for distinguishing different pedestrians and booths by using the pedestrian recognition model and the booth recognition model respectively and judging that the pedestrians or booths in different frame images are the same pedestrian or booth comprises the following steps: extracting the features of the pedestrian image by using a pedestrian recognition model to obtain the feature vector of the pedestrian; extracting the characteristics of the booth image by using the booth identification model to obtain the characteristic vector of the booth; comparing the feature vectors of the stored pedestrians and the booths with the feature vectors of the stored pedestrians and the booths;
calculating a characteristic distance D under the Euclidean distance according to the characteristic vector; a threshold value T is given, if D is larger than T, pedestrians or booths in different frame images are not the same booth or pedestrian; and if D is less than or equal to T, the pedestrians or the booths in different frame images are the same booth or the pedestrians.
Characteristic distance at euclidean distance:
Figure BDA0001712274050000071
where D represents the Euclidean distance, n 1024 represents the feature vector dimension, aiRepresenting the value of the i-th dimension in the feature vector a, biA value representing the ith dimension in the feature vector b; a and b represent pedestrians or booths in different frame images.
In the step (6), the method for determining whether the pedestrian classified as the same fixed booth is a vendor comprises the following steps: establishing a database for pedestrians, storing corresponding characteristic information and history classificationInformation and COUNT variable COUNT; the historical classification information refers to information which is classified by a certain pedestrian through a K-means clustering method in the process of processing multiple frames; every time a pedestrian is detected, the pedestrian is compared with the pedestrians in the database, and if the same pedestrian can be detected, the COUNT value COUNT + n is increased1Adding current classification information to the historical classification information; if the same pedestrian can not be detected, adding the information into the database; if a pedestrian does not appear in the current frame, the COUNT value of the pedestrian is decreased, and the COUNT value of the pedestrian is decreased to be COUNT-n2(ii) a Given a count THRESHOLD parameter C _ THRESHOLD and a percentage THRESHOLD parameter P _ THRESHOLD, if the historical classification information of a certain pedestrian is enough and is greater than C _ THRESHOLD, and the percentage of the pedestrian classified into a certain category is greater than P _ THRESHOLD, the pedestrian can be determined as a mobile vendor; giving a highest threshold value COUNT _ MAX and a lowest threshold value COUNT _ MIN; if the COUNT is greater than the COUNT _ MAX, setting the COUNT to be the highest value COUNT _ MAX; and if the COUNT is less than the COUNT _ MIN, deleting the corresponding pedestrian in the database.
The invention adopts a fast R-CNN (fast regional convolutional neural network) method, which is a mainstream deep learning network framework for target detection and has the advantages of higher identification precision than other methods; the position analysis of pedestrians and booths requires a clustering algorithm; the K-Means algorithm is a simple and effective unsupervised learning clustering algorithm, and samples are divided into different categories according to the distance of the samples on a feature space by randomly selecting initial sample points.
The method provided by the invention obtains the positions of pedestrians and booths from the road monitoring video, analyzes the target characteristics, screens and filters data to obtain the positions and the number of fixed booths, and finds out the vendor from the pedestrians by a K-means clustering-based method, thereby carrying out automatic evidence obtaining.
The practical benefits of the invention are mainly expressed in that: the system combines the deep learning technology, can automatically realize the automatic evidence obtaining function of illegal mobile vendors, utilizes the existing urban road video monitoring network, effectively improves the efficiency of urban management departments, and reduces the labor cost.
Drawings
FIG. 1 is a flow chart of a method for identifying a mobile bootlegger according to the present invention;
fig. 2 is a structure of an inclusion respet v2 network provided by the present invention;
FIG. 3 is a Reduction network module in the inclusion Resnet v2 network;
FIG. 4 is an inclusion network module in an inclusion Resnet v2 network;
FIG. 5 is an inclusion-C network module in an inclusion Resnet v2 network;
fig. 6 is a network structure of a target detection model provided by the present invention.
Detailed Description
In order to more specifically describe the present invention, the following detailed description is provided for the technical solution of the present invention with reference to the accompanying drawings and the specific embodiments.
As shown in fig. 1, the method for identifying a bootlegger based on deep learning target detection includes the following steps:
(1) and acquiring a road monitoring image, and cutting the road monitoring video into frame images.
(2) The positions of the booth and the pedestrians are detected from the frame image using the target detection model.
The target detection model is obtained by training a learning network consisting of an increment Resnet v2 network and a Faster R-CNN network; the pedestrian identification model and the vendor identification model are obtained by network training of the inclusion Resnet v 2.
As shown in fig. 6, the learning network corresponding to the target detection model includes:
the Inception Resnet v2 network is used for extracting the characteristics of the input frame image and outputting a characteristic diagram to the RPN network and the RoI pooling layer;
the RPN network receives the feature map output by the inclusion Resnet v2 network, extracts a rectangular candidate region possibly having a target, and outputs the rectangular candidate region to the RoI pooling layer;
the RoI pooling layer receives the feature map output by the inclusion Resnet v2 network and the rectangular candidate region output by the RPN network, maps the rectangular candidate region on the feature map and outputs the feature map to the full connection layer;
the full connection layer receives the feature map output by the RoI pooling layer, and outputs the category to which the object in the image of each rectangular candidate region belongs and the classification confidence coefficient of the object; and adjusting the boundary of the object in the rectangular candidate area and outputting coordinate information.
And marking the same type of marks on pedestrians and booths in the image respectively to form a training sample to train the target detection model.
As shown in fig. 2, the inclusion respet v2 network corresponding to the pedestrian identification model and the vendor identification model includes:
the first layer is a Reshape function layer;
the second layer and the third layer are 3-by-3 convolution layers;
the fourth layer is a maximum pooling layer;
the fifth layer and the sixth layer are 3-by-3 convolution layers;
the seventh layer is a maximum pooling layer;
the eighth layer to the thirteenth layer are alternately connected with a Reduction network module and an addition network module;
the fourteenth layer is a 3 x 3 convolutional layer;
the eleventh layer is an average pooling layer;
the sixteenth layer is an output layer;
the seventeenth layer is a fully connected layer with 1 × 1024, and outputs a feature map and a vector with dimensions of 1 × 1024;
and the eighteenth layer is a fully-connected layer of 1 x N and is used for classifying the objects in the vector of 1 x 1024 dimensions, and outputting object classes and classification confidence coefficients, wherein N is the classification number.
The eighth layer to the thirteenth layer in the inclusion respet v2 network are sequentially a Reduction A module, 5 tandem inclusion A modules, a Reduction B module, 10 tandem inclusion B modules, a Reduction C module and 5 tandem inclusion C modules.
As shown in FIG. 3, the Reduction-A module is composed of four parts in parallel: the first part is a 1 x 1 convolution layer; the second part is 1 × 1 convolution layer, 3 × 3 convolution layer; the third part is 1 × 1 convolution layer, 3 × 3 convolution layer; the fourth part is a convolution layer of 1 x 1, an average pooling layer and the four parts are output in parallel; the Reduction-B module is formed by connecting three parts in parallel: the first part is a 1 x 1 convolution layer; the second part is 1 × 1 convolution layer, 3 × 3 convolution layer; the third part is an average pooling layer; the three parts are connected through a Concat layer and output after splicing; the Reduction-C module is formed by connecting three parts in parallel: the first part is a 1 x 1 convolution layer, a 1 x 1 convolution layer; the second part is 1 × 1 convolution layer, 3 × 3 convolution layer; the third part is 1 × 1 convolution layer, 3 × 3 convolution layer; the fourth part is an average pooling layer; the four parts are connected through a Concat layer and output after splicing;
as shown in fig. 4 and 5, the inclusion-a module is composed of three parts in parallel: the first part is a 1 x 1 convolution layer; the second part is 1 × 1 convolution layer, 3 × 3 convolution layer; the third part is 1 × 1 convolution layer, 3 × 3 convolution layer; the three parts are connected through a Concat layer, and form output together with a depth residual error network after passing through a 3 x 3 convolution layer; the Incep-B module is formed by connecting two parts in parallel: the first part is a 1 x 1 convolution layer; the second part is 1 × 1 convolution layer, 3 × 3 convolution layer; the two parts are connected through a Concat layer, and form output together with a depth residual error network after passing through a 3 x 3 convolution layer; the Inceptation-C module is formed by connecting two parts in parallel: the first part is a 1 x 1 convolution layer; the second part is 1 × 1 convolution layer, 3 × 3 convolution layer; these two parts are connected by a Concat layer, and after passing through a 3 × 3 convolutional layer, they constitute an output together with a depth residual network.
The Incepton Resnet v2 network also comprises a Resnet network structure, and a depth residual error network is utilized to directly carry out input and output without an intermediate module, so that the phenomenon that the accuracy rate may be reduced as the network depth is increased is solved.
The pedestrian recognition method comprises the steps of extracting pedestrian images, marking the same class mark for the same pedestrian, wherein the class marks of different pedestrians are different and are used for training a pedestrian recognition model; and extracting the images of the mobile booths, marking the same class marks on the same booth, and training booth identification models when the class marks of different booths are different.
(3) And filtering the moving booth in the image according to the position of the booth, and keeping the fixed booth.
In the analysis process of the mobile booths, the target detection network cannot guarantee that all pedestrians and booths can be detected each time. In a certain frame, the same pedestrian and the same booth can be detected, and the next frame cannot be detected necessarily, which brings difficulty to the analysis process. Thus requiring the removal of the booth in motion.
Specifically, the position and feature vector of each detected booth is stored in a database, and a COUNT variable COUNT is added; comparing the feature vector of a new booth with the stored target every time the booth is detected; if the same target is stored in the database and the coordinate change of the target is smaller than the preset value, the COUNT value COUNT + n is increased1Updating the information of the corresponding target in the database; if the target is not stored in the database, storing the target into the database; if a target in the database does not appear in a frame, the COUNT value COUNT-n is decreased2(ii) a Giving a highest threshold value COUNT _ MAX and a lowest threshold value COUNT _ MIN; if the COUNT is greater than the COUNT _ MAX, setting the COUNT as the highest value COUNT _ MAX; if COUNT is less than COUNT _ MIN, then the current target is deleted.
And adjusting the preset value of the coordinate change according to the actual situation.
If the COUNT is greater than the COUNT _ MAX, the COUNT is set to the highest value COUNT _ MAX, so that data out-of-range caused by an excessively large COUNT value can be prevented, and excessive data in the database cannot be deleted
(4) And based on the positions and the number of the fixed booths, clustering the pedestrians by using a K-means clustering method to obtain the pedestrians corresponding to each fixed booth.
Specifically, according to the number n of fixed booths, the central points of the n fixed booths are used as initial sample points; classifying the pedestrians by a K-means clustering method according to the gravity center distance between the center position of each pedestrian and each class cluster, and finally separating n classes corresponding to n fixed booths.
(5) And respectively distinguishing whether the pedestrians or the booths in different frame images are the same pedestrian or booths by utilizing the pedestrian recognition model and the booths recognition model.
Taking a pedestrian as an example, the target detection model detects and locates existing pedestrians in each frame, but cannot judge whether a certain two pedestrians are the same person in the two frames before and after. Therefore, each time one frame is processed, the pedestrian position information is acquired using the target detection model, the features of the pedestrian image are extracted using the pedestrian recognition model, and the feature vector of each pedestrian can be acquired.
Specifically, according to the difference of feature vectors generated by the pedestrian and the booth in the human recognition model and the booth recognition model respectively, whether the two objects are the same pedestrian or booth is judged according to the distance of the target in the feature space.
Specifically, the pedestrian recognition model is used for extracting the features of the pedestrian image to obtain the feature vector of the pedestrian; extracting the characteristics of the booth image by using the booth identification model to obtain the characteristic vector of the booth; comparing the feature vectors of the stored pedestrians and the booths with the feature vectors of the stored pedestrians and the booths;
calculating a characteristic distance D under the Euclidean distance according to the characteristic vector; a threshold value T is given, if D is larger than T, pedestrians or booths in different frame images are not the same booth or pedestrian; and if D is less than or equal to T, the pedestrians or the booths in different frame images are the same booth or the pedestrians.
Characteristic distance at euclidean distance:
Figure BDA0001712274050000141
where D represents the Euclidean distance, n 1024 represents the feature vector dimension, aiRepresenting the value of the i-th dimension in the feature vector a, biA value representing the ith dimension in the feature vector b; a and b represent pedestrians or booths in different frame images.
(6) And judging whether the pedestrians classified as the same fixed booth are pedlars or not.
Specifically, a database is established for the pedestrian, and corresponding characteristic information is storedInformation, historical classification information, and a COUNT variable COUNT; the historical classification information refers to information which is classified by a certain pedestrian through a K-means clustering method in the process of processing multiple frames; every time a pedestrian is detected, the pedestrian is compared with the pedestrians in the database, and if the same pedestrian can be detected, the COUNT value COUNT + n is increased1Adding current classification information to the historical classification information; if the same pedestrian can not be detected, adding the information into the database; if a pedestrian does not appear in the current frame, the COUNT value of the pedestrian is decreased, and the COUNT value of the pedestrian is decreased to be COUNT-n2(ii) a Given a count THRESHOLD parameter C _ THRESHOLD and a percentage THRESHOLD parameter P _ THRESHOLD, if the historical classification information of a certain pedestrian is enough and is greater than C _ THRESHOLD, and the percentage of the pedestrian classified into a certain category is greater than P _ THRESHOLD, the pedestrian can be determined as a mobile vendor; giving a highest threshold value COUNT _ MAX and a lowest threshold value COUNT _ MIN; if the COUNT is greater than the COUNT _ MAX, setting the COUNT to be the highest value COUNT _ MAX; and if the COUNT is less than the COUNT _ MIN, deleting the corresponding pedestrian in the database.

Claims (6)

1. An illegal mobile vendor identification method based on deep learning target detection comprises the following steps:
(1) acquiring a road monitoring video, and cutting the road monitoring video into frame images;
(2) detecting the positions of the booth and the pedestrians from the frame image by using the target detection model;
(3) filtering the moving booth in the image according to the position of the booth, and keeping a fixed booth;
(4) based on the positions and the number of the fixed booths, clustering the pedestrians by using a K-means clustering method to obtain the pedestrians corresponding to each fixed booth;
(5) distinguishing whether pedestrians or booths in different frame images are the same pedestrian or booths by utilizing a pedestrian recognition model and a booth recognition model;
(6) judging whether the pedestrians classified as the same fixed booth are pedlars or not;
the target detection model is obtained by training a learning network consisting of an increment Resnet v2 network and a Faster R-CNN network; the pedestrian identification model and the vendor identification model are obtained by network training of increment Resnet v 2;
the method for keeping the fixed booth in the step (3) comprises the following steps: storing the position and the feature vector of each detected booth in a database, and adding a counting variable COUNT; comparing the feature vector of a new booth with the stored target every time the booth is detected; if the same target is stored in the database and the coordinate change of the target is smaller than the preset value, the COUNT value COUNT + n is increased1Updating the information of the corresponding target in the database; if the target is not stored in the database, storing the target into the database; if a target in the database does not appear in a frame, the COUNT value COUNT-n is decreased2
2. The method of claim 1, wherein the step (3) of reserving a fixed booth further comprises: giving a highest threshold value COUNT _ MAX and a lowest threshold value COUNT _ MIN; if the COUNT is greater than the COUNT _ MAX, setting the COUNT as the highest value COUNT _ MAX; if COUNT is less than COUNT _ MIN, then the current target is deleted.
3. The method for identifying a bootlegger with target detection based on deep learning of claim 1, wherein the step (4) of obtaining the pedestrian corresponding to the fixed booth comprises: according to the number n of the fixed booths, taking the central points of the n fixed booths as initial sample points; classifying the pedestrians by a K-means clustering method according to the gravity center distance between the center position of each pedestrian and each class cluster, and finally separating n classes corresponding to n fixed booths.
4. The method of identifying a bootlegger with deep learning based object detection as claimed in claim 1, wherein the step (5) of distinguishing whether the pedestrians or the booths in different frame images are the same pedestrian or booths comprises:
extracting the features of the pedestrian image by using a pedestrian recognition model to obtain the feature vector of the pedestrian; extracting the characteristics of the booth image by using the booth identification model to obtain the characteristic vector of the booth; comparing the feature vectors of the stored pedestrians and the booths with the feature vectors of the stored pedestrians and the booths;
calculating a characteristic distance D under the Euclidean distance according to the characteristic vector; a threshold value T is given, if D is larger than T, pedestrians or booths in different frame images are not the same booth or pedestrian; and if D is less than or equal to T, the pedestrians or the booths in different frame images are the same booth or the pedestrians.
5. The method for identifying a bootlegger with target detection based on deep learning of claim 1, wherein the step (6) of determining whether the pedestrian classified as the same booth is a bootlegger comprises:
establishing a database for the pedestrian, and storing corresponding characteristic information, historical classification information and a counting variable COUNT; the historical classification information refers to information which is classified by a certain pedestrian through a K-means clustering method in the process of processing multiple frames; every time a pedestrian is detected, the pedestrian is compared with the pedestrians in the database, and if the same pedestrian can be detected, the COUNT value COUNT + n is increased1Adding current classification information to the historical classification information; if the same pedestrian can not be detected, adding the information into the database; if a pedestrian does not appear in the current frame, the COUNT value of the pedestrian is decreased, and the COUNT value of the pedestrian is decreased to be COUNT-n2
Given a count THRESHOLD parameter C _ THRESHOLD and a percentage THRESHOLD parameter P _ THRESHOLD, a pedestrian may be considered a floating vendor if its historical classification information is sufficiently large, greater than C _ THRESHOLD, and its percentage classified into a category is greater than P _ THRESHOLD.
6. The method for identifying a bootlegger with deep learning based target detection as claimed in claim 1, wherein the step (6) of determining whether the pedestrian classified as the same booth is a bootlegger further comprises: giving a highest threshold value COUNT _ MAX and a lowest threshold value COUNT _ MIN; if the COUNT is greater than the COUNT _ MAX, setting the COUNT to be the highest value COUNT _ MAX; and if the COUNT is less than the COUNT _ MIN, deleting the corresponding pedestrian in the database.
CN201810688380.0A 2018-06-28 2018-06-28 Illegal mobile vendor identification method based on deep learning target detection Active CN108921083B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810688380.0A CN108921083B (en) 2018-06-28 2018-06-28 Illegal mobile vendor identification method based on deep learning target detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810688380.0A CN108921083B (en) 2018-06-28 2018-06-28 Illegal mobile vendor identification method based on deep learning target detection

Publications (2)

Publication Number Publication Date
CN108921083A CN108921083A (en) 2018-11-30
CN108921083B true CN108921083B (en) 2021-07-27

Family

ID=64422018

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810688380.0A Active CN108921083B (en) 2018-06-28 2018-06-28 Illegal mobile vendor identification method based on deep learning target detection

Country Status (1)

Country Link
CN (1) CN108921083B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109345435A (en) * 2018-12-07 2019-02-15 山东晴天环保科技有限公司 Occupy-street-exploit managing device and method
CN109726717B (en) * 2019-01-02 2022-03-01 西南石油大学 Vehicle comprehensive information detection system
CN109977782B (en) * 2019-02-27 2021-01-08 浙江工业大学 Cross-store operation behavior detection method based on target position information reasoning
CN110276254A (en) * 2019-05-17 2019-09-24 恒锋信息科技股份有限公司 No peddler region street pedlar's automatic identification method for early warning based on unmanned plane
CN110287207A (en) * 2019-06-30 2019-09-27 北京健康有益科技有限公司 A kind of quality of food estimating and measuring method based on density meter
CN110458082B (en) * 2019-08-05 2022-05-03 城云科技(中国)有限公司 Urban management case classification and identification method
CN110992645A (en) * 2019-12-06 2020-04-10 江西洪都航空工业集团有限责任公司 Mobile vendor detection and alarm system in dynamic scene
CN111553321A (en) * 2020-05-18 2020-08-18 城云科技(中国)有限公司 Mobile vendor target detection model, detection method and management method thereof
CN114255409A (en) * 2020-09-23 2022-03-29 中兴通讯股份有限公司 Man-vehicle information association method, device, equipment and storage medium
CN112464015A (en) * 2020-12-17 2021-03-09 郑州信大先进技术研究院 Image electronic evidence screening method based on deep learning
CN113163334A (en) * 2021-02-19 2021-07-23 合肥海赛信息科技有限公司 Intelligent mobile vendor detection method based on video analysis
CN112949510A (en) * 2021-03-08 2021-06-11 香港理工大学深圳研究院 Human detection method based on fast R-CNN thermal infrared image

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102034212A (en) * 2010-06-21 2011-04-27 艾浩军 City management system based on video analysis
CN104166841B (en) * 2014-07-24 2017-06-23 浙江大学 The quick detection recognition methods of pedestrian or vehicle is specified in a kind of video surveillance network
CN106845325B (en) * 2015-12-04 2019-10-22 杭州海康威视数字技术股份有限公司 A kind of information detecting method and device
CN107679078B (en) * 2017-08-29 2020-01-10 银江股份有限公司 Bayonet image vehicle rapid retrieval method and system based on deep learning

Also Published As

Publication number Publication date
CN108921083A (en) 2018-11-30

Similar Documents

Publication Publication Date Title
CN108921083B (en) Illegal mobile vendor identification method based on deep learning target detection
CN108830188B (en) Vehicle detection method based on deep learning
CN107563372B (en) License plate positioning method based on deep learning SSD frame
CN104700099B (en) The method and apparatus for recognizing traffic sign
CN103034836B (en) Road sign detection method and road sign checkout equipment
CN110263786B (en) Road multi-target identification system and method based on feature dimension fusion
CN110866430B (en) License plate recognition method and device
KR101697161B1 (en) Device and method for tracking pedestrian in thermal image using an online random fern learning
CN105574550A (en) Vehicle identification method and device
CN106295532B (en) A kind of human motion recognition method in video image
CN101980245B (en) Adaptive template matching-based passenger flow statistical method
CN110781836A (en) Human body recognition method and device, computer equipment and storage medium
CN110298297A (en) Flame identification method and device
CN104537647A (en) Target detection method and device
CN111274886B (en) Deep learning-based pedestrian red light running illegal behavior analysis method and system
CN104615986A (en) Method for utilizing multiple detectors to conduct pedestrian detection on video images of scene change
Yang et al. Improved lane detection with multilevel features in branch convolutional neural networks
CN112990282B (en) Classification method and device for fine-granularity small sample images
CN103679187A (en) Image identifying method and system
CN111738300A (en) Optimization algorithm for detecting and identifying traffic signs and signal lamps
CN112084890A (en) Multi-scale traffic signal sign identification method based on GMM and CQFL
CN104915642A (en) Method and apparatus for measurement of distance to vehicle ahead
CN109543498B (en) Lane line detection method based on multitask network
CN108073940A (en) A kind of method of 3D object instance object detections in unstructured moving grids
CN101216886A (en) A shot clustering method based on spectral segmentation theory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant