CN115272989A - Method, system and storage medium for capturing cross-lane black smoke vehicle - Google Patents

Method, system and storage medium for capturing cross-lane black smoke vehicle Download PDF

Info

Publication number
CN115272989A
CN115272989A CN202210869316.9A CN202210869316A CN115272989A CN 115272989 A CN115272989 A CN 115272989A CN 202210869316 A CN202210869316 A CN 202210869316A CN 115272989 A CN115272989 A CN 115272989A
Authority
CN
China
Prior art keywords
vehicle
black smoke
lane
cross
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210869316.9A
Other languages
Chinese (zh)
Inventor
康宇
陈佳艺
刘文清
曹洋
张玉钧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Hefei Institutes of Physical Science of CAS
Original Assignee
University of Science and Technology of China USTC
Hefei Institutes of Physical Science of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC, Hefei Institutes of Physical Science of CAS filed Critical University of Science and Technology of China USTC
Priority to CN202210869316.9A priority Critical patent/CN115272989A/en
Publication of CN115272989A publication Critical patent/CN115272989A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/54Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/625License plates
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses a cross-lane black smoke vehicle snapshot system, and belongs to the field of environmental monitoring. In the video-based cross-lane vehicle identification, a complete image sequence is regarded as a point set, and the similarity of the two image sequences is measured by using a Pompe-Hausdorff distance. Because only the distance between the two sets is considered, the metric pairs in the same image sequence can be automatically discarded, and the influence of abnormal values in the same label can be eliminated; meanwhile, the relaxed Pompe-Hausdorff distance measurement automatically selects the kth best matching point of the two point sets by relaxing the maximum constraint of the one-way distance, so that the influence of abnormal values in the same label can be reduced, the measurement pair caused by label noise due to the maximum distance can be avoided, and the adverse influence of the abnormal values and the label noise on the optimization of the network is avoided.

Description

Cross-lane black smoke vehicle snapshot method, system and storage medium
Technical Field
The invention relates to the field of environmental monitoring, in particular to a cross-lane black smoke vehicle snapshot system.
Background
In suburban diesel trucks, a large amount of black smoke is often discharged, which causes serious environmental pollution. The traditional smoke detection algorithm is mainly based on the fact that smoke has a visual fuzzy characteristic, a semitransparent characteristic and a diffusion motion characteristic, a video smoke detection method based on a color characteristic and a motion characteristic is adopted, such as an optical flow method and the like, or detection is carried out by assisting the frequency domain change characteristic of the smoke, but the traditional smoke detection algorithm is mainly applied to relatively static scenes such as mountain forest fire or a chimney, and diesel vehicles discharging black smoke usually run on suburb roads, the background moving at high speed is complex and changeable, and the traditional smoke detection method is not suitable for application. In recent years, with the development of artificial intelligence and deep learning, it is obviously more efficient to acquire video information through computer vision and identify smoke in an image by using a deep learning target detection algorithm. Therefore, it is desirable to provide an end-to-end black smoke detection network for capturing black smoke vehicles at a traffic crossing efficiently and accurately in real time.
Meanwhile, in the capturing process of the black smoke diesel vehicle, the problem that the license plate of the captured vehicle cannot be determined due to the fact that the black smoke discharged from the tail of the diesel vehicle shields the vehicle information on the back or the vehicles shield each other often exists in the existing algorithm. Therefore, it is desirable to construct a cross-lane vehicle identification network, and obtain the common features of vehicles with different viewing angles according to a re-identification algorithm by using a cross-camera diesel vehicle video, so as to obtain license plate information from other unobstructed viewing angles.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a cross-lane black smoke vehicle snapshot method, a cross-lane black smoke vehicle snapshot system and a storage medium.
The purpose of the invention can be realized by the following technical scheme:
a cross-lane black smoke vehicle snapshot method comprises the following steps:
step 1: extracting picture data in a data set captured by a multi-lane camera to obtain a black smoke detection data set and a vehicle data set;
step 2: performing feature extraction and fusion on picture data in the black smoke detection data set to construct a black smoke detection model;
and 3, step 3: constructing a ResNet-50 network model, and extracting common features of picture sequences in a vehicle data set; defining a Pompe-Hausdorff distance learning loss based on the set as a triplet loss, optimizing a ResNet-50 network model through the triplet loss and an ID loss, and training to obtain a vehicle identification model;
and 4, step 4: selecting a to-be-detected picture of a test set part in a black cigarette vehicle data set, inputting a black cigarette detection model, and judging whether a black cigarette vehicle exists or not; if the cross-lane vehicle data exists, the intercepted black smoke diesel vehicle area is used as a query set picture and is input into a vehicle identification model together with a gallery set picture in the cross-lane vehicle data set;
if the cross-lane vehicle gallery set contains the vehicles in the query set, returning the pictures of the vehicles, which are cut from the gallery set, and acquiring the license plate information of the diesel vehicle emitting black smoke from the pictures without the black smoke shielding visual angle.
Optionally, selecting a vehicle and a black smoke area discharged by the vehicle by using a bounding box, extracting a black smoke area data set, and training, verifying and testing the black smoke area data set.
Optionally, the videos containing the same vehicle belong to the same vehicle ID, key frame pictures are captured to form a picture sequence containing spatiotemporal information association, the bounding boxes of each track are labeled, the bounding box of each track only contains one vehicle ID, image sequences containing different bounding boxes are obtained, and the cross-lane vehicle data set is formed.
Optionally, the black smoke detection model includes:
a backbone network;
a feature pyramid module combining pyramid pooling and path aggregation networks, and
and the detector is used for detecting the feature hierarchy extracted by the feature pyramid to obtain the positions of the prediction frames, then suppressing the probability that each prediction frame contains black smoke by using a non-maximum value, and taking the prediction frame with the highest score as the position of the detection target.
Optionally, in step 3, regarding one image sequence as a point set, and measuring the similarity between two image sequences by using a pompe-hausdov distance.
Optionally, in step 3, by randomly selecting a class P vehicle id, randomly selecting K images for each class, forming a batch containing PK images; the triplet loss function is defined as: (ii) a Represents the Pompe-Hausdorff distance;
and superposing the ID loss function and the central loss function to obtain a loss function of the vehicle identification model.
Optionally, the ID loss function:
Figure BDA0003759895920000031
optionally, the central loss function:
Figure BDA0003759895920000032
a vehicle identification system comprising:
extracting a main network of the characteristic information of the vehicle picture data;
the BNNeck layer is configured with a set-based Ponber-Housdov distance learning loss as a triple loss function; and
the classifier is fully connected with the layer.
A computer readable storage medium storing instructions that, when executed, enable any of the above-described methods.
The invention has the beneficial effects that:
the snapshot method of the invention adopts an end-to-end black smoke detection network, ensures the real-time performance of the detection system, and is used for capturing the diesel vehicle which discharges black smoke in the monitoring video of the traffic crossing in real time; the cross-lane vehicle identification network based on the Ponber-Hausdorff distance identifies vehicles by extracting the common characteristics of the same vehicle under different camera view angles, uses Ponber-Hausdorff distance measurement learning to reduce the influence of abnormal values and label noise caused by mutual shielding between vehicles, judges the same vehicle crossing a camera scene, and gives the license plate information of the vehicle by other camera view angles which are not shielded by black smoke, thereby capturing diesel vehicles which illegally discharge the black smoke and the information thereof.
Drawings
The invention will be further described with reference to the accompanying drawings.
FIG. 1 is a network structure of a black smoke detection model in an example of the present invention;
fig. 2 is a network structure of a cross-lane vehicle recognition model in an example of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In some examples of the present invention, a cross-lane black smoke snapshot method is disclosed, comprising the steps of S1 to S4:
s1: the method comprises the steps of processing a traffic crossing monitoring video of a road section with a large traffic flow, deploying 4 non-overlapping cameras at different traffic crossings, obtaining original videos shot by all the cameras from 8 to 10 points in the morning when traffic is crowded, extracting key frames and marking boundary frames of the video data, and manufacturing a black smoke detection data set and a cross-lane vehicle data set.
S2: constructing a Darknet53 network based on a cross-phase local network structure, and extracting the characteristics of picture data in the black smoke detection data set; then fusing the features with different sizes through space pyramid pooling and a path aggregation network; inputting the fused features into a target detector at one stage, performing black smoke detection on the extracted three feature layers to obtain position information of a prediction frame, and calculating a loss function according to the central point coordinates and the width and height information of the prediction frame and the real frame to optimize and obtain a final black smoke detection model. The network structure of the black smoke detection model is shown in fig. 1.
S3: constructing a ResNet-50 network model based on a depth residual error network, and extracting common features of picture sequences in a cross-lane vehicle data set; aiming at the shielding problem among vehicles, pompe-Hausdorff distance metric learning is introduced, and Pompe-Hausdorff distance learning loss based on a set is defined as triple loss:
Figure BDA0003759895920000051
wherein P represents the number of vehicle id categories, K represents the random selection of K objects in each category,
Figure BDA0003759895920000052
an anchor box representing the i-th class,
Figure BDA0003759895920000053
representing a positive sample in the i-th class,
Figure BDA0003759895920000054
representing a negative sample in the i-th class,
Figure BDA0003759895920000055
represents the relaxed pompe-hausdov distance.
Triple loss is adopted for batch difficult samples, so that the influence of abnormal values and label noise caused by mutual shielding among vehicles is reduced; after the feature layer, inputting the extracted features into a BNNeck layer structure to balance optimization of different loss functions on feature vectors; inputting the features output from the BNNeck layer into a full connection layer of a classifier, classifying, and calculating ID loss; and optimizing the network through the triple loss and the ID loss to obtain a trained lane-crossing vehicle identification model. The network structure of the cross-lane vehicle identification model is shown in fig. 2.
S4: selecting a to-be-detected picture of a test set part in the black smoke detection data set, inputting the to-be-detected picture into the black smoke detection model trained in the step 2, and judging whether a black smoke vehicle exists or not; if the cross-lane vehicle data center exists, the intercepted black smoke diesel vehicle area is used as an inquiry set picture and is input into a cross-lane vehicle recognition model together with a gallery set picture in a cross-lane vehicle data center, if the cross-lane vehicle gallery set contains the vehicle of the inquiry set, the picture of the vehicle intercepted from the gallery set is returned, and therefore the picture of the visual angle which is not blocked by the black smoke obtains the license plate information of the diesel vehicle which emits the black smoke, and the cross-lane black smoke snapshot system based on an end-to-end black smoke detection network and a Ponbe-Hausdorff distance is achieved.
The step S1 specifically includes the following steps S11 to S14:
s11: the method comprises the steps that a traffic crossing monitoring camera deployed on a certain road section is utilized to obtain an original video shot by the road section from 8 to 10 points in the morning when traffic is crowded, 4 non-overlapping cameras deployed on different traffic crossings are involved, the viewing angles of the cameras are required to be different, and each vehicle passing through the road section is captured by at least 2 cameras.
S12: in the video acquired in the screening S11, a key frame picture including a diesel vehicle that is emitting black smoke is captured at a frequency of 3 frames per second from a view point of the back of the vehicle and includes the video of the diesel vehicle that is emitting black smoke. And labeling the diesel vehicle and the black smoke region emitted by the diesel vehicle in the picture by using a LabelImg image label labeling tool, wherein the labeled region is called a boundary box, and an xml file obtained by labeling is in a PASCAL VOC data set format and comprises the category information of the boundary box, the coordinate information of the lower left corner and the upper right corner of the boundary box and the like. A black smoke detection data set is thus obtained, wherein the data is divided into a training set, a validation set, and a test set according to the ratio 6.
S13: and classifying the videos obtained in the S11 according to the vehicle license plate information, wherein the videos containing the same vehicle belong to the same ID. And intercepting the key frame pictures at an interval of 3 frames per second to form a picture sequence containing spatio-temporal information association. And manually marking the boundary box of each track by using a computer visual marking tool, wherein the boundary box of each track only contains one vehicle id. Thus, image sequences of different vehicles containing different bounding boxes are obtained, and a cross-lane vehicle identification data set is formed. Wherein the training set and the test set are divided according to the proportion of 1. In the test set, each vehicle selects one picture as a query set, and the rest pictures form a picture library set.
The training set is used for training the cross-lane vehicle recognition model, and the query set and the image library set are used for testing the model. And training the model on the training set, extracting features of the images in the query set and the image library set after obtaining the model, calculating similarity, and finding out the first N images similar to the images in the image library set for each query set.
The step S2 specifically includes the following steps S21 to S23:
s21: a real-time and efficient black smoke vehicle detection model comprising a backbone network, a feature pyramid and a detector is constructed, and is shown in figure 1.
The main feature extraction network is of a Darknet53 structure with a cross-stage local network, the cross-stage local network splits the stack of the Darknet53 residual block into a left part and a right part, and the main part continues to stack the original residual block; the other part is directly connected to the last with little processing, like a residual edge. The structure greatly improves the calculation speed of the network so as to meet the requirement of real-time detection.
And the characteristic pyramid part of the black smoke detection model is combined with a space pyramid pooling network and a path aggregation network to fuse the characteristic information of the characteristic graphs of different sizes output by the characteristic extraction network. After the last feature layer of the feature extraction network is convolved for three times, the spatial pyramid structure respectively utilizes four maximal pooling of different scales to process, and the sizes of maximal pooling kernels are respectively 13x13, 9x9, 5x5 and 1x1, so that the receptive field is increased, and the most remarkable contextual features are separated. The path aggregation network is used for repeated extraction of features. After the spatial feature pyramid extracts features from bottom to top, feature extraction from top to bottom is realized, so that the purpose of repeatedly extracting the features is achieved.
And the detector part adopts a one-stage target detector to detect the three feature layers extracted by the feature pyramid to obtain the positions of three prediction frames, and then uses a non-maximum value to suppress the probability of containing black smoke in each prediction frame to obtain the prediction frame with the highest score as the final position of the detection target.
The loss function of the network is the average of the bounding box regression loss, confidence loss and classification loss. Loss = (L)DIOU+Lprob+Lconf)/3
Wherein, the bounding box regression loss is as follows:
Figure BDA0003759895920000071
where IOU (A, B) is the intersection ratio between the predicted box A and the real box B, c is the diagonal length of the minimum bounding box of A and B, ρ2(Actr,Bctr) Is the coordinate A of the center point of the prediction framectrCoordinate B with the center point of the real framectrThe square of the euclidean distance between.
The confidence loss is:
Figure BDA0003759895920000081
the parameter S indicates that the number of meshes in each picture is S × S in total, and each mesh generates M candidate frames. Parameter(s)
Figure BDA0003759895920000082
Representing the ith meshIs responsible for predicting this target, if it is, the value is 1, otherwise it is 0. Parameter(s)
Figure BDA0003759895920000083
The jth candidate box representing the i meshes is not responsible for the target,
Figure BDA0003759895920000084
the true value of the confidence is represented,
Figure BDA0003759895920000085
and representing a confidence degree predicted value, wherein when the jth bounding box of the ith grid is responsible for predicting a certain object, the corresponding value is 1, and otherwise, the corresponding value is 0.
The classification loss is:
Figure BDA0003759895920000086
wherein p isi(c) The true value of the class c to which the marker box belongs,
Figure BDA0003759895920000087
indicating the probability that the prediction box belongs to category c.
S22: and (3) extracting a main feature extraction network, namely a Darknet53 structure with a cross-phase local network, and training on the ImageNet data set to obtain a pre-training model so as to save a large amount of time.
S23: the training set pictures of the black smoke detection data set are normalized in size and mapped to 608 × 608 pictures in the original image coordinate system. And randomly cutting a plurality of pictures by adopting a mosaic data enhancement mode, splicing the pictures to one picture to serve as training data, inputting the pictures and the information of the boundary frame of the pictures into a main network for training, and finally obtaining an applicable black smoke detection model.
After the features are extracted by the backbone network, feature fusion is carried out by the feature pyramid to obtain three feature layers. And for each characteristic layer, extracting the position of the point where the target really exists and the corresponding type of the point, and processing the output of the predicted value to obtain a prediction frame. For each graph, the IOU is computed for all real and predicted blocks. Ciou _ loss is then calculated as the loss of regression. The loss of confidence and prediction class is calculated simultaneously. Thereby yielding an overall loss function. And optimizing the loss function and obtaining a final black smoke detection model.
The step S3 specifically includes the following steps S31 to S33:
s31: and constructing a cross-lane vehicle identification network based on the Ponbe-Housdov distance.
The cross-lane vehicle identification network structure comprises a backbone network, a BNNeck layer and a classifier full-connection layer, as shown in FIG. 2.
And the main network part adopts a ResNet-50 model pre-trained on ImageNet and is used for extracting characteristics in the cross-lane vehicle information matching data set.
The BNNeck structure is a layer of neural network behind a characteristic layer and in front of a full connection layer of a classification layer. The present invention uses different loss functions, where ID loss mainly optimizes cosine distances, while triplet loss is used to optimize euclidean distances, enhancing intra-class compactness and inter-class separability. When optimizing the eigenvectors using both losses, one loss reduction and the other loss ringing or increasing may occur. To avoid this, after the features are extracted by the backbone network, the features are entered into the BNNeck structure. The feature before the batch normalization layer is denoted as ft,ftObtaining the characteristic f after batch normalizationiRespectively by ftAnd fiTriple losses and ID losses are calculated.
And inputting the features output from the BNNeck layer into a full connection layer of the classifier, classifying, and calculating the ID loss.
The pompe-hausdov distance mentioned therein. Is provided with S1And S2Is the Euclidean metric space (G, d)E) Two non-empty subsets of (a). Their pompe-hausdorff distance dH(S1,S2) Presentation set S1At most with the set S2Distance of a certain pointDistance dH(S1,S2). Specifically, dH(S1,S2) Defined as the largest of all distances from a point in one set to the nearest point in the other set.
In some examples of the invention, a complete image sequence is treated as a set of points, and the similarity of two image sequences is measured by a Ponbe-Housdov distance, based on video cross-lane vehicle identification. Suppose that
Figure BDA0003759895920000101
Are two sets consisting of M points and N points, respectively. Two sets of points S1And S2A large pompe-hausdov distance d between themH(S1,S2) Can be defined as:
dH(S1,S2)=max{dU(S1,S2),dU(S2,S1)}
Figure BDA0003759895920000102
Figure BDA0003759895920000103
wherein d isH(S1,S2) Also known as the two-way pombe hausdorff distance. dU(S1,S2) And du(S2,S1) Is the two unidirectional poinbes-hausdorff distances between the two sets, which is the maximum value measured from a point in one set to the nearest neighbor in the other set. Further, the bi-directional Pompe-Hausdorff distance is the larger of the two unidirectional Pompe-Hausdorff distances.
For two types of noise samples due to occlusion: isolated point and label noise, in the existing point-to-point metric learning method, all images in a small batch are used for optimizing a metric space, each sample is processed independently, regardless of the structure of a track, and a noise sample is introduced by a sample with occlusion, so that two harmful metric pairs are caused, for example, a car a has two labels of x1 and x2, and a car B has two labels of y1 and y2, wherein x2 is occluded by the car B to have the visual appearance of more cars B, but is still marked on the car a, and then due to the influence of the abnormal value, the characteristic distance of the dead car x1 and x2 with larger visual difference is shortened, and the characteristic distance of the x2 and y2 is enlarged, so that the optimization of a network is adversely influenced; in addition, false tags in the data set can become tag noise, which can also affect the optimization of the network.
And by adopting the Pompe-Hausdorff distance measurement, because only the distance between two sets is considered, the measurement pairs in the same image sequence can be automatically discarded, and the influence of abnormal values in the same label can be eliminated.
When calculating the one-way Pompe-Hausdorff distance, the metric is very sensitive to tag noise due to the presence of the maximum computation. To further circumvent this difficulty, a relaxed pompe-hausdov distance metric learning method is proposed by relaxing the maximum constraint of the one-way distance:
Figure BDA0003759895920000111
wherein the content of the first and second substances,
Figure BDA0003759895920000112
is represented in a set
Figure BDA0003759895920000113
Figure BDA0003759895920000114
The kth maximum is selected and vice versa. The relaxed Pompe-Hausdorff distance metric may be automatically selected S by relaxing the maximum constraint on the one-way distance1(S2) The k best match points of (c) can mitigate the effect of outliers in the co-label, while it identifies a subset of the full set of sizes k that minimizes the one-way hausdorff distanceAnd the metric pair caused by the label noise due to the maximum distance can be avoided, and the adverse influence of the abnormal value and the label noise on the optimization of the network can be avoided.
In some examples of the invention, a Pompe-Hausdorff distance loss function is disclosed, which is proposed based on Batch Hard (BH) triplet loss, and employs an ensemble-based sample selection strategy to accomplish the video-based cross-lane vehicle identification task. Based on the Batch Hard (BH) triplet loss, a Batch containing PK images is formed by randomly selecting P vehicle ids and K images for each type, and the Hard positive and negative samples in the Batch are called Batch Hard. To evaluate video-based distances in a set-to-set manner, a triplet strategy based on the pompe-hausdov relaxation distance was devised. For a particular anchor frame Sa, the pompe-hausdov distance loss equation is as follows:
Figure BDA0003759895920000115
wherein
Figure BDA0003759895920000116
Represents the relaxed pompe-hausdov distance.
ID loss of the definition model:
Figure BDA0003759895920000117
center loss of the defined model:
Figure BDA0003759895920000118
wherein, yjIs the label of the jth image in the mini-batch,
Figure BDA0003759895920000121
being the y-th of a depth featurejClass center, B is the number of batch-sizes, centerloss can increase the intra-class compactness.
The loss function of the final cross-lane vehicle identification model is as follows:
L=LPhD+LID+βLc
s32: the picture sequences in the cross-lane vehicle data set are pre-amplified in pixel units and adjusted to the same resolution of 256 × 256, and are synchronously enhanced by performing a random horizontal flipping operation on each image sequence.
S33: and randomly extracting P types of vehicle ids in the S32, randomly selecting K images in each type as training samples to form batch-size B containing P x K images, sending the batch-size B to a cross-lane vehicle identification model for training, outputting a characteristic f by the model, calculating loss, and finally obtaining an applicable cross-lane vehicle identification model by optimizing a loss function.
For example, random decimation may set P =16,k =4.
And (3) optimizing the model by adopting an Adam optimizer, setting the initial learning rate to be 3.5 x 10^ -4, reducing the initial learning rate to 1/10 of the original initial learning rate at the 40 th epoch and the 70 th epoch respectively, and training 120 epochs in total.
The step S4 is a detection process of the end-to-end lane-crossing black smoke vehicle snapshot system based on the poinbe-hausdov distance matching, and specifically includes the following subdivision steps S41 to S43:
s41: selecting a to-be-detected picture of a test set part in the black smoke detection data set, inputting the to-be-detected picture into the black smoke detection model trained in the step 2, and judging whether a black smoke vehicle exists or not;
and inputting a picture containing the black smoke diesel vehicle for the trained black smoke detection model, detecting the black smoke and the diesel vehicle area, and returning the marked picture and the position information of the boundary frame.
S42: if the picture in S41 has black smoke, taking the intercepted black smoke diesel vehicle region as an inquiry set picture, inputting the inquiry set picture and a gallery set picture in a cross-lane vehicle data set into a cross-lane vehicle identification model together, and if the gallery set of the cross-lane vehicle contains the vehicle of the inquiry set, returning the picture of the visual angle, which is intercepted from the gallery set and is not blocked by the black smoke, of the vehicle;
wherein the query set and the gallery set are used for testing a cross-lane vehicle identification model. And training the model on a training set to obtain a trained model, extracting features of all pictures in the query set and the atlas set to calculate similarity, finding out the first N pictures similar to the former N pictures in the atlas set for each query set, and outputting the pictures with the highest similarity as the pictures of the same vehicle from different view angles.
S43: therefore, the license plate information of the diesel vehicle which discharges the black smoke is obtained by the picture of the visual angle which is not shielded by the black smoke, and the cross-lane black smoke snapshot system based on the end-to-end black smoke detection network and the Ponbe-Housedouv distance is realized.
In some examples, the invention also relates to a computer-readable storage medium storing instructions. When the instructions are executed, the snapshot method in any one of the above examples can be realized. More specifically, the instructions may be in a computer readable language. The computer described above may be a general purpose computing device or a special purpose computing device. In a specific implementation, the computer may be a desktop computer, a laptop computer, a network server, a Personal Digital Assistant (PDA), a mobile phone, a tablet computer, a wireless terminal device, a communication device, or an embedded device. The storage medium may be any available medium that can be accessed by a computer or a data storage device including one or more available media integrated servers, data centers, and the like. For example, the storage medium may be, but is not limited to, a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a Digital Versatile Disk (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)).
In the description herein, references to the description of "one embodiment," "an example," "a specific example," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed.

Claims (10)

1. A cross-lane black smoke vehicle snapshot method is characterized by comprising the following steps:
step 1: extracting picture data in a data set captured by a multi-lane camera to obtain a black smoke detection data set and a vehicle data set;
step 2: performing feature extraction and fusion on picture data in the black smoke detection data set to construct a black smoke detection model;
and step 3: selecting ResNet-50 as a backbone network to construct a network model, and extracting common features of picture sequences in a vehicle data set; defining a Pompe-Hausdorff distance learning loss based on the set as a triple loss, optimizing a ResNet-50 network model through the triple loss and an ID loss, and obtaining a vehicle identification model through training;
and 4, step 4: selecting a to-be-detected picture of a test set part in a black cigarette vehicle data set, inputting a black cigarette detection model, and judging whether a black cigarette vehicle exists or not; if the cross-lane vehicle data set exists, the intercepted black smoke diesel vehicle area is taken as a query set picture and is input into the vehicle identification model together with a map library set picture in the cross-lane vehicle data set;
if the cross-lane vehicle gallery set contains the vehicle in the query set, returning the picture of the vehicle captured from the gallery set, and acquiring the license plate information of the diesel vehicle emitting black smoke from the picture of the visual angle which is not blocked by the black smoke.
2. The cross-lane black smoke vehicle snapshot method according to claim 1, wherein in step 1, a bounding box is used to select a vehicle and a black smoke area discharged by the vehicle, a black smoke area data set is extracted, and the black smoke area data set is a training set, a verification set and a test set.
3. The cross-lane black smoke vehicle snapshot method according to claim 1, wherein videos containing the same vehicle belong to the same vehicle ID, key frame pictures are intercepted to form a picture sequence containing spatio-temporal information association, a boundary frame of each track is labeled, the boundary frame of each track contains one vehicle ID, an image sequence containing different boundary frames is obtained, and the cross-lane vehicle data set is formed.
4. The method for capturing the black smoke of the cross lane according to claim 1, wherein the black smoke detection model comprises:
a backbone network;
a feature pyramid module combining pyramid pooling and path aggregation networks, and
and the detector is used for detecting the feature hierarchy extracted by the feature pyramid to obtain the positions of the prediction frames, then, the probability that each prediction frame contains black smoke is inhibited by using a non-maximum value, and the prediction frame with the highest score is the position of the detection target.
5. A method as claimed in claim 1, wherein in step 3, a sequence of images is regarded as a set of points, and the similarity between the two sequences of images is measured by a poinber-hausdov distance.
6. The method according to claim 1, wherein in step 3, by randomly selecting P types of vehicle IDs, K images are randomly selected for each type, and a batch containing P x K images is formed; the triplet loss function is defined as:
Figure FDA0003759895910000021
where τ represents the margin, the subscripts a, p, and n represent the anchor box, positive samples, and negative samples, respectively,
Figure FDA0003759895910000022
representing the kth maximum value in the Pompe-Hausdorff distance set;
and superposing the ID loss function and the central loss function by the triple loss function to obtain the loss function of the vehicle identification model.
7. The cross-lane blackcar snapping method of claim 6, wherein the ID loss function:
Figure FDA0003759895910000023
wherein, N, qi、piAnd y represents: the number of classes, the label smoothing parameter for class i, the predicted logic value for class i, and the true ID label.
8. The cross-lane black smoke vehicle snapshot method of claim 6, wherein the center loss function:
Figure FDA0003759895910000024
wherein, yjIs the label of the jth image in the mini-batch,
Figure FDA0003759895910000025
representing the features before the batch normalization layer normalization,
Figure FDA0003759895910000026
being the y th of depth featurejClass center, B is the number of samples selected for a training session.
9. A vehicle identification system comprising:
extracting a main network of the characteristic information of the vehicle picture data;
the BNNeck layer is configured with a set-based Ponber-Housdov distance learning loss as a triple loss function; and
the classifier is fully connected with the layer.
10. A computer-readable storage medium storing instructions that, when executed, are capable of performing the method of any one of claims 1 to 8.
CN202210869316.9A 2022-07-22 2022-07-22 Method, system and storage medium for capturing cross-lane black smoke vehicle Pending CN115272989A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210869316.9A CN115272989A (en) 2022-07-22 2022-07-22 Method, system and storage medium for capturing cross-lane black smoke vehicle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210869316.9A CN115272989A (en) 2022-07-22 2022-07-22 Method, system and storage medium for capturing cross-lane black smoke vehicle

Publications (1)

Publication Number Publication Date
CN115272989A true CN115272989A (en) 2022-11-01

Family

ID=83769315

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210869316.9A Pending CN115272989A (en) 2022-07-22 2022-07-22 Method, system and storage medium for capturing cross-lane black smoke vehicle

Country Status (1)

Country Link
CN (1) CN115272989A (en)

Similar Documents

Publication Publication Date Title
CN108416250B (en) People counting method and device
CN108062349B (en) Video monitoring method and system based on video structured data and deep learning
Cong et al. Video anomaly search in crowded scenes via spatio-temporal motion context
CN109977782B (en) Cross-store operation behavior detection method based on target position information reasoning
WO2019218824A1 (en) Method for acquiring motion track and device thereof, storage medium, and terminal
US10735694B2 (en) System and method for activity monitoring using video data
Fan et al. A survey of crowd counting and density estimation based on convolutional neural network
Bertini et al. Multi-scale and real-time non-parametric approach for anomaly detection and localization
CN111767882A (en) Multi-mode pedestrian detection method based on improved YOLO model
Shen et al. A convolutional neural‐network‐based pedestrian counting model for various crowded scenes
US11587327B2 (en) Methods and systems for accurately recognizing vehicle license plates
WO2020258077A1 (en) Pedestrian detection method and device
Wei et al. City-scale vehicle tracking and traffic flow estimation using low frame-rate traffic cameras
Luo et al. Traffic analytics with low-frame-rate videos
CN112270381A (en) People flow detection method based on deep learning
CN110942456B (en) Tamper image detection method, device, equipment and storage medium
CN113920585A (en) Behavior recognition method and device, equipment and storage medium
CN114332921A (en) Pedestrian detection method based on improved clustering algorithm for Faster R-CNN network
CN116824641B (en) Gesture classification method, device, equipment and computer storage medium
He et al. A double-region learning algorithm for counting the number of pedestrians in subway surveillance videos
CN112215188A (en) Traffic police gesture recognition method, device, equipment and storage medium
CN116311166A (en) Traffic obstacle recognition method and device and electronic equipment
Miah et al. An empirical analysis of visual features for multiple object tracking in urban scenes
CN116189286A (en) Video image violence behavior detection model and detection method
Gawande et al. Scale invariant mask r-cnn for pedestrian detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination