Disclosure of Invention
Aiming at the limitation of the prior art, the invention provides a vehicle detection counting method, a system, a storage medium and computer equipment, and the technical scheme adopted by the invention is as follows:
a vehicle detection counting method comprises the following steps:
acquiring a relative state of the vehicle by taking a preset position identification line as a reference standard by using an SSD detector provided with a saliency enhancement unit based on an attention mechanism;
tracking vehicles in preset ranges at two sides of the position identification line according to the relative state;
and counting the vehicles of which the relative states change within the preset ranges at the two sides of the position identification line.
Compared with the prior art, the vehicle state detection method and the vehicle state detection device aim at the problems of missed detection, high false detection rate and the like of vehicle state detection, vehicle target detection of a quantitative coordinate position is converted into vehicle state detection of a qualitative relative state, meanwhile, deep attention is used for optimizing weight information of a shallower layer and remarkable feature mapping is obtained in a maximized mode, vehicle counting is completed by using a counting algorithm based on the vehicle state, an effective counting area in a counting process is reduced, and about 47.36-70.53% of over-detected targets are reduced under the condition that an overall better counting effect is obtained.
As a preferable aspect, the relative states include up, mid, down, and far; the relative state up represents that the vehicle is located in the preset range of the upper side area of the position identification line, the relative state mid represents that the distance from the vehicle to the position identification line is zero, the relative state down represents that the vehicle is located in the preset range of the lower side area of the position identification line, and the relative state far represents that the vehicle is located outside the preset ranges of the two sides of the position identification line.
As a preferable scheme, the relative state, the vehicle state, is expressed by the following formula:
wherein D represents the distance from the vehicle to the position identification line, D >0 represents that the vehicle is positioned in the upper area of the position identification line, and D <0 represents that the vehicle is positioned in the lower area of the position identification line; h denotes a picture height.
Further, the attention-based saliency enhancement unit is used for saliency enhancing shallower feature maps with deeper feature maps in the SSD detector.
Further, the step of performing significance enhancement on the shallow feature map by the significance enhancing unit based on the attention mechanism with the deep feature map in the SSD detector is as follows:
by mapping F to deeper layer features b Performing Sigmoid operation on the deconvolution result to obtain attention-focused feature mapping S;
by mapping the feature map S to a shallower feature map F a Performing dot product operation to obtain feature mapping P with enhanced attention;
by mapping F to deeper layer features b Performing linear interpolation operation on the convolution result to obtain a feature map L with the same spatial resolution as the feature map P;
obtaining a fused feature map A by adding the feature map P to the feature map L;
by mapping the feature map A with a shallower feature map F a And carrying out maximum operation to obtain the feature mapping M with enhanced significance.
As a preferable scheme, the SSD detector includes six layers of feature maps for detection, wherein the feature maps of the first three layers from shallow to deep are respectively provided with saliency enhancing units based on attention mechanism, and the saliency enhancing units based on attention mechanism respectively enhance the saliency of the feature map of the current layer by using the feature maps of the adjacent deeper layers.
A vehicle detection counting system, comprising:
the vehicle relative state acquisition module is used for acquiring a relative state of the vehicle by taking a preset position identification line as a reference by using an SSD detector provided with a saliency enhancement unit based on an attention mechanism;
the vehicle tracking module is used for tracking the vehicles in the preset ranges at two sides of the position identification line according to the relative state;
and the vehicle counting module is used for counting the vehicles of which the relative states change within the preset ranges at the two sides of the position identification line.
The present invention also provides the following:
a storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the vehicle detection and counting method as described above.
A computer device comprising a storage medium, a processor and a computer program stored in the storage medium and executable by the processor, the computer program, when executed by the processor, implementing the steps of the aforementioned vehicle detection counting method.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
it should be understood that the embodiments described are only some embodiments of the present application, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without any creative effort belong to the protection scope of the embodiments in the present application.
The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the present application. As used in the examples of this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application, as detailed in the appended claims. In the description of the present application, it is to be understood that the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not necessarily used to describe a particular order or sequence, nor are they to be construed as indicating or implying relative importance. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.
Further, in the description of the present application, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. The invention is further illustrated below with reference to the figures and examples.
In order to solve the limitation of the prior art, the present embodiment provides a technical solution, and the technical solution of the present invention is further described below with reference to the accompanying drawings and embodiments.
Referring to fig. 1, a vehicle detecting and counting method includes the following steps:
s01, acquiring the relative state of the vehicle by taking a preset position identification line as a reference standard by using an SSD detector provided with a saliency enhancement unit based on an attention mechanism;
s02, tracking the vehicles in the preset ranges at the two sides of the position identification line according to the relative state;
and S03, counting the vehicles of which the relative states change in the preset ranges at the two sides of the position identification line.
Compared with the prior art, the vehicle state detection method and the vehicle state detection device aim at the problems of missed detection, high false detection rate and the like of vehicle state detection, vehicle target detection of a quantitative coordinate position is converted into vehicle state detection of a qualitative relative state, meanwhile, deep attention is used for optimizing weight information of a shallower layer and remarkable feature mapping is obtained in a maximized mode, vehicle counting is completed by using a counting algorithm based on the vehicle state, an effective counting area in a counting process is reduced, and about 47.36-70.53% of over-detected targets are reduced under the condition that an overall better counting effect is obtained.
Specifically, in an optional embodiment, the SSD detector provided with the saliency enhancement unit based on the Attention mechanism is used as a neural network for vehicle state Detection, and may be constructed by using a MobileNetV1 network architecture as a backhaul, and combining a Single-Shot Object Detection (SSD) unit and the saliency enhancement (ASE) unit based on the Attention mechanism provided in the embodiment of the present invention, and obtaining the SSD detector that can be used for obtaining a relative state of a vehicle with a preset position identification line as a reference after training of a preset data set. As for the scheme of using other network architectures such as other versions of MobileNet series as the backhaul, or using other convolutional neural network models instead of SSD, even only for detecting the state of the target without performing the subsequent counting step, as long as the convolutional neural network models are combined with ASE, it can be regarded as a modification or equivalent replacement within the spirit and principle of the present invention, and it should be included in the protection scope of the present patent.
In an alternative embodiment, in step S02, a Kernel Correlation Filter (KCF) algorithm may be used to track the detected vehicle, i.e. the vehicles within a preset range on both sides of the position identification line. When the relative state of the tracked vehicle changes, counting is performed. In the counting, the advancing direction of the vehicle is determined according to the change sequence of the relative state, and the vehicles of the type are counted.
As a preferred embodiment, the relative states include up, mid, down, and far; the relative state up represents that the vehicle is located in the preset range of the upper side area of the position identification line, the relative state mid represents that the distance from the vehicle to the position identification line is zero, the relative state down represents that the vehicle is located in the preset range of the lower side area of the position identification line, and the relative state far represents that the vehicle is located outside the preset ranges of the two sides of the position identification line.
Specifically, the "upper region" or the "lower region" of the position recognition line is only used for distinguishing the separation region of the position recognition line on the screen, the concept of "up and down" mentioned in the present embodiment is only described by using a fixed viewing angle in the description, and the above-mentioned designation cannot be used as a limitation to the solution of the present invention, and cannot be understood as indicating or implying relative importance. As an alternative embodiment, the position identification line is a horizontal line in the picture; in other embodiments, the position identification line may also be a vertical line in the frame, and both sides of the position identification line may also be referred to as a "left area" or a "right area", and related contents may also be adjusted accordingly, but the above modification is not substantially different from the scheme of this embodiment, and thus, no further description is given.
As a preferred embodiment, the relative state, the vehicle state, is expressed by the following formula:
wherein D represents the distance from the vehicle to the position identification line, D >0 represents that the vehicle is positioned in the upper area of the position identification line, and D <0 represents that the vehicle is positioned in the lower area of the position identification line; h denotes a picture height.
Specifically, reference may be made to fig. 2 for a schematic diagram of relative states of the vehicle: the middle solid line is a position identification line for acquiring the relative state of the vehicle, and the dotted line is a reference line defining a preset range. The minimum distance of the vehicle from the position recognition line is represented as | D |, and the screen height is represented as H. If the vehicle is in the middle of the position identification line (| D | ═ 0), its relative state is mid, as is vehicle c. If the vehicle is in the upper area of the position identification line and | D | ≦ 0.2 × H is satisfied, the relative state is up, such as vehicle b. If the vehicle is in the area under the position identification line and | D | ≦ 0.2 × H, the relative state is down, such as vehicle D. If the vehicle is in the upper area or the lower area of the position identification line, but | D | >0.2 × H is satisfied, the state is far, such as vehicles a, e, f.
The data set may be from an existing vehicle surveillance video; the method comprises the steps of preprocessing a vehicle monitoring video, adding a position identification line in the video, marking the relative state of a vehicle, and after data enhancement is completed, the video can be used for training a neural network.
The vehicle type detected in the data set and the actual detection process comprises car, bus, truck and other types, the car represents a private car, the bus represents a bus, a passenger bus and the like, the truck represents the truck, the other type represents other vehicle types except the three vehicle types, and each vehicle type comprises four relative states of up, mid, down and far. Since the vehicles in the far state do not need to be counted in this embodiment, the far states of all vehicle categories far from the position identification line are directly marked as the far category without distinguishing the vehicle types. Thus, in the present embodiment, there are thirteen types of vehicle data in the data set: car _ up, car _ mid, car _ down, bus _ up, bus _ down, other _ up, other _ mid, other _ down, and far.
Referring to FIG. 3, a schematic diagram of vehicle data types, wherein (G) the up status of car, labeled as car _ up class; (H) the down state of the bus is marked as a bus _ down type; (I) mid state of truck, labeled truck _ mid class; (J) the down state of the other type vehicle is marked as other _ down type; (K) the far state of car is marked as far, and the vehicle is in the area below the position identification line at the moment; (L) far status of bus, marked far, when the vehicle is in the upper area of the position identification line.
More specifically, in the tracking counting process, since the counting is performed when the state of the tracked vehicle changes, and the advancing direction of the vehicle is determined and the vehicle of the category is counted according to the change sequence of the relative state in the counting. For example, if one object class detected at the beginning is car _ up and the object changes to car _ down, the car classes moving from top to bottom are counted.
Since in this implementation all vehicle types away from the location identification line are defined as far as possible. The vehicles in the far state not only interfere with the classification of the types of the vehicles, but also contribute little to the counting value of the position space where the vehicles in the far state are located, and belong to a reducible space. Therefore, only other types except for far types are tracked and counted in the counting process, and the purposes of compressing counting space and reducing calculated amount are achieved.
Further, the attention-based saliency enhancement unit is used for saliency enhancing shallower feature maps with deeper feature maps in the SSD detector.
Further, referring to fig. 4 and 5, the saliency enhancement unit based on attention mechanism performs saliency enhancement on the shallower feature map with the deeper feature map in the SSD detector as follows:
t01 by mapping F to deeper features b Performing Sigmoid operation on the deconvolution result to obtain attention-focused feature mapping S;
t02 by mapping the feature map S to a shallower feature map F a Performing dot product operation to obtain feature mapping P with enhanced attention;
t03 by mapping F to deeper features b Performing linear interpolation operation on the convolution result to obtain a feature map L with the same spatial resolution as the feature map P;
t04, obtaining a fused feature map a by adding the feature map P to the feature map L;
t05 by mapping the feature map A with a shallower feature map F a And carrying out maximum operation to obtain the feature mapping M with enhanced significance.
Through the improvement, the receptive field of the shallow feature mapping can be effectively increased and the multi-scale features can be fused.
In FIG. 5, Deconv is shown as a deconvolution operation, with the effect of taking F b Feature mapping scale up to F a And the number of channels is consistent; conv is denoted as a coherency operation, and in this embodiment the parameters are set to: pad 1, stride 1, kernel _ size 3, and channel number setting and feature mapping F a Equal; max is an operation of obtaining the maximum value of the feature.
As a preferred embodiment, the SSD detector includes six layers of feature maps for detection, wherein the first three layers of feature maps from shallow to deep are respectively provided with saliency enhancement units based on attention mechanism, and the saliency enhancement units based on attention mechanism respectively perform saliency enhancement on the current layer feature map by using adjacent deeper layer feature maps.
Specifically, referring to fig. 6, there are 6 feature maps for target detection in the SSD, which are referred to as conv11, conv13, conv14_2, conv15_2, conv16_2 and conv17_2 respectively, and considering that the receptive field of the feature map of the previous three layers in the SSD is small and the feature information characterization is relatively insufficient, the feature maps of the previous three layers, i.e., conv11, conv13 and conv14_2, are optimized by using ASE units; each layer of the integrated ASE unit significantly enhances the current layer using the adjacent deeper layer as an input. The last three layers of the SSD have a large receptive field, and if the ASE unit is integrated, the improvement of the detection accuracy is not obvious, and the calculation amount is increased, so that it is a preferred embodiment. The last three layers have no integrated ASE cells. In the present embodiment, the modified vehicle detector (detector) may be referred to as SSD-ASE.
Next, the present example further demonstrates and proves the scheme and effects of the present invention by way of experimental tests:
the experimental environment of this example is Ubuntu 16.04 and the deep learning framework is cafe 1.0. The hardware used by the server is configured to: the CPU uses Intel (R) core (TM) i5-4690 CPU @3.50GHz, the GPU uses NVIDA GeForce GTX1070Ti, and the memory is 8G.
The data set used in the experimental test of the present embodiment is mainly derived from the test video of the red 2016 and the vehicle running video collected by the vehicle. The resolution of the self-collected video is 1920x1080, the color is realized, and the frame rate is 19.9-45.04 frames/second. For each sample image, position identification lines of 4 states are automatically added according to the formula of the aforementioned vehicle state, and corresponding labels are set for the samples to which the identification lines are added. And finally, performing data enhancement on the sample by adopting operations such as horizontal mirroring, Gaussian blur and the like. The sample distribution of the data set is shown in table 1, and comprises a total of 24794 sample images. Setting background label to 0, otherwise 2336 for the number of samples for which the vehicle type is car and the status is down, and1 for label is recorded as 2336/1. We divided the data set into a training set (14873), a validation set (2482) and a test set (7439) in a ratio of 6:1: 3. When a vehicle counting test is carried out, two sections of videos Video 1 and 2 under different scenes are used, the Video 1 is a one-way lane, the time length is 7min 45s, and the total number of vehicles is 160. Video2 is a bidirectional lane, the duration is 5min 42s, and the total number of vehicles is 240.
TABLE 1 vehicle State data set sample distribution and tag settings
For the experimental tests performed in this example, the accuracy of vehicle detection can be evaluated using the mAP and the miss rate:
wherein N is the number of samples in the test set, Pr (i) is the precision value for identifying i pictures, Delta Re (i) is the Recall change value for identifying i-1 picture to i picture, and m is the number of categories of all the detected samples. TP is the number of positive samples, and the detected samples are correctly identified; FP is the number of false identified detected samples, whose negative samples are false identified as positive samples. FN is the number of detected samples that are misidentified and positive samples are misidentified as negative samples.
Meanwhile, the experimental test of this example uses a Log-average miss rate, which represents a calibrated by averaging miss rates at 9 even spaced FPPI (False Positive Image) points between 10e-2 and10e0, in Log-space, to compare the degree of detector omission. The smaller this value, the higher the detection accuracy of the model.
Vehicle counting is evaluated using a count accuracy Acc, which may be expressed as:
wherein the number of vehicles counted by mistake is N w (including the number of duplicate and missing counts), N a The total number of real vehicles.
Counting the calculated quantity by using the number of times N of vehicle tracking trace A comparison is made wherein for a conventional non-relative state based approach N trace I.e. the number of all target vehicles detected, for the relative state based method N trace Is the target number of non-far states detected.
First, in the present embodiment, the MobileNetV1+ SSD is used as a reference Network (Baseline Network) to perform an ablation experiment on the ASE unit, and table 2 shows a comparison of detection performances after the ASE unit is added. As can be seen from table 2, the vehicle detected the mAP in an increasing trend as new ASE units were added. When ASE is added to the reference network 1 The mAP is improved by 0.33 percent compared with the reference network; when ASE is added 1 And ASE 2 When the method is used, the mAP is improved by 0.62%; when 3 ASE units are added, the mAP reaches the highest (95.94%), which is 1% higher than that of the reference network; when every ASE structure is added, mAP hasA lift of about 0.3%.
TABLE 2 ASE ablation experiments
Network Name
|
Detector
|
ASE 1 |
ASE 2 |
ASE 3 |
mAP
|
Baseline
|
SSD
|
|
|
|
94.94%
|
Baseline+ASE 1 |
SSD-ASE 1 |
√
|
|
|
95.27%
|
Baseline+ASE 1,2 |
SSD-ASE 1,2 |
√
|
√
|
|
95.56%
|
Baseline+ASE 1,2,3 |
SSD-ASE 1,2,3 |
√
|
√
|
√
|
95.94% |
Referring to fig. 7, comparing the Log-average miss rates of the Baseline network and the improved network with the added ASE unit, most of the Baseline networks have lower false negative rates than the Baseline network after the added ASE unit. Adding ASE 1 Then, the missing detection rate of car _ up, car _ down and truck _ up is obviously reduced by 0.03,0.02 and 0.02 respectively. While using ASE 1,2 Then, the false drop rate of car _ up, car _ down, and far is significantly reduced by 0.05,0.02, and 0.04, respectively. And addition of ASE 1,2,3 After that, most of the missed detection rates become lower. The detection missing rate of car _ up, truck _ up and far classes is obviously reduced and is respectively 0.05,0.11 and 0.17. The missing rate of far class is reduced by about 23 percent compared with that of Baseline.
Table 3 shows a comparison of vehicle detection for several of the state-of-the-art methods of the present embodiment with a reference network. As can be seen from Table 3, the preferred network of this embodiment (MobileNet V1+ SSD-ASE) 1,2,3 ) The overall speed and accuracy balance is achieved, both the real-time requirement of detection (about 56FPS) is met, and the best mAP value (95.94%) is obtained. Compared with a base line network, the optimal network of the embodiment has the advantages that the mAP value is improved by 1%, and the detection performance is close to that of the base line network; in addition, it is compatible with VGG16+ SSDCompared with the prior art, the mAP and the FPS of the preferred network are slightly higher than those of VGG16+ SSD; compared with the fast RCNN as a detector and two networks using ZF and VGG-16, the preferred network of the embodiment has great advantages in detection accuracy and performance.
TABLE 3 test results of different methods
Backbone
|
Detector
|
mAP
|
FPS(f/s)
|
MobileNetV1
|
SSD
|
94.94%
|
66
|
VGG16
|
SSD
|
95.73%
|
49
|
VGG16
|
Faster RCNN
|
92.11%
|
9
|
ZF
|
Faster RCNN
|
79.52%
|
26
|
MobileNetV1
|
SSD-ASE 1,2,3 |
95.94%
|
56 |
In subsequent vehicle counting experiments, the present embodiment compares the relative state-based method (the detection network employs Baseline + ASE1,2,3) with the Non-relative state-based conventional method (the detection network employs Baseline network), wherein the Non-state based counting method is trained by using the data-enhanced data set without the position recognition line.
Referring to fig. 8 and 9, different frame intervals (F) are compared in the experiment inter 15,10,5) for video 1 (one-way lane) and video2 (two-way lane). It can be seen that when F inter When the counting rate is 15, the counting accuracy of both video 1 and video2 is below 90%. With F inter There is some improvement in the accuracy of both methods due to the smaller F inter The value can reduce the probability of false positives or false negatives, thereby reducing the occurrence of false negatives or duplicate counts. This is why the prior art has chosen smaller frame intervals for vehicle counting. At smaller frame intervals (F) inter Either 5 or 10), the method of this embodiment has a higher Acc than the conventional method. Fig. 8 shows a counting experiment for video 1, and Acc of the method of the present embodiment is improved by 0.62% and 4.39% at Finter ═ 5 and10, respectively, compared to the conventional method. Fig. 9 shows a counting experiment for video2, and for video2, due to a two-lane scene, the vehicle is smaller than video 1 within the same field of view, Acc of the method of the embodiment is respectively improved by 3.75% and 7.08%, which is more obvious than that of the conventional method, and it can be seen that the SSD-ASE detector of the embodiment has an advantage for small target detection.
Referring to FIG. 10 and FIG. 11, FIG. 10 shows different frame intervals (F) inter Number of counted tracings of video 1 (one-way lane) (N) under 15,10,5) trace ) In comparison, FIG. 11 shows the frame spacing (F) at different frame intervals inter Number of times of tracking count of video2 (bidirectional lane) under 15,10,5) (N) trace ) And (6) comparing. As can be seen from FIGS. 10 and 11, following F inter N of the method of the present embodiment and the conventional method trace There is a large increase, but at the same frame interval, the present embodiment is based on N of the state method trace Is obviously lower than the traditional method. FIG. 10 shows video 1 at F inter 15,10,5, N of the method of the present embodiment trace Compared with the traditional method, the method reduces 70.53%, 70.46% and 69.81% respectively. In video2 (FIG. 11) there was a reduction of 47.36%, 58.71% and 56.28%, respectively. Since the target range of the counting process is compressed by using far class detection, the counting method based on the state greatly reduces the tracking times.
The present invention also provides the following:
a vehicle detection and counting system, please refer to fig. 12, comprising:
a vehicle relative state acquisition module 1, configured to acquire a relative state of the vehicle with reference to a preset position identification line by using an SSD detector provided with a saliency enhancement unit based on an attention mechanism;
the vehicle tracking module 2 is used for tracking the vehicles in the preset ranges at two sides of the position identification line according to the relative state;
and the vehicle counting module 3 is used for counting the vehicles of which the relative states change in the preset ranges at the two sides of the position identification line.
A storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the vehicle detection and counting method as described above.
A computer device comprising a storage medium, a processor and a computer program stored in the storage medium and executable by the processor, the computer program, when executed by the processor, implementing the steps of the aforementioned vehicle detection counting method.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. This need not be, nor should it be exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.