CN113158747A - Night snapshot identification method for black smoke vehicle - Google Patents

Night snapshot identification method for black smoke vehicle Download PDF

Info

Publication number
CN113158747A
CN113158747A CN202110146700.1A CN202110146700A CN113158747A CN 113158747 A CN113158747 A CN 113158747A CN 202110146700 A CN202110146700 A CN 202110146700A CN 113158747 A CN113158747 A CN 113158747A
Authority
CN
China
Prior art keywords
time
smoke
network
space
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110146700.1A
Other languages
Chinese (zh)
Inventor
李晓斌
李毓勤
何玉龙
周当
刘颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Skyland Information Technology Co ltd
Original Assignee
Guangzhou Skyland Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Skyland Information Technology Co ltd filed Critical Guangzhou Skyland Information Technology Co ltd
Priority to CN202110146700.1A priority Critical patent/CN113158747A/en
Publication of CN113158747A publication Critical patent/CN113158747A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/40Analysis of texture
    • G06T7/41Analysis of texture based on statistical description of texture
    • G06T7/44Analysis of texture based on statistical description of texture using image operators, e.g. filters, edge density metrics or local histograms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/584Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of vehicle lights or traffic lights

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a method for capturing and identifying a black smoke vehicle at night, which comprises the following steps: acquiring a vehicle video image; extracting texture features of the video image; extracting space-time characteristics based on a space-time two-way 3D residual convolution network; removing nighttime interference; extracting time domain characteristics and space domain characteristics after interference removal, introducing an attention mechanism, newly calibrating time-space domain network characteristic channels, weighting time-space domain networks respectively, screening out characteristics favorable for classification results, fusing the time-space domain characteristics to obtain a final classification identification network, and directly obtaining a classification result by taking a video block as input.

Description

Night snapshot identification method for black smoke vehicle
Technical Field
The invention belongs to the technical field of environment, and particularly relates to a night snapshot identification method for a black smoke vehicle.
Background
The tail gas discharged by the motor vehicle contains toxic and harmful substances, black smoke vehicles are typical representatives of high-pollution vehicles, and more than 80 percent of the black smoke vehicles are diesel oil type commercial vehicles (trucks and buses). The tail gas discharged by the black smoke vehicle not only pollutes the atmosphere, but also is harmful to human health. Therefore, the pollution control of black smoke vehicles has been the major task of pollution control of motor vehicles for a long time.
At present, in an online monitoring mode of electronic snapshot of the black smoke vehicle, the black smoke vehicle can be intelligently identified to be a powerful auxiliary tool for dealing with motor vehicle pollution. However, based on technical and equipment characteristics, the recognition rate of the black smoke vehicle snapshot is high only in daytime. Because the snapshot is easily influenced by factors such as light, shadow, traffic flow and the like under the night environment, especially the street lamp and the opposite traffic flow far-reaching lamp influence the black smoke recognition, the accurate recognition rate of the snapshot of the black smoke vehicle at night is very low.
Disclosure of Invention
The invention provides a black smoke vehicle night snapshot identification method aiming at the problems in the prior art.
The invention is realized by the following technical scheme:
a black smoke vehicle night snapshot identification method comprises the following steps:
s1: acquiring a vehicle video image, and preprocessing the video image;
s2: extracting texture features of the preprocessed video image, removing a video dynamic background, extracting features by adopting an LBP (local binary pattern) texture classification feature algorithm, and extracting motion features by adopting an optical flow method;
s3: extracting space-time characteristics based on a space-time two-way 3D residual convolution network, wherein the space-time characteristics are extracted based on a space-time deep neural network strategy and the time domain characteristics are extracted based on an LSTM strategy;
s4: removing nighttime interference;
s5: extracting time domain characteristics and space domain characteristics after interference removal, introducing an attention mechanism, newly calibrating time-space domain network characteristic channels, weighting time-space domain networks respectively, screening out characteristics favorable for classification results, fusing the time-space domain characteristics to obtain a final classification identification network, and directly obtaining a classification result by taking a video block as input.
Further, the preprocessing comprises the steps of obtaining video streams from a camera gun through a network, decoding and video restoration by utilizing a video coding and decoding technology, and performing brightness adjustment, image correction and denoising processing on the restored video based on a preset video preprocessing algorithm.
Further, in the step S2, the removing the video dynamic background includes the steps of:
s201, each new pixel value Xt is compared with the current K models according to the following formula until a distribution model matching the new pixel value is found, namely the mean deviation of the new pixel value and the distribution model is within 2.5 sigma,
|Xti,t-1|≤2.5σi,t-1
s202, if the matched mode meets the background requirement, the pixel belongs to the background, otherwise, the pixel belongs to the foreground;
s203, updating each model weight according to the following formula, wherein alpha is learning rate, and for the matched model M k,t1, otherwise Mk,tWhen the weight of each mode is equal to 0, normalizing the weight of each mode;
wk,t=(1-α)*wk,t-1+α*Mk,t
s204, the mean value mu of the unmatched model is unchanged from the standard deviation sigma, and the parameters of the matched model are updated according to the following formula:
ρ=α*η(Xtk,σk)
μt=(1-ρ)*μt-1+ρ*Xt
Figure RE-GDA0003106704320000031
wherein ρ represents a threshold value satisfying the matching model;
s205, if no model is matched in the step 1, replacing the model with the minimum weight, namely, the mean value of the model is the current pixel value, the standard deviation is an initial large value, and the weight is a small value;
s206, each model is according to w/alpha2The data are arranged in descending order, and the mode with heavy weight and small standard deviation is arranged in front;
s207, selecting the first B models as backgrounds, wherein B satisfies the following formula, parameter T represents the proportion of the backgrounds,
Figure RE-GDA0003106704320000032
further, in the step S3, in the spatial domain feature extraction based on the spatio-temporal depth neural network policy, after the motion feature extraction, preliminary spatial domain discrimination is performed: when the airspace is judged to be smoke, motion information between a group of continuous frames is accumulated through a time flow network part and a circulating neural network part to distinguish a smoke area from a non-smoke area, and after most of the non-smoke areas are filtered, the space-time domain characteristics of the smoke area are extracted for classification and identification.
Further, the space-time deep neural network combines the 3D convolutional neural network and the DenseNet, and decomposes the 3D CNN into the size DkX1 time convolution kernel and size dkTwo pseudo 3D CNNs of a x 3 spatial convolution kernel; for an input feature map of d × w × h × c, where d is the video frame length, w is the video width, h is the video height, and c is the video frame input feature dimension, the pass size is dkThe number of calculation parameters for a x 3 × 3D convolution kernel is: dXwXhXcXdk×3×3;
And the calculation parameters for the decomposed pseudo-3D convolution kernel are:
d×w×h×c×(dk+3×3)。
furthermore, the space-time two-way 3D residual convolution network comprises a plurality of S-P3D network blocks and a plurality of T-P3D network blocks, wherein the plurality of connected T-P3D network blocks are connected with the plurality of S-P3D network blocks to form a space-time two-way network for respectively extracting the space-time characteristics of smoke.
Further, in the step S4, the nighttime interference is removed based on the super-resolution variation algorithm.
Further, in step S5, the method specifically includes:
s501, inputting a network: taking a small video block with the size of dxwxhxc as the input of a network, performing dimension increasing on the input through a layer of 3D convolution layer with 16 convolution kernels with the size of 1x1, and extracting low-layer time-space domain features;
s502, time domain and space domain network: and respectively obtaining two time domain characteristic layers and space domain characteristic layers with the sizes of D/2 xw/2 xh/2 x 32 by inputting with the sizes of D xw xh x 16 through a time-space domain network consisting of an S-P3D network structure block, a T-P3D network structure block and a 3D pooling network structure block.
S503, outputting a network: connecting space-time characteristics extracted by a space-time two-path 3D residual convolution network in a characteristic channel dimension to obtain space-time fusion characteristics with the size of D/2 xw/2 xh/2 x 64, normalizing the space-time fusion characteristics by a layer of 3D convolution layer containing 64 convolution kernels with the size of 1x1x1, finally accessing a global pooling layer, and obtaining a final classification result through softmax layer evaluation;
s504, evaluating and judging the road black smoke classification result, wherein the evaluation formula is as follows:
Figure RE-GDA0003106704320000051
Figure RE-GDA0003106704320000052
Figure RE-GDA0003106704320000053
in the formula: ACC represents accuracy, N is the total number of samples; TPR denotes smoke predicted as smokeThe number of sample results, namely the detection rate; TNR denotes the number of non-smoke sample results predicted to be non-smoke, i.e. false detection rate; tp represents the amount of correctly detected smoke in the total number of smoke; fNA number representing an unrecognized actual smoke region; fp represents the amount of non-smoke identified as smoke; t isNIndicating the number of non-smoke regions identified as non-smoke regions.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method for identifying a black-smoke vehicle snapshot at night.
A computer device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the night snapshot recognition method of the black smoke vehicle.
Compared with the prior art, the invention has the following advantages: according to the invention, the intelligent recognition algorithm for the snapshot of the black smoke vehicle is optimized on the basis of the snapshot recognition of the black smoke vehicle, the problem of night interference is solved, the error recognition rate at night is reduced, all-weather snapshot at day and night is realized, and the black smoke vehicle is intelligently recognized.
Drawings
The present invention will be described in further detail with reference to the accompanying drawings;
FIG. 1 is a diagram of encoding neighboring pixels using a variable radius circle;
FIG. 2 is a diagram of a spatiotemporal depth-based neural network architecture;
FIG. 3 is a transformation diagram of a 3D CNN decomposed into two pseudo 3D CNNs;
FIG. 4 is a diagram of a spatiotemporal two-way network architecture;
FIG. 5 is a diagram of a spatial network architecture of the present invention;
FIG. 6 is an LBP algorithm implementation;
FIG. 7 is an output phase function variation;
fig. 8 is another function variation in the output phase.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention relates to a black smoke vehicle night snapshot identification method, wherein a light supplementing device is additionally arranged on a rod piece above a road, a video image snapshot by a high-definition camera is transmitted to an inference machine by the system, an artificial intelligence-based image identification algorithm is deployed in the inference machine, the black smoke vehicle is identified and judged by utilizing an image processing technology, texture feature extraction and a night snapshot algorithm, and data are transmitted to a rear-end platform through a network. In a specific embodiment, the method comprises the following steps:
1. vehicle video image acquisition
A high-definition camera is deployed on the road rod piece to record the passing vehicles. A virtual position covering a full lane (no more than three lanes) is arranged at a distance of 22-24 meters in front of the camera, and when the vehicle passes through a preset position, the high-definition camera captures the image at the moment.
2. Image pre-processing
The method comprises the steps of obtaining video streams from a camera gun through a network, decoding and video restoration by utilizing a video coding and decoding technology, and carrying out brightness adjustment, image correction, denoising and other processing on videos based on the existing video preprocessing algorithm.
(1) Image graying
When analyzing the image problem, because the environment and the shooting self factor influence, need carry out certain conversion to the image, use the RGB model, assume that the values of three channels are equal, then represent the color information of this point with unified grey scale value, the range of grey scale value is 0 to 255. The image graying method mainly uses a weighted average method here.
The formula of the weighted average method RGB graying is as follows: gray (x, y) ═ UrR(x,y)+UgG(x,y) +UbB(x,y)
In the above formula, U represents the weight of three channels, and the sum of the three is 1.
According to the conversion relationship between the YUV color space and the RGB color space, when determining that the weights are 0.3008, 0.5958 and 0.1133 respectively, the simplified formula is as follows:
Gray(x,y)=0.3008*R(x,y)+0.5958*G(x,y)+0.1133*B(x,y)
(2) image de-noising
The grey values (including the central pixel point) of all the pixel points in the window range taking a certain pixel point as the center in the image are sequenced by using a median filtering method, and then the middle value of the grey sequence is assigned to the central pixel point. Unlike other linear filters that consider each pixel, median filtering ignores most of the pixels in the relative neighborhood (here set to 3 x 3 windows), which is dark or bright, and occupies less than half of the total pixels (i.e., 3 pixels)2And/2), thereby effectively filtering isolated noise points.
The median filtering method based on the number limits the range of the gray difference absolute value of the pixel point in the neighborhood and the central pixel point by setting a threshold value rho, thereby dividing the attribute of the pixel point in the noisy image into three types, namely a flat area, an image edge and a noise point. Taking a window with a size of 3 × 3 as an example, assuming that the number of adjacent pixels in 8 neighborhoods of the central pixel, the absolute value of the gray difference between the adjacent pixels and the central pixel is greater than a threshold value T is m, when m is less than or equal to 2, the central pixel is a flat area point; when m is more than 2 and less than 6, the central pixel point is an image edge point; when m is larger than or equal to 6, the central pixel point is a noise point.
3. Video image texture feature extraction
(1) Video dynamic background removal
In order to find a moving pixel set in a video, a background removal method using a mixed gaussian model is used, in which a moving region is set to white and other regions not moving are set to a black background. The method comprises the following steps:
1. each new pixel value XtAnd comparing the current K models according to the following formula until a distribution model matching with a new pixel value is found, namely the mean deviation of the distribution model and the model is within 2.5 sigma.
|Xti,t-1|≤2.5σi,t-1
2. If the matched pattern meets the background requirement, the pixel belongs to the background, otherwise, the pixel belongs to the foreground.
3. The weight of each pattern is updated according to the following formula, wherein alpha is the learning rate, and the matched pattern M k,t1, otherwise Mk,tThe weights of the modes are then normalized to 0.
wk,t=(1-α)*wk,t-1+α*Mk,t
4. The mean value mu of the unmatched pattern is unchanged from the standard deviation sigma, and the parameters of the matched pattern are updated according to the following formula:
ρ=α*η(Xtk,σk)
μt=(1-ρ)*μt-1+ρ*Xt
Figure RE-GDA0003106704320000081
5. if no pattern match occurs in step 1, the pattern with the smallest weight is replaced, i.e. the mean value of the pattern is the current pixel value, the standard deviation is an initial larger value, and the weight is a smaller value.
6. Each mode according to w/alpha2And the patterns with large weight and small standard deviation are arranged in descending order and are arranged in front.
7. And B modes in the front are selected as backgrounds, B satisfies the following formula, and parameter T represents the proportion of the backgrounds.
Figure RE-GDA0003106704320000091
(2) Smoke LBP characteristic value
In order to extract the texture of smoke, an lbp (local binary pattern) texture classification feature algorithm is used for extraction.
The LBP algorithm can be used to calculate the relationship between pixels, and is an operator used to describe the local texture features of the image, and the content of the reflection is the relationship between each pixel and the surrounding pixels. The LBP algorithm divides the image into 3 × 3 sub-regions, extracts LBP features for each pixel in each sub-region, and the implementation method is as shown in fig. 6:
where (Xc, Yc) is the center pixel and the luminance is ic(ii) a And inIt is the brightness of the neighboring pixel. s is a sign function:
Figure RE-GDA0003106704320000092
the description method can well capture the details in the image. As the method just described is proposed, the fixed neighborhood fails for scale-varying coding. Therefore, an extension method using a variable is to encode neighbor pixels using a circle of variable radius, as in fig. 1, so that the following neighbors can be captured:
for a given point (Xc, Yc), its neighbors (Xp, Yp), P ∈ P, may be calculated as follows:
Figure RE-GDA0003106704320000101
Figure RE-GDA0003106704320000102
where R is the radius of the circle and P is the number of sample points.
This is an extension of the original LBP operator and is sometimes referred to as an extended LBP (referred to as a circular LBP). If a point on the circle is not on the image coordinates, we use his interpolated point. Interpolation methods are used, whereas OpenCV uses bilinear interpolation, as follows.
Figure RE-GDA0003106704320000103
(3) Vector extraction by optical flow method
In order to observe the motion trail of the moving object, an optical flow method is adopted to extract motion characteristics.
4. Neural network strategy extraction space-time characteristics
(1) Spatial domain feature extraction based on space-time deep neural network strategy
In order to obtain richer texture information and spatial information of black smoke, a spatio-temporal deep neural network is used. After the characteristics of the motion area are automatically extracted, a preliminary airspace is distinguished: motion information between a set of consecutive frames is further accumulated by the time flow network and the recurrent neural network parts on the basis of judging the airspace as smoke to distinguish between smoke and non-smoke regions.
In order to further reduce the detection area, after most non-smoke areas are filtered, the time-space domain features of the video smoke are extracted for classification and identification. By combining the 3D convolutional neural network with DenseNet, in order to reduce the model parameters, 3 × 1 × 1 and 1 × 3 × 3 convolution kernels are used instead of the original 3 × 3 × 3 convolution kernel, as shown in fig. 2.
The 3D CNN is decomposed into two pseudo-3D CNNs, the expansion from 2DCNN to 3DCNN is realized, as shown in figure 3, the classification and identification capability of the network is ensured, the network parameter quantity is reduced, and the spatial domain characteristics are extracted.
By a size dkX1 time convolution kernel and size dkThe x 3 space convolution kernel is replaced by two pseudo 3D network structures, and the parameter amount of the network is reduced under the condition of ensuring the same classification capability. For an input feature map of d × w × h × c, where d is the video frame length, w is the video width, h is the video height, and c is the video frame input feature dimension, the pass size is dkThe number of calculation parameters for a x 3 × 3D convolution kernel is:
d×w×h×c×dk×3×3
and the calculation parameters for the decomposed pseudo-3D convolution kernel are:
d×w×h×c×(dk+3×3)
it follows that the pseudo 3D convolution kernel calculation parameter is D of the 3D convolution kernel k32/(dk+32) The number of network parameters is greatly reduced, and good classification capability can be ensured.
(2) Extraction of time domain features based on LSTM strategy
In order for the neural network to classify events at each time point and to be able to infer the next event using the previous event, the LSTM neural network is used for long-term memory.
There are three main stages inside the LSTM:
1. forget the stage. This stage is mainly the selective forgetting of the input coming from the previous node. Simply put, "forget unimportant and remember important".
2. The memory stage is selected. This stage selectively "remembers" the inputs of this stage. Will mainly be to the input XtAnd performing selection and memory. Which important ones are recorded and which ones are not important, and the others are recorded less. The current input content is represented by Z calculated previously. And the selected gating signal is represented by Zi(i represents information) to perform control.
3. And (5) an output stage. This phase will determine which will be the output of the current state. Mainly through Z0To be controlled. And also for C obtained in the previous stage0Scaling (variation by a tanh activation function) was performed. As shown in fig. 7:
here: x is the input of data in the current state, and h represents the input of the last node received.
y is the output at the current node state and h' is the output passed to the next node. As shown in fig. 8:
when the input characteristic value x is less than 0, the output is 0, the more neurons which are 0 after training are completed, the greater the sparsity is, the more representative the extracted characteristic is, and the stronger the generalization capability is; when the input characteristic value x is larger than 0, the output is equal to the input, the gradient dissipation problem is avoided, and the convergence is fast.
The output of the convolution layer network is subjected to batch normalization, so that the normalization of the output is used as the input of an activation function, and the problems of inconsistent data distribution and gradient dispersion of different layers are solved. The basic structure of the space-time two-way 3D residual error convolution network is obtained, and FIG. 4 shows a two-way 3D residual error convolution network structure and an evolution structure, which are respectively as follows: P3D, S-P3D network blocks, T-P3D network blocks. And connecting the plurality of T-P3D network blocks with the plurality of S-P3D network blocks to construct a space-time two-way network for respectively extracting the space-time characteristics of the smoke.
(3) Attention mechanism is introduced to improve classification and identification capability
In order to improve the detection efficiency, attention is drawn to a mechanism, the importance degree of the features extracted from the time domain network and the space domain network is automatically learned, the features favorable for classification are improved, and the features useless for classification are suppressed. The characteristic channels of the space-time two-way network are weighted respectively, and the classification and identification capability of the network is improved. The network structure comprises the following three steps:
1) squeeze operation, namely compressing all dimensions except the dimension c of the characteristic channel in the input tensor through a global pooling layer to convert the input tensor into a real number vector with the size of the characteristic channel number, wherein the input with the size of dxwxhxc is 1 xc;
2) an Excitation operation, namely compressing the converted feature vector through a full connection layer to reduce the dimension to c/r, wherein the dimension is 1 xc/r, and obtaining a feature weight vector with the output dimension matched with the number of input feature channels through the full connection layer after function activation, wherein the dimension is 1 xc;
3) and (4) reweighting operation, namely normalizing the weight through a Sigmoid function, and finally weighting the weight obtained by the Excitation operation on the characteristic channel so as to realize the recalibration of the characteristic.
The normalized feature channel weight in the Reweight operation is in the range of (0,1), wherein the closer to 0, the less influence of the feature on the classification result is shown, and the closer to 1, the greater influence of the feature on the classification result is shown.
5. Nighttime interference removal
The method is easily influenced by factors such as light, shadow, traffic flow and the like in night environment, particularly influences of street lamps and opposite traffic flow high beam lamps on black smoke identification, and is used for solving the problem of night interference through a super-resolution variation algorithm so as to reduce the error identification rate. The super-resolution variation algorithm is realized as follows:
and taking the candidate motion area as the input of a space network to extract space domain characteristics, further utilizing RNN to accumulate the motion characteristics in time on the basis of smoke, and finally using a Softmax loss function to classify and identify. The loss function is as follows:
target perceptual loss function
Figure RE-GDA0003106704320000141
6. Evaluating the classification result
Respectively extracting time domain characteristics and space domain characteristics of the video, attracting attention, newly calibrating time-space domain network characteristic channels, respectively weighting the time-space domain network, screening out characteristics favorable for classification results, then fusing the time-space domain characteristics to obtain a final classification identification network, and directly obtaining the classification results by taking the video block as input. The realization method comprises the following steps:
and (3) an input network, namely taking a small video block with the size of dxwxhxc as the input of the network, and increasing the dimension of the input and extracting the low-layer time-space domain characteristics through a layer of 3D convolution layer with 16 convolution kernels with the size of 1x 1.
And (3) a time domain and space domain network, namely, obtaining two time domain characteristic layers and space domain characteristic layers with the sizes of D/2 xw/2 xh/2 x 32 by inputting with the size of D xw xh x 16 through the time-space domain network consisting of an S-P3D network structure block, a T-P3D network structure block and a 3D pooling network structure block respectively.
The output network is used for connecting the features extracted by the space-time two-way network in the dimension of the feature channel to obtain the value
D/2 xw/2 xh/2 x 64 space-time combination characteristics, normalizing the fused space-time characteristics through a layer of 3D convolution layer containing 64 convolution kernels with the convolution kernel size of 1x1x1, finally connecting a global pooling layer, and obtaining a final classification result through softmax layer evaluation.
And evaluating and judging the road black smoke classification result, wherein the evaluation formula is as follows:
Figure RE-GDA0003106704320000151
Figure RE-GDA0003106704320000152
Figure RE-GDA0003106704320000153
ACC represents accuracy, and N is the total number of samples; TPR represents the number of smoke sample results predicted as smoke, i.e., the detection rate; TNR denotes the number of non-smoke sample results predicted to be non-smoke, i.e. false detection rate; tp represents the amount of correctly detected smoke in the total number of smoke; fNA number representing an unrecognized actual smoke region; fp represents the amount of non-smoke identified as smoke; t isNIndicating the number of non-smoke regions identified as non-smoke regions.
The invention also provides a computer-readable storage medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the steps of the black-smoke vehicle night snapshot recognition method.
The invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the program to realize the steps of the night snapshot identification method of the black smoke vehicle.
The above-mentioned embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, and it should be understood that the above-mentioned embodiments are only examples of the present invention and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the invention are also within the protection scope of the invention.

Claims (10)

1. A black smoke vehicle night snapshot identification method is characterized by comprising the following steps:
s1: acquiring a vehicle video image, and preprocessing the video image;
s2: extracting texture features of the preprocessed video image, removing a video dynamic background, extracting features by adopting an LBP (local binary pattern) texture classification feature algorithm, and extracting motion features by adopting an optical flow method;
s3: extracting space-time characteristics based on a space-time two-way 3D residual convolution network, wherein the space-time characteristics are extracted based on a space-time deep neural network strategy and the time domain characteristics are extracted based on an LSTM strategy;
s4: removing nighttime interference;
s5: extracting time domain characteristics and space domain characteristics after interference removal, introducing an attention mechanism, newly calibrating time-space domain network characteristic channels, weighting time-space domain networks respectively, screening out characteristics favorable for classification results, fusing the time-space domain characteristics to obtain a final classification identification network, and directly obtaining a classification result by taking a video block as input.
2. The black smoke vehicle night snapshot recognition method of claim 1, wherein the preprocessing comprises acquiring a video stream from a camera gun through a network, decoding and video restoration by using a video encoding and decoding technology, and performing brightness adjustment, image correction and denoising processing on the restored video based on a preset video preprocessing algorithm.
3. The method for identifying the night snapshot of the black smoke vehicle according to claim 1, wherein in the step S2, the removing the dynamic background of the video comprises the steps of:
s201. each new pixel value XtAnd then the K models are compared according to the following formula until a distribution model matching with a new pixel value is found, namely the mean deviation of the K models is within 2.5 sigma,
|Xti,t-1|≤2.5σi,t-1
s202, if the matched mode meets the background requirement, the pixel belongs to the background, otherwise, the pixel belongs to the foreground;
s203, updating each model weight according to the following formula, wherein alpha is learning rate, and for the matched model Mk,t1, otherwise Mk,tWhen the weight of each mode is equal to 0, normalizing the weight of each mode;
wk,t=(1-α)*wk,t-1+α*Mk,t
s204, the mean value mu of the unmatched model is unchanged from the standard deviation sigma, and the parameters of the matched model are updated according to the following formula:
ρ=α*η(Xtk,σk)
μt=(1-ρ)*μt-1+ρ*Xt
Figure FDA0002930749920000021
wherein ρ represents a threshold value satisfying the matching model;
s205, if no model is matched in the step 1, replacing the model with the minimum weight, namely, the mean value of the model is the current pixel value, the standard deviation is an initial large value, and the weight is a small value;
s206, each model is according to w/alpha2The data are arranged in descending order, and the mode with heavy weight and small standard deviation is arranged in front;
s207, selecting the first B models as backgrounds, wherein B satisfies the following formula, parameter T represents the proportion of the backgrounds,
Figure FDA0002930749920000022
4. the black smoke vehicle night snapshot recognition method according to claim 1, wherein in the step S3, in the spatial domain feature extraction based on the space-time deep neural network strategy, after the motion feature extraction, a preliminary spatial domain discrimination is performed: when the airspace is judged to be smoke, motion information between a group of continuous frames is accumulated through a time flow network part and a circulating neural network part to distinguish a smoke area from a non-smoke area, and after most of the non-smoke areas are filtered, the space-time domain characteristics of the smoke area are extracted for classification and identification.
5. The method for night snapshot recognition of black smoke vehicle as claimed in claim 4, wherein said spatiotemporal deep neural network combines 3D convolutional neural network and DenseNet and decomposes 3D CNN into size DkX1 time convolution kernel and size dkTwo pseudo 3D CNNs of a x 3 spatial convolution kernel; for an input feature map of d × w × h × c, where d is the video frame length, w is the video width, h is the video height, and c is the video frame input feature dimension, the pass size is dkThe number of calculation parameters for a x 3 × 3D convolution kernel is: dXwXhXcXdk×3×3;
And the calculation parameters for the decomposed pseudo-3D convolution kernel are:
d×w×h×c×(dk+3×3)。
6. the black smoke vehicle night snapshot recognition method of claim 5, wherein in the step S3, the space-time two-way 3D residual convolution network comprises a plurality of S-P3D network blocks and a plurality of T-P3D network blocks, and a plurality of connected T-P3D network blocks are connected with the plurality of S-P3D network blocks to construct a space-time two-way network for respectively extracting the space-time characteristics of smoke.
7. The method for identifying the night snapshot of the black smoke vehicle according to claim 1, wherein in the step S4, the night disturbance is removed based on a super-resolution variation algorithm.
8. The method for capturing and identifying the black smoke vehicle at night according to claim 6, wherein in the step S5, the method specifically comprises:
s501, inputting a network: taking a small video block with the size of dxwxhxc as the input of a network, performing dimension increasing on the input through a layer of 3D convolution layer with 16 convolution kernels with the size of 1x1, and extracting low-layer time-space domain features;
s502, time domain and space domain network: and respectively obtaining two time domain characteristic layers and space domain characteristic layers with the sizes of D/2 xw/2 xh/2 x 32 by inputting with the sizes of D xw xh x 16 through a time-space domain network consisting of an S-P3D network structure block, a T-P3D network structure block and a 3D pooling network structure block.
S503, outputting a network: connecting space-time characteristics extracted by a space-time two-path 3D residual convolution network in a characteristic channel dimension to obtain space-time fusion characteristics with the size of D/2 xw/2 xh/2 x 64, normalizing the space-time fusion characteristics by a layer of 3D convolution layer containing 64 convolution kernels with the size of 1x1x1, finally accessing a global pooling layer, and obtaining a final classification result through softmax layer evaluation;
s504, evaluating and judging the road black smoke classification result, wherein the evaluation formula is as follows:
Figure FDA0002930749920000041
Figure FDA0002930749920000042
Figure FDA0002930749920000043
in the formula: ACC represents accuracy, N is the total number of samples; TPR represents the number of smoke sample results predicted as smoke, i.e., the detection rate; TNR denotes the number of non-smoke sample results predicted to be non-smoke, i.e. false detection rate; tp represents the amount of correctly detected smoke in the total number of smoke; fNA number representing an unrecognized actual smoke region; fp represents the amount of non-smoke identified as smoke; t isNIndicating the number of non-smoke regions identified as non-smoke regions.
9. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the method for identifying a black-smoke car snapshot at night of any one of claims 1 to 8.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for identifying a snapshot of a black-smoke vehicle at night according to any one of claims 1 to 8 when executing the program.
CN202110146700.1A 2021-02-03 2021-02-03 Night snapshot identification method for black smoke vehicle Pending CN113158747A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110146700.1A CN113158747A (en) 2021-02-03 2021-02-03 Night snapshot identification method for black smoke vehicle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110146700.1A CN113158747A (en) 2021-02-03 2021-02-03 Night snapshot identification method for black smoke vehicle

Publications (1)

Publication Number Publication Date
CN113158747A true CN113158747A (en) 2021-07-23

Family

ID=76882750

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110146700.1A Pending CN113158747A (en) 2021-02-03 2021-02-03 Night snapshot identification method for black smoke vehicle

Country Status (1)

Country Link
CN (1) CN113158747A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114067435A (en) * 2021-11-15 2022-02-18 山东大学 Sleep behavior detection method and system based on pseudo-3D convolutional network and attention mechanism
CN114924715A (en) * 2022-06-15 2022-08-19 泰州亚东广告传媒有限公司 System and method for accessing API function of step-counting application program

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114067435A (en) * 2021-11-15 2022-02-18 山东大学 Sleep behavior detection method and system based on pseudo-3D convolutional network and attention mechanism
CN114924715A (en) * 2022-06-15 2022-08-19 泰州亚东广告传媒有限公司 System and method for accessing API function of step-counting application program
CN114924715B (en) * 2022-06-15 2023-08-22 泰州亚东广告传媒有限公司 Step counting application program API function access system and method

Similar Documents

Publication Publication Date Title
CN110688925B (en) Cascade target identification method and system based on deep learning
CN112232349A (en) Model training method, image segmentation method and device
CN112101175A (en) Expressway vehicle detection and multi-attribute feature extraction method based on local images
CN111046880A (en) Infrared target image segmentation method and system, electronic device and storage medium
CN110866879B (en) Image rain removing method based on multi-density rain print perception
CN113158738B (en) Port environment target detection method, system, terminal and readable storage medium based on attention mechanism
CN112686207B (en) Urban street scene target detection method based on regional information enhancement
CN111539343B (en) Black smoke vehicle detection method based on convolution attention network
CN113158747A (en) Night snapshot identification method for black smoke vehicle
CN109190455B (en) Black smoke vehicle identification method based on Gaussian mixture and autoregressive moving average model
CN110490155B (en) Method for detecting unmanned aerial vehicle in no-fly airspace
US20220284703A1 (en) Method and system for automated target recognition
CN113191339B (en) Track foreign matter intrusion monitoring method and system based on video analysis
CN112149476A (en) Target detection method, device, equipment and storage medium
CN108921857A (en) A kind of video image focus area dividing method towards monitoring scene
Tuominen et al. Cloud detection and movement estimation based on sky camera images using neural networks and the Lucas-Kanade method
CN116704273A (en) Self-adaptive infrared and visible light dual-mode fusion detection method
CN111274964A (en) Detection method for analyzing water surface pollutants based on visual saliency of unmanned aerial vehicle
CN109446938A (en) A kind of black smoke vehicle detection method based on multisequencing dual-projection
CN112052768A (en) Urban illegal parking detection method and device based on unmanned aerial vehicle and storage medium
CN110334703B (en) Ship detection and identification method in day and night image
CN113111862B (en) Vehicle tail lamp state identification method based on action-state joint learning
CN114565764A (en) Port panorama sensing system based on ship instance segmentation
CN115700737A (en) Oil spill detection method based on video monitoring
CN114398950A (en) Garbage recognition and classification method, computer readable storage medium and robot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination