CN113158747A

CN113158747A - Night snapshot identification method for black smoke vehicle

Info

Publication number: CN113158747A
Application number: CN202110146700.1A
Authority: CN
Inventors: 李晓斌; 李毓勤; 何玉龙; 周当; 刘颖
Original assignee: Guangzhou Skyland Information Technology Co ltd
Current assignee: Guangzhou Skyland Information Technology Co ltd
Priority date: 2021-02-03
Filing date: 2021-02-03
Publication date: 2021-07-23

Abstract

The invention relates to a method for capturing and identifying a black smoke vehicle at night, which comprises the following steps: acquiring a vehicle video image; extracting texture features of the video image; extracting space-time characteristics based on a space-time two-way 3D residual convolution network; removing nighttime interference; extracting time domain characteristics and space domain characteristics after interference removal, introducing an attention mechanism, newly calibrating time-space domain network characteristic channels, weighting time-space domain networks respectively, screening out characteristics favorable for classification results, fusing the time-space domain characteristics to obtain a final classification identification network, and directly obtaining a classification result by taking a video block as input.

Description

Night snapshot identification method for black smoke vehicle

Technical Field

The invention belongs to the technical field of environment, and particularly relates to a night snapshot identification method for a black smoke vehicle.

Background

The tail gas discharged by the motor vehicle contains toxic and harmful substances, black smoke vehicles are typical representatives of high-pollution vehicles, and more than 80 percent of the black smoke vehicles are diesel oil type commercial vehicles (trucks and buses). The tail gas discharged by the black smoke vehicle not only pollutes the atmosphere, but also is harmful to human health. Therefore, the pollution control of black smoke vehicles has been the major task of pollution control of motor vehicles for a long time.

At present, in an online monitoring mode of electronic snapshot of the black smoke vehicle, the black smoke vehicle can be intelligently identified to be a powerful auxiliary tool for dealing with motor vehicle pollution. However, based on technical and equipment characteristics, the recognition rate of the black smoke vehicle snapshot is high only in daytime. Because the snapshot is easily influenced by factors such as light, shadow, traffic flow and the like under the night environment, especially the street lamp and the opposite traffic flow far-reaching lamp influence the black smoke recognition, the accurate recognition rate of the snapshot of the black smoke vehicle at night is very low.

Disclosure of Invention

The invention provides a black smoke vehicle night snapshot identification method aiming at the problems in the prior art.

The invention is realized by the following technical scheme:

a black smoke vehicle night snapshot identification method comprises the following steps:

s1: acquiring a vehicle video image, and preprocessing the video image;

s2: extracting texture features of the preprocessed video image, removing a video dynamic background, extracting features by adopting an LBP (local binary pattern) texture classification feature algorithm, and extracting motion features by adopting an optical flow method;

s3: extracting space-time characteristics based on a space-time two-way 3D residual convolution network, wherein the space-time characteristics are extracted based on a space-time deep neural network strategy and the time domain characteristics are extracted based on an LSTM strategy;

s4: removing nighttime interference;

s5: extracting time domain characteristics and space domain characteristics after interference removal, introducing an attention mechanism, newly calibrating time-space domain network characteristic channels, weighting time-space domain networks respectively, screening out characteristics favorable for classification results, fusing the time-space domain characteristics to obtain a final classification identification network, and directly obtaining a classification result by taking a video block as input.

Further, the preprocessing comprises the steps of obtaining video streams from a camera gun through a network, decoding and video restoration by utilizing a video coding and decoding technology, and performing brightness adjustment, image correction and denoising processing on the restored video based on a preset video preprocessing algorithm.

Further, in the step S2, the removing the video dynamic background includes the steps of:

s201, each new pixel value Xt is compared with the current K models according to the following formula until a distribution model matching the new pixel value is found, namely the mean deviation of the new pixel value and the distribution model is within 2.5 sigma,

|X_t-μ_i，t-1|≤2.5σ_i，t-1；

s202, if the matched mode meets the background requirement, the pixel belongs to the background, otherwise, the pixel belongs to the foreground;

s203, updating each model weight according to the following formula, wherein alpha is learning rate, and for the matched model M _k，t1, otherwise M_k，tWhen the weight of each mode is equal to 0, normalizing the weight of each mode;

w_k，t＝(1-α)*w_k，t-1+α*M_k，t

s204, the mean value mu of the unmatched model is unchanged from the standard deviation sigma, and the parameters of the matched model are updated according to the following formula:

ρ＝α*η(X_t|μ_k，σ_k)

μ_t＝(1-ρ)*μ_t-1+ρ*X_t

wherein ρ represents a threshold value satisfying the matching model;

s205, if no model is matched in the step 1, replacing the model with the minimum weight, namely, the mean value of the model is the current pixel value, the standard deviation is an initial large value, and the weight is a small value;

s206, each model is according to w/alpha²The data are arranged in descending order, and the mode with heavy weight and small standard deviation is arranged in front;

s207, selecting the first B models as backgrounds, wherein B satisfies the following formula, parameter T represents the proportion of the backgrounds,

further, in the step S3, in the spatial domain feature extraction based on the spatio-temporal depth neural network policy, after the motion feature extraction, preliminary spatial domain discrimination is performed: when the airspace is judged to be smoke, motion information between a group of continuous frames is accumulated through a time flow network part and a circulating neural network part to distinguish a smoke area from a non-smoke area, and after most of the non-smoke areas are filtered, the space-time domain characteristics of the smoke area are extracted for classification and identification.

Further, the space-time deep neural network combines the 3D convolutional neural network and the DenseNet, and decomposes the 3D CNN into the size D_kX1 time convolution kernel and size d_kTwo pseudo 3D CNNs of a x 3 spatial convolution kernel; for an input feature map of d × w × h × c, where d is the video frame length, w is the video width, h is the video height, and c is the video frame input feature dimension, the pass size is d_kThe number of calculation parameters for a x 3 × 3D convolution kernel is: dXwXhXcXd_k×3×3；

And the calculation parameters for the decomposed pseudo-3D convolution kernel are:

d×w×h×c×(d_k+3×3)。

furthermore, the space-time two-way 3D residual convolution network comprises a plurality of S-P3D network blocks and a plurality of T-P3D network blocks, wherein the plurality of connected T-P3D network blocks are connected with the plurality of S-P3D network blocks to form a space-time two-way network for respectively extracting the space-time characteristics of smoke.

Further, in the step S4, the nighttime interference is removed based on the super-resolution variation algorithm.

Further, in step S5, the method specifically includes:

s501, inputting a network: taking a small video block with the size of dxwxhxc as the input of a network, performing dimension increasing on the input through a layer of 3D convolution layer with 16 convolution kernels with the size of 1x1, and extracting low-layer time-space domain features;

s502, time domain and space domain network: and respectively obtaining two time domain characteristic layers and space domain characteristic layers with the sizes of D/2 xw/2 xh/2 x 32 by inputting with the sizes of D xw xh x 16 through a time-space domain network consisting of an S-P3D network structure block, a T-P3D network structure block and a 3D pooling network structure block.

S503, outputting a network: connecting space-time characteristics extracted by a space-time two-path 3D residual convolution network in a characteristic channel dimension to obtain space-time fusion characteristics with the size of D/2 xw/2 xh/2 x 64, normalizing the space-time fusion characteristics by a layer of 3D convolution layer containing 64 convolution kernels with the size of 1x1x1, finally accessing a global pooling layer, and obtaining a final classification result through softmax layer evaluation;

s504, evaluating and judging the road black smoke classification result, wherein the evaluation formula is as follows:

in the formula: ACC represents accuracy, N is the total number of samples; TPR denotes smoke predicted as smokeThe number of sample results, namely the detection rate; TNR denotes the number of non-smoke sample results predicted to be non-smoke, i.e. false detection rate; tp represents the amount of correctly detected smoke in the total number of smoke; f_NA number representing an unrecognized actual smoke region; fp represents the amount of non-smoke identified as smoke; t is_NIndicating the number of non-smoke regions identified as non-smoke regions.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method for identifying a black-smoke vehicle snapshot at night.

A computer device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the night snapshot recognition method of the black smoke vehicle.

Compared with the prior art, the invention has the following advantages: according to the invention, the intelligent recognition algorithm for the snapshot of the black smoke vehicle is optimized on the basis of the snapshot recognition of the black smoke vehicle, the problem of night interference is solved, the error recognition rate at night is reduced, all-weather snapshot at day and night is realized, and the black smoke vehicle is intelligently recognized.

Drawings

The present invention will be described in further detail with reference to the accompanying drawings;

FIG. 1 is a diagram of encoding neighboring pixels using a variable radius circle;

FIG. 2 is a diagram of a spatiotemporal depth-based neural network architecture;

FIG. 3 is a transformation diagram of a 3D CNN decomposed into two pseudo 3D CNNs;

FIG. 4 is a diagram of a spatiotemporal two-way network architecture;

FIG. 5 is a diagram of a spatial network architecture of the present invention;

FIG. 6 is an LBP algorithm implementation;

FIG. 7 is an output phase function variation;

fig. 8 is another function variation in the output phase.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention relates to a black smoke vehicle night snapshot identification method, wherein a light supplementing device is additionally arranged on a rod piece above a road, a video image snapshot by a high-definition camera is transmitted to an inference machine by the system, an artificial intelligence-based image identification algorithm is deployed in the inference machine, the black smoke vehicle is identified and judged by utilizing an image processing technology, texture feature extraction and a night snapshot algorithm, and data are transmitted to a rear-end platform through a network. In a specific embodiment, the method comprises the following steps:

1. vehicle video image acquisition

A high-definition camera is deployed on the road rod piece to record the passing vehicles. A virtual position covering a full lane (no more than three lanes) is arranged at a distance of 22-24 meters in front of the camera, and when the vehicle passes through a preset position, the high-definition camera captures the image at the moment.

2. Image pre-processing

The method comprises the steps of obtaining video streams from a camera gun through a network, decoding and video restoration by utilizing a video coding and decoding technology, and carrying out brightness adjustment, image correction, denoising and other processing on videos based on the existing video preprocessing algorithm.

(1) Image graying

When analyzing the image problem, because the environment and the shooting self factor influence, need carry out certain conversion to the image, use the RGB model, assume that the values of three channels are equal, then represent the color information of this point with unified grey scale value, the range of grey scale value is 0 to 255. The image graying method mainly uses a weighted average method here.

The formula of the weighted average method RGB graying is as follows: gray (x, y) ═ U_rR(x,y)+U_gG(x,y) +U_bB(x,y)

In the above formula, U represents the weight of three channels, and the sum of the three is 1.

According to the conversion relationship between the YUV color space and the RGB color space, when determining that the weights are 0.3008, 0.5958 and 0.1133 respectively, the simplified formula is as follows:

Gray(x,y)＝0.3008*R(x,y)+0.5958*G(x,y)+0.1133*B(x,y)

(2) image de-noising

The grey values (including the central pixel point) of all the pixel points in the window range taking a certain pixel point as the center in the image are sequenced by using a median filtering method, and then the middle value of the grey sequence is assigned to the central pixel point. Unlike other linear filters that consider each pixel, median filtering ignores most of the pixels in the relative neighborhood (here set to 3 x 3 windows), which is dark or bright, and occupies less than half of the total pixels (i.e., 3 pixels)²And/2), thereby effectively filtering isolated noise points.

The median filtering method based on the number limits the range of the gray difference absolute value of the pixel point in the neighborhood and the central pixel point by setting a threshold value rho, thereby dividing the attribute of the pixel point in the noisy image into three types, namely a flat area, an image edge and a noise point. Taking a window with a size of 3 × 3 as an example, assuming that the number of adjacent pixels in 8 neighborhoods of the central pixel, the absolute value of the gray difference between the adjacent pixels and the central pixel is greater than a threshold value T is m, when m is less than or equal to 2, the central pixel is a flat area point; when m is more than 2 and less than 6, the central pixel point is an image edge point; when m is larger than or equal to 6, the central pixel point is a noise point.

3. Video image texture feature extraction

(1) Video dynamic background removal

In order to find a moving pixel set in a video, a background removal method using a mixed gaussian model is used, in which a moving region is set to white and other regions not moving are set to a black background. The method comprises the following steps:

1. each new pixel value X_tAnd comparing the current K models according to the following formula until a distribution model matching with a new pixel value is found, namely the mean deviation of the distribution model and the model is within 2.5 sigma.

|X_t-μ_i，t-1|≤2.5σ_i，t-1

2. If the matched pattern meets the background requirement, the pixel belongs to the background, otherwise, the pixel belongs to the foreground.

3. The weight of each pattern is updated according to the following formula, wherein alpha is the learning rate, and the matched pattern M _k，t1, otherwise M_k，tThe weights of the modes are then normalized to 0.

w_k，t＝(1-α)*w_k，t-1+α*M_k，t

4. The mean value mu of the unmatched pattern is unchanged from the standard deviation sigma, and the parameters of the matched pattern are updated according to the following formula:

ρ＝α*η(X_t|μ_k，σ_k)

μ_t＝(1-ρ)*μ_t-1+ρ*X_t

5. if no pattern match occurs in step 1, the pattern with the smallest weight is replaced, i.e. the mean value of the pattern is the current pixel value, the standard deviation is an initial larger value, and the weight is a smaller value.

6. Each mode according to w/alpha²And the patterns with large weight and small standard deviation are arranged in descending order and are arranged in front.

7. And B modes in the front are selected as backgrounds, B satisfies the following formula, and parameter T represents the proportion of the backgrounds.

(2) Smoke LBP characteristic value

In order to extract the texture of smoke, an lbp (local binary pattern) texture classification feature algorithm is used for extraction.

The LBP algorithm can be used to calculate the relationship between pixels, and is an operator used to describe the local texture features of the image, and the content of the reflection is the relationship between each pixel and the surrounding pixels. The LBP algorithm divides the image into 3 × 3 sub-regions, extracts LBP features for each pixel in each sub-region, and the implementation method is as shown in fig. 6:

where (Xc, Yc) is the center pixel and the luminance is i_c(ii) a And i_nIt is the brightness of the neighboring pixel. s is a sign function:

the description method can well capture the details in the image. As the method just described is proposed, the fixed neighborhood fails for scale-varying coding. Therefore, an extension method using a variable is to encode neighbor pixels using a circle of variable radius, as in fig. 1, so that the following neighbors can be captured:

for a given point (Xc, Yc), its neighbors (Xp, Yp), P ∈ P, may be calculated as follows:

where R is the radius of the circle and P is the number of sample points.

This is an extension of the original LBP operator and is sometimes referred to as an extended LBP (referred to as a circular LBP). If a point on the circle is not on the image coordinates, we use his interpolated point. Interpolation methods are used, whereas OpenCV uses bilinear interpolation, as follows.

(3) Vector extraction by optical flow method

In order to observe the motion trail of the moving object, an optical flow method is adopted to extract motion characteristics.

4. Neural network strategy extraction space-time characteristics

(1) Spatial domain feature extraction based on space-time deep neural network strategy

In order to obtain richer texture information and spatial information of black smoke, a spatio-temporal deep neural network is used. After the characteristics of the motion area are automatically extracted, a preliminary airspace is distinguished: motion information between a set of consecutive frames is further accumulated by the time flow network and the recurrent neural network parts on the basis of judging the airspace as smoke to distinguish between smoke and non-smoke regions.

In order to further reduce the detection area, after most non-smoke areas are filtered, the time-space domain features of the video smoke are extracted for classification and identification. By combining the 3D convolutional neural network with DenseNet, in order to reduce the model parameters, 3 × 1 × 1 and 1 × 3 × 3 convolution kernels are used instead of the original 3 × 3 × 3 convolution kernel, as shown in fig. 2.

The 3D CNN is decomposed into two pseudo-3D CNNs, the expansion from 2DCNN to 3DCNN is realized, as shown in figure 3, the classification and identification capability of the network is ensured, the network parameter quantity is reduced, and the spatial domain characteristics are extracted.

By a size d_kX1 time convolution kernel and size d_kThe x 3 space convolution kernel is replaced by two pseudo 3D network structures, and the parameter amount of the network is reduced under the condition of ensuring the same classification capability. For an input feature map of d × w × h × c, where d is the video frame length, w is the video width, h is the video height, and c is the video frame input feature dimension, the pass size is d_kThe number of calculation parameters for a x 3 × 3D convolution kernel is:

d×w×h×c×d_k×3×3

d×w×h×c×(d_k+3×3)

it follows that the pseudo 3D convolution kernel calculation parameter is D of the 3D convolution kernel _k3²/(d_k+3²) The number of network parameters is greatly reduced, and good classification capability can be ensured.

(2) Extraction of time domain features based on LSTM strategy

In order for the neural network to classify events at each time point and to be able to infer the next event using the previous event, the LSTM neural network is used for long-term memory.

There are three main stages inside the LSTM:

1. forget the stage. This stage is mainly the selective forgetting of the input coming from the previous node. Simply put, "forget unimportant and remember important".

2. The memory stage is selected. This stage selectively "remembers" the inputs of this stage. Will mainly be to the input X^tAnd performing selection and memory. Which important ones are recorded and which ones are not important, and the others are recorded less. The current input content is represented by Z calculated previously. And the selected gating signal is represented by Zⁱ(i represents information) to perform control.

3. And (5) an output stage. This phase will determine which will be the output of the current state. Mainly through Z⁰To be controlled. And also for C obtained in the previous stage⁰Scaling (variation by a tanh activation function) was performed. As shown in fig. 7:

here: x is the input of data in the current state, and h represents the input of the last node received.

y is the output at the current node state and h' is the output passed to the next node. As shown in fig. 8:

when the input characteristic value x is less than 0, the output is 0, the more neurons which are 0 after training are completed, the greater the sparsity is, the more representative the extracted characteristic is, and the stronger the generalization capability is; when the input characteristic value x is larger than 0, the output is equal to the input, the gradient dissipation problem is avoided, and the convergence is fast.

The output of the convolution layer network is subjected to batch normalization, so that the normalization of the output is used as the input of an activation function, and the problems of inconsistent data distribution and gradient dispersion of different layers are solved. The basic structure of the space-time two-way 3D residual error convolution network is obtained, and FIG. 4 shows a two-way 3D residual error convolution network structure and an evolution structure, which are respectively as follows: P3D, S-P3D network blocks, T-P3D network blocks. And connecting the plurality of T-P3D network blocks with the plurality of S-P3D network blocks to construct a space-time two-way network for respectively extracting the space-time characteristics of the smoke.

(3) Attention mechanism is introduced to improve classification and identification capability

In order to improve the detection efficiency, attention is drawn to a mechanism, the importance degree of the features extracted from the time domain network and the space domain network is automatically learned, the features favorable for classification are improved, and the features useless for classification are suppressed. The characteristic channels of the space-time two-way network are weighted respectively, and the classification and identification capability of the network is improved. The network structure comprises the following three steps:

1) squeeze operation, namely compressing all dimensions except the dimension c of the characteristic channel in the input tensor through a global pooling layer to convert the input tensor into a real number vector with the size of the characteristic channel number, wherein the input with the size of dxwxhxc is 1 xc;

2) an Excitation operation, namely compressing the converted feature vector through a full connection layer to reduce the dimension to c/r, wherein the dimension is 1 xc/r, and obtaining a feature weight vector with the output dimension matched with the number of input feature channels through the full connection layer after function activation, wherein the dimension is 1 xc;

3) and (4) reweighting operation, namely normalizing the weight through a Sigmoid function, and finally weighting the weight obtained by the Excitation operation on the characteristic channel so as to realize the recalibration of the characteristic.

The normalized feature channel weight in the Reweight operation is in the range of (0,1), wherein the closer to 0, the less influence of the feature on the classification result is shown, and the closer to 1, the greater influence of the feature on the classification result is shown.

5. Nighttime interference removal

The method is easily influenced by factors such as light, shadow, traffic flow and the like in night environment, particularly influences of street lamps and opposite traffic flow high beam lamps on black smoke identification, and is used for solving the problem of night interference through a super-resolution variation algorithm so as to reduce the error identification rate. The super-resolution variation algorithm is realized as follows:

and taking the candidate motion area as the input of a space network to extract space domain characteristics, further utilizing RNN to accumulate the motion characteristics in time on the basis of smoke, and finally using a Softmax loss function to classify and identify. The loss function is as follows:

target perceptual loss function

6. Evaluating the classification result

Respectively extracting time domain characteristics and space domain characteristics of the video, attracting attention, newly calibrating time-space domain network characteristic channels, respectively weighting the time-space domain network, screening out characteristics favorable for classification results, then fusing the time-space domain characteristics to obtain a final classification identification network, and directly obtaining the classification results by taking the video block as input. The realization method comprises the following steps:

and (3) an input network, namely taking a small video block with the size of dxwxhxc as the input of the network, and increasing the dimension of the input and extracting the low-layer time-space domain characteristics through a layer of 3D convolution layer with 16 convolution kernels with the size of 1x 1.

And (3) a time domain and space domain network, namely, obtaining two time domain characteristic layers and space domain characteristic layers with the sizes of D/2 xw/2 xh/2 x 32 by inputting with the size of D xw xh x 16 through the time-space domain network consisting of an S-P3D network structure block, a T-P3D network structure block and a 3D pooling network structure block respectively.

The output network is used for connecting the features extracted by the space-time two-way network in the dimension of the feature channel to obtain the value

D/2 xw/2 xh/2 x 64 space-time combination characteristics, normalizing the fused space-time characteristics through a layer of 3D convolution layer containing 64 convolution kernels with the convolution kernel size of 1x1x1, finally connecting a global pooling layer, and obtaining a final classification result through softmax layer evaluation.

And evaluating and judging the road black smoke classification result, wherein the evaluation formula is as follows:

ACC represents accuracy, and N is the total number of samples; TPR represents the number of smoke sample results predicted as smoke, i.e., the detection rate; TNR denotes the number of non-smoke sample results predicted to be non-smoke, i.e. false detection rate; tp represents the amount of correctly detected smoke in the total number of smoke; f_NA number representing an unrecognized actual smoke region; fp represents the amount of non-smoke identified as smoke; t is_NIndicating the number of non-smoke regions identified as non-smoke regions.

The invention also provides a computer-readable storage medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the steps of the black-smoke vehicle night snapshot recognition method.

The invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the program to realize the steps of the night snapshot identification method of the black smoke vehicle.

The above-mentioned embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, and it should be understood that the above-mentioned embodiments are only examples of the present invention and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the invention are also within the protection scope of the invention.

Claims

1. A black smoke vehicle night snapshot identification method is characterized by comprising the following steps:

s1: acquiring a vehicle video image, and preprocessing the video image;

s4: removing nighttime interference;

2. The black smoke vehicle night snapshot recognition method of claim 1, wherein the preprocessing comprises acquiring a video stream from a camera gun through a network, decoding and video restoration by using a video encoding and decoding technology, and performing brightness adjustment, image correction and denoising processing on the restored video based on a preset video preprocessing algorithm.

3. The method for identifying the night snapshot of the black smoke vehicle according to claim 1, wherein in the step S2, the removing the dynamic background of the video comprises the steps of:

s201. each new pixel value X_tAnd then the K models are compared according to the following formula until a distribution model matching with a new pixel value is found, namely the mean deviation of the K models is within 2.5 sigma,

|X_t-μ_i，t-1|≤2.5σ_i，t-1；

s203, updating each model weight according to the following formula, wherein alpha is learning rate, and for the matched model M_k，t1, otherwise M_k，tWhen the weight of each mode is equal to 0, normalizing the weight of each mode;

w_k，t＝(1-α)*w_k，t-1+α*M_k，t

ρ＝α*η(X_t|μ_k，σ_k)

μ_t＝(1-ρ)*μ_t-1+ρ*X_t

wherein ρ represents a threshold value satisfying the matching model;

4. the black smoke vehicle night snapshot recognition method according to claim 1, wherein in the step S3, in the spatial domain feature extraction based on the space-time deep neural network strategy, after the motion feature extraction, a preliminary spatial domain discrimination is performed: when the airspace is judged to be smoke, motion information between a group of continuous frames is accumulated through a time flow network part and a circulating neural network part to distinguish a smoke area from a non-smoke area, and after most of the non-smoke areas are filtered, the space-time domain characteristics of the smoke area are extracted for classification and identification.

5. The method for night snapshot recognition of black smoke vehicle as claimed in claim 4, wherein said spatiotemporal deep neural network combines 3D convolutional neural network and DenseNet and decomposes 3D CNN into size D_kX1 time convolution kernel and size d_kTwo pseudo 3D CNNs of a x 3 spatial convolution kernel; for an input feature map of d × w × h × c, where d is the video frame length, w is the video width, h is the video height, and c is the video frame input feature dimension, the pass size is d_kThe number of calculation parameters for a x 3 × 3D convolution kernel is: dXwXhXcXd_k×3×3；

d×w×h×c×(d_k+3×3)。

6. the black smoke vehicle night snapshot recognition method of claim 5, wherein in the step S3, the space-time two-way 3D residual convolution network comprises a plurality of S-P3D network blocks and a plurality of T-P3D network blocks, and a plurality of connected T-P3D network blocks are connected with the plurality of S-P3D network blocks to construct a space-time two-way network for respectively extracting the space-time characteristics of smoke.

7. The method for identifying the night snapshot of the black smoke vehicle according to claim 1, wherein in the step S4, the night disturbance is removed based on a super-resolution variation algorithm.

8. The method for capturing and identifying the black smoke vehicle at night according to claim 6, wherein in the step S5, the method specifically comprises:

in the formula: ACC represents accuracy, N is the total number of samples; TPR represents the number of smoke sample results predicted as smoke, i.e., the detection rate; TNR denotes the number of non-smoke sample results predicted to be non-smoke, i.e. false detection rate; tp represents the amount of correctly detected smoke in the total number of smoke; f_NA number representing an unrecognized actual smoke region; fp represents the amount of non-smoke identified as smoke; t is_NIndicating the number of non-smoke regions identified as non-smoke regions.

9. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the method for identifying a black-smoke car snapshot at night of any one of claims 1 to 8.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for identifying a snapshot of a black-smoke vehicle at night according to any one of claims 1 to 8 when executing the program.