CN114866784A - Vehicle detection method based on compressed video DCT (discrete cosine transformation) coefficients - Google Patents
Vehicle detection method based on compressed video DCT (discrete cosine transformation) coefficients Download PDFInfo
- Publication number
- CN114866784A CN114866784A CN202210411306.0A CN202210411306A CN114866784A CN 114866784 A CN114866784 A CN 114866784A CN 202210411306 A CN202210411306 A CN 202210411306A CN 114866784 A CN114866784 A CN 114866784A
- Authority
- CN
- China
- Prior art keywords
- frame
- dct coefficients
- dct
- vehicle detection
- blocks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 87
- 230000009466 transformation Effects 0.000 title claims description 10
- 238000000034 method Methods 0.000 claims abstract description 34
- 238000012549 training Methods 0.000 claims abstract description 14
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 230000008569 process Effects 0.000 claims description 16
- 238000000605 extraction Methods 0.000 claims description 11
- 230000006870 function Effects 0.000 claims description 7
- 238000012163 sequencing technique Methods 0.000 claims description 7
- 230000006835 compression Effects 0.000 claims description 5
- 238000007906 compression Methods 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 238000009825 accumulation Methods 0.000 claims description 3
- 238000003064 k means clustering Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 239000013598 vector Substances 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/625—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using discrete cosine transform [DCT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/149—Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Algebra (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Discrete Mathematics (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention discloses a vehicle detection method based on compressed video DCT coefficients, which comprises the following steps: extracting a compressed code stream video, and obtaining a first DCT coefficient corresponding to the compressed code stream video; preprocessing is carried out on the basis of the first DCT coefficient, and a second DCT coefficient is obtained; constructing a vehicle detection model; acquiring an image sample set based on an open source image data set UA-DETRAC, and then training the vehicle detection model by using the image sample set to obtain a vehicle detection network; and acquiring a vehicle detection result based on the compressed code stream video, the second DCT coefficient and the vehicle detection network. The method utilizes the characteristic that the characteristic information can be obtained without completely decoding the compressed format data, combines the depth science, reduces the complexity of a vehicle detection model, reduces the calculation force required by vehicle detection, and meets the requirement of edge calculation.
Description
Technical Field
The invention relates to the field of vehicle detection, in particular to a vehicle detection method based on video compression DCT coefficients.
Background
With the high-speed development of national economy, the quantity of motor vehicles in China is continuously increased, and the increase of the motor vehicles brings a series of problems such as traffic jam and the like, so that an intelligent transportation system is necessary to be developed. Road vehicle detection has been vigorously developed as an important technology in intelligent transportation systems. The video-based vehicle detection method has the characteristics of large information content, small interference to traffic facilities and the like.
The current video detection work firstly needs to be transmitted through a network, and the transmitted video uses a compressed format. The video vehicle detection method based on the pixel domain needs to completely decode the video and then realize vehicle detection through the current mainstream deep learning method. The deep learning detection method based on the pixel domain can realize real-time end-to-end detection at present, has high precision, but has a complex model, needs to completely decode a video and has high consumption of computing resources.
The vehicle detection method directly using the video compressed domain information does not need to completely decode the video, only needs to partially decode the video to obtain the compressed domain information, and can realize vehicle detection based on the characteristic information contained in the compressed domain, but the precision is not high.
Disclosure of Invention
In order to solve the problems, the invention provides a vehicle detection method based on a compressed video DCT coefficient.
In order to achieve the purpose of the invention, the invention provides a vehicle detection method based on compressed video DCT, which comprises the following steps:
s1: extracting a compressed code stream video, and obtaining a first DCT coefficient corresponding to the compressed code stream video;
s2: preprocessing the first DCT coefficient to obtain a second DCT coefficient;
s3: constructing a vehicle detection model;
s4: acquiring an image sample set based on an open source image data set UA-DETRAC, and then training the vehicle detection model by using the image sample set to obtain a vehicle detection network;
s5: and inputting the second DCT coefficient into the vehicle detection network for detection, obtaining the position, type and confidence information of the vehicle, and then drawing a detection frame, type and confidence of the vehicle in the decoded compressed code stream video frame based on the position, type and confidence information of the vehicle, wherein the detection frame, type and confidence of the vehicle are the detection result of the vehicle.
Further, the compressed code stream video is an h.264 compressed code stream video.
Further, in the step S1, the extracting is performed on the h.264 compressed code stream video to obtain a first DCT coefficient corresponding to the compressed code stream video, and the specific process includes the following steps:
converting the size of the H.264 compressed code stream video into 416x 416;
the image frame of the H.264 compressed code stream video comprises: i, P, and B frames;
obtaining residual DCT coefficients of 4x4 blocks of the H.264 compressed code stream video I frame and a predicted value under an I frame intra-frame prediction mode by using a JM (JM) decoder, then carrying out DCT transformation of 4x4 blocks on the predicted value under the I frame intra-frame prediction mode, and finally adding a transformation result with the residual DCT coefficients of 4x4 blocks of the H.264 compressed code stream video I frame to obtain the DCT coefficients of 4x4 blocks of the I frame;
obtaining respective residual DCT coefficients of a P frame and a B frame of the H.264 compressed code stream video and DCT coefficients of respective reference frames, obtaining positions of respective reference coding blocks and the DCT coefficients of the respective reference coding blocks according to the DCT coefficients of the respective reference frames of the P frame and the B frame and respective motion vectors of the P frame and the B frame, and obtaining the DCT coefficients of respective 4x4 blocks of the P frame and the B frame based on the obtained respective residual DCT coefficients of the P frame and the B frame, the positions of the respective reference coding blocks and the DCT coefficients of the respective reference coding blocks;
the DCT coefficients of the 4x4 blocks of each of the I frame, the P frame and the B frame are collectively called first DCT coefficients;
and converting the first DCT coefficient of the 4x4 block into the first DCT coefficient of the 8x8 block according to the block space relation of the DCT coefficients, namely obtaining the first DCT coefficient corresponding to the compressed code stream video.
Further, in step S1, the specific process of obtaining the DCT coefficients of the respective 4x4 blocks of the P frame and the B frame based on the obtained residual DCT coefficients of the P frame and the B frame, the positions of the respective reference coding blocks, and the DCT coefficients of the respective reference coding blocks includes the following steps:
when the reference coding blocks of the P frame and the B frame are positioned at integer positions of multiples of a reference frame pixel 4, directly adding residual DCT coefficients of the P frame or the B frame and DCT coefficients of the reference coding blocks of the P frame and the B frame to obtain DCT coefficients of 4x4 blocks of the P frame and the B frame;
when the reference coding blocks of the P frame and the B frame are located at integer positions which are not multiples of the pixel 4 of the reference frame, the DCT coefficients of the reference coding blocks of the P frame and the B frame are obtained according to the DCT coefficients of four blocks which are adjacent to the reference coding blocks and located at the integer positions of the multiples of the pixel 4 of the reference frame, then the residual DCT coefficients of the P frame and the B frame are respectively added with the DCT coefficients of the reference coding blocks of the P frame and the B frame, and the DCT coefficients of 4x4 blocks of the P frame and the B frame are obtained.
Further, in step S2, the specific process of preprocessing the first DCT coefficient to obtain a second DCT coefficient includes:
and removing the DCT coefficients of Cb components and Cr components in the first DCT coefficients, reserving the DCT coefficients of a format 416x416 in the first DCT coefficients, converting the DCT coefficients into a format of 52x52x64, sequencing the DCT coefficients after the format conversion according to ZigZag, and finally taking the first 24 DCT coefficients in the sequencing result, namely the second DCT coefficients.
Further, in step S3, the specific process of constructing the vehicle detection model includes:
constructing a trunk feature extraction network based on a DarkNet-53 model, combining the trunk feature extraction network with a residual error network, extracting features through the accumulation of convolution and residual error structures, and reducing the size of a feature map;
constructing a regression detection network based on a feature pyramid, and detecting vehicles on feature maps of three scales of 52x52, 26x26 and 13x 13;
determining a loss function, wherein the loss function comprises detection frame coordinate loss, confidence coefficient loss and classification loss.
Further, in step s4, the acquiring process of the image sample set includes:
and (3) uniformly scaling the picture size in the open source image data set UA-DETRAC to 416x416, then extracting the DCT coefficient of the compression format image based on a Libjpeg library, processing the extracted DCT coefficient and outputting the DCT coefficient of the Y component with the size of 52x52x24, namely the image sample set.
Further, in step S4, the specific process of training the vehicle detection model using the image sample set includes:
initializing the network weight of the vehicle detection model, and initializing the network initial weight by using normal distribution;
setting the initial learning rate of the vehicle detection model to be 1e-4, and obtaining a self-adaptive learning rate in subsequent training by using an Adam algorithm;
setting the size of an anchor frame according to the label data of the image sample set by using a K-means clustering method, and setting anchor frames with three sizes on the feature maps with the three scales of 52x52, 26x26 and 13x13 by taking the idea of YOLOv3 as reference;
setting parameter values of the vehicle detection model: detecting the category, the batch size and the iteration number;
training the vehicle detection model using the set of image samples.
Compared with the prior art, the invention has the following beneficial technical effects:
according to the scheme, the vehicle detection method based on the deep learning of the pixel domain is combined with the method based on the video compression domain information, so that the end-to-end detection with high precision can be realized, the picture does not need to be completely decoded, and the resource consumption is greatly reduced.
Drawings
FIG. 1 is a flowchart illustrating a method for vehicle detection based on compressed video DCT coefficients according to an embodiment;
FIG. 2 is a diagram of referencing an encoding block and obtaining DCT coefficients from neighboring blocks in one embodiment;
FIG. 3 is a diagram illustrating the spatial relationship of a DCT block to sub-blocks of an embodiment;
FIG. 4 is a schematic illustration of a Zigzag arrangement of an embodiment;
FIG. 5 is a diagram of a backbone feature extraction network architecture of one embodiment;
FIG. 6 is a diagram illustrating an overall architecture of a vehicle inspection model according to an embodiment;
FIG. 7 is a test video frame of an embodiment;
FIG. 8 shows vehicle detection results according to one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The data set used in the specific implementation of the invention is a UA-DETRAC vehicle detection data set which comprises 8250 vehicles and 121 ten thousand target detection frames, and the training of a vehicle detection model is realized according to DCT feature information extracted from a data set picture, so that the vehicle detection of an H.264 compressed code stream is realized, and the vehicle detection of a compressed format video is realized.
As shown in fig. 1, a method for detecting a vehicle based on compressed video DCT coefficients includes the following steps:
s1: extracting a compressed code stream video, and obtaining a first DCT coefficient corresponding to the compressed code stream video;
s2: preprocessing the first DCT coefficient to obtain a second DCT coefficient;
s3: constructing a vehicle detection model;
s4: acquiring an image sample set based on an open source image data set UA-DETRAC, and then training the vehicle detection model by using the image sample set to obtain a vehicle detection network;
s5: and inputting the second DCT coefficient into the vehicle detection network for detection, obtaining the position, type and confidence information of the vehicle, and then drawing a detection frame, type and confidence of the vehicle in the decoded compressed code stream video frame based on the position, type and confidence information of the vehicle, wherein the detection frame, type and confidence of the vehicle are the detection result of the vehicle.
In an embodiment, in step S1, the extracting is performed on the h.264 compressed code stream video to obtain a first DCT coefficient corresponding to the compressed code stream video, and the specific process includes the following steps:
converting the size of the H.264 compressed code stream video into 416x 416;
the image frame of the H.264 compressed code stream video comprises: i, P, and B frames;
obtaining residual DCT coefficients of 4x4 blocks of the H.264 compressed code stream video I frame and a predicted value under an I frame intra-frame prediction mode by using a JM (JM) decoder, then carrying out DCT transformation of 4x4 blocks on the predicted value under the I frame intra-frame prediction mode, and finally adding a transformation result with the residual DCT coefficients of 4x4 blocks of the H.264 compressed code stream video I frame to obtain the DCT coefficients of 4x4 blocks of the I frame;
obtaining respective residual DCT coefficients of a P frame and a B frame of the H.264 compressed code stream video and DCT coefficients of respective reference frames, obtaining positions of respective reference coding blocks and the DCT coefficients of the respective reference coding blocks according to the DCT coefficients of the respective reference frames of the P frame and the B frame and respective motion vectors of the P frame and the B frame, and obtaining the DCT coefficients of respective 4x4 blocks of the P frame and the B frame based on the obtained respective residual DCT coefficients of the P frame and the B frame, the positions of the respective reference coding blocks and the DCT coefficients of the respective reference coding blocks;
the DCT coefficients of the 4x4 blocks of each of the I frame, the P frame and the B frame are collectively called first DCT coefficients;
and converting the first DCT coefficient of the 4x4 block into the first DCT coefficient of the 8x8 block according to the block space relation of the DCT coefficients, namely obtaining the first DCT coefficient corresponding to the compressed code stream video.
In an embodiment, in the step S1, the specific process of obtaining the DCT coefficients of the respective 4x4 blocks of the P frame and the B frame based on the obtained residual DCT coefficients of the respective P frame and the B frame, the position of the respective reference coding block, and the DCT coefficients of the respective reference coding block includes the following steps:
when the reference coding blocks of the P frame and the B frame are positioned at integer positions of multiples of a reference frame pixel 4, directly adding residual DCT coefficients of the P frame or the B frame and DCT coefficients of the reference coding blocks of the P frame and the B frame to obtain DCT coefficients of 4x4 blocks of the P frame and the B frame;
when the reference coding blocks of the P frame and the B frame are located at integer positions which are not multiples of the pixel 4 of the reference frame, the DCT coefficients of the reference coding blocks of the P frame and the B frame are obtained according to the DCT coefficients of four blocks which are adjacent to the reference coding blocks and located at the integer positions of the multiples of the pixel 4 of the reference frame, then the residual DCT coefficients of the P frame and the B frame are respectively added with the DCT coefficients of the reference coding blocks of the P frame and the B frame, and the DCT coefficients of 4x4 blocks of the P frame and the B frame are obtained.
Is provided withTo reference a block of code, x 1 、x 2 、x 3 、x 4 Are respectively provided withDCT coefficients of four blocks that intersect as shown in fig. 2.I n Representing the identity matrix, the formula for obtaining the DCT coefficient of the reference coding block is as follows:
when the reference coding block is positioned at the fractional position, the extracted predicted value is subjected to 4x4 block DCT, and then residual DCT coefficients are added to obtain DCT coefficients.
In one embodiment, in step S1, the DCT coefficient conversion process is:
the DCT coefficients extracted from the h.264 code stream are 4x4 blocks, and the vehicle detection model converts 4x4 blocks of DCT coefficients into 8x8 blocks of DCT coefficients according to their block spatial relationship using 8x8 blocks of DCT as input.
Is provided with Y 2 Representing 4x4 blocks of DCT coefficients,a DCT transform matrix of order 8 is represented,to representWherein T is 4 Representing a 4 th order DCT transformation matrix, Y 2 Representing 8x8 blocks of DCT coefficients, wherein the schematic diagram of the DCT transform for different sizes is shown in fig. 3. The transformation of 4x4 blocks of DCT coefficients into 8x8 blocks of DCT coefficients is as follows:
in an embodiment, in the step S2, the specific process of preprocessing the first DCT coefficient to obtain the second DCT coefficient includes:
removing the DCT coefficients of Cb components and Cr components in the first DCT coefficient, reserving the DCT coefficients of a format 416x416 in the first DCT coefficient, converting the DCT coefficients into a format 52x52x64, sequencing the DCT coefficients after the format conversion according to the ZigZag, wherein the sequencing mode is shown in FIG. 4, reserving the DC coefficient at the upper left corner and 23 AC coefficients, namely a DC coefficient and the first 23 AC coefficients sequenced by the ZigZag in (2-3), and obtaining the DCT coefficients of a format 52x52x 24; and finally, taking the first 24 DCT coefficients from the sequencing result, namely the second DCT coefficient.
In one embodiment, the step S3, constructing the vehicle detection model process includes:
constructing a trunk feature extraction network, constructing the trunk feature extraction network based on a DarkNet-53 model, wherein the feature extraction network has a structure shown in figure 5, constructing a light-weight trunk feature extraction network, and extracting features by combining a residual network idea through the accumulation of convolution and residual structures to reduce the size of a feature map.
Constructing a regression detection network based on a feature pyramid, and detecting vehicles on feature maps of three scales of 52x52, 26x26 and 13x 13: the method comprises the steps that anchor frames with different sizes are arranged on feature maps with different sizes by using an anchor frame idea, a vehicle is detected based on regression of the anchor frames, the overall structure of a vehicle detection model is shown in FIG. 6, a DBL in FIG. 6 is composed of a convolution layer, a batch normalization layer (BN) and an activation function, and downsampling and feature extraction are achieved; the Resn residual module is formed by stacking one DBL and a plurality of residual components; the residual error component is composed of DBL and residual error edges, gradient disappearance is prevented, and learning accuracy is improved. The DCT coefficients with the size of 52x52x24 are input, the vehicles are detected on feature maps with different sizes through a feature extraction network, the final output results are 13x13x18, 26x26x18 and 52x52x18, and the output results comprise the positions of vehicle detection frames, the types of the vehicles and confidence information of the vehicles.
Determining a loss function, the loss function comprising: frame coordinate loss, confidence loss, and classification loss are detected.
In one embodiment, in step s4, the acquiring of the image sample set includes:
and (3) uniformly scaling the picture size in the open source image data set UA-DETRAC to 416x416, then extracting the DCT coefficient of the compression format image based on a Libjpeg library, processing the extracted DCT coefficient and outputting the DCT coefficient of the Y component with the size of 52x52x24, namely the image sample set.
In one embodiment, the step S4, the training the vehicle detection model using the image sample set includes:
initializing the network weight of the vehicle detection model, and initializing the network initial weight by using normal distribution;
setting the initial learning rate of the vehicle detection model to be 1e-4, and obtaining a self-adaptive learning rate in subsequent training by using an Adam algorithm;
setting the size of an anchor frame according to the label data of the image sample set by using a K-means clustering method, and setting anchor frames with three sizes on the feature maps with the three scales of 52x52, 26x26 and 13x13 by taking the idea of YOLOv3 as reference;
setting parameter values of the vehicle detection model: detecting the category, the batch size and the iteration number;
training the vehicle detection model using the set of image samples.
As shown in fig. 7, in an embodiment, a picture of a test video frame is extracted during decoding, DCT coefficients of the picture are extracted, and a prediction result obtained by mapping the test video frame onto a decoded video frame is processed by a trained vehicle detection model, as shown in fig. 8, where the detection result includes a detection frame, a type and a confidence of a vehicle, and it can be seen from fig. 8 that the invention can realize a good detection effect by using information in a compressed format picture in combination with deep learning.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
It should be noted that the terms "first \ second \ third" referred to in the embodiments of the present application are only used for distinguishing similar objects, and do not represent a specific ordering for the objects, and it should be understood that "first \ second \ third" may interchange a specific order or sequence when allowed. It should be understood that "first \ second \ third" distinct objects may be interchanged under appropriate circumstances such that the embodiments of the application described herein may be implemented in an order other than those illustrated or described herein.
The terms "comprising" and "having" and any variations thereof in the embodiments of the present application are intended to cover non-exclusive inclusions. For example, a process, method, apparatus, product, or device that comprises a list of steps or modules is not limited to the listed steps or modules but may alternatively include other steps or modules not listed or inherent to such process, method, product, or device.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (8)
1. A vehicle detection method based on compressed video DCT coefficients is characterized by comprising the following steps:
s1: extracting a compressed code stream video, and obtaining a first DCT coefficient corresponding to the compressed code stream video;
s2: preprocessing the first DCT coefficient to obtain a second DCT coefficient;
s3: constructing a vehicle detection model;
s4: acquiring an image sample set based on an open source image data set UA-DETRAC, and then training the vehicle detection model by using the image sample set to obtain a vehicle detection network;
s5: and inputting the second DCT coefficient into the vehicle detection network for detection, obtaining the position, type and confidence information of the vehicle, and then drawing a detection frame, type and confidence of the vehicle in the decoded compressed code stream video frame based on the position, type and confidence information of the vehicle, wherein the detection frame, type and confidence of the vehicle are the detection result of the vehicle.
2. The method according to claim 1, wherein the compressed code stream video is an H.264 compressed code stream video.
3. The method according to claim 2, wherein in step S1, the h.264 compressed code stream video is extracted to obtain the first DCT coefficient corresponding to the compressed code stream video, and the specific process includes the following steps:
converting the size of the H.264 compressed code stream video into 416x 416;
the image frame of the H.264 compressed code stream video comprises: i, P, and B frames;
obtaining residual DCT coefficients of 4x4 blocks of the H.264 compressed code stream video I frame and a predicted value under an I frame intra-frame prediction mode by using a JM (JM) decoder, then carrying out DCT transformation of 4x4 blocks on the predicted value under the I frame intra-frame prediction mode, and finally adding a transformation result with the residual DCT coefficients of 4x4 blocks of the H.264 compressed code stream video I frame to obtain the DCT coefficients of 4x4 blocks of the I frame;
obtaining respective residual DCT coefficients of a P frame and a B frame of the H.264 compressed code stream video and DCT coefficients of respective reference frames, obtaining positions of respective reference coding blocks and the DCT coefficients of the respective reference coding blocks according to the DCT coefficients of the respective reference frames of the P frame and the B frame and respective motion vectors of the P frame and the B frame, and obtaining the DCT coefficients of respective 4x4 blocks of the P frame and the B frame based on the obtained respective residual DCT coefficients of the P frame and the B frame, the positions of the respective reference coding blocks and the DCT coefficients of the respective reference coding blocks;
the DCT coefficients of the 4x4 blocks of each of the I frame, the P frame and the B frame are collectively called first DCT coefficients;
and converting the first DCT coefficient of the 4x4 block into the first DCT coefficient of the 8x8 block according to the block space relation of the DCT coefficients, namely obtaining the first DCT coefficient corresponding to the compressed code stream video.
4. The method as claimed in claim 3, wherein in step S1, the specific process of obtaining the DCT coefficients of the 4x4 blocks of the P frame and the B frame based on the obtained residual DCT coefficients of the P frame and the B frame, the positions of the reference coding blocks and the DCT coefficients of the reference coding blocks comprises the following steps:
when the reference coding blocks of the P frame and the B frame are positioned at integer positions of multiples of a reference frame pixel 4, directly adding residual DCT coefficients of the P frame or the B frame and DCT coefficients of the reference coding blocks of the P frame and the B frame to obtain DCT coefficients of 4x4 blocks of the P frame and the B frame;
when the reference coding blocks of the P frame and the B frame are located at integer positions which are not multiples of the pixel 4 of the reference frame, the DCT coefficients of the reference coding blocks of the P frame and the B frame are obtained according to the DCT coefficients of four blocks which are adjacent to the reference coding blocks and located at the integer positions of the multiples of the pixel 4 of the reference frame, then the residual DCT coefficients of the P frame and the B frame are respectively added with the DCT coefficients of the reference coding blocks of the P frame and the B frame, and the DCT coefficients of 4x4 blocks of the P frame and the B frame are obtained.
5. The method as claimed in claim 4, wherein the step S2, the pre-processing the first DCT coefficient to obtain the second DCT coefficient includes:
and removing the DCT coefficients of Cb components and Cr components in the first DCT coefficients, reserving the DCT coefficients of a format 416x416 in the first DCT coefficients, converting the DCT coefficients into a format of 52x52x64, sequencing the DCT coefficients after the format conversion according to ZigZag, and finally taking the first 24 DCT coefficients in the sequencing result, namely the second DCT coefficients.
6. The method for detecting vehicles according to claim 5, wherein in step S3, the specific process of constructing the vehicle detection model includes:
constructing a trunk feature extraction network based on a DarkNet-53 model, combining the trunk feature extraction network with a residual error network, extracting features through the accumulation of convolution and residual error structures, and reducing the size of a feature map;
constructing a regression detection network based on a feature pyramid, and detecting vehicles on feature maps of three scales of 52x52, 26x26 and 13x 13;
determining a loss function, wherein the loss function comprises detection frame coordinate loss, confidence coefficient loss and classification loss.
7. The method according to claim 6, wherein in step s4, the step of obtaining the image sample set comprises:
and (3) uniformly scaling the picture size in the open source image data set UA-DETRAC to 416x416, then extracting the DCT coefficient of the compression format image based on a Libjpeg library, processing the extracted DCT coefficient and outputting the DCT coefficient of the Y component with the size of 52x52x24, namely the image sample set.
8. The method according to claim 7, wherein the step S4 of training the vehicle detection model using the image sample set comprises:
initializing the network weight of the vehicle detection model, and initializing the network initial weight by using normal distribution;
setting the initial learning rate of the vehicle detection model to be 1e-4, and obtaining a self-adaptive learning rate in subsequent training by using an Adam algorithm;
setting the size of an anchor frame according to the label data of the image sample set by using a K-means clustering method, and setting anchor frames with three sizes on the feature maps with the three scales of 52x52, 26x26 and 13x13 by taking the idea of YOLOv3 as reference;
setting parameter values of the vehicle detection model: detecting the category, the batch size and the iteration number;
training the vehicle detection model using the set of image samples.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210411306.0A CN114866784A (en) | 2022-04-19 | 2022-04-19 | Vehicle detection method based on compressed video DCT (discrete cosine transformation) coefficients |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210411306.0A CN114866784A (en) | 2022-04-19 | 2022-04-19 | Vehicle detection method based on compressed video DCT (discrete cosine transformation) coefficients |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114866784A true CN114866784A (en) | 2022-08-05 |
Family
ID=82631816
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210411306.0A Pending CN114866784A (en) | 2022-04-19 | 2022-04-19 | Vehicle detection method based on compressed video DCT (discrete cosine transformation) coefficients |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114866784A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030194007A1 (en) * | 2002-04-12 | 2003-10-16 | William Chen | Method and apparatus for memory efficient compressed domain video processing |
CN111726633A (en) * | 2020-05-11 | 2020-09-29 | 河南大学 | Compressed video stream recoding method based on deep learning and significance perception |
CN111914664A (en) * | 2020-07-06 | 2020-11-10 | 同济大学 | Vehicle multi-target detection and track tracking method based on re-identification |
-
2022
- 2022-04-19 CN CN202210411306.0A patent/CN114866784A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030194007A1 (en) * | 2002-04-12 | 2003-10-16 | William Chen | Method and apparatus for memory efficient compressed domain video processing |
CN111726633A (en) * | 2020-05-11 | 2020-09-29 | 河南大学 | Compressed video stream recoding method based on deep learning and significance perception |
CN111914664A (en) * | 2020-07-06 | 2020-11-10 | 同济大学 | Vehicle multi-target detection and track tracking method based on re-identification |
Non-Patent Citations (1)
Title |
---|
李晓港: "基于视频压缩域的车辆检测方法研究", 《工程科技II辑》, 31 January 2024 (2024-01-31) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115049936B (en) | High-resolution remote sensing image-oriented boundary enhanced semantic segmentation method | |
CN113343707B (en) | Scene text recognition method based on robustness characterization learning | |
Zhang et al. | Objective video quality assessment combining transfer learning with CNN | |
CN111222513B (en) | License plate number recognition method and device, electronic equipment and storage medium | |
CN106529419B (en) | The object automatic testing method of saliency stacking-type polymerization | |
CN111626293A (en) | Image text recognition method and device, electronic equipment and storage medium | |
CN107046645A (en) | Image coding/decoding method and device | |
CN104661037B (en) | The detection method and system that compression image quantization table is distorted | |
CN111491167B (en) | Image encoding method, transcoding method, device, equipment and storage medium | |
CN112507842A (en) | Video character recognition method and device based on key frame extraction | |
CN113066089B (en) | Real-time image semantic segmentation method based on attention guide mechanism | |
CN111696136A (en) | Target tracking method based on coding and decoding structure | |
CN111429468B (en) | Cell nucleus segmentation method, device, equipment and storage medium | |
CN113505640A (en) | Small-scale pedestrian detection method based on multi-scale feature fusion | |
CN116824694A (en) | Action recognition system and method based on time sequence aggregation and gate control transducer | |
WO2023203509A1 (en) | Image data compression method and device using segmentation and classification | |
CN117877068B (en) | Mask self-supervision shielding pixel reconstruction-based shielding pedestrian re-identification method | |
CN110490170B (en) | Face candidate frame extraction method | |
CN114866784A (en) | Vehicle detection method based on compressed video DCT (discrete cosine transformation) coefficients | |
CN115170807A (en) | Image segmentation and model training method, device, equipment and medium | |
CN113727050B (en) | Video super-resolution processing method and device for mobile equipment and storage medium | |
CN116091862A (en) | Picture quality identification method, device, equipment, storage medium and product | |
CN114092827A (en) | Image data set generation method | |
CN113255646A (en) | Real-time scene text detection method | |
CN114758387B (en) | Lightweight face anti-fraud method and device based on single-frame RGB image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |