CN114866784A - Vehicle detection method based on compressed video DCT (discrete cosine transformation) coefficients - Google Patents

Vehicle detection method based on compressed video DCT (discrete cosine transformation) coefficients Download PDF

Info

Publication number
CN114866784A
CN114866784A CN202210411306.0A CN202210411306A CN114866784A CN 114866784 A CN114866784 A CN 114866784A CN 202210411306 A CN202210411306 A CN 202210411306A CN 114866784 A CN114866784 A CN 114866784A
Authority
CN
China
Prior art keywords
frame
dct coefficients
dct
vehicle detection
blocks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210411306.0A
Other languages
Chinese (zh)
Inventor
何铁军
李晓港
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202210411306.0A priority Critical patent/CN114866784A/en
Publication of CN114866784A publication Critical patent/CN114866784A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/625Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using discrete cosine transform [DCT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/149Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Algebra (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Discrete Mathematics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a vehicle detection method based on compressed video DCT coefficients, which comprises the following steps: extracting a compressed code stream video, and obtaining a first DCT coefficient corresponding to the compressed code stream video; preprocessing is carried out on the basis of the first DCT coefficient, and a second DCT coefficient is obtained; constructing a vehicle detection model; acquiring an image sample set based on an open source image data set UA-DETRAC, and then training the vehicle detection model by using the image sample set to obtain a vehicle detection network; and acquiring a vehicle detection result based on the compressed code stream video, the second DCT coefficient and the vehicle detection network. The method utilizes the characteristic that the characteristic information can be obtained without completely decoding the compressed format data, combines the depth science, reduces the complexity of a vehicle detection model, reduces the calculation force required by vehicle detection, and meets the requirement of edge calculation.

Description

Vehicle detection method based on compressed video DCT (discrete cosine transformation) coefficients
Technical Field
The invention relates to the field of vehicle detection, in particular to a vehicle detection method based on video compression DCT coefficients.
Background
With the high-speed development of national economy, the quantity of motor vehicles in China is continuously increased, and the increase of the motor vehicles brings a series of problems such as traffic jam and the like, so that an intelligent transportation system is necessary to be developed. Road vehicle detection has been vigorously developed as an important technology in intelligent transportation systems. The video-based vehicle detection method has the characteristics of large information content, small interference to traffic facilities and the like.
The current video detection work firstly needs to be transmitted through a network, and the transmitted video uses a compressed format. The video vehicle detection method based on the pixel domain needs to completely decode the video and then realize vehicle detection through the current mainstream deep learning method. The deep learning detection method based on the pixel domain can realize real-time end-to-end detection at present, has high precision, but has a complex model, needs to completely decode a video and has high consumption of computing resources.
The vehicle detection method directly using the video compressed domain information does not need to completely decode the video, only needs to partially decode the video to obtain the compressed domain information, and can realize vehicle detection based on the characteristic information contained in the compressed domain, but the precision is not high.
Disclosure of Invention
In order to solve the problems, the invention provides a vehicle detection method based on a compressed video DCT coefficient.
In order to achieve the purpose of the invention, the invention provides a vehicle detection method based on compressed video DCT, which comprises the following steps:
s1: extracting a compressed code stream video, and obtaining a first DCT coefficient corresponding to the compressed code stream video;
s2: preprocessing the first DCT coefficient to obtain a second DCT coefficient;
s3: constructing a vehicle detection model;
s4: acquiring an image sample set based on an open source image data set UA-DETRAC, and then training the vehicle detection model by using the image sample set to obtain a vehicle detection network;
s5: and inputting the second DCT coefficient into the vehicle detection network for detection, obtaining the position, type and confidence information of the vehicle, and then drawing a detection frame, type and confidence of the vehicle in the decoded compressed code stream video frame based on the position, type and confidence information of the vehicle, wherein the detection frame, type and confidence of the vehicle are the detection result of the vehicle.
Further, the compressed code stream video is an h.264 compressed code stream video.
Further, in the step S1, the extracting is performed on the h.264 compressed code stream video to obtain a first DCT coefficient corresponding to the compressed code stream video, and the specific process includes the following steps:
converting the size of the H.264 compressed code stream video into 416x 416;
the image frame of the H.264 compressed code stream video comprises: i, P, and B frames;
obtaining residual DCT coefficients of 4x4 blocks of the H.264 compressed code stream video I frame and a predicted value under an I frame intra-frame prediction mode by using a JM (JM) decoder, then carrying out DCT transformation of 4x4 blocks on the predicted value under the I frame intra-frame prediction mode, and finally adding a transformation result with the residual DCT coefficients of 4x4 blocks of the H.264 compressed code stream video I frame to obtain the DCT coefficients of 4x4 blocks of the I frame;
obtaining respective residual DCT coefficients of a P frame and a B frame of the H.264 compressed code stream video and DCT coefficients of respective reference frames, obtaining positions of respective reference coding blocks and the DCT coefficients of the respective reference coding blocks according to the DCT coefficients of the respective reference frames of the P frame and the B frame and respective motion vectors of the P frame and the B frame, and obtaining the DCT coefficients of respective 4x4 blocks of the P frame and the B frame based on the obtained respective residual DCT coefficients of the P frame and the B frame, the positions of the respective reference coding blocks and the DCT coefficients of the respective reference coding blocks;
the DCT coefficients of the 4x4 blocks of each of the I frame, the P frame and the B frame are collectively called first DCT coefficients;
and converting the first DCT coefficient of the 4x4 block into the first DCT coefficient of the 8x8 block according to the block space relation of the DCT coefficients, namely obtaining the first DCT coefficient corresponding to the compressed code stream video.
Further, in step S1, the specific process of obtaining the DCT coefficients of the respective 4x4 blocks of the P frame and the B frame based on the obtained residual DCT coefficients of the P frame and the B frame, the positions of the respective reference coding blocks, and the DCT coefficients of the respective reference coding blocks includes the following steps:
when the reference coding blocks of the P frame and the B frame are positioned at integer positions of multiples of a reference frame pixel 4, directly adding residual DCT coefficients of the P frame or the B frame and DCT coefficients of the reference coding blocks of the P frame and the B frame to obtain DCT coefficients of 4x4 blocks of the P frame and the B frame;
when the reference coding blocks of the P frame and the B frame are located at integer positions which are not multiples of the pixel 4 of the reference frame, the DCT coefficients of the reference coding blocks of the P frame and the B frame are obtained according to the DCT coefficients of four blocks which are adjacent to the reference coding blocks and located at the integer positions of the multiples of the pixel 4 of the reference frame, then the residual DCT coefficients of the P frame and the B frame are respectively added with the DCT coefficients of the reference coding blocks of the P frame and the B frame, and the DCT coefficients of 4x4 blocks of the P frame and the B frame are obtained.
Further, in step S2, the specific process of preprocessing the first DCT coefficient to obtain a second DCT coefficient includes:
and removing the DCT coefficients of Cb components and Cr components in the first DCT coefficients, reserving the DCT coefficients of a format 416x416 in the first DCT coefficients, converting the DCT coefficients into a format of 52x52x64, sequencing the DCT coefficients after the format conversion according to ZigZag, and finally taking the first 24 DCT coefficients in the sequencing result, namely the second DCT coefficients.
Further, in step S3, the specific process of constructing the vehicle detection model includes:
constructing a trunk feature extraction network based on a DarkNet-53 model, combining the trunk feature extraction network with a residual error network, extracting features through the accumulation of convolution and residual error structures, and reducing the size of a feature map;
constructing a regression detection network based on a feature pyramid, and detecting vehicles on feature maps of three scales of 52x52, 26x26 and 13x 13;
determining a loss function, wherein the loss function comprises detection frame coordinate loss, confidence coefficient loss and classification loss.
Further, in step s4, the acquiring process of the image sample set includes:
and (3) uniformly scaling the picture size in the open source image data set UA-DETRAC to 416x416, then extracting the DCT coefficient of the compression format image based on a Libjpeg library, processing the extracted DCT coefficient and outputting the DCT coefficient of the Y component with the size of 52x52x24, namely the image sample set.
Further, in step S4, the specific process of training the vehicle detection model using the image sample set includes:
initializing the network weight of the vehicle detection model, and initializing the network initial weight by using normal distribution;
setting the initial learning rate of the vehicle detection model to be 1e-4, and obtaining a self-adaptive learning rate in subsequent training by using an Adam algorithm;
setting the size of an anchor frame according to the label data of the image sample set by using a K-means clustering method, and setting anchor frames with three sizes on the feature maps with the three scales of 52x52, 26x26 and 13x13 by taking the idea of YOLOv3 as reference;
setting parameter values of the vehicle detection model: detecting the category, the batch size and the iteration number;
training the vehicle detection model using the set of image samples.
Compared with the prior art, the invention has the following beneficial technical effects:
according to the scheme, the vehicle detection method based on the deep learning of the pixel domain is combined with the method based on the video compression domain information, so that the end-to-end detection with high precision can be realized, the picture does not need to be completely decoded, and the resource consumption is greatly reduced.
Drawings
FIG. 1 is a flowchart illustrating a method for vehicle detection based on compressed video DCT coefficients according to an embodiment;
FIG. 2 is a diagram of referencing an encoding block and obtaining DCT coefficients from neighboring blocks in one embodiment;
FIG. 3 is a diagram illustrating the spatial relationship of a DCT block to sub-blocks of an embodiment;
FIG. 4 is a schematic illustration of a Zigzag arrangement of an embodiment;
FIG. 5 is a diagram of a backbone feature extraction network architecture of one embodiment;
FIG. 6 is a diagram illustrating an overall architecture of a vehicle inspection model according to an embodiment;
FIG. 7 is a test video frame of an embodiment;
FIG. 8 shows vehicle detection results according to one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The data set used in the specific implementation of the invention is a UA-DETRAC vehicle detection data set which comprises 8250 vehicles and 121 ten thousand target detection frames, and the training of a vehicle detection model is realized according to DCT feature information extracted from a data set picture, so that the vehicle detection of an H.264 compressed code stream is realized, and the vehicle detection of a compressed format video is realized.
As shown in fig. 1, a method for detecting a vehicle based on compressed video DCT coefficients includes the following steps:
s1: extracting a compressed code stream video, and obtaining a first DCT coefficient corresponding to the compressed code stream video;
s2: preprocessing the first DCT coefficient to obtain a second DCT coefficient;
s3: constructing a vehicle detection model;
s4: acquiring an image sample set based on an open source image data set UA-DETRAC, and then training the vehicle detection model by using the image sample set to obtain a vehicle detection network;
s5: and inputting the second DCT coefficient into the vehicle detection network for detection, obtaining the position, type and confidence information of the vehicle, and then drawing a detection frame, type and confidence of the vehicle in the decoded compressed code stream video frame based on the position, type and confidence information of the vehicle, wherein the detection frame, type and confidence of the vehicle are the detection result of the vehicle.
In an embodiment, in step S1, the extracting is performed on the h.264 compressed code stream video to obtain a first DCT coefficient corresponding to the compressed code stream video, and the specific process includes the following steps:
converting the size of the H.264 compressed code stream video into 416x 416;
the image frame of the H.264 compressed code stream video comprises: i, P, and B frames;
obtaining residual DCT coefficients of 4x4 blocks of the H.264 compressed code stream video I frame and a predicted value under an I frame intra-frame prediction mode by using a JM (JM) decoder, then carrying out DCT transformation of 4x4 blocks on the predicted value under the I frame intra-frame prediction mode, and finally adding a transformation result with the residual DCT coefficients of 4x4 blocks of the H.264 compressed code stream video I frame to obtain the DCT coefficients of 4x4 blocks of the I frame;
obtaining respective residual DCT coefficients of a P frame and a B frame of the H.264 compressed code stream video and DCT coefficients of respective reference frames, obtaining positions of respective reference coding blocks and the DCT coefficients of the respective reference coding blocks according to the DCT coefficients of the respective reference frames of the P frame and the B frame and respective motion vectors of the P frame and the B frame, and obtaining the DCT coefficients of respective 4x4 blocks of the P frame and the B frame based on the obtained respective residual DCT coefficients of the P frame and the B frame, the positions of the respective reference coding blocks and the DCT coefficients of the respective reference coding blocks;
the DCT coefficients of the 4x4 blocks of each of the I frame, the P frame and the B frame are collectively called first DCT coefficients;
and converting the first DCT coefficient of the 4x4 block into the first DCT coefficient of the 8x8 block according to the block space relation of the DCT coefficients, namely obtaining the first DCT coefficient corresponding to the compressed code stream video.
In an embodiment, in the step S1, the specific process of obtaining the DCT coefficients of the respective 4x4 blocks of the P frame and the B frame based on the obtained residual DCT coefficients of the respective P frame and the B frame, the position of the respective reference coding block, and the DCT coefficients of the respective reference coding block includes the following steps:
when the reference coding blocks of the P frame and the B frame are positioned at integer positions of multiples of a reference frame pixel 4, directly adding residual DCT coefficients of the P frame or the B frame and DCT coefficients of the reference coding blocks of the P frame and the B frame to obtain DCT coefficients of 4x4 blocks of the P frame and the B frame;
when the reference coding blocks of the P frame and the B frame are located at integer positions which are not multiples of the pixel 4 of the reference frame, the DCT coefficients of the reference coding blocks of the P frame and the B frame are obtained according to the DCT coefficients of four blocks which are adjacent to the reference coding blocks and located at the integer positions of the multiples of the pixel 4 of the reference frame, then the residual DCT coefficients of the P frame and the B frame are respectively added with the DCT coefficients of the reference coding blocks of the P frame and the B frame, and the DCT coefficients of 4x4 blocks of the P frame and the B frame are obtained.
Is provided with
Figure BDA0003603831200000051
To reference a block of code, x 1 、x 2 、x 3 、x 4 Are respectively provided with
Figure BDA0003603831200000052
DCT coefficients of four blocks that intersect as shown in fig. 2.
Figure BDA0003603831200000061
I n Representing the identity matrix, the formula for obtaining the DCT coefficient of the reference coding block is as follows:
Figure BDA0003603831200000062
when the reference coding block is positioned at the fractional position, the extracted predicted value is subjected to 4x4 block DCT, and then residual DCT coefficients are added to obtain DCT coefficients.
In one embodiment, in step S1, the DCT coefficient conversion process is:
the DCT coefficients extracted from the h.264 code stream are 4x4 blocks, and the vehicle detection model converts 4x4 blocks of DCT coefficients into 8x8 blocks of DCT coefficients according to their block spatial relationship using 8x8 blocks of DCT as input.
Is provided with Y 2 Representing 4x4 blocks of DCT coefficients,
Figure BDA0003603831200000063
a DCT transform matrix of order 8 is represented,
Figure BDA0003603831200000064
to represent
Figure BDA0003603831200000065
Wherein T is 4 Representing a 4 th order DCT transformation matrix, Y 2 Representing 8x8 blocks of DCT coefficients, wherein the schematic diagram of the DCT transform for different sizes is shown in fig. 3. The transformation of 4x4 blocks of DCT coefficients into 8x8 blocks of DCT coefficients is as follows:
Figure BDA0003603831200000066
in an embodiment, in the step S2, the specific process of preprocessing the first DCT coefficient to obtain the second DCT coefficient includes:
removing the DCT coefficients of Cb components and Cr components in the first DCT coefficient, reserving the DCT coefficients of a format 416x416 in the first DCT coefficient, converting the DCT coefficients into a format 52x52x64, sequencing the DCT coefficients after the format conversion according to the ZigZag, wherein the sequencing mode is shown in FIG. 4, reserving the DC coefficient at the upper left corner and 23 AC coefficients, namely a DC coefficient and the first 23 AC coefficients sequenced by the ZigZag in (2-3), and obtaining the DCT coefficients of a format 52x52x 24; and finally, taking the first 24 DCT coefficients from the sequencing result, namely the second DCT coefficient.
In one embodiment, the step S3, constructing the vehicle detection model process includes:
constructing a trunk feature extraction network, constructing the trunk feature extraction network based on a DarkNet-53 model, wherein the feature extraction network has a structure shown in figure 5, constructing a light-weight trunk feature extraction network, and extracting features by combining a residual network idea through the accumulation of convolution and residual structures to reduce the size of a feature map.
Constructing a regression detection network based on a feature pyramid, and detecting vehicles on feature maps of three scales of 52x52, 26x26 and 13x 13: the method comprises the steps that anchor frames with different sizes are arranged on feature maps with different sizes by using an anchor frame idea, a vehicle is detected based on regression of the anchor frames, the overall structure of a vehicle detection model is shown in FIG. 6, a DBL in FIG. 6 is composed of a convolution layer, a batch normalization layer (BN) and an activation function, and downsampling and feature extraction are achieved; the Resn residual module is formed by stacking one DBL and a plurality of residual components; the residual error component is composed of DBL and residual error edges, gradient disappearance is prevented, and learning accuracy is improved. The DCT coefficients with the size of 52x52x24 are input, the vehicles are detected on feature maps with different sizes through a feature extraction network, the final output results are 13x13x18, 26x26x18 and 52x52x18, and the output results comprise the positions of vehicle detection frames, the types of the vehicles and confidence information of the vehicles.
Determining a loss function, the loss function comprising: frame coordinate loss, confidence loss, and classification loss are detected.
In one embodiment, in step s4, the acquiring of the image sample set includes:
and (3) uniformly scaling the picture size in the open source image data set UA-DETRAC to 416x416, then extracting the DCT coefficient of the compression format image based on a Libjpeg library, processing the extracted DCT coefficient and outputting the DCT coefficient of the Y component with the size of 52x52x24, namely the image sample set.
In one embodiment, the step S4, the training the vehicle detection model using the image sample set includes:
initializing the network weight of the vehicle detection model, and initializing the network initial weight by using normal distribution;
setting the initial learning rate of the vehicle detection model to be 1e-4, and obtaining a self-adaptive learning rate in subsequent training by using an Adam algorithm;
setting the size of an anchor frame according to the label data of the image sample set by using a K-means clustering method, and setting anchor frames with three sizes on the feature maps with the three scales of 52x52, 26x26 and 13x13 by taking the idea of YOLOv3 as reference;
setting parameter values of the vehicle detection model: detecting the category, the batch size and the iteration number;
training the vehicle detection model using the set of image samples.
As shown in fig. 7, in an embodiment, a picture of a test video frame is extracted during decoding, DCT coefficients of the picture are extracted, and a prediction result obtained by mapping the test video frame onto a decoded video frame is processed by a trained vehicle detection model, as shown in fig. 8, where the detection result includes a detection frame, a type and a confidence of a vehicle, and it can be seen from fig. 8 that the invention can realize a good detection effect by using information in a compressed format picture in combination with deep learning.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
It should be noted that the terms "first \ second \ third" referred to in the embodiments of the present application are only used for distinguishing similar objects, and do not represent a specific ordering for the objects, and it should be understood that "first \ second \ third" may interchange a specific order or sequence when allowed. It should be understood that "first \ second \ third" distinct objects may be interchanged under appropriate circumstances such that the embodiments of the application described herein may be implemented in an order other than those illustrated or described herein.
The terms "comprising" and "having" and any variations thereof in the embodiments of the present application are intended to cover non-exclusive inclusions. For example, a process, method, apparatus, product, or device that comprises a list of steps or modules is not limited to the listed steps or modules but may alternatively include other steps or modules not listed or inherent to such process, method, product, or device.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (8)

1. A vehicle detection method based on compressed video DCT coefficients is characterized by comprising the following steps:
s1: extracting a compressed code stream video, and obtaining a first DCT coefficient corresponding to the compressed code stream video;
s2: preprocessing the first DCT coefficient to obtain a second DCT coefficient;
s3: constructing a vehicle detection model;
s4: acquiring an image sample set based on an open source image data set UA-DETRAC, and then training the vehicle detection model by using the image sample set to obtain a vehicle detection network;
s5: and inputting the second DCT coefficient into the vehicle detection network for detection, obtaining the position, type and confidence information of the vehicle, and then drawing a detection frame, type and confidence of the vehicle in the decoded compressed code stream video frame based on the position, type and confidence information of the vehicle, wherein the detection frame, type and confidence of the vehicle are the detection result of the vehicle.
2. The method according to claim 1, wherein the compressed code stream video is an H.264 compressed code stream video.
3. The method according to claim 2, wherein in step S1, the h.264 compressed code stream video is extracted to obtain the first DCT coefficient corresponding to the compressed code stream video, and the specific process includes the following steps:
converting the size of the H.264 compressed code stream video into 416x 416;
the image frame of the H.264 compressed code stream video comprises: i, P, and B frames;
obtaining residual DCT coefficients of 4x4 blocks of the H.264 compressed code stream video I frame and a predicted value under an I frame intra-frame prediction mode by using a JM (JM) decoder, then carrying out DCT transformation of 4x4 blocks on the predicted value under the I frame intra-frame prediction mode, and finally adding a transformation result with the residual DCT coefficients of 4x4 blocks of the H.264 compressed code stream video I frame to obtain the DCT coefficients of 4x4 blocks of the I frame;
obtaining respective residual DCT coefficients of a P frame and a B frame of the H.264 compressed code stream video and DCT coefficients of respective reference frames, obtaining positions of respective reference coding blocks and the DCT coefficients of the respective reference coding blocks according to the DCT coefficients of the respective reference frames of the P frame and the B frame and respective motion vectors of the P frame and the B frame, and obtaining the DCT coefficients of respective 4x4 blocks of the P frame and the B frame based on the obtained respective residual DCT coefficients of the P frame and the B frame, the positions of the respective reference coding blocks and the DCT coefficients of the respective reference coding blocks;
the DCT coefficients of the 4x4 blocks of each of the I frame, the P frame and the B frame are collectively called first DCT coefficients;
and converting the first DCT coefficient of the 4x4 block into the first DCT coefficient of the 8x8 block according to the block space relation of the DCT coefficients, namely obtaining the first DCT coefficient corresponding to the compressed code stream video.
4. The method as claimed in claim 3, wherein in step S1, the specific process of obtaining the DCT coefficients of the 4x4 blocks of the P frame and the B frame based on the obtained residual DCT coefficients of the P frame and the B frame, the positions of the reference coding blocks and the DCT coefficients of the reference coding blocks comprises the following steps:
when the reference coding blocks of the P frame and the B frame are positioned at integer positions of multiples of a reference frame pixel 4, directly adding residual DCT coefficients of the P frame or the B frame and DCT coefficients of the reference coding blocks of the P frame and the B frame to obtain DCT coefficients of 4x4 blocks of the P frame and the B frame;
when the reference coding blocks of the P frame and the B frame are located at integer positions which are not multiples of the pixel 4 of the reference frame, the DCT coefficients of the reference coding blocks of the P frame and the B frame are obtained according to the DCT coefficients of four blocks which are adjacent to the reference coding blocks and located at the integer positions of the multiples of the pixel 4 of the reference frame, then the residual DCT coefficients of the P frame and the B frame are respectively added with the DCT coefficients of the reference coding blocks of the P frame and the B frame, and the DCT coefficients of 4x4 blocks of the P frame and the B frame are obtained.
5. The method as claimed in claim 4, wherein the step S2, the pre-processing the first DCT coefficient to obtain the second DCT coefficient includes:
and removing the DCT coefficients of Cb components and Cr components in the first DCT coefficients, reserving the DCT coefficients of a format 416x416 in the first DCT coefficients, converting the DCT coefficients into a format of 52x52x64, sequencing the DCT coefficients after the format conversion according to ZigZag, and finally taking the first 24 DCT coefficients in the sequencing result, namely the second DCT coefficients.
6. The method for detecting vehicles according to claim 5, wherein in step S3, the specific process of constructing the vehicle detection model includes:
constructing a trunk feature extraction network based on a DarkNet-53 model, combining the trunk feature extraction network with a residual error network, extracting features through the accumulation of convolution and residual error structures, and reducing the size of a feature map;
constructing a regression detection network based on a feature pyramid, and detecting vehicles on feature maps of three scales of 52x52, 26x26 and 13x 13;
determining a loss function, wherein the loss function comprises detection frame coordinate loss, confidence coefficient loss and classification loss.
7. The method according to claim 6, wherein in step s4, the step of obtaining the image sample set comprises:
and (3) uniformly scaling the picture size in the open source image data set UA-DETRAC to 416x416, then extracting the DCT coefficient of the compression format image based on a Libjpeg library, processing the extracted DCT coefficient and outputting the DCT coefficient of the Y component with the size of 52x52x24, namely the image sample set.
8. The method according to claim 7, wherein the step S4 of training the vehicle detection model using the image sample set comprises:
initializing the network weight of the vehicle detection model, and initializing the network initial weight by using normal distribution;
setting the initial learning rate of the vehicle detection model to be 1e-4, and obtaining a self-adaptive learning rate in subsequent training by using an Adam algorithm;
setting the size of an anchor frame according to the label data of the image sample set by using a K-means clustering method, and setting anchor frames with three sizes on the feature maps with the three scales of 52x52, 26x26 and 13x13 by taking the idea of YOLOv3 as reference;
setting parameter values of the vehicle detection model: detecting the category, the batch size and the iteration number;
training the vehicle detection model using the set of image samples.
CN202210411306.0A 2022-04-19 2022-04-19 Vehicle detection method based on compressed video DCT (discrete cosine transformation) coefficients Pending CN114866784A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210411306.0A CN114866784A (en) 2022-04-19 2022-04-19 Vehicle detection method based on compressed video DCT (discrete cosine transformation) coefficients

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210411306.0A CN114866784A (en) 2022-04-19 2022-04-19 Vehicle detection method based on compressed video DCT (discrete cosine transformation) coefficients

Publications (1)

Publication Number Publication Date
CN114866784A true CN114866784A (en) 2022-08-05

Family

ID=82631816

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210411306.0A Pending CN114866784A (en) 2022-04-19 2022-04-19 Vehicle detection method based on compressed video DCT (discrete cosine transformation) coefficients

Country Status (1)

Country Link
CN (1) CN114866784A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030194007A1 (en) * 2002-04-12 2003-10-16 William Chen Method and apparatus for memory efficient compressed domain video processing
CN111726633A (en) * 2020-05-11 2020-09-29 河南大学 Compressed video stream recoding method based on deep learning and significance perception
CN111914664A (en) * 2020-07-06 2020-11-10 同济大学 Vehicle multi-target detection and track tracking method based on re-identification

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030194007A1 (en) * 2002-04-12 2003-10-16 William Chen Method and apparatus for memory efficient compressed domain video processing
CN111726633A (en) * 2020-05-11 2020-09-29 河南大学 Compressed video stream recoding method based on deep learning and significance perception
CN111914664A (en) * 2020-07-06 2020-11-10 同济大学 Vehicle multi-target detection and track tracking method based on re-identification

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李晓港: "基于视频压缩域的车辆检测方法研究", 《工程科技II辑》, 31 January 2024 (2024-01-31) *

Similar Documents

Publication Publication Date Title
CN115049936B (en) High-resolution remote sensing image-oriented boundary enhanced semantic segmentation method
CN113343707B (en) Scene text recognition method based on robustness characterization learning
Zhang et al. Objective video quality assessment combining transfer learning with CNN
CN111222513B (en) License plate number recognition method and device, electronic equipment and storage medium
CN106529419B (en) The object automatic testing method of saliency stacking-type polymerization
CN111626293A (en) Image text recognition method and device, electronic equipment and storage medium
CN107046645A (en) Image coding/decoding method and device
CN104661037B (en) The detection method and system that compression image quantization table is distorted
CN111491167B (en) Image encoding method, transcoding method, device, equipment and storage medium
CN112507842A (en) Video character recognition method and device based on key frame extraction
CN113066089B (en) Real-time image semantic segmentation method based on attention guide mechanism
CN111696136A (en) Target tracking method based on coding and decoding structure
CN111429468B (en) Cell nucleus segmentation method, device, equipment and storage medium
CN113505640A (en) Small-scale pedestrian detection method based on multi-scale feature fusion
CN116824694A (en) Action recognition system and method based on time sequence aggregation and gate control transducer
WO2023203509A1 (en) Image data compression method and device using segmentation and classification
CN117877068B (en) Mask self-supervision shielding pixel reconstruction-based shielding pedestrian re-identification method
CN110490170B (en) Face candidate frame extraction method
CN114866784A (en) Vehicle detection method based on compressed video DCT (discrete cosine transformation) coefficients
CN115170807A (en) Image segmentation and model training method, device, equipment and medium
CN113727050B (en) Video super-resolution processing method and device for mobile equipment and storage medium
CN116091862A (en) Picture quality identification method, device, equipment, storage medium and product
CN114092827A (en) Image data set generation method
CN113255646A (en) Real-time scene text detection method
CN114758387B (en) Lightweight face anti-fraud method and device based on single-frame RGB image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination