CN115150628A - Coarse-to-fine depth video coding method with super-prior guiding mode prediction - Google Patents

Coarse-to-fine depth video coding method with super-prior guiding mode prediction Download PDF

Info

Publication number
CN115150628A
CN115150628A CN202210727355.5A CN202210727355A CN115150628A CN 115150628 A CN115150628 A CN 115150628A CN 202210727355 A CN202210727355 A CN 202210727355A CN 115150628 A CN115150628 A CN 115150628A
Authority
CN
China
Prior art keywords
motion
compression
features
super
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210727355.5A
Other languages
Chinese (zh)
Inventor
盛律
胡智昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202210727355.5A priority Critical patent/CN115150628A/en
Publication of CN115150628A publication Critical patent/CN115150628A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention provides a coarse-to-fine depth video coding method with super-prior guiding mode prediction, which comprises the following steps: features of an input video frame are extracted, then motion estimation is carried out twice in a rough to fine mode, and compression and compensation are carried out to obtain predicted features, wherein motion compression carried out on a fine layer surface uses motion compression guided by a super-prior. After the motion compensated features are obtained, the residual information is compressed by the residual compression guided by the super-first-check. And finally, loading the reconstructed residual error characteristics back to the prediction characteristics, and obtaining the reconstructed video frame through a frame reconstruction module. The invention can better process complex and large-motion scenes and improve the motion compensation quality under the condition of extremely little bit consumption. The resolution of different blocks in motion compression and whether the compression of the current block is skipped in residual compression are predicted by using the prior information, so that the bit number required in motion and residual compression is greatly saved.

Description

Coarse-to-fine depth video coding method with super-prior guiding mode prediction
Technical Field
The invention relates to the technical field of video compression and deep learning, in particular to a coarse-to-fine depth video compression coding method with super-prior guide mode prediction.
Background
The phenomenon that the total traffic of the internet occupied by the video content is increased year by year is more and more prominent at present, which is caused by that the traffic of a video website is increased year by year, higher resolution ratio is supported, and higher frame rate is achieved. Most of the video compression algorithms that we use daily are the traditional video compression algorithms h.264 and h.265. Therefore, in the field of video compression, there is an urgent need for a new video compression system based on deep learning to effectively reduce redundant information in a video sequence.
Although the existing video compression algorithm based on deep learning can achieve a good video restoration effect, the existing video compression algorithm only uses a single-scale motion estimation and motion compensation strategy, and because motion information in a video is very complex, the single-scale video compression algorithm can perform a poor effect on a scene with large motion and complex motion. In addition, the existing video compression method based on deep learning cannot use a mode selection strategy, so that the performance of the video compression algorithm based on deep learning is greatly limited.
Therefore, an urgent need exists in the art to solve the problem of how to provide a method for depth video coding with super-apriori guided mode prediction that can effectively reduce the number of bits consumed and improve the compression performance.
Disclosure of Invention
In view of this, the present invention provides a coarse-to-fine depth video coding method with super-prior guided mode prediction.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method for coarse-to-fine depth video coding with super-prior guided mode prediction, comprising the steps of:
s1, feature acquisition: obtaining the input image frame X to be compressed t The reconstructed reference frame obtained by compressing with the previous frame
Figure BSA0000276259820000011
Respectively extracting to obtain input features F t And reference character
Figure BSA0000276259820000012
S2, rough motion compensation: the input feature F t And reference character
Figure BSA0000276259820000021
Obtaining a coarse offset between two frames via one motion estimation and one motion compression, loading the coarse offset to the reference feature
Figure BSA0000276259820000022
Performing motion compensation once to obtain intermediate prediction characteristics
Figure BSA0000276259820000023
S3, fine motion compensation: the intermediate prediction feature
Figure BSA0000276259820000024
And input feature F t Performing secondary motion estimation, secondary motion compression and secondary motion compensation again to generate final prediction characteristics
Figure BSA0000276259820000025
The secondary motion compression adopts a super-prior guided self-adaptive motion compression method, super-prior information of the features obtained by secondary motion estimation is used as input to carry out resolution mode prediction, and the obtained prediction feature block guides coding and decoding operations of the features obtained by secondary motion estimation in the secondary motion compression;
s4, residual error feature compression: input feature F t And final predicted features
Figure BSA0000276259820000026
Residual error characteristic R between t The adaptive residual error compression method guided by the super-a-priori is used for carrying out skip/non-skip mode prediction, and the characteristics of residual error values meeting the requirements of set thresholds are skipped to obtain reconstructed residual error characteristics
Figure BSA0000276259820000027
And loaded into the final predicted features
Figure BSA0000276259820000028
Generating reconstruction features
Figure BSA0000276259820000029
S5, the reconstruction characteristics are carried out
Figure BSA00002762598200000210
Input to a frame reconstruction module to generate a reconstructed frame
Figure BSA00002762598200000211
S6, reconstructing the frame
Figure BSA00002762598200000212
And (5) as a reference frame of the next frame, repeating the steps of S1-S5 until the last frame to obtain the compressed video.
Preferably, S1 further comprises: when t =1, reconstructing the reference frame
Figure BSA00002762598200000213
For an input image frame X t And obtaining a reconstructed frame through compression by a compression algorithm.
Preferably, the S2 includes:
by pair input features F t And reference character
Figure BSA00002762598200000214
Carrying out down-sampling operation to scale the two low-resolution features into two low-resolution features with the size of 1/n of the original features;
after motion estimation and motion compression are carried out on the two low-resolution features, up-sampling operation is carried out to scale the two low-resolution features by n times, and then the rough offset between two frames is obtained;
setting the rough offset to be in the reference characteristic
Figure BSA00002762598200000215
Based on a single motion compensation using a deformable convolution to generate intermediate predictive features
Figure BSA00002762598200000216
Preferably, the input features F after down sampling t And reference character
Figure BSA00002762598200000217
Input to coarseA coarse motion estimation network that connects and passes the two features to the two convolutional layers.
Preferably, the features after motion estimation are input to a coarse motion compression network for performing a motion compression, where the coarse motion compression network is composed of a motion coding network and a motion decoding network.
Preferably, the S3 includes:
the pre-learning prediction network based on the super-prior information, namely a resolution mode prediction network, is used for outputting the optimal block resolution;
inputting the input characteristics needing to be compressed into a motion encoder to be encoded to obtain motion characteristics M t Motion characteristics M t Obtaining the super-prior information as the input of the super-prior network;
inputting the super-prior information into a resolution mode prediction network for predicting the optimal resolution of each feature block to obtain a predicted resolution mode;
the motion characteristics M t Inputting the average pooling layer to the mode-guided average pooling layer for performing corresponding average pooling operation, and inputting the average pooled feature to the mode-guided upsampling layer to restore the original size as the feature
Figure BSA0000276259820000031
Figure BSA0000276259820000032
And inputting the motion into a motion decoder for decoding to obtain the compressed motion characteristics.
Preferably, the super-a-priori information includes the motion feature M t Mean and variance of (c).
Preferably, the coded features in the primary motion compression, the secondary motion compression module and the residual feature compression process are all converted into bit streams and then are subjected to corresponding decoding operations.
Through the technical scheme, compared with the prior art, the invention has the beneficial effects that:
1. the invention provides a depth video compression framework from coarse to fine, wherein motion estimation, motion compression and motion compensation are carried out twice in a coarse-to-fine mode, so that complex and large-motion scenes can be better processed, and the motion compensation quality is improved under the condition of extremely low bit consumption.
2. The invention provides two super prior guiding mode prediction methods, which take the super prior information with discriminability as input to learn two mode prediction networks; the super-prior information in motion and residual compression is used for predicting the resolution of different blocks in motion compression and whether the compression of a current block is skipped in residual compression, so that the number of bits required in motion and residual compression is greatly saved. The method of super-a priori guided mode prediction does not introduce any extra bit cost, incurs negligible computational cost, and can be easily used to predict the best coding mode (i.e. the best block resolution mode for motion coding and the "skip" and "no skip" modes for residual compression).
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts;
fig. 1 is a flowchart of a coarse-to-fine depth video coding method with super-prior guided mode prediction according to an embodiment of the present invention;
fig. 2 is a schematic network structure diagram of a feature extraction module and a frame reconstruction module according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a coarse motion compensation branch network according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a fine motion compensation branch network according to an embodiment of the present invention;
fig. 5 is a schematic diagram of the four basic modes in the resolution mode prediction network according to the embodiment of the present invention and the mode prediction network according to the embodiment of the present invention;
FIG. 6 is a flowchart of adaptive motion compression for guiding a motion vector according to an embodiment of the present invention;
FIG. 7 is a comparison graph of the performance of the Bpp-PSNR video compression algorithm provided by the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
Referring to fig. 1, the present invention provides a coarse-to-fine depth video coding method with super-prior guided mode prediction, which is implemented according to the following procedures: features of an input video frame are extracted, then motion estimation is carried out twice in a rough to fine mode, and compression and compensation are carried out to obtain predicted features, wherein motion compression carried out on a fine layer surface uses motion compression guided by a super-prior. After the motion compensated features are obtained, the residual information is compressed by the residual compression guided by the super-first-check. And finally, loading the reconstructed residual error characteristics back to the prediction characteristics, and obtaining a reconstructed video frame through a frame reconstruction module. The quantized features in the compressed network are arithmetically entropy coded and stored as a binary file.
The specific execution steps are as follows:
s1, feature acquisition: acquiring an input image frame Xx to be compressed currently and a reconstructed reference frame obtained by compressing the previous frame
Figure BSA0000276259820000041
Respectively extracting to obtain input features F t And reference character
Figure BSA0000276259820000042
S2, rough motion compensation: input feature F t And reference character
Figure BSA0000276259820000043
Obtaining a rough offset between two frames through one-time motion estimation and one-time motion compression, and loading the rough offset to the reference feature
Figure BSA0000276259820000044
Performing motion compensation once to obtain intermediate prediction characteristics
Figure BSA0000276259820000045
S3, fine motion compensation: intermediate predictive features
Figure BSA0000276259820000051
And input feature F t Performing secondary motion estimation, secondary motion compression and secondary motion compensation again to generate final prediction characteristics
Figure BSA0000276259820000052
The secondary motion compression adopts a super-prior guided self-adaptive motion compression method, the super-prior information of the features obtained by secondary motion estimation is used as input to carry out resolution mode prediction, and the obtained prediction feature block guides the coding and decoding operations of the features obtained by secondary motion estimation in the secondary motion compression;
s4, residual error characteristic compression: input feature F t And final predicted features
Figure BSA0000276259820000053
Residual error characteristic R between t The adaptive residual error compression method guided by the prior experience is used for carrying out skip/non-skip mode prediction, and skipping the characteristic that the residual error value meets the requirement of a set threshold value to obtain the reconstructed residual error characteristic
Figure BSA0000276259820000054
And loaded to the final predicted features
Figure BSA0000276259820000055
Generating reconstruction features
Figure BSA0000276259820000056
S5, reconstructing the characteristics
Figure BSA0000276259820000057
Input to a frame reconstruction module to generate a reconstructed frame
Figure BSA0000276259820000058
S6, reconstructing the frame
Figure BSA0000276259820000059
And (5) as a reference frame of the next frame, repeating the steps of S1-S5 until the last frame to obtain the compressed video.
In one embodiment, as shown in FIG. 2 (a), the feature extraction module performs the input feature F in the video t As shown in fig. 2 (b), the frame reconstruction module performs the reconstruction feature
Figure BSA00002762598200000510
And (4) a reconstruction step. ResBlock in fig. 2 (a) and 2 (b) is a basic block constituting a convolutional neural network ResNet, and is shown in fig. 2 (c).
In one embodiment, the video to be compressed is decomposed into images of one frame, for the first frame, we compress the image to obtain a reconstructed frame by using a conventional image compression algorithm, and for each next frame, we repeatedly compress the image to obtain the reconstructed frame by adopting the method from step 2 to step 7 from front to back.
In this embodiment, the compression process of the first frame reconstruction frame performed before Sl is as follows: when t =1, reconstructing the reference frame
Figure BSA00002762598200000511
For an input image frame X t And obtaining a reconstructed frame through compression by a compression algorithm.
For the t (t > = 2) th frameFrom the current input image frame X, we need to perform a compression t Reconstructed reference frame obtained by compressing with last frame
Figure BSA00002762598200000512
Extracting input features F t And reference character
Figure BSA00002762598200000513
In one embodiment, to produce more accurate motion compensation results, two stages of coarse to fine motion compensation modules are proposed. As shown in fig. 3, S2 is a step performed by the motion compensation module at the coarse level, and includes:
by applying to input features F t And reference character
Figure BSA00002762598200000514
Carrying out down-sampling operation to scale the two low-resolution features into the original features with the size of 1/n;
after motion estimation and motion compression are carried out on the two low-resolution features, up-sampling operation is carried out, namely bilinear interpolation calculation is carried out, and the size of the two low-resolution features is scaled by n times, so that the rough offset between the two frames is obtained;
applying a coarse offset to a reference feature
Figure BSA0000276259820000061
Based on a single motion compensation using a deformable convolution to generate intermediate predictive features
Figure BSA0000276259820000062
Since the motion compression bit consumption of this process is not large, adaptive motion compression is not used in the coarse level motion compensation module.
In this embodiment, the downsampling operation is to scale the features to 1/4 length by 1/4 width by bilinear interpolation. The upsampling operation is to scale the features to the original size of 4 times length by 4 times width by a bilinear interpolation algorithm.
In one embodiment, the downsampled input features F are sampled t And reference character
Figure BSA0000276259820000063
The input is to the coarse motion estimation network, which connects and passes the two features to the two convolutional layers.
In one embodiment, the features after motion estimation are input to a coarse motion compression network for primary motion compression, the motion compression network is composed of a motion coding network and a motion decoding network, wherein the motion coding network comprises four convolutional layers with a step size of 2 and four convolutional layers with a step size of 1, and the motion decoding network comprises four anti-convolutional layers with a step size of 2 and four convolutional layers with a step size of 1.
In one embodiment, in the fine level motion compensation module, the prediction features are based on the intermediate
Figure BSA0000276259820000064
And input feature F t At the fine level, we perform motion estimation, motion compression and motion compensation again, thereby generating the final prediction features
Figure BSA0000276259820000065
As shown in fig. 4, S3 is a step performed by the motion compensation module at the fine level, wherein the motion estimation network and the motion compensation network are the same as the motion compensation module at the coarse level.
In the fine-level motion compression module, a newly proposed adaptive motion compression module guided in advance is adopted, and as shown in fig. 6, the adaptive motion compression module guided in advance specifically includes the following steps:
the pre-learning prediction network based on the prior information, namely a resolution mode prediction network, is used for outputting the optimal block resolution to determine the optimal block resolution so as to better encode the motion information;
the input features to be compressed are processed by four convolutional layers with the step size of 2 and four convolutional layers with the step size of 1 to obtainTo the coded motion characteristics M to be transmitted t Motion characteristics M t Obtaining the super-priority information as the input of the super-priority network;
the super-a-priori information is input into a resolution mode prediction network for predicting the optimal resolution size of each feature block to obtain a predicted resolution mode, wherein, as shown in fig. 5 (a), four basic modes are 4 basic resolution modes, and for each 2x2 and 4x4 feature block, the resolution mode (i.e., the basic mode in fig. 5 a) is predicted. As shown in fig. 5b, for the current 4x4 feature block, what base resolution mode the 4x4 feature block belongs to is predicted first, and when the prediction result is M0 (i.e. the base mode M0 in fig. 5 a), the 4x4 feature block is divided into 4 sub-blocks of 2x 2. Simultaneously, the resolution mode of each 2x2 sub-block is also predicted, and the resolution mode (M0/M1/M2/M3) of each block is selected according to the prediction result; the mode-guided average pooling operation is performed on each feature block according to the resulting resolution mode, e.g., each value in block A in the top left 2x2 (i.e., M) t 3,4, 4,5) are averaged and pooled to 4, and then quantized, entropy encoded, and the decoding side obtains this 4, since at the decoding side there is also a resolution mode for each block, and it is known that block a actually consists of 4 values, 4 upsampling in block a is used here to obtain 4 (i.e., mode-directed upsampling is used to upsample 4 in block a to 4 (i.e., 4 values)
Figure BSA0000276259820000071
Red block in the upper left corner).
Will move the characteristic M t Inputting the average pooling layer to the mode-guided average pooling layer to perform corresponding average pooling operation to reduce the number of values of the motion characteristics to be transmitted, thereby effectively reducing the number of bits for transmitting the coded motion characteristics, and then inputting the average pooled characteristics to the mode-guided upsampling layer to restore the original size of the average pooled characteristics as characteristics according to the prior-a-information
Figure BSA0000276259820000072
Namely, after obtaining the characteristic, the decoding end can also obtain the characteristic according to the prior informationThe averaged pooled features are restored to their original size.
Figure BSA0000276259820000073
And inputting the motion decoder for decoding to obtain the decoded fine-level motion characteristics. The motion decoding network comprises four deconvolution layers with step size 2 and four convolution layers with step size 1.
In this embodiment, the super-a-priori information includes the motion characteristics M of the super-a-priori network after encoding t Predicted mean and variance are used to assist in the alignment of the motion features M t And performing arithmetic coding and arithmetic decoding.
In one embodiment, residual feature R t The compression is performed by an adaptive residual compression module guided by a super-a-priori. The overall network structure comprises a residual error coding network, a residual error decoding network, a super-prior network and a resolution mode prediction network. The overall network structure is substantially identical to the super-a-guided adaptive motion compression module (including motion coding network, motion decoding network, super-a-network and resolution mode prediction network), except that based on the super-a-information, the prediction network of the adaptive residual compression module does not predict the optimal resolution of each block, but rather learns the input residual features R t Coded residual error characteristic Y obtained after residual error coding network t Each of the eigenvalues (in total, 128 × h × w eigenvalues) that need to be transmitted (dimension 128 × h × w) predicts a "skip"/"no-skip" pattern, as shown in fig. 5 (c). By transmitting skipped insignificant feature values to save bits, insignificant features (e.g., features where the residual value is 0, which also does not contain any information) will not be transmitted to the decoding side, and filling these skipped feature values with 0 at the decoding side to reduce the number of bits required to transmit the encoded residual features, so that the residual compression network can better encode the residual features. Finally, the reconstructed residual error characteristics
Figure BSA0000276259820000081
Adding back the final predicted features
Figure BSA0000276259820000082
Generating reconstructed features
Figure BSA0000276259820000083
In one embodiment, the quadratic compensation process in the fine level motion compensation module is: the decompressed motion characteristics obtained after the adaptive compression by the super-experience guidance are obtained
Figure BSA0000276259820000084
Predicting features in the middle
Figure BSA0000276259820000085
Based on the final predicted features by quadratic motion compensation using deformable convolution
Figure BSA0000276259820000086
Thereby enabling compensation at a higher resolution to obtain a more accurate prediction.
In one embodiment, the coded features in the primary motion compression, the secondary motion compression module and the residual feature compression are all converted into bit streams and then subjected to corresponding decoding operations. As shown in fig. 6, after AC arithmetic coding (arithmetric coding), the feature map is converted into a bitstream for transmission to the decoding end, and after receiving the bitstream, the decoding end converts the bitstream into the feature map again using AD arithmetic decoding (arithmetric decoding).
Table 1 gives the BDBR results of the present example method (Ours) compared to standard reference software H265 (HM) on multiple datasets, including HEVC ClaSS B, C, D, E, UVG and MCL-JCV. Negative values in the table indicate how many percent of bits can be saved for the same reconstruction quality. Compared with other video compression methods (FVC, ELF-VC, DCVC, FVC (re-imp)) based on deep learning, the method of the embodiment can also achieve the best performance at present.
TABLE 1 BDBR results comparison Table
Figure BSA0000276259820000087
Since video compression requires consideration of reconstruction performance at different bit rates, a performance map is drawn by comparing bpp (the average number of bits consumed per pixel, the smaller the better) with PSNR (the larger the better the quality of reconstruction), as shown in fig. 7.
FVC (re-imp) is our baseline approach, C2F is our proposed coarse-to-fine video compression algorithm framework, C2F + HAMC is our proposed adaptive motion compression algorithm guided by super-prior on the coarse-to-fine video compression algorithm framework equipment, C2F + HAMC + HARC is our proposed adaptive motion and residual compression algorithm guided by super-prior on the coarse-to-fine video compression algorithm framework equipment.
The result shows that the proposed video compression algorithm framework from coarse to fine, the super-a-guided resolution adaptive motion compression and the super-a-guided skip adaptive residual compression can improve the performance of the existing method, and the effectiveness of the proposed algorithm is proved.
Comprehensive experiments on HEVC, UVG and MCL-JCV data sets show that the newly-proposed super-a-priori guided mode prediction method provided by the coarse-to-fine framework in the embodiment achieves video compression performance equivalent to H265 (HM) in terms of PSNR indexes and is generally superior to the current latest video compression standard VTM in terms of MS-SSIM indexes.
The coarse-to-fine depth video coding method with super-prior guided mode prediction provided by the present invention is described in detail above, in this embodiment, a specific example is applied to illustrate the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method of the present invention and the core idea thereof; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined in this embodiment may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (8)

1. A method for coarse-to-fine depth video coding with super-prior guided mode prediction, comprising the steps of:
s1, feature acquisition: obtaining the input image frame X to be compressed t The reconstructed reference frame obtained by compressing with the previous frame
Figure FSA0000276259810000011
Respectively extracting to obtain input features F t And reference character
Figure FSA0000276259810000012
S2, rough motion compensation: the input feature F t And reference character
Figure FSA0000276259810000013
Obtaining a coarse offset between two frames via one motion estimation and one motion compression, loading the coarse offset to the reference feature
Figure FSA0000276259810000014
Performing motion compensation once to obtain intermediate prediction characteristics
Figure FSA0000276259810000015
S3, fine motion compensation: the intermediate prediction feature
Figure FSA0000276259810000016
And input feature F t Performing secondary motion estimation, secondary motion compression and secondary motion compensation again to generate final prediction characteristics
Figure FSA0000276259810000017
The secondary motion compression adopts a super-a-guided adaptive motion compression method, super-a-information of the features obtained by secondary motion estimation is used as input to carry out resolution mode prediction, and the obtained prediction feature block guides the coding and decoding operations of the features obtained by secondary motion estimation in the secondary motion compression;
s4, residual error feature compression: input feature F t And final predicted features
Figure FSA0000276259810000018
Residual error characteristic R between t The adaptive residual error compression method guided by the prior experience is used for carrying out skip/non-skip mode prediction, and skipping the characteristic that the residual error value meets the requirement of a set threshold value to obtain the reconstructed residual error characteristic
Figure FSA0000276259810000019
And loaded to the final predicted features
Figure FSA00002762598100000110
Generating reconstruction features
Figure FSA00002762598100000111
S5, the reconstruction characteristics are carried out
Figure FSA00002762598100000112
Input to a frame reconstruction module to generate a reconstructed frame
Figure FSA00002762598100000113
S6, reconstructing the frame
Figure FSA00002762598100000114
And (5) as a reference frame of the next frame, repeating the steps of S1-S5 until the last frame to obtain the compressed video.
2. The method of coarse-to-fine depth video coding with super-a priori guided mode prediction according to claim 1, wherein S1 is preceded by: when t =1, reconstructing the reference frame
Figure FSA00002762598100000115
For input image frame X t And obtaining a reconstructed frame through compression by a compression algorithm.
3. The method for coarse-to-fine depth video coding with super-prior guided mode prediction according to claim 1, wherein the S2 comprises:
by pair input features F t And reference character
Figure FSA00002762598100000116
Carrying out down-sampling operation to scale the two low-resolution features into the original features with the size of 1/n;
after motion estimation and motion compression are carried out on the two low-resolution features, up-sampling operation is carried out to scale the two low-resolution features by n times, and further the rough offset between two frames is obtained;
setting the rough offset to be in the reference characteristic
Figure FSA00002762598100000117
Based on a single motion compensation using a deformable convolution to generate intermediate predictive features
Figure FSA00002762598100000118
4. The method of claim 3, wherein the coarse-to-fine depth video coding with super-a priori guided mode prediction is performedDownsampled input feature F t And reference character
Figure FSA0000276259810000021
Input to a coarse motion estimation network that connects and passes the two features to the two convolutional layers.
5. The method of claim 3, wherein the estimated motion features are input to a coarse motion compression network for motion compression, and the coarse motion compression network comprises a motion coding network and a motion decoding network.
6. The method for coarse-to-fine depth video coding with super-prior guided mode prediction according to claim 1, wherein the S3 comprises:
a pre-learning prediction network based on the prior information, namely a resolution mode prediction network, for outputting the best block resolution;
inputting the input characteristics needing to be compressed into a motion encoder to be encoded to obtain motion characteristics M t Characteristic of motion M t Obtaining the super-prior information as the input of the super-prior network;
inputting the super-prior information into a resolution mode prediction network for predicting the optimal resolution of each feature block to obtain a predicted resolution mode;
the motion characteristics M t Inputting the average pooling layer to the mode-guided average pooling layer for performing corresponding average pooling operation, and inputting the average pooled feature to the mode-guided upsampling layer to restore the original size as the feature
Figure FSA0000276259810000022
Figure FSA0000276259810000023
And inputting the motion decoder for decoding to obtain the compressed motion characteristics.
7. The method of claim 6, wherein the super-prior information comprises the motion feature M t Mean and variance of.
8. The method of claim 1, wherein the coded features in the primary motion compression, the secondary motion compression, and the residual feature compression are all transformed into a bitstream for a corresponding decoding operation.
CN202210727355.5A 2022-05-31 2022-05-31 Coarse-to-fine depth video coding method with super-prior guiding mode prediction Pending CN115150628A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210727355.5A CN115150628A (en) 2022-05-31 2022-05-31 Coarse-to-fine depth video coding method with super-prior guiding mode prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210727355.5A CN115150628A (en) 2022-05-31 2022-05-31 Coarse-to-fine depth video coding method with super-prior guiding mode prediction

Publications (1)

Publication Number Publication Date
CN115150628A true CN115150628A (en) 2022-10-04

Family

ID=83407729

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210727355.5A Pending CN115150628A (en) 2022-05-31 2022-05-31 Coarse-to-fine depth video coding method with super-prior guiding mode prediction

Country Status (1)

Country Link
CN (1) CN115150628A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116437089A (en) * 2023-06-08 2023-07-14 北京交通大学 Depth video compression algorithm based on key target

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200160565A1 (en) * 2018-11-19 2020-05-21 Zhan Ma Methods And Apparatuses For Learned Image Compression
CN112203093A (en) * 2020-10-12 2021-01-08 苏州天必佑科技有限公司 Signal processing method based on deep neural network
CN113298894A (en) * 2021-05-19 2021-08-24 北京航空航天大学 Video compression method based on deep learning feature space
CN114501013A (en) * 2022-01-14 2022-05-13 上海交通大学 Variable bit rate video compression method, system, device and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200160565A1 (en) * 2018-11-19 2020-05-21 Zhan Ma Methods And Apparatuses For Learned Image Compression
CN112203093A (en) * 2020-10-12 2021-01-08 苏州天必佑科技有限公司 Signal processing method based on deep neural network
CN113298894A (en) * 2021-05-19 2021-08-24 北京航空航天大学 Video compression method based on deep learning feature space
CN114501013A (en) * 2022-01-14 2022-05-13 上海交通大学 Variable bit rate video compression method, system, device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马思伟: "智能视频编码", 人工智能, 10 April 2020 (2020-04-10) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116437089A (en) * 2023-06-08 2023-07-14 北京交通大学 Depth video compression algorithm based on key target
CN116437089B (en) * 2023-06-08 2023-09-05 北京交通大学 Depth video compression method based on key target

Similar Documents

Publication Publication Date Title
CN110087092B (en) Low-bit-rate video coding and decoding method based on image reconstruction convolutional neural network
CN112203093B (en) Signal processing method based on deep neural network
CN103607591A (en) Image compression method combining super-resolution reconstruction
EP2168382B1 (en) Method for processing images and the corresponding electronic device
CN107454412B (en) Video image processing method, device and system
EP1397774A1 (en) Method and system for achieving coding gains in wavelet-based image codecs
CN102217314A (en) Methods and apparatus for video imaging pruning
CN105430416A (en) Fingerprint image compression method based on adaptive sparse domain coding
US20170223381A1 (en) Image coding and decoding methods and apparatuses
CN113298894A (en) Video compression method based on deep learning feature space
CN111726614A (en) HEVC (high efficiency video coding) optimization method based on spatial domain downsampling and deep learning reconstruction
CN111669588B (en) Ultra-high definition video compression coding and decoding method with ultra-low time delay
CN105392009A (en) Low bit rate image coding method based on block self-adaptive sampling and super-resolution reconstruction
CN109922339A (en) In conjunction with the image coding framework of multi-sampling rate down-sampling and super-resolution rebuilding technology
Fu et al. An extended hybrid image compression based on soft-to-hard quantification
CN114245989A (en) Encoder and method of encoding a sequence of frames
CN115278262A (en) End-to-end intelligent video coding method and device
CN115150628A (en) Coarse-to-fine depth video coding method with super-prior guiding mode prediction
KR100679027B1 (en) Method and apparatus for coding image without DC component loss
CN110677644A (en) Video coding and decoding method and video coding intra-frame predictor
CN111080729B (en) Training picture compression network construction method and system based on Attention mechanism
CN104581173A (en) Soft decoding verification model platform
JP4762486B2 (en) Multi-resolution video encoding and decoding
CN115643406A (en) Video decoding method, video encoding device, storage medium, and storage apparatus
Peng et al. An optimized algorithm based on generalized difference expansion method used for HEVC reversible video information hiding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination