CN113542745A - Rate distortion coding optimization method - Google Patents
Rate distortion coding optimization method Download PDFInfo
- Publication number
- CN113542745A CN113542745A CN202110588067.1A CN202110588067A CN113542745A CN 113542745 A CN113542745 A CN 113542745A CN 202110588067 A CN202110588067 A CN 202110588067A CN 113542745 A CN113542745 A CN 113542745A
- Authority
- CN
- China
- Prior art keywords
- distortion
- coding
- calculating
- coding blocks
- coding block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000005457 optimization Methods 0.000 title claims abstract description 14
- 238000004364 calculation method Methods 0.000 claims abstract description 10
- 238000010606 normalization Methods 0.000 claims description 7
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 abstract description 7
- 238000004458 analytical method Methods 0.000 abstract description 6
- 230000006835 compression Effects 0.000 description 3
- 238000007906 compression Methods 0.000 description 3
- 230000010365 information processing Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/147—Data rate or code amount at the encoder output according to rate distortion criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention relates to a rate-distortion coding optimization method, which comprises the following steps: when an image is coded, firstly, a network is analyzed according to preset image characteristics to obtain the characteristics of the image; then, calculating a value (marked as ROIM) of the interest degree of the machine for each coding block according to the characteristics of the image, wherein the higher the ROIM is, the more the machine is likely to be interested in the future visual analysis task; performing code rate allocation on each coding block in the image according to ROIM; after code rate allocation, a calculation mode of rate distortion errors is modified, brand-new characteristic distortion-based coding distortion facing machine analysis is expressed, and finally performance of the coded image in a visual analysis task is improved.
Description
Technical Field
The invention belongs to the field of image and video compression, and particularly relates to a rate-distortion coding optimization method.
Background
The existing rate distortion coding optimization method for image/video compression mainly adopts the following two modes:
in AVS series video coding standards and H.26x series video coding, the rate-distortion optimization method of most image/video compression adopts a rate-distortion coding method based on pixel signal mean square error, the mean square error is mainly used for estimating the consistency of the compressed image and an original image at the pixel level, and the pursued result is that all pixels are most similar to the original image in numerical value on average. However, this method has been demonstrated by many efforts to be affected by noise, such as focusing errors on certain regions of the image, which can result in cross-visualization even if there are zero errors in other regions. Many times, rate distortion optimization methods based on mean square error cannot accurately represent subjective feelings of the human visual system.
Secondly, in order to solve the mismatching between the pixel level distortion and the human visual system, a rate distortion optimization method facing subjective vision is adopted in many new methods to promote. The commonly used method is structural similarity or multi-scale structural similarity. These rate-distortion optimization methods pay more attention to the structural similarity between the compressed image and the original image, and restore the same graphic structure as the original image as much as possible. However, this method has many limitations when dealing with the task of visual analysis.
The invention content is as follows:
the invention aims to solve the technical problem that the existing rate-distortion coding algorithm has low performance in a visual analysis task.
The invention provides a rate-distortion coding optimization method, which comprises the following steps:
step 1: inputting an image, extracting a frame by using an RPN (Region pro-social Network) Network, and obtaining preset image characteristics of the image;
step 2: calculating the machine interest value of each coding block according to preset image characteristics, and distributing the number of coding bits according to the machine interest value of each coding block;
and step 3: for every two adjacent coding blocks, calculating the correlation index of the adjacent coding blocks according to the preset image characteristics, and limiting the QP calculation in the actual coding according to the correlation index of the adjacent coding blocks;
and 4, step 4: for each coding block, extracting features through a convolutional neural network, calculating cosine distances between the features as distortion, calculating rate distortion loss according to the distortion and code rates, establishing rate distortion optimization according to the rate distortion loss, and outputting an optimized image.
Further, the preset image characteristic in step 1 is a frequency of a frame overlapping each coding block, or a size ratio of the frame overlapping a boundary of two adjacent coding blocks, or any combination of the three.
Further, the method for calculating the value of interest of the machine in step 2 is as follows:
a, defining a machine interest value of each coding block;
b, traversing all frames in the step 1 for each coding block, and calculating the proportion F of the frames and the coding blocks occupying the area of the coding block per se;
and C, recording the F value corresponding to the largest coding block F in all the coding blocks as FMAX, then dividing the F corresponding to all the coding blocks by FMAX for normalization processing, and assigning a machine interest value of each coding block according to a result obtained by the normalization processing.
Further, the method for allocating the number of coding bits is as follows:
initializing the bit number of the whole image;
b, for each coding block, calculating the number of bits which can be used currently and the weighted sum of the SATD value and the machine interest value of each coding block according to the number of bits of the whole image and the number of bits which are consumed, calculating the sum of the SATD value and the machine interest value of the current coding block, accounting for the sum of the SATD values and the machine interest values of all the coding blocks, and distributing the number of bits which can be used currently according to the ratio;
and C, after each distribution, the encoder encodes, and updates the number of bits consumed according to the number of bits consumed by the encoder.
Further, in step 3, the calculation formula of the correlation index of the adjacent coding blocks is as follows: and MC is related index of adjacent coding blocks, A is the intersection of the lengths of frames spanning the current adjacent coding blocks, and B is the length of adjacent edges of the current adjacent coding block.
Further, in step 3, the limiting method is as follows: if the correlation index of the adjacent coding blocks is larger than 0.7, the QP gap between the two adjacent coding blocks cannot exceed 2; otherwise, the QP gap between the two adjacent coding blocks cannot exceed 9. When different coding tree units of the current image are coded, the QP in the current image needs to be set so as to improve the coding quality.
Further, in step 4, the distortion is a weighted sum of the characteristic distortion and the pixel distortion.
Further, the method for calculating the characteristic distortion comprises the following steps: and calculating the cosine distance between the characteristic F1 of the current block extracted by the neural network and the characteristic F2 of the original block, wherein the cosine distance is characteristic distortion.
Further, the pixel distortion calculation method includes: and calculating the average value of the squares of the differences between the corresponding pixels of the current block and the original block after being extracted by the neural network, wherein the average value is pixel distortion.
Compared with the prior art, the invention has the following advantages and effects:
1. the invention establishes a brand-new code rate allocation mode and a rate distortion calculation method.
2. The image compressed by the method can obtain better performance in a visual analysis task under the precondition of the same code rate.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example 1: a method of rate-distortion coding optimization, comprising the steps of:
step 1: inputting an image, extracting a frame by using a pre-trained RPN (Region-generating Network), wherein the pre-trained RPN and the pre-trained RPN both use the existing algorithm, for example, the algorithm in the article "Faster-cnn" by the authors S.ren, K.He, R.Girshick and J.Sun, the instruction real-time object detection with Region information processing Network (journal number Advances in the probability information processing system 2015, pp.91-99), and defining the frequency of the frame and each coding block overlapping or the size ratio of the frame and the boundary of two adjacent coding blocks overlapping or any combination of the three as a preset image characteristic, for example, the preset characteristic is as follows;
step 2.1: defining a machine interest value (hereinafter referred to as a "ROIM value") of each coding block, traversing all frames in step 1 for each coding block, namely enumerating all frames obtained in step 1, calculating a proportion F of the frames and the coding blocks occupying the area of the coding block, recording an F value corresponding to the coding block with the maximum F in all the coding blocks as FMAX, then dividing the F corresponding to all the coding blocks by FMAX for normalization processing, and assigning a value to the machine interest value of each coding block according to a result obtained by the normalization processing. For example, if a coding block size is 128 × 128 and the size of the intersection of the frame and the coding block is 7285 pixels, F is 7285/(128 × 128) is 0.44; and F corresponding to each coding block is 0.75 at the maximum, the FMAX is 0.75, and the normalization processing is carried out by dividing the F by the FMAX to obtain the ROIM value of each coding block.
Step 2.2: initializing the bit number of the whole image, namely inputting the bit number of the whole image to be a certain value, for example 100, and then traversing each coding block from top to bottom and from left to right in sequence, and for each coding block, firstly obtaining the bit number which can be used currently by the bit number of the whole image and the bit number which is already consumed. And meanwhile, calculating the weighted Sum of the SATD (Sum of Absolute value Sum after Hadamard transform) value and the ROIM value of each coding block, wherein the SATD calculation mode adopts a calculation mode in a VVC (virtual Video coding) standard, the weighted Sum of the SATD value and the ROIM value of the current coding block accounts for the Sum proportion of the weighted sums of the SATD value and the ROIM value of all the coding blocks, the proportion is used as an allocation coefficient to allocate the number of bits which can be used currently, namely, the existing VTM encoder is used for allocating code rates to different coding blocks, and after each allocation, the VTM encoder performs encoding, and updates the number of bits which are already consumed according to the actual number of bits which are consumed by the VTM encoder.
Step 3.1: for every two adjacent coding blocks, calculating a correlation index (MC) of the adjacent coding blocks according to preset image characteristics, wherein the calculation formula is as follows: and MC is related index of adjacent coding blocks, A is the intersection of the lengths of frames spanning the current adjacent coding blocks, and B is the length of adjacent edges of the current adjacent coding block. For example, the size of the default coding block in the VTM is 128, and for two adjacent coding blocks, the border length across the two coding blocks is 96, so that MC is 96/128 is 0.75.
Step 3.2: limiting the QP calculation of the current coding tree unit in the actual coding in the later application process according to the MC, wherein if the MC is more than 0.7, the QP difference between the two adjacent coding blocks cannot exceed 2; otherwise, the QP gap between the two adjacent coding blocks cannot exceed 9. For example, if MC is 96/128-0.75 and MC is greater than 0.7, the QP gap cannot exceed 2.
And 4, step 4: for each coding block, extracting features through a convolutional neural network trained in advance, using the existing algorithm, such as a sub-network obtained by removing the last pooling layer and full connection layer through a VGG-19 network in the article "Very deep convolutional network for large-scale image recognition" (journal number arXiv prediction arXiv:1409.1556,2014), which is written by k.simony and a.zisserman, calculating the cosine distance (defined as feature distortion) between the feature F1 of the current block extracted through the neural network and the feature F2 of the original block and the average value (defined as pixel distortion) of the square of the difference between the current block extracted through the neural network and the corresponding pixel of the original block, defining the weighted sum of the feature distortion and the pixel distortion as distortion, and then using the distortion and the consumed code rate of the current configuration to jointly calculate a rate distortion, the rate distortion loss is calculated by R + lambda D, R is code rate, D is distortion, and lambda is the internal parameter of the encoder, and the rate distortion loss is used for optimizing the rate distortion during division.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (9)
1. A method for rate-distortion coding optimization, comprising the steps of:
step 1: inputting an image, extracting a frame by using an RPN network, and obtaining preset image characteristics of the image;
step 2: calculating the machine interest value of each coding block according to preset image characteristics, and distributing the number of coding bits according to the machine interest value of each coding block;
and step 3: for every two adjacent coding blocks, calculating the correlation index of the adjacent coding blocks according to the preset image characteristics, and limiting the QP calculation according to the correlation index of the adjacent coding blocks;
and 4, step 4: for each coding block, extracting features through a convolutional neural network, calculating cosine distances between the features as distortion, calculating rate distortion loss according to the distortion and code rates, establishing rate distortion optimization according to the rate distortion loss, and outputting an optimized image.
2. The method according to claim 1, wherein the predetermined picture characteristic in step 1 is a frequency at which the frame coincides with each coding block, or a size ratio at which the frame coincides with a boundary between two adjacent coding blocks, or any combination thereof.
3. The method of claim 2, wherein the machine interest value in step 2 is calculated by:
a, defining a machine interest value of each coding block;
b, traversing all frames in the step 1 for each coding block, and calculating the proportion F of the frames and the coding blocks occupying the area of the coding block per se;
and C, recording the F value corresponding to the largest coding block F in all the coding blocks as FMAX, then dividing the F corresponding to all the coding blocks by FMAX for normalization processing, and assigning a machine interest value of each coding block according to a result obtained by the normalization processing.
4. The method of claim 2, wherein the number of coded bits is allocated by:
initializing the bit number of the whole image;
b, for each coding block, calculating the number of bits which can be used currently and the weighted sum of the SATD value and the machine interest value of each coding block according to the number of bits of the whole image and the number of bits which are consumed, calculating the sum of the SATD value and the machine interest value of the current coding block, accounting for the sum of the SATD values and the machine interest values of all the coding blocks, and distributing the number of bits which can be used currently according to the ratio;
and C, after each distribution, the encoder encodes, and updates the number of bits consumed according to the number of bits consumed by the encoder.
5. The method of claim 2, wherein in step 3, the correlation index of adjacent coding blocks is calculated by the following formula: and MC is related index of adjacent coding blocks, A is the intersection of the lengths of frames spanning the current adjacent coding blocks, and B is the length of adjacent edges of the current adjacent coding block.
6. The method of claim 2, wherein in step 3, the limiting method is: if the correlation index of the adjacent coding blocks is larger than 0.7, the QP gap between the two adjacent coding blocks cannot exceed 2; otherwise, the QP gap between the two adjacent coding blocks cannot exceed 9.
7. The method of claim 2, wherein in step 4, the distortion is a weighted sum of the characteristic distortion and the pixel distortion.
8. The method of claim 7, wherein the characteristic distortion is calculated by: and calculating the cosine distance between the characteristic F1 of the current block extracted by the neural network and the characteristic F2 of the original block, wherein the cosine distance is characteristic distortion.
9. The method of claim 7, wherein the pixel distortion is calculated by: and calculating the average value of the squares of the differences between the corresponding pixels of the current block and the original block after being extracted by the neural network, wherein the average value is pixel distortion.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110588067.1A CN113542745B (en) | 2021-05-27 | 2021-05-27 | Rate distortion coding optimization method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110588067.1A CN113542745B (en) | 2021-05-27 | 2021-05-27 | Rate distortion coding optimization method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113542745A true CN113542745A (en) | 2021-10-22 |
CN113542745B CN113542745B (en) | 2024-06-25 |
Family
ID=78124465
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110588067.1A Active CN113542745B (en) | 2021-05-27 | 2021-05-27 | Rate distortion coding optimization method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113542745B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118233638A (en) * | 2024-05-24 | 2024-06-21 | 宁波康达凯能医疗科技有限公司 | Machine vision-oriented inter-frame image coding rate distortion optimization method and system |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004044830A1 (en) * | 2002-11-12 | 2004-05-27 | Nokia Corporation | Region-of-interest tracking method and device for wavelet-based video coding |
CN101198058A (en) * | 2007-12-14 | 2008-06-11 | 武汉大学 | Rate aberration optimizing frame refreshing and code rate distribution method for interested area |
AU2008230068A1 (en) * | 2002-04-15 | 2008-11-13 | Godo Kaisha Ip Bridge 1 | Image encoding method and image decoding method |
WO2010036772A2 (en) * | 2008-09-26 | 2010-04-01 | Dolby Laboratories Licensing Corporation | Complexity allocation for video and image coding applications |
KR20100071834A (en) * | 2008-12-19 | 2010-06-29 | 주식회사 케이티 | Apparatus and method for calculating distortion of moving picture |
CN104539962A (en) * | 2015-01-20 | 2015-04-22 | 北京工业大学 | Layered video coding method fused with visual perception features |
EP3151566A1 (en) * | 2012-06-29 | 2017-04-05 | GE Video Compression, LLC | Video data stream concept |
CN109889839A (en) * | 2019-03-27 | 2019-06-14 | 上海交通大学 | ROI Image Coding, decoding system and method based on deep learning |
CN112752102A (en) * | 2019-10-31 | 2021-05-04 | 北京大学 | Video code rate distribution method based on visual saliency |
-
2021
- 2021-05-27 CN CN202110588067.1A patent/CN113542745B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2008230068A1 (en) * | 2002-04-15 | 2008-11-13 | Godo Kaisha Ip Bridge 1 | Image encoding method and image decoding method |
WO2004044830A1 (en) * | 2002-11-12 | 2004-05-27 | Nokia Corporation | Region-of-interest tracking method and device for wavelet-based video coding |
CN101198058A (en) * | 2007-12-14 | 2008-06-11 | 武汉大学 | Rate aberration optimizing frame refreshing and code rate distribution method for interested area |
WO2010036772A2 (en) * | 2008-09-26 | 2010-04-01 | Dolby Laboratories Licensing Corporation | Complexity allocation for video and image coding applications |
KR20100071834A (en) * | 2008-12-19 | 2010-06-29 | 주식회사 케이티 | Apparatus and method for calculating distortion of moving picture |
EP3151566A1 (en) * | 2012-06-29 | 2017-04-05 | GE Video Compression, LLC | Video data stream concept |
CN104539962A (en) * | 2015-01-20 | 2015-04-22 | 北京工业大学 | Layered video coding method fused with visual perception features |
WO2016115968A1 (en) * | 2015-01-20 | 2016-07-28 | 北京工业大学 | Visual perception feature-fused scaled video coding method |
CN109889839A (en) * | 2019-03-27 | 2019-06-14 | 上海交通大学 | ROI Image Coding, decoding system and method based on deep learning |
CN112752102A (en) * | 2019-10-31 | 2021-05-04 | 北京大学 | Video code rate distribution method based on visual saliency |
Non-Patent Citations (2)
Title |
---|
A. SERDAR TAN: "Rate-Distortion Optimization for Stereoscopic Video Streaming with Unequal Error Protection", EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING * |
张哲为: "基于感兴趣区域率失真优化的视频压缩编码通信系统设计", 中国优秀硕博毕业论文 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118233638A (en) * | 2024-05-24 | 2024-06-21 | 宁波康达凯能医疗科技有限公司 | Machine vision-oriented inter-frame image coding rate distortion optimization method and system |
Also Published As
Publication number | Publication date |
---|---|
CN113542745B (en) | 2024-06-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111988611B (en) | Quantization offset information determining method, image encoding device and electronic equipment | |
CN108063944B (en) | Perception code rate control method based on visual saliency | |
CN107454413B (en) | Video coding method with reserved characteristics | |
CN108337515A (en) | A kind of method for video coding and device | |
CN108347612A (en) | A kind of monitored video compression and reconstructing method of view-based access control model attention mechanism | |
CN104992419A (en) | Super pixel Gaussian filtering pre-processing method based on JND factor | |
CN117274820B (en) | Map data acquisition method and system for mapping geographic information | |
CN106056638B (en) | A kind of low complexity compression perceptual image coding method of adaptive texture contrast | |
CN113542745A (en) | Rate distortion coding optimization method | |
CN114173131B (en) | Video compression method and system based on inter-frame correlation | |
JP2004023288A (en) | Preprocessing system for moving image encoding | |
Yang et al. | Fast intra encoding decisions for high efficiency video coding standard | |
CN1151678C (en) | Method and apparatus for encoding contour image of object in video signal | |
CN116723305A (en) | Virtual viewpoint quality enhancement method based on generation type countermeasure network | |
CN110493597A (en) | A kind of efficiently perception video encoding optimization method | |
CN116896638A (en) | Data compression coding technology for transmission operation detection scene | |
CN113542753B (en) | AVS3 video coding method and encoder | |
CN112292860A (en) | System and method for efficiently representing and encoding images | |
CN115802038A (en) | Quantization parameter determination method and device and video coding method and device | |
CN110446040A (en) | A kind of inter-frame encoding methods and system suitable for HEVC standard | |
CN112929663B (en) | Knowledge distillation-based image compression quality enhancement method | |
CN112218083B (en) | Method for estimating intra-frame image code rate of high-efficiency video coding standard | |
KR100196874B1 (en) | Apparatus for selectively approximating contour of image | |
CN106657999A (en) | Rapid selection method for HEVC intra-frame prediction coding units | |
CN112509107A (en) | Point cloud attribute recoloring method, device and encoder |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Ma Siwei Inventor after: Huang Zhimeng Inventor after: Jia Chuanmin Inventor after: Wang Shanshe Inventor after: Zhao Liping Inventor before: Ma Siwei |
|
GR01 | Patent grant | ||
GR01 | Patent grant |