CN113542745A - Rate distortion coding optimization method - Google Patents

Rate distortion coding optimization method Download PDF

Info

Publication number
CN113542745A
CN113542745A CN202110588067.1A CN202110588067A CN113542745A CN 113542745 A CN113542745 A CN 113542745A CN 202110588067 A CN202110588067 A CN 202110588067A CN 113542745 A CN113542745 A CN 113542745A
Authority
CN
China
Prior art keywords
distortion
coding
calculating
coding blocks
coding block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110588067.1A
Other languages
Chinese (zh)
Other versions
CN113542745B (en
Inventor
马思伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaoxing Beida Information Technology Innovation Center
Original Assignee
Shaoxing Beida Information Technology Innovation Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaoxing Beida Information Technology Innovation Center filed Critical Shaoxing Beida Information Technology Innovation Center
Priority to CN202110588067.1A priority Critical patent/CN113542745B/en
Publication of CN113542745A publication Critical patent/CN113542745A/en
Application granted granted Critical
Publication of CN113542745B publication Critical patent/CN113542745B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention relates to a rate-distortion coding optimization method, which comprises the following steps: when an image is coded, firstly, a network is analyzed according to preset image characteristics to obtain the characteristics of the image; then, calculating a value (marked as ROIM) of the interest degree of the machine for each coding block according to the characteristics of the image, wherein the higher the ROIM is, the more the machine is likely to be interested in the future visual analysis task; performing code rate allocation on each coding block in the image according to ROIM; after code rate allocation, a calculation mode of rate distortion errors is modified, brand-new characteristic distortion-based coding distortion facing machine analysis is expressed, and finally performance of the coded image in a visual analysis task is improved.

Description

Rate distortion coding optimization method
Technical Field
The invention belongs to the field of image and video compression, and particularly relates to a rate-distortion coding optimization method.
Background
The existing rate distortion coding optimization method for image/video compression mainly adopts the following two modes:
in AVS series video coding standards and H.26x series video coding, the rate-distortion optimization method of most image/video compression adopts a rate-distortion coding method based on pixel signal mean square error, the mean square error is mainly used for estimating the consistency of the compressed image and an original image at the pixel level, and the pursued result is that all pixels are most similar to the original image in numerical value on average. However, this method has been demonstrated by many efforts to be affected by noise, such as focusing errors on certain regions of the image, which can result in cross-visualization even if there are zero errors in other regions. Many times, rate distortion optimization methods based on mean square error cannot accurately represent subjective feelings of the human visual system.
Secondly, in order to solve the mismatching between the pixel level distortion and the human visual system, a rate distortion optimization method facing subjective vision is adopted in many new methods to promote. The commonly used method is structural similarity or multi-scale structural similarity. These rate-distortion optimization methods pay more attention to the structural similarity between the compressed image and the original image, and restore the same graphic structure as the original image as much as possible. However, this method has many limitations when dealing with the task of visual analysis.
The invention content is as follows:
the invention aims to solve the technical problem that the existing rate-distortion coding algorithm has low performance in a visual analysis task.
The invention provides a rate-distortion coding optimization method, which comprises the following steps:
step 1: inputting an image, extracting a frame by using an RPN (Region pro-social Network) Network, and obtaining preset image characteristics of the image;
step 2: calculating the machine interest value of each coding block according to preset image characteristics, and distributing the number of coding bits according to the machine interest value of each coding block;
and step 3: for every two adjacent coding blocks, calculating the correlation index of the adjacent coding blocks according to the preset image characteristics, and limiting the QP calculation in the actual coding according to the correlation index of the adjacent coding blocks;
and 4, step 4: for each coding block, extracting features through a convolutional neural network, calculating cosine distances between the features as distortion, calculating rate distortion loss according to the distortion and code rates, establishing rate distortion optimization according to the rate distortion loss, and outputting an optimized image.
Further, the preset image characteristic in step 1 is a frequency of a frame overlapping each coding block, or a size ratio of the frame overlapping a boundary of two adjacent coding blocks, or any combination of the three.
Further, the method for calculating the value of interest of the machine in step 2 is as follows:
a, defining a machine interest value of each coding block;
b, traversing all frames in the step 1 for each coding block, and calculating the proportion F of the frames and the coding blocks occupying the area of the coding block per se;
and C, recording the F value corresponding to the largest coding block F in all the coding blocks as FMAX, then dividing the F corresponding to all the coding blocks by FMAX for normalization processing, and assigning a machine interest value of each coding block according to a result obtained by the normalization processing.
Further, the method for allocating the number of coding bits is as follows:
initializing the bit number of the whole image;
b, for each coding block, calculating the number of bits which can be used currently and the weighted sum of the SATD value and the machine interest value of each coding block according to the number of bits of the whole image and the number of bits which are consumed, calculating the sum of the SATD value and the machine interest value of the current coding block, accounting for the sum of the SATD values and the machine interest values of all the coding blocks, and distributing the number of bits which can be used currently according to the ratio;
and C, after each distribution, the encoder encodes, and updates the number of bits consumed according to the number of bits consumed by the encoder.
Further, in step 3, the calculation formula of the correlation index of the adjacent coding blocks is as follows: and MC is related index of adjacent coding blocks, A is the intersection of the lengths of frames spanning the current adjacent coding blocks, and B is the length of adjacent edges of the current adjacent coding block.
Further, in step 3, the limiting method is as follows: if the correlation index of the adjacent coding blocks is larger than 0.7, the QP gap between the two adjacent coding blocks cannot exceed 2; otherwise, the QP gap between the two adjacent coding blocks cannot exceed 9. When different coding tree units of the current image are coded, the QP in the current image needs to be set so as to improve the coding quality.
Further, in step 4, the distortion is a weighted sum of the characteristic distortion and the pixel distortion.
Further, the method for calculating the characteristic distortion comprises the following steps: and calculating the cosine distance between the characteristic F1 of the current block extracted by the neural network and the characteristic F2 of the original block, wherein the cosine distance is characteristic distortion.
Further, the pixel distortion calculation method includes: and calculating the average value of the squares of the differences between the corresponding pixels of the current block and the original block after being extracted by the neural network, wherein the average value is pixel distortion.
Compared with the prior art, the invention has the following advantages and effects:
1. the invention establishes a brand-new code rate allocation mode and a rate distortion calculation method.
2. The image compressed by the method can obtain better performance in a visual analysis task under the precondition of the same code rate.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example 1: a method of rate-distortion coding optimization, comprising the steps of:
step 1: inputting an image, extracting a frame by using a pre-trained RPN (Region-generating Network), wherein the pre-trained RPN and the pre-trained RPN both use the existing algorithm, for example, the algorithm in the article "Faster-cnn" by the authors S.ren, K.He, R.Girshick and J.Sun, the instruction real-time object detection with Region information processing Network (journal number Advances in the probability information processing system 2015, pp.91-99), and defining the frequency of the frame and each coding block overlapping or the size ratio of the frame and the boundary of two adjacent coding blocks overlapping or any combination of the three as a preset image characteristic, for example, the preset characteristic is as follows;
step 2.1: defining a machine interest value (hereinafter referred to as a "ROIM value") of each coding block, traversing all frames in step 1 for each coding block, namely enumerating all frames obtained in step 1, calculating a proportion F of the frames and the coding blocks occupying the area of the coding block, recording an F value corresponding to the coding block with the maximum F in all the coding blocks as FMAX, then dividing the F corresponding to all the coding blocks by FMAX for normalization processing, and assigning a value to the machine interest value of each coding block according to a result obtained by the normalization processing. For example, if a coding block size is 128 × 128 and the size of the intersection of the frame and the coding block is 7285 pixels, F is 7285/(128 × 128) is 0.44; and F corresponding to each coding block is 0.75 at the maximum, the FMAX is 0.75, and the normalization processing is carried out by dividing the F by the FMAX to obtain the ROIM value of each coding block.
Step 2.2: initializing the bit number of the whole image, namely inputting the bit number of the whole image to be a certain value, for example 100, and then traversing each coding block from top to bottom and from left to right in sequence, and for each coding block, firstly obtaining the bit number which can be used currently by the bit number of the whole image and the bit number which is already consumed. And meanwhile, calculating the weighted Sum of the SATD (Sum of Absolute value Sum after Hadamard transform) value and the ROIM value of each coding block, wherein the SATD calculation mode adopts a calculation mode in a VVC (virtual Video coding) standard, the weighted Sum of the SATD value and the ROIM value of the current coding block accounts for the Sum proportion of the weighted sums of the SATD value and the ROIM value of all the coding blocks, the proportion is used as an allocation coefficient to allocate the number of bits which can be used currently, namely, the existing VTM encoder is used for allocating code rates to different coding blocks, and after each allocation, the VTM encoder performs encoding, and updates the number of bits which are already consumed according to the actual number of bits which are consumed by the VTM encoder.
Step 3.1: for every two adjacent coding blocks, calculating a correlation index (MC) of the adjacent coding blocks according to preset image characteristics, wherein the calculation formula is as follows: and MC is related index of adjacent coding blocks, A is the intersection of the lengths of frames spanning the current adjacent coding blocks, and B is the length of adjacent edges of the current adjacent coding block. For example, the size of the default coding block in the VTM is 128, and for two adjacent coding blocks, the border length across the two coding blocks is 96, so that MC is 96/128 is 0.75.
Step 3.2: limiting the QP calculation of the current coding tree unit in the actual coding in the later application process according to the MC, wherein if the MC is more than 0.7, the QP difference between the two adjacent coding blocks cannot exceed 2; otherwise, the QP gap between the two adjacent coding blocks cannot exceed 9. For example, if MC is 96/128-0.75 and MC is greater than 0.7, the QP gap cannot exceed 2.
And 4, step 4: for each coding block, extracting features through a convolutional neural network trained in advance, using the existing algorithm, such as a sub-network obtained by removing the last pooling layer and full connection layer through a VGG-19 network in the article "Very deep convolutional network for large-scale image recognition" (journal number arXiv prediction arXiv:1409.1556,2014), which is written by k.simony and a.zisserman, calculating the cosine distance (defined as feature distortion) between the feature F1 of the current block extracted through the neural network and the feature F2 of the original block and the average value (defined as pixel distortion) of the square of the difference between the current block extracted through the neural network and the corresponding pixel of the original block, defining the weighted sum of the feature distortion and the pixel distortion as distortion, and then using the distortion and the consumed code rate of the current configuration to jointly calculate a rate distortion, the rate distortion loss is calculated by R + lambda D, R is code rate, D is distortion, and lambda is the internal parameter of the encoder, and the rate distortion loss is used for optimizing the rate distortion during division.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (9)

1. A method for rate-distortion coding optimization, comprising the steps of:
step 1: inputting an image, extracting a frame by using an RPN network, and obtaining preset image characteristics of the image;
step 2: calculating the machine interest value of each coding block according to preset image characteristics, and distributing the number of coding bits according to the machine interest value of each coding block;
and step 3: for every two adjacent coding blocks, calculating the correlation index of the adjacent coding blocks according to the preset image characteristics, and limiting the QP calculation according to the correlation index of the adjacent coding blocks;
and 4, step 4: for each coding block, extracting features through a convolutional neural network, calculating cosine distances between the features as distortion, calculating rate distortion loss according to the distortion and code rates, establishing rate distortion optimization according to the rate distortion loss, and outputting an optimized image.
2. The method according to claim 1, wherein the predetermined picture characteristic in step 1 is a frequency at which the frame coincides with each coding block, or a size ratio at which the frame coincides with a boundary between two adjacent coding blocks, or any combination thereof.
3. The method of claim 2, wherein the machine interest value in step 2 is calculated by:
a, defining a machine interest value of each coding block;
b, traversing all frames in the step 1 for each coding block, and calculating the proportion F of the frames and the coding blocks occupying the area of the coding block per se;
and C, recording the F value corresponding to the largest coding block F in all the coding blocks as FMAX, then dividing the F corresponding to all the coding blocks by FMAX for normalization processing, and assigning a machine interest value of each coding block according to a result obtained by the normalization processing.
4. The method of claim 2, wherein the number of coded bits is allocated by:
initializing the bit number of the whole image;
b, for each coding block, calculating the number of bits which can be used currently and the weighted sum of the SATD value and the machine interest value of each coding block according to the number of bits of the whole image and the number of bits which are consumed, calculating the sum of the SATD value and the machine interest value of the current coding block, accounting for the sum of the SATD values and the machine interest values of all the coding blocks, and distributing the number of bits which can be used currently according to the ratio;
and C, after each distribution, the encoder encodes, and updates the number of bits consumed according to the number of bits consumed by the encoder.
5. The method of claim 2, wherein in step 3, the correlation index of adjacent coding blocks is calculated by the following formula: and MC is related index of adjacent coding blocks, A is the intersection of the lengths of frames spanning the current adjacent coding blocks, and B is the length of adjacent edges of the current adjacent coding block.
6. The method of claim 2, wherein in step 3, the limiting method is: if the correlation index of the adjacent coding blocks is larger than 0.7, the QP gap between the two adjacent coding blocks cannot exceed 2; otherwise, the QP gap between the two adjacent coding blocks cannot exceed 9.
7. The method of claim 2, wherein in step 4, the distortion is a weighted sum of the characteristic distortion and the pixel distortion.
8. The method of claim 7, wherein the characteristic distortion is calculated by: and calculating the cosine distance between the characteristic F1 of the current block extracted by the neural network and the characteristic F2 of the original block, wherein the cosine distance is characteristic distortion.
9. The method of claim 7, wherein the pixel distortion is calculated by: and calculating the average value of the squares of the differences between the corresponding pixels of the current block and the original block after being extracted by the neural network, wherein the average value is pixel distortion.
CN202110588067.1A 2021-05-27 2021-05-27 Rate distortion coding optimization method Active CN113542745B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110588067.1A CN113542745B (en) 2021-05-27 2021-05-27 Rate distortion coding optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110588067.1A CN113542745B (en) 2021-05-27 2021-05-27 Rate distortion coding optimization method

Publications (2)

Publication Number Publication Date
CN113542745A true CN113542745A (en) 2021-10-22
CN113542745B CN113542745B (en) 2024-06-25

Family

ID=78124465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110588067.1A Active CN113542745B (en) 2021-05-27 2021-05-27 Rate distortion coding optimization method

Country Status (1)

Country Link
CN (1) CN113542745B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118233638A (en) * 2024-05-24 2024-06-21 宁波康达凯能医疗科技有限公司 Machine vision-oriented inter-frame image coding rate distortion optimization method and system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004044830A1 (en) * 2002-11-12 2004-05-27 Nokia Corporation Region-of-interest tracking method and device for wavelet-based video coding
CN101198058A (en) * 2007-12-14 2008-06-11 武汉大学 Rate aberration optimizing frame refreshing and code rate distribution method for interested area
AU2008230068A1 (en) * 2002-04-15 2008-11-13 Godo Kaisha Ip Bridge 1 Image encoding method and image decoding method
WO2010036772A2 (en) * 2008-09-26 2010-04-01 Dolby Laboratories Licensing Corporation Complexity allocation for video and image coding applications
KR20100071834A (en) * 2008-12-19 2010-06-29 주식회사 케이티 Apparatus and method for calculating distortion of moving picture
CN104539962A (en) * 2015-01-20 2015-04-22 北京工业大学 Layered video coding method fused with visual perception features
EP3151566A1 (en) * 2012-06-29 2017-04-05 GE Video Compression, LLC Video data stream concept
CN109889839A (en) * 2019-03-27 2019-06-14 上海交通大学 ROI Image Coding, decoding system and method based on deep learning
CN112752102A (en) * 2019-10-31 2021-05-04 北京大学 Video code rate distribution method based on visual saliency

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2008230068A1 (en) * 2002-04-15 2008-11-13 Godo Kaisha Ip Bridge 1 Image encoding method and image decoding method
WO2004044830A1 (en) * 2002-11-12 2004-05-27 Nokia Corporation Region-of-interest tracking method and device for wavelet-based video coding
CN101198058A (en) * 2007-12-14 2008-06-11 武汉大学 Rate aberration optimizing frame refreshing and code rate distribution method for interested area
WO2010036772A2 (en) * 2008-09-26 2010-04-01 Dolby Laboratories Licensing Corporation Complexity allocation for video and image coding applications
KR20100071834A (en) * 2008-12-19 2010-06-29 주식회사 케이티 Apparatus and method for calculating distortion of moving picture
EP3151566A1 (en) * 2012-06-29 2017-04-05 GE Video Compression, LLC Video data stream concept
CN104539962A (en) * 2015-01-20 2015-04-22 北京工业大学 Layered video coding method fused with visual perception features
WO2016115968A1 (en) * 2015-01-20 2016-07-28 北京工业大学 Visual perception feature-fused scaled video coding method
CN109889839A (en) * 2019-03-27 2019-06-14 上海交通大学 ROI Image Coding, decoding system and method based on deep learning
CN112752102A (en) * 2019-10-31 2021-05-04 北京大学 Video code rate distribution method based on visual saliency

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A. SERDAR TAN: "Rate-Distortion Optimization for Stereoscopic Video Streaming with Unequal Error Protection", EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING *
张哲为: "基于感兴趣区域率失真优化的视频压缩编码通信系统设计", 中国优秀硕博毕业论文 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118233638A (en) * 2024-05-24 2024-06-21 宁波康达凯能医疗科技有限公司 Machine vision-oriented inter-frame image coding rate distortion optimization method and system

Also Published As

Publication number Publication date
CN113542745B (en) 2024-06-25

Similar Documents

Publication Publication Date Title
CN111988611B (en) Quantization offset information determining method, image encoding device and electronic equipment
CN108063944B (en) Perception code rate control method based on visual saliency
CN107454413B (en) Video coding method with reserved characteristics
CN108337515A (en) A kind of method for video coding and device
CN108347612A (en) A kind of monitored video compression and reconstructing method of view-based access control model attention mechanism
CN104992419A (en) Super pixel Gaussian filtering pre-processing method based on JND factor
CN117274820B (en) Map data acquisition method and system for mapping geographic information
CN106056638B (en) A kind of low complexity compression perceptual image coding method of adaptive texture contrast
CN113542745A (en) Rate distortion coding optimization method
CN114173131B (en) Video compression method and system based on inter-frame correlation
JP2004023288A (en) Preprocessing system for moving image encoding
Yang et al. Fast intra encoding decisions for high efficiency video coding standard
CN1151678C (en) Method and apparatus for encoding contour image of object in video signal
CN116723305A (en) Virtual viewpoint quality enhancement method based on generation type countermeasure network
CN110493597A (en) A kind of efficiently perception video encoding optimization method
CN116896638A (en) Data compression coding technology for transmission operation detection scene
CN113542753B (en) AVS3 video coding method and encoder
CN112292860A (en) System and method for efficiently representing and encoding images
CN115802038A (en) Quantization parameter determination method and device and video coding method and device
CN110446040A (en) A kind of inter-frame encoding methods and system suitable for HEVC standard
CN112929663B (en) Knowledge distillation-based image compression quality enhancement method
CN112218083B (en) Method for estimating intra-frame image code rate of high-efficiency video coding standard
KR100196874B1 (en) Apparatus for selectively approximating contour of image
CN106657999A (en) Rapid selection method for HEVC intra-frame prediction coding units
CN112509107A (en) Point cloud attribute recoloring method, device and encoder

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Ma Siwei

Inventor after: Huang Zhimeng

Inventor after: Jia Chuanmin

Inventor after: Wang Shanshe

Inventor after: Zhao Liping

Inventor before: Ma Siwei

GR01 Patent grant
GR01 Patent grant