CN113542745A

CN113542745A - Rate distortion coding optimization method

Info

Publication number: CN113542745A
Application number: CN202110588067.1A
Authority: CN
Inventors: 马思伟
Original assignee: Shaoxing Beida Information Technology Innovation Center
Current assignee: Shaoxing Beida Information Technology Innovation Center
Priority date: 2021-05-27
Filing date: 2021-05-27
Publication date: 2021-10-22
Anticipated expiration: 2041-05-27
Also published as: CN113542745B

Abstract

The invention relates to a rate-distortion coding optimization method, which comprises the following steps: when an image is coded, firstly, a network is analyzed according to preset image characteristics to obtain the characteristics of the image; then, calculating a value (marked as ROIM) of the interest degree of the machine for each coding block according to the characteristics of the image, wherein the higher the ROIM is, the more the machine is likely to be interested in the future visual analysis task; performing code rate allocation on each coding block in the image according to ROIM; after code rate allocation, a calculation mode of rate distortion errors is modified, brand-new characteristic distortion-based coding distortion facing machine analysis is expressed, and finally performance of the coded image in a visual analysis task is improved.

Description

Rate distortion coding optimization method

Technical Field

The invention belongs to the field of image and video compression, and particularly relates to a rate-distortion coding optimization method.

Background

The existing rate distortion coding optimization method for image/video compression mainly adopts the following two modes:

in AVS series video coding standards and H.26x series video coding, the rate-distortion optimization method of most image/video compression adopts a rate-distortion coding method based on pixel signal mean square error, the mean square error is mainly used for estimating the consistency of the compressed image and an original image at the pixel level, and the pursued result is that all pixels are most similar to the original image in numerical value on average. However, this method has been demonstrated by many efforts to be affected by noise, such as focusing errors on certain regions of the image, which can result in cross-visualization even if there are zero errors in other regions. Many times, rate distortion optimization methods based on mean square error cannot accurately represent subjective feelings of the human visual system.

Secondly, in order to solve the mismatching between the pixel level distortion and the human visual system, a rate distortion optimization method facing subjective vision is adopted in many new methods to promote. The commonly used method is structural similarity or multi-scale structural similarity. These rate-distortion optimization methods pay more attention to the structural similarity between the compressed image and the original image, and restore the same graphic structure as the original image as much as possible. However, this method has many limitations when dealing with the task of visual analysis.

The invention content is as follows:

the invention aims to solve the technical problem that the existing rate-distortion coding algorithm has low performance in a visual analysis task.

The invention provides a rate-distortion coding optimization method, which comprises the following steps:

step 1: inputting an image, extracting a frame by using an RPN (Region pro-social Network) Network, and obtaining preset image characteristics of the image;

step 2: calculating the machine interest value of each coding block according to preset image characteristics, and distributing the number of coding bits according to the machine interest value of each coding block;

and step 3: for every two adjacent coding blocks, calculating the correlation index of the adjacent coding blocks according to the preset image characteristics, and limiting the QP calculation in the actual coding according to the correlation index of the adjacent coding blocks;

and 4, step 4: for each coding block, extracting features through a convolutional neural network, calculating cosine distances between the features as distortion, calculating rate distortion loss according to the distortion and code rates, establishing rate distortion optimization according to the rate distortion loss, and outputting an optimized image.

Further, the preset image characteristic in step 1 is a frequency of a frame overlapping each coding block, or a size ratio of the frame overlapping a boundary of two adjacent coding blocks, or any combination of the three.

Further, the method for calculating the value of interest of the machine in step 2 is as follows:

a, defining a machine interest value of each coding block;

b, traversing all frames in the step 1 for each coding block, and calculating the proportion F of the frames and the coding blocks occupying the area of the coding block per se;

and C, recording the F value corresponding to the largest coding block F in all the coding blocks as FMAX, then dividing the F corresponding to all the coding blocks by FMAX for normalization processing, and assigning a machine interest value of each coding block according to a result obtained by the normalization processing.

Further, the method for allocating the number of coding bits is as follows:

initializing the bit number of the whole image;

b, for each coding block, calculating the number of bits which can be used currently and the weighted sum of the SATD value and the machine interest value of each coding block according to the number of bits of the whole image and the number of bits which are consumed, calculating the sum of the SATD value and the machine interest value of the current coding block, accounting for the sum of the SATD values and the machine interest values of all the coding blocks, and distributing the number of bits which can be used currently according to the ratio;

and C, after each distribution, the encoder encodes, and updates the number of bits consumed according to the number of bits consumed by the encoder.

Further, in step 3, the calculation formula of the correlation index of the adjacent coding blocks is as follows: and MC is related index of adjacent coding blocks, A is the intersection of the lengths of frames spanning the current adjacent coding blocks, and B is the length of adjacent edges of the current adjacent coding block.

Further, in step 3, the limiting method is as follows: if the correlation index of the adjacent coding blocks is larger than 0.7, the QP gap between the two adjacent coding blocks cannot exceed 2; otherwise, the QP gap between the two adjacent coding blocks cannot exceed 9. When different coding tree units of the current image are coded, the QP in the current image needs to be set so as to improve the coding quality.

Further, in step 4, the distortion is a weighted sum of the characteristic distortion and the pixel distortion.

Further, the method for calculating the characteristic distortion comprises the following steps: and calculating the cosine distance between the characteristic F1 of the current block extracted by the neural network and the characteristic F2 of the original block, wherein the cosine distance is characteristic distortion.

Further, the pixel distortion calculation method includes: and calculating the average value of the squares of the differences between the corresponding pixels of the current block and the original block after being extracted by the neural network, wherein the average value is pixel distortion.

Compared with the prior art, the invention has the following advantages and effects:

1. the invention establishes a brand-new code rate allocation mode and a rate distortion calculation method.

2. The image compressed by the method can obtain better performance in a visual analysis task under the precondition of the same code rate.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Example 1: a method of rate-distortion coding optimization, comprising the steps of:

step 1: inputting an image, extracting a frame by using a pre-trained RPN (Region-generating Network), wherein the pre-trained RPN and the pre-trained RPN both use the existing algorithm, for example, the algorithm in the article "Faster-cnn" by the authors S.ren, K.He, R.Girshick and J.Sun, the instruction real-time object detection with Region information processing Network (journal number Advances in the probability information processing system 2015, pp.91-99), and defining the frequency of the frame and each coding block overlapping or the size ratio of the frame and the boundary of two adjacent coding blocks overlapping or any combination of the three as a preset image characteristic, for example, the preset characteristic is as follows;

step 2.1: defining a machine interest value (hereinafter referred to as a "ROIM value") of each coding block, traversing all frames in step 1 for each coding block, namely enumerating all frames obtained in step 1, calculating a proportion F of the frames and the coding blocks occupying the area of the coding block, recording an F value corresponding to the coding block with the maximum F in all the coding blocks as FMAX, then dividing the F corresponding to all the coding blocks by FMAX for normalization processing, and assigning a value to the machine interest value of each coding block according to a result obtained by the normalization processing. For example, if a coding block size is 128 × 128 and the size of the intersection of the frame and the coding block is 7285 pixels, F is 7285/(128 × 128) is 0.44; and F corresponding to each coding block is 0.75 at the maximum, the FMAX is 0.75, and the normalization processing is carried out by dividing the F by the FMAX to obtain the ROIM value of each coding block.

Step 2.2: initializing the bit number of the whole image, namely inputting the bit number of the whole image to be a certain value, for example 100, and then traversing each coding block from top to bottom and from left to right in sequence, and for each coding block, firstly obtaining the bit number which can be used currently by the bit number of the whole image and the bit number which is already consumed. And meanwhile, calculating the weighted Sum of the SATD (Sum of Absolute value Sum after Hadamard transform) value and the ROIM value of each coding block, wherein the SATD calculation mode adopts a calculation mode in a VVC (virtual Video coding) standard, the weighted Sum of the SATD value and the ROIM value of the current coding block accounts for the Sum proportion of the weighted sums of the SATD value and the ROIM value of all the coding blocks, the proportion is used as an allocation coefficient to allocate the number of bits which can be used currently, namely, the existing VTM encoder is used for allocating code rates to different coding blocks, and after each allocation, the VTM encoder performs encoding, and updates the number of bits which are already consumed according to the actual number of bits which are consumed by the VTM encoder.

Step 3.1: for every two adjacent coding blocks, calculating a correlation index (MC) of the adjacent coding blocks according to preset image characteristics, wherein the calculation formula is as follows: and MC is related index of adjacent coding blocks, A is the intersection of the lengths of frames spanning the current adjacent coding blocks, and B is the length of adjacent edges of the current adjacent coding block. For example, the size of the default coding block in the VTM is 128, and for two adjacent coding blocks, the border length across the two coding blocks is 96, so that MC is 96/128 is 0.75.

Step 3.2: limiting the QP calculation of the current coding tree unit in the actual coding in the later application process according to the MC, wherein if the MC is more than 0.7, the QP difference between the two adjacent coding blocks cannot exceed 2; otherwise, the QP gap between the two adjacent coding blocks cannot exceed 9. For example, if MC is 96/128-0.75 and MC is greater than 0.7, the QP gap cannot exceed 2.

And 4, step 4: for each coding block, extracting features through a convolutional neural network trained in advance, using the existing algorithm, such as a sub-network obtained by removing the last pooling layer and full connection layer through a VGG-19 network in the article "Very deep convolutional network for large-scale image recognition" (journal number arXiv prediction arXiv:1409.1556,2014), which is written by k.simony and a.zisserman, calculating the cosine distance (defined as feature distortion) between the feature F1 of the current block extracted through the neural network and the feature F2 of the original block and the average value (defined as pixel distortion) of the square of the difference between the current block extracted through the neural network and the corresponding pixel of the original block, defining the weighted sum of the feature distortion and the pixel distortion as distortion, and then using the distortion and the consumed code rate of the current configuration to jointly calculate a rate distortion, the rate distortion loss is calculated by R + lambda D, R is code rate, D is distortion, and lambda is the internal parameter of the encoder, and the rate distortion loss is used for optimizing the rate distortion during division.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for rate-distortion coding optimization, comprising the steps of:

step 1: inputting an image, extracting a frame by using an RPN network, and obtaining preset image characteristics of the image;

and step 3: for every two adjacent coding blocks, calculating the correlation index of the adjacent coding blocks according to the preset image characteristics, and limiting the QP calculation according to the correlation index of the adjacent coding blocks;

2. The method according to claim 1, wherein the predetermined picture characteristic in step 1 is a frequency at which the frame coincides with each coding block, or a size ratio at which the frame coincides with a boundary between two adjacent coding blocks, or any combination thereof.

3. The method of claim 2, wherein the machine interest value in step 2 is calculated by:

a, defining a machine interest value of each coding block;

4. The method of claim 2, wherein the number of coded bits is allocated by:

initializing the bit number of the whole image;

5. The method of claim 2, wherein in step 3, the correlation index of adjacent coding blocks is calculated by the following formula: and MC is related index of adjacent coding blocks, A is the intersection of the lengths of frames spanning the current adjacent coding blocks, and B is the length of adjacent edges of the current adjacent coding block.

6. The method of claim 2, wherein in step 3, the limiting method is: if the correlation index of the adjacent coding blocks is larger than 0.7, the QP gap between the two adjacent coding blocks cannot exceed 2; otherwise, the QP gap between the two adjacent coding blocks cannot exceed 9.

7. The method of claim 2, wherein in step 4, the distortion is a weighted sum of the characteristic distortion and the pixel distortion.

8. The method of claim 7, wherein the characteristic distortion is calculated by: and calculating the cosine distance between the characteristic F1 of the current block extracted by the neural network and the characteristic F2 of the original block, wherein the cosine distance is characteristic distortion.

9. The method of claim 7, wherein the pixel distortion is calculated by: and calculating the average value of the squares of the differences between the corresponding pixels of the current block and the original block after being extracted by the neural network, wherein the average value is pixel distortion.