WO2023159965A1

WO2023159965A1 - Visual perception-based rate control method and device

Info

Publication number: WO2023159965A1
Application number: PCT/CN2022/123742
Authority: WO
Inventors: 刘鹏飞; 温安君; 刘国正
Original assignee: 翱捷科技股份有限公司
Priority date: 2022-02-23
Filing date: 2022-10-08
Publication date: 2023-08-31
Also published as: CN114666585A

Abstract

Disclosed in the present invention is a visual perception-based rate control method, comprising the following steps: step S10, calculating the gradient magnitude of a texture-rich region in an LCU, and taking a calculation result as a texture perception factor of the LCU; step S20, calculating the product of the frame difference magnitude and an area proportion of a motion-rich region in the LCU, and taking a calculation result as a motion perception factor of the LCU, wherein Step S10 and step S20 are carried out sequentially or simultaneously; and step S30, calculating a bit allocation weight of the LCU according to the texture perception factor and the motion perception factor of the LCU, and further calculating the number of target coding bits of the LCU. The technical effect achieved by the present invention is that the target bit allocation of the LCU in an image frame better conforms to the subjective perception features of human eyes, the subjective visual quality of human eyes is improved, and the method is suitable for being realized by adopting hardware.

Description

A code rate control method and device based on visual perception

technical field

The present invention relates to a video encoding technology, in particular to a code rate control method and device based on visual perception and suitable for hardware implementation.

Background technique

Video coding is a technology that compresses redundant components in video images and uses as little data as possible to represent video information. HEVC (High Efficiency Video Coding, also known as H.265) is a new generation of video coding standards. Compared with the previous generation video coding standard AVC (Advanced Video Coding, also known as H.264), under the premise of achieving the same video coding quality, HEVC can reduce the coding bit rate by about 50%, and the video compression performance is improved compared with AVC about double. Due to the large amount of computation of the video encoding algorithm, in order to improve the video encoding speed, it has become a common practice in the industry to use an application specific integrated circuit (ASIC) to perform hardware acceleration on the video encoding process.

Video coding technology uses image block as the basic coding unit. In HEVC, the basic coding unit is CU (Coding Unit, coding unit). The CU may be an image block of 64 pixels×64 pixels, 32 pixels×32 pixels, 16 pixels×16 pixels, or 8 pixels×8 pixels. The image block of 64 pixels×64 pixels is also called LCU (Largest Coding Unit, the largest coding unit).

In practical applications, the channel bandwidth capacity used to transmit compressed video is limited. If the encoding bit rate of the compressed video is too high and exceeds the capacity of the channel bandwidth, it will cause video transmission congestion or even packet loss. If the encoding bit rate of the compressed video is too low, the channel bandwidth will not be fully utilized, and higher video quality cannot be obtained. Therefore, it is necessary to use rate control (Rate Control) technology to control the output bit rate of the video encoder to match the channel bandwidth capacity.

The purpose of the rate control technology is to adjust the encoding parameters of the video encoder so that the output bit rate of the video encoder is equal to the preset target bit rate, and at the same time reduce encoding distortion as much as possible to improve the video encoding quality. Currently in the HEVC reference encoder HM (HEVC Test Model, HEVC Test Model), the rate control algorithm used is based on the JCTVC-K0103 proposal. The JCTVC-K0103 proposal establishes a mathematical relationship model between the coding bit rate R and the Lagrangian multiplier λ (ie, the R-λ model), and realizes the bit rate control task through two links of target bit allocation and target bit control.

In the JCTVC-K0103 proposal, the target bit allocation is carried out at three levels, namely the GOP (Group Of Pictures, image group, that is, a set of time-continuous image frames) level, the image frame level, and the basic coding unit level. In order to reduce the computational complexity, in the target bit allocation at the basic coding unit level, the LCU is generally selected as the basic unit of the target bit allocation. Therefore, the target bit allocation at the basic coding unit level is usually also referred to as the target bit allocation at the LCU level.

After the target bit allocation at the GOP level and the image frame level, the target encoding bit number of the current video frame to be encoded is determined, and the next step is to perform LCU level target bit allocation to determine the bit allocation of each LCU in the current video frame weight, and allocate the target number of coding bits for each LCU according to the bit allocation weight of each LCU. In the JCTVC-K0103 proposal, the LCU-level bit allocation is performed according to the following formula.

Among them, T _{LCU_curr} is the target number of coding bits allocated by the LCU currently to be encoded, T _Pic is the target number of coding bits allocated to the current video frame to be encoded (video frame is also called an image frame), and Bit _H is the pre-estimated video frame header information The number of bits required, Coded _Pic is the actual number of coded bits of the encoded LCU in the current video frame to be encoded, ω _{LCU_curr} is the bit allocation weight of the current LCU to be encoded, ω _LCU indicates that it does not specifically refer to the bit allocation weight of a certain LCU, ∑ _{{AllNotCodedLCUs}} ω _LCU is the sum of the bit allocation weights of all uncoded LCUs in the current video frame to be coded.

It can be found from the above formula that the core of the bit allocation at the LCU level is the bit allocation weight ω _LCU of the LCU. After calculating the bit allocation weights of all LCUs in the current video frame to be encoded, the sum of the bit allocation weights of all unencoded LCUs in the current frame can be obtained by accumulating the bit allocation weights of each unencoded LCU. Then, according to the proportion of the bit allocation weight of each LCU in the sum of the bit allocation weights of all unencoded LCUs in the current frame, the target number of encoded bits of each LCU can be calculated, and the target bit allocation at the LCU level can be completed.

In the JCTVC-K0103 proposal, the bit allocation weight ω _LCU of the LCU is calculated based on the prediction error MAD value (Mean Absolute Differences, mean absolute difference) of the LCU at the same position (that is, the same position) in the previous coded frame. The calculation formula is as follows.

Among them, ω _LCU is the bit allocation weight of the LCU, and MAD _LCU is the prediction error MAD value of the same LCU of the LCU in the previous coded frame.

Among them, N _pixels is the number of pixels in the LCU, ∑ _{{AllPixelsInLCU}} is to accumulate all the pixels in the LCU, P _org is the brightness value of the original pixel, and P _pred is the brightness value of the predicted pixel.

When watching video images, the human eye is affected by subjective perception, and pays more attention to areas with complex textures and rich details in video images, and pays less attention to flat areas with inconspicuous texture features in images. At the same time, when the human eye watches a video image, it also pays more attention to areas with intense movement and rich changes in the video image, and pays less attention to areas that are still in the image. In other words, in video images, motion-rich regions and texture-rich regions are more likely to attract the attention of human vision. Based on the visual perception characteristics of the human eye, it is necessary to allocate more target codes to the LCU in the motion-rich areas and texture-rich areas in the image frame when the target encoding bit rate of the image frame is fixed during the video encoding process. The number of bits to improve the visual quality perceived by the human eye. To achieve this goal, the rate control algorithm of video coding needs to be able to detect motion-rich areas and texture-rich areas, and calculate the bit allocation weights at the LCU level accordingly, so that more target coding bits can be allocated to the areas located in human In the LCU in the eye-focused area, the visual quality of the subjective perception of the human eye can be improved.

The research on the HEVC code rate control technology based on the JCTVC-K0103 proposal shows that the bit allocation weight at the LCU level is calculated based on the prediction error MAD value of the same LCU in the previous encoded frame, and it only calculates the pixel from the perspective of signal processing. The brightness difference between points does not take into account the subjective visual perception characteristics of the human eye, so it cannot achieve the purpose of allocating more target coding bits to the human eye attention area. Therefore, it is necessary to improve the LCU-level target bit allocation method in the HEVC bit rate control technology, select factors that can represent the human visual perception to calculate the LCU-level bit allocation weights, and provide motion-rich areas and texture-rich areas in the image frame. The LCU allocates more target coding bits to achieve the purpose of improving the visual quality of human subjective perception. The visual perception factor mentioned here needs to be able to represent the subjective attention of the human eye to the image content, and it needs to meet the following characteristics: (1) For the human eye attention area, such as the motion-rich area and the texture-rich area, the value of the visual perception factor Larger; and the larger the motion amplitude and texture complexity, the larger the value of the visual perception factor. (2) For areas that the human eye does not pay attention to, such as sparse motion areas and areas with simple textures, the value of the visual perception factor is small; and the smaller the motion amplitude and the simpler the texture, the smaller the value of the visual perception factor;

At present, there are some technical solutions to improve the HEVC bit rate control technology, so as to achieve the purpose of improving the visual quality of human subjective perception. Most of these schemes also start with looking for factors that can represent the visual perception of the human eye, use the visual perception factor to calculate the bit allocation weight at the LCU level, and allocate more target coding bits to the LCU within the human eye's attention area. However, these existing technical solutions generally have the following problems.

The first type of prior art solution needs to preprocess the entire frame of image before encoding the current frame. The purpose of preprocessing is to calculate the bit allocation weight of each LCU in the current frame according to the selected visual perception factor, and then accumulate the sum of the bit allocation weights of all LCUs in the current frame, and then pass the bit allocation weight of each LCU in the current frame. Calculate the target number of coded bits of each LCU according to the ratio of the inside. This type of technical solution needs to add a preprocessing stage before image encoding to calculate the bit allocation weights of all LCUs in the entire frame image before calculating the target number of encoding bits for each LCU. In this type of technical solution, the preprocessing of the entire frame of image takes a lot of time, and the larger the resolution of the image frame, the longer the preprocessing time will be, which will introduce a large frame-level coding delay. At the same time, since the preprocessing of the entire frame image is performed separately from the image encoding, the preprocessing and encoding need to read the image content from the memory separately, which will consume a lot of additional bus bandwidth. Therefore, this type of technical solution is suitable for software encoder implementation, not for hardware encoder implementation.

The second type of prior art solution utilizes the correlation between video frames, and uses the bit allocation weight of each LCU in the previous frame as the bit allocation weight of the same LCU in the current frame. This type of scheme calculates the bit allocation weight corresponding to this LCU according to the visual perception factor during the encoding process of each LCU when encoding the previous frame. When the encoding of the previous frame is completed, the bit allocation weights of the LCUs in the previous frame are also calculated simultaneously. The bit allocation weight of each LCU in the current frame adopts the bit allocation weight of the same LCU in the previous frame. For this type of technical solution, although there is no need to add an additional preprocessing stage, the real-time performance is better, and no additional bus bandwidth will be consumed, but because the bit allocation weight of the LCU on the current frame is completely calculated from the image content of the previous frame It is shown that when the change between frames is large, the bit allocation weight of the LCU on the current frame will not match the subjective visual sensitive area of the human eye, and the LCU bit allocation error will be large.

In addition to the above-mentioned problems, the algorithms of visual perception factors selected by many existing technical solutions are very complicated, require a lot of calculation, consume a lot of computing resources and bandwidth resources, and are not suitable for hardware implementation.

Contents of the invention

The technical problem to be solved by the present invention is to propose a HEVC code rate control method and device based on human subjective visual perception and suitable for hardware implementation.

In order to solve the above technical problems, the present invention proposes a code rate control method based on visual perception, including the following steps: Step S10: Calculate the gradient magnitude of the texture-rich area inside the LCU, and use the calculation result as the texture perception factor of the LCU. Step S20: Calculate the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU, and use the calculation result as the motion perception factor of the LCU. The sequence of step S10 and step S20 is either performed first, or performed simultaneously. Step S30: Calculate the bit allocation weight of the LCU according to the texture perception factor and the motion perception factor of the LCU, and then calculate the target number of coding bits of the LCU.

Further, the texture perception factor of the LCU represents the visual sensitivity of the human eye to the texture feature of the LCU; the larger the value of the texture perception factor of the LCU, the richer the texture inside the LCU, and the more attention the subjective vision of the human eye has; The smaller the value of the texture perception factor of the LCU, the smoother the texture inside the LCU, and the less attention is paid to the subjective vision of the human eye.

Further, the motion perception factor of the LCU characterizes the visual sensitivity of the human eye to the motion characteristics of the LCU; the larger the value of the motion perception factor of the LCU, the richer the movement inside the LCU, and the more attention the human eye subjective vision pays; The smaller the value of the motion perception factor of the LCU, the lighter the motion inside the LCU, and the less attention is paid to the subjective vision of the human eye.

Further, the calculation of the texture perception factor of the LCU in the step S10 specifically includes the following steps. Step S11: Calculate the gradient magnitude Grad _{x, y} of each pixel in the LCU, and judge whether each pixel in the LCU is a texture-rich pixel by using the gradient determination threshold of the texture-rich pixel obtained from the gradient information of the previous frame; The area composed of texture-rich pixels is the texture-rich area inside the LCU. Step S12: Calculate the gradient magnitude Grad _LCU of the texture-rich region inside the LCU; when calculating the Grad _LCU of each LCU in the current frame, record the maximum value G _max . Step S13: Normalize the gradient magnitude Grad _LCU of the texture-rich area inside the LCU to obtain

Wherein the maximum value of the gradient magnitude of the LCU internal texture-rich region of the previous frame is used as the normalized benchmark of the gradient magnitude of the LCU internal texture-rich region in the current frame; the normalized value

It is the texture perception factor of LCU. Step S14: Using the pixel gradient mean value of the current frame to calculate the gradient determination threshold Grad _Thr of the texture-rich pixel in the next frame.

Further, in the step S11, if Grad _{x, y} >Grad _Thr is satisfied, it is determined that the pixel belongs to a texture-rich pixel, otherwise it is determined that the pixel does not belong to a texture-rich pixel; Grad _Thr is a texture-rich pixel obtained from the gradient information of the previous frame. Gradient decision threshold for pixels.

Further, in the step S12,

Grad _x,y >Grad _Thr ; where M is the horizontal width of the LCU, N is the vertical height of the LCU, and Grad _LCU is the sum of the gradient magnitudes of all texture-rich pixels inside the LCU.

Further, in the step S13,

Wherein, G _max is the maximum value of the Grad _LCU of each LCU in the previous frame;

The value of is between [0,ZZ] and is an integer, and an integer between 0 and ZZ is used to represent the normalized range of 0 to 1, and ZZ represents the maximum value after normalization.

Further, in the step S14, Grad _Thr =Grad _avg +Grad _offset or Grad _Thr =α×Grad _avg ; wherein, Grad _offset is the gradient threshold adjustment deviation, α is the gradient threshold adjustment multiplier, and Grad _avg is the current frame’s pixel gradient mean;

Wherein, W is the horizontal width of the current image frame, and H is the vertical height of the current image frame.

Further, the calculation of the motion perception factor of the LCU in the step S20 specifically includes the following steps. Step S21: Calculate the frame difference amplitude Diff _{x,y of} each pixel in the LCU and the corresponding pixel of the same LCU in the previous frame, and judge the frame difference judgment threshold of the motion-rich pixels obtained from the frame difference information of the previous frame to judge the Whether each pixel is a motion-rich pixel; the area inside the LCU composed of motion-rich pixels is the motion-rich area inside the LCU. Step S22: Calculate the frame difference amplitude value Diff _LCU of the motion-rich area inside the LCU; when calculating the Diff _LCU of each LCU in the current frame, record the maximum value D _max . Step S23: Calculate the area ratio Area _LCU of the motion-rich area inside the LCU. The sequence of step S22 and step S23 is either carried out first, or carried out at the same time. Step S24: Calculate the normalized value of the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU

The maximum value of the frame difference amplitude of the LCU internal motion-rich area of the previous frame is used as the normalized benchmark of the product of the frame difference amplitude and the area ratio of the LCU internal motion-rich area in the current frame; normalization value

It is the motion perception factor of LCU. Step S25: Calculate the frame difference determination threshold Diff _Thr of the motion-rich pixels in the next frame by using the frame difference amplitude mean value of the current frame.

Further, in the step S21, if Diff _{x, y} _> Diff _Thr is satisfied, it is determined that the pixel belongs to a motion-rich pixel, otherwise it is determined that the pixel does not belong to a motion-rich pixel; The frame difference decision threshold for rich pixels.

Further, in the step S22,

Diff _x,y >Diff _Thr ; where M is the horizontal width of the LCU, N is the vertical height of the LCU, and Diff _LCU is the sum of frame differences of all motion-rich pixels inside the LCU.

Further, in the step S23,

Among them, the value of Area _LCU is between [0, ZZ] and is an integer, and the corresponding area accounts for 0% to 100%; M is the horizontal width of LCU, and N is the vertical height of LCU;

Wherein, if the pixel is a motion-rich pixel, its corresponding Mov _{x, y} takes a value of 1, otherwise takes a value of 0.

Further, in the step S24,

in,

The value of is between [0, ZZ] and is an integer, using an integer between 0 and ZZ to represent the normalized range of 0 to 1, ZZ represents the maximum value after normalization; D _max is the previous frame The maximum value of the Diff _LCU of each LCU in .

Further, in the step S25, Diff _Thr =Diff _avg +Diff _offset or Diff _Thr =β×Diff _avg ; wherein, Diff _offset is the motion threshold adjustment deviation, β is the motion threshold adjustment multiplier, and Diff _avg is the current frame Mean frame difference amplitude;

Wherein, W is the horizontal width of the image frame, and H is the vertical height of the image frame.

Further, in the step S30, the formula for calculating the bit allocation weight of the LCU according to the texture perception factor K _T of the LCU and the motion perception factor K _M of the LCU is as follows; ω _LCU = μ _T × K _T + μ _M × K _M ; Among them, ω _LCU is the bit allocation weight of LCU, and its value is between [0, ZZ] and is an integer, using an integer between 0 and ZZ to represent 0 to 1; μ _T is the weight coefficient of the LCU texture perception factor , μ _M is the weight coefficient of the LCU motion perception factor, both of which satisfy μ _T +μ _M =1 and 0<μ _T <1 and 0<μ _M <1.

Further, in the step S30, after the bit allocation weight ω _LCU of the LCU in the current frame is calculated, the sum of the LCU bit allocation weights in the previous frame is used to replace the sum of the bit allocation weights of the LCU in the current frame, and the real-time Calculate the proportion of the bit allocation weight of the LCU in the current frame in the entire image frame

Then calculate the target code bit number of LCU;

Preferably, the value of ZZ is one of 127, 255, 511, or 1023.

The present application also proposes a code rate control device based on visual perception, which includes a texture perception factor calculation module, a motion perception factor calculation module and an LCU bit allocation module. The texture perception factor calculation module is used to calculate the gradient magnitude of the texture-rich area inside the LCU, and use the calculation result as the texture perception factor of the LCU. The motion perception factor calculation module is used to calculate the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU, and use the calculation result as the motion perception factor of the LCU. The LCU bit allocation module calculates the bit allocation weight of the LCU according to the texture perception factor and the motion perception factor of the LCU, and then calculates the target number of coding bits of the LCU.

The technical effect achieved by the invention is to make the target bit allocation of the LCU in the image frame more in line with the characteristics of the subjective perception of the human eye, improve the subjective visual quality of the human eye, and be suitable for hardware implementation.

Description of drawings

FIG. 1 is a schematic flow chart of a code rate control method proposed by the present invention.

FIG. 2 is a schematic structural diagram of a code rate control device proposed by the present invention.

Fig. 3 is a schematic flow chart of calculating the texture perception factor of the LCU in step S10.

FIG. 4 is a schematic flow chart of calculating the motion perception factor of the LCU in step S20.

Reference numerals in the figure illustrate: 10 is a texture perception factor calculation module, 20 is a motion perception factor calculation module, and 30 is an LCU bit allocation module.

Detailed ways

Please refer to FIG. 1 , the code rate control method proposed by the present invention includes the following steps.

Step S10: Calculate the gradient magnitude of the texture-rich region inside the LCU, and use the calculation result as the texture perception factor of the LCU. The texture perception factor represents the visual sensitivity of the human eye to the texture feature of the LCU.

Step S20: Calculate the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU, and use the calculation result as the motion perception factor of the LCU. The motion perception factor represents the degree of visual sensitivity of the human eye to the motion characteristics of the LCU. In this step, the product of the frame difference magnitude and the area ratio of the motion-rich area of the LCU is used to calculate the motion perception factor of the LCU, which belongs to an innovation of the present invention. The inventor found through experiments that if only the frame difference amplitude of the motion-rich area inside the LCU is used to represent the motion richness of the LCU, this method is sensitive to small-area motion and is easily affected by sensor noise, resulting in inaccurate judgment results. If only the area ratio of the motion-rich area inside the LCU is used to characterize the motion richness of the LCU, this method is very sensitive to subtle changes in illumination and shadows in local areas of the image caused by foreground motion, which does not meet the characteristics of human eyes. Using the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU to characterize the motion richness of the LCU can overcome the shortcomings of the above-mentioned single method, and can more accurately reflect the visual attention of the human eye to the motion area.

The order of the steps S10 and S20 is not strictly limited, either one can be performed first, or they can be performed simultaneously.

Step S30: Calculate the bit allocation weight of the LCU according to the texture perception factor and the motion perception factor of the LCU, and then calculate the target number of coding bits of the LCU.

By adopting the code rate control method proposed by the present invention, the bit allocation of the LCU in the image frame can be more in line with the characteristics of the subjective perception of the human eye, and the subjective visual quality of the human eye can be improved.

Please refer to FIG. 2 , the code rate control device proposed by the present invention includes a texture perception factor calculation module 10 , a motion perception factor calculation module 20 and an LCU bit allocation module 30 , which generally correspond to the code rate control method shown in FIG. 1 . Wherein, the texture perception factor calculation module 10 is used to calculate the gradient magnitude of the texture-rich region inside the LCU, and use the calculation result as the texture perception factor of the LCU. The motion perception factor calculation module 20 is used to calculate the product of the frame difference magnitude and the area ratio of the motion-rich area inside the LCU, and use the calculation result as the motion perception factor of the LCU. The LCU bit allocation module 30 calculates the bit allocation weight of the LCU according to the texture perception factor and the motion perception factor of the LCU, and then calculates the target number of coding bits of the LCU.

Please refer to FIG. 3 , the calculation of the texture perception factor of the LCU in the step S10 specifically includes the following steps.

Step S11: Calculate the gradient magnitude Grad _x,y of each pixel in the LCU, and determine whether each pixel in the LCU is a texture-rich pixel based on the gradient determination threshold of the texture-rich pixel obtained from the gradient information of the previous frame. If Grad _x,y >Grad _Thr is satisfied, it is determined that the pixel belongs to a texture-rich pixel; otherwise, it is determined that the pixel does not belong to a texture-rich pixel. Grad _Thr is the gradient determination threshold of texture-rich pixels obtained from the gradient information of the previous frame, see the subsequent step S14 for details. The region composed of texture-rich pixels inside the LCU is the texture-rich region inside the LCU.

Step S12: Calculate the gradient magnitude Grad _LCU of the texture-rich region inside the LCU, the calculation formula is as follows.

Grad _x,y >Grad _Thr . Wherein, M is the horizontal width of the LCU, N is the vertical height of the LCU, and the unit is the number of pixels. Grad _LCU is the sum of the gradient magnitudes of all texture-rich pixels inside the LCU, reflecting the gradient magnitude of the texture-rich region inside the LCU. When calculating the Grad _LCU of each LCU in the current frame, record the maximum value G _max for calculating the normalized value of the Grad _LCU of the LCU in the next frame

used when.

Step S13: Normalize the gradient magnitude Grad _LCU of the texture-rich area inside the LCU to obtain

The maximum value of the gradient magnitude of the texture-rich region inside the LCU in the previous frame is used as a normalized benchmark for the gradient magnitude of the texture-rich region inside the LCU in the current frame. normalized value

It is the texture perception factor of LCU, and the calculation formula is as follows.

in,

is the normalized value of Grad _LCU ,

The value of is between [0, ZZ] and is an integer, and the square brackets indicate that "equal to" is included. G _max is the maximum value of the Grad _LCU of each LCU in the previous frame. In each specific application scenario, ZZ is a fixed value. In order to facilitate the normalization operation of the hardware, floating-point operations are not introduced as much as possible, and integers between 0 and ZZ are used to represent the normalized range of 0 to 1, and ZZ represents the maximum value after normalization. Considering that the calculation amount should not be greatly increased on the basis of ensuring the calculation accuracy, the preferred value of ZZ is 127, 255, 511, or 1023, so as to facilitate hardware implementation.

Hereinafter, the texture perception factor of the LCU is abbreviated as K _T . The value of K _T is between [0, ZZ]. The larger the K _T value, the richer the texture inside the LCU, and the more attention the human subjective vision pays; the smaller the K _T value, the flatter the texture inside the LCU, and the less attention the human subjective vision pays.

Step S14: Using the pixel gradient mean value of the current frame to calculate the gradient determination threshold Grad _Thr of the texture-rich pixel in the next frame, the calculation formula is any one of the following. Grad _Thr = Grad _avg + Grad _offset or Grad _Thr = α×Grad _avg . Among them, Grad _offset is the gradient threshold adjustment deviation, and α is the gradient threshold adjustment multiplier. The values of these two values can be adjusted according to the user's sensitivity to the image texture area. Grad _avg is the average pixel gradient of the current frame, and the calculation formula is as follows.

Wherein, W is the horizontal width of the current image frame, H is the vertical height of the current image frame, and the unit is the number of pixels.

In the process of calculating the texture perception factor K _T of the LCU, the present invention has the following innovations. (1) Utilizing the correlation of image content between consecutive video frames, using the combination of the pixel gradient mean value of the previous frame and user adjustment parameters, as the gradient judgment threshold of texture-rich pixels in the current frame, to realize the LCU in the current frame Adaptive determination of whether the internal pixels belong to texture-rich pixels. (2) Utilizing the correlation of image content between consecutive video frames, the maximum value of the gradient amplitude of the texture-rich area inside the LCU counted in the previous frame is used as the normalization of the gradient amplitude of the texture-rich area inside the LCU in the current frame. The unified benchmark realizes the adaptive normalization of the gradient amplitude of the texture-rich area inside the LCU in the current frame, making the distribution of the gradient amplitude of each LCU in the current frame more reasonable, and making the distribution of K _T values more reasonable. Reasonable.

Please refer to FIG. 4 , the calculation of the motion perception factor of the LCU in the step S20 specifically includes the following steps.

Step S21: Calculate the frame difference amplitude Diff _{x, y} of each pixel in the LCU and the corresponding pixel of the LCU at the same position (that is, the same position) in the previous frame, and the frame difference judgment threshold of the motion-rich pixels obtained from the frame difference information of the previous frame To determine whether each pixel inside the LCU is a motion-rich pixel. The frame difference amplitude refers to the absolute value of the difference between the luminance values of two corresponding pixel points. In the frame difference (that is, inter-frame difference) method, the brightness value of the corresponding pixel is differenced, and then the difference needs to be taken as an absolute value and then used as the result to participate in subsequent calculations. If Diff _x,y >Diff _Thr is satisfied, it is determined that the pixel is a motion-rich pixel, otherwise it is determined that the pixel does not belong to a motion-rich pixel. Diff _Thr is the frame difference determination threshold of motion-rich pixels obtained from the frame difference information of the previous frame, see the subsequent step S25 for details. The region composed of motion-rich pixels inside the LCU is the motion-rich region inside the LCU.

Step S22: Calculate the frame difference amplitude value Diff _LCU of the motion-rich area inside the LCU, the calculation formula is as follows.

Diff _x,y >Diff _Thr . Wherein, M is the horizontal width of the LCU, N is the vertical height of the LCU, and the unit is the number of pixels. Diff _LCU is the sum of the frame difference amplitudes of all motion-rich pixels inside the LCU, reflecting the frame difference amplitude of the motion-rich area of the LCU. When calculating the Diff _LCU of each LCU in the current frame, the maximum value D _max is recorded for use in calculating the normalized value of the product of the Diff _LCU and the Area _LCU of the LCU in the next frame.

Step S23: Calculate the area ratio Area _LCU of the motion-rich area inside the LCU, and the calculation formula is as follows.

Among them, the value of Area _LCU is between [0, ZZ], and the corresponding area accounts for 0% to 100%. The meaning and preferred value of ZZ are the same as before, and will not be repeated here. M is the horizontal width of the LCU, N is the vertical height of the LCU, and the unit is the number of pixels.

Wherein, if the pixel is a motion-rich pixel, its corresponding Mov _{x, y} takes a value of 1, otherwise takes a value of 0. The logical meaning of the formula for calculating the Area _LCU is to calculate the ratio of the total number of all moving pixels in the LCU to the total number of all pixels in the LCU, and normalize the ratio. Mov _{x, y} = 1 indicates that the corresponding pixel is a motion pixel, so the sum of Mov _{x, y} can obtain the total number of all motion pixels inside the LCU.

The order of the steps S22 and S23 is not strictly limited, either one can be performed first, or they can be performed simultaneously.

Step S24: Calculate the normalized value of the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU

The maximum value of the frame difference amplitude of the motion-rich area inside the LCU in the previous frame is used as a normalized benchmark of the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU in the current frame. normalized value

It is the motion perception factor of the LCU, and the calculation formula is as follows.

in,

The value of is between [0, ZZ]. The meaning and preferred value of ZZ are the same as before, and will not be repeated here. D _max is the maximum value of the Diff _LCU of each LCU in the previous frame.

In the following, the motion perception factor of the LCU is abbreviated as K _M . The value of K _M is between [0, ZZ]. The larger the value of K _M , the richer the movement inside the LCU, and the more attention the human subjective vision pays; the smaller the value of K _M , the lighter the movement inside the LCU, and the less attention the human subjective vision pays.

Step S25: Calculate the frame difference decision threshold Diff _Thr of the motion-rich pixels in the next frame by using the frame difference amplitude mean value of the current frame, and the calculation formula is any one of the following. Diff _Thr =Diff _avg +Diff _offset or Diff _Thr =β×Diff _avg . Among them, Diff _offset is the motion threshold adjustment deviation, β is the motion threshold adjustment multiplier, and the values of these two values can be adjusted according to the sensitivity of the user to the motion area of the image. Diff _avg is the average frame difference amplitude of the current frame, and the calculation formula is as follows.

Wherein, W is the horizontal width of the image frame, H is the vertical height of the image frame, and the unit is the number of pixels.

In the process of calculating the motion perception factor K _M of the LCU, the present invention has the following innovations. (1) Utilize the correlation of image content between consecutive video frames, use the frame difference amplitude mean value of the previous frame statistics and user adjustment parameters, as the frame difference judgment threshold of the motion-rich pixels in the current frame, realize the current Adaptive determination of whether the pixels inside the LCU in the frame belong to motion-rich pixels. (2) Using the correlation of image content between consecutive video frames, use the maximum value of the frame difference amplitude of the LCU internal motion-rich area in the previous frame statistics as the frame difference amplitude of the LCU internal motion-rich area in the current frame The normalization benchmark of the product of the area ratio and the area ratio realizes the adaptive normalization of the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU in the current frame, so that the frame difference of each LCU in the current frame The amplitude distribution is more reasonable, and the value distribution of K _M is more reasonable. (3) The motion perception factor of the LCU is calculated by using the product of the frame difference amplitude and the area ratio of the motion-rich area of the LCU, which can more accurately reflect the visual attention of the human eye to the motion area.

In the step S30, the texture perception factor K _T of the LCU and the motion perception factor K _M of the LCU are synthesized, and the result of the synthesis is used as the bit allocation weight of the LCU. The formula for calculating the bit allocation weight of the LCU according to K _T and K _M is as follows. ω _LCU =μ _T ×K _T +μ _M ×K _M . Among them, ω _LCU assigns weights to the bits of the LCU, and its value is between [0, ZZ]. The meaning and preferred value of ZZ are the same as before, and will not be repeated here. μ _T is the weight coefficient of the LCU texture perception factor, and μ _M is the weight coefficient of the LCU motion perception factor, both of which satisfy μ _T + μ _M =1 and 0<μ _T <1 and 0<μ _M <1.

Since there is correlation in the content of consecutive video image frames, the sum of the bit allocation weights of all LCUs in the frame also has correlation for adjacent video image frames, and the sum of the bit allocation weights of all LCUs in the previous frame can be used To predict the sum of the bit allocation weights of all LCUs in the current frame. Therefore, in the step S30 of the present invention, after the bit allocation weight of the LCU in the current frame is calculated, the correlation on the image content between consecutive video frames will be used to use the LCU bit allocation weight of the previous frame statistics The sum replaces the sum of the bit allocation weights of the LCU in the current frame, and calculates the proportion of the bit allocation weight of the LCU in the current frame in the entire image frame in real time

Furthermore, the target number of coding bits of the LCU is calculated, which belongs to an innovation of the present invention.

The calculation formula of is as follows.

In this way, the real-time calculation of the target coding bit allocation of each LCU in the current frame can be realized, and there is no need to preprocess all LCUs in the entire frame before coding, so no frame-level coding delay is introduced. At the same time, since the bit allocation weight of the LCU is calculated by the LCU in the current frame, the bit allocation of the LCU is consistent with the subjective visual experience of the human eye on the current frame. If there are a few situations where the image content of the previous and subsequent frames is quite different, it will be adjusted by the bit allocation algorithm at the image frame level and GOP level of the HEVC code rate control algorithm.

The present invention proposes an HEVC code rate control method based on subjective visual perception of human eyes and suitable for hardware implementation, which has the following beneficial effects. (1) The code rate control method proposed by the present invention can make the bit allocation of the LCU in the image frame more in line with the characteristics of the subjective perception of the human eye, and improve the subjective visual quality of the human eye. (2) The present invention makes full use of the correlation on the image content between continuous video frames in the computing process of visual perception factor, makes the judgment of LCU texture-rich region and motion-rich region can be completed adaptively, and makes texture perception factor and The value distribution of the motion perception factor is more reasonable. (3) In the process of calculating the motion perception factor, the present invention calculates the motion perception factor of the LCU by using the product of the frame difference amplitude and the area ratio of the LCU motion-rich region, which can more accurately reflect the vision of the human eye to the motion region Attention. (4) In the process of LCU bit allocation, the present invention fully utilizes the correlation on the image content between continuous video frames, and realizes the real-time calculation of the proportion of LCU bit allocation weight in the whole image frame, without increasing the image content. In the case of the frame preprocessing stage, the bit allocation of the LCU conforms to the subjective visual experience of the human eye on the current frame. (5) The calculation method of the visual perception factor adopted by the present invention is simple and convenient, has a small amount of computation, occupies less bus bandwidth, and is suitable for hardware implementation. (6) In the present invention, the calculation of LCU texture perception factors and motion perception factors is carried out simultaneously with the encoding of LCU, and there is no need to introduce a preprocessing stage to calculate the visual perception factors of all LCUs in the frame before encoding the entire frame, without introducing Frame-level delay does not need to consume additional bus bandwidth and is suitable for hardware implementation.

In order to verify the beneficial effects of the present invention, the present invention selects three YUV video streams Johnny, FourPeople and KristenAndSara in the HEVC standard test sequence ClassE for testing. These three video streams are all typical video conferencing scenarios, and the subjective attention area of the human eye is the human face in the video stream. During the experiment, the encoder adopts the constant bit rate control mode, the encoding bit rate of Johnny and KristenAndSara is set to 600kbps, the encoding bit rate of FourPeople is set to 800kbps, the number of encoding frames is 120 frames, the encoding GOP structure is IPPP structure, and the P frame only refers to the previous one. frame. In the experiment, the code rate control algorithm based on the MAD value used in the HM to calculate the LCU bit allocation weight is used as a comparison benchmark, compared with the improved code rate control algorithm based on the subjective visual perception of the human eye proposed by the present invention, and three PSNR (Peak Signal-to-Noise Ratio, peak signal-to-noise ratio) of the face area in the image after encoding the YUV video stream. The unit of PSNR is decibel dB. The higher the PSNR, the smaller the image distortion and the better the image quality. The experimental results are shown in Table 1 below.

Table 1: The test comparison table of the present invention and existing code rate control method

In the three YUV video streams, compared with the background area, the face area belongs to both the motion-rich area and the texture-rich area; that is, compared with the background area, the face area has obvious motion characteristics and more obvious texture features. It can be seen from Table 1 that after applying the code rate control algorithm proposed by the present invention, under the condition that the overall average code rate and average PSNR of the encoded code stream do not change much, the PSNR of the face area is improved—Johnny improves by 0.38dB, KristenAndSara Increased by 0.38dB, FourPeople increased by 0.42dB, effectively enhancing the visual quality of human subjective perception.

The above are only preferred embodiments of the present invention, and are not intended to limit the present invention. Various modifications and variations of the present invention will occur to those skilled in the art. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims

A kind of code rate control method based on visual perception, it is characterized in that, comprises the steps:

Step S10: Calculate the gradient magnitude of the texture-rich region inside the LCU, and use the calculation result as the texture perception factor of the LCU;

Step S20: Calculate the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU, and use the calculation result as the motion perception factor of the LCU;

The sequence of step S10 and step S20 is either carried out before, or carried out at the same time;

Step S30: Calculate the bit allocation weight of the LCU according to the texture perception factor and the motion perception factor of the LCU, and then calculate the target number of coding bits of the LCU.
The code rate control method based on visual perception according to claim 1, wherein the texture perception factor of the LCU represents the visual sensitivity of the human eye to the texture feature of the LCU; the higher the value of the texture perception factor of the LCU A larger value means that the texture inside the LCU is richer, and the subjective vision of the human eye pays more attention to it; the smaller the value of the texture perception factor of the LCU, it means that the texture inside the LCU is flatter, and the subjective vision of the human eye is less concerned.
The code rate control method based on visual perception according to claim 1, wherein the motion perception factor of the LCU represents the visual sensitivity of the human eye to the LCU motion feature; the higher the value of the motion perception factor of the LCU A larger value means that the movement inside the LCU is more abundant, and the subjective vision of the human eye pays more attention to it; the smaller the value of the motion perception factor of the LCU, it means that the movement inside the LCU is lighter, and the subjective vision of the human eye is less concerned.
The code rate control method based on visual perception according to claim 1, wherein the calculation of the texture perception factor of the LCU in the step S10 specifically includes the following steps;

Step S11: Calculate the gradient magnitude Grad x, y of each pixel in the LCU, and judge whether each pixel in the LCU is a texture-rich pixel by using the gradient determination threshold of the texture-rich pixel obtained from the gradient information of the previous frame; The area composed of texture-rich pixels is the texture-rich area inside the LCU;

Step S12: Calculate the gradient amplitude Grad LCU of the texture-rich region inside the LCU; when calculating the Grad LCU of each LCU in the current frame, record the maximum value G max ;

Step S13: Normalize the gradient magnitude Grad LCU of the texture-rich area inside the LCU to obtain
Wherein the maximum value of the gradient magnitude of the LCU internal texture-rich region of the previous frame is used as the normalized benchmark of the gradient magnitude of the LCU internal texture-rich region in the current frame; the normalized value
It is the texture perception factor of LCU;

Step S14: Using the pixel gradient mean value of the current frame to calculate the gradient determination threshold Grad Thr of the texture-rich pixel in the next frame.
The code rate control method based on visual perception according to claim 4, wherein in the step S11, if Grad x,y >Grad Thr is satisfied, it is determined that the pixel belongs to a texture-rich pixel, otherwise it is determined that the pixel does not belong to Texture-rich pixels; Grad Thr is the gradient determination threshold of texture-rich pixels obtained from the gradient information of the previous frame.
The code rate control method based on visual perception according to claim 4, characterized in that, in the step S12,
Grad x,y >Grad Thr ; where M is the horizontal width of the LCU, N is the vertical height of the LCU, and Grad LCU is the sum of the gradient magnitudes of all texture-rich pixels inside the LCU.
The code rate control method based on visual perception according to claim 4, characterized in that, in the step S13,
Wherein, G max is the maximum value of the Grad LCU of each LCU in the previous frame;
The value of is between [0,ZZ] and is an integer, and an integer between 0 and ZZ is used to represent the normalized range of 0 to 1, and ZZ represents the maximum value after normalization.
The code rate control method based on visual perception according to claim 4, wherein, in the step S14, Grad Thr =Grad avg +Grad offset or Grad Thr =α×Grad avg ; wherein, Grad offset is a gradient threshold Adjust the deviation, α is the gradient threshold adjustment multiplier, and Grad avg is the average value of the pixel gradient of the current frame;

Wherein, W is the horizontal width of the current image frame, and H is the vertical height of the current image frame.
The code rate control method based on visual perception according to claim 1, wherein the calculation of the motion perception factor of the LCU in the step S20 specifically includes the following steps;

Step S21: Calculate the frame difference amplitude Diff x,y of each pixel in the LCU and the corresponding pixel of the same LCU in the previous frame, and judge the frame difference judgment threshold of the motion-rich pixels obtained from the frame difference information of the previous frame to judge the Whether each pixel is a motion-rich pixel; the area composed of motion-rich pixels inside the LCU is the motion-rich area inside the LCU;

Step S22: Calculate the frame difference amplitude value Diff LCU of the motion-rich area inside the LCU; when calculating the Diff LCU of each LCU in the current frame, record the maximum value D max ;

Step S23: Calculate the area ratio Area LCU of the motion-rich area inside the LCU;

The sequence of step S22 and step S23 is either carried out before, or carried out at the same time;

Step S24: Calculate the normalized value of the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU
The maximum value of the frame difference amplitude of the LCU internal motion-rich area of the previous frame is used as the normalized benchmark of the product of the frame difference amplitude and the area ratio of the LCU internal motion-rich area in the current frame; normalization value
It is the motion perception factor of LCU;

Step S25: Calculate the frame difference determination threshold Diff Thr of the motion-rich pixels in the next frame by using the frame difference amplitude mean value of the current frame.
The code rate control method based on visual perception according to claim 9, wherein in the step S21, if Diff x,y >Diff Thr is satisfied, it is determined that the pixel belongs to a motion-rich pixel, otherwise it is determined that the pixel does not belong to Motion-rich pixels; Diff Thr is the frame difference judgment threshold of motion-rich pixels obtained from the frame difference information of the previous frame.
The code rate control method based on visual perception according to claim 9, characterized in that, in the step S22,
Diff x,y >Diff Thr ; where M is the horizontal width of the LCU, N is the vertical height of the LCU, and Diff LCU is the sum of frame differences of all motion-rich pixels inside the LCU.
The code rate control method based on visual perception according to claim 9, characterized in that, in the step S23,
Among them, the value of Area LCU is between [0, ZZ] and is an integer, and the corresponding area accounts for 0% to 100%; M is the horizontal width of LCU, and N is the vertical height of LCU;

Wherein, if the pixel is a motion-rich pixel, its corresponding Mov x, y takes a value of 1, otherwise takes a value of 0.
The code rate control method based on visual perception according to claim 9, characterized in that, in the step S24,
in,
The value of is between [0, ZZ] and is an integer, using an integer between 0 and ZZ to represent the normalized range of 0 to 1, ZZ represents the maximum value after normalization; D max is the previous frame The maximum value of the Diff LCU of each LCU in .
The code rate control method based on visual perception according to claim 9, wherein, in the step S25, Diff Thr =Diff avg +Diff offset or Diff Thr =β×Diff avg ; wherein, Diff offset is a motion threshold Adjust the deviation, β is the motion threshold adjustment multiplier, and Diff avg is the average value of the frame difference amplitude of the current frame;
Wherein, W is the horizontal width of the image frame, and H is the vertical height of the image frame.
The code rate control method based on visual perception according to claim 1, wherein in the step S30, the formula for calculating the bit allocation weight of the LCU according to the texture perception factor K T of the LCU and the motion perception factor K M of the LCU As follows; ω LCU = μ T × K T + μ M × K M ; where, ω LCU is the bit allocation weight of LCU, and its value is between [0, ZZ] and is an integer, and the value between 0 and ZZ is used Integer to represent 0 to 1; μ T is the weight coefficient of LCU texture perception factor, μ M is the weight coefficient of LCU motion perception factor, both satisfy μ T + μ M = 1 and 0<μ T <1 and 0<μ M <1.
The code rate control method based on visual perception according to claim 1, wherein in the step S30, after calculating the bit allocation weight ω LCU of the LCU in the current frame, the LCU bit counted in the previous frame is used The sum of the allocation weights replaces the sum of the bit allocation weights of the LCUs in the current frame, and calculates the proportion of the bit allocation weights of the LCUs in the current frame in the entire image frame in real time
Then calculate the target code bit number of LCU;
The code rate control method based on visual perception according to any one of claims 7, 12, 13, 15, characterized in that ZZ is one of 127, 255, 511, or 1023.
A code rate control device based on visual perception, characterized in that it includes a texture perception factor calculation module, a motion perception factor calculation module and an LCU bit allocation module;

The texture perception factor calculation module is used to calculate the gradient magnitude of the LCU internal texture-rich region, and the calculation result is used as the texture perception factor of the LCU;

The motion perception factor calculation module is used to calculate the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU, and use the calculation result as the motion perception factor of the LCU;

The LCU bit allocation module calculates the bit allocation weight of the LCU according to the texture perception factor and the motion perception factor of the LCU, and then calculates the target number of coding bits of the LCU.