WO2023159965A1 - Visual perception-based rate control method and device - Google Patents

Visual perception-based rate control method and device Download PDF

Info

Publication number
WO2023159965A1
WO2023159965A1 PCT/CN2022/123742 CN2022123742W WO2023159965A1 WO 2023159965 A1 WO2023159965 A1 WO 2023159965A1 CN 2022123742 W CN2022123742 W CN 2022123742W WO 2023159965 A1 WO2023159965 A1 WO 2023159965A1
Authority
WO
WIPO (PCT)
Prior art keywords
lcu
motion
texture
rich
frame
Prior art date
Application number
PCT/CN2022/123742
Other languages
French (fr)
Chinese (zh)
Inventor
刘鹏飞
温安君
刘国正
Original Assignee
翱捷科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 翱捷科技股份有限公司 filed Critical 翱捷科技股份有限公司
Publication of WO2023159965A1 publication Critical patent/WO2023159965A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/196Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters
    • H04N19/198Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters including smoothing of a sequence of encoding parameters, e.g. by averaging, by choice of the maximum, minimum or median value
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/40Analysis of texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/004Predictors, e.g. intraframe, interframe coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/177Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/557Motion estimation characterised by stopping computation or iteration based on certain criteria, e.g. error magnitude being too large or early exit

Definitions

  • the present invention relates to a video encoding technology, in particular to a code rate control method and device based on visual perception and suitable for hardware implementation.
  • Video coding is a technology that compresses redundant components in video images and uses as little data as possible to represent video information.
  • HEVC High Efficiency Video Coding, also known as H.265
  • AVC Advanced Video Coding, also known as H.264
  • HEVC can reduce the coding bit rate by about 50%, and the video compression performance is improved compared with AVC about double. Due to the large amount of computation of the video encoding algorithm, in order to improve the video encoding speed, it has become a common practice in the industry to use an application specific integrated circuit (ASIC) to perform hardware acceleration on the video encoding process.
  • ASIC application specific integrated circuit
  • Video coding technology uses image block as the basic coding unit.
  • the basic coding unit is CU (Coding Unit, coding unit).
  • the CU may be an image block of 64 pixels ⁇ 64 pixels, 32 pixels ⁇ 32 pixels, 16 pixels ⁇ 16 pixels, or 8 pixels ⁇ 8 pixels.
  • the image block of 64 pixels ⁇ 64 pixels is also called LCU (Largest Coding Unit, the largest coding unit).
  • the channel bandwidth capacity used to transmit compressed video is limited. If the encoding bit rate of the compressed video is too high and exceeds the capacity of the channel bandwidth, it will cause video transmission congestion or even packet loss. If the encoding bit rate of the compressed video is too low, the channel bandwidth will not be fully utilized, and higher video quality cannot be obtained. Therefore, it is necessary to use rate control (Rate Control) technology to control the output bit rate of the video encoder to match the channel bandwidth capacity.
  • rate control Rate Control
  • the purpose of the rate control technology is to adjust the encoding parameters of the video encoder so that the output bit rate of the video encoder is equal to the preset target bit rate, and at the same time reduce encoding distortion as much as possible to improve the video encoding quality.
  • the rate control algorithm used is based on the JCTVC-K0103 proposal.
  • the JCTVC-K0103 proposal establishes a mathematical relationship model between the coding bit rate R and the Lagrangian multiplier ⁇ (ie, the R- ⁇ model), and realizes the bit rate control task through two links of target bit allocation and target bit control.
  • the target bit allocation is carried out at three levels, namely the GOP (Group Of Pictures, image group, that is, a set of time-continuous image frames) level, the image frame level, and the basic coding unit level.
  • the LCU is generally selected as the basic unit of the target bit allocation. Therefore, the target bit allocation at the basic coding unit level is usually also referred to as the target bit allocation at the LCU level.
  • the target encoding bit number of the current video frame to be encoded is determined, and the next step is to perform LCU level target bit allocation to determine the bit allocation of each LCU in the current video frame weight, and allocate the target number of coding bits for each LCU according to the bit allocation weight of each LCU.
  • the LCU-level bit allocation is performed according to the following formula.
  • T LCU_curr is the target number of coding bits allocated by the LCU currently to be encoded
  • T Pic is the target number of coding bits allocated to the current video frame to be encoded (video frame is also called an image frame)
  • Bit H is the pre-estimated video frame header information
  • the number of bits required Coded Pic is the actual number of coded bits of the encoded LCU in the current video frame to be encoded
  • ⁇ LCU_curr is the bit allocation weight of the current LCU to be encoded
  • ⁇ LCU indicates that it does not specifically refer to the bit allocation weight of a certain LCU
  • ⁇ ⁇ AllNotCodedLCUs ⁇ ⁇ LCU is the sum of the bit allocation weights of all uncoded LCUs in the current video frame to be coded.
  • the core of the bit allocation at the LCU level is the bit allocation weight ⁇ LCU of the LCU.
  • the bit allocation weight ⁇ LCU of the LCU is calculated based on the prediction error MAD value (Mean Absolute Differences, mean absolute difference) of the LCU at the same position (that is, the same position) in the previous coded frame.
  • the calculation formula is as follows. Among them, ⁇ LCU is the bit allocation weight of the LCU, and MAD LCU is the prediction error MAD value of the same LCU of the LCU in the previous coded frame.
  • N pixels is the number of pixels in the LCU
  • ⁇ ⁇ AllPixelsInLCU ⁇ is to accumulate all the pixels in the LCU
  • P org is the brightness value of the original pixel
  • P pred is the brightness value of the predicted pixel.
  • the human eye When watching video images, the human eye is affected by subjective perception, and pays more attention to areas with complex textures and rich details in video images, and pays less attention to flat areas with inconspicuous texture features in images. At the same time, when the human eye watches a video image, it also pays more attention to areas with intense movement and rich changes in the video image, and pays less attention to areas that are still in the image. In other words, in video images, motion-rich regions and texture-rich regions are more likely to attract the attention of human vision. Based on the visual perception characteristics of the human eye, it is necessary to allocate more target codes to the LCU in the motion-rich areas and texture-rich areas in the image frame when the target encoding bit rate of the image frame is fixed during the video encoding process.
  • the rate control algorithm of video coding needs to be able to detect motion-rich areas and texture-rich areas, and calculate the bit allocation weights at the LCU level accordingly, so that more target coding bits can be allocated to the areas located in human In the LCU in the eye-focused area, the visual quality of the subjective perception of the human eye can be improved.
  • the research on the HEVC code rate control technology based on the JCTVC-K0103 proposal shows that the bit allocation weight at the LCU level is calculated based on the prediction error MAD value of the same LCU in the previous encoded frame, and it only calculates the pixel from the perspective of signal processing.
  • the brightness difference between points does not take into account the subjective visual perception characteristics of the human eye, so it cannot achieve the purpose of allocating more target coding bits to the human eye attention area. Therefore, it is necessary to improve the LCU-level target bit allocation method in the HEVC bit rate control technology, select factors that can represent the human visual perception to calculate the LCU-level bit allocation weights, and provide motion-rich areas and texture-rich areas in the image frame.
  • the LCU allocates more target coding bits to achieve the purpose of improving the visual quality of human subjective perception.
  • the visual perception factor mentioned here needs to be able to represent the subjective attention of the human eye to the image content, and it needs to meet the following characteristics: (1) For the human eye attention area, such as the motion-rich area and the texture-rich area, the value of the visual perception factor Larger; and the larger the motion amplitude and texture complexity, the larger the value of the visual perception factor. (2) For areas that the human eye does not pay attention to, such as sparse motion areas and areas with simple textures, the value of the visual perception factor is small; and the smaller the motion amplitude and the simpler the texture, the smaller the value of the visual perception factor;
  • the first type of prior art solution needs to preprocess the entire frame of image before encoding the current frame.
  • the purpose of preprocessing is to calculate the bit allocation weight of each LCU in the current frame according to the selected visual perception factor, and then accumulate the sum of the bit allocation weights of all LCUs in the current frame, and then pass the bit allocation weight of each LCU in the current frame. Calculate the target number of coded bits of each LCU according to the ratio of the inside.
  • This type of technical solution needs to add a preprocessing stage before image encoding to calculate the bit allocation weights of all LCUs in the entire frame image before calculating the target number of encoding bits for each LCU.
  • the preprocessing of the entire frame of image takes a lot of time, and the larger the resolution of the image frame, the longer the preprocessing time will be, which will introduce a large frame-level coding delay.
  • the preprocessing and encoding need to read the image content from the memory separately, which will consume a lot of additional bus bandwidth. Therefore, this type of technical solution is suitable for software encoder implementation, not for hardware encoder implementation.
  • the second type of prior art solution utilizes the correlation between video frames, and uses the bit allocation weight of each LCU in the previous frame as the bit allocation weight of the same LCU in the current frame.
  • This type of scheme calculates the bit allocation weight corresponding to this LCU according to the visual perception factor during the encoding process of each LCU when encoding the previous frame.
  • the bit allocation weights of the LCUs in the previous frame are also calculated simultaneously.
  • the bit allocation weight of each LCU in the current frame adopts the bit allocation weight of the same LCU in the previous frame.
  • the technical problem to be solved by the present invention is to propose a HEVC code rate control method and device based on human subjective visual perception and suitable for hardware implementation.
  • Step S10 Calculate the gradient magnitude of the texture-rich area inside the LCU, and use the calculation result as the texture perception factor of the LCU.
  • Step S20 Calculate the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU, and use the calculation result as the motion perception factor of the LCU.
  • the sequence of step S10 and step S20 is either performed first, or performed simultaneously.
  • Step S30 Calculate the bit allocation weight of the LCU according to the texture perception factor and the motion perception factor of the LCU, and then calculate the target number of coding bits of the LCU.
  • the texture perception factor of the LCU represents the visual sensitivity of the human eye to the texture feature of the LCU; the larger the value of the texture perception factor of the LCU, the richer the texture inside the LCU, and the more attention the subjective vision of the human eye has; The smaller the value of the texture perception factor of the LCU, the smoother the texture inside the LCU, and the less attention is paid to the subjective vision of the human eye.
  • the motion perception factor of the LCU characterizes the visual sensitivity of the human eye to the motion characteristics of the LCU; the larger the value of the motion perception factor of the LCU, the richer the movement inside the LCU, and the more attention the human eye subjective vision pays; The smaller the value of the motion perception factor of the LCU, the lighter the motion inside the LCU, and the less attention is paid to the subjective vision of the human eye.
  • the calculation of the texture perception factor of the LCU in the step S10 specifically includes the following steps.
  • Step S11 Calculate the gradient magnitude Grad x, y of each pixel in the LCU, and judge whether each pixel in the LCU is a texture-rich pixel by using the gradient determination threshold of the texture-rich pixel obtained from the gradient information of the previous frame; The area composed of texture-rich pixels is the texture-rich area inside the LCU.
  • Step S12 Calculate the gradient magnitude Grad LCU of the texture-rich region inside the LCU; when calculating the Grad LCU of each LCU in the current frame, record the maximum value G max .
  • Step S13 Normalize the gradient magnitude Grad LCU of the texture-rich area inside the LCU to obtain Wherein the maximum value of the gradient magnitude of the LCU internal texture-rich region of the previous frame is used as the normalized benchmark of the gradient magnitude of the LCU internal texture-rich region in the current frame; the normalized value It is the texture perception factor of LCU.
  • Step S14 Using the pixel gradient mean value of the current frame to calculate the gradient determination threshold Grad Thr of the texture-rich pixel in the next frame.
  • step S11 if Grad x, y >Grad Thr is satisfied, it is determined that the pixel belongs to a texture-rich pixel, otherwise it is determined that the pixel does not belong to a texture-rich pixel;
  • Grad Thr is a texture-rich pixel obtained from the gradient information of the previous frame. Gradient decision threshold for pixels.
  • Step S12 Grad x,y >Grad Thr ; where M is the horizontal width of the LCU, N is the vertical height of the LCU, and Grad LCU is the sum of the gradient magnitudes of all texture-rich pixels inside the LCU.
  • G max is the maximum value of the Grad LCU of each LCU in the previous frame;
  • the value of is between [0,ZZ] and is an integer, and an integer between 0 and ZZ is used to represent the normalized range of 0 to 1, and ZZ represents the maximum value after normalization.
  • Grad offset is the gradient threshold adjustment deviation
  • is the gradient threshold adjustment multiplier
  • Grad avg is the current frame’s pixel gradient mean;
  • W is the horizontal width of the current image frame
  • H is the vertical height of the current image frame.
  • the calculation of the motion perception factor of the LCU in the step S20 specifically includes the following steps.
  • Step S21 Calculate the frame difference amplitude Diff x,y of each pixel in the LCU and the corresponding pixel of the same LCU in the previous frame, and judge the frame difference judgment threshold of the motion-rich pixels obtained from the frame difference information of the previous frame to judge the Whether each pixel is a motion-rich pixel; the area inside the LCU composed of motion-rich pixels is the motion-rich area inside the LCU.
  • Step S22 Calculate the frame difference amplitude value Diff LCU of the motion-rich area inside the LCU; when calculating the Diff LCU of each LCU in the current frame, record the maximum value D max .
  • Step S23 Calculate the area ratio Area LCU of the motion-rich area inside the LCU.
  • the sequence of step S22 and step S23 is either carried out first, or carried out at the same time.
  • Step S24 Calculate the normalized value of the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU The maximum value of the frame difference amplitude of the LCU internal motion-rich area of the previous frame is used as the normalized benchmark of the product of the frame difference amplitude and the area ratio of the LCU internal motion-rich area in the current frame; normalization value It is the motion perception factor of LCU.
  • Step S25 Calculate the frame difference determination threshold Diff Thr of the motion-rich pixels in the next frame by using the frame difference amplitude mean value of the current frame.
  • step S21 if Diff x, y > Diff Thr is satisfied, it is determined that the pixel belongs to a motion-rich pixel, otherwise it is determined that the pixel does not belong to a motion-rich pixel;
  • the frame difference decision threshold for rich pixels if Diff x, y > Diff Thr is satisfied, it is determined that the pixel belongs to a motion-rich pixel, otherwise it is determined that the pixel does not belong to a motion-rich pixel;
  • the frame difference decision threshold for rich pixels if Diff x, y > Diff Thr is satisfied, it is determined that the pixel belongs to a motion-rich pixel, otherwise it is determined that the pixel does not belong to a motion-rich pixel;
  • the frame difference decision threshold for rich pixels if Diff x, y > Diff Thr is satisfied, it is determined that the pixel belongs to a motion-rich pixel, otherwise it is determined that the pixel does not belong to a motion-rich pixel;
  • Diff x,y >Diff Thr M is the horizontal width of the LCU, N is the vertical height of the LCU, and Diff LCU is the sum of frame differences of all motion-rich pixels inside the LCU.
  • the value of Area LCU is between [0, ZZ] and is an integer, and the corresponding area accounts for 0% to 100%; M is the horizontal width of LCU, and N is the vertical height of LCU; Wherein, if the pixel is a motion-rich pixel, its corresponding Mov x, y takes a value of 1, otherwise takes a value of 0.
  • the value of is between [0, ZZ] and is an integer, using an integer between 0 and ZZ to represent the normalized range of 0 to 1, ZZ represents the maximum value after normalization; D max is the previous frame The maximum value of the Diff LCU of each LCU in .
  • Diff offset is the motion threshold adjustment deviation
  • is the motion threshold adjustment multiplier
  • Diff avg is the current frame Mean frame difference amplitude;
  • W is the horizontal width of the image frame
  • H is the vertical height of the image frame.
  • step S30 after the bit allocation weight ⁇ LCU of the LCU in the current frame is calculated, the sum of the LCU bit allocation weights in the previous frame is used to replace the sum of the bit allocation weights of the LCU in the current frame, and the real-time Calculate the proportion of the bit allocation weight of the LCU in the current frame in the entire image frame Then calculate the target code bit number of LCU;
  • the value of ZZ is one of 127, 255, 511, or 1023.
  • the present application also proposes a code rate control device based on visual perception, which includes a texture perception factor calculation module, a motion perception factor calculation module and an LCU bit allocation module.
  • the texture perception factor calculation module is used to calculate the gradient magnitude of the texture-rich area inside the LCU, and use the calculation result as the texture perception factor of the LCU.
  • the motion perception factor calculation module is used to calculate the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU, and use the calculation result as the motion perception factor of the LCU.
  • the LCU bit allocation module calculates the bit allocation weight of the LCU according to the texture perception factor and the motion perception factor of the LCU, and then calculates the target number of coding bits of the LCU.
  • the technical effect achieved by the invention is to make the target bit allocation of the LCU in the image frame more in line with the characteristics of the subjective perception of the human eye, improve the subjective visual quality of the human eye, and be suitable for hardware implementation.
  • FIG. 1 is a schematic flow chart of a code rate control method proposed by the present invention.
  • FIG. 2 is a schematic structural diagram of a code rate control device proposed by the present invention.
  • Fig. 3 is a schematic flow chart of calculating the texture perception factor of the LCU in step S10.
  • FIG. 4 is a schematic flow chart of calculating the motion perception factor of the LCU in step S20.
  • 10 is a texture perception factor calculation module
  • 20 is a motion perception factor calculation module
  • 30 is an LCU bit allocation module.
  • the code rate control method proposed by the present invention includes the following steps.
  • Step S10 Calculate the gradient magnitude of the texture-rich region inside the LCU, and use the calculation result as the texture perception factor of the LCU.
  • the texture perception factor represents the visual sensitivity of the human eye to the texture feature of the LCU.
  • Step S20 Calculate the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU, and use the calculation result as the motion perception factor of the LCU.
  • the motion perception factor represents the degree of visual sensitivity of the human eye to the motion characteristics of the LCU.
  • the product of the frame difference magnitude and the area ratio of the motion-rich area of the LCU is used to calculate the motion perception factor of the LCU, which belongs to an innovation of the present invention.
  • the inventor found through experiments that if only the frame difference amplitude of the motion-rich area inside the LCU is used to represent the motion richness of the LCU, this method is sensitive to small-area motion and is easily affected by sensor noise, resulting in inaccurate judgment results.
  • steps S10 and S20 are not strictly limited, either one can be performed first, or they can be performed simultaneously.
  • Step S30 Calculate the bit allocation weight of the LCU according to the texture perception factor and the motion perception factor of the LCU, and then calculate the target number of coding bits of the LCU.
  • the bit allocation of the LCU in the image frame can be more in line with the characteristics of the subjective perception of the human eye, and the subjective visual quality of the human eye can be improved.
  • the code rate control device proposed by the present invention includes a texture perception factor calculation module 10 , a motion perception factor calculation module 20 and an LCU bit allocation module 30 , which generally correspond to the code rate control method shown in FIG. 1 .
  • the texture perception factor calculation module 10 is used to calculate the gradient magnitude of the texture-rich region inside the LCU, and use the calculation result as the texture perception factor of the LCU.
  • the motion perception factor calculation module 20 is used to calculate the product of the frame difference magnitude and the area ratio of the motion-rich area inside the LCU, and use the calculation result as the motion perception factor of the LCU.
  • the LCU bit allocation module 30 calculates the bit allocation weight of the LCU according to the texture perception factor and the motion perception factor of the LCU, and then calculates the target number of coding bits of the LCU.
  • the calculation of the texture perception factor of the LCU in the step S10 specifically includes the following steps.
  • Step S11 Calculate the gradient magnitude Grad x,y of each pixel in the LCU, and determine whether each pixel in the LCU is a texture-rich pixel based on the gradient determination threshold of the texture-rich pixel obtained from the gradient information of the previous frame. If Grad x,y >Grad Thr is satisfied, it is determined that the pixel belongs to a texture-rich pixel; otherwise, it is determined that the pixel does not belong to a texture-rich pixel.
  • Grad Thr is the gradient determination threshold of texture-rich pixels obtained from the gradient information of the previous frame, see the subsequent step S14 for details.
  • the region composed of texture-rich pixels inside the LCU is the texture-rich region inside the LCU.
  • Step S12 Calculate the gradient magnitude Grad LCU of the texture-rich region inside the LCU, the calculation formula is as follows. Grad x,y >Grad Thr . Wherein, M is the horizontal width of the LCU, N is the vertical height of the LCU, and the unit is the number of pixels. Grad LCU is the sum of the gradient magnitudes of all texture-rich pixels inside the LCU, reflecting the gradient magnitude of the texture-rich region inside the LCU. When calculating the Grad LCU of each LCU in the current frame, record the maximum value G max for calculating the normalized value of the Grad LCU of the LCU in the next frame used when.
  • Step S13 Normalize the gradient magnitude Grad LCU of the texture-rich area inside the LCU to obtain The maximum value of the gradient magnitude of the texture-rich region inside the LCU in the previous frame is used as a normalized benchmark for the gradient magnitude of the texture-rich region inside the LCU in the current frame.
  • normalized value It is the texture perception factor of LCU, and the calculation formula is as follows. in, is the normalized value of Grad LCU , The value of is between [0, ZZ] and is an integer, and the square brackets indicate that "equal to" is included.
  • G max is the maximum value of the Grad LCU of each LCU in the previous frame. In each specific application scenario, ZZ is a fixed value.
  • ZZ represents the maximum value after normalization.
  • the preferred value of ZZ is 127, 255, 511, or 1023, so as to facilitate hardware implementation.
  • K T the texture perception factor of the LCU.
  • the value of K T is between [0, ZZ]. The larger the K T value, the richer the texture inside the LCU, and the more attention the human subjective vision pays; the smaller the K T value, the flatter the texture inside the LCU, and the less attention the human subjective vision pays.
  • Step S14 Using the pixel gradient mean value of the current frame to calculate the gradient determination threshold Grad Thr of the texture-rich pixel in the next frame, the calculation formula is any one of the following.
  • Grad offset is the gradient threshold adjustment deviation
  • is the gradient threshold adjustment multiplier. The values of these two values can be adjusted according to the user's sensitivity to the image texture area.
  • Grad avg is the average pixel gradient of the current frame, and the calculation formula is as follows. Wherein, W is the horizontal width of the current image frame, H is the vertical height of the current image frame, and the unit is the number of pixels.
  • the present invention has the following innovations. (1) Utilizing the correlation of image content between consecutive video frames, using the combination of the pixel gradient mean value of the previous frame and user adjustment parameters, as the gradient judgment threshold of texture-rich pixels in the current frame, to realize the LCU in the current frame Adaptive determination of whether the internal pixels belong to texture-rich pixels. (2) Utilizing the correlation of image content between consecutive video frames, the maximum value of the gradient amplitude of the texture-rich area inside the LCU counted in the previous frame is used as the normalization of the gradient amplitude of the texture-rich area inside the LCU in the current frame.
  • the unified benchmark realizes the adaptive normalization of the gradient amplitude of the texture-rich area inside the LCU in the current frame, making the distribution of the gradient amplitude of each LCU in the current frame more reasonable, and making the distribution of K T values more reasonable. Reasonable.
  • the calculation of the motion perception factor of the LCU in the step S20 specifically includes the following steps.
  • Step S21 Calculate the frame difference amplitude Diff x, y of each pixel in the LCU and the corresponding pixel of the LCU at the same position (that is, the same position) in the previous frame, and the frame difference judgment threshold of the motion-rich pixels obtained from the frame difference information of the previous frame To determine whether each pixel inside the LCU is a motion-rich pixel.
  • the frame difference amplitude refers to the absolute value of the difference between the luminance values of two corresponding pixel points.
  • the frame difference that is, inter-frame difference
  • the difference needs to be taken as an absolute value and then used as the result to participate in subsequent calculations.
  • Diff Thr is the frame difference determination threshold of motion-rich pixels obtained from the frame difference information of the previous frame, see the subsequent step S25 for details.
  • the region composed of motion-rich pixels inside the LCU is the motion-rich region inside the LCU.
  • Step S22 Calculate the frame difference amplitude value Diff LCU of the motion-rich area inside the LCU, the calculation formula is as follows. Diff x,y >Diff Thr . Wherein, M is the horizontal width of the LCU, N is the vertical height of the LCU, and the unit is the number of pixels. Diff LCU is the sum of the frame difference amplitudes of all motion-rich pixels inside the LCU, reflecting the frame difference amplitude of the motion-rich area of the LCU. When calculating the Diff LCU of each LCU in the current frame, the maximum value D max is recorded for use in calculating the normalized value of the product of the Diff LCU and the Area LCU of the LCU in the next frame.
  • Step S23 Calculate the area ratio Area LCU of the motion-rich area inside the LCU, and the calculation formula is as follows. Among them, the value of Area LCU is between [0, ZZ], and the corresponding area accounts for 0% to 100%. The meaning and preferred value of ZZ are the same as before, and will not be repeated here. M is the horizontal width of the LCU, N is the vertical height of the LCU, and the unit is the number of pixels. Wherein, if the pixel is a motion-rich pixel, its corresponding Mov x, y takes a value of 1, otherwise takes a value of 0.
  • the logical meaning of the formula for calculating the Area LCU is to calculate the ratio of the total number of all moving pixels in the LCU to the total number of all pixels in the LCU, and normalize the ratio.
  • Mov x, y 1 indicates that the corresponding pixel is a motion pixel, so the sum of Mov x, y can obtain the total number of all motion pixels inside the LCU.
  • steps S22 and S23 are not strictly limited, either one can be performed first, or they can be performed simultaneously.
  • Step S24 Calculate the normalized value of the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU
  • the maximum value of the frame difference amplitude of the motion-rich area inside the LCU in the previous frame is used as a normalized benchmark of the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU in the current frame.
  • normalized value It is the motion perception factor of the LCU, and the calculation formula is as follows. in, The value of is between [0, ZZ]. The meaning and preferred value of ZZ are the same as before, and will not be repeated here.
  • D max is the maximum value of the Diff LCU of each LCU in the previous frame.
  • K M the motion perception factor of the LCU.
  • the value of K M is between [0, ZZ]. The larger the value of K M , the richer the movement inside the LCU, and the more attention the human subjective vision pays; the smaller the value of K M , the lighter the movement inside the LCU, and the less attention the human subjective vision pays.
  • Step S25 Calculate the frame difference decision threshold Diff Thr of the motion-rich pixels in the next frame by using the frame difference amplitude mean value of the current frame, and the calculation formula is any one of the following.
  • Diff offset is the motion threshold adjustment deviation
  • is the motion threshold adjustment multiplier
  • the values of these two values can be adjusted according to the sensitivity of the user to the motion area of the image.
  • Diff avg is the average frame difference amplitude of the current frame
  • the calculation formula is as follows. Wherein, W is the horizontal width of the image frame, H is the vertical height of the image frame, and the unit is the number of pixels.
  • the present invention has the following innovations. (1) Utilize the correlation of image content between consecutive video frames, use the frame difference amplitude mean value of the previous frame statistics and user adjustment parameters, as the frame difference judgment threshold of the motion-rich pixels in the current frame, realize the current Adaptive determination of whether the pixels inside the LCU in the frame belong to motion-rich pixels.
  • the texture perception factor K T of the LCU and the motion perception factor K M of the LCU are synthesized, and the result of the synthesis is used as the bit allocation weight of the LCU.
  • ⁇ T is the weight coefficient of the LCU texture perception factor
  • the sum of the bit allocation weights of all LCUs in the frame also has correlation for adjacent video image frames, and the sum of the bit allocation weights of all LCUs in the previous frame can be used To predict the sum of the bit allocation weights of all LCUs in the current frame.
  • the correlation on the image content between consecutive video frames will be used to use the LCU bit allocation weight of the previous frame statistics
  • the sum replaces the sum of the bit allocation weights of the LCU in the current frame, and calculates the proportion of the bit allocation weight of the LCU in the current frame in the entire image frame in real time
  • the target number of coding bits of the LCU is calculated, which belongs to an innovation of the present invention.
  • the calculation formula of is as follows.
  • the real-time calculation of the target coding bit allocation of each LCU in the current frame can be realized, and there is no need to preprocess all LCUs in the entire frame before coding, so no frame-level coding delay is introduced.
  • the bit allocation weight of the LCU is calculated by the LCU in the current frame, the bit allocation of the LCU is consistent with the subjective visual experience of the human eye on the current frame. If there are a few situations where the image content of the previous and subsequent frames is quite different, it will be adjusted by the bit allocation algorithm at the image frame level and GOP level of the HEVC code rate control algorithm.
  • the present invention proposes an HEVC code rate control method based on subjective visual perception of human eyes and suitable for hardware implementation, which has the following beneficial effects.
  • the code rate control method proposed by the present invention can make the bit allocation of the LCU in the image frame more in line with the characteristics of the subjective perception of the human eye, and improve the subjective visual quality of the human eye.
  • the present invention makes full use of the correlation on the image content between continuous video frames in the computing process of visual perception factor, makes the judgment of LCU texture-rich region and motion-rich region can be completed adaptively, and makes texture perception factor and The value distribution of the motion perception factor is more reasonable.
  • the present invention calculates the motion perception factor of the LCU by using the product of the frame difference amplitude and the area ratio of the LCU motion-rich region, which can more accurately reflect the vision of the human eye to the motion region Attention.
  • the present invention fully utilizes the correlation on the image content between continuous video frames, and realizes the real-time calculation of the proportion of LCU bit allocation weight in the whole image frame, without increasing the image content.
  • the bit allocation of the LCU conforms to the subjective visual experience of the human eye on the current frame.
  • the calculation method of the visual perception factor adopted by the present invention is simple and convenient, has a small amount of computation, occupies less bus bandwidth, and is suitable for hardware implementation.
  • the calculation of LCU texture perception factors and motion perception factors is carried out simultaneously with the encoding of LCU, and there is no need to introduce a preprocessing stage to calculate the visual perception factors of all LCUs in the frame before encoding the entire frame, without introducing Frame-level delay does not need to consume additional bus bandwidth and is suitable for hardware implementation.
  • the present invention selects three YUV video streams Johnny, FourPeople and KristenAndSara in the HEVC standard test sequence ClassE for testing. These three video streams are all typical video conferencing scenarios, and the subjective attention area of the human eye is the human face in the video stream.
  • the encoder adopts the constant bit rate control mode, the encoding bit rate of Johnny and KristenAndSara is set to 600kbps, the encoding bit rate of FourPeople is set to 800kbps, the number of encoding frames is 120 frames, the encoding GOP structure is IPPP structure, and the P frame only refers to the previous one. frame.
  • the code rate control algorithm based on the MAD value used in the HM to calculate the LCU bit allocation weight is used as a comparison benchmark, compared with the improved code rate control algorithm based on the subjective visual perception of the human eye proposed by the present invention, and three PSNR (Peak Signal-to-Noise Ratio, peak signal-to-noise ratio) of the face area in the image after encoding the YUV video stream.
  • the unit of PSNR is decibel dB. The higher the PSNR, the smaller the image distortion and the better the image quality.
  • Table 1 The experimental results are shown in Table 1 below.
  • Table 1 The test comparison table of the present invention and existing code rate control method
  • the face area belongs to both the motion-rich area and the texture-rich area; that is, compared with the background area, the face area has obvious motion characteristics and more obvious texture features. It can be seen from Table 1 that after applying the code rate control algorithm proposed by the present invention, under the condition that the overall average code rate and average PSNR of the encoded code stream do not change much, the PSNR of the face area is improved—Johnny improves by 0.38dB, KristenAndSara Increased by 0.38dB, FourPeople increased by 0.42dB, effectively enhancing the visual quality of human subjective perception.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Disclosed in the present invention is a visual perception-based rate control method, comprising the following steps: step S10, calculating the gradient magnitude of a texture-rich region in an LCU, and taking a calculation result as a texture perception factor of the LCU; step S20, calculating the product of the frame difference magnitude and an area proportion of a motion-rich region in the LCU, and taking a calculation result as a motion perception factor of the LCU, wherein Step S10 and step S20 are carried out sequentially or simultaneously; and step S30, calculating a bit allocation weight of the LCU according to the texture perception factor and the motion perception factor of the LCU, and further calculating the number of target coding bits of the LCU. The technical effect achieved by the present invention is that the target bit allocation of the LCU in an image frame better conforms to the subjective perception features of human eyes, the subjective visual quality of human eyes is improved, and the method is suitable for being realized by adopting hardware.

Description

一种基于视觉感知的码率控制方法及装置A code rate control method and device based on visual perception 技术领域technical field
本发明涉及一种视频编码技术,特别是涉及一种基于视觉感知的、适合于硬件实现的码率控制方法及装置。The present invention relates to a video encoding technology, in particular to a code rate control method and device based on visual perception and suitable for hardware implementation.
背景技术Background technique
视频编码是一种通过压缩视频图像中的冗余成分,并使用尽可能少的数据来表征视频信息的技术。HEVC(High Efficiency Video Coding,又称H.265)是新一代的视频编码标准。相比于上一代的视频编码标准AVC(Advanced Video Coding,又称H.264),在达到相同视频编码质量的前提下,HEVC可以减少约50%的编码比特率,视频压缩性能较AVC提升了约一倍。由于视频编码算法的运算量很大,为了提高视频编码速度,使用专用集成电路(ASIC)对视频编码过程进行硬件加速成为业界的通用做法。Video coding is a technology that compresses redundant components in video images and uses as little data as possible to represent video information. HEVC (High Efficiency Video Coding, also known as H.265) is a new generation of video coding standards. Compared with the previous generation video coding standard AVC (Advanced Video Coding, also known as H.264), under the premise of achieving the same video coding quality, HEVC can reduce the coding bit rate by about 50%, and the video compression performance is improved compared with AVC about double. Due to the large amount of computation of the video encoding algorithm, in order to improve the video encoding speed, it has become a common practice in the industry to use an application specific integrated circuit (ASIC) to perform hardware acceleration on the video encoding process.
视频编码技术以图像块作为基本编码单元。在HEVC中,基本编码单元是CU(Coding Unit,编码单元)。CU可以为64像素×64像素、32像素×32像素、16像素×16像素、8像素×8像素的图像块。其中64像素×64像素的图像块又称为LCU(Largest Coding Unit,最大编码单元)。Video coding technology uses image block as the basic coding unit. In HEVC, the basic coding unit is CU (Coding Unit, coding unit). The CU may be an image block of 64 pixels×64 pixels, 32 pixels×32 pixels, 16 pixels×16 pixels, or 8 pixels×8 pixels. The image block of 64 pixels×64 pixels is also called LCU (Largest Coding Unit, the largest coding unit).
在实际应用中,用来传输压缩视频的信道带宽容量是有限的。如果压缩视频的编码比特率过高,超出了信道带宽的容量,就会造成视频传输拥塞甚至丢包。如果压缩视频的编码比特率过低,又会导致信道带宽没有得到充分利用,也无法获得更高的视频质量。因此,有必要使用码率控制(Rate Control)技术,对视频编码器的输出码率进行控制,使之与信道带宽容量相匹配。In practical applications, the channel bandwidth capacity used to transmit compressed video is limited. If the encoding bit rate of the compressed video is too high and exceeds the capacity of the channel bandwidth, it will cause video transmission congestion or even packet loss. If the encoding bit rate of the compressed video is too low, the channel bandwidth will not be fully utilized, and higher video quality cannot be obtained. Therefore, it is necessary to use rate control (Rate Control) technology to control the output bit rate of the video encoder to match the channel bandwidth capacity.
码率控制技术的目的就是通过调整视频编码器的编码参数,使视频编码器的输出码率等于预先设置的目标码率,同时尽可能减少编码失真以提升视频编码质量。目前在HEVC参考编码器HM(HEVC Test Model,HEVC测试模型)中,采用的码率控制算法基于JCTVC-K0103提案。JCTVC-K0103提案建立了编码比特率R与拉格朗日乘子λ的数学关系模型(即R-λ模型),通过目标比特分配和目标比特控制两个环节来实现码率控制任务。The purpose of the rate control technology is to adjust the encoding parameters of the video encoder so that the output bit rate of the video encoder is equal to the preset target bit rate, and at the same time reduce encoding distortion as much as possible to improve the video encoding quality. Currently in the HEVC reference encoder HM (HEVC Test Model, HEVC Test Model), the rate control algorithm used is based on the JCTVC-K0103 proposal. The JCTVC-K0103 proposal establishes a mathematical relationship model between the coding bit rate R and the Lagrangian multiplier λ (ie, the R-λ model), and realizes the bit rate control task through two links of target bit allocation and target bit control.
JCTVC-K0103提案中,目标比特分配在三个层次进行,分别是GOP(Group Of Pictures,图像组,即一组时间连续的图像帧的集合)级别、图像帧级别、基本编码单元级别。为了减少运算复杂度,在基本编码单元级别的目标比特分配中,一般选择LCU作为目标比特分配的基本单元。因此,基本编码单元级别的目标比特分配通常又称为LCU级别目标比特分配。In the JCTVC-K0103 proposal, the target bit allocation is carried out at three levels, namely the GOP (Group Of Pictures, image group, that is, a set of time-continuous image frames) level, the image frame level, and the basic coding unit level. In order to reduce the computational complexity, in the target bit allocation at the basic coding unit level, the LCU is generally selected as the basic unit of the target bit allocation. Therefore, the target bit allocation at the basic coding unit level is usually also referred to as the target bit allocation at the LCU level.
在经过GOP级别和图像帧级别的目标比特分配之后,当前待编码视频帧的目标编码比特 数就被确定下来,下一步要进行LCU级别目标比特分配,以决定当前视频帧内部各个LCU的比特分配权重,并依据每个LCU的比特分配权重为各个LCU分配目标编码比特数。在JCTVC-K0103提案中,LCU级别比特分配依据如下公式进行。
Figure PCTCN2022123742-appb-000001
Figure PCTCN2022123742-appb-000002
其中,T LCU_curr为当前待编码LCU分配的目标编码比特数,T Pic为当前待编码视频帧(视频帧也称图像帧)分配的目标编码比特数,Bit H为事先预估的视频帧头信息需要的比特数,Coded Pic为当前待编码视频帧中已编码的LCU的实际编码比特数,ω LCU_curr为当前待编码LCU的比特分配权重,ω LCU表示不特指某一个LCU的比特分配权重,∑ {AllNotCodedLCUs}ω LCU为当前待编码视频帧中所有未编码LCU的比特分配权重之和。
After the target bit allocation at the GOP level and the image frame level, the target encoding bit number of the current video frame to be encoded is determined, and the next step is to perform LCU level target bit allocation to determine the bit allocation of each LCU in the current video frame weight, and allocate the target number of coding bits for each LCU according to the bit allocation weight of each LCU. In the JCTVC-K0103 proposal, the LCU-level bit allocation is performed according to the following formula.
Figure PCTCN2022123742-appb-000001
Figure PCTCN2022123742-appb-000002
Among them, T LCU_curr is the target number of coding bits allocated by the LCU currently to be encoded, T Pic is the target number of coding bits allocated to the current video frame to be encoded (video frame is also called an image frame), and Bit H is the pre-estimated video frame header information The number of bits required, Coded Pic is the actual number of coded bits of the encoded LCU in the current video frame to be encoded, ω LCU_curr is the bit allocation weight of the current LCU to be encoded, ω LCU indicates that it does not specifically refer to the bit allocation weight of a certain LCU, ∑ {AllNotCodedLCUs} ω LCU is the sum of the bit allocation weights of all uncoded LCUs in the current video frame to be coded.
从上式中可以发现,LCU级别比特分配的核心是LCU的比特分配权重ω LCU。在计算出当前待编码视频帧内所有LCU的比特分配权重之后,通过累加各个未编码LCU的比特分配权重,就可以得到当前帧内所有未编码LCU的比特分配权重之和。然后通过各个LCU的比特分配权重在当前帧内所有未编码LCU的比特分配权重之和中的占比,就可以计算出各个LCU的目标编码比特数,完成LCU级别的目标比特分配。 It can be found from the above formula that the core of the bit allocation at the LCU level is the bit allocation weight ω LCU of the LCU. After calculating the bit allocation weights of all LCUs in the current video frame to be encoded, the sum of the bit allocation weights of all unencoded LCUs in the current frame can be obtained by accumulating the bit allocation weights of each unencoded LCU. Then, according to the proportion of the bit allocation weight of each LCU in the sum of the bit allocation weights of all unencoded LCUs in the current frame, the target number of encoded bits of each LCU can be calculated, and the target bit allocation at the LCU level can be completed.
在JCTVC-K0103提案中,LCU的比特分配权重ω LCU依据前一编码帧同位(即相同位置)LCU的预测误差MAD值(Mean Absolute Differences,平均绝对差)来进行计算,其计算公式如下。
Figure PCTCN2022123742-appb-000003
其中,ω LCU为LCU的比特分配权重,MAD LCU为LCU在前一编码帧中的同位LCU的预测误差MAD值。
Figure PCTCN2022123742-appb-000004
其中,N pixels为LCU中像素点的数目,∑ {AllPixelsInLCU}是对LCU中的所有像素点进行累加,P org为原始像素点的亮度值,P pred为预测像素点的亮度值。
In the JCTVC-K0103 proposal, the bit allocation weight ω LCU of the LCU is calculated based on the prediction error MAD value (Mean Absolute Differences, mean absolute difference) of the LCU at the same position (that is, the same position) in the previous coded frame. The calculation formula is as follows.
Figure PCTCN2022123742-appb-000003
Among them, ω LCU is the bit allocation weight of the LCU, and MAD LCU is the prediction error MAD value of the same LCU of the LCU in the previous coded frame.
Figure PCTCN2022123742-appb-000004
Among them, N pixels is the number of pixels in the LCU, ∑ {AllPixelsInLCU} is to accumulate all the pixels in the LCU, P org is the brightness value of the original pixel, and P pred is the brightness value of the predicted pixel.
人眼在观看视频图像时,受到主观感知的影响,更侧重于关注视频图像中纹理复杂、细节丰富的区域,对于图像中纹理特征不明显的平坦区域关注较少。同时,人眼在观看视频图像时,也更侧重于关注视频图像中的运动剧烈、变化丰富的区域,对于图像中静止不动的区域关注较少。换句话说,在视频图像中,运动丰富区域和纹理丰富区域更容易引起人眼视觉的关注。基于人眼的这种视觉感知特征,有必要在视频编码的过程中,在图像帧目标编码比特率固定的情况下,为图像帧中运动丰富区域和纹理丰富区域的LCU分配更多的目标编码比特数,以提升人眼主观感知的视觉质量。为达到这个目的,就需要视频编码的码率控制算法能够检测出运动丰富区域和纹理丰富区域,并据此来计算LCU级别的比特分配权重,以使更多的目标编码比特数分配到位于人眼关注区域内的LCU中,进而提升人眼主观感知的视觉质量。When watching video images, the human eye is affected by subjective perception, and pays more attention to areas with complex textures and rich details in video images, and pays less attention to flat areas with inconspicuous texture features in images. At the same time, when the human eye watches a video image, it also pays more attention to areas with intense movement and rich changes in the video image, and pays less attention to areas that are still in the image. In other words, in video images, motion-rich regions and texture-rich regions are more likely to attract the attention of human vision. Based on the visual perception characteristics of the human eye, it is necessary to allocate more target codes to the LCU in the motion-rich areas and texture-rich areas in the image frame when the target encoding bit rate of the image frame is fixed during the video encoding process. The number of bits to improve the visual quality perceived by the human eye. To achieve this goal, the rate control algorithm of video coding needs to be able to detect motion-rich areas and texture-rich areas, and calculate the bit allocation weights at the LCU level accordingly, so that more target coding bits can be allocated to the areas located in human In the LCU in the eye-focused area, the visual quality of the subjective perception of the human eye can be improved.
对基于JCTVC-K0103提案的HEVC码率控制技术的研究可以发现,LCU级别的比特分配权 重是依据前一编码帧同位LCU的预测误差MAD值来进行计算的,它仅仅从信号处理的角度计算像素点之间的亮度差异,并未考虑人眼的主观视觉感知特征,无法达到为人眼关注区域分配更多目标编码比特的目的。因此,有必要对HEVC码率控制技术中的LCU级别目标比特分配方式进行改良,选取能够表征人眼视觉感知的因子来计算LCU级别的比特分配权重,为图像帧中运动丰富区域和纹理丰富区域的LCU分配更多的目标编码比特数,以达到提升人眼主观感知的视觉质量的目的。这里提到的视觉感知因子需要能够表征人眼对图像内容的主观关注程度,它需要满足如下特征:(1)对于人眼关注区域,如运动丰富区域和纹理丰富区域,视觉感知因子的取值较大;且运动幅度和纹理复杂度越大,视觉感知因子的取值越大。(2)对于人眼不关注的区域,如运动稀疏区域和纹理简单区域,视觉感知因子的取值较小;且运动幅度越小、纹理越简单,视觉感知因子的取值越小;The research on the HEVC code rate control technology based on the JCTVC-K0103 proposal shows that the bit allocation weight at the LCU level is calculated based on the prediction error MAD value of the same LCU in the previous encoded frame, and it only calculates the pixel from the perspective of signal processing. The brightness difference between points does not take into account the subjective visual perception characteristics of the human eye, so it cannot achieve the purpose of allocating more target coding bits to the human eye attention area. Therefore, it is necessary to improve the LCU-level target bit allocation method in the HEVC bit rate control technology, select factors that can represent the human visual perception to calculate the LCU-level bit allocation weights, and provide motion-rich areas and texture-rich areas in the image frame. The LCU allocates more target coding bits to achieve the purpose of improving the visual quality of human subjective perception. The visual perception factor mentioned here needs to be able to represent the subjective attention of the human eye to the image content, and it needs to meet the following characteristics: (1) For the human eye attention area, such as the motion-rich area and the texture-rich area, the value of the visual perception factor Larger; and the larger the motion amplitude and texture complexity, the larger the value of the visual perception factor. (2) For areas that the human eye does not pay attention to, such as sparse motion areas and areas with simple textures, the value of the visual perception factor is small; and the smaller the motion amplitude and the simpler the texture, the smaller the value of the visual perception factor;
目前已经有一些技术方案对HEVC码率控制技术进行改进,以达到提升人眼主观感知视觉质量的目的。这些方案大多也是从寻找能够表征人眼视觉感知的因子入手,利用视觉感知因子来计算LCU级别的比特分配权重,为处于人眼关注区域之内的LCU分配更多的目标编码比特数。但是这些现有的技术方案普遍存在如下问题。At present, there are some technical solutions to improve the HEVC bit rate control technology, so as to achieve the purpose of improving the visual quality of human subjective perception. Most of these schemes also start with looking for factors that can represent the visual perception of the human eye, use the visual perception factor to calculate the bit allocation weight at the LCU level, and allocate more target coding bits to the LCU within the human eye's attention area. However, these existing technical solutions generally have the following problems.
第一类现有技术方案需要在当前帧编码之前,对整帧图像进行预处理。预处理的目的是依据选取的视觉感知因子计算当前帧内部每一个LCU的比特分配权重,然后累加得到当前帧内所有LCU的比特分配权重之和,进而通过每一个LCU的比特分配权重在当前帧内所占的比例计算出各个LCU的目标编码比特数。这类技术方案需要在图像编码前添加预处理级将整帧图像内所有LCU的比特分配权重都计算出来以后,才能计算每一个LCU的目标编码比特数。这类技术方案中,整帧图像的预处理需要耗费大量的时间,且图像帧的分辨率越大,预处理需要的时间就越长,会引入很大的帧级编码延迟。同时,由于整帧图像的预处理与图像编码分开进行,导致预处理和编码需要分别从内存中读取图像内容,会额外耗费很多的总线带宽。因此,这一类技术方案适合于软件编码器实现,不适合硬件编码器实现。The first type of prior art solution needs to preprocess the entire frame of image before encoding the current frame. The purpose of preprocessing is to calculate the bit allocation weight of each LCU in the current frame according to the selected visual perception factor, and then accumulate the sum of the bit allocation weights of all LCUs in the current frame, and then pass the bit allocation weight of each LCU in the current frame. Calculate the target number of coded bits of each LCU according to the ratio of the inside. This type of technical solution needs to add a preprocessing stage before image encoding to calculate the bit allocation weights of all LCUs in the entire frame image before calculating the target number of encoding bits for each LCU. In this type of technical solution, the preprocessing of the entire frame of image takes a lot of time, and the larger the resolution of the image frame, the longer the preprocessing time will be, which will introduce a large frame-level coding delay. At the same time, since the preprocessing of the entire frame image is performed separately from the image encoding, the preprocessing and encoding need to read the image content from the memory separately, which will consume a lot of additional bus bandwidth. Therefore, this type of technical solution is suitable for software encoder implementation, not for hardware encoder implementation.
第二类现有技术方案利用视频帧之间的相关性,使用前一帧中各个LCU的比特分配权重,来作为当前帧中同位LCU的比特分配权重。这类方案在进行前一帧的编码时,在编码每一个LCU的过程中,同时依据视觉感知因子计算出这个LCU对应的比特分配权重。在前一帧编码完成时,前一帧上的各个LCU的比特分配权重也同时计算完成。当前帧上各个LCU的比特分配权重就采用前一帧同位LCU的比特分配权重。对于这类技术方案,虽然不需要添加额外的预处理级,实时性较好,且不会消耗额外的总线带宽,但是由于当前帧上LCU的比特分配权重完全由前一帧的图像内容计算得出,当帧与帧之间的变化较大时,会造成当前帧上的LCU的比特分配权重与人眼主观视觉敏感区域不匹配,LCU比特分配误差较大。The second type of prior art solution utilizes the correlation between video frames, and uses the bit allocation weight of each LCU in the previous frame as the bit allocation weight of the same LCU in the current frame. This type of scheme calculates the bit allocation weight corresponding to this LCU according to the visual perception factor during the encoding process of each LCU when encoding the previous frame. When the encoding of the previous frame is completed, the bit allocation weights of the LCUs in the previous frame are also calculated simultaneously. The bit allocation weight of each LCU in the current frame adopts the bit allocation weight of the same LCU in the previous frame. For this type of technical solution, although there is no need to add an additional preprocessing stage, the real-time performance is better, and no additional bus bandwidth will be consumed, but because the bit allocation weight of the LCU on the current frame is completely calculated from the image content of the previous frame It is shown that when the change between frames is large, the bit allocation weight of the LCU on the current frame will not match the subjective visual sensitive area of the human eye, and the LCU bit allocation error will be large.
除了上述问题,很多现有技术方案选取的视觉感知的因子的算法都很复杂,运算量很大,需要消耗很多的计算资源和带宽资源,不适合硬件实现。In addition to the above-mentioned problems, the algorithms of visual perception factors selected by many existing technical solutions are very complicated, require a lot of calculation, consume a lot of computing resources and bandwidth resources, and are not suitable for hardware implementation.
发明内容Contents of the invention
本发明所要解决的技术问题是提出了一种基于人眼主观视觉感知的、适合于硬件实现的HEVC码率控制方法及装置。The technical problem to be solved by the present invention is to propose a HEVC code rate control method and device based on human subjective visual perception and suitable for hardware implementation.
为解决上述技术问题,本发明提出了一种基于视觉感知的码率控制方法,包括如下步骤:步骤S10:计算LCU内部纹理丰富区域的梯度幅值,并将计算结果作为LCU的纹理感知因子。步骤S20:计算LCU内部运动丰富区域的帧差幅值与面积占比的乘积,并将计算结果作为LCU的运动感知因子。所述步骤S10和步骤S20的顺序或者任一在前,或者同时进行。步骤S30:根据LCU的纹理感知因子和运动感知因子计算LCU的比特分配权重,进而计算出LCU的目标编码比特数。In order to solve the above technical problems, the present invention proposes a code rate control method based on visual perception, including the following steps: Step S10: Calculate the gradient magnitude of the texture-rich area inside the LCU, and use the calculation result as the texture perception factor of the LCU. Step S20: Calculate the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU, and use the calculation result as the motion perception factor of the LCU. The sequence of step S10 and step S20 is either performed first, or performed simultaneously. Step S30: Calculate the bit allocation weight of the LCU according to the texture perception factor and the motion perception factor of the LCU, and then calculate the target number of coding bits of the LCU.
进一步地,所述LCU的纹理感知因子表征了人眼对于LCU纹理特征的视觉敏感程度;所述LCU的纹理感知因子取值越大,表示LCU内部的纹理越丰富,人眼主观视觉越关注;所述LCU的纹理感知因子取值越小,表示LCU内部的纹理越平坦,人眼主观视觉越不关注。Further, the texture perception factor of the LCU represents the visual sensitivity of the human eye to the texture feature of the LCU; the larger the value of the texture perception factor of the LCU, the richer the texture inside the LCU, and the more attention the subjective vision of the human eye has; The smaller the value of the texture perception factor of the LCU, the smoother the texture inside the LCU, and the less attention is paid to the subjective vision of the human eye.
进一步地,所述LCU的运动感知因子表征了人眼对于LCU运动特征的视觉敏感程度;所述LCU的运动感知因子取值越大,表示LCU内部的运动越丰富,人眼主观视觉越关注;所述LCU的运动感知因子取值越小,表示LCU内部的运动越轻微,人眼主观视觉越不关注。Further, the motion perception factor of the LCU characterizes the visual sensitivity of the human eye to the motion characteristics of the LCU; the larger the value of the motion perception factor of the LCU, the richer the movement inside the LCU, and the more attention the human eye subjective vision pays; The smaller the value of the motion perception factor of the LCU, the lighter the motion inside the LCU, and the less attention is paid to the subjective vision of the human eye.
进一步地,所述步骤S10中计算LCU的纹理感知因子具体包括如下步骤。步骤S11:计算LCU内部每一个像素的梯度幅值Grad x,y,并由前一帧梯度信息得到的纹理丰富像素的梯度判定阈值来判断LCU内部的各个像素是否为纹理丰富像素;LCU内部由纹理丰富像素组成的区域即为LCU内部纹理丰富区域。步骤S12:计算LCU内部纹理丰富区域的梯度幅值Grad LCU;在计算当前帧中各个LCU的Grad LCU时,记录下其中的最大值G max。步骤S13:对LCU内部纹理丰富区域的梯度幅值Grad LCU进行归一化处理得到
Figure PCTCN2022123742-appb-000005
其中使用前一帧的LCU内部纹理丰富区域的梯度幅值的最大值作为当前帧中LCU内部纹理丰富区域的梯度幅值的归一化的基准;归一化值
Figure PCTCN2022123742-appb-000006
即为LCU的纹理感知因子。步骤S14:使用当前帧的像素梯度均值计算下一帧纹理丰富像素的梯度判定阈值Grad Thr
Further, the calculation of the texture perception factor of the LCU in the step S10 specifically includes the following steps. Step S11: Calculate the gradient magnitude Grad x, y of each pixel in the LCU, and judge whether each pixel in the LCU is a texture-rich pixel by using the gradient determination threshold of the texture-rich pixel obtained from the gradient information of the previous frame; The area composed of texture-rich pixels is the texture-rich area inside the LCU. Step S12: Calculate the gradient magnitude Grad LCU of the texture-rich region inside the LCU; when calculating the Grad LCU of each LCU in the current frame, record the maximum value G max . Step S13: Normalize the gradient magnitude Grad LCU of the texture-rich area inside the LCU to obtain
Figure PCTCN2022123742-appb-000005
Wherein the maximum value of the gradient magnitude of the LCU internal texture-rich region of the previous frame is used as the normalized benchmark of the gradient magnitude of the LCU internal texture-rich region in the current frame; the normalized value
Figure PCTCN2022123742-appb-000006
It is the texture perception factor of LCU. Step S14: Using the pixel gradient mean value of the current frame to calculate the gradient determination threshold Grad Thr of the texture-rich pixel in the next frame.
进一步地,所述步骤S11中,若满足Grad x,y>Grad Thr,判定该像素属于纹理丰富像素,否则判定该像素不属于纹理丰富像素;Grad Thr是由前一帧梯度信息得到的纹理丰富像素的梯度判定阈值。 Further, in the step S11, if Grad x, y >Grad Thr is satisfied, it is determined that the pixel belongs to a texture-rich pixel, otherwise it is determined that the pixel does not belong to a texture-rich pixel; Grad Thr is a texture-rich pixel obtained from the gradient information of the previous frame. Gradient decision threshold for pixels.
进一步地,所述步骤S12中,
Figure PCTCN2022123742-appb-000007
Grad x,y>Grad Thr;其中,M是LCU的水平宽度,N是LCU的垂直高度,Grad LCU是LCU内部所有纹理丰富像素的梯度幅值 之和。
Further, in the step S12,
Figure PCTCN2022123742-appb-000007
Grad x,y >Grad Thr ; where M is the horizontal width of the LCU, N is the vertical height of the LCU, and Grad LCU is the sum of the gradient magnitudes of all texture-rich pixels inside the LCU.
进一步地,所述步骤S13中,
Figure PCTCN2022123742-appb-000008
其中,G max是前一帧中各个LCU的Grad LCU的最大值;
Figure PCTCN2022123742-appb-000009
的取值在[0,ZZ]之间并且为整数,使用0到ZZ之间的整数来代表0至1的归一化范围,ZZ代表归一化后的最大值。
Further, in the step S13,
Figure PCTCN2022123742-appb-000008
Wherein, G max is the maximum value of the Grad LCU of each LCU in the previous frame;
Figure PCTCN2022123742-appb-000009
The value of is between [0,ZZ] and is an integer, and an integer between 0 and ZZ is used to represent the normalized range of 0 to 1, and ZZ represents the maximum value after normalization.
进一步地,所述步骤S14中,Grad Thr=Grad avg+Grad offset或Grad Thr=α×Grad avg;其中,Grad offset为梯度阈值调节偏差,α为梯度阈值调节乘数,Grad avg是当前帧的像素梯度均值;
Figure PCTCN2022123742-appb-000010
其中,W是当前图像帧的水平宽度,H是当前图像帧的垂直高度。
Further, in the step S14, Grad Thr =Grad avg +Grad offset or Grad Thr =α×Grad avg ; wherein, Grad offset is the gradient threshold adjustment deviation, α is the gradient threshold adjustment multiplier, and Grad avg is the current frame’s pixel gradient mean;
Figure PCTCN2022123742-appb-000010
Wherein, W is the horizontal width of the current image frame, and H is the vertical height of the current image frame.
进一步地,所述步骤S20中计算LCU的运动感知因子具体包括如下步骤。步骤S21:计算LCU内部每一个像素与前一帧同位LCU对应像素的帧差幅值Diff x,y,并由前一帧帧差信息得到的运动丰富像素的帧差判定阈值来判断LCU内部的各个像素是否为运动丰富像素;LCU内部由运动丰富像素组成的区域即为LCU内部运动丰富区域。步骤S22:计算LCU内部运动丰富区域的帧差幅值Diff LCU;在计算当前帧中各个LCU的Diff LCU时,记录下其中的最大值D max。步骤S23:计算LCU内部运动丰富区域的面积占比Area LCU。所述步骤S22和步骤S23的顺序或者任一在前,或者同时进行。步骤S24:计算LCU内部运动丰富区域的帧差幅值与面积占比的乘积的归一化值
Figure PCTCN2022123742-appb-000011
其中使用前一帧的LCU内部运动丰富区域的帧差幅值的最大值来作为当前帧中LCU内部运动丰富区域的帧差幅值与面积占比的乘积的归一化的基准;归一化值
Figure PCTCN2022123742-appb-000012
即为LCU的运动感知因子。步骤S25:使用当前帧的帧差幅值均值计算下一帧的运动丰富像素的帧差判定阈值Diff Thr
Further, the calculation of the motion perception factor of the LCU in the step S20 specifically includes the following steps. Step S21: Calculate the frame difference amplitude Diff x,y of each pixel in the LCU and the corresponding pixel of the same LCU in the previous frame, and judge the frame difference judgment threshold of the motion-rich pixels obtained from the frame difference information of the previous frame to judge the Whether each pixel is a motion-rich pixel; the area inside the LCU composed of motion-rich pixels is the motion-rich area inside the LCU. Step S22: Calculate the frame difference amplitude value Diff LCU of the motion-rich area inside the LCU; when calculating the Diff LCU of each LCU in the current frame, record the maximum value D max . Step S23: Calculate the area ratio Area LCU of the motion-rich area inside the LCU. The sequence of step S22 and step S23 is either carried out first, or carried out at the same time. Step S24: Calculate the normalized value of the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU
Figure PCTCN2022123742-appb-000011
The maximum value of the frame difference amplitude of the LCU internal motion-rich area of the previous frame is used as the normalized benchmark of the product of the frame difference amplitude and the area ratio of the LCU internal motion-rich area in the current frame; normalization value
Figure PCTCN2022123742-appb-000012
It is the motion perception factor of LCU. Step S25: Calculate the frame difference determination threshold Diff Thr of the motion-rich pixels in the next frame by using the frame difference amplitude mean value of the current frame.
进一步地,所述步骤S21中,若满足Diff x,y>Diff Thr,判定该像素属于运动丰富像素,否则判定该像素不属于运动丰富像素;Diff Thr是由前一帧帧差信息得到的运动丰富像素的帧差判定阈值。 Further, in the step S21, if Diff x, y > Diff Thr is satisfied, it is determined that the pixel belongs to a motion-rich pixel, otherwise it is determined that the pixel does not belong to a motion-rich pixel; The frame difference decision threshold for rich pixels.
进一步地,所述步骤S22中,
Figure PCTCN2022123742-appb-000013
Diff x,y>Diff Thr;其中,M是LCU的水平宽度,N是LCU的垂直高度,Diff LCU是LCU内部所有运动丰富像素的帧差幅值之和。
Further, in the step S22,
Figure PCTCN2022123742-appb-000013
Diff x,y >Diff Thr ; where M is the horizontal width of the LCU, N is the vertical height of the LCU, and Diff LCU is the sum of frame differences of all motion-rich pixels inside the LCU.
进一步地,所述步骤S23中,
Figure PCTCN2022123742-appb-000014
其中,Area LCU的取值在[0,ZZ]之间并且为整数,对应面积占比0%至100%;M是LCU的水平宽度,N是LCU的垂直高度;
Figure PCTCN2022123742-appb-000015
其中,若像素为运动丰富像素,则其对应的Mov x,y取值为1,否则取值为0。
Further, in the step S23,
Figure PCTCN2022123742-appb-000014
Among them, the value of Area LCU is between [0, ZZ] and is an integer, and the corresponding area accounts for 0% to 100%; M is the horizontal width of LCU, and N is the vertical height of LCU;
Figure PCTCN2022123742-appb-000015
Wherein, if the pixel is a motion-rich pixel, its corresponding Mov x, y takes a value of 1, otherwise takes a value of 0.
进一步地,所述步骤S24中,
Figure PCTCN2022123742-appb-000016
其中,
Figure PCTCN2022123742-appb-000017
的取值在[0,ZZ]之间并且为整数,使用0到ZZ之间的整数来代表0至1的归一化范围,ZZ代表归一化后的最大值;D max是前一帧中各个LCU的Diff LCU的最大值。
Further, in the step S24,
Figure PCTCN2022123742-appb-000016
in,
Figure PCTCN2022123742-appb-000017
The value of is between [0, ZZ] and is an integer, using an integer between 0 and ZZ to represent the normalized range of 0 to 1, ZZ represents the maximum value after normalization; D max is the previous frame The maximum value of the Diff LCU of each LCU in .
进一步地,所述步骤S25中,Diff Thr=Diff avg+Diff offset或Diff Thr=β×Diff avg;其中,Diff offset为运动阈值调节偏差,β为运动阈值调节乘数,Diff avg是当前帧的帧差幅值均值;
Figure PCTCN2022123742-appb-000018
其中,W是图像帧的水平宽度,H是图像帧的垂直高度。
Further, in the step S25, Diff Thr =Diff avg +Diff offset or Diff Thr =β×Diff avg ; wherein, Diff offset is the motion threshold adjustment deviation, β is the motion threshold adjustment multiplier, and Diff avg is the current frame Mean frame difference amplitude;
Figure PCTCN2022123742-appb-000018
Wherein, W is the horizontal width of the image frame, and H is the vertical height of the image frame.
进一步地,所述步骤S30中,依据LCU的纹理感知因子K T和LCU的运动感知因子K M计算LCU的比特分配权重的公式如下;ω LCU=μ T×K TM×K M;其中,ω LCU为LCU的比特分配权重,其取值在[0,ZZ]之间并且为整数,使用0到ZZ之间的整数来代表0至1;μ T为LCU纹理感知因子的权重系数,μ M为LCU运动感知因子的权重系数,两者满足μ TM=1以及0<μ T<1以及0<μ M<1。 Further, in the step S30, the formula for calculating the bit allocation weight of the LCU according to the texture perception factor K T of the LCU and the motion perception factor K M of the LCU is as follows; ω LCU = μ T × K T + μ M × K M ; Among them, ω LCU is the bit allocation weight of LCU, and its value is between [0, ZZ] and is an integer, using an integer between 0 and ZZ to represent 0 to 1; μ T is the weight coefficient of the LCU texture perception factor , μ M is the weight coefficient of the LCU motion perception factor, both of which satisfy μ TM =1 and 0<μ T <1 and 0<μ M <1.
进一步地,所述步骤S30中,在计算出当前帧中的LCU的比特分配权重ω LCU后,使用前一帧统计的LCU比特分配权重之和代替当前帧中LCU的比特分配权重之和,实时计算出当前帧中的LCU的比特分配权重在整个图像帧中的占比
Figure PCTCN2022123742-appb-000019
进而计算出LCU的目标编码比特数;
Figure PCTCN2022123742-appb-000020
Further, in the step S30, after the bit allocation weight ω LCU of the LCU in the current frame is calculated, the sum of the LCU bit allocation weights in the previous frame is used to replace the sum of the bit allocation weights of the LCU in the current frame, and the real-time Calculate the proportion of the bit allocation weight of the LCU in the current frame in the entire image frame
Figure PCTCN2022123742-appb-000019
Then calculate the target code bit number of LCU;
Figure PCTCN2022123742-appb-000020
优选地,ZZ的取值为127、255、511、或1023之一。Preferably, the value of ZZ is one of 127, 255, 511, or 1023.
本申请还提出了一种基于视觉感知的码率控制装置,包括纹理感知因子计算模块、运动感知因子计算模块和LCU比特分配模块。所述纹理感知因子计算模块用于计算LCU内部纹理丰富区域的梯度幅值,并将计算结果作为LCU的纹理感知因子。所述运动感知因子计算模块用于计算LCU内部运动丰富区域的帧差幅值与面积占比的乘积,并将计算结果作为LCU的运动感知因子。所述LCU比特分配模块根据LCU的纹理感知因子和运动感知因子计算LCU的比特分配权重,进而计算出LCU的目标编码比特数。The present application also proposes a code rate control device based on visual perception, which includes a texture perception factor calculation module, a motion perception factor calculation module and an LCU bit allocation module. The texture perception factor calculation module is used to calculate the gradient magnitude of the texture-rich area inside the LCU, and use the calculation result as the texture perception factor of the LCU. The motion perception factor calculation module is used to calculate the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU, and use the calculation result as the motion perception factor of the LCU. The LCU bit allocation module calculates the bit allocation weight of the LCU according to the texture perception factor and the motion perception factor of the LCU, and then calculates the target number of coding bits of the LCU.
本发明取得的技术效果是使图像帧内LCU的目标比特分配更加符合人眼主观感知的特征,提升人眼主观视觉质量,并且适合于采用硬件实现。The technical effect achieved by the invention is to make the target bit allocation of the LCU in the image frame more in line with the characteristics of the subjective perception of the human eye, improve the subjective visual quality of the human eye, and be suitable for hardware implementation.
附图说明Description of drawings
图1是本发明提出的码率控制方法的流程示意图。FIG. 1 is a schematic flow chart of a code rate control method proposed by the present invention.
图2是本发明提出的码率控制装置的结构示意图。FIG. 2 is a schematic structural diagram of a code rate control device proposed by the present invention.
图3是步骤S10中计算LCU的纹理感知因子的具体流程示意图。Fig. 3 is a schematic flow chart of calculating the texture perception factor of the LCU in step S10.
图4是步骤S20中计算LCU的运动感知因子的具体流程示意图。FIG. 4 is a schematic flow chart of calculating the motion perception factor of the LCU in step S20.
图中附图标记说明:10为纹理感知因子计算模块、20为运动感知因子计算模块、30为LCU比特分配模块。Reference numerals in the figure illustrate: 10 is a texture perception factor calculation module, 20 is a motion perception factor calculation module, and 30 is an LCU bit allocation module.
具体实施方式Detailed ways
请参阅图1,本发明提出的码率控制方法包括如下步骤。Please refer to FIG. 1 , the code rate control method proposed by the present invention includes the following steps.
步骤S10:计算LCU内部纹理丰富区域的梯度幅值,并将计算结果作为LCU的纹理感知因子。所述纹理感知因子表征了人眼对于LCU纹理特征的视觉敏感程度。Step S10: Calculate the gradient magnitude of the texture-rich region inside the LCU, and use the calculation result as the texture perception factor of the LCU. The texture perception factor represents the visual sensitivity of the human eye to the texture feature of the LCU.
步骤S20:计算LCU内部运动丰富区域的帧差幅值与面积占比的乘积,并将计算结果作为LCU的运动感知因子。所述运动感知因子表征了人眼对于LCU运动特征的视觉敏感程度。这一步使用LCU运动丰富区域的帧差幅值与面积占比的乘积来计算LCU的运动感知因子,属于本发明的一个创新。发明人经过实验发现,如果仅采用LCU内部运动丰富区域的帧差幅值来表征LCU的运动丰富程度,这种方式对小面积运动比较敏感,容易受到传感器噪点的影响导致判定结果不准确。如果仅采用LCU内部运动丰富区域的面积占比来表征LCU的运动丰富程度,这种方式对由于前景运动而造成的图像局部区域光照和阴影的细微变化非常敏感,不符合人眼关注特性。而利用LCU内部运动丰富区域的帧差幅值与面积占比的乘积来表征LCU的运动丰富程度能够克服采用上述单一方法的缺陷,可以更准确地反映人眼对运动区域的视觉关注度。Step S20: Calculate the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU, and use the calculation result as the motion perception factor of the LCU. The motion perception factor represents the degree of visual sensitivity of the human eye to the motion characteristics of the LCU. In this step, the product of the frame difference magnitude and the area ratio of the motion-rich area of the LCU is used to calculate the motion perception factor of the LCU, which belongs to an innovation of the present invention. The inventor found through experiments that if only the frame difference amplitude of the motion-rich area inside the LCU is used to represent the motion richness of the LCU, this method is sensitive to small-area motion and is easily affected by sensor noise, resulting in inaccurate judgment results. If only the area ratio of the motion-rich area inside the LCU is used to characterize the motion richness of the LCU, this method is very sensitive to subtle changes in illumination and shadows in local areas of the image caused by foreground motion, which does not meet the characteristics of human eyes. Using the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU to characterize the motion richness of the LCU can overcome the shortcomings of the above-mentioned single method, and can more accurately reflect the visual attention of the human eye to the motion area.
所述步骤S10和步骤S20的顺序没有严格限制,可以任一在前,也可以同时进行。The order of the steps S10 and S20 is not strictly limited, either one can be performed first, or they can be performed simultaneously.
步骤S30:根据LCU的纹理感知因子和运动感知因子计算LCU的比特分配权重,进而计算出LCU的目标编码比特数。Step S30: Calculate the bit allocation weight of the LCU according to the texture perception factor and the motion perception factor of the LCU, and then calculate the target number of coding bits of the LCU.
采用本发明提出的码率控制方法,可以使图像帧内LCU的比特分配更加符合人眼主观感知的特征,提升人眼主观视觉质量。By adopting the code rate control method proposed by the present invention, the bit allocation of the LCU in the image frame can be more in line with the characteristics of the subjective perception of the human eye, and the subjective visual quality of the human eye can be improved.
请参阅图2,本发明提出的码率控制装置包括纹理感知因子计算模块10、运动感知因子计算模块20和LCU比特分配模块30,整体与图1所示的码率控制方法相对应。其中,纹理感知因子计算模块10用于计算LCU内部纹理丰富区域的梯度幅值,并将计算结果作为LCU的纹理感知因子。运动感知因子计算模块20用于计算LCU内部运动丰富区域的帧差幅值与面积占比的乘积,并将计算结果作为LCU的运动感知因子。LCU比特分配模块30根据LCU的纹理感知因子和运动感知因子计算LCU的比特分配权重,进而计算出LCU的目标编码比特数。Please refer to FIG. 2 , the code rate control device proposed by the present invention includes a texture perception factor calculation module 10 , a motion perception factor calculation module 20 and an LCU bit allocation module 30 , which generally correspond to the code rate control method shown in FIG. 1 . Wherein, the texture perception factor calculation module 10 is used to calculate the gradient magnitude of the texture-rich region inside the LCU, and use the calculation result as the texture perception factor of the LCU. The motion perception factor calculation module 20 is used to calculate the product of the frame difference magnitude and the area ratio of the motion-rich area inside the LCU, and use the calculation result as the motion perception factor of the LCU. The LCU bit allocation module 30 calculates the bit allocation weight of the LCU according to the texture perception factor and the motion perception factor of the LCU, and then calculates the target number of coding bits of the LCU.
请参阅图3,所述步骤S10中计算LCU的纹理感知因子具体包括如下步骤。Please refer to FIG. 3 , the calculation of the texture perception factor of the LCU in the step S10 specifically includes the following steps.
步骤S11:计算LCU内部每一个像素的梯度幅值Grad x,y,并由前一帧梯度信息得到的纹理丰富像素的梯度判定阈值来判断LCU内部的各个像素是否为纹理丰富像素。若满足Grad x,y>Grad Thr,判定该像素属于纹理丰富像素,否则判定该像素不属于纹理丰富像素。 Grad Thr是由前一帧梯度信息得到的纹理丰富像素的梯度判定阈值,详见后续步骤S14。LCU内部由纹理丰富像素组成的区域即为LCU内部纹理丰富区域。 Step S11: Calculate the gradient magnitude Grad x,y of each pixel in the LCU, and determine whether each pixel in the LCU is a texture-rich pixel based on the gradient determination threshold of the texture-rich pixel obtained from the gradient information of the previous frame. If Grad x,y >Grad Thr is satisfied, it is determined that the pixel belongs to a texture-rich pixel; otherwise, it is determined that the pixel does not belong to a texture-rich pixel. Grad Thr is the gradient determination threshold of texture-rich pixels obtained from the gradient information of the previous frame, see the subsequent step S14 for details. The region composed of texture-rich pixels inside the LCU is the texture-rich region inside the LCU.
步骤S12:计算LCU内部纹理丰富区域的梯度幅值Grad LCU,计算公式如下。
Figure PCTCN2022123742-appb-000021
Figure PCTCN2022123742-appb-000022
Grad x,y>Grad Thr。其中,M是LCU的水平宽度,N是LCU的垂直高度,单位均为像素数量。Grad LCU是LCU内部所有纹理丰富像素的梯度幅值之和,反映了LCU内部纹理丰富区域的梯度幅值大小。在计算当前帧中各个LCU的Grad LCU时,记录下其中的最大值G max,供计算下一帧的LCU的Grad LCU的归一化值
Figure PCTCN2022123742-appb-000023
时使用。
Step S12: Calculate the gradient magnitude Grad LCU of the texture-rich region inside the LCU, the calculation formula is as follows.
Figure PCTCN2022123742-appb-000021
Figure PCTCN2022123742-appb-000022
Grad x,y >Grad Thr . Wherein, M is the horizontal width of the LCU, N is the vertical height of the LCU, and the unit is the number of pixels. Grad LCU is the sum of the gradient magnitudes of all texture-rich pixels inside the LCU, reflecting the gradient magnitude of the texture-rich region inside the LCU. When calculating the Grad LCU of each LCU in the current frame, record the maximum value G max for calculating the normalized value of the Grad LCU of the LCU in the next frame
Figure PCTCN2022123742-appb-000023
used when.
步骤S13:对LCU内部纹理丰富区域的梯度幅值Grad LCU进行归一化处理得到
Figure PCTCN2022123742-appb-000024
其中使用前一帧的LCU内部纹理丰富区域的梯度幅值的最大值作为当前帧中LCU内部纹理丰富区域的梯度幅值的归一化的基准。归一化值
Figure PCTCN2022123742-appb-000025
即为LCU的纹理感知因子,计算公式如下。
Figure PCTCN2022123742-appb-000026
其中,
Figure PCTCN2022123742-appb-000027
是Grad LCU的归一化值,
Figure PCTCN2022123742-appb-000028
的取值在[0,ZZ]之间并且为整数,方括号表示包含“等于”。G max是前一帧中各个LCU的Grad LCU的最大值。在每个具体应用场景中,ZZ是一个固定值。为了方便硬件实现归一化操作,尽可能不引入浮点运算,使用0到ZZ之间的整数来代表0至1的归一化范围,ZZ代表归一化后的最大值。考虑到在保证计算精度的基础上不过大增加计算量,ZZ的优选取值为127、255、511、或1023,以便于硬件实现。
Step S13: Normalize the gradient magnitude Grad LCU of the texture-rich area inside the LCU to obtain
Figure PCTCN2022123742-appb-000024
The maximum value of the gradient magnitude of the texture-rich region inside the LCU in the previous frame is used as a normalized benchmark for the gradient magnitude of the texture-rich region inside the LCU in the current frame. normalized value
Figure PCTCN2022123742-appb-000025
It is the texture perception factor of LCU, and the calculation formula is as follows.
Figure PCTCN2022123742-appb-000026
in,
Figure PCTCN2022123742-appb-000027
is the normalized value of Grad LCU ,
Figure PCTCN2022123742-appb-000028
The value of is between [0, ZZ] and is an integer, and the square brackets indicate that "equal to" is included. G max is the maximum value of the Grad LCU of each LCU in the previous frame. In each specific application scenario, ZZ is a fixed value. In order to facilitate the normalization operation of the hardware, floating-point operations are not introduced as much as possible, and integers between 0 and ZZ are used to represent the normalized range of 0 to 1, and ZZ represents the maximum value after normalization. Considering that the calculation amount should not be greatly increased on the basis of ensuring the calculation accuracy, the preferred value of ZZ is 127, 255, 511, or 1023, so as to facilitate hardware implementation.
后文中,LCU的纹理感知因子简写为K T。K T的取值在[0,ZZ]之间。K T取值越大,表示LCU内部的纹理越丰富,人眼主观视觉越关注;K T取值越小,表示LCU内部的纹理越平坦,人眼主观视觉越不关注。 Hereinafter, the texture perception factor of the LCU is abbreviated as K T . The value of K T is between [0, ZZ]. The larger the K T value, the richer the texture inside the LCU, and the more attention the human subjective vision pays; the smaller the K T value, the flatter the texture inside the LCU, and the less attention the human subjective vision pays.
步骤S14:使用当前帧的像素梯度均值计算下一帧纹理丰富像素的梯度判定阈值Grad Thr,计算公式如下面任意一种。Grad Thr=Grad avg+Grad offset或Grad Thr=α×Grad avg。其中,Grad offset为梯度阈值调节偏差,α为梯度阈值调节乘数,这两个值的取值可以依据用户对图像纹理区域的敏感程度进行调节。Grad avg是当前帧的像素梯度均值,计算公式如下。
Figure PCTCN2022123742-appb-000029
其中,W是当前图像帧的水平宽度,H是当前图像帧的垂直高度,单位均为像素数量。
Step S14: Using the pixel gradient mean value of the current frame to calculate the gradient determination threshold Grad Thr of the texture-rich pixel in the next frame, the calculation formula is any one of the following. Grad Thr = Grad avg + Grad offset or Grad Thr = α×Grad avg . Among them, Grad offset is the gradient threshold adjustment deviation, and α is the gradient threshold adjustment multiplier. The values of these two values can be adjusted according to the user's sensitivity to the image texture area. Grad avg is the average pixel gradient of the current frame, and the calculation formula is as follows.
Figure PCTCN2022123742-appb-000029
Wherein, W is the horizontal width of the current image frame, H is the vertical height of the current image frame, and the unit is the number of pixels.
在计算LCU的纹理感知因子K T的过程中,本发明具有如下创新。(1)利用连续视频帧之间图像内容上的相关性,使用前一帧统计的像素梯度均值与用户调节参数相结合,作为当前帧中纹理丰富像素的梯度判定阈值,实现对当前帧中LCU内部的像素点是否属于纹理丰富像素点的自适应判定。(2)利用连续视频帧之间图像内容上的相关性,使用前一帧统计的LCU内部纹理丰富区域的梯度幅值的最大值来作为当前帧中LCU内部纹理丰富区域的梯度幅值的 归一化的基准,实现对当前帧中LCU内部纹理丰富区域的梯度幅值的自适应归一化,使当前帧内各LCU的梯度幅值大小分布更加合理,并令K T的取值分布更加合理。 In the process of calculating the texture perception factor K T of the LCU, the present invention has the following innovations. (1) Utilizing the correlation of image content between consecutive video frames, using the combination of the pixel gradient mean value of the previous frame and user adjustment parameters, as the gradient judgment threshold of texture-rich pixels in the current frame, to realize the LCU in the current frame Adaptive determination of whether the internal pixels belong to texture-rich pixels. (2) Utilizing the correlation of image content between consecutive video frames, the maximum value of the gradient amplitude of the texture-rich area inside the LCU counted in the previous frame is used as the normalization of the gradient amplitude of the texture-rich area inside the LCU in the current frame. The unified benchmark realizes the adaptive normalization of the gradient amplitude of the texture-rich area inside the LCU in the current frame, making the distribution of the gradient amplitude of each LCU in the current frame more reasonable, and making the distribution of K T values more reasonable. Reasonable.
请参阅图4,所述步骤S20中计算LCU的运动感知因子具体包括如下步骤。Please refer to FIG. 4 , the calculation of the motion perception factor of the LCU in the step S20 specifically includes the following steps.
步骤S21:计算LCU内部每一个像素与前一帧同位(即相同位置)LCU对应像素的帧差幅值Diff x,y,并由前一帧帧差信息得到的运动丰富像素的帧差判定阈值来判断LCU内部的各个像素是否为运动丰富像素。所述帧差幅值是指两个对应像素点亮度值的差值的绝对值。在帧差(即帧间差分)法里,对应像素点的亮度值做差,然后差值需要取绝对值再作为结果参与后续的运算。若满足Diff x,y>Diff Thr,判定该像素属于运动丰富像素,否则判定该像素不属于运动丰富像素。Diff Thr是由前一帧帧差信息得到的运动丰富像素的帧差判定阈值,详见后续步骤S25。LCU内部由运动丰富像素组成的区域即为LCU内部运动丰富区域。 Step S21: Calculate the frame difference amplitude Diff x, y of each pixel in the LCU and the corresponding pixel of the LCU at the same position (that is, the same position) in the previous frame, and the frame difference judgment threshold of the motion-rich pixels obtained from the frame difference information of the previous frame To determine whether each pixel inside the LCU is a motion-rich pixel. The frame difference amplitude refers to the absolute value of the difference between the luminance values of two corresponding pixel points. In the frame difference (that is, inter-frame difference) method, the brightness value of the corresponding pixel is differenced, and then the difference needs to be taken as an absolute value and then used as the result to participate in subsequent calculations. If Diff x,y >Diff Thr is satisfied, it is determined that the pixel is a motion-rich pixel, otherwise it is determined that the pixel does not belong to a motion-rich pixel. Diff Thr is the frame difference determination threshold of motion-rich pixels obtained from the frame difference information of the previous frame, see the subsequent step S25 for details. The region composed of motion-rich pixels inside the LCU is the motion-rich region inside the LCU.
步骤S22:计算LCU内部运动丰富区域的帧差幅值Diff LCU,计算公式如下。
Figure PCTCN2022123742-appb-000030
Figure PCTCN2022123742-appb-000031
Diff x,y>Diff Thr。其中,M是LCU的水平宽度,N是LCU的垂直高度,单位均为像素数量。Diff LCU是LCU内部所有运动丰富像素的帧差幅值之和,反映了LCU运动丰富区域的帧差幅值大小。在计算当前帧中各个LCU的Diff LCU时,记录下其中的最大值D max,供计算下一帧中LCU的Diff LCU与Area LCU的乘积的归一化值时使用。
Step S22: Calculate the frame difference amplitude value Diff LCU of the motion-rich area inside the LCU, the calculation formula is as follows.
Figure PCTCN2022123742-appb-000030
Figure PCTCN2022123742-appb-000031
Diff x,y >Diff Thr . Wherein, M is the horizontal width of the LCU, N is the vertical height of the LCU, and the unit is the number of pixels. Diff LCU is the sum of the frame difference amplitudes of all motion-rich pixels inside the LCU, reflecting the frame difference amplitude of the motion-rich area of the LCU. When calculating the Diff LCU of each LCU in the current frame, the maximum value D max is recorded for use in calculating the normalized value of the product of the Diff LCU and the Area LCU of the LCU in the next frame.
步骤S23:计算LCU内部运动丰富区域的面积占比Area LCU,计算公式如下。
Figure PCTCN2022123742-appb-000032
Figure PCTCN2022123742-appb-000033
其中,Area LCU的取值在[0,ZZ]之间,对应面积占比0%至100%。ZZ的含义和优选取值与之前相同,不再赘述。M是LCU的水平宽度,N是LCU的垂直高度,单位均为像素数量。
Figure PCTCN2022123742-appb-000034
其中,若像素为运动丰富像素,则其对应的Mov x,y取值为1,否则取值为0。计算Area LCU的公式的逻辑意义是,计算LCU内部所有运动像素点的总个数占LCU内部所有像素点的总个数的比例,并将比例归一化。Mov x,y=1表示对应像素点为运动像素点,这样对Mov x,y进行加和,就得到了LCU内部所有运动像素点的总个数。
Step S23: Calculate the area ratio Area LCU of the motion-rich area inside the LCU, and the calculation formula is as follows.
Figure PCTCN2022123742-appb-000032
Figure PCTCN2022123742-appb-000033
Among them, the value of Area LCU is between [0, ZZ], and the corresponding area accounts for 0% to 100%. The meaning and preferred value of ZZ are the same as before, and will not be repeated here. M is the horizontal width of the LCU, N is the vertical height of the LCU, and the unit is the number of pixels.
Figure PCTCN2022123742-appb-000034
Wherein, if the pixel is a motion-rich pixel, its corresponding Mov x, y takes a value of 1, otherwise takes a value of 0. The logical meaning of the formula for calculating the Area LCU is to calculate the ratio of the total number of all moving pixels in the LCU to the total number of all pixels in the LCU, and normalize the ratio. Mov x, y = 1 indicates that the corresponding pixel is a motion pixel, so the sum of Mov x, y can obtain the total number of all motion pixels inside the LCU.
所述步骤S22和步骤S23的顺序没有严格限制,可以任一在前,也可以同时进行。The order of the steps S22 and S23 is not strictly limited, either one can be performed first, or they can be performed simultaneously.
步骤S24:计算LCU内部运动丰富区域的帧差幅值与面积占比的乘积的归一化值
Figure PCTCN2022123742-appb-000035
其中使用前一帧的LCU内部运动丰富区域的帧差幅值的最大值来作为当前帧中LCU内部运动丰富区域的帧差幅值与面积占比的乘积的归一化的基准。归一化值
Figure PCTCN2022123742-appb-000036
即为LCU的运动感知因子,计算公式如下。
Figure PCTCN2022123742-appb-000037
其中,
Figure PCTCN2022123742-appb-000038
的取值在[0,ZZ]之间。ZZ的含义和优选取值与之前相同,不再赘述。D max是前一帧中各个 LCU的Diff LCU的最大值。
Step S24: Calculate the normalized value of the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU
Figure PCTCN2022123742-appb-000035
The maximum value of the frame difference amplitude of the motion-rich area inside the LCU in the previous frame is used as a normalized benchmark of the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU in the current frame. normalized value
Figure PCTCN2022123742-appb-000036
It is the motion perception factor of the LCU, and the calculation formula is as follows.
Figure PCTCN2022123742-appb-000037
in,
Figure PCTCN2022123742-appb-000038
The value of is between [0, ZZ]. The meaning and preferred value of ZZ are the same as before, and will not be repeated here. D max is the maximum value of the Diff LCU of each LCU in the previous frame.
后文中,LCU的运动感知因子简写为K M。K M的取值在[0,ZZ]之间。K M取值越大,表示LCU内部的运动越丰富,人眼主观视觉越关注;K M取值越小,表示LCU内部的运动越轻微,人眼主观视觉越不关注。 In the following, the motion perception factor of the LCU is abbreviated as K M . The value of K M is between [0, ZZ]. The larger the value of K M , the richer the movement inside the LCU, and the more attention the human subjective vision pays; the smaller the value of K M , the lighter the movement inside the LCU, and the less attention the human subjective vision pays.
步骤S25:使用当前帧的帧差幅值均值计算下一帧的运动丰富像素的帧差判定阈值Diff Thr,计算公式如下面任意一种。Diff Thr=Diff avg+Diff offset或Diff Thr=β×Diff avg。其中,Diff offset为运动阈值调节偏差,β为运动阈值调节乘数,这两个值的取值可以依据用户对图像运动区域的敏感程度进行调节。Diff avg是当前帧的帧差幅值均值,计算公式如下。
Figure PCTCN2022123742-appb-000039
Figure PCTCN2022123742-appb-000040
其中,W是图像帧的水平宽度,H是图像帧的垂直高度,单位均为像素数量。
Step S25: Calculate the frame difference decision threshold Diff Thr of the motion-rich pixels in the next frame by using the frame difference amplitude mean value of the current frame, and the calculation formula is any one of the following. Diff Thr =Diff avg +Diff offset or Diff Thr =β×Diff avg . Among them, Diff offset is the motion threshold adjustment deviation, β is the motion threshold adjustment multiplier, and the values of these two values can be adjusted according to the sensitivity of the user to the motion area of the image. Diff avg is the average frame difference amplitude of the current frame, and the calculation formula is as follows.
Figure PCTCN2022123742-appb-000039
Figure PCTCN2022123742-appb-000040
Wherein, W is the horizontal width of the image frame, H is the vertical height of the image frame, and the unit is the number of pixels.
在计算LCU的运动感知因子K M的过程中,本发明具有如下创新。(1)利用连续视频帧之间图像内容上的相关性,使用前一帧统计的帧差幅值均值与用户调节参数相结合,作为当前帧中运动丰富像素的帧差判定阈值,实现对当前帧中LCU内部的像素点是否属于运动丰富像素点的自适应判定。(2)利用连续视频帧之间图像内容上的相关性,使用前一帧统计的LCU内部运动丰富区域的帧差幅值的最大值来作为当前帧中LCU内部运动丰富区域的帧差幅值与面积占比的乘积的归一化的基准,实现对当前帧中LCU内部运动丰富区域的帧差幅值与面积占比的乘积的自适应归一化,使当前帧内各LCU的帧差幅值大小分布更加合理,并令K M的取值分布更加合理。(3)使用LCU运动丰富区域的帧差幅值与面积占比的乘积来计算LCU的运动感知因子,可以更准确地反映人眼对运动区域的视觉关注度。 In the process of calculating the motion perception factor K M of the LCU, the present invention has the following innovations. (1) Utilize the correlation of image content between consecutive video frames, use the frame difference amplitude mean value of the previous frame statistics and user adjustment parameters, as the frame difference judgment threshold of the motion-rich pixels in the current frame, realize the current Adaptive determination of whether the pixels inside the LCU in the frame belong to motion-rich pixels. (2) Using the correlation of image content between consecutive video frames, use the maximum value of the frame difference amplitude of the LCU internal motion-rich area in the previous frame statistics as the frame difference amplitude of the LCU internal motion-rich area in the current frame The normalization benchmark of the product of the area ratio and the area ratio realizes the adaptive normalization of the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU in the current frame, so that the frame difference of each LCU in the current frame The amplitude distribution is more reasonable, and the value distribution of K M is more reasonable. (3) The motion perception factor of the LCU is calculated by using the product of the frame difference amplitude and the area ratio of the motion-rich area of the LCU, which can more accurately reflect the visual attention of the human eye to the motion area.
所述步骤S30中,将LCU的纹理感知因子K T和LCU的运动感知因子K M进行合成,并将合成结果作为LCU的比特分配权重。依据K T和K M计算LCU的比特分配权重的公式如下。ω LCU=μ T×K TM×K M。其中,ω LCU为LCU的比特分配权重,其取值在[0,ZZ]之间。ZZ的含义和优选取值与之前相同,不再赘述。μ T为LCU纹理感知因子的权重系数,μ M为LCU运动感知因子的权重系数,两者满足μ TM=1以及0<μ T<1以及0<μ M<1。 In the step S30, the texture perception factor K T of the LCU and the motion perception factor K M of the LCU are synthesized, and the result of the synthesis is used as the bit allocation weight of the LCU. The formula for calculating the bit allocation weight of the LCU according to K T and K M is as follows. ω LCUT ×K TM ×K M . Among them, ω LCU assigns weights to the bits of the LCU, and its value is between [0, ZZ]. The meaning and preferred value of ZZ are the same as before, and will not be repeated here. μ T is the weight coefficient of the LCU texture perception factor, and μ M is the weight coefficient of the LCU motion perception factor, both of which satisfy μ T + μ M =1 and 0<μ T <1 and 0<μ M <1.
由于连续的视频图像帧内容上存在相关性,因而对于相邻的视频图像帧,帧内所有LCU的比特分配权重之和也具有相关性,可以利用前一帧内所有LCU的比特分配权重之和来预测当前帧内所有LCU的比特分配权重之和。因此,在本发明的所述步骤S30中,在计算出当前帧中的LCU的比特分配权重后,会利用连续视频帧之间图像内容上的相关性,使用前一帧统计的LCU比特分配权重之和代替当前帧中LCU的比特分配权重之和,实时计算出当前帧中的LCU的比特分配权重在整个图像帧中的占比
Figure PCTCN2022123742-appb-000041
进而计算出LCU的目标编码比特数,这属 于本发明的一个创新。
Figure PCTCN2022123742-appb-000042
的计算公式如下。
Figure PCTCN2022123742-appb-000043
Since there is correlation in the content of consecutive video image frames, the sum of the bit allocation weights of all LCUs in the frame also has correlation for adjacent video image frames, and the sum of the bit allocation weights of all LCUs in the previous frame can be used To predict the sum of the bit allocation weights of all LCUs in the current frame. Therefore, in the step S30 of the present invention, after the bit allocation weight of the LCU in the current frame is calculated, the correlation on the image content between consecutive video frames will be used to use the LCU bit allocation weight of the previous frame statistics The sum replaces the sum of the bit allocation weights of the LCU in the current frame, and calculates the proportion of the bit allocation weight of the LCU in the current frame in the entire image frame in real time
Figure PCTCN2022123742-appb-000041
Furthermore, the target number of coding bits of the LCU is calculated, which belongs to an innovation of the present invention.
Figure PCTCN2022123742-appb-000042
The calculation formula of is as follows.
Figure PCTCN2022123742-appb-000043
通过这种方式,可以实现对当前帧中每个LCU的目标编码比特分配的实时计算,不需要在编码前对整帧所有LCU进行预处理,因而不会引入帧级编码延迟。同时,由于LCU的比特分配权重由当前帧中的LCU计算得出,LCU的比特分配与人眼在当前帧上的主观视觉感受一致。如果出现了少数前后帧图像内容差异较大的情形,则由HEVC码率控制算法的图像帧级别和GOP级别的比特分配算法进行调节。In this way, the real-time calculation of the target coding bit allocation of each LCU in the current frame can be realized, and there is no need to preprocess all LCUs in the entire frame before coding, so no frame-level coding delay is introduced. At the same time, since the bit allocation weight of the LCU is calculated by the LCU in the current frame, the bit allocation of the LCU is consistent with the subjective visual experience of the human eye on the current frame. If there are a few situations where the image content of the previous and subsequent frames is quite different, it will be adjusted by the bit allocation algorithm at the image frame level and GOP level of the HEVC code rate control algorithm.
本发明提出了一种基于人眼主观视觉感知的、适合于硬件实现的HEVC码率控制方法,具有如下有益效果。(1)本发明提出的码率控制方法可以使图像帧内LCU的比特分配更加符合人眼主观感知的特征,提升人眼主观视觉质量。(2)本发明在视觉感知因子的计算过程中充分利用了连续视频帧之间图像内容上的相关性,使LCU纹理丰富区域和运动丰富区域的判定可以自适应完成,并令纹理感知因子和运动感知因子的取值分布更加合理。(3)本发明在计算运动感知因子的过程中,使用LCU运动丰富区域的帧差幅值与面积占比的乘积来计算LCU的运动感知因子,可以更准确地反映人眼对运动区域的视觉关注度。(4)本发明在LCU比特分配的过程中,充分利用了连续视频帧之间图像内容上的相关性,实现了LCU比特分配权重在整个图像帧中所占比例的实时计算,在不增加图像帧预处理级的情况下,使LCU的比特分配符合人眼在当前帧上的主观视觉感受。(5)本发明采用的视觉感知因子计算方法简便,运算量小,总线带宽占用少,适合硬件实现。(6)本发明中,LCU纹理感知因子和运动感知因子的计算与LCU的编码同时进行,不需要引入预处理级在整帧编码前对帧内所有LCU的视觉感知因子进行计算,不会引入帧级延迟,不需要消耗额外的总线带宽,适合于硬件实现。The present invention proposes an HEVC code rate control method based on subjective visual perception of human eyes and suitable for hardware implementation, which has the following beneficial effects. (1) The code rate control method proposed by the present invention can make the bit allocation of the LCU in the image frame more in line with the characteristics of the subjective perception of the human eye, and improve the subjective visual quality of the human eye. (2) The present invention makes full use of the correlation on the image content between continuous video frames in the computing process of visual perception factor, makes the judgment of LCU texture-rich region and motion-rich region can be completed adaptively, and makes texture perception factor and The value distribution of the motion perception factor is more reasonable. (3) In the process of calculating the motion perception factor, the present invention calculates the motion perception factor of the LCU by using the product of the frame difference amplitude and the area ratio of the LCU motion-rich region, which can more accurately reflect the vision of the human eye to the motion region Attention. (4) In the process of LCU bit allocation, the present invention fully utilizes the correlation on the image content between continuous video frames, and realizes the real-time calculation of the proportion of LCU bit allocation weight in the whole image frame, without increasing the image content. In the case of the frame preprocessing stage, the bit allocation of the LCU conforms to the subjective visual experience of the human eye on the current frame. (5) The calculation method of the visual perception factor adopted by the present invention is simple and convenient, has a small amount of computation, occupies less bus bandwidth, and is suitable for hardware implementation. (6) In the present invention, the calculation of LCU texture perception factors and motion perception factors is carried out simultaneously with the encoding of LCU, and there is no need to introduce a preprocessing stage to calculate the visual perception factors of all LCUs in the frame before encoding the entire frame, without introducing Frame-level delay does not need to consume additional bus bandwidth and is suitable for hardware implementation.
为了验证本发明的有益效果,本发明选取了HEVC标准测试序列ClassE中的三个YUV视频流Johnny、FourPeople和KristenAndSara来进行测试。这三个视频流都属于典型的视频会议场景,人眼的主观关注区域都是视频流中的人脸部分。实验时,编码器采用恒定码率控制模式,Johnny和KristenAndSara编码比特率设置为600kbps,FourPeople编码比特率设置为800kbps,编码帧数为120帧,编码GOP结构为IPPP结构,P帧仅参考前一帧。实验中,以HM中采用的基于MAD值来计算LCU比特分配权重的码率控制算法为比较基准,与本发明提出的基于人眼主观视觉感知的改进的码率控制算法作对比,比较三个YUV视频流编码后图像中人脸区域的PSNR(Peak Signal-to-Noise Ratio,峰值信噪比)。PSNR的单位是分贝dB。PSNR越高,代表图像失真越小,图像质量越好。实验结果如下表1所示。In order to verify the beneficial effects of the present invention, the present invention selects three YUV video streams Johnny, FourPeople and KristenAndSara in the HEVC standard test sequence ClassE for testing. These three video streams are all typical video conferencing scenarios, and the subjective attention area of the human eye is the human face in the video stream. During the experiment, the encoder adopts the constant bit rate control mode, the encoding bit rate of Johnny and KristenAndSara is set to 600kbps, the encoding bit rate of FourPeople is set to 800kbps, the number of encoding frames is 120 frames, the encoding GOP structure is IPPP structure, and the P frame only refers to the previous one. frame. In the experiment, the code rate control algorithm based on the MAD value used in the HM to calculate the LCU bit allocation weight is used as a comparison benchmark, compared with the improved code rate control algorithm based on the subjective visual perception of the human eye proposed by the present invention, and three PSNR (Peak Signal-to-Noise Ratio, peak signal-to-noise ratio) of the face area in the image after encoding the YUV video stream. The unit of PSNR is decibel dB. The higher the PSNR, the smaller the image distortion and the better the image quality. The experimental results are shown in Table 1 below.
Figure PCTCN2022123742-appb-000044
Figure PCTCN2022123742-appb-000044
Figure PCTCN2022123742-appb-000045
Figure PCTCN2022123742-appb-000045
表1:本发明与现有的码率控制方法的测试比较表Table 1: The test comparison table of the present invention and existing code rate control method
在这三个YUV视频流中,与背景区域相比,人脸区域同时属于运动丰富区域和纹理丰富区域;即相对于背景区域,人脸区域运动特征很显著,纹理特征较显著。由表1可见,应用了本发明提出的码率控制算法以后,在编码码流整体平均码率和平均PSNR变化不大的情况下,提高了人脸区域的PSNR——Johnny提高0.38dB,KristenAndSara提高0.38dB,FourPeople提高0.42dB,有效增强了人眼主观感知视觉质量。In the three YUV video streams, compared with the background area, the face area belongs to both the motion-rich area and the texture-rich area; that is, compared with the background area, the face area has obvious motion characteristics and more obvious texture features. It can be seen from Table 1 that after applying the code rate control algorithm proposed by the present invention, under the condition that the overall average code rate and average PSNR of the encoded code stream do not change much, the PSNR of the face area is improved—Johnny improves by 0.38dB, KristenAndSara Increased by 0.38dB, FourPeople increased by 0.42dB, effectively enhancing the visual quality of human subjective perception.
以上仅为本发明的优选实施例,并不用于限定本发明。对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above are only preferred embodiments of the present invention, and are not intended to limit the present invention. Various modifications and variations of the present invention will occur to those skilled in the art. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims (18)

  1. 一种基于视觉感知的码率控制方法,其特征是,包括如下步骤:A kind of code rate control method based on visual perception, it is characterized in that, comprises the steps:
    步骤S10:计算LCU内部纹理丰富区域的梯度幅值,并将计算结果作为LCU的纹理感知因子;Step S10: Calculate the gradient magnitude of the texture-rich region inside the LCU, and use the calculation result as the texture perception factor of the LCU;
    步骤S20:计算LCU内部运动丰富区域的帧差幅值与面积占比的乘积,并将计算结果作为LCU的运动感知因子;Step S20: Calculate the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU, and use the calculation result as the motion perception factor of the LCU;
    所述步骤S10和步骤S20的顺序或者任一在前,或者同时进行;The sequence of step S10 and step S20 is either carried out before, or carried out at the same time;
    步骤S30:根据LCU的纹理感知因子和运动感知因子计算LCU的比特分配权重,进而计算出LCU的目标编码比特数。Step S30: Calculate the bit allocation weight of the LCU according to the texture perception factor and the motion perception factor of the LCU, and then calculate the target number of coding bits of the LCU.
  2. 根据权利要求1所述的基于视觉感知的码率控制方法,其特征是,所述LCU的纹理感知因子表征了人眼对于LCU纹理特征的视觉敏感程度;所述LCU的纹理感知因子取值越大,表示LCU内部的纹理越丰富,人眼主观视觉越关注;所述LCU的纹理感知因子取值越小,表示LCU内部的纹理越平坦,人眼主观视觉越不关注。The code rate control method based on visual perception according to claim 1, wherein the texture perception factor of the LCU represents the visual sensitivity of the human eye to the texture feature of the LCU; the higher the value of the texture perception factor of the LCU A larger value means that the texture inside the LCU is richer, and the subjective vision of the human eye pays more attention to it; the smaller the value of the texture perception factor of the LCU, it means that the texture inside the LCU is flatter, and the subjective vision of the human eye is less concerned.
  3. 根据权利要求1所述的基于视觉感知的码率控制方法,其特征是,所述LCU的运动感知因子表征了人眼对于LCU运动特征的视觉敏感程度;所述LCU的运动感知因子取值越大,表示LCU内部的运动越丰富,人眼主观视觉越关注;所述LCU的运动感知因子取值越小,表示LCU内部的运动越轻微,人眼主观视觉越不关注。The code rate control method based on visual perception according to claim 1, wherein the motion perception factor of the LCU represents the visual sensitivity of the human eye to the LCU motion feature; the higher the value of the motion perception factor of the LCU A larger value means that the movement inside the LCU is more abundant, and the subjective vision of the human eye pays more attention to it; the smaller the value of the motion perception factor of the LCU, it means that the movement inside the LCU is lighter, and the subjective vision of the human eye is less concerned.
  4. 根据权利要求1所述的基于视觉感知的码率控制方法,其特征是,所述步骤S10中计算LCU的纹理感知因子具体包括如下步骤;The code rate control method based on visual perception according to claim 1, wherein the calculation of the texture perception factor of the LCU in the step S10 specifically includes the following steps;
    步骤S11:计算LCU内部每一个像素的梯度幅值Grad x,y,并由前一帧梯度信息得到的纹理丰富像素的梯度判定阈值来判断LCU内部的各个像素是否为纹理丰富像素;LCU内部由纹理丰富像素组成的区域即为LCU内部纹理丰富区域; Step S11: Calculate the gradient magnitude Grad x, y of each pixel in the LCU, and judge whether each pixel in the LCU is a texture-rich pixel by using the gradient determination threshold of the texture-rich pixel obtained from the gradient information of the previous frame; The area composed of texture-rich pixels is the texture-rich area inside the LCU;
    步骤S12:计算LCU内部纹理丰富区域的梯度幅值Grad LCU;在计算当前帧中各个LCU的Grad LCU时,记录下其中的最大值G maxStep S12: Calculate the gradient amplitude Grad LCU of the texture-rich region inside the LCU; when calculating the Grad LCU of each LCU in the current frame, record the maximum value G max ;
    步骤S13:对LCU内部纹理丰富区域的梯度幅值Grad LCU进行归一化处理得到
    Figure PCTCN2022123742-appb-100001
    其中使用前一帧的LCU内部纹理丰富区域的梯度幅值的最大值作为当前帧中LCU内部纹理丰富区域的梯度幅值的归一化的基准;归一化值
    Figure PCTCN2022123742-appb-100002
    即为LCU的纹理感知因子;
    Step S13: Normalize the gradient magnitude Grad LCU of the texture-rich area inside the LCU to obtain
    Figure PCTCN2022123742-appb-100001
    Wherein the maximum value of the gradient magnitude of the LCU internal texture-rich region of the previous frame is used as the normalized benchmark of the gradient magnitude of the LCU internal texture-rich region in the current frame; the normalized value
    Figure PCTCN2022123742-appb-100002
    It is the texture perception factor of LCU;
    步骤S14:使用当前帧的像素梯度均值计算下一帧纹理丰富像素的梯度判定阈值Grad ThrStep S14: Using the pixel gradient mean value of the current frame to calculate the gradient determination threshold Grad Thr of the texture-rich pixel in the next frame.
  5. 根据权利要求4所述的基于视觉感知的码率控制方法,其特征是,所述步骤S11中,若满足Grad x,y>Grad Thr,判定该像素属于纹理丰富像素,否则判定该像素不属于纹理丰富像素;Grad Thr是由前一帧梯度信息得到的纹理丰富像素的梯度判定阈值。 The code rate control method based on visual perception according to claim 4, wherein in the step S11, if Grad x,y >Grad Thr is satisfied, it is determined that the pixel belongs to a texture-rich pixel, otherwise it is determined that the pixel does not belong to Texture-rich pixels; Grad Thr is the gradient determination threshold of texture-rich pixels obtained from the gradient information of the previous frame.
  6. 根据权利要求4所述的基于视觉感知的码率控制方法,其特征是,所述步骤S12中,
    Figure PCTCN2022123742-appb-100003
    Grad x,y>Grad Thr;其中,M是LCU的水平宽度,N是LCU的垂直高度,Grad LCU是LCU内部所有纹理丰富像素的梯度幅值之和。
    The code rate control method based on visual perception according to claim 4, characterized in that, in the step S12,
    Figure PCTCN2022123742-appb-100003
    Grad x,y >Grad Thr ; where M is the horizontal width of the LCU, N is the vertical height of the LCU, and Grad LCU is the sum of the gradient magnitudes of all texture-rich pixels inside the LCU.
  7. 根据权利要求4所述的基于视觉感知的码率控制方法,其特征是,所述步骤S13中,
    Figure PCTCN2022123742-appb-100004
    其中,G max是前一帧中各个LCU的Grad LCU的最大值;
    Figure PCTCN2022123742-appb-100005
    的取值在[0,ZZ]之间并且为整数,使用0到ZZ之间的整数来代表0至1的归一化范围,ZZ代表归一化后的最大值。
    The code rate control method based on visual perception according to claim 4, characterized in that, in the step S13,
    Figure PCTCN2022123742-appb-100004
    Wherein, G max is the maximum value of the Grad LCU of each LCU in the previous frame;
    Figure PCTCN2022123742-appb-100005
    The value of is between [0,ZZ] and is an integer, and an integer between 0 and ZZ is used to represent the normalized range of 0 to 1, and ZZ represents the maximum value after normalization.
  8. 根据权利要求4所述的基于视觉感知的码率控制方法,其特征是,所述步骤S14中,Grad Thr=Grad avg+Grad offset或Grad Thr=α×Grad avg;其中,Grad offset为梯度阈值调节偏差,α为梯度阈值调节乘数,Grad avg是当前帧的像素梯度均值;
    Figure PCTCN2022123742-appb-100006
    Figure PCTCN2022123742-appb-100007
    其中,W是当前图像帧的水平宽度,H是当前图像帧的垂直高度。
    The code rate control method based on visual perception according to claim 4, wherein, in the step S14, Grad Thr =Grad avg +Grad offset or Grad Thr =α×Grad avg ; wherein, Grad offset is a gradient threshold Adjust the deviation, α is the gradient threshold adjustment multiplier, and Grad avg is the average value of the pixel gradient of the current frame;
    Figure PCTCN2022123742-appb-100006
    Figure PCTCN2022123742-appb-100007
    Wherein, W is the horizontal width of the current image frame, and H is the vertical height of the current image frame.
  9. 根据权利要求1所述的基于视觉感知的码率控制方法,其特征是,所述步骤S20中计算LCU的运动感知因子具体包括如下步骤;The code rate control method based on visual perception according to claim 1, wherein the calculation of the motion perception factor of the LCU in the step S20 specifically includes the following steps;
    步骤S21:计算LCU内部每一个像素与前一帧同位LCU对应像素的帧差幅值Diff x,y,并由前一帧帧差信息得到的运动丰富像素的帧差判定阈值来判断LCU内部的各个像素是否为运动丰富像素;LCU内部由运动丰富像素组成的区域即为LCU内部运动丰富区域; Step S21: Calculate the frame difference amplitude Diff x,y of each pixel in the LCU and the corresponding pixel of the same LCU in the previous frame, and judge the frame difference judgment threshold of the motion-rich pixels obtained from the frame difference information of the previous frame to judge the Whether each pixel is a motion-rich pixel; the area composed of motion-rich pixels inside the LCU is the motion-rich area inside the LCU;
    步骤S22:计算LCU内部运动丰富区域的帧差幅值Diff LCU;在计算当前帧中各个LCU的Diff LCU时,记录下其中的最大值D maxStep S22: Calculate the frame difference amplitude value Diff LCU of the motion-rich area inside the LCU; when calculating the Diff LCU of each LCU in the current frame, record the maximum value D max ;
    步骤S23:计算LCU内部运动丰富区域的面积占比Area LCUStep S23: Calculate the area ratio Area LCU of the motion-rich area inside the LCU;
    所述步骤S22和步骤S23的顺序或者任一在前,或者同时进行;The sequence of step S22 and step S23 is either carried out before, or carried out at the same time;
    步骤S24:计算LCU内部运动丰富区域的帧差幅值与面积占比的乘积的归一化值
    Figure PCTCN2022123742-appb-100008
    其中使用前一帧的LCU内部运动丰富区域的帧差幅值的最大值来作为当前帧中LCU内部运动丰富区域的帧差幅值与面积占比的乘积的归一化的基准;归一化值
    Figure PCTCN2022123742-appb-100009
    即为LCU的运动感知因子;
    Step S24: Calculate the normalized value of the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU
    Figure PCTCN2022123742-appb-100008
    The maximum value of the frame difference amplitude of the LCU internal motion-rich area of the previous frame is used as the normalized benchmark of the product of the frame difference amplitude and the area ratio of the LCU internal motion-rich area in the current frame; normalization value
    Figure PCTCN2022123742-appb-100009
    It is the motion perception factor of LCU;
    步骤S25:使用当前帧的帧差幅值均值计算下一帧的运动丰富像素的帧差判定阈值Diff ThrStep S25: Calculate the frame difference determination threshold Diff Thr of the motion-rich pixels in the next frame by using the frame difference amplitude mean value of the current frame.
  10. 根据权利要求9所述的基于视觉感知的码率控制方法,其特征是,所述步骤S21中,若满足Diff x,y>Diff Thr,判定该像素属于运动丰富像素,否则判定该像素不属于运动丰富像素;Diff Thr是由前一帧帧差信息得到的运动丰富像素的帧差判定阈值。 The code rate control method based on visual perception according to claim 9, wherein in the step S21, if Diff x,y >Diff Thr is satisfied, it is determined that the pixel belongs to a motion-rich pixel, otherwise it is determined that the pixel does not belong to Motion-rich pixels; Diff Thr is the frame difference judgment threshold of motion-rich pixels obtained from the frame difference information of the previous frame.
  11. 根据权利要求9所述的基于视觉感知的码率控制方法,其特征是,所述步骤S22中,
    Figure PCTCN2022123742-appb-100010
    Diff x,y>Diff Thr;其中,M是LCU的水平宽度,N是LCU的垂直高 度,Diff LCU是LCU内部所有运动丰富像素的帧差幅值之和。
    The code rate control method based on visual perception according to claim 9, characterized in that, in the step S22,
    Figure PCTCN2022123742-appb-100010
    Diff x,y >Diff Thr ; where M is the horizontal width of the LCU, N is the vertical height of the LCU, and Diff LCU is the sum of frame differences of all motion-rich pixels inside the LCU.
  12. 根据权利要求9所述的基于视觉感知的码率控制方法,其特征是,所述步骤S23中,
    Figure PCTCN2022123742-appb-100011
    其中,Area LCU的取值在[0,ZZ]之间并且为整数,对应面积占比0%至100%;M是LCU的水平宽度,N是LCU的垂直高度;
    Figure PCTCN2022123742-appb-100012
    Figure PCTCN2022123742-appb-100013
    其中,若像素为运动丰富像素,则其对应的Mov x,y取值为1,否则取值为0。
    The code rate control method based on visual perception according to claim 9, characterized in that, in the step S23,
    Figure PCTCN2022123742-appb-100011
    Among them, the value of Area LCU is between [0, ZZ] and is an integer, and the corresponding area accounts for 0% to 100%; M is the horizontal width of LCU, and N is the vertical height of LCU;
    Figure PCTCN2022123742-appb-100012
    Figure PCTCN2022123742-appb-100013
    Wherein, if the pixel is a motion-rich pixel, its corresponding Mov x, y takes a value of 1, otherwise takes a value of 0.
  13. 根据权利要求9所述的基于视觉感知的码率控制方法,其特征是,所述步骤S24中,
    Figure PCTCN2022123742-appb-100014
    其中,
    Figure PCTCN2022123742-appb-100015
    的取值在[0,ZZ]之间并且为整数,使用0到ZZ之间的整数来代表0至1的归一化范围,ZZ代表归一化后的最大值;D max是前一帧中各个LCU的Diff LCU的最大值。
    The code rate control method based on visual perception according to claim 9, characterized in that, in the step S24,
    Figure PCTCN2022123742-appb-100014
    in,
    Figure PCTCN2022123742-appb-100015
    The value of is between [0, ZZ] and is an integer, using an integer between 0 and ZZ to represent the normalized range of 0 to 1, ZZ represents the maximum value after normalization; D max is the previous frame The maximum value of the Diff LCU of each LCU in .
  14. 根据权利要求9所述的基于视觉感知的码率控制方法,其特征是,所述步骤S25中,Diff Thr=Diff avg+Diff offset或Diff Thr=β×Diff avg;其中,Diff offset为运动阈值调节偏差,β为运动阈值调节乘数,Diff avg是当前帧的帧差幅值均值;
    Figure PCTCN2022123742-appb-100016
    其中,W是图像帧的水平宽度,H是图像帧的垂直高度。
    The code rate control method based on visual perception according to claim 9, wherein, in the step S25, Diff Thr =Diff avg +Diff offset or Diff Thr =β×Diff avg ; wherein, Diff offset is a motion threshold Adjust the deviation, β is the motion threshold adjustment multiplier, and Diff avg is the average value of the frame difference amplitude of the current frame;
    Figure PCTCN2022123742-appb-100016
    Wherein, W is the horizontal width of the image frame, and H is the vertical height of the image frame.
  15. 根据权利要求1所述的基于视觉感知的码率控制方法,其特征是,所述步骤S30中,依据LCU的纹理感知因子K T和LCU的运动感知因子K M计算LCU的比特分配权重的公式如下;ω LCU=μ T×K TM×K M;其中,ω LCU为LCU的比特分配权重,其取值在[0,ZZ]之间并且为整数,使用0到ZZ之间的整数来代表0至1;μ T为LCU纹理感知因子的权重系数,μ M为LCU运动感知因子的权重系数,两者满足μ TM=1以及0<μ T<1以及0<μ M<1。 The code rate control method based on visual perception according to claim 1, wherein in the step S30, the formula for calculating the bit allocation weight of the LCU according to the texture perception factor K T of the LCU and the motion perception factor K M of the LCU As follows; ω LCU = μ T × K T + μ M × K M ; where, ω LCU is the bit allocation weight of LCU, and its value is between [0, ZZ] and is an integer, and the value between 0 and ZZ is used Integer to represent 0 to 1; μ T is the weight coefficient of LCU texture perception factor, μ M is the weight coefficient of LCU motion perception factor, both satisfy μ T + μ M = 1 and 0<μ T <1 and 0<μ M <1.
  16. 根据权利要求1所述的基于视觉感知的码率控制方法,其特征是,所述步骤S30中,在计算出当前帧中的LCU的比特分配权重ω LCU后,使用前一帧统计的LCU比特分配权重之和代替当前帧中LCU的比特分配权重之和,实时计算出当前帧中的LCU的比特分配权重在整个图像帧中的占比
    Figure PCTCN2022123742-appb-100017
    进而计算出LCU的目标编码比特数;
    Figure PCTCN2022123742-appb-100018
    The code rate control method based on visual perception according to claim 1, wherein in the step S30, after calculating the bit allocation weight ω LCU of the LCU in the current frame, the LCU bit counted in the previous frame is used The sum of the allocation weights replaces the sum of the bit allocation weights of the LCUs in the current frame, and calculates the proportion of the bit allocation weights of the LCUs in the current frame in the entire image frame in real time
    Figure PCTCN2022123742-appb-100017
    Then calculate the target code bit number of LCU;
    Figure PCTCN2022123742-appb-100018
  17. 根据权利要求7、12、13、15中的任意一项所述的基于视觉感知的码率控制方法,其特征是,ZZ的取值为127、255、511、或1023之一。The code rate control method based on visual perception according to any one of claims 7, 12, 13, 15, characterized in that ZZ is one of 127, 255, 511, or 1023.
  18. 一种基于视觉感知的码率控制装置,其特征是,包括纹理感知因子计算模块、运动感知因子计算模块和LCU比特分配模块;A code rate control device based on visual perception, characterized in that it includes a texture perception factor calculation module, a motion perception factor calculation module and an LCU bit allocation module;
    所述纹理感知因子计算模块用于计算LCU内部纹理丰富区域的梯度幅值,并将计算结果 作为LCU的纹理感知因子;The texture perception factor calculation module is used to calculate the gradient magnitude of the LCU internal texture-rich region, and the calculation result is used as the texture perception factor of the LCU;
    所述运动感知因子计算模块用于计算LCU内部运动丰富区域的帧差幅值与面积占比的乘积,并将计算结果作为LCU的运动感知因子;The motion perception factor calculation module is used to calculate the product of the frame difference amplitude and the area ratio of the motion-rich area inside the LCU, and use the calculation result as the motion perception factor of the LCU;
    所述LCU比特分配模块根据LCU的纹理感知因子和运动感知因子计算LCU的比特分配权重,进而计算出LCU的目标编码比特数。The LCU bit allocation module calculates the bit allocation weight of the LCU according to the texture perception factor and the motion perception factor of the LCU, and then calculates the target number of coding bits of the LCU.
PCT/CN2022/123742 2022-02-23 2022-10-08 Visual perception-based rate control method and device WO2023159965A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210171159.4A CN114666585A (en) 2022-02-23 2022-02-23 Code rate control method and device based on visual perception
CN202210171159.4 2022-02-23

Publications (1)

Publication Number Publication Date
WO2023159965A1 true WO2023159965A1 (en) 2023-08-31

Family

ID=82027311

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/123742 WO2023159965A1 (en) 2022-02-23 2022-10-08 Visual perception-based rate control method and device

Country Status (2)

Country Link
CN (1) CN114666585A (en)
WO (1) WO2023159965A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117274820A (en) * 2023-11-20 2023-12-22 深圳市天成测绘技术有限公司 Map data acquisition method and system for mapping geographic information

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114666585A (en) * 2022-02-23 2022-06-24 翱捷科技股份有限公司 Code rate control method and device based on visual perception

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101827267A (en) * 2010-04-20 2010-09-08 上海大学 Code rate control method based on video image segmentation technology
CN105681793A (en) * 2016-01-06 2016-06-15 四川大学 Very-low delay and high-performance video coding intra-frame code rate control method based on video content complexity adaption
CN112291564A (en) * 2020-11-20 2021-01-29 西安邮电大学 HEVC intra-frame code rate control method for optimizing and monitoring video perception quality
CN114666585A (en) * 2022-02-23 2022-06-24 翱捷科技股份有限公司 Code rate control method and device based on visual perception

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101827267A (en) * 2010-04-20 2010-09-08 上海大学 Code rate control method based on video image segmentation technology
CN105681793A (en) * 2016-01-06 2016-06-15 四川大学 Very-low delay and high-performance video coding intra-frame code rate control method based on video content complexity adaption
CN112291564A (en) * 2020-11-20 2021-01-29 西安邮电大学 HEVC intra-frame code rate control method for optimizing and monitoring video perception quality
CN114666585A (en) * 2022-02-23 2022-06-24 翱捷科技股份有限公司 Code rate control method and device based on visual perception

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117274820A (en) * 2023-11-20 2023-12-22 深圳市天成测绘技术有限公司 Map data acquisition method and system for mapping geographic information
CN117274820B (en) * 2023-11-20 2024-03-08 深圳市天成测绘技术有限公司 Map data acquisition method and system for mapping geographic information

Also Published As

Publication number Publication date
CN114666585A (en) 2022-06-24

Similar Documents

Publication Publication Date Title
WO2023159965A1 (en) Visual perception-based rate control method and device
US10587874B2 (en) Real-time video denoising method and terminal during coding, and non-volatile computer readable storage medium
JP5318561B2 (en) Content classification for multimedia processing
TWI743919B (en) Video processing apparatus and processing method of video stream
US9025673B2 (en) Temporal quality metric for video coding
RU2377737C2 (en) Method and apparatus for encoder assisted frame rate up conversion (ea-fruc) for video compression
JP5399578B2 (en) Image processing apparatus, moving image processing apparatus, video processing apparatus, image processing method, video processing method, television receiver, program, and recording medium
CN113766226A (en) Image encoding method, apparatus, device and storage medium
US8737485B2 (en) Video coding mode selection system
CN106358040B (en) Code rate control bit distribution method based on significance
CN108737825A (en) Method for coding video data, device, computer equipment and storage medium
CN105072345A (en) Video encoding method and device
CN108810530A (en) A kind of AVC bit rate control methods based on human visual system
WO2023134523A1 (en) Content adaptive video coding method and apparatus, device and storage medium
JP3800435B2 (en) Video signal processing device
CN114339241A (en) Video code rate control method
JP2001076166A (en) Encoding method of animation dynamic image
CN106331705B (en) A kind of new HEVC code rate GOP grades of Bit distribution methods of control
KR100316764B1 (en) Method and system for coding images using human visual sensitivity characteristic
CN111246218A (en) JND model-based CU partition prediction and mode decision texture coding method
KR20040062733A (en) Bit rate control system based on object
CN114630120B (en) Video compression method and circuit system based on self-adaptive compression rate
Xiang et al. Perceptual ctu level bit allocation for avs2
JP2003009156A (en) Moving picture coding apparatus, method therefor, storing medium and moving picture decoding method
EP1921866A2 (en) Content classification for multimedia processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22928228

Country of ref document: EP

Kind code of ref document: A1