CN114666586A

CN114666586A - Time domain perceptual coding method based on transform block motion

Info

Publication number: CN114666586A
Application number: CN202210248448.XA
Authority: CN
Inventors: 梁凡; 范烁烁; 张坤
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2022-03-14
Filing date: 2022-03-14
Publication date: 2022-06-24

Abstract

The invention discloses a time domain perceptual coding method based on transform block motion, which comprises the following steps: acquiring target parameters of a target transformation block in a video coding process, wherein the target parameters comprise a motion vector, a video frame rate, a quantization parameter and a video resolution; calculating a motility coefficient of the target transformation block according to the target parameter; determining the retina speeds in the horizontal direction and the vertical direction according to the motion vector and the motion coefficient; determining the maximum perceptible frequency in the horizontal direction and the vertical direction according to the retina speed; determining a target bit number and a target sum of square errors according to the maximum perceptible frequency; determining an optimal coding mode according to the target bit number and the target sum of square errors; wherein the optimal encoding mode is used for video encoding. The invention can reduce the whole code rate of video coding, simultaneously considers the image quality and can be widely applied to the technical field of video coding.

Description

Time domain perceptual coding method based on transform block motion

Technical Field

The invention relates to the technical field of video coding, in particular to a time domain perceptual coding method based on motion of a transform block.

Background

The AVS3 video coding standard is a new generation video coding standard. The method is suitable for various application scenes such as ultra-high definition television broadcasting, VR, video monitoring and the like. AVS3 saves about 30% of the bit rate in 4K ultra high resolution video over AVS 2. Furthermore, the second stage of AVS3 is aimed at developing more efficient coding tools to improve performance, especially surveillance video and screen content video, with a target coding performance that doubles that of AVS 2. The AVS3 standard reference software HPM may achieve a BD-rate reduction of about 20% on average compared to the HEVC reference software HM. AVS3 employs a number of novel coding tools to improve coding efficiency, such as: QTBT + EQT partitioning, Ultimate Motion Vector Expression (UMVE), Position Based Transform (PBT), Intra Derived Tree (Intra DT), and the like. However, these techniques objectively pursue a better coding effect without considering subjective influence of human eyes. People are the final receivers of information, and the subjective quality is very important, so more and more subjective quality optimization technologies are introduced into video coding and decoding.

In recent years, with the continuous deepening of brain science, neuroscience and cognitive psychology research, research results in related fields provide a new idea for the development of video coding technology, and video coding technology based on visual perception is produced. Research shows that for human beings, visual redundancy exists in video information besides temporal redundancy, spatial redundancy and information entropy redundancy. This is because Human beings acquire and process Visual information through the Human Visual System (HVS). The HVS system is a nonlinear system and has a plurality of perceptual characteristics, and the main characteristics are represented in 3 aspects: luminance characteristics, frequency domain characteristics, image type characteristics. Among them, the luminance characteristic is one of the most basic characteristics of the human visual system, mainly regarding sensitivity of human eyes to luminance variation. In general, the human eye is less sensitive to noise attached to high luminance areas, which means that if the background luminance of an image is higher, it can contain more additional information. For the frequency domain characteristics, if the image is transformed from the spatial domain to the frequency domain, the higher the frequency, the lower the resolving power of the human eye. The lower the frequency, the higher the resolving power of the human eye. The frequency domain nature of the human visual system indicates that the human eye is less sensitive to high frequency content. From the image type characteristics, an image can be divided into a large smooth region and a texture dense region. The human visual system is much more sensitive to smooth regions than to texture dense regions. Just because the human visual system has differences in image resolution capabilities with different characteristics, there is a great deal of visual redundancy in video coding.

Disclosure of Invention

In view of this, embodiments of the present invention provide a time-domain perceptual coding method based on motion of a transform block, so as to reduce an overall bit rate of video coding and simultaneously consider image quality.

One aspect of the present invention provides a transform block motion-based time-domain perceptual coding method, including:

acquiring target parameters of a target transformation block in a video coding process, wherein the target parameters comprise a motion vector, a video frame rate, a quantization parameter and a video resolution;

calculating a motility coefficient of the target transformation block according to the target parameter;

determining the retina speeds in the horizontal direction and the vertical direction according to the motion vector and the motion coefficient;

determining the maximum perceptible frequency in the horizontal direction and the vertical direction according to the retina speed;

determining a target bit number and a target sum of square errors according to the maximum perceptible frequency;

determining an optimal coding mode according to the target bit number and the target sum of square errors;

wherein the optimal encoding mode is used for video encoding.

Optionally, the size of the target transform block is 4 × 4 or 8 × 8, and the calculation formula of the motility coefficient is:

wherein M represents the motility coefficient; QP represents the quantization parameter; fps represents the video frame rate; p _ w _ log2 represents the base 2 log of the bandwidth; p _ h _ log2 represents the base 2 log value of video high; tu _ w _ log2 represents the base-2 logarithm of the target transform block width; tu _ h _ log2 represents the base 2 logarithm of the target transform block height; c represents an adjustable parameter.

Optionally, the formula for calculating the retinal velocity is:

wherein the content of the first and second substances,

represents the retinal velocity in the horizontal direction; m represents a motility coefficient; MV (Medium Voltage) data base_xIs the horizontal component of the motion vector;

represents the retinal velocity in the vertical direction; MV (Medium Voltage) data base_yThe vertical component of the motion vector.

Optionally, the maximum perceivable frequency is calculated by the following formula:

wherein, K_iRepresents the maximum perceptible frequency, i represents the horizontal or vertical component; k_maxRepresenting the highest perceptible frequency of the human eye; v. of_cRepresents an angular velocity; v_rRepresenting the retinal velocity of the object motion.

Optionally, the determining a target bit number and a target sum of squared errors according to the maximum perceptual frequency includes:

zeroing the high-frequency coefficient of the target transformation block according to the maximum perceptible frequency in the horizontal direction and the vertical direction;

calculating the target bit number required by the current coding mode after the high-frequency coefficient is set to zero;

and acquiring an original transformation coefficient, and performing inverse transformation and inverse quantization processing according to the original transformation coefficient to obtain the target square error sum.

Optionally, the determining an optimal coding mode according to the target bit number and the target sum of square errors includes determining an expression of rate-distortion optimization based on lagrangian according to the target bit number and the target sum of square errors;

the expression of the rate distortion optimization is as follows:

minJ J＝D(Motion)_ori+λ_ModeR(Motion)_new

wherein, D (motion)_oriRepresenting the sum of squared errors, R (motion), recalculated using the original transform coefficients through inverse transform and inverse quantization processes_newAnd the target bit number required by the current coding mode is adopted after the high-frequency coefficient is set to zero.

Another aspect of the embodiments of the present invention further provides a transform block motion-based temporal perceptual coding apparatus, including:

the video coding device comprises a first module, a second module and a third module, wherein the first module is used for acquiring target parameters of a target transformation block in a video coding process, and the target parameters comprise a motion vector, a video frame rate, a quantization parameter and a video resolution;

a second module, configured to calculate a motility coefficient of the target transform block according to the target parameter;

a third module for determining the retinal velocity in the horizontal direction and the vertical direction according to the motion vector and the motility coefficient;

a fourth module for determining maximum perceptible frequencies in a horizontal direction and a vertical direction from the retinal velocity;

a fifth module, configured to determine a target bit number and a target sum of squared errors according to the maximum perceivable frequency;

a sixth module, configured to determine an optimal coding mode according to the target bit number and the target sum of squared errors;

wherein the optimal encoding mode is used for video encoding.

Another aspect of the embodiments of the present invention further provides an electronic device, including a processor and a memory;

the memory is used for storing programs;

the processor executes the program to implement the method as described above.

Still another aspect of embodiments of the present invention provides a computer-readable storage medium, which stores a program,

the program is executed by a processor to implement the method as described above.

Another aspect of the embodiments of the invention is a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.

The method comprises the steps of obtaining target parameters of a target transformation block in a video coding process, wherein the target parameters comprise a motion vector, a video frame rate, a quantization parameter and a video resolution; calculating a motility coefficient of the target transformation block according to the target parameter; determining the retina speeds in the horizontal direction and the vertical direction according to the motion vector and the motion coefficient; determining the maximum perceptible frequency in the horizontal direction and the vertical direction according to the retina speed; determining a target bit number and a target sum of square errors according to the maximum perceptible frequency; determining an optimal coding mode according to the target bit number and the target sum of square errors; wherein the optimal coding mode is used for video coding. The invention can reduce the whole code rate of video coding and simultaneously give consideration to the image quality.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a diagram illustrating the frequency response of human eye;

FIG. 2 is a diagram illustrating perceptual frequency distributions corresponding to transform blocks;

FIG. 3 is a flowchart of the overall method steps provided by the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

To solve the problems in the prior art, an embodiment of the present invention provides a transform block motion-based time-domain perceptual coding method, and referring to fig. 3, the method of the present invention includes the following steps:

wherein the optimal encoding mode is used for video encoding.

wherein M represents the motility coefficient; QP represents the quantization parameter; fps represents the video frame rate; p _ w _ log2 represents the base 2 logarithm of the video bandwidth; p _ h _ log2 represents the base 2 log value of video high; tu _ w _ log2 represents the base-2 logarithm of the target transform block width; tu _ h _ log2 represents the base 2 logarithm of the target transform block height; c represents an adjustable parameter.

Optionally, the formula for calculating the retinal velocity is:

wherein the content of the first and second substances,

represents the retinal velocity in the horizontal direction; m represents a motility coefficient; MV (Medium Voltage) device_xIs the horizontal component of the motion vector;

according to the maximum perceptible frequency in the horizontal direction and the vertical direction, setting the high-frequency coefficient of the target transformation block to zero;

Optionally, the determining an optimal coding mode according to the target bit number and the target sum of squared errors includes determining an expression of rate-distortion optimization based on lagrangian according to the target bit number and the target sum of squared errors;

the expression of the rate distortion optimization is as follows:

minJ J＝D(Motion)_ori+λ_ModeR(Motion)_new

a third module for determining the retinal velocities in the horizontal direction and the vertical direction according to the motion vector and the motility coefficient;

wherein the optimal encoding mode is used for video encoding.

the memory is used for storing programs;

the processor executes the program to implement the method as described above.

The following detailed description of the implementation principle of the present invention is made with reference to the accompanying drawings:

first, the visual sensitivity model is introduced:

for the visual sensitivity model, assuming that the highest perceivable frequency of a stationary object is 32cyc les/deg, the perceivable frequency is significantly reduced when the image moves at a high speed. As shown in fig. 1, the perceptible frequency of the human eye decreases significantly as the speed of the retina of an object increases. According to this rule, the highest perceivable frequency can be obtained with equation (1).

Wherein, K_maxIs the highest perceptible frequency (32cycles/deg), v, of the human eye_cIs angular velocity (2deg/s), V_rIs the retinal velocity of the object motion.

The following introduces a transform block motility based time-domain perceptual coding algorithm:

in AVS3 video coding, the transform block size may be 4x4, 8x8, 16x16, 64x 64. The invention mainly aims at a 4x4, 8x8 transformation block, and determines the retina velocity V of the current transformation block by obtaining the motion vector MV of the transformation block, the video frame rate fps, the quantization parameter QP and the video resolution (high video bandwidth)_r. Specifically, the larger the MV of the transform block indicates that the current block moves faster. The larger the video fps is, the faster the video playing speed is, and the motion of the transformation block is faster at the moment. The smaller the quantization parameter QP, the more high frequency coefficients the transform block contains. The greater the resolution of the video frame, the more high frequency coefficients the transform block can reject. Since the boundary effect of large blocks is more significant, the larger the transform block, the fewer high frequency coefficients can be rejected.

Based on the principle, the invention substitutes the parameters into formula (2), calculates the motion coefficient M of the transformation block, and then substitutes the motion coefficient M into (3) and (4) to obtain V_rMV is divided into horizontal and vertical directions, so calculated V_rAlso divided into horizontal and vertical directions.

Description of the parameters: fps is video frame rate, p _ w _ log2 and p _ h _ log2 are video width and high base-2 logarithm values respectively, QP is quantization parameter value, tu _ w _ log2 and tu _ h _ log2 are transformation block width and high base-2 logarithm values respectively, c is adjustable parameter, MV_xFor the horizontal component of the motion vector, MV_yIs the vertical component of the motion vector and,

in order to transform the horizontal retinal velocity of a block,

to transform the vertical retinal velocity of the block.

Calculate out

And

then, the horizontal and vertical directions are calculated by substituting the calculated values into the formula (1)

And

for the 4x4 and 8x8 transform blocks, the high frequency transform coefficients are distributed at the lower right, i.e., the larger the number of columns and rows, the more the corresponding coefficients conform to the high frequency coefficients to be eliminated by the design. From equation (1), K can be calculated_iAnd V_rThe correspondence relationship of (a) is shown in table 1:

TABLE 1

K_i(cycles/deg)	4	8	12	16	20	24	28
								V_r(deg/sec)	14.0	6.0	3.34	2.0	1.2	0.67	0.29

Table 1 describes K_iAnd V_rAccording to table 1, the present invention sets the perceptual frequencies of rows and columns of a transform block as shown in fig. 2, specifically, fig. 2 shows that the perceptual frequencies are mapped into the transform block according to the results of table 1, the perceptual frequency corresponding to each column of each row is 32, the block frequency change step size is 4 for 8x8, the block frequency change step size is 8 for 4x4, after the mapping relation as shown in fig. 2 is obtained, the Vr in the horizontal and vertical directions of the transform block is calculated to determine the maximum perceptual frequency in the horizontal and vertical directions, the high frequency coefficients of the rows and columns exceeding the frequency need to be set to zero, and red x indicates that the high frequency coefficient is set to zero.

For 4x4 and 8x8 transform blocks, a calculation is made

And

then, will be greater than or equal to

And

the high-frequency coefficients on the rows and columns are set to zero, thereby reducing the transformation coefficients and finally achieving the aim of reducing the coding rate.

The principle of modifying the Rate Distortion Optimization (RDO) function is described in further detail below:

one of the main tasks of a video encoder is to select the optimal encoding parameters with a certain strategy to achieve the optimal encoding performance. The coding parameter optimization method based on the rate distortion theory is called rate distortion optimization, and the rate distortion optimization technology is a main means for ensuring the coding efficiency of a coder.

Inter-frame prediction is to predict the current coding block by using pixels of other coded pictures, and adopts a lagrangian-based rate-distortion optimization method, as shown in formula (5).

minJ J＝D(Motion)+λ_ModeR(Motion) (5)

Wherein, d (motion), r (motion) represent distortion and bit number when different motion modes (including motion vector, reference image, prediction weight, etc.) are adopted, λ_ModeFor lagrangian factors, the optimal prediction mode is the motion mode with the minimum rate distortion cost.

The main goal of the time-domain perceptual coding algorithm is to zero out the high frequency coefficients with high-motility transform blocks, but after this sub-processing, the transform block high frequency coefficients are culled. In the RDO process, d (motion) is generally calculated by Sum of Squared errors (SSD), and after the high-frequency coefficients of the transformed block are removed, the SSD obtained after inverse transformation and inverse quantization is much larger than the original SSD, which may change the encoder to select other encoding parameters to a great extent, thereby resulting in a decrease in encoding efficiency.

In order to avoid great influence on the process of selecting the optimal coding parameters by the RDO of the encoder, before the high-frequency coefficient of the transformation block is eliminated, the transformation coefficient of an original transformation block is copied and inversely transformed and inversely quantized by the transformation coefficient to obtain the SSD, so that the obtained SSD cannot become large due to the elimination of the high-frequency coefficient, and the rate distortion optimization method based on Lagrange can be expressed by an equation (6).

minJ J＝D(Motion)_ori+λ_ModeR(Motion)_new (6)

Wherein, D (motion)_oriRepresenting SSD, R (motion) computed using the original transform coefficients, inverse transformed, inverse quantized, and recomputed_newThe number of bits in this mode is shown after the high frequency coefficients are removed.

After the rate distortion optimization calculation method is modified, the problem that the coding efficiency is reduced due to misjudgment when the encoder selects the optimal mode because the SSD is too large is avoided.

In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.

Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise indicated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those of ordinary skill in the art will be able to practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for temporal perceptual coding based on motion of transform blocks, comprising:

wherein the optimal encoding mode is used for video encoding.

2. The method of claim 1, wherein the size of the target transform block is 4x4 or 8x8, and the motion coefficient is calculated by:

3. The method of claim 1, wherein the retinal velocity is calculated by the following formula:

wherein the content of the first and second substances,

4. A transform block motion-based temporal perceptual coding method according to claim 1, wherein the maximum perceptual frequency is calculated by:

wherein, K_iRepresents the maximum perceptible frequency, i represents the horizontal or vertical component; k_maxThe highest perceptible frequency, represented by the human eye; v. of_cRepresents an angular velocity; v_rRepresenting the retinal velocity of the object motion.

5. The method according to claim 1, wherein said determining a target number of bits and a target sum of square errors according to the maximum perceptual frequency comprises:

6. The method as claimed in claim 6, wherein the step of determining the optimal coding mode according to the target bit number and the target sum of squared errors comprises determining an expression of rate-distortion optimization based on Lagrangian according to the target bit number and the target sum of squared errors;

the expression of the rate distortion optimization is as follows:

minJ J＝D(Motion)_ori+λ_ModeR(Motion)_new

7. An apparatus for temporal perceptual coding based on motion of transform blocks, comprising:

wherein the optimal encoding mode is used for video encoding.

8. An electronic device comprising a processor and a memory;

the memory is used for storing programs;

the processor executing the program realizes the method of any one of claims 1 to 6.

9. A computer-readable storage medium, characterized in that the storage medium stores a program which is executed by a processor to implement the method according to any one of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program realizes the method of any of claims 1 to 6 when executed by a processor.