CN114666586A - Time domain perceptual coding method based on transform block motion - Google Patents
Time domain perceptual coding method based on transform block motion Download PDFInfo
- Publication number
- CN114666586A CN114666586A CN202210248448.XA CN202210248448A CN114666586A CN 114666586 A CN114666586 A CN 114666586A CN 202210248448 A CN202210248448 A CN 202210248448A CN 114666586 A CN114666586 A CN 114666586A
- Authority
- CN
- China
- Prior art keywords
- target
- motion
- video
- coefficient
- frequency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 230000009466 transformation Effects 0.000 claims abstract description 37
- 238000013139 quantization Methods 0.000 claims abstract description 21
- 230000004899 motility Effects 0.000 claims abstract description 20
- 210000001525 retina Anatomy 0.000 claims abstract description 12
- 230000008569 process Effects 0.000 claims abstract description 11
- 230000002207 retinal effect Effects 0.000 claims description 21
- 241000282414 Homo sapiens Species 0.000 claims description 19
- 238000005457 optimization Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 7
- 230000002123 temporal effect Effects 0.000 claims description 6
- 101150036841 minJ gene Proteins 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 4
- 239000000126 substance Substances 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 description 10
- 230000006870 function Effects 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000004304 visual acuity Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/147—Data rate or code amount at the encoder output according to rate distortion criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/189—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
- H04N19/19—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding using optimisation based on Lagrange multipliers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
Abstract
The invention discloses a time domain perceptual coding method based on transform block motion, which comprises the following steps: acquiring target parameters of a target transformation block in a video coding process, wherein the target parameters comprise a motion vector, a video frame rate, a quantization parameter and a video resolution; calculating a motility coefficient of the target transformation block according to the target parameter; determining the retina speeds in the horizontal direction and the vertical direction according to the motion vector and the motion coefficient; determining the maximum perceptible frequency in the horizontal direction and the vertical direction according to the retina speed; determining a target bit number and a target sum of square errors according to the maximum perceptible frequency; determining an optimal coding mode according to the target bit number and the target sum of square errors; wherein the optimal encoding mode is used for video encoding. The invention can reduce the whole code rate of video coding, simultaneously considers the image quality and can be widely applied to the technical field of video coding.
Description
Technical Field
The invention relates to the technical field of video coding, in particular to a time domain perceptual coding method based on motion of a transform block.
Background
The AVS3 video coding standard is a new generation video coding standard. The method is suitable for various application scenes such as ultra-high definition television broadcasting, VR, video monitoring and the like. AVS3 saves about 30% of the bit rate in 4K ultra high resolution video over AVS 2. Furthermore, the second stage of AVS3 is aimed at developing more efficient coding tools to improve performance, especially surveillance video and screen content video, with a target coding performance that doubles that of AVS 2. The AVS3 standard reference software HPM may achieve a BD-rate reduction of about 20% on average compared to the HEVC reference software HM. AVS3 employs a number of novel coding tools to improve coding efficiency, such as: QTBT + EQT partitioning, Ultimate Motion Vector Expression (UMVE), Position Based Transform (PBT), Intra Derived Tree (Intra DT), and the like. However, these techniques objectively pursue a better coding effect without considering subjective influence of human eyes. People are the final receivers of information, and the subjective quality is very important, so more and more subjective quality optimization technologies are introduced into video coding and decoding.
In recent years, with the continuous deepening of brain science, neuroscience and cognitive psychology research, research results in related fields provide a new idea for the development of video coding technology, and video coding technology based on visual perception is produced. Research shows that for human beings, visual redundancy exists in video information besides temporal redundancy, spatial redundancy and information entropy redundancy. This is because Human beings acquire and process Visual information through the Human Visual System (HVS). The HVS system is a nonlinear system and has a plurality of perceptual characteristics, and the main characteristics are represented in 3 aspects: luminance characteristics, frequency domain characteristics, image type characteristics. Among them, the luminance characteristic is one of the most basic characteristics of the human visual system, mainly regarding sensitivity of human eyes to luminance variation. In general, the human eye is less sensitive to noise attached to high luminance areas, which means that if the background luminance of an image is higher, it can contain more additional information. For the frequency domain characteristics, if the image is transformed from the spatial domain to the frequency domain, the higher the frequency, the lower the resolving power of the human eye. The lower the frequency, the higher the resolving power of the human eye. The frequency domain nature of the human visual system indicates that the human eye is less sensitive to high frequency content. From the image type characteristics, an image can be divided into a large smooth region and a texture dense region. The human visual system is much more sensitive to smooth regions than to texture dense regions. Just because the human visual system has differences in image resolution capabilities with different characteristics, there is a great deal of visual redundancy in video coding.
Disclosure of Invention
In view of this, embodiments of the present invention provide a time-domain perceptual coding method based on motion of a transform block, so as to reduce an overall bit rate of video coding and simultaneously consider image quality.
One aspect of the present invention provides a transform block motion-based time-domain perceptual coding method, including:
acquiring target parameters of a target transformation block in a video coding process, wherein the target parameters comprise a motion vector, a video frame rate, a quantization parameter and a video resolution;
calculating a motility coefficient of the target transformation block according to the target parameter;
determining the retina speeds in the horizontal direction and the vertical direction according to the motion vector and the motion coefficient;
determining the maximum perceptible frequency in the horizontal direction and the vertical direction according to the retina speed;
determining a target bit number and a target sum of square errors according to the maximum perceptible frequency;
determining an optimal coding mode according to the target bit number and the target sum of square errors;
wherein the optimal encoding mode is used for video encoding.
Optionally, the size of the target transform block is 4 × 4 or 8 × 8, and the calculation formula of the motility coefficient is:
wherein M represents the motility coefficient; QP represents the quantization parameter; fps represents the video frame rate; p _ w _ log2 represents the base 2 log of the bandwidth; p _ h _ log2 represents the base 2 log value of video high; tu _ w _ log2 represents the base-2 logarithm of the target transform block width; tu _ h _ log2 represents the base 2 logarithm of the target transform block height; c represents an adjustable parameter.
Optionally, the formula for calculating the retinal velocity is:
wherein the content of the first and second substances,represents the retinal velocity in the horizontal direction; m represents a motility coefficient; MV (Medium Voltage) data basexIs the horizontal component of the motion vector;represents the retinal velocity in the vertical direction; MV (Medium Voltage) data baseyThe vertical component of the motion vector.
Optionally, the maximum perceivable frequency is calculated by the following formula:
wherein, KiRepresents the maximum perceptible frequency, i represents the horizontal or vertical component; kmaxRepresenting the highest perceptible frequency of the human eye; v. ofcRepresents an angular velocity; vrRepresenting the retinal velocity of the object motion.
Optionally, the determining a target bit number and a target sum of squared errors according to the maximum perceptual frequency includes:
zeroing the high-frequency coefficient of the target transformation block according to the maximum perceptible frequency in the horizontal direction and the vertical direction;
calculating the target bit number required by the current coding mode after the high-frequency coefficient is set to zero;
and acquiring an original transformation coefficient, and performing inverse transformation and inverse quantization processing according to the original transformation coefficient to obtain the target square error sum.
Optionally, the determining an optimal coding mode according to the target bit number and the target sum of square errors includes determining an expression of rate-distortion optimization based on lagrangian according to the target bit number and the target sum of square errors;
the expression of the rate distortion optimization is as follows:
minJ J=D(Motion)ori+λModeR(Motion)new
wherein, D (motion)oriRepresenting the sum of squared errors, R (motion), recalculated using the original transform coefficients through inverse transform and inverse quantization processesnewAnd the target bit number required by the current coding mode is adopted after the high-frequency coefficient is set to zero.
Another aspect of the embodiments of the present invention further provides a transform block motion-based temporal perceptual coding apparatus, including:
the video coding device comprises a first module, a second module and a third module, wherein the first module is used for acquiring target parameters of a target transformation block in a video coding process, and the target parameters comprise a motion vector, a video frame rate, a quantization parameter and a video resolution;
a second module, configured to calculate a motility coefficient of the target transform block according to the target parameter;
a third module for determining the retinal velocity in the horizontal direction and the vertical direction according to the motion vector and the motility coefficient;
a fourth module for determining maximum perceptible frequencies in a horizontal direction and a vertical direction from the retinal velocity;
a fifth module, configured to determine a target bit number and a target sum of squared errors according to the maximum perceivable frequency;
a sixth module, configured to determine an optimal coding mode according to the target bit number and the target sum of squared errors;
wherein the optimal encoding mode is used for video encoding.
Another aspect of the embodiments of the present invention further provides an electronic device, including a processor and a memory;
the memory is used for storing programs;
the processor executes the program to implement the method as described above.
Still another aspect of embodiments of the present invention provides a computer-readable storage medium, which stores a program,
the program is executed by a processor to implement the method as described above.
Another aspect of the embodiments of the invention is a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
The method comprises the steps of obtaining target parameters of a target transformation block in a video coding process, wherein the target parameters comprise a motion vector, a video frame rate, a quantization parameter and a video resolution; calculating a motility coefficient of the target transformation block according to the target parameter; determining the retina speeds in the horizontal direction and the vertical direction according to the motion vector and the motion coefficient; determining the maximum perceptible frequency in the horizontal direction and the vertical direction according to the retina speed; determining a target bit number and a target sum of square errors according to the maximum perceptible frequency; determining an optimal coding mode according to the target bit number and the target sum of square errors; wherein the optimal coding mode is used for video coding. The invention can reduce the whole code rate of video coding and simultaneously give consideration to the image quality.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a diagram illustrating the frequency response of human eye;
FIG. 2 is a diagram illustrating perceptual frequency distributions corresponding to transform blocks;
FIG. 3 is a flowchart of the overall method steps provided by the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
To solve the problems in the prior art, an embodiment of the present invention provides a transform block motion-based time-domain perceptual coding method, and referring to fig. 3, the method of the present invention includes the following steps:
acquiring target parameters of a target transformation block in a video coding process, wherein the target parameters comprise a motion vector, a video frame rate, a quantization parameter and a video resolution;
calculating a motility coefficient of the target transformation block according to the target parameter;
determining the retina speeds in the horizontal direction and the vertical direction according to the motion vector and the motion coefficient;
determining the maximum perceptible frequency in the horizontal direction and the vertical direction according to the retina speed;
determining a target bit number and a target sum of square errors according to the maximum perceptible frequency;
determining an optimal coding mode according to the target bit number and the target sum of square errors;
wherein the optimal encoding mode is used for video encoding.
Optionally, the size of the target transform block is 4 × 4 or 8 × 8, and the calculation formula of the motility coefficient is:
wherein M represents the motility coefficient; QP represents the quantization parameter; fps represents the video frame rate; p _ w _ log2 represents the base 2 logarithm of the video bandwidth; p _ h _ log2 represents the base 2 log value of video high; tu _ w _ log2 represents the base-2 logarithm of the target transform block width; tu _ h _ log2 represents the base 2 logarithm of the target transform block height; c represents an adjustable parameter.
Optionally, the formula for calculating the retinal velocity is:
wherein the content of the first and second substances,represents the retinal velocity in the horizontal direction; m represents a motility coefficient; MV (Medium Voltage) devicexIs the horizontal component of the motion vector;represents the retinal velocity in the vertical direction; MV (Medium Voltage) data baseyThe vertical component of the motion vector.
Optionally, the maximum perceivable frequency is calculated by the following formula:
wherein, KiRepresents the maximum perceptible frequency, i represents the horizontal or vertical component; kmaxRepresenting the highest perceptible frequency of the human eye; v. ofcRepresents an angular velocity; vrRepresenting the retinal velocity of the object motion.
Optionally, the determining a target bit number and a target sum of squared errors according to the maximum perceptual frequency includes:
according to the maximum perceptible frequency in the horizontal direction and the vertical direction, setting the high-frequency coefficient of the target transformation block to zero;
calculating the target bit number required by the current coding mode after the high-frequency coefficient is set to zero;
and acquiring an original transformation coefficient, and performing inverse transformation and inverse quantization processing according to the original transformation coefficient to obtain the target square error sum.
Optionally, the determining an optimal coding mode according to the target bit number and the target sum of squared errors includes determining an expression of rate-distortion optimization based on lagrangian according to the target bit number and the target sum of squared errors;
the expression of the rate distortion optimization is as follows:
minJ J=D(Motion)ori+λModeR(Motion)new
wherein, D (motion)oriRepresenting the sum of squared errors, R (motion), recalculated using the original transform coefficients through inverse transform and inverse quantization processesnewAnd the target bit number required by the current coding mode is adopted after the high-frequency coefficient is set to zero.
Another aspect of the embodiments of the present invention further provides a transform block motion-based temporal perceptual coding apparatus, including:
the video coding device comprises a first module, a second module and a third module, wherein the first module is used for acquiring target parameters of a target transformation block in a video coding process, and the target parameters comprise a motion vector, a video frame rate, a quantization parameter and a video resolution;
a second module, configured to calculate a motility coefficient of the target transform block according to the target parameter;
a third module for determining the retinal velocities in the horizontal direction and the vertical direction according to the motion vector and the motility coefficient;
a fourth module for determining maximum perceptible frequencies in a horizontal direction and a vertical direction from the retinal velocity;
a fifth module, configured to determine a target bit number and a target sum of squared errors according to the maximum perceivable frequency;
a sixth module, configured to determine an optimal coding mode according to the target bit number and the target sum of squared errors;
wherein the optimal encoding mode is used for video encoding.
Another aspect of the embodiments of the present invention further provides an electronic device, including a processor and a memory;
the memory is used for storing programs;
the processor executes the program to implement the method as described above.
Still another aspect of embodiments of the present invention provides a computer-readable storage medium, which stores a program,
the program is executed by a processor to implement the method as described above.
Another aspect of the embodiments of the invention is a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
The following detailed description of the implementation principle of the present invention is made with reference to the accompanying drawings:
first, the visual sensitivity model is introduced:
for the visual sensitivity model, assuming that the highest perceivable frequency of a stationary object is 32cyc les/deg, the perceivable frequency is significantly reduced when the image moves at a high speed. As shown in fig. 1, the perceptible frequency of the human eye decreases significantly as the speed of the retina of an object increases. According to this rule, the highest perceivable frequency can be obtained with equation (1).
Wherein, KmaxIs the highest perceptible frequency (32cycles/deg), v, of the human eyecIs angular velocity (2deg/s), VrIs the retinal velocity of the object motion.
The following introduces a transform block motility based time-domain perceptual coding algorithm:
in AVS3 video coding, the transform block size may be 4x4, 8x8, 16x16, 64x 64. The invention mainly aims at a 4x4, 8x8 transformation block, and determines the retina velocity V of the current transformation block by obtaining the motion vector MV of the transformation block, the video frame rate fps, the quantization parameter QP and the video resolution (high video bandwidth)r. Specifically, the larger the MV of the transform block indicates that the current block moves faster. The larger the video fps is, the faster the video playing speed is, and the motion of the transformation block is faster at the moment. The smaller the quantization parameter QP, the more high frequency coefficients the transform block contains. The greater the resolution of the video frame, the more high frequency coefficients the transform block can reject. Since the boundary effect of large blocks is more significant, the larger the transform block, the fewer high frequency coefficients can be rejected.
Based on the principle, the invention substitutes the parameters into formula (2), calculates the motion coefficient M of the transformation block, and then substitutes the motion coefficient M into (3) and (4) to obtain VrMV is divided into horizontal and vertical directions, so calculated VrAlso divided into horizontal and vertical directions.
Description of the parameters: fps is video frame rate, p _ w _ log2 and p _ h _ log2 are video width and high base-2 logarithm values respectively, QP is quantization parameter value, tu _ w _ log2 and tu _ h _ log2 are transformation block width and high base-2 logarithm values respectively, c is adjustable parameter, MVxFor the horizontal component of the motion vector, MVyIs the vertical component of the motion vector and,in order to transform the horizontal retinal velocity of a block,to transform the vertical retinal velocity of the block.
Calculate outAndthen, the horizontal and vertical directions are calculated by substituting the calculated values into the formula (1)Andfor the 4x4 and 8x8 transform blocks, the high frequency transform coefficients are distributed at the lower right, i.e., the larger the number of columns and rows, the more the corresponding coefficients conform to the high frequency coefficients to be eliminated by the design. From equation (1), K can be calculatediAnd VrThe correspondence relationship of (a) is shown in table 1:
TABLE 1
Ki(cycles/deg) | 4 | 8 | 12 | 16 | 20 | 24 | 28 |
Vr(deg/sec) | 14.0 | 6.0 | 3.34 | 2.0 | 1.2 | 0.67 | 0.29 |
Table 1 describes KiAnd VrAccording to table 1, the present invention sets the perceptual frequencies of rows and columns of a transform block as shown in fig. 2, specifically, fig. 2 shows that the perceptual frequencies are mapped into the transform block according to the results of table 1, the perceptual frequency corresponding to each column of each row is 32, the block frequency change step size is 4 for 8x8, the block frequency change step size is 8 for 4x4, after the mapping relation as shown in fig. 2 is obtained, the Vr in the horizontal and vertical directions of the transform block is calculated to determine the maximum perceptual frequency in the horizontal and vertical directions, the high frequency coefficients of the rows and columns exceeding the frequency need to be set to zero, and red x indicates that the high frequency coefficient is set to zero.
For 4x4 and 8x8 transform blocks, a calculation is madeAndthen, will be greater than or equal toAndthe high-frequency coefficients on the rows and columns are set to zero, thereby reducing the transformation coefficients and finally achieving the aim of reducing the coding rate.
The principle of modifying the Rate Distortion Optimization (RDO) function is described in further detail below:
one of the main tasks of a video encoder is to select the optimal encoding parameters with a certain strategy to achieve the optimal encoding performance. The coding parameter optimization method based on the rate distortion theory is called rate distortion optimization, and the rate distortion optimization technology is a main means for ensuring the coding efficiency of a coder.
Inter-frame prediction is to predict the current coding block by using pixels of other coded pictures, and adopts a lagrangian-based rate-distortion optimization method, as shown in formula (5).
minJ J=D(Motion)+λModeR(Motion) (5)
Wherein, d (motion), r (motion) represent distortion and bit number when different motion modes (including motion vector, reference image, prediction weight, etc.) are adopted, λModeFor lagrangian factors, the optimal prediction mode is the motion mode with the minimum rate distortion cost.
The main goal of the time-domain perceptual coding algorithm is to zero out the high frequency coefficients with high-motility transform blocks, but after this sub-processing, the transform block high frequency coefficients are culled. In the RDO process, d (motion) is generally calculated by Sum of Squared errors (SSD), and after the high-frequency coefficients of the transformed block are removed, the SSD obtained after inverse transformation and inverse quantization is much larger than the original SSD, which may change the encoder to select other encoding parameters to a great extent, thereby resulting in a decrease in encoding efficiency.
In order to avoid great influence on the process of selecting the optimal coding parameters by the RDO of the encoder, before the high-frequency coefficient of the transformation block is eliminated, the transformation coefficient of an original transformation block is copied and inversely transformed and inversely quantized by the transformation coefficient to obtain the SSD, so that the obtained SSD cannot become large due to the elimination of the high-frequency coefficient, and the rate distortion optimization method based on Lagrange can be expressed by an equation (6).
minJ J=D(Motion)ori+λModeR(Motion)new (6)
Wherein, D (motion)oriRepresenting SSD, R (motion) computed using the original transform coefficients, inverse transformed, inverse quantized, and recomputednewThe number of bits in this mode is shown after the high frequency coefficients are removed.
After the rate distortion optimization calculation method is modified, the problem that the coding efficiency is reduced due to misjudgment when the encoder selects the optimal mode because the SSD is too large is avoided.
In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise indicated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those of ordinary skill in the art will be able to practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (10)
1. A method for temporal perceptual coding based on motion of transform blocks, comprising:
acquiring target parameters of a target transformation block in a video coding process, wherein the target parameters comprise a motion vector, a video frame rate, a quantization parameter and a video resolution;
calculating a motility coefficient of the target transformation block according to the target parameter;
determining the retina speeds in the horizontal direction and the vertical direction according to the motion vector and the motion coefficient;
determining the maximum perceptible frequency in the horizontal direction and the vertical direction according to the retina speed;
determining a target bit number and a target sum of square errors according to the maximum perceptible frequency;
determining an optimal coding mode according to the target bit number and the target sum of square errors;
wherein the optimal encoding mode is used for video encoding.
2. The method of claim 1, wherein the size of the target transform block is 4x4 or 8x8, and the motion coefficient is calculated by:
wherein M represents the motility coefficient; QP represents the quantization parameter; fps represents the video frame rate; p _ w _ log2 represents the base 2 logarithm of the video bandwidth; p _ h _ log2 represents the base 2 log value of video high; tu _ w _ log2 represents the base-2 logarithm of the target transform block width; tu _ h _ log2 represents the base 2 logarithm of the target transform block height; c represents an adjustable parameter.
3. The method of claim 1, wherein the retinal velocity is calculated by the following formula:
wherein the content of the first and second substances,represents the retinal velocity in the horizontal direction; m represents a motility coefficient; MV (Medium Voltage) data basexIs the horizontal component of the motion vector;represents the retinal velocity in the vertical direction; MV (Medium Voltage) data baseyThe vertical component of the motion vector.
4. A transform block motion-based temporal perceptual coding method according to claim 1, wherein the maximum perceptual frequency is calculated by:
wherein, KiRepresents the maximum perceptible frequency, i represents the horizontal or vertical component; kmaxThe highest perceptible frequency, represented by the human eye; v. ofcRepresents an angular velocity; vrRepresenting the retinal velocity of the object motion.
5. The method according to claim 1, wherein said determining a target number of bits and a target sum of square errors according to the maximum perceptual frequency comprises:
zeroing the high-frequency coefficient of the target transformation block according to the maximum perceptible frequency in the horizontal direction and the vertical direction;
calculating the target bit number required by the current coding mode after the high-frequency coefficient is set to zero;
and acquiring an original transformation coefficient, and performing inverse transformation and inverse quantization processing according to the original transformation coefficient to obtain the target square error sum.
6. The method as claimed in claim 6, wherein the step of determining the optimal coding mode according to the target bit number and the target sum of squared errors comprises determining an expression of rate-distortion optimization based on Lagrangian according to the target bit number and the target sum of squared errors;
the expression of the rate distortion optimization is as follows:
minJ J=D(Motion)ori+λModeR(Motion)new
wherein, D (motion)oriRepresenting the sum of squared errors, R (motion), recalculated using the original transform coefficients through inverse transform and inverse quantization processesnewAnd the target bit number required by the current coding mode is adopted after the high-frequency coefficient is set to zero.
7. An apparatus for temporal perceptual coding based on motion of transform blocks, comprising:
the video coding device comprises a first module, a second module and a third module, wherein the first module is used for acquiring target parameters of a target transformation block in a video coding process, and the target parameters comprise a motion vector, a video frame rate, a quantization parameter and a video resolution;
a second module, configured to calculate a motility coefficient of the target transform block according to the target parameter;
a third module for determining the retinal velocity in the horizontal direction and the vertical direction according to the motion vector and the motility coefficient;
a fourth module for determining maximum perceptible frequencies in a horizontal direction and a vertical direction from the retinal velocity;
a fifth module, configured to determine a target bit number and a target sum of squared errors according to the maximum perceivable frequency;
a sixth module, configured to determine an optimal coding mode according to the target bit number and the target sum of squared errors;
wherein the optimal encoding mode is used for video encoding.
8. An electronic device comprising a processor and a memory;
the memory is used for storing programs;
the processor executing the program realizes the method of any one of claims 1 to 6.
9. A computer-readable storage medium, characterized in that the storage medium stores a program which is executed by a processor to implement the method according to any one of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program realizes the method of any of claims 1 to 6 when executed by a processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210248448.XA CN114666586A (en) | 2022-03-14 | 2022-03-14 | Time domain perceptual coding method based on transform block motion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210248448.XA CN114666586A (en) | 2022-03-14 | 2022-03-14 | Time domain perceptual coding method based on transform block motion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114666586A true CN114666586A (en) | 2022-06-24 |
Family
ID=82028564
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210248448.XA Pending CN114666586A (en) | 2022-03-14 | 2022-03-14 | Time domain perceptual coding method based on transform block motion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114666586A (en) |
-
2022
- 2022-03-14 CN CN202210248448.XA patent/CN114666586A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11553182B2 (en) | Method and device for encoding or decoding image | |
CN111988611B (en) | Quantization offset information determining method, image encoding device and electronic equipment | |
US11082707B2 (en) | Encoding method and apparatus, image processing system, and computer-readable storage medium | |
US20180184128A1 (en) | Non-local adaptive loop filter combining multiple denoising technologies and grouping image patches in parallel | |
MX2012011650A (en) | Method and apparatus for encoding and decoding image and method and apparatus for decoding image using adaptive coefficient scan order. | |
KR20190122615A (en) | Method and Apparatus for image encoding | |
US20170374361A1 (en) | Method and System Of Controlling A Video Content System | |
KR20170084213A (en) | Systems and methods for processing a block of a digital image | |
RU2707719C1 (en) | Scanning order selection method and device | |
CN114040211A (en) | AVS 3-based intra-frame prediction rapid decision-making method | |
CN113906762B (en) | Pre-processing for video compression | |
Yang et al. | Fast intra encoding decisions for high efficiency video coding standard | |
JP2009135902A (en) | Encoding device, control method of the encoding device, and computer program | |
CN110581990B (en) | TU (TU) recursion fast algorithm suitable for HEVC (high efficiency video coding) 4K and 8K ultra-high definition coding | |
Zhao et al. | Fast CU partition decision strategy based on human visual system perceptual quality | |
CN114666586A (en) | Time domain perceptual coding method based on transform block motion | |
CN110855973B (en) | Video intra-frame fast algorithm based on regional directional dispersion sum | |
WO2021263251A1 (en) | State transition for dependent quantization in video coding | |
US11102488B2 (en) | Multi-scale metric-based encoding | |
CN109889829B (en) | Fast sample adaptive compensation for 360 degree video | |
US9838713B1 (en) | Method for fast transform coding based on perceptual quality and apparatus for the same | |
JP2016082395A (en) | Encoder, coding method and program | |
Mao et al. | A fast intra prediction algorithm based on wmse for 360-degree video | |
Pathak et al. | Low bit rate Intra prediction coding for Medical image Sequences using HEVC Standard | |
JP2014036393A (en) | Image encoding device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |