CN114666586A - Time domain perceptual coding method based on transform block motion - Google Patents

Time domain perceptual coding method based on transform block motion Download PDF

Info

Publication number
CN114666586A
CN114666586A CN202210248448.XA CN202210248448A CN114666586A CN 114666586 A CN114666586 A CN 114666586A CN 202210248448 A CN202210248448 A CN 202210248448A CN 114666586 A CN114666586 A CN 114666586A
Authority
CN
China
Prior art keywords
target
motion
video
coefficient
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210248448.XA
Other languages
Chinese (zh)
Inventor
梁凡
范烁烁
张坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202210248448.XA priority Critical patent/CN114666586A/en
Publication of CN114666586A publication Critical patent/CN114666586A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/19Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding using optimisation based on Lagrange multipliers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding

Abstract

The invention discloses a time domain perceptual coding method based on transform block motion, which comprises the following steps: acquiring target parameters of a target transformation block in a video coding process, wherein the target parameters comprise a motion vector, a video frame rate, a quantization parameter and a video resolution; calculating a motility coefficient of the target transformation block according to the target parameter; determining the retina speeds in the horizontal direction and the vertical direction according to the motion vector and the motion coefficient; determining the maximum perceptible frequency in the horizontal direction and the vertical direction according to the retina speed; determining a target bit number and a target sum of square errors according to the maximum perceptible frequency; determining an optimal coding mode according to the target bit number and the target sum of square errors; wherein the optimal encoding mode is used for video encoding. The invention can reduce the whole code rate of video coding, simultaneously considers the image quality and can be widely applied to the technical field of video coding.

Description

Time domain perceptual coding method based on transform block motion
Technical Field
The invention relates to the technical field of video coding, in particular to a time domain perceptual coding method based on motion of a transform block.
Background
The AVS3 video coding standard is a new generation video coding standard. The method is suitable for various application scenes such as ultra-high definition television broadcasting, VR, video monitoring and the like. AVS3 saves about 30% of the bit rate in 4K ultra high resolution video over AVS 2. Furthermore, the second stage of AVS3 is aimed at developing more efficient coding tools to improve performance, especially surveillance video and screen content video, with a target coding performance that doubles that of AVS 2. The AVS3 standard reference software HPM may achieve a BD-rate reduction of about 20% on average compared to the HEVC reference software HM. AVS3 employs a number of novel coding tools to improve coding efficiency, such as: QTBT + EQT partitioning, Ultimate Motion Vector Expression (UMVE), Position Based Transform (PBT), Intra Derived Tree (Intra DT), and the like. However, these techniques objectively pursue a better coding effect without considering subjective influence of human eyes. People are the final receivers of information, and the subjective quality is very important, so more and more subjective quality optimization technologies are introduced into video coding and decoding.
In recent years, with the continuous deepening of brain science, neuroscience and cognitive psychology research, research results in related fields provide a new idea for the development of video coding technology, and video coding technology based on visual perception is produced. Research shows that for human beings, visual redundancy exists in video information besides temporal redundancy, spatial redundancy and information entropy redundancy. This is because Human beings acquire and process Visual information through the Human Visual System (HVS). The HVS system is a nonlinear system and has a plurality of perceptual characteristics, and the main characteristics are represented in 3 aspects: luminance characteristics, frequency domain characteristics, image type characteristics. Among them, the luminance characteristic is one of the most basic characteristics of the human visual system, mainly regarding sensitivity of human eyes to luminance variation. In general, the human eye is less sensitive to noise attached to high luminance areas, which means that if the background luminance of an image is higher, it can contain more additional information. For the frequency domain characteristics, if the image is transformed from the spatial domain to the frequency domain, the higher the frequency, the lower the resolving power of the human eye. The lower the frequency, the higher the resolving power of the human eye. The frequency domain nature of the human visual system indicates that the human eye is less sensitive to high frequency content. From the image type characteristics, an image can be divided into a large smooth region and a texture dense region. The human visual system is much more sensitive to smooth regions than to texture dense regions. Just because the human visual system has differences in image resolution capabilities with different characteristics, there is a great deal of visual redundancy in video coding.
Disclosure of Invention
In view of this, embodiments of the present invention provide a time-domain perceptual coding method based on motion of a transform block, so as to reduce an overall bit rate of video coding and simultaneously consider image quality.
One aspect of the present invention provides a transform block motion-based time-domain perceptual coding method, including:
acquiring target parameters of a target transformation block in a video coding process, wherein the target parameters comprise a motion vector, a video frame rate, a quantization parameter and a video resolution;
calculating a motility coefficient of the target transformation block according to the target parameter;
determining the retina speeds in the horizontal direction and the vertical direction according to the motion vector and the motion coefficient;
determining the maximum perceptible frequency in the horizontal direction and the vertical direction according to the retina speed;
determining a target bit number and a target sum of square errors according to the maximum perceptible frequency;
determining an optimal coding mode according to the target bit number and the target sum of square errors;
wherein the optimal encoding mode is used for video encoding.
Optionally, the size of the target transform block is 4 × 4 or 8 × 8, and the calculation formula of the motility coefficient is:
Figure BDA0003545843410000021
wherein M represents the motility coefficient; QP represents the quantization parameter; fps represents the video frame rate; p _ w _ log2 represents the base 2 log of the bandwidth; p _ h _ log2 represents the base 2 log value of video high; tu _ w _ log2 represents the base-2 logarithm of the target transform block width; tu _ h _ log2 represents the base 2 logarithm of the target transform block height; c represents an adjustable parameter.
Optionally, the formula for calculating the retinal velocity is:
Figure BDA0003545843410000023
Figure BDA0003545843410000024
wherein the content of the first and second substances,
Figure BDA0003545843410000025
represents the retinal velocity in the horizontal direction; m represents a motility coefficient; MV (Medium Voltage) data basexIs the horizontal component of the motion vector;
Figure BDA0003545843410000026
represents the retinal velocity in the vertical direction; MV (Medium Voltage) data baseyThe vertical component of the motion vector.
Optionally, the maximum perceivable frequency is calculated by the following formula:
Figure BDA0003545843410000022
wherein, KiRepresents the maximum perceptible frequency, i represents the horizontal or vertical component; kmaxRepresenting the highest perceptible frequency of the human eye; v. ofcRepresents an angular velocity; vrRepresenting the retinal velocity of the object motion.
Optionally, the determining a target bit number and a target sum of squared errors according to the maximum perceptual frequency includes:
zeroing the high-frequency coefficient of the target transformation block according to the maximum perceptible frequency in the horizontal direction and the vertical direction;
calculating the target bit number required by the current coding mode after the high-frequency coefficient is set to zero;
and acquiring an original transformation coefficient, and performing inverse transformation and inverse quantization processing according to the original transformation coefficient to obtain the target square error sum.
Optionally, the determining an optimal coding mode according to the target bit number and the target sum of square errors includes determining an expression of rate-distortion optimization based on lagrangian according to the target bit number and the target sum of square errors;
the expression of the rate distortion optimization is as follows:
minJ J=D(Motion)oriModeR(Motion)new
wherein, D (motion)oriRepresenting the sum of squared errors, R (motion), recalculated using the original transform coefficients through inverse transform and inverse quantization processesnewAnd the target bit number required by the current coding mode is adopted after the high-frequency coefficient is set to zero.
Another aspect of the embodiments of the present invention further provides a transform block motion-based temporal perceptual coding apparatus, including:
the video coding device comprises a first module, a second module and a third module, wherein the first module is used for acquiring target parameters of a target transformation block in a video coding process, and the target parameters comprise a motion vector, a video frame rate, a quantization parameter and a video resolution;
a second module, configured to calculate a motility coefficient of the target transform block according to the target parameter;
a third module for determining the retinal velocity in the horizontal direction and the vertical direction according to the motion vector and the motility coefficient;
a fourth module for determining maximum perceptible frequencies in a horizontal direction and a vertical direction from the retinal velocity;
a fifth module, configured to determine a target bit number and a target sum of squared errors according to the maximum perceivable frequency;
a sixth module, configured to determine an optimal coding mode according to the target bit number and the target sum of squared errors;
wherein the optimal encoding mode is used for video encoding.
Another aspect of the embodiments of the present invention further provides an electronic device, including a processor and a memory;
the memory is used for storing programs;
the processor executes the program to implement the method as described above.
Still another aspect of embodiments of the present invention provides a computer-readable storage medium, which stores a program,
the program is executed by a processor to implement the method as described above.
Another aspect of the embodiments of the invention is a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
The method comprises the steps of obtaining target parameters of a target transformation block in a video coding process, wherein the target parameters comprise a motion vector, a video frame rate, a quantization parameter and a video resolution; calculating a motility coefficient of the target transformation block according to the target parameter; determining the retina speeds in the horizontal direction and the vertical direction according to the motion vector and the motion coefficient; determining the maximum perceptible frequency in the horizontal direction and the vertical direction according to the retina speed; determining a target bit number and a target sum of square errors according to the maximum perceptible frequency; determining an optimal coding mode according to the target bit number and the target sum of square errors; wherein the optimal coding mode is used for video coding. The invention can reduce the whole code rate of video coding and simultaneously give consideration to the image quality.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a diagram illustrating the frequency response of human eye;
FIG. 2 is a diagram illustrating perceptual frequency distributions corresponding to transform blocks;
FIG. 3 is a flowchart of the overall method steps provided by the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
To solve the problems in the prior art, an embodiment of the present invention provides a transform block motion-based time-domain perceptual coding method, and referring to fig. 3, the method of the present invention includes the following steps:
acquiring target parameters of a target transformation block in a video coding process, wherein the target parameters comprise a motion vector, a video frame rate, a quantization parameter and a video resolution;
calculating a motility coefficient of the target transformation block according to the target parameter;
determining the retina speeds in the horizontal direction and the vertical direction according to the motion vector and the motion coefficient;
determining the maximum perceptible frequency in the horizontal direction and the vertical direction according to the retina speed;
determining a target bit number and a target sum of square errors according to the maximum perceptible frequency;
determining an optimal coding mode according to the target bit number and the target sum of square errors;
wherein the optimal encoding mode is used for video encoding.
Optionally, the size of the target transform block is 4 × 4 or 8 × 8, and the calculation formula of the motility coefficient is:
Figure BDA0003545843410000041
wherein M represents the motility coefficient; QP represents the quantization parameter; fps represents the video frame rate; p _ w _ log2 represents the base 2 logarithm of the video bandwidth; p _ h _ log2 represents the base 2 log value of video high; tu _ w _ log2 represents the base-2 logarithm of the target transform block width; tu _ h _ log2 represents the base 2 logarithm of the target transform block height; c represents an adjustable parameter.
Optionally, the formula for calculating the retinal velocity is:
Figure BDA0003545843410000052
Figure BDA0003545843410000053
wherein the content of the first and second substances,
Figure BDA0003545843410000054
represents the retinal velocity in the horizontal direction; m represents a motility coefficient; MV (Medium Voltage) devicexIs the horizontal component of the motion vector;
Figure BDA0003545843410000055
represents the retinal velocity in the vertical direction; MV (Medium Voltage) data baseyThe vertical component of the motion vector.
Optionally, the maximum perceivable frequency is calculated by the following formula:
Figure BDA0003545843410000051
wherein, KiRepresents the maximum perceptible frequency, i represents the horizontal or vertical component; kmaxRepresenting the highest perceptible frequency of the human eye; v. ofcRepresents an angular velocity; vrRepresenting the retinal velocity of the object motion.
Optionally, the determining a target bit number and a target sum of squared errors according to the maximum perceptual frequency includes:
according to the maximum perceptible frequency in the horizontal direction and the vertical direction, setting the high-frequency coefficient of the target transformation block to zero;
calculating the target bit number required by the current coding mode after the high-frequency coefficient is set to zero;
and acquiring an original transformation coefficient, and performing inverse transformation and inverse quantization processing according to the original transformation coefficient to obtain the target square error sum.
Optionally, the determining an optimal coding mode according to the target bit number and the target sum of squared errors includes determining an expression of rate-distortion optimization based on lagrangian according to the target bit number and the target sum of squared errors;
the expression of the rate distortion optimization is as follows:
minJ J=D(Motion)oriModeR(Motion)new
wherein, D (motion)oriRepresenting the sum of squared errors, R (motion), recalculated using the original transform coefficients through inverse transform and inverse quantization processesnewAnd the target bit number required by the current coding mode is adopted after the high-frequency coefficient is set to zero.
Another aspect of the embodiments of the present invention further provides a transform block motion-based temporal perceptual coding apparatus, including:
the video coding device comprises a first module, a second module and a third module, wherein the first module is used for acquiring target parameters of a target transformation block in a video coding process, and the target parameters comprise a motion vector, a video frame rate, a quantization parameter and a video resolution;
a second module, configured to calculate a motility coefficient of the target transform block according to the target parameter;
a third module for determining the retinal velocities in the horizontal direction and the vertical direction according to the motion vector and the motility coefficient;
a fourth module for determining maximum perceptible frequencies in a horizontal direction and a vertical direction from the retinal velocity;
a fifth module, configured to determine a target bit number and a target sum of squared errors according to the maximum perceivable frequency;
a sixth module, configured to determine an optimal coding mode according to the target bit number and the target sum of squared errors;
wherein the optimal encoding mode is used for video encoding.
Another aspect of the embodiments of the present invention further provides an electronic device, including a processor and a memory;
the memory is used for storing programs;
the processor executes the program to implement the method as described above.
Still another aspect of embodiments of the present invention provides a computer-readable storage medium, which stores a program,
the program is executed by a processor to implement the method as described above.
Another aspect of the embodiments of the invention is a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
The following detailed description of the implementation principle of the present invention is made with reference to the accompanying drawings:
first, the visual sensitivity model is introduced:
for the visual sensitivity model, assuming that the highest perceivable frequency of a stationary object is 32cyc les/deg, the perceivable frequency is significantly reduced when the image moves at a high speed. As shown in fig. 1, the perceptible frequency of the human eye decreases significantly as the speed of the retina of an object increases. According to this rule, the highest perceivable frequency can be obtained with equation (1).
Figure BDA0003545843410000061
Wherein, KmaxIs the highest perceptible frequency (32cycles/deg), v, of the human eyecIs angular velocity (2deg/s), VrIs the retinal velocity of the object motion.
The following introduces a transform block motility based time-domain perceptual coding algorithm:
in AVS3 video coding, the transform block size may be 4x4, 8x8, 16x16, 64x 64. The invention mainly aims at a 4x4, 8x8 transformation block, and determines the retina velocity V of the current transformation block by obtaining the motion vector MV of the transformation block, the video frame rate fps, the quantization parameter QP and the video resolution (high video bandwidth)r. Specifically, the larger the MV of the transform block indicates that the current block moves faster. The larger the video fps is, the faster the video playing speed is, and the motion of the transformation block is faster at the moment. The smaller the quantization parameter QP, the more high frequency coefficients the transform block contains. The greater the resolution of the video frame, the more high frequency coefficients the transform block can reject. Since the boundary effect of large blocks is more significant, the larger the transform block, the fewer high frequency coefficients can be rejected.
Based on the principle, the invention substitutes the parameters into formula (2), calculates the motion coefficient M of the transformation block, and then substitutes the motion coefficient M into (3) and (4) to obtain VrMV is divided into horizontal and vertical directions, so calculated VrAlso divided into horizontal and vertical directions.
Figure BDA0003545843410000071
Figure BDA0003545843410000072
Figure BDA0003545843410000073
Description of the parameters: fps is video frame rate, p _ w _ log2 and p _ h _ log2 are video width and high base-2 logarithm values respectively, QP is quantization parameter value, tu _ w _ log2 and tu _ h _ log2 are transformation block width and high base-2 logarithm values respectively, c is adjustable parameter, MVxFor the horizontal component of the motion vector, MVyIs the vertical component of the motion vector and,
Figure BDA0003545843410000074
in order to transform the horizontal retinal velocity of a block,
Figure BDA0003545843410000075
to transform the vertical retinal velocity of the block.
Calculate out
Figure BDA0003545843410000076
And
Figure BDA0003545843410000077
then, the horizontal and vertical directions are calculated by substituting the calculated values into the formula (1)
Figure BDA0003545843410000078
And
Figure BDA0003545843410000079
for the 4x4 and 8x8 transform blocks, the high frequency transform coefficients are distributed at the lower right, i.e., the larger the number of columns and rows, the more the corresponding coefficients conform to the high frequency coefficients to be eliminated by the design. From equation (1), K can be calculatediAnd VrThe correspondence relationship of (a) is shown in table 1:
TABLE 1
Ki(cycles/deg) 4 8 12 16 20 24 28
Vr(deg/sec) 14.0 6.0 3.34 2.0 1.2 0.67 0.29
Table 1 describes KiAnd VrAccording to table 1, the present invention sets the perceptual frequencies of rows and columns of a transform block as shown in fig. 2, specifically, fig. 2 shows that the perceptual frequencies are mapped into the transform block according to the results of table 1, the perceptual frequency corresponding to each column of each row is 32, the block frequency change step size is 4 for 8x8, the block frequency change step size is 8 for 4x4, after the mapping relation as shown in fig. 2 is obtained, the Vr in the horizontal and vertical directions of the transform block is calculated to determine the maximum perceptual frequency in the horizontal and vertical directions, the high frequency coefficients of the rows and columns exceeding the frequency need to be set to zero, and red x indicates that the high frequency coefficient is set to zero.
For 4x4 and 8x8 transform blocks, a calculation is made
Figure BDA00035458434100000710
And
Figure BDA00035458434100000711
then, will be greater than or equal to
Figure BDA00035458434100000712
And
Figure BDA00035458434100000713
the high-frequency coefficients on the rows and columns are set to zero, thereby reducing the transformation coefficients and finally achieving the aim of reducing the coding rate.
The principle of modifying the Rate Distortion Optimization (RDO) function is described in further detail below:
one of the main tasks of a video encoder is to select the optimal encoding parameters with a certain strategy to achieve the optimal encoding performance. The coding parameter optimization method based on the rate distortion theory is called rate distortion optimization, and the rate distortion optimization technology is a main means for ensuring the coding efficiency of a coder.
Inter-frame prediction is to predict the current coding block by using pixels of other coded pictures, and adopts a lagrangian-based rate-distortion optimization method, as shown in formula (5).
minJ J=D(Motion)+λModeR(Motion) (5)
Wherein, d (motion), r (motion) represent distortion and bit number when different motion modes (including motion vector, reference image, prediction weight, etc.) are adopted, λModeFor lagrangian factors, the optimal prediction mode is the motion mode with the minimum rate distortion cost.
The main goal of the time-domain perceptual coding algorithm is to zero out the high frequency coefficients with high-motility transform blocks, but after this sub-processing, the transform block high frequency coefficients are culled. In the RDO process, d (motion) is generally calculated by Sum of Squared errors (SSD), and after the high-frequency coefficients of the transformed block are removed, the SSD obtained after inverse transformation and inverse quantization is much larger than the original SSD, which may change the encoder to select other encoding parameters to a great extent, thereby resulting in a decrease in encoding efficiency.
In order to avoid great influence on the process of selecting the optimal coding parameters by the RDO of the encoder, before the high-frequency coefficient of the transformation block is eliminated, the transformation coefficient of an original transformation block is copied and inversely transformed and inversely quantized by the transformation coefficient to obtain the SSD, so that the obtained SSD cannot become large due to the elimination of the high-frequency coefficient, and the rate distortion optimization method based on Lagrange can be expressed by an equation (6).
minJ J=D(Motion)oriModeR(Motion)new (6)
Wherein, D (motion)oriRepresenting SSD, R (motion) computed using the original transform coefficients, inverse transformed, inverse quantized, and recomputednewThe number of bits in this mode is shown after the high frequency coefficients are removed.
After the rate distortion optimization calculation method is modified, the problem that the coding efficiency is reduced due to misjudgment when the encoder selects the optimal mode because the SSD is too large is avoided.
In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise indicated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those of ordinary skill in the art will be able to practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A method for temporal perceptual coding based on motion of transform blocks, comprising:
acquiring target parameters of a target transformation block in a video coding process, wherein the target parameters comprise a motion vector, a video frame rate, a quantization parameter and a video resolution;
calculating a motility coefficient of the target transformation block according to the target parameter;
determining the retina speeds in the horizontal direction and the vertical direction according to the motion vector and the motion coefficient;
determining the maximum perceptible frequency in the horizontal direction and the vertical direction according to the retina speed;
determining a target bit number and a target sum of square errors according to the maximum perceptible frequency;
determining an optimal coding mode according to the target bit number and the target sum of square errors;
wherein the optimal encoding mode is used for video encoding.
2. The method of claim 1, wherein the size of the target transform block is 4x4 or 8x8, and the motion coefficient is calculated by:
Figure FDA0003545843400000011
wherein M represents the motility coefficient; QP represents the quantization parameter; fps represents the video frame rate; p _ w _ log2 represents the base 2 logarithm of the video bandwidth; p _ h _ log2 represents the base 2 log value of video high; tu _ w _ log2 represents the base-2 logarithm of the target transform block width; tu _ h _ log2 represents the base 2 logarithm of the target transform block height; c represents an adjustable parameter.
3. The method of claim 1, wherein the retinal velocity is calculated by the following formula:
Figure FDA0003545843400000012
Figure FDA0003545843400000013
wherein the content of the first and second substances,
Figure FDA0003545843400000014
represents the retinal velocity in the horizontal direction; m represents a motility coefficient; MV (Medium Voltage) data basexIs the horizontal component of the motion vector;
Figure FDA0003545843400000015
represents the retinal velocity in the vertical direction; MV (Medium Voltage) data baseyThe vertical component of the motion vector.
4. A transform block motion-based temporal perceptual coding method according to claim 1, wherein the maximum perceptual frequency is calculated by:
Figure FDA0003545843400000016
wherein, KiRepresents the maximum perceptible frequency, i represents the horizontal or vertical component; kmaxThe highest perceptible frequency, represented by the human eye; v. ofcRepresents an angular velocity; vrRepresenting the retinal velocity of the object motion.
5. The method according to claim 1, wherein said determining a target number of bits and a target sum of square errors according to the maximum perceptual frequency comprises:
zeroing the high-frequency coefficient of the target transformation block according to the maximum perceptible frequency in the horizontal direction and the vertical direction;
calculating the target bit number required by the current coding mode after the high-frequency coefficient is set to zero;
and acquiring an original transformation coefficient, and performing inverse transformation and inverse quantization processing according to the original transformation coefficient to obtain the target square error sum.
6. The method as claimed in claim 6, wherein the step of determining the optimal coding mode according to the target bit number and the target sum of squared errors comprises determining an expression of rate-distortion optimization based on Lagrangian according to the target bit number and the target sum of squared errors;
the expression of the rate distortion optimization is as follows:
minJ J=D(Motion)oriModeR(Motion)new
wherein, D (motion)oriRepresenting the sum of squared errors, R (motion), recalculated using the original transform coefficients through inverse transform and inverse quantization processesnewAnd the target bit number required by the current coding mode is adopted after the high-frequency coefficient is set to zero.
7. An apparatus for temporal perceptual coding based on motion of transform blocks, comprising:
the video coding device comprises a first module, a second module and a third module, wherein the first module is used for acquiring target parameters of a target transformation block in a video coding process, and the target parameters comprise a motion vector, a video frame rate, a quantization parameter and a video resolution;
a second module, configured to calculate a motility coefficient of the target transform block according to the target parameter;
a third module for determining the retinal velocity in the horizontal direction and the vertical direction according to the motion vector and the motility coefficient;
a fourth module for determining maximum perceptible frequencies in a horizontal direction and a vertical direction from the retinal velocity;
a fifth module, configured to determine a target bit number and a target sum of squared errors according to the maximum perceivable frequency;
a sixth module, configured to determine an optimal coding mode according to the target bit number and the target sum of squared errors;
wherein the optimal encoding mode is used for video encoding.
8. An electronic device comprising a processor and a memory;
the memory is used for storing programs;
the processor executing the program realizes the method of any one of claims 1 to 6.
9. A computer-readable storage medium, characterized in that the storage medium stores a program which is executed by a processor to implement the method according to any one of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program realizes the method of any of claims 1 to 6 when executed by a processor.
CN202210248448.XA 2022-03-14 2022-03-14 Time domain perceptual coding method based on transform block motion Pending CN114666586A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210248448.XA CN114666586A (en) 2022-03-14 2022-03-14 Time domain perceptual coding method based on transform block motion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210248448.XA CN114666586A (en) 2022-03-14 2022-03-14 Time domain perceptual coding method based on transform block motion

Publications (1)

Publication Number Publication Date
CN114666586A true CN114666586A (en) 2022-06-24

Family

ID=82028564

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210248448.XA Pending CN114666586A (en) 2022-03-14 2022-03-14 Time domain perceptual coding method based on transform block motion

Country Status (1)

Country Link
CN (1) CN114666586A (en)

Similar Documents

Publication Publication Date Title
US11553182B2 (en) Method and device for encoding or decoding image
CN111988611B (en) Quantization offset information determining method, image encoding device and electronic equipment
US11082707B2 (en) Encoding method and apparatus, image processing system, and computer-readable storage medium
US20180184128A1 (en) Non-local adaptive loop filter combining multiple denoising technologies and grouping image patches in parallel
MX2012011650A (en) Method and apparatus for encoding and decoding image and method and apparatus for decoding image using adaptive coefficient scan order.
KR20190122615A (en) Method and Apparatus for image encoding
US20170374361A1 (en) Method and System Of Controlling A Video Content System
KR20170084213A (en) Systems and methods for processing a block of a digital image
RU2707719C1 (en) Scanning order selection method and device
CN114040211A (en) AVS 3-based intra-frame prediction rapid decision-making method
CN113906762B (en) Pre-processing for video compression
Yang et al. Fast intra encoding decisions for high efficiency video coding standard
JP2009135902A (en) Encoding device, control method of the encoding device, and computer program
CN110581990B (en) TU (TU) recursion fast algorithm suitable for HEVC (high efficiency video coding) 4K and 8K ultra-high definition coding
Zhao et al. Fast CU partition decision strategy based on human visual system perceptual quality
CN114666586A (en) Time domain perceptual coding method based on transform block motion
CN110855973B (en) Video intra-frame fast algorithm based on regional directional dispersion sum
WO2021263251A1 (en) State transition for dependent quantization in video coding
US11102488B2 (en) Multi-scale metric-based encoding
CN109889829B (en) Fast sample adaptive compensation for 360 degree video
US9838713B1 (en) Method for fast transform coding based on perceptual quality and apparatus for the same
JP2016082395A (en) Encoder, coding method and program
Mao et al. A fast intra prediction algorithm based on wmse for 360-degree video
Pathak et al. Low bit rate Intra prediction coding for Medical image Sequences using HEVC Standard
JP2014036393A (en) Image encoding device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination