CN111247804B

CN111247804B - Image processing method and device

Info

Publication number: CN111247804B
Application number: CN201980005232.7A
Authority: CN
Inventors: 孟学苇; 郑萧桢; 王苫社; 马思伟
Original assignee: Peking University; SZ DJI Technology Co Ltd
Current assignee: Peking University; SZ DJI Technology Co Ltd
Priority date: 2019-03-12
Filing date: 2019-03-12
Publication date: 2023-10-13
Anticipated expiration: 2039-03-12
Also published as: WO2020181507A1; CN111247804A

Abstract

A method and apparatus for image processing are provided, the method comprising: acquiring a motion vector CPMV of a control point of an image block; and acquiring a motion vector of the sub-image block in the image block according to the CPMV of the image block, wherein the motion vector is of integral pixel precision. By making the motion vector of the sub-image block as the image processing unit the whole pixel precision, the motion compensation process of the sub-image block can be made to involve no sub-pixels, so that the bandwidth pressure generated by the Affine prediction technique can be reduced to some extent.

Description

Image processing method and device

Copyright declaration

The disclosure of this patent document contains material which is subject to copyright protection. The copyright is owned by the copyright owner. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the patent and trademark office patent files or records.

Technical Field

The present application relates to the field of image processing, and more particularly, to a method and apparatus for image processing.

Background

The general idea of video coding inter prediction is: and predicting the current frame by using a previously encoded reconstructed frame as a reference frame through a motion estimation and motion compensation method by utilizing the time domain correlation between adjacent frames of the video, so as to remove the time redundancy information of the video. The general flow of inter prediction includes motion estimation (Motion Estimation, ME) and motion compensation (Motion Compensation, MC). The current coding block of the current frame searches the most similar block in the reference frame as the prediction block of the current block, and the relative displacement between the current block and the similar block is a Motion Vector (MV). The motion estimation process is a process of obtaining a motion vector after searching and comparing a current coding block of a current frame in a reference frame. Motion compensation is the process of deriving a predicted frame using an MV and a reference frame. The predicted frame obtained by motion compensation may have a certain difference from the original current frame, so that the difference (residual) between the predicted frame and the current frame needs to be transferred to the decoding end after the processes of transformation, quantization and the like, and in addition, the information of the MV and the reference frame needs to be transferred to the decoding end. The decoding end can reconstruct the current frame through MV, reference frame and the difference value between the predicted frame and the current frame.

Due to the continuity of the motion of a natural object, the motion vector of the object between two adjacent frames is not necessarily exactly an integer number of pixel units. In order to improve the accuracy of motion vectors, sub-pixel accuracy is proposed. For example, in the high performance video coding (high efficiency video coding, HEVC) standard, motion vectors with 1/4 pixel precision are employed for motion estimation of the luminance component. However, in digital video, there are no samples at fractional pixels, and in general, in order to achieve 1/K pixel accuracy estimation, the values of these fractional pixels must be interpolated approximately, i.e. by K times the row and column directions of the reference frame, i.e. the reference frame after interpolation is searched for a prediction block. In the process of interpolating the current block, pixel points in the current block and pixel points in adjacent areas thereof are required.

Generally, only conventional motion models (e.g., translational motion) are considered in the inter prediction process. However, in the real world there are also a wide variety of motion patterns, such as zoom, rotation, perspective, etc. irregular motion. In order to take the above multi-motion form into account, in VTM-3.0, affine motion compensated prediction (Affine motion compensation prediction, which may be simply referred to as Affine) techniques are introduced. In the Affine mode, the Affine motion field of an image block can be derived from motion vectors of two control points (four parameters) or three control points (six parameters).

In the Affine mode, motion vectors with 1/4 pixel precision, 1/16 pixel precision, or other sub-pixel precision may be employed for motion estimation of the image processing unit. The image processing unit of the Affine technique is a sub-CU (which may be referred to as a sub-block) having a size of 4×4 (units: pixels), which may cause the Affine technique to generate a larger bandwidth pressure.

Disclosure of Invention

The application provides an image processing method and device, which can reduce bandwidth pressure caused by an Affine prediction technology to a certain extent.

In a first aspect, there is provided a method of image processing, the method comprising: acquiring a motion vector CPMV of a control point of an image block; and acquiring a motion vector of the sub-image block in the image block according to the CPMV of the image block, wherein the motion vector is of the whole pixel precision.

In a second aspect, there is provided an apparatus for image processing, the apparatus comprising: a first acquisition unit for acquiring a motion vector CPMV of a control point of an image block; and the second acquisition unit is used for acquiring the motion vector of the sub-image block in the image block according to the CPMV of the image block acquired by the first acquisition unit, wherein the motion vector is of the whole pixel precision.

In a third aspect, there is provided an apparatus for image processing, the encoding apparatus comprising a memory for storing instructions and a processor for executing the instructions stored in the memory, and execution of the instructions stored in the memory causing the processor to perform the method provided in the first aspect.

In a fourth aspect, a chip is provided, the chip including a processing module and a communication interface, the processing module being configured to control the communication interface to communicate with the outside, the processing module being further configured to implement the method provided in the first aspect.

In a fifth aspect, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a computer causes the computer to implement the method of the first aspect or any of the possible implementations of the first aspect.

In a sixth aspect, there is provided a computer program product containing instructions which, when executed by a computer, cause the computer to carry out the method provided by the first aspect.

According to the scheme provided by the application, the motion vector of the sub-image block serving as the image processing unit is made to be the whole pixel precision, so that the sub-pixel is not involved in the motion compensation process of the sub-image block, and the bandwidth pressure generated by the Affine prediction technology can be reduced to a certain extent.

Drawings

Fig. 1 is a schematic diagram of a video coding architecture.

FIG. 2 is a schematic diagram of 1/4 pixel interpolation.

Fig. 3 (a) and 3 (b) are schematic diagrams of a four-parameter Affine model and a six-parameter Affine model, respectively.

Fig. 4 is a schematic diagram of an Affine motion vector field.

Fig. 5 is a diagram comparing reference pixels required for the Affine mode and the HEVC mode of the prior art.

Fig. 6 is a schematic flow chart of a method of image processing according to an embodiment of the application.

Fig. 7 is another schematic flow chart of a method of image processing according to an embodiment of the present application.

Fig. 8 is a further schematic flow chart of a method of image processing according to an embodiment of the present application.

Fig. 9 is a schematic flow chart of an apparatus for image processing according to an embodiment of the present application.

Fig. 10 is another schematic flow chart of an apparatus for image processing according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

To facilitate an understanding of the solution according to an embodiment of the application, several related concepts are first described below.

1. Inter prediction

As shown in fig. 1, the video coding framework mainly includes several parts, i.e., intra prediction, inter prediction, transformation, quantization, entropy coding, and loop filtering.

The present application is primarily directed to improvements in the inter prediction (inter prediction) section.

The general idea of inter prediction is: the temporal redundancy information of the video is removed by predicting the current frame by motion estimation (Motion Estimation, ME) and motion compensation (Motion Compensation, MC) using the reconstructed frame as a reference frame using the temporal correlation between adjacent frames of the video.

The current frame referred to herein represents a frame currently being encoded in an encoding scene and represents a frame currently being decoded in a decoding scene.

The reconstructed frame referred to herein, in the encoding scenario, represents a frame that has been previously encoded, and in the decoding scenario, represents a frame that has been previously decoded.

For a frame of image, the whole frame of image is not directly processed in the encoding process, and the whole frame of image is generally divided into image blocks for processing.

As an example, the whole frame image is first divided into Coding units (CTUs), for example, the CTUs are 64×64 or 128×128 (units: pixels), and then the CTUs may be further divided into square or rectangular Coding Units (CU). In the encoding process, the CU is processed.

The units of the sizes of the image blocks mentioned herein are pixels.

The general flow of inter prediction is as follows.

For a current image block (hereinafter simply referred to as a current block) in a current frame, a most similar block is found in a reference frame as a prediction block of the current block. The relative displacement between the current block and the similar block is called Motion Vector (MV). Motion estimation refers to a process of obtaining a motion vector by searching and comparing a current block of a current frame in a reference frame. Motion compensation refers to the process of obtaining a prediction block using a reference block and motion vectors obtained by motion estimation.

The prediction block obtained by the process of inter prediction may have a certain difference from the original current block, and thus, a difference between the prediction block and the current block, which may be referred to as a residual, needs to be calculated. After the residual is transformed, quantized, entropy coded, etc., the coded bit stream is obtained.

At the encoding end, after the image encoding is completed, that is, after the bit stream obtained by entropy encoding, the bit stream and the encoding mode information, for example, information such as the inter-frame prediction mode, the motion vector information, etc., are stored or transmitted to the decoding end.

At a decoding end, after obtaining an entropy coding bit stream, performing entropy decoding on the bit stream to obtain a corresponding residual error; then, according to the coding mode information such as the motion vector obtained by decoding, obtain the predictive block; and finally, obtaining the value of each pixel point in the current block according to the residual error and the predicted block, namely reconstructing the current block, and the like, and reconstructing the current frame.

As shown in fig. 1, the encoding process may further include the steps of dequantization and inverse transformation. Inverse quantization refers to the process that is the inverse of the quantization process. Inverse transformation refers to the process that is the inverse of the transformation process.

Inter prediction mainly includes forward prediction, backward prediction, bi-prediction, and the like. Wherein forward prediction predicts the current frame using a reconstructed frame (which may be referred to as a history frame) that is previous to the current frame. Backward prediction is the prediction of a current frame using a frame following the current frame (which may be referred to as a future frame). Bi-prediction may be bi-prediction, i.e. predicting the current frame using both "historical frames" and "future frames". Bi-prediction may also be a two-directional prediction, e.g., using two "history frames" to predict a current frame, or two "future frames" to predict a current frame.

2. Sub-pixel precision motion estimation

In a real scene, the motion vector of an object between two adjacent frames is not necessarily just an integer number of pixel units due to the continuity of the motion of a natural object, and thus, the accuracy of motion estimation needs to be raised to a sub-pixel level (also referred to as 1/K pixel accuracy). For example, in the HEVC standard, motion vectors with 1/4 pixel precision are employed for motion estimation of the luminance component.

However, there is no sample at 1/K pixel in the digital video, and in general, in order to realize motion estimation with 1/K pixel accuracy, the value of 1/K pixel is approximately interpolated, in other words, K times the interpolation is performed in the row direction and the column direction of the reference frame, and a search is performed in the image after interpolation. The interpolation process for the current block needs to use the pixel points in the current block and the pixels points in the adjacent areas.

As an example, a 1/4 pixel interpolation process is shown in fig. 2. For an image block of size 8 x 8, 4 x 4 or 8 x 4, the pixel values of the interpolation points are generated using the 3 pixel points on the left and the 4 pixel points on the right outside the image block. As shown in fig. 2, for an image block of size 4 x 4, a _0,0 And d _0,0 1/4 pixel point, b _0,0 And h _0,0 C is half pixel point _0,0 And n _0,0 Is 3-4 pixels. If the current block is said to be a 2 x 2 block, A _0,0 ～A _1,0 ，A _0,0 ～A _0,1 Enclosed 2 x 2 blocks. To calculate all interpolation points in this 2 x 2 block, some points outside 2 x 2 are needed, including 3 on the left, 4 on the right, 3 on the top, and 4 on the bottom.

3. Affine motion compensated prediction technique (Affine motion compensated prediction, hereinafter referred to as Affine).

Affine is an inter-frame prediction technique.

In the HEVC standard, the inter-prediction process only considers traditional motion models (e.g., translational motion). However, in the real world there are also a wide variety of motion patterns, such as zoom, rotation, perspective, etc. irregular motion. In order to take the above motion patterns into consideration, in VTM-3.0, the Affine technique is introduced.

As shown in fig. 3, a motion field of an Affine pattern may be derived by motion vectors of two control points (four parameters) as shown in fig. 3 (a) or three control points (six parameters) as shown in fig. 3 (b).

Hereinafter, MV (control point motion vector) of the control point is simply referred to as CPMV.

The processing unit of the Affine is not a CU, but sub-blocks (sub-CUs) obtained by dividing the CU, and each sub-CU has a size of 4×4. In the Affine mode, each sub-CU has one MV. It is understood that unlike a normal CU, an Affine mode CU has more than one MV, and how many sub-CUs there are in a CU, and how many MVs this CU has.

As an example, MVs of sub-CUs in one CU are derived by CPMV calculation of two control points or three control points as shown in fig. 3. For example, for a four-parameter Affine motion model, the MV of a sub-CU located at the (x, y) position is calculated by the following formula:

For another example, for a six-parameter Affine motion model, the MV of a sub-CU located at the (x, y) position is calculated by the following formula:

wherein (mv) _0x ，mv _0y ) MV, which is the upper left corner control point, (MV _1x ，mv _1y ) MV, which is the upper right corner control point, (MV _2x ，mv _2y ) MV for the lower left corner control point. W in the above formula represents the width of the CU of the sub-CU, and H represents the height of the CU of the sub-CU.

Through the calculation of the above formula (1), a schematic diagram of motion vectors in a CU is shown in fig. 4, and each square represents a sub-CU with a size of 4×4. The MVs of all sub-CUs after the above formula calculation are converted into a representation with 1/16 pixel precision, i.e. the highest precision of the MVs of the sub-CUs is 1/16 pixel.

After the MV of each sub-CU is calculated, a prediction block of each sub-CU is obtained through a motion compensation process. The sub-CU sizes of the chrominance component and the luminance component are both 4×4, and the motion vector of the chrominance component 4×4 block is averaged from the motion vectors of its corresponding four 4×4 luminance components.

In the encoding process of the Affine mode, CPMV information is written in the code stream, and MV information of each sub-CU does not need to be written.

4. Adaptive motion vector precision (Adaptive Motion Vector Resolution, AMVR)

The AMVR technique may enable a CU to have motion vectors with integer pixel precision and sub-pixel precision. The whole pixel precision may be, for example, 1 pixel precision, 2 pixel precision, or the like. The sub-pixel accuracy may be, for example, 1/2 pixel accuracy, 1/4 pixel accuracy, 1/8 pixel accuracy, 1/16 pixel accuracy, or the like.

For example, for each CU employing the Affine AMVR technique (in some cases, the CU may not employ the Affine AMVR), its corresponding MV precision is adaptively decided at the encoding end, and the decision result is written into the code stream and transmitted to the decoding end.

The whole pixel precision or sub-pixel precision mentioned in the Affine AMVR technique refers to the pixel precision of CPMV, not the sub-CU.

For CPMV of integer pixel, the motion estimation process of CU is the integer pixel process, but MV of sub-CU obtained after calculation by the above formula (1) or formula (2) may be 1/4 pixel precision or other sub-pixel precision.

If the MV of the sub-CU is of sub-pixel precision, the sub-CU motion compensation process involves sub-pixels, and the sub-CU size is 4×4, which causes the Affine prediction process to generate a larger bandwidth pressure.

The applicant selects official communication data as a test sequence on the latest reference software VTM-4.0 of the VVC, and simulation is carried out, and the simulation result is shown in figure 5.

As shown in fig. 5, the left box represents a bi-directional inter-prediction CU with HEVC worst case (MV with 1/4 pixel precision) of 8×8, and the number of required reference pixels is (8+7) × (8+7) ×2=450. The right box represents a CU for 4×4 bi-directional inter prediction in worst case (1/16 and 1/4 pixel precision MVs) in Affine mode of VVC, the number of required reference pixel points is (4+7) × (4+7) ×2×4=968.

As can be seen from fig. 5, the existing Affine mode increases 115% of reference pixels compared to HEVC, resulting in a larger bandwidth pressure.

In view of the above problems, the present application provides an image processing method and apparatus, which can reduce bandwidth pressure generated by an Affine technology to a certain extent.

The application is applicable to the technical field of digital video coding, in particular to an inter-frame prediction part of a video coder-decoder. The application can be applied to the coder-decoder conforming to the international video coding standard H.264/HEVC, the China AVS2 standard and the like, and the coder-decoder conforming to the next generation video coding standard VVC or AVS3 and the like.

The present application can be applied to an inter prediction portion of a video codec, that is, a method of image processing according to an embodiment of the present application can be performed by an encoding apparatus or a decoding apparatus.

Fig. 6 is a schematic flow chart of a method 600 of image processing provided by the present application, the method 600 comprising the following steps.

A motion vector (CPMV) of a control point of the image block is acquired 610.

The manner of acquiring the CPMV of the image block will be described below, and will not be described here.

620, according to the CPMV of the image block, the motion vector of the sub-image block in the image block is acquired, and the motion vector is the whole pixel precision.

In other words, based on the CPMV of the image block, the motion vector of the sub-image block in the image block is acquired, and the pixel precision of the motion vector of the sub-image block is made to be the whole pixel precision.

The sub-image blocks mentioned in the present application represent processing units for image processing or video processing. The sub-image block may be less than 8 pixels wide and/or high. For example, the sub-image block has a size of 4×4 (pixels).

The sub-picture blocks may be blocks obtained by dividing the picture blocks. It will be appreciated that a sub-image block may be considered to be the image block itself if the image block is the same size as the sub-image block.

The sub-image blocks may be square blocks, for example, blocks of size 4×4 or 8×8, or rectangular blocks, for example, blocks of size 2×4 or 4×8.

The size of the image block mentioned in the present application may be 16×16, 16×8, 16×4, 8×16, 4×8, 8×8, 8×4, 4×8, etc. other sizes.

It should be appreciated that the motion vector of the sub-image block as the processing unit is of integer pixel precision, and therefore the sub-pixel is not involved in the motion compensation process of the sub-image block, so that the bandwidth pressure generated in the video inter-prediction process can be reduced.

According to the CPMV of the image block, the process of acquiring the motion vector of the sub-image block in the image block may include: and calculating and obtaining the motion vector of the sub-image block according to the motion vectors of the two or three control points of the image block, and enabling the pixel precision of the obtained motion vector of the sub-image block to be the whole pixel precision.

As an example, the motion vector of the sub-image block may be calculated according to the formula (1) or the formula (2) described above.

Alternatively, in some embodiments, if the pixel precision of the motion vector of the sub-image block calculated directly based on the CPMV of the image block is the whole pixel precision, this motion vector is the motion vector of the sub-image block to be acquired by the present application.

For example, as one possible implementation, an algorithm is used to calculate the motion vector of the sub-image block according to the CPMV of the image block, and the algorithm may ensure that the pixel precision of the calculated motion vector of the sub-image block is the whole pixel.

Alternatively, in some embodiments, if the pixel precision of the motion vector of the calculated sub-image block is sub-pixel precision, for example, 1/4 pixel precision, 1/8 pixel precision, or 1/16 pixel precision, based directly on the CPMV of the image block, the motion vector currently calculated also needs to be processed to change from sub-pixel precision to integer pixel precision.

Optionally, step 620 includes steps 1) and 2) as follows.

1) And calculating a first motion vector of the sub-image block according to the CPMVs of the image block, wherein the first motion vector is sub-pixel precision.

For example, according to the formula (1) or the formula (2) described above, the first motion vector of the sub-image block is calculated based on the CPMV, and the pixel precision of the calculated first motion vector is sub-pixel.

2) The first motion vector is processed into a second motion vector of integer pixel precision.

As one possible implementation of step 2): and acquiring a second motion vector according to the first motion vector of the sub-image block, so that the end point of the second motion vector is the integral pixel point closest to the end point of the first motion vector.

For example, the nearest integer pixel point may be the integer pixel point above, below, to the left or to the right of the end point of the first motion vector.

As an example, the second motion vector (MV 2x, MV2 y) of the sub-image block is calculated from the first motion vector (MV 1x, MV1 y) of the sub-image block by the following formula.

If mv1x > =0, mv2x= ((mv1x+ (1 < < (shift-1))) > > shift) < < shift;

if MV1x <0, mv2x= - (-mv1x+ (1 < < < (shift-1))) > > shift) < < shift;

if mv1y > =0, mv2y= ((mv1y+ (1 < < (shift-1))) > > shift) < < shift;

if MV1y <0, mv2y= - (-mv1y+ (1 < < (shift-1))) > > shift) < < shift,

formula (3).

Wherein, the value of shift is related to the storage precision of the motion vector in the coding software platform. For example, in the current VTM-4.0 reference software, the motion vector may be stored with a precision of 1/16, and the value of shift may be set to 4.

As another example, the second motion vector (MV 2x, MV2 y) of the sub-picture block is obtained from the first motion vector (MV 1x, MV1 y) of the sub-picture block by the following formula.

If MV1x > =0, mv2x= (MV 1x > > shift) < < shift;

if MV1x <0, mv2x= - ((-MV 1 x) > > shift) < < shift);

if MV1y > =0, mv2y= (MV 1y > > shift) < < shift;

if MV1y <0, mv2y= - ((-MV 1 y) > > shift) < < shift), formula (4).

Wherein the meaning of shift is consistent with the meaning of shift described previously.

"<" in the formula (3) and the formula (4) means left shift, and "> >" means right shift.

The pixel precision of the motion vector is not limited by the manner of converting the sub-pixel level into the whole pixel level. For example, a second motion vector of integer pixel accuracy may also be obtained from the first motion vector according to other possible sub-pixel to integer pixel transformation algorithms.

When the size of the smallest CU (corresponding to the image block in the embodiment of the present application) processed in the Affine technique is 16×16, bandwidth pressure is not brought in the motion estimation process, so that no modification is required in the motion estimation process. In this case, the pixel precision of the CPMV of the image block may be the whole pixel or the sub-pixel. If the pixel precision of the CPMVs of the image blocks is a sub-pixel, the pixel precision of the motion vector of the sub-image blocks calculated according to the CPMVs of the image blocks is also a sub-pixel; if the pixel precision of the CPMV of the image block is the whole pixel, the pixel precision of the motion vector of the sub-image block calculated according to the CPMV of the image block may also be a sub-pixel, for example, the pixel precision of the motion vector of the sub-image block calculated according to the formula (1) or the formula (2) may be a sub-pixel.

As can be seen from the foregoing, in the existing Affine technology, the pixel precision of the motion vector of the sub-image block, i.e. the processing unit, may be sub-pixels, which may cause the motion compensation process to involve sub-pixels, and may increase the bandwidth pressure of the Affine technology.

It will be appreciated that the problem of bandwidth pressure can also be alleviated to some extent by enlarging the size of the sub-image block as a processing unit, but this reduces image compression performance. The application can ensure the motion compensation of the whole pixel precision by processing the motion vector of the sub-image block serving as the processing unit into the whole pixel precision, thereby solving the problem of bandwidth pressure on one hand and ensuring better image compression performance on the other hand.

According to the scheme provided by the application, the existing Affine technology is improved, namely, the motion vector of the Sub-CU in the Affine mode is processed into the whole pixel precision, so that the bandwidth pressure generated by the Affine technology can be reduced.

In addition to being applicable to the Affine technique, the scheme provided by the application can also be applied to other similar techniques that may occur in the future, for example, the pixel precision of the motion vector includes the whole pixel precision and the sub-pixel precision, and the size of the image processing unit is smaller, for example, 4×4.

It should be understood that the scheme provided by the application can be used for improving the quality of compressed video and improving the hardware friendliness of the codec, and has important significance for the compression processing of videos such as broadcast television, video conference, network video and the like.

Optionally, in some embodiments, the method provided by the embodiment of the present application further includes: the CPMV of the image block is processed to full pixel precision.

The embodiment can ensure that CPMV of the image block is whole pixel precision.

An embodiment of processing CPMV of the image block to full pixel accuracy will be described below.

Optionally, as shown in fig. 7, in some embodiments, step 610 includes the following steps 611, 612, and 613.

611, a motion information candidate list of the image block is acquired.

For example, motion vectors of spatial and/or temporal neighboring blocks of the image block are acquired, and a candidate list of motion information of the image block is constructed based on the motion vectors of the neighboring blocks.

The motion vectors in the motion information candidate list are processed to integer pixel precision 612.

For example, the motion vector in the motion information candidate list may be processed to the whole pixel precision using the formula (3) or the formula (4) described above.

The neighboring block refers to a neighboring block used to construct a motion information candidate list of the image block, e.g., a neighboring block in the temporal and/or spatial domain. The present application is not limited in the manner in which adjacent blocks are determined.

613, obtaining the CPMV of the image block according to the motion vector processed to the whole pixel precision in the motion information candidate list.

The Affine inter prediction mode may be divided into an Affine merge mode and an Affine inter mode.

The embodiment shown in fig. 7 can be applied to an Affine inter mode as well as an Affine merge mode.

Alternatively, in the embodiment shown in fig. 7, the inter prediction mode of the image block is the Affine merge mode.

In the Affine merge mode, one CPMV may be selected from the motion information candidate list as the CPMV of the image block directly. Namely step 613 comprises: and selecting one CPMV from the motion information candidate list of the image block as the CPMV of the image block.

Since the motion vector of the neighboring block used to construct the motion information candidate list is processed to the whole pixel precision, selecting CPMV from the motion information candidate list directly as CPMV of the image block can ensure that CPMV of the image block is the whole pixel.

As an example, the general flow of inter prediction of the Affine merge mode includes the following steps. In this example, a CU is taken as an example of an image block.

Step 1-1, motion Vectors (MVs) of neighboring blocks are obtained from spatial neighboring blocks and/or temporal neighboring blocks. The process acquires MVs of neighboring blocks of the Affine mode and MVs of neighboring blocks of the conventional mode, obtains CPMVs according to the MVs of the neighboring blocks, and constructs a motion information candidate list of the CU from the CPMVs.

And step 1-2, processing the motion vector in the motion information candidate list of the CU into whole pixel precision.

And step 1-3, selecting a combination (which may contain two or three CPMVs and represents two control points and three CPMVs of the control points) from the motion information candidate list as CPMVs of the CU.

In the Affine merge mode, CPMVs selected in the motion information candidate list are regarded as CPMVs of the current CU, motion estimation is not required, and a concept of MVD in the Affine inter mode (to be described later) does not exist. That is, in the Affine merge mode, only the index of cpvs selected from the motion information candidate list (one CU only needs to write one index) needs to be written into the bitstream, and MVD does not need to be transmitted.

Regarding the neighboring block mentioned in step 1-1, the inter prediction mode of the neighboring block may be a conventional inter prediction mode or an affine mode, and thus the MV obtained from the neighboring block may be either full-pel precision or sub-pel precision.

The embodiment can ensure that the CPMV of the image block is the whole pixel precision by processing the motion vector of the adjacent block of the current image block into the whole pixel precision.

The embodiment shown in fig. 7 can also be applied to the Affine inter mode as described above. For a better understanding of the embodiments of the present application, the general flow of the Affine Inter mode will be described before describing the application of the embodiment shown in fig. 7 to the Affine Inter mode.

By way of example, the general flow of the Affine Inter mode includes the following steps. In this example, a CU is taken as an example of an image block.

And 2-1, obtaining the motion vector of the adjacent block from the space domain adjacent block and/or the time domain adjacent block. The process obtains the motion vector of the adjacent block of the Affine mode and the motion vector of the adjacent block of the traditional mode; CPMVs are obtained according to the acquired motion vector combination, and a motion information candidate list of the CU is constructed by the CPMVs.

Step 2-2, selecting a combination (which may include two or three CPMV's, representing two control points and three CPMV's) from the motion information candidate list constructed in step 2-1 as a predicted MV (Motion vector prediction, MVP) of the current CU (i.e., a predicted CPMV s of the current CU).

And 2-3, performing motion estimation by taking the current whole CU as a unit to obtain CPMUs of the current CU.

And step 2-4, calculating the difference between the CPMVs selected in step 2-2 and the CPMVs of the motion estimation in step 2-3 to obtain a motion vector difference (Motion Vector Difference, MVD).

In the Affine Inter mode, an index of the selected cpvs, and the MVD need to be written into the bitstream.

In the Affine Inter mode, the motion estimation process is performed in units of CUs (corresponding to image blocks in the embodiment of the present application), and the motion compensation process is performed in units of 4×4 sub-CUs (corresponding to sub-image blocks in the embodiment of the present application).

Regarding the neighboring block mentioned in step 2-1, the inter prediction mode of the neighboring block may be a conventional inter prediction mode or an affine mode, and thus the MV obtained from the neighboring block may be either full-pel precision or sub-pel precision.

In the Affine Inter mode, the encoding end may select different pixel accuracies of the motion vector of the CU, which may be referred to as an adaptive motion vector accuracy (Adaptive Motion Vector Resolution, AMVR) decision.

The pixel precision of the AMVR decision is essentially that of the MVD, i.e., the CPMVs of the CU, rather than the MVs of the sub-CU.

In existing Affine Inter modes, the range of pixel precision for AMVR decisions includes, but is not limited to: 1/16 pixel precision, 1/8 pixel precision, 1/4 pixel precision, 1/2 pixel precision, 1 pixel precision, 2 pixel precision, 4 pixel precision, and the like. In other words, a CU may have CPMUs of a variety of different pixel accuracies. For example, a CU may have three different CPMVs, integer pixel, 1/4 pixel precision, and 1/16 pixel precision.

Optionally, in the embodiment shown in fig. 7, the Inter prediction mode of the image block is an Affine Inter mode, and step 611 includes obtaining a motion information candidate list of the image block; step 612 includes processing the motion vectors in the motion information candidate list to integer pixel precision; step 613 comprises: and selecting the prediction CPMV of the image block from the motion information candidate list of the image block to obtain the MVD of the image block, and obtaining the CPMV of the image block by the prediction CPMV of the image block and the MVD of the image block.

As shown in fig. 8, in this embodiment, step 610 may further include step 614 of performing motion vector precision decision of N pixels on the image block, where N is a positive integer.

I.e. a motion vector precision decision (AMVR decision) for the image block with full pixel precision.

It can be understood that by performing an AMVR decision with respect to the image block with full pixel precision, it can be ensured that the pixel precision of the MVD of the image block is full pixel, and also that the pixel precision of the CPMV of the image block is full pixel. In this way, it is ensured that no sub-pixels are involved in the motion estimation of the image block, so that the bandwidth pressure can be reduced to some extent.

In the present embodiment, when motion vector precision decision is made using the Affine AMVR, decision is not made for all pixel precision, but decision in which 1/M (M > 1) pixel precision is skipped, that is, decision for only N pixel precision is made.

It should be understood that in the present embodiment, when the motion vector precision index is written to the code stream, since the pixel precision option is reduced, the number of bits (bits) written to the code stream is correspondingly reduced, and even the number of bits representing the motion vector precision index may not be written. For example, the native pixel precision options include three: integer pixel, 1/4 pixel and 1/16 pixel, then at least 2 bits of information are required to represent these three pixel accuracies, e.g., using a "0" for 1/4 pixel, a "10" for 1/16 pixel, and a "11" for integer pixel. In this embodiment, a "0" may be used to represent the whole pixel, so that only 1 bit of data needs to be written into the code stream, or the whole pixel precision may be well defined by the protocol, so that the motion vector precision index does not need to be written into the code stream, which saves signaling overhead and also reduces bandwidth pressure.

Note that, when the inter prediction mode of the image block is the Affine inter mode, the embodiment of performing the motion vector accuracy decision of N (N is a positive integer) pixels on the image block may be implemented in combination with the embodiment shown in fig. 8 or may be implemented independently of the embodiment shown in fig. 8.

Optionally, as shown in fig. 8, in some embodiments, the inter prediction mode of the image block is an Affine inter mode, and step 610 includes: and acquiring CPMVs of the image blocks, and carrying out motion vector precision decision of N pixels on the image blocks, wherein N is a positive integer.

It will be appreciated that by making a motion vector accuracy decision for N pixels for the image block, the CPMV for the image block can be guaranteed to be of full pixel accuracy, whether or not the motion vectors for neighboring blocks of the image block are processed to be of full pixel accuracy.

It should also be appreciated that in the Affine Inter mode, motion estimation of the full pixel accuracy can be guaranteed by processing the pixel accuracy of the CPMV of the image block to the full pixel accuracy, which helps to reduce bandwidth pressure.

As can be seen from the above, in the Affine merge mode, the implementation manner of processing the pixel precision of the CPMV of the image block into the whole pixel precision is as follows: and processing the motion vector in the motion information candidate list into whole pixel precision.

In the Affine inter mode, the implementation way of processing the pixel precision of the CPMV of the image block into the whole pixel precision is as follows: and processing the motion vector in the motion information candidate list into the whole pixel precision, and carrying out AMVR decision of the whole pixel precision on the image block.

Or in the Affine inter mode, the implementation way of processing the pixel precision of the CPMV of the image block into the whole pixel precision is as follows: and carrying out AMVR decision with whole pixel precision on the image block.

In the above-described embodiment involving processing the motion vector of the neighboring block to the full pixel precision, the motion vector of the neighboring block may be processed to the full pixel precision in the manner shown in the above-described formula (3) or formula (4). Other possible algorithms or methods for converting sub-pixels to pixels may be used to process the motion vectors of neighboring blocks to full pixel precision. The application is not limited in this regard.

Optionally, in some embodiments, the CPMV of an image block is processed to full pixel precision when the size of the image block is less than a threshold.

The threshold may be determined based on actual requirements. For example, the threshold is 16 pixels.

For example, when the image block is less than 16 pixels high and/or wide, the CPMV of the image block is processed to full pixel precision.

As is known from the above-described Affine Inter mode, motion estimation in units of image blocks is performed in the Affine Inter mode. For example, when the height and width of an image block are equal to or greater than 16 pixels, even the motion estimation process of sub-pixel accuracy does not cause a large bandwidth pressure, in which case CPMV of the image block may not be processed to be of full pixel accuracy.

However, if the image block is less than 16 pixels high and/or wide, e.g., the image block is 4 x 8, 8 x 4, 4 x 16, or 16 x 4 in size, the sub-pixel accurate motion estimation process may cause a large bandwidth pressure. In this case, the CPMV of the image block may be processed to the whole pixel precision.

Optionally, in some embodiments, the prediction mode of the image block is an Affine Inter mode, and the image block has a height and/or width of less than 16 pixels, and the method according to an embodiment of the present application further includes: and carrying out AMVR decision with whole pixel precision on the image block.

The embodiment can ensure the motion estimation process of the whole pixel precision, thereby avoiding the generation of larger bandwidth pressure.

In addition, when the motion vector precision index of the image block satisfying the condition of high and/or width less than 16 pixels is written into the code stream, since the pixel precision option is reduced, the bit number of the written code stream can be reduced.

For example, for a CU with a high sum width greater than or equal to 16 pixels, the AMVR pixel precision is selected among three modes of integer pixel, 1/4 pixel, and 1/16 pixel, e.g., using "0" for 1/4 pixel, "10" for 1/16 pixel, and "11" for integer pixel. For CUs that are high and/or less than 16 pixels wide, the AMVR pixel precision index does not need to be written to the code stream because there is only one AMVR pixel precision option, e.g., full pixel precision can be employed by protocol conventions.

The embodiment of the application can be applied to different inter-frame prediction modes, such as forward prediction, backward prediction or bi-prediction. In other words, the inter prediction manner of the sub image block mentioned in the embodiment of the present application may be any of the following: forward prediction, backward prediction, bi-prediction.

For example, if the inter prediction mode of the sub-image block is forward prediction, the motion vector of the sub-image block obtained by the forward prediction process is processed into an integer pixel.

For another example, if the inter prediction mode of the sub-image block is backward prediction, the motion vector of the sub-image block obtained by the backward prediction process is processed into an integer pixel.

For another example, if the inter prediction mode of the sub-image block is bi-prediction, the motion vector of the sub-image block obtained by the bi-prediction process is processed into an integer pixel.

Optionally, the inter prediction mode of the sub-image block is bi-prediction, but only one prediction process in bi-prediction is used to process the motion vector of the sub-image block into the whole pixel precision by adopting the method provided by the embodiment of the application.

For example, the CPMV of the image block is the CPMV of the image block obtained by forward prediction in the bi-prediction process, or the CPMV of the image block obtained by backward prediction in the bi-prediction process.

In other words, for example, if the inter prediction mode of the sub-image block is bi-prediction, the motion vector of the sub-image block obtained by one prediction process of the bi-prediction process is processed as an integer pixel. The one prediction process may be a forward prediction process in bi-prediction or a backward prediction process in bi-prediction.

As can be seen from the above, according to the scheme provided by the present application, the motion vector of the sub-image block as the image processing unit is made to be the whole pixel precision, so that the motion compensation process of the sub-image block does not involve sub-pixels, and therefore, the bandwidth pressure generated by the Affine prediction technology can be reduced to a certain extent.

Further, by processing the pixel precision of the CPMV of the image block into the whole pixel precision, in the Affine Inter mode, motion estimation of the whole pixel precision can be ensured, which is helpful for reducing bandwidth pressure.

Therefore, the scheme provided by the application can reduce the bandwidth pressure caused by the inter-frame prediction process and ensure certain compression performance.

The method embodiments of the present application are described above and the apparatus embodiments of the present application will be described below. It should be understood that the descriptions of the apparatus embodiments and the descriptions of the method embodiments correspond to each other, and thus, descriptions of details not described may refer to the foregoing method embodiments, which are not repeated herein for brevity.

As shown in fig. 9, an embodiment of the present application provides an apparatus 900 for image processing, the apparatus 900 including the following units.

A first acquisition unit 910 for acquiring a motion vector CPMV of a control point of an image block.

A second obtaining unit 920, configured to obtain a motion vector of a sub-image block in the image block according to the CPMV of the image block obtained by the first obtaining unit 910, where the motion vector is of full pixel precision.

Optionally, in some embodiments, the second obtaining unit 920 is configured to: calculating a first motion vector of the sub-image block according to the CPMV of the image block, wherein the first motion vector is sub-pixel precision; the first motion vector is processed into a second motion vector of integer pixel precision.

Optionally, in some embodiments, the second obtaining unit 920 is configured to obtain the second motion vector according to the first motion vector of the sub-image block, so that an end point of the second motion vector is an integral pixel point closest to an end point of the first motion vector.

For example, the second obtaining unit 920 is configured to process the first motion vector into a second motion vector with pixel precision of an integer pixel by equation (3) or equation (4).

Optionally, in some embodiments, the sub-image block is 4 pixels high and/or wide.

Optionally, in some embodiments, the first obtaining unit 910 is configured to: acquiring a motion information candidate list of the image block, and processing a motion vector in the motion information candidate list into whole pixel precision; and acquiring CPMVs of the image block according to the motion vectors processed into the whole pixel precision in the motion information candidate list.

Optionally, in some embodiments, the apparatus 900 further comprises: and a processing unit 930, configured to make a motion vector precision decision of N pixels for the image block, where N is a positive integer.

Optionally, in some embodiments, the image block is less than 16 pixels in height and/or width.

Optionally, in some embodiments, the inter prediction mode of the sub-image block is any one of the following: forward prediction, backward prediction, bi-prediction.

Optionally, in some embodiments, the inter-prediction mode of the sub-image block is bi-prediction, where the CPMV of the image block is the CPMV of the image block obtained by forward prediction in the bi-prediction process, or the CPMV of the image block obtained by backward prediction in the bi-prediction process.

Optionally, the apparatus 900 for image processing in this embodiment may be an encoder, and a functional module for implementing a video coding related procedure may be further included in the apparatus 900.

Optionally, the apparatus 900 for image processing in this embodiment may be a decoder, and a functional module for implementing a video decoding related procedure may be further included in the apparatus 900.

As shown in fig. 10, an embodiment of the present invention further provides an apparatus 1000 for image processing. The apparatus 1000 comprises a processor 1010 and a memory 1020, the memory 1020 for storing instructions, the processor 1010 for executing the instructions stored in the memory 1020, and execution of the instructions stored in the memory 1020 causes the processor 1010 to perform the method of the above method embodiment.

Specifically, the encoding apparatus 1000 further includes a communication interface 1030 for transmitting signals with an external device.

Optionally, the apparatus 1000 for image processing of the present embodiment is an encoder, and the communication interface 1030 is used to receive image or video data to be processed from an external device. Alternatively, the communication interface 1030 is also configured to send the encoded code stream to a decoding end.

Optionally, the apparatus 1000 for image processing of the present embodiment is a decoder, and the communication interface 1030 is configured to receive the encoded code stream from the encoding device.

The present invention also provides a computer storage medium having stored thereon a computer program which, when executed by a computer, causes the computer to perform the method of the method embodiment above.

Embodiments of the present invention also provide a computer program product comprising instructions which, when executed by a computer, cause the computer to perform the method of the method embodiments above.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present invention are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means from one website, computer, server, or data center. Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of image processing, comprising:

acquiring a motion vector CPMV of a control point of an image block;

according to the CPMV of the image block, a motion vector of a sub-image block in the image block is obtained, wherein the motion vector is of integral pixel precision;

Based on the CPMV of the image block, acquiring a motion vector of a sub-image block in the image block, including:

calculating a first motion vector of the sub-image block according to the CPMV of the image block, wherein the first motion vector is sub-pixel precision;

the first motion vector is processed into a second motion vector of integer pixel precision.

2. Method according to claim 1, characterized in that the sub-image blocks have a height and/or width of 4 pixels.

3. Method according to claim 1 or 2, characterized in that the acquisition of the motion vector CPMV of the control point of the image block comprises:

acquiring a motion information candidate list of the image block;

processing the motion vector in the motion information candidate list into whole pixel precision;

and obtaining CPMVs of the image blocks according to the motion vectors processed into the whole pixel precision in the motion information candidate list.

4. The method according to claim 1 or 2, characterized in that the method further comprises:

and carrying out motion vector precision decision of N pixels on the image block, wherein N is a positive integer.

5. Method according to claim 1 or 2, characterized in that the image block has a height and/or width of less than 16 pixels.

6. The method according to claim 1 or 2, wherein the inter prediction mode of the sub image block is any one of the following: forward prediction, backward prediction, bi-prediction.

7. The method according to claim 1 or 2, wherein the inter prediction mode of the sub-picture block is bi-prediction, wherein the CPMV of the picture block is the CPMV of the picture block obtained by forward prediction in bi-prediction or the CPMV of the picture block obtained by backward prediction in bi-prediction.

8. The method of claim 1, wherein processing the first motion vector into a second motion vector of integer pixel precision comprises:

and acquiring the second motion vector according to the first motion vector, so that the end point of the second motion vector is the integral pixel point closest to the end point of the first motion vector.

9. An apparatus for image processing, comprising:

a first acquisition unit for acquiring a motion vector CPMV of a control point of an image block;

a second obtaining unit, configured to obtain a motion vector of a sub-image block in the image block according to the CPMV of the image block obtained by the first obtaining unit, where the motion vector is whole pixel precision;

The second acquisition unit is used for:

10. The apparatus of claim 9, wherein the sub-image blocks are 4 pixels high and/or wide.

11. The apparatus according to claim 9 or 10, wherein the first acquisition unit is configured to:

acquiring a motion information candidate list of the image block;

12. The apparatus according to claim 9 or 10, characterized in that the apparatus further comprises:

and the processing unit is used for carrying out motion vector precision decision of N pixels on the image block, wherein N is a positive integer.

13. The apparatus according to claim 9 or 10, wherein the image block is less than 16 pixels high and/or wide.

14. The apparatus according to claim 9 or 10, wherein the inter prediction mode of the sub image block is any one of the following: forward prediction, backward prediction, bi-prediction.

15. The apparatus according to claim 9 or 10, wherein the inter prediction mode of the sub-picture block is bi-prediction, wherein the CPMV of the picture block is the CPMV of the picture block obtained by forward prediction in bi-prediction or the CPMV of the picture block obtained by backward prediction in bi-prediction.

16. The apparatus according to claim 9, wherein the second obtaining unit is configured to obtain the second motion vector from the first motion vector such that an end point of the second motion vector is an integer pixel point closest to an end point of the first motion vector.

17. An image processing apparatus, comprising: a memory for storing instructions and a processor for executing the instructions stored in the memory, and execution of the instructions stored in the memory causes the processor to perform the method of any one of claims 1 to 8.

18. A computer storage medium having stored thereon a computer program which, when executed by a computer, causes the computer to perform the method of any of claims 1 to 8.