WO2020181507A1

WO2020181507A1 - Image processing method and apparatus

Info

Publication number: WO2020181507A1
Application number: PCT/CN2019/077894
Authority: WO
Inventors: 孟学苇; 郑萧桢; 王苫社; 马思伟
Original assignee: 北京大学; 深圳市大疆创新科技有限公司
Priority date: 2019-03-12
Filing date: 2019-03-12
Publication date: 2020-09-17
Also published as: CN111247804B; CN111247804A

Abstract

Provided are an image processing method and apparatus. The method comprises: obtaining a control point motion vector (CPMV) of an image block; and obtaining MVs of sub-image blocks in the image block according to the CPMV thereof, the MV being the whole pixel accuracy. By using the MVs of the sub-image blocks used as image processing units as the whole pixel accuracy, a motion compensation process of the sub-image blocks is allowed not to involve sub-pixels, thereby reducing the bandwidth pressure generated by an Affine prediction technology to a certain extent.

Description

Image processing method and device

Copyright statement

The content disclosed in this patent document contains copyrighted material. The copyright belongs to the copyright owner. The copyright owner does not object to anyone copying the patent document or the patent disclosure in the official records and archives of the Patent and Trademark Office.

Technical field

This application relates to the field of image processing, and more specifically, to an image processing method and device.

Background technique

The general idea of inter-frame prediction in video coding is: use the time-domain correlation between adjacent frames of the video, use the previously coded reconstructed frame as a reference frame, and predict the current frame through motion estimation and motion compensation. Remove the time redundant information of the video. The general process of inter-frame prediction includes Motion Estimation (ME) and Motion Compensation (MC). The current coding block of the current frame searches for the most similar block in the reference frame as the prediction block of the current block, and the relative displacement between the current block and its similar block is a motion vector (MV). The process of motion estimation is the process of obtaining a motion vector after searching and comparing the current coding block of the current frame in the reference frame. Motion compensation is the process of obtaining prediction frames using MV and reference frames. The predicted frame obtained by motion compensation may be different from the original current frame. Therefore, the difference (residual) between the predicted frame and the current frame needs to be transmitted to the decoder after transformation, quantization, etc., in addition to Pass the MV and reference frame information to the decoder. The decoder can reconstruct the current frame through the MV, the reference frame, and the difference between the predicted frame and the current frame.

Due to the continuity of natural object motion, the motion vector of the object between two adjacent frames may not be exactly an integer number of pixel units. In order to improve the accuracy of the motion vector, sub-pixel accuracy is proposed. For example, in the high-efficiency video coding (HEVC) standard, motion vectors with 1/4 pixel precision are used for motion estimation of luminance components. However, there is no sample value at the fractional pixel in digital video. Generally speaking, in order to achieve 1/K pixel accuracy estimation, the value of these fractional pixels must be approximately interpolated, that is, the line direction and the reference frame K-fold interpolation is performed in the column direction, that is, the prediction block is searched in the reference frame after interpolation. In the process of interpolating the current block, the pixels in the current block and the pixels in the adjacent area need to be used.

Generally, only traditional motion models (for example, translational motion) are considered in the inter prediction process. However, in the real world, there are still many forms of motion, such as zoom, rotation, perspective motion and other irregular motions. In order to consider the aforementioned multi-motion form, in VTM-3.0, an affine motion compensation prediction (Affine motion compensation prediction, which can be referred to as Affine) technology is introduced. In the Affine mode, the affine motion field of the image block can be derived from the motion vector of two control points (four parameters) or three control points (six parameters).

In the Affine mode, motion vectors with 1/4 pixel accuracy, 1/16 pixel accuracy or other sub-pixel accuracy can be used for the motion estimation of the image processing unit. The image processing unit of the Affine technology is a sub-CU (which can be referred to as a sub-block), and the size of the sub-CU is 4×4 (unit: pixel), which will cause the Affine technology to generate greater bandwidth pressure.

Summary of the invention

The present application provides an image processing method and device, which can reduce the bandwidth pressure caused by the Affine prediction technology to a certain extent.

In a first aspect, an image processing method is provided, the method includes: obtaining a motion vector CPMV of a control point of an image block; obtaining a motion vector of a sub-image block in the image block according to the CPMV of the image block, so The motion vector mentioned is in integer pixel accuracy.

In a second aspect, there is provided an image processing device, the device comprising: a first acquisition unit, configured to acquire a motion vector CPMV of a control point of an image block; and a second acquisition unit, configured to acquire according to the first acquisition unit The CPMV of the image block is obtained, and the motion vector of the sub-image block in the image block is obtained, and the motion vector has an integer pixel accuracy.

In a third aspect, an image processing device is provided, the encoding device includes a memory and a processor, the memory is used to store instructions, and the processor is used to execute instructions stored in the memory and store Execution of the instructions of causes the processor to execute the method provided in the first aspect.

In a fourth aspect, a chip is provided. The chip includes a processing module and a communication interface, the processing module is configured to control the communication interface to communicate with the outside, and the processing module is also configured to implement the method provided in the first aspect.

In a fifth aspect, a computer-readable storage medium is provided, on which a computer program is stored. When the computer program is executed by a computer, the computer realizes the method in the first aspect or any possible implementation manner of the first aspect .

In a sixth aspect, a computer program product containing instructions is provided, which when executed by a computer causes the computer to implement the method provided in the first aspect.

In the solution provided by this application, by making the motion vector of the sub-image block as the image processing unit have integer pixel accuracy, the motion compensation process of the sub-image block does not involve sub-pixels, which can reduce the Affine prediction technology to a certain extent. Bandwidth pressure.

Description of the drawings

Figure 1 is a schematic diagram of a video coding architecture.

Figure 2 is a schematic diagram of 1/4 pixel interpolation.

Figures 3(a) and 3(b) are schematic diagrams of the four-parameter Affine model and the six-parameter Affine model, respectively.

Figure 4 is a schematic diagram of the Affine motion vector field.

Fig. 5 is a comparison diagram of reference pixels required by the Affine mode and the HEVC mode in the prior art.

Fig. 6 is a schematic flowchart of an image processing method according to an embodiment of the present application.

Fig. 7 is another schematic flowchart of an image processing method according to an embodiment of the present application.

Fig. 8 is another schematic flowchart of the image processing method according to an embodiment of the present application.

Fig. 9 is a schematic flowchart of an image processing apparatus according to an embodiment of the present application.

Fig. 10 is another schematic flowchart of an image processing apparatus according to an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the technical field of this application. The terms used in the description of the application herein are only for the purpose of describing specific embodiments, and are not intended to limit the application.

In order to facilitate the understanding of the solutions according to the embodiments of the present application, a few related concepts are first described below.

1. Inter prediction

As shown in Figure 1, the video coding framework mainly includes intra-frame prediction, inter-frame prediction, transformation, quantization, entropy coding, and loop filtering.

This application is mainly aimed at improving the inter prediction (inter prediction) part.

The general idea of inter-frame prediction is: use the temporal correlation between adjacent frames of the video, use the reconstructed frame as the reference frame, and use Motion Estimation (ME) and Motion Compensation (MC) to compare the current frame Make predictions to remove the temporal redundant information of the video.

The current frame mentioned in this article, in the encoding scene, means the frame currently being encoded, and in the decoding scene, means the frame currently being decoded.

The reconstructed frame mentioned in this article, in the encoding scene, means the previously encoded frame, in the decoding scene, means the previously decoded frame.

For a frame of image, the entire frame of image is not directly processed in the encoding process, and the entire frame of image is usually divided into image blocks for processing.

As an example, first divide the entire frame of image into coding areas (Coding Tree Unit, CTU), for example, the size of the CTU is 64×64 or 128×128 (unit: pixels), and then the CTU can be further divided into square or rectangular Coding Unit (CU). During the encoding process, the CU is processed.

The unit of the size of the image block mentioned in this article is all pixels.

The general flow of inter prediction is as follows.

For the current image block in the current frame (hereinafter referred to as the current block for short), the most similar block is found in the reference frame as the prediction block of the current block. The relative displacement between the current block and similar blocks is called a Motion Vector (MV). Motion estimation refers to the process of obtaining a motion vector after searching and comparing the current block of the current frame in the reference frame. Motion compensation refers to the process of obtaining a prediction block using a reference block and a motion vector obtained by motion estimation.

The prediction block obtained by the inter-frame prediction process may be different from the original current block. Therefore, it is necessary to calculate the difference between the prediction block and the current block, and the difference may be called the residual. After performing transformation, quantization, entropy coding and other processing on the residual, the coded bit stream is obtained.

At the encoding end, after the image encoding is completed, that is, after the bit stream obtained by entropy encoding, the bit stream and encoding mode information, such as inter-frame prediction mode, motion vector information, etc., are stored or sent to the decoding end.

At the decoding end, after obtaining the entropy coded bitstream, first perform entropy decoding on the bitstream to obtain the corresponding residual; then, obtain the prediction block according to the coding mode information such as the decoded motion vector; finally, according to the residual and prediction Block, get the value of each pixel in the current block, that is, reconstruct the current block, and so on, reconstruct the current frame.

As shown in Figure 1, in the encoding process, steps such as inverse quantization and inverse transformation may also be included. Dequantization refers to the process opposite to the quantification process. Inverse transformation refers to the process opposite to the transformation process.

Inter-frame prediction mainly includes forward prediction, backward prediction, bi-prediction and so on. Among them, forward prediction is to use the previous reconstructed frame of the current frame (may be called the historical frame) to predict the current frame. Backward prediction is to use frames after the current frame (may be called a future frame) to predict the current frame. Bi-prediction may be bi-directional prediction, that is, both "historical frames" and "future frames" are used to predict the current frame. Bi-prediction can also be prediction in two directions, for example, using two "historical frames" to predict the current frame, or using two "future frames" to predict the current frame.

2. Sub-pixel precision motion estimation

In the actual scene, due to the continuity of natural object motion, the motion vector of the object between two adjacent frames may not be exactly an integer number of pixels. Therefore, the accuracy of motion estimation needs to be improved to the sub-pixel level (also called 1/K pixel accuracy). For example, in the HEVC standard, motion vectors with 1/4 pixel accuracy are used for motion estimation of the luminance component.

However, there is no sample value at 1/K pixel in digital video. Generally, in order to achieve motion estimation with 1/K pixel accuracy, the value of 1/K pixel is approximately interpolated. In other words, the line of reference frame K-fold interpolation is performed in the direction and column direction, and search is performed in the image after interpolation. In the process of interpolating the current block, the pixels in the current block and the pixels in the adjacent area need to be used.

As an example, the process of 1/4 pixel interpolation is shown in Figure 2. For an image block with a size of 8×8, 4×8, 4×4, or 8×4, the 3 pixels on the left and 4 pixels on the right outside the image block will be used to generate interpolation points The pixel value. As shown in Figure 2, for an image block with a size of 4×4, a _0,0 and d _0,0 are 1/4 pixels, b _0,0 and h _0,0 are half pixels, and c _{0, 0} and n _0,0 are 3/4 pixels. If the current block is a 2×2 block, A _0,0 ～A _1,0 , A _0,0 ～A _{0,1 are} enclosed by 2×2 blocks. In order to calculate all the interpolation points in this 2×2 block, some points outside the 2×2 need to be used, including 3 on the left, 4 on the right, 3 on the top, and 4 on the bottom.

3. Affine motion compensated prediction technology (Affine motion compensated prediction, hereinafter referred to as Affine).

Affine is an inter-frame prediction technology.

In the HEVC standard, only the traditional motion model (for example, translational motion) is considered in the inter prediction process. However, in the real world, there are still many forms of motion, such as zoom, rotation, perspective motion and other irregular motions. In order to take into account the above-mentioned movement form, in VTM-3.0, Affine technology was introduced.

As shown in Figure 3, an Affine mode sports field can pass two control points (four parameters) (as shown in Figure 3(a)) or three control points (six parameters) (as shown in Figure 3(b)) The motion vector is exported.

Hereinafter, the MV (controlpointmotionvector) of the control point is referred to as CPMV for short.

The processing unit of Affine is not a CU, but a sub-block (sub-CU) obtained after dividing the CU, and the size of each sub-CU is 4×4. In Affine mode, each sub-CU has one MV. It can be understood that, unlike ordinary CUs, Affine mode CUs do not only have one MV. There are as many sub-CUs as there are in a CU.

As an example, the MV of the sub-CU in one CU is derived through the CPMV calculation of two control points or three control points as shown in FIG. 3. For example, for the four-parameter Affine motion model, the MV of the sub-CU at the (x, y) position is calculated by the following formula:

For another example, for the six-parameter Affine motion model, the MV of the sub-CU at the (x, y) position is calculated by the following formula:

Among them, (mv _0x , mv _0y ) is the MV of the upper left control point, (mv _1x , mv _1y ) is the MV of the upper right control point, and (mv _2x , mv _2y ) is the MV of the lower left control point. W in the above formula represents the width of the CU where the sub-CU is located, and H represents the height of the CU where the sub-CU is located.

After calculation of the above formula (1), a schematic diagram of a motion vector in a CU is shown in FIG. 4, and each square represents a sub-CU with a size of 4×4. After the above formula is calculated, the MVs of all sub-CUs will be converted into 1/16 pixel precision representation, that is to say, the highest precision of the sub-CU MV is 1/16 pixel.

After the MV of each sub-CU is calculated, the prediction block of each sub-CU is obtained through the process of motion compensation. The size of the sub-CU of the chrominance component and the luminance component is 4×4, and the motion vector of the chrominance component 4×4 block is obtained by averaging the corresponding four 4×4 luminance component motion vectors.

In the encoding process of the Affine mode, CPMV information is written in the code stream, and there is no need to write the MV information of each sub-CU.

4. Adaptive Motion Vector Resolution (AMVR)

AMVR technology can make the CU have motion vectors with full pixel precision and sub-pixel precision. The integer pixel accuracy can be, for example, 1-pixel accuracy, 2-pixel accuracy, or the like. The sub-pixel accuracy can be, for example, 1/2 pixel accuracy, 1/4 pixel accuracy, 1/8 pixel accuracy, or 1/16 pixel accuracy.

For example, for each CU that adopts Affine AMVR technology (in some cases, CU may not adopt AffineAMVR), the corresponding MV accuracy is adaptively decided at the encoding end, and the result of the decision is written into the code stream and passed to the decoding end.

The whole pixel accuracy or sub-pixel accuracy mentioned in Affine AMVR technology refers to the pixel accuracy of CPMV, not the pixel accuracy of sub-CU.

For the CPMV of the whole pixel, the motion estimation process of the CU is the whole pixel process, but the MV of the sub-CU obtained after the above formula (1) or formula (2) may be 1/4 pixel accuracy or other sub-pixel accuracy. Pixel accuracy.

If the MV of the sub-CU is of sub-pixel accuracy, the motion compensation process of the sub-CU will involve sub-pixels, and since the size of the sub-CU is 4×4, this will cause the Affine prediction process to generate greater bandwidth pressure.

The applicant selected the official general test data as the test sequence on the latest reference software VTM-4.0 of VVC, and performed a simulation. The simulation result is shown in Figure 5.

As shown in Figure 5, the box on the left represents the worst case of HEVC (MV with 1/4 pixel accuracy) is a bidirectional inter prediction CU with 8×8, and the number of reference pixels required is (8+7) ×(8+7)×2=450. The box on the right represents the 4×4 bidirectional inter-frame prediction CU under the worst case (1/16 and 1/4 pixel precision MV) in the Affine mode of VVC. The number of reference pixels required is (4+7 )×(4+7)×2×4=968.

It can be seen from Fig. 5 that compared with HEVC, the existing Affine mode increases the reference pixels by 115%, which causes a greater bandwidth pressure.

In response to the above problems, this application proposes an image processing method and device, which can reduce the bandwidth pressure generated by the Affine technology to a certain extent.

This application is suitable for the field of digital video coding technology, and is specifically used for the inter-frame prediction part of a video codec. This application can be applied to codecs that comply with the international video coding standard H.264/HEVC and the Chinese AVS2 standard, as well as codecs that comply with the next-generation video coding standard VVC or AVS3.

This application can be applied to the inter-frame prediction part of a video codec, that is to say, the image processing method according to the embodiment of this application can be executed by an encoding device or a decoding device.

FIG. 6 is a schematic flowchart of an image processing method 600 provided by this application. The method 600 includes the following steps.

610. Acquire a motion vector (CPMV) of a control point of the image block.

The method of obtaining the CPMV of the image block will be described below, which will not be described here.

620. Obtain a motion vector of a sub-image block in the image block according to the CPMV of the image block, where the motion vector has an integer pixel accuracy.

In other words, based on the CPMV of the image block, the motion vector of the sub-image block in the image block is obtained, and the pixel accuracy of the motion vector of the sub-image block is made to be an integer pixel accuracy.

The sub-image block mentioned in this application represents a processing unit of image processing or video processing. The width and/or height of the sub-image block may be less than 8 pixels. For example, the size of the sub-image block is 4×4 (pixels).

The sub image block may be a block obtained by dividing the image block. It can be understood that if the size of the image block and the sub-image block are the same, the sub-image block can be regarded as the image block itself.

The sub-image block may be a square block, for example, a block with a size of 4×4 or 8×8, or a rectangular block, for example, a block with a size of 2×4 or 4×8.

The size of the image block mentioned in this application can be 16×16, 16×8, 16×4, 8×16, 4×8, 8×8, 8×4, 4×8 and other sizes.

It should be understood that the motion vector of the sub-image block as the processing unit has an integer pixel accuracy. Therefore, the motion compensation process of the sub-image block does not involve sub-pixels, which can reduce the bandwidth pressure generated by the video inter-frame prediction process.

According to the CPMV of the image block, the process of obtaining the motion vector of the sub-image block in the image block may include: calculating the motion vector of the sub-image block according to the motion vector of the two or three control points of the image block, and making the obtained motion vector The pixel accuracy of the motion vector of the sub-image block is the integer pixel accuracy.

As an example, the motion vector of the sub-image block can be calculated according to formula (1) or formula (2) described above.

Optionally, in some embodiments, if the pixel accuracy of the motion vector of the sub-image block calculated directly based on the CPMV of the image block is integer pixel accuracy, then this motion vector is the motion vector of the sub-image block to be obtained in this application. .

For example, as a possible implementation manner, an algorithm is used to calculate the motion vector of the sub-image block according to the CPMV of the image block. The algorithm can ensure that the calculated pixel accuracy of the motion vector of the sub-image block is an integer pixel.

Optionally, in some embodiments, if directly based on the CPMV of the image block, the calculated pixel accuracy of the motion vector of the sub-image block is sub-pixel accuracy, for example, 1/4 pixel accuracy, 1/8 pixel accuracy, or 1 /16 pixel accuracy, you also need to process the currently calculated motion vector to change it from sub-pixel accuracy to full-pixel accuracy.

Optionally, step 620 includes the following steps 1) and 2).

1) Calculate the first motion vector of the sub-image block according to the CPMV of the image block, and the first motion vector has sub-pixel accuracy.

For example, according to formula (1) or formula (2) described above, the first motion vector of the sub-image block is calculated based on CPMV, and the pixel accuracy of the calculated first motion vector is sub-pixel.

2) Process the first motion vector into a second motion vector with integer pixel accuracy.

As a possible implementation of step 2): the second motion vector is obtained according to the first motion vector of the sub-image block, so that the end point of the second motion vector is the whole pixel point closest to the end point of the first motion vector.

For example, the closest whole pixel point may be the whole pixel point above, below, left or right of the end point of the first motion vector.

As an example, the following formula is used to calculate the second motion vector (MV2x, MV2y) of the sub-image block according to the first motion vector (MV1x, MV1y) of the sub-image block.

If MV1x>=0, MV2x=((MV1x+(1<<(shift-1)))>>shift)<<shift;

If MV1x<0, MV2x=-((-MV1x+(1<<(shift-1)))>>shift)<<shift;

If MV1y>=0, MV2y=((MV1y+(1<<(shift-1)))>>shift)<<shift;

If MV1y<0, MV2y=-((-MV1y+(1<<(shift-1)))>>shift)<<shift,

Formula (3).

Among them, the value of shift is related to the storage accuracy of the motion vector in the coding software platform. For example, in the current VTM-4.0 reference software, the storage accuracy of the motion vector is 1/16 accuracy, and the value of shift can be set to 4.

As another example, the following formula is used to obtain the second motion vector (MV2x, MV2y) of the sub-image block according to the first motion vector (MV1x, MV1y) of the sub-image block.

If MV1x>=0, MV2x=(MV1x>>shift)<<shift;

If MV1x<0, MV2x=-(((-MV1x)>>shift)<<shift);

If MV1y>=0, MV2y=(MV1y>>shift)<<shift;

If MV1y<0, MV2y=-(((-MV1y)>>shift)<<shift), formula (4).

Among them, the meaning of shift is consistent with the meaning of shift described above.

"<<" in formula (3) and formula (4) means left shift, and ">>" means right shift.

It should be noted that this application does not limit the manner in which the pixel accuracy of the motion vector is converted from the sub-pixel level to the entire pixel level. For example, it is also possible to obtain the second motion vector with integer pixel accuracy according to the first motion vector according to other feasible conversion algorithms from sub-pixel to integer pixel.

When the size of the smallest CU processed in the current Affine technology (corresponding to the image block in the embodiment of this application) is 16×16, there will be no bandwidth pressure during the motion estimation process. Therefore, there is no need to perform the motion estimation process. modify. In this case, the pixel accuracy of the CPMV of the image block may be a whole pixel or a sub-pixel. If the pixel accuracy of the CPMV of the image block is sub-pixel, the pixel accuracy of the motion vector of the sub-image block calculated according to the CPMV of the image block is also sub-pixel; if the pixel accuracy of the CPMV of the image block is full pixels, according to the image block The pixel accuracy of the motion vector of the sub-image block calculated by CPMV may also be sub-pixel. For example, the pixel accuracy of the motion vector of the sub-image block calculated according to formula (1) or formula (2) may be sub-pixel.

It can be seen from the above that in the existing Affine technology, the pixel accuracy of the sub-image block, that is, the motion vector of the processing unit may be sub-pixel, which will cause the motion compensation process to involve sub-pixels, which will increase the bandwidth pressure of the Affine technology.

It should be understood that by enlarging the size of the sub-image block as the processing unit, the bandwidth pressure problem can also be relieved to a certain extent, but this will reduce the image compression performance. In this application, by processing the motion vector of the sub-image block as the processing unit into integer pixel accuracy, it can ensure the motion compensation of the integer pixel accuracy, so that on the one hand, the problem of bandwidth pressure can be solved, and on the other hand, better image compression can be ensured. performance.

The existing Affine technology can be improved according to the solution provided by the present application, that is, the motion vector of the Sub-CU in the Affine mode is processed to an integer pixel accuracy, so that the bandwidth pressure generated by the Affine technology can be reduced.

In addition to being applied to the Affine technology, the solution provided in this application can also be applied to other similar technologies that may appear in the future. For example, the pixel accuracy of the motion vector includes integer pixel accuracy and sub-pixel accuracy, and the size of the image processing unit Smaller, for example, 4×4.

It should be understood that the solution provided in this application can be used to improve the quality of compressed video and improve the hardware friendliness of the codec, which is of great significance to the compression processing of videos such as broadcast television, video conference, and network video.

Optionally, in some embodiments, the method provided in the embodiment of the present application further includes: processing the CPMV of the image block to integer pixel accuracy.

This embodiment can ensure that the CPMV of the image block has an integer pixel accuracy.

Hereinafter, an embodiment of processing the CPMV of the image block into integer pixel accuracy will be described.

Optionally, as shown in FIG. 7, in some embodiments, step 610 includes the following step 611, step 612, and step 613.

611: Acquire a motion information candidate list of the image block.

For example, the motion vectors of the spatial and/or temporal neighboring blocks of the image block are acquired, and based on the motion vectors of these neighboring blocks, a motion information candidate list of the image block is constructed.

612: Process the motion vector in the motion information candidate list into integer pixel accuracy.

For example, the aforementioned formula (3) or formula (4) can be used to process the motion vector in the motion information candidate list into integer pixel accuracy.

The neighboring block refers to the neighboring block used to construct the motion information candidate list of the image block, for example, the neighboring block in the temporal and/or spatial domain. This application does not limit the manner of determining neighboring blocks.

613. Obtain the CPMV of the image block according to the motion vector processed as an integer pixel precision in the motion information candidate list.

Affine inter prediction modes can be divided into Affine merge mode and Affine inter mode.

The embodiment shown in FIG. 7 can be applied to the Affine inter mode and can also be applied to the Affine merge mode.

Optionally, in the embodiment shown in FIG. 7, the inter-frame prediction mode of the image block is the Affine merge mode.

In the Affine merge mode, a CPMV can be selected from the motion information candidate list directly as the CPMV of the image block. That is, step 613 includes: selecting a CPMV from the motion information candidate list of the image block as the CPMV of the image block.

Because the motion vectors of the neighboring blocks used to construct the motion information candidate list are processed with integer pixel accuracy, selecting CPMV from the motion information candidate list directly as the CPMV of the image block can ensure that the CPMV of the image block is an integer pixel.

As an example, the general process of inter prediction in Affine merge mode includes the following steps. In this example, the image block is a CU as an example.

Step 1-1: Obtain the motion vector (MV) of the neighboring block from the spatial neighboring block and/or the temporal neighboring block. In this process, the MV of the neighboring block in the Affine mode and the MV of the neighboring block in the traditional mode are obtained, and CPMVs are obtained according to the MV combination of these neighboring blocks, and the motion information candidate list of the CU is constructed from these CPMVs.

Step 1-2, processing the motion vector in the motion information candidate list of the CU into integer pixel accuracy.

Steps 1-3, select a combination from the motion information candidate list (the combination may contain two or three CPMV, representing two control points and three control points CPMV), as the CPMVs of the CU.

In the Affine merge mode, the CPMVs selected in the motion information candidate list are used as the CPMVs of the current CU, no motion estimation is required, and there is no concept of MVD in the Affine inter mode (described below). That is, in the Affine merge mode, only the index of CPMVs selected from the motion information candidate list (one CU only needs to write one index) is written into the code stream, and there is no need to transmit MVD.

Regarding the neighboring block mentioned in step 1-1, the inter prediction mode of the neighboring block can be the traditional inter prediction mode or the affine mode. Therefore, the MV obtained from the neighboring block may be of integer pixel accuracy or Sub-pixel accuracy.

In this embodiment, by processing the motion vector of the neighboring block of the current image block into integer pixel accuracy, it can ensure that the CPMV of the image block has integer pixel accuracy.

As mentioned above, the embodiment shown in FIG. 7 can also be applied to the Affine inter mode. In order to better understand the embodiments of the present application, before describing the embodiment in which the embodiment shown in FIG. 7 is applied to the Affine Inter mode, the general flow of the Affine Inter mode will be described first.

As an example, the general process of Affine Inter mode includes the following steps. In this example, the image block is a CU as an example.

Step 2-1: Obtain motion vectors of neighboring blocks from spatial neighboring blocks and/or temporal neighboring blocks. In this process, the motion vector of the neighboring block in the Affine mode and the motion vector of the neighboring block in the traditional mode are obtained; CPMVs are obtained by combining the obtained motion vectors, and the motion information candidate list of the CU is constructed from these CPMVs.

Step 2-2, select a combination from the motion information candidate list constructed in step 2-1 (the combination may contain two or three CPMV, representing two control points and three control points CPMV), as the current CU MV (Motion vector prediction, MVP) (that is, the predicted CPMVs of the current CU).

Step 2-3: Perform motion estimation with the current entire CU as a unit, and obtain CPMVs of the current CU.

Step 2-4: Calculate the difference between the CPMVs selected in step 2-2 and the CPMVs of step 2-3 motion estimation to obtain a motion vector difference (MVD).

In Affine Inter mode, the index of the selected CPMVs and MVD need to be written into the code stream.

In the Affine Inter mode, the motion estimation process is performed in units of CU (corresponding to the image block in the embodiment of this application), and the motion compensation process is performed in a 4×4 sub-CU (corresponding to the sub-image in the embodiment of this application). Block) as a unit.

Regarding the neighboring block mentioned in step 2-1, the inter prediction mode of the neighboring block can be the traditional inter prediction mode or the affine mode. Therefore, the MV obtained from the neighboring block may be of integer pixel accuracy or Sub-pixel accuracy.

In the Affine Inter mode, the encoder will select different pixel precisions of the motion vector of the CU. This process can be called adaptive motion vector resolution (AMVR) decision-making.

The pixel accuracy of AMVR decision is essentially the pixel accuracy of MVD, that is, the pixel accuracy of CPMVs of CU, not the pixel accuracy of MV of sub-CU.

In the existing Affine Inter mode, the range of pixel accuracy for AMVR decisions includes but is not limited to: 1/16 pixel accuracy, 1/8 pixel accuracy, 1/4 pixel accuracy, 1/2 pixel accuracy, 1 pixel accuracy, 2 Pixel accuracy, 4-pixel accuracy, etc. In other words, the CU can have multiple CPMVs with different pixel accuracy. For example, the CU can have three different CPMVs of integer pixels, 1/4 pixel accuracy, and 1/16 pixel accuracy.

Optionally, in the embodiment shown in FIG. 7, the inter prediction mode of the image block is Affine Inter mode, and step 611 includes obtaining the motion information candidate list of the image block; step 612 includes The motion vector is processed to integer pixel accuracy; step 613 includes: selecting the predicted CPMV of the image block from the motion information candidate list of the image block to obtain the MVD of the image block, the predicted CPMV of the image block and the MVD of the image block, and obtaining the The CPMV of the image block.

As shown in FIG. 8, in this embodiment, step 610 may further include step 614 of performing a motion vector accuracy decision of N pixels for the image block, where N is a positive integer.

That is, the whole pixel precision motion vector precision decision (AMVR decision) is made for the image block.

It can be understood that by making AMVR decisions with integer pixel accuracy for the image block, the pixel accuracy of the MVD of the image block can be guaranteed to be integer pixels, and the pixel accuracy of the CPMV of the image block can also be guaranteed to be integer pixels. In this way, it can be ensured that no sub-pixels are involved in the motion estimation process of the image block, thereby reducing the bandwidth pressure to a certain extent.

In this embodiment, when Affine AMVR is used to make motion vector accuracy decisions, it does not make decisions on all pixel accuracy, but skips the decision of 1/M (M>1) pixel accuracy, that is, only N pixel accuracy is made Decision-making.

It should be understood that, in this embodiment, when the motion vector accuracy index is written into the code stream, the number of bits (bit number) written into the code stream is reduced correspondingly because the pixel accuracy options are reduced, and there is even no need to write to indicate motion. The number of bits for the vector precision index. For example, the original pixel accuracy options include three types: integer pixels, 1/4 pixels, and 1/16 pixels. At least 2 bits of information are required to indicate these three pixel accuracy. For example, "0" is used to indicate 1/4 pixel. "10" means 1/16 pixel, and "11" means whole pixel. In this embodiment, "0" can be used to represent the whole pixel, so only 1 bit of data needs to be written in the code stream, or the whole pixel precision can be agreed through the agreement, so there is no need to write the motion vector precision index. Into the code stream, this saves signaling overhead, while also reducing bandwidth pressure.

It should be noted that when the inter prediction mode of the image block is the Affine inter mode, the implementation of N (N is a positive integer) pixel motion vector accuracy decision on the image block is the same as the embodiment shown in FIG. 8 It can be implemented in combination, or it can be implemented independently from the embodiment shown in FIG. 8.

Optionally, as shown in FIG. 8, in some embodiments, the inter prediction mode of the image block is Affine inter mode, and step 610 includes: obtaining the CPMV of the image block, and performing the motion vector precision of N pixels on the image block Decision, N is a positive integer.

It should be understood that by making an N-pixel motion vector accuracy decision on the image block, whether or not the motion vectors of the neighboring blocks of the image block are processed to the integer pixel accuracy, the CPMV of the image block can be guaranteed to have the integer pixel accuracy.

It should also be understood that in the Affine Inter mode, by processing the pixel accuracy of the CPMV of the image block as an integer pixel accuracy, it is possible to ensure the motion estimation of the integer pixel accuracy, which helps reduce bandwidth pressure.

It can be seen from the above that, in the Affine merge mode, the implementation manner of processing the CPMV of the image block's pixel accuracy to integer pixel accuracy is: processing the motion vectors in the motion information candidate list to integer pixel accuracy.

In the Affine inter mode, the implementation of processing the pixel accuracy of the CPMV of the image block to integer pixel accuracy is: processing the motion vector in the motion information candidate list to integer pixel accuracy, and performing integer pixel accuracy on the image block AMVR decision.

Or, in the Affine inter mode, the implementation of processing the CPMV pixel accuracy of the image block to the integer pixel accuracy is to implement an AMVR decision with the integer pixel accuracy for the image block.

In the foregoing embodiment involving processing the motion vector of the neighboring block to integer pixel accuracy, the method shown in the above formula (3) or formula (4) can be used to process the motion vector of the neighboring block to integer pixel accuracy. It is also possible to use other feasible algorithms or methods that convert from sub-pixels to pixels to process the motion vectors of neighboring blocks into integer-pixel accuracy. This application does not limit this.

Optionally, in some embodiments, when the size of the image block is smaller than the threshold value, the CPMV of the image block is processed into integer pixel accuracy.

The threshold can be determined according to actual needs. For example, the threshold is 16 pixels.

For example, when the height and/or width of the image block is less than 16 pixels, the CPMV of the image block is processed to the integer pixel accuracy.

From the Affine Inter mode described above, in the Affine Inter mode, motion estimation in units of image blocks will be performed. For example, when the height and width of the image block are equal to or greater than 16 pixels, even the sub-pixel precision motion estimation process will not cause a large bandwidth pressure. In this case, the CPMV of the image block may not be processed to make it Become an integer pixel accuracy.

However, if the height and/or width of the image block is less than 16 pixels, for example, the size of the image block is 4×8, 8×4, 4×16, or 16×4, the motion estimation process with sub-pixel accuracy may cause large Bandwidth pressure. In this case, the CPMV of the image block can be processed to full pixel accuracy.

Optionally, in some embodiments, the prediction mode of the image block is Affine Inter mode, and the height and/or width of the image block are less than 16 pixels. The method according to the embodiment of the present application further includes: performing integer pixel accuracy on the image block AMVR decision.

This embodiment can ensure the motion estimation process with the accuracy of the whole pixel, so as to avoid causing a large bandwidth pressure.

In addition, when the motion vector accuracy index of the image block that meets the condition of height and/or width less than 16 pixels is written into the code stream, the number of bits written into the code stream can be reduced because the pixel accuracy options are reduced.

For example, for a CU with a height and width greater than or equal to 16 pixels, AMVR pixel accuracy can be selected from three methods: integer, 1/4, and 1/16 pixels. For example, “0” represents 1/4 pixel, and “10 "Represents 1/16 pixel, and "11" represents an entire pixel. For CUs with a height and/or width less than 16 pixels, because there is only one AMVR pixel accuracy option, there is no need to write the AMVR pixel accuracy index into the code stream. For example, the whole pixel accuracy can be adopted by agreement.

The embodiments of the present application can be applied to different kinds of inter-frame prediction methods, for example, forward prediction, backward prediction, or bi-prediction. In other words, the inter-frame prediction mode of the sub-image block mentioned in the embodiment of the present application may be any of the following: forward prediction, backward prediction, and bi-prediction.

For example, if the inter-frame prediction mode of the sub-image block is forward prediction, the motion vector of the sub-image block obtained in the forward prediction process is processed as an integer pixel.

For another example, if the inter-frame prediction mode of the sub-image block is backward prediction, the motion vector of the sub-image block obtained in the backward prediction process is processed as an integer pixel.

For another example, if the inter-frame prediction mode of the sub-image block is bi-prediction, the motion vector of the sub-image block obtained by the bi-prediction process is processed as integer pixels.

Optionally, the inter-frame prediction mode of the sub-image block is bi-prediction, but for only one prediction process in the bi-prediction, the method provided in the embodiment of the present application is used to process the motion vector of the sub-image block to integer pixel accuracy.

For example, the CPMV of the image block is the CPMV of the image block obtained by forward prediction in the bi-prediction process, or the CPMV of the image block obtained by backward prediction in the bi-prediction process.

In other words, for example, if the inter-frame prediction mode of the sub-image block is bi-prediction, the motion vector of the sub-image block obtained in one prediction process of the bi-prediction process is processed as an integer pixel. This prediction process may be the forward prediction process in the bi-prediction or the backward prediction process in the bi-prediction.

It can be seen from the above that the solution provided by this application, by making the motion vector of the sub-image block as the image processing unit have integer pixel accuracy, can make the motion compensation process of the sub-image block not involve sub-pixels, thereby reducing Affine prediction to a certain extent. Bandwidth pressure created by technology.

Further, by processing the pixel accuracy of the CPMV of the image block as integer pixel accuracy, in the Affine Inter mode, the motion estimation with integer pixel accuracy can be guaranteed, which helps reduce bandwidth pressure.

Therefore, the solution provided by the present application can reduce the bandwidth pressure caused by the inter-frame prediction process, and at the same time can ensure a certain compression performance.

The method embodiments of the present application are described above, and the device embodiments of the present application will be described below. It should be understood that the description of the device embodiment and the description of the method embodiment correspond to each other. Therefore, for the content that is not described in detail, please refer to the previous method embodiment. For brevity, details are not repeated here.

As shown in FIG. 9, an embodiment of the present application provides an image processing apparatus 900, which includes the following units.

The first acquiring unit 910 is configured to acquire the motion vector CPMV of the control point of the image block.

The second acquiring unit 920 is configured to acquire a motion vector of a sub-image block in the image block according to the CPMV of the image block acquired by the first acquiring unit 910, and the motion vector has an integer pixel accuracy.

Optionally, in some embodiments, the second acquiring unit 920 is configured to: calculate the first motion vector of the sub-image block according to the CPMV of the image block, the first motion vector is of sub-pixel accuracy; A motion vector is processed as a second motion vector with integer pixel precision.

Optionally, in some embodiments, the second obtaining unit 920 is configured to obtain a second motion vector according to the first motion vector of the sub-image block, so that the end point of the second motion vector is the same as that of the first motion vector. The whole pixel closest to the end point.

For example, the second acquisition unit 920 is configured to process the first motion vector into a second motion vector with a pixel accuracy of an entire pixel through formula (3) or formula (4).

Optionally, in some embodiments, the height and/or width of the sub-image block is 4 pixels.

Optionally, in some embodiments, the first obtaining unit 910 is configured to: obtain a motion information candidate list of the image block, and process the motion vector in the motion information candidate list to integer pixel accuracy; and according to the motion information candidate The list is processed as a motion vector with integer pixel precision, and the CPMV of the image block is obtained.

Optionally, in some embodiments, the device 900 further includes: a processing unit 930, configured to make a motion vector accuracy decision of N pixels for the image block, where N is a positive integer.

Optionally, in some embodiments, the height and/or width of the image block is less than 16 pixels.

Optionally, in some embodiments, the inter-frame prediction mode of the sub-image block is any one of the following: forward prediction, backward prediction, and bi-prediction.

Optionally, in some embodiments, the inter-frame prediction mode of the sub-image block is bi-prediction, wherein the CPMV of the image block is the CPMV of the image block obtained by forward prediction in the bi-prediction process, or bi-prediction The CPMV of the image block obtained by backward prediction in the process.

Optionally, the image processing apparatus 900 of this embodiment may be an encoder, and the apparatus 900 may also include functional modules for implementing video encoding related processes.

Optionally, the image processing apparatus 900 of this embodiment may be a decoder, and the apparatus 900 may further include functional modules for implementing video decoding related processes.

As shown in FIG. 10, an embodiment of the present invention also provides an image processing apparatus 1000. The device 1000 includes a processor 1010 and a memory 1020. The memory 1020 is used to store instructions. The processor 1010 is used to execute instructions stored in the memory 1020. The execution of the instructions stored in the memory 1020 makes the processor 1010 The method used to perform the above method embodiment.

Specifically, the encoding device 1000 further includes a communication interface 1030 for transmitting signals with external devices.

Optionally, the image processing apparatus 1000 in this embodiment is an encoder, and the communication interface 1030 is used to receive image or video data to be processed from an external device. Alternatively, the communication interface 1030 is also used to send a coded stream to the decoding end.

Optionally, the image processing apparatus 1000 in this embodiment is a decoder, and the communication interface 1030 is used to receive an encoded bitstream from an encoding end.

The embodiment of the present invention also provides a computer storage medium on which a computer program is stored. When the computer program is executed by a computer, the computer executes the method in the above method embodiment.

An embodiment of the present invention also provides a computer program product containing instructions, which is characterized in that, when the instructions are executed by a computer, the computer executes the method of the above method embodiment.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware or any other combination. When implemented by software, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present invention are generated in whole or in part. The computer can be a general-purpose computer, a dedicated computer, a computer network, or other programmable devices. Computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, computer instructions can be transmitted from a website, computer, server, or data center through a cable (such as Coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL) or wireless (such as infrared, wireless, microwave, etc.) transmission to another website site, computer, server, or data center. A computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media. Available media may be magnetic media (for example, floppy disk, hard disk, tape), optical media (for example, digital video disc (DVD)), or semiconductor media (for example, solid state disk (SSD)), etc.

A person of ordinary skill in the art may be aware that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, the functional units in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

An image processing method, characterized in that it comprises:

Obtain the motion vector CPMV of the control point of the image block;

According to the CPMV of the image block, the motion vector of the sub-image block in the image block is obtained, and the motion vector has an integer pixel accuracy.
The method according to claim 1, wherein, based on the CPMV of the image block, obtaining the motion vector of the sub-image block in the image block comprises:

Calculating a first motion vector of the sub-image block according to the CPMV of the image block, where the first motion vector has sub-pixel accuracy;

The first motion vector is processed into a second motion vector with integer pixel accuracy.
The method according to claim 1 or 2, wherein the height and/or width of the sub-image block is 4 pixels.
The method according to any one of claims 1 to 3, wherein acquiring the motion vector CPMV of the control point of the image block comprises:

Acquiring a motion information candidate list of the image block;

Processing the motion vectors in the motion information candidate list into integer pixel accuracy;

Acquire the CPMV of the image block according to the motion vector processed to the integer pixel precision in the motion information candidate list.
The method according to any one of claims 1 to 4, wherein the method further comprises:

A motion vector precision decision of N pixels is performed on the image block, where N is a positive integer.
The method according to any one of claims 1 to 5, wherein the height and/or width of the image block is less than 16 pixels.
The method according to any one of claims 1 to 6, wherein the inter-frame prediction mode of the sub-image block is any one of the following: forward prediction, backward prediction, and bi-prediction.
The method according to any one of claims 1 to 6, wherein the inter-frame prediction mode of the sub-image block is bi-prediction, wherein the CPMV of the image block is obtained from forward prediction in the bi-prediction process The CPMV of the image block or the CPMV of the image block obtained by backward prediction in the bi-prediction process.
The method according to claim 2, wherein processing the first motion vector into a second motion vector with integer pixel precision comprises:

According to the first motion vector, the second motion vector is acquired, so that the end point of the second motion vector is an integral pixel point closest to the end point of the first motion vector.
An image processing device, characterized in that it comprises:

The first acquiring unit is used to acquire the motion vector CPMV of the control point of the image block;

The second acquisition unit is configured to acquire a motion vector of a sub-image block in the image block according to the CPMV of the image block acquired by the first acquisition unit, and the motion vector has an integer pixel accuracy.
The device according to claim 10, wherein the second acquiring unit is configured to:

Calculating a first motion vector of the sub-image block according to the CPMV of the image block, where the first motion vector has sub-pixel accuracy;

The first motion vector is processed into a second motion vector with integer pixel accuracy.
The device according to claim 10 or 11, wherein the height and/or width of the sub-image block is 4 pixels.
The device according to any one of claims 10 to 12, wherein the first obtaining unit is configured to:

Acquiring a motion information candidate list of the image block;

Processing the motion vectors in the motion information candidate list into integer pixel accuracy;

Acquire the CPMV of the image block according to the motion vector processed to the integer pixel precision in the motion information candidate list.
The device according to any one of claims 10 to 13, wherein the device further comprises:

The processing unit is used to make N-pixel motion vector accuracy decision on the image block, where N is a positive integer.
The device according to any one of claims 10 to 14, wherein the height and/or width of the image block is less than 16 pixels.
The apparatus according to any one of claims 10 to 15, wherein the inter-frame prediction mode of the sub-image block is any one of the following: forward prediction, backward prediction, and bi-prediction.
The apparatus according to any one of claims 10 to 16, wherein the inter-frame prediction mode of the sub-image block is bi-prediction, wherein the CPMV of the image block is obtained from forward prediction in the bi-prediction process The CPMV of the image block or the CPMV of the image block obtained by backward prediction in the bi-prediction process.
The apparatus according to claim 11, wherein the second obtaining unit is configured to obtain the second motion vector according to the first motion vector, so that the end point of the second motion vector is the same The whole pixel point closest to the end point of the first motion vector.
An image processing device, characterized by comprising: a memory and a processor, the memory is used to store instructions, the processor is used to execute the instructions stored in the memory, and the execution of the instructions stored in the memory causes The processor is configured to execute the method according to any one of claims 1 to 9.
A computer storage medium, characterized in that a computer program is stored thereon, and when the computer program is executed by a computer, the computer executes the method according to any one of claims 1 to 9.
A computer program product containing instructions, characterized in that, when the instructions are executed by a computer, the computer executes the method according to any one of claims 1 to 9.