WO2020252707A1

WO2020252707A1 - Video processing method and device

Info

Publication number: WO2020252707A1
Application number: PCT/CN2019/091955
Authority: WO
Inventors: 孟学苇; 郑萧桢; 王苫社; 马思伟
Original assignee: 北京大学; 深圳市大疆创新科技有限公司
Priority date: 2019-06-19
Filing date: 2019-06-19
Publication date: 2020-12-24
Also published as: CN111656782A

Abstract

Embodiments of the present application provide a video processing method and device, capable of effectively implementing an interpolation process in a motion estimation and/or motion compensation process. The method comprises: performing motion estimation and/or motion compensation on an image block of a target frame having multiple motion vectors (MVs) using an interpolation filter in a plurality of interpolation filters.

Description

Video processing method and equipment

Copyright statement

The content disclosed in this patent document contains copyrighted material. The copyright belongs to the copyright owner. The copyright owner does not object to anyone copying the patent document or the patent disclosure in the official records and archives of the Patent and Trademark Office.

Technical field

This application relates to the field of image processing, and more specifically, to a video processing method and device.

Background technique

Prediction is an important module of the mainstream video coding framework. Prediction can include intra-frame prediction and inter-frame prediction.

The general process of inter prediction may include motion estimation (ME) and motion compensation (MC). The process of motion estimation is the process of obtaining a motion vector (MV) after searching and comparing the current coding block of the current frame in the reference frame. Motion compensation is the process of obtaining the prediction block of the current block by using the MV and the reference block. The predicted block obtained by motion compensation may be different from the original current block. Therefore, the difference (residual) between the predicted block and the current block needs to be transmitted to the decoding end after transformation, quantization, etc., in addition to The information of the MV and the reference frame is passed to the decoding end for the decoding end to reconstruct the current frame.

Due to the continuity of natural object motion, the motion vector of the object between two adjacent frames may not be exactly an integer number of pixel units. In order to improve the accuracy of the motion vector, sub-pixel accuracy is proposed. For example, in the High Efficiency Video Coding (HEVC) standard, a motion vector with 1/4 pixel accuracy is used for motion estimation of the luminance component. However, there are no samples at sub-pixels in digital video. Generally speaking, in order to achieve 1/K pixel accuracy estimation, the values of these sub-pixels must be approximately interpolated, that is, the line direction and the reference frame K-fold interpolation is performed in the column direction, and the prediction block is searched in the reference frame after the interpolation. In the process of interpolating the current block, the pixels in the current block and the pixels in the adjacent area need to be used.

How to effectively implement the above interpolation process is an urgent problem to be solved.

Summary of the invention

The embodiments of the present application provide a video processing method and device, which can effectively implement the interpolation process in the motion estimation and/or motion compensation process.

In a first aspect, a video processing method is provided, which includes: using an interpolation filter among a variety of interpolation filters to perform motion estimation and/or motion compensation on an image block with multiple motion vectors MV of a target frame.

In a second aspect, a video processing device is provided, including a processor, and the processor is configured to call codes stored in a memory to perform the following operations:

Using the interpolation filter among the multiple interpolation filters, motion estimation and/or motion compensation are performed on the image block with multiple MVs of the target frame.

In a third aspect, a computer system is provided, including: a memory, configured to store computer-executable instructions; a processor, configured to access the memory and execute the computer-executable instructions to perform the above-mentioned method in the first aspect operating.

In a fourth aspect, a computer storage medium is provided, the computer storage medium stores program code, and the program code can be used to instruct the execution of the method of the first aspect.

In a fifth aspect, a computer program product is provided. The program product includes program code, and the program code can be used to instruct to execute the method of the first aspect.

Therefore, in the embodiment of the present application, for image blocks with multiple MVs, there may be multiple interpolation filters to choose from, and the interpolation filters can be flexibly selected, so that the storage bandwidth pressure can be reduced while ensuring the encoding performance.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only some of the present application. Embodiments, for those of ordinary skill in the art, without creative work, other drawings can be obtained from these drawings.

Fig. 1 is a frame diagram of video coding according to an embodiment of the present application.

Fig. 2 is a schematic diagram of a prediction method according to an embodiment of the present application.

Fig. 3 is a schematic diagram of an image block interpolation process according to an embodiment of the present application.

Fig. 4 is a schematic diagram of the control points of the Affine mode according to an embodiment of the present application.

Fig. 5 is a schematic diagram of a motion vector of a CU according to an embodiment of the present application.

Fig. 6 is a schematic flowchart of a video processing method according to an embodiment of the present application.

Fig. 7 is a schematic block diagram of a video processing device according to an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are a part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

Unless otherwise specified, all technical and scientific terms used in the embodiments of the present application have the same meaning as commonly understood by those skilled in the technical field of the present application. The terminology used in this application is only for the purpose of describing specific embodiments, and is not intended to limit the scope of this application.

As shown in Figure 1, the video coding framework mainly includes intra-frame prediction, inter-frame prediction, transformation, quantization, entropy coding, and loop filtering.

This application is mainly aimed at improving the inter prediction (inter prediction) part.

The general idea of inter-frame prediction is: use the time-domain correlation between adjacent frames of the video, use the reconstructed frame as a reference frame, and use Motion Estimation (ME) and Motion Compensation (MC) to compare the current frame Make predictions to remove the temporal redundant information of the video.

The current frame (or target frame) mentioned in this article refers to the frame currently being encoded in the encoding scene, and refers to the frame currently being decoded in the decoding scene.

The reconstructed frame mentioned in this article, in the encoding scene, means the previously encoded frame, in the decoding scene, means the previously decoded frame.

For a frame of image, the entire frame of image is not directly processed in the encoding process, and the entire frame of image is usually divided into image blocks for processing.

As an example, first divide the entire frame of image into coding areas (Coding Tree Unit, CTU), for example, the size of the CTU is 64×64 or 128×128 (unit: pixels), and then the CTU can be further divided into square or rectangular Coding Unit (CU). During the encoding process, the CU can be processed.

The unit of the size of the image block mentioned in this article may all be pixels.

The general flow of inter prediction is as follows.

For the current image block in the current frame (hereinafter referred to as the current block for short), the most similar block is found in the reference frame as the prediction block of the current block. The relative displacement between the current block and similar blocks is called a Motion Vector (MV). Motion estimation refers to the process of obtaining a motion vector after searching and comparing the current block of the current frame in the reference frame. Motion compensation refers to the process of obtaining a prediction block using a reference block and a motion vector obtained by motion estimation.

The prediction block obtained in the process of inter prediction may be different from the original current block. Therefore, the difference between the prediction block and the current block can be calculated, and the difference may be called the residual. After performing transformation, quantization, entropy coding and other processing on the residual, the coded bit stream is obtained.

At the encoding end, after the image encoding is completed, that is, after the bitstream obtained by entropy encoding, the bitstream and encoding mode information, such as inter-frame prediction mode, motion vector information, and other information, can be stored or sent to the decoding end.

At the decoding end, after obtaining the entropy coded bitstream, first perform entropy decoding on the bitstream to obtain the corresponding residual; then, obtain the prediction block according to the coding mode information such as the decoded motion vector; finally, according to the residual and prediction Block, get the value of each pixel in the current block, that is, reconstruct the current block, and so on, reconstruct the current frame.

As shown in Figure 1, in the encoding process, steps such as inverse quantization and inverse transformation may also be included. Dequantization refers to the process opposite to the quantification process. Inverse transformation refers to the process opposite to the transformation process.

Inter prediction may include forward prediction, backward prediction, bi-prediction, and so on.

Wherein, the forward prediction is to use the previous reconstructed frame (may be referred to as the historical frame) of the current frame (for example, the frame labeled t as shown in FIG. 2) to predict the current frame. Backward prediction is to use frames after the current frame (may be called a future frame) to predict the current frame. Bi-prediction can be bi-prediction, that is, using both "historical frames" (for example, frames labeled t-2 and t-1 as shown in Figure 2) and "future frames" (for example, as shown in Figure 2, Frames labeled t+2 and t+1) to predict the current frame. Bi-prediction can also be prediction in the same direction, for example, using two "historical frames" to predict the current frame, or using two "future frames" to predict the current frame.

In video coding and decoding, different frame types can be set, and different frame types can support different types of inter-frame prediction modes. Among them, the frame types can include three types: "I frame", "B frame" and "P frame".

All image blocks in the "I frame" use intra-frame coding and do not refer to the information of other frames.

"B frame" is a bidirectional predictive frame. The image blocks in the B frame may use intra-frame coding or inter-frame coding mode. For a bi-predicted B frame, the inter prediction mode of its image block can be forward prediction, backward prediction, or bidirectional prediction. Therefore, the inter prediction block MV of the B frame can be a single MV or a dual MV.

"Generalized B frame" (Generalized P and B picture, referred to as GPB) is a structure in HEVC that combines the characteristics of traditional B and P frames. Generalized B-frames can adopt a dual forward prediction method, that is, there are two reference frames and all of them are "historical frames". For the coding block of a generalized B frame, it may also be an intra mode, a forward prediction mode, and a dual forward prediction mode. The inter prediction block MV of a generalized B frame can be a single MV or a dual MV.

"P frame" is a forward prediction frame, and it is a unidirectional prediction. The coded block in the P frame may use the intra prediction mode, and may use the forward prediction mode. Since the P frame is a unidirectional prediction frame, the inter prediction block MV of the P frame is all a single MV.

The above several frame types can be combined in a specific way to obtain several different encoding methods.

For example, there can be four coding methods in HEVC: All Intra (AI), Random Access (RA), Low Delay B Frame Coding (Low Delay B, LDB), and Low Delay P frame coding (Low Delay P, LDP).

In the AI coding mode, all frames are I frames (I I I I I I I I...).

In the RA encoding mode, it is mainly B frames, and I frames are inserted periodically (approximately every second), which means that in this encoding mode, it is I B B B B B B B B B B...B B B I B B B B B B B B B...B B B I....

In the LDB encoding mode, only the first frame is an I frame, and the rest are coded in a generalized B frame manner. The frame structure is I B B B B B.... In LDP, only the first frame is an I frame, and the rest are coded in the manner of P frames. The frame structure is I P P P P...

The inter-frame prediction technology in HEVC can include three modes, namely inter mode (also called AMVP mode), merge mode and skip mode.

For inter mode, motion vector prediction (motion vector prediction, MVP) can be determined first. After the MVP is obtained, the starting point of motion estimation can be determined according to the MVP, and the motion search is performed near the starting point. After the search is completed, the optimal MV, the position of the reference block in the reference image is determined by the MV, the reference block is subtracted from the current block to obtain the residual block, and the MVP is subtracted from the MV to obtain the Motion Vector Difference (MVD), and the MVD is passed through the code stream Transmitted to the decoding end.

For the Merge mode, the MVP can be determined first, and the MVP can be directly determined as the MV. In order to obtain the MVP, an MVP candidate list (merge candidate list) can be constructed first. In the MVP candidate list, at least one candidate MVP can be included. , Each candidate MVP can correspond to an index. After selecting the MVP from the MVP candidate list, the encoder can write the MVP index into the code stream, and the decoder can find the index from the MVP candidate list according to the index Corresponding MVP to achieve the decoding of image blocks.

In order to understand the Merge mode more clearly, the following will introduce the operation process of using the Merge mode to encode.

Step 1: Obtain the MVP candidate list;

Step 2: Select an optimal MVP from the MVP candidate list, and at the same time obtain the index of the MVP in the MVP candidate list;

Step 3: Use the MVP as the MV of the current block;

Step 4: Determine the position of the reference block (also called the prediction block) in the reference frame image according to the MV;

Step 5. Subtract the current block from the reference block to obtain residual data;

Step 6. Pass the residual data and the index of the MVP to the decoder.

It should be understood that the above process is only a specific implementation of the Merge mode. Merge mode can also have other implementations.

For example, Skip mode is a special case of Merge mode. After obtaining the MV according to the Merge mode, if the encoding end determines that the current block is basically the same as the reference block, then there is no need to transmit residual data, only the index of the MV, and further a flag can be passed, which can indicate that the current block can be directly Obtained from the reference block.

In other words, the feature of the Merge mode is: MV=MVP (MVD=0); and the Skip mode has one more feature, namely: reconstruction value rec=predicted value pred (residual value resi=0).

In the actual scene, due to the continuity of natural object motion, the motion vector of the object between two adjacent frames may not be exactly an integer number of pixels. Therefore, the accuracy of motion estimation can be improved to the sub-pixel level (also called 1/K pixel accuracy). For example, in the HEVC standard, motion vectors with 1/4 pixel accuracy are used for motion estimation of the luminance component.

However, there is no sample value at 1/K pixel in digital video. Generally, in order to achieve motion estimation with 1/K pixel accuracy, the value of 1/K pixel is approximately interpolated. In other words, the line of reference frame K-fold interpolation is performed in the direction and column direction, and search is performed in the image after interpolation. In the process of interpolating the current block, the pixels in the current block and the pixels in the adjacent area need to be used.

As an example, the 1/4 pixel interpolation process is shown in FIG. 3, and the 3 pixels on the left side and the 4 pixels on the right side of the image block to be encoded can be used to generate the pixel value of the interpolation point. As shown in Figure 3, for an image block with a size of 4×4, a ₀ , ₀ and d _{0, 0} are 1/4 pixels, b ₀ , ₀ and h _{0, 0} are half pixels, and c _{0, 0} and n _{0, 0} is 3/4 pixel. If the current block is a 2×2 block, A _0,0 to A _1,0 , A _0,0 to A _{0,1 are} surrounded by 2×2 blocks. In order to calculate all the interpolation points in this 2×2 block, some points outside the 2×2 need to be used, including 3 on the left, 4 on the right, 3 on the top, and 4 on the bottom. The image blocks mentioned here may be 8×8, 4×8, 4×4, or 8×4 image blocks, and may also be image blocks of other sizes, which are not specifically limited in the embodiment of the present application.

The pixel interpolation of each point can be obtained by formula 1-21 in the following way:

a _0,j ＝(∑ _i＝-3..3 A _i,j qfilter[i])＞＞(B-8) Formula 1

b _0,j ＝(∑ _i＝-3..4 A _i,j hfilter[i])＞＞(B-8) Equation 2

c _0,j ＝(∑ _i＝-2..4 A _i,j qfilter[1-i])＞＞(B-8) Equation 3

d _0,0 ＝(∑ _i＝-3..3 A _0,j qfilter[j])＞＞(B-8) Equation 4

h _0,0 ＝(∑ _i＝-3..4 A _0,j hfilter[j])＞＞(B-8) Equation 5

n _0,0 ＝(∑ _i＝-2..4 A _0,j qfilter[1-j])＞＞(B-8) Equation 6

The interpolation process in the embodiment of the present application can be implemented by an interpolation filter. The number of taps of the interpolation filter may refer to the pixel values of the number of points that may be used at most to calculate the interpolated samples. Among them, the coefficients of the 8-tap interpolation filter corresponding to the luminance component and the coefficients of the 4-tap interpolation filter of the chrominance component can be as shown in Tables 1 and 2.

Table 1. Coefficients of 8-tap interpolation filter

位置索引值iPosition index i	-3-3	-2-2	-1-1	00	11	22	33	44
hfilter[i]hfilter[i]	-1-1	44	-11-11	4040	4040	-11-11	44	-1-1
qfilter[i]qfilter[i]	-1-1	44	-10-10	5858	1717	-5-5	11	To

Table 2. Coefficients of 4-tap interpolation filter for chrominance components

位置索引值iPosition index i	-1-1	00	11	22
filter1[i]filter1[i]	-2-2	5858	1010	-2-2
filter2[i]filter2[i]	-4-4	5454	1616	-2-2
filter2[i]filter2[i]	-6-6	4646	2828	-4-4
filter4[i]filter4[i]	-4-4	3636	3636	-4-4
filter5[i]filter5[i]	-4-4	2828	4646	-6-6
filter6[i]filter6[i]	-2-2	1616	5454	-4-4
filter7[i]filter7[i]	-2-2	1010	5858	-2-2

Among them, the tap coefficients of the chrominance components in Table 2, filter1 is the interpolation filter coefficient used at 1/8 pixel position, filter2 is the interpolation filter coefficient used at 2/8 pixel position, and filter3 is 3/8 pixel position The interpolation filter coefficients used, and so on.

The Adaptive Motion Vector Resolution (AMVR) technology can enable the CU to have a motion vector with full pixel precision or sub-pixel precision. The integer pixel accuracy can be, for example, 1-pixel accuracy, 2-pixel accuracy, or the like. The sub-pixel accuracy can be, for example, 1/2 pixel accuracy, 1/4 pixel accuracy, 1/8 pixel accuracy, or 1/16 pixel accuracy.

AMVR can include AMVR in inter mode and AMVR in Affine mode.

In the HEVC standard, only the traditional motion model (for example, translational motion) is considered in the inter prediction process. However, in the real world, there are still many forms of motion, such as zoom, rotation, perspective motion and other irregular motions. In order to take into account the above-mentioned movement form, in VTM-3.0, Affine technology was introduced.

As shown in Figure 4, an Affine mode sports field can pass two control points (four parameters) (as shown in Figure 4(a)) or three control points (six parameters) (as shown in Figure 4(b)) The motion vector is exported.

Hereinafter, the MV (Control Point Motion Vector) of the control point is referred to as CPMV for short.

The processing unit of Affine is not a CU, but a sub-block (sub-CU) obtained after dividing the CU, and the size of each sub-CU may be 4×4. In Affine mode, each sub-CU has one MV. It can be understood that, unlike ordinary CUs, Affine mode CUs do not only have one MV. There are as many sub-CUs as there are in a CU.

As an example, the MV of the sub-CU in one CU is derived through the CPMV calculation of two control points or three control points as shown in FIG. 4. For example, for the four-parameter Affine motion model, the MV of the sub-CU at the (x, y) position is calculated by the following formula:

For another example, for the six-parameter Affine motion model, the MV of the sub-CU at the (x, y) position is calculated by the following formula:

Among them, (mv _0x , mv _0y ) is the MV of the upper left control point, (mv _1x , mv _1y ) is the MV of the upper right control point, and (mv _2x , mv _2y ) is the MV of the lower left control point.

After calculating the above formula, the motion vector in a CU can be as shown in Fig. 5, and each square represents a sub-CU with a size of 4x4. All MVs after the above formula calculation will be converted into 1/16 precision representation, which means that the highest precision of sub-CU MV is 1/16. After the MV of each sub-CU is calculated, the prediction block of each sub-CU is obtained through the process of motion compensation. The size of the sub-CU of the chrominance component and the luminance component is 4x4, and the motion of the chrominance component 4x4 block is obtained by averaging its corresponding four 4x4 luminance component motion vectors.

The Affine merge mode can only process CUs whose width and height are not less than 8, similar to the normal merge mode mentioned above. In this mode, you can first obtain MVs from spatial neighboring blocks and temporal neighboring blocks. In this process, CPMVs of Affine mode CUs and traditional mode MVs are obtained, and CPMVs are obtained from these MV combinations to construct a candidate list, and then from candidates Select a combination from the list (this combination may contain two or three CPMV, representing two control points and three control points CPMV) as the CPMVs of the current block, no motion estimation is required, and only the final selection The index of CPMVs (a CU only needs to write one index) is written into the code stream. The inter prediction mode of adjacent blocks can be the traditional inter prediction mode or the Affine mode. Therefore, the MV obtained from the adjacent blocks may be whole pixels or sub-pixels. The Affine merge mode does not perform AMVR, that is, it does not The process of adaptive motion vector accuracy decision-making will be carried out, and the accuracy of the MV selected from the neighboring blocks is as much as possible.

The Affine Inter mode can only process CUs whose width and height are not less than 16, which is similar to the AMVP mode mentioned above. The candidate list can be constructed by first obtaining MVs from adjacent blocks in the spatial or temporal domain, and then performing the motion estimation process. The motion estimation process is performed in units of the entire CU to obtain CPMVs. The motion compensation process is performed in a unit of 4x4 sub-CU, and finally the index of the selected CPMVs and the difference (MVD, motion vector difference) between the actual CPMVs of the current block CU can be written into the code stream. The accuracy of AMVR is essentially the accuracy of MVD, that is, the accuracy of CPMVs, not the MV accuracy of sub-CU.

For each CU that uses Affine AMVR (or AMVR technology, in some cases, CU may not use Affine AMVR), the encoder can adaptively decide its corresponding MV accuracy, and write the result of the decision into the code stream Pass it to the decoder.

The whole pixel accuracy or sub-pixel accuracy mentioned in Affine AMVR technology refers to the pixel accuracy of CPMV, not the pixel accuracy of sub-CU. For example, the 1/16 accuracy, 1/4 accuracy, and integer pixel accuracy mentioned in Affine AMVR can refer to the accuracy of the CPMV in Figure 4, not the accuracy of the MV actually used in the sub-CU motion compensation process. . For the CPMV of the whole pixel, the process of motion estimation is the whole pixel process, and the MV of the sub-CU obtained after the above two formulas 1) and 2) may be 1/4 accuracy, so the process of motion compensation Sub-pixels will be involved.

The sub-pixel precision interpolation process mentioned above may bring pressure on the memory data reading. The main reason is that, in addition to reading the value of the current encoding block, the interpolation process also needs to read the data of its neighboring points to obtain the pixel value of the sub-pixel. Taking an 8-tap interpolation filter as an example, it needs to use 7 pixels in the horizontal direction and 7 pixels in the vertical direction in addition to the current block. If the current block is a block of width and height W and H, the interpolation process needs Read the area of (W+7)x(H+7). For a 4-tap filter, the area of (W+3)x(H+3) needs to be read. For a 6-tap interpolation filter, the area of (W+5)x(H+5) needs to be read.

Especially in the LDB mode, compared to the LDP mode, due to the existence of dual MVs, the storage bandwidth consumption is larger. If LDP is used and only single MV is allowed, it will lead to a larger gap between the coding performance and LDB.

In response to the above problems, this application proposes an image processing method and device, which can reduce bandwidth pressure to a certain extent while ensuring compression performance.

This application is suitable for the field of digital video coding technology, and is specifically used for the inter-frame prediction part of a video codec. This application can be applied to codecs that comply with the international video coding standard H.264/HEVC and the Chinese AVS2 standard, as well as codecs that comply with the next-generation video coding standard VVC or AVS3.

This application can be applied to the inter-frame prediction part of a video codec, that is to say, the image processing method according to the embodiment of this application can be executed by an encoding device or a decoding device.

Fig. 6 is a schematic flowchart of a video processing method according to an embodiment of the present application. The method includes at least part of the following content.

In 110, an interpolation filter among a variety of interpolation filters is used to perform motion estimation and/or motion compensation on an image block having at least one MV (specifically, multiple MVs) of the target frame.

Specifically, there may be a variety of interpolation filters for use by the video processing device. The video processing device is performing motion estimation and/or motion compensation on the current image block, and the interpolation filter can be selected from the multiple interpolation filters for motion. Estimation and/or motion compensation.

In the embodiment of the present application, for the same image block, the interpolation filter used for motion estimation and the interpolation filter used for motion compensation may be the same or different.

Specifically, the interpolation filter can be selected once for the current image block, which is used for motion estimation and motion compensation. Or, for the current image block, an interpolation filter may be selected for motion estimation and an interpolation filter may be selected for motion compensation.

In the embodiments of the present application, the multiple interpolation filters used for motion estimation may be the same, partially the same, or completely different from the multiple interpolation filters used for motion compensation.

In the embodiments of the present application, different interpolation filters may refer to differences in at least one of the following aspects: the number of taps of the interpolation filter, the coefficients of the interpolation filter, the shape of the interpolation filter (or referred to as the reference of the interpolation filter) Pixel position).

In the embodiment of the present application, the number of taps of the interpolation filter may be 2, 4, 6, or 8, etc.

In the embodiment of the present application, different interpolation filters of the multiple interpolation filters correspond to different preset conditions.

Specifically, each of the multiple interpolation filters corresponds to a preset condition, and when the preset condition of a certain interpolation filter is satisfied, the interpolation filter can be used to perform motion estimation and/or motion compensation.

For example, when the first preset condition corresponding to the first interpolation filter (which may be any interpolation filter among a plurality of interpolation filters) is satisfied, the first interpolation filter is used for the image block, Perform motion estimation and/or motion compensation.

In the embodiment of the present application, the preset conditions corresponding to different interpolation filters may be different in at least one of the following aspects:

The encoding mode of the image block, the interval in which the size of the image block is located, the components to be encoded of the image block, and the number of MVs of the image block.

The coding mode of the image block mentioned in the embodiment of this application may include: Inter mode, Affine mode, Merge mode, and so on.

The interval where the size of the image block mentioned in the embodiment of the present application is located can be divided into two or more than two types of intervals. For example, the interval where the size of the image block is located can be divided into two intervals greater than the preset value and less than or equal to the preset value; for example, the difference between the size of the image block may be divided into intervals greater than the first preset value , An interval less than or equal to the first preset value and greater than the second preset value, and an interval less than or equal to the second preset value.

The components to be coded of the image block mentioned in the embodiments of the present application may include: luminance components and chrominance components.

The number of MVs of image blocks mentioned in the embodiment of the present application may be one, two, three or more.

Optionally, in the embodiment of the present application, one interpolation filter may correspond to one or more preset conditions, and when any one of the one or more preset conditions is satisfied, the interpolation filter may Used for motion estimation and/or motion compensation. Wherein, different preset conditions among the multiple preset conditions corresponding to one interpolation filter are different in at least one of the following aspects:

For example, the preset conditions corresponding to the first interpolation filter in the plurality of interpolation filters include at least two of the following:

1) The coding mode of the image block is the inter mode, and the component to be coded is the luminance component;

2) The coding mode of the image block is inter mode, and the component to be coded is a chrominance component;

3) The coding mode of the image block is the affine motion compensation prediction Affine mode, and the component to be coded is the chrominance component;

4) The coding mode of the image block is Affine mode, and the component to be coded is a luminance component.

The preset conditions corresponding to the first interpolation filter include the above at least two types, which may be different encoding modes, or different components to be encoded.

It should be understood that the factors included in a single preset condition mentioned in the embodiments of the present application may be open-ended, that is, in addition to the factors mentioned in (the factors included in the preset conditions, that is, the factors limited by the preset conditions), Other factors can also be included or defined. Wherein, when the preset condition includes multiple factors, it may mean that the multiple factors are all limited, and when other factors are not included, it may mean that other factors are not limited.

For example, for the preset condition 1), the same as the preset condition 1) may also include that the size of the preset condition is less than or equal to the preset value.

Optionally, in the embodiment of the present application, the preset conditions corresponding to different interpolation filters are different in at least one of the following aspects:

Wherein, it is assumed that one interpolation filter corresponds to a plurality of preset conditions, and the factors for each of the plurality of preset conditions and the preset conditions of other interpolation filters may be different.

For example, the interpolation filter 1 corresponds to the preset condition A and the preset condition B, and the interpolation filter 2 corresponds to the preset condition C. The preset condition A and the preset condition C may be different in the encoding mode of the image block. The condition B and the preset condition C may be different in the component to be encoded.

Optionally, in the embodiment of the present application, different preset conditions correspond to different interpolation filters of the multiple kinds of interpolation filters. When one of the preset conditions is satisfied, the interpolation filter corresponding to the preset condition can be used to perform motion estimation and/or motion compensation on the image block.

Specifically, there may be multiple preset conditions, and the interpolation filters corresponding to each preset condition may be different.

Among them, different preset conditions are different in at least one of the following aspects:

It should be understood that, in the embodiments of the present application, the preset condition is different in a certain aspect, which may refer to different restrictions on this aspect. For example, the preset condition A defines the encoding mode inter mode of the image block, and the preset condition B defines the image The coding mode of the block is Affine mode, the two preset conditions are different in the coding mode of the image block.

Optionally, in the embodiment of the present application, the following preset conditions respectively correspond to different interpolation filters:

1) The encoding mode of the image block is the inter mode, the component to be encoded is the luminance component, and the size of the image block is less than or equal to a first preset value and greater than a second preset value;

2) The encoding mode of the image block is the inter mode, the component to be encoded is the luminance component, and the size of the image block is less than or equal to the second preset value.

The above preset condition 1) and preset condition 2) have the same limitation on the encoding mode of the image block and the component to be encoded, but the limitation on the size of the image block may be different.

It should be understood that different preset conditions may have different limitations on the size of the image block, and the limited encoding mode or the component to be encoded may also be the Affine mode or the chrominance component; or, different preset conditions may limit the size of the image. The size of the block has the same limitation, but the coding mode or the component to be coded can be differently defined; or, different preset conditions can have different limitations on the size of the image block, and there are also different limitations on the coding mode or the component to be coded. Different restrictions.

For example, the following preset conditions correspond to different interpolation filters:

1) The coding mode of the image block is Affine mode, the component to be coded is a luminance component, and the size of the image block is less than or equal to a first preset value and greater than a second preset value;

2) The coding mode of the image block is Affine mode, the component to be coded is a chrominance component, and the size of the image block is less than or equal to the second preset value.

For another example, the following preset conditions correspond to different interpolation filters:

2) The coding mode of the image block is Affine mode, the component to be coded is a luminance component, and the size of the image block is less than or equal to the second preset value.

1) The coding mode of the image block is Affine mode, the component to be coded is a chrominance component, and the size of the image block is less than or equal to a first preset value and greater than a second preset value;

2) The coding mode of the image block is inter mode, the component to be coded is a chrominance component, and the size of the image block is less than or equal to the second preset value.

1) The coding mode of the image block is inter mode, and the component to be coded is a luminance component;

2) The coding mode of the image block is inter mode, and the component to be coded is a chrominance component.

The above preset condition 1) and preset condition 2) have the same limitation on the encoding mode of the image block, but have different limitations on the component to be encoded. Among them, the above preset conditions 1) and 2) can also be defined in Other aspects have the same or different limitations.

The coding mode of the image block is inter mode, and the component to be coded is a luminance component;

The coding mode of the image block is Affine mode, and the component to be coded is a luminance component or a chrominance component.

The above preset condition 1) and preset condition 2) have different restrictions on the encoding mode of the image block and the component to be encoded, and may have the same or different restrictions in other aspects.

For example, the preset condition 1) may further include: the size of the image block is greater than a preset value. At this time, the preset condition 2) may not limit the size of the image block (that is, any size is fine), or the size may be limited.

The foregoing only exemplarily compares some different preset conditions, but the embodiment of the present application is not limited to this, and the preset conditions of the embodiment of the present application may also be other.

In order to understand this application more clearly, the following will explain the number of interpolation filter taps under various preset conditions.

In an implementation manner, the preset condition a) includes: the encoding mode of the image block is the Affine mode, the component to be encoded is the luminance component; and the number of taps of the corresponding filter is 4. Under the preset condition a), the number of taps of the filter is 4 instead of 6 or 8, which is greater than 4, which can reduce bandwidth pressure.

In an implementation manner, the preset condition b) includes: the encoding mode of the image block is the inter mode, the component to be encoded is the luminance component; and the number of taps of the corresponding filter is 4 or 6. The first preset condition may further include: the size of the image block is less than or equal to a preset value. Under the preset condition b), the number of taps of the filter is 4 or 6 instead of 8, etc., which can reduce the bandwidth pressure.

For example, when the size is less than or equal to the first preset value and greater than the second preset value, the number of interpolation filters used may be 6, and when the size is less than or equal to the second preset value, the number of interpolation filters used The number of taps is 4. When the size is greater than the first preset value, the number of interpolation filters used is 8.

For example, when the size is less than or equal to the first preset value, the number of interpolation filters used is 4, and when the size is greater than the first preset value, the number of interpolation filters used is 6 or 8.

In an implementation manner, the preset condition c) includes: the encoding mode of the image block is the inter mode, the component to be encoded is the luminance component; and the number of taps of the corresponding filter is 8. The first preset condition may further include: the size of the image block is greater than a preset value.

In an implementation manner, the preset condition d) includes: the encoding mode of the image block is the inter mode, the component to be encoded is the chrominance component; and the number of taps of the corresponding filter is 4.

In an implementation manner, the preset condition e) includes: the encoding mode of the image block is the Affine mode, the component to be encoded is the chrominance component; and the number of taps of the corresponding filter is 4.

For the above preset conditions a)-e), there may be other limiting factors. For example, the number of MVs is limited to two, or the image frame is limited to double forward B-frames.

Optionally, in the embodiment of the present application, the size of the image block may be negatively correlated with the number of taps of the interpolation filter used. This is because the smaller the size of the image block, the greater the number of image blocks obtained by dividing the image frame. For the entire image frame, the larger the number of pixels required for interpolation processing, the greater the pressure on the bandwidth. Therefore, when the size of the image block is small, a smaller number of taps can be used. Interpolation filter, which can reduce bandwidth pressure.

Optionally, in the embodiment of the present application, the interpolation filter is taken as an example above, and it is mentioned that different interpolation filters may correspond to different preset conditions or different preset conditions may correspond to different interpolation filters. However, in the embodiments of the present application, different interpolation methods may also correspond to different preset conditions, or different preset conditions may correspond to different interpolation filters. At this time, the specific implementation can refer to the above description of the interpolation filter, and specifically, the above interpolation filter can be replaced with an interpolation method.

In the embodiment of the present application, different interpolation methods may include different interpolation filters.

In the embodiments of the present application, under some preset conditions, the interpolation methods used for motion estimation and/or motion compensation may also be the same. Among them, these preset conditions may be different in at least one of the following aspects:

For example, under the preset condition that the component to be coded is a luminance component (referred to as the preset condition A)) and under the preset condition that the component to be coded is a chrominance component (referred to as the preset condition B)) , The interpolation method used for motion estimation and/or motion compensation is the same.

Here, the preset condition A) and the preset condition B) define different components to be encoded, and the interpolation mode corresponding to the preset condition may be the same. The same interpolation method can be the same interpolation filter. The number of taps of the interpolation filter can be 4, which is used to interpolate 1/16 of the pixels.

Optionally, in the embodiment of the present application, the preset condition A) and the preset condition B) respectively define the components to be encoded. In other aspects (for example, the number of MVs, the size of the image block, the encoding mode, etc.) may have The same limitation may also have different limitations.

For example, the preset condition A) and the preset condition B) respectively define the inter mode of the encoding mode of the image block.

For example, the preset condition A) and the preset condition B) respectively define the coding mode Affine mode of the image block.

For example, the preset condition A) defines the encoding mode of the image block as the inter mode, and the preset condition B) separately defines the encoding mode Affine mode of the image block.

For example, the preset condition A) defines the encoding mode of the image block as the Affine mode, and the preset condition B) respectively defines the encoding mode inter mode of the image block.

Optionally, in this embodiment of the application, the image block includes a luminance component and a chrominance component; the luminance component and chrominance component of the image block adopt the same interpolation method for motion estimation and/or motion compensation.

Specifically, in the embodiment of the present application, for the luminance component and the chrominance component of the same image block, the same interpolation method may be used for motion estimation and/or motion compensation.

The use of the same interpolation method mentioned here may mean that the number of taps and/or interpolation coefficients of the interpolation filter used are the same. Optionally, in this embodiment of the application, the number of taps of the interpolation filter used for motion estimation and/or motion compensation of the luminance component and chrominance component of the image block is 4, which is used to interpolate 1/16 pixels .

In the embodiment of the present application, when at least one of the following conditions is met, the luminance component and the chrominance component of the image block adopt the same interpolation method for motion estimation and/or motion compensation:

1) There is a specific identification bit in the code stream, at this time, it can be applied to the decoding end. Among them, the specific identification bit mentioned here may be the first identification bit mentioned below, or the first identification bit with a specific value.

For example, when there is a first identification bit in the code stream, the luminance component and the chrominance component of the image block adopt the same interpolation method for motion estimation and/or motion compensation. Wherein, the second identification bit can be used to indicate whether the first identification bit exists in the code stream. The second identification bit indicates that when the current frame is a B frame, the first identification bit exists in the code stream.

For example, when the value of the first identification bit is a specific value, the luminance component and the chrominance component of the image block adopt the same interpolation method for motion estimation and/or motion compensation. The first identification bit may have two values: longtype and shorttype. The longtype indicates that the number of taps of the interpolation filter used may be more than the number of taps of the interpolation filter used for shorttype indicates. For example, when the value of the first identification bit is long type, the number of taps of the interpolation filter is 8, and when the value of the first identification bit is short type, the number of taps of the interpolation filter is 4 or 6. When the first flag indicates the long type or the short type, the luminance component and the chrominance component of the image block adopt the same interpolation method for motion estimation and/or motion compensation.

2) The coding mode is inter mode or Affine mode. Among them, the coding mode here can be that the coding mode of the luminance component is inter mode or Affine mode, or the coding mode of the chrominance component is inter mode or Affine mode; or, the coding mode of the luminance component and the coding of the chrominance component The modes are both inter mode or Affine mode, where the coding mode of the luminance component and the coding mode of the chrominance component may be the same or different.

3) The size of the image block is greater than a preset value.

Optionally, in this embodiment of the present application, the coding modes of the luminance component and the chrominance component of the image block are both inter mode.

Specifically, it may be that when the coding modes of the luminance component and the chrominance component of the image block are both inter mode, the luminance component and the chrominance component of the image block adopt the same interpolation method for motion estimation and/or Motion compensation. At this time, the same interpolation method for the luminance component and the chrominance component of the image block for motion estimation and/or motion compensation may also have other restrictions, which are not specifically limited in the embodiment of the present application.

Optionally, in this embodiment of the present application, the coding modes of the luminance component and the chrominance component of the image block are both Affine mode.

Specifically, it may be that when the coding modes of the luminance component and the chrominance component of the image block are both Affine mode, the luminance component and the chrominance component of the image block adopt the same interpolation method for motion estimation and/or Motion compensation. At this time, the same interpolation method for the luminance component and the chrominance component of the image block for motion estimation and/or motion compensation may also have other restrictions, which are not specifically limited in the embodiment of the present application.

Optionally, in this embodiment of the application, the encoding end may write identification bits in the bitstream, and the identification bits are used to indicate that the luminance component and chrominance component of the image block use the same motion estimation and/or motion estimation. The interpolation method of compensation.

For the decoding end, an identification bit is obtained in the code stream, and the identification bit is used to indicate that the luminance component and the chrominance component of the image block adopt the same motion estimation and/or motion compensation interpolation method.

Alternatively, in the embodiment of the present application, the code stream may not have an identification bit, but the encoding end and the decoding end adopt the same method to select the interpolation filter.

Optionally, in the embodiment of the present application, the encoding end may write an identification bit in the code stream, and the identification bit may indicate whether the interpolation mode (or interpolation filter) corresponding to each preset condition is the same. The decoding end can obtain the identification bit from the code stream to determine whether the interpolation filters (or interpolation filters) corresponding to each preset condition are the same.

Exemplarily, the identification bit is used to indicate whether the preset condition including the component to be coded as a luminance component and the interpolation mode corresponding to the preset condition including the component to be coded as the luminance component are the same.

Among them, the identification bit can be carried in the sequence header, frame header, and slice header.

Optionally, in this embodiment of the present application, the encoding end may add a first identification bit to the code stream, and the first identification bit is used to indicate that one of the interpolation filters is selected from among the various interpolation filters. Used for motion estimation and/or motion compensation (that is, whether the solution of this application is applicable). Correspondingly, the decoding end can obtain the first identification bit from the code stream to determine that an interpolation filter needs to be selected from the various interpolation filters for motion estimation and/or motion compensation.

Optionally, in the embodiment of the present application, the first identification bit is used to indicate that a filter with one tap quantity is selected from filters with multiple tap quantities for use in motion estimation and/or motion compensation.

As an example, the multiple-tap filters include a first filter and a second filter, the first filter has 8 taps, and the second filter has 6 or 4 taps. , Or the number of taps of the second filter is the same as the number of taps of the chrominance component; the first flag is used to indicate the selection of the first filter or the second filter.

At this time, the first interpolation filter and the second interpolation filter may be candidate interpolation filters for encoding the luminance component, but the embodiment of the present application is not limited thereto.

Optionally, in this embodiment of the present application, the first identification bit may have two values: longtype and shorttype. The longtype indicates that the number of taps of the interpolation filter used may be more than that of the shorttype indicates that the interpolation filter used Number of taps. For example, when the value of the first identification bit is long type, the selected interpolation filter can be the first interpolation filter, the number of taps of the interpolation filter is 8, and when the value of the first identification bit is short type, the selected interpolation filter The filter may be a second interpolation filter, and the number of taps of the second interpolation filter is 4 or 6.

Wherein, the first identification bit can be carried in the sequence header, the frame header, and the Slice header, and specifically can be carried in the Slice_type in the Slice header. The second identification bit can be carried in the sequence header, the frame header, and the slice header.

Among them, a frame of image can have one or more slices, and each slice has its own slice header. The slice header can use "slice type" to identify whether the current slice is I_SLICE, B_SLICE or P_SLICE. For I_SLICE, only intra prediction can be used; P_SLICE can use intra prediction or forward prediction; B_SLICE can use intra, forward prediction, bidirectional prediction, backward prediction or dual forward prediction.

Carrying the first identification bit in the slice header means: the first identification bit may be an identification bit independent of the slice type. When the Slice header indicates that the slice is B_SLICE, if the first flag indicates to select an interpolation filter from a variety of interpolation filters, you can select an interpolation filter from a variety of interpolation filters according to the scheme of this application (that is, apply this If the first flag indicates that the interpolation filter is not to be selected from a variety of interpolation filters, then the interpolation filter may not be selected from a variety of interpolation filters according to the solution of the application (that is, the application is not applicable Solution), for example, a preset interpolation filter can be used. When the Slice header indicates that the slice is P_SLICE, if the first flag indicates to select an interpolation filter from a variety of interpolation filters, you can select an interpolation filter from a variety of interpolation filters according to the solution of this application, and follow the B_SLICE The SLICE is processed in a manner (for example, bidirectional prediction, dual forward prediction, or dual backward prediction is used, that is, the current slice is processed as B_SLICE), if the first flag indicates that no interpolation filter is selected from a variety of interpolation filters, It is not necessary to select an interpolation filter from a variety of interpolation filters according to the solution of the present application, and process the SLICE in the manner of P_SLICE (for example, using intra prediction or forward prediction).

Optionally, in this embodiment of the present application, a second identification bit may also exist in the code stream. When the second identification bit is used to indicate that the target frame is a B frame, the code stream has the first An identification bit.

Wherein, the second identification bit here may be slice type. In other words, when the slice type indicates that the current frame is a B frame, the first identification bit still exists in the code stream, otherwise the first identification bit does not exist.

In the embodiment of this application, the first identification bit can also be multiplexed with slice type, especially the slice type indicating P_SLICE. For example, when the slice type is P_SLICE, it can be selected from multiple interpolation filters according to the scheme of this application. Select the interpolation filter, and process the SLICE in the manner of B_SLICE (for example, using bidirectional prediction, dual forward prediction, or dual backward prediction, that is, processing the current slice as B_SLICE). If the slice type is B_SLICE, in addition to the slice_type, there may also be a first identification bit in the slice header. If the first identification bit indicates to select an interpolation filter from a variety of interpolation filters, you can follow this application The solution selects the interpolation filter from a variety of interpolation filters. If the first flag indicates that the interpolation filter is not selected from the multiple interpolation filters, then the interpolation filter may not be selected from the multiple interpolation filters according to the solution of the application For example, a preset interpolation filter can be used.

In the embodiment of the present application, there may be more than two types of first identification bits at the same time. For example, the first identification bit exists in at least two of the sequence header, the frame header, and the slice header.

In one implementation, the first identification bit in the sequence header indicates whether it is required (representative must) to be applicable or not required (representative must not, not possible). When the solution of this application is applicable, it can be in the frame header or slice header. There is no first identification bit, and the solution of this application is applicable or not applicable to all frames or slices of the sequence. The first identification bit in the sequence header indicates that it can (representing selectivity, each frame or slice can be applicable or not) applicable or not applicable to the scheme of this application, it can exist in the frame header or slice header The first flag indicates whether the current frame or slice is applicable to the solution of this application.

For example, when the first identification bit in the sequence header indicates that each frame is applicable to the solution of this application, the first identification bit in the frame header or slice header indicates whether the current frame or slice applies the solution of this application. When the first identification bit in the sequence header indicates that each frame does not need to apply the solution of this application, the first identification bit no longer exists in the frame header and slice header. When processing each frame or slice, the solution of this application does not apply .

For example, when the first identification bit in the sequence header indicates that each frame needs to apply the solution of this application, the first identification bit no longer exists in the frame header and the sequence header. When processing each frame or slice, the solution of this application applies . When the identification bit in the sequence header indicates that each frame may not be applicable (or applicable) to the solution of the application, the first identification bit in the frame header or slice header indicates whether the current frame or slice is applicable to the solution of the application.

Similarly, in the embodiment of the present application, the first identification bit may exist in both the frame header and the slice header, or only the first identification bit may exist in the frame header. When the first identification bit in the frame header indicates that the solution of this application needs to be applied or not required, the first identification bit may not be present in the slice header, and all slices of the frame are applicable or not applicable to the solution of this application . When the first identification bit in the frame header indicates that the solution of this application may or may not be applicable, there may be a first identification bit in the slice header, indicating whether the slice is applicable to the solution of this application.

The embodiments of this application can be used in the LDB mode. By modifying the interpolation function of the double MV in the LDB (selecting the interpolation filter or the interpolation method), the memory bandwidth consumption of the LDB can be reduced, and the LDB mode can bring no extra cost compared to LDP. Bandwidth pressure, while having better compression performance than LDP.

The above describes how to select an interpolation filter or an interpolation method. The embodiments of the present application can also be used for how to select one-way prediction or two-way prediction, and motion estimation and/or motion compensation use integer pixel precision (without interpolation filter ) Or sub-pixel accuracy (interpolation filter is required).

In one implementation manner, there may be motion estimation and/or motion compensation methods with integer pixel accuracy, and motion estimation and/or motion compensation methods with sub-pixel accuracy. Then, the selection of the motion estimation and/or motion compensation method with integer pixel accuracy can be selected according to preset conditions. The estimation and/or motion compensation method is also a sub-pixel precision motion estimation and/or motion compensation method. Wherein, the two preset conditions may also be different in at least one of the following aspects:

Among them, the preset conditions corresponding to the motion estimation and/or motion compensation mode with integer pixel accuracy may include: the encoding mode is inter mode, the size of the image block is less than or equal to the preset value, and the number of MVs of the image block is greater than or equal to two. The preset condition may also limit other factors, which are not limited in the embodiment of the present application.

In this case, the preset conditions corresponding to the sub-pixel precision motion estimation and/or motion compensation method may further include multiple types, corresponding to multiple interpolation filters, or sub-pixel precision motion estimation and/or motion. The compensation method may have multiple interpolation filters, and the preset conditions corresponding to each interpolation filter may be different.

In another implementation manner, there may be a prediction mode of unidirectional prediction and a prediction mode of bidirectional prediction, and the prediction mode of unidirectional prediction or bidirectional prediction may be selected according to a preset condition. Wherein, the two preset conditions may also be different in at least one of the following aspects:

The preset conditions corresponding to the prediction mode of unidirectional prediction may include: the coding mode is inter mode, the size of the image block is less than or equal to the preset value, and the number of MVs of the image block is greater than or equal to 2. The preset condition may also limit other factors, which are not limited in the embodiment of the present application.

In this case, the preset conditions corresponding to the prediction mode of the bidirectional prediction may include multiple types, corresponding to multiple interpolation filters, or the prediction mode of the bidirectional prediction may have multiple interpolation filters, each of which The corresponding preset conditions may be different.

The video processing method according to the embodiment of the present application is described above, and the video processing device for implementing the embodiment of the present application will be introduced below.

FIG. 7 shows a schematic block diagram of a video processing device 200 according to an embodiment of the present application.

As shown in FIG. 7, the device 200 may include a processor 210, and may further include a memory 220.

It should be understood that the computer system 200 may also include components commonly included in other computer systems, such as input and output devices, communication interfaces, etc., which are not limited in the embodiment of the present application.

The memory 220 is used to store computer executable instructions.

The memory 220 may be various types of memory, for example, it may include a high-speed random access memory (Random Access Memory, RAM), and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. The example does not limit this.

The processor 210 is configured to access the memory 220 and execute the computer-executable instructions to perform operations in the method for video processing in the foregoing embodiment of the present application.

The processor 210 may include a microprocessor, a field-programmable gate array (Field-Programmable Gate Array, FPGA), a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), etc. The implementation of this application The example does not limit this.

The video processing device 200 of the embodiment of the present application may correspond to the execution subject of the video processing method of the embodiment of the present application, and the foregoing and other operations and/or functions of the various modules of the video processing device 200 are used to implement the corresponding procedures of the foregoing methods. , For the sake of brevity, I will not repeat it here.

An embodiment of the present application also provides an electronic device, which may include the video processing device of the foregoing various embodiments of the present application.

The embodiment of the present application also provides a computer storage medium, and the computer storage medium stores program code, and the program code may be used to instruct the execution of the video processing method in the foregoing embodiment of the present application.

It should be understood that, in the embodiments of the present application, the term "and/or" is merely an association relationship describing an associated object, indicating that there may be three relationships. For example, A and/or B can mean: A alone exists, A and B exist at the same time, and B exists alone. In addition, the character "/" in this text generally indicates that the associated objects before and after are in an "or" relationship.

A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of the two, in order to clearly illustrate the hardware and software Interchangeability. In the above description, the composition and steps of each example have been generally described in terms of function. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments of the present application.

In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of this application is essentially or the part that contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium It includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code .

The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Anyone familiar with the technical field can easily think of various equivalents within the technical scope disclosed in this application. Modifications or replacements, these modifications or replacements shall be covered within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims

A video processing method, characterized by comprising:

Using the interpolation filter among the multiple interpolation filters, motion estimation and/or motion compensation are performed on the image block with multiple MVs of the target frame.
The method according to claim 1, wherein different interpolation filters in the multiple interpolation filters correspond to different preset conditions;

The use of the interpolation filter among the multiple interpolation filters to perform motion estimation and/or motion compensation for the image block with multiple MVs of the target frame includes:

When the first preset condition corresponding to the first interpolation filter is satisfied, the first interpolation filter is used to perform motion estimation and/or motion compensation on the image block.
The method according to claim 2, wherein the preset conditions corresponding to different interpolation filters are different in at least one of the following aspects:

The encoding mode of the image block, the interval in which the size of the image block is located, the components to be encoded of the image block, and the number of MVs of the image block.
The method according to claim 1, wherein different preset conditions correspond to different interpolation filters among the multiple interpolation filters;

Use the interpolation filter among multiple interpolation filters to perform motion estimation and/or motion compensation on the image block with multiple MVs of the target frame, including:

When the first preset condition among the multiple preset conditions is satisfied, the first interpolation filter corresponding to the first preset condition is used to perform motion estimation and/or motion on the image block make up.
The method according to claim 4, wherein the different preset conditions are different in at least one of the following aspects:

The encoding mode of the image block, the interval in which the size of the image block is located, the components to be encoded of the image block, and the number of MVs of the image block.
The method according to claim 2 or 3, wherein the preset condition corresponding to the first interpolation filter includes at least two of the following:

The coding mode of the image block is an inter mode, and the component to be coded is a luminance component;

The coding mode of the image block is inter mode, and the component to be coded is a chrominance component;

The coding mode of the image block is an affine motion compensation prediction Affine mode, and the component to be coded is a chrominance component;

The coding mode of the image block is Affine mode, and the component to be coded is a luminance component.
The method according to claim 6, characterized in that the coding mode including the image block is inter mode, and the preset condition that the component to be coded is a luminance component further comprises: the size of the image block is less than or equal to default value.
The method according to any one of claims 2 to 5, wherein the following preset conditions respectively correspond to different interpolation filters:

The coding mode of the image block is an inter mode, the component to be coded is a luminance component, and the size of the image block is less than or equal to a first preset value and greater than a second preset value;

The coding mode of the image block is an inter mode, the component to be coded is a luminance component, and the size of the image block is less than or equal to the second preset value.
The method according to any one of claims 2 to 5, wherein the following preset conditions respectively correspond to different interpolation filters:

The coding mode of the image block is inter mode, and the component to be coded is a luminance component;

The coding mode of the image block is inter mode, and the component to be coded is a chrominance component.
The method according to any one of claims 2 to 5, wherein the following preset conditions respectively correspond to different interpolation filters:

The coding mode of the image block is inter mode, and the component to be coded is a luminance component;

The coding mode of the image block is Affine mode, and the component to be coded is a luminance component or a chrominance component.
The method according to claim 9 or 10, wherein the coding mode including the image block is inter mode, and the preset condition that the component to be coded is a luminance component further comprises: the size of the image block is greater than default value.
The method according to any one of claims 2 to 5, characterized in that:

The first preset condition includes: the coding mode of the image block is Affine mode, and the component to be coded is a luminance component;

The number of taps of the first filter is 4.
The method according to any one of claims 2 to 5, characterized in that:

The first preset condition includes: the coding mode of the image block is inter mode, and the component to be coded is a luminance component;

The number of taps of the first filter is 4 or 6.
The method according to claim 13, wherein the first preset condition further comprises: the size of the image block is less than or equal to a preset value.
The method according to any one of claims 2 to 5, characterized in that:

The first preset condition includes: the coding mode of the image block is inter mode, and the component to be coded is a luminance component;

The number of taps of the first filter is 8.
The method according to claim 15, wherein the first preset condition further comprises: the size of the image block is greater than a preset value.
The method according to any one of claims 2 to 5, characterized in that:

The first preset condition includes: the coding mode of the image block is inter mode, and the component to be coded is a chrominance component;

The number of taps of the first filter is 4.
The method according to any one of claims 2 to 5, characterized in that:

The first preset condition includes: the coding mode of the image block is Affine mode, and the component to be coded is a chrominance component;

The number of taps of the first filter is 4.
The method according to any one of claims 1 to 3, wherein the image block includes a luminance component and a chrominance component;

The luminance component and chrominance component of the image block adopt the same interpolation method for motion estimation and/or motion compensation.
The method of claim 19, wherein:

When at least one of the following conditions is met, the luminance component and the chrominance component of the image block adopt the same interpolation method for motion estimation and/or motion compensation:

The code stream has a specific identification bit; the coding mode is inter mode or Affine mode; the size of the image block is greater than a preset value.
The method according to claim 19 or 20, wherein the luminance component and the chrominance component of the image block are subjected to motion estimation and/or motion compensation using the same interpolation method, comprising:

The number of taps of the interpolation filter used for motion estimation and/or motion compensation of the luminance component and chrominance component of the image block is the same; and/or,

The interpolation coefficients of the interpolation filters used for motion estimation and/or motion compensation of the luminance component and the chrominance component of the image block are the same.
The method according to any one of claims 19 to 21, wherein the coding modes of the luminance component and the chrominance component of the image block are both inter mode.
The method according to any one of claims 19 to 21, wherein the coding mode of the luminance component and the chrominance component of the image block are both Affine mode.
The method according to any one of claims 19 to 23, characterized in that the number of taps of the interpolation filter used for motion estimation and/or motion compensation of the luminance component and chrominance component of the image block is 4. Interpolate 1/16 of the pixels.
The method according to any one of claims 19 to 24, wherein the method is used at the encoding end, and the method further comprises:

An identification bit is written in the code stream, and the identification bit is used to indicate that the luminance component and the chrominance component of the image block adopt the same motion estimation and/or motion compensation interpolation method.
The method according to any one of claims 19 to 24, wherein the method is used at the decoding end, and the method further comprises:

An identification bit is obtained in the code stream, and the identification bit is used to indicate that the luminance component and the chrominance component of the image block adopt the same motion estimation and/or motion compensation interpolation method.
The method according to any one of claims 1 to 26, wherein the method is used at the encoding end; the method further comprises:

A first identification bit is added to the code stream, and the first identification bit is used to indicate that one of the interpolation filters is selected from a variety of interpolation filters for use in motion estimation and/or motion compensation.
The method according to any one of claims 1 to 26, wherein the method is used at the decoding end; the method further comprises:

The first identification bit is acquired in the code stream, and the first identification bit is used to indicate that one of the interpolation filters is selected from a variety of interpolation filters for use in motion estimation and/or motion compensation.
The method according to claim 27 or 28, wherein the first identification bit is used to indicate that a filter with one tap number is selected from filters with multiple tap numbers for use in motion estimation and/or Motion compensation.
The method according to claim 29, wherein the filters with multiple tap numbers include a first filter and a second filter, the first filter has 8 taps, and the second filter The number of taps of the filter is 6 or 4, or the number of taps of the second filter indicates the same number of taps of the filter of the chrominance component;

The first identification bit is used to indicate the selection of the first filter or the second filter.
The method according to any one of claims 27 to 30, wherein the method is used at the encoding end, and the method further comprises:

A second identification bit is added to the code stream, and when the second identification bit is used to indicate that the target frame is a B frame, the code stream has the first identification bit.
The method according to any one of claims 27 to 30, wherein the method is used at the decoding end, and the method further comprises:

A second identification bit is acquired in a code stream. When the second identification bit is used to indicate that the target frame is a B frame, the code stream has the first identification bit.
The method according to any one of claims 27 to 32, wherein the first identification bit is carried in a sequence header, a frame header, and a slice header.
The method according to claim 31 or 32, wherein the second identification bit is carried in a sequence header, a frame header, and a slice header.
The method according to any one of claims 31 to 34, wherein the second identification bit is Slice_type in the Slice header.
A video processing device, characterized by comprising a processor, which is used to call codes stored in a memory to perform the following operations:

Using the interpolation filter among the multiple interpolation filters, motion estimation and/or motion compensation are performed on the image block with multiple MVs of the target frame.
The device according to claim 36, wherein different interpolation filters of the multiple kinds of interpolation filters correspond to different preset conditions;

The use of the interpolation filter among the multiple interpolation filters to perform motion estimation and/or motion compensation for the image block with multiple MVs of the target frame includes:

When the first preset condition corresponding to the first interpolation filter is satisfied, the first interpolation filter is used to perform motion estimation and/or motion compensation on the image block.
The device according to claim 37, wherein the preset conditions corresponding to different interpolation filters are different in at least one of the following aspects:

The encoding mode of the image block, the interval in which the size of the image block is located, the components to be encoded of the image block, and the number of MVs of the image block.
The device according to claim 36, wherein different preset conditions correspond to different interpolation filters of the multiple interpolation filters;

Use the interpolation filter among multiple interpolation filters to perform motion estimation and/or motion compensation on the image block with multiple MVs of the target frame, including:

When the first preset condition among the multiple preset conditions is satisfied, the first interpolation filter corresponding to the first preset condition is used to perform motion estimation and/or motion on the image block make up.
The device according to claim 39, wherein the different preset conditions are different in at least one of the following aspects:

The encoding mode of the image block, the interval in which the size of the image block is located, the components to be encoded of the image block, and the number of MVs of the image block.
The device according to claim 37 or 38, wherein the preset condition corresponding to the first interpolation filter includes at least two of the following:

The coding mode of the image block is an inter mode, and the component to be coded is a luminance component;

The coding mode of the image block is inter mode, and the component to be coded is a chrominance component;

The coding mode of the image block is an affine motion compensation prediction Affine mode, and the component to be coded is a chrominance component;

The coding mode of the image block is Affine mode, and the component to be coded is a luminance component.
The device according to claim 41, wherein the encoding mode including the image block is inter mode, and the preset condition that the component to be encoded is a luminance component further comprises: the size of the image block is less than or equal to default value.
The device according to any one of claims 37 to 40, wherein the following preset conditions respectively correspond to different interpolation filters:

The coding mode of the image block is an inter mode, the component to be coded is a luminance component, and the size of the image block is less than or equal to a first preset value and greater than a second preset value;

The coding mode of the image block is an inter mode, the component to be coded is a luminance component, and the size of the image block is less than or equal to the second preset value.
The device according to any one of claims 37 to 40, wherein the following preset conditions respectively correspond to different interpolation filters:

The coding mode of the image block is inter mode, and the component to be coded is a luminance component;

The coding mode of the image block is inter mode, and the component to be coded is a chrominance component.
The device according to any one of claims 37 to 40, wherein the following preset conditions respectively correspond to different interpolation filters:

The coding mode of the image block is inter mode, and the component to be coded is a luminance component;

The coding mode of the image block is Affine mode, and the component to be coded is a luminance component or a chrominance component.
The device according to claim 44 or 45, wherein the encoding mode including the image block is inter mode, and the preset condition that the component to be encoded is a luminance component further comprises: the size of the image block is greater than default value.
The device according to any one of claims 37 to 40, characterized in that:

The first preset condition includes: the coding mode of the image block is Affine mode, and the component to be coded is a luminance component;

The number of taps of the first filter is 4.
The device according to any one of claims 37 to 40, characterized in that:

The first preset condition includes: the coding mode of the image block is inter mode, and the component to be coded is a luminance component;

The number of taps of the first filter is 4 or 6.
The device according to claim 48, wherein the first preset condition further comprises: the size of the image block is less than or equal to a preset value.
The device according to any one of claims 37 to 40, characterized in that:

The first preset condition includes: the coding mode of the image block is inter mode, and the component to be coded is a luminance component;

The number of taps of the first filter is 8.
The device according to claim 50, wherein the first preset condition further comprises: the size of the image block is greater than a preset value.
The device according to any one of claims 37 to 40, characterized in that:

The first preset condition includes: the coding mode of the image block is inter mode, and the component to be coded is a chrominance component;

The number of taps of the first filter is 4.
The device according to any one of claims 37 to 40, characterized in that:

The first preset condition includes: the coding mode of the image block is Affine mode, and the component to be coded is a chrominance component;

The number of taps of the first filter is 4.
The device according to any one of claims 36 to 38, wherein the image block includes a luminance component and a chrominance component;

The luminance component and chrominance component of the image block adopt the same interpolation method for motion estimation and/or motion compensation.
The device of claim 54, wherein:

When at least one of the following conditions is met, the luminance component and the chrominance component of the image block adopt the same interpolation method for motion estimation and/or motion compensation:

The code stream has a specific identification bit; the coding mode is inter mode or Affine mode; the size of the image block is greater than a preset value.
The device according to claim 54 or 55, wherein the luminance component and the chrominance component of the image block adopt the same interpolation method for motion estimation and/or motion compensation, comprising:

The number of taps of the interpolation filter used for motion estimation and/or motion compensation of the luminance component and chrominance component of the image block is the same; and/or,

The interpolation coefficients of the interpolation filters used for motion estimation and/or motion compensation of the luminance component and the chrominance component of the image block are the same.
The device according to any one of claims 54 to 56, wherein the coding modes of the luminance component and the chrominance component of the image block are both inter mode.
The device according to any one of claims 54 to 56, wherein the coding mode of the luminance component and the chrominance component of the image block are both Affine mode.
The device according to any one of claims 54 to 58, wherein the number of taps of the interpolation filter used for motion estimation and/or motion compensation of the luminance component and chrominance component of the image block is 4, Interpolate 1/16 of the pixels.
The device according to any one of claims 54 to 59, wherein the device is used for an encoding end, and the processor is further used for:

An identification bit is written in the code stream, and the identification bit is used to indicate that the luminance component and the chrominance component of the image block adopt the same motion estimation and/or motion compensation interpolation method.
The device according to any one of claims 54 to 59, wherein the device is used for a decoding end, and the processor is further used for:

An identification bit is obtained in the code stream, and the identification bit is used to indicate that the luminance component and the chrominance component of the image block adopt the same motion estimation and/or motion compensation interpolation method.
The device according to any one of claims 36 to 61, wherein the device is used for an encoding end; the processor is further used for:

A first identification bit is added to the code stream, and the first identification bit is used to indicate that one of the interpolation filters is selected from a variety of interpolation filters for use in motion estimation and/or motion compensation.
The device according to any one of claims 36 to 61, wherein the device is used for a decoding end; the processor is further used for:

The first identification bit is acquired in the code stream, and the first identification bit is used to indicate that one of the interpolation filters is selected from a variety of interpolation filters for use in motion estimation and/or motion compensation.
The device according to claim 62 or 63, wherein the first identification bit is used to indicate that a filter with one tap number is selected from filters with multiple tap numbers for use in motion estimation and/or Motion compensation.
The device according to claim 64, wherein the filters with multiple numbers of taps comprise a first filter and a second filter, the number of taps of the first filter is 8, and the second filter The number of taps of the filter is 6 or 4, or the number of taps of the second filter indicates the same number of taps of the filter of the chrominance component;

The first identification bit is used to indicate the selection of the first filter or the second filter.
The device according to any one of claims 62 to 65, wherein the device is used for an encoding end, and the processor is further used for:

A second identification bit is added to the code stream, and when the second identification bit is used to indicate that the target frame is a B frame, the code stream has the first identification bit.
The device according to any one of claims 62 to 65, wherein the device is used for a decoding end, and the processor is further used for:

A second identification bit is acquired in a code stream. When the second identification bit is used to indicate that the target frame is a B frame, the code stream has the first identification bit.
The device according to any one of claims 62 to 67, wherein the first identification bit is carried in a sequence header, a frame header, and a slice header.
The device according to claim 67 or 68, wherein the second identification bit is carried in a sequence header, a frame header, and a slice header.
The device according to claim 69, wherein the second identification bit is Slice_type in the Slice header.