CN111656782A

CN111656782A - Video processing method and device

Info

Publication number: CN111656782A
Application number: CN201980009161.8A
Authority: CN
Inventors: 孟学苇; 郑萧桢; 王苫社; 马思伟
Original assignee: Peking University; SZ DJI Technology Co Ltd
Current assignee: Peking University; SZ DJI Technology Co Ltd
Priority date: 2019-06-19
Filing date: 2019-06-19
Publication date: 2020-09-11
Also published as: WO2020252707A1

Abstract

The embodiment of the application provides a video processing method and device, which can effectively realize an interpolation process in a motion estimation and/or motion compensation process. The method comprises the following steps: and performing motion estimation and/or motion compensation on the image block with the multiple motion vectors MV of the target frame by utilizing an interpolation filter in the multiple interpolation filters.

Description

Video processing method and device

Copyright declaration

The disclosure of this patent document contains material which is subject to copyright protection. The copyright is owned by the copyright owner. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the patent and trademark office official records and records.

Technical Field

The present application relates to the field of image processing, and more particularly, to a video processing method and apparatus.

Background

Prediction is an important module of the mainstream video coding framework and can include intra-prediction and inter-prediction.

The general flow of inter prediction may include Motion Estimation (ME) and Motion Compensation (MC). The Motion estimation process is a process of obtaining a Motion Vector (MV) by searching and comparing a current coding block of a current frame in a reference frame. Motion compensation is the process of obtaining a prediction block for a current block using the MV and a reference block. The prediction block obtained by motion compensation may have a certain difference from the original current block, so that the difference (residual) between the prediction block and the current block needs to be transmitted to the decoding end after transformation, quantization and other processes, and in addition, the information of the MV and the reference frame needs to be transmitted to the decoding end for the decoding end to reconstruct the current frame.

Due to the continuity of the natural object motion, the motion vector of the object between two adjacent frames is not necessarily exactly an integer number of pixel units. To improve the accuracy of motion vectors, sub-pixel accuracy is proposed. For example, in the High Efficiency Video Coding (HEVC) standard, motion vectors with 1/4 pixel accuracy are used for motion estimation of the luma component. However, there are no sub-pixel samples in the digital video, and in general, in order to achieve 1/K pixel precision estimation, it is necessary to approximately interpolate the values of these sub-pixels, that is, to perform K-fold interpolation in the row direction and column direction of the reference frame, and search for the prediction block in the reference frame after interpolation. In the process of interpolating the current block, the pixel points in the current block and the pixel points in the adjacent area thereof need to be used.

How to effectively realize the interpolation process is an urgent problem to be solved.

Disclosure of Invention

The embodiment of the application provides a video processing method and device, which can effectively realize an interpolation process in a motion estimation and/or motion compensation process.

In a first aspect, a video processing method is provided, including: and performing motion estimation and/or motion compensation on the image block with the multiple motion vectors MV of the target frame by utilizing an interpolation filter in the multiple interpolation filters.

In a second aspect, a video processing apparatus is provided, comprising a processor configured to invoke code stored in a memory to perform the following operations:

and performing motion estimation and/or motion compensation on the image block with multiple MVs of the target frame by using an interpolation filter in the multiple interpolation filters.

In a third aspect, there is provided a computer system comprising: a memory for storing computer executable instructions; a processor for accessing the memory and executing the computer-executable instructions to perform the operations in the method of the first aspect described above.

In a fourth aspect, a computer storage medium is provided, in which program code is stored, the program code being operable to instruct execution of the method of the first aspect.

In a fifth aspect, a computer program product is provided, which comprises program code that may be used to instruct the execution of the method of the first aspect.

Therefore, in the embodiment of the application, for an image block with multiple MVs, multiple interpolation filters can be selected, and the interpolation filters can be flexibly selected, so that the storage bandwidth pressure can be reduced while the coding performance is ensured.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a block diagram of video encoding according to an embodiment of the present application.

Fig. 2 is a schematic diagram of a prediction approach according to an embodiment of the present application.

Fig. 3 is a schematic diagram of an interpolation process of an image block according to an embodiment of the present application.

Fig. 4 is a schematic diagram of control points of an Affine mode according to an embodiment of the present application.

Fig. 5 is a schematic diagram of motion vectors of a CU according to an embodiment of the present application.

Fig. 6 is a schematic flow chart of a video processing method according to an embodiment of the present application.

Fig. 7 is a schematic block diagram of a video processing apparatus according to an embodiment of the present application.

Detailed Description

Technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Unless otherwise defined, all technical and scientific terms used in the examples of this application have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present application.

As shown in fig. 1, the video coding framework mainly includes intra-frame prediction, inter-frame prediction, transformation, quantization, entropy coding, and loop filtering.

The present application is primarily directed to improvements in the inter prediction (inter prediction) section.

The general idea of inter prediction is: the temporal correlation between adjacent frames of a video is utilized, a reconstructed frame is used as a reference frame, and a current frame is predicted through Motion Estimation (ME) and Motion Compensation (MC), so that the temporal redundant information of the video is removed.

The current frame (or target frame) referred to herein, in an encoding scenario, represents a frame that is currently being encoded, and in a decoding scenario, represents a frame that is currently being decoded.

A reconstructed frame as referred to herein, in an encoding scenario, represents a frame that has been previously encoded, and in a decoding scenario, represents a frame that has been previously decoded.

For a frame of image, the whole frame of image is not directly processed in the encoding process, and the whole frame of image is usually divided into image blocks for processing.

As an example, an entire frame image is first divided into Coding Tree Units (CTUs), for example, the CTUs have a size of 64 × 64 or 128 × 128 (Unit: pixel), and then the CTUs may be further divided into Coding Units (CUs) of a square or rectangle. During the encoding process, the CU may be processed.

The units of size of the image blocks referred to herein may all be pixels.

The general flow of inter prediction is as follows.

For a current image block (hereinafter, simply referred to as a current block) in a current frame, a most similar block is found in a reference frame as a prediction block for the current block. The relative displacement between the current block and the similar block is called a Motion Vector (MV). Motion estimation refers to a process of obtaining a motion vector after searching and comparing a current block of a current frame in a reference frame. Motion compensation refers to a process of obtaining a prediction block using a reference block and a motion vector obtained by motion estimation.

The prediction block obtained by the inter-prediction process may have a certain difference from the original current block, and therefore, the difference between the prediction block and the current block may be calculated, and the difference may be referred to as a residual. And transforming, quantizing, entropy coding and the like the residual error to obtain a coded bit stream.

After the encoding of the image is completed, i.e., after the bit stream obtained by entropy encoding is encoded, the bit stream and encoding mode information, such as inter-frame prediction mode and motion vector information, may be stored or transmitted to the decoding side.

At a decoding end, after entropy coding bit stream is obtained, entropy decoding is carried out on the bit stream to obtain corresponding residual error; then, obtaining a prediction block according to coding mode information such as a motion vector obtained by decoding; and finally, obtaining the value of each pixel point in the current block according to the residual error and the prediction block, namely reconstructing the current block, and reconstructing the current frame by analogy.

As shown in fig. 1, during the encoding process, steps such as inverse quantization and inverse transformation may be further included. Inverse quantization refers to the inverse of the quantization process. Inverse transformation refers to the process that is the reverse of the transformation process.

Inter prediction may include forward prediction, backward prediction, bi-prediction, etc.

Wherein the forward prediction is to predict a current frame (which may be referred to as a history frame) by using a previous reconstructed frame (which may be referred to as a history frame) of the current frame (for example, a frame denoted by a reference numeral t as shown in fig. 2). Backward prediction is the prediction of a current frame using frames subsequent to the current frame, which may be referred to as future frames. Bi-prediction may be bi-directional prediction, i.e., using both "historical frames" (e.g., frames labeled t-2 and t-1 as shown in fig. 2) and "future frames" (e.g., frames labeled t +2 and t +1 as shown in fig. 2) to predict the current frame. Bi-prediction can also be a prediction in the same direction, e.g. using two "historical frames" to predict the current frame, or two "future frames" to predict the current frame.

In video coding and decoding, different frame types can be set, and different frame types can support different kinds of inter-frame prediction modes. Among them, the frame type may include three types: "I frame", "B frame", and "P frame".

All image blocks in the "I frame" are intra-coded without reference to the information of other frames.

A "B frame" is a bi-directionally predicted frame, and image blocks in the B frame may be intra-coded or inter-coded. For bi-directionally predicted B-frames, the inter-prediction mode of its image block may be forward prediction, backward prediction, or bi-directional prediction, so the inter-prediction block MV of the B-frame may be a single MV or a dual MV.

The "Generalized B frame" (Generalized P and B picture, abbreviated as GPB) is a structure in HEVC, and combines the characteristics of the conventional B frame and P frame. The generalized B-frame can be bi-forward predicted, i.e., there are two reference frames and all are "historical frames". For a coding block of a generalized B frame, intra mode, forward prediction mode, bi-forward prediction mode are also possible. The inter-prediction block MV of the generalized B-frame may be a single MV or a dual MV.

A "P frame" is a forward predicted frame and, for unidirectional prediction, a coded block in a P frame may employ an intra-prediction mode, possibly employing a forward prediction mode. Since the P-frames are unidirectional predicted frames, the inter-predicted blocks MV of the P-frames are all single MVs.

The above several frame types can be combined in a specific way to obtain several different coding modes.

For example, HEVC may have four encoding modes, i.e., full intra (AI), Random Access (RA), low latency B frame (LowDelayB, LDB), and low latency P frame (LowDelayP, LDP).

In the AI encoding mode, all frames are I-frames (IIIIIIIII … …).

In the RA coding scheme, B frames are mainly inserted and I frames are periodically inserted (approximately every second), i.e., I B … B I B … B I … … in this coding mode.

In the LDB coding mode, only the first frame is an I-frame, and the rest are coded in the generalized B-frame mode, with the frame structure being I B … …. The LDP only has the first frame as an I frame, and the rest is encoded in a P frame manner, with a frame structure of IPPPPP … ….

The inter prediction technique in HEVC may include three modes, i.e., inter mode (also called AMVP mode), merge mode, and skip mode.

For the inter mode, Motion Vector Prediction (MVP) may be determined first, after obtaining the MVP, a starting point of Motion estimation may be determined according to the MVP, Motion search may be performed near the starting point, an optimal MV may be obtained after the search is completed, a position of a reference block in a reference image is determined by the MV, a residual block is obtained by subtracting a current block from the reference block, a Motion Vector Difference (MVD) is obtained by subtracting the MVP from the MV, and the MVD is transmitted to a decoding end through a code stream.

For the Merge mode, an MVP may be determined first, and the MVP is determined as the MV directly, where to obtain the MVP, an MVP candidate list (Merge candidate list) may be constructed first, the MVP candidate list may include at least one candidate MVP, each candidate MVP may correspond to an index, after selecting the MVP from the MVP candidate list, the encoding end may write the MVP index into the code stream, and the decoding end may find the MVP corresponding to the index from the MVP candidate list according to the index, so as to implement decoding of the image block.

In order to understand the Merge mode more clearly, the operation flow of encoding using the Merge mode will be described below.

Step one, obtaining an MVP candidate list;

selecting an optimal MVP from the MVP candidate list, and simultaneously obtaining an index of the MVP in the MVP candidate list;

step three, taking the MVP as the MV of the current block;

step four, determining the position of a reference block (also called a prediction block) in a reference frame image according to the MV;

subtracting the current block from the reference block to obtain residual data;

and step six, transmitting the residual data and the index of the MVP to a decoding end.

It should be understood that the above flow is just one specific implementation of the Merge mode. The Merge mode may also have other implementations.

For example, Skip mode is a special case of Merge mode. After obtaining the MV according to the Merge mode, if the encoding side determines that the current block and the reference block are substantially the same, the residual data does not need to be transmitted, only the index of the MV needs to be transferred, and further, a flag may be transferred, which may indicate that the current block may be directly obtained from the reference block.

That is, the Merge mode is characterized by: MV ═ MVP (MVD ═ 0); and Skip mode has one more feature, namely: the reconstructed value rec is the predicted value pred (residual value resi is 0).

In an actual scene, due to the continuity of the motion of a natural object, the motion vector of the object between two adjacent frames is not necessarily just an integer pixel unit, and therefore, the precision of motion estimation can be improved to a sub-pixel level (also referred to as 1/K pixel precision). For example, in the HEVC standard, motion estimation for the luma component employs motion vectors of 1/4 pixel precision.

However, the sample value at 1/K pixel does not exist in the digital video, and generally, in order to realize motion estimation with 1/K pixel accuracy, the value of 1/K pixel is approximately interpolated, in other words, K times interpolation is performed in the row direction and the column direction of the reference frame, and a search is performed in the image after interpolation. The process of interpolating the current block requires the use of the pixel points in the current block and the pixel points in the neighboring area.

As an example, the process of 1/4 pixel interpolation is shown in FIG. 3, and 3 pixel points on the left side and 4 pixel points on the right side outside the image block to be encoded can be usedPixel values for pixel points to generate interpolated points, as shown in FIG. 3, for an image block of size 4 × 4, a_0,0And d _0,01/4 pixel point, b_0,0And h_0,0Is a half pixel point, c_0,0And n _0,03/4 pixels, if the current block is a block of 2 × 2, A_0,0～A_1,0，A_0,0～A_0,1The 2 × 2 blocks are enclosed, in order to calculate all interpolation points in this 2 × 2 block, it is necessary to use some points outside 2 × 2, including the left 3, the right 4, the upper 3, and the lower 4, where the image blocks mentioned herein may be image blocks of 8 × 8, 4 × 8, 4 × 4, or 8 × 4, or image blocks of other sizes, which is not specifically limited in this embodiment of the application.

The pixel interpolation for each point can result in equations 1-21 as follows:

a_0,j＝(∑_i＝-3..3A_i,jqfilter[i]) > (B-8) formula 1

b_0,j＝(∑_i＝-3..4A_i,jhfilter[i]) > (B-8) formula 2

c_0,j＝(∑_i＝-2..4A_i,jqfilter[1-i]) > (B-8) formula 3

d_0,0＝(∑_i＝-3..3A_0,jqfilter[j]) > (B-8) formula 4

h_0,0＝(∑_i＝-3..4A_0,jhfilter[j]) Formula 5

n_0,0＝(∑_i＝-2..4A_0,jqfilter[1-j]) Formula 6

The interpolation process of the embodiment of the present application may be implemented by an interpolation filter. The number of taps of the interpolation filter may refer to the maximum possible number of pixel values for that number of points to compute the interpolated samples. Wherein the coefficients of the 8-tap interpolation filter corresponding to the luminance component and the coefficients of the 4-tap interpolation filter corresponding to the chrominance component may be as shown in tables 1 and 2 below,

TABLE 1, 8-TAP INTERPOLATION FILTER COEFFICIENT

Position index value i	-3	-2	-1	0	1	2	3	4
									hfilter[i]	-1	4	-11	40	40	-11	4	-1
qfilter[i]	-1	4	-10	58	17	-5	1

TABLE 2 coefficients of 4-tap interpolation filter for chrominance components

Position index value i	-1	0	1	2
					filter1[i]	-2	58	10	-2
filter2[i]	-4	54	16	-2
					filter2[i]	-6	46	28	-4
filter4[i]	-4	36	36	-4
					filter5[i]	-4	28	46	-6
filter6[i]	-2	16	54	-4
					filter7[i]	-2	10	58	-2

In table 2, the tap coefficients of the chrominance components, filter1, filter2, filter3, and so on are the interpolation filter coefficients used at the 1/8 pixel point position, the 2/8 pixel point position, and the 3/8 pixel point position, respectively.

Adaptive Motion Vector Resolution (AMVR) techniques may enable a CU to have Motion vectors of integer or sub-pixel precision. The integer pixel precision may be, for example, 1 pixel precision, 2 pixel precision, or the like. The sub-pixel precision may be, for example, 1/2 pixel precision, 1/4 pixel precision, 1/8 pixel precision, 1/16 pixel precision, or the like.

The AMVR may include an AMVR in inter mode and an AMVR in affinity mode.

In the HEVC standard, the inter prediction process only considers the traditional motion model (e.g., translational motion). However, in the real world, there are many forms of motion, such as random motions, such as zooming, rotating, perspective, etc. In order to take the above-mentioned motion patterns into consideration, in VTM-3.0, Affinine technology was introduced.

As shown in fig. 4, the motion field of one Affine mode can be derived by motion vectors of two control points (four parameters) (as shown in fig. 4 (a)) or three control points (six parameters) (as shown in fig. 4 (b)).

Hereinafter, the mv (control Point Motion vector) of the control Point is simply referred to as CPMV.

The processing unit of Affine is not a CU, but sub-CUs obtained after dividing the CU, and the size of each sub-CU may be 4 × 4. In Affinine mode, each sub-CU has an MV. It can be understood that unlike a normal CU, a CU in the affinity mode has not only one MV, but also how many sub-CUs in a CU, and how many MVs this CU has.

As an example, the MVs of sub-CUs in one CU are derived by CPMV calculation of two control points or three control points as shown in fig. 4. For example, for a four-parameter Affine motion model, the MV of the sub-CU at the (x, y) position is calculated by the following formula:

for another example, for a six-parameter Affine motion model, the MV of the sub-CU at the (x, y) position is calculated by the following formula:

wherein (mv)_0x,mv_0y) MV for the upper left corner control Point, (MV)_1x,mv_1y) MV for the upper right control point, (MV)_2x,mv_2y) The MV for the lower left corner control point.

Through the calculation of the above formula, the motion vector in a CU can be shown in fig. 5, and each square represents a sub-CU of 4 × 4 size. All MVs after the above formula calculation are converted to 1/16 precision representation, i.e. the highest precision of sub-CU MV is 1/16. After the MV of each sub-CU is obtained through calculation, a prediction block of each sub-CU is obtained through a motion compensation process. The sub-CU sizes for both the chroma and luma components are 4x4, and the motion of the chroma component 4x4 block is averaged over its corresponding four 4x4 luma component motion vectors.

The affinity mode may process only CUs having a width and height of not less than 8, similar to the general merge mode mentioned above. In the mode, MVs can be obtained from spatial domain neighboring blocks and temporal domain neighboring blocks at first, CPMVs of CUs in the Affine mode and MVs in the traditional mode can be obtained in the process, then CPMVs are obtained according to the MV combinations to construct a candidate list, then a combination (two or three CPMVs may be included in the candidate list, and CPMVs representing two control points and three control points) is selected from the candidate list to serve as the CPMVs of the current block, motion estimation is not needed, and only an index of the finally selected CPMVs (only one index needs to be written in one CU) needs to be written in a code stream. The inter prediction mode of the adjacent block may be a conventional inter prediction mode or an affinity mode, so the MV acquired from the adjacent block may be an integer pixel or a sub-pixel, the affinity mode does not perform AMVR, that is, does not perform the adaptive motion vector precision decision process, and what the MV precision selected from the adjacent block is.

The affinity inter mode may process only CUs having a width and height of not less than 16, similar to the AMVP mode mentioned above. The MV can be obtained from spatial domain or temporal domain neighboring blocks to construct a candidate list, and then a motion estimation process is performed, wherein the motion estimation process is performed by taking the whole CU as a unit to obtain CPMVs. The motion compensation process is performed in units of sub-CU of 4 × 4, and finally, the index of the selected CPMVs and the difference value (MVD) between the index and the actual CPMVs of the current block CU can be written into the code stream. The accuracy of AMVR is essentially the accuracy of MVDs, i.e. CPMVs, rather than the MV accuracy of sub-CUs.

For each CU adopting Affinine AMVR (or AMVR technology, in some cases, the CU does not adopt AffinieAMVR), the corresponding MV precision can be arbitrarily and adaptively decided at the encoding end, and the decision result is written into the code stream and transmitted to the decoding end.

Integer-pixel precision or sub-pixel precision referred to in the affinity AMVR technique refers to the pixel precision of CPMV, not sub-CU. For example, the 1/16 precision, 1/4 precision, integer pixel precision, etc. mentioned in the AffineAMVR may refer to the precision of the CPMV in fig. 4, and not the precision of the MV actually used in the motion compensation process of the sub-CU. For the CPMV of integer pixels, the motion estimation process is an integer pixel process, and the MV of the sub-CU obtained after the computation of the above two formulas 1) and 2) may be 1/4 precision, so the motion compensation process involves sub-pixels.

The interpolation process with sub-pixel accuracy mentioned above may bring the pressure of reading the memory data. The main reason is that the interpolation process needs to read the data of the adjacent point of the current coding block in addition to the value of the current coding block to obtain the pixel value of the sub-pixel point. Taking an 8-tap interpolation filter as an example, 7 pixels in the horizontal direction and 7 pixels in the vertical direction besides the current block need to be additionally used, and if the current block is a block with a width and a height of W, H, the interpolation process needs to read the region of (W +7) x (H + 7). For a 4-tap filter, the region of (W +3) x (H +3) needs to be read. For a 6-tap interpolation filter, the region of (W +5) x (H +5) needs to be read.

Especially in LDB mode, storage bandwidth consumption is large due to the presence of dual MVs compared to LDP mode, and if LDP is used, only a single MV is allowed, which results in a large difference between coding performance and LDB.

In view of the above problems, the present application provides a method and an apparatus for image processing, which can reduce bandwidth pressure to a certain extent and ensure compression performance.

The method is applicable to the technical field of digital video coding, and particularly applicable to an inter-frame prediction part of a video coder-decoder. The application can be applied to the codecs conforming to the international video coding standard H.264/HEVC, the Chinese AVS2 standard and the like, and the codecs conforming to the next generation video coding standard VVC, AVS3 and the like.

The present application may be applied to an inter prediction part of a video codec, that is, a method of image processing according to an embodiment of the present application may be performed by an encoding apparatus and may also be performed by a decoding apparatus.

Fig. 6 is a schematic flow chart of a video processing method according to an embodiment of the present application. The method includes at least some of the following.

At 110, motion estimation and/or motion compensation are performed on an image block of the target frame having at least one MV (specifically, multiple MVs) by using an interpolation filter of multiple interpolation filters.

In particular, there may be a plurality of interpolation filters for use by the video processing device, the video processing device being performing motion estimation and/or motion compensation on the current image block, from which interpolation filters may be selected for motion estimation and/or motion compensation.

In the embodiment of the present application, for the same image block, the interpolation filter used for motion estimation and the interpolation filter used for motion compensation may be the same or may not be the same.

In particular, the interpolation filter may be selected once for the current image block, both for motion estimation and for motion compensation. Alternatively, for the current image block, one interpolation filter may be selected for motion estimation and one interpolation filter may be selected for motion compensation.

In the embodiments of the present application, the various interpolation filters used for motion estimation may be the same as, partially the same as, or completely different from the various interpolation filters used for motion compensation.

In the embodiment of the present application, the interpolation filters may be different in at least one of the following ways: the number of taps of the interpolation filter, the coefficients of the interpolation filter, the shape of the interpolation filter (alternatively referred to as the pixel position referenced by the interpolation filter).

In the embodiment of the present application, the number of taps of the interpolation filter may be 2,4,6, 8, or the like.

In the embodiment of the present application, different interpolation filters in the plurality of interpolation filters correspond to different preset conditions.

Specifically, each of the interpolation filters has a preset condition, and when the preset condition of a certain interpolation filter is satisfied, the interpolation filter may be used to perform motion estimation and/or motion compensation.

For example, when a first preset condition corresponding to a first interpolation filter (which may be any one of a plurality of interpolation filters) is satisfied, motion estimation and/or motion compensation are performed on the image block by using the first interpolation filter.

In the embodiment of the present application, the preset conditions corresponding to different interpolation filters may be different in at least one of the following aspects:

the coding mode of the image block, the interval where the size of the image block is located, the component to be coded of the image block, and the number of MVs of the image block.

The coding mode of the image block mentioned in the embodiments of the present application may include: inter mode, affinity mode, Merge mode, etc.

The interval in which the size of the image block mentioned in the embodiment of the present application is located may be divided into two or more than two intervals. For example, the interval in which the size of the image block is located may be divided into two intervals, which are greater than a preset value and less than or equal to the preset value; for example, the division in which the sizes of the image blocks are located may be divided into intervals greater than a first preset value, intervals less than or equal to the first preset value and greater than a second preset value, and intervals less than or equal to the second preset value.

The components to be encoded of the image block according to the embodiments of the present application may include: luminance components and chrominance components, etc.

The number of MVs of an image block mentioned in the embodiments of the present application may be 1, 2, 3, or more.

Optionally, in this embodiment of the present application, an interpolation filter may correspond to one or more preset conditions, and in a case where any one of the one or more preset conditions is satisfied, the interpolation filter may be used for performing motion estimation and/or motion compensation. Wherein different preset conditions among the plurality of preset conditions corresponding to one interpolation filter are different in at least one of the following aspects:

For example, the preset condition corresponding to the first interpolation filter of the plurality of interpolation filters includes at least two of the following:

1) the encoding mode of the image block is an inter-frame mode, and the component to be encoded is a brightness component;

2) the coding mode of the image block is an inter mode, and the component to be coded is a chrominance component;

3) the encoding mode of the image block is an Affine motion compensation prediction Affine mode, and the component to be encoded is a chrominance component;

4) the encoding mode of the image block is an Affinine mode, and the component to be encoded is a luminance component.

The preset conditions corresponding to the first interpolation filter include at least two of different coding modes, or different components to be coded.

It should be understood that the factors included in the single preset condition mentioned in the embodiments of the present application may be open-ended, that is, other factors may be included or defined besides the factors mentioned (the factors included in the preset condition, that is, the factors defined by the preset condition). When the preset condition includes a plurality of factors, it may mean that all of the factors are limited, and when no other factor is included, it may mean that no other factor is limited.

For example, for the preset condition 1), the preset condition 1) may further include that the size of the preset condition is less than or equal to a preset value.

Optionally, in this embodiment of the present application, the preset conditions corresponding to different interpolation filters are different in at least one of the following aspects:

It is assumed that one interpolation filter corresponds to a plurality of preset conditions, and factors that each preset condition is different from the preset conditions of other interpolation filters in the plurality of preset conditions may be different.

For example, the interpolation filter1 corresponds to a preset condition a and a preset condition B, and the interpolation filter2 corresponds to a preset condition C, where the preset condition a and the preset condition C may be different in terms of the encoding mode of the image block, and the preset condition B and the preset condition C may be different in terms of the component to be encoded.

Optionally, in this embodiment of the present application, different preset conditions correspond to different interpolation filters in the multiple interpolation filters. When one of the preset conditions is satisfied, the interpolation filter corresponding to the preset condition may be used to perform motion estimation and/or motion compensation on the image block.

Specifically, there may be a plurality of preset conditions, and the interpolation filter corresponding to each preset condition may be different.

Wherein the different preset conditions differ in at least one of the following aspects:

It should be understood that, in the embodiment of the present application, the preset condition is different in a certain aspect, which may mean that the limitation in this aspect is different, for example, the preset condition a defines an encoding mode inter mode of an image block, and the preset condition B defines an encoding mode Affine mode of an image block, and then the two preset conditions are different in the encoding mode of the image block.

Alternatively, in the embodiment of the present application, the following preset conditions respectively correspond to different interpolation filters:

1) the encoding mode of the image block is an inter mode, the component to be encoded is a brightness component, and the size of the image block is smaller than or equal to a first preset value and larger than a second preset value;

2) the coding mode of the image block is an inter mode, the component to be coded is a brightness component, and the size of the image block is smaller than or equal to the second preset value.

The above preset conditions 1) and 2) are the same in the definition of the encoding mode of the image block and the component to be encoded, and the definition of the size of the image block may be different.

It should be understood that different preset conditions may have different limitations on the size of the image block, and the defined encoding mode or the component to be encoded may also be both an Affine mode or a chrominance component; alternatively, different preset conditions may have the same definition for the size of the image block, and different definitions for the encoding mode or the component to be encoded; alternatively, different preset conditions may have different restrictions on the size of the image block, and different restrictions on the encoding mode or the component to be encoded.

For example, the following preset conditions respectively correspond to different interpolation filters:

1) the encoding mode of the image block is an Affine mode, the component to be encoded is a luminance component, and the size of the image block is smaller than or equal to a first preset value and larger than a second preset value;

2) the encoding mode of the image block is an Affinine mode, the component to be encoded is a chrominance component, and the size of the image block is smaller than or equal to the second preset value.

For another example, the following preset conditions respectively correspond to different interpolation filters:

2) the encoding mode of the image block is an Affinine mode, the component to be encoded is a luminance component, and the size of the image block is smaller than or equal to the second preset value.

1) the encoding mode of the image block is an Affine mode, the component to be encoded is a chrominance component, and the size of the image block is smaller than or equal to a first preset value and larger than a second preset value;

2) the coding mode of the image block is an inter mode, the component to be coded is a chrominance component, and the size of the image block is smaller than or equal to the second preset value.

1) the coding mode of the image block is an inter mode, and the component to be coded is a brightness component;

2) the coding mode of the image block is an inter mode, and the component to be coded is a chrominance component.

The above preset conditions 1) and 2) have the same definition in terms of the coding mode of the image block and different definitions in terms of the components to be coded, wherein the above preset conditions 1) and 2) may also have the same or different definitions in other respects.

the coding mode of the image block is an inter mode, and the component to be coded is a brightness component;

the encoding mode of the image block is an affinity mode, and the component to be encoded is a luminance component or a chrominance component.

The above preset conditions 1) and 2) have different limitations in terms of the encoding mode of the image block and the component to be encoded, and may have the same limitation or different limitations in other aspects.

For example, the preset condition 1) may further include: the size of the image block is larger than a preset value. In this case, the preset condition 2) may not limit the size of the image block (that is, any size may be used), or may limit the size.

The above examples only compare some different preset conditions, but the embodiment of the present application is not limited thereto, and the preset conditions of the embodiment of the present application may also be other.

In order to more clearly understand the present application, what the number of interpolation filter taps is under various preset conditions will be described below.

In one implementation, the preset condition a) includes: the encoding mode of the image block is an Affinine mode, and the component to be encoded is a brightness component; the corresponding filter has a tap number of 4. Under the preset condition a), the bandwidth pressure can be relieved by adopting 4 as the number of taps of the filter instead of adopting 6 or 8 and the like which are more than 4.

In one implementation, the preset condition b) includes: the coding mode of the image block is an inter mode, and the component to be coded is a brightness component; the corresponding filter has a tap number of 4 or 6. Wherein the first preset condition may further include: the size of the image block is smaller than or equal to a preset value. Under the preset condition b), the tap number of the filter adopts 4 or 6 instead of 8, and the bandwidth pressure can be relieved.

For example, when the size is less than or equal to the first preset value and greater than the second preset value, the number of interpolation filters employed may be 6, and when less than or equal to the second preset value, the number of taps of the interpolation filters employed may be 4. When the size is larger than the first preset value, the number of the adopted interpolation filters is 8.

For example, when the size is less than or equal to the first preset value, the number of interpolation filters used is 4, and when the size is greater than the first preset value, the number of interpolation filters used is 6 or 8.

In one implementation, the preset condition c) includes: the coding mode of the image block is an inter mode, and the component to be coded is a brightness component; the corresponding filter has a tap number of 8. The first preset condition may further include: the size of the image block is larger than a preset value.

In one implementation, the preset condition d) includes: the coding mode of the image block is an inter mode, and the component to be coded is a chrominance component; the corresponding filter has a tap number of 4.

In one implementation, the preset condition e) includes: the encoding mode of the image block is an Affinine mode, and the component to be encoded is a chrominance component; the corresponding filter has a tap number of 4.

For the above preset conditions a) -e), there may also be other defining factors, for example, each defining the number of MVs to be 2, or defining the image frames to be dual forward B frames.

Alternatively, in the embodiment of the present application, the size of the image block may be inversely related to the number of taps of the applied interpolation filter, because the smaller the size of the image block, the greater the number of image blocks obtained by dividing the image frame, and the greater the number of pixels required for performing the interpolation processing for the entire image frame, the greater the bandwidth pressure is caused, and therefore, when the size of the image block is smaller, the interpolation filter with the smaller number of taps may be applied, and the bandwidth pressure may be reduced.

Alternatively, in the embodiment of the present application, the interpolation filter is taken as an example above, and it is mentioned that different interpolation filters may correspond to different preset conditions or different preset conditions may correspond to different interpolation filters. However, in the embodiment of the present application, different interpolation manners may also correspond to different preset conditions, or different preset conditions may correspond to different interpolation filters. At this time, the specific implementation may refer to the above description about the interpolation filter, and specifically, the above interpolation filter may be replaced by an interpolation method.

In the embodiment of the present application, the different interpolation manners may include different interpolation filters.

In the embodiment of the present application, the interpolation method for motion estimation and/or motion compensation may be the same under some preset conditions. Wherein the preset conditions may differ in at least one of the following ways:

For example, the interpolation manner for motion estimation and/or motion compensation is the same under a preset condition (referred to as a preset condition a)) including a component to be encoded as a luminance component and under a preset condition (referred to as a preset condition B)) including a component to be encoded as a chrominance component.

Here, the preset condition a) and the preset condition B) define different components to be encoded, and the interpolation manners corresponding to the preset conditions may be the same. The same interpolation method may be the same interpolation filter. The number of taps of the interpolation filter may be 4 for interpolating 1/16 pixels.

Alternatively, in the embodiment of the present application, the preset condition a) and the preset condition B) respectively define components to be encoded, and may have the same definition in other aspects (for example, the number of MVs, the size of an image block, an encoding mode, and the like) or may have different definitions.

For example, the preset condition a) and the preset condition B) respectively define the encoding mode inter mode of the image block.

For example, the preset condition a) and the preset condition B) respectively define the encoding mode Affine mode of the image block.

For example, the preset condition a) defines the encoding mode of the image block as an inter mode, and the preset condition B) defines the encoding mode Affine mode of the image block respectively.

For example, the preset condition a) defines the encoding mode of the image block as an affinity mode, and the preset condition B) defines the encoding mode inter mode of the image block respectively.

Optionally, in this embodiment of the present application, the image block includes a luminance component and a chrominance component; and the luminance component and the chrominance component of the image block adopt the same interpolation mode to carry out motion estimation and/or motion compensation.

Specifically, in the embodiment of the present application, for the luminance component and the chrominance component of the same image block, the same interpolation manner may be adopted for motion estimation and/or motion compensation.

The same interpolation may be used as mentioned herein, which means that the number of taps and/or interpolation coefficients of the interpolation filter used are the same. Optionally, in this embodiment of the present application, the number of taps of the interpolation filter used for motion estimation and/or motion compensation for the luminance component and the chrominance component of the image block is 4, which is used to interpolate 1/16 pixels.

In the embodiment of the present application, the luminance component and the chrominance component of the image block may perform motion estimation and/or motion compensation in the same interpolation manner when at least one of the following conditions is satisfied:

1) the code stream has a specific identification bit, and at this time, the method can be applied to a decoding end. The specific flag bit mentioned here may be a first flag bit mentioned below, or may be a first flag bit having a specific value.

For example, when the code stream has the first flag, the luminance component and the chrominance component of the image block perform motion estimation and/or motion compensation in the same interpolation manner. And whether the first identification bit exists in the code stream can be indicated through the second identification bit. And when the second identification bit indicates that the current frame is a B frame, the first identification bit exists in the code stream.

For example, when the first flag value is a specific value, the luminance component and the chrominance component of the image block perform motion estimation and/or motion compensation in the same interpolation manner. The first identification bit can have two values of long type and short type, and the number of taps of the interpolation filter adopted by the long type can be more than that of the interpolation filter adopted by the short type. For example, when the first flag bit takes the long type, the number of taps of the interpolation filter is 8, and when the first flag bit takes the short type, the number of taps of the interpolation filter is 4 or 6. And when the first identification bit indicates a long type or a short type, the luminance component and the chrominance component of the image block adopt the same interpolation mode for motion estimation and/or motion compensation.

2) The coding mode is inter mode or affinity mode. The coding mode here may be that the coding mode of the luminance component is an inter mode or an affinity mode, or that the coding mode of the chrominance component is an inter mode or an affinity mode; or, the encoding mode of the luminance component and the encoding mode of the chrominance component are both an inter mode or an affinity mode, where the encoding mode of the luminance component and the encoding mode of the chrominance component may be the same or different.

3) The size of the image block is larger than a preset value.

Optionally, in this embodiment of the present application, the encoding modes of the luminance component and the chrominance component of the image block are both inter modes.

Specifically, when the encoding modes of the luminance component and the chrominance component of the image block are both inter modes, the luminance component and the chrominance component of the image block may perform motion estimation and/or motion compensation in the same interpolation mode, and at this time, the luminance component and the chrominance component of the image block may perform motion estimation and/or motion compensation in the same interpolation mode and may further have other limiting conditions, which is not specifically limited in this embodiment of the present application.

Optionally, in this embodiment of the present application, the encoding modes of the luminance component and the chrominance component of the image block are both Affine modes.

Specifically, when the encoding modes of the luminance component and the chrominance component of the image block are both affinity modes, the luminance component and the chrominance component of the image block may perform motion estimation and/or motion compensation in the same interpolation mode, and at this time, the luminance component and the chrominance component of the image block may perform motion estimation and/or motion compensation in the same interpolation mode and may further have other limiting conditions, which is not specifically limited in this embodiment of the present application.

Optionally, in this embodiment of the present application, the encoding end may write an identification bit in the code stream, where the identification bit is used to indicate that the luminance component and the chrominance component of the image block adopt the same motion estimation and/or motion compensation interpolation manner.

For a decoding end, acquiring an identification bit in a code stream, wherein the identification bit is used for indicating that a luminance component and a chrominance component of the image block adopt the same motion estimation and/or motion compensation interpolation mode.

Or, in the embodiment of the present application, the code stream may not have the flag, and the encoding end and the decoding end use the same method to select the interpolation filter.

Optionally, in this embodiment of the present application, the encoding end may write an identification bit in the code stream, where the identification bit may indicate whether the interpolation modes (or interpolation filters) corresponding to the respective preset conditions are the same. The decoding end may obtain the identification bit from the code stream to determine whether the interpolation filters (or interpolation filters) corresponding to the respective preset conditions are the same.

Illustratively, the flag is used to indicate whether an interpolation manner corresponding to a preset condition that the component to be coded is a luminance component and a preset condition that the component to be coded is a luminance component are the same.

The identification bit may be carried in a sequence header, a frame header, and a Slice header.

Optionally, in this embodiment of the present application, the encoding end may add a first flag bit to the code stream, where the first flag bit is used to indicate that one of the interpolation filters is selected from the multiple interpolation filters to be used for motion estimation and/or motion compensation (i.e., whether the scheme of the present application is applied). Accordingly, the decoding end can obtain the first identification bit from the code stream to determine that an interpolation filter needs to be selected from the multiple interpolation filters for motion estimation and/or motion compensation.

Optionally, in this embodiment of the present application, the first flag is used to indicate that one of the filters with multiple tap numbers is selected for motion estimation and/or motion compensation.

Illustratively, the plurality of types of filters of tap numbers include a first filter of which tap number is 8 and a second filter of which tap number is 6 or 4, or the tap number of the second filter indicates the same tap number as that of the filter of the chrominance component; the first identification bit is used for indicating that the first filter or the second filter is selected.

At this time, the first and second interpolation filters may be candidate interpolation filters for encoding the luminance component, but the embodiment of the present application is not limited thereto.

Optionally, in this embodiment of the present application, the first flag may have two values, that is, a long type and a short type, and the long type indicates that the number of taps of the interpolation filter used may be greater than the number of taps of the interpolation filter used by the short type. For example, when the first flag bit takes the long type, the selected interpolation filter may be a first interpolation filter, and the number of taps of the interpolation filter is 8, and when the first flag bit takes the short type, the selected interpolation filter may be a second interpolation filter, and the number of taps of the second interpolation filter is 4 or 6.

The first identification bit may be carried in a sequence header, a frame header, and a Slice header, and may be particularly carried in a Slice _ type in the Slice header. The second flag may be carried in the sequence header, the frame header, and the Slice header.

The SLICE header can identify whether the current SLICE is I _ SLICE, B _ SLICE or P _ SLICE by "SLICE type". For I _ SLICE, only intra prediction can be used; p _ SLICE may use intra prediction or forward prediction; b _ SLICE may use intra, forward prediction, bi-directional prediction, backward prediction, or bi-forward prediction.

Bearing the first identification in the Slice head means: the first identification bit may be an identification bit independent of the slice type. When the Slice header indicates that Slice is B _ Slice, if the first flag bit indicates that an interpolation filter is selected from multiple interpolation filters, the interpolation filter may be selected from the multiple interpolation filters according to the scheme of the present application (that is, the scheme of the present application is applied), and if the first flag bit indicates that an interpolation filter is not selected from the multiple interpolation filters, the interpolation filter may not be selected from the multiple interpolation filters according to the scheme of the present application (that is, the scheme of the present application is not applied), for example, a preset interpolation filter may be used. When the Slice header indicates that Slice is P _ Slice, if the first flag bit indicates that an interpolation filter is selected from a plurality of interpolation filters, the interpolation filter may be selected from the plurality of interpolation filters according to the scheme of the present application and processed in a B _ Slice manner (for example, bidirectional prediction, dual forward prediction, or dual backward prediction is used, that is, the current Slice is processed as B _ Slice), and if the first flag bit indicates that an interpolation filter is not selected from the plurality of interpolation filters, the interpolation filter may not be selected from the plurality of interpolation filters according to the scheme of the present application and processed in a P _ Slice manner (for example, intra prediction or forward prediction is used).

Optionally, in this embodiment of the present application, a second flag bit may also be present in the code stream, and when the second flag bit is used to indicate that the target frame is a B frame, the code stream has the first flag bit.

Here, the second flag bit may be a slice type. That is to say, when the slice type indicates that the current frame is a B frame, the first flag bit still exists in the code stream, otherwise, the first flag bit does not exist.

In this embodiment of the present application, the first identification bit may also multiplex a SLICE type, and in particular, multiplex a SLICE type indicating P _ SLICE, for example, when the SLICE type is P _ SLICE, an interpolation filter may be selected from multiple interpolation filters according to the scheme of the present application, and process the SLICE in a B _ SLICE manner (for example, bidirectional prediction, dual forward prediction, or dual backward prediction is adopted, that is, a current SLICE is processed as B _ SLICE). If the SLICE type is B _ SLICE, then in addition to the SLICE _ type, in the SLICE header, there may also be a first flag bit, if the first flag bit indicates that an interpolation filter is selected from the plurality of types of interpolation filters, then the interpolation filter may be selected from the plurality of types of interpolation filters according to the scheme of the present application, and if the first flag bit indicates that an interpolation filter is not selected from the plurality of types of interpolation filters, then the interpolation filter may not be selected from the plurality of types of interpolation filters according to the scheme of the present application, for example, a preset interpolation filter may be employed.

In the embodiment of the present application, two or more first identification bits may exist at the same time. For example, the first identification bit is present in at least two of the sequence header, the frame header, and the Slice header.

In one implementation, when the first flag in the sequence header indicates that the scheme of the present application is required (for necessity) or not required (for necessity, not), the first flag may not be present in the frame header or slice header, and the scheme of the present application is applicable or not applicable to all frames or slices of the sequence. When the first flag in the sequence header indicates that the scheme of the present application may be applicable (which means selectivity, and each frame or slice may be applicable or may not be applicable) or may not be applicable, the first flag may exist in the frame header or slice header to indicate whether the current frame or slice is applicable to the scheme of the present application.

For example, when the first flag in the sequence header indicates that each frame can apply the scheme of the present application, the first flag exists in the frame header or slice header to indicate whether the current frame or slice applies the scheme of the present application. When the first identification bit in the sequence header indicates that each frame does not need to be applied to the scheme of the application, the first identification bit does not exist in the frame header and the slice header any more, and when each frame or slice is processed, the scheme of the application is not applied.

For example, when the first flag in the sequence header indicates that each frame needs to apply the scheme of the present application, the first flag no longer exists in the frame header and the sequence header, and the scheme of the present application is applied when each frame or slice is processed. When the flag in the sequence header indicates that each frame may not be applicable (or may be applicable) to the scheme of the present application, a first flag exists in the frame header or slice header to indicate whether the current frame or slice is applicable to the scheme of the present application.

Also, in the embodiment of the present application, the first flag may be present in both the frame header and the slice header, or may be present only in the frame header. When the first flag in the frame header indicates that the scheme of the present application needs to be applied or does not need to be applied, the first flag may not exist in the slice header, and all slices of the frame are applicable or are not applicable to the scheme of the present application. When the first flag in the frame header indicates that the scheme of the present application may be applicable or may not be applicable, the first flag may exist in the slice header to indicate whether the slice is applicable to the scheme of the present application.

The embodiment of the application can be used in the LDB mode, the storage bandwidth consumption of the LDB can be reduced by modifying the interpolation function (selecting an interpolation filter or an interpolation mode) of the double MVs in the LDB, the LDB mode can not bring extra bandwidth pressure compared with the LDP, and meanwhile, the LDB mode has better compression performance compared with the LDP.

The above describes how to select an interpolation filter or an interpolation mode, and the embodiments of the present application can also be used to select how to select in unidirectional prediction or bidirectional prediction, and to select whether motion estimation and/or motion compensation uses integer-pixel precision (no interpolation filter is needed) or sub-pixel precision (interpolation filter is needed).

In one implementation, there may be a motion estimation and/or motion compensation mode with integer pixel precision and a motion estimation and/or motion compensation mode with sub-pixel precision, and whether the motion estimation and/or motion compensation mode with integer pixel precision or the motion estimation and/or motion compensation mode with sub-pixel precision is adopted may be selected according to a preset condition. Wherein the two preset conditions may also differ in at least one of the following aspects:

The preset condition corresponding to the motion estimation and/or motion compensation mode of the integer pixel precision may include: the coding mode is an inter mode, the size of the image block is smaller than or equal to a preset value, and the number of MVs of the image block is larger than or equal to 2. The preset condition may also define other factors, which are not limited in the embodiment of the present application.

In this case, the preset conditions corresponding to the motion estimation and/or motion compensation method with sub-pixel precision may further include multiple types, which respectively correspond to multiple interpolation filters, or the motion estimation and/or motion compensation method with sub-pixel precision may have multiple types of interpolation filters, and the preset conditions corresponding to each interpolation filter may be different.

In another implementation, there may be a unidirectional prediction mode and a bidirectional prediction mode, and the unidirectional prediction mode or the bidirectional prediction mode may be selected according to a preset condition. Wherein the two preset conditions may also differ in at least one of the following aspects:

The preset condition corresponding to the prediction mode of the unidirectional prediction may include: the coding mode is an inter mode, the size of the image block is smaller than or equal to a preset value, and the number of MVs of the image block is larger than or equal to 2. The preset condition may also define other factors, which are not limited in the embodiment of the present application.

In this case, the preset condition corresponding to the prediction mode of the bidirectional prediction may include a plurality of kinds, respectively corresponding to a plurality of interpolation filters, or the prediction mode of the bidirectional prediction may have a plurality of kinds of interpolation filters, and the preset condition corresponding to each interpolation filter may be different.

Having described the video processing method according to the embodiment of the present application, a video processing apparatus for realizing the embodiment of the present application will be described below.

Fig. 7 shows a schematic block diagram of a video processing apparatus 200 according to an embodiment of the present application.

As shown in fig. 7, the apparatus 200 may include a processor 210, and further may include a memory 220.

It should be understood that the computer system 200 may also include other components commonly included in computer systems, such as input/output devices, communication interfaces, and the like, which are not limited in the embodiments of the present application.

The memory 220 is used to store computer executable instructions.

The Memory 220 may be various types of memories, and for example, may include a Random Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory, which is not limited in this embodiment of the present invention.

The processor 210 is configured to access the memory 220 and execute the computer-executable instructions to perform the operations of the method for video processing of the embodiments of the present application described above.

The processor 210 may include a microprocessor, a Field-Programmable gate array (FPGA), a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), and the like, which is not limited in the embodiments.

The video processing device 200 of the embodiment of the present application may correspond to an execution main body of the video processing method of the embodiment of the present application, and the above and other operations and/or functions of each module of the video processing device 200 are respectively to implement corresponding flows of the foregoing methods, and are not described herein again for brevity.

An embodiment of the present application further provides an electronic device, which may include the video processing device according to the various embodiments of the present application.

The embodiment of the present application further provides a computer storage medium, where a program code is stored in the computer storage medium, and the program code may be used to instruct to execute the video processing method according to the embodiment of the present application.

It should be understood that, in the embodiment of the present application, the term "and/or" is only one kind of association relation describing an associated object, and means that three kinds of relations may exist. For example, a and/or B, may represent: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiments of the present application.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially or partially contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A video processing method, comprising:

2. The method according to claim 1, wherein different interpolation filters of the plurality of interpolation filters correspond to different preset conditions;

the performing motion estimation and/or motion compensation for the image block with multiple MVs of the target frame by using the interpolation filter of the multiple interpolation filters includes:

and when a first preset condition corresponding to the first interpolation filter is met, performing motion estimation and/or motion compensation on the image block by using the first interpolation filter.

3. The method of claim 2, wherein the preset conditions for different interpolation filters are different in at least one of the following ways:

4. The method of claim 1, wherein different predetermined conditions correspond to different interpolation filters of the plurality of interpolation filters;

utilizing an interpolation filter in a plurality of interpolation filters to perform motion estimation and/or motion compensation on an image block with multiple MVs of a target frame, comprising:

and when a first preset condition in the plurality of preset conditions is met, performing motion estimation and/or motion compensation on the image block by using the first interpolation filter corresponding to the first preset condition.

5. The method according to claim 4, wherein the different preset conditions differ in at least one of the following aspects:

6. The method according to claim 2 or 3, wherein the preset condition corresponding to the first interpolation filter comprises at least two of the following:

the encoding mode of the image block is an inter-frame mode, and the component to be encoded is a brightness component;

the coding mode of the image block is an inter mode, and the component to be coded is a chrominance component;

the encoding mode of the image block is an Affine motion compensation prediction Affine mode, and the component to be encoded is a chrominance component;

the encoding mode of the image block is an Affinine mode, and the component to be encoded is a luminance component.

7. The method according to claim 6, wherein the encoding mode including the image block is inter mode, and the preset condition that the component to be encoded is luminance component further comprises: the size of the image block is smaller than or equal to a preset value.

8. The method according to any one of claims 2 to 5, characterized in that the following preset conditions correspond to different interpolation filters, respectively:

the encoding mode of the image block is an inter mode, the component to be encoded is a brightness component, and the size of the image block is smaller than or equal to a first preset value and larger than a second preset value;

the coding mode of the image block is an inter mode, the component to be coded is a brightness component, and the size of the image block is smaller than or equal to the second preset value.

9. The method according to any one of claims 2 to 5, characterized in that the following preset conditions correspond to different interpolation filters, respectively:

the coding mode of the image block is an inter mode, and the component to be coded is a chrominance component.

10. The method according to any one of claims 2 to 5, characterized in that the following preset conditions correspond to different interpolation filters, respectively:

11. The method according to claim 9 or 10, wherein the encoding mode including the image block is inter mode, and the preset condition that the component to be encoded is luminance component further comprises: the size of the image block is larger than a preset value.

12. The method according to any one of claims 2 to 5,

the first preset condition includes: the encoding mode of the image block is an Affinine mode, and the component to be encoded is a brightness component;

the number of taps of the first filter is 4.

13. The method according to any one of claims 2 to 5,

the first preset condition includes: the coding mode of the image block is an inter mode, and the component to be coded is a brightness component;

the number of taps of the first filter is 4 or 6.

14. The method of claim 13, wherein the first preset condition further comprises: the size of the image block is smaller than or equal to a preset value.

15. The method according to any one of claims 2 to 5,

the number of taps of the first filter is 8.

16. The method of claim 15, wherein the first preset condition further comprises: the size of the image block is larger than a preset value.

17. The method according to any one of claims 2 to 5,

the first preset condition includes: the coding mode of the image block is an inter mode, and the component to be coded is a chrominance component;

the number of taps of the first filter is 4.

18. The method according to any one of claims 2 to 5,

the first preset condition includes: the encoding mode of the image block is an Affinine mode, and the component to be encoded is a chrominance component;

the number of taps of the first filter is 4.

19. The method according to any of claims 1 to 3, wherein the image block comprises a luminance component and a chrominance component;

and the luminance component and the chrominance component of the image block adopt the same interpolation mode to carry out motion estimation and/or motion compensation.

20. The method of claim 19,

when at least one of the following conditions is satisfied, the luminance component and the chrominance component of the image block are subjected to motion estimation and/or motion compensation in the same interpolation mode:

the code stream has a specific identification bit; the coding mode is an inter mode or an affinity mode; the size of the image block is larger than a preset value.

21. The method according to claim 19 or 20, wherein the motion estimation and/or motion compensation of the luminance component and the chrominance component of the image block using the same interpolation method comprises:

the tap numbers of the interpolation filters used for motion estimation and/or motion compensation of the luminance component and the chrominance component of the image block are the same; and/or the presence of a gas in the gas,

the luminance component and the chrominance component of the image block have the same interpolation coefficient of the interpolation filter used for motion estimation and/or motion compensation.

22. The method according to any of claims 19 to 21, wherein the coding modes of the luma component and the chroma components of the image block are both inter modes.

23. The method according to any one of claims 19 to 21, wherein the coding modes of the luminance component and the chrominance component of the image block are both Affine modes.

24. The method according to any of the claims 19 to 23, wherein the number of taps of the interpolation filter used for motion estimation and/or motion compensation for the luminance component and the chrominance component of the image block is 4 for interpolating 1/16 pixels.

25. The method according to any of claims 19 to 24, wherein the method is used in an encoding side, the method further comprising:

and writing an identification bit in the code stream, wherein the identification bit is used for indicating that the luminance component and the chrominance component of the image block adopt the same motion estimation and/or motion compensation interpolation mode.

26. The method according to any of claims 19 to 24, wherein the method is used at a decoding end, the method further comprising:

and acquiring an identification bit in the code stream, wherein the identification bit is used for indicating that the luminance component and the chrominance component of the image block adopt the same motion estimation and/or motion compensation interpolation mode.

27. The method according to any one of claims 1 to 26, wherein the method is used at an encoding end; the method further comprises the following steps:

and adding a first identification bit in the code stream, wherein the first identification bit is used for indicating that one interpolation filter is selected from a plurality of interpolation filters to be used for motion estimation and/or motion compensation.

28. The method according to any one of claims 1 to 26, wherein the method is used at a decoding end; the method further comprises the following steps:

and acquiring a first identification bit in the code stream, wherein the first identification bit is used for indicating that one interpolation filter is selected from a plurality of interpolation filters to be used for motion estimation and/or motion compensation.

29. The method according to claim 27 or 28, wherein the first flag is used to indicate that one of a plurality of tap number filters is selected for motion estimation and/or motion compensation.

30. The method of claim 29, wherein the plurality of tap number filters includes a first filter and a second filter, wherein the first filter has a tap number of 8, wherein the second filter has a tap number of 6 or 4, or wherein the tap number of the second filter is equal to the tap number of the filter for the chroma component;

the first identification bit is used for indicating that the first filter or the second filter is selected.

31. The method according to any one of claims 27 to 30, wherein the method is used in an encoding side, the method further comprising:

and adding a second identification bit in the code stream, wherein the code stream has the first identification bit when the second identification bit is used for indicating that the target frame is the B frame.

32. The method according to any one of claims 27 to 30, wherein the method is used at a decoding end, the method further comprising:

and acquiring a second identification bit in the code stream, wherein the first identification bit is arranged in the code stream when the second identification bit is used for indicating that the target frame is a B frame.

33. The method according to any one of claims 27 to 32, wherein the first identification bit is carried in a sequence header, a frame header, or a Slice header.

34. The method according to claim 31 or 32, wherein the second identification bit is carried in a sequence header, a frame header, or a Slice header.

35. The method according to any of claims 31 to 34, wherein the second identification bit is a Slice _ type in the Slice header.

36. A video processing device comprising a processor configured to invoke code stored in a memory to perform the operations of:

37. The apparatus of claim 36, wherein different interpolation filters of the plurality of interpolation filters correspond to different preset conditions;

38. The apparatus of claim 37, wherein the preset conditions for different interpolation filters are different in at least one of:

39. The apparatus of claim 36, wherein different predetermined conditions correspond to different interpolation filters of the plurality of interpolation filters;

40. The apparatus of claim 39, wherein the different preset conditions differ in at least one of:

41. The apparatus according to claim 37 or 38, wherein the preset conditions corresponding to the first interpolation filter include at least two of the following:

42. The apparatus of claim 41, wherein the encoding mode comprising the image block is inter mode, and the preset condition that the component to be encoded is a luminance component further comprises: the size of the image block is smaller than or equal to a preset value.

43. The apparatus according to any one of claims 37 to 40, characterized in that the following preset conditions correspond respectively to different interpolation filters:

44. The apparatus according to any one of claims 37 to 40, characterized in that the following preset conditions correspond respectively to different interpolation filters:

45. The apparatus according to any one of claims 37 to 40, characterized in that the following preset conditions correspond respectively to different interpolation filters:

46. The apparatus of claim 44 or 45, wherein the encoding mode including the image block is inter mode, and the preset condition that the component to be encoded is a luminance component further comprises: the size of the image block is larger than a preset value.

47. The apparatus according to any one of claims 37 to 40,

the number of taps of the first filter is 4.

48. The apparatus according to any one of claims 37 to 40,

the number of taps of the first filter is 4 or 6.

49. The apparatus of claim 48, wherein the first preset condition further comprises: the size of the image block is smaller than or equal to a preset value.

50. The apparatus according to any one of claims 37 to 40,

the number of taps of the first filter is 8.

51. The apparatus of claim 50, wherein the first preset condition further comprises: the size of the image block is larger than a preset value.

52. The apparatus according to any one of claims 37 to 40,

the number of taps of the first filter is 4.

53. The apparatus according to any one of claims 37 to 40,

the number of taps of the first filter is 4.

54. The apparatus according to any of the claims 36 to 38, wherein the image blocks comprise a luminance component and a chrominance component;

55. The apparatus of claim 54,

56. The apparatus according to claim 54 or 55, wherein the motion estimation and/or motion compensation is performed on the luminance component and the chrominance component of the image block by using the same interpolation method, comprising:

57. The apparatus according to any of claims 54 to 56, wherein the coding modes of the luminance component and the chrominance component of the image block are both inter modes.

58. The apparatus according to any of claims 54 to 56, wherein the coding modes of the luminance component and the chrominance component of the image block are both Affine modes.

59. The apparatus according to any of claims 54 to 58, wherein the number of taps of the interpolation filter used for motion estimation and/or motion compensation for the luminance component and the chrominance component of the image block is 4, for interpolating 1/16 pixels.

60. The apparatus according to any one of claims 54 to 59, wherein the apparatus is configured to be used on an encoding side, and wherein the processor is further configured to:

61. The apparatus according to any one of claims 54 to 59, wherein the apparatus is configured to be used on a decoding side, and wherein the processor is further configured to:

62. The apparatus according to any of claims 36 to 61, wherein the apparatus is for an encoding side; the processor is further configured to:

63. The apparatus according to any one of claims 36 to 61, wherein the apparatus is for a decoding side; the processor is further configured to:

64. The apparatus of claim 62 or 63, wherein the first flag is used to indicate that a filter with one of a plurality of tap numbers is selected for motion estimation and/or motion compensation.

65. The apparatus of claim 64, wherein the plurality of tap number filters comprises a first filter and a second filter, wherein the first filter has a tap number of 8, wherein the second filter has a tap number of 6 or 4, or wherein the tap number of the second filter is indicative of the same tap number as the filter for the chroma component;

66. The apparatus according to any one of claims 62 to 65, wherein the apparatus is configured to be used at an encoding end, and the processor is further configured to:

67. The apparatus according to any one of claims 62 to 65, wherein the apparatus is configured to be used on a decoding side, and the processor is further configured to:

68. The apparatus according to any of claims 62 to 67, wherein the first identification bits are carried in a sequence header, a frame header, a Slice header.

69. The apparatus according to claim 67 or 68, wherein the second identification bit is carried in a sequence header, a frame header, or a Slice header.

70. The device of claim 69, wherein the second identification bit is a Slice _ type in the Slice header.