WO2021056212A1

WO2021056212A1 - Method and apparatus for video encoding and decoding

Info

Publication number: WO2021056212A1
Application number: PCT/CN2019/107598
Authority: WO
Inventors: 郑萧桢; 孟学苇; 马思伟; 王苫社
Original assignee: 深圳市大疆创新科技有限公司; 北京大学
Priority date: 2019-09-24
Filing date: 2019-09-24
Publication date: 2021-04-01
Also published as: CN112154666A

Abstract

A method and apparatus for video encoding and decoding, comprising: when performing pixel interpolation on an image block of a current image, pixel interpolation can be performed by using one type from among at least two types of interpolation filters, the current image comprising a first image block and a second image block; pixel interpolation is performed on a reference block of the first image block by using a first interpolation filter; and pixel interpolation is performed on a reference block of the second image block by using a default interpolation filter, the first image block being a neighboring block of the second image block. The described method and apparatus for encoding and decoding prevent a current image block from overly depending on an interpolation filter used by a neighboring block, thereby improving encoding and decoding efficiency in an encoding and decoding process, and improving the performance of an encoding and decoding apparatus.

Description

Video coding and decoding method and device

Copyright statement

The content disclosed in this patent document contains copyrighted material. The copyright belongs to the copyright owner. The copyright owner does not object to anyone copying the patent document or the patent disclosure in the official records and archives of the Patent and Trademark Office.

Technical field

This application relates to the field of image processing, and more specifically, to a video coding and decoding method and device.

Background technique

Prediction is an important module of the mainstream video coding framework. Prediction can include intra-frame prediction and inter-frame prediction. The inter prediction modes may include Advanced Motion Vector Prediction (AMVP) mode, Merge mode, and Skip mode. In the Merge mode, the MVP can be determined in the motion vector prediction (MVP) candidate list, and the MVP can be directly determined as the MV, and the MVP and reference frame index can be transmitted to the decoder in the code stream. Used for decoding on the decoder side. In Skip mode, only the index of the MVP needs to be passed, and there is no need to pass the MVD, and there is no need to pass the residual.

In the process of constructing the candidate list of motion vector predictors, when the encoding or decoding of an encoded or decoded block is completed, the motion information of the encoded or decoded block is used to update the motion vector of the next to be encoded or decoded block List of predicted values. When pixel interpolation is performed on the current image block or the reference block of the coding block, it is often dependent on reading the MVP in the candidate list of the motion vector predictor and its corresponding interpolation filter, which will make the motion used by the adjacent image block The dependence of the vector and the corresponding interpolation filter is strong. Therefore, the encoding and decoding methods in the prior art will result in a decrease in the encoding and decoding efficiency and the performance loss of the encoding and decoding device.

Summary of the invention

In order to solve the technical problems in the prior art, the embodiments of the present application provide a video coding and decoding method and device, which can avoid the current image block from relying too much on interpolation filters used by neighboring blocks, and improve coding and decoding efficiency.

In a first aspect, a video encoding and decoding method is provided, which includes: when pixel interpolation is performed on an image block of a current image, one of at least two interpolation filters can be used for pixel interpolation, and the current image includes a first image block. And a second image block; using a first interpolation filter to perform pixel interpolation on the reference block of the first image block; using a default interpolation filter to perform pixel interpolation on the reference block of the second image block, the first image The block is an adjacent block of the second image block.

In a second aspect, a video encoding and decoding device is provided. The video encoding and decoding device includes: a memory, configured to store executable instructions; a processor, configured to execute the instructions stored in the memory, so that the video A coding and decoding method, which includes the operations in the method of the first aspect described above.

In a third aspect, a video codec is provided, which includes the video codec device of the second aspect described above, and a body, and the codec device is installed on the body.

In a fourth aspect, a computer-readable storage medium is provided, the computer-readable storage medium stores program instructions, and the program instructions can be used to instruct to perform the method of the first aspect.

In this embodiment of the application, the first interpolation filter is used to perform pixel interpolation on the reference block of the first image block; the default interpolation filter is used to perform pixel interpolation on the reference block of the second image block, and the first interpolation filter is used to perform pixel interpolation on the reference block of the second image block. The image block is the adjacent block of the second image block, and the reference block of the second image block uses the default interpolation filter during the interpolation process, which does not inherit or read the interpolation filter used by the first image block , Reducing its dependence on the coding and decoding of the first image block, therefore, in the coding and decoding process, the coding and decoding efficiency is improved, and the performance of the coding and decoding device is improved.

Description of the drawings

The following will briefly introduce the drawings that need to be used in the embodiments.

Fig. 1 is a structural diagram of a technical solution according to an embodiment of the present application.

Fig. 2 is a schematic diagram of a video coding framework 2 according to an embodiment of the present application.

Fig. 3 is a schematic diagram of adjacent blocks of an image block according to an embodiment of the present application.

Fig. 4 is a schematic flowchart of a video encoding and decoding method according to an embodiment of the present application.

Fig. 5 is a schematic block diagram of a video encoding and decoding device according to an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application are described below.

Unless otherwise specified, all technical and scientific terms used in the embodiments of the present application have the same meaning as commonly understood by those skilled in the technical field of the present application. The terminology used in this application is only for the purpose of describing specific embodiments, and is not intended to limit the scope of this application.

Fig. 1 is a structural diagram of a technical solution applying an embodiment of the present application.

As shown in FIG. 1, the system 100 can receive the data 102 to be processed, process the data 102 to be processed, and generate processed data 108. For example, the system 100 may receive the data to be encoded and encode the data to be encoded to generate encoded data, or the system 100 may receive the data to be decoded and decode the data to be decoded to generate decoded data. In some embodiments, the components in the system 100 may be implemented by one or more processors. The processor may be a processor in a computing device or a processor in a mobile device (such as a drone). The processor may be any type of processor, which is not limited in the embodiment of the present application. In some possible designs, the processor may include an encoder, a decoder, or a codec, etc. The system 100 may also include one or more memories. The memory can be used to store instructions and data, for example, computer-executable instructions that implement the technical solutions of the embodiments of the present application, to-be-processed data 102, processed data 108, and so on. The memory can be any type of memory, which is not limited in the embodiment of the present application.

The data to be encoded may include text, images, graphic objects, animation sequences, audio, video, or any other data that needs to be encoded. In some cases, the data to be encoded may include sensor data from sensors, which may be vision sensors (for example, cameras, infrared sensors), microphones, near-field sensors (for example, ultrasonic sensors, radars), position sensors, and temperature sensors. Sensors, touch sensors, etc. In some cases, the data to be encoded may include information from the user, for example, biological information, which may include facial features, fingerprint scans, retinal scans, voice recordings, DNA sampling, and the like.

Fig. 2 is a schematic diagram of a video coding framework 2 according to an embodiment of the present application. As shown in FIG. 2, after receiving the video to be encoded, starting from the first frame of the video to be encoded, each frame in the video to be encoded is encoded in turn. Among them, the current coded frame mainly undergoes processing such as prediction (Prediction), transformation (Transform), quantization (Quantization), and entropy coding (Entropy Coding), and finally the bit stream of the current coded frame is output. Correspondingly, the decoding process usually decodes the received bitstream according to the inverse process of the above process to recover the video frame information before decoding.

Specifically, as shown in FIG. 2, the video encoding framework 2 includes an encoding control module 201, which is used to perform decision-making control actions and parameter selection in the encoding process. For example, as shown in FIG. 2, the encoding control module 202 controls the parameters used in transformation, quantization, inverse quantization, and inverse transformation, controls the selection of intra or inter mode, and parameter control of motion estimation and filtering, and The control parameters of the encoding control module 202 will also be input to the entropy encoding module, and the encoding will be performed to form a part of the encoded bitstream.

When the encoding of the current encoded frame is started, the encoded frame is partitioned 202, specifically, it is firstly divided into slices, and then divided into blocks. Optionally, in an example, the coded frame is divided into a plurality of non-overlapping largest coding tree units (Coding Tree Units, CTUs), and each CTU can also be in a quadtree, or binary tree, or triple tree manner. Iteratively divides into a series of smaller coding units (Coding Unit, CU). In some examples, the CU may also include a prediction unit (Prediction Unit, PU) and a transformation unit (Transform Unit, TU) associated with it. The PU It is the basic unit of prediction, and TU is the basic unit of transformation and quantization. In some examples, the PU and TU are respectively obtained by dividing into one or more blocks on the basis of the CU, where one PU includes multiple prediction blocks (PB) and related syntax elements. In some examples, the PU and TU may be the same, or they may be obtained by the CU through different division methods. In some examples, at least two of the CU, PU, and TU are the same. For example, CU, PU, and TU are not distinguished, and prediction, quantization, and transformation are all performed in units of CU. For the convenience of description, the CTU, CU, or other formed data units are all referred to as coding blocks in the following.

It should be understood that, in the embodiment of the present application, the data unit for video encoding may be a frame, a slice, a coding tree unit, a coding unit, a coding block, or any group of the above. In different embodiments, the size of the data unit can vary.

Specifically, as shown in FIG. 2, after the coded frame is divided into multiple coded blocks, a prediction process is performed to remove the spatial and temporal redundant information of the current coded frame. Currently, more commonly used predictive coding methods include intra-frame prediction and inter-frame prediction. Intra-frame prediction uses only the reconstructed information in the current frame to predict the current coding block, while inter-frame prediction uses the information in other previously reconstructed frames (also called reference frames) to predict the current coding block. Make predictions. Specifically, in this embodiment of the present application, the encoding control module 202 is used to make a decision to select intra-frame prediction or inter-frame prediction.

When the intra-frame prediction mode is selected, the process of intra-frame prediction 203 includes obtaining the reconstructed block of the coded neighboring block around the current coding block as a reference block, and based on the pixel value of the reference block, the prediction mode method is used to calculate the predicted value to generate the predicted block , Subtracting the corresponding pixel values of the current coding block and the prediction block to obtain the residual of the current coding block, the residual of the current coding block is transformed 204, quantized 205, and entropy coding 210 to form the code stream of the current coding block. Further, after all the coded blocks of the current coded frame undergo the above-mentioned coding process, they form a part of the coded stream of the coded frame. In addition, the control and reference data generated in the intra-frame prediction 203 are also encoded by the entropy encoding 210 to form a part of the encoded bitstream.

Specifically, the transform 204 is used to remove the correlation of the residual of the image block, so as to improve the coding efficiency. The transformation of the residual data of the current coding block usually adopts two-dimensional discrete cosine transform (DCT) transformation and two-dimensional discrete sine transform (DST) transformation, for example, the residual information of the coded block Respectively multiply an N×M transformation matrix and its transposed matrix, and obtain the transformation coefficient of the current coding block after the multiplication.

After generating the transform coefficients, quantization 205 is used to further improve the compression efficiency. The transform coefficients can be quantized to obtain the quantized coefficients, and then the quantized coefficients are entropy-encoded 210 to obtain the residual code stream of the current encoding block. But it is not limited to content adaptive binary arithmetic coding (Context Adaptive Binary Arithmetic Coding, CABAC) entropy coding.

Specifically, the coded neighboring block in the intra prediction 203 process is: the neighboring block that has been coded before the current coding block is coded, and the residual generated in the coding process of the neighboring block is transformed 204, quantized 205, After inverse quantization 206 and inverse transform 207, the reconstructed block is obtained by adding the prediction block of the neighboring block. Correspondingly, the inverse quantization 206 and the inverse transformation 207 are the inverse processes of the quantization 206 and the transformation 204, which are used to restore the residual data before the quantization and transformation.

As shown in FIG. 2, when the inter-frame prediction mode is selected, the inter-frame prediction process includes motion estimation (ME) 208 and motion compensation (MC) 209. Specifically, the motion estimation is performed 208 according to the reference frame image in the reconstructed video frame, and the image block most similar to the current encoding block is searched for in one or more reference frame images according to a certain matching criterion as a matching block. The relative displacement with the current coding block is the motion vector (Motion Vector, MV) of the current coding block. Then, based on the motion vector and the reference frame, perform motion compensation 209 on the current coding block to obtain the prediction block of the current coding block. The original value of the pixel of the coding block is subtracted from the pixel value of the corresponding prediction block to obtain the residual of the coding block. The residual of the current coding block is transformed 204, quantized 205, and entropy coding 210 to form a part of the code stream of the coded frame. In addition, the control and reference data generated in the motion compensation 209 are also encoded by the entropy encoding 210 to form a part of the encoded bitstream.

Wherein, as shown in FIG. 2, the reconstructed video frame is a video frame obtained after filtering 211. Filtering 211 is used to reduce compression distortions such as blocking effects and ringing effects generated in the encoding process. The reconstructed video frame is used to provide reference frames for inter-frame prediction during the encoding process. In the decoding process, the reconstructed video frame is output after post-processing For the final decoded video.

The inter prediction modes in the video coding standard may include AMVP mode, Merge mode and Skip mode.

For the AMVP mode, the MVP can be determined first. After the MVP is obtained, the starting point of the motion estimation can be determined according to the MVP, and the motion search is performed near the starting point. After the search is completed, the optimal MV is obtained. The MV determines the reference block in With reference to the position in the image, the reference block is subtracted from the current block to obtain the residual block, and the MV is subtracted from the MVP to obtain the Motion Vector Difference (MVD), and the MVD is transmitted to the decoder through the code stream.

For the Merge mode, you can determine the MVP first, and directly determine the MVP as the MV. Among them, in order to obtain the MVP, you can first build a MVP candidate list (merge candidate list), in the MVP candidate list, you can include at least one candidate MVP, each candidate MVP can correspond to an index, the encoding end is from the MVP candidate list After selecting the MVP, the MVP index can be written into the code stream, and the decoder can find the MVP corresponding to the index from the MVP candidate list according to the index, so as to realize the decoding of the image block.

Among them, the establishment of the MVP candidate list in the Merge mode can refer to FIG. 3. The MVP candidate list may include temporal candidate motion vectors, spatial candidate motion vectors, pairwise motion vectors, or zero motion vectors. Among them, the pairwise motion vector can be obtained by averaging or weighted averaging based on the existing motion vectors in the candidate list. The spatial candidate motion vector is obtained from the position of the gray box from 1 to 5 in Figure 3, and the temporal candidate motion vector is obtained from the co-located CU in the coded image adjacent to the current CU. Unlike the spatial domain, the temporal candidate motion vector cannot be directly used as a candidate. The motion information of the block needs to be adjusted according to the position relationship of the reference image. The specific scaling method will not be repeated here.

For Skip mode, it is a special merge mode that only needs to pass the index of the MVP. In addition to the need to transmit MVD information, there is also no need to transmit residuals.

Skip mode is a special case of Merge mode. After obtaining the MV according to the Merge mode, if the encoder determines that the current block is basically the same as the reference block, then there is no need to transmit residual data, only the index of the MV needs to be transmitted, and further a flag can be passed, which can indicate that the current block can be directly Obtained from the reference block.

In other words, the feature of the Merge mode is: MV=MVP (MVD=0); and the Skip mode has one more feature, namely: reconstruction value rec=predicted value pred (residual value resi=0).

In the process of motion estimation, due to the continuity of natural object motion, the motion vector of the object between two adjacent frames may not be exactly an integer number of pixel units. In order to improve the accuracy of the motion vector, a motion vector with 1/4 pixel accuracy is used for the motion estimation of the luminance component in HEVC. However, there are no samples at fractional pixels in digital video. Generally speaking, in order to achieve 1/K pixel accuracy estimation, the values of these fractional pixels must be approximately interpolated, that is, the line direction and the reference frame K-fold interpolation is performed in the column direction, and a search is performed in the image after interpolation.

In the current video coding reference architecture, the common AMVP mode sets four types of adaptive motion vector resolution (Advanced Motion Vector Resolution, AMVR) accuracy (integer pixels, 4 pixels, 1/4 pixels and 1/2 accuracy), which need to be explained Yes, the accuracy of these AMVRs is only an example, and the embodiments of this application do not specifically limit the value of pixel accuracy. For example, there may also be 1/8 pixels, 1/16 pixels, etc., here are integer pixels, 4 pixels, 1/ Take 4 pixels and 1/2 pixels as an example. For each CU adopting AMVR technology (in some cases, CU may not adopt AMVR), the corresponding MV accuracy (integer pixel, 4 pixel or 1/4 pixel or 1/2 pixel) is adaptively decided at the encoding end, and Write the result of the decision into the code stream and pass it to the decoding end.

The number of taps corresponding to interpolation filters of different AMVR accuracy may be different. For example, for 1/4 pixel accuracy, an eight-tap interpolation filter is used. For 1/2 pixel, Gaussian interpolation filter (six-tap interpolation filter) is used. Due to the different interpolation filters used, when storing the motion vector, the interpolation filter currently used by the CU needs to be stored. As an example, the interpolation filter can be represented by 1 bit. When a Gaussian filter (six-tap interpolation filter) is used, it is stored as 1, and when a Gaussian filter is not used, it is stored as 0. When the subsequent CU refers to the previously encoded CU, the interpolation filter used by the current CU needs to be determined according to the identification bit of the interpolation filter. Here, the interpolation filter used by the current CU can be understood as an interpolation filter used when performing pixel interpolation on the reference block of the current CU. The interpolation filter used by the image block mentioned below can be understood as the interpolation filter used when performing pixel interpolation on the reference block of the image block.

For the Merge mode, since the MV information of the spatial neighboring block and the temporal neighboring block need to be used in the process of constructing the MVP candidate list, after each CU is encoded, the MV finally used needs to be stored for subsequent follow-up MV reference, the MV will store information such as the value of the MV vector, the index of the reference frame, the prediction mode of the current CU, and so on. Moreover, when the current block is performing motion compensation, it completely inherits the MV of the neighboring block and its corresponding interpolation filter. Therefore, after each CU completes encoding, it is also necessary to store the interpolation filter used by the current CU. The specific storage method may also be a 1-bit identification bit for storage, and the storage method may refer to the foregoing description, which will not be repeated here.

For example, the optimal MV selected by the current block from the motion vector candidate list is the time domain MV. If it is MV0, and the identification bit of the interpolation filter corresponding to the MV0 is 0, it means that the time domain neighboring blocks of the current block are doing In motion compensation, Gaussian filter is used to perform pixel interpolation on its reference block. Then, after the current block uses MV0 to find the reference block of the current block from the reference frame, it will also use a Gaussian filter, that is, a 6-tap interpolation filter, to perform pixel interpolation on its reference block. After the pixels are interpolated, the prediction of the current block is obtained. Piece. The residual block can be obtained by subtracting the current block from the reference block.

From the above description, it can be seen that when the current block performs motion compensation, it completely inherits the MV of the neighboring block and its corresponding interpolation filter, so that the interpolation filter used by the neighboring image block is strongly dependent. This coding and decoding method will lead to a reduction in coding and decoding efficiency and result in loss of performance of the coding and decoding device. In addition, since each CU needs to store the interpolation filter it uses after encoding, it occupies storage resources, and in the process of encoding and decoding, the number of CUs is very large. The storage overhead required to store the identifier of the interpolation filter will be very large, which will require very high hardware.

In order to solve this problem, the following solutions provided by the embodiments of the present application can reduce its dependence on the encoding and decoding process of adjacent blocks, thereby improving the encoding and decoding efficiency during the encoding and decoding process, and improving the performance of the encoding and decoding device. In addition, storage costs can be saved.

The pixel interpolation of the image block referred to in this application refers to the pixel interpolation of the reference block of the image block, and the interpolation filter used in the process is to use the interpolation filter to perform pixel interpolation on the reference block of the image block .

FIG. 4 is a schematic flowchart of a video encoding and decoding method 400 according to an embodiment of the present application. The method 400 includes at least part of the following content. Among them, the following method 400 can be used on the encoding side and can also be used on the decoding side.

In 410, when performing pixel interpolation on an image block of the current image, one of at least two interpolation filters may be used for pixel interpolation, and the current image includes a first image block and a second image block;

In 420, the first interpolation filter is used to perform pixel interpolation on the reference block of the first image block; the default interpolation filter is used to perform pixel interpolation on the reference block of the second image block, and the first image block is the phase of the second image block. Adjacent block.

It should be noted that the interpolation filter used by the image block mentioned in this application can be understood as the interpolation filter adopted/used when performing pixel interpolation on the reference block of the image block.

In the embodiment of the present application, when pixel interpolation is performed on the reference block of the second image block, the interpolation filter of the neighboring block is not inherited, but the default interpolation filter is used to perform pixel interpolation on the reference block of the second image block .

And it should be pointed out that the image block in the current image has at least two kinds of interpolation filters to choose from when performing pixel interpolation, for example, including at least a 6-tap interpolation filter and an 8-tap interpolation filter. Exemplarily, different pixel precisions may also correspond to interpolation filters with different taps. Taking the AMVR precision interpolation filter as an example, for 1/4 pixel precision, an eight-tap interpolation filter is used. For 1/2 pixel, Gaussian interpolation filter (six-tap interpolation filter) is used. The foregoing corresponding relationship is only an exemplary representation, and does not constitute a limitation to the present application.

In the embodiment of the present application, after the first image block is coded and decoded, the interpolation filter used by the current CU is not stored, and the identification bit is not set. When the subsequent CU refers to the previously coded CU, it is not necessary The interpolation filter used by the current CU is determined according to the identification bit of the interpolation filter, but the default interpolation filter is directly used.

Exemplarily, the pixel accuracy used for pixel interpolation of the reference block of the second image block is at the sub-pixel level.

Exemplarily, the pixel precision used for pixel interpolation on the reference block of the second image block is 1/2 pixel precision. The pixel precision used for pixel interpolation on the reference block of the second image block is 1/2 pixel precision for illustrative purposes only, and other pixel precisions may be used in other embodiments of this application, such as 1/4 pixel precision, 1/ 8 pixel accuracy and so on.

Although the foregoing exemplarily expresses the case where the second image block has a sub-pixel accuracy level, the embodiments shown in this application can also be applied to pixel interpolation with integer pixel accuracy.

Exemplarily, the first image block is a temporal neighboring block of the second image block. For example, the first image block is located on the reference frame, the second image block is located on the current frame, the correlation prediction between the first image block and the second image block is inter prediction, and the direction of inter prediction can be forward prediction, Backward prediction, two-way prediction, etc. Forward prediction uses the previous reconstructed frame ("historical frame") to predict the current frame. Backward prediction is to use frames after the current frame ("future frame") to predict the current frame. Bidirectional prediction uses not only "historical frames" but also "future frames" to predict the current frame. This application is not limited to any one of the above three prediction methods. When the first image block and the second image block are temporal adjacent blocks, the temporal candidate list is obtained from the co-located CU in the adjacent coded image of the current CU. The temporal candidate list cannot directly use the motion information of the candidate block, but needs to be based on The positional relationship of the reference image is adjusted accordingly. The specific scaling method will not be repeated here.

The bidirectional prediction mode is one of the dual motion vector modes. The dual motion vector mode includes dual forward prediction mode, dual backward prediction mode and bidirectional prediction mode. The dual forward prediction mode includes two forward motion vectors, and the dual backward prediction mode includes two backward motion vectors. Among them, the bidirectional prediction mode includes a forward prediction mode and a backward prediction mode.

Exemplarily, the first image block is a spatial neighboring block of the second image block. For example, the first image block and the second image block are both located on the current frame.

When the first image block and the second image block are spatially adjacent blocks, for example, in the Merge mode, a motion vector candidate list will be constructed for the current coding unit, as shown in Figure 3, the spatial candidate list in the merge mode Obtain from the position of the boxes 1 to 5 in the figure.

Optionally, the first interpolation filter used by the first image block is a default interpolation filter or a non-default interpolation filter. In this application, the first image block may be merge mode, AMVP mode or Skip mode. When the first image block is in the merge mode or the Skip mode, the first interpolation filter is the default interpolation filter. When the first image block is in the AMVP mode, the first interpolation filter is the actually selected and determined interpolation filter. Exemplarily, if the pixel accuracy actually selected in AMVR is 1/2 pixel accuracy, a 6-tap interpolation filter is used, and if the actual selected pixel accuracy is 1/4 pixel accuracy, an 8-tap interpolation filter is used. When other pixel accuracy is determined, other interpolation filters can be selected accordingly.

Exemplarily, the default interpolation filter is an interpolation filter with a default number of taps. In this embodiment, the default interpolation filter refers to an interpolation filter with a default number of taps. Illustratively, the interpolation filter with a default number of taps includes a 6-tap interpolation filter or an 8-tap interpolation filter. The 6-tap and 8-tap in the embodiment of the present application are only used as an example, and do not constitute a limitation on the default interpolation filter.

Exemplarily, the default interpolation filter is an interpolation filter with a default weight value. For the understanding of the weight value, an explanation is given below. Taking a 6-tap interpolation filter with 1/2 pixel accuracy as an example, one sub-pixel needs to be interpolated between every two whole pixels in the reference block, and the sub-pixel is 1/2 pixel. Since there is no pixel value on the 1/2 pixel, the pixel values of the 6 integral pixels on the left and the 6 integral pixels on the right of the 1/2 pixel need to be used to calculate the pixel of the 1/2 pixel. value. Assume that the calculation formula for the pixel value of the 1/2 pixel point is q=round((–A0+4*A1–10*A2+58*A3+17*A4–5*A5+A6)/B). Among them, for different whole pixels, the weight values are shown respectively. The weight value refers to the value given in front of A0～A6, which represents the weight value given by the pixel to the final calculation result. It can be seen that, For different pixels, the weight value has been determined as a specific value, and the specific value can be determined by setting or by a default value.

The foregoing example is only provided for understanding the default interpolation filter of the embodiment of the present application, and does not constitute a limitation to the embodiment of the present application.

Optionally, the first image block may adopt Merge prediction mode or Skip mode.

Exemplarily, before using the default interpolation filter to perform pixel interpolation on the reference block of the second image block, the method further includes: obtaining a motion vector candidate list; selecting a motion vector from the motion vector candidate list; determining the first motion vector from the reference frame according to the motion vector The reference block of the second image block; after using the default interpolation filter to perform pixel interpolation on the reference block of the second image block, the method further includes: determining a residual error according to the reference block after the pixel interpolation of the reference block and the second image block.

Exemplarily, after the first interpolation filter is used to perform pixel interpolation on the reference block of the first image block, the identifier of the first interpolation filter used by the reference block of the first image block is not stored.

Exemplarily, after the default interpolation filter is used to perform pixel interpolation on the reference block of the second image block, the identifier of the default interpolation filter used by the reference block of the second image block is not stored.

In the foregoing embodiment, only two images of the first image block and the second image block are described as examples. However, in the specific encoding and decoding process, when encoding and decoding all image blocks, or for any image When a block is coded or decoded, the identifier of the interpolation filter used by it is not stored. Exemplarily, no matter which of the inter-frame prediction modes is adopted for the current block, such as AMVP, merge, or skip, etc., the identifier of the interpolation filter used for the current block is not stored.

It can be seen that the foregoing encoding and decoding method can reduce its dependence on the encoding and decoding process of adjacent blocks, thereby improving the encoding and decoding efficiency during the encoding and decoding process, and improving the performance of the encoding and decoding device. In addition, since the type of interpolation filter is not stored, hardware storage resources can be saved.

The above embodiment is described in the type that all blocks do not store the interpolation filter. In other optional embodiments, the corresponding interpolation filter may be stored only for the spatial MV, but not for the time domain MV. Corresponding interpolation filter.

Exemplarily, after the first interpolation filter is used to perform pixel interpolation on the reference block of the first image block, the identifier of the first interpolation filter used by the reference block of the first image block is stored in the space domain. The spatial storage shown in this application refers to storing the identification bit of the interpolation filter used by the reference block of the current block in the buffer of the spatial MV information. In this case, when performing pixel interpolation on the reference block of the current block, the type of interpolation filter corresponding to the spatial MV can be directly read from the buffer.

Exemplarily, after the first interpolation filter is used to perform pixel interpolation on the reference block of the first image block, the identification of the first interpolation filter used by the reference block of the first image block is not stored in the time domain. The time-domain storage shown in this application means that the identification bit of the interpolation filter used by the reference block of the current block is not stored in the buffer of the time-domain MV information. In this case, when performing pixel interpolation on the reference block of the current block, it is not necessary to read the type of the interpolation filter corresponding to the time domain MV from the buffer corresponding to the time domain MV, but directly use the default interpolation filter.

In this embodiment, only the corresponding interpolation filter is stored for the spatial MV, and the corresponding interpolation filter is not stored for the time domain MV, because the information of the entire frame of image is stored when the time domain is stored, and the pressure of the storage is Therefore, the corresponding interpolation filter can be stored only for the spatial MV, while the corresponding interpolation filter is not stored for the time domain MV, which can also relieve part of the storage pressure. When the subsequent CU performs a motion search, if the optimal MV selected from the motion vector candidate list is the time domain MV, the default interpolation filter is directly used to perform pixel interpolation on its reference block. If the optimal MV selected from the motion vector candidate list is a spatial MV, the interpolation filter corresponding to the spatial MV is still used to perform pixel interpolation on the reference block.

In another optional embodiment, for the spatial MV and the time domain MV, the corresponding interpolation filter is not stored. When the subsequent CU performs a motion search, it directly uses the default interpolation filter to perform pixel interpolation on its reference block. . This can reduce the storage overhead of the hardware. The process is saved, the coding and decoding efficiency is improved, and the performance of the coding and decoding device is improved.

In another alternative embodiment, only the corresponding interpolation filter is stored for the time domain MV, and the corresponding interpolation filter is not stored for the spatial MV. Since the storage of the time domain MV will also occupy storage space, This results in an increase in storage pressure. Therefore, only the corresponding interpolation filter can be stored for the time domain MV, and the corresponding interpolation filter can not be stored for the spatial MV, so that part of the storage pressure can also be relieved. When the subsequent CU performs a motion search, if the optimal MV selected from the motion vector candidate list is a spatial MV, the default interpolation filter is directly used to perform pixel interpolation on its reference block. If the optimal MV selected from the motion vector candidate list is the time domain MV, the interpolation filter corresponding to the time domain MV is still used to perform pixel interpolation on the reference block.

Exemplarily, the motion vector candidate list does not include the identifier of the first interpolation filter used by the reference block of the first image block. In the embodiment shown in this application, there is no need to obtain the identifier of the first interpolation filter used by the reference block of the first image block, and the default interpolation filter is directly used, which can save storage space and ensure coding performance.

Exemplarily, the motion vector candidate list includes one or more of spatial candidate motion vectors, temporal candidate motion vectors, candidate motion vectors based on historical information, and paired candidate motion vectors, where the paired candidate motion vectors are spatial domain candidates. One or more of candidate motion vectors, temporal candidate motion vectors, or candidate motion vectors based on historical information are determined. For example, the paired candidate motion vector is determined based on the mean/weighted mean of the spatial candidate motion vector and/or the temporal candidate motion vector.

Exemplarily, for the brightness mode, the default interpolation filter is used to perform pixel interpolation on the reference block of the second image block. The embodiment shown in this application can work in the brightness mode.

Exemplarily, for the chroma mode, the default interpolation filter is used to perform pixel interpolation on the reference block of the second image block. The embodiments shown in this application can work in chroma mode.

It should be understood that the solutions of the embodiments of the present application may not be limited to the above-mentioned AMVP mode, Merge mode or Skip mode, but can also be used in other encoding and decoding modes, that is to say, it can be applied to the use of interpolation in the encoding and decoding process. Any codec mode of filter information storage.

It should also be understood that the solution of the embodiment of the present application can reduce its dependence on the encoding and decoding process of adjacent blocks, thereby improving the encoding and decoding efficiency during the encoding and decoding process, and improving the performance of the encoding and decoding device, so that it can be used for image blocks. It should be understood that the embodiment of the present application can also be used in a coding and decoding scenario, that is, reducing the storage pressure of the system can be used for other purposes.

FIG. 5 shows a schematic block diagram of a video encoding and decoding apparatus 500 according to an embodiment of the present application.

As shown in FIG. 5, the video encoding and decoding apparatus 500 may include a processor 510, and may further include a memory 520.

It should be understood that the video encoding and decoding apparatus 500 may also include components commonly included in other video encoding and decoding apparatuses, such as input and output devices, communication interfaces, etc., which are not limited in the embodiment of the present application.

The memory 520 is used to store computer-executable instructions.

The memory 520 may be various types of memory, for example, it may include a high-speed random access memory (Random Access Memory, RAM), and may also include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. The example does not limit this.

The processor 510 is configured to access the memory 520 and execute the computer-executable instructions to perform operations in the method for video processing in the foregoing embodiment of the present application.

The processor 510 may include a microprocessor, a field-programmable gate array (Field-Programmable Gate Array, FPGA), a central processing unit (CPU), a graphics processor (Graphics Processing Unit, GPU), etc. The implementation of this application The example does not limit this.

The device for video processing and the computer system in the embodiment of the application may correspond to the execution body of the method for video processing in the embodiment of the application, and the above and other aspects of the device and the computer system for the video processing are described above. The operations and/or functions are used to implement the corresponding procedures of the foregoing methods, and are not repeated here for brevity.

It should be understood that the video processor can implement the corresponding operations implemented by the codec device in the above method embodiments. The video encoder may further include a body on which the encoder device is installed.

Exemplarily, the body includes at least one of a mobile phone, a camera, or a drone.

The embodiment of the present application also provides a computer-readable storage medium, and the computer-readable storage medium stores program instructions, and the program instructions may be used to instruct to perform the above-mentioned loop filtering method of the embodiment of the present application.

It should be understood that, in the embodiments of the present application, the term "and/or" is merely an association relationship describing an associated object, indicating that there may be three relationships. For example, A and/or B can mean that: A alone exists, A and B exist at the same time, and B exists alone. In addition, the character "/" in this text generally indicates that the associated objects before and after are in an "or" relationship.

A person of ordinary skill in the art may be aware that the units and algorithm steps of the examples described in the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of both, in order to clearly illustrate the hardware and software Interchangeability, in the above description, the composition and steps of each example have been generally described in accordance with the function. Whether these functions are performed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of this application.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the above-described system, device, and unit may refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments of the present application.

In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of this application is essentially or the part that contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium. It includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program instructions .

The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Anyone familiar with the technical field can easily think of various equivalents within the technical scope disclosed in this application. Modifications or replacements, these modifications or replacements shall be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

A video encoding and decoding method, which is characterized in that it comprises:

When performing pixel interpolation on an image block of the current image, one of at least two interpolation filters may be used for pixel interpolation, and the current image includes a first image block and a second image block;

Performing pixel interpolation on the reference block of the first image block by using a first interpolation filter;

A default interpolation filter is used to perform pixel interpolation on the reference block of the second image block, and the first image block is an adjacent block of the second image block.
The method according to claim 1, wherein the pixel accuracy used for pixel interpolation of the reference block of the second image block is at the sub-pixel level.
The method according to claim 1 or 2, wherein the pixel accuracy used for pixel interpolation on the reference block of the second image block is 1/2 pixel accuracy.
The method according to any one of claims 1 to 3, wherein the first image block is a temporally adjacent block of the second image block.
The method according to any one of claims 1 to 3, wherein the first image block is a spatial neighboring block of the second image block.
The method according to any one of claims 1 to 5, wherein the default interpolation filter is an interpolation filter with a default number of taps.
The method according to any one of claims 1 to 6, wherein the first interpolation filter is the default interpolation filter or a non-default interpolation filter.
The method according to claim 6, wherein the interpolation filter with the default number of taps comprises a 6-tap interpolation filter or an 8-tap interpolation filter.
The method according to any one of claims 1 to 8, wherein the second image block adopts a Merge prediction mode or a Skip mode.
The method according to any one of claims 1 to 8, wherein the first image block adopts AMVP prediction mode or Merge prediction mode or Skip mode.
8. The method according to claim 7, wherein before the pixel interpolation is performed on the reference block of the second image block by using the default interpolation filter, the method further comprises:

Obtaining a list of motion vector candidates;

Selecting a motion vector from the motion vector candidate list;

Determining a reference block of the second image block from a reference frame according to the motion vector;

After performing pixel interpolation on the reference block of the second image block by using the default interpolation filter, the method further includes:

The residual is determined according to the reference block after the pixel interpolation of the reference block and the second image block.
The method according to any one of claims 1 to 11, wherein after the first interpolation filter is used to perform pixel interpolation on the reference block of the first image block, the image of the first image block is not The identifier of the first interpolation filter used by the reference block is stored.
The method according to any one of claims 1 to 12, wherein after the default interpolation filter is used to perform pixel interpolation on the reference block of the second image block, the reference block of the second image block is not The identifier of the default interpolation filter used by the block is stored.
The method according to any one of claims 1 to 13, wherein after the first interpolation filter is used to perform pixel interpolation on the reference block of the first image block, the image of the first image block is not The identifier of the first interpolation filter used by the reference block is stored in the time domain.
The method according to claim 12, wherein after pixel interpolation is performed on the reference block of the first image block by using the first interpolation filter, all the reference blocks of the first image block are used. The identifier of the first interpolation filter is stored in the spatial domain.
The method according to claim 12, wherein after the first interpolation filter is used to perform pixel interpolation on the reference block of the first image block, all the reference blocks of the first image block are not used. The identifier of the first interpolation filter is stored in the spatial domain.
The method according to claim 11, wherein the motion vector candidate list does not include an identifier of the first interpolation filter used by the reference block of the first image block.
The method according to claim 11, wherein the motion vector candidate list includes a candidate motion vector in the spatial domain, a candidate motion vector in the time domain, a candidate motion vector based on historical information, a pair of candidate motion vectors, or a zero motion vector. One or more, wherein the paired candidate motion vector is determined by one or more of the candidate motion vector in the spatial domain, the candidate motion vector in the time domain, or the candidate motion vector based on historical information.
The method according to any one of claims 1 to 18, wherein for the brightness mode, the default interpolation filter is used to perform pixel interpolation on the reference block of the second image block.
The method according to any one of claims 1 to 18, wherein for the chrominance mode, the default interpolation filter is used to perform pixel interpolation on the reference block of the second image block.
A video encoding and decoding device, characterized in that the video encoding and decoding device includes:

Memory, used to store executable instructions;

The processor is configured to execute the instructions stored in the memory to enable the video encoding and decoding method, and the method includes:

When performing pixel interpolation on an image block of the current image, one of at least two interpolation filters may be used for pixel interpolation, and the current image includes a first image block and a second image block;

Performing pixel interpolation on the reference block of the first image block by using a first interpolation filter;

A default interpolation filter is used to perform pixel interpolation on the reference block of the second image block, and the first image block is an adjacent block of the second image block.
The video encoding and decoding device according to claim 21, wherein the pixel accuracy used for pixel interpolation of the reference block of the second image block is at the sub-pixel level.
The video encoding and decoding device according to claim 21 or 22, wherein the pixel precision used for pixel interpolation on the reference block of the second image block is 1/2 pixel precision.
The video encoding and decoding device according to any one of claims 21 to 23, wherein the first image block is a temporal neighboring block of the second image block.
The video encoding and decoding device according to any one of claims 21 to 23, wherein the first image block is a spatial neighboring block of the second image block.
The video encoding and decoding device according to any one of claims 21 to 25, wherein the default interpolation filter is an interpolation filter with a default number of taps.
The video encoding and decoding device according to any one of claims 21 to 26, wherein the first interpolation filter is the default interpolation filter or a non-default interpolation filter.
The video encoding and decoding device according to claim 26, wherein the interpolation filter with the default number of taps comprises a 6-tap interpolation filter or an 8-tap interpolation filter.
The video encoding and decoding device according to any one of claims 21 to 28, wherein the second image block adopts a Merge prediction mode or a Skip mode.
The video encoding and decoding device according to any one of claims 21 to 28, wherein the first image block adopts AMVP prediction mode or Merge prediction mode or Skip mode.
The video encoding and decoding device according to claim 27, wherein before the pixel interpolation is performed on the reference block of the second image block by using the default interpolation filter, the method further comprises:

Obtaining a list of motion vector candidates;

Selecting a motion vector from the motion vector candidate list;

Determining a reference block of the second image block from a reference frame according to the motion vector;

After performing pixel interpolation on the reference block of the second image block by using the default interpolation filter, the method further includes:

The residual is determined according to the reference block after the pixel interpolation of the reference block and the second image block.
The video encoding and decoding device according to any one of claims 21 to 31, wherein after the first interpolation filter is used to perform pixel interpolation on the reference block of the first image block, the first image block is not subjected to pixel interpolation. The identifier of the first interpolation filter used by the reference block of the image block is stored.
The video encoding and decoding device according to any one of claims 21 to 32, wherein after pixel interpolation is performed on the reference block of the second image block by using the default interpolation filter, the second image is not The identifier of the default interpolation filter used by the reference block of the block is stored.
The video encoding and decoding device according to any one of claims 21 to 33, wherein after the first interpolation filter is used to perform pixel interpolation on the reference block of the first image block, the first image block is not subjected to pixel interpolation. The identifier of the first interpolation filter used by the reference block of the image block is stored in the time domain.
The video encoding and decoding device according to claim 32, wherein after pixel interpolation is performed on the reference block of the first image block by using the first interpolation filter, the reference block of the first image block is The used identifier of the first interpolation filter is stored in the spatial domain.
The video encoding and decoding device according to claim 32, wherein after the first interpolation filter is used to perform pixel interpolation on the reference block of the first image block, the reference block of the first image block is not The used identifier of the first interpolation filter is stored in the spatial domain.
The video encoding and decoding device according to claim 31, wherein the motion vector candidate list does not include an identifier of the first interpolation filter used by the reference block of the first image block.
The video encoding and decoding device according to claim 31, wherein the motion vector candidate list includes spatial candidate motion vectors, temporal candidate motion vectors, candidate motion vectors based on historical information, paired candidate motion vectors, or zero motion One or more of the vector, wherein the paired candidate motion vector is one or more of the candidate motion vector in the spatial domain, the candidate motion vector in the time domain, or the candidate motion vector based on historical information definite.
The video encoding and decoding device according to any one of claims 21 to 38, wherein for the brightness mode, the default interpolation filter is used to perform pixel interpolation on the reference block of the second image block.
The video encoding and decoding device according to any one of claims 21 to 38, wherein for the chrominance mode, the default interpolation filter is used to perform pixel interpolation on the reference block of the second image block.
A video codec, which is characterized in that it comprises:

The video codec device according to any one of claims 21 to 40; and

The body, the codec device is mounted on the body.
The video codec according to claim 41, wherein the body comprises at least one of a mobile phone, a camera, or a drone.
A computer-readable storage medium having a computer program stored thereon, wherein the program is executed by a processor to implement the video encoding and decoding method according to any one of claims 1 to 20.