CN112204977A

CN112204977A - Video encoding and decoding method, device and computer readable storage medium

Info

Publication number: CN112204977A
Application number: CN201980034107.9A
Authority: CN
Inventors: 马思伟; 王苏红; 郑萧桢; 王苫社
Original assignee: Peking University; SZ DJI Technology Co Ltd
Current assignee: Peking University; SZ DJI Technology Co Ltd; SZ DJI Innovations Technology Co Ltd
Priority date: 2019-09-24
Filing date: 2019-09-24
Publication date: 2021-01-08
Also published as: WO2021056210A1

Abstract

A video encoding and decoding method, device and computer readable storage medium, the method comprising: obtaining a motion vector candidate list, and selecting a motion vector from the motion vector candidate list; determining a reference block of the current block according to the selected motion vector; and adopting a default interpolation filter to perform pixel interpolation on the reference block. According to the video coding and decoding method and device and the computer readable storage medium, after the motion vector predicted value is determined from the motion vector predicted value candidate list and the reference block of the current block is determined based on the motion vector predicted value, the type of the interpolation filter does not need to be selected and judged in a complicated mode, pixel interpolation is directly carried out on the reference block by adopting the default interpolation filter, the coding and decoding complexity can be reduced on the premise that the coding and decoding efficiency is not reduced, and the hardware cost is saved.

Description

Video encoding and decoding method, device and computer readable storage medium

Technical Field

The present application relates generally to the field of video encoding and decoding, and more particularly, to a video encoding and decoding method, apparatus and computer-readable storage medium.

Background

Prediction is an important module of the mainstream video coding framework and can include intra-prediction and inter-prediction. At present, after a main video coding standard interframe prediction part selects a motion vector from a motion vector candidate list, a corresponding interpolation filter needs to be determined based on the type of the selected motion vector, the process is complicated, and unnecessary coding and decoding burden is introduced.

Disclosure of Invention

The embodiment of the application provides a video coding and decoding scheme, which can reduce the complexity of coding and decoding and save the hardware overhead. The video encoding and decoding scheme proposed by the present application is briefly described below, and more details will be described in the following detailed description with reference to the accompanying drawings.

According to an aspect of an embodiment of the present application, there is provided a video encoding and decoding method, the method including: obtaining a motion vector candidate list, and selecting a motion vector from the motion vector candidate list; determining a reference block of the current block according to the selected motion vector; and adopting a default interpolation filter to perform pixel interpolation on the reference block.

According to another aspect of the embodiments of the present application, there is provided a video encoding and decoding method, including: obtaining a motion vector candidate list, and selecting a motion vector from the motion vector candidate list; determining a reference block of the current block according to the selected motion vector; and if the selected motion vector is a paired average motion vector, performing pixel interpolation on the reference block by adopting a default interpolation filter, wherein the paired average motion vector is determined according to at least two candidate motion vectors in the motion vector candidate list, and the interpolation filters corresponding to the at least two candidate motion vectors are the same or different.

According to another aspect of the embodiments of the present application, there is provided a video encoding and decoding apparatus, the apparatus including a memory and a processor, the memory having a computer program stored thereon, the computer program being executed by the processor, the processor performing the following operations as a result of reading the computer program stored in the memory: obtaining a motion vector candidate list, and selecting a motion vector from the motion vector candidate list; determining a reference block of the current block according to the selected motion vector; and adopting a default interpolation filter to perform pixel interpolation on the reference block.

According to another aspect of the embodiments of the present application, there is provided a video codec device, the device including a memory and a processor, the memory having a computer program stored thereon, the computer program being executed by the processor, the processor performing the following operations as a result of reading the computer program stored in the memory: obtaining a motion vector candidate list, and selecting a motion vector from the motion vector candidate list; determining a reference block of the current block according to the selected motion vector; and if the selected motion vector is a paired average motion vector, performing pixel interpolation on the reference block by adopting a default interpolation filter, wherein the paired average motion vector is determined according to at least two candidate motion vectors in the motion vector candidate list, and the interpolation filters corresponding to the at least two candidate motion vectors are the same or different.

According to yet another aspect of the embodiments of the present application, there is provided a computer-readable storage medium having a computer program stored thereon, where the computer program is executed by a processor to perform any one of the video coding and decoding methods described above.

According to the video coding and decoding method and device and the computer readable storage medium, after the motion vector predicted value is determined from the motion vector predicted value candidate list and the reference block of the current block is determined based on the motion vector predicted value, the type of the interpolation filter does not need to be selected and judged in a complicated mode, pixel interpolation is directly carried out on the reference block by adopting the default interpolation filter, the coding and decoding complexity can be reduced on the premise that the coding and decoding efficiency is not reduced, and the hardware cost is saved.

Drawings

Fig. 1 shows an architecture diagram of a solution according to an embodiment of the present application.

Fig. 2 shows a schematic diagram of a video coding framework according to an embodiment of the present application.

Fig. 3 shows a schematic flow chart of a video coding and decoding method according to an embodiment of the present application.

Fig. 4 is a schematic diagram illustrating the establishment of a spatial motion vector candidate list used in a video encoding and decoding method according to an embodiment of the present application.

Fig. 5 is a schematic diagram illustrating the establishment of a temporal motion vector candidate list used in a video coding and decoding method according to an embodiment of the present application.

Fig. 6 shows a schematic block diagram of a video codec device according to an embodiment of the present application.

Detailed Description

Example embodiments of the present application will be described below with reference to the accompanying drawings.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term "and/or" includes any and all combinations of the associated listed items. The plurality mentioned in the embodiments of the present application includes at least two, and at least two may be, for example, 2, 3, 4, or more. In the embodiments of the present application, "a or B" includes both a alone and B alone, and also includes a combination of a and B.

In order to thoroughly understand the embodiments of the present application, detailed steps and detailed structures will be provided in the following description so as to explain the technical solutions provided by the embodiments of the present application.

Fig. 1 is an architecture diagram of a solution to which an embodiment of the present application is applied.

As shown in FIG. 1, the system 100 can receive the data 102 to be processed, process the data 102 to be processed, and generate processed data 108. For example, the system 100 may receive data to be encoded, encoding the data to be encoded to produce encoded data, or the system 100 may receive data to be decoded, decoding the data to be decoded to produce decoded data. In some embodiments, the components in system 100 may be implemented by one or more processors, which may be processors in computing devices, as well as processors in mobile devices (e.g., cell phones, cameras, drones, etc.). The processor may be any kind of processor, which is not limited in this application. In some possible designs, the processor may include an encoder, a decoder, a codec, or the like. One or more memories may also be included in the system 100. The memory may be used to store instructions and data, such as computer-executable instructions to implement aspects of embodiments of the present application, pending data 102, processed data 108, and the like. The memory may be any kind of memory, which is not limited in this embodiment of the present application.

The data to be encoded may include text, images, graphical objects, animation sequences, audio, video, or any other data that needs to be encoded. In some cases, the data to be encoded may include sensory data from sensors, which may be visual sensors (e.g., cameras, infrared sensors), microphones, near-field sensors (e.g., ultrasonic sensors, radar), position sensors, temperature sensors, touch sensors, and so forth. In some cases, the data to be encoded may include information from the user, e.g., biometric information, which may include facial features, fingerprint scans, retinal scans, voice recordings, DNA samples, and the like.

Fig. 2 is a schematic diagram of a video coding framework according to an embodiment of the present application. As shown in fig. 2, after receiving the video to be encoded, each frame of the video to be encoded is encoded in turn, starting from the first frame of the video to be encoded. Wherein, the current coding frame mainly passes through: and (3) processing Prediction (Prediction), transformation (Transform), Quantization (Quantization), Entropy Coding (encoding) and the like, and finally outputting the code stream of the current Coding frame. Correspondingly, the decoding process generally decodes the received code stream according to the inverse process of the above process to recover the video frame information before decoding.

Specifically, as shown in fig. 2, the video coding framework 2 includes a coding control module 201 for performing decision control actions and parameter selection during the coding process. For example, as shown in fig. 2, the encoding control module 202 controls parameters used in transformation, quantization, inverse quantization, and inverse transformation, and controls the selection of intra-frame or inter-frame modes, and the parameter control of motion estimation and filtering, and the control parameters of the encoding control module 202 are also input into the entropy encoding module and encoded to form a part of the encoded code stream.

When the current coding frame is coded, the coding frame is divided 202, specifically, the coding frame is divided into slices (slices) and then divided into blocks. Optionally, in an example, the coded frame is divided into a plurality of largest Coding Tree Units (CTUs) that do not overlap with each other, and each CTU may be further iteratively divided into a series of smaller Coding Units (CUs) in a quadtree manner, a binary tree manner, or a ternary tree manner, respectively. In some examples, a PU and a TU are each divided into one or more blocks on a CU basis, where a PU includes multiple Prediction Blocks (PBs) and associated syntax elements. In some examples, the PU and TU may be the same or derived from the CU by different partitioning methods. In some examples, at least two of the CU, PU, and TU are the same, e.g., without distinguishing the CU, PU, and TU, all are predicted, quantized, and transformed in units of CUs. For convenience of description, a CTU, CU, or other formed data unit is hereinafter referred to as an encoded block.

It should be understood that in the embodiments of the present application, the data unit for video coding may be a frame, a slice, a coding tree unit, a coding block or a group of any of the above. The size of the data units may vary in different embodiments.

Specifically, as shown in fig. 2, after the encoded frame is divided into a plurality of encoded blocks, a prediction process is performed to remove redundant information in spatial domain and temporal domain of the current encoded frame. The currently used prediction coding methods include intra-frame prediction and inter-frame prediction. Intra-frame prediction uses only the reconstructed information in the current frame image to predict the current coding block, while inter-frame prediction uses information in other frame images (also called reference frames) that have been reconstructed before to predict the current coding block. Specifically, in the embodiment of the present application, the encoding control module 202 is configured to decide to select intra prediction or inter prediction.

When the intra-frame prediction mode is selected, the intra-frame prediction 203 includes obtaining reconstructed blocks of adjacent blocks coded around the current coding block as reference blocks, calculating predicted values to generate prediction blocks by adopting a prediction mode method based on pixel values of the reference blocks, subtracting corresponding pixel values of the current coding block and the prediction blocks to obtain residual errors of the current coding block, and transforming 204, quantizing 205 and entropy coding 210 the residual errors of the current coding block to form a code stream of the current coding block. Furthermore, after all the coding blocks of the current coding frame are subjected to the coding process, a part of the coding code stream of the coding frame is formed. In addition, the control and reference data generated in intra prediction 203 is also entropy encoded 210, forming part of the encoded code stream.

In particular, the transform 204 is used to remove correlation of the residuals of the image blocks in order to improve coding efficiency. For the transform of the residual data of the current coding block, a two-dimensional Discrete Cosine Transform (DCT) transform and a two-dimensional Discrete Sine Transform (DST) transform are usually adopted, for example, the residual information of the coding block is multiplied by an N × M transform matrix and a transpose matrix thereof, respectively, at the encoding end, and the transform coefficient of the current coding block is obtained after the multiplication.

The compression efficiency is further improved by quantization 205 after the transform coefficients are generated, the quantized transform coefficients are quantized to obtain quantized coefficients, and then entropy coding 210 is performed on the quantized coefficients to obtain a residual code stream of the current coding block, wherein the entropy coding method includes, but is not limited to, Content Adaptive Binary Arithmetic Coding (CABAC) entropy coding.

Specifically, the encoded neighboring blocks in the intra prediction 203 process are: before the current coding block is coded, the residual error generated in the coding process of the adjacent block is transformed 204, quantized 205, dequantized 206 and inverse transformed 207, and then is added to the prediction block of the adjacent block to obtain a reconstructed block. Correspondingly, inverse quantization 206 and inverse transform 207 are inverse processes of quantization 206 and transform 204, and are used to recover residual data prior to quantization and transformation.

As shown in fig. 2, when the inter prediction mode is selected, the inter prediction process includes Motion Estimation (ME) 208 and Motion Compensation (MC) 209. Specifically, motion estimation 208 is performed according to a reference frame image in the reconstructed video frame, an image block most similar to the current coding block is searched in one or more reference frame images according to a certain matching criterion as a matching block, and a relative displacement between the matching block and the current coding block is a Motion Vector (MV) of the current coding block. The current coding block is then motion compensated 209 based on the motion vector and the reference frame to obtain a prediction block for the current coding block. And subtracting the original value of the pixel of the coding block from the corresponding pixel value of the prediction block to obtain the residual error of the coding block. The residual of the current coding block is transformed 204, quantized 205 and entropy coded 210 to form a part of the coded stream of the coded frame. In addition, the control and reference data generated in motion compensation 209 is also encoded by entropy coding 210, forming part of the encoded code stream.

As shown in fig. 2, the reconstructed video frame is a video frame obtained after being filtered 211. The filtering 211 is used to reduce compression distortion such as blocking effect and ringing effect generated in the encoding process, the reconstructed video frame is used to provide a reference frame for inter-frame prediction in the encoding process, and the reconstructed video frame is output as a final decoded video after post-processing in the decoding process.

Inter prediction modes in the video coding standard may include an Advanced Motion Vector Prediction (AMVP) mode and a Merge (Merge) mode, among others.

For the AMVP mode, a Motion Vector Prediction (MVP) value may be determined first, after obtaining the MVP, a start point of motion estimation may be determined according to the MVP, motion search may be performed near the start point, an optimal MV may be obtained after the search is completed, a position of a reference block in a reference image is determined by the MV, a residual block is obtained by subtracting a current block from the reference block, a Motion Vector Difference (MVD) is obtained by subtracting the MVP from the MV, and the MVD is transmitted to a decoding end through a code stream.

For the Merge mode, an MVP may be determined first, and the MVP is determined as an MV directly, in order to obtain the MVP, an MVP candidate list (Merge candidate list) may be constructed first, where the MVP candidate list may include at least one candidate MVP, each candidate MVP may correspond to an index, after an encoding end selects an MVP from the MVP candidate list, the MVP index may be written into a code stream, and a decoding end may find an MVP corresponding to the index from the MVP candidate list according to the index, so as to implement decoding of an image block.

In order to understand the Merge mode more clearly, the operation flow of encoding using the Merge mode will be described below.

Step one, obtaining an MVP candidate list;

selecting an optimal MVP from the MVP candidate list, and simultaneously obtaining an index of the MVP in the MVP candidate list;

step three, taking the MVP as the MV of the current block;

step four, determining the position of the reference block in the reference image according to the MV;

subtracting the current block from the reference block to obtain a residual block;

and step six, transmitting the residual data and the index of the MVP to a decoder.

It should be understood that the following flow is just one specific implementation of the Merge mode. The Merge mode may also have other implementations.

In the framework of video coding based on block motion compensation, Motion Estimation (ME) is one of the most important links. Since the natural object motion is not necessarily in the basic unit of integer pixels, it is possible to have units of half pixels (1/2 pixels), 1/4 pixels, and even 1/8 pixels. If the integer-pixel-precision motion estimation is still used in this case, the problem of inaccurate search may be caused, which results in large motion compensation residual amplitude and affects the coding efficiency. In this case, therefore, motion estimation should be performed with sub-pixel accuracy. Sub-pixel precision motion estimation means that interpolation needs to be carried out on a reference image, and a good interpolation mode can greatly improve the performance of motion compensation.

As described above, in the current Merge mode, after determining an MVP from the MVP candidate list and determining a reference block of the current block based on the MVP, if the determined MVP is a spatial candidate MV, an interpolation filter directly inherits the spatial MV when pixel interpolation is performed on the reference block. If the determined MVP is a temporal candidate MV, an 8-tap (8-tap) interpolation filter is used by default when pixel interpolation is performed on the reference block. If the determined MVP is a History Motion Vector Prediction (HMVP), then pixel interpolation of the reference block is performed using an interpolation filter that directly inherits the HMVP. If the determined MVP is a pairwise-average (Pairwise) candidate MV, the following steps are required: firstly, judging whether two candidate MVs for averaging use the same interpolation filter, if so, using the same interpolation filter as the two candidate MVs when performing pixel interpolation on the reference block; if not, an 8-tap interpolation filter is used by default when pixel interpolation is performed on the reference block. The Merge mode described above is cumbersome in determining the interpolation filter used for the reference block, and introduces unnecessary coding and decoding burdens.

Therefore, the following scheme provided by the embodiment of the application can reduce the complexity of encoding and decoding and save the hardware overhead.

Fig. 3 shows a schematic flow diagram of a video coding method 300 according to an embodiment of the present application. The method 300 may be used on the encoding side as well as on the decoding side. As shown in fig. 3, the method 300 includes the steps of:

in step S310, a motion vector candidate list is acquired, and a motion vector is selected from the motion vector candidate list.

In step S330, a reference block of the current block is determined according to the selected motion vector.

In step S330, a default interpolation filter is used to interpolate pixels of the reference block.

In the method 300, the MVP candidate list may be obtained, and the MVP may be determined from the MVP candidate list to predict the position of the reference block of the current block, and after the position of the reference block is obtained, a tedious process of selecting and judging the type of an interpolation filter is not required, but a default interpolation filter is directly used to perform pixel interpolation on the reference block, so that the encoding and decoding process may be saved, the encoding and decoding complexity may be reduced, and the hardware overhead may be saved on the premise that the encoding and decoding efficiency is not reduced.

In order to more clearly understand the present application, a specific implementation of the embodiments of the present application will be described below. It is to be understood that the following description may apply to the above method 300.

The motion vector candidate list (i.e., MVP candidate list) in the embodiment of the present application may include a spatial motion vector candidate, a temporal motion vector candidate, an HMVP motion vector candidate, a pairwise motion vector candidate, and a zero motion vector, and some or all of the motion vectors are used to form the motion vector candidate list in the embodiment of the present application. The establishment of the above-described candidate motion vectors and motion vector candidate list is briefly described below.

In the embodiment of the present application, the spatial domain candidate motion vector may be as shown in fig. 4, where a gray block in fig. 4 is a current coding unit (i.e., a current block), a1 represents a coding unit at the bottom left of the current coding unit, B1 represents a coding unit at the right of the current coding unit, B0 and a0 represent coding units closest to the top right and bottom left of the current prediction unit, respectively, and B2 represents a coding unit closest to the top left of the current coding unit. Illustratively, the spatial domain provides 4 candidate MVs at the maximum, which are selected in the order of a1- > B1- > B0- > a0- > (B2) when selecting the spatial motion vector, where B2 is a complement.

In an embodiment of the present application, the temporal candidate motion vector may be derived based on motion information of a coding unit of the current coding unit at a corresponding position in a neighboring coded picture. Unlike the spatial domain, the temporal candidate motion vector cannot directly use the motion information of the candidate block, but needs to be scaled according to the position relationship with the reference image. Illustratively, a temporal candidate motion vector is generally provided, and as shown in fig. 5, the gray block is the current coding unit (i.e., the current block), and the candidate motion vector is obtained from the position of C0 in the previous frame (forward prediction) or the next frame (backward prediction) of the current frame, and if C0 is not available, the motion vector of the coding unit corresponding to the C1 position is adopted as the candidate motion vector.

In an embodiment of the present application, the HMVP candidate motion vector stores motion information for the encoded block. In the Merge mode, if the motion vector candidate list is not filled with the previous spatial domain and temporal domain candidate MVs, the motion vector candidate list is filled with the HMVP candidate motion vectors, and the HMVP candidate motion vectors storing the motion information of the coded block are filled into the motion vector candidate list from near to far according to the distance between the coded block and the current coded block.

In an embodiment of the present application, when the motion vector candidate list is not filled with the spatial domain candidate motion vector, the temporal domain candidate motion vector, and the HMVP candidate motion vector, the padding of the motion vector candidate list may be performed using a pairwise mode. The pairwise averages at least two existing candidate motion vectors (for example, the first two candidate motion vectors in the list) in the motion vector candidate list to obtain a new candidate motion vector, and fills the new candidate motion vector into the motion vector candidate list. If one of the two candidate motion vectors has no motion vector in the forward or backward reference direction, only using the one with the motion vector to fill the candidate motion vector; if the two candidate motion vectors do not have a motion vector in a certain reference direction, the padding of the pairwise candidate motion vectors is not carried out.

In an embodiment of the present application, if the spatial, temporal, HMVP, and pairwise candidate motion vectors still fill the motion vector candidate list, the motion vector candidate list is filled with zero motion vectors until the motion vector candidate list is filled.

Based on the above-mentioned constructed motion vector candidate list, the method 300 may acquire the motion vector candidate list and select a motion vector for determining a reference block of the current block from the motion vector candidate list at step S310. For example, an optimal motion vector may be selected from the motion vector candidate list for determining a reference block of the current block. Illustratively, the optimal motion vector may be selected by: traversing all candidate motion vectors in the motion vector candidate list, calculating rate distortion cost, and finally selecting the candidate motion vector with the minimum rate distortion cost as the optimal motion vector. The encoding and decoding end can establish the motion vector candidate list according to the same mode, and the encoder only needs to transmit the index of the optimal motion vector in the motion vector candidate list, so that the encoding bit number of the motion information is greatly saved.

Based on the motion vector selected in step S310, the position of the reference block in the reference picture (each candidate motion vector corresponds to a reference picture) can be determined from the motion vector, i.e. the reference block (or prediction block) of the current block. Based on the reference block obtained in step S320, the reference block in the reference image may be subjected to pixel interpolation in step S330. In the embodiment of the present application, the current block may adopt a unidirectional prediction mode, and may also adopt a dual motion vector mode. The dual motion vector mode includes a dual forward prediction mode, a dual backward prediction mode, and a bi-directional prediction mode. The dual forward prediction mode includes two forward motion vectors and the dual backward prediction mode includes two backward motion vectors. The bidirectional prediction mode comprises a forward prediction mode and a backward prediction mode. When the current block employs the unidirectional prediction mode, the number of motion vector candidate lists acquired in step S310 is one. The motion vector candidate list may be determined based on a forward reference frame or a backward reference frame, for example. When the current block adopts the dual motion vector mode, the number of motion vector candidate lists acquired in step S310 is two. Illustratively, for a bi-directional prediction mode in the dual motion vector mode, the two motion vector candidate lists may be determined based on a forward reference frame and a backward reference frame, respectively. For a dual forward prediction mode in the dual motion vector mode, the two motion vector candidate lists are determined based on forward reference frames, respectively. For a backward prediction mode in the dual motion vector mode, the two motion vector candidate lists are determined based on backward reference frames, respectively. When the current block adopts the dual motion vector mode, since two motion vector candidate lists are acquired, motion vectors are respectively selected from the two motion vector candidate lists at step S310, and reference blocks are respectively determined according to the respectively selected motion vectors at step S320, and pixel interpolation is respectively performed according to the respectively determined reference blocks at step S330.

In the embodiment of the present application, a default interpolation filter is used to perform pixel interpolation on the reference block obtained in step S320. The default interpolation filter may be a default tap number interpolation filter or any other interpolation filter suitable for pixel interpolation of the reference block. The interpolation precision may be, for example, sub-pixel precision such as 1/2-pixel precision or 1/4-pixel precision, or integer-pixel precision such as 1-pixel precision, 2-pixel precision or 4-pixel precision.

In an embodiment of the present application, the motion vector selected at step S320 may be a pairwise motion vector. The pairwise motion vector is determined from at least two candidate motion vectors in the motion vector candidate list obtained in step S310 (e.g. the pairwise motion vector is equal to an average, such as a weighted average, of the at least two candidate motion vectors), which may or may not be the same as the respective corresponding interpolation filter. Here, the interpolation filter corresponding to the candidate motion vector means: the interpolation filter is used when performing pixel interpolation on a reference block of a neighboring block of a current block, where the neighboring block is a block corresponding to a motion vector recorded in a motion vector candidate list, in other words, the motion vector recorded in the motion vector candidate list is a motion vector of the neighboring block of the current block. Therefore, the interpolation filter corresponding to the candidate motion vector is the interpolation filter used when performing pixel interpolation on the reference block of the neighboring block of the current block. The neighboring blocks may include temporal neighboring blocks or spatial neighboring blocks. The temporal neighboring block refers to a block in a previous frame image or a subsequent frame image of a current frame where the current block is located, and the spatial neighboring block refers to other encoded blocks in the current frame.

In the existing method, when the selected motion vector is a pairwise motion vector, two candidate motion vectors for determining the pairwise motion vector need to be compared with each other to determine their corresponding interpolation filters; if the interpolation filters corresponding to the two candidate motion vectors are the same, adopting the interpolation filters corresponding to the two candidate motion vectors to carry out pixel interpolation on the reference block of the current block; and if the interpolation filters corresponding to the two candidate motion vectors are different, performing pixel interpolation on the reference block of the current block by adopting an 8-tap interpolation filter by default. The method needs to additionally introduce comparison and judgment processes, increases the burden of hardware to a certain extent, makes the running water more tense and introduces delay. In the method 300 of the present application, when the motion vector selected in step S320 is a pairwise motion vector, the above comparison and determination operations are not performed, and no matter whether the interpolation filters corresponding to the two candidate motion vectors for determining the pairwise motion vector are the same or not, the default interpolation filter is directly used to perform pixel interpolation on the reference block of the current block. Therefore, the logic can be simplified, and the hardware overhead can be saved. The default interpolation filter may be an 8-tap interpolation filter, a 6-tap interpolation filter, or another tapped interpolation filter, which is not limited in this embodiment of the present application. The 6-tap interpolation filter has a flatter characteristic, the required bandwidth is smaller, and the interpolation complexity is smaller.

In an embodiment of the present application, the motion vector selected at step S320 may be a spatial candidate motion vector. In the conventional method, when the selected motion vector is a spatial candidate motion vector, an interpolation filter inherits the spatial candidate motion vector when performing pixel interpolation on a reference block of the current block. In the method 300 of the present application, when the motion vector selected in step S320 is a spatial candidate motion vector, no inheritance operation is performed, but a default interpolation filter (e.g., 8-tap interpolation filter or 6-tap interpolation filter) is directly used to interpolate pixels of the reference block of the current block. Compared with the prior art, the method and the device have the advantages that the type of the interpolation filter corresponding to the spatial domain candidate motion vector acquired from the storage buffer is not required, so that the processing flow is saved, and the complexity of encoding and decoding is reduced.

In an embodiment of the present application, the motion vector selected at step S320 may be a temporal candidate motion vector. In the existing method, when the selected motion vector is a temporal candidate motion vector, an 8-tap interpolation filter is used by default when pixel interpolation is performed on a reference block of the current block. In the method 300 of the present application, when the motion vector selected in step S320 is a spatial candidate motion vector, a 6-tap interpolation filter may be used to interpolate pixels of the reference block of the current block, in addition to the 8-tap interpolation filter. As mentioned above, the 6-tap interpolation filter has a flatter characteristic, requires a smaller bandwidth, and has a smaller interpolation complexity.

In an embodiment of the present application, the motion vector selected at step S320 may be a history-based candidate motion vector. In the conventional method, when the selected motion vector is a history-based candidate motion vector, the interpolation filter corresponding to the history-based candidate motion vector is directly inherited when pixel interpolation is performed on the reference block. In the method 300 of the present application, when the motion vector selected at step S320 is a history-based candidate motion vector, no inheritance operation is performed, but pixel interpolation is performed on the reference block of the current block directly using a default interpolation filter (e.g., an 8-tap interpolation filter or a 6-tap interpolation filter). Compared with the prior art, the method and the device for obtaining the interpolation filter corresponding to the history-based candidate motion vector do not need to execute the type of the interpolation filter obtained from the buffer, so that the processing flow is saved, and the complexity of encoding and decoding is reduced.

In the above embodiment, no matter what type of motion vector the motion vector selected in step S320 is, a default interpolation filter may be used to perform pixel interpolation on the reference block of the current block, so that the logic may be simplified, the complexity of encoding and decoding may be reduced, and the hardware overhead may be saved.

In yet another embodiment of the present application, the motion vector selected in step S320 may include only the pairwise candidate motion vector. In this embodiment, only when the motion vector selected in step S320 is a pairwise candidate motion vector, the reference block of the current block is subjected to pixel interpolation by using the default interpolation filter, so that the original comparison and determination logic is omitted, the encoding and decoding complexity can be reduced, and the hardware overhead can be saved.

Based on the above description, according to the video encoding and decoding method of the embodiment of the present application, after determining the motion vector predictor from the motion vector predictor candidate list and determining the reference block of the current block based on the motion vector predictor, the reference block is directly subjected to pixel interpolation by using the default interpolation filter without complicated selection and judgment logic, so that the encoding and decoding complexity can be reduced without reducing the encoding and decoding efficiency, and the hardware cost can be saved.

A video codec device provided according to another aspect of the present application is described below with reference to fig. 6. Fig. 6 shows a schematic block diagram of a video codec device 600 according to an embodiment of the present application. The video codec device 600 includes a memory 610 and a processor 620. The video encoding and decoding apparatus 600 may be implemented as an encoder, a decoder, a mobile phone, a camera, an unmanned aerial vehicle, or other apparatuses, products, or devices that can implement encoding and decoding processes.

The memory 610 stores therein programs for implementing respective steps in the video coding and decoding method according to an embodiment of the present application. The processor 620 is configured to execute the program stored in the memory 610 to perform the corresponding steps of the video coding and decoding method according to the embodiment of the present application.

In one embodiment of the present application, the program, when executed by the processor 620, causes the video codec device 600 to perform the following steps: obtaining a motion vector candidate list, and selecting a motion vector from the motion vector candidate list; determining a reference block of the current block according to the selected motion vector; and adopting a default interpolation filter to perform pixel interpolation on the reference block.

In an embodiment of the application, the selected motion vector is a pair-wise averaged motion vector determined from at least two candidate motion vectors of the motion vector candidate list, the at least two candidate motion vectors each having a same or different interpolation filter.

In one embodiment of the present application, the default interpolation filter is a default tap number of interpolation filters.

In one embodiment of the present application, the default tap number of interpolation filters includes an 8-tap interpolation filter or a 6-tap interpolation filter.

In one embodiment of the present application, the pixel interpolation of the reference block with the default interpolation filter, which is executed by the video codec device 600 when the program is executed by the processor 620, includes: interpolating pixels of the reference block using the default interpolation filter of 1/2 pixel precision.

In an embodiment of the application said selected motion vector is a pairwise average motion vector, said pairwise average motion vector being determined from one of said motion vector candidates in said list of motion vector candidates.

In an embodiment of the application, the selected motion vector is a temporal candidate motion vector.

In one embodiment of the present application, the selected motion vector is a spatial candidate motion vector.

In one embodiment of the application, the selected motion vector is a history-based candidate motion vector.

In one embodiment of the present application, the current block employs a unidirectional prediction mode, and the number of the motion vector candidate lists is one.

In one embodiment of the present application, the motion vector candidate list is determined based on a forward reference frame or a backward reference frame.

In one embodiment of the present application, the current block employs a bi-directional prediction mode, and the number of the motion vector candidate lists is two.

In one embodiment of the application, the two motion vector candidate lists are determined based on a forward reference frame and a backward reference frame, respectively.

In an embodiment of the application, the pair of average motion vectors is an average of the at least two candidate motion vectors.

In an embodiment of the application, the pair-wise averaged motion vector is a weighted average of the at least two candidate motion vectors.

Furthermore, according to the embodiment of the present application, there is also provided a computer-readable storage medium, on which program instructions are stored, which when executed by a computer or a processor are used for executing the corresponding steps of the video coding and decoding method of the embodiment of the present application. The computer-readable storage medium may include, for example, a memory card of a smart phone, a memory component of a tablet computer, a hard disk of a personal computer, a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), a USB memory, or any combination of the above computer-readable storage media. The computer-readable storage medium may be any combination of one or more computer-readable storage media.

In one embodiment, the computer program instructions may, when executed by a computer, perform a video codec method according to an embodiment of the present application.

The embodiment of the present application further provides a video coding and decoding device, which includes a processor and a memory, where the memory is used to store program instructions, and the processor is used to call the program instructions to execute the video coding and decoding methods according to the various embodiments of the present application.

Embodiments of the present application also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a computer, so that the computer executes the method of the above-mentioned method embodiments.

Embodiments of the present application also provide a computer program product containing instructions, which when executed by a computer, cause the computer to perform the method of the above method embodiments.

It should also be understood that the various embodiments described in this specification can be implemented individually or in combination, and the examples in this application are not limited thereto.

The video encoding and decoding method, device and computer readable storage medium according to the embodiment of the present application are exemplarily described above. Based on the above description, according to the video encoding and decoding method, apparatus and computer readable storage medium of the embodiments of the present application, after determining the motion vector predictor from the motion vector predictor candidate list and determining the reference block of the current block based on the motion vector predictor, it is not necessary to select and determine the type of the interpolation filter in a tedious manner, but rather, the default interpolation filter is directly used to perform pixel interpolation on the reference block, so that the encoding and decoding complexity can be reduced without reducing the encoding and decoding efficiency, and the hardware overhead can be saved.

Although the example embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the above-described example embodiments are merely illustrative and are not intended to limit the scope of the embodiments of the present application thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the embodiments of the present application. All such changes and modifications are intended to be included within the scope of the embodiments of the present application as claimed in the appended claims.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another device, or some features may be omitted, or not executed.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the description of exemplary embodiments of the present application, various features of the present application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the method of the embodiments of the present application should not be interpreted to reflect the following intentions: that is, the claimed embodiments of the application require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of an embodiment of this application.

It will be understood by those skilled in the art that all of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where such features are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the embodiments of the application and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some of the modules according to embodiments of the present application. Embodiments of the present application may also be implemented as apparatus programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing embodiments of the present application may be stored on a computer-readable storage medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the embodiments of the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The embodiments of the application can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

The above description is only for the specific implementation of the embodiments of the present application or the description thereof, and the scope of the embodiments of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the embodiments disclosed in the present application, and all the changes or substitutions should be covered by the scope of the embodiments of the present application. The protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.

Claims

1. A video encoding and decoding method, the method comprising:

obtaining a motion vector candidate list, and selecting a motion vector from the motion vector candidate list;

determining a reference block of the current block according to the selected motion vector;

and adopting a default interpolation filter to perform pixel interpolation on the reference block.

2. The method according to claim 1, wherein the selected motion vector is a pairwise averaged motion vector determined from at least two candidate motion vectors in the motion vector candidate list, each of the at least two candidate motion vectors having a same or different respective interpolation filter.

3. The method of claim 1 or 2, wherein the default interpolation filter is a default tap number of interpolation filters.

4. The method of claim 3, wherein the default tap number of interpolation filters comprises an 8-tap interpolation filter or a 6-tap interpolation filter.

5. The method of any of claims 1-4, wherein said pixel interpolating the reference block with a default interpolation filter comprises:

interpolating pixels of the reference block using the default interpolation filter of 1/2 pixel precision.

6. The method according to claim 1, wherein the selected motion vector is a pairwise average motion vector determined from one of the motion vector candidates in the motion vector candidate list.

7. The method of claim 1, wherein the selected motion vector is a temporal candidate motion vector.

8. The method of claim 1, wherein the selected motion vector is a spatial candidate motion vector.

9. The method of claim 1, wherein the selected motion vector is a history-based candidate motion vector.

10. The method of any of claims 1-9, wherein the current block employs a uni-directional prediction mode, and wherein the number of motion vector candidate lists is one.

11. The method of claim 10, wherein the motion vector candidate list is determined based on a forward reference frame or a backward reference frame.

12. The method of any of claims 1-9, wherein the current block employs a dual motion vector mode, and wherein the number of motion vector candidate lists is two.

13. The method of claim 12, wherein the dual motion vector mode comprises a bi-directional prediction mode, and wherein the two motion vector candidate lists are determined based on a forward reference frame and a backward reference frame, respectively.

14. The method of claim 2, wherein the pair of average motion vectors is an average of the at least two candidate motion vectors.

15. The method of claim 2, wherein the pairwise average motion vector is a weighted average of the at least two candidate motion vectors.

16. A video encoding and decoding method, the method comprising:

and if the selected motion vector is a paired average motion vector, performing pixel interpolation on the reference block by adopting a default interpolation filter, wherein the paired average motion vector is determined according to at least two candidate motion vectors in the motion vector candidate list, and the interpolation filters corresponding to the at least two candidate motion vectors are the same or different.

17. The method of claim 16, wherein the default interpolation filter is a default tap number interpolation filter.

18. The method of claim 17, wherein the default tap number of interpolation filters comprises an 8-tap interpolation filter or a 6-tap interpolation filter.

19. The method of any of claims 16-18, wherein said pixel interpolating the reference block with a default interpolation filter comprises:

20. The method of any of claims 16-19 wherein the current block employs a uni-directional prediction mode and the number of motion vector candidate lists is one.

21. The method of claim 20, wherein the motion vector candidate list is determined based on a forward reference frame or a backward reference frame.

22. The method of any of claims 16-19 wherein the current block employs dual motion vector mode and the number of motion vector candidate lists is two.

23. The method of claim 22, wherein the dual motion vector mode comprises a bi-directional prediction mode, and wherein the two motion vector candidate lists are determined based on a forward reference frame and a backward reference frame, respectively.

24. The method according to any of claims 16-23, wherein said pair of average motion vectors is an average of said at least two candidate motion vectors.

25. The method according to any of claims 16-23, wherein said pair-wise average motion vector is a weighted average of said at least two candidate motion vectors.

26. A video encoding and decoding apparatus comprising a memory and a processor, wherein the memory stores thereon a computer program executed by the processor, and the processor performs the following operations as a result of reading the computer program stored in the memory:

27. The apparatus of claim 26, wherein the selected motion vector is a pairwise mean motion vector, and wherein the pairwise mean motion vector is determined according to at least two candidate motion vectors in the motion vector candidate list, and wherein interpolation filters corresponding to the at least two candidate motion vectors are the same or different.

28. The apparatus of claim 26 or 27, wherein the default interpolation filter is a default tap number interpolation filter.

29. The apparatus of claim 28, wherein the default tap number of interpolation filters comprises an 8-tap interpolation filter or a 6-tap interpolation filter.

30. The apparatus according to any of claims 26-29, wherein said pixel interpolating the reference block using a default interpolation filter comprises:

31. The apparatus of claim 26, wherein the selected motion vector is a pairwise average motion vector determined from one of the candidate motion vectors in the motion vector candidate list.

32. The apparatus of claim 26, wherein the selected motion vector is a temporal candidate motion vector.

33. The apparatus of claim 26, wherein the selected motion vector is a spatial candidate motion vector.

34. The apparatus of claim 26, wherein the selected motion vector is a history-based candidate motion vector.

35. The apparatus according to any of claims 26-34, wherein the current block employs a uni-directional prediction mode, and the number of motion vector candidate lists is one.

36. The apparatus of claim 35, wherein the list of motion vector candidates is determined based on a forward reference frame or a backward reference frame.

37. The apparatus of any of claims 26-34 wherein the current block employs dual motion vector mode and the number of motion vector candidate lists is two.

38. The apparatus of claim 37, wherein the dual motion vector mode comprises a bi-directional prediction mode, and wherein the two motion vector candidate lists are determined based on a forward reference frame and a backward reference frame, respectively.

39. The apparatus of claim 27, wherein the pair of average motion vectors is an average of the at least two candidate motion vectors.

40. The apparatus of claim 27, wherein the pairwise average motion vector is a weighted average of the at least two candidate motion vectors.

41. A video encoding and decoding apparatus comprising a memory and a processor, wherein the memory stores thereon a computer program executed by the processor, and the processor performs the following operations as a result of reading the computer program stored in the memory:

42. The apparatus of claim 41, wherein the default interpolation filter is a default tap number interpolation filter.

43. The apparatus of claim 42, wherein the default tap number of interpolation filters comprises an 8-tap interpolation filter or a 6-tap interpolation filter.

44. The apparatus according to any of claims 41-43, wherein said pixel interpolating the reference block using a default interpolation filter comprises:

45. The apparatus of any of claims 41-44, wherein the current block employs a uni-directional prediction mode, and wherein the number of motion vector candidate lists is one.

46. The apparatus of claim 45, wherein the motion vector candidate list is determined based on a forward reference frame or a backward reference frame.

47. The apparatus of any of claims 41-46 wherein the current block employs dual motion vector mode and the number of motion vector candidate lists is two.

48. The apparatus of claim 47, wherein the dual motion vector mode comprises a bi-directional prediction mode, and wherein the two motion vector candidate lists are determined based on a forward reference frame and a backward reference frame, respectively.

49. The apparatus according to any of claims 41-48, wherein the pair of average motion vectors is an average of the at least two candidate motion vectors.

50. The apparatus according to any of claims 41-48, wherein the pair-wise average motion vector is a weighted average of the at least two candidate motion vectors.

51. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, causes the processor to carry out a video codec method according to any one of claims 1-25.