CN111343461A

CN111343461A - Video decoding method, video encoding method and device

Info

Publication number: CN111343461A
Application number: CN201911304627.5A
Authority: CN
Inventors: 陈漪纹; 王祥林
Original assignee: Reach Best Technology Co Ltd
Current assignee: Reach Best Technology Co Ltd; Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2018-12-18
Filing date: 2019-12-17
Publication date: 2020-06-26
Anticipated expiration: 2039-12-17
Also published as: CN111343461B

Abstract

The disclosure relates to a video decoding method, a video encoding method and a video encoding device, belonging to the technical field of video compression and encoding, wherein the video decoding method comprises the following steps: the method comprises the steps of receiving a video stream, wherein the video stream comprises decoding mode information and video decoding information of a video block, when the decoding mode of the video block is determined to be MMVD, obtaining motion description information of the video block from the video decoding information, such as index information of a target candidate motion vector of the video block and description information of target vector correction information for correcting the target candidate motion vector, selecting the target candidate motion vector from a built merging candidate list of the video block according to the index information, correcting the target candidate motion vector according to the description information, and decoding the video block according to the corrected target candidate motion vector and pixel residual error information of the video block obtained from the video decoding information.

Description

Video decoding method, video encoding method and device

This application claims priority from the united states patent application entitled "Motion recording with Motion Vector Difference" filed at 18.12.2018, with the united states intellectual property office, application number 62/781,555, the entire contents of which are incorporated herein by reference.

Technical Field

The present disclosure relates to the field of video compression and encoding technologies, and in particular, to a video decoding method, a video encoding method, and an apparatus.

Background

An important objective of video codec techniques is to compress video data into a form that uses a lower bit rate while avoiding or minimizing degradation of video quality.

A final Motion Vector Expression (UMVE) is adopted in a multifunctional Video Coding and decoding (VVC), and the UMVE is integrated into reference software VTM-3.0. UMVE is then renamed to Merge Mode with Motion vector difference (MMVD). MMVD is used for skip (skip) mode or merge (merge) mode by the proposed motion vector representation method.

MMVD reuses the merge candidate list, the same as used in VVC. In the merge candidate list, a candidate Motion Vector with the best coding performance may be selected as the best candidate Motion Vector (or called a target candidate Motion Vector), and the best candidate Motion Vector may be further extended by the proposed Motion Vector expression method, which includes the start point, Motion magnitude, and Motion direction of Motion Vector Difference (MVD).

In the related art, when encoding and decoding are performed using MMVD, the encoding end and the decoding end have previously defined a motion amplitude and a motion direction for expressing MVDs. Subsequently, the MVD of each video block is expressed by using the agreed motion amplitude and motion direction for both video blocks in the video sequence. However, in practice, the MVD difference between video blocks in a video sequence is large, and it is difficult to accurately express the MVD of each video block by using the predetermined motion amplitude and motion direction, and when the MVD expression of a video block is not accurate, the pixel residual information of the video block is large, and the coding efficiency is correspondingly low.

Disclosure of Invention

The present disclosure provides a video decoding method, a video encoding method and an apparatus thereof, so as to at least solve the problem of low encoding efficiency when using MMVD in the related art. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a video decoding method, including:

receiving a video stream, wherein the video stream comprises decoding mode information and video decoding information of a video block, and the video decoding information comprises motion description information and pixel residual information of the video block;

when the decoding mode of the video block is determined to be a merging mode with a motion vector difference according to the decoding mode information, obtaining motion description information of the video block from the video decoding information, wherein the motion description information comprises index information of a target candidate motion vector of the video block and description information of target vector correction information for correcting the target candidate motion vector;

selecting the target candidate motion vector from the constructed merging candidate list of the video block according to the index information of the target candidate motion vector;

correcting the selected target candidate motion vector according to the description information of the target vector correction information;

and decoding the video block according to the modified target candidate motion vector and the pixel residual information of the video block acquired from the video decoding information.

In a possible implementation manner, before selecting the target candidate motion vector from the constructed merge candidate list of the video block according to the index information of the target candidate motion vector, the method further includes:

and for each candidate motion vector in the merging candidate list, if the candidate motion vector with the difference value smaller than a preset value exists in the merging candidate list, deleting the candidate motion vector from the merging candidate list.

adjusting a candidate motion vector determined in a bi-predictive manner in the merge candidate list to a head of the merge candidate list.

In a possible implementation, modifying the selected target candidate motion vector according to the description information of the target vector modification information includes:

if the description information comprises motion amplitude description information, correcting the target candidate motion vector according to the motion amplitude description information and appointed motion direction description information;

if the description information comprises motion direction description information, correcting the target candidate motion vector according to the motion direction description information and appointed motion amplitude description information;

and if the description information comprises motion direction description information and motion amplitude description information, correcting the target candidate motion vector according to the motion amplitude description information and the motion direction description information.

In a possible implementation manner, before modifying the selected target candidate motion vector according to the description information of the target vector modification information, the method further includes:

and if the target candidate motion vector is determined to be an intra block copy vector, rounding the motion amplitude value described by the motion amplitude description information.

In a possible implementation manner, after the modifying the selected target candidate motion vector according to the description information of the target vector modification information, the method further includes:

and if the target candidate motion vector is determined to be an intra block copy vector, rounding the coordinate value of the corrected target candidate motion vector in the rectangular coordinate system.

In a possible implementation manner, if the prediction mode of the video block is bi-prediction, decoding the video block according to the modified target candidate motion vector and pixel residual information of the video block obtained from the video decoding information includes:

acquiring first pixel information of the video block from a first reference video frame and second pixel information of the video block from a second reference video frame according to the corrected target candidate motion vector, wherein the video frame where the video block is located between the first reference video frame and the second reference video frame;

according to the pixel weight information, carrying out weighting processing on corresponding pixels in the first pixel information and the second pixel information to obtain third pixel information of the video block, wherein the pixel weight information is received through signaling or is agreed in advance;

and correcting the third pixel information of the video block according to the pixel residual information of the video block to obtain the decoded video block.

According to a second aspect of the embodiments of the present disclosure, there is provided a video encoding method, including:

acquiring a video sequence;

when the coding mode of any video block in the video sequence is a merging mode with a motion vector difference, combining vector correction information in a vector correction information base and a candidate motion vector in a built merging candidate list of the video block, wherein the vector correction information base is generated according to a preset motion amplitude and a preset motion direction;

determining a target combination according to the coding performance corresponding to each combination;

generating motion description information of the video block according to a target candidate motion vector and target vector correction information in the target combination, wherein the motion description information comprises index information of the target candidate motion vector and description information of the target vector correction information, and determining pixel residual information of the video block when the video block is coded by the target combination;

and sending a video stream, wherein the video stream comprises decoding mode information and video decoding information of the video block, and the video decoding information comprises motion description information and pixel residual information of the video block.

In a possible implementation manner, before combining the vector modification information in the vector modification information base and the candidate motion vector in the constructed merge candidate list of the video block, the method further includes:

In a possible implementation, combining the vector modification information in the vector modification information base and the candidate motion vector in the constructed merge candidate list of the video block includes:

for each candidate motion vector in the merging candidate list, if the candidate motion vector is an intra block copy vector, selecting vector correction information with a motion amplitude value as an integer from the vector correction information base and combining the candidate motion vector; and

and if the candidate motion vector is not the intra block copy vector, selecting all vector correction information from the vector correction information base and combining the candidate motion vector.

In a possible implementation, the transmission level of the description information of the target vector modification information is a video sequence level, a video frame level, a slice level, or a video block level.

In a possible implementation, the description information of the target vector modification information includes motion direction description information and/or motion amplitude description information.

In a possible implementation manner, when the description information of the target vector modification information includes motion direction description information and motion amplitude description information, the transmission levels of the motion direction description information and the motion amplitude description information are the same or different.

According to a third aspect of the embodiments of the present disclosure, there is provided a video decoding apparatus comprising:

the video decoding device comprises a receiving module, a decoding module and a decoding module, wherein the receiving module is configured to receive a video stream, the video stream comprises decoding mode information and video decoding information of a video block, and the video decoding information comprises motion description information and pixel residual information of the video block;

an obtaining module configured to perform, when it is determined that the decoding manner of the video block is the merge mode having a motion vector difference according to the decoding manner information, obtaining motion description information of the video block from the video decoding information, the motion description information including index information of a target candidate motion vector of the video block and description information of target vector modification information for modifying the target candidate motion vector;

a selection module configured to select the target candidate motion vector from the constructed merge candidate list of the video block according to index information of the target candidate motion vector;

a correction module configured to perform correction of the selected target candidate motion vector according to description information of the target vector correction information;

a decoding module configured to perform decoding of the video block according to the modified target candidate motion vector and pixel residual information of the video block acquired from the video decoding information.

In a possible implementation, the method further includes the step of:

the adjusting module is configured to perform, for each candidate motion vector in the merge candidate list before selecting the target candidate motion vector from the built merge candidate list of the video block according to the index information of the target candidate motion vector, deleting the candidate motion vector from the merge candidate list if it is determined that a candidate motion vector whose difference value with the candidate motion vector is smaller than a preset value exists in the merge candidate list.

In a possible implementation, the method further includes the step of:

the adjusting module is configured to adjust the candidate motion vector determined in a bidirectional prediction mode in the merge candidate list to the head of the merge candidate list before selecting the target candidate motion vector from the built merge candidate list of the video block according to the index information of the target candidate motion vector.

In a possible implementation, the modification module is specifically configured to perform:

In a possible implementation, the method further includes the first rounding module:

the first rounding module is configured to perform rounding processing on the motion amplitude value described by the motion amplitude description information if the target candidate motion vector is determined to be an intra block copy vector before the selected target candidate motion vector is corrected according to the description information of the target vector correction information.

In a possible implementation manner, the method further includes the second rounding module:

and the second rounding module is configured to perform, after the selected target candidate motion vector is corrected according to the description information of the target vector correction information, rounding the coordinate value of the corrected target candidate motion vector in the rectangular coordinate system if the target candidate motion vector is determined to be an intra block copy vector.

In a possible implementation manner, if the prediction mode of the video block is bi-prediction, the decoding module is specifically configured to perform:

According to a fourth aspect of the embodiments of the present disclosure, there is provided a video encoding apparatus comprising:

an acquisition module configured to perform acquiring a video sequence;

the combination module is configured to perform combination on vector correction information in a vector correction information base and candidate motion vectors in a constructed combination candidate list of the video blocks when the coding mode of any video block in the video sequence is a combination mode with a motion vector difference, wherein the vector correction information base is generated according to a preset motion amplitude and a preset motion direction;

a determining module configured to determine a target combination according to the coding performance corresponding to each combination;

an encoding module configured to perform generating motion description information of the video block according to a target candidate motion vector and target vector modification information in the target combination, the motion description information including index information of the target candidate motion vector and description information of the target vector modification information, and determining pixel residual information of the video block when the video block is encoded by the target combination;

and the sending module is configured to execute sending of a video stream, wherein the video stream contains decoding mode information and video decoding information of the video block, and the video decoding information comprises motion description information and pixel residual information of the video block.

In a possible implementation, the method further includes the step of:

the adjusting module is configured to perform, for each candidate motion vector in the merge candidate list before combining the vector correction information in the vector correction information base and the constructed candidate motion vector in the merge candidate list of the video block, deleting the candidate motion vector from the merge candidate list if it is determined that a candidate motion vector whose difference value with the candidate motion vector is smaller than a preset value exists in the merge candidate list.

In a possible implementation, the method further includes the step of:

the adjusting module is configured to perform adjusting the candidate motion vector determined in a bidirectional prediction mode in the merge candidate list to the head of the merge candidate list before combining the vector correction information in the vector correction information base and the constructed candidate motion vector in the merge candidate list of the video block.

In a possible embodiment, the combining module is specifically configured to perform:

According to a fifth aspect of embodiments of the present disclosure, there is provided an electronic apparatus including: at least one processor, and a memory communicatively coupled to the at least one processor, wherein:

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any of the methods described above.

According to a sixth aspect of embodiments of the present disclosure, there is provided a storage medium, wherein when instructions in the storage medium are executed by a processor of an electronic device, the electronic device is capable of performing any one of the above methods.

According to a seventh aspect of embodiments of the present disclosure, there is provided a computer program product which, when invoked by a computer, may cause the computer to perform any of the methods described above.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

receiving a video stream, wherein the video stream comprises decoding mode information and video decoding information of a video block, the video decoding information comprises motion description information and pixel residual information of the video block, when the decoding mode of the video block is determined to be a merging mode with a motion vector difference according to the decoding mode information, obtaining the motion description information of the video block from the video decoding information, the motion description information comprises index information of a target candidate motion vector of the video block and description information of target vector correction information used for correcting the target candidate motion vector, selecting the target candidate motion vector from a merging candidate list of the constructed video block according to the index information of the target candidate motion vector, correcting the target candidate motion vector according to the description information of the target vector correction information, and further correcting the target candidate motion vector and the pixel residual information of the video block obtained from the video decoding information, the video block is decoded, so that the motion description information of the video block is directly transmitted in the video stream, the expression of the motion information of the video block is not limited to the appointed amplitude and direction any more, the motion description of the video block is richer and more accurate, and after the motion description information of the video block is accurate, the pixel residual information of the video block needing to be transmitted can be reduced, namely, the bit rate used when the pixel residual information is transmitted is lower, so that the coding efficiency can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a block diagram illustrating an exemplary encoder according to an exemplary embodiment.

Fig. 2 is a block diagram illustrating an exemplary decoder according to an exemplary embodiment.

FIG. 3 is a diagram illustrating a search process for a UMVE in accordance with an exemplary embodiment.

FIG. 4 is a diagram illustrating search points for a UMVE in accordance with an exemplary embodiment.

FIG. 5 is a search direction diagram illustrating an exemplary embodiment.

FIG. 6 is a diagram illustrating a triangle prediction unit mode according to an example embodiment.

Fig. 7 is a schematic diagram illustrating the location of adjacent video blocks according to an example embodiment.

Fig. 8 is a schematic diagram illustrating an adaptive weighting process in accordance with an exemplary embodiment.

Fig. 9 is a flow chart illustrating a method of video encoding according to an example embodiment.

Fig. 10 is a flow chart illustrating a method of video decoding according to an example embodiment.

Fig. 11 is a block diagram illustrating a video encoding apparatus according to an example embodiment.

Fig. 12 is a block diagram illustrating a video decoding apparatus according to an example embodiment.

Fig. 13 is a schematic structural diagram illustrating an electronic device for implementing a video encoding method and/or a video decoding method according to an exemplary embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Video coding typically utilizes prediction methods such as inter-prediction, intra-prediction, etc. to remove redundancy present in a video frame or video sequence. Currently, video coding is performed according to one or more video coding standards. For example, Video codec standards include VVC, Joint Exploration test Model (JEM), High-Efficiency Video Coding (HEVC) (h.265/HEVC), Advanced Video Coding (AVC) (h.264/AVC), Moving Picture Expert Group (MPEG), and the like.

Conceptually, the video codec standards described above are similar. For example, these video codec standards are processed on a block basis and share similar video codec block diagrams to achieve video compression, and fig. 1 is a block diagram of a typical encoder for the video codec standards according to an exemplary embodiment.

In an encoder, a video frame is divided into a plurality of video blocks, and then a prediction value of each video block is formed based on interframe prediction or intraframe prediction, wherein the interframe prediction is to perform motion estimation and motion compensation on pixels of a previously reconstructed frame to determine the prediction value of the video block; intra prediction is the determination of the prediction value of a video block from reconstructed pixels in the current frame. Typically, the prediction value of a video block is multiple, and thus, the video block may also be predicted by selecting the best prediction value through mode decision.

Further, the prediction residual information of the best predictor (i.e., the pixel difference information between the current block and its predictor) is sent to the transform module, the transform coefficients are sent to the quantization module for entropy encoding, and the quantized coefficients are sent to the entropy encoding module to generate a compressed video bitstream. As shown in fig. 1, prediction related information (e.g., block partition information, motion vectors, reference picture indices, and intra prediction modes, etc.) from the inter and/or intra prediction modules also passes through the entropy coding module and is stored in the bitstream.

In order to enable the decoder to reconstruct the video block well, the encoder also needs to consider the information needed by the relevant modules at the decoder side when reconstructing the video block, and for this purpose, the encoder may also reconstruct the prediction residual of the video block by inverse quantization and inverse transformation, and then combine the reconstructed prediction residual and the prediction value of the video block to generate the non-filtered reconstructed pixel of the video block.

In order to improve coding efficiency and video quality, a loop filter is generally used. For example, deblocking filters may be used for AVC, HEVC, and current VVC. An additional loop filter called Sample Adaptive Offset (SAO) is defined in HEVC to further improve coding efficiency. In recent VVCs, a Loop Filter called an Adaptive Loop Filter (ALF) is actively studied, and it is highly likely to be incorporated into a final standard.

In particular implementations, the loop filters are optional, and usually turning on these loop filters helps to improve coding efficiency and video quality, but may also be turned off based on encoder decisions to save computational complexity. It should be noted that intra-prediction is typically based on non-filtered reconstructed pixels, whereas inter-prediction is based on filtered reconstructed pixels if these filter options are turned on by the encoder.

Fig. 2 is a block diagram illustrating a typical decoder for the above-described video codec standard according to an exemplary embodiment, which can be seen to be almost identical to the reconstruction related part residing in the encoder.

In the decoder, the received bitstream is first decoded by an entropy decoding module to derive quantized coefficient levels and prediction related information. Then, the quantized coefficient levels are processed by an inverse quantization module and an inverse transformation module to obtain reconstructed prediction residuals, prediction values are formed by intra prediction or motion compensation processing based on the decoded prediction information, and non-filtered reconstructed pixels are obtained by summing the reconstructed prediction residuals and the prediction values. In addition, in the case where the loop filter is turned on, a filtering operation is performed on the above-described reconstructed pixels to obtain a final reconstructed video.

UMVE is adopted in the VVC, and is integrated into reference software VTM-3.0. FIG. 3 is a diagram illustrating a UMVE search process according to an exemplary embodiment, in which a current block in a current frame is simultaneously subjected to a motion search in an L0 reference frame and an L1 reference frame, the L0 reference frame and the L1 reference frame are in a mirror relationship, and thus, the motion search direction and motion amplitude (+ S, +2S, +3S) of the current block in the L0 reference frame and the L1 reference frame are opposite. FIG. 4 is a diagram illustrating search points for a UMVE in accordance with an exemplary embodiment. Taking the L0 frame in fig. 4 as an example, the dotted origin at the center position in the L0 frame represents the starting point, the motion search directions include the positive x-axis direction, the negative x-axis direction, the positive y-axis direction, and the negative y-axis direction, the motion range corresponding to the black origin is 1 pixel, and the motion range corresponding to the solid origin is 2 pixels. The motion search process of the L1 frame in fig. 4 is similar and will not be described again.

UMVE is then renamed MMVD which provides a new motion vector representation method with simplified signaling, including the starting point, motion amplitude and motion direction of the MVD, and which also uses the merge candidate list in VVC, but considers the MMVD representation only if the merge TYPE is the DEFAULT merge TYPE (MRG _ TYPE _ DEFAULT _ N).

In MMVD, a base candidate index is used to indicate the best Motion Vector Predictor (MVP) in the merge candidate list, also called the target candidate motion vector, which is the starting point of the MVD, one candidate motion vector is shown as 1.

TABLE 1 candidate motion vectors

The distance index is used to indicate motion magnitude information of the MVD from the starting point, a candidate motion magnitude as shown in table 2.

TABLE 2 candidate motion amplitude

Distance index	0	1	2	3	4	5	6	7
									Pixel distance	1/4-pel	1/2-pel	1-pel	2-pel	4-pel	8-pel	16-pel	32-pel

Where X-pel denotes X pixels, and X ═ 1/4, 1/2, 1, 2, 4, 8, 16, 32 }.

The direction index indicates the motion direction of the MVD relative to the starting point, and one candidate motion direction is shown in table 3.

TABLE 3 directions of motion candidates

Direction index	00	01	10	11
					x axis	+	–	N/A	N/A
y axis	N/A	N/A	+	–

Where (+, N/A) for 00 indicates movement in the positive x-axis direction; (-, N/A) corresponding to 01 indicates movement in the negative x-direction; 10 (N/A, +) indicates movement in the y-axis positive direction; 11, (N/a, -) indicates movement in the negative y-axis direction.

In specific implementation, the encoding end sends the MMVD identifier after sending the skip and merge identifiers. For the decoding end, if the skipping and merging identifications are true, analyzing the MMVD identification, and if the MMVD identification is equal to 1, decoding the video block by using the MMVD; and resolving the AFFINE identification if the MMVD identification is not equal to 1, decoding the video block by using the AFFINE mode if the AFFINE identification is equal to 1, and resolving the index of the merging/skipping mode for the skipping/merging mode of the VTM if the AFFINE identification is not equal to 1.

A related approach also proposes an enhanced final motion vector representation that extends MMVD by:

increasing the direction candidates;

a plurality of motion magnitude lists;

tabulated with the motion magnitudes being large and integer.

In VTM-3, in addition to the horizontal/vertical direction candidates, other candidate directions are used, and one candidate motion direction is shown in table 4.

TABLE 4 directions of motion candidates

Direction index	000	001	010	011	100	101	110	111
									x axis	+	1	–1	0	0	+1/2	-1/2	-1/2	+1/2
y axis	0	0	+1	–1	+1/2	-1/2	+1/2	-1/2

The sets of candidate motion amplitude tables used are shown in tables 5 and 6.

TABLE 5 amplitude of first candidate motion

Distance index	0	1	2	3
					Pixel distance	1/4-pel	1/2-pel	3/4-pel	5/4-pel

TABLE 6 second candidate motion amplitude

Distance index	0	1	2	3
					Pixel distance	1-pel	2-pel	4-pel	8-pel

The values of the candidate motion direction and candidate motion amplitude may be modified to conform to VTM3.0 encoding, which may be different on each subtest, since the candidate motion amplitude table for best performance should be different depending on the use of the candidate motion direction.

Another related approach improves the final motion vector expression as follows:

(1) the candidate motion direction additional table is formed by using the diagonal directions, as shown in table 7, the search direction corresponding to table 7 is schematically shown in fig. 5, and 00, 01, 10, and 11 in fig. 5 indicate direction indexes. Table 8 or table 9 may be selected as candidate motion magnitudes according to the candidate motion direction.

(2) Using the adaptive distance table based on video resolution, table 8 is selected as the candidate motion amplitude when the video resolution is not greater than 2K, i.e. 1920 × 1080, otherwise table 9 is selected as the candidate motion amplitude.

(3) An adaptive distance table based on frequency of occurrence is used. The motion amplitudes may be rearranged according to the frequency of occurrence of the previously used motion amplitudes, i.e., the motion amplitudes with the highest frequency of occurrence may be adjusted to the head of the candidate motion amplitude table.

(4) And modifying the MMVD candidate value. If the motion amplitude of the MMVD is larger than the preset threshold value, the motion vector of the Coding Unit (CU) of the MMVD mode should have full pixels instead of sub-pixels.

TABLE 7 directions of motion candidates

TABLE 8 candidate motion amplitude

TABLE 9 candidate motion amplitude

Distance index	0	1	2	3	4	5	6	7
									Pixel distance	1-pel	2-pel	4-pel	8-pel	16-pel	32-pel	64-pel	128-pel

In the above MMVD related scheme, both the encoding end and the decoding end are well defined for expressing the motion amplitude and the motion direction of the MVD, and subsequently, both the encoding end and the decoding end use the defined motion amplitude and motion direction to express the MVD of each video block for each video block in the video sequence. However, in practice, the MVDs of the video blocks in the video frame are different greatly, and it is difficult to accurately express the MVD of each video block by using the predetermined motion amplitude and motion direction, and even if the two parties adaptively adjust the positions of the pixel distances in the candidate motion amplitude table, the problem cannot be solved.

To this end, the embodiment of the present disclosure provides a scheme for directly transferring motion description information of a video block in a video stream, in which a coding end may provide a plurality of candidate motion magnitudes and a plurality of candidate motion directions in advance, the plurality of candidate motion magnitudes and the plurality of candidate motion directions may combine into a plurality of possible MVDs, and each MVD is regarded as a kind of vector modification information, and then the plurality of MVDs may be regarded as a vector modification information base.

Subsequently, if the encoding end determines that the encoding mode of each video block in the acquired video sequence is MMVD, after constructing a merge candidate list of the video block, combining vector correction information in a vector correction information base with candidate motion vectors in the merge candidate list, correcting the candidate motion vectors in the combination with the vector correction information in the combination for each combination, determining the encoding performance corresponding to the combination according to the corrected candidate motion vectors, determining a target combination according to the encoding performance corresponding to each combination, generating motion description information of the video block according to the target candidate motion vectors in the target combination and the target vector correction information, wherein the motion description information includes index information of the target candidate motion vectors in the merge candidate list of the video block and description information of the target vector correction information, and determining pixel residual information of the video block when the target combination is used for coding the video, and finally sending the video stream to a decoding end, wherein the video stream contains decoding mode information and video decoding information of the video block, and the video decoding information comprises motion description information and pixel residual information of the video block.

If the decoding end and the encoding end agree to use the same candidate motion amplitude set and candidate motion direction set to express the vector correction information, the description information of the target vector correction information may be index information of the target motion amplitude and index information of the target motion direction.

If the decoding end and the encoding end do not agree on the motion amplitude and the motion direction for expressing the vector correction information, the description information of the target vector correction information may be the index information and the amplitude value of the target motion amplitude, and the index information and the direction information of the target motion direction.

If the decoding end and the encoding end agree on the motion amplitude for expressing the vector correction information, the description information of the target vector correction information may be index information of the target motion amplitude, and index information and direction information of the target motion direction.

If the decoding end and the encoding end agree on the motion direction for expressing the vector correction information, the description information of the target vector correction information may be index information of the target motion direction, and index information and amplitude value of the target motion amplitude.

No matter which of the above-mentioned situations, the target vector correction information of the video block sent from the encoding end to the decoding end is more accurate, and the prediction residual of the video block can be reduced, so that the encoding efficiency can be improved.

In addition, the transmission level of the description information of the target vector modification information may be a video sequence level, a video frame level, a slice level, or a video block level.

Among them, Video Sequence level such as Video Parameter Set (VPS), Sequence Parameter Set (SPS); video frame levels such as (Picture Parameter Set, PPS); slice level such as slice header, video block level such as Coding Tree Unit (CTU), CU, Prediction Unit (PU), or smaller block.

When the same motion direction description information and different motion amplitude description information are used to express the description information of the object vector correction information of each video block in the same video frame, the motion direction description information of each video block can be sent at the video frame level, and the motion amplitude description information of each video block can be sent at the video block level. In addition, the motion direction description information and the motion magnitude description information of each video block may also be transmitted at the video block level.

When the description information of the object vector modification information of each video block in the same video frame can be expressed by using different motion direction description information and the same motion amplitude description information, the motion amplitude description information of each video block can be transmitted at the video frame level, and the motion direction description information of each video block can be transmitted at the video block level. In addition, the motion direction description information and the motion magnitude description information of each video block may also be transmitted at the video block level.

Similarly, when the description information of the object vector correction information of each video block in the same video sequence is expressed by using the same motion direction description information and different motion amplitude description information, or when the description information of the object vector correction information of each video block in the same video sequence is expressed by using different motion direction description information and the same motion amplitude description information, the transmission level of the motion direction description information and the transmission level of the motion amplitude description information of each video block may be the same or different.

That is to say, in the embodiment of the present disclosure, when the description information of the target vector correction information includes the motion direction description information and the motion magnitude description information, the transmission levels of the motion direction description information and the motion magnitude description information may be the same or different.

It should be noted that the above cases are merely examples, and do not constitute a limitation on the combination of the transmission levels of the motion direction description information and the motion magnitude description information in the embodiment of the present disclosure.

Correspondingly, a decoding end can receive a video stream, the video stream comprises decoding mode information and video decoding information of a video block, wherein the video decoding information comprises motion description information and pixel residual error information of the video block, when the decoding mode of the video block is determined to be MMVD according to the decoding mode information of the video block, the motion description information of the video block can be obtained from the video decoding information, the motion description information comprises index information of a target candidate motion vector of the video block in a merging candidate list of the video block and description information of target vector correction information used for correcting the target candidate motion vector, then the target candidate motion vector is selected from the constructed merging candidate list of the video block according to the index information of the target candidate motion vector, and the selected target candidate motion vector is corrected according to the description information of the target vector correction information, and decoding the video block according to the corrected target candidate motion vector and the pixel residual information of the video block acquired from the video decoding information.

In specific implementation, if the description information only includes motion amplitude description information, the decoding end may correct the target candidate motion vector according to the motion amplitude description information and the agreed motion direction description information; if the description information includes motion direction description information, the target candidate motion vector may be modified according to the motion direction description information and the appointed motion magnitude description information. In addition, the description information may also include motion direction description information and motion amplitude description information at the same time, and in this case, the target candidate motion vector may be directly corrected according to the motion amplitude description information and the motion direction description information.

The method provided by the embodiment of the disclosure can be combined with affine combination. In the combination scheme, the first available affine merge candidate motion vector can be selected as the base predictor, and then the motion vector offset is applied to the motion vector value of each control point from the base predictor, where neither the inter prediction direction of the selected base predictor nor the reference index of each direction is changed.

Assuming that the affine model of the current block is a 4-parameter model, only 2 control points need to be derived, and thus, only the first 2 control points of the basic prediction value are used as control point prediction values.

For each control point, the zero _ MVD identifies whether the control point indicating the current block has an MV value identical to the corresponding control point prediction value. If the zero _ MVD identification is true, no further signaling is needed for that control point; otherwise, for the control point, the distance index and the direction index are signaled.

As shown in table 10, a candidate motion amplitude table of size 5 is used and the distance index of the video block at the decoding end can be signaled to indicate which pixel distance to use.

TABLE 10 candidate motion magnitudes

Distance index	0	1	2	3	4
						Pixel distance	1/2-pel	1-pel	2-pel	4-pel	8-pel

Here, a simplified approach is proposed to reduce the signaling overhead by signaling the distance index and direction index of each video block, the same offset will be applied to all available control points in the same way. In this method, the number of control points is determined by the affine type of basic prediction value, 3 control points for 6 parameter type and 2 control points for 4 parameter type, and zero MVD identification is not used in this method since all control points of a block are signaled at once.

The method provided by the embodiment of the disclosure can also be used in combination with the triangle prediction unit mode.

The concept of triangle prediction unit mode is to introduce new triangle partitions for motion compensated prediction, fig. 6 is a schematic diagram of a triangle prediction unit mode according to an exemplary embodiment, fig. 6 divides a CU into two triangle prediction units in a diagonal direction or in a diagonal opposite direction: prediction unit 1 and prediction unit 2. Each triangle prediction unit in a CU uses its own uni-directional prediction motion vector and a reference frame index derived from a uni-directional prediction candidate list for inter prediction. After the triangle prediction unit is predicted, adaptive weighting processing is performed on diagonal edges, applying the transform and quantization process to the entire CU. It is noted that this mode only applies to the skip mode and the merge mode.

Fig. 7 is a schematic diagram illustrating the locations of neighboring video blocks according to an exemplary embodiment, in the delta prediction unit mode, a uni-directional prediction candidate list is composed of five uni-directional prediction motion vector candidates. The uni-directional prediction candidate list is derived from seven neighboring video blocks, including five spatial neighboring video blocks (1-5) and two temporal co-located video blocks (6-7). The motion vectors of the seven neighboring video blocks are collected and put into a unidirectional prediction candidate list in the order of a unidirectional prediction motion vector, an L0 motion vector of a bidirectional prediction motion vector, an L1 motion vector of a bidirectional prediction motion vector, and an average motion vector of L0 and L1 motion vectors of a bidirectional prediction motion vector, wherein if the number of candidates is less than five, a zero motion vector is added to the unidirectional prediction candidate list.

After each triangle prediction unit is predicted, an adaptive weighting process is applied to the diagonal edge between two triangle prediction units to derive a final prediction value for the entire CU.

Two sets of weighting coefficients are listed below:

first set of weighting coefficients: {7/8, 6/8, 4/8, 2/8, 1/8} and {7/8, 4/8, 1/8} for luma and chroma samples, respectively;

second set of weighting coefficients: {7/8, 6/8, 5/8, 4/8, 3/8, 2/8, 1/8} and {6/8, 4/8, 2/8} are used for luma and chroma samples, respectively.

The second set of weighting coefficient sets may be used when the reference pictures of the two triangle prediction units are different from each other or the motion vector difference of the two triangle prediction units is greater than a preset pixel, such as 16 pixels; otherwise, the first set of weighting coefficients may be used.

Fig. 8 is a schematic diagram illustrating an adaptive weighting process according to an exemplary embodiment, and fig. 8 determines a final predicted value of an entire CU using the first set of weighting coefficients, where the left graph is to weight the luminances of two triangular prediction units and the left graph is to weight the chromaticities of two triangular prediction units.

The method provided by the embodiment of the present disclosure can also be used in combination with weighted average Bi-directional prediction (BWA).

In HEVC, bi-directional predictors are generated by averaging two predictors obtained from two different reference pictures and/or using two different motion vectors. In VTM-3.0, the bi-predictive mode extends beyond simple averaging, allowing weighted averaging of two predicted values, one weighted average being formulated as follows:

P_bi-pred＝((8-w)*P₀+w*P₁+4)＞＞3；

wherein, P_bi-predRepresenting bidirectional predictors, P0 represents the use of references in the reference List0The frame-generated List0 predictor, P1 represents the List1 predictor generated using a reference frame in the reference List1, and w represents the weight of the List1 predictor.

Five weights w ∈ { -2, 3, 4, 5, 10} are allowed in the weighted average bi-prediction, weight w is determined for each bi-predicted CU by one of two ways:

1) for a CU which is not coded in a merging mode, a coding end signals a decoding end weight index after determining a target motion vector difference;

2) for a CU coded in a merge mode, a weight index is deduced from neighboring blocks according to a base candidate index.

Weighted average bi-prediction applies to CUs with 256 or more luma samples (i.e., CU width multiplied by CU height is greater than or equal to 256) — for low delay pictures all 5 weights are used and for non-low delay pictures only 3 weights are used (w ∈ {3, 4, 5 }).

When used in combination with an Advanced Motion Vector Resolution (AMVR) mode, if the current picture is a low latency coded picture (meaning a picture in which the image acquisition time of the reference frame used by the current picture is earlier than the image acquisition time of the current picture), then only unequal weights are used for 1-pixel and 4-pixel Motion Vector precision to make the selection of the optimal weight at the encoding end.

When used in combination with an affine, the selection of the optimal weight at the encoding end is done for an affine if and only if the affine mode is selected as the current best mode.

When two reference pictures in bidirectional prediction are the same, the optimal weight at the encoding end is not selected.

In addition, when the specified condition is satisfied, unequal weights will not be searched (i.e. no selection of the optimal weight at the encoding end is made), depending on the Picture Order Count (POC) distance, the Quantization Parameter (QP) and the Temporal level number (Temporal layerld) between the current image and its reference image.

Embodiments of the present disclosure are described below with reference to specific embodiments.

In the embodiment of the present disclosure, the encoding side may adaptively determine motion description information, such as index information of a target candidate motion vector and description information of target vector modification information, of each video block according to video encoding information, such as video resolution, and explicitly transmit the motion description information in a bitstream.

The sending level of the description information of the target vector modification information may be VPS, SPS, PPS, slice header, CTU, CU, PU or smaller block level, and when the description information of the target vector modification information includes motion direction description information and motion amplitude description information, the sending levels of the motion direction description information and the motion amplitude description information may be the same or different.

In specific implementation, the encoding end may signal the inter-frame signal prediction direction of the video block at the decoding end, such as list0 (unidirectional prediction), list1 (unidirectional prediction), or bidirectional prediction, and may also signal the weight information when the decoding end performs weighting processing on the predicted value of list0 and the predicted value of list1, so that the encoding end may signal additional motion information, which may describe the motion compensation process more accurately, and further improve the encoding efficiency.

With respect to the candidate motion vectors.

Parameters to be considered in determining the candidate motion vectors for MMVD, such as the number of candidate motion vectors and the method used to derive the candidate motion vectors.

In particular, the encoding side may explicitly send the number of candidate motion vectors at different levels in the bitstream, such as VPS, SPS, PPS, slice header, CTU, CU or PU. Instead of transmitting the number of candidate motion vectors, the number of candidate motion vectors may be adaptively determined according to encoding information agreed with the decoding side, such as video resolution, size of the encoding block (e.g., size of the current CU).

As for the method of deriving candidate motion vectors, the embodiments of the present disclosure propose the following several improvements:

the first point is as follows: two-way priority.

That is, the candidate motion vector determined in a bi-predictive manner in the merge candidate list is adjusted to the head of the merge candidate list.

And a second point: the removal is similar.

And for each candidate motion vector in the merging candidate list, if determining that a candidate motion vector with a difference value smaller than a preset value exists in the merging candidate list, deleting the candidate motion vector from the merging candidate list.

Table 11 is an example of candidate motion vectors provided by embodiments of the present disclosure.

TABLE 11 candidate motion vectors

With respect to candidate motion magnitudes.

In an embodiment of the present disclosure, there are a plurality of parameters for determining the motion amplitude of the MMVD, including a first offset (S1) and an increment (DN) between two consecutive offsets, where DN is defined as (SN +1) -SN. The encoding side may send the number of motion amplitudes, SN and DN at different levels in the bitstream, such as VPS, SPS, PPS, slice header, CTU, CU, PU or smaller block level.

It should be noted that, when the target candidate motion vector is a vector for Current Picture Reference (CPR) or is called intra block copy (a technique used in HEVC Screen Content Coding (SCC)), the pixel precision of S1-pixels (S1-pel) needs to be integer, and the final motion vector generated by MMVD should also be rounded to integer precision.

Table 12 is an example of candidate motion magnitudes provided by embodiments of the present disclosure.

TABLE 12 motion amplitude candidates

Distance index	0	1	2	3	4	5	6	7
									Pixel distance	S1-pel	S2-pel	S3-pel	S4-pel	S5-pel	S6-pel	S7-pel	S8-pel

With respect to the candidate motion direction.

In the embodiment of the present disclosure, the encoding end may transmit the index set of candidate motion directions at different levels, such as VPS, SPS, PPS, slice header, CTU, CU, PU, or smaller block level, where the candidate motion directions may be any subset of the direction sets in table 13 and table 14.

TABLE 13 candidate directions of motion

TABLE 14 directions of motion candidates

Diagonal line meter	x-axis	+	–	+	–
							y-axis	+	–	–	+

In addition, in the related scheme, when deriving the motion information of the MMVD, a weight index of the BMA is inferred from neighboring video blocks based on a target candidate motion vector of a selected video block, and in order to simplify the MMVD, in the embodiment of the present disclosure, it is proposed to always average two predicted values of the bidirectional motion vector constructed by the MMVD using equal weights.

Fig. 9 is a flowchart illustrating a video encoding method for use in a terminal or a server according to an exemplary embodiment, the flowchart including the following steps.

S901: a video sequence is acquired.

S902: and when the encoding mode of any video block in the video sequence is MMVD, constructing a merging candidate list of the video block.

In a specific implementation, the merge candidate list of the video block may be constructed according to the motion vectors of the neighboring video blocks of the video block and the motion vector of the video block within the reference video frame. The content of the part can be determined by the method in the prior art, and is not described in detail herein.

S903: and for each candidate motion vector in the merging candidate list, if determining that a candidate motion vector with a difference value smaller than a preset value exists in the merging candidate list, deleting the candidate motion vector from the merging candidate list.

Assuming that a candidate motion vector is (x1, y1) and another candidate motion vector is (x2, y2), the difference Δ p between these two candidate motion vectors is:

in this way, the redundancy check is performed on the candidate motion vectors in the merge candidate list of the video block, so that the candidate motion vectors with similar motion information can be excluded from the list, thereby improving the encoding efficiency.

S904: the candidate motion vectors determined in a bi-predictive manner in the merge candidate list are adjusted to the head of the merge candidate list.

Considering that the candidate motion vector determined in the bidirectional prediction manner is more accurate, if the candidate motion vector determined in the bidirectional prediction manner in the merge candidate list is adjusted to the head of the merge candidate list, the probability that the candidate motion vector at the head of the merge candidate list is selected as the target candidate motion vector is higher, the index of the head of the merge candidate list is smaller, and the number of bits required during transmission is smaller, so that the encoding efficiency can be further improved.

S905: and combining the vector correction information in the vector correction information base and the candidate motion vector in the merging candidate list of the video block.

In practical application, a plurality of candidate motion amplitudes and a plurality of candidate motion directions may be preset, and the candidate motion amplitudes and the candidate motion directions are combined to obtain a vector correction information base, where each type of vector correction information in the vector correction information base is an MVD.

In specific implementation, the vector correction information in the vector correction information base is MVD, the candidate motion vector in the merge candidate list of the video block is MVP, and the vector correction information in the vector correction information base and the candidate motion vector in the merge candidate list of the video block are combined, that is, the MVD and the MVP are combined, which is equivalent to searching for a possible motion point near each MVP, and specifically, refer to the introduction of motion search of UMVE.

And the encoding end can directly combine the MVP in the merging candidate list and the MVD in the vector correction information base, or determine an MVP according to the encoding performance of each MVP in the merging candidate list, and then combine the MVP and the MVD in the vector correction information base. The specific manner of selection can be determined by the skilled person according to the actual requirements.

In addition, considering that when an intra block copy vector is used in HEVC screen content coding, the pixel value of the motion amplitude must be an integer, so for each candidate motion vector, if the candidate motion vector is an intra block copy vector, vector correction information with the motion amplitude value being an integer is selected from the vector correction information base to be combined with the candidate motion vector; if the candidate motion vector is not an intra block copy vector, all vector correction information and the candidate motion vector may be selected from the vector correction information base to be combined.

In a specific implementation, each candidate motion vector in the video block merging candidate list has a flag bit for indicating whether the candidate motion vector is an intra block copy vector, and whether the candidate motion vector is an intra block copy vector can be determined by the flag bit. For example, when the flag bit is 1, the candidate motion vector is an intra block copy vector; if the flag is 0, it means that the candidate motion vector is not an intra block copy vector.

S906: and correcting the candidate motion vectors in each combination according to the vector correction information in each combination, and determining the coding performance corresponding to the combination according to the corrected candidate motion vectors.

For example, if the vector correction information in a certain combination is MVD ' and the candidate motion vector is MVP1', the corrected candidate motion vector MV ' is MVP1' + MVD '.

Further, the video block may be encoded according to MV ', and the encoding efficiency and distortion rate of MV ' may be calculated, and the encoding performance corresponding to the combination may be determined according to the encoding rate and distortion rate of MV '.

For example, the coding performance I corresponding to the combination is determined according to the following formula:

I＝α*rate+β*distortion；

here, rate is coding efficiency, distortion is distortion rate, and α and β are preset.

Considering that the smaller the distortion ratio, the better the coding efficiency, the larger the coding efficiency, so if α is a positive number and β is a negative number, the larger the I of the combination represents the better the coding performance of the combination.

S907: and determining the target combination according to the coding performance corresponding to each combination.

In specific implementation, one combination with the optimal coding performance can be directly selected as a target combination, and one combination can be randomly selected from the first L combinations with the optimal coding performance as the target combination, wherein L is an integer.

S908: and generating motion description information of the video block according to the target candidate motion vector and the target vector correction information in the target combination, and determining pixel residual information of the video block when the target combination is used for coding the video block.

The motion description information includes index information of the target candidate motion vector and description information of the target vector modification information.

In specific implementation, the predicted pixel information of the video block can be obtained from a reference frame of the video block according to the modified candidate motion vector in the target combination, and the difference between the actual pixel information of the video block and the predicted pixel information is the pixel residual information of the video block, that is, the pixel residual information between the video block and the video block obtained by encoding the video block by using the target combination.

S909: and sending a video stream, wherein the video stream contains decoding mode information and video decoding information of the video block, and the video decoding information comprises motion description information and pixel residual information of the video block.

Optionally, the transmission level of the description information of the target vector modification information may be a video sequence level, a video frame level, a slice level, or a video block level, wherein the video sequence level is VPS, SPS; video frame levels such as PPS; slice level such as slice header, video block level such as CTU, CU, PU or smaller blocks.

In specific implementation, the encoding end and the decoding end can send complete description information of the vector correction information to the decoding end without appointing the description information of the vector correction information, such as motion amplitude description information and motion direction description information; the encoding end and the decoding end may also agree only on the description information of part of the vector correction information, such as motion amplitude description information or motion direction description information, and then the encoding end sends the description information of the vector correction information of the unconventional part to the decoding end, that is, the description information of the target vector correction information in the embodiment of the present disclosure includes motion direction description information and/or motion amplitude description information.

In addition, when the description information of the target vector correction information includes the motion direction description information and the motion amplitude description information, the transmission levels of the motion direction description information and the motion amplitude description information may be the same or different.

It should be noted that there is no sequential relationship between S903 and S904.

Fig. 10 is a flowchart illustrating a video decoding method for use in a terminal or a server according to an exemplary embodiment, the flowchart including the following steps.

S1001: receiving a video stream, wherein the video stream comprises decoding mode information and video decoding information of a video block, and the video decoding information comprises motion description information and pixel residual information of the video block.

S1002: and when the decoding mode of the video block is determined to be MMVD according to the decoding mode information, acquiring motion description information of the video block from the video decoding information of the video block, wherein the motion description information comprises index information of a target candidate motion vector of the video block in a merging candidate list of the video block and description information of target vector correction information for correcting the target candidate motion vector.

S1003: a merge candidate list of video blocks is constructed.

In addition, when the prediction direction of the video block signaled by the encoding end is received, a merging candidate list of the video block is constructed according to the prediction direction.

S1004: and for each candidate motion vector in the merging candidate list, if determining that a candidate motion vector with a difference value smaller than a preset value exists in the merging candidate list, deleting the candidate motion vector from the merging candidate list.

S1005: the candidate motion vectors determined in a bi-predictive manner in the merge candidate list are adjusted to the head of the merge candidate list.

Considering that the candidate motion vector determined in the bidirectional prediction manner is more accurate, if the candidate motion vector determined in the bidirectional prediction manner in the merge candidate list is adjusted to the head of the merge candidate list, the probability that the candidate motion vector at the head of the merge candidate list is selected as the target candidate motion vector is higher, the index number of the head of the merge candidate list is smaller, and the number of bits required during transmission is smaller, so that the encoding efficiency can be further improved.

S1006: a target candidate motion vector for the video block is selected from a merge candidate list of video blocks according to the index information.

S1007: and correcting the target candidate motion vector according to the description information of the target vector correction information.

And if the description information comprises motion direction description information and motion amplitude description information, modifying the target candidate motion vector according to the motion amplitude description information and the motion direction description information.

And if the description information comprises motion amplitude description information, correcting the target candidate motion vector according to the motion amplitude description information and appointed motion direction description information.

And if the description information comprises motion direction description information, correcting the target candidate motion vector according to the motion direction description information and appointed motion amplitude description information.

In addition, considering that when an intra block copy vector is used in HEVC screen content coding, the motion magnitude value of the MVD needs to be an integer, therefore, if it is determined that the target candidate motion vector is an intra block copy vector, before correcting the target candidate motion vector, rounding may also be performed on the motion magnitude value described by the motion magnitude description information, where the motion magnitude description information may be received by the decoding end or agreed between the decoding end and the coding end.

S1008: and if the target candidate motion vector is determined to be the intra block copy vector, rounding the coordinate value of the corrected target candidate motion vector in the rectangular coordinate system.

When intra block copy vectors are used in HEVC screen content coding, the final motion vector generated by MMVD should also be rounded to integer precision, so rounding processing can also be performed on the modified target candidate motion vector to ensure the display effect of the screen content.

S1009: and decoding the video block according to the corrected target candidate motion vector and the pixel residual information of the video block acquired from the video decoding information.

In specific implementation, if the prediction mode of the video block is bi-directional prediction, the first pixel information of the video block may be obtained from the first reference video frame and the second pixel information of the video block may be obtained from the second reference video frame according to the modified target candidate motion vector, and then the corresponding pixels in the first pixel information and the second pixel information may be weighted according to the pixel weight information to obtain the third pixel information of the video block, and the third pixel information of the video block may be modified according to the pixel residual information of the video block obtained from the video decoding information to obtain the decoded video block, where the video frame where the video block is located between the first reference video frame and the second reference video frame, and the pixel weight information is received by the decoding end through a signaling or is pre-agreed with the encoding end.

Compared with the method that the decoding end inherits the pixel weight information from the adjacent video blocks of the video blocks in the existing scheme, in the embodiment of the disclosure, the pixel weight information directly acquired by the decoding end from the signaling is more accurate, so that the decoding effect is better.

In addition, when the pixel weight information is predetermined in advance by the decoding end and the encoding end, the weights of the corresponding pixels in the first pixel information and the second pixel information may be equal to 1/2, or may be unequal to (7/8, 1/8), (3/8, 5/8), and the like, so that the encoding efficiency can be improved, the gradient processing can be realized, and the decoding effect is good.

In the above flow, there is no strict precedence relationship between S1004 and S1005.

When the method provided in the embodiments of the present disclosure is implemented in software or hardware or a combination of software and hardware, a plurality of functional modules may be included in the electronic device, and each functional module may include software, hardware or a combination of software and hardware.

Fig. 11 is a block diagram illustrating a video encoding apparatus according to an exemplary embodiment, the apparatus including an acquisition module 1101, a combining module 1102, a determination module 1103, an encoding module 1104, and a transmitting module 1105.

An acquisition module 1101 configured to perform acquiring a video sequence;

a combination module 1102, configured to perform, when the encoding mode of any video block in the video sequence is a merge mode with a motion vector difference, combining vector correction information in a vector correction information base and a candidate motion vector in a built merge candidate list of the video block, where the vector correction information base is generated according to a preset motion amplitude and a preset motion direction;

a determining module 1103 configured to perform determining a target combination according to the coding performance corresponding to each combination;

an encoding module 1104 configured to perform generating motion description information of the video block according to a target candidate motion vector and target vector modification information in the target combination, the motion description information including index information of the target candidate motion vector and description information of the target vector modification information, and determining pixel residual information of the video block when the video block is encoded by the target combination;

a sending module 1105 configured to execute sending a video stream, where the video stream includes decoding mode information and video decoding information of the video block, and the video decoding information includes motion description information and pixel residual information of the video block.

Under a possible implementation manner, the method further includes the adjusting module 1106:

the adjusting module 1106 is configured to perform, for each candidate motion vector in the merge candidate list before combining the vector modification information in the vector modification information base and the constructed candidate motion vector in the merge candidate list of the video block, deleting the candidate motion vector from the merge candidate list if it is determined that a candidate motion vector whose difference value with the candidate motion vector is smaller than a preset value exists in the merge candidate list.

the adjusting module 1106 is configured to perform adjusting the candidate motion vector determined in the merge candidate list in a bidirectional prediction manner to the head of the merge candidate list before combining the vector modification information in the vector modification information base and the constructed candidate motion vector in the merge candidate list of the video block.

In a possible implementation, the combining module 1102 is specifically configured to perform:

Fig. 12 is a block diagram illustrating a video decoding apparatus according to an exemplary embodiment, which includes a receiving module 1201, an obtaining module 1202, a selecting module 1203, a correcting module 1204, and a decoding module 1205.

A receiving module 1201 configured to perform receiving a video stream, where the video stream includes decoding mode information and video decoding information of a video block, and the video decoding information includes motion description information and pixel residual information of the video block;

an obtaining module 1202 configured to perform, when it is determined that the decoding manner of the video block is the merge mode with a motion vector difference according to the decoding manner information, obtaining motion description information of the video block from the video decoding information, where the motion description information includes index information of a target candidate motion vector of the video block and description information of target vector modification information for modifying the target candidate motion vector;

a selecting module 1203 configured to perform selecting the target candidate motion vector from the constructed merge candidate list of the video block according to the index information of the target candidate motion vector;

a modification module 1204 configured to perform modification of the selected target candidate motion vector according to description information of the target vector modification information;

a decoding module 1205 configured to perform decoding of the video block according to the modified target candidate motion vector and pixel residual information of the video block obtained from the video decoding information.

In a possible implementation, the adjusting module 1206:

the adjusting module 1206 is configured to perform, for each candidate motion vector in the merge candidate list before selecting the target candidate motion vector from the built merge candidate list of the video block according to the index information of the target candidate motion vector, deleting the candidate motion vector from the merge candidate list if it is determined that a candidate motion vector whose difference with the candidate motion vector is smaller than a preset value exists in the merge candidate list.

In a possible implementation, the adjusting module 1206:

the adjusting module 1206 is configured to perform adjusting the candidate motion vector determined in the merge candidate list in a bi-predictive manner to the head of the merge candidate list before selecting the target candidate motion vector from the built merge candidate list of the video block according to the index information of the target candidate motion vector.

In a possible implementation, the modification module 1204 is specifically configured to perform:

In a possible implementation, the first rounding module 1207 is further included:

the first rounding module 1207 is configured to perform rounding processing on the motion amplitude value described by the motion amplitude description information if it is determined that the target candidate motion vector is an intra block copy vector before the selected target candidate motion vector is corrected according to the description information of the target vector correction information.

In a possible implementation manner, the second rounding module 1208 further includes:

the second rounding module 1208 is configured to, after the selected target candidate motion vector is modified according to the description information of the target vector modification information, round the coordinate value of the modified target candidate motion vector in the rectangular coordinate system if it is determined that the target candidate motion vector is an intra block copy vector.

In a possible implementation manner, if the prediction mode of the video block is bi-prediction, the decoding module 1205 is specifically configured to perform:

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

The division of the modules in the embodiments of the present disclosure is illustrative, and is only a logical function division, and there may be another division manner in actual implementation, and in addition, each functional module in each embodiment of the present disclosure may be integrated in one processor, may also exist alone physically, or may also be integrated in one module by two or more modules. The coupling of the various modules to each other may be through interfaces that are typically electrical communication interfaces, but mechanical or other forms of interfaces are not excluded. Thus, modules described as separate components may or may not be physically separate, may be located in one place, or may be distributed in different locations on the same or different devices. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

Fig. 13 is a schematic structural diagram of an electronic device according to an exemplary embodiment, where the electronic device includes a transceiver 1301, a processor 1302, and other physical devices, where the processor 1302 may be a Central Processing Unit (CPU), a microprocessor, an application specific integrated circuit, a programmable logic circuit, a large scale integrated circuit, or a digital processing unit. The transceiver 1301 is used for data transmission and reception between an electronic device and other devices.

The electronic device may further include a memory 1303 for storing software instructions executed by the processor 1302, and may also store some other data required by the electronic device, such as identification information of the electronic device, encryption information of the electronic device, user data, and the like. The memory 1303 may be a volatile memory (RAM), such as a random-access memory (RAM); the memory 1303 may also be a non-volatile memory (non-volatile memory) such as a read-only memory (ROM), a flash memory (flash memory), a Hard Disk Drive (HDD) or a solid-state drive (SSD), or the memory 1303 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto. The memory 1303 may be a combination of the above memories.

The specific connection medium between the processor 1302, the memory 1303 and the transceiver 1301 is not limited in the embodiments of the present disclosure. In fig. 13, the embodiment of the present disclosure is described by taking only the case where the memory 1303, the processor 1302, and the transceiver 1301 are connected through the bus 1304, the bus is shown by a thick line in fig. 13, and the connection manner between other components is merely illustrative and not limited. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 13, but this is not intended to represent only one bus or type of bus.

The processor 1302 may be dedicated hardware or a processor running software, and when the processor 1302 may run software, the processor 1302 reads software instructions stored in the memory 1303 and executes a video encoding method or a video decoding method involved in the foregoing embodiments under the driving of the software instructions.

The disclosed embodiments also provide a storage medium, and when instructions in the storage medium are executed by a processor of an electronic device, the electronic device is capable of performing the video encoding method or the video decoding method referred to in the foregoing embodiments.

In some possible embodiments, various aspects of the video encoding method or the video decoding method provided by the present disclosure may also be implemented in the form of a program product including program code for causing an electronic device to perform the video encoding method or the video decoding method referred to in the foregoing embodiments when the program product is run on the electronic device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product for video encoding or video decoding provided by the present disclosure may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a computing device. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functions of two or more units described above may be embodied in one unit, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.

Further, while the operations of the disclosed methods are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present disclosure have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the disclosure.

It will be apparent to those skilled in the art that various changes and modifications can be made in the present disclosure without departing from the spirit and scope of the disclosure. Thus, if such modifications and variations of the present disclosure fall within the scope of the claims of the present disclosure and their equivalents, the present disclosure is intended to include such modifications and variations as well.

Claims

1. A video decoding method, comprising:

2. The method of claim 1, wherein before selecting the target candidate motion vector from the constructed merge candidate list of the video block according to the index information of the target candidate motion vector, the method further comprises:

3. The method of claim 1 or 2, further comprising, before selecting the target candidate motion vector from the constructed merge candidate list of the video block according to the index information of the target candidate motion vector:

4. The method according to claim 1, wherein modifying the selected target candidate motion vector according to the description information of the target vector modification information comprises:

5. The method according to claim 4, wherein before modifying the selected target candidate motion vector according to the description information of the target vector modification information, further comprising:

6. A video encoding method, comprising:

acquiring a video sequence;

7. A video decoding apparatus, comprising:

8. A video encoding apparatus, comprising:

an acquisition module configured to perform acquiring a video sequence;

9. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein:

the memory stores instructions executable by the at least one processor, the at least one processor being capable of performing the method of any one of claims 1-5 or 6 when the instructions are executed by the at least one processor.

10. A storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method of any of claims 1-5 or 6.