CN115379240A

CN115379240A - MMVD (multimedia Messaging video coding) prediction method and system based on direction expansion

Info

Publication number: CN115379240A
Application number: CN202210792570.3A
Authority: CN
Inventors: 蒋先涛; 张纪庄; 郭咏梅; 郭咏阳
Original assignee: Kangda Intercontinental Medical Devices Co ltd
Current assignee: Kangda Intercontinental Medical Devices Co ltd
Priority date: 2022-07-07
Filing date: 2022-07-07
Publication date: 2022-11-22

Abstract

The invention discloses a method and a system for predicting video coding MMVD (MMVD) based on direction expansion, which relate to the technical field of image processing and comprise the following steps: acquiring a motion vector candidate list based on the Merge mode; selecting an initial motion vector in the motion vector candidate list according to a preset selection rule; setting a starting point by taking the initial motion vector as a reference, and performing motion vector search direction expansion by taking a regular hexagon as a frame; taking the starting point as the origin, and searching the motion vector under the step length range reduction according to each expanded searching direction; performing rate distortion cost calculation under a minimum rate distortion criterion according to the motion vector set obtained by searching; and selecting the motion vector with the minimum rate distortion cost as the best prediction motion vector of the current prediction unit. The search directions in the basic MMVD technology in the prior art are expanded to 6 on the basis of the regular hexagon frame, so that the motion information of the object can be better predicted, and a better coding effect is obtained.

Description

MMVD prediction method and system for video coding based on direction expansion

Technical Field

The invention relates to the technical field of image processing, in particular to a method and a system for predicting video coding MMVD based on direction expansion.

Background

H.266/VVC is the latest international video coding standard developed in recent 4 years and jointly finalized in 7 months in 2020 by the cooperation of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG). With the improvement of video resolution, as the most advanced video coding standard, compared with the High Efficiency Video Coding (HEVC) standard, the VVC can save about 50% of coding rate under the same image quality. In order to improve the accuracy of inter-frame prediction, the inter-frame mode of the VVC not only makes an optimized extension adjustment on the original Merge mode, but also adds some inter-frame prediction tools that have not been found in the previous video coding standards, such as an Extended Merge mode (Extended Merge prediction), affine motion compensated prediction (Affine motion compensated prediction), bi-directional optical flow (Bi-directional optical flow), and a Merge mode with motion vector difference (Merge MVD, MMVD). However, these newly proposed tools still have many imperfections.

Nowadays, along with the development of science and technology, object motion is complicated, and equipment removes the flexibility and constantly increases, and the direction that the object can move also increases gradually, only to the translation prediction in four directions about from top to bottom the condition of most object motion that has very difficult accurate prediction, so need design a novel direction model that can predict more object motion characteristics and optimize.

Disclosure of Invention

In order to better adapt to the requirement of video change and avoid the increase of code rate, the invention provides a MMVD (multimedia Messaging video coding) prediction method based on direction expansion, which comprises the following steps of:

s1: acquiring a motion vector candidate list based on the Merge mode;

s2: selecting an initial motion vector in the motion vector candidate list according to a preset selection rule;

s3: setting a starting point by taking the initial motion vector as a reference, and expanding the motion vector searching direction by taking a regular hexagon as a frame;

s4: taking the starting point as the origin, and searching the motion vector under the condition of step length range reduction according to each expanded searching direction;

s5: carrying out rate distortion cost calculation under the minimum rate distortion criterion according to the motion vector set obtained by searching;

s6: and selecting the motion vector with the minimum rate distortion cost as the optimal prediction motion vector of the current prediction unit.

Further, in the step S3, the starting point is the pointing positions of the initial motion vector in the previous frame direction and the next frame direction of the current inter-frame image.

Further, in the step S3, the regular hexagon is a regular hexagon in the horizontal direction, and the starting point is a center point of the regular hexagon.

Further, in the step S3, the expanded search direction is a direction of a connection line between a center point of the regular hexagon and each corner point on the regular hexagon.

Further, in the step S4, the motion vector search is performed in a range from an initial pixel size to a preset pixel size, and the step size is increased by a multiple of 2.

The invention also provides a video coding MMVD prediction system based on direction expansion, which comprises the following steps:

the initial selection unit is used for acquiring a motion vector candidate list based on the Merge mode and selecting an initial motion vector in the motion vector candidate list according to a preset selection rule;

the direction expansion unit is used for setting a starting point by taking the initial motion vector as a reference and expanding the motion vector searching direction by taking a regular hexagon as a frame;

the vector search unit is used for searching the motion vector under the condition of step length range reduction according to each expanded search direction by taking the starting point as an origin;

the rate distortion calculation unit is used for performing rate distortion cost calculation under the minimum rate distortion criterion according to the motion vector set obtained by searching;

and the vector selection unit is used for selecting the motion vector with the minimum rate distortion cost as the optimal prediction motion vector of the current prediction unit.

Further, in the direction expanding unit, the starting point is the pointing position of the initial motion vector in the previous frame direction and the next frame direction of the current inter-frame image.

Further, in the direction expanding unit, the regular hexagon is a regular hexagon in the horizontal direction, and the starting point is a center point of the regular hexagon.

Further, in the direction expansion unit, the expanded search direction is a direction of a connecting line between a central point of the regular hexagon and each corner point on the regular hexagon.

Further, in the vector search unit, the motion vector search is a motion vector search with a step size increased by a multiple of 2 within a step size range from an initial pixel size to a preset pixel size.

Compared with the prior art, the invention at least has the following beneficial effects:

(1) According to the MMVD prediction method and system based on direction expansion, on the basis of the prior art, the search directions in the MMVD technology are expanded from 4 to 6, so that the motion information of an object can be well predicted, and a better coding effect is obtained;

(2) When the search direction is amplified, the increase of the coding code rate caused by excessive motion vectors obtained by motion vector search due to direction amplification is reduced by reducing the step search range;

(3) The regular hexagon in the horizontal direction is used as a frame for directional amplification, and the prediction performance in the horizontal direction, which moves more violently relative to the vertical direction, is further improved by the amplification in the search direction on the basis of reserving the prediction of the motion in the horizontal direction of the original cross model.

Drawings

FIG. 1 is a diagram of steps of a MMVD prediction method for video coding based on directional expansion;

FIG. 2 is a block diagram of a MMVD prediction system for video coding based on directional expansion;

FIG. 3 is a schematic diagram of the search direction of a regular hexagonal motion vector in the horizontal direction;

fig. 4 is a schematic diagram of the search direction of the regular hexagonal motion vector in the vertical direction.

Detailed Description

The following are specific embodiments of the present invention and are further described with reference to the drawings, but the present invention is not limited to these embodiments.

Example one

An inter-frame prediction technique named Merge is first proposed in the HEVC coding standard. A Motion Vector (MV) candidate list with the size of 5 is established for a current Prediction Unit (PU), and the correlation of motion vectors on a spatial domain and a time domain is utilized to analyze the motion information of a coding block adjacent to the spatial domain and a coding block adjacent to the time domain, so as to obtain a candidate motion vector, and then inter-frame prediction is carried out. The VVC is improved on the basis of HEVC, and the Merge mode is optimized. Firstly, the length of a motion vector candidate list is expanded, and the motion vector candidate list is expanded from the first 5 candidate motion vectors to 6 candidate motion vectors; then, the Merge technology is improved, a candidate motion vector selection method for a coding block adjacent to a current prediction unit in a time domain and a space domain is reserved, and meanwhile, several new inter-frame candidate list selection modes are added to fill a candidate list, such as: constructing MVP (HMVP), pair-by-pair average MVP based on historical information; and finally, performing rate distortion cost calculation by using a minimum Rate Distortion (RD) criterion, and selecting an optimal MV after comparison as the current PU optimal prediction MV.

However, the motion information obtained by the Merge mode may not be accurate, and since the motion information of the adjacent coding blocks may not be the true motion trend of the coding block to be predicted, the MMVD technique is proposed. The MMVD technique is proposed for the first time in the VVC coding standard, and is an inter-frame prediction technique in which both the encoding side and the decoding side belong to the Merge technique, and the largest difference with the Merge mode is that the MMVD mode needs to transfer the MVD to the decoding side. Meanwhile, considering the problem of algorithm efficiency, at best, only the first four candidates in the Merge candidate list are currently used as initial motion vectors. In the original MMVD technology, 1/4 pixel to 32 pixels are taken as a step range to search motion vectors, the step is increased by taking 2 as a multiple along with the increase of the search times, and the increase is stopped when the 32 pixels are reached; therefore, the specific implementation process of the original MMVD mode is as follows: firstly, obtaining a motion vector candidate list based on a Merge mode, selecting a certain number of initial motion vectors according to the requirement of coding code rate, and selecting the pointing positions of the initial motion vectors in the previous frame direction and the subsequent frame direction of the current inter-frame image as starting points by taking the initial motion vectors as a reference; secondly, searching in a variable step length in four directions (namely a cross model) of the starting point up, down, left and right; and finally, obtaining a predicted value through motion compensation, and calculating the rate-distortion cost through the predicted value, so that the rate-distortion cost at each search point position is compared to obtain the optimal combination of a motion vector, a direction and a step length for transmission.

MMVD techniques have improved over the Merge mode, performing a refined search around the motion vector. However, in real life, as the flexibility of object motion is continuously improved, many object motion directions are gradually changed from simple translational motion to more complex and changeable translational motion, and the translational motion in only four directions, namely up, down, left and right, is difficult to meet the daily video coding requirement. Therefore, in order to better handle complex motion of an object, the MMVD technique needs to be expanded in the search direction to better adapt to the requirement of video change. Based on this, as shown in fig. 1, the present invention provides a MMVD prediction method for video coding based on directional expansion, which includes the steps of:

s1: acquiring a motion vector candidate list based on the Merge mode;

s4: taking the starting point as the origin, and searching the motion vector under the step length range reduction according to each expanded searching direction;

s5: performing rate distortion cost calculation under a minimum rate distortion criterion according to the motion vector set obtained by searching;

s6: and selecting the motion vector with the minimum rate distortion cost as the best prediction motion vector of the current prediction unit.

For the inter-frame image coding of more complicated and changeable direction motion, increasing the search direction is obviously the best method to cope with the complication of motion, so the invention firstly thinks of the means by increasing the search direction. On this basis, it is considered that the motion vector has directionality, and in a natural case, the intensity of the horizontal motion of the image in the frame-by-frame image of the video is obviously higher than the motion in the vertical direction. According to the characteristic, the invention provides the expansion of the motion vector searching direction by taking a horizontal regular hexagon as a frame on the basis of avoiding the excessive increase of the searching direction. As shown in fig. 3, the starting point in the step S3 is a central point of the regular hexagon, and the expanded search direction is a direction of a connection line between the central point of the regular hexagon and each corner point on the regular hexagon, that is, a motion vector search is performed in a certain step range in six directions of 0 °, 60 °, 120 °, 180 °, 240 °, and 300 ° with an X-axis positive semi-axis as a starting line, so as to obtain a better encoding effect and save encoding code rate. As shown in fig. 4, compared with the motion vector search using a vertical regular hexagon as a frame, the horizontal regular hexagon not only retains the motion prediction of the original cross model in the horizontal direction, but also replaces two motion predictions in the vertical direction with four motion predictions pointing to different directions, which better conforms to the characteristics of image motion in natural situations.

Compared with the original MMVD model, the method only combines the upper, lower, left and right 4 directions of the conventional cross model to search the motion vector, and then judges the optimal prediction motion vector of inter-frame prediction according to the search result. The six-direction motion vector search which is carried out by taking the regular hexagon as the frame has wider coverage range, so that the prediction result is more accurate, and more flexible and changeable image motion conditions can be processed.

Because there are reference images in two directions of the previous frame and the next frame in the MMVD model, and each reference image is extended into four directions, and each direction searches for points with eight steps, this may result in higher complexity of the algorithm during searching. As the search direction increases, if the motion vector search is still performed in the original MMVD model, it can be predicted that the computational complexity increases again during the search. The most direct way to reduce the complexity of the algorithm is to reduce the number of search points. The size of the search range directly influences the number of search points, and a larger search range can cover more search points, so that the quality of decoded video can be improved to a certain extent. However, the excessive search range increases the computational complexity of video coding, and a large amount of time is lost, thereby affecting the real-time performance of the video. However, the problem of local optimal solution may be caused by an excessively small search range, and the problem of inter-frame prediction cannot be solved well, so that the size of the search range needs to be balanced in terms of algorithm performance and algorithm time complexity according to actual use requirements.

Based on the above, in order to reduce the computational complexity of the algorithm, the invention provides an improved way of reducing the search step range, and the improvement is carried out twice on the basis of the amplification of the horizontal regular hexagon search direction. According to the center offset characteristic of motion, the range of the search step length is reduced on the basis that two reference images are reserved on an original MMVD model to perform interframe prediction in two different directions, namely the previous frame direction and the next frame direction. In a preferred embodiment, the search complexity is reduced by having each direction search only four search points, i.e. only a fraction of the step size in the range of 1/4 pixel to 2 pixels.

The MMVD technique uses three parameters of a starting point, a motion step and a motion direction to represent a motion vector in Skip and Merge modes, and the candidate list selection process of the MMVD technique is also the same as the generation mode of a Merge motion vector candidate list in the VVC standard. The MMVD provided by the invention reserves the characteristics during optimization, only improves the process of supposing and selecting the final motion vector, and increases the selectable direction. The whole MMVD interframe coding process can be roughly divided into three steps:

step1: acquiring a motion vector candidate list based on the Merge mode, and selecting a proper motion vector from the motion vector candidate list as an initial motion vector (in the embodiment, the first two motion vectors in the candidate list);

step2: constructing a new MMVD motion vector set under the conditions of motion step length reduction and search direction expansion of the selected initial motion vector;

step3: based on the minimum rate distortion criterion, the selection of the best prediction motion vector is performed in the motion vector set and inter-frame coding is performed.

The new MMVD motion vector set construction steps are as follows:

first, the first two bits in the Merge motion vector candidate list are selected for examination and then used as the initial motion vector. For the initial motion vector, positions pointed by reference frames before and after the motion vector are selected as starting points to search. Then, one direction is selected to perform motion vector search within a certain step range. In the same direction, a candidate motion vector is formed each time a step point is encountered. The process is repeated until the two initial motion vectors are traversed in 6 directions and 4 steps respectively, and 24 new motion vectors of two adjacent front and rear frame reference images are obtained and used as a motion vector set.

Example two

The effect of the present invention can be further illustrated by the following simulation experiments: the experiment is based on H.266 coding standard test software VTM, the used computer is configured to be AMD Ryzen R75800H CPU,3.2GHz, the running memory is 16GB, the operating system is Windows 10 (64 bits), and the running environment is Microsoft Visual Studio 2019. The test material is selected from the group consisting of universal test conditions (CTCs) for Standard Dynamic Range (SDR) video. The test sequence here is divided into five levels, B, C, D, E, F, where class B, C and D represent 1920 × 1080, 832 × 480 and 416 × 240 resolution video, respectively. The present simulation test condition mode is a low delay B (LPB) mode, and a standard test video sequence is encoded using four Quantization Parameters (QP) 22, 27, 32, 37. All results of this experiment were obtained by encoding each video sequence with 50 frames, the encoding performance was measured by the widely used BD-Rate value, and the encoding and decoding time was expressed by EncT, decT.

The BD-Rate value indicates the change of the code Rate when the image quality PSNR is consistent. When the value is negative, the coding performance is improved, and the code rate is saved; otherwise, the code rate is lost and the performance is reduced. And the EncT, decT values represent the ratio of the total codec time of the improved algorithm to the total codec time of the standard algorithm. The results of this simulation are shown in tables 3 and 4:

table 3 shows the overall performance comparison of the proposed method in terms of coding efficiency and coding complexity.

Table 3: performance and time results of different sequences based on the method of the invention

Table 4: results of different sequence Performance based on the method of the invention

The method provided by the invention improves the MMVD technical coding performance on the whole and balances the time complexity at the same time. Analysis is performed from the performance perspective, and most of the values of YUV are reduced through YUV three-row data change in the table 3; and from the average value, the average performance is improved. In all test sequences, the brightness component Y saves 0.24% of code rate averagely, the highest code rate is Class D, the code rate is saved by 0.49%, the lowest code rate is Class B, and the code rate is also saved by 0.15%; the chroma component U can averagely save 0.26% of code rate, and although the performance is not good in Class E, the performance improvement of Class B, class D and Class F is 0.31% -0.44%; although the chroma component V only saves the code rate by 0.01 percent on average, the performance improvement of 0.62 percent is obtained by separately looking at Class D, and the single performance improvement amplitude is larger in comparison; while the performance is improved, the time complexity change is analyzed, and as can be seen from the two columns of EncT and DncT data in table 3, the encoding and decoding time is also slightly reduced, which is respectively reduced by 6% and 5% on average. From the viewpoint of video sequence group division, although the time complexity result is not good for the two groups B and C with large video sizes, the time complexity is improved by 10% -15% for small-size test sequences.

Table 4 shows the performance comparison of the proposed method on different sequences in terms of coding efficiency, the data provided by the method is that the performance YUV test results are refined on the basis of table 3, and it can be seen from table 4 that the improvement results of the luminance component Y performance of the luminance component in Class F are not so remarkable except in the chinasapeed sequence, and the other sequences are all improved; by analyzing the result of the Class E, the scheme can be found that only the Y component is partially promoted in the FourPeople and Kristen AndSara sequences, the improvement result is not obvious for the chroma components U and V, and the improvement result of the residual RaceHorses sequences in the Class E group to the U component is not obvious; in addition, two or more values in YUV components of other test sequence groups are basically reduced, which means that coding gain is obtained and performance is enhanced. It is noted that the proposed method can maintain a stable coding efficiency improvement in some video sequences, such as BasketbalPass, blowingBubbles, partyScene, which both obtain a gain of more than 0.30% for the luminance component and the chrominance component test results.

EXAMPLE III

In order to better understand the technical content of the present invention, this embodiment explains the present invention by the form of an inter-system structure, as shown in fig. 2, a MMVD prediction system based on direction extension video coding includes:

and the vector selecting unit is used for selecting the motion vector with the minimum rate distortion cost as the optimal prediction motion vector of the current prediction unit.

Further, in the direction expanding unit, the starting point is the pointing positions of the initial motion vectors in the previous frame direction and the next frame direction of the current inter-frame image.

Further, in the direction expanding unit, the regular hexagon is a regular hexagon in the horizontal direction, and the starting point is the center point of the regular hexagon.

Further, in the direction expansion unit, the expanded search direction is a direction of a connection line between a center point of the regular hexagon and each corner point on the regular hexagon.

In summary, the MMVD prediction method and system based on direction expansion according to the present invention expand the search directions in the MMVD technique from 4 to 6 on the basis of the prior art, so that the object motion information can be better predicted, thereby obtaining a better coding effect. When the search direction is amplified, the increase of the coding code rate caused by excessive motion vectors obtained by motion vector search due to direction amplification is reduced by reducing the step search range.

The regular hexagon in the horizontal direction is used as a frame for directional amplification, and the prediction performance in the horizontal direction, which moves more violently relative to the vertical direction, is further improved by the amplification in the search direction on the basis of reserving the prediction of the motion in the horizontal direction of the original cross model.

It should be noted that all directional indicators (such as up, down, left, right, front, and back \8230;) in the embodiments of the present invention are only used to explain the relative positional relationship between the components, the motion situation, etc. in a specific posture (as shown in the attached drawings), and if the specific posture is changed, the directional indicators are changed accordingly.

Moreover, descriptions of the present invention as relating to "first," "second," "a," etc. are for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicit ly indicating a number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

In the present invention, unless otherwise expressly stated or limited, the terms "connected," "secured," and the like are to be construed broadly, and for example, "secured" may be a fixed connection, a removable connection, or an integral part; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or they may be interconnected within two elements or in a relationship where two elements interact with each other unless otherwise specifically limited. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

In addition, the technical solutions in the embodiments of the present invention may be combined with each other, but it must be based on the realization of those skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination of technical solutions should not be considered to exist, and is not within the protection scope of the present invention.

Claims

1. A MMVD prediction method of video coding based on direction expansion is characterized by comprising the following steps:

s1: acquiring a motion vector candidate list based on the Merge mode;

s3: setting a starting point by taking the initial motion vector as a reference, and performing motion vector search direction expansion by taking a regular hexagon as a frame;

2. The MMVD-based video coding method of claim 1, wherein in step S3, the starting point is the pointing position of the initial motion vector in the previous frame direction and the next frame direction of the current inter-frame picture.

3. The MMVD prediction method according to claim 1, wherein in step S3, the regular hexagons are horizontally regular hexagons, and the starting point is the center point of the regular hexagons.

4. The MMVD prediction method according to claim 3, wherein in step S3, the extended search direction is a direction connecting a center point of the regular hexagon and each corner point of the regular hexagon.

5. The MMVD method according to claim 1, wherein in step S4, the step of searching for a motion vector is a step of searching for a motion vector with a step size increased by a factor of 2 within a range from an initial pixel size to a predetermined pixel size.

6. A MMVD prediction system for video coding based on directional expansion, comprising:

7. The MMVD prediction system according to claim 6, wherein said direction expansion unit is configured to start from a pointing position of the initial motion vector in a previous frame direction and a subsequent frame direction of the current inter-frame picture.

8. The MMVD prediction system according to claim 6, wherein in the direction expansion unit, the regular hexagon is a regular hexagon in the horizontal direction, and the starting point is the center point of the regular hexagon.

9. The MMVD prediction system according to claim 8, wherein in the direction expansion unit, the expanded search direction is a direction connecting a center point of the regular hexagon and corner points of the regular hexagon.

10. The MMVD prediction system according to claim 6, wherein said motion vector search unit is configured to perform a motion vector search with increments of 2 steps in a step size range from an initial pixel size to a predetermined pixel size.