CN112868234A

CN112868234A - Motion estimation method, system and storage medium

Info

Publication number: CN112868234A
Application number: CN201980066902.6A
Authority: CN
Inventors: 马思伟; 孟学苇; 郑萧桢; 王苫社
Original assignee: Peking University; SZ DJI Technology Co Ltd
Current assignee: Peking University; SZ DJI Technology Co Ltd
Priority date: 2019-09-24
Filing date: 2019-09-24
Publication date: 2021-05-28
Also published as: WO2021056215A1

Abstract

A motion estimation method, system, and storage medium, the method comprising: for an affine coding unit in a current frame, selecting one of at least four motion vector precisions for motion estimation in a reference frame to determine motion vectors for control points of the affine coding unit (S110); dividing the affine coding unit into a number of sub-units (S120); calculating a motion vector of the sub-unit in the affine coding unit from the motion vectors of the control points (S130). The motion estimation method, the motion estimation system and the storage medium enable the design of the motion vector precision under the affine mode and the motion vector precision under the conventional mode to be unified, and improve the coding performance.

Description

Motion estimation method, system and storage medium

Technical Field

The invention relates to the technical field of video coding and decoding, in particular to a motion estimation method, a motion estimation system and a storage medium.

Background

The basic principle of video coding is to remove redundancy as much as possible by using correlations between spatial, temporal and codeword. The current video coding scheme mainly comprises intra-frame prediction, inter-frame prediction, transformation, quantization, entropy coding, loop filtering and the like.

Inter-frame prediction techniques use temporal correlation between adjacent frames of a video, and use a previously encoded reconstructed frame as a reference frame to predict a current frame (i.e., a frame currently being encoded) through Motion Estimation (ME) and Motion Compensation (MC), so as to remove temporal redundant information of the video. Since there is a certain correlation between adjacent frames in a video, an image can be divided into a plurality of coding units, the position of each coding unit in the adjacent frames is searched, and the relative offset of the spatial position between the two coding units is obtained, the obtained relative offset is a Motion Vector (MV), and the process of obtaining the motion vector is called motion estimation. The motion compensation is a process of obtaining a predicted frame by using the MV and the reference frame, and the predicted frame obtained in the process may have a certain difference from the original current frame, so that the difference between the predicted frame and the current frame needs to be transmitted to a decoding end together with MV information after processes of transformation, quantization and the like, and the decoding end can reconstruct the current frame through the MV, the reference frame and the difference between the predicted frame and the current frame.

Motion estimation is an important link affecting video coding efficiency, and therefore how to optimize a motion estimation method is always a concern to those skilled in the art.

Disclosure of Invention

In this summary, concepts in a simplified form are introduced that are further described in the detailed description. This summary of the invention is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

In view of the deficiencies of the prior art, a first aspect of the embodiments of the present invention provides a method for motion estimation, the method comprising:

for an affine coding unit in a current frame, selecting one of at least four motion vector precisions to perform motion estimation in a reference frame so as to determine a motion vector of a control point of the affine coding unit;

dividing the affine coding unit into a plurality of sub-units;

and calculating the motion vector of the subunit in the affine coding unit according to the motion vector of the control point.

A second aspect of the embodiments of the present invention provides another motion estimation method, where the method includes:

for an affine coding unit in a current frame, selecting one of a plurality of motion vector precisions to perform motion estimation in a reference frame so as to determine a motion vector of a control point of the affine coding unit, wherein the plurality of motion vector precisions comprise 1/2 pixel precisions;

dividing the affine coding unit into a plurality of sub-units;

A third aspect of the embodiments of the present invention provides a motion estimation system, including a storage device and a processor, where the storage device stores thereon a computer program executed by the processor, and the computer program, when executed by the processor, performs the motion estimation method as described above.

A fourth aspect of the embodiments of the present invention provides a storage medium having stored thereon a computer program which, when running, executes a motion estimation method as described above.

The motion estimation method, the motion estimation system and the storage medium enable the design of the motion vector precision under the affine mode and the motion vector precision under the conventional mode to be unified, and improve the coding performance.

Drawings

The following drawings of the invention are included to provide a further understanding of the invention. The drawings illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention.

In the drawings:

fig. 1 shows a flow chart of a motion estimation method according to an embodiment of the invention;

FIG. 2 shows a schematic diagram of motion vectors for control points of an affine coding unit according to an embodiment of the present invention;

FIG. 3 shows a schematic diagram of motion vectors for a sub-unit of an affine coding unit according to an embodiment of the present invention;

fig. 4 shows a flow chart of a motion estimation method according to another embodiment of the invention;

fig. 5 shows a block diagram of a motion estimation system according to an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, exemplary embodiments according to the present invention will be described in detail below with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of embodiments of the invention and not all embodiments of the invention, with the understanding that the invention is not limited to the example embodiments described herein. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the invention described herein without inventive step, shall fall within the scope of protection of the invention.

In the following description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without one or more of these specific details. In other instances, well-known features have not been described in order to avoid obscuring the invention.

It is to be understood that the present invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term "and/or" includes any and all combinations of the associated listed items.

In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The following detailed description of the preferred embodiments of the invention, however, the invention is capable of other embodiments in addition to those detailed.

The motion estimation method of the embodiment of the invention can be applied to an inter-frame prediction part in the video coding and decoding technology. To better understand the motion estimation method according to the embodiment of the present invention, the following first describes video encoding and decoding.

Video is generally composed of a plurality of frames of images in a certain order. There are often a lot of spatial structural similarities or similarities in a frame of image, that is, there is a lot of spatial redundant information in the video file. Furthermore, the method is simple. Because the sampling time interval between two adjacent frames of a video is very short, there is usually a great deal of similarity between two adjacent frames, i.e., there is also a great deal of temporally redundant information in the video. In addition, from the viewpoint of visual sensitivity of human eyes, there is also a portion of video information that can be used for compression, i.e., visually redundant information.

In addition to the above-mentioned spatial redundancy, temporal redundancy and visual redundancy, a series of redundant information such as information entropy redundancy, structural redundancy, knowledge redundancy, importance redundancy, and the like exist in the video image information. The purpose of video coding is to remove redundant information in a video sequence, so as to achieve the effects of reducing storage space and saving transmission bandwidth.

At present, video coding mainly comprises intra-frame prediction, inter-frame prediction, transformation, quantization, entropy coding and loop filtering. The inter-frame prediction technique uses temporal correlation between adjacent frames of a video, predicts a current frame (a frame currently being encoded) through motion estimation and motion compensation using a reconstructed frame that has been previously encoded as a reference frame, and thus removes temporal redundant information of the video.

The motion estimation method, system, and storage medium of the present application are described in detail below with reference to the accompanying drawings. The features of the following examples and embodiments may be combined with each other without conflict. The motion estimation method, system, and storage medium described in embodiments of the present invention use the HEVC standard or extensions thereof. However, the invention is also applicable to other coding standards, such as the h.264 standard, the next generation video coding standard VVC, AVS3 or any other suitable coding standard.

Fig. 1 shows a flow diagram of a motion estimation method 100 according to an embodiment of the invention. As shown in fig. 1, the method 100 includes the steps of:

in step S110, for the affine coding unit in the current frame, one of at least four kinds of motion vector precisions is selected for motion estimation in the reference frame, thereby determining the motion vector of the control point of the affine coding unit.

Wherein the current frame is a video frame to be currently encoded. The current frame may be a video frame acquired in real time or a video frame extracted from a storage medium.

The reference frame is a video frame to be referred to when encoding the current frame. The reference frame may be a reconstructed video frame obtained by reconstructing encoded data corresponding to a video frame that can be used as the reference frame. The reference frame may be a forward reference frame, a backward reference frame, or a bidirectional reference frame, depending on the type of inter-prediction. Specifically, the inter prediction techniques include forward prediction, backward prediction, bi-directional prediction, and the like. Forward prediction predicts a current frame using a previous frame (history frame) of the current frame as a reference frame. Backward prediction predicts the current frame using a frame (future frame) subsequent to the current frame as a reference frame. Bidirectional prediction is prediction of a current frame using not only a history frame but also a future frame. In this embodiment, a bidirectional prediction mode is used, i.e., the reference frame includes both the historical frame and the future frame.

An Affine Coding Unit in the current frame, i.e., a Coding Unit (CU) divided in the current frame based on an Affine motion compensation prediction (Affine) technique.

Specifically, the conventional motion model only includes translational motion, but there are many motion forms, such as irregular motions of zooming, rotating, perspective motion, etc., which introduces the Affine technology. The processing unit in the affinity technology is no longer the whole coding unit, but the whole coding unit is divided into a plurality of sub-units, and in the motion compensation process, the sub-units are used as the units for motion compensation.

Compared with the conventional coding unit, the Affine coding unit in the affinity mode no longer has only one motion vector, but each sub-unit in the Affine coding unit has a respective motion vector. After determining the motion vectors of the control points of the affine coding unit, the motion vector of each sub-unit in the affine coding unit is derived by motion vector calculation of two control points (i.e. a four-parameter model, see left diagram in fig. 2) or three control points (i.e. a six-parameter model, see right diagram in fig. 2) of the affine coding unit, and only the information of the motion vectors of the control points needs to be written in the code stream, but the information of the motion vectors of each sub-unit does not need to be written.

As described above, in order to determine the motion vector of the sub-unit, the motion vector of the control point is first determined. In the process of motion estimation, due to the continuity of the motion of a natural object, the motion vector of the object between two adjacent frames is not necessarily exactly an integer pixel unit, so the embodiment of the invention adopts an adaptive motion vector precision (AMVR) technology to adaptively determine the precision of the motion vector at the encoding end. In the embodiment of the present invention, the determination of the Motion Vector of the control point is based on an Inter mode (also referred to as AMVP mode) in the affinity mode, and in this mode, the selection of the Motion Vector precision and the calculation of the MVD (Motion Vector Difference) are performed at the encoding end.

In one embodiment, the selectable motion vector precisions include four types, and one of the four types of motion vector precisions is selected for motion estimation for each coding unit. The at least four motion vector precisions include any four of 4 pixels, 2 pixels, integer pixels, 1/2 pixels, 1/4 pixels, 1/8 pixels, and 1/16 pixels. For example, the four motion vector precisions may be integer pixel precision, 1/2 pixel precision, 1/4 pixel precision, and 1/16 pixel precision.

In current video coding software VTM-6.0, the conventional AMVP mode includes four AMVR precisions. Therefore, compared with the conventional affinity mode adopting three kinds of precision, the embodiment of the invention increases one-bit precision, so that the number of the motion vector precision available for the Affine coding unit is the same as that of the motion vector precision available for the conventional coding unit, and the design of the adaptive motion vector precision in the affinity mode is unified with that in the conventional AMVP mode. In one embodiment, the added precision of the embodiment of the invention is 1/2 pixel precision.

Note that the accuracy of the motion vector of the control point indicated by 1/16 accuracy, 1/4 accuracy, full-pixel accuracy, and the like in the affinity mode is not the accuracy of the motion vector actually used in the motion compensation process of the subunit.

In one embodiment, a method of determining motion vector precision comprises: the motion vector precision is selected in dependence on the motion vector precision already selected by neighboring coding units.

In another embodiment, the method of determining the motion vector precision may further include: motion estimation is attempted based on at least two of the four motion vector precisions, and the motion vector precision is selected based on the effect of the motion estimation.

Specifically, two motion vector precisions may be selected from the four selectable motion vector precisions, motion estimation may be attempted separately, and the effects of the two motion estimations may be compared. For example, 1/2 pixel accuracy and integer pixel accuracy may be selected for motion estimation, respectively.

Thereafter, the effects of the motion estimation are compared. If the motion estimation effect using the lower motion vector precision is better, stopping the trial, and directly taking the lower motion vector precision as the selected motion vector precision. For example, if the effect of motion estimation with integer pixel accuracy is better than the effect of motion estimation with 1/2 pixel accuracy, the integer pixel accuracy is directly selected without any attempt at other accuracy. If the motion estimation effect using the higher motion vector precision is better, the motion estimation is continuously attempted using the higher motion vector precision until the best motion estimation effect is obtained. For example, if motion estimation with 1/2 pixel accuracy is more effective than integer pixel accuracy, motion estimation with 1/4 pixel accuracy is continuously attempted. If the effect of motion estimation at 1/2 pixel accuracy is better than the effect of motion estimation at 1/4 pixel accuracy, then 1/2 pixel accuracy is selected as the motion vector accuracy. If the effect of motion estimation at 1/4 pixel accuracy is better than the effect of motion estimation at 1/2 pixel accuracy, the effect of motion estimation at 1/8 pixel accuracy may also be compared.

In one embodiment, the determining the motion vector of the control point of the affine coding unit includes: firstly, obtaining a motion vector of a spatial domain or temporal adjacent coding unit, and constructing a candidate list according to the combination of the motion vectors of the spatial domain adjacent coding unit or the temporal adjacent coding unit.

The motion vector acquired in the process may be a motion vector of a control point of the encoding unit in the affinity mode, or may be a motion vector of a conventional encoding unit in the conventional mode. Thereafter, the acquired motion vectors are combined to construct a candidate list of control point motion vectors, the number of motion vectors in each combination depending on the number of control points of the affine coding unit.

Then, a group of motion vectors is selected from the candidate list as a predicted Motion Vector (MVP) of the control point of the affine coding unit, and motion estimation is performed in the reference frame according to the predicted motion vector to determine an actual motion vector of the control point of the affine coding unit. For example, a corresponding reference block may be determined in the reference frame based on the predicted motion vector. And then, carrying out interpolation processing on the reference block to generate fractional pixel points, and further determining the actual motion vector.

The encoding side may further calculate a difference value MVD (motion Vector difference) between the actual motion Vector and the predicted motion Vector, encode the MVD, and transmit an index of the encoded MVD and the predicted motion Vector in the candidate list to the decoding side.

As described above, the precision of the motion vector includes integer pixel precision and fractional pixel precision, and since the pixel itself of the fractional pixel position does not exist, it is necessary to acquire the pixel at the sub-pixel position by interpolating the reference block. The interpolation is to generate fractional pixel points between each integer sample by using the value of the integer pixel points. The more fractional pixel points are generated among the integer pixel points, the higher the resolution of the reference block becomes, and the more accurate and accurate the displacement of the fractional pixel precision can be compensated. With the improvement of interpolation precision, the efficiency of motion estimation and motion compensation is improved to a certain extent.

Specifically, the precision of the motion vector of the Affine mode may be an integer, i.e., integer pixel precision, e.g., integer pixel, 2 pixels; it may also be non-integer, i.e. sub-pixel precision, e.g. 1/2, 1/4, 1/8 etc. As an example, a pixel at an 1/2 precision position needs to be interpolated by a pixel at an integer pixel position. The pixel values for other precision positions need to be obtained by further interpolation using integer pixel precision pixels or 1/2 precision pixels.

For example, an interpolation filter may be selected according to the selected motion vector precision to interpolate the reference block.

In one embodiment, the same interpolation filter may be used for all motion vector precisions. For example, for all motion vector precisions, the existing six-tap interpolation filter is adopted by default. In this case, the flag bit representing the type of the interpolation filter may not be set in the code stream, thereby saving one bit of data.

In another embodiment, different interpolation filters may be selected based on different motion vector precisions. For example, since for the conventional AMVP mode, only 1/2 precision employs a 6-tap interpolation filter, other precisions employ an 8-tap interpolation filter. Thus, in one embodiment of the present invention, when 1/2 pixel precision is selected as the motion vector precision, a first interpolation filter is selected to interpolate the reference block; when a precision other than 1/2 pixel precision is selected as the motion vector precision, a second interpolation filter is selected to perform interpolation processing on the reference block, wherein the first interpolation filter and the second interpolation filter have different tap numbers. Further, the first interpolation filter may be a 6-tap interpolation filter, and the second interpolation filter may be an 8-tap interpolation filter. Therefore, the design of the interpolation filter in the Affinine mode is more matched with the design of the interpolation filter in the traditional AMVP mode.

Further, if different interpolation filters are selected according to different motion vector precisions, an identification bit of a filter type can be set in the code stream. For example, an interpolation filter using 6 taps may be represented by 1; the interpolation filter without 6 taps, i.e. with the default 8 taps, is denoted by 0.

Thus, in one embodiment, when applied to the decoding end, before selecting an interpolation filter according to the motion vector precision if a different interpolation filter is selected according to a different motion vector precision, the motion estimation method 200 further includes: and acquiring a code stream, wherein the code stream is provided with an identification bit of the filter type corresponding to the motion vector.

As described above, in the process of acquiring the motion vectors of neighboring coding units to construct the candidate list, the acquired motion vectors may be the motion vectors of the control points of the coding unit in the Affine mode, or the motion vectors of the conventional coding unit in the conventional mode. That is, the motion estimation may include both the Affine mode and the normal AMVP mode. Under the conventional AMVP, for a conventional coding unit divided in a current frame, motion estimation is performed in units of the entire coding unit.

For each conventional coding unit, when adaptive motion vector precision (AMVR) is applied, it also includes adaptively selecting one of four motion vector precisions for motion estimation. The four motion vector precisions of the conventional encoding unit are the same as or different from the four motion vector precisions of the affine encoding unit. For example, the four motion vector precisions may include integer pixel, 4 pixel, 1/4 pixel, and 1/2 pixel precisions. It should be noted, however, that the motion vector accuracy is not limited to the above four types, and may also include 1/8 pixels, 1/16 pixels, and the like, for example.

And for each conventional coding unit adopting the AMVR technology, adaptively deciding the corresponding motion vector precision at a coding end, and writing the decision result into a code stream and transmitting the code stream to a decoding end. In the embodiment of the present invention, the flag indicating the motion vector accuracy of the affine coding unit coincides with the flag indicating the motion vector accuracy of the conventional coding unit, thereby making the two modes more uniform.

Thus, in one embodiment, when the motion estimation method 200 is applied to a decoding end, before selecting one of at least four motion vector precisions for motion estimation in a reference frame, the method further comprises: and acquiring a code stream, wherein the motion vector precision of the selected affine coding unit is recorded in an identification position of the code stream, and an identification representing the motion vector precision of the affine coding unit is consistent with an identification representing the motion vector precision of the conventional coding unit.

In step S120, the affine coding unit is divided into sub-units.

Wherein the size of the sub-units may be fixed, e.g. each sub-unit is divided into 4 x 4 pixel sizes. Alternatively, the size of the sub-unit may be determined in other manners, for example, a sub-unit with a suitable size may be selected to reduce the complexity of the encoding and decoding.

Thereafter, in step S130, the motion vector of the subunit in the affine coding unit is calculated from the motion vectors of the control points.

Illustratively, the motion field of the Affine mode may be derived by motion vectors of two control points (four parameters) or three control points (six parameters). After determining the motion vectors of the control points, for the four-parameter (two control points) affine coding unit, the motion vectors of the sub-units located at the (x, y) positions are calculated by the following formula (1):

wherein (mv)_0x,mv _0y) Is the motion vector of the upper left corner control point, (mv)_1x,mv _1y) The motion vector of the upper right control point, x and y are coordinates of the center point of the subunit, and w is the width of the affine coding unit.

For a six-parameter (three control points) affine coding unit, the motion vector of the sub-unit located at the (x, y) position is calculated by the following equation (2):

wherein (mv)_0x,mv _0y) Is the motion vector of the upper left corner control point, (mv)_1x,mv _1y) Is the upper right cornerMotion vector of control point, (mv)_2x,mv _2y) The motion vector for the lower left corner control point, w is the width of the affine coding unit.

Through the calculation of the above formula, a schematic diagram of the motion vector in an affine coding unit is shown in fig. 3, where each square represents a subunit of size 4 × 4. All motion vectors after the above formula calculation are rounded to a representation of 1/16 pixel accuracy. The size of the sub-unit of both the chrominance component and the luminance component is 4 × 4, and the motion vector of the 4 × 4 sub-unit of the chrominance component can be averaged from the motion vectors of its corresponding four 4 × 4 luminance components. After the motion vector of each sub-unit is calculated, a prediction block of each sub-unit in the reference frame can be obtained through a motion compensation process. Then, a predicted frame can be obtained by utilizing the motion vector and the predicted block, the difference between the predicted frame and the actual current frame is transmitted to a decoding end after being processed by transformation, quantization and the like at the encoding end, and the current frame can be reconstructed by the decoding end through the motion vector, the reference frame and the difference between the predicted frame and the current frame.

Based on the above description, the motion estimation method according to the embodiment of the present invention unifies the design of the motion vector precision in the affine mode and the motion vector precision in the normal mode, and improves the encoding performance.

Fig. 4 shows a flow diagram of a motion estimation method 400 according to another embodiment of the invention. As shown in fig. 4, the method 400 includes the steps of:

in step S410, for an affine coding unit in a current frame, selecting one of a plurality of motion vector precisions to perform motion estimation in a reference frame, thereby determining a motion vector of a control point of the affine coding unit, wherein the plurality of motion vector precisions include 1/2 pixel precisions;

in step S420, dividing the affine coding unit into a number of sub-units;

in step S430, the motion vector of the subunit in the affine coding unit is calculated from the motion vectors of the control points.

In step S410, the current frame is a video frame to be currently encoded. The reference frame is a video frame to be referred to when encoding the current frame. The reference frame in this embodiment includes both the history frame and the future frame.

An Affine Coding Unit in the current frame, i.e., a Coding Unit (CU) divided in the current frame based on an Affine motion compensation prediction (Affine) technique. The processing unit in the affinity technology is no longer the whole coding unit, but the whole coding unit is divided into a plurality of sub-units, and in the motion compensation process, the sub-units are used as the units for motion compensation.

As described above, in order to determine the motion vector of the sub-unit, it is first necessary to determine the motion vector of the control point. In the process of motion estimation, due to the continuity of the motion of a natural object, the motion vector of the object between two adjacent frames is not necessarily exactly an integer pixel unit, so the embodiment of the invention adopts an adaptive motion vector precision (AMVR) technology to adaptively determine the precision of the motion vector at the encoding end. In the embodiment of the present invention, the determination of the Motion Vector of the control point is based on an Inter mode (also referred to as AMVP mode) in the affinity mode, and in this mode, the selection of the Motion Vector precision and the calculation of the MVD (Motion Vector Difference) are performed at the encoding end.

In the present embodiment, one of a plurality of motion vector precisions is selected for motion estimation in the reference frame, wherein the plurality of motion vector precisions includes 1/2 pixel precisions. Illustratively, for each coding unit, one of four motion vector precisions may be selected for motion estimation. The plurality of motion vector precisions include any of 4 pixels, 2 pixels, whole pixels, 1/4 pixels, 1/8 pixels, and 1/16 pixels, in addition to a fixed 1/2 pixel precision. For example, one may be selected from integer pixel accuracy, 1/2 pixel accuracy, 1/4 pixel accuracy, and 1/16 pixel accuracy for motion estimation.

In the current video coding software VTM-6.0, the conventional AMVP mode adds an AMVR precision of 1/2 pixels. Therefore, the embodiment of the invention adds 1/2 pixel precision to the optional motion vector precision, so that the design of the motion vector precision of the affine coding unit is matched with the design of the motion vector precision of the conventional coding unit.

Specifically, two motion vector precisions may be selected from the four selectable motion vector precisions, motion estimation may be attempted separately, and the effects of the two motion estimations may be compared. Thereafter, the effects of the motion estimation are compared. If the motion estimation effect using the lower motion vector precision is better, stopping the trial, and directly taking the lower motion vector precision as the selected motion vector precision. If the motion estimation effect using the higher motion vector precision is better, the motion estimation is continuously attempted using the higher motion vector precision until the best motion estimation effect is obtained.

In one embodiment, the determining the motion vector of the control point of the affine coding unit includes: firstly, obtaining a motion vector of a spatial domain or temporal adjacent coding unit, and constructing a candidate list according to the combination of the motion vectors of the spatial domain adjacent coding unit or the temporal adjacent coding unit. Thereafter, the acquired motion vectors are combined to construct a candidate list of control point motion vectors, the number of motion vectors in each combination depending on the number of control points of the affine coding unit.

Then, a group of motion vectors is selected from the candidate list as a predicted Motion Vector (MVP) of the control point of the affine coding unit, and motion estimation is performed in the reference frame according to the predicted motion vector to determine an actual motion vector of the control point of the affine coding unit. For example, a corresponding reference block may be determined in the reference frame based on the predicted motion vector. And then, carrying out interpolation processing on the reference block to generate fractional pixel points, and further determining the actual motion vector. The interpolation is to generate fractional pixel points between each integer sample by using the value of the integer pixel points. The more fractional pixel points are generated among the integer pixel points, the higher the resolution of the reference frame becomes, and the more accurate and accurate the displacement of the fractional pixel precision can be compensated. With the improvement of interpolation precision, the efficiency of motion estimation and motion compensation is improved to a certain extent.

Thus, in one embodiment, when applied to the decoding end, before selecting an interpolation filter according to the motion vector precision if a different interpolation filter is selected according to a different motion vector precision, the motion estimation method 400 further includes: and acquiring a code stream, wherein the code stream is provided with an identification bit of the filter type corresponding to the motion vector.

In one embodiment, the motion estimation may include both the Affine mode and the conventional AMVP mode. Under the conventional AMVP, for a conventional coding unit divided in a current frame, motion estimation is performed in units of the entire coding unit.

For each conventional coding unit, when applying adaptive motion vector precision (AMVR), it also includes adaptively selecting one from a plurality of motion vector precisions, including 1/2 pixel precision, for motion estimation. The optional motion vector precision of the conventional coding unit is the same as or different from the optional motion vector precision of the affine coding unit, except for 1/2 pixel precision. In one embodiment, the regular coding unit also includes four alternative motion vector precisions.

Thus, in one embodiment, when the motion estimation method 400 is applied to a decoding end, before selecting one of at least four motion vector precisions for motion estimation in a reference frame, the method further comprises: and acquiring a code stream, wherein the motion vector precision of the selected affine coding unit is recorded in an identification position of the code stream, and an identification representing the motion vector precision of the affine coding unit is consistent with an identification representing the motion vector precision of the conventional coding unit.

Thereafter, in step S420, the affine coding unit is divided into several sub-units, and in step S430, motion vectors of the sub-units in the affine coding unit are calculated from the motion vectors of the control points. For specific details of step S420 and step S430, reference may be made to the description related to step S120 and step S130 of method 100, and details are not repeated here.

Based on the above description, the motion estimation method according to the embodiment of the present invention increases 1/2 pixel accuracy in the optional motion vector accuracy in the affine mode, so that the motion vector accuracy in the affine mode is unified with the design of the motion vector accuracy in the conventional mode, and the encoding performance is improved.

A motion estimation system 500 according to an embodiment of the invention is described below in conjunction with fig. 5.

Fig. 5 is a schematic block diagram of a motion estimation system 500 of an embodiment of the present invention. The motion estimation system 500 shown in fig. 5 includes: a processor 510, a storage device 520, and a computer program stored on the storage device 520 and running on the processor 510, which when executed by the processor implements the steps of the aforementioned motion estimation method 100 shown in fig. 1 or 400 shown in fig. 4.

The processor 510 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, the processor 510 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the motion estimation system 500 to perform desired functions. For example, the processor 510 can include one or more embedded processors, processor cores, microprocessors, logic circuits, hardware Finite State Machines (FSMs), Digital Signal Processors (DSPs), or a combination thereof.

The storage 520 includes one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that may be executed by processor 210 to implement the motion estimation method (implemented by the processor) in the embodiments of the invention described below and/or other desired functions. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

In one embodiment, the system 500 further includes an input device (not shown), which may be a device used by a user to input instructions and may include one or more of an operating key, a keyboard, a mouse, a microphone, a touch screen, and the like. Furthermore, the input device may be any interface for receiving information.

In one embodiment, the system 500 further includes an output device that may output various information (e.g., images or sounds) to an external (e.g., user), and may include one or more of a display (e.g., displaying video images to the user, etc.), speakers, and the like. The output device may be any other device having an output function.

In one embodiment, the system 500 further includes a communication interface for communicating with other devices, including wired or wireless communication.

Specifically, in one embodiment, the processor, when executing the program, performs the steps of: for an affine coding unit in a current frame, selecting one of at least four motion vector precisions to perform motion estimation in a reference frame so as to determine a motion vector of a control point of the affine coding unit; dividing the affine coding unit into a plurality of sub-units; and calculating the motion vector of the subunit in the affine coding unit according to the motion vector of the control point.

In another embodiment, the processor when executing the program performs the steps of: for an affine coding unit in a current frame, selecting one of a plurality of motion vector precisions to perform motion estimation in a reference frame so as to determine a motion vector of a control point of the affine coding unit, wherein the plurality of motion vector precisions comprise 1/2 pixel precisions; dividing the affine coding unit into a plurality of sub-units; and calculating the motion vector of the subunit in the affine coding unit according to the motion vector of the control point.

In addition, the embodiment of the invention also provides a storage medium, and the computer program is stored on the storage medium. The computer program, when executed by a processor, may implement the steps of the method illustrated in fig. 1 or fig. 4, as previously described.

The storage medium is, for example, a computer-readable storage medium. The computer-readable storage medium may include, for example, a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), a USB memory, or any combination of the above storage media. The computer-readable storage medium may be any combination of one or more computer-readable storage media.

In one embodiment, the computer program instructions, when executed by a computer or processor, cause the computer or processor to perform the steps of: for an affine coding unit in a current frame, selecting one of at least four motion vector precisions to perform motion estimation in a reference frame so as to determine a motion vector of a control point of the affine coding unit; dividing the affine coding unit into a plurality of sub-units; and calculating the motion vector of the subunit in the affine coding unit according to the motion vector of the control point.

In another embodiment, the computer program instructions, when executed by a computer or processor, cause the computer or processor to perform the steps of: for an affine coding unit in a current frame, selecting one of a plurality of motion vector precisions to perform motion estimation in a reference frame so as to determine a motion vector of a control point of the affine coding unit, wherein the plurality of motion vector precisions comprise 1/2 pixel precisions; dividing the affine coding unit into a plurality of sub-units; and calculating the motion vector of the subunit in the affine coding unit according to the motion vector of the control point.

In summary, the motion estimation method, system and storage medium of the present invention unify the design of motion vector precision in the affine mode and motion vector precision in the conventional mode, improve the encoding performance, can be used to improve the quality of compressed video, improve the hardware friendliness of the codec, and have important significance for video compression processing of broadcast television, video conference, network video, etc.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the foregoing illustrative embodiments are merely exemplary and are not intended to limit the scope of the invention thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present invention. All such changes and modifications are intended to be included within the scope of the present invention as set forth in the appended claims.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another device, or some features may be omitted, or not executed.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the method of the present invention should not be construed to reflect the intent: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

It will be understood by those skilled in the art that all of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where such features are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some of the modules according to embodiments of the present invention. The present invention may also be embodied as apparatus programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

The above description is only for the specific embodiment of the present invention or the description thereof, and the protection scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the protection scope of the present invention. The protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

A method of motion estimation, the method comprising:

for an affine coding unit in a current frame, selecting one of at least four motion vector precisions to perform motion estimation in a reference frame so as to determine a motion vector of a control point of the affine coding unit;

dividing the affine coding unit into a plurality of sub-units;

and calculating the motion vector of the subunit in the affine coding unit according to the motion vector of the control point.
The method of claim 1, wherein the four motion vector precisions comprise any four of 4 pixels, 2 pixels, integer pixels, 1/2 pixels, 1/4 pixels, 1/8 pixels, and 1/16 pixels.
The method according to claim 1, wherein said determining motion vectors for control points of said affine coding unit comprises:

acquiring a motion vector of a space-domain adjacent coding unit or a time-domain adjacent coding unit, and constructing a candidate list according to the combination of the motion vectors of the space-domain adjacent coding unit or the time-domain adjacent coding unit;

selecting a group of motion vectors in the candidate list as predicted motion vectors of control points of the affine coding unit;

and performing motion estimation in the reference frame according to the predicted motion vector to determine an actual motion vector of the control point of the affine coding unit.
The method of claim 1, further comprising: and for the conventional coding unit in the current frame, performing motion estimation by taking the whole coding unit as a unit.
The method of claim 4, wherein the number of motion vector precisions available to the affine coding unit is the same as the number of motion vector precisions available to the regular coding unit.
The method of claim 5, further comprising: for each conventional coding unit, adaptively selecting one of four motion vector precisions for motion estimation of the conventional coding unit, wherein the four motion vector precisions of the conventional coding unit are the same as or different from the four motion vector precisions of the affine coding unit.
The method of claim 6, further comprising: and recording the motion vector precision of the selected affine coding unit in a code stream identification bit, wherein the identification representing the motion vector precision of the affine coding unit is consistent with the identification representing the motion vector precision of the conventional coding unit.
The method of claim 6, wherein selecting one of at least four motion vector precisions for motion estimation in a reference frame further comprises:

and acquiring a code stream, wherein the motion vector precision of the selected affine coding unit is recorded in an identification position of the code stream, and an identification representing the motion vector precision of the affine coding unit is consistent with an identification representing the motion vector precision of the conventional coding unit.
The method according to claim 1, characterized in that each of said affine coding elements comprises two control points or three control points.
The method of claim 1, further comprising: and selecting an interpolation filter according to the motion vector precision so as to perform interpolation processing on the reference block.
The method of claim 10, wherein selecting an interpolation filter based on the motion vector precision comprises:

selecting different interpolation filters according to different motion vector precisions; or

The same interpolation filter is used for all motion vector precisions.
The method of claim 11, further comprising: if different interpolation filters are selected according to different motion vector precisions, an identification bit of the filter type is set in the code stream.
The method of claim 11, wherein different motion vector precisions select different interpolation filters;

the selecting an interpolation filter according to the motion vector precision further comprises:

and acquiring a code stream, wherein the code stream is provided with an identification bit of the filter type corresponding to the motion vector.
The method of claim 11, wherein selecting different interpolation filters according to different motion vector precisions comprises:

selecting a first interpolation filter to interpolate a reference block when 1/2 pixel precision is selected as the motion vector precision;

when a precision other than the 1/2 pixel precision is selected as the motion vector precision, a second interpolation filter is selected to perform interpolation processing on the reference block, wherein,

the first interpolation filter and the second interpolation filter have different numbers of taps.
The method of claim 14, wherein the first interpolation filter is a 6-tap interpolation filter.
The method of claim 14, wherein the second interpolation filter is an 8-tap interpolation filter.
The method of claim 1, wherein selecting one of four motion vector precisions for motion estimation in a reference frame comprises:

the motion vector precision is selected in dependence on the motion vector precision already selected by neighboring coding units.
The method of claim 1, wherein adaptively selecting one of four motion vector precisions for motion estimation in a reference frame comprises:

motion estimation is attempted based on at least two of the four motion vector precisions, and the motion vector precision is selected based on the effect of the motion estimation.
The method of claim 18, wherein attempting motion estimation based on at least two of the four motion vector precisions and selecting the motion vector precision based on an effect of the motion estimation comprises:

selecting two motion vector precisions from the four motion vector precisions, respectively trying to carry out motion estimation, and comparing the effects of the two motion estimation;

and if the motion estimation effect using the lower motion vector precision is better, stopping trying, and directly taking the lower motion vector precision as the selected motion vector precision, and if the motion estimation effect using the higher motion vector precision is better, trying to perform motion estimation continuously by using the higher motion vector precision until the best motion estimation effect is obtained.
The method of claim 1, wherein the reference frame comprises a video frame preceding the current frame and a video frame following the current frame.
A method of motion estimation, the method comprising:

for an affine coding unit in a current frame, selecting one of a plurality of motion vector precisions to perform motion estimation in a reference frame so as to determine a motion vector of a control point of the affine coding unit, wherein the plurality of motion vector precisions comprise 1/2 pixel precisions;

dividing the affine coding unit into a plurality of sub-units;

and calculating the motion vector of the subunit in the affine coding unit according to the motion vector of the control point.
The method of claim 21, wherein the plurality of motion vector accuracies further comprises at least one of a 4-pixel accuracy, a 2-pixel accuracy, an integer-pixel accuracy, an 1/4-pixel accuracy, a 1/8-pixel accuracy, and a 1/16-pixel accuracy.
The method of claim 21, wherein said determining the motion vector for the control point of the affine coding unit comprises:

acquiring a motion vector of a space-domain adjacent coding unit or a time-domain adjacent coding unit, and constructing a candidate list according to the combination of the motion vectors of the space-domain adjacent coding unit or the time-domain adjacent coding unit;

selecting a group of motion vectors in the candidate list as predicted motion vectors of control points of the affine coding unit;

and performing motion estimation in the reference frame according to the predicted motion vector to determine an actual motion vector of the control point of the affine coding unit.
The method of claim 21, further comprising: and for the conventional coding unit in the current frame, performing motion estimation by taking the whole coding unit as a unit.
The method of claim 24, wherein the number of motion vector precisions available to the affine coding unit is the same as the number of motion vector precisions available to the regular coding unit.
The method of claim 25, further comprising: and recording the motion vector precision of the selected affine coding unit in a code stream identification bit, wherein the identification representing the motion vector precision of the affine coding unit is consistent with the identification representing the motion vector precision of the conventional coding unit.
The method of claim 26, wherein selecting one of at least four motion vector precisions for motion estimation in a reference frame further comprises:

and acquiring a code stream, wherein the motion vector precision of the selected affine coding unit is recorded in an identification position of the code stream, and an identification representing the motion vector precision of the affine coding unit is consistent with an identification representing the motion vector precision of the conventional coding unit.
The method of claim 21, further comprising: and selecting an interpolation filter according to the motion vector precision so as to perform interpolation processing on the reference block.
The method of claim 28, wherein selecting an interpolation filter based on the motion vector precision comprises:

selecting different interpolation filters according to different motion vector precisions; or

The same interpolation filter is used for all motion vector precisions.
The method of claim 29, wherein selecting different interpolation filters according to different motion vector precisions comprises:

when the 1/2 pixel precision is selected as the motion vector precision, selecting a first interpolation filter to perform interpolation processing on a reference block;

selecting a second interpolation filter to interpolate the reference block when a precision other than the 1/2 pixel precision is selected as the motion vector precision,

the first interpolation filter and the second interpolation filter have different numbers of taps.
The method of claim 30, wherein the first interpolation filter is a 6-tap interpolation filter.
The method of claim 30, wherein the second interpolation filter is an 8-tap interpolation filter.
The method of claim 29, further comprising: if different interpolation filters are selected according to different motion vector precisions, an identification bit of the filter type is set in the code stream.
The method of claim 33, wherein different motion vector precisions select different interpolation filters;

the selecting an interpolation filter according to the motion vector precision further comprises:

and acquiring a code stream, wherein the code stream is provided with an identification bit of the filter type corresponding to the motion vector.
A motion estimation system, characterized in that the system comprises a storage means and a processor, the storage means having stored thereon a computer program for execution by the processor, the computer program, when executed by the processor, performing the motion estimation method according to any of claims 1-34.
A storage medium having stored thereon a computer program which, when executed, performs a motion estimation method as claimed in any one of claims 1-34.