WO2021056215A1

WO2021056215A1 - Motion estimation method and system, and storage medium

Info

Publication number: WO2021056215A1
Application number: PCT/CN2019/107601
Authority: WO
Inventors: 马思伟; 孟学苇; 郑萧桢; 王苫社
Original assignee: 深圳市大疆创新科技有限公司; 北京大学
Priority date: 2019-09-24
Filing date: 2019-09-24
Publication date: 2021-04-01
Also published as: CN112868234A

Abstract

A motion estimation method and system, and a storage medium. The method comprises: for an affine coding unit in a current frame, selecting one of at least four types of motion vector accuracy to perform motion estimation in a reference frame, so as to determine the motion vector of a control point of the affine coding unit (S110); dividing the affine coding unit into a plurality of subunits (S120); and calculating the motion vectors of the subunits in the affine coding unit according to the motion vector of the control point (S130). The motion estimation method and system and the storage medium unify the design of the motion vector accuracy in an affine mode with the motion vector accuracy in the conventional mode, thus improving the coding performance.

Description

Motion estimation method, system and storage medium

Technical field

The present invention relates to the technical field of video coding and decoding, in particular to a motion estimation method, system and storage medium.

Background technique

The basic principle of video coding is to use the correlation between the spatial, temporal and codewords to remove redundancy as much as possible. Current video coding schemes mainly include intra-frame prediction, inter-frame prediction, transformation, quantization, entropy coding, and loop filtering.

Among them, the inter-frame prediction technology uses the time-domain correlation between adjacent frames of the video, uses the previously encoded reconstructed frame as a reference frame, and performs motion estimation (ME) and motion compensation (MC) pairs The current frame (that is, the frame currently being encoded) is predicted to remove the temporal redundant information of the video. Among them, because there is a certain correlation between adjacent frames in the video, the image can be divided into several coding units, and the position of each coding unit in the adjacent frames can be searched out, and the spatial position between the two can be obtained. Relative offset, the obtained relative offset is usually referred to as a motion vector (motion vector, MV), and the process of obtaining a motion vector is called motion estimation. Motion compensation is the process of using MV and reference frames to obtain the predicted frame. The predicted frame obtained by this process may be different from the original current frame. Therefore, the difference between the predicted frame and the current frame needs to be transformed and quantized. The MV information is passed to the decoder, so that the decoder can reconstruct the current frame through the MV, the reference frame, and the difference between the predicted frame and the current frame.

Motion estimation is an important link that affects the efficiency of video coding. Therefore, how to optimize the motion estimation method has always been a concern of those skilled in the art.

Summary of the invention

A series of simplified concepts are introduced in the content of the invention, which will be described in further detail in the detailed implementation section. The inventive content part of the present invention does not mean an attempt to limit the key features and necessary technical features of the claimed technical solution, nor does it mean an attempt to determine the protection scope of the claimed technical solution.

In view of the shortcomings of the prior art, the first aspect of the embodiments of the present invention provides a motion estimation method, the method includes:

For the affine coding unit in the current frame, select one from at least four kinds of motion vector accuracy to perform motion estimation in the reference frame, thereby determining the motion vector of the control point of the affine coding unit;

Dividing the affine coding unit into several subunits;

The motion vector of the sub-unit in the affine coding unit is calculated according to the motion vector of the control point.

The second aspect of the embodiments of the present invention provides another motion estimation method, and the method includes:

For the affine coding unit in the current frame, select one from a variety of motion vector precisions to perform motion estimation in the reference frame, thereby determining the motion vector of the control point of the affine coding unit, wherein the multiple Motion vector accuracy includes 1/2 pixel accuracy;

Dividing the affine coding unit into several subunits;

A third aspect of the embodiments of the present invention provides a motion estimation system. The system includes a storage device and a processor. The storage device stores a computer program run by the processor. The processor executes the above-mentioned motion estimation method while it is running.

A fourth aspect of the embodiments of the present invention provides a storage medium on which a computer program is stored, and the computer program executes the above-mentioned motion estimation method when running.

The motion estimation method, system and storage medium of the present invention unify the design of the motion vector accuracy in the affine mode with the motion vector accuracy in the conventional mode, and improve the coding performance.

Description of the drawings

The following drawings of the present invention are used here as a part of the present invention for understanding the present invention. The drawings show the embodiments of the present invention and the description thereof to explain the principle of the present invention.

In the attached picture:

Fig. 1 shows a flowchart of a motion estimation method according to an embodiment of the present invention;

Fig. 2 shows a schematic diagram of a motion vector of a control point of an affine coding unit according to an embodiment of the present invention;

Fig. 3 shows a schematic diagram of motion vectors of subunits of an affine coding unit according to an embodiment of the present invention;

Fig. 4 shows a flowchart of a motion estimation method according to another embodiment of the present invention;

Fig. 5 shows a structural block diagram of a motion estimation system according to an embodiment of the present invention.

detailed description

In order to make the objectives, technical solutions, and advantages of the present invention more obvious, the exemplary embodiments according to the present invention will be described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments of the present invention, and it should be understood that the present invention is not limited by the exemplary embodiments described herein. Based on the embodiments of the present invention described in the present invention, all other embodiments obtained by those skilled in the art without creative work should fall within the protection scope of the present invention.

In the following description, a lot of specific details are given in order to provide a more thorough understanding of the present invention. However, it is obvious to those skilled in the art that the present invention can be implemented without one or more of these details. In other examples, in order to avoid confusion with the present invention, some technical features known in the art are not described.

It should be understood that the present invention can be implemented in different forms and should not be construed as being limited to the embodiments presented here. On the contrary, the provision of these embodiments will make the disclosure thorough and complete, and will fully convey the scope of the present invention to those skilled in the art.

The purpose of the terms used here is only to describe specific embodiments and not as a limitation of the present invention. When used herein, the singular forms "a", "an" and "the/the" are also intended to include plural forms, unless the context clearly indicates otherwise. It should also be understood that the terms "composition" and/or "including", when used in this specification, determine the existence of the described features, integers, steps, operations, elements and/or components, but do not exclude one or more other The existence or addition of features, integers, steps, operations, elements, components, and/or groups. As used herein, the term "and/or" includes any and all combinations of related listed items.

In order to thoroughly understand the present invention, detailed steps and detailed structures will be proposed in the following description to explain the technical solution proposed by the present invention. The preferred embodiments of the present invention are described in detail as follows. However, in addition to these detailed descriptions, the present invention may also have other embodiments.

The motion estimation method of the embodiment of the present invention can be applied to the inter-frame prediction part of the video coding and decoding technology. In order to better understand the motion estimation method of the embodiment of the present invention, the following first introduces video coding and decoding.

Video is generally composed of multiple frames of images in a certain order. There are often a lot of identical or similar spatial structures in one frame of image, that is to say, there are a lot of spatial redundant information in the video file. In addition. Since the sampling time interval between two adjacent frames of the video is extremely short, there is usually a large amount of similarity in the adjacent two frames, that is, there is a large amount of time redundant information in the video. In addition, from the perspective of the visual sensitivity of the human eye, there is also a part of video information that can be used for compression, that is, visual redundant information.

In addition to the above-mentioned spatial redundancy, temporal redundancy and visual redundancy, video image information also has a series of redundant information such as information entropy redundancy, structural redundancy, knowledge redundancy, importance redundancy and so on. The purpose of video coding is to remove redundant information in a video sequence, so as to reduce storage space and save transmission bandwidth.

At present, video coding mainly includes intra-frame prediction, inter-frame prediction, transformation, quantization, entropy coding, and loop filtering. The embodiment of the present invention mainly aims at improving the inter-frame prediction part. The inter-frame prediction technology uses the time-domain correlation between adjacent frames of the video, uses the previously encoded reconstructed frame as a reference frame, and predicts the current frame (the frame currently being encoded) through motion estimation and motion compensation, thereby removing Time redundant information of the video.

The following describes the motion estimation method, system and storage medium of the present application in detail with reference to the accompanying drawings. In the case of no conflict, the following embodiments and features in the implementation can be combined with each other. The motion estimation method, system, and storage medium described in the embodiments of the present invention use the HEVC standard or its extension. However, the present invention is also applicable to other coding standards, such as the H.264 standard, the next generation video coding standard VVC, AVS3, or any other suitable coding standard.

Fig. 1 shows a flowchart of a motion estimation method 100 according to an embodiment of the present invention. As shown in FIG. 1, the method 100 includes the following steps:

In step S110, for the affine coding unit in the current frame, one of at least four kinds of motion vector accuracy is selected to perform motion estimation in the reference frame, thereby determining the motion vector of the control point of the affine coding unit.

Wherein, the current frame is the video frame currently to be encoded. The current frame can be a video frame collected in real time, or a video frame extracted from a storage medium.

The reference frame is the video frame to be referred to when encoding the current frame. The reference frame may be a reconstructed video frame obtained by reconstructing the encoded data corresponding to the video frame that can be used as the reference frame. Depending on the type of inter prediction, the reference frame can be a forward reference frame, a backward reference frame, or a bidirectional reference frame. Specifically, inter-frame prediction techniques include forward prediction, backward prediction, bidirectional prediction, and so on. Forward prediction uses the previous frame (historical frame) of the current frame as a reference frame to predict the current frame. Backward prediction uses the frame after the current frame (future frame) as a reference frame to predict the current frame. Bidirectional prediction uses not only historical frames but also future frames to predict the current frame. In this embodiment, a bidirectional prediction mode is adopted, that is, the reference frame includes both historical frames and future frames.

The affine coding unit in the current frame is a coding unit (CU) divided in the current frame based on the affine motion compensation prediction (Affine) technology.

Specifically, the traditional motion model only includes translational motion, but in reality there are many forms of motion, such as zooming, rotating, perspective motion and other irregular motions, which introduces the Affine technology. The processing unit in the Affine technology is no longer the entire coding unit, but divides the entire coding unit into multiple sub-units. In the process of motion compensation, motion compensation is performed in the unit of sub-units.

Compared with the conventional coding unit, the affine coding unit in the Affine mode no longer has only one motion vector, but each subunit in the affine coding unit has its own motion vector. After determining the motion vector of the control point of the affine coding unit, the motion vector of each subunit in the affine coding unit passes through the two control points of the affine coding unit (ie, the four-parameter model, see the left figure in Figure 2) or The motion vectors of the three control points (that is, the six-parameter model, see the right figure in Figure 2) are calculated and derived. Only the motion vector information of the control point needs to be written in the code stream, not the motion of each subunit. Vector information.

As described above, in order to determine the motion vector of the subunit, the motion vector of the control point is first determined. In the process of motion estimation, due to the continuity of natural object motion, the motion vector of the object between two adjacent frames may not be exactly an integer number of pixels. Therefore, the embodiment of the present invention adopts the adaptive motion vector accuracy (AMVR) technology in The encoder side adaptively determines the accuracy of the motion vector. In the embodiment of the present invention, the determination of the motion vector of the control point is based on the Inter mode (also known as the AMVP mode) in the Affine mode. In this mode, the motion vector accuracy is selected on the encoder side, and MVD (Motion Vector Difference, motion vector difference) calculation.

In one embodiment, the selectable motion vector precision includes four kinds, and for each coding unit, one of the four kinds of motion vector precision is selected for motion estimation. The at least four motion vector precisions include any four of 4 pixels, 2 pixels, whole pixels, 1/2 pixels, 1/4 pixels, 1/8 pixels, and 1/16 pixels. For example, the four kinds of motion vector precisions may be integer pixel precision, 1/2 pixel precision, 1/4 pixel precision, and 1/16 pixel precision.

In the current video coding software VTM-6.0, the conventional AMVP mode includes four AMVR precisions. Therefore, compared with the previous Affine mode with three precisions, the embodiment of the present invention increases the precision by one bit, so that the number of motion vector precisions available for the affine coding unit is the same as the number of motion vector precisions available for the conventional coding unit. Furthermore, the design of adaptive motion vector accuracy in Affine mode is unified with the design of adaptive motion vector accuracy in conventional AMVP mode. In one embodiment, the newly added precision in the embodiment of the present invention is 1/2 pixel precision.

It should be noted that the accuracy of the motion vector of the control point referred to in the Affine mode, such as 1/16 accuracy, 1/4 accuracy, and integer pixel accuracy, is not the actual use in the process of sub-unit motion compensation. The accuracy of the motion vector.

In an embodiment, the method for determining the accuracy of the motion vector includes: selecting the accuracy of the motion vector according to the selected motion vector accuracy of the neighboring coding unit.

In another embodiment, the method for determining the accuracy of a motion vector may further include: attempting to perform motion estimation based on at least two of the four kinds of motion vector accuracy, and selecting the motion vector accuracy based on the effect of the motion estimation.

Specifically, two kinds of motion vector precisions can be selected from the four optional motion vector precisions, and motion estimation is attempted respectively, and the effects of the motion estimation twice are compared. For example, you can select 1/2 pixel precision and integer pixel precision to perform motion estimation separately.

After that, compare the effects of motion estimation. If the motion estimation effect with lower motion vector accuracy is better, stop trying, and directly use the lower motion vector accuracy as the selected motion vector accuracy. For example, if the effect of using the integer pixel precision for motion estimation is better than the effect of using 1/2 pixel accuracy for motion estimation, then no other precision attempts are made, and the integer pixel precision is directly selected. If the motion estimation effect with higher motion vector accuracy is better, continue to use higher motion vector accuracy to try motion estimation until the best motion estimation effect is obtained. For example, if the effect of using 1/2 pixel accuracy for motion estimation is better than that of whole pixel accuracy, then continue to try 1/4 pixel accuracy for motion estimation. If the effect of motion estimation under 1/2 pixel accuracy is better than the effect of motion estimation under 1/4 pixel accuracy, then 1/2 pixel accuracy is selected as the motion vector accuracy. If the effect of motion estimation under 1/4 pixel accuracy is better than the effect of motion estimation under 1/2 pixel accuracy, you can continue to compare the effect of motion estimation under 1/8 pixel accuracy.

In an embodiment, the determining the motion vector of the control point of the affine coding unit includes: first, obtaining the motion vector of the spatial or temporal adjacent coding unit, and according to the spatial adjacent coding unit or temporal adjacent coding The combination of the motion vectors of the units constructs a candidate list.

The motion vector obtained in this process may be the motion vector of the control point of the coding unit in the Affine mode, or the motion vector of the conventional coding unit in the traditional mode. After that, the obtained motion vectors are combined to construct a candidate list of control point motion vectors, and the number of motion vectors in each combination depends on the number of control points of the affine coding unit.

Afterwards, select a group of motion vectors from the candidate list as the motion vector predictor (MVP) of the control point of the affine coding unit, and perform motion estimation in the reference frame according to the motion vector predictor to determine The actual motion vector of the control point of the affine coding unit. For example, the corresponding reference block can be determined in the reference frame according to the predicted motion vector. After that, interpolation processing is performed on the reference block to generate fractional pixels, and then the actual motion vector is determined.

The encoding end can also calculate the difference MVD (Motion Vector Difference) between the actual motion vector and the predicted motion vector, encode the MVD, and send the encoded MVD and the index of the predicted motion vector in the candidate list to the decoding end.

As described above, the accuracy of the motion vector includes integer pixel accuracy and fractional pixel accuracy. Since the pixel at the fractional pixel position does not exist, it is necessary to interpolate the reference block to obtain the pixel at the sub-pixel position. Interpolation is to use the value of integer pixels to generate fractional pixels between each integer sample. The more fractional pixels are generated between integer pixels, the higher the resolution of the reference block becomes, and the more accurately and accurately the displacement of fractional pixel precision can be compensated. With the improvement of interpolation accuracy, the efficiency of motion estimation and motion compensation will be improved to a certain extent.

Specifically, the accuracy of the motion vector in the Affine mode can be an integer, that is, an integer pixel accuracy, such as integer, 2 pixels; it can also be non-integer, that is, a sub-pixel accuracy, such as 1/2, 1/4, 1/8. Equal precision. As an example, the pixel at 1/2 precision position needs to be obtained by interpolation of the pixel at the whole pixel position. The pixel values of other precision positions need to be obtained by further interpolation using integer-pixel precision pixels or 1/2-precision pixels.

Exemplarily, an interpolation filter can be selected according to the selected motion vector accuracy to perform interpolation processing on the reference block.

In one embodiment, the same interpolation filter may be used for all motion vector accuracy. For example, for all motion vector accuracy, the existing six-tap interpolation filter is used by default. In this case, the identification bit that characterizes the type of the interpolation filter may not be set in the code stream, thereby saving one bit of data.

In another embodiment, different interpolation filters can be selected according to different motion vector accuracy. For example, for the conventional AMVP mode, only 1/2 precision uses a 6-tap interpolation filter, and other precisions all use an 8-tap interpolation filter. Therefore, in an embodiment of the present invention, when 1/2 pixel precision is selected as the motion vector precision, the first interpolation filter is selected to perform interpolation processing on the reference block; when the precision other than 1/2 pixel precision is selected When the other precision of is used as the motion vector precision, a second interpolation filter is selected to perform interpolation processing on the reference block, wherein the number of taps of the first interpolation filter and the second interpolation filter are different. Further, the first interpolation filter may be a 6-tap interpolation filter, and the second interpolation filter may be an 8-tap interpolation filter. As a result, the interpolation filter design in the Affine mode is more matched with the interpolation filter design in the traditional AMVP mode.

Further, if different interpolation filters are selected according to different motion vector accuracy, then the filter type identification bit can be set in the code stream. For example, 1 can be used to indicate that a 6-tap interpolation filter is used; 0 can be used to indicate that a 6-tap interpolation filter is not used, that is, the default 8-tap interpolation filter is used.

Therefore, in one embodiment, when applied to the decoding end, if different interpolation filters are selected for different motion vector accuracy, before selecting the interpolation filter according to the motion vector accuracy, the motion estimation method 200 further includes: acquiring code Stream, the code stream is provided with an identification bit of the filter type corresponding to the motion vector.

As mentioned above, in the process of obtaining the motion vectors of adjacent coding units to construct the candidate list, what is obtained can be the motion vector of the control point of the coding unit in Affine mode, or the motion vector of the conventional coding unit in the traditional mode. . In other words, motion estimation can include both Affine mode and regular AMVP mode. Under the conventional AMVP, for the conventional coding unit divided in the current frame, motion estimation is performed with the entire coding unit as a unit.

For each conventional coding unit, when adaptive motion vector accuracy (AMVR) is applied, it also includes adaptively selecting one of four motion vector accuracy for motion estimation. The four motion vector accuracies of the conventional coding unit are the same or different from the four motion vector accuracies of the affine coding unit. For example, the four kinds of motion vector precisions may include integer pixel, 4 pixel, 1/4 pixel and 1/2 pixel precision. However, it should be noted that the accuracy of the motion vector is not limited to the above four types, for example, it may also include 1/8 pixel, 1/16 pixel, and so on.

For each conventional coding unit using AMVR technology, the corresponding motion vector accuracy is adaptively decided at the coding end, and the result of the decision is written into the code stream and passed to the decoding end. In the embodiment of the present invention, the identifier indicating the accuracy of the motion vector of the affine coding unit is consistent with the identifier indicating the accuracy of the motion vector of the conventional coding unit, so that the two modes are more unified.

Therefore, in one embodiment, when the motion estimation method 200 is applied to the decoding end, before selecting one of the at least four motion vector precisions to perform motion estimation in the reference frame, the method further includes: acquiring a bitstream, so The identification bit of the code stream records the motion vector accuracy of the selected affine coding unit, the identifier representing the motion vector accuracy of the affine coding unit and the identifier representing the motion vector accuracy of the conventional coding unit Consistent.

In step S120, the affine coding unit is divided into several subunits.

Wherein, the size of the sub-units may be fixed, for example, each sub-unit is divided into a size of 4×4 pixels. Alternatively, the size of the subunit may also be determined in other ways. For example, a subunit of an appropriate size may be selected to reduce the complexity of coding and decoding.

After that, in step S130, the motion vector of the subunit in the affine coding unit is calculated according to the motion vector of the control point.

Exemplarily, the sports field of the Affine mode can be derived from the motion vectors of two control points (four parameters) or three control points (six parameters). After determining the motion vector of the control point, for the four-parameter (two control points) affine coding unit, the motion vector of the subunit located at the (x, y) position is calculated by the following formula (1):

Among them, (mv _0x ,mv _0y ) is the motion vector of the control point in the upper left corner, (mv _1x ,mv _1y ) is the motion vector of the control point in the upper right corner, x and y are the coordinates of the center point of the subunit, and w is the affine The width of the coding unit.

For the six-parameter (three control points) affine coding unit, the motion vector of the sub-unit at the position (x, y) is calculated by the following formula (2):

Among them, (mv _0x ,mv _0y ) is the motion vector of the control point in the upper left corner, (mv _1x ,mv _1y ) is the motion vector of the control point in the upper right corner, (mv _2x ,mv _2y ) is the motion vector of the control point in the lower left corner, w is the width of the affine coding unit.

After calculating the above formula, a schematic diagram of the motion vector in an affine coding unit is shown in Fig. 3, where each square represents a 4×4 size subunit. All motion vectors after the calculation of the above formula will be rounded to a 1/16 pixel precision representation. The size of the subunits of the chrominance component and the luminance component are both 4×4, and the motion vector of the chrominance component 4×4 subunit can be obtained by averaging the motion vectors of the corresponding four 4×4 luminance components. After the motion vector of each subunit is calculated, the prediction block of each subunit in the reference frame can be obtained through a motion compensation process. After that, the prediction frame can be obtained by using the motion vector and the prediction block. The encoding end transfers the difference between the prediction frame and the actual current frame to the decoding end after transformation, quantization, etc., and the decoding end uses the motion vector, reference frame, and The difference between the predicted frame and the current frame can reconstruct the current frame.

Based on the above description, the motion estimation method according to the embodiment of the present invention unifies the design of the motion vector accuracy in the affine mode with the motion vector accuracy in the normal mode, and improves the coding performance.

Fig. 4 shows a flowchart of a motion estimation method 400 according to another embodiment of the present invention. As shown in FIG. 4, the method 400 includes the following steps:

In step S410, for the affine coding unit in the current frame, select one from a variety of motion vector precisions to perform motion estimation in the reference frame, thereby determining the motion vector of the control point of the affine coding unit, wherein, The various motion vector precisions include 1/2 pixel precision;

In step S420, the affine coding unit is divided into several subunits;

In step S430, the motion vector of the sub-unit in the affine coding unit is calculated according to the motion vector of the control point.

In step S410, the current frame is the video frame currently to be encoded. The reference frame is the video frame to be referred to when encoding the current frame. The reference frame in this embodiment includes both historical frames and future frames.

The affine coding unit in the current frame is a coding unit (CU) divided in the current frame based on the affine motion compensation prediction (Affine) technology. The processing unit in the Affine technology is no longer the entire coding unit, but divides the entire coding unit into multiple sub-units. In the process of motion compensation, motion compensation is performed in the unit of sub-units.

As mentioned above, in order to determine the motion vector of the subunit, the motion vector of the control point needs to be determined first. In the process of motion estimation, due to the continuity of natural object motion, the motion vector of the object between two adjacent frames may not be exactly an integer number of pixels. Therefore, the embodiment of the present invention adopts the adaptive motion vector accuracy (AMVR) technology in The encoder side adaptively determines the accuracy of the motion vector. In the embodiment of the present invention, the determination of the motion vector of the control point is based on the Inter mode (also known as the AMVP mode) in the Affine mode. In this mode, the motion vector accuracy is selected on the encoder side, and MVD (Motion Vector Difference, motion vector difference) calculation.

In this embodiment, one is selected from multiple types of motion vector accuracy to perform motion estimation in the reference frame, where the multiple types of motion vector accuracy include 1/2 pixel accuracy. Exemplarily, for each coding unit, one of four kinds of motion vector accuracy can be selected for motion estimation. In addition to the fixed 1/2 pixel precision, the various motion vector precisions include any of 4 pixels, 2 pixels, whole pixels, 1/4 pixels, 1/8 pixels, and 1/16 pixels. For example, one can be selected from integer pixel accuracy, 1/2 pixel accuracy, 1/4 pixel accuracy, and 1/16 pixel accuracy for motion estimation.

In the current video coding software VTM-6.0, the conventional AMVP mode adds 1/2 pixel AMVR accuracy. Therefore, the embodiment of the present invention adds 1/2 pixel precision to the optional motion vector precision, so that the design of the motion vector precision of the affine coding unit matches the design of the motion vector precision of the conventional coding unit.

Specifically, two kinds of motion vector precisions can be selected from the four optional motion vector precisions, and motion estimation is attempted respectively, and the effects of the two motion estimations are compared. After that, compare the effects of motion estimation. If the motion estimation effect with lower motion vector accuracy is better, stop trying, and directly use the lower motion vector accuracy as the selected motion vector accuracy. If the motion estimation effect with higher motion vector accuracy is better, continue to use higher motion vector accuracy to try motion estimation until the best motion estimation effect is obtained.

In an embodiment, the determining the motion vector of the control point of the affine coding unit includes: first, obtaining the motion vector of the spatial or temporal adjacent coding unit, and according to the spatial adjacent coding unit or temporal adjacent coding The combination of the motion vectors of the units constructs a candidate list. After that, the obtained motion vectors are combined to construct a candidate list of control point motion vectors, and the number of motion vectors in each combination depends on the number of control points of the affine coding unit.

Afterwards, select a group of motion vectors from the candidate list as the motion vector predictor (MVP) of the control point of the affine coding unit, and perform motion estimation in the reference frame according to the motion vector predictor to determine The actual motion vector of the control point of the affine coding unit. For example, the corresponding reference block can be determined in the reference frame according to the predicted motion vector. After that, interpolation processing is performed on the reference block to generate fractional pixels, and then the actual motion vector is determined. Interpolation is to use the value of integer pixels to generate fractional pixels between each integer sample. The more fractional pixels are generated between integer pixels, the higher the resolution of the reference frame becomes, and the more accurately and accurately the displacement of fractional pixel accuracy can be compensated. With the improvement of interpolation accuracy, the efficiency of motion estimation and motion compensation will be improved to a certain extent.

Therefore, in one embodiment, when applied to the decoding end, if different interpolation filters are selected for different motion vector accuracy, before selecting the interpolation filter according to the motion vector accuracy, the motion estimation method 400 further includes: acquiring code Stream, the code stream is provided with an identification bit of the filter type corresponding to the motion vector.

In one embodiment, motion estimation may include both Affine mode and regular AMVP mode. Under the conventional AMVP, for the conventional coding unit divided in the current frame, motion estimation is performed with the entire coding unit as a unit.

For each conventional coding unit, when adaptive motion vector precision (AMVR) is applied, it also includes adaptively selecting one of multiple motion vector precisions for motion estimation, the multiple motion vector precisions including 1/2 Pixel accuracy. Except for 1/2 pixel precision, the optional motion vector precision of the conventional coding unit is the same as or different from the optional motion vector precision of the affine coding unit. In one embodiment, the conventional coding unit also includes four optional motion vector precisions.

Therefore, in an embodiment, when the motion estimation method 400 is applied to the decoding end, before selecting one of the at least four motion vector precisions to perform motion estimation in the reference frame, the method further includes: obtaining a code stream, so The identification bit of the code stream records the motion vector accuracy of the selected affine coding unit, the identifier representing the motion vector accuracy of the affine coding unit and the identifier representing the motion vector accuracy of the conventional coding unit Consistent.

After that, in step S420, the affine coding unit is divided into several sub-units, and in step S430, the motion vector of the sub-unit in the affine coding unit is calculated according to the motion vector of the control point. For the specific details of step S420 and step S430, reference may be made to the related description of step S120 and step S130 of the method 100, which will not be repeated here.

Based on the above description, the motion estimation method according to the embodiment of the present invention adds 1/2 pixel precision to the optional motion vector precision in affine mode, so that the precision of the motion vector in affine mode is the same as that in normal mode. The precision design is unified, and the coding performance is improved.

The following describes a motion estimation system 500 according to an embodiment of the present invention with reference to FIG. 5.

FIG. 5 is a schematic block diagram of a motion estimation system 500 according to an embodiment of the present invention. The motion estimation system 500 shown in FIG. 5 includes a processor 510, a storage device 520, and a computer program stored on the storage device 520 and running on the processor 510. The processor implements the foregoing figure when the program is executed. Steps of the motion estimation method 100 shown in 1 or the motion estimation method 400 shown in FIG. 4.

The processor 510 may be a central processing unit (CPU), an image processing unit (GPU), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), or other forms with data processing capabilities and/or instruction execution capabilities The processor 510 may be a central processing unit (CPU) or other form of processing unit with data processing capability and/or instruction execution capability, and may control other components in the motion estimation system 500 to execute The desired function. For example, the processor 510 can include one or more embedded processors, processor cores, microprocessors, logic circuits, hardware finite state machines (FSM), digital signal processors (DSP), or combinations thereof.

The storage device 520 includes one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include random access memory (RAM) and/or cache memory (cache), for example. The non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 210 may run the program instructions to implement the motion estimation method in the embodiments of the present invention (implemented by the processor) described below. And/or other desired functions. Various application programs and various data, such as various data used and/or generated by the application program, can also be stored in the computer-readable storage medium.

In an embodiment, the system 500 further includes an input device (not shown). The input device may be a device used by the user to input instructions, and may include one of operation keys, a keyboard, a mouse, a microphone, and a touch screen. Or more. In addition, the input device may also be any interface for receiving information.

In an embodiment, the system 500 further includes an output device that can output various information (such as images or sounds) to the outside (such as a user), and may include a display (such as displaying a video image to the user), One or more of speakers, etc. In addition, the output device may also be any other device with output function.

In an embodiment, the system 500 further includes a communication interface, which is used to communicate with other devices, including wired or wireless communication.

Specifically, in one embodiment, the processor implements the following steps when executing the program: For the affine coding unit in the current frame, select one of at least four motion vector precisions to perform motion estimation in the reference frame, In this way, the motion vector of the control point of the affine coding unit is determined; the affine coding unit is divided into several sub-units; the motion vector of the sub-unit in the affine coding unit is calculated according to the motion vector of the control point. Motion vector.

In another embodiment, the processor implements the following steps when executing the program: For the affine coding unit in the current frame, select one of a variety of motion vector precisions to perform motion estimation in the reference frame, thereby determining the The motion vector of the control point of the affine coding unit, wherein the various motion vector precisions include 1/2 pixel precision; the affine coding unit is divided into a number of sub-units; the calculation is based on the motion vector of the control point The motion vector of the sub-unit in the affine coding unit.

In addition, the embodiment of the present invention also provides a storage medium on which a computer program is stored. When the computer program is executed by the processor, the steps of the method shown in FIG. 1 or FIG. 4 can be implemented.

For example, the storage medium is a computer-readable storage medium. The computer-readable storage medium may include, for example, a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a read-only memory (ROM), an erasable programmable read-only memory (EPROM), a portable compact disk read-only Memory (CD-ROM), USB memory, or any combination of the above storage media. The computer-readable storage medium may be any combination of one or more computer-readable storage media.

In one embodiment, the computer program instructions, when run by the computer or processor, cause the computer or processor to perform the following steps: For the affine coding unit in the current frame, select one of at least four motion vector precisions to Perform motion estimation in the reference frame to determine the motion vector of the control point of the affine coding unit; divide the affine coding unit into several subunits; calculate the affine coding according to the motion vector of the control point The motion vector of the sub-unit in the unit.

In another embodiment, the computer program instructions, when run by the computer or processor, cause the computer or processor to perform the following steps: For the affine coding unit in the current frame, select one from a variety of motion vector precisions. Perform motion estimation in a reference frame to determine the motion vector of the control point of the affine coding unit, wherein the various motion vector precisions include 1/2 pixel precision; the affine coding unit is divided into several sub- Unit; calculate the motion vector of the sub-unit in the affine coding unit according to the motion vector of the control point.

In summary, the motion estimation method, system and storage medium of the present invention unify the design of the motion vector accuracy in the affine mode with the motion vector accuracy in the conventional mode, improve the coding performance, and can be used to improve the quality of compressed video. Enhancing the hardware friendliness of the codec is of great significance to the video compression processing of broadcast television, video conference, network video, etc.

A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of the present invention.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present invention essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

The above are only specific embodiments of the present invention, but the scope of protection of the present invention is not limited thereto. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed by the present invention. It should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Although the exemplary embodiments have been described herein with reference to the accompanying drawings, it should be understood that the above-described exemplary embodiments are merely exemplary, and are not intended to limit the scope of the present invention thereto. Those of ordinary skill in the art can make various changes and modifications therein without departing from the scope and spirit of the present invention. All these changes and modifications are intended to be included within the scope of the present invention as claimed in the appended claims.

A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to realize the described functions, but such realization should not be considered as going beyond the scope of the present invention.

In the several embodiments provided in this application, it should be understood that the disclosed device and method may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another device, or some features can be ignored or not implemented.

In the instructions provided here, a lot of specific details are explained. However, it can be understood that the embodiments of the present invention can be practiced without these specific details. In some instances, well-known methods, structures, and technologies are not shown in detail, so as not to obscure the understanding of this specification.

Similarly, it should be understood that in order to simplify the present invention and help understand one or more of the various aspects of the invention, in the description of the exemplary embodiments of the present invention, the various features of the present invention are sometimes grouped together into a single embodiment. , Or in its description. However, the method of the present invention should not be construed as reflecting the intention that the claimed invention requires more features than those explicitly stated in each claim. To be more precise, as reflected in the corresponding claims, the point of the invention is that the corresponding technical problems can be solved with features that are less than all the features of a single disclosed embodiment. Therefore, the claims following the specific embodiment are thus explicitly incorporated into the specific embodiment, wherein each claim itself serves as a separate embodiment of the present invention.

Those skilled in the art can understand that in addition to mutual exclusion between the features, any combination of all features disclosed in this specification (including the accompanying claims, abstract, and drawings) and any method or device disclosed in this manner can be used. Processes or units are combined. Unless expressly stated otherwise, each feature disclosed in this specification (including the accompanying claims, abstract and drawings) may be replaced by an alternative feature providing the same, equivalent or similar purpose.

In addition, those skilled in the art can understand that although some embodiments described herein include certain features included in other embodiments but not other features, the combination of features of different embodiments means that they are within the scope of the present invention. Within and form different embodiments. For example, in the claims, any one of the claimed embodiments can be used in any combination.

The various component embodiments of the present invention may be implemented by hardware, or by software modules running on one or more processors, or by a combination of them. Those skilled in the art should understand that a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all of the functions of some modules according to the embodiments of the present invention. The present invention can also be implemented as a device program (for example, a computer program and a computer program product) for executing part or all of the methods described herein. Such a program for realizing the present invention may be stored on a computer-readable medium, or may have the form of one or more signals. Such a signal can be downloaded from an Internet website, or provided on a carrier signal, or provided in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the present invention, and those skilled in the art can design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses should not be constructed as a limitation to the claims. The invention can be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In the unit claims that list several devices, several of these devices may be embodied in the same hardware item. The use of the words first, second, and third, etc. do not indicate any order. These words can be interpreted as names.

The above are only specific implementations or descriptions of specific implementations of the present invention. The protection scope of the present invention is not limited thereto. Any person skilled in the art can easily fall within the technical scope disclosed by the present invention. Any change or replacement should be included in the protection scope of the present invention. The protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

A motion estimation method, characterized in that the method includes:

For the affine coding unit in the current frame, select one from at least four kinds of motion vector accuracy to perform motion estimation in the reference frame, thereby determining the motion vector of the control point of the affine coding unit;

Dividing the affine coding unit into several subunits;

The motion vector of the sub-unit in the affine coding unit is calculated according to the motion vector of the control point.
The method according to claim 1, wherein the four kinds of motion vector precisions include 4 pixels, 2 pixels, integer pixels, 1/2 pixels, 1/4 pixels, 1/8 pixels, and 1/16 pixels. Any four of them.
The method according to claim 1, wherein the determining the motion vector of the control point of the affine coding unit comprises:

Acquiring motion vectors of spatial neighboring coding units or temporal neighboring coding units, and constructing a candidate list according to a combination of motion vectors of the spatial neighboring coding units or temporal neighboring coding units;

Selecting a group of motion vectors from the candidate list as the predicted motion vectors of the control points of the affine coding unit;

Perform motion estimation in the reference frame according to the predicted motion vector to determine the actual motion vector of the control point of the affine coding unit.
The method according to claim 1, further comprising: for the conventional coding unit in the current frame, performing motion estimation with the entire coding unit as a unit.
The method according to claim 4, wherein the number of motion vector precisions available to the affine coding unit is the same as the number of motion vector precisions available to the conventional coding unit.
The method according to claim 5, further comprising: for each conventional coding unit, adaptively selecting one of four kinds of motion vector accuracy to perform the motion estimation of the conventional coding unit, and the conventional coding unit The four motion vector accuracies of the coding unit are the same as or different from the four motion vector accuracies of the affine coding unit.
The method according to claim 6, further comprising: recording the accuracy of the motion vector of the selected affine coding unit in the bitstream identification bit, indicating the accuracy of the motion vector of the affine coding unit The identification is consistent with the identification indicating the accuracy of the motion vector of the conventional coding unit.
The method according to claim 6, wherein the selecting one of the at least four kinds of motion vector accuracy to perform motion estimation in a reference frame, before further comprising:

Obtain a code stream, the identification bits of the code stream record the motion vector accuracy of the selected affine coding unit, the identifier representing the motion vector accuracy of the affine coding unit and the identifier representing the conventional coding unit The signs of motion vector accuracy are consistent.
The method according to claim 1, wherein each of the affine coding units includes two control points or three control points.
The method according to claim 1, further comprising: selecting an interpolation filter according to the accuracy of the motion vector to perform interpolation processing on the reference block.
The method according to claim 10, wherein the selecting an interpolation filter according to the accuracy of the motion vector comprises:

Choose different interpolation filters according to different motion vector accuracy; or

The same interpolation filter is used for all motion vector accuracy.
The method according to claim 11, further comprising: if different interpolation filters are selected according to different motion vector precisions, setting an identification bit of the filter type in the code stream.
The method according to claim 11, wherein different interpolation filters are selected for different motion vector accuracy;

The selection of an interpolation filter according to the accuracy of the motion vector previously further includes:

Obtain a code stream, and an identification bit of the filter type corresponding to the motion vector is set in the code stream.
The method according to claim 11, wherein the selecting different interpolation filters according to different motion vector precisions comprises:

When 1/2 pixel accuracy is selected as the motion vector accuracy, the first interpolation filter is selected to perform interpolation processing on the reference block;

When a precision other than 1/2 pixel precision is selected as the motion vector precision, a second interpolation filter is selected to perform interpolation processing on the reference block, wherein,

The number of taps of the first interpolation filter and the second interpolation filter are different.
The method according to claim 14, wherein the first interpolation filter is a 6-tap interpolation filter.
The method according to claim 14, wherein the second interpolation filter is an 8-tap interpolation filter.
The method according to claim 1, wherein the selecting one of four kinds of motion vector accuracy to perform motion estimation in a reference frame comprises:

The accuracy of the motion vector is selected according to the accuracy of the motion vector selected by the neighboring coding unit.
The method according to claim 1, wherein the adaptively selecting one of four kinds of motion vector accuracy to perform motion estimation in a reference frame comprises:

At least two of the four types of motion vector accuracy are attempted to perform motion estimation, and the motion vector accuracy is selected based on the effect of the motion estimation.
The method according to claim 18, wherein the attempting to perform motion estimation based on at least two of the four kinds of motion vector accuracy, and selecting the motion vector accuracy based on the effect of the motion estimation, comprises:

Select two kinds of motion vector precisions from the four kinds of motion vector precisions, try to perform motion estimation respectively, and compare the effects of the two motion estimations;

If the motion estimation effect with lower motion vector accuracy is better, stop trying, and directly use the lower motion vector accuracy as the selected motion vector accuracy. If the motion estimation effect with higher motion vector accuracy is used If it is better, continue to use higher motion vector accuracy to try motion estimation until the best motion estimation effect is obtained.
The method according to claim 1, wherein the reference frame includes a video frame before the current frame and a video frame after the current frame.
A motion estimation method, characterized in that the method includes:

For the affine coding unit in the current frame, select one from a variety of motion vector precisions to perform motion estimation in the reference frame, thereby determining the motion vector of the control point of the affine coding unit, wherein the multiple Motion vector accuracy includes 1/2 pixel accuracy;

Dividing the affine coding unit into several subunits;

The motion vector of the sub-unit in the affine coding unit is calculated according to the motion vector of the control point.
The method according to claim 21, wherein the multiple motion vector precisions further include 4 pixel precision, 2 pixel precision, integer pixel precision, 1/4 pixel precision, 1/8 pixel precision, and 1/16 pixel precision. At least one of accuracy.
22. The method according to claim 21, wherein said determining the motion vector of the control point of the affine coding unit comprises:

Acquiring motion vectors of spatial neighboring coding units or temporal neighboring coding units, and constructing a candidate list according to a combination of motion vectors of the spatial neighboring coding units or temporal neighboring coding units;

Selecting a group of motion vectors from the candidate list as the predicted motion vectors of the control points of the affine coding unit;

Perform motion estimation in the reference frame according to the predicted motion vector to determine the actual motion vector of the control point of the affine coding unit.
The method according to claim 21, further comprising: for the conventional coding unit in the current frame, performing motion estimation using the entire coding unit as a unit.
The method according to claim 24, wherein the number of motion vector precisions available for the affine coding unit is the same as the number of motion vector precisions available for the conventional coding unit.
The method according to claim 25, further comprising: recording the accuracy of the motion vector of the selected affine coding unit in the bitstream identification bit, indicating the accuracy of the motion vector of the affine coding unit The identification is consistent with the identification indicating the accuracy of the motion vector of the conventional coding unit.
The method according to claim 26, wherein the selecting one of the at least four kinds of motion vector accuracy to perform motion estimation in a reference frame, before further comprising:

Obtain a code stream, the identification bits of the code stream record the motion vector accuracy of the selected affine coding unit, the identifier representing the motion vector accuracy of the affine coding unit and the identifier representing the conventional coding unit The signs of motion vector accuracy are consistent.
The method according to claim 21, further comprising: selecting an interpolation filter according to the accuracy of the motion vector to perform interpolation processing on the reference block.
The method according to claim 28, wherein the selecting an interpolation filter according to the accuracy of the motion vector comprises:

Choose different interpolation filters according to different motion vector accuracy; or

The same interpolation filter is used for all motion vector accuracy.
The method according to claim 29, wherein the selecting different interpolation filters according to different motion vector precisions comprises:

When the 1/2 pixel precision is selected as the motion vector precision, a first interpolation filter is selected to perform interpolation processing on the reference block;

When a precision other than the 1/2 pixel precision is selected as the motion vector precision, a second interpolation filter is selected to perform interpolation processing on the reference block,

The number of taps of the first interpolation filter and the second interpolation filter are different.
The method according to claim 30, wherein the first interpolation filter is a 6-tap interpolation filter.
The method according to claim 30, wherein the second interpolation filter is an 8-tap interpolation filter.
The method according to claim 29, further comprising: if different interpolation filters are selected according to different motion vector precisions, setting an identification bit of the filter type in the code stream.
The method according to claim 33, wherein different interpolation filters are selected for different motion vector accuracy;

The selection of an interpolation filter according to the accuracy of the motion vector previously further includes:

Obtain a code stream, and an identification bit of the filter type corresponding to the motion vector is set in the code stream.
A motion estimation system, characterized in that the system includes a storage device and a processor, the storage device stores a computer program run by the processor, and the computer program is executed when the processor is run The motion estimation method according to any one of claims 1-34.
A storage medium, characterized in that a computer program is stored on the storage medium, and the computer program executes the motion estimation method according to any one of claims 1-34 during operation.