CN118075485A

CN118075485A - Inter-frame motion estimation method and device, electronic equipment and storage medium

Info

Publication number: CN118075485A
Application number: CN202211473468.3A
Authority: CN
Inventors: 宋剑军; 翟云; 杨作兴; 胡祥斌
Original assignee: Shenzhen MicroBT Electronics Technology Co Ltd
Current assignee: Shenzhen MicroBT Electronics Technology Co Ltd
Priority date: 2022-11-23
Filing date: 2022-11-23
Publication date: 2024-05-24

Abstract

The embodiment of the invention provides an inter-frame motion estimation method, an inter-frame motion estimation device, electronic equipment and a storage medium. The method comprises the following steps: determining the interest degree of the current coding block; determining a Lagrangian factor for the current encoding block based on the degree of interest, wherein the Lagrangian factor decreases as the degree of interest increases; and performing inter-frame motion estimation on the current coding block based on the Lagrangian factor. According to the method and the device, the Lagrange factors are adjusted by considering the difference of the interested degree between different areas, and good compromise can be achieved between the image quality and the complexity of the image frame.

Description

Inter-frame motion estimation method and device, electronic equipment and storage medium

Technical Field

The present invention relates to video encoding and decoding technology, and in particular, to a method and apparatus for inter-frame motion estimation, an electronic device, and a storage medium.

Background

The main processes of video coding include prediction, transformation, quantization, and entropy coding, where prediction includes intra-prediction and inter-prediction. Inter prediction refers to predicting a current pixel block of a current image to be encoded by using the encoded pixel blocks of the encoded images adjacent in time by utilizing the correlation of the video time domain, so as to achieve the purpose of removing the video time domain redundancy.

The core of inter prediction is block-based motion estimation, i.e. each pixel block of the current image is searched for the best matching block in the previously encoded image, so as to obtain the motion vector corresponding to each pixel block. The picture used for prediction is referred to as a reference picture, and the blocks of pixels in the reference picture are referred to as reference blocks. The pixel block to be predicted in the current image is called a current coding block, and the displacement of the reference block to the current coding block is called a motion vector.

In the prior art, the same motion estimation strategy is performed indiscriminately for the whole frame of image, and it is difficult to achieve a good compromise between coding quality and coding complexity.

Disclosure of Invention

The embodiment of the invention provides an inter-frame motion estimation method, an inter-frame motion estimation device, electronic equipment and a storage medium.

The technical scheme of the embodiment of the invention is as follows:

An inter-frame motion estimation method, comprising;

Determining the interest degree of the current coding block;

determining a lagrangian factor for the current encoded block based on the level, wherein the lagrangian factor decreases as the degree of interest increases;

And performing inter-frame motion estimation on the current coding block based on the Lagrangian factor.

In an exemplary embodiment, the determining the lagrangian factor for the current encoded block based on the degree of interest comprises:

Adjusting a quantization parameter value of the current coding block based on the degree of interest, wherein the quantization parameter value decreases as the degree of interest increases;

And determining the Lagrangian factor of the current coding block based on the adjusted quantization parameter value.

determining a Lagrangian factor of the current coding block based on the quantization parameter value of the current coding block;

And adjusting a Lagrangian factor of the current coding block based on the degree of interest, wherein the Lagrangian factor decreases with increasing degree of interest.

In an exemplary embodiment, the level of interest has a level that includes at least one of:

a first stage associated with a foreground region segmented from an image frame;

a second stage associated with a background region segmented from the image frame;

third, associating with a remaining region of the image frame other than the foreground region and the background region;

wherein the first level of interest is greater than the third level of interest, and the third level of interest is greater than the second level of interest.

In an exemplary embodiment, the level of the current coding block is a first level; the performing inter-frame motion estimation on the current encoded block based on the lagrangian factor comprises:

In the reference frame searching area of the current coding block, performing integral pixel searching by adopting a 1-pixel step length, calculating the rate distortion cost of each searched pixel point based on the Lagrange factor, and determining the pixel point with the minimum rate distortion cost;

Searching is carried out around the pixel point with the minimum rate distortion cost by adopting 1/2 pixel step length, the rate distortion cost of each searched 1/2 pixel point is calculated based on the Lagrange factor, and the 1/2 pixel point with the minimum rate distortion cost is determined;

Searching is carried out around the 1/2 pixel point with the minimum rate distortion cost by adopting 1/4 pixel step length, the rate distortion cost of each searched 1/4 pixel point is calculated based on the Lagrange factor, and the 1/4 pixel point with the minimum rate distortion cost is determined;

And determining the motion vector of the current coding block based on the 1/4 pixel point with the minimum rate distortion cost.

In an exemplary embodiment, the level of the current coding block is a third level; the performing inter-frame motion estimation on the current encoded block based on the lagrangian factor comprises:

In the reference frame searching area of the current coding block, performing whole pixel searching by adopting a 2-pixel step length, calculating the rate distortion cost of each searched pixel point based on the Lagrange factor, and determining the pixel point with the minimum rate distortion cost when the step length is 2;

Searching is carried out around the pixel point with the minimum rate distortion cost when the step length is 2 by adopting a 1-pixel step length, the rate distortion cost of each searched pixel point is calculated based on the Lagrange factor, and the pixel point with the minimum rate distortion cost when the step length is 1 is determined;

Searching is carried out around the pixel point with the minimum rate distortion cost when the step length is 1 by adopting 1/2 pixel step length, the rate distortion cost of each searched 1/2 pixel point is calculated based on the Lagrange factor, and the 1/2 pixel point with the minimum rate distortion cost is determined;

In an exemplary embodiment, the level of the current coding block is a second level; the performing inter-frame motion estimation on the current encoded block based on the lagrangian factor comprises:

In the reference frame searching area of the current coding block, performing whole pixel searching by adopting a 4-pixel step length, calculating the rate distortion cost of each searched pixel point based on the Lagrange factor, and determining the pixel point with the minimum rate distortion cost when the step length is 4;

Searching is carried out around the pixel point with the minimum rate distortion cost when the step length is 4 by adopting a 2-pixel step length, the rate distortion cost of each searched pixel point is calculated based on the Lagrange factor, and the pixel point with the minimum rate distortion cost when the step length is 2 is determined;

Searching is carried out around the pixel point with the minimum rate distortion cost when the step length is 2 by adopting 1 pixel step length, the rate distortion cost of each searched pixel point is calculated based on the Lagrange factor, and the pixel point with the minimum rate distortion cost is determined;

An inter-frame motion estimation apparatus, comprising:

A first determination module configured to determine a degree of interest of a current coding block;

A second determination module configured to determine a lagrangian factor for the current encoded block based on the degree of interest, wherein the lagrangian factor decreases with increasing degree of interest;

a motion estimation module configured to perform inter-frame motion estimation on the current encoded block based on the lagrangian factor.

In an exemplary embodiment, the second determining module is configured to adjust a quantization parameter value of the current coding block based on the degree of interest, wherein the quantization parameter value decreases with increasing degree of interest; and determining the Lagrangian factor of the current coding block based on the adjusted quantization parameter value.

In an exemplary embodiment, the second determining module is configured to determine the lagrangian factor of the current coding block based on the quantization parameter value of the current coding block; and adjusting a Lagrangian factor of the current coding block based on the degree of interest, wherein the Lagrangian factor decreases with increasing degree of interest.

An electronic device, comprising:

A memory;

A processor;

Wherein the memory has stored therein an application executable by the processor for causing the processor to perform the inter-frame motion estimation method as claimed in any one of the preceding claims.

A computer readable storage medium having stored thereon computer readable instructions which, when executed by a processor, cause the processor to perform the inter-frame motion estimation method of any of the above claims.

From the above technical solution, in the embodiment of the present invention, the degree of interest of the current coding block is determined; determining a Lagrangian factor for the current encoding block based on the degree of interest, wherein the Lagrangian factor decreases with increasing degree of interest; inter-frame motion estimation is performed on the current encoded block based on the lagrangian factor. Therefore, according to the embodiment of the invention, the Lagrange factors are adjusted in consideration of the interested difference degree between different areas, and for the current coding block with higher interested degree, the distortion weight in the rate distortion cost calculation model is increased by reducing the Lagrange factors, so that the pixel point with smaller distortion (namely better image quality) tends to be selected in the searching and selecting process, and the imaging quality of the current coding block is increased (the complexity is correspondingly increased). Similarly, for current coding blocks of less interest, pixels with greater distortion tend to be selected, thereby reducing complexity. Therefore, the embodiment of the invention can realize good compromise between the image quality and the complexity of the image frame.

Drawings

Fig. 1 is an exemplary flowchart of an inter motion estimation method according to an embodiment of the present invention.

FIG. 2 is an exemplary schematic diagram of regions and interest levels in an image frame according to an embodiment of the present invention.

Fig. 3 is an exemplary schematic diagram of an inter motion estimation process for an image frame according to an embodiment of the present invention.

Fig. 4 is an exemplary block diagram of an inter motion estimation apparatus according to an embodiment of the present invention.

Fig. 5 is an exemplary structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent.

For simplicity and clarity of description, the following description sets forth aspects of the invention by describing several exemplary embodiments. Numerous details in the embodiments are provided solely to aid in the understanding of the invention. It will be apparent, however, that the embodiments of the invention may be practiced without limitation to these specific details. Some embodiments are not described in detail in order to avoid unnecessarily obscuring aspects of the present invention, but rather only to present a framework. Hereinafter, "comprising" means "including but not limited to", "according to … …" means "according to at least … …, but not limited to only … …". The term "a" or "an" is used herein to refer to a number of components, either one or more, or at least one, unless otherwise specified.

Hereinafter, terms related to the embodiments of the present disclosure will be explained.

Rate distortion optimization (Rate-distortion optimization, RDO): in the case of a limited coding rate, a mode with the least distortion is selected.

Quantization (Quantization): quantization refers to the process of mapping a continuous value (or a large number of possible discrete values) of a signal to a finite number of discrete magnitudes, implementing a many-to-one mapping of signal values.

Quantization parameter (Quantization Parameter, QP): is the sequence number of the quantization step (Qstep) reflecting the spatial detail compression. The smaller the quantization parameter value, the finer the quantization, the higher the image quality, and the longer the resulting code stream.

Lagrangian factor: the balance between coding rate and distortion is a key factor in video coding. In calculating the rate-distortion cost, the distortion and the coding rate can be combined by the effect of the lagrangian factor.

Motion estimation is done by a search algorithm. Conventional video coding standards typically employ full search, two-dimensional logarithm (Two-Dimensional Logarithmic), three-dimensional logarithm (Three-Dimensional Logarithmic), UMHexagonS, or TZsearch, among other search methods. The full search complexity is high, the real-time coding requirement is difficult to meet, and the rest search algorithms can only obtain local optimal reference blocks, so that the coding quality is inferior to that of full search. In the search algorithm, the pixel point with the minimum rate distortion cost is generally selected as a search result. A common rate distortion cost calculation model is j=d+λ×r; wherein: j represents a rate distortion cost; d represents distortion; r represents the encoded output bit rate; λ represents the weight between distortion and bit rate, i.e. the lagrangian factor. The pixel at the minimum value of J (minJ) is typically selected as the search result.

In the prior art, a lagrangian factor is determined based on a quantization parameter of a current coding block, and motion estimation is directly performed on the current coding block using the lagrangian factor. That is, the prior art does not adapt the lagrangian factor based on the different regions of the image frame to which the current coding block belongs.

The applicant found that: in practice, the imaging quality requirements and complexity tolerance of different regions of an image frame are different. Such as: the region of interest needs to have higher image quality and tolerate higher complexity; the non-region of interest may tolerate lower image quality but requires less complexity. Thus, if the lagrangian factor is adjusted to take into account the degree of difference of interest between different regions, a good compromise between image quality and complexity of the image frame can be achieved.

Fig. 1 is an exemplary flowchart of an inter motion estimation method according to an embodiment of the present invention. As shown in fig. 1, the method includes:

Step 101: the degree of interest of the current coding block is determined.

The current encoded block is the block of pixels currently encoded in the image frame. The image frames are reconstructed image frames after the video coding compression technology is adopted on the original image frames to reduce the code rate. For example, video coding compression techniques may include: (1), ISO-MPEG/ITU-T series: a series of coding standards developed by the Moving Picture Experts Group (MPEG) and the international telecommunication union, telecommunication standardization sector (ITU-T) under the International Standard Organization (ISO), comprising in particular: (1.1), h.265, also known as High Efficiency Video Coding (HEVC), is a successor to h.264; (1.2), h.266, also known as multi-function video coding (VERSATILE VIDEO CODING, VVC), is the successor to h.265, and so on. (2), AOM series: the open media alliance (Alliance for Open Media, AOM) developed coding standards. The method specifically comprises the following steps: (2.1), VP8; (2.2), VP9; (2.3), AV1, etc. (3), AVS series: the method specifically comprises the following steps: (3.1), a second algebraic audio video codec technical standard (AVS 2); second algebraic audio video codec standard (AVS 3), and so on.

The Coding block may be implemented as a Coding Tree Unit (CTU), a Coding Unit (CU), a Prediction Unit (PU), a Transform Unit (TU), or the like.

While the above exemplary descriptions of typical examples of video coding compression techniques and coding blocks, those skilled in the art will recognize that such descriptions are merely exemplary and are not intended to limit the scope of embodiments of the present invention.

In embodiments of the present invention, the degree of interest may be determined by a user or computer vision algorithm. For example, automatic image segmentation techniques may be employed to automatically identify the degree of interest of a region in an image frame, or based on manual framing. For example, automatic image segmentation techniques may include: conventional image segmentation methods (such as watershed method, grabCyt method, meanShift method, background subtraction method, etc.) and image segmentation methods based on deep learning (such as full convolution layer network, UNet network with Encode-Decode structure, and spatial pyramid pooling network (PSPNet), etc.).

In one embodiment, the level of interest has a level, wherein the level comprises:

(1): a first stage associated with a foreground region (e.g., a moving object such as a vehicle or a specific object such as a face) segmented from an image frame;

(2): a second stage associated with a background region (e.g., a stationary pattern background) segmented from the image frame;

(3): and a third stage associated with the remaining regions of the image frame except for the foreground region and the background region.

Wherein: the first level of interest is more interesting than the third level of interest, which is more interesting than the second level of interest.

The image frame ABCD includes a foreground region IJKL. The foreground region IJKL has the highest degree of interest, and is the first level. The foreground region IJKL is surrounded by a region EFGH of a predetermined size. The remaining area after removing the area EFGH from the image frame ABCD is the background area with the lowest interest level, i.e. the second level. The remainder of the region EFGH after the removal of the foreground region IJKL is a region of interest between the foreground region and the background region, i.e., the third level.

While the above exemplary descriptions of the level of interest have been presented for exemplary purposes, those skilled in the art will appreciate that such descriptions are exemplary only and are not intended to limit the scope of embodiments of the invention. In fact, the three levels described above may be further refined or summarized, and the embodiment of the present invention is not limited thereto.

Step 102: based on the degree of interest, a Lagrangian factor for the current encoded block is determined, wherein the Lagrangian factor decreases as the degree of interest increases.

Here, for the current coding block of higher interest, the weight of D in the rate-distortion cost calculation model (j=d+λ×r) is increased by reducing its lagrangian factor, so that a pixel point with smaller D (i.e., better image quality) tends to be selected in the search selection process, thereby increasing the imaging quality (and correspondingly increasing complexity) of the current coding block. Similarly, for current coding blocks of less interest, pixels with greater distortion tend to be selected, thereby reducing complexity.

In one embodiment, step 102 specifically includes: adjusting a quantization parameter value of the current coding block based on the degree of interest, wherein the quantization parameter value decreases with increasing degree of interest; and determining the Lagrangian factor of the current coding block based on the adjusted quantization parameter value. Thus, by directly adjusting the quantization parameter value of the current coding block, the lagrangian factor of the current coding block can be adjusted. In this embodiment, for a current encoded block having different degrees of interest, its quantization parameters are first adjusted separately, wherein the quantization parameter values decrease as the degree of interest increases. For example, when the level is the first level, the quantization parameter value of the current coding block is reduced; when the level is the third level, the quantization parameter value of the current coding block is stored unchanged; when the level is the second level, the quantization parameter value for increasing the current coding block increases. Then, the lagrangian factors of the current coding block are calculated based on the respectively adjusted quantization parameters.

For example, taking h.265 as an example, the calculation formula of the lagrangian factor λ is as follows:

λ=α*W_k*2^{((QP-12)/3.0)}；

Wherein W _k is a weighting factor related to the position of the picture in a group of pictures (GOP); alpha is a reference frame factor, 1 if the reference frame is not the reference frame, and slightly less than 1 if the reference frame is the reference frame; QP is the adjusted quantization parameter described above.

Typical examples of determining the lagrangian factor based on quantization parameters are described above using h.265 as an example. In fact, the particular manner of calculating the lagrangian factor based on the quantization parameter may also include many forms of transformations, and embodiments of the present invention are not limited in this regard.

In one embodiment, step 102 specifically includes: determining a Lagrange factor of the current coding block based on the quantization parameter value of the current coding block; based on the degree of interest, a Lagrangian factor of the current encoded block is adjusted, wherein the Lagrangian factor decreases as the degree of interest increases.

In this embodiment, for the current encoded block having different degrees of interest, the quantization parameter thereof is not adjusted, but the lagrangian factor is first calculated based on the quantization parameter that is not adjusted (for example, refer to the calculation formula of the lagrangian factor λ described above), and then the lagrangian factor is adjusted. For example, when the level is the first level, the quantization parameter value of the current coding block is reduced by 3; when the level is the third level, the quantization parameter value of the current coding block is stored unchanged; when the level is the second level, the quantization parameter value of the current coding block is increased by 3. It can be seen that the lagrangian factor is directly adjusted without adjusting the quantization parameter value.

Step 103: inter-frame motion estimation is performed on the current encoded block based on the lagrangian factor.

Here, the motion estimation may be performed using a search method such as a full search, a two-dimensional logarithm, a three-dimensional logarithm, or UMHexagonS, TZsearch, and a pixel point at which the minimum value of the rate distortion cost calculated using the lagrangian factor is calculated may be selected as the search result.

Preferably, in order to achieve a better compromise between coding quality and coding complexity, a differentiation process may also be performed on the complexity of the search algorithm based on the level differences of the coding blocks, wherein the more interesting coding blocks, the more complex the search algorithm is employed.

In one embodiment, the level of the current coding block is the first level, and a high complexity motion estimation process is performed. At this time, step 103 specifically includes:

Step (1): and in the reference frame searching area of the current coding block, performing integral pixel searching by adopting a 1-pixel step length, calculating the rate distortion cost of each searched pixel point based on the Lagrange factor lambda _H, and determining the pixel point with the minimum rate distortion cost. For example, assume that the initial value of the Lagrangian factor calculated based on the quantization parameter of the current coding block is λ ₁,λ_H=λ₁ -3. Assuming that the size of the current search area is m×n, the number of J is m×n.

Step (2): and (3) searching is performed around the pixel point with the minimum rate distortion cost by adopting 1/2 pixel step length (for example, 4 or 8 1/2 pixel points nearby the searching, namely J is 4 or 8), the rate distortion cost of each searched 1/2 pixel point is calculated based on the Lagrange factor lambda _H, and the 1/2 pixel point with the minimum rate distortion cost is determined.

Step (3): and (3) searching around the 1/2 pixel point with the minimum rate distortion cost by adopting a 1/4 pixel step length (for example, 4 or 8 1/4 pixel points near the searching, namely J is 4 or 8), calculating the rate distortion cost of each searched 1/4 pixel point based on the Lagrange factor lambda _H, and determining the 1/4 pixel point with the minimum rate distortion cost.

Step (4): and determining the motion vector of the current coding block based on the 1/4 pixel point with the minimum rate distortion cost.

In step (4), the motion vector of the current coding block is determined at the 1/4 pixel point where the rate distortion cost is minimal, which corresponds to the corresponding description in h.265 for 1/4 pixel precision. In practice, the search may also be performed with smaller pixel steps (e.g., 1/8 pixel steps) around the 1/4 pixel point where the rate distortion cost is minimal to achieve smaller pixel accuracy (e.g., 1/8 pixel accuracy as found in H.266). In summary, the search may be performed in a larger or smaller pixel step size in a high complexity motion estimation process based on the pixel accuracy desired to be obtained, which is not limited by the embodiments of the present invention. Wherein: the integer pixel search algorithm in step (1) of the high complexity motion estimation process may be implemented as: any one of a Full search (Full) algorithm, a diamond search (DIA) algorithm, a hexagonal search (HEX) algorithm, a symmetrical cross-shaped multi-level hexagonal lattice point search (UMH) algorithm, a STAR Search (STAR) algorithm, and a Successive Elimination (SEA) algorithm.

In one embodiment, the current coding block is at a third level, and a motion estimation process of intermediate complexity is performed. At this time, step 103 specifically includes:

Step (1): and in the reference frame searching area of the current coding block, performing whole pixel searching by adopting a 2-pixel step length, calculating the rate distortion cost of each searched pixel point based on the Lagrange factor lambda _M, and determining the pixel point with the minimum rate distortion cost when the step length is 2. For example, assume that the initial value of the lagrangian factor calculated based on the quantization parameter of the current coding block is λ ₂,λ_M=λ₂. Assuming that the size of the current search area is m×n, the number of J is m×n/4.

Step (2): and (3) performing searching by adopting a 1-pixel step size around the pixel point with the minimum rate distortion cost when the step size is 2 (for example, 4 or 8 pixels near the searching, namely J is 4 or 8), calculating the rate distortion cost of each searched pixel point based on the Lagrange factor lambda _M, and determining the pixel point with the minimum rate distortion cost when the step size is 1.

Step (3): and (3) performing searching (for example, 4 or 8 1/2 pixel points near the searching (namely J is 4 or 8) around the pixel point with the minimum rate distortion cost when the step size is 1, calculating the rate distortion cost of each searched 1/2 pixel point based on the Lagrange factor lambda _M, and determining the 1/2 pixel point with the minimum rate distortion cost.

Step (4): and (3) searching around the 1/2 pixel point with the minimum rate distortion cost by adopting a 1/4 pixel step length (for example, 4 or 8 1/4 pixel points near the searching, namely J is 4 or 8), calculating the rate distortion cost of each searched 1/4 pixel point based on the Lagrange factor lambda _M, and determining the 1/4 pixel point with the minimum rate distortion cost.

Step (5): and determining the motion vector of the current coding block based on the 1/4 pixel point with the minimum rate distortion cost.

In step (5), the motion vector of the current coding block is determined at the 1/4 pixel point where the rate distortion cost is minimal, which corresponds to the corresponding description in h.265 for 1/4 pixel precision. In practice, the search may also be performed with smaller pixel steps (e.g., 1/8 pixel steps) around the 1/4 pixel point where the rate distortion cost is minimal to achieve smaller pixel accuracy (e.g., 1/8 pixel accuracy as found in H.266). In summary, the search may be performed in a motion estimation process of intermediate complexity with a larger or smaller pixel step size based on the pixel accuracy desired to be obtained, which is not limited by the embodiment of the present invention.

Wherein: the integer pixel search algorithm in step (1) of the medium complexity motion estimation process may be implemented as: any one of Full algorithm, DIA algorithm, HEX algorithm, UMH algorithm, STAR algorithm, SEA.

In one embodiment, the level of the current coding block is the second level, and a low complexity motion estimation process is performed. At this time, step 103 specifically includes:

Step (1): and in the reference frame searching area of the current coding block, performing whole pixel searching by adopting a 4-pixel step length, calculating the rate distortion cost of each searched pixel point based on the Lagrange factor lambda _L, and determining the pixel point with the minimum rate distortion cost when the step length is 4. For example, assume that the initial value of the lagrangian factor calculated based on the quantization parameter of the current coding block is λ ₃,λ_L=λ₃ +3. Assuming that the size of the current search area is M x N, the number of J is M x N/16.

Step (2): and (3) performing searching by adopting a 2-pixel step size around the pixel point with the minimum rate distortion cost when the step size is 4 (for example, 4 or 8 pixels near the searching, namely J is 4 or 8), calculating the rate distortion cost of each searched pixel point based on the Lagrange factor lambda _L, and determining the pixel point with the minimum rate distortion cost when the step size is 2.

Step (3): and (3) performing searching (for example, 4 or 8 pixels near the searching, namely J is 4 or 8) around the pixel with the minimum rate distortion cost when the step size is 2, calculating the rate distortion cost of each searched pixel based on the Lagrange factor lambda _L, and determining the pixel with the minimum rate distortion cost.

Step (4): and (3) searching is performed around the pixel point with the minimum rate distortion cost by adopting a 1/2 pixel step size (for example, 4 or 8 1/2 pixel points nearby the searching, namely J is 4 or 8), the rate distortion cost of each searched 1/2 pixel point is calculated based on the Lagrange factor lambda _L, and the 1/2 pixel point with the minimum rate distortion cost is determined.

Step (5): and (3) searching around the 1/2 pixel point with the minimum rate distortion cost by adopting a 1/4 pixel step length (for example, 4 or 8 1/4 pixel points near the searching, namely J is 4 or 8), calculating the rate distortion cost of each searched 1/4 pixel point based on the Lagrange factor lambda _L, and determining the 1/4 pixel point with the minimum rate distortion cost.

Step (6): and determining the motion vector of the current coding block based on the 1/4 pixel point with the minimum rate distortion cost.

In step (6), the motion vector of the current coding block is determined at the 1/4 pixel point where the rate distortion cost is minimal, which corresponds to the corresponding description in h.265 for 1/4 pixel precision. In practice, the search may also be performed with smaller pixel steps (e.g., 1/8 pixel steps) around the 1/4 pixel point where the rate distortion cost is minimal to achieve smaller pixel accuracy (e.g., 1/8 pixel accuracy as found in H.266). In summary, the search may be performed in a lower complexity motion estimation process with a larger or smaller pixel step size based on the pixel accuracy desired to be obtained, which is not limited by the embodiments of the present invention.

Wherein: the integer pixel search algorithm in step (1) of the low complexity motion estimation process may be implemented as: any one of Full algorithm, DIA algorithm, HEX algorithm, UMH algorithm, STAR algorithm, SEA.

Fig. 3 is an exemplary schematic diagram of an inter motion estimation process for an image frame according to an embodiment of the present invention. As shown in fig. 3, the inter-frame motion estimation process includes:

Step 301: whether the current coding block belongs to the foreground region is determined, if yes, step 306 is executed and the flow is ended, and if not, step 302 and the following steps are executed.

Step 302: it is determined whether the current encoded block belongs to the background area, if yes, step 307 is performed and the flow is ended, and if not, step 303 and subsequent steps are performed.

Step 303: a motion estimation process of intermediate complexity is performed.

Step 304: whether the current code block is the last code block is determined, if yes, the process is ended, and if not, step 305 and the subsequent steps are executed.

Step 305: switch to the next code block and return to execution step 301.

Step 306: a high complexity motion estimation process is performed.

Step 307: a low complexity motion estimation process is performed.

Based on the above description, the embodiment of the invention also provides an inter-frame motion estimation device. Fig. 4 is an exemplary block diagram of an inter motion estimation apparatus according to an embodiment of the present invention. As shown in fig. 4, the inter motion estimation apparatus 400 includes: a first determining module 401 configured to determine a degree of interest of a current coding block; a second determining module 402 configured to determine a lagrangian factor for the current encoding block based on the degree of interest, wherein the lagrangian factor decreases with increasing degree of interest; the motion estimation module 403 is configured to perform inter-frame motion estimation on the current encoded block based on the lagrangian factor.

In one embodiment, the second determining module 402 is configured to adjust a quantization parameter value of the current coding block based on the degree of interest, wherein the quantization parameter value decreases with increasing degree of interest; and determining the Lagrangian factor of the current coding block based on the adjusted quantization parameter value.

In one embodiment, the second determining module 402 is configured to determine the lagrangian factor for the current coding block based on the quantization parameter value of the current coding block; based on the degree of interest, a Lagrangian factor of the current encoded block is adjusted, wherein the Lagrangian factor decreases as the degree of interest increases.

In one embodiment, the level of interest has a level, wherein the level comprises at least one of: a first stage associated with a foreground region segmented from an image frame; a second stage associated with a background region segmented from the image frame; third, associating with a remaining region of the image frame other than the foreground region and the background region; wherein the first level of interest is greater than the third level of interest, and the third level of interest is greater than the second level of interest.

In one embodiment, the level of the current coding block is the first level; a motion estimation module 403 configured to: in a reference frame searching area of a current coding block, performing whole pixel searching by adopting a 1-pixel step length, calculating the rate distortion cost of each searched pixel point based on a Lagrange factor, and determining the pixel point with the minimum rate distortion cost; searching is carried out around the pixel point with the minimum rate distortion cost by adopting 1/2 pixel step length, the rate distortion cost of each searched 1/2 pixel point is calculated based on Lagrange factors, and the 1/2 pixel point with the minimum rate distortion cost is determined; searching is carried out around 1/2 pixel points with the minimum rate distortion cost by adopting 1/4 pixel step length, the rate distortion cost of each searched 1/4 pixel point is calculated based on Lagrange factors, and the 1/4 pixel point with the minimum rate distortion cost is determined; and determining the motion vector of the current coding block based on the 1/4 pixel point with the minimum rate distortion cost.

In one embodiment, the level of the current coding block is the third level; a motion estimation module 403 configured to: in a reference frame searching area of a current coding block, performing whole pixel searching by adopting a 2-pixel step length, calculating the rate distortion cost of each searched pixel point based on a Lagrange factor, and determining the pixel point with the minimum rate distortion cost when the step length is 2; searching is carried out around the pixel point with the minimum rate distortion cost when the step length is 2 by adopting 1 pixel step length, the rate distortion cost of each searched pixel point is calculated based on Lagrange factors, and the pixel point with the minimum rate distortion cost when the step length is 1 is determined; searching is carried out around the pixel point with the minimum rate distortion cost when the step length is 1 by adopting 1/2 pixel step length, the rate distortion cost of each searched 1/2 pixel point is calculated based on Lagrange factors, and the 1/2 pixel point with the minimum rate distortion cost is determined; searching is carried out around 1/2 pixel points with the minimum rate distortion cost by adopting 1/4 pixel step length, the rate distortion cost of each searched 1/4 pixel point is calculated based on Lagrange factors, and the 1/4 pixel point with the minimum rate distortion cost is determined; and determining the motion vector of the current coding block based on the 1/4 pixel point with the minimum rate distortion cost.

In one embodiment, the level of the current encoded block is the second level; a motion estimation module 403 configured to: in a reference frame searching area of a current coding block, performing whole pixel searching by adopting a 4-pixel step length, calculating the rate distortion cost of each searched pixel point based on a Lagrange factor, and determining the pixel point with the minimum rate distortion cost when the step length is 4; searching is carried out around the pixel point with the minimum rate distortion cost when the step length is 4 by adopting a 2-pixel step length, the rate distortion cost of each searched pixel point is calculated based on the Lagrange factor, and the pixel point with the minimum rate distortion cost when the step length is 2 is determined; searching is carried out around the pixel point with the minimum rate distortion cost when the step length is 2 by adopting 1 pixel step length, the rate distortion cost of each searched pixel point is calculated based on Lagrange factors, and the pixel point with the minimum rate distortion cost is determined; searching is carried out around the pixel point with the minimum rate distortion cost by adopting 1/2 pixel step length, the rate distortion cost of each searched 1/2 pixel point is calculated based on Lagrange factors, and the 1/2 pixel point with the minimum rate distortion cost is determined; searching is carried out around 1/2 pixel points with the minimum rate distortion cost by adopting 1/4 pixel step length, the rate distortion cost of each searched 1/4 pixel point is calculated based on Lagrange factors, and the 1/4 pixel point with the minimum rate distortion cost is determined; and determining the motion vector of the current coding block based on the 1/4 pixel point with the minimum rate distortion cost.

Fig. 5 is an exemplary structural diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 5, the electronic device 800 includes: a processor 801 and a memory 802. Processor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 801 may be implemented in at least one hardware form of Digital Signal Processing (DSP), field-Programmable gate array (fieldprogrammable GATE ARRAY, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 801 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a central processor (Central Processing Unit, CPU), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 801 may integrate with an image processor (Graphics Processing Unit, GPU) that is responsible for rendering and rendering of the content that the display screen is required to display. In some implementations, the processor 801 may also include an AI processor for processing computing operations related to machine learning. For example, the AI processor may be implemented as a neural network processor.

Memory 802 may include one or more computer-readable storage media, which may be non-transitory. Memory 802 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices.

In some embodiments, a non-transitory computer readable storage medium in memory 802 is used to store at least one instruction for execution by processor 801 to implement the inter-frame motion estimation methods provided by various embodiments in the present disclosure. In some embodiments, the electronic device 800 may further optionally include: a peripheral interface 803, and at least one peripheral. The processor 801, the memory 802, and the peripheral interface 803 may be connected by a bus or signal line. Individual peripheral devices may be connected to the peripheral device interface 803 by buses, signal lines, or a circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 804, a touch display 805, a camera assembly 806, audio circuitry 807, a positioning assembly 808, and a power supply 809. Peripheral interface 803 may be used to connect at least one Input/Output (I/O) related peripheral device to processor 801 and memory 802. In some implementations, the processor 801, memory 802, and peripheral interface 803 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 801, the memory 802, and the peripheral interface 803 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 804 is configured to receive and transmit Radio Frequency (RF) signals, also known as electromagnetic signals. The radio frequency circuit 804 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 804 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 804 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 804 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or wireless fidelity (WIRELESS FIDELITY, wi-Fi) networks. In some embodiments, the radio frequency circuitry 804 may also include circuitry related to Near Field Communication (NFC), as embodiments of the invention are not limited in this regard.

The display 805 is used to display a User Interface (UI). The UI may include graphics, text, icons, video, and any combination thereof. When the display 805 is a touch display, the display 805 also has the ability to collect touch signals at or above the surface of the display 805. The touch signal may be input as a control signal to the processor 801 for processing. At this time, the display 805 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 805 may be one and disposed on a front panel of the electronic device 800; in other embodiments, the display 805 may be at least two, respectively disposed on different surfaces of the electronic device 800 or in a folded design; in some implementations, the display 805 may be a flexible display disposed on a curved surface or a folded surface of the electronic device 800. Even more, the display 805 may be arranged in an irregular pattern other than rectangular, i.e., a shaped screen. The display 805 may be made of a Liquid crystal display (Liquid CRYSTAL DISPLAY, LCD), an Organic Light-Emitting Diode (OLED), or other materials.

The camera assembly 806 is used to capture images or video. Optionally, the camera assembly 806 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some implementations, the camera assembly 806 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The double-color temperature flash lamp is a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

Audio circuitry 807 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and the environment, converting the sound waves into electric signals, inputting the electric signals to the processor 801 for processing, or inputting the electric signals to the radio frequency circuit 804 for voice communication. For purposes of stereo acquisition or noise reduction, the microphone may be multiple and separately disposed at different locations of the electronic device 800. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 801 or the radio frequency circuit 804 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some implementations, the audio circuit 807 may also include a headphone jack. The location component 808 is operative to locate a current geographic location of the electronic device 800 to enable navigation or location-based services (Location Based Service, LBS). The positioning component 808 may be a positioning component based on the U.S. global positioning system (Global Positioning System, GPS), the beidou system of china, the grainer system of russia, or the galileo system of the european union. The power supply 809 is used to power the various components in the electronic device 800. The power supply 809 may be an alternating current, direct current, disposable battery, or rechargeable battery. When the power supply 809 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging.

Those skilled in the art will appreciate that the above-described structures are not limiting of the electronic device 800 and may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components. It should be noted that not all the steps and modules in the above processes and the structure diagrams are necessary, and some steps or modules may be omitted according to actual needs. The execution sequence of the steps is not fixed and can be adjusted as required. The division of the modules is merely for convenience of description and the division of functions adopted in the embodiments, and in actual implementation, one module may be implemented by a plurality of modules, and functions of a plurality of modules may be implemented by the same module, and the modules may be located in the same device or different devices. The hardware modules in the various embodiments may be implemented mechanically or electronically. For example, a hardware module may include specially designed permanent circuits or logic devices (e.g., special purpose processors such as FPGAs or ASICs) for performing certain operations. A hardware module may also include programmable logic devices or circuits (e.g., including a general purpose processor or other programmable processor) temporarily configured by software for performing particular operations. As regards implementation of the hardware modules in a mechanical manner, either by dedicated permanent circuits or by circuits that are temporarily configured (e.g. by software), this may be determined by cost and time considerations.

The application also provides a machine-readable storage medium storing instructions for causing a machine to perform the method of the application. Specifically, a system or apparatus provided with a storage medium on which a software program code realizing the functions of any of the above embodiments is stored, and a computer (or CPU or MPU) of the system or apparatus may be caused to read out and execute the program code stored in the storage medium. Further, some or all of the actual operations may be performed by an operating system or the like operating on a computer based on instructions of the program code. The program code read out from the storage medium may also be written into a memory provided in an expansion board inserted into a computer or into a memory provided in an expansion unit connected to the computer, and then, based on instructions of the program code, a CPU or the like mounted on the expansion board or the expansion unit may be caused to perform part or all of actual operations, thereby realizing the functions of any of the above embodiments. Storage medium implementations for providing program code include floppy disks, hard disks, magneto-optical disks, optical disks (e.g., CD-ROMs, CD-R, CD-RWs, DVD-ROMs, DVD-RAMs, DVD-RWs, DVD+RWs), magnetic tapes, non-volatile memory cards, and ROMs. Alternatively, the program code may be downloaded from a server computer or cloud by a communications network.

The foregoing is merely a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An inter-frame motion estimation method, comprising;

Determining the interest degree of the current coding block;

Determining a Lagrangian factor for the current encoding block based on the degree of interest, wherein the Lagrangian factor decreases as the degree of interest increases;

2. The method of claim 1, wherein the determining the lagrangian factor for the current encoded block based on the degree of interest comprises:

3. The method of claim 1, wherein the determining the lagrangian factor for the current encoded block based on the degree of interest comprises:

4. A method according to any one of claims 1-3, wherein the level of interest has a level comprising at least one of:

5. The method of claim 4, wherein the level of the current encoded block is a first level; the performing inter-frame motion estimation on the current encoded block based on the lagrangian factor comprises:

6. The method of claim 4, wherein the level of the current encoded block is a third level; the performing inter-frame motion estimation on the current encoded block based on the lagrangian factor comprises:

7. The method of claim 4, wherein the level of the current encoded block is a second level; the performing inter-frame motion estimation on the current encoded block based on the lagrangian factor comprises:

8. An inter-frame motion estimation apparatus, comprising:

9. The apparatus of claim 8, wherein the device comprises a plurality of sensors,

The second determination module is configured to adjust quantization parameter values of the current coding block based on the degree of interest, wherein the quantization parameter values decrease with increasing degree of interest; and determining the Lagrangian factor of the current coding block based on the adjusted quantization parameter value.

10. The apparatus of claim 8, wherein the device comprises a plurality of sensors,

The second determining module is configured to determine a lagrangian factor of the current coding block based on the quantization parameter value of the current coding block; and adjusting a Lagrangian factor of the current coding block based on the degree of interest, wherein the Lagrangian factor decreases with increasing degree of interest.

11. The apparatus of any one of claims 8-10, wherein the level of interest has a level comprising at least one of:

12. An electronic device, comprising:

A memory;

A processor;

wherein the memory has stored therein an application executable by the processor for causing the processor to perform the inter-frame motion estimation method of any of claims 1-7.

13. A computer readable storage medium having stored thereon computer readable instructions which, when executed by a processor, cause the processor to perform the inter-frame motion estimation method of any of claims 1-7.