CN117979024A

CN117979024A - Motion search method and device, electronic equipment and storage medium

Info

Publication number: CN117979024A
Application number: CN202410168018.6A
Authority: CN
Inventors: 简云瑞; 周超
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2024-02-05
Filing date: 2024-02-05
Publication date: 2024-05-03

Abstract

The embodiment of the disclosure provides a motion search method, a motion search device, electronic equipment and a storage medium. The method comprises the following steps: at least one first image block to be coded in a frame to be coded, in a corresponding reference frame, carrying out whole pixel search through a first whole pixel search algorithm, and determining a whole pixel matching point; the first whole pixel searching algorithm adopts an MR-SAD algorithm; taking the whole pixel matching point as a starting point, and carrying out sub-pixel searching through a sub-pixel searching algorithm in a preset sub-pixel searching range of a reference frame to determine sub-pixel matching points; and determining the motion vector of the image block to be encoded according to the sub-pixel matching points. According to the method, in the whole pixel searching process of motion searching, the MR-SAD algorithm is adopted for carrying out whole pixel searching on the first image block to be coded in the frame to be coded, so that more accurate whole pixel matching points are obtained, the problem that errors exist in distortion calculation caused by brightness difference is solved, and the overall coding performance is improved.

Description

Motion search method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of video encoding and decoding, and in particular, to a motion search method, apparatus, electronic device, computer program product, and storage medium.

Background

Currently, in the video coding process, the coding performance of video is generally improved through motion search. Motion search is a critical module in a video encoder, and the module is responsible for searching out Motion Vectors (MVs) of Coding Units (CUs) in the process of matching image blocks. The accuracy of the motion vector directly influences the residual error size of the coding unit, and further influences the code rate consumption of residual error coding. Therefore, this module is critical to the overall coding performance of the encoder.

However, the commonly used absolute error sum algorithm (Sum of Absolute Difference, SAD) has some drawbacks in performing the motion search, which can limit the coding efficiency of the encoder.

Disclosure of Invention

The embodiment of the disclosure provides a motion search method, a motion search device, electronic equipment and a storage medium.

According to a first aspect of embodiments of the present disclosure, there is provided a motion search method, including: at least one first image block to be coded in a frame to be coded, in a corresponding reference frame, carrying out whole pixel search through a first whole pixel search algorithm, and determining a whole pixel matching point; the frame to be encoded comprises at least one image block to be encoded; the first image block to be encoded belongs to the at least one image block to be encoded; the first whole pixel searching algorithm adopts an MR-SAD algorithm; taking the whole pixel matching point as a starting point, and carrying out sub-pixel searching in a preset sub-pixel searching range of the reference frame through a sub-pixel searching algorithm to determine sub-pixel matching points; and determining the motion vector of the image block to be encoded according to the sub-pixel matching points.

In some exemplary embodiments of the present disclosure, the method further comprises: carrying out whole pixel searching on at least one second image block to be coded in the frame to be coded in the corresponding reference frame through a second whole pixel searching algorithm, and determining the whole pixel matching point; the second image block to be encoded belongs to the at least one image block to be encoded; the second whole pixel search algorithm adopts SAD algorithm.

In some exemplary embodiments of the present disclosure, the method further comprises: and determining the image block to be encoded as the first image block to be encoded or the second image block to be encoded according to at least one piece of information in brightness, size or frame type of the image block to be encoded.

In some exemplary embodiments of the disclosure, the determining, according to the luminance of the image block to be encoded, the image block to be encoded as the first image block to be encoded or the second image block to be encoded includes: calculating brightness difference values between the image blocks to be coded and corresponding image blocks in the reference frame; responding to the brightness difference value being greater than or equal to a brightness threshold value, wherein the image block to be coded is the first image block to be coded; and responding to the brightness difference value being smaller than a brightness threshold value, wherein the image block to be coded is the second image block to be coded.

In some exemplary embodiments of the disclosure, the determining, according to the luminance of the image block to be encoded, the image block to be encoded as the first image block to be encoded or the second image block to be encoded includes: responding to the image block to be encoded being greater than or equal to a size threshold, wherein the image block to be encoded is the first image block to be encoded; and responding to the image block to be encoded being smaller than a size threshold, wherein the image block to be encoded is the second image block to be encoded.

In some exemplary embodiments of the disclosure, the determining, according to the luminance of the image block to be encoded, the image block to be encoded as the first image block to be encoded or the second image block to be encoded includes: responding to the frame to be coded as a key frame (I frame) or a forward prediction frame (P frame), wherein an image block to be coded of the frame to be coded is the first image block to be coded; in response to the frame to be encoded being a bi-predictive frame (B-frame), a block of pictures to be encoded of the frame to be encoded is the second block of pictures to be encoded.

In some exemplary embodiments of the present disclosure, the method further comprises: generating a mask image according to the first image block to be coded and the second image block to be coded; the mask map is used for indicating that the image block to be encoded is the first image block to be encoded or the second image block to be encoded.

In some exemplary embodiments of the present disclosure, the sub-pixel search algorithm employs an SATD algorithm.

In some exemplary embodiments of the present disclosure, the method further comprises: and responding to the decoding end to adopt DMVR algorithm, and transmitting the mask map to the decoding end so that the decoding end can correct the motion vector according to the mask map.

In some exemplary embodiments of the present disclosure, the method further comprises: responding to the image block to be encoded as the first image block to be encoded, and setting the first image block to be encoded into DMVR algorithm openable states; responding to the image block to be coded as the second image block to be coded, and detecting whether BCW corresponding to the second image block to be coded is set as a default value; setting the second image block to be coded to DMVR in an algorithm openable state in response to the BCW corresponding to the second image block to be coded being set to a default value; and setting DMVR the second image block to be coded into a non-starting state of an algorithm in response to the fact that the BCW corresponding to the second image block to be coded is set to be a non-default value.

In some exemplary embodiments of the present disclosure, the method further comprises: responding to the image block to be encoded as the first image block to be encoded, and calculating the distortion cost of the first image block to be encoded based on an MR-SAD algorithm; and responding to the image block to be encoded as the second image block to be encoded, and calculating the distortion cost of the second image block to be encoded based on an SAD algorithm.

According to a second aspect of embodiments of the present disclosure, there is provided a motion search apparatus comprising: the whole pixel searching module is configured to search for at least one first image block to be coded in the frame to be coded, and in the corresponding reference frame, the whole pixel searching is carried out through a first whole pixel searching algorithm to determine a whole pixel matching point; the frame to be encoded comprises at least one image block to be encoded; the first image block to be encoded belongs to the at least one image block to be encoded; the first whole pixel searching algorithm adopts an MR-SAD algorithm; the sub-pixel searching module is configured to perform sub-pixel searching by a sub-pixel searching algorithm in a preset sub-pixel searching range of the reference frame by taking the whole pixel matching point as a starting point, so as to determine a sub-pixel matching point; and the motion vector determining module is configured to determine the motion vector of the image block to be encoded according to the sub-pixel matching points.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the executable instructions to implement any of the motion search methods.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform any of the motion search methods.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program/instruction, characterized in that the computer program/instruction, when executed by a processor, implements the motion search method of any one of the claims.

According to the motion searching method, in the whole pixel searching process of motion searching, the MR-SAD algorithm is adopted for carrying out whole pixel searching on the first image block to be coded in the frame to be coded, so that more accurate whole pixel matching points can be obtained, the problem that errors exist in distortion calculation caused by brightness difference in the motion searching process is solved, code rate consumption of related residual data coding is reduced, and overall coding performance is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.

FIG. 1 shows a schematic diagram of an exemplary system architecture to which the methods of embodiments of the present disclosure may be applied.

Fig. 2 shows a schematic diagram of a video communication system to which embodiments of the present disclosure may be applied.

Fig. 3 is a flowchart illustrating a motion search method according to an exemplary embodiment.

Fig. 4 is a flow chart of a motion search process according to an example.

Fig. 5 is a flow chart illustrating another motion search method according to an example.

Fig. 6 is a flowchart illustrating an algorithm setting an on state for the image block DMVR to be encoded according to one example.

Fig. 7 is a block diagram illustrating a motion search apparatus according to an exemplary embodiment.

Fig. 8 is a schematic diagram illustrating a structure of an electronic device suitable for use in implementing an exemplary embodiment of the present disclosure, according to an exemplary embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments can be embodied in many forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted.

The described features, structures, or characteristics of the disclosure may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. However, those skilled in the art will recognize that the aspects of the present disclosure may be practiced with one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.

The drawings are merely schematic illustrations of the present disclosure, in which like reference numerals denote like or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in at least one hardware module or integrated circuit or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only, and not necessarily all of the elements or steps are included or performed in the order described. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

In the present specification, the terms "a," "an," "the," "said" and "at least one" are used to indicate the presence of at least one element/component/etc.; the terms "comprising," "including," and "having" are intended to be inclusive and mean that there may be additional elements/components/etc., in addition to the listed elements/components/etc.; the terms "first," "second," and "third," etc. are used merely as labels, and do not limit the number of their objects.

As shown in fig. 1, the system architecture may include a server 101, a network 102, a terminal device 103, a terminal device 104, and a terminal device 105. Network 102 is the medium used to provide communication links between terminal device 103, terminal device 104, or terminal device 105 and server 101. Network 102 may include various connection types such as wired, wireless communication links, or fiber optic cables, among others.

The server 101 may be a server providing various services, such as a background management server providing support for devices operated by a user with the terminal device 103, the terminal device 104, or the terminal device 105. The background management server may perform analysis and other processing on the received data such as the request, and feed back the processing result to the terminal device 103, the terminal device 104, or the terminal device 105.

The terminal device 103, the terminal device 104, and the terminal device 105 may be, but are not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a wearable smart device, a virtual reality device, an augmented reality device, and the like.

It should be understood that the numbers of the terminal device 103, the terminal device 104, the terminal device 105, the network 102 and the server 101 in fig. 1 are only illustrative, and the server 101 may be a server of one entity, may be a server cluster formed by a plurality of servers, may be a cloud server, and may have any number of terminal devices, networks and servers according to actual needs.

The motion search method in the embodiment of the disclosure is applied to a compression encoding process of multimedia information including video, still images, moving images, and the like. For ease of description, some terminology in H.264 or H.265 is used in embodiments of the application. Those skilled in the art will appreciate that the solutions described in this disclosure are equally applicable to similar technical problems when encoding uses other criteria. In this disclosure, a video communication is taken as an example, and fig. 2 shows a schematic diagram of a video communication system to which an embodiment of the present disclosure may be applied. As shown in fig. 2, the transmitting device of the transmitting end includes a video collector, a video memory, a video encoder and a transmitter, the video collector sends the collected video to the video encoder to perform compression encoding of image information, and then the video is sent out through the transmitter, and the video memory can be used for storing the recent video collected by the collector. The receiving device of the receiving end comprises a receiver, a video display and a video decoder, wherein the video decoder decodes the received data to recover an image, and the decoded image is displayed on the video display. The motion search method in the embodiment of the disclosure is mainly applied to a video encoder in a video communication system, and the motion search is a main part of the occupied computing resources in the encoding process, and the research of a motion search algorithm plays an important role in improving the overall encoding efficiency of video encoding.

Fig. 4 is a flow chart of a motion search process according to an example. As shown, the motion search mainly adopts a two-step search strategy. Firstly, when motion searching is carried out on a current image block to be coded in a frame to be coded, determining a current reference image block corresponding to a reference frame corresponding to the frame to be coded, and further determining a whole pixel searching range around the current reference image block. Based on the integer pixel search algorithm, each motion vector is traversed within the integer pixel search range to predict the optimal motion vector (Motion Vector Predict, MVP) and further determine the integer pixel matching point. And secondly, determining a sub-pixel searching range according to the whole pixel matching point. The sub-pixel search range may be a sub-pixel search range determined in one-half pixel or one-quarter pixel based on the whole pixel matching point as a starting point. Based on a sub-pixel searching algorithm, each motion vector is traversed in the sub-pixel searching range to predict the optimal motion vector, and then sub-pixel matching points are determined. And obtaining a corresponding optimal reference block, namely a matching block, in the reference frame according to the sub-pixel matching point, and further determining the motion vector of the current image block to be coded.

The steps of the method in the exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings and examples.

Fig. 3 is a flowchart illustrating a motion search method according to an exemplary embodiment. The method provided by the embodiment of fig. 3 may be performed by any electronic device, for example, the terminal device in fig. 1, or the server in fig. 1, or a combination of the terminal device and the server in fig. 1, which is not limited in this disclosure.

In step S3, at least one first image block to be encoded in a frame to be encoded, in a corresponding reference frame, performing integer pixel search by using a first integer pixel search algorithm, and determining an integer pixel matching point; the frame to be encoded comprises at least one image block to be encoded; the first image block to be encoded belongs to the at least one image block to be encoded; the first whole pixel search algorithm employs a mean-removed absolute error sum algorithm (Mean removal sum of absolute difference, MR-SAD).

As described above, in the motion search process for video encoding, the motion search is mainly divided into two search steps of whole-pixel search and sub-pixel search. Wherein the split pixel search may be further split into a half pixel and a quarter pixel search.

In the embodiment of the present disclosure, this step S310 involves an integer-pixel search process in the motion search process. The method comprises the steps of carrying out integral pixel searching by adopting an MR-SAD algorithm on a first image block to be coded in a frame to be coded so as to determine integral pixel matching points, thereby optimizing the problem that errors exist in distortion calculation caused by brightness difference in the motion searching process, and further improving the video coding performance.

Where the frame to be encoded is the frame that currently needs to be encoded and the reference frame is a certain frame that has been encoded and stored in the video sequence, typically a certain frame before or after the frame to be encoded. In the encoding process, motion vectors and residual data are generated by calculating differences between frames to be encoded and reference frames. And obtaining a predicted image block according to the decoded reference frame at the decoding end according to the position indicated by the motion vector, and adding the predicted image block and residual data to obtain a reconstructed decoded frame.

In addition, in order to better perform video coding and compression, improve compression efficiency, adapt to different scenes and requirements, and improve prediction accuracy, it is generally necessary to divide the frame to be coded into a plurality of image blocks to be coded. According to the actual scene requirement, the division of the image block to be encoded can be performed based on the division strategies of gray value change, brightness value change, size, dynamic and static areas and the like. The embodiments of the present disclosure are not limited to this particular image block partitioning strategy. It should be noted that, as a special case, the frame to be encoded does not undergo image block division, and the frame to be encoded can be regarded as one image block to be encoded, and it is also within the protection scope of the present disclosure.

In an embodiment of the present disclosure, the first image block to be encoded belongs to all or part of the image blocks of the plurality of image blocks to be encoded. It may be determined whether the first image block to be encoded is based on image parameter information of the image block to be encoded. For example, it may be determined whether the image block to be encoded is the first image block to be encoded according to at least one information of brightness, size, or frame type of the image block to be encoded.

According to the analysis of the present invention, it is found that in the video encoding process, when the textures between the frame to be encoded and the reference frame are similar, but the overall brightness difference is large. In the frequency domain, the two images are highly similar in the high frequency components corresponding to the textures, but only the low frequency components corresponding to the overall brightness are greatly different. In the encoder, the low frequency component is easy to compress, the consumed code rate is less, and the high frequency component is difficult to compress, and the consumed code rate is more. However, in the existing whole pixel searching algorithm process, whether the related difference is caused by brightness change or image texture change is not distinguished. Therefore, in the search process for this case, the matching accuracy is poor, thereby affecting the code rate consumption of the residual coding.

In view of the above, in the embodiments of the present disclosure, the integer pixel search algorithm employs an MR-SAD algorithm to perform the integer pixel search. The MR-SAD algorithm is an algorithm for image comparison registration. The core idea of the algorithm is to perform mean value removal on the two images before calculating the absolute error, and then compare the two images by removing the absolute error after the mean value. By removing the mean value, the influence caused by brightness change can be eliminated, and the texture similarity can be better distinguished.

Therefore, the embodiment of the disclosure can eliminate the influence caused by brightness change by adopting the MR-SAD algorithm to perform the whole pixel search, and solve the problem that the whole pixel search matching calculation cannot be accurately performed under the above conditions, so that more accurate whole pixel matching points can be obtained.

In step S320, with the integral pixel matching point as a starting point, in a preset sub-pixel searching range of the reference frame, sub-pixel searching is performed by a sub-pixel searching algorithm, so as to determine a sub-pixel matching point.

In the embodiment of the present disclosure, the whole pixel matching point determined in the foregoing step S310 is taken as a starting point, and a preset sub-pixel searching range is determined in the reference frame. In an exemplary embodiment, the sub-pixel search range is determined based on a preset step size (e.g., one-half pixel or one-quarter pixel) centered on the integer pixel matching point. For example, the position of the integral pixel matching point is (1, 1), the preset step length is 1/2 pixel, and the position of the sub-pixel searching range includes (1, 1) and the upper part (1, 1.5), the lower part (1, 0.5), the left side (0.5, 1), the right side (1.5, 1), the upper right part (1.5 ), the lower right part (1.5,0.5), the upper left part (0.5, 1.5) and the lower left part (0.5 ) which are centered on (1, 1). In addition, the sub-pixel search range may be determined based on factors such as a pixel difference between the two images, a motion vector size, and the like, which will not be described in detail herein.

In the embodiment of the disclosure, in the sub-pixel searching range of the reference frame, sub-pixel searching is performed through a sub-pixel searching algorithm, and sub-pixel matching points are determined. The split pixel search algorithm calculates a difference from the current block by traversing each pixel of the candidate block to select the matching block with the smallest error. The split pixel search algorithm will typically employ a different image comparison algorithm than the whole pixel search algorithm.

In an exemplary embodiment, the split pixel search algorithm may employ an absolute transform error and algorithm (Sum of Absolute Transformed Difference, SATD). The SATD algorithm is an algorithm for motion vector estimation for calculating an error of a moving object. The SATD algorithm measures the accuracy of motion vectors by transforming blocks and calculating the sum of absolute differences between the transformed blocks. The split pixel search algorithm can be applied to a common inter prediction method and can also be applied to a forward prediction method (1 ookahead).

In an exemplary embodiment, the sub-pixel searching algorithm may calculate an optimal sub-pixel position according to a searching cost in the whole-pixel searching process by adopting a method of solving a curved surface error, so as to determine the sub-pixel matching point.

In step S330, a motion vector of the image block to be encoded is determined according to the sub-pixel matching point.

In the embodiment of the disclosure, according to the sub-pixel matching point, a motion vector of a first image block to be encoded in the frame to be encoded relative to a corresponding image block in a reference frame is calculated. The motion vector is used to represent the displacement from the reference frame to the frame to be encoded. And encodes the motion vector for transmission to a decoding end for use in the decoding process.

Fig. 5 is a flow chart illustrating another motion search method according to an example. In the embodiment of the present disclosure, steps S510A, S, S530 in the motion search method shown in fig. 5 correspond to steps S310, S320, S330 in the motion search method shown in fig. 3, respectively, and are not repeated here. On the basis of the aforementioned motion search method, the method may further include the following steps.

In step S5B, performing integer pixel search on at least one second image block to be encoded in the frame to be encoded, in the corresponding reference frame, by using a second integer pixel search algorithm, to determine the integer pixel matching point; the second image block to be encoded belongs to the at least one image block to be encoded; the second whole pixel search algorithm employs a sum of absolute error algorithm (Sum of Absolute Difference, SAD).

In the embodiment of the disclosure, the image blocks to be encoded may be further divided into second image blocks to be encoded according to differences between different image blocks to be encoded in the frame to be encoded. The second image block to be encoded is an image block that is distinguished from the aforementioned first image block to be encoded. And for the second image block to be encoded, carrying out integral pixel searching by adopting an SAD algorithm in the integral pixel searching process.

The SAD algorithm is a pixel difference metric method that measures the similarity between two image blocks. The core idea of the algorithm is to calculate the absolute difference between the corresponding pixels in the two image blocks and to accumulate the differences of all pixels to obtain an error value. The SAD algorithm does not perform mean value removal relative to the MR-SAD algorithm, and therefore cannot eliminate the effect of brightness variation. However, the SAD algorithm has the advantages of simple algorithm and high calculation speed, and can improve the efficiency of the whole pixel searching process.

In the embodiment of the disclosure, the processing procedure of the whole pixel search is further optimized by carrying out the whole pixel search on different image blocks to be encoded by adopting a differentiated whole pixel search algorithm. And the MR-SAD algorithm is adopted for the first image block to be encoded, which is greatly influenced by the brightness difference, so as to obtain more accurate integral pixel matching points, and the SAD algorithm is adopted for the second image block to be encoded, which is less influenced by the brightness difference, so that the efficiency of the searching process is improved.

In an exemplary embodiment, the motion search method further includes: and generating a mask image according to the first image block to be coded and the second image block to be coded. The mask map is used for indicating that the image block to be encoded is the first image block to be encoded or the second image block to be encoded. The mask map corresponds to each image block to be coded in the frame to be coded, and marks each image block to be coded as a first image block to be coded or a second image block to be coded in a marking mode. For example, the flag "0" indicates that the image block to be encoded is a first image block to be encoded, and the flag "1" indicates that the image block to be encoded is a second image block to be encoded, and vice versa. The mask diagram can learn the whole pixel search of the whole pixel search algorithm adopted for the image block to be coded in the video coding and decoding process, and further can adopt differentiated coding and decoding processes.

As described above, according to the difference between different image blocks to be encoded in the frame to be encoded, the image block to be encoded is determined to be the first image block to be encoded or the second image block to be encoded. Specifically, the image block to be encoded may be determined as the first image block to be encoded or the second image block to be encoded according to at least one of the brightness, the size, or the frame type of the image block to be encoded.

Dividing image blocks to be encoded based on brightness

In an exemplary embodiment, the image block to be encoded is determined to be the first image block to be encoded or the second image block to be encoded according to the luminance of the image block to be encoded. The method may comprise the following steps.

In step S11, a luminance difference between the image block to be encoded and a corresponding image block in the reference frame is calculated.

In step S12, in response to the luminance difference value being greater than or equal to a luminance threshold, the image block to be encoded is the first image block to be encoded.

In step S13, in response to the luminance difference value being smaller than a luminance threshold, the image block to be encoded is the second image block to be encoded.

In an embodiment of the present disclosure, the image block to be encoded is determined to be the first image block to be encoded or the second image block to be encoded by calculating a luminance difference value between the image block to be encoded and a corresponding image block in the reference frame. If the brightness difference value is larger than or equal to the preset brightness threshold value, the brightness difference value of the image block to be coded is larger, the influence of the brightness difference is eliminated by adopting an MR-SAD algorithm, and more accurate integral pixel matching points are obtained. If the brightness difference is smaller than the preset brightness threshold, the brightness difference of the image block to be coded is smaller, the brightness difference is not needed to be eliminated, and the SAD algorithm can be adopted to improve the efficiency of the searching process.

Among them, there are many algorithms for calculating a luminance difference value between two image blocks, which can be calculated by an algorithm of gray histogram statistics, luminance histogram statistics, average luminance, edge detection, or the like. The process of calculating the luminance difference value will be described below by taking a gray histogram statistical method as an example.

First, two sets of history 0[256] and history 1[256] are defined, corresponding to the image block to be encoded and the reference frame image block, respectively. Wherein 256 represents a pixel value ranging from 0 to 255. The two numbers are used to record the gray values of the two image blocks, respectively. And constructing gray histograms of the two image blocks according to the two arrays. The gray level histogram is a statistical chart reflecting the relationship between the frequency of occurrence of each gray level pixel in one image and the gray level. The gray level is taken as an abscissa, the frequency is taken as an ordinate, and all pixels in the digital image are counted according to the gray value.

Next, the absolute value of the difference between the individual gray levels in the two gray histograms is calculated.

Finally, each gray level difference value is statistically summed to be the brightness difference value between two image blocks.

Other algorithms for calculating the brightness difference are all technical means familiar to those skilled in the art, and will not be described herein.

Dividing image blocks to be encoded based on size

In an exemplary embodiment, the image block to be encoded is determined as the first image block to be encoded or the second image block to be encoded according to the size of the image block to be encoded. The method may comprise the following steps.

In step S21, in response to the image block to be encoded being greater than or equal to a size threshold, the image block to be encoded is the first image block to be encoded.

In step S22, in response to the image block to be encoded being smaller than a size threshold, the image block to be encoded is the second image block to be encoded.

In the embodiment of the disclosure, whether the size of the image block to be encoded exceeds a preset size threshold is determined to be the first image block to be encoded or the second image block to be encoded. If the size of the image block to be encoded is larger than or equal to a preset size threshold, the image block to be encoded is larger in size and is greatly influenced by brightness difference, the brightness difference influence needs to be eliminated by adopting an MR-SAD algorithm, and more accurate whole pixel matching points are obtained. If the size of the image block to be encoded is smaller than the preset size threshold, the image block to be encoded is smaller in size and is less influenced by brightness difference, brightness difference is not required to be eliminated, and the SAD algorithm can be adopted to improve the efficiency of the searching process.

It should be noted that the setting of the size threshold may also be adjusted according to the actual scene requirements. For example, the size threshold may be set to mxn. When the width of the image block to be encoded is greater than or equal to M and the height of the image block to be encoded is greater than or equal to N, the image block to be encoded adopts an MR-SAD algorithm for the first image block to be encoded. And when the width of the image block to be coded is smaller than M or the height of the image block to be coded is smaller than N, adopting SAD algorithm for the second image block to be coded.

Partitioning image blocks to be encoded based on frame type

In an exemplary embodiment, the image block to be encoded is determined as the first image block to be encoded or the second image block to be encoded according to the frame type of the image block to be encoded. The method may comprise the following steps.

In step S31, in response to the frame to be encoded being a key frame (I frame) or a forward prediction frame (P frame), the image block to be encoded of the frame to be encoded is the first image block to be encoded.

In step S32, in response to the frame to be encoded being a bi-predictive frame (B-frame), the image block to be encoded of the frame to be encoded is the second image block to be encoded.

In video encoded image frames, three frame types are typically provided, I-frames, P-frames, B-frames, respectively. In a video coding sequence, a group of pictures (Group of Pictures, GOP) is a set of pictures in the sequence that is used to assist random access, which refers to the interval between two I frames. The first picture of the GOP must be an I-frame to ensure that the GOP can be independently decoded without reference to other pictures. I frames, also called key frames, contain complete image information, providing the highest image quality. An I-frame is an intra-coded frame, i.e. independent of other frames, which can be decoded and displayed separately. P frames, also known as forward predicted frames, require a decoder to reconstruct an image by referencing a previous frame, and are more efficiently compression encoded than I frames by referencing a previous I or P frame. B frames, also known as bi-predictive frames, are predictively encoded by referring to previous and subsequent I frames or P frames, and decoding requires simultaneous reference to the previous and subsequent frames to reconstruct the image.

In an embodiment of the present disclosure, the image block to be encoded of the frame to be encoded is determined to be the first image block to be encoded or the second image block to be encoded according to the frame type of the frame to be encoded. If the frame to be encoded is an I frame or a P frame, the encoding mode of the frame to be encoded is relatively independent and is greatly influenced by brightness difference, and the brightness difference influence needs to be eliminated by adopting an MR-SAD algorithm, so that more accurate integral pixel matching points are obtained. If the frame to be encoded is a B frame, the B frame is encoded according to the front and back two-way reference frames, so that the influence of brightness difference is small, and the SAD algorithm can be adopted to improve the efficiency of the searching process.

It should be noted that the whole pixel search algorithm is determined according to the frame type in such a way that the selected whole pixel search algorithm is adopted for the whole frame to be encoded, and the image blocks to be encoded inside the frame to be encoded are not distinguished any more.

Meanwhile, the above describes that the whole pixel searching algorithm is determined according to the brightness, the size or the frame type of the image block to be encoded, respectively. In practical application, the whole pixel searching algorithm can be comprehensively determined according to a mode of combining various information, and the whole pixel searching algorithm is also within the protection scope of the disclosure.

The decoding-side motion vector correction algorithm (Decoder Motion Vector Refinement, DMVR) is a technique for block encoding and decoding in bi-predictive merge mode. The DMVR algorithm introduces the concept of deformable motion vectors, allowing the motion vectors of image blocks to be adjusted at the sub-pixel level, providing a more accurate motion estimation. In this mode, bi-directional motion vectors of the data block can be further refined using bi-directional matching prediction at the decoding end. Since DMVR algorithm needs the information of the coding end to assist in correcting the motion vector in the process of correcting the motion vector by the decoding end. Therefore, the encoding side needs to be matched.

In the case that the decoding side adopts DMVR algorithm, the method can include the following steps.

And responding to the decoding end to adopt DMVR algorithm, and transmitting the mask map to the decoding end so that the decoding end can correct the motion vector according to the mask map.

In an embodiment of the present disclosure, the mask map is used to represent that the image block to be encoded is the first image block to be encoded or the second image block to be encoded. The mask map corresponds to each image block to be coded in the frame to be coded, and marks each image block to be coded as a first image block to be coded or a second image block to be coded in a marking mode. And sending the mask image to a decoding end, so that the decoding end knows that each image block to be encoded in the frame to be encoded is a first image block to be encoded or a second image block to be encoded, and further knows that each image block to be encoded adopts an MR-SAD algorithm or an SAD algorithm to perform whole pixel search.

Fig. 6 is a flowchart illustrating an algorithm setting an on state for the image block DMVR to be encoded according to one example. On the basis of the aforementioned motion search method, the method may further include the following steps.

In step S610, an image block to be encoded is determined.

In the DMVR algorithm, BCW (Block-based Weighted Prediction) is used to weight each image Block to more accurately predict motion vectors. In the related art, when the image block corresponding BCW is set to a default value, the image block may be set to DMVR algorithm-on state. The decoding side may start DMVR the algorithm for that tile. When the image block corresponding BCW is set to a non-default value, if the image block adopts the SAD algorithm during the whole pixel searching process, the image block may be set to DMVR algorithm non-openable state. Accordingly, the decoding end cannot start DMVR the algorithm for the image block. In the embodiment of the disclosure, if the image block adopts the MR-SAD algorithm in the whole pixel searching process, the limitation of BCW setting is not needed. The decoding end may start DMVR the algorithm for that tile. Therefore, in this step, it is determined whether the type of the image block to be encoded is the first image block to be encoded or the second image block to be encoded, so as to perform the differentiation processing.

In step S620, in response to the to-be-encoded image block being the first to-be-encoded image block, the first to-be-encoded image block is set to DMVR in an algorithm-on state.

In the embodiment of the present disclosure, in response to the image block to be encoded being the first image block to be encoded, that is, the image block to be encoded adopts the MR-SAD algorithm in the whole pixel searching process, the first image block to be encoded may be set to DMVR algorithm-openable state, regardless of whether BCW is set to a default value. The decoding end can adopt DMVR algorithm to decode according to the requirement.

In step S630, in response to the image block to be encoded being the second image block to be encoded, it is detected whether BCW corresponding to the second image block to be encoded is set to a default value.

In the embodiment of the present disclosure, in response to the image block to be encoded being the second image block to be encoded, that is, the image block to be encoded adopts the SAD algorithm in the whole pixel searching process, it is required to detect whether the BCW corresponding to the second image block to be encoded is set to a default value.

In step S640, in response to the BCW corresponding to the second image block to be encoded being set to a default value, the second image block to be encoded is set to DMVR in an algorithm-openable state.

In the embodiment of the present disclosure, when the image block corresponds to the BCW set to the default value, the image block may be set to DMVR algorithm-openable states. The decoding side may start DMVR the algorithm for that tile.

In step S650, in response to the BCW corresponding to the second image block to be encoded being set to a non-default value, the second image block to be encoded is set to DMVR in an algorithm non-openable state.

In the embodiment of the present disclosure, when the image block corresponding BCW is set to a non-default value, if the image block adopts the SAD algorithm during the whole pixel searching process, the image block may be set to DMVR algorithm non-openable state. The decoding end cannot start DMVR the algorithm for that block.

In video coding, distortion calculation is also required at motion estimation (Motion Estimation). The distortion calculation is used to evaluate the difference between the prediction result of the frame to be encoded and the frame to be encoded (or the reference frame), or to evaluate the merits between different prediction modes. In the related art, the distortion calculation may select an optimal prediction model by calculating the difference values under different prediction models.

In an exemplary embodiment, the Distortion calculation formula is distortion=min (satd×k1, sad×k2). Wherein, the disfigurement is a Distortion cost; SATD represents the distortion cost under the SATD model; SAD represents the distortion cost under SAD model; k1, k2 represent distortion coefficients for different coding modes.

In the embodiment of the disclosure, the motion search method performs the whole pixel search on different image blocks to be encoded by adopting a differentiated whole pixel search algorithm, so that the distortion calculation formulas corresponding to the different image blocks to be encoded are also adjusted accordingly. The method may further comprise the following steps.

And responding to the image block to be encoded as the first image block to be encoded, and calculating the distortion cost of the first image block to be encoded based on an MR-SAD algorithm.

And responding to the image block to be encoded as the second image block to be encoded, and calculating the distortion cost of the second image block to be encoded based on an SAD algorithm.

In the embodiment of the present disclosure, the MR-SAD algorithm is adopted for the first image block to be encoded due to its integer pixel search algorithm. Therefore, the corresponding distortion calculation formula calculates the distortion cost of the first image block to be encoded based on the MR-SAD algorithm. For example, disfigurement=min (satd×k1, MR-sad×k2). The SAD algorithm is used for the second image block to be encoded due to its integer pixel search algorithm. Therefore, the corresponding distortion calculation formula calculates the distortion cost of the second image block to be encoded based on the SAD algorithm. For example, disfigurement=min (satd×k1, sad×k2).

The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method of the present disclosure.

Fig. 7 is a block diagram illustrating a motion search apparatus according to an exemplary embodiment. Referring to fig. 7, the apparatus 700 may include: the whole pixel searching module 710, the sub-pixel searching module 720, the motion vector determining module 730.

The whole pixel searching module 710 is configured to perform whole pixel searching through a first whole pixel searching algorithm in the corresponding reference frame to determine a whole pixel matching point; the frame to be encoded comprises at least one image block to be encoded; the first image block to be encoded belongs to the at least one image block to be encoded; the first whole pixel search algorithm adopts an MR-SAD algorithm.

The sub-pixel searching module 720 is configured to perform sub-pixel searching by using the sub-pixel searching algorithm in a preset sub-pixel searching range of the reference frame with the whole pixel matching point as a starting point, so as to determine a sub-pixel matching point.

A motion vector determining module 730, configured to determine a motion vector of the image block to be encoded according to the sub-pixel matching point.

In some exemplary embodiments of the present disclosure, the integer-pixel search module 710 is further configured to perform, for at least one second image block to be encoded in the frame to be encoded, an integer-pixel search in the corresponding reference frame by a second integer-pixel search algorithm, to determine the integer-pixel matching point; the second image block to be encoded belongs to the at least one image block to be encoded; the second whole pixel search algorithm adopts SAD algorithm.

In some exemplary embodiments of the present disclosure, an image block to be encoded dividing module is configured to determine that the image block to be encoded is the first image block to be encoded or the second image block to be encoded according to at least one information of brightness, size, or frame type of the image block to be encoded.

In some exemplary embodiments of the present disclosure, an image block to be encoded dividing module configured to calculate a luminance difference value between the image block to be encoded and a corresponding image block in the reference frame; responding to the brightness difference value being greater than or equal to a brightness threshold value, wherein the image block to be coded is the first image block to be coded; and responding to the brightness difference value being smaller than a brightness threshold value, wherein the image block to be coded is the second image block to be coded.

In some exemplary embodiments of the present disclosure, an image block to be encoded dividing module configured to, in response to the image block to be encoded being greater than or equal to a size threshold, the image block to be encoded being the first image block to be encoded; and responding to the image block to be encoded being smaller than a size threshold, wherein the image block to be encoded is the second image block to be encoded.

In some exemplary embodiments of the present disclosure, the image block to be encoded dividing module is configured to, in response to the frame to be encoded being a key frame (I frame) or a forward prediction frame (P frame), the image block to be encoded of the frame to be encoded being the first image block to be encoded; in response to the frame to be encoded being a bi-predictive frame (B-frame), a block of pictures to be encoded of the frame to be encoded is the second block of pictures to be encoded.

In some exemplary embodiments of the present disclosure, the whole pixel search module 710 is further configured to generate a mask map according to the first image block to be encoded and the second image block to be encoded; the mask map is used for indicating that the image block to be encoded is the first image block to be encoded or the second image block to be encoded.

In some exemplary embodiments of the present disclosure, the sub-pixel search module 720 is configured such that the sub-pixel search algorithm employs an SATD algorithm.

In some exemplary embodiments of the present disclosure, DMVR the setting module is configured to, in response to the decoding side adopting the DMVR algorithm, send the mask map to the decoding side for the decoding side to perform correction of the motion vector according to the mask map.

In some exemplary embodiments of the present disclosure, DMVR the setting module is configured to set the first block of pictures to be encoded to a DMVR algorithm-openable state in response to the block of pictures to be encoded being the first block of pictures to be encoded; responding to the image block to be coded as the second image block to be coded, and detecting whether BCW corresponding to the second image block to be coded is set as a default value; setting the second image block to be coded to DMVR in an algorithm openable state in response to the BCW corresponding to the second image block to be coded being set to a default value; and setting DMVR the second image block to be coded into a non-starting state of an algorithm in response to the fact that the BCW corresponding to the second image block to be coded is set to be a non-default value.

In some exemplary embodiments of the present disclosure, a distortion calculation module configured to calculate a distortion cost for the first image block to be encoded based on an MR-SAD algorithm in response to the image block to be encoded being the first image block to be encoded; and responding to the image block to be encoded as the second image block to be encoded, and calculating the distortion cost of the second image block to be encoded based on an SAD algorithm.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

An electronic device 800 according to such an embodiment of the present disclosure is described below with reference to fig. 8. The electronic device 800 shown in fig. 8 is merely an example and should not be construed to limit the functionality and scope of use of embodiments of the present disclosure in any way.

As shown in fig. 8, the electronic device 800 is embodied in the form of a general purpose computing device. Components of electronic device 800 may include, but are not limited to: the at least one processing unit 810, the at least one storage unit 820, a bus 830 connecting the different system components (including the storage unit 820 and the processing unit 810), and a display unit 840.

Wherein the storage unit stores program code that is executable by the processing unit 810 such that the processing unit 810 performs steps according to various exemplary embodiments of the present disclosure described in the above section of the present description of exemplary methods. For example, the processing unit 810 may perform the various steps shown in fig. 2.

As another example, the electronic device may implement the various steps shown in fig. 2.

Storage unit 820 may include readable media in the form of volatile storage units such as Random Access Memory (RAM) 821 and/or cache memory unit 822, and may further include Read Only Memory (ROM) 823.

The storage unit 820 may also include a program/utility 824 having a set (at least one) of program modules 825, such program modules 825 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

Bus 830 may be one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 800 may also communicate with one or more external devices 870 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 800, and/or any device (e.g., router, modem, etc.) that enables the electronic device 800 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 850. Also, electronic device 800 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 860. As shown, network adapter 860 communicates with other modules of electronic device 800 over bus 830. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 800, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

In an exemplary embodiment, a computer readable storage medium is also provided, e.g., a memory, comprising instructions executable by a processor of an apparatus to perform the above method. Alternatively, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

In an exemplary embodiment, a computer program product is also provided, comprising a computer program/instruction which, when executed by a processor, implements the method in the above embodiments.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A motion search method, comprising:

At least one first image block to be coded in a frame to be coded, in a corresponding reference frame, carrying out whole pixel search through a first whole pixel search algorithm, and determining a whole pixel matching point; the frame to be encoded comprises at least one image block to be encoded; the first image block to be encoded belongs to the at least one image block to be encoded; the first whole pixel search algorithm employs a mean-removed absolute error sum algorithm (Mean removal sum of absolute difference, MR-SAD);

taking the whole pixel matching point as a starting point, and carrying out sub-pixel searching in a preset sub-pixel searching range of the reference frame through a sub-pixel searching algorithm to determine sub-pixel matching points;

And determining the motion vector of the image block to be encoded according to the sub-pixel matching points.

2. The method as recited in claim 1, further comprising:

Carrying out whole pixel searching on at least one second image block to be coded in the frame to be coded in the corresponding reference frame through a second whole pixel searching algorithm, and determining the whole pixel matching point; the second image block to be encoded belongs to the at least one image block to be encoded; the second whole pixel search algorithm employs a sum of absolute error algorithm (Sum of Absolute Difference, SAD).

3. The method as recited in claim 2, further comprising: and determining the image block to be encoded as the first image block to be encoded or the second image block to be encoded according to at least one piece of information in brightness, size or frame type of the image block to be encoded.

4. A method according to claim 3, wherein said determining the image block to be encoded as the first image block to be encoded or the second image block to be encoded according to the luminance of the image block to be encoded comprises:

calculating brightness difference values between the image blocks to be coded and corresponding image blocks in the reference frame;

responding to the brightness difference value being greater than or equal to a brightness threshold value, wherein the image block to be coded is the first image block to be coded;

And responding to the brightness difference value being smaller than a brightness threshold value, wherein the image block to be coded is the second image block to be coded.

5. A method according to claim 3, wherein said determining the image block to be encoded as the first image block to be encoded or the second image block to be encoded according to the luminance of the image block to be encoded comprises:

Responding to the image block to be encoded being greater than or equal to a size threshold, wherein the image block to be encoded is the first image block to be encoded;

and responding to the image block to be encoded being smaller than a size threshold, wherein the image block to be encoded is the second image block to be encoded.

6. A method according to claim 3, wherein said determining the image block to be encoded as the first image block to be encoded or the second image block to be encoded according to the luminance of the image block to be encoded comprises:

responding to the frame to be coded as a key frame (I frame) or a forward prediction frame (P frame), wherein an image block to be coded of the frame to be coded is the first image block to be coded;

in response to the frame to be encoded being a bi-predictive frame (B-frame), a block of pictures to be encoded of the frame to be encoded is the second block of pictures to be encoded.

7. The method as recited in claim 2, further comprising: generating a mask image according to the first image block to be coded and the second image block to be coded; the mask map is used for indicating that the image block to be encoded is the first image block to be encoded or the second image block to be encoded.

8. The method of claim 1, wherein the sub-pixel search algorithm employs an absolute transform error sum algorithm (Sum of Absolute Transformed Difference, SATD).

9. The method as recited in claim 7, further comprising:

And in response to the decoding end adopting a decoding end motion vector correction algorithm (Decoder Motion Vector Refinement, DMVR), sending the mask map to the decoding end so that the decoding end can correct the motion vector according to the mask map.

10. The method as recited in claim 2, further comprising:

Responding to the image block to be encoded as the first image block to be encoded, and setting the first image block to be encoded into DMVR algorithm openable states;

Detecting whether Block weighted prediction (Block-based Weighted Prediction, BCW) corresponding to the second image Block to be encoded is set as a default value in response to the image Block to be encoded being the second image Block to be encoded;

Setting the second image block to be coded to DMVR in an algorithm openable state in response to the BCW corresponding to the second image block to be coded being set to a default value;

And setting DMVR the second image block to be coded into a non-starting state of an algorithm in response to the fact that the BCW corresponding to the second image block to be coded is set to be a non-default value.

11. The method as recited in claim 2, further comprising:

responding to the image block to be encoded as the first image block to be encoded, and calculating the distortion cost of the first image block to be encoded based on an MR-SAD algorithm;

12. A motion search apparatus, comprising:

The whole pixel searching module is configured to search for at least one first image block to be coded in the frame to be coded, and in the corresponding reference frame, the whole pixel searching is carried out through a first whole pixel searching algorithm to determine a whole pixel matching point; the frame to be encoded comprises at least one image block to be encoded; the first image block to be encoded belongs to the at least one image block to be encoded; the first whole pixel searching algorithm adopts an MR-SAD algorithm;

The sub-pixel searching module is configured to perform sub-pixel searching by a sub-pixel searching algorithm in a preset sub-pixel searching range of the reference frame by taking the whole pixel matching point as a starting point, so as to determine a sub-pixel matching point;

And the motion vector determining module is configured to determine the motion vector of the image block to be encoded according to the sub-pixel matching points.

13. An electronic device, comprising:

A processor;

a memory for storing the processor-executable instructions;

Wherein the processor is configured to execute the executable instructions to implement the motion search method of any one of claims 1 to 11.

14. A computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the motion search method of any one of claims 1 to 11.

15. A computer program product comprising computer programs/instructions which when executed by a processor implement the motion search method of any one of claims 1 to 11.