WO2017019765A1

WO2017019765A1 - Systems and methods for video encoding using an improved motion vector search algorithm

Info

Publication number: WO2017019765A1
Application number: PCT/US2016/044252
Authority: WO
Inventors: Da Qing ZHOU
Original assignee: Tmm, Inc.
Priority date: 2015-07-29
Filing date: 2016-07-27
Publication date: 2017-02-02

Abstract

Systems and methods for identifying a best motion vector are provided. An example method includes the steps of: determining a search range; performing a global search for a global best motion vector; identifying the global best motion vector; performing a first part of a proximity search, wherein the global best motion vector is used as the predicted motion vector; identifying a proximity best motion vector from the first part of the proximity search; performing a second part of the proximity search, based on the proximity best motion vector; and identifying a final best motion vector from the second part of the proximity search.

Description

SYSTEMS AND METHODS FOR V DEO ENCODING USING AN IMPROVED MOTION VECTOR SEARCH ALGORITHM

This application is being filed on 27 July 2016, as a PCT International patent application, and claims priority to U.S. Provisional Patent Application No.

62/198,251, filed July 29, 2015, the disclosure of which is hereby incorporated by reference herein in its entirety.

Background

High speed, quality video encoding may be benefited by the identification of best motion vectors for use during motion estimation. Over the past ten years, encoding work flows such as H.264 AVC, VP8, VP9 and H264 HEVC have been relying on a variation of the Diamond Search Algorithm to identify these best motion vectors. Traditional exhaustive searches for motion vectors typically yield the best motion vector and the smallest file size. However, an exhaustive search also takes a longer period of time to process as it needs to search the entire frame for the best motion vector. Depending on the frame resolution, it could take minutes to encode one frame. Since a two hour feature video at 24fps has about 172,800 frames, exhaustive search encoding is simply unduly time consuming and unacceptable.

A need exists for a faster, more accurate motion vector search algorithm. As described herein, the Fibonacci Search Algorithm provides systems and methods for determining the best motion vector in a more efficient nature. Summary

Systems and methods for identifying a best motion vector are provided. An example method includes the steps of: determining a search range; performing a global search for a global best motion vector; identifying the global best motion vector; performing a first part of a proximity search, wherein the global best motion vector is used as the predicted motion vector; identifying a proximity best motion vector from the first part of the proximity search; performing a second part of the proximity search, based on the proximity best motion vector; and identifying a final best motion vector from the second part of the proximity search. 6 044252

Brief Description of the Drawings

The same number represents the same element or same type of element in all drawings.

Figure 1 is an exemplary method for performing high speed searching using the Fibonacci Search Algorithm.

Figure 2 provides a graphical representation of the Fibonacci Search Algorithm Global Search Motion Vector Table.

Figure 3 illustrates one example of a suitable operating environment in which one or more of the present embodiments may be implemented.

Figure 4 is an embodiment of an exemplary network in which the various systems and methods disclosed herein may operate.

Detailed Description

The aspects disclosed herein relate to systems and methods for video encoding using an improved motion vector search algorithm. A video includes one or more frames, where each frame is a single video image representing the video at a given point in time. Combination of these frames creates a video sequence. Between frames in a sequence, the patterns of objects and background images only change minimally. Motion estimation is a process that calculates these changes. Specifically, motion estimation determines motion vectors that predict the transformation of the images from one frame to another.

Typically, motion vectors are determined on a macroblock by macroblock basis. A frame may be divided into one or more macroblocks, where each macroblock corresponds to one or more pixel blocks, depending on the quality parameters of the video. Motion vectors, therefore, describe differences between macroblocks in different video frames. Motion estimation provides a means to compress these motion vectors instead of the frame in its entirety. It is the differences between the frames, not the frames themselves, which are compressed and encoded. In this regard, motion estimation saves time and computational effort by sending encoded motion vectors instead of the fully encoded video frame.

For motion estimation to operate and succeed, accurate identification of the best motion vector for each macroblock is of the utmost importance. Aspects disclosed herein relate to an improved motion vector search algorithm, referred to herein as the Fibonacci Search Algorithm. As described herein, the Fibonacci Search Algorithm is used by a VP9 encoder to locate the best motion vector for each macroblock during encoding. However, it should be appreciated that the systems and methods described herein are applicable to any encoding work flows such as H.264 AVC, VP8, H264, etc.

Figure 1 is an exemplary method 100 for implementing the Fibonacci Search Algorithm. Flow begins at operation 102 where a search range is determined. A search range is identified to limit the time of processing and to improve the overall speed of best motion vector identification. Determination of an optimal search range is necessary to increase speed without sacrificing quality. Different macroblocks may have different search ranges. These are affected by the encoding parameters such as, for example, "~cpu-speed" and "-rt" within the VP9 encoding algorithm. As discussed above, different encoding parameters may be employed by different encoding workflows without departing from the scope of this disclosure. Both the global search and the proximity searches may be constrained to the search range for each macro block.

Flow continues to operation 104 where a global search for the best motion vector is performed. The global search calculates one or more new motion vectors based on a starting prediction motion vector and a Global Search Motion Vector Table. The starting prediction motion vector for the global search may differ depending on encoding parameters of the macroblock, as discussed above. An exemplary Global Search Motion Vector Table is included below, as Table 1.

Table 1 - Global Search Motion Vector Table Each motion vector in the Global Search Motion Vector Table, as detailed in Table 1, is a point in a two dimensional plane. Each row of Table 1 corresponds to a complete circle with the center being the starting prediction motion vector. This circle is progressively expanded, as shown in Table 1 and depicted in Figure 2. Each motion vector in the Global Search Motion Vector Table is initially augmented by the starting prediction motion vector. The result of the augmentation creates a new motion vector. The global search sub-algorithm searches and checks all of the new motion vectors within the Global Search Motion Vector Table for the best motion vector. As a result, a best motion vector from the global search is identified.

Figure 2 is a graphical representation 200 of the Fibonacci Search Algorithm Global Search Motion Vector Table. The starting prediction motion vector 202 is the central point from which all new motion vectors are calculated. The Fibonacci Search Algorithm global search uses the center of each "circle," as depicted in Figure 2, as the center of the global search range. The global search range starts from the smallest circle and grows.

Returning to Figure 1, flow proceeds to the proximity search. The best motion vector obtained from the global search may be used as a starting prediction motion vector for the proximity search. The proximity search uses a fast diamond search, which comprises two steps. As is understood by persons of skill in the art, a fast diamond search is an algorithm based on the conventional full diamond search algorithm. There is a difference in speed between a full diamond search and a fast diamond search. In a fast diamond search, there is no need to check all eight points within the tables, as will be discussed in further detail below. This is because some of these values have been previously computed and checked in a former fast diamond search. Re-computation of these values may be skipped to increase processing speed.

At operation 106, the first step of the proximity search is performed. The first step of the proximity search may use the following fast diamond search motion vector table, as detailed in Table 2 below.

Table 2 - Fast Diamond Search Motion Vectors for Proximity Search Step 1

The first step of the proximity search iteratively computes the best motion vector until the best motion vector is the starting prediction motion vector. For 2 every iteration, the algorithm uses the best vector identified thus far as the center of a diamond calculation. Computational iterations are performed until no significant improvement is achieved by additional best motion vector computations.

Flow then proceeds to operation 108 where the second step of the proximity search is performed. The second step of the proximity search may use the following motion vector table, as detailed below in Table 3, and performs a one-time search.

Table 3 - Fast Diamond Search Motion Vectors for Proximity Search

Step 2

Flow then proceeds to operation 110, when the final best motion vector is obtained. The final best motion vector is the motion vector that is calculated from the second step of the proximity search.

Flow then proceeds to operation 110 where the Fibonacci Search Algorithm is repeated for the next macroblock to determine the best motion vector for that macroblock.

Instead of using Mean-Squared Error (MSE), the Fibonacci Search Algorithm uses Sum of Absolute Difference (SAD) to compare and search for the best motion vector. SAD requires less processing and still gives good results. In addition, it can be optimized with hardware as some graphics processing units (GPU) have hardware optimized code to calculate SADs.

Having described various embodiments of systems and methods that may be employed to self-similarity upsampling, this disclosure will now describe an exemplary operating environment that may be used to perform the systems and methods disclosed herein. Figure 3 illustrates one example of a suitable operating environment 300 in which one or more of the present embodiments may be implemented. This is only one example of a suitable operating environment and is not intended to suggest any limitation as to the scope of use or functionality. Other well-known computing systems, environments, and/or configurations that may be suitable for use include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor- based systems, programmable consumer electronics such as smart phones, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. In its most basic configuration, operating environment 300 typically includes at least one processing unit 302 and memory 304. Operating environment may also include one or more graphics processing units (GPU), such as CPU 318. GPU 318 may a coprocessor that is designed to handle graphics rendering and simple mathematical calculations. Depending on the exact configuration and type of computing device, memory 304 (storing, instructions for the Fibonacci Search Algorithm) may be volatile (such as RAM), non-volatile (such as ROM, flash memory, etc.), or some combination of the two. This most basic configuration is illustrated in Figure 3 by dashed line 306. Further, environment 300 may also include storage devices (removable, 308, and/or non-removable, 310) including, but not limited to, magnetic or optical disks or tape. Similarly, environment 300 may also have input device(s) 314 such as keyboard, mouse, pen, voice input, etc. and/or output device(s) 316 such as a display, speakers, printer, etc. Also included in the environment may be one or more communication connections, 312, such as LAN, WAN, point to point, etc. In embodiments, the connections may be operable to facility point-to-point communications, connection-oriented communications, connectionless communications, etc.

Operating environment 300 typically includes at least some form of computer readable media. Computer readable media can be any available media that can be accessed by processing unit 302 or other devices comprising the operating environment. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transitory medium which can be used to store the desired information. Computer storage media does not include communication media.

Communication media embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, microwave, and other wireless media. Combinations of the any of the above should also be included within the scope of computer readable media.

The operating environment 300 may be a single computer operating in a networked environment using logical connections to one or more remote computers. The remote computer may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above as well as others not so mentioned. The logical connections may include any method supported by available communications media. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.

Figure 4 is an embodiment of a system 400 in which the various systems and methods disclosed herein may operate. In embodiments, a client device, such as client device 402, may communicate with one or more servers, such as servers 404 and 406, via a network 408. In embodiments, a client device may be a laptop, a personal computer, a smart phone, a PDA, a netbook, a netbook, a tablet, a phablet, a convertible laptop, a television, or any other type of computing device, such as the computing device in Figure 4. In embodiments, servers 404 and 406 may be any type of computing device, such as the computing device illustrated in Figure 3. Network 408 may be any type of network capable of facilitating communications between the client device and one or more servers 404 and 404. Examples of such networks include, but are not limited to, LANs, WANs, cellular networks, a WiFi network, and/or the Internet.

In embodiments, the various systems and methods disclosed herein may be performed by one or more server devices. For example, in one embodiment, a single server, such as server 404 may be employed to perform the systems and methods disclosed herein. Client device 402 may interact with server 404 via network 408 in order to access data or information such as, for example, a video data generated based upon the various aspects disclosed herein. In further embodiments, the client device 406 may also perform functionality disclosed herein. In alternate embodiments, the methods and systems disclosed herein may be performed using a distributed computing network, or a cloud network. In such embodiments, the methods and systems disclosed herein may be performed by two or more servers, such as servers 404 and 406. In such embodiments, the two or more servers may each perform one or more of the operations described herein. Although a particular network configuration is disclosed herein, one of skill in the art will appreciate that the systems and methods disclosed herein may be performed using other types of networks and/or network configurations.

The embodiments described herein may be employed using software, hardware, or a combination of software and hardware to implement and perform the systems and methods disclosed herein. Although specific devices have been recited throughout the disclosure as performing specific functions, one of skill in the art will appreciate that these devices are provided for illustrative purposes, and other devices may be employed to perform the functionality disclosed herein without departing from the scope of the disclosure.

This disclosure describes some embodiments of the present technology with reference to the accompanying drawings, in which only some of the possible embodiments were shown. Other aspects may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments were provided so that this disclosure was thorough and complete and fully conveyed the scope of the possible embodiments to those skilled in the art.

Although specific embodiments are described herein, the scope of the technology is not limited to those specific embodiments. One skilled in the art will recognize other embodiments or improvements that are within the scope and spirit of the present technology. Therefore, the specific structure, acts, or media are disclosed only as illustrative embodiments. The scope of the technology is defined by the following claims and any equivalents therein.

Claims

We claim:

1. A method for identifying a best motion vector, comprising:

determining a search range;

performing a global search for a global best motion vector;

identifying the global best motion vector;

performing a first part of a proximity search, wherein the global best motion vector is used as the predicted motion vector;

identifying a proximity best motion vector from the first part of the proximity search;

performing a second part of the proximity search, based on the proximity best motion vector; and

identifying a final best motion vector from the second part of the proximity search.