WO2012099544A1

WO2012099544A1 - A method, an apparatus and a computer program product for estimating motion between frames of a video sequence

Info

Publication number: WO2012099544A1
Application number: PCT/SG2012/000024
Authority: WO
Inventors: Wei Siong Lee; Jo Yew Tham; Kwong Huang Goh; Kok Seng Aw; Hai Gao
Original assignee: Agency For Science, Technology And Research
Priority date: 2011-01-21
Filing date: 2012-01-25
Publication date: 2012-07-26

Abstract

Various embodiments provide a method for estimating motion between frames of a video sequence, the method including: determining a first intermediate motion vector relating to a frame portion of a first frame, the motion vector representing motion of the frame portion of the first frame to a second frame in a first temporal direction; and determining a second intermediate motion vector relating to another frame portion of the first frame, the additional motion vector representing motion of the other frame portion of the first frame to a third frame, the third frame being in an opposite temporal direction from the first frame as the second frame; and determining a motion vector relating to the frame portion of the first frame using the first intermediate motion vector and the second intermediate motion vector.

Description

A METHOD, AN APPARATUS AND A COMPUTER PROGRAM

PRODUCT FOR ESTIMATING MOTION BETWEEN FRAMES OF A VIDEO SEQUENCE

Cross-Reference to Related Application

[0001] This application claims priority from United States of America provisional patent application number 61/461,648 filed 21 January 2011, the content of it being hereby incorporated by reference in its entirety for all purposes.

Technical Field [0002] Various embodiments relate to a method, an apparatus and a computer program product. In various embodiments, the method, the apparatus and the computer program product are for estimating motion between frames of a video sequence.

Background [0003] Individual frames of a video sequence may contain redundant information when successive video frames contain the same static or moving objects. Motion estimation (ME) may be understood as a process which attempts to obtain motion vectors that represent the movement of objects between frames. The knowledge of the object motion may be used in motion compensation to achieve compression. [0004] In block-based video coding, motion vectors may be determined by the best match for each macroblock in the current frame with respect to a reference frame. A best match for a particular sized macroblock in the current frame may be found by exhaustively searching in the reference frame over a particular search window. This process may lead to a large number of search points which in turn requires a large number of arithmetic operations to compute the sum of absolute differences (SAD) to find the best match. Implementing this process in software can be very computationally expensive.

l Summary

[0005] In various embodiments, a method for estimating motion between frames of a video sequence is provided. The method may include: determining a first intermediate motion vector relating to a frame portion of a first frame, the motion vector representing motion of the frame portion of the first frame to a second frame in a first temporal direction; determining a second intermediate motion vector relating to another frame portion of the first frame, the additional motion vector representing motion of the other frame portion of the first frame to a third frame, the third frame being in an opposite temporal direction from the first frame as the second frame; and determining a motion vector relating to the frame portion of the first frame using the first intermediate motion vector and the second intermediate motion vector.

[0006] In various embodiments, an apparatus for estimating motion between frames of a video sequence is provided. The apparatus may include: at least one processor; and at least one memory including computer program code; the at least one memory and the computer program code being configured, with the at least one processor, to cause the apparatus to perform at least the following: determining a first intermediate motion vector relating to a frame portion of a first frame, the motion vector representing motion of the frame portion of the first frame to a second frame in a first temporal direction; and determining a second intermediate motion vector relating to another frame portion of the first frame, the additional motion vector representing motion of the other frame portion of the first frame to a third frame, the third frame being in an opposite temporal direction from the first frame as the second frame; and determining a motion vector relating to the frame portion of the first frame using the first intermediate motion vector and the second intermediate motion vector. [0007] In various embodiments, a computer program product for estimating motion between frames of a video sequence is provided. The computer program product may include at least one computer-readable storage medium having computer-executable program code instructions stored therein, the computer-executable program code instructions including: program code for determining a first intermediate motion vector relating to a frame portion of a first frame, the motion vector representing motion of the frame portion of the first frame to a second frame in a first temporal direction; and program code for determining a second intermediate motion vector relating to another frame portion of the first frame, the additional motion vector representing motion of the other frame portion of the first frame to a third frame, the third frame being in an opposite temporal direction from the first frame as the second frame; and program code for determining a motion vector relating to the frame portion of the first frame using the first intermediate motion vector and the second intermediate motion vector.

Brief Description of the Drawings [0008] In the drawings, like reference characters generally refer to like parts throughout the different views. The drawings are not necessarily to scale, emphasis instead generally being placed upon illustrating the principles of some example embodiments of the invention. In the following description, various example embodiments of the invention are described with reference to the following drawings, in which:

[0009] Figure 1 illustrates a video sequence according to an embodiment; [0010] Figure 2 illustrates a video sequence according to an embodiment; [0011] Figure 3 is a flow diagram relating to Figure 2; [0012] Figure 4 illustrates a video sequence according to an embodiment; [0013] Figure 5 is a flow diagram relating to Figure 4;

[0014] Figure 6 illustrates a video sequence according to an embodiment;

[0015] Figures 7 to 14 illustrate a video sequence according to an embodiment; and

[0016] Figure 15 is a schematic diagram of an apparatus according to an embodiment. Detailed Description

[0017] The following detailed description refers to the accompanying drawings that show, by way of illustration, specific details and embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. Other embodiments may be utilized and structural, logical, and electrical changes may be made without departing from the scope of the invention. The various embodiments are not necessarily mutually exclusive, as some embodiments can be combined with one or more other embodiments to form new embodiments. [0018] In various embodiments, a method for estimating motion between frames of a video sequence is provided. The method may include: determining a first intermediate motion vector relating to a frame portion of a first frame, the motion vector representing motion of the frame portion of the first frame to a second frame in a first temporal direction; and determining a second intermediate motion vector relating to another frame portion of the first frame, the additional motion vector representing motion of the other frame portion of the first frame to a third frame, the third frame being in an opposite temporal direction from the first frame as the second frame; and determining a motion vector relating to the frame portion of the first frame using the first intermediate motion vector and the second intermediate motion vector. [0019] In an implementation of various embodiments, the method may further include: determining an inertia motion vector relating to a frame portion of the second frame being determined by the determined motion vector, the inertia motion vector being determined using the determined motion vector, the inertia motion vector representing motion of the frame portion of the second frame to a further frame in the first temporal direction.

[0020] In another implementation of various embodiments, the inertia motion vector may be similar to the determined motion vector. [0021] In yet another implementation of various embodiments, the motion vector relating to the frame portion of the first frame may- be determined by inverting the second intermediate motion vector and by using the first intermediate motion vector and the inverted second intermediate motion vector.

[0022] In yet another implementation of various embodiments, the motion vector relating to the frame portion of the first frame may be determined by interpolating the first intermediate motion vector and the inverted second intermediate motion vector.

[0023] In yet another implementation of various embodiments, the frame portions may correspond to an image block or image macroblocks of a respective frame.

[0024] In yet another implementation of various embodiments, the first frame may be an I frame, which may also be referred to as an Intra frame.

[0025] In various embodiments, an apparatus for estimating motion between frames of a video sequence is provided. The apparatus may include: at least one processor; and at least one memory including computer program code; the at least one memory and the computer program code being configured, with the at least one processor, to cause the apparatus to perform at least the following: determining a first intermediate motion vector relating to a frame portion of a first frame, the motion vector representing motion of the frame portion of the first frame to a second frame in a first temporal direction; and determining a second intermediate motion vector relating to another frame portion of the first frame, the additional motion vector representing motion of the other frame portion of the first frame to a third frame, the third frame being in an opposite temporal direction from the first frame as the second frame; and determining a motion vector relating to the frame portion of the first frame using the first intermediate motion vector and the second intermediate motion vector.

[0026] In various embodiments, a computer program product for estimating motion between frames of a video sequence is provided. The computer program product may include at least one computer-readable storage medium having computer-executable program code instructions stored therein, the computer-executable program code instructions including: program code for determining a first intermediate motion vector relating to a frame portion of a first frame, the motion vector representing motion of the frame portion of the first frame to a second frame in a first temporal direction; and program code for determining a second intermediate motion vector relating to another frame portion of the first frame, the additional motion vector representing motion of the other frame portion of the first frame to a third frame, the third frame being in an opposite temporal direction from the first frame as the second frame; and program code for determining a motion vector relating to the frame portion of the first frame using the first intermediate motion vector and the second intermediate motion vector.

[0027] In an embodiment, the first frame is sequentially after the initialization frame in the video sequence. In an embodiment, the initialization frame is temporally later than the first frame in the video sequence. In an embodiment, the initialization and first frames are adjacent frames of the video sequence. In an embodiment, the second frame is sequentially after the first frame in the video sequence. In an embodiment, the first frame is temporally later than the second frame in the video sequence. In an embodiment, the first and second frames are adjacent frames of the video sequence. In an embodiment, the third frame is sequentially after the second frame in the video sequence. In an embodiment, the second frame is temporally later than the third frame in the video sequence. In an embodiment, the second and third frames are adjacent frames of the video sequence.

[0028] In an embodiment, the video sequence may be recorded by means of a video camera, for example, or may be received via a transmission medium such as a wireline communication network and/or a wireless communication network, e.g. using Internet protocols such as the Transport Control Protocol (TCP) / Internet Protocol (IP), although other protocols may be used for the transmission of the video sequence in alternative embodiments.

[0029] FIG. 1 illustrates an exemplary video sequence 2 including exemplary frames (i.e. images) 4, 6 and 8. In an embodiment, frames 4 to 8 are consecutive frames of the video sequence 2. In an embodiment, frame 4 is the temporally earliest frame. [0030] Each of frames 4 to 6 may include multiple macroblocks, each macroblock including a plurality of blocks (e.g. rectangular blocks), wherein each of the plurality of blocks may include a plurality of pixels (also known as picture elements). In an embodiment, the macroblocks are arranged in a grid. In an embodiment, each block may include 8 by 8 pixels or 16 by 16 pixels (alternatively, each block may include 8 by 16 or 16 by 8 pixels, although other ways of grouping of pixels into blocks and/or macroblocks may be provided in alternative embodiments). Each macroblock may e.g. include 4 by 4 blocks. Thus, in an embodiment, the pixels may be arranged in a grid. [0031] As seen more particularly in FIG. 1, each of frames 4 to 8 may contain an image of a circle 14 and a square 16. In the video sequence 2, the circle 14 moves from left to right and the square 16 simultaneously moves from right to left. The circle 14 is positioned behind the square 16, i.e. when the video sequence was captured by a camera the square 16 was positioned closer to the camera than the circle 14 (this situation may also be referred to as occlusion). Accordingly, as seen. more particularly in frames 6 and 8, when the circle 14 and square 16 overlap the square 16 obstructs the circle 14, i.e. the square 16 occludes the circle 14.

[0032] When motion estimation is performed for frame 8 whilst using frame 6 as a reference frame, it may not be possible to find a good match for some macroblocks (or e.g. blocks) due to the occlusion event. In an embodiment, it may not be possible to find an image portion of frame 6 which is a good match for macroblock 10 of frame 8. Specifically, the macroblock 10 mainly contains a portion of the circle 14, and that portion of the circle 14 is occluded (i.e. absent) in frame 6 because it is hidden behind the square 16. However, it may be possible to identify an image portion 12 of frame 4 which is a good match for macroblock 10 of frame 8 because the square 16 has completely moved past the circle 14 in frame 4.

[0033] The following provides a description of an embodiment with reference to FIG. 2 and FIG. 3. In an embodiment, the following methods may be interpreted as motion lacing embodiments. [0034] In an embodiment, a method may be provided which allows fast algorithms to track large displacements of moving objects across intermediate frames, i.e. from a target frame (e.g. frame 8) to the reference frame (e.g. frame 4) via one or more intermediate frames (e.g. frame 6). Accordingly, the quality of a predicted image may be improved.

[0035] FIG. 2 illustrates a video sequence 20 including four frames 22 to 28. In FIG. 2, each frame is depicted as a cross-sectional view. In an implementation, each frame may include 5 by 5 macroblocks and each macroblock may include 8 by 8 pixels. Each cross sectional view in FIG. 2 illustrates a slice through a frame and a single row of 5 pixels can be seen. In an implementation, frames 22 to 28 are consecutive frames of the video sequence 20, and frame 28 is the earliest frame. The process illustrated in FIG. 2 relates to determining a motion vector for macroblock 30 of frame 22 using frame 28 as a reference frame. In summary, the motion vector from frame 22 to frame 28 will be determined from the motion vectors between consecutive frames, i.e. frames 22 and 24, frames 24 and 26, and frames 26 and 28.

[0036] The embodiment of FIG. 2 will now be described in more detail with reference to the flow diagram of FIG. 3.

[0037] At 100, an image portion of the initialization frame (of time t = 3) is identified. In an embodiment, the frame 22 is the initialization frame. In an embodiment, the image portion of the initialization frame is macroblock 30 of frame 22. At 102, a target image portion of the first frame may be identified. In an implementation, the first frame may be frame 24 (of time t = 2) and the target image portion may be the image portion 32 of frame 24. In an embodiment, the target image portion is identified by searching the frame 24 for an image portion which corresponds to the image portion of the initialization frame. In an embodiment, once the target image portion is identified, a motion vector 34 is calculated for the target image portion using the initialization frame as the reference frame. In an embodiment, this motion vector 34 is called an initialization motion vector. [0038] At 104, a plurality of motion vectors is determined for the identified image portion. In the instant case, the identified image portion is the target image portion, i.e. image portion 32 of frame 24. Specifically, the macroblocks of the first frame which are at least partly covered by the identified image portion are identified. In the instant case, this means that macroblocks 36 and 38 of frame 24 are identified. Next, for each identified macroblock (i.e. for each of macroblocks 36 and 38), a corresponding image portion of frame 26 (of time t = 1) is identified. In other words, for each macroblock, frame 26 is searched to find an image portion with a pixel content that matches the pixel content of the macroblock. Once a corresponding image portion has been identified, a motion vector is calculated for the image portion using the corresponding macroblock as a reference. In other words, the motion vector represents motion of the macroblock's pixel content from frame 24 to frame 26. Accordingly, a motion vector mO is calculated for macroblock 36 and a motion vector ml is calculated for macroblock 38. In an embodiment, the plurality of motion vectors may include the motion vectors mO and ml .

[0039] At 106, an interpolated motion vector 40 is generated. In an embodiment, the interpolated motion vector 40 may be calculated as the mean, median or mode of the plurality of motion vectors determined in 104. In some other embodiments, the interpolated motion vector 40 may be the mean and a weighting may be applied to certain ones of the plurality of motion vectors so that they are represented in the mean to a greater or lesser extent. In the instant embodiment, the interpolated motion vector 40 is the mean.

[0040] At 108, an image portion 42 of the next frame which corresponds to the target image portion is identified. In the instant case, the next frame is frame 26, i.e. the second frame 26. In an embodiment, the image portion 42 is identified using the interpolated motion vector 40. For example, the image portion 42 of frame 26 may be identified by starting at the location of the image portion 32 of frame 24 and moving in accordance with the interpolated motion vector 40.

[0041] At 1 10, a test is performed to identify whether or not any further frames are to be processed. If further frames are present, processing flows back to 104. Alternatively, if no further frames are present, processing flows to 112. In the instant case, one further frame exists, i.e. frame 28, therefore, processing flows back.to 104.

[0042] Processing between 104 and 110 is performed in accordance with the above description. This time, however, the identified image portion is the image portion 42 of frame 26, the motion vectors calculated at 104 are m2 and m3, the interpolated motion vector calculated at 106 is motion vector 44, and the corresponding image portion identified at 108 is image portion 46. When processing returns to 1 10, this time there are no further frames to process so processing flows to 112.

[0043] At 112, a combined motion vector is calculated. In an embodiment, the combined motion vector represents the motion of the image portion 30 from frame 22 to frame 28. In other words, the content of the image portion 46 corresponds with the content of the image portion 30 and the combined motion vector represents what movement has been applied to the content to move it from its position in frame 22 to its position in frame 28. Accordingly, the image portion 46 may be represented by the image portion 30 together with the combined motion vector. In this way, compression of the video sequence 20 may be achieved.

[0044] In an embodiment, the combined motion vector may be generated based on the motion vector 34, the motion vector 40 and/or the motion vector 44. In an embodiment, the combined motion vector may be the sum of the motion vectors 34, 40 and 44. In an embodiment, the combined motion vector may additionally include a spatial offset (not shown). Specifically, motion estimation, such as, for example, fast motion estimation, may be performed around the area of frame 28 indicated by motion vector 44 in order to identify an image portion of frame 28 which most closely matches the image portion 30. The spatial offset may be calculated based on a comparison between the image portion of frame 28 indicated by the motion vector 44 and the image portion of frame 28 which best matches the image portion 30. In an embodiment, the special offset is the spatial difference between the positions of the two compared image portions. [0045] In the instant embodiment, the combined motion vector is the sum of the motion vector 34, the motion vector 40, the motion vector 44 and the spatial offset.

[0046] According to the above-described embodiment, each intermediate frame (i.e. frames 24 and 26 in the above example) uses surrounding motion vectors in the same temporal direction (which may be referred to as a first direction) to interpolate a new motion vector to the next intermediate or reference frame. It is noted that the intermediate frames are those frames in-between the initialization frame and the final frame of the video sequence being considered. For example, considering the embodiment of FIG. 2, the initialization frame is frame 22 and the final frame is frame 28 (in the temporal sequence the first frame), therefore, the intermediate frames are frames 24 and 26.

[0047] According to the above-described embodiment, it is possible to obtain a combined motion vector which traces motion between a reference frame (e.g. frame 28) and a current frame (e.g. frame 22) that may be far apart, i.e. they may not be consecutive frames. In an embodiment, one, two, three, four, or even more intermediate frames (e.g. frames 24 and 26) may exist between the reference and current frames. Specifically, it is the motion of a pixel content of a macroblock (e.g. macroblock 30) of the current frame that may be traced back to the reference frame. The trace path may be defined by the combined motion vector. According to this operation, motion estimation may be performed in the reference frame at or around the location indicated by the combined motion vector in order to identify an image portion (e.g. image portion 46) of the reference frame having a pixel content matching or corresponding to a pixel content of a particular macroblock (e.g. macroblock 30) of the current frame. Accordingly, the search for the corresponding image portion in the current frame may be focused, rather than, for example, being an exhaustive search. In this way, fewer computational operations may be performed and energy (i.e. power) may be conserved. It is therefore an advantage of the above-described embodiment that computation may be simplified such that processing may be performed faster and/or using less power. [0048] The following provides a description of an embodiment. This embodiment corresponds with the above-described embodiment. The following embodiment may be interpreted as a motion lacing embodiment. The following description is set out in mathematical terms.

[0049] Let M_tl;t0 denote a set of motion vectors of current frame f(tl) with reference frame f(t0), and let M_ti,_to(p) represent a motion vector of a macroblock positioned at p in the current frame f(tl). Strong temporal correlations in the motion vector fields between neighboring frames may allow approximations to be made for:

M _₂(p) « M_t._f_!(p) + Mt_i, (p + M_M_i (p))

t-2 0)

[0050] In general, for (tl - tO) > 0, the approximation may be given by:

Μ_ίιΛ (ρ) i¾ 111 ii -io-i (2) using the following iterations:

m³ (3)

j— p - - ητ^{7 1} (4) with the initial condition: in⁰ = Μ_{ί1 ;ί1}_.₅._ϊ (ρ) (5) and s = sgn(t] -to).

[0051] The updating vector function u in equation (3) is a motion vector at ρ/^' interpolated from the neighboring motion vectors. In an embodiment, equations (3) to (5) may form the core computing steps. [0052] As was the case with the previous embodiment, the above-described method may be integrated with fast motion estimation methods to improve performance accuracy for long-range motion estimation. These advantages may be achieved by observing that rigid body motions produce continuous motion trajectories spanning a number of frames across time. By exploiting these motion characteristics, the above- described method may help to progressively guide the motion prediction process while locating the 'true' motion vector even across a relatively large temporal distance between the current and reference frames.

[0053] The following provides a description of an embodiment with reference to FIG. 4 and FIG. 5. The following embodiment may be interpreted as a motion braiding embodiment.

[0054] As described above with reference to FIG. 1 , accurate motion estimation may be difficult when target and reference frames are temporally far apart. Specifically, as frames become farther apart the likelihood of large object translation (i.e. movement) and instances of occlusions increases. In turn, this can have an adverse effect on the performance of motion estimation algorithms by causing inaccurate and low quality predicted frames thus potentially limiting the application of these algorithms. This can also have a negative effect on video coding efficiency. Accordingly, fast motion estimation algorithms may be unable to provide good-quality predicted frames in such a scenario.

[0055] For example, the H.264/SVC video compression standard uses a hierarchical B-picture structure to achieve temporal scalability. The usage of this structure may make long-range motion estimation inevitable, i.e. the target and reference frames may necessarily be temporally far apart. This may limit the application of motion estimation algorithms, and in particular fast motion estimation algorithms, in H.264/SVC applications.

[0056] Therefore, in summary, when the temporal distance between target and reference frames is large, incidents of object occlusion are likely. Object occlusion can disrupt motion trajectories and this can cause in-accurate motion estimation. This may be a common issue in motion tracking.

[0057J However, in accordance with various embodiments, it may be assumed that object occlusion (for the same object) is unlikely to occur in both temporal directions. The following embodiment uses motion information in opposite temporal directions to continue tracing the motion trajectory.

[0058] FIG. 4 illustrates a video sequence which corresponds to the video sequence of FIG. 2. Accordingly, elements of FIG. 4 which correspond to elements of FIG. 2 have been given the same reference signs. The following describes the difference between the operation of FIG. 2 and the operation of FIG. 4. FIG. 4 will be described with reference to the flow diagram of FIG. 5.

[0059] The flow diagram of FIG. 5 corresponds with the flow diagram of FIG. 3. The difference between FIG. 5 and FIG. 3 is e.g. that the block 104 of FIG. 3 has been replaced by a delineated section 104 in FIG. 5. The delineated section 104 may include a number of operations which together provide a functionality which is similar to that of block 104 of FIG. 3. In other words, the delineated section 104 may include operations which together determine a plurality of motion vectors of an image portion. The following describes the operations of the delineated section 104 in detail.

[0060] Operations 100 and 102 were described above with reference to FIG. 3. At 200, macroblocks of the first frame (i.e. frame 24) which are covered by the image portion 32 are determined. Accordingly, macroblocks 36 and 38 are determined. At 202, a first one of the determined macroblocks are selected. In the instant embodiment, macroblock 36 is selected first. At 204, an image portion which corresponds to the pixel content of macroblock 36 is searched for in the next frame (i.e. frame 26). If a corresponding (i.e. matching) image portion is found, processing flows to 206 and then to 208. In the instant case, a corresponding image portion is found in frame 26 and, therefore, processing flows to 206. At 206, a motion vector is calculated for the image portion of frame 26 using the macroblock 36 as a reference. In other words, the motion vector represents motion of the pixel content of macroblock 36 from frame 24 to frame 26. Accordingly, a motion vector mO is calculated for macroblock 36. In an embodiment, the plurality of motion vectors determined by delineated section 104 may include the motion vector mO. It is noted that this operation is the same as described above with reference to block 104 of FIG. 3.

[0061] At 208, a check is performed to determine if any further macroblocks require processing. If further macroblocks are present, processing flows to 210. If no further macroblocks are present, processing flows to 106 which was described above with reference to FIG. 3. In the instant embodiment, macroblock 38 still requires processing and, therefore, processing flows to 210.

[0062] At 210, the next macroblock is selected, i.e. macroblock 38 is selected. Processing then returns to 204. As before, at 204, a matching image portion of frame 26 is searched for. However, this time a suitable matching image portion cannot be found in the frame 26. For example, this may be because the pixel content of macroblock 38 is absent in frame 26. For example, the pixel content may be absent because it is occluded in frame 26, such as, for example, by another object as described above with reference to FIG. 1. In an embodiment, various image portions of frame 26 having a corresponding size and shape to macroblock 38 are compared to macroblock 38, i.e. their respective pixel contents are compared. In an embodiment, a corresponding or matching image portion may be identified when the comparison identifies no difference. Additionally or alternatively, in an embodiment, a corresponding or matching image portion may be identified when the comparison identifies a difference which is below a predefined threshold. In the instant embodiment, the motion vector ml associated with the best matching image portion of 26 is ignored because the best matching image portion of 26 is not a close enough match, for example, the difference may be above the predefined threshold.

[0063] As mentioned above, in the instant embodiment, a matching image portion of frame 26 cannot be found and processing flows to 212. At 212, a matching image portion is identified in a previous frame. In an embodiment, the previous frame is in an opposite temporal direction from frame 24 as frame 26. In the instant embodiment, the previous frame is frame 22 (i.e. the initialization frame). Therefore, an image portion of frame 22 which corresponds (i.e. matches) image portion 32 of frame 24 is identified.

[0064] At 214, in an analogous way to 206, a motion vector is calculated for the image portion of frame 22 using the macroblock 38 as a reference. In other words, the motion vector represents motion of the pixel content of macroblock 38 from frame 24 to frame 22. It is noted that this is in the opposite direction to the motion vector generated in 206. Accordingly, a motion vector ml is calculated for macroblock 38. The motion vector ml is then used to generate an inertia motion vector - ml . In an embodiment, the inertia motion vector - ml is generated by inverting the motion vector ml , i.e. the motion vector direction is changed to its opposite, as seen more particularly on FIG. 4. It is to be understood that in some other embodiments, further processing may be performed to the motion vector ml in order to generate the inertia motion vector -ml . [0065] Once the inertia motion vector has been generated, processing flows to 208 which is described above. Once no further macroblocks require processing, the processing flow of FIG. 5 flows to 106. At 106, an interpolated motion vector is generated as described above with reference to FIG. 3. However, in the instant embodiment, the plurality of motion vectors generated by the delineated section 104 includes the motion vector generated in 206 (i.e. motion vector mO) and the motion vector generated in 214 (i.e. inertia motion vector - ml ). Therefore, these motion vectors are processed as described above with reference to FIG. 3 in order to generate the interpolated motion vector 60.

[0066] Processing from 108 to 112 is also as described above with reference to FIG. 3. Specifically, the interpolated motion vector 60 is used to identify an image portion 62 of frame 26. Additional motion vectors m2 and m3 are generated based on the image portion 62. A further interpolated motion vector 64 is generated based on motion vectors m2 and m3. An image portion 66 of frame 28 is identified using the further interpolated motion vector 64. A combined motion vector is then generated as described above with reference to FIG. 3. It is noted that when FIG. 2 and FIG. 4 are compared, the interpolated trajectory of FIG. 2 may be quite different from the interpolated trajectory of FIG. 4. .... ..

[0067] FIG. 6 illustrates a possible extension to the embodiment of FIG. 4 and FIG. 5. In the embodiment of FIG. 6, a forked secondary inertia path is generated. In the embodiment of FIG. 6, motion of the image portion 30 of frame 22 is traced through to an extra frame, i.e. to a frame 80 which follows frame 28.

[0068] As described above with reference to FIG. 4, it may be the case that because of object occlusion in frame 26 the motion vector ml of FIG. 2 is replaced by the inertia motion vector - fnl of FIG. 4. However, it may not be possible to predict the duration of the object occlusion. Accordingly, it may be the case that the object occlusion persists in frame 28, i.e. object occlusion occurs in both frames 26 and 28. In this case, motion vectors generated based on frame 28 may not be good candidates from interpolation because they may not accurately reflect motion of the pixel content of the image portion 30.

[0069] In view of the above, the trajectory may be forked, i.e. two trajectories may be defined. A first trajectory is indicated by a motion vector 82 generated by interpolating motion vectors m4 and m5. In an embodiment, the motion vectors 82, m4 and m5 are calculated in an analogous way to as described with reference to FIG. 5. A second trajectory is indicated by a motion vector 84. The motion vector 84 may be generated using the previously interpolated motion vector 64. In an embodiment, the motion vector 84 is an extension of the motion vector 64, i.e. the motion vector previously interpolated from m2 and m3. In an embodiment, this second trajectory may be interpreted as the motion inertia path.

[0070] According to the above operation, two possible regions of frame 80 may be identified, one associated with motion vector 84 and the other associated with the motion vector 82. In an embodiment, each region is considered in turn to identify the closest matching pixel content to that of image portion 30 of frame 22. In an embodiment, motion estimation, such as, for example, fast motion estimation, may be performed around each region in order to identify an image portion of frame 28 which most closely matches the image portion 30. A spatial offset may be calculated based on a comparison between the image portion of frame 28 indicated by the motion vector 44 and the image portion of frame 28 which best matches the image portion 30. In an embodiment, the spatial offset is the spatial difference between the positions of the two compared image portions. In an embodiment, the spatial offset is included in the combined motion vector, such as, for example, as described above with reference to FIG. 2 and FIG. 3.

[0071] In an embodiment, the trajectory may be forked by the generation of a motion inertia path whenever an inertia motion vector is generated. In an embodiment, the generation of an inertia motion vector may set a flag and any subsequently generated interpolated motion vectors may be generated together with the motion inertia path. Accordingly, a forked trajectory may be generated each time an inertia motion vector is generated. It is to be noted that in alternative embodiments, a plurality of forked trajectories each time an inertia motion vector is generated, e.g. two, three, four, or even more. In various embodiments, a plurality of motion inertia paths may be generated, e.g. two, three, four, or even more.

[0072] According to the above operation, inaccurate spatial matching due to occlusion is ignored and the resulting motion trajectory is forked. Also, a secondary trajectory is established to trace the object based on its inertia. At each intermediate frame, surrounding motion vectors are used in either temporal direction, depending on the estimation quality of each motion vector, to interpolate a new motion vector to the next intermediate or reference frame.

[0073] The above operation may enhance performance in scenarios where object occlusions exist in the video. [0074] The following provides a description of an embodiment with reference to FIG. 7 to FIG. 14. The following embodiment may be interpreted as a motion braiding embodiment. [0075] As mentioned above, as the temporal distance between the reference and current frame increases, object occlusions are more likely to occur in the intermediate frames. The trace path in motion lacing (e.g. as described above with reference to FIG. 2 and FIG. 3) can be interrupted by temporary occlusion. In the spatial-temporal vicinity where occlusion occurs, the motion vectors in the pre-computed fields used in motion lacing may not represent true motion. Consequently, during lacing through the intermediate frames, the interpolated motion vector may deviate from the 'ideal' motion path and yield false motion match in the reference frame. This can be seen more particularly on FIG. 7. [0076] FIG. 7 illustrates a video sequence 500 including four frames: f(l), f(2), f(3) and f(4). The structure and subject-matter of each frame are analogous to the frames of FIG. 1. In an embodiment, frames f(0) to f(4) are consecutive frames of video sequence 500. In an embodiment, frame f(0) is the temporally earliest frame.

[0077] FIG. 7 provides an example of object occlusion posing a problem for motion estimation. In this illustration, there are two horizontally moving objects; a square 16 passing in front of a sphere. The square 16 briefly occludes the sphere in frames f(l) and f(2). When forward motion estimation is performed for f(2) using f(l) as a reference frame, it may not be possible to find a good match for some macroblocks due to the occlusion event. For instance, a highlighted macroblock 502 in f(2) maybe unable to find a good match (as indicated by the dashed arrows). In practice, the motion estimation algorithm may return a motion vector that gives the closest possible match regardless whether the motion vector represents the true motion or not.

[0078] FIG. 8 provides an example of how motion estimating in the opposite temporal dimension may resolve the object occlusion difficulty. By searching in the opposite temporal direction, that is performing backward motion estimation using f(3) as the reference frame, the highlighted macroblock 502 in f(2) is able to find a good match in f(3), i.e. image portion 504, when it is unable to do so with reference f(l).

[0079] The assumption of motion continuity of a rigid body implies that, if a particular macroblock content in frame f(t) is occluded in f(t-l), then it is very unlikely to be occluded in f(t+l); each macroblock should have a good matching reference in either temporal direction (see FIG. 8). Hence, an additional pass is introduced to the motion lacing embodiments described above. In motion lacing, two- passes of fast motion estimation, in forward and backward temporal directions, is performed to provide interpolating information for bi-directional motion estimation with lacing. In the following embodiment, the two sets of information are cross processed in order to eliminate motion vectors that do not represent true object motion. Given m_tjt-i and m_t;t+], let:

-rn_t i+i , if poor match due to occlusion.

1 liii i-i , otherwise.

In various embodiments, "poor match" may be understood that the comparison value which represents a match, e.g. the sum of the square differences of the respective pixel values (e.g. luminance or chrominance values) of a macroblock and a

corresponding frame portion of another frame, is above a predefined threshold value, wherein the predefined threshold value may be predefined by a manufacturer and/or a user.

The sets of motion vectors are called inertia motion vectors and are given by:

[0080] FIG. 9 to FIG. 14 illustrate a motion braiding embodiment that uses the inertia motion vectors with the interpolated motion vector (from motion lacing) to split the lace accordingly in the intermediate frames in order to determine the true motion estimate at the reference frame.

[0081] In summary, the operations of FIG. 9 to FIG. 14 combine together in order to motion estimate macroblock b in f(3) using f(0) as reference. In FIG. 9, a matching reference block rO in f(2) is determined. The motion of macroblock b to the position of reference block rO is defined by the motion vector M₃₎₂(b).

[0082] In FIG. 10, a matching reference block in f(l) may be found in order to trace a motion path to f(0) eventually. It may be seen from FIG. 9 that rO in f(2) overlaps macroblocks bO and bl . The computed motion vectors of bO and bl may be interpolated to determine the motion path from f(2) to f(l). In an embodiment, for bO, the inertia motion vector M _2,i(b0) is used since bO does not have a good match in f(l) due to the object occlusion visible in f(l). Generation of the inertia motion vector may be as described above with reference to FIG. 5. In an embodiment, a use_inertia flag may be set since the inertia motion vector has been used. In an embodiment, for bl , the motion vector M_2>1(bl) is used, i.e. an inertia motion vector is not used for bl . Generation of the motion vector may be as described above with reference to FIG. 5.

[0083] In FIG. 1 1, an interpolated motion vector uO is generated from M ₂₎₁(b0) and M_2]i(bl). A reference macroblock rl in f(l) is given by motion vector uO relative from rO in f(2). At this point, the trace path of macroblock b from f(3) may be M_3>2(b) + uO.

[0084] In FIG. 12, the reference block rl of f(l) overlaps macroblocks b2 and b3 of f(l). In an embodiment, to continue the motion path tracing to f(0), the motion vectors of macroblocks b2 and b3 are interpolated. In an embodiment, a matching reference block may be identifiable in f(0) for each of b2 and b3. Stated differently, the pixel content of each of b2 and b3 may be present (i.e_r not occluded) in f(0). Accordingly, a motion vector M]_i0(b2) may be used, whereas a motion vector Mi_i0(b3) may be used. Generation of these motion vectors may be as described above with reference to FIG. 5.

[0085] In FIG. 13, since the use_inertia flag is set, there may be two possible paths from rl in f(l) to f(0), i.e. the trajectory may be forked (as described above with reference to FIG. 6). One path may be given by an interpolated motion vector ul generated from Mi_,o(b2) and Mi₎₀(b3). The other path may be given by the motion vector uO computed previously in FIG. 11. In an embodiment, uO plays the role of an inertia motion vector since the occlusion event in f(l) cannot be foreseen to continue in f(0). Thus the inertia motion vector uO may help to assure that the traced motion path does not deviate from true motion due to object occlusion. Since f(0) is the target reference frame for macroblock b, fast motion estimation may be performed in two areas aO and al given by motion vectors uO and ul to determine the best matching block for b. The size of the areas aO and al may be fixed or variable. The size of one area may be the same size or a different size as the other area.

[0086] In FIG. 14, the best matching block is given by r2 from searching around the area aO given by motion vector uO (relative from rl). In an embodiment, d denotes the spatial offset between rl and r2, i.e. the final fast motion estimation result. Therefore the motion vector M_3,0(b) may be given by the motion trace path from f(3) to f(0), which is M_3,2(b)+ uO + (uO + d).

[0087] According to the above-described embodiment, it is possible to obtain a combined motion vector which traces motion between a reference frame (e.g. frame f(0)) and a current frame (e.g. frame f(3)) that may be far apart, i.e. they may not be consecutive frames. In an embodiment, two or more intermediate frames (e.g. frames f(l) and f(2)) may exist between the reference and current frames. Specifically, it is the motion of a pixel content of a macroblock (e.g. macroblock b) of the current frame that may be traced back to the reference frame. The trace path may be defined by the combined motion vector (e.g. motion vector M_3>0(b)). According to this operation, motion estimation may be performed in the reference frame at or around the location indicated by the combined motion vector in order to identify an image portion (e.g. image portion r₂) of the reference frame having a pixel content matching or corresponding to a pixel content of the particular macroblock (e.g. macroblock b) of the current frame. Accordingly, the search for the corresponding image portion in the current frame may be focused, rather than, for example, being an exhaustive search. In this way, fewer computational operations may be performed and energy (i.e. power) may be conserved. It is therefore an advantage of the above-described embodiment that computation may be simplified such that processing may be performed faster and/or using less power. [0088] In an embodiment, motion lacing may be interpreted as a special case of motion braiding. Stated differently, operations associated with motion lacing may form a subset of the operations associated with motion braiding.

[0089] In the above-described embodiments, a video sequence including three or four frames is considered. It is to be understood that in some other embodiments, the video sequence may include a greater or lesser number of frames. Furthermore, it is to be understood that when tracing the path of an image portion from an initialization frame to a final frame, any number of intermediate frames may be present, i.e. there may be more than one or two intermediate frames. In such cases, the above-described principles and methods may be applied in an analogous fashion to as described above.

[0090] In the above-described embodiments, frames of a particular size and shape have been considered. It is to be understood that in some other embodiments, the size and shape of a frame may vary from those described above. For example, a frame may include a greater or lesser number of macroblocks to one of the above-described frames. Additionally or alternatively, a frame may have a shape which is different to one of the above-described frames, such as, for example, a rectangular shape, a triangular shape or a hexagonal shape.

[0091] In the above-described embodiments, macroblocks of a particular size and shape have been considered. It is to be understood that in some other embodiments, the size and shape of a macroblock may vary from those described above. For example, a macroblock may include a greater or lesser number of pixels to one of the above-described macroblocks. Additionally or alternatively, a macroblock may have a shape which is different to one of the above-described macroblocks, such as, for example, a rectangular shape, a triangular shape or a hexagonal shape.

[0092] It is to be understood that the term 'macroblock' as used in the above- described embodiments is not limiting. Instead, the term is taken to mean an image portion of a frame and it is to be understood that the image portion may have any size or shape, i.e. it may include any number of pixels. [0093] In the above-described embodiments, an image portion of a frame is referred to as coving, including or overlapping various macroblocks of the frame. For example, in Figure 4, image portion 32 covers macroblocks 36 and 38. For example, in Figures 1 1 and 12, image portion rO is overlaps macroblocks b2 and b3. It is noted that covering, including and overlapping all refer to the same thing. Specifically, they refer to the situation where an image portion having the same size and shape as a macroblock is identified in a frame. However, the image portion is not necessarily positioned such that it occupies a single macroblock of the frame. Accordingly, the image portion may occupy parts of multiple macroblocks, i.e. it may overlap parts of multiple macroblocks. This can be clearly seen by comparing image portion rO of FIG. 1 1 with macroblocks b2 and b3 of FIG. 12.

[0094] Further to the above, it is noted that in the above-described embodiments, an image portion overlaps only two macroblocks. However, it is to be understood that in some other embodiments, an image portion may overlap more than two macroblocks. For example, in an embodiment, an image portion may overlap four or more macroblocks.

[0095] FIG. 15 depicts an example computing device 1000 that may be utilized to implement any one of the above-described methods for estimating motion between frames of a video sequence. In an embodiment, the computing device 1000 is an apparatus for estimating motion between frames of a video sequence. The following description of computing device 1000 is provided by way of example only and is not intended to be limiting.

[0096] As shown in FIG. 15, example computing device 1000 includes a processor 1004 for executing software routines. Although a single processor is shown for the sake of clarity, computing device 1000 may also include a multi-processor system. Processor 1004 is connected to a communication infrastructure 1006 for communication with other components of computing device 1000. Communication infrastructure 1006 may include, for example, a communications bus, cross-bar, or network. [0097] Computing device 1000 further includes a main memory 1008, such as a random access memory (RAM), and a secondary memory 1010. Secondary memory 1010 may include, for example, a hard disk drive 1012 and/or a removable storage drive 1014, which may include a floppy disk drive, a magnetic tape drive, an optical disk drive, or the like. Removable storage drive 1014 reads from and or writes to a removable storage unit 1018 in a well known manner. Removable storage unit 1018 may include a floppy disk, magnetic tape, optical disk, or the like, which is read by and written to by removable storage drive 1014. As will be appreciated by persons skilled in the relevant art(s), removable storage unit 1018 includes a computer readable storage medium having stored therein computer executable program code instructions and/or data.

[0098] In an alternative implementation, secondary memory 1010 may additionally or alternatively include other similar means for allowing computer programs or other instructions to be loaded into computing device 1000. Such means can include, for example, a removable storage unit 1022 and an interface 1020. Examples of a removable storage unit 1022 and interface 1020 include a program cartridge and cartridge interface (such as that found in video game console devices), a removable memory chip (such as an EPROM or PROM) and associated socket, and other removable storage units 1022 and interfaces 1020 which allow software and data to be transferred from the removable storage unit 1022 to computer system 1000.

[0099] Computing device 1000 also includes at least one communication interface 1024. Communication interface 1024 allows software and data to be transferred between computing device 1000 and external devices via a communication path 1026. In various embodiments, communication interface 1024 permits data to be transferred between computing device 1000 and a data communication network, such as a public data or private data communication network. Examples of communication interface 1024 can include a modem, a network interface (such as Ethernet card), a communication port, and the like. Software and data transferred via communication interface 1024 are in the form of signals which can be electronic, electromagnetic, optical or other signals capable of being received by communication interface 1024. These signals are provided to the communication interface via communication path 1026.

[00100] As shown in FIG. 15, computing device 1000 n ay further include a display interface 1002 which performs operations for rendering images to an associated display 1030 and an audio interface 1032 for performing operations for playing audio content via associated speaker(s) 1034.

[00101] As used herein, the term "computer program product" may refer, in part, to removable storage unit 1018, removable storage unit 1022, a hard disk installed in hard disk drive 1012, or a carrier wave carrying software over communication path 1026 (wireless link or cable) to communication interface 1024. A computer readable medium can include magnetic media, optical media, or other recordable media, or media that transmits a carrier wave or other signal. These computer program products are devices for providing software to computer system 1000. [00102] Computer programs (also called computer program code) are stored in main memory 1008 and/or secondary memory 1010. Computer programs can also be received via communication interface 1024. Such computer programs, when executed, enable the computing device 1000 to perform one or more features of embodiments discussed herein. In various embodiments, the computer programs, when executed, enable the processor 1004 to perform features of the above-described embodiments. Accordingly, such computer programs represent controllers of the computer system 1000.

[00103] Software may be stored in a computer program product and loaded into computing device 1000 using removable storage drive 1014, hard disk drive 1012, or interface 1020. Alternatively, the computer program product may be downloaded to computer system 1000 over communications path 1026. The software, when executed by the processor 1004, causes the computing device 1000 to perform functions of embodiments described herein. [00104] It is to be understood that the embodiment of Figure 15 is presented merely by way of example. Therefore, in some embodiments one or more features of the computing device 1000 may be omitted. Also, in some embodiments, one or more features of the computing device 1000 may be combined together. Additionally, in some embodiments, one or more features of the computing device 1000 may be split into one or more component parts.

[00105] While the invention has been particularly shown and described with reference to specific example embodiments, it should be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. The scope of the invention is thus indicated by the appended claims and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced.

Claims

1. A method for estimating motion between frames of a video sequence, the method comprising:

determining a first intermediate motion vector relating to a frame portion of a first frame, the motion vector representing motion of the frame portion of the first frame to a second frame in a first temporal direction; and

determining a second intermediate motion vector relating to another frame portion of the first frame, the additional motion vector representing motion of the other frame portion of the first frame to a third frame, the third frame being in an opposite temporal direction from the first frame as the second frame; and

determining a motion vector relating to the frame portion of the first frame using the first intermediate motion vector and the second intermediate motion vector.

2. The method of claim 1, further comprising: determining an inertia motion vector relating to a frame portion of the second frame being determined by the determined motion vector, the inertia motion vector being determined using the determined motion vector, the inertia motion vector representing motion of the frame portion of the second frame to a further frame in the first temporal direction.

3. The method of claim 2, wherein the inertia motion vector is similar to the determined motion vector.

4. The method of any one of claims 1 to 3, wherein the motion vector relating to the frame portion of the first frame is determined by inverting the second intermediate motion vector and by using the first intermediate motion vector and the inverted second intermediate motion vector.

5. The method of claim 4, wherein the motion vector relating to the frame portion of the first frame is determined by interpolating the first intermediate motion vector and the inverted second intermediate motion vector. The method of any preceding claim, wherein the frame portions correspond to an image block or image macroblocks of a respective frame.

The method of any preceding claim, wherein the first frame is an I frame.

An apparatus for estimating motion between frames of a video sequence, the apparatus comprising:

at least one processor; and

at least one memory including computer program code;

the at least one memory and the computer program code being configured, with the at least one processor, to cause the apparatus to perform at least the following: determining a first intermediate motion vector relating to a frame portion of a first frame, the motion vector representing motion of the frame portion of the first frame to a second frame in a first temporal direction; and

A computer program product for estimating motion between frames of a video sequence, the computer program product comprising at least one computer- readable storage medium having computer-executable program code instructions stored therein, the computer-executable program code instructions comprising: program code for determining a first intermediate motion vector relating to a frame portion of a first frame, the motion vector representing motion of the frame portion of the first frame to a second frame in a first temporal direction; and

program code for determining a second intermediate motion vector relating to another frame portion of the first frame, the additional motion vector representing motion of the other frame portion of the first frame to a third frame, the third frame being in an opposite temporal direction from the first frame as the second frame; and

program code for determining a motion vector relating to the frame portion of the first frame using the first intermediate motion vector and the second intermediate motion vector.