GB2471323A - Motion Vector Estimator - Google Patents

Motion Vector Estimator Download PDF

Info

Publication number
GB2471323A
GB2471323A GB0911050A GB0911050A GB2471323A GB 2471323 A GB2471323 A GB 2471323A GB 0911050 A GB0911050 A GB 0911050A GB 0911050 A GB0911050 A GB 0911050A GB 2471323 A GB2471323 A GB 2471323A
Authority
GB
United Kingdom
Prior art keywords
sampled
reference frame
motion vector
processing apparatus
data processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB0911050A
Other versions
GB0911050D0 (en
GB2471323B (en
Inventor
Patrik Andersson
Tomas Edso
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARM Ltd
Original Assignee
ARM Ltd
Advanced Risc Machines Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ARM Ltd, Advanced Risc Machines Ltd filed Critical ARM Ltd
Priority to GB0911050.3A priority Critical patent/GB2471323B/en
Publication of GB0911050D0 publication Critical patent/GB0911050D0/en
Priority to JP2010143839A priority patent/JP2011010304A/en
Priority to US12/801,789 priority patent/US9407931B2/en
Priority to CN201010253443.3A priority patent/CN101938652B/en
Publication of GB2471323A publication Critical patent/GB2471323A/en
Application granted granted Critical
Publication of GB2471323B publication Critical patent/GB2471323B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/567Motion estimation based on rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/527Global motion vector estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/53Multi-resolution motion estimation; Hierarchical motion estimation

Abstract

A data processing apparatus is provided which is configured to receive a down-sampled source block 635 and a down-sampled reference frame 640 portion. The data processing apparatus comprises interpolation circuitry 670 configured to interpolate between pixels of the down-sampled reference frame portion to generate a set of interpolated down-sampled reference frame blocks. Cost function calculation circuitry 650 calculates a cost function value indicative of a difference between the down-sampled source block and each interpolated down-sampled reference frame block. Minimisation circuitry 655 identifies the lowest cost function value and estimation motion vector generation circuitry 660 generates an estimate motion vector independence thereon. Said downsampled source and reference images may be a filtered version of said images or an averaging of the pixels of said images. Said cost function may be calculated by algorithms determining a sum of absolute transformed differences, a sum of square error, a mean square error, a mean absolute error and a mean absolute difference.

Description

MOTION VECTOR ESTIMATOR
FIELD OF THE INVENTION
The present invention is concerned with video encoding, and in particular to motion searching in video encoding in order to generate motion vectors.
BACKGROUND OF THE INVENTION
Motion searching is an important part of contemporary video encoding, and is known to be a high bandwidth, memory intensive activity. Typically, block based motion estimation is carried out, in which a current frame is subdivided into blocks of pixels. Each of these blocks (also referred to herein as a source block) is then compared to a reference frame (which may for example be the preceding frame) to find a best-matching block therein. The displacement from the location of this best-matching block to the current source block defines a motion vector, which is used by the video encoder to encode the source block by reference to that best matching block, using the motion vector together with further information representing any residual difference between the blocks.
Due to the computational demands of motion searching, in particular the memory access bandwidth requirements, it is known to perform a motion search in more than one step, initially carrying out a rough search which identifies an approximate motion vector, followed by a more detailed search to refine the results of the rough search. Jn this context, it is known for the rough search to search discrete positions in the reference frame or search a down-sampled version of the reference frame. Down-sampling is the process of generating an image of lower resolution from a higher resolution image. Performing motion vector searching using down-sampled images has the advantage of reducing memory access bandwidth, but at the cost of lower accuracy in the motion search.
It is further known for motion searches to be carried out in more than two steps, in a multi-resolution motion search. For example in "A Fast Hierarchical Motion Vector Estimation Algorithm Using Mean Pyramid", Kwon Moon Nam et a!, IEEE Trans. Circuits and Systems for Video Technology, Volume 5, pp. 344-35 1, April 1993, a multi-stage motion search is described in which the motion search is broken down into several stages of progressive resolutions.
Hence, although down-sampling can avoid the high bandwidth requirements of a full resolution search, it suffers from a degradation in motion vector estimation accuracy resulting from the lower resolution of the down-sampled search image. More severely, when the full resolution image contains high frequency detail, the loss of such high frequency components of the image can result in the calculation of a motion vector from the down-sampled image that is significantly different from that which would have been calculated from the full resolution image.
Hence, it would be desirable to provide an improved technique for motion vector estimation in a video encoder with improved accuracy, yet without the high bandwidth requirements of a full resolution search.
SUMMAR? OF THE INVENTION Viewed from a first aspect, the present invention provides a data processing apparatus configured to receive a down-sampled source block generated from a source frame and to receive a down-sampled reference frame portion generated from a reference frame, said reference frame and said source frame being taken from a sequence of video frames, said data processing apparatus comprising: interpolation circuitry configured to interpolate between pixels of said down-sampled reference frame portion to generate a set of interpolated down-sampled reference frame blocks; cost function calculation circuitry configured to calculate a cost function value indicative of a difference between said down-sampled source block and each of said set of interpolated down-sampled reference frame blocks; minimisation circuitry configured to select an interpolated down-sampled reference frame block which corresponds to a minimum of said cost function value; and estimate motion vector generation circuitry configured to generate an estimate motion vector in dependence on said interpolated down-sampled reference frame block selected by said minimisation circuitry.
According to the techniques of the present invention, the data processing apparatus receives a down-sampled source block and a down-sampled reference frame portion, in order to generate an estimate motion vector based on a comparison between the two. The inventors of the present invention realised that an improvement in the accuracy of the estimate motion vector generated could be attained, without resorting to fetching higher resolution images from memory and the memory bandwidth increase that would entail, by providing interpolation circuitry which uses the down-sampled reference frame portion to generate a set of interpolated down-sampled reference frame blocks.
Cost function calculation circuitry is provided, which calculates a cost function value indicating a difference between the down-sampled source block and each interpolated down-sampled reference frame block. Thus the data processing apparatus can not only calculate the cost function at integer pixel positions in the down-sampled reference frame, but also at interpolated positions between those integer pixel positions. As a result the minimisation circuitry, which is arranged to find that position of the down-sampled source block within the down-sampled reference frame portion which minimises the cost function value, can find a minimum of the cost function at a higher resolution than would be possible with reference to integer pixel positions in the down-sampled reference frame alone. The estimate motion vector then generated by the estimate motion vector generation circuitry can thus provide result motion vector generation circuitry with a more accurate starting point for a motion search at full resolution. The accuracy is particularly improved when the reference frame contains high frequency components that are liable to be smeared out by down-sampling.
Furthermore, the techniques of the present invention enable an improvement in the accuracy of the estimate motion vector, without increasing the memory access bandwidth requirement, since only down-sampled source and reference frame blocks are retrieved from memory to be locally stored.
The down-sampled source image and down-sampled reference image may have been previously generated and stored in memory, but in one embodiment the data processing apparatus further comprises down-sampling circuitry configured to generate said down-sampled source block and said down-sampled reference frame portion. Providing down-sampling circuitry allows down-sampled images to be generated for storage in memory as they are required.
In one embodiment the data processing apparatus further comprises motion vector generation circuitry configured to receive as an input said estimate motion
S
vector and to generate from said source image and said reference image a result motion vector, said result motion vector being constrained to be within a predetermined range of said estimate motion vector. Provision of the estimate motion vector to the result motion vector generator allows the result motion vector generator to limit the area in which it performs a flu resolution motion search, thus saving memory bandwidth and computational resource.
It is advantageous to select the set of interpolated down-sampled reference frame blocks such that a minimum of the cost function may be more rapidly found and in one embodiment said set of interpolated down-sampled reference frame blocks is determined with reference to a predetermined set of points in said down-sampled reference frame portion. This set of points may be specified in a number of ways, but in one embodiment said predetermined set of points are separated by half of a block width.
In another embodiment, said set of interpolated down-sampled reference frame blocks is determined with reference to a null motion vector. When performing video encoding it is common that a motion vector for a source block from a source frame will be found to have a motion vector that is close to a null motion vector, when that block has not (or hardly) moved with respect to the same block in the reference frame.
In yet another embodiment, said set of interpolated down-sampled reference frame blocks is determined with reference to at least one predetermined motion vector of at least one neighbouring source block. When performing video encoding it is common that a motion vector for a given block will be closely correlated with the motion vectors of neighbouring blocks, for example due to the movement of an object which is larger than the block size. In one embodiment this at least one predetermined motion vector is a predicted motion vector for said down-sampled source block. A predicted motion vector is typically generated for each source block as part of the video encoding process, based on the motion vectors of neighbouring blocks.
Although the data processing apparatus could operate in a single step, calculating the cost function value for a single set of interpolated down-sampled reference frame blocks, in one embodiment said cost function calculation circuitry and said minimisation circuitry are configured to iteratively select said set of interpolated down-sampled reference frame blocks to find a local minimum of said cost function
S
value. This arrangement allows the data processing apparatus to home in on a minimum of the cost function, iteratively selecting the set of interpolated down-sampled reference frame blocks to follow the surface of the cost function to a local minimum.
In order for the motion searching to be efficiently performed, the down-sampled reference frame portion must naturally be larger than the down-sampled source block, but should not present too great an area in which to motion search, since this would be burdensome both in terms of computational resource and in terms of memory access bandwidth. Hence in one embodiment said down-sampled reference frame portion is approximately an order of magnitude larger than said down-sampled source block.
It will be appreciated that the down-sampled source block could be generated from a source frame in a number of ways and in one embodiment said down-sampled source block comprises a subset of pixels from said source image. Equivalently, in one embodiment said down-sampled reference frame portion comprises a subset of pixels from said reference image. These subsets could for example be every other pixel, every fourth pixel or similar.
In other embodiments said down-sampled source block comprises a filtered version of a block of said source image, which could for example be provided in that each pixel of said down-sampled source block is generated by averaging over a set of pixels of said source image. Taking the mean or median pixel value over a 2x2 or 4x4 set of pixels are such examples.
Similarly in other embodiments said down-sampled reference frame portion comprises a filtered version of a block of said reference image, which could for example be provided in that each pixel of said down-sampled reference frame portion is generated by averaging over a set of pixels of said reference image. Similarly, taking the mean or median pixel value over a 2x2 or 4x4 set of pixels are such
examples.
The interpolation performed by the interpolation circuitry could be performed at a range of ratios, but in one embodiment said interpolation circuitry is configured to perform /4 pixel interpolation.
Generally the estimate motion vector could take any length within a frame, but it is advantageous to strike a balance between the freedom for the motion vector to take any length and the computational resource required to search in a wider area and in one embodiment said estimate motion vector is constrained to have a maximum length of 64 pixels.
There are various ways in which the cost function value could be calculated, but in an advantageously simple embodiment said cost function value is calculated from a sum of absolute differences between pixels of said down-sampled source block and pixels of each of said set of interpolated down-sampled reference frame blocks.
This sum of absolute differences does not account for the cost of encoding a large motion vector and in one embodiment said cost function value further comprises a motion vector penalty value.
In other embodiments the cost function value is calculated based on one of a sum of absolute transformed difference (SATD) algorithm, a sum of square error (SSE) algorithm, a mean square error (MSE) algorithm, a mean absolute error (MAE) algorithm, and a mean absolute difference (MAD) algorithm.
Viewed from a second aspect, the present invention provides a method of generating an estimate motion vector comprising the steps of: receiving a down-sampled source block generated from a source frame and receiving a down-sampled reference frame portion generated from a reference frame, said reference frame and said source frame being taken from a sequence of video frames; interpolating between pixels of said down-sampled reference frame portion to generate a set of interpolated down-sampled reference frame blocks; calculating a cost function value indicative of a difference between said down-sampled source block and each of said set of interpolated down-sampled reference frame blocks; selecting an interpolated down-sampled reference frame block which corresponds to a minimum of said cost function value; and generating an estimate motion vector in dependence on said interpolated down-sampled reference frame block selected by said minimisation circuitry.
I
Viewed from a third aspect, the present invention provides a data processing apparatus configured to receive a down-sampled source block generated from a source frame and to receive a down-sampled reference frame portion generated from a reference frame, said reference frame and said source frame being taken from a sequence of video frames, said data processing apparatus comprising: interpolation means for interpolating between pixels of said down-sampled reference frame portion to generate a set of interpolated down-sampled reference frame blocks; cost function calculation means for calculating a cost function value indicative of a difference between said down-sampled source block and each of said set of interpolated down-sampled reference frame blocks; minimisation means for selecting an interpolated down-sampled reference frame block which corresponds to a minimum of said cost function value; and estimate motion vector generation means for generating an estimate motion vector in dependence on said interpolated down-sampled reference frame block selected by said minimisation means.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention will be described further, by way of example only, with reference to embodiments thereof as illustrated in the accompanying drawings, in which: Figure 1 schematically illustrates a system for generating a motion vector; Figure 2 schematically illustrates the down-sampling of a 1 6x1 6 block; Figure 3 schematically illustrates a down-sampled source block and a down-sampled reference frame portion; Figure 4 schematically illustrates a predetermined set of points in a down-sampled reference frame portion; Figure 5 schematically illustrates 1/4 pixel interpolation; Figure 6 schematically illustrates a data processing apparatus according to one embodiment; Figure 7 schematically illustrates the iterative search for a cost function minimum;
S
Figure 8A schematically illustrates a series of steps performed in one embodiment; Figure 8B schematically illustrates a series of steps performed in one embodiment; Figure 8C schematically illustrates a series of steps performed in one embodiment; Figure 8D schematically illustrates a series of steps performed in one embodiment; Figure 9 schematically represents neighbouring blocks for a given source block and Figure 10 schematically illustrates an example improvement in motion vector estimation.
DESCRIPTION OF EMBODIMENTS
Figure 1 schematically illustrates a system for generating motion vectors as part of a video encoding process. An external memory 100 stores frames of video data taken from a sequence of video frames which are to be encoded. Also stored within external memory 100 are down-sampled versions of those frames which are generated according to known techniques such as taking a subset of pixels from the original image (e.g. every fourth pixel) or by filtering the original image (e.g. by averaging over a set of 4x4 pixels). The full resolution frames stored in external memory 100 are passed to motion vector generation unit 110 which determines motion vectors for blocks within those frames. However, this process of motion vector generation using full resolution frames is computationally very expensive. For this reason, motion vector estimation unit 120 is provided which provides the motion vector generation unit 110 with an estimate motion vector which constrains the motion vector searching performed by the motion vector generation unit 110 to take place within a limited spatial region. The motion vector estimation unit 120 receives the down-sampled frames from external memory 100 and performs motion vector estimation in order to generate the estimate motion vector passed to motion vector generation unit 110.
The disadvantage of performing motion vector estimation using down-sampled frames is that high frequency elements of the original full resolution frames may be missed by this motion vector estimation process and hence the estimate motion vector passed to the motion vector generation unit 110 represents a poor starting place for the full resolution search. However, motion vector estimation unit 120 is arranged (as will be further described hereinafter) to interpolate between the pixels of the down-sampled frames and to perform motion vector estimation using those interpolated frames. This enables the motion vector estimation unit to maintain the low bandwidth advantages of only retrieving down-sampled frames from external memory 100, whilst improving on the resolution accessible in the down-sampled frames alone, and hence providing the motion vector generation unit 110 with an estimate motion vector which represents a more accurate starting position for performing the motion vector generation using full resolution frames.
Figure 2 schematically illustrates the process of down-sampling. Each full resolution frame is subdivided into macro blocks (also simply known as blocks) comprising 16x16 full resolution pixels as illustrated by grid 200. Once down-sampled such a block becomes a 4x4 set of down-sampled pixels as illustrated by grid 210. The process of down-sampling could be carried out in a number of ways, for example by taking a subset of the pixels in the full resolution frame, i.e. every fourth pixel (both horizontally and vertically) being taken from the 1 6x 16 full resolution grid to provide the 4x4 grid 210. However, in this illustrated embodiment the down-sampled block is generated by filtering the 16x16 full resolution grid 200, averaging over each 4x4. set of pixels to generate each down-sampled pixel of the 4x4 block 210.
The process of motion vector estimation is further explained with reference to Figure 3, in which there is illustrated a 4x4 down-sampled source block 300 and a 40x40 down-sampled reference frame portion 310. In order to perform the video encoding each block of a current frame being encoded is in turn treated as the source block which is then compared to a reference frame to find the position in which the source block best fits with the reference frame. Whilst it would in principle be possible to search the entirety of a reference frame this is rather computationally intensive, in particular requiring more data to be retrieved from memory and stored locally, and in this embodiment only a portion of the reference frame is examined, namely a 40x40 (down-sampled) section thereof. A reference frame section of 40x40 down-sampled pixels is used, because this allows a more efficient search procedure to be carried out in which four source blocks in a 2x2 configuration are simultaneously processed. Adjacent source blocks will clearly require largely overlapping reference frames and for a 2x2 configuration of source blocks, a 40x40 reference frame provides a search window of �64 full resolution pixels for each source block. Thus the motion vector estimation unit 120 in Figure 1 retrieves each down-sampled source block 300 from external memory 100 and a down-sampled reference frame portion 310 from external memory 100, storing each in local buffers whilst the searching is performed.
Whilst the source block 300 could in principle be compared initially with all possible positions in reference portion 310, a less computationally intensive approach may be taken, which still produces satisfactory results, in which source block 300 is initially compared with a predetermined set of points in reference frame portion 310 as is schematically illustrated in Figure 4. This set of points 400 are in this embodiment separated by half of a block width, i.e. at two down-sampled pixel separation. This then provides a 16x16 set of points against which the source block 300 is compared in reference frame portion 310. Out of these points the point at which the source block best fits (an example is labelled 410 in Figure 4) is then selected according to a cost function minimisation technique which will be further described hereinafter.
Figure 5 schematically illustrates the interpolation between down-sampled pixels of the down-sampled reference frame portion to generate interpolated down-sampled reference frame blocks to be compared with the down-sampled source block.
(Note that for clarity of illustration only 2x2 interpolated blocks are illustrated in this figure.) Performing this interpolation means that the down-sampled source block can not only be compared with integer positions in the down-sampled reference frame portion 500 (of which only part is illustrated here), but also at interim positions between those integer positions. For example, the interpolated down-sampled reference frame block 510 is offset downwards and rightwards by a quarter down-sampled pixel from the integer down-sampled pixel positions of the down-sampled reference frame portion. As another example, the interpolated down-sampled reference frame block 520 is offset by half a down-sampled pixel to the right from the integer positions of the down-sampled reference frame portion. These offset positions may be generated by weighting the pixels according to their area of overlap with the integer position pixels. For example, each of the four pixels of the interpolated down-sampled reference frame block 520 derives 50% of its value from each of the two integer position pixels that it spans. On the other hand each pixel of the interpolated down-sampled reference frame block 510 comprise a 9/16 weighting from the pixel it mainly overlaps, 3/16 weightings from the immediately horizontally and vertically adjacent pixels, and 1/16 weighting from the pixel overlapped by its corner.
Figure 6 schematically illustrates a data processing apparatus according to one embodiment for generating a motion vector estimation in a video encoding system.
External memory 600 stores images (video frames) for access by the remainder of the system. These images are stored both at full resolution, as in the case of source image 605 and reference image 615, and are also stored in down-sampled (DS) form, as in the case of DS source image 610 and DS reference image 620. DS source image 610 and DS reference image 620 are generated from source image 605 and reference image 615 respectively by down-sampler 625.
In order for the system to perform motion vector estimation the control unit accesses a down-sampled source image 610 in order to fill source buffer 635 with a down-sampled source block taken therefrom. Similarly control unit 630 accesses down-sampled reference image 620 to fill reference buffer 640 with a down-sampled reference frame portion taken therefrom. The contents of source buffer 635 are passed to search unit 645 which comprises cost function calculation unit 650, minimisation unit 655 and control unit 660. Parts of the content of reference buffer 640 are also passed via interpolator 670 to cost function calculator 650. Hence, the search unit 645 searches in the search window provided by reference buffer 640 for the position in which the contents of the source buffer 635 best fit. As was described with reference to Figure 3, in one embodiment a 2x2 configuration of source blocks is retrieved from memory together, in order to perform a parallel search, but for clarity in the embodiment illustrated in Figure 6 only one source block is retrieved at a time. As was described with reference to Figure 4A, this search is initially performed with reference to a predetermined set of points in this search window. In this situation the interpolator 670 merely passes a set of 4x4 pixels to the cost function calculator 650, which the cost calculator 650 then compares with the 4x4 down-sampled pixels contained in source buffer 635. This comparison is performed by means of calculating a cost function, which in this embodiment is performed by calculating the Sum of
S
Absolute Differences between each set of 4x4 pixels. In addition, the cost function value calculated by the cost function calculator further includes a motion vector penalty value which represents the cost (in terms of encoding space) of encoding a motion vector describing the current position under investigation in the reference buffer. For example a long motion vector (representing greater movement between the source block and the target block in the reference frame) may require greater encoding space than a shorter motion vector. In any regard, in this embodiment the estimate motion vector is constrained to have a maximum length of 64 pixels.
When the cost function has thus been calculated for each of the predetermined set of positions in the reference buffer search window, the minimisation unit 655 then selects the position which represents the lowest value of the cost function value. This information is then passed to control unit 660 which controls interpolator 670 to interpolate between down-sampled pixels of the down-sampled reference frame portion stored in reference buffer 640 in order to generate a set of interpolated down-sampled reference frame blocks each of which is then passed to cost function calculator 650 in turn for comparison with the contents of source buffer 635. The set of interpolated down-sampled reference frame blocks generated by interpolator 670 are those in the immediate vicinity of the point in the search window previously found by minimisation unit 655 to have the lowest cost function value. Thus an iterative process can be performed in which the minimisation unit finds the lowest cost function value from amongst a set of points, the interpolator 670 then generates interpolated down-sampled reference frame blocks in the immediate vicinity of that point, the cost function calculator 650 calculates the cost function value associated with each of those blocks and the minimisation unit 655 selects the one with the lowest cost function value.
This iterative process is schematically illustrated in Figure 7. In this figure the squares each represent the central position of an interpolated block. At step 700, an original central position (hatched) represents the starting point, together with a set of interpolated down-sampled reference frame blocks immediately adjacent to that position (unhatched blocks). These are generated by the interpolator performing quarter down-sampled pixel interpolation, so these squares represent quarter pixel interpolations "up and left", "up", "up and right", "left", "right", "down and left",
S
"down" and "down and right" with respect to the original position. Amongst these interpolated positions the minimisation unit has then selected the lower right ("down and right") position as having the lowest cost function value. Then at step 710 the interpolation unit generates a further set of interpolated down-sampled reference frame blocks in the immediate vicinity of this new lowest cost function value position (one of which (upper left) corresponds to the original position at the centre of step 700) and the minimisation circuit selects the block which results in the lowest cost function value. Finally, at step 720 the interpolation unit again generates a set of interpolated down-sampled reference frame blocks in the immediate vicinity of the position selected at the previous stage having the lowest cost function value, but now the position with the lowest cost function value remains the central point. Thus a local minimum of the cost function value has been found and the iterative process stops.
The estimate motion vector generated is then generated with respect to this position.
In addition to calculating the cost function for a predetermined set of points (see Figure 4) the search unit 645 also in parallel calculates the cost function for a null motion vector (i.e. representing no change in position between the source block and the reference frame) and also calculates the cost function value for the predicted motion vector (each source block has an associated predicted motion vector deriving from its neighbours) for the current source block. From each of these starting points the same iterative minimisation process (also known as a "descent") is also carried out, and the overall lowest cost function value from each of these three methods is then selected for the generation of the final estimate motion vector.
The process of calculating a minimum cost function value according to these three strands is now discussed with reference to the flow diagrams in Figures 8A to 8D. In Figure 8A the flow begins at step 800 where a down-sampled (DS) source and reference images are retrieved from external memory and locally buffered. At step 805 the cost function is calculated for the down-sampled source image at a predetermined set of points on the down-sampled reference image (those selected within the reference frame buffer) arid at step 810 the point with the lowest cost function is identified. Then at step 815 interpolation on local pixels of the down-sampled reference image is performed arid for each interpolated down-sampled reference frame block generated the cost function value is calculated. If a lower cost
S
function value is thereby found to that previously found, then from step 820 the flow moves to step 825 where the focus of the process is shifted to centre on that new lowest cost function value found and the flow returns to step 815 where interpolation in that local region is carried out, followed by the calculation of the corresponding cost function values. If at step 820 a lower cost function value is not found, then the flow proceeds to step 825 where the lowest cost function value found is provided as an output.
Similarly in Figure 8B the flow begins at step 830 where down-sampled source and reference images are buffered having been retrieved from the external memory. In this context the only portion of the reference image required is that which corresponds to a null motion vector of the source block. At step 835 the cost function for this null motion vector is calculated and at step 840 interpolation in the local region of that null motion vector is carried out, followed by the calculation of cost function values corresponding to the interpolated down-sampled reference frame blocks thus generated. At step 845 it is checked whether a lower cost function value has been found then that calculated for the null motion vector itself and if it has then the same iterative loop is started going via step 850 to shift the focus to centre on the lowest cost function value found, followed by interpolation and cost function value calculation at step 840. When no lower cost function value is found step 845, the flow concludes at step 850 where this lowest cost function value is provided as the output.
Finally, in Figure 8C, the flow similarly begins at step 855 by the down-sampled source and reference images being buffered. Here however at step 860 the cost function value is calculated for the predicted motion vector of the source block currently under consideration. This predicted motion vector is derived from the neighbouring source blocks as illustrated in Figure 9 -the predicted motion vector for the hatched source block is generated from the calculated motion vectors for the source blocks marked with crosses. This is then followed by the same iterative loop (of steps 865, 870 and 875, as described above with reference to Figures 8A and 8B. Finally, the cost function value minimum found is provided as an output at step 880.
The results of these three starting points for an iterative "descent" are then compared in the first step 900 of Figure 8D. The lowest cost function value is selected and at step 910 an estimate motion vector is generated corresponding to the position at
S
which this lowest cost function value is found. This estimate motion vector at step 920 is then passed to the full resolution search unit (motion vector generation unit 110 in Figure 1) in order to carry out a full resolution search. At step 930 this full resolution motion vector search is then carried out within 8 pixels of the target of the estimate motion vector. At step 940 the final motion vector is generated.
Figure 10 schematically illustrates the benefits of the present invention.
Generally labelled as 950 is an 8x8 set of blocks each annotated with their calculated motion vector. It can be seen that the peripheral blocks have null motion vectors (represented by a dot), whilst in the central region of the set of blocks there is a subset of blocks each of which has motion vectors pointing to the right. This could for example correspond to an object in the field of view moving to the right against a stationary background. However, it can also be seen that the motion vector for one block is unusually long and at a different angle to the other motion vectors. This has resulted from the estimate motion vector calculation process miscalculating the estimate motion vector because of the lower resolution of the down-sampled images used. The reason for this will become apparent from the cost function value graphs calculated for this block and illustrated at 960, 970 and 980.
The cost function graph illustrated at 960 is the cost function calculated for this block at full resolution. Here it can be seen that the lowest value of this cost function occurs at the tip of the sharp valley on the left of the distribution, i.e. at a small number of pixels and corresponding to a short motion vector such as the majority of the blocks in 950 have.
However, it can be seen in the cost function graph at 970, which is generated from a down-sampled resolution source block and reference frame that the down-sampling has lost the high frequency feature on the left of the graph, and now the minimum has been identified elsewhere, namely at a longer length of motion vector (as illustrated in 950).
The advantage of interpolating the down-sampled reference frame portion can be seen in 980, where the minimum is once again found corresponding to a short motion vector, in approximately the same place as the minimum for the distribution in 960. Thus, the minimum in 980 represents a more promising starting point for a limited range search at full resolution.
Thus, according to the techniques of the present invention, the bandwidth advantages of performing motion vector estimation on a down-sampled source block and reference frame portion are gained, yet by interpolating between pixels of the down-sampled reference frame portion the minimum of the cost function value may be more accurately identified, and thus provide a better starting point for a limited range full resolution search.
Although a particular embodiment has been described herein, it will be appreciated that the invention is not limited thereto and that many modifications and additions thereto may be made within the scope of the invention. For example, various combinations of the features of the following dependent claims could be made with the features of the independent claims without departing from the scope of the present invention.

Claims (25)

  1. CLAIMS1. A data processing apparatus configured to receive a down-sampled source block generated from a source frame and to receive a down-sampled reference frame portion generated from a reference frame, said reference frame and said source frame being taken from a sequence of video frames, said data processing apparatus comprising: interpolation circuitry configured to interpolate between pixels of said down-sampled reference frame portion to generate a set of interpolated down-sampled reference frame blocks; cost function calculation circuitry configured to calculate a cost function value indicative of a difference between said down-sampled source block and each of said set of interpolated down-sampled reference frame blocks; minimisation circuitry configured to select an interpolated down-sampled reference frame block which corresponds to a minimum of said cost function value; and estimate motion vector generation circuitry configured to generate an estimate motion vector in dependence on said interpolated down-sampled reference frame block selected by said minimisation circuitry.
  2. 2. A data processing apparatus as claimed in claim 1, further comprising down-sampling circuitry configured to generate said down-sampled source block and said down-sampled reference frame portion.
  3. 3. A data processing apparatus as claimed in claim 1 or claim 2, further comprising motion vector generation circuitry configured to receive as an input said estimate motion vector and to generate from said source image and said reference image a result motion vector, said result motion vector being constrained to be within a predetermined range of said estimate motion vector.
  4. 4. A data processing apparatus as claimed in any preceding claim, wherein said set of interpolated down-sampled reference frame blocks is determined with reference to a predetermined set of points in said down-sampled reference frame portion.
  5. 5. A data processing apparatus as claimed in claim 4, wherein said predetermined set of points are separated by half of a block width.
  6. 6. A data processing apparatus as claimed in any of claims 1-3, wherein said set of interpolated down-sampled reference frame blocks is determined with reference to a null motion vector.
  7. 7. A data processing apparatus as claimed in claim 1-3, wherein said set of interpolated down-sampled reference frame blocks is determined with reference to at least one predetermined motion vector of at least one neighbouring source block.
  8. 8. A data processing apparatus as claimed in claim 7, wherein said at least one predetermined motion vector is a predicted motion vector for said down-sampled source block.
  9. 9. A data processing apparatus as claimed in any preceding claim, wherein said cost function calculation circuitry and said minimisation circuitry are configured to iteratively select said set of interpolated down-sampled reference frame blocks to find a local minimum of said cost function value.
  10. 10. A data processing apparatus as claimed in any preceding claim, wherein said down-sampled reference frame portion is approximately an order of magnitude larger than said down-sampled source block.
  11. 11. A data processing apparatus as claimed in any preceding claim, wherein said down-sampled source block comprises a subset of pixels from said source image.
  12. 12. A data processing apparatus as claimed in any preceding claim, wherein said down-sampled reference frame portion comprises a subset of pixels from said reference image.
  13. 13. A data processing apparatus as claimed in any of claims 1-10, wherein said down-sampled source block comprises a filtered version of a block of said source image.
  14. 14. A data processing apparatus as claimed in claim 13, wherein each pixel of said down-sampled source block is generated by averaging over a set of pixels of said source image.
  15. 15. A data processing apparatus as claimed in any of claims 1-11, 13 or 14, wherein said down-sampled reference frame portion comprises a filtered version of a block of said reference image.
  16. 16. A data processing apparatus as claimed in claim 15, wherein each pixel of said down-sampled reference frame portion is generated by averaging over a set of pixels of said reference image.
  17. 17. A data processing apparatus as claimed in any preceding claim, wherein said interpolation circuitry is configured to perform /4 pixel interpolation.
  18. 18. A data processing apparatus as claimed in any preceding claim, wherein said estimate motion vector is constrained to have a maximum length of 64 pixels.
  19. 19. A data processing apparatus as claimed in any preceding claim, wherein said cost function value is calculated from a sum of absolute differences (SAD) between pixels of said down-sampled source block and pixels of each of said set of interpolated down-sampled reference frame blocks.
  20. 20. A data processing apparatus as claimed in claim 20, wherein said cost function value further comprises a motion vector penalty value.
  21. 21. A data processing apparatus as claimed in any of claims 1-19, wherein said cost function value is calculated based on one of a sum of absolute transformed difference (SATD) algorithm, a sum of square error (SSE) algorithm, a mean square error (MSE) algorithm, a mean absolute error (MAE) algorithm, arid a mean absolute difference (JVIAD) algorithm.
  22. 22. A method of generating an estimate motion vector comprising the steps of: receiving a down-sampled source block generated from a source frame and receiving a down-sampled reference frame portion generated from a reference frame, said reference frame and said source frame being taken from a sequence of video frames; interpolating between pixels of said down-sampled reference frame portion to generate a set of interpolated down-sampled reference frame blocks; calculating a cost function value indicative of a difference between said down-sampled source block and each of said set of interpolated down-sampled reference frame blocks; selecting an interpolated down-sampled reference frame block which corresponds to a minimum of said cost function value; and generating an estimate motion vector in dependence on said interpolated down-sampled reference frame block selected by said minimisation circuitry.
  23. 23. A data processing apparatus configured to receive a down-sampled source block generated from a source frame and to receive a down-sampled reference frame portion generated from a reference frame, said reference frame and said source frame being taken from a sequence of video frames, said data processing apparatus comprising: interpolation means for interpolating between pixels of said down-sampled reference frame portion to generate a set of interpolated down-sampled reference frame blocks; cost function calculation means for calculating a cost function value indicative of a difference between said down-sampled source block and each of said set of interpolated down-sampled reference frame blocks; minimisation means for selecting an interpolated down-sampled reference frame block which corresponds to a minimum of said cost function value; and estimate motion vector generation means for generating an estimate motion vector in dependence on said interpolated down-sampled reference frame block selected by said minimisation means.
  24. 24. A data processing apparatus substantially as described herein with reference to the accompanying figures.
  25. 25. A method of generating an estimate motion vector substantially as described herein with reference to the accompanying figures.
GB0911050.3A 2009-06-25 2009-06-25 Motion vector estimator Active GB2471323B (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
GB0911050.3A GB2471323B (en) 2009-06-25 2009-06-25 Motion vector estimator
JP2010143839A JP2011010304A (en) 2009-06-25 2010-06-24 Motion vector estimator
US12/801,789 US9407931B2 (en) 2009-06-25 2010-06-25 Motion vector estimator
CN201010253443.3A CN101938652B (en) 2009-06-25 2010-06-25 Motion vector estimator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB0911050.3A GB2471323B (en) 2009-06-25 2009-06-25 Motion vector estimator

Publications (3)

Publication Number Publication Date
GB0911050D0 GB0911050D0 (en) 2009-08-12
GB2471323A true GB2471323A (en) 2010-12-29
GB2471323B GB2471323B (en) 2014-10-22

Family

ID=41008284

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0911050.3A Active GB2471323B (en) 2009-06-25 2009-06-25 Motion vector estimator

Country Status (4)

Country Link
US (1) US9407931B2 (en)
JP (1) JP2011010304A (en)
CN (1) CN101938652B (en)
GB (1) GB2471323B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013059470A1 (en) * 2011-10-21 2013-04-25 Dolby Laboratories Licensing Corporation Weighted predictions based on motion information
US9407931B2 (en) 2009-06-25 2016-08-02 Arm Limited Motion vector estimator

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8817878B2 (en) * 2007-11-07 2014-08-26 Broadcom Corporation Method and system for motion estimation around a fixed reference vector using a pivot-pixel approach
GB2484969B (en) * 2010-10-29 2013-11-20 Canon Kk Improved reference frame for video encoding and decoding
US20120169845A1 (en) * 2010-12-30 2012-07-05 General Instrument Corporation Method and apparatus for adaptive sampling video content
US9781418B1 (en) * 2012-06-12 2017-10-03 Google Inc. Adaptive deadzone and rate-distortion skip in video processing
US9179155B1 (en) 2012-06-14 2015-11-03 Google Inc. Skipped macroblock video encoding enhancements
GB2496015B (en) * 2012-09-05 2013-09-11 Imagination Tech Ltd Pixel buffering
JP5890794B2 (en) * 2013-02-28 2016-03-22 株式会社東芝 Image processing device
JP6336341B2 (en) 2014-06-24 2018-06-06 キヤノン株式会社 Imaging apparatus, control method therefor, program, and storage medium
US10368073B2 (en) * 2015-12-07 2019-07-30 Qualcomm Incorporated Multi-region search range for block prediction mode for display stream compression (DSC)
CN108848380B (en) 2018-06-20 2021-11-30 腾讯科技(深圳)有限公司 Video encoding and decoding method, device, computer device and storage medium
CN108848376B (en) * 2018-06-20 2022-03-01 腾讯科技(深圳)有限公司 Video encoding method, video decoding method, video encoding device, video decoding device and computer equipment
US11234017B1 (en) * 2019-12-13 2022-01-25 Meta Platforms, Inc. Hierarchical motion search processing

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070019738A1 (en) * 2005-07-20 2007-01-25 Chao-Tsung Huang Method and apparatus for cost calculation in decimal motion estimation

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11122624A (en) * 1997-10-16 1999-04-30 Matsushita Electric Ind Co Ltd Method and system for reducing video decoder processing amount
KR20040069865A (en) * 2003-01-30 2004-08-06 삼성전자주식회사 Device and method for extending character region-of-content of image
JP2004236023A (en) * 2003-01-30 2004-08-19 Matsushita Electric Ind Co Ltd Motion vector detecting apparatus and method therefor
CN100353768C (en) * 2003-11-26 2007-12-05 联发科技股份有限公司 Motion estimating method and device in video compression system
KR100631777B1 (en) 2004-03-31 2006-10-12 삼성전자주식회사 Method and apparatus for effectively compressing motion vectors in multi-layer
US8374238B2 (en) 2004-07-13 2013-02-12 Microsoft Corporation Spatial scalability in 3D sub-band decoding of SDMCTF-encoded video
JP2006197387A (en) * 2005-01-14 2006-07-27 Fujitsu Ltd Motion vector retrieving device and motion vector retrieving program
US20060222074A1 (en) * 2005-04-01 2006-10-05 Bo Zhang Method and system for motion estimation in a video encoder
US20070110159A1 (en) 2005-08-15 2007-05-17 Nokia Corporation Method and apparatus for sub-pixel interpolation for updating operation in video coding
US20070160288A1 (en) * 2005-12-15 2007-07-12 Analog Devices, Inc. Randomly sub-sampled partition voting (RSVP) algorithm for scene change detection
JP2007235333A (en) * 2006-02-28 2007-09-13 Victor Co Of Japan Ltd Motion vector detector
JP4641995B2 (en) * 2006-09-22 2011-03-02 パナソニック株式会社 Image encoding method and image encoding apparatus
GB2471323B (en) 2009-06-25 2014-10-22 Advanced Risc Mach Ltd Motion vector estimator

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070019738A1 (en) * 2005-07-20 2007-01-25 Chao-Tsung Huang Method and apparatus for cost calculation in decimal motion estimation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9407931B2 (en) 2009-06-25 2016-08-02 Arm Limited Motion vector estimator
WO2013059470A1 (en) * 2011-10-21 2013-04-25 Dolby Laboratories Licensing Corporation Weighted predictions based on motion information

Also Published As

Publication number Publication date
GB0911050D0 (en) 2009-08-12
CN101938652B (en) 2015-07-22
CN101938652A (en) 2011-01-05
US9407931B2 (en) 2016-08-02
US20100329345A1 (en) 2010-12-30
GB2471323B (en) 2014-10-22
JP2011010304A (en) 2011-01-13

Similar Documents

Publication Publication Date Title
US9407931B2 (en) Motion vector estimator
KR100582856B1 (en) Motion estimation and motion-compensated interpolation
KR100441509B1 (en) Apparatus and method for transformation of scanning format
EP2422509B1 (en) Object tracking using momentum and acceleration vectors in a motion estimation system
CN112954328B (en) Encoding and decoding method, device and equipment
JP4837615B2 (en) Image processing method and image processing apparatus
TW201146011A (en) Bi-directional, local and global motion estimation based frame rate conversion
KR100994773B1 (en) Method and Apparatus for generating motion vector in hierarchical motion estimation
US8532409B2 (en) Adaptive motion search range determining apparatus and method
JP2008538433A (en) Video processing using region-based multipath motion estimation and temporal motion vector candidate update
US20100182511A1 (en) Image Processing
US20100315550A1 (en) Image frame interpolation device, image frame interpolation method, and image frame interpolation program
JP2022530172A (en) Intercoding for adaptive resolution video coding
JPH09233477A (en) Motion vector generating method
US7505636B2 (en) System and method for two-pass interpolation for quarter-pel motion compensation
US20060098886A1 (en) Efficient predictive image parameter estimation
US8279936B1 (en) Method and apparatus for fractional pixel expansion and motion vector selection in a video codec
US9106926B1 (en) Using double confirmation of motion vectors to determine occluded regions in images
JP5118087B2 (en) Frame rate conversion method, frame rate conversion device, and frame rate conversion program
JP2004129099A (en) Motion vector searching method and device
US8724703B2 (en) Method for motion estimation
JP3513277B2 (en) Video encoding device and video decoding device
JP2004064518A (en) Moving image encoding method and device and its computer program
JP6059899B2 (en) Frame interpolation apparatus and program
US20040120402A1 (en) Motion estimation apparatus for image data compression