GB2575672A - Motion estimation in video - Google Patents

Motion estimation in video Download PDF

Info

Publication number
GB2575672A
GB2575672A GB1811818.2A GB201811818A GB2575672A GB 2575672 A GB2575672 A GB 2575672A GB 201811818 A GB201811818 A GB 201811818A GB 2575672 A GB2575672 A GB 2575672A
Authority
GB
United Kingdom
Prior art keywords
image
motion vector
estimated
fine
tile
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB1811818.2A
Other versions
GB201811818D0 (en
GB2575672B (en
Inventor
James Knee Michael
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Snell Advanced Media Ltd
Original Assignee
Snell Advanced Media Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Snell Advanced Media Ltd filed Critical Snell Advanced Media Ltd
Priority to GB1811818.2A priority Critical patent/GB2575672B/en
Publication of GB201811818D0 publication Critical patent/GB201811818D0/en
Publication of GB2575672A publication Critical patent/GB2575672A/en
Application granted granted Critical
Publication of GB2575672B publication Critical patent/GB2575672B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/207Analysis of motion for motion estimation over a hierarchy of resolutions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/14Picture signal circuitry for video frequency region
    • H04N5/144Movement detection
    • H04N5/145Movement estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • H04N19/139Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/521Processing of motion vectors for estimating the reliability of the determined motion vectors or motion vector field, e.g. for smoothing the motion vector field or for correcting motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/14Picture signal circuitry for video frequency region
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • G06T2207/10021Stereoscopic video; Stereoscopic image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

Estimating motion vector associated with image that is part of series of images (i.e. video sequence), comprising: processing 104 (original or ‘fine’) image 102 to form coarse image 106, and fine image 102, (and optionally one or more intermediate image, where each step is performed between coarse and intermediate, and then intermediate and fine, images, respectively); dividing fine image 102 into two or more tiles including first tile, and second tile (Fig. 7), which cover substantially the whole area of fine image 102; estimating 120 coarse motion vector associated with coarse image 106 or sub-region thereof; estimating 162 first fine motion vector associated with first tile or sub-region thereof, (and estimating second fine motion vector associated with second tile or sub-region thereof, in parallel with one another); determining whether magnitude of estimated coarse motion vector is greater than magnitude of maximum motion vector that can be detected by estimating first fine motion vector within spatial constraints of first tile, and discarding first fine motion vector if the magnitude of estimated coarse motion vector is greater than magnitude of maximum motion vector of first tile. The invention appears to relate to video coding.

Description

Motion Estimation in Video
Field of Invention
The present invention is in the field of motion estimation in video.
Background
Precise and robust motion estimation is computationally intensive and especially if real time performance is demanded - requires large resources. These issues are compounded when motion vectors are used in a real time motion compensated process. In some instances, a motion compensated process - such as motion compensated standards conversion - will require access to the whole picture to produce any particular output pixel. Where the - for example - standards conversion process has a write-side architecture, the whole image needs to be processed before a value of a given output pixel can be determined.
The problem of complexity of course worsens with increasing image resolution.
A solution to the general problem of increasing processing speed in a complex process would be to divide the process into parallel streams. Parallelisation in video processing presents special difficulties.
Summary
There is set out herein a method of estimating a motion vector associated with an image that is part of a series of images. The method may comprise the steps of processing the image to form a coarse image, and a fine image, and optionally one or more intermediate images, and dividing the fine image into two or more tiles including a first tile, and a second tile, which cover substantially the whole area of the fine image. The method may further comprise estimating a coarse motion vector associated with the coarse image or a sub-region thereof, estimating a first fine motion vector associated with the first tile or a sub-region thereof, and estimating a second fine motion vector associated with the second tile or a sub-region thereof, in parallel with one another. The method may also comprise determining whether the magnitude of the estimated coarse motion vector is greater than the magnitude of a maximum motion vector that can be detected by estimating the first fine motion vector within the spatial constraints of the first tile and discarding the first fine motion vector if the magnitude of the estimated coarse motion vector is greater than the magnitude of the maximum motion vector of the first tile.
The method may include, in addition to discarding the first fine motion vector if the estimated coarse motion vector is greater than the maximum motion vector of the first tile, using the estimated coarse motion vector for both the coarse image or sub-region thereof and the first tile or sub-region thereof of the fine image.
The method may include determining whether the estimated first fine motion vector is a plausible refinement of the coarse estimated motion vector, if the estimated first fine motion vector is a plausible refinement of the estimated coarse motion vector discarding the estimated coarse motion vector.
The method may comprise determining whether the estimated first fine motion vector is close to the maximum value of motion vector, and if so discarding the estimated first fine motion vector.
The method may comprise determining whether the images in the image series before and after the image indicate that a motion vector associated with the image or a sub-region thereof should have an approximate value! and using this approximate value as the estimated motion vector associated with the image or sub-region thereof if the estimated motion vector of the image differs significantly from this approximate value.
The method may comprise performing each of the determination steps on the one or more intermediate images. Further the method may comprise dividing the one or more intermediate image into a first intermediate tile and a second intermediate tile, and wherein the estimation of the first and second fine motion vectors, and the estimation of the first and second intermediate motion vectors associated with the first and second intermediate tiles are done in parallel.
The method may comprise the first tile and the second tile of the fine image overlapping so that an area of the fine image is covered by both the first tile and the second tile.
In the method processing of the image to form a coarse image may be subband processing. Moreover, in the method sub-band processing may be performed through the use of a low pass filter.
The method may comprise using the estimated motion vectors associated with each image or sub-region thereof to motion compensate each image or sub-region thereof, and assimilating the coarse and fine images together to form a motion compensated image.
The method may comprise detecting a cut in the image series and not determining a motion vector associated with the image directly after the cut.
The method may comprise detecting the presence of image bars in the image and tiling the image so that the image bars are not wholly covered by the tiles. In the method the image bars may be top and bottom image bars, and/or left and right image bars.
The method may comprise detecting if identical images are repeated one after the other in the image sequence, and ignoring the identical images in the estimation of the motion vectors.
In the method the estimated motion vector of the coarse layer may be estimated first, and if this is above the maximum motion vector of the first tile, then the estimated first fine motion vector is not estimated, but is given the value of the estimated coarse motion vector.
There is also described an apparatus configured to perform any of the methods set out herein.
There is also described a computer implemented product configured to perform the method described herein.
Brief Description of the Figures
Figure 1 illustrates an overview of a method of motion compensation.
Figure 2 shows an apparatus that may be used for motion compensation.
Figure 3 illustrates a flow diagram showing the steps that may be used in motion estimation.
Figure 4 illustrates a flow diagram of a further method step that may be used in motion estimation.
Figure 5 illustrates a flow diagram of a further method step that may be used in motion estimation.
Figure 6 illustrates a flow diagram of a further method step that may be used in motion estimation.
Figure 7 shows an exemplary tiling arrangement of an image to be processed.
Figure 8 shows an exemplary apparatus that may be used to perform the method as described herein.
Detailed Description of the Figures
The Figures described herein are merely exemplary, and aspects of the invention are set out in the appended claims.
Figure 1 shows one embodiment of a method of motion compensation. This may for example be implemented by a standards converter or a temporal interpolator.
On the left side of Figure 1, each image is processed to form a series of layers, each of these layers represents the image at different resolutions.
The layers may be formed by downsampling the image in the layer above (for example to form an image in layer 1, the image in layer 0 is downsampled). The downsampling may be by a factor of two in each direction (e.g. in both the χ-direction and the y-direction). Image layer 0 may correspond with the original image, and layer 3 in this case is the most downsampled layer (however, in some examples further downsampled layers may be produced). The furthest downsampled layer may be referred to as the base layer.
The next step may be parallel motion estimation 162, 122, 120. This is done for each of the layers 102, 106, 110. In this case this excludes the base layer, however in some embodiments the motion estimation of the base layer may also be performed in parallel.
The parallel motion estimator estimates the motion vectors associated with each of the images or sub-regions thereof. This may be done by any normal means. For example, each of the layers may be compared with the equivalent layer of the previous image in the image sequence to determine differences between these layers. These differences can be used to estimate the motion vectors associated with each image.
For example, one technique that may be used is block matching, or a variant thereof. An alternative technique may be phase correlation.
Figure 1 shows that the images created through downsampling are then processed in order to isolate features, such that each feature appears only in one layer, and so that each layer of the image contains distinct features.
These features may be used as blocks in block matching. These features are indicative of the detail in each image. In order to isolate these features Figure 1 shows that an image may be further downsampled 104, 108, 112, and then upsampled 124, 132, 138. The downsampling and upsampling may be performed using the same factor in all directions. This creates an image with all of the details of the layer before it was downsampled, except the fine details are present only in the pre-downsampling layer and not in the layers below (for example a fine image may contain the features of a person’s facial expressions not captured in a coarser image). A subtractor may then be used to find the differences between the original layer and the processed (downsampled and then upsampled) layer. The output of the subtractor may then be the features of the image only contained in the original layer. For example image 106 is in layer 1. To identify the features that are present only in layer 1, but not in any of the layers below, the image 106 is downsampled 108, and then upsampled 132 to form the processed image 130. This is then compared with the original image 106 in subtractor 126. The features identified through this comparison are then output to the parallel picture builder 128. These identified features are the features only contained in layer 1, and not in layers 2 or 3.
The parallel picture builder 128 uses the motion estimated by parallel motion estimator 120, and the features identified to only belong in layer 1 in subtractor 126, to build the picture of the image of layer 1. This process may be performed for all of the layers in parallel at the same time. This may reduce the latency of the motion estimation process.
The final stage may comprise building the new image. The base image layer goes through a motion estimator and picture builder element 140. This estimates the motion associated with the features of the base image. The picture builder then builds the image of the base image layer based on the estimated motion. This image is then upsampled 158 to form image 156.
Image 156 is added to the picture of the layer above (in this case layer 2) in adder 154. This forms composite image 150. This process of adding the images together continues until all of the image layers have been added together to form the output image. This image is then the motion compensated image. This output creates an image in which each layer of the image has been processed such that the estimated motion has been accounted for, however Figure 1 does not show a mechanism whereby the estimated motion vectors of the different layers are compared so that some of the estimated motion vectors can be discarded.
Figure 2 shows an apparatus that may be used for motion compensation. Apparatus 800 is formed from downsampling module 804, upsampling module 810, parallel motion estimator module 816, subtractor module 812, parallel picture builder 818, base motion estimator and picture builder module 822 and adder module 824.
An input image series 802 is input to downsampling module 804. Downsampling module 804 may be configured to downsample the images such that an original image layer is output, along with a layer that has been downsampled. The downsampled layer may be fed back into the downsampling module via feedback loop 806. This may produce any number of further downsampled layers. These are output as layers 808. Layers 808 show image layers 0, 1, 2 and 3. Image layer 0 may correspond with the original image, and layer 3 in this case is the most downsampled layer (however, in some examples further downsampled layers may be produced). This may be referred to as the base layer. The downsampling may be by a factor of two in each direction (e.g. in both the χ-direction and the ydirection). Each of the downsampled layers will have a lower resolution than the layers that are less downsampled. Therefore details may be lost in each downsampling, such that a downsampled layer will contain all of the information of the layer above, with the exception of details lost through downsampling.
The downsampled layers are used in several apparatus modules. The data associated with each layer may enter the parallel motion estimator module 816. This module may estimate the motion vectors associated with each layer. Each motion vector may be associated with a sub-region of a layer, such as associated with an identified feature or block. This may be done by comparing the layers with the corresponding layers of images that precede or follow the image for which motion is being compensated, in the image series.
The parallel motion estimator module may compare the motion vectors that are produced for each layer (as well as with the motion vectors produced by the base motion estimator and picture builder module 822 for the base layer via data channel 832). These motion vectors may be compared as discussed with reference to Figures 3 to 6, to decide which motion vectors to use. Alternatively this operation may be carried out by parallel picture builder module 818.
Additionally, the layers 1 to 3 may be fed into the upsampling module 810. Only the layer corresponding with the original image may not be upsampled. The upsampling may be by the same factor as the downsampling module 804. This produces layers with the same resolution as the layer above in the sequence (for example upsampled layer 1 produces layer 0*), but without the details that were lost during downsampling.
The downsampled layers 808 (with the exception of the base layer) and the upsampled layers 814 are entered into subtractor module 812. Subtractor module 812 determines the differences between the downsampled layers 808 and the upsampled layers 814. The differences between layer 0 and layer 0* are the details that were lost in the downsampling process that produced layer 1. Therefore the result of the subtraction of layer 0 - layer 0* should result in just these details being left. These details are output for every layer to parallel picture builder module 818.
The parallel picture builder uses the details output for each layer by the subtractor module 812 and the estimated motion vectors associated with each layer that are output by parallel motion estimator module 816 to produce a picture associated with each layer (showing the details associated with each layer). These pictures are output as output layers 820.
The base layer may be processed in the same manner as the other layers. However, in some examples, such as the one shown in Figure 2, the base layer is treated differently. Layer 3 of layers 808 is output to base motion estimator and picture builder module 822. This estimates the motion vectors associated with layer 3 and outputs a picture associated with the features of layer 3. As layer 3 is the base layer all of the details of layer 3 are used in the production of the output picture. Therefore each motion vector generated may be associated with one or more pixels (that may form a feature or block) of layer 3. The estimated motion vectors may be compared via data channel 832 with the estimated motion vectors of the other layers in accordance with the teachings of Figures 3 to 6 to decide which estimated motion vectors to use in the production of the output layer 826.
The output layers associated with all of the layers are entered into adder 824 which adds the output layers together to form an output image. This image is motion compensated and is output 830. This may be output to a data store, to a live video/image stream, or to an external device.
The apparatus of Figure 2 may comprise an apparatus module for each module shown. Alternatively, multiple modules may be incorporated into a single apparatus module. As another alternative, the apparatus may comprise a computer device comprising a memory store, a processor and an interface with external networks such as the internet, and the computer device may be configured to implement the apparatus shown in Figure 2.
Figure 3 shows a process of deciding which motion vectors to use for the layers when building the image. This process may be used before the parallel picture builder is used in Figure 1, such that the images built are built according to desired estimated motion vectors.
The first step 202 shown in Figure 3 corresponds to the process shown in Figure 1 of producing a first layer and a second layer. In the case of Figure 3 these are labelled the coarse image and fine image and are generalizable to any two layers of Figure 1. The coarse image is the layer that is lower in the layer structure. For example, relative to layer 1, layer 2 would be regarded as coarse. In this example, layer 1 would be the fine image. Whereas with regard to layer 0 of Figure 1, layer 1 would be regarded as the coarse image, and layer 0 would be regarded as the fine image. If the fine image is the finest image (i.e. it is the original image) the processing to produce the fine layer may consist solely of identifying the original image as the fine layer, without the need for downsampling. Alternatively if the fine image is any other image (that is not the original image) downsampling may be required in the processing step to form the fine image.
Step 204 recites dividing the fine image into a first tile, and a second tile, which together cover substantially the whole area of the coarse image. The tiling may allow the fine image to be processed without significantly increasing the processing power required to do so. However, the tiling does limit the maximum motion that may be detected in a tile. The maximum motion that may be estimated is measured in pixels per frame. A tile consists of a central area, as well as an overlapping portion (as shown by the rectangles and the dashed lines U and V in Figure 7). The maximum motion that may be reliably detected for a motion vector originating inside the central area is the width (in pixels) of the overlapping region per frame. This is because if a motion vector originates at the border of the central area, and is directed directly to the edge of the overlapping portion, the width of the overlapping portion is the width that may be measured within the tile. Therefore, if motion is above this threshold the estimated motion derived from the fine image may not be accurate. Of course it is noted that the larger the overlapping portion is, the larger the processing time, and therefore it may be advantageous to limit the size of the overlapping portion. The present method allows the overlapping portion to be minimised without causing the image to compile incorrectly based on flawed estimated motion vectors.
The next step 206 in the method shown in Figure 3 is to estimate one or more motion vectors associated with the coarse image. This is shown in Figure 1 as the motion estimation block. This may either be done in parallel or before the motion estimation of the fine image. This may comprise identifying pixels, blocks or features, or sub-regions of the coarse image and comparing these to corresponding pixels, blocks, features or sub-regions in a corresponding coarse layer of a preceding image. The difference in position of the pixels, blocks, sub-regions or features may then be used to estimate the motion vector associated with each pixel, block, feature or sub-region. Alternatively phase correlation rather than block matching may be used.
The step 210 recites estimating one or more motion vectors associated with the first tile, and estimating a motion vector associated with the second tile, in parallel with one another. This step estimates the motion for the fine image, by estimating the motion for both the first and second tiles. It is noted that there may be any number of tiles, for example, four, six, or ten that form the fine image. The method of motion estimation may be block matching, or phase correlation.
Step 212 recites determining whether the estimated motion vectors associated with the coarse image (or sub-region thereof) is above the maximum motion vector that can be detected by estimating the motion vectors associated with either the first tile or the second tile (due to the spatial constraints of the tiles). This step may be performed before the motion is estimated for the first tile and second tile. In this case, if the estimated motion is above the maximum motion vector that can be detected for the first and second tiles the motion may not be estimated for the first and second tiles, and instead this step may not be performed. However, as shown in Figure 2, this step can also be performed after the estimation of the first and second tiles. If some of the motion vectors associated with the coarse image are above the maximum value for the fine image, but other motion vectors associated with the coarse image are not, the motion of the fine image may still be estimated, but motion vectors of the fine image associated with those of the coarse image above the maximum value may be discarded. Optionally, once discarded the coarse image vectors may instead be used. This may the case for example, if the coarse image is the base layer as shown in Figure 1.
The final step shown in Figure 3 recites discarding the estimated motion vectors associated with the the first tile or second tile of the fine image, if the estimated motion vectors associated with the coarse image is above the maximum value of motion vector that can be detected by estimating the motion vectors associated with the first tile, or second tile. This may be achieved by giving the fine image zero weighting in compiling the output image as shown in Figures 1 and 2.
Optionally, instead of merely discarding the estimated motion vectors associated with the fine image, these vectors may be replaced by the estimated coarse motion vectors. By using the estimated motion vectors for the coarse image for the fine image the problem of the first and second tiles maximum estimated motion that may be detected is overcome. This means that all of the images (or layers as shown in Figure 1) such as the fine image and the coarse image may use the same estimated motion vectors so that each layer is processed to create a consistent image across all of the layers.
The method may also be performed on multiple layers with a coarse image, a fine image, and an intermediate image used. More images may also be used. For example, layers 0, 1 and 2 may be used as the fine, intermediate and coarse images. The method may proceed in the same manner as described, but with each step being carried out between the coarse and the intermediate image, and then between the fine and the intermediate image. If the estimated motion vectors associated with the fine image are used instead of the estimated motion vectors associated with the intermediate image, the estimated motion vectors associated with the fine image will be used for the coarse image. Likewise this is true if the coarse image estimated motion vector is used, both the fine and intermediate images will use this value.
It is noted that in Figures 1 and 2 the layers are shown as being single images. The layers of Figure 1 and 2 may be comprised of tiles as discussed with relevance to Figure 3, and as shown in Figure 7.
Figure 4 shows an alternative or additional method to be used in determining the estimated motion vectors to use for the image. This method may be used in conjunction with the methods of Figures 1, 2 and 3.
Step 302 recites estimating one or more motion vectors associated with the coarse image, and the first tile of the fine image (or sub-regions thereof). In this Figure the coarse image has the same meaning as that described with reference to Figure 3. In this Figure the fine image has the same meaning as that described with reference to Figure 3.
Step 304 recites determining whether the estimated motion vectors of the first tile are a plausible refinement of the estimated motion vectors of the coarse image. A plausible refinement in this case may mean that the estimated motion vectors of the first tile do not differ significantly from the estimated motion vectors associated with the coarse image. As the fine image, and so the first tile, has a finer resolution the estimated motion vectors of the fine image may be more accurate than those of the coarse image, with its poorer resolution, and so higher errors. If the vectors are in different directions, or if they are of significantly different scales then this may be an indication that the estimated motion vectors of the first tile are not a plausible refinement of the estimated motion vector of the coarse image.
Step 306 recites discarding the estimated motion vectors associated with the coarse image. This may be achieved by giving the coarse image zero weighting when compiling the output image in the manner shown in Figures 1 and 2. This allows the superior resolution of the fine image to be used for the output image, so that the most accurate estimation of motion may be used. It is noted that some of the estimated motion vectors of the fine image may be regarded as plausible refinements, whilst others may not. In this case only those estimated motion vectors that are not plausible refinements are discarded, whilst those vectors identified as plausible refinements are not. For example, some estimated motion vectors associated with one set of blocks may be determined as being plausible refinements of the equivalent coarse image motion vectors. These estimated fine motion vectors may be used.
Optionally, the discarded estimated motion vectors may be replaced with estimated motion vectors associated with the first tile, if the estimated motion vectors of the first tile are a plausible refinement. Figure 5 shows an alternative or additional method to be used in determining the estimated motion vectors to use for the image. This method may be used in conjunction with the methods shown in Figures 1, 2, 3 and 4.
Step 402 recites estimating one or more motion vectors associated with the coarse image, and the first tile of the fine image. This estimation may be performed using block matching, or phase correlation, or any other method of motion estimation. The coarse image and fine image have the same meaning as described above with reference to Figure 3.
Step 404 recites determining if the estimated motion vectors associated with the first tile (or sub-regions thereof) are close to the maximum value that can be measured for the first tile. “Close to” may mean determining whether the estimated motion vectors associated with the first tile have a value in a region of estimated motion below the maximum value that may therefore be measured, but nevertheless may still be at a level at which the estimation may not be accurate. For example, if the maximum value that may be measured is 100 pixels per frame, then a measurement between 90 and 100 pixels per frame may be below the theoretical maximum motion that may be estimated, but still may not represent an accurate measurement of the motion.
Step 406 recites that if any of the estimated motion vectors of the first tile are close to the maximum value, discarding those vectors. This may be achieved by giving the fine image zero weighting in compiling the output image.
Optionally, the discarded vectors may bereplaced with the equivalent estimated motion vectors of the coarse image (for example, if an estimated motion vector associated with the a first block of the first tile, replacing it with the estimated motion vector associated with the equivalent block of the coarse image). This may be done if the estimated motion associated with the coarse image is thought to have a lower error (despite the lower resolution of the coarse image) due to the increased error in the fine image because the estimated motion vectors are close to the maximum for the fine image. This may therefore reduce the error in determining the estimated motion vector for the image in this case.
Figure 6 shows an alternative or additional method to be used in determining the estimated motion vectors to use for the image. This method may be used in conjunction with the methods shown in Figures 1, 2, 3, 4 and 5. The estimated motion vectors may be determined by the use of block matching, or phase correlation.
Step 502 recites estimating one or more motion vectors associated with an image (or sub-regions thereof). This may be performed after the method of Figures 3, 4 or 5 may have been performed such that one or more estimated motion vectors for the image have been selected already. The estimated motion vectors may instead be based on another method not contained within this application.
Step 504 recites comparing the estimated motion vectors to approximate values based on images (or corresponding layers thereof) in the image series. Specifically estimated motion vectors associated with a sub-region of a layer may be compared with an approximate estimated motion vector derived from estimated motion vectors of the same sub-region of the same layer in other images in the image series. The image is part of an image series which may form a video file. Most movement that is captured over a period of numerous frames is relatively smooth, rather than being juddery. Therefore it would be expected that there would be some correlation between the images in the image series immediately before and after the image for which the motion vectors are being estimated. This may extend to those images which are not immediately before or after, but are close by to the image in the image series. Therefore by comparing the estimated motion vectors to the motion vectors of those images it may be possible to determine whether the estimated motion vectors are likely to be accurate.
Step 506 recites discarding the estimated motion vectors and replacing them with the approximate values if the values differ significantly. This may the case for example where all of the images in the image series that are near to, including those images immediately before and after, the image for which the motion is being estimated have vectors that point in the other direction to the estimated motion vectors. In this case it is likely that the estimated motion vectors are errors, and therefore these values may be discarded. In their place approximate values may be used. This may be a weighted mean of the images in the nearby vicinity of the image series, or a weighted mean of the motion vectors of those images immediately before and after the image in the image series. It is noted that only some of the estimated motion vectors may differ significantly, whilst other estimated motion vectors may not. Only those estimated motion vectors that are significantly different from the corresponding approximate values may be replaced.
Figure 7 shows one example of a tiling arrangement of an image layer. In this case the tiling is in a four by four arrangement. Equally it may be in a two by one, a one by two, or any other arrangement. The tiles themselves may be square, rectangular, or otherwise shaped. The tiles may be equally sized, or may be unequally sized. It is noted that before tiling an image layer a border may be detected, and only the image portion not contained in the border may be tiled. This is an optional feature. It is also noted that the tiles may overlap with one another. For example, the tiles may be shaped as shown by tiles U and V. These tiles are centred on a rectangle, but contain an additional border that is also covered by other tiles. The rectangular portion may be referred to as the central portion, and the area outside the central portion and within the dashed lines may be referred to as the overlapping portion. These tiles are larger than the four by four interlocking rectangles. This may be advantageous to ensure consistency of motion estimation throughout the image. The information within the entirety of the tile, the overlapping portion and the central portion, may be used to determine estimated motion vectors. However the point at which the estimated motion vector may originate from must be within the central portion of the tile. Therefore, the maximum motion that may be estimated for a tile is based on the width of the overlapping portion of the tile.
Figure 8 shows an apparatus such as a computer. This apparatus may be configured to perform the method described above. This may also be used to implement a computer implemented product configured to perform the method described.
The apparatus of Figure 8 comprises a processor 702, a memory 704, a user input/output module 706, and an external input/output module 708. The processor is configured to receive data from the memory, and to send requests to the memory for data. The user input/output module is configured such that a user may instruct a processor what to do, and a processor may send processed data to the user. Additionally processed data may be sent to an external apparatus via the external input output module. The external apparatus may include any piece of hardware, such as another computer, or server, or a cloud based system. The apparatus may be configured such that the image series is stored in the memory, and a user instructs the processor to estimate the motion vectors for the image sequence. The processer may then request the data of the image sequence, from the memory, and then process this data according to the methods described above. The processed data may be output to the user, and may additionally be stored in the memory.
With reference to the drawings in general, it will be appreciated that schematic functional block diagrams are used to indicate functionality of systems and apparatus described herein. It will be appreciated however that the functionality need not be divided in this way, and should not be taken to imply any particular structure of hardware other than that described and claimed below. The function of one or more of the elements shown in the drawings may be further subdivided, and/or distributed throughout apparatus of the disclosure. In some embodiments the function of one or more elements shown in the drawings may be integrated into a single functional unit.
The above embodiments are to be understood as illustrative examples. Further embodiments are envisaged. It is to be understood that any feature described in relation to any one embodiment may be used alone, or in combination with other features described, and may also be used in combination with one or more features of any other of the embodiments, or any combination of any other of the embodiments. Furthermore, equivalents and modifications not described above may also be employed without departing from the scope of the invention, which is defined in the accompanying claims.
In some examples, one or more memory elements can store data and/or program instructions used to implement the operations described herein. Embodiments of the disclosure provide tangible, non-transitory storage media comprising program instructions operable to program a processor to perform any one or more of the methods described and/or claimed herein and/or to provide data processing apparatus as described and/or claimed herein.
The activities and apparatus outlined herein may be implemented with fixed logic such as assemblies of logic gates or programmable logic such as software and/or computer program instructions executed by a processor. Other kinds of programmable logic include programmable processors, programmable digital logic (e.g., a field programmable gate array (FPGA), an erasable programmable read only memory (EPROM), an electrically erasable programmable read only memory (EEPROM)), an application specific integrated circuit, ASIC, or any other kind of digital logic, software, code, electronic instructions, flash memory, optical disks, CD-ROMs, DVD ROMs, magnetic or optical cards, other types of machine-readable mediums suitable for storing electronic instructions, or any suitable combination thereof.

Claims (18)

Claims
1. A method of estimating a motion vector associated with an image that is part of a series of images, comprising the steps of:
processing the image to form a coarse image, and a fine image, and optionally one or more intermediate images, and;
dividing the fine image into two or more tiles including a first tile, and a second tile, which cover substantially the whole area of the fine image!
estimating a coarse motion vector associated with the coarse image or a sub-region thereof!
estimating a first fine motion vector associated with the first tile or a sub-region thereof, and estimating a second fine motion vector associated with the second tile or a sub-region thereof, in parallel with one another!
determining whether the magnitude of the estimated coarse motion vector is greater than the magnitude of a maximum motion vector that can be detected by estimating the first fine motion vector within the spatial constraints of the first tile and discarding the first fine motion vector if the magnitude of the estimated coarse motion vector is greater than the magnitude of the maximum motion vector of the first tile.
2. The method of claim 1, wherein in addition to discarding the first fine motion vector if the estimated coarse motion vector is greater than the maximum motion vector of the first tile, using the estimated coarse motion vector for both the coarse image or sub-region thereof and the first tile or sub-region thereof of the fine image.
3. The method of claims 1 or 2, further comprising:
determining whether the estimated first fine motion vector is a plausible refinement of the coarse estimated motion vector!
if the estimated first fine motion vector is a plausible refinement of the estimated coarse motion vector discarding the estimated coarse motion vector.
4. The method of any preceding claim, further comprising:
determining whether the estimated first fine motion vector is close to the maximum value of motion vector, and if so discarding the estimated first fine motion vector.
5. The method of any preceding claim, further comprising:
determining whether the images in the image series before and after the image indicate that a motion vector associated with the image or a subregion thereof should have an approximate value! and using this approximate value as the estimated motion vector associated with the image or sub-region thereof if the estimated motion vector of the image differs significantly from this approximate value.
6. The method of any preceding claim, further comprising:
performing each of the determination steps on the one or more intermediate images.
7. The method of claim 6, comprising dividing the one or more intermediate image into a first intermediate tile and a second intermediate tile, and wherein the estimation of the first and second fine motion vectors, and the estimation of the first and second intermediate motion vectors associated with the first and second intermediate tiles are done in parallel.
8. The method of any preceding claim, wherein the first tile and the second tile of the fine image overlap so that an area of the fine image is covered by both the first tile and the second tile.
9. The method of any preceding claim, wherein the processing of the image to form a coarse image is sub-band processing.
10. The method of claim 9, wherein the sub-band processing is performed through the use of a low pass filter.
11. The method of any preceding claim, comprising creating a motion compensated image, comprising the steps of:
using the estimated motion vectors associated with each image or subregion thereof to motion compensate each image or sub-region thereof; and assimilating the coarse and fine images together to form a motion compensated image.
12. The method of any preceding claim, comprising:
detecting a cut in the image series and not determining a motion vector associated with the image directly after the cut.
13. The method of any preceding claim, comprising:
detecting the presence of image bars in the image and tiling the image so that the image bars are not wholly covered by the tiles.
14. The method of claim 13, wherein the image bars are top and bottom image bars, and/or left and right image bars.
15. The method of any preceding claim, comprising:
detecting if identical images are repeated one after the other in the image sequence! and ignoring the identical images in the estimation of the motion vectors.
16. The method of any preceding claim, wherein the estimated motion vector of the coarse layer is estimated first, and if this is above the maximum motion vector of the first tile, then the estimated first fine motion vector is not estimated, but is given the value of the estimated coarse motion vector.
17. An apparatus configured to perform the method steps of any of claims 1 to 16.
18. A computer implemented product configured to perform the method steps of any of claims 1 to 16.
Intellectual Property Office
GB1811818.2A 2018-07-19 2018-07-19 Motion estimation in video Active GB2575672B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1811818.2A GB2575672B (en) 2018-07-19 2018-07-19 Motion estimation in video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1811818.2A GB2575672B (en) 2018-07-19 2018-07-19 Motion estimation in video

Publications (3)

Publication Number Publication Date
GB201811818D0 GB201811818D0 (en) 2018-09-05
GB2575672A true GB2575672A (en) 2020-01-22
GB2575672B GB2575672B (en) 2021-11-10

Family

ID=63364418

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1811818.2A Active GB2575672B (en) 2018-07-19 2018-07-19 Motion estimation in video

Country Status (1)

Country Link
GB (1) GB2575672B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010070128A1 (en) * 2008-12-19 2010-06-24 Thomson Licensing Method for multi-resolution motion estimation
US20110194025A1 (en) * 2010-02-08 2011-08-11 Himax Technologies Limited Method and system of hierarchical motion estimation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010070128A1 (en) * 2008-12-19 2010-06-24 Thomson Licensing Method for multi-resolution motion estimation
US20110194025A1 (en) * 2010-02-08 2011-08-11 Himax Technologies Limited Method and system of hierarchical motion estimation

Also Published As

Publication number Publication date
GB201811818D0 (en) 2018-09-05
GB2575672B (en) 2021-11-10

Similar Documents

Publication Publication Date Title
Litvin et al. Probabilistic video stabilization using Kalman filtering and mosaicing
JP4997281B2 (en) Method for determining estimated motion vector in image, computer program, and display device
US9202263B2 (en) System and method for spatio video image enhancement
US9947077B2 (en) Video object tracking in traffic monitoring
US8478002B2 (en) Method for analyzing object motion in multi frames
US8447074B2 (en) Image processing apparatus, image processing method, and program
US8363985B2 (en) Image generation method and apparatus, program therefor, and storage medium which stores the program
KR20170087814A (en) Method and Device of Image Deblurring
US8711938B2 (en) Methods and systems for motion estimation with nonlinear motion-field smoothing
WO2010024479A1 (en) Apparatus and method for converting 2d image signals into 3d image signals
US8369609B2 (en) Reduced-complexity disparity map estimation
US9332185B2 (en) Method and apparatus for reducing jitters of video frames
EP1956556B1 (en) Motion vector estimation
US20120019677A1 (en) Image stabilization in a digital camera
JP2015530649A (en) A method for sampling image colors of video sequences and its application to color clustering
US9317903B1 (en) Self-similarity based single frame super-resolution
JP2012073703A (en) Image blur amount calculation device and program for the same
KR101682137B1 (en) Method and apparatus for temporally-consistent disparity estimation using texture and motion detection
GB2575672A (en) Motion estimation in video
Lu et al. Fast and robust sprite generation for MPEG-4 video coding
EP2237559B1 (en) Background motion estimate based halo reduction
KR101359351B1 (en) Fast method for matching stereo images according to operation skip
JP2006215657A (en) Method, apparatus, program and program storage medium for detecting motion vector
US20160093062A1 (en) Method and apparatus for estimating absolute motion values in image sequences
JPWO2006117844A1 (en) Image processing apparatus, image processing method, and information terminal apparatus