WO2005109339A1 - Creating an output image - Google Patents

Creating an output image Download PDF

Info

Publication number
WO2005109339A1
WO2005109339A1 PCT/IB2005/051440 IB2005051440W WO2005109339A1 WO 2005109339 A1 WO2005109339 A1 WO 2005109339A1 IB 2005051440 W IB2005051440 W IB 2005051440W WO 2005109339 A1 WO2005109339 A1 WO 2005109339A1
Authority
WO
WIPO (PCT)
Prior art keywords
input images
group
ofpixels
pixels
particular object
Prior art date
Application number
PCT/IB2005/051440
Other languages
French (fr)
Inventor
Henricus W. P. Van Der Heijden
Paul M. Hofman
Claus N. Cordes
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Priority to EP05738325A priority Critical patent/EP1751711A1/en
Priority to JP2007512646A priority patent/JP2007536671A/en
Publication of WO2005109339A1 publication Critical patent/WO2005109339A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/14Transformations for image registration, e.g. adjusting or mapping for alignment of images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion

Definitions

  • the invention relates to a method of creating an output image on basis of a sequence of temporally consecutive input images.
  • the invention further relates to a computer program product to be loaded by a computer arrangement, comprising instructions to create an output image on basis of a sequence of temporally consecutive input images.
  • the invention further relates to an image processing apparatus being arranged to create an output image on basis of a sequence of temporally consecutive input images
  • An advantage of showing a sequence of temporally consecutive input images is that dynamic events can be visualized, e.g. movement of an object relative to its background can be shown. For instance a sport game like football, wherein the actual movement of the ball is relevant, can be shown. It is a common feature to repeat portions of a sequence of images corresponding to a football game, during broadcasts. Typically these portions correspond to the most exciting moments of the game. However, when it is required to illustrate such an exciting moment in for instance a newspaper or some other kind of printed media, much of the attractiveness of the event is lost. This is because a picture in a newspaper does not indicate the dynamics of the event.
  • This object of the invention is achieved in that the method comprising: identifying a particular part of a particular object in a first one of the input images; fetching a first group of pixels from the first one of the input images, the first group of pixels corresponding to the particular part of the particular object; localizing the particular part of the particular object in a second one of the input images; fetching a second group of pixels from the second one of the input images, the second group of pixels corresponding to the particular part of the particular object; and appending the second group of pixels to the first group of pixels to form the output image.
  • An obvious existing approach to illustrate a dynamic event is to create a schematic drawing e.g. an artificial graphical representation.
  • the method according to the invention differs in that use is made of a sequence of temporally consecutive input images, i.e. spatio-temporal data, to generate a static, i.e. a spatial image, comprising an object of the input images at different moments in time. Portions are selected from the dynamic (x,y,t) input images, and combined to form a single static (x,y) output image. This is done in such a manner that the static output image illustrates a dynamic event, such as for example the motion of an object.
  • a characteristic feature is that the output image comprises image data of a particular part of a particular object sampled at different moments in time.
  • the particular part of the particular object appears multiple times in the output image.
  • the second group of pixels is appended to the first group of pixels, typically adjacent to the second group ofpixels.
  • sets of spatial image data are used to create a larger output image.
  • portions of consecutive images are combined differently.
  • respective pixels of spatially overlapping image regions are merged. The result is that each object appears only once in the output image.
  • the appending comprises a weighted summation of respective pixels values of the first group ofpixels and the second group of pixels.
  • a weighted summation is that the transition in the luminance and/or color from the first group of pixels to the second group of pixels is smoothed.
  • the second group of pixels is just put adjacent to the first group of pixels.
  • a combination of placing groups of pixels and using weighted summation for the transitions is used. Thus portions of two images are selected and combined through some form of interpolation, either through weighted averaging or simply placing the portions adjacent to one another.
  • the first group of pixels corresponds to the pixels of a number of columns of pixels of the first one of the input images.
  • the first group ofpixels, and also consecutive groups ofpixels extend over the complete height of the pixel matrix corresponding to the input images. That means that all pixels which are located at a column comprising pixels representing the particular part of the particular object are selected and used as a kind of slice to construct the output image.
  • the output image comprises a set of slices which are fetched from the consecutive input images. Each of the slices shows the particular part of the particular object in the respective input images. Typically, the slices also represent a background in front of which the particular object is moving.
  • This embodiment according to the invention is advantageous for creating an output image which illustrates a horizontal movement of the object.
  • the first group of pixels corresponds to the pixels of a number of rows ofpixels of the first one of the input images.
  • the first group ofpixels, and also consecutive groups ofpixels extend over the complete width of the pixel matrix corresponding to the input images. That means that all pixels which are located at a row comprising pixels representing the particular part of the particular object are selected and used as a kind of slice to construct the output image.
  • the output image comprises a set of slices which are fetched from the consecutive input images. Each of the slices shows the particular part of the particular object in the respective input image.
  • the slices also represent a background in front of which the particular object is moving.
  • This embodiment according to the invention is advantageous for creating an output image which illustrates a vertical movement of the object.
  • the first group of pixels corresponds to the pixels of a number of columns of pixels of the first one of the input images
  • the number of columns ofpixels is based on tracking of the particular object.
  • the movement of the particular object is estimated.
  • the estimated movement determines the dimensions of the first group ofpixels. For instance if the estimated movement of the particular part of the particular object is equal to 20 pixels, then the number of columns ofpixels is also 20.
  • the number of rows ofpixels is based on tracking of the particular object.
  • the movement of the particular object is estimated.
  • the estimated movement determines the dimensions of the first group ofpixels. For instance if the estimated movement of the particular part of the particular object is equal to 20 pixels, then the number of rows ofpixels is also 20.
  • the tracking is based on evaluating a number of motion vector candidates, the evaluating comprising establishing of a minimal match error. This technique is generally known as motion estimation.
  • the match error corresponds to a difference between respective pixel values corresponding to the particular object in the first one of the input images and/or the second one of the input images.
  • Movement is a relative quantity. Movement can be expressed relative to the pixel matrices of the consecutive input images. If the consecutive input images were acquired by means of a stationary positioned camera, that approach is appropriate. That means that the coordinates of the particular part of the particular object in the first one of the input images and the coordinates of the particular part of the particular object in the second one of the input images can directly be used to compute the motion of the object. However, in many cases the camera is panning and/or zooming during acquisition of a moving object.
  • the number of columns ofpixels is based on tracking motion of the background in the first one of the input images and/or the second one of the input images.
  • the number of rows ofpixels is based on tracking motion of the background in the first one of the input images and/or the second one of the input images.
  • compensation according to a background motion model is realized. This may be a so-called pan-zoom model, which models the background model as a combination of translation and scaling, but it may also be more complex and also cover other aspects such as perspective projections and rotations
  • the number of fetched columns/rows is based on movement.
  • This movement is relative to the background in front of which the object is moving. In case of a stationary located camera this movement corresponds to movement relative to the various pixel matrices.
  • the particular object can also be tracked semi-manually.
  • the number of columns ofpixels is determined by: determining a first pixel coordinate on basis of identifying the particular part of the particular object in the first one of the input images; determining a second pixel coordinate on basis of identifying the particular part of the particular object in a third one of the input images; - determining the number of consecutive input images being temporally located between the first one of the input images and the third one of the input images; and computing the number of columns on basis of the first pixel coordinate, the second pixel coordinate and the number of consecutive input images.
  • a user has to indicate in a number of images where the particular part of the particular object is located. This might be done by means of moving a cursor relative to the displayed input images.
  • This object of the invention is achieved in that the computer program product, after being loaded in computer arrangement comprising processing means and a memory, provides said processing means with the capability to carry out: accepting a location of a particular part of a particular object in a first one of the input images - fetching a first group ofpixels from the first one of the input images, the first group ofpixels corresponding to the particular part of the particular object; localizing the particular part of the particular object in a second one of the input images; fetching a second group ofpixels from the second one of the input images, the second group ofpixels corresponding to the particular part of the particular object; and appending the second group ofpixels to the first group ofpixels to form the output image.
  • the image processing apparatus comprises processing means with the capability to carry out: accepting a location of a particular part of a particular object in a first one of the input images fetching a first group of pixels from the first one of the input images, the first group of pixels corresponding to the particular part of the particular object; localizing the particular part of the particular object in a second one of the input images; - fetching a second group of pixels from the second one of the input images, the second group ofpixels corresponding to the particular part of the particular object; and appending the second group ofpixels to the first group ofpixels to form the output image.
  • Modifications of the method and variations thereof may correspond to modifications and variations thereof of the image processing apparatus and the computer program product, being described.
  • Fig. 1 schematically shows the method according to the invention, wherein the camera was stationary during acquisition of the input images
  • Fig. 2A schematically shows the method according to the invention, wherein the camera was panning during acquisition of the input images
  • Fig. 2B schematically shows a number of output images according to the invention
  • Fig. 3 schematically shows a number of input images of a football match and an output image which is created according to the invention, based on these input images
  • Fig. 4 schematically shows a first embodiment of the image processing apparatus according to the invention
  • Fig. 5 schematically shows a second embodiment of the image processing apparatus according to the invention.
  • Same reference numerals are used to denote similar parts throughout the
  • FIG. 1 schematically shows the method according to the invention, wherein the camera was stationary during acquisition of the input images 102, 104 and 106.
  • the input images 102, 104 and 106 represent an object, i.e. a ball 100 which was moving in front of a homogeneous background.
  • the camera was not moving during the acquisition of the input images 102, 104 and 106.
  • the output image 108 which is based on the input images 102, 104 and 106 comprises a number of slices 110, 112 and 114 of the respective input images 102, 104 and 106.
  • a slice is meant a set ofpixels corresponding to a number of columns (or rows) of an input image.
  • the arrows in Fig. 1 depict the relation between the slices as fetched from the input images 102, 104 and 106 and the slices being combined to form the output image 108.
  • the size of these slices is based on the movement of the ball 100 relative to the pixel matrices.
  • the output image 108 also comprises a start portion 116 of the first input image 102 and an end portion 118 of the last input image 106.
  • the size of the start portion 116 and of the end portion 118 is not related to the movement of the ball 100.
  • Fig. 2A schematically shows the method according to the invention, wherein the camera was panning during acquisition of the input images.
  • the input images 102, 104 and 106 represent an object, i.e. a ball 100 which was moving in front of a house.
  • the camera was panning during the acquisition of the input images 102, 104 and 106.
  • the direction of the movement of the camera and of the ball are mutually equal.
  • the speed of the camera movement is higher than the speed of the ball 100.
  • the output image 208 which is based on the input images 102, 104 and 106 comprises a number of slices 110, 112 and 114 of the respective input images 102, 104 and 106.
  • the arrows in Fig. 2A depict the relation between the slices as fetched from the input images 102, 104 and 106 and the slices being combined to form the output image 208.
  • the size of these slices is based on the movement of the ball 100 relative to the background.
  • the output image 208 also comprises a start portion 116 of the first input image 102 and an end portion 118 of the last input image 106. The size of the start portion 116 and of the end portion 118 is not related to the movement of the ball 100.
  • the output image 208 shows the complete house whereas the different input images show a portion of the house. That means that the method according to the invention is such that spatially related image data is also combined optionally resulting in a relatively large output image.
  • Fig. 2B schematically shows a number of output images 202, 204 and 208 being constructed according to this approach.
  • a first one of the output images 202 shows the said overview image in which the ball 100 is visible only once.
  • FIG. 3 schematically shows a number of input images 102, 104 and 106 of a football match and an output image 308 which is created according to the invention, based on these input images 102, 104 and 106. It should be noted that the shown input images 102, 104 and 106 are only a part of a longer sequence of consecutive input images.
  • the input images 102, 104 and 106 represents a football match. In a first one of the input images 102 it can be seen that a player kicks the ball 100. Look at the circle.
  • Fig. 3 also shows the output image 308 which is based on the shown input images 102, 104 and 106 and based on approximately 40 not shown input images. The actual trajectory of the ball is clearly visible in the output image 308.
  • Fig. 4 schematically shows a first embodiment of the image processing apparatus 100 according to the invention.
  • the image processing apparatus 400 is provided with a sequence of input images at its image input connector 410 and is arranged to provide a sequence of intermediate output images and a final output image at its image output connector 414.
  • the image processing apparatus is provided by location information which is provided by means of user interaction, e.g. by a user who has indicated the object of interest in a number of input images.
  • the image processing apparatus 100 comprises processing means with the capability to carry out: accepting a location of a particular part of a particular object in a first one of the input images, by means of location information input interface 412; fetching a first group ofpixels by means of pixel processor 404 from the first one of the input images which is temporally stored in an input memory device 402, wherein the first group ofpixels corresponds to the particular part of the particular object; localizing the particular part of the particular object in a second one of the input images, by means of the localization unit 408; fetching a second group ofpixels by means of pixel processor 404 from the second one of the input images which is temporarily stored in the input memory device 402 after the first one of the input images, wherein the second group ofpixels also corresponds to the particular part of the particular object; and - appending
  • Fig. 5 schematically shows a second embodiment of the image processing apparatus 500 according to the invention.
  • This embodiment 500 is basically the same as the embodiment 400 as described in connection with Fig. 4. A difference is that this embodiment 500 is arranged to compensate for camera movement.
  • This embodiment of the image processing apparatus is arranged to perform motion estimation of the background to be able to compensate for the effects of camera movement.
  • This embodiment 500 comprises an additional memory device for temporally storage of a second input image.
  • the localization unit 408 is provided with positional information of a target of interest, i.e. a particular object to be tracked, within the sequence of input images.
  • the localization unit 408 is arranged to compute a global motion vector for the background in front of which the target object is moving.
  • the global motion vector is computed by combining a number of motion vectors being computed on basis of a pair of input images.
  • the motion vectors are computed by means of a standard motion estimator which is preferably incorporated in the localization unit 408.
  • the motion estimator is e.g. as specified in the article "True-Motion Estimation with 3-D Recursive Search Block Matching" by G. de Haan et al. in IEEE Transactions on circuits and systems for video technology, vol.3, no.5, October 1993, pages 368-379.
  • a motion vector for the entire image is computed on basis of a mean image-row (x-component) and a mean image-column (y-component), as disclosed in the article "feature-based block matching algorithm integral projections" by J.S.Kim and R- H. Park, in Electronic Letters Vol 25, ⁇ .29-30.
  • the pixel processor 404 and the localization unit 408 may be implemented using one processor. Normally, these functions are performed under control of a software program product. During execution, normally the software program product is loaded into a memory, like a RAM, and executed from there. The program may be loaded from a background memory, like a ROM, hard disk, or magnetically and/or optical storage, or may be loaded via a network like Internet.
  • an application specific integrated circuit provides the disclosed functionality.
  • the working of the embodiment of the image processing apparatus as depicted in Fig. 5 will be explained using an example involving a sequence of input images representing a free kick in football.
  • a few input images, i.e. video frames, are shown in Fig. 3.
  • the camera was panning, with a non-constant speed, from the location of the kick to the goal.
  • the dynamic event to be captured in the output image is the ball flying into the goal and therefore the ball has to be tracked in the sequence of input images.
  • the motion of the ball is approximated by using a constant velocity in the x direction (this is along the left-right axis in the input images). This is a reasonable assumption of the ball motion between the kick and the first following contact with an object such as the goal net.
  • motion in the y direction is disregarded (the top-bottom axis in the input images).
  • the user is required to provide two or more spatio- temporal positions x SCreen (ni) for input images n,, in order to be able to determine the velocity v, as well as to provide start and end points of the event.
  • the relative camera position X camera (n) for each input image n is automatically calculated from the video sequence.
  • v is calculated for the event, and for each input image n the horizontal areas of interest, i.e. the slices comprising a number of columns of the input images, in screen coordinates, are centered around x screen (n), which can be calculated from Equation (1).
  • the method, computer program product and image processing apparatus may be beneficial for several applications, e.g.: professional image processing, like in film studios, broadcast studios or for making newspapers and other types of printed media; consumer electronics devices, like TVs, set-top boxes and personal video recording devices; educational purposes; and consumer video processing software, e.g. for making home videos.
  • professional image processing like in film studios, broadcast studios or for making newspapers and other types of printed media
  • consumer electronics devices like TVs, set-top boxes and personal video recording devices
  • educational purposes and consumer video processing software, e.g. for making home videos.
  • any reference signs placed (typo?) between parentheses shall not be constructed as limiting the claim.
  • the word 'comprising' does not exclude the presence of elements or steps not listed in a claim.
  • the word "a” or “an” preceding an element does not exclude the presence of a plurality of such elements.
  • the invention can be implemented by means of hardware comprising several distinct elements and by means of a suitable programmed computer. In the unit claims enumerating several means, several of these means can be embodied by one and the same item of hardware.
  • the usage of the words first, second and third, etcetera do not indicate any ordering. These words are to be interpreted as names.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)
  • Studio Devices (AREA)

Abstract

A method of creating an output image (108) on basis of a sequence of temporally consecutive input images is disclosed. The method comprises: identifying a particular part of a particular object (100) in a first one of the input images (102); fetching a first group of pixels (110) from the first one of the input images (102), the first group of pixels (110) corresponding to the particular part of the particular object (100); localizing the particular part of the particular object (100) in a second one of the input images (104); fetching a second group of pixels (110) from the second one of the input images (104), the second group of pixels (110) corresponding to the particular part of the particular object (100); and appending the second group of pixels (110) to the first group of pixels (110) to form the output image.

Description

Creating an output image
The invention relates to a method of creating an output image on basis of a sequence of temporally consecutive input images. The invention further relates to a computer program product to be loaded by a computer arrangement, comprising instructions to create an output image on basis of a sequence of temporally consecutive input images. The invention further relates to an image processing apparatus being arranged to create an output image on basis of a sequence of temporally consecutive input images
An advantage of showing a sequence of temporally consecutive input images is that dynamic events can be visualized, e.g. movement of an object relative to its background can be shown. For instance a sport game like football, wherein the actual movement of the ball is relevant, can be shown. It is a common feature to repeat portions of a sequence of images corresponding to a football game, during broadcasts. Typically these portions correspond to the most exciting moments of the game. However, when it is required to illustrate such an exciting moment in for instance a newspaper or some other kind of printed media, much of the attractiveness of the event is lost. This is because a picture in a newspaper does not indicate the dynamics of the event.
It is an object of the invention to provide a method of the kind described in the opening paragraph for summarizing a dynamic event in an output image. This object of the invention is achieved in that the method comprising: identifying a particular part of a particular object in a first one of the input images; fetching a first group of pixels from the first one of the input images, the first group of pixels corresponding to the particular part of the particular object; localizing the particular part of the particular object in a second one of the input images; fetching a second group of pixels from the second one of the input images, the second group of pixels corresponding to the particular part of the particular object; and appending the second group of pixels to the first group of pixels to form the output image. An obvious existing approach to illustrate a dynamic event is to create a schematic drawing e.g. an artificial graphical representation. The method according to the invention differs in that use is made of a sequence of temporally consecutive input images, i.e. spatio-temporal data, to generate a static, i.e. a spatial image, comprising an object of the input images at different moments in time. Portions are selected from the dynamic (x,y,t) input images, and combined to form a single static (x,y) output image. This is done in such a manner that the static output image illustrates a dynamic event, such as for example the motion of an object. A characteristic feature is that the output image comprises image data of a particular part of a particular object sampled at different moments in time. In other words, the particular part of the particular object appears multiple times in the output image. This is because the second group of pixels is appended to the first group of pixels, typically adjacent to the second group ofpixels. There is a clear distinction with the prior-art of creating one large panorama view by stitching together different smaller images. In that case sets of spatial image data are used to create a larger output image. In the method according to the prior art, portions of consecutive images are combined differently. Typically respective pixels of spatially overlapping image regions are merged. The result is that each object appears only once in the output image. In the method according to the invention use is explicitly made of data representing a single object at different moments in time. In an embodiment of the method according to the invention, the appending comprises a weighted summation of respective pixels values of the first group ofpixels and the second group of pixels. An advantage of a weighted summation is that the transition in the luminance and/or color from the first group of pixels to the second group of pixels is smoothed. Alternatively, the second group of pixels is just put adjacent to the first group of pixels. Typically, a combination of placing groups of pixels and using weighted summation for the transitions is used. Thus portions of two images are selected and combined through some form of interpolation, either through weighted averaging or simply placing the portions adjacent to one another. In an embodiment of the method according to the invention, the first group of pixels corresponds to the pixels of a number of columns of pixels of the first one of the input images. In this embodiment of the method according to the invention the first group ofpixels, and also consecutive groups ofpixels extend over the complete height of the pixel matrix corresponding to the input images. That means that all pixels which are located at a column comprising pixels representing the particular part of the particular object are selected and used as a kind of slice to construct the output image. In other words the output image comprises a set of slices which are fetched from the consecutive input images. Each of the slices shows the particular part of the particular object in the respective input images. Typically, the slices also represent a background in front of which the particular object is moving. This embodiment according to the invention is advantageous for creating an output image which illustrates a horizontal movement of the object. In an embodiment of the method according to the invention the first group of pixels corresponds to the pixels of a number of rows ofpixels of the first one of the input images. In this embodiment of the method according to the invention the first group ofpixels, and also consecutive groups ofpixels extend over the complete width of the pixel matrix corresponding to the input images. That means that all pixels which are located at a row comprising pixels representing the particular part of the particular object are selected and used as a kind of slice to construct the output image. In other words the output image comprises a set of slices which are fetched from the consecutive input images. Each of the slices shows the particular part of the particular object in the respective input image. Typically, the slices also represent a background in front of which the particular object is moving. This embodiment according to the invention is advantageous for creating an output image which illustrates a vertical movement of the object. In an embodiment of the method according to the invention, wherein the first group of pixels corresponds to the pixels of a number of columns of pixels of the first one of the input images, the number of columns ofpixels is based on tracking of the particular object. The movement of the particular object is estimated. The estimated movement determines the dimensions of the first group ofpixels. For instance if the estimated movement of the particular part of the particular object is equal to 20 pixels, then the number of columns ofpixels is also 20. In an embodiment of the method according to the invention wherein the first group of pixels corresponds to the pixels of a number of rows ofpixels of the first one of the input images, the number of rows ofpixels is based on tracking of the particular object. The movement of the particular object is estimated. The estimated movement determines the dimensions of the first group ofpixels. For instance if the estimated movement of the particular part of the particular object is equal to 20 pixels, then the number of rows ofpixels is also 20. In an embodiment according to the invention, the tracking is based on evaluating a number of motion vector candidates, the evaluating comprising establishing of a minimal match error. This technique is generally known as motion estimation. Preferably, the match error corresponds to a difference between respective pixel values corresponding to the particular object in the first one of the input images and/or the second one of the input images. Movement is a relative quantity. Movement can be expressed relative to the pixel matrices of the consecutive input images. If the consecutive input images were acquired by means of a stationary positioned camera, that approach is appropriate. That means that the coordinates of the particular part of the particular object in the first one of the input images and the coordinates of the particular part of the particular object in the second one of the input images can directly be used to compute the motion of the object. However, in many cases the camera is panning and/or zooming during acquisition of a moving object. If the sequence of temporally consecutive input images is based on such an acquisition, a correction for this camera movement is preferred. In a preferred embodiment according to the invention the number of columns ofpixels is based on tracking motion of the background in the first one of the input images and/or the second one of the input images. Alternatively, the number of rows ofpixels is based on tracking motion of the background in the first one of the input images and/or the second one of the input images. In general compensation according to a background motion model is realized. This may be a so-called pan-zoom model, which models the background model as a combination of translation and scaling, but it may also be more complex and also cover other aspects such as perspective projections and rotations As said, the number of fetched columns/rows is based on movement. This movement is relative to the background in front of which the object is moving. In case of a stationary located camera this movement corresponds to movement relative to the various pixel matrices. As an alternative for tracking the particular object by means of motion estimation on basis of evaluation of motion vectors, the particular object can also be tracked semi-manually. In that case the number of columns ofpixels is determined by: determining a first pixel coordinate on basis of identifying the particular part of the particular object in the first one of the input images; determining a second pixel coordinate on basis of identifying the particular part of the particular object in a third one of the input images; - determining the number of consecutive input images being temporally located between the first one of the input images and the third one of the input images; and computing the number of columns on basis of the first pixel coordinate, the second pixel coordinate and the number of consecutive input images. In this embodiment according to the invention, a user has to indicate in a number of images where the particular part of the particular object is located. This might be done by means of moving a cursor relative to the displayed input images. It is a further object of the invention to provide a computer program product of the kind described in the opening paragraph for summarizing a dynamic event in an output image. This object of the invention is achieved in that the computer program product, after being loaded in computer arrangement comprising processing means and a memory, provides said processing means with the capability to carry out: accepting a location of a particular part of a particular object in a first one of the input images - fetching a first group ofpixels from the first one of the input images, the first group ofpixels corresponding to the particular part of the particular object; localizing the particular part of the particular object in a second one of the input images; fetching a second group ofpixels from the second one of the input images, the second group ofpixels corresponding to the particular part of the particular object; and appending the second group ofpixels to the first group ofpixels to form the output image. It is a further object of the invention to provide an image processing apparatus of the kind described in the opening paragraph for summarizing a dynamic event in an output image. This object of the invention is achieved in that the image processing apparatus comprises processing means with the capability to carry out: accepting a location of a particular part of a particular object in a first one of the input images fetching a first group of pixels from the first one of the input images, the first group of pixels corresponding to the particular part of the particular object; localizing the particular part of the particular object in a second one of the input images; - fetching a second group of pixels from the second one of the input images, the second group ofpixels corresponding to the particular part of the particular object; and appending the second group ofpixels to the first group ofpixels to form the output image. Modifications of the method and variations thereof may correspond to modifications and variations thereof of the image processing apparatus and the computer program product, being described.
These and other aspects of the image processing apparatus, of the method and of the computer program product, according to the invention will become apparent from and will be elucidated with respect to the implementations and embodiments described hereinafter and with reference to the accompanying drawings, wherein: Fig. 1 schematically shows the method according to the invention, wherein the camera was stationary during acquisition of the input images; Fig. 2A schematically shows the method according to the invention, wherein the camera was panning during acquisition of the input images; Fig. 2B schematically shows a number of output images according to the invention; Fig. 3 schematically shows a number of input images of a football match and an output image which is created according to the invention, based on these input images; Fig. 4 schematically shows a first embodiment of the image processing apparatus according to the invention; and Fig. 5 schematically shows a second embodiment of the image processing apparatus according to the invention. Same reference numerals are used to denote similar parts throughout the
Figures. Fig. 1 schematically shows the method according to the invention, wherein the camera was stationary during acquisition of the input images 102, 104 and 106. The input images 102, 104 and 106 represent an object, i.e. a ball 100 which was moving in front of a homogeneous background. The camera was not moving during the acquisition of the input images 102, 104 and 106. It can clearly be seen that the ball 100 is moving from the left to the right relative to the pixel matrices corresponding to the input images 102, 104 and 106. The output image 108 which is based on the input images 102, 104 and 106 comprises a number of slices 110, 112 and 114 of the respective input images 102, 104 and 106. With a slice is meant a set ofpixels corresponding to a number of columns (or rows) of an input image. The arrows in Fig. 1 depict the relation between the slices as fetched from the input images 102, 104 and 106 and the slices being combined to form the output image 108. The size of these slices is based on the movement of the ball 100 relative to the pixel matrices. The output image 108 also comprises a start portion 116 of the first input image 102 and an end portion 118 of the last input image 106. The size of the start portion 116 and of the end portion 118 is not related to the movement of the ball 100. Fig. 2A schematically shows the method according to the invention, wherein the camera was panning during acquisition of the input images. The input images 102, 104 and 106 represent an object, i.e. a ball 100 which was moving in front of a house. The camera was panning during the acquisition of the input images 102, 104 and 106. The direction of the movement of the camera and of the ball are mutually equal. The speed of the camera movement is higher than the speed of the ball 100. The output image 208 which is based on the input images 102, 104 and 106 comprises a number of slices 110, 112 and 114 of the respective input images 102, 104 and 106. The arrows in Fig. 2A depict the relation between the slices as fetched from the input images 102, 104 and 106 and the slices being combined to form the output image 208. The size of these slices is based on the movement of the ball 100 relative to the background. The output image 208 also comprises a start portion 116 of the first input image 102 and an end portion 118 of the last input image 106. The size of the start portion 116 and of the end portion 118 is not related to the movement of the ball 100. By comparing the output image 208 with the input images 102, 104 and 106 it becomes clear that the output image is larger. The output image 208 shows the complete house whereas the different input images show a portion of the house. That means that the method according to the invention is such that spatially related image data is also combined optionally resulting in a relatively large output image. It will be clear that each time a new slice of an input image is appended to the output image as constructed until then, a new output image is created. In other words a first output image which is appended with a slice becomes a second output image. Showing such a series of output images under construction gives a users the impression of a live dynamic event combined with the history of the lapsed part of the event. The user is shown a series of output images which differ in size, i.e. a subsequent output image is larger than its predecessor. Alternatively, first a relatively large overview image is constructed on basis of the sequence of input images, wherein the overview image represent the total scene being captured by the input images. However without duplicates as described above. This is preferably done by using strips ofpixels which do not comprise pixels representing a moving object. Typically these strips are located at the border of the input images. The size of these strips is not related to movement of a particular object to be tracked but is related to movement of the background relative to the camera. After having created such a large overview image the method according to the invention is applied. The intermediate results of the method, i.e. subsequent output images are combined with the overview image. Basically, this means that the subsequent output images are appended with respective portions, i.e. remaining parts, of the overview image. Fig. 2B schematically shows a number of output images 202, 204 and 208 being constructed according to this approach. A first one of the output images 202 shows the said overview image in which the ball 100 is visible only once. In a second one of the output images 204 the ball 100 is visible twice and in a third one of the output images 208 the ball 100 is visible three times. Fig. 3 schematically shows a number of input images 102, 104 and 106 of a football match and an output image 308 which is created according to the invention, based on these input images 102, 104 and 106. It should be noted that the shown input images 102, 104 and 106 are only a part of a longer sequence of consecutive input images. The input images 102, 104 and 106 represents a football match. In a first one of the input images 102 it can be seen that a player kicks the ball 100. Look at the circle. In a second one of the input images 104 it can be seen that the ball 100 is flying through the sky. Look at the circle again. In a third one of the input images 106 it can be seen that the ball 100 reaches the goal. Fig. 3 also shows the output image 308 which is based on the shown input images 102, 104 and 106 and based on approximately 40 not shown input images. The actual trajectory of the ball is clearly visible in the output image 308. Fig. 4 schematically shows a first embodiment of the image processing apparatus 100 according to the invention. The image processing apparatus 400 is provided with a sequence of input images at its image input connector 410 and is arranged to provide a sequence of intermediate output images and a final output image at its image output connector 414. Preferably, the image processing apparatus according to the invention is provided by location information which is provided by means of user interaction, e.g. by a user who has indicated the object of interest in a number of input images. The image processing apparatus 100 comprises processing means with the capability to carry out: accepting a location of a particular part of a particular object in a first one of the input images, by means of location information input interface 412; fetching a first group ofpixels by means of pixel processor 404 from the first one of the input images which is temporally stored in an input memory device 402, wherein the first group ofpixels corresponds to the particular part of the particular object; localizing the particular part of the particular object in a second one of the input images, by means of the localization unit 408; fetching a second group ofpixels by means of pixel processor 404 from the second one of the input images which is temporarily stored in the input memory device 402 after the first one of the input images, wherein the second group ofpixels also corresponds to the particular part of the particular object; and - appending the second group ofpixels to the first group ofpixels to form the output image. The pixel processor 404 is arranged to make a copy of the accessed second group of pixel values and to write the copy to pixel values at the appropriate position in the output memory device 406. Fig. 5 schematically shows a second embodiment of the image processing apparatus 500 according to the invention. This embodiment 500 is basically the same as the embodiment 400 as described in connection with Fig. 4. A difference is that this embodiment 500 is arranged to compensate for camera movement. This embodiment of the image processing apparatus is arranged to perform motion estimation of the background to be able to compensate for the effects of camera movement. This embodiment 500 comprises an additional memory device for temporally storage of a second input image. The localization unit 408 is provided with positional information of a target of interest, i.e. a particular object to be tracked, within the sequence of input images. Besides that the localization unit 408 is arranged to compute a global motion vector for the background in front of which the target object is moving. The global motion vector is computed by combining a number of motion vectors being computed on basis of a pair of input images. The motion vectors are computed by means of a standard motion estimator which is preferably incorporated in the localization unit 408. The motion estimator is e.g. as specified in the article "True-Motion Estimation with 3-D Recursive Search Block Matching" by G. de Haan et al. in IEEE Transactions on circuits and systems for video technology, vol.3, no.5, October 1993, pages 368-379. Alternatively a motion vector for the entire image is computed on basis of a mean image-row (x-component) and a mean image-column (y-component), as disclosed in the article "feature-based block matching algorithm integral projections" by J.S.Kim and R- H. Park, in Electronic Letters Vol 25, ρ.29-30. The pixel processor 404 and the localization unit 408 may be implemented using one processor. Normally, these functions are performed under control of a software program product. During execution, normally the software program product is loaded into a memory, like a RAM, and executed from there. The program may be loaded from a background memory, like a ROM, hard disk, or magnetically and/or optical storage, or may be loaded via a network like Internet. Optionally an application specific integrated circuit provides the disclosed functionality. The working of the embodiment of the image processing apparatus as depicted in Fig. 5 will be explained using an example involving a sequence of input images representing a free kick in football. A few input images, i.e. video frames, are shown in Fig. 3. The camera was panning, with a non-constant speed, from the location of the kick to the goal. The dynamic event to be captured in the output image is the ball flying into the goal and therefore the ball has to be tracked in the sequence of input images. The motion of the ball is approximated by using a constant velocity in the x direction (this is along the left-right axis in the input images). This is a reasonable assumption of the ball motion between the kick and the first following contact with an object such as the goal net. In this example motion in the y direction is disregarded (the top-bottom axis in the input images). For the x position of the football can be derived
X screen (O + X camera («) = X screen (" ) + X camera («0 ) + V ' (" ~ "o )» ( 0
where no is a reference input image number, in which the x positions of the ball on screen (xscrccn) , i e. pixel matrix, and the relative position of the camera (xCamera) are considered known. The actual position of the ball is given by the sum of the screen position and the camera position. For example, if the ball moves to the right in the "real" world, it is possible that the camera pans faster to the right than the ball moves, in which case the ball can be seen moving to the left on screen. To compensate for this effect, the camera position is included in Equation (1). If a second screen position at input image ni is known, the true velocity v can be calculated using v _ (X screen (". ) + X camera (*| )) ~ (X screen ) + X camera )) (2) "i - »o In this embodiment, the user is required to provide two or more spatio- temporal positions xSCreen(ni) for input images n,, in order to be able to determine the velocity v, as well as to provide start and end points of the event. Using a global motion estimation algorithm the relative camera position Xcamera(n) for each input image n is automatically calculated from the video sequence. Then v is calculated for the event, and for each input image n the horizontal areas of interest, i.e. the slices comprising a number of columns of the input images, in screen coordinates, are centered around xscreen(n), which can be calculated from Equation (1).
X screen (") = X screen ("o ) + X camera ) " X camera (O + V ' (" " "o )» (3)
These areas of interest, i.e. slices are copied to appropriate parts of the output image. The embodiment presented here is limited in certain regards, which may be overcome with more advanced processing techniques. Most notably, it is dependent on user interaction to provide start and end input images, as well as start and end locations of
"interesting objects". This could be more generalized ("follow the ball") using (object based) motion estimation and with intelligent automatic choices for start and end frames for an event. The method, computer program product and image processing apparatus according to the invention may be beneficial for several applications, e.g.: professional image processing, like in film studios, broadcast studios or for making newspapers and other types of printed media; consumer electronics devices, like TVs, set-top boxes and personal video recording devices; educational purposes; and consumer video processing software, e.g. for making home videos. It should be noted that the above-mentioned embodiments illustrate rather than limit the invention and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed (typo?) between parentheses shall not be constructed as limiting the claim. The word 'comprising' does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements and by means of a suitable programmed computer. In the unit claims enumerating several means, several of these means can be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words are to be interpreted as names.

Claims

CLAIMS:
1. A method of creating an output image (108) on basis of a sequence of temporally consecutive input images, the method comprising: identifying a particular part of a particular object (100) in a first one of the input images (102); - fetching a first group ofpixels (110) from the first one of the input images
(102), the first group ofpixels (110) corresponding to the particular part of the particular object (100); localizing the particular part of the particular object (100) in a second one of the input images (104); - fetching a second group ofpixels (110) from the second one of the input images (104), the second group ofpixels (110) corresponding to the particular part of the particular object (100); and appending the second group ofpixels (110) to the first group ofpixels (110) to form the output image.
2. A method as claimed in claim 1, wherein the appending comprises a weighted summation of respective pixels values of the first group ofpixels (110) and the second group ofpixels (1 10).
3. A method as claimed in claim 1, wherein the first group ofpixels (110) corresponds to the pixels of a number of columns of pixels of the first one of the input images (102).
4. A method as claimed in claim 1, wherein the first group ofpixels (110) corresponds to the pixels of a number of rows of pixels of the first one of the input images (102).
5. A method as claimed in claim 3, wherein the number of columns ofpixels is based on tracking of the particular object (100).
6. A method as claimed in claim 4, wherein the number of rows of pixels is based on tracking of the particular object (100).
7. A method as claimed in claim 5 or 6, wherein the tracking is based on evaluating a number of motion vector candidates, the evaluating comprising establishing of a minimal match error.
8. A method as claimed in claim 7, wherein the match error corresponds to a difference between respective pixel values corresponding to the particular object (100) in the first one of the input images (102) and/or the second one of the input images (104).
9. A method as claimed in claim 5, wherein the number of columns ofpixels is based on tracking motion of the background in the first one of the input images (102) and/or the second one of the input images (104).
10. A method as claimed in claim 6, wherein the number of rows ofpixels is based on tracking motion of the background in the first one of the input images (102) and/or the second one of the input images (104).
11. A method as claimed in claim 5, wherein the number of columns of pixels is determined by: determining a first pixel coordinate on basis of identifying the particular part of the particular object (100) in the first one of the input images (102); - determining a second pixel coordinate on basis of identifying the particular part of the particular object (100) in a third one of the input images; determining the number of consecutive input images being temporally located between the first one of the input images (102) and the third one of the input images; and computing the number of columns on basis of the first pixel coordinate, the second pixel coordinate and the number of consecutive input images.
12. A computer program product to be loaded by a computer arrangement, comprising instructions to create an output image (108) on basis of a sequence of temporally consecutive input images, the computer arrangement comprising processing means and a memory, the computer program product, after being loaded, providing said processing means with the capability to carry out: accepting a location of a particular part of a particular object (100) in a first one of the input images (102) - fetching a first group of pixels (1 10) from the first one of the input images
(102), the first group ofpixels (110) corresponding to the particular part of the particular object (100); localizing the particular part of the particular object (100) in a second one of the input images (104); - fetching a second group ofpixels (110) from the second one of the input images (104), the second group of pixels (110) corresponding to the particular part of the particular object (100); and appending the second group ofpixels (110) to the first group ofpixels (110) to form the output image.
13. An image processing apparatus being arranged to create an output image (108) on basis of a sequence of temporally consecutive input images, the image processing apparatus comprising processing means with the capability to carry out: accepting a location of a particular part of a particular object (100) in a first one of the input images (102) fetching a first group ofpixels (110) from the first one of the input images (102), the first group ofpixels (110) corresponding to the particular part of the particular object (100); localizing the particular part of the particular object (100) in a second one of the input images (104); fetching a second group of pixels (110) from the second one of the input images (104), the second group ofpixels (110) corresponding to the particular part of the particular object (100); and appending the second group ofpixels (110) to the first group of pixels (110) to form the output image.
14. An image processing apparatus as claimed in claim 13, characterized in further comprising a display device for displaying the output image.
PCT/IB2005/051440 2004-05-10 2005-05-03 Creating an output image WO2005109339A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP05738325A EP1751711A1 (en) 2004-05-10 2005-05-03 Creating an output image
JP2007512646A JP2007536671A (en) 2004-05-10 2005-05-03 Output image generation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP04102018.1 2004-05-10
EP04102018 2004-05-10

Publications (1)

Publication Number Publication Date
WO2005109339A1 true WO2005109339A1 (en) 2005-11-17

Family

ID=34967091

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2005/051440 WO2005109339A1 (en) 2004-05-10 2005-05-03 Creating an output image

Country Status (5)

Country Link
EP (1) EP1751711A1 (en)
JP (1) JP2007536671A (en)
KR (1) KR20070008687A (en)
CN (1) CN1950847A (en)
WO (1) WO2005109339A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101711061B1 (en) * 2010-02-12 2017-02-28 삼성전자주식회사 Method for estimating depth information using depth estimation device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2354388A (en) * 1999-07-12 2001-03-21 Independent Television Commiss System and method for capture, broadcast and display of moving images
US20030076406A1 (en) * 1997-01-30 2003-04-24 Yissum Research Development Company Of The Hebrew University Of Jerusalem Generalized panoramic mosaic

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030076406A1 (en) * 1997-01-30 2003-04-24 Yissum Research Development Company Of The Hebrew University Of Jerusalem Generalized panoramic mosaic
GB2354388A (en) * 1999-07-12 2001-03-21 Independent Television Commiss System and method for capture, broadcast and display of moving images

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
ADRIEN BARTOLI, NAVNEET DALAL, AND RADU HORAUD: "Motion Panoramas", 17 February 2003 (2003-02-17), XP002338671, Retrieved from the Internet <URL:http://www.inrialpes.fr/movi/people/Horaud/> [retrieved on 20050801] *
BARTOLI A ET AL: "From video sequences to motion panoramas", MOTION AND VIDEO COMPUTING, 2002. PROCEEDINGS. WORKSHOP ON 5-6 DEC. 2002, PISCATAWAY, NJ, USA,IEEE, 5 December 2002 (2002-12-05), pages 201 - 207, XP010628802, ISBN: 0-7695-1860-5 *
GRACIAS N R ET AL: "Trajectory reconstruction with uncertainty estimation using mosaic registration", 30 June 2001, ROBOTICS AND AUTONOMOUS SYSTEMS, ELSEVIER SCIENCE PUBLISHERS, AMSTERDAM, NL, PAGE(S) 163-177, ISSN: 0921-8890, XP004245253 *
INAMOTO N ET AL: "Intermediate view generation of soccer scene from multiple videos", 11 August 2002, PATTERN RECOGNITION, 2002. PROCEEDINGS. 16TH INTERNATIONAL CONFERENCE ON QUEBEC CITY, QUE., CANADA 11-15 AUG. 2002, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, PAGE(S) 713-716, ISBN: 0-7695-1695-X, XP010613981 *
JONES R C ET AL: "Building mosaics from video using MPEG motion vectors", October 1999, ACM MULTIMEDIA, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE, NEW YORK, NY, US, PAGE(S) 29-32, XP002272152 *

Also Published As

Publication number Publication date
CN1950847A (en) 2007-04-18
KR20070008687A (en) 2007-01-17
EP1751711A1 (en) 2007-02-14
JP2007536671A (en) 2007-12-13

Similar Documents

Publication Publication Date Title
US11217006B2 (en) Methods and systems for performing 3D simulation based on a 2D video image
CN103299610B (en) For the method and apparatus of video insertion
Genc et al. Marker-less tracking for AR: A learning-based approach
KR100271384B1 (en) Video merging employing pattern-key insertion
Brostow et al. Image-based motion blur for stop motion animation
US6124864A (en) Adaptive modeling and segmentation of visual image streams
JP2021511729A (en) Extension of the detected area in the image or video data
JPH11508099A (en) Scene Motion Tracking Method for Raw Video Insertion System
JP2009505553A (en) System and method for managing the insertion of visual effects into a video stream
US20200244891A1 (en) Method to configure a virtual camera path
US10764493B2 (en) Display method and electronic device
Hayashi et al. Synthesizing free-viewpoing images from multiple view videos in soccer stadiumadium
Wu et al. Global motion estimation with iterative optimization-based independent univariate model for action recognition
CN111680671A (en) Automatic generation method of camera shooting scheme based on optical flow
Inamoto et al. Free viewpoint video synthesis and presentation from multiple sporting videos
WO1997026758A1 (en) Method and apparatus for insertion of virtual objects into a video sequence
Carrillo et al. Automatic football video production system with edge processing
WO2005109339A1 (en) Creating an output image
Malik Robust registration of virtual objects for real-time augmented reality
Monji-Azad et al. An efficient augmented reality method for sports scene visualization from single moving camera
Shishido et al. Calibration of multiple sparsely distributed cameras using a mobile camera
KR100466587B1 (en) Method of Extrating Camera Information for Authoring Tools of Synthetic Contents
Thanedar et al. Semi-automated placement of annotations in videos
Liang et al. Video2Cartoon: Generating 3D cartoon from broadcast soccer video
TWI594209B (en) Method for automatically deducing motion parameter for control of mobile stage based on video images

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2005738325

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2007512646

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 1020067023312

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 200580014962.1

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

WWP Wipo information: published in national office

Ref document number: 1020067023312

Country of ref document: KR

WWP Wipo information: published in national office

Ref document number: 2005738325

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 2005738325

Country of ref document: EP