WO2004073313A1

WO2004073313A1 - Spatio-temporal up-conversion

Info

Publication number: WO2004073313A1
Application number: PCT/IB2004/050062
Authority: WO
Inventors: Marco K. Bosma
Original assignee: Koninklijke Philips Electronics N.V.
Priority date: 2003-02-13
Filing date: 2004-01-29
Publication date: 2004-08-26

Abstract

A video conversion unit (300,400) for converting a sequence of input images (100-102), having a first frequency and comprising a first (100) and second (102) input image with a first resolution into a sequence of output images (104-110), having a second frequency being different from the first frequency and comprising an output image (106) with a second resolution being higher than the first resolution, comprises a temporal interpolation unit (202) which is arranged to compute the output image on basis of a first group (116) of pixels and a second group (118) of pixels, the first group (116) of pixels being computed by a spatial interpolation unit (204) on basis of a third group (112) of pixels of the first input image and the second group (118) of pixels being computed by the spatial interpolation unit (204) on basis of a fourth group (114) of pixels of the second input image.

Description

Spatio-temporal up-conversion

The invention relates to a video conversion unit for converting a sequence of input images, having a first frequency and comprising a first input image with a first resolution into a sequence of output images, having a second frequency being different from the first frequency and comprising an output image with a second resolution being higher than the first resolution, the video conversion unit comprising a temporal interpolation unit and a spatial interpolation unit.

The invention further relates to an image processing apparatus comprising: receiving means for receiving a signal corresponding to a sequence of input images, having a first frequency and comprising a first input image with a first resolution; and a video conversion unit as described above for converting the sequence of input images into a sequence of output images, having a second frequency being different from the first frequency and comprising an output image with a second resolution being higher than the first resolution. The invention further relates to a method of converting a sequence of input images, having a first frequency and comprising a first input image with a first resolution into a sequence of output images, having a second frequency being different from the first frequency and comprising an output image with a second resolution being higher than the first resolution. The invention further relates to a computer program product to be loaded by a computer arrangement, comprising instructions to convert a sequence of input images, having a first frequency and comprising a first input image with a first resolution into a sequence of output images, having a second frequency being different from the first frequency and comprising an output image with a second resolution being higher than the first resolution

An embodiment of the video conversion unit of the kind described in the opening paragraph is known from the American patent US 6,108,047. This patent specification discloses a video conversion unit comprising a temporal scaling unit and a spatial scaling unit. The temporal scaling unit is arranged to perform image-rate conversion using motion compensation. Image-rate up-conversion means that from a series of original input images a larger series of output images is computed. The spatial scaling unit comprises a vertical scaling unit and a horizontal scaling unit which are arranged to compute additional pixel values by means of inteφolation of pixel values of the images provided by the temporal scaling unit. Some of these provided images are original input images, while the rest is computed by means of interpolation of multiple original input images. A disadvantage of the known video conversion unit is that often image details disappear in the temporal scaling unit. These details can not be reproduced by means of the spatial scaling unit. As a consequence there is a difference in sharpness of consecutive output images of the video conversion unit: output images directly based on original input images are sharper than output images which are computed by means of temporal interpolation of multiple input images.

It is an object of the invention to provide a video conversion unit of the kind described in the opening paragraph which provides a sequence of output images with a substantially constant image quality.

This object of the invention is achieved in that the temporal interpolation unit is arranged to compute the output image on basis of a first group of pixels, the first group of pixels being computed by the spatial interpolation unit on basis of a third group of pixels of the first input image. A difference with the known video conversion unit is that the temporal scaling is performed after the spatial scaling. That means that first intermediate results are created with higher spatial resolution and subsequently output images are computed on basis of these intermediate results. In general, there is no loss of image detail while calculating the intermediate results.

Reversing the order of temporal and spatial scaling is not obvious. The consequence of performing spatial up-scaling is that the amount of pixel data increases, which typically results in additional memory and computing resource requirements. Hence, the skilled person would not design a video conversion unit in which the temporal interpolation is performed on basis of spatially up -scaled intermediate images.

Preferably, the output image is computed without storage of complete intermediate images. That means that the image processing of an output image is divided into processing for a number of image parts, i.e. groups of pixels. Optionally, only intermediate results corresponding to these parts have to be stored temporarily.

The temporal interpolation is performed on basis of at least one input image. An advantage of applying a single image for the inteφolation is that the resulting output image is relatively sharp.

In an embodiment of the video conversion unit according to the invention, the temporal interpolation unit is arranged to compute the output image on basis of the first group of pixels and a second group of pixels, the second group of pixels being computed by the spatial interpolation unit on basis of a fourth group of pixels of a second one of the input images. In this embodiment according to the invention the temporal interpolation is performed on basis of pixel values of multiple images. An advantage of applying pixel values of multiple images is that occasional errors in motion vectors which are applied for motion compensated temporal interpolation will not result in severe artifacts. In other words, applying pixel values of multiple images has an averaging effect which prevents outliers. An embodiment of the video conversion unit according to the invention, comprises a motion estimation unit for estimating a motion vector which represents the relation between the third group of pixels and the fourth group of pixels on basis of the first and the second one of the input images, the video conversion unit being arranged to apply the motion vector to compute the output image. That means that the motion estimation is performed on basis of the original images which are not spatially scaled. In other words the motion estimation which is required for motion compensated temporal-inteφolation is performed on the original images and not on the spatially up-converted images which are to be temporarily converted. An advantage of this approach is that the amount of computations is less than in the case that the motion estimation has to be performed on basis of the spatially up-converted images. Another advantage is that there is no need to temporarily store complete spatially up-scaled intermediate images. Another advantage is that the quality of the motion vectors is higher.

An embodiment of the video conversion unit according to the invention, comprises a motion estimation unit which is arranged to estimate the motion vector on basis of a sum of differences between pixel values of the first image and further pixel values of the second image. For example, the match error might be the Sum of Absolute Difference (SAD). This match error is a relatively good measure for establishing a match between images parts and which does not require extensive computations. An embodiment of the video conversion unit according to the invention is characterized in that the first group of pixels corresponds to a block of pixels. A typical block comprises 8*8 or 16*16 pixels. In general, block-based image processing matches well with memory access. Hence, memory bandwidth usage is relatively low. In an embodiment of the video conversion unit according to the invention, the spatial interpolation unit is arranged to perform sharpness enhancement for computing the first group of pixels. With sharpness enhancement, high frequencies are amplified. Optionally, the sharpness enhancement includes edge-enhancement. In the case of edge- enhancement, frequency components which do not exist in the original video signal can be generated by means of non-linear operators. The advantage of performing sharpness enhancement is to increase the sharpness of the images.

In an embodiment of the video conversion unit according to the invention, the spatial interpolation unit is arranged to control the sharpness enhancement on basis of the motion vector. Preferably, the amount of sharpness enhancement is adjusted to the expected reduction in sharpness. Sharpness might be reduced because of motion. Hence, more enhancement will be applied in the case of relatively much motion, i.e. relatively long motion vectors. Preferably, the shaφness-enhancement also depends on the direction of the motion, the orientation of the motion vectors. In the article "An overview of flaws in emerging television displays and remedial video processing" , by G. de Haan and M.A. Klompenhouwer, in IEEE Transactions on Consumer Electronics, Aug. 2001, pp. 326-334, a method of motion dependent shaφness enhancement is disclosed.

Sharpness might also be reduced because of interpolation. In the case of a motion vector having a length which is not an integer then the amount of shaφness reduction because of interpolation will be higher than in the case of a motion vector having a length which is an integer. However, if the fractional part of the length of the motion vector is related to the spatial scaling, then there is no substantial shaφness reduction because of interpolation. So, the sharpness enhancement depends on the fractional part of the length of the motion vectors.

It is a further object of the invention to provide a method of the kind described in the opening paragraph which provides a sequence of output images with a substantially constant image quality.

This object of the invention is achieved in that temporal inteφolation is performed to compute the output image on basis of a first group of pixels, the first group of pixels being computed by means of spatial inteφolation of a third group of pixels of the first input image.

It is a further object of the invention to provide an image processing apparatus of the kind described in the opening paragraph, comprising a video conversion unit which provides a sequence of output images with a substantially constant image quality.

This object of the invention is achieved in that the temporal inteφolation unit is arranged to compute the output image on basis of a first group of pixels, the first group of pixels being computed by the spatial interpolation unit on basis of a third group of pixels of the first input image. The image processing apparatus may comprise additional components, e.g. a display device for displaying the output images. The image processing apparatus might e.g. be a TV, a set top box, a VCR (Video Cassette Recorder), a satellite tuner, a DVD (Digital Versatile Disk) player or recorder.

It is a further object of the invention to provide a computer program product of the kind described in the opening paragraph, which provides a sequence of output images with a substantially constant image quality.

This object of the invention is achieved in that the computer program product, after being loaded, provides processing means with the capability to perform temporal interpolation for computing the output image on basis of a first group of pixels, the first group of pixels being computed by means of spatial interpolation of a third group of pixels of the first input image.

Modifications of the video conversion unit and variations thereof may correspond to modifications and variations thereof of the image processing apparatus, the method and the computer program product described.

These and other aspects of the video conversion unit, of the image processing apparatus, of the method and of the computer program product according to the invention will become apparent from and will be elucidated with respect to the implementations and embodiments described hereinafter and with reference to the accompanying drawings, wherein:

Fig. 1 schematically shows a sequence of input images and a sequence of up- converted output images which are based on the sequence of input images;

Fig. 2 shows an embodiment of a video conversion unit according to the prior art; Fig. 3 shows an embodiment of the video conversion unit according to the invention;

Fig. 4 shows another embodiment of the video conversion unit according to the invention; and Fig. 5 shows an embodiment of the image processing apparatus according to the invention. Same reference numerals are used to denote similar parts throughout the figures.

Fig. 1 schematically shows a sequence of input images 100-102 and a sequence of up-converted output images 104-110 which are based on the sequence of input images. The sequence of input images 100-102 has a first frequency, i.e. first image-rate or frame-rate, and the sequence of output images 104-110 has a second frequency, i.e. second image-rate which is higher than the first frequency. The sequence of input images 100-102 comprises a first 100 and second 102 input image with a first resolution and the sequence of output images comprises an output image with a second resolution which is higher than the first resolution. A portion 120 of the output image 106 is based on a first group 116 of pixels and a second group 118 of pixels. The first group 116 of pixels is computed by means of spatial interpolation on basis of a third group 112 of pixels of the first input image 100 and the second group 118 of pixels is computed by means of spatial interpolation on basis of a fourth group 114 of pixels of the second input image 102. In this example the second resolution is twice the first resolution and the second frequency is twice the first frequency. It will be clear that other ratios between input an output resolutions and input and output frequencies are also possible. In connection with Fig. 1 it is described that a portion 120 of the output image

106 is based on a first group 116 of pixels which is derived from the first input image and a second group 118 of pixels which is derived from the second input image. It should be noted that in the video conversion unit 300, 400 of the invention, as described in connection with Figs. 3 and 4, the temporal interpolation might also be performed on basis of a single image. In the case of a temporal interpolation on basis of a single image it is still possible to apply motion compensation. That means that motion vectors and the pixel values of the single image are applied to compute the motion compensated output image. In European patent application EPA 0 475 499 Al, of G. de Haan and GFM De Poortere this approach is disclosed. Typically, multiple images are required for the estimation of the motion vectors. Fig. 2 shows an embodiment of a video conversion unit 200 according to the prior art. The video conversion unit 200 is arranged to convert a series of original input images into a larger series of output images. Output images are temporally located between two original input images. The series of original input images is provided at the input connector 214 and the output images are provided at the output connector 216. The video conversion unit 200 comprises a number of memory devices 208-212 for temporarily storage of input images, intermediate pixel values and motion vectors, respectively. The video conversion unit 200 further comprises: a motion estimation unit 206 for estimating motion vectors on basis of input images; a temporal interpolation unit 202 for computing an intermediate image on basis of two input images and the corresponding motion vectors; and a spatial inteφolation unit 204 for computing a spatially up-converted output image on basis of an intermediate image. Fig. 3 shows an embodiment of the video conversion unit 300 according to the invention. The video conversion unit 300 is arranged to convert a sequence of input images 100-102, having a first frequency and comprising a first 102 and second 102 input image with a first resolution into a sequence of output images 104-110, having a second frequency which is higher than the first frequency and comprising an output image 106 with a second resolution being higher than the first resolution. The series of original input images is provided at the input connector 214 and the output images are provided at the output connector 216. The video conversion unit 300 comprises a number of memory devices 208- 212 for temporarily storage of spatially up-converted images, pixel values of input images and motion vectors, respectively. The video conversion unit 200 further comprises: - a spatial interpolation unit 204 for computing spatially up-converted images on basis of input images. a motion estimation unit 206 for estimating motion vectors on basis of the spatially up-converted images; and a temporal interpolation unit 202 for computing an output image on basis of two spatially up-converted images and the corresponding motion vectors.

The motion estimation unit 206 is e.g. as specified in the article "True-Motion Estimation with 3-D Recursive Search Block Matching" by G. de Haan et. al. in IEEE Transactions on circuits and systems for video technology, vol.3, no.5, October 1993, pages 368-379. The temporal interpolation unit 202, the spatial inteφolation unit 204 and the motion estimation unit 206 may be implemented using one processor. Normally, these functions are performed under control of a software program product. During execution, normally the software program product is loaded into a memory, like a RAM, and executed from there. The program may be loaded from a background memory, like a ROM, hard disk, or magnetically and/or optical storage, or may be loaded via a network like Internet. Optionally an application specific integrated circuit provides the disclosed functionality.

Fig. 4 shows another embodiment of the video conversion unit 400 according to the invention. The video conversion unit 400 is arranged to convert a sequence of input images 100-102, having a first frequency and comprising a first 102 and second 102 input image with a first resolution into a sequence of output images 104-110, having a second frequency which is higher than the first frequency and comprising an output image 106 with a second resolution being higher than the first resolution. The series of original input images is provided at the input connector 214 and the output images are provided at the output connector 216. The video conversion unit 400 comprises a first memory device 208 for temporarily storage of input images 100-102 and a second memory device 212 for temporarily storage of motion vectors. The video conversion unit 400 further comprises a motion estimation unit 206 for estimating motion vectors on basis of the input images 206. The video conversion unit 400 further comprises a temporal interpolation unit 202 and a spatial interpolation unit 204, characterized in that the temporal interpolation unit 202 is arranged to compute the output image 106 on basis of a first group 116 of pixels and a second group 118 of pixels, the first group 116 of pixels being computed by the spatial interpolation unit 204 on basis of a third group 112 of pixels of the first input image 100 and the second group 118 of pixels being computed by the spatial interpolation unit 204 on basis of a fourth group 114 of pixels of the second input image 102.

The working of this embodiment of the video conversion unit 400 according to the invention is as follows. First the motion vectors for a new output image 106 are computed on basis of two original input images 100 and 102. These motion vectors are temporarily stored in the second memory device 212. On basis of a first one of the motion vectors the appropriate pixel values, i.e. corresponding to the third group 112 of pixels, are fetched from the first memory device 208 which holds the first 100 and second 102 input image. On basis of this first one of the motion vectors also the fourth group 114 of pixels is fetched from the first memory device 208. The spatial interpolation unit 204 computes the first group 116 of pixels on basis of the third group 112 of pixels. For example, the third group 112 of pixels comprises 9*9 (or 8*8) pixels and the first group 116 of pixels comprises 16*16 pixels.

Optionally, sharpness enhancement, as described above, is applied during this computation. The spatial inteφolation unit 204 computes the second group 118 of pixels on basis of the fourth group 114 of pixels. Optionally, shaφness enhancement, as described in the article "An overview of flaws in emerging television displays and remedial video processing" , by G. de Haan and M.A. Klompenhouwer, in IEEE Transactions on Consumer Electronics, Aug. 2001, pp. 326-334, is applied during this computation. The temporal inteφolation unit 202 combines the first group 116 of pixels and the second group 118 of pixels into a portion 120 of the output image 106.

After that for a second and subsequent ones of the motion vectors similar processing steps are performed. Eventually, the complete output image is computed.

Fig. 5 shows an embodiment of the image processing apparatus 500 according to the invention, comprising: - Receiving means 502 for receiving a signal representing input images. The signal may be a broadcast signal received via an antenna or cable but may also be a signal from a storage device like a VCR (Video Cassette Recorder) or Digital Versatile Disk (DVD). The signal is provided at the input connector 510;

The video conversion unit as described in connection with Fig. 3 or Fig. 4; and A display device 506 for displaying the output images of the video conversion unit 504.

The image processing apparatus 500 might e.g. be a TV. Alternatively the image processing apparatus 500 does not comprise the optional display device but provides the output images to an apparatus that does comprise a display device 506. Then the image processing apparatus 500 might be e.g. a set top box, a satellite-tuner, a VCR player, a DVD player or recorder. Optionally the image processing apparatus 500 comprises storage means, like a hard-disk or means for storage on removable media, e.g. optical disks. The image processing apparatus 500 might also be a system being applied by a film-studio or broadcaster.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be constructed as limiting the claim. The word 'comprising' does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements and by means of a suitable programmed computer. In the unit claims enumerating several means, several of these means can be embodied by one and the same item of hardware.

Claims

CLAIMS:

1. A video conversion unit (300,400) for converting a sequence of input images (100-102), having a first frequency and comprising a first (100) input image with a first resolution into a sequence of output images (104-110), having a second frequency being different from the first frequency and comprising an output image (106) with a second resolution being higher than the first resolution, the video conversion unit (300,400) comprising a temporal interpolation unit (202) and a spatial inteφolation unit (204), characterized in that the temporal interpolation unit (202) is arranged to compute the output image on basis of a first group (116) of pixels, the first group ( 116) of pixels being computed by the spatial inteφolation unit (204) on basis of a third group (112) of pixels of the first input image.

2. A video conversion unit (300,400) as claimed in claim 1, characterized in that the temporal interpolation unit (202) is arranged to compute the output image on basis of the first group (116) of pixels and a second group (118) of pixels, the second group (118) of pixels being computed by the spatial interpolation unit (204) on basis of a fourth group (114) of pixels of a second one of the input images.

3. A video conversion unit (300,400) as claimed in claim 2, characterized in comprising a motion estimation unit (206) for estimating a motion vector which represents the relation between the third group (112) of pixels and the fourth group (114) of pixels on basis of the first and the second one of the input images, the video conversion unit being arranged to apply the motion vector to compute the output image.

4. A video conversion unit (400) as claimed in claim 3, characterized in that the motion estimation unit (206) is arranged to estimate the motion vector on basis of a sum of differences between pixel values of the first image and further pixel values of the second image.

5. A video conversion unit (300,400) as claimed in claim 1, characterized in that the first group (116) of pixels corresponds to a block of pixels.

6. A video conversion unit (300,400) as claimed in claim 1, characterized in that the spatial interpolation unit (204) is arranged to perform sharpness enhancement for computing the first group (116) of pixels.

7. A video conversion unit (300,400) as claimed in claim 6, characterized in that the spatial interpolation unit (204) is arranged to control the shaφness enhancement on basis of the motion vector.

8. A method of converting a sequence of input images (100-102), having a first frequency and comprising a first input image with a first resolution into a sequence of output images (104-110), having a second frequency being different from the first frequency and comprising an output image with a second resolution being higher than the first resolution, characterized in that temporal interpolation is performed to compute the output image on basis of a first group ( 116) of pixels, the first group ( 116) of pixels being computed by means of spatial interpolation of a third group (112) of pixels of the first input image.

9. An image processing apparatus (500) comprising: receiving means (502) for receiving a signal corresponding to a sequence of input images (100-102), having a first frequency and comprising a first input image with a first resolution; and a video conversion unit (300,400) for converting the sequence of input images (100-102) into a sequence of output images (104-110), having a second frequency being different from the first frequency and comprising an output image with a second resolution being higher than the first resolution, as claimed in claim 1.

10. An image processing apparatus (500) as claimed in claim 9, characterized in further comprising a display device (506) for displaying the output images (104-110).

11. An image processing apparatus (500) as claimed in claim 10, characterized in that it is a TV.

12. A computer program product to be loaded by a computer arrangement, comprising instructions to convert a sequence of input images (100-102), having a first frequency and comprising a first input image with a first resolution into a sequence of output images (104-110), having a second frequency being different from the first frequency and comprising an output image with a second resolution being higher than the first resolution, the computer arrangement comprising processing means and a memory, the computer program product, after being loaded, providing said processing means with the capability to perform temporal interpolation for computing the output image on basis of a first group (116) of pixels, the first group (116) of pixels being computed by means of spatial interpolation of a third group ( 112) of pixels of the first input image.