GB2277006A

GB2277006A - Generating motion vectors; subsampling video signals, interpolating correlation surfaces

Info

Publication number: GB2277006A
Application number: GB9312129A
Authority: GB
Inventors: Morgan William Amos David; Martin Rex Dorricott; Carl William Walters
Original assignee: Sony United Kingdom Ltd
Current assignee: Sony Europe Ltd
Priority date: 1993-04-08
Filing date: 1993-06-11
Publication date: 1994-10-12
Anticipated expiration: 2013-06-11
Also published as: GB9312129D0; GB2277006B

Abstract

A motion compensated video signal processing apparatus comprises a subsampler (170) for subsampling input images of an input digital video signal, to generate corresponding subsampled images; a block comparator (190) for comparing blocks of pixels from a pair of the subsampled images, to generate a first plurality of original correlation surfaces, each comprising an array of correlation values representing correlation between the respective blocks of pixels; means (200) for generating a second plurality of interpolated correlation surfaces by interpolation from the original correlation surfaces, the second plurality being greater than the first plurality; detect; by a point of maximum correlation in that interpolated correlation surface; means (210) For generating a respective motion vector from each interpolated correlation surface, in dependence on the detected point of maximum correlation in tat interpolated correlation surface; means (230) for selecting motion vectors for use in interpolation of respective pixels of an output image of an output digital video signal, by comparing test blocks, pointed to by a motion vector under test, in the pair of subsampled images, each test block comprising pixels of the respective subsampled image and test values interpolated from those pixels; and a motion compensated interpolator (140) for interpolating the output image from a pair of input images of the input digital video signal corresponding to the pair of subsampled images, according to the respective selected motion vectors. <IMAGE>

Description

MOTION COMPENSATED VIDEO SIGNAL PROCESSING This invention relates to motion compensated video signal processing.

Motion compensated video signal processing is used in applications such as television standards conversion, film standards conversion and conversion between video and film standards.

In a motion compensated television standards converter, such as the converter described in the British Published Patent Application number GB-A-2 231 749, pairs of successive input images are processed to generate sets of motion vectors representing image motion between the pair of input images. The processing is carried out on discrete blocks of the images, so that each motion vector represents the interimage motion of the contents of a respective block.

Each set of motion vectors is then supplied to a motion vector reducer which derives a subset of the set of motion vectors for each block. The subset is then passed to a motion vector selector which assigns one of the subset of motion vectors to each picture element (pixel) in each block of the image. The selected motion vector for each pixel is supplied to a motion compensated interpolator; the interpolator operates on progressive-scan converted versions of the input images to interpolate successive output images. taking into account the motion between the input images.

Motion compensated video signal processing such as that described above requires powerful and complex processing apparatus to carry out the very large number of calculations required to generate and process motion vectors for each pair of input images. This is particularly true if the images are in a high definition format, or if the processing is to be performed on an input video signal to produce an output video signal in real time, in which case multiple sets of identical apparatus may be operated in parallel in order to generate sets of motion vectors for each output image in the time available (e.g. an output field period).

This invention provides a motion compensated video signal processing apparatus comprising: a subs ampler for subsampling input images of an input digital video signal, to generate corresponding subsampled images; a block comparator for comparing blocks of pixels from a pair of the subsampled images, to generate a first plurality of original correlation surfaces, each comprising an array of correlation values representing correlation between the respective blocks of pixels; means for generating a second plurality of interpolated correlation surfaces by interpolation from the original correlation surfaces, the second plurality being greater than the first plurality; means for interpolating between correlation values in each interpolated correlation surface, to detect a point of maximum correlation in that interpolated correlation surface; means for generating a respective motion vector from each interpolated correlation surface, in dependence on the detected point of maximum correlation in that interpolated correlation surface; means for selecting motion vectors for use in interpolation of respective pixels of an output image of an output digital video signal, by comparing test blocks, pointed to by a motion vector under test, in the pair of subsampled images, each test block comprising pixels of the respective subsampled image and test values interpolated from those pixels: and a motion compensated interpolator for interpolating the output image from a pair of input images of the input digital video signal corresponding to the pair of subsampled images, according to the respective selected motion vectors.

As mentioned above, motion compensated video signal processing places great demands on the processing capacity of a video signal processing apparatus, particularly when the processing is to be performed on high definition video signals and in real time. The invention recognises that a number of features of motion compensated processing are particularly demanding, and provides a number of measures to reduce the processing requirements of these features. This can obviate or reduce the need for parallel processing to generate and use the motion vectors, leading to a reduction in the complexity (and the corresponding cost and size) of the apparatus.

The measures provided by the invention are: 1. The interpolation of output images (e.g. fields or frames) directly from the input images. This removes the need for progressive scan conversion of the input images (e.g. fields or frames).

2. The generation of motion vectors from subsampled versions of the input images, which reduces the processing overhead of block matching.

3. Interpolating a larger number of correlation surfaces from those generated by block matching. Each interpolated correlation surface can then be used to generate a motion vector for subsequent use in interpolation.

4. Performing minimum (correlation maximum) detection in each interpolated correlation surface to sub-pixel accuracy - this helps to alleviate the loss in resolution of the correlation surfaces caused by the subsampling process.

5. Pixel values are interpolated for use in the test blocks during motion vector selection. Again, this helps to alleviate the loss in resolution of the correlation surfaces caused by the subsampling process.

Preferably, the input digital video signal has a predetermined resolution, and the apparatus comprises means for receiving a digital video signal; and means for adding dummy pixel values to the received digital video signal to generate the input digital video signal, in the case when the received digital video signal has a lower resolution than the predetermined resolution.

In an advantageously simple embodiment, the dummy pixel values are pixel values representing black pixels.

In order that a representation of the output of the apparatus can be viewed, recorded or transmitted using conventional definition equipment even in the case where the output digital video signal is a high definition video signal, it is preferred that the apparatus comprises a down-converter for generating a second output digital video signal from the firstmentioned output digital video signal, the second output digital video signal having a lower resolution than the firstmentioned output digital video signal.

Preferably the apparatus comprises a first time base changer for selecting pairs of input images for use in interpolation of a respective output image; and a second time base changer for selecting pairs of subsampled images, corresponding to the pairs of input images selected by the first time base changer, for use in the generation of a respective set of motion vectors indicative of image motion between that pair of subsampled images.

In a preferred embodiment the first time base changer comprises means for generating a control signal indicative of the temporal position of each output image with respect to the pair of input images selected for use in interpolation of that image; and the second time base changer comprises means for generating a control signal indicative of the temporal position of each output image with respect to the pair of subsampled images selected for use in the generation of motion vectors.

Although the apparatus is useful when used to process conventional definition (or resolution) video signals, it is preferred that the input digital video signal is a high resolution video signal.

Preferably the input digital video signal is an interlaced video signal.

The apparatus is useful with many different video signal formats.

However, it is preferred that the input digital video signal is selected from the group consisting of: an 1125/60 2:1 interlaced video signal; an 1125/30 1:1 non-interlaced video signal; a 1250/50 2:1 interlaced video signal; a 1250/25 1:1 non-interlaced video signal; a 525/60 2:1 interlaced video signal; a 525j30 1:1 non-interlaced video signal; a 625/50 2:1 interlaced video signal; a 625/25 1:1 non-interlaced video signal; and an 1125/24 3232 pull-down video signal.

Again, although the apparatus is useful when used to generate conventional definition (or resolution) video signals, it is preferred that the output digital video signal is a high resolution video signal.

Preferably the output digital video signal is an interlaced video signal.

Preferably the output digital video signal is selected from the group consisting of: an 1125/60 2:1 interlaced video signal; an 1125/30 1:1 non-interlaced video signal; a 1250/50 2:1 interlaced video signal; a 1250/25 1:1 non-interlaced video signal; a 525/60 2:1 interlaced video signal; a 525/30 1:1 non-interlaced video signal: a 625/50 2:1 interlaced video signal; a 625/25 1:1 non-interlaced video signal; and an 1125/24 3232 pull-down video signal.

Apparatus according to the invention is particularly usefully employed in a television standards conversion apparatus, a film standards conversion apparatus or an apparatus for converting between film and television standards.

Viewed from a second aspect this invention provides a method of motion compensated video signal processing, the method comprising the steps of: subsampling input images of an input digital video signal, to generate corresponding subsampled images; comparing blocks of pixels from a pair of the subsampled images, to generate a first plurality of original correlation surfaces, each comprising an array of correlation values representing correlation between the respective blocks of pixels; generating a second plurality of interpolated correlation surfaces by interpolation from the original correlation surfaces, the second plurality being greater than the first plurality; interpolating between correlation values in each interpolated correlation surface, to detect a point of maximum correlation in that interpolated correlation surface; generating a respective motion vector from each interpolated correlation surface, in dependence on the detected point of maximum correlation in that interpolated correlation surface; selecting motion vectors for use in interpolation of respective pixels of an output image of an output digital video signal, by comparing test blocks, pointed to by a motion vector under test, in the pair of subsampled images, each test block comprising pixels of the respective subsampled image and test values interpolated from those pixels; and interpolating the output image from a pair of input images of the input digital video signal corresponding to the pair of subsampled images, according to the respective selected motion vectors.

An embodiment of the invention will not be described, by way of example only, with reference to the accompanying drawings, throughout which like parts are referred to by like references, and in which: Figure 1 is a schematic block diagram of a motion compersated television standards conversion apparatus; Figure 2 is a schematic diagram illustrating vertical subsampling of an interlaced video field; Figure 3 is a schematic diagram of a correlation surface; Figures 4a, 4b and 4c schematically illustrate the interpolation of correlation surfaces; Figure 5 is a schematic diagram of a part of an image; Figures 6a and 6b are schematic diagrams of original and interpolated correlation surfaces derived from the image of Figure 5; Figure 7 is a schematic block diagram of a part of a correlation surface processor; Figure 8 is a schematic block diagram of a motion vector estimator; Figure 9 illustrates an array of points on a correlation surface; and Figures 10 and 11 schematically illustrate pixel interpolation performed during motion vector selection.

Figure 1 is a schematic block diagram of a motion compensated television standards conversion apparatus. The apparatus receives an input interlaced digital video signal 50 (e.g. an 1125!60 2:1 high definition video signal (HDVS)) and generates an output interlaced digital video signal 60 (e.g a 1250/50 2:1 signal).

The input video signal 50 is first supplied to an input buffer/packer 110. In the case of a conventional definition input signal, the input bufferipacker 11G formats the image data into a high definition (16:9 aspect ratio) format, padding with black pixels where necessary. For a HDVS input the input buffer/packer 110 merely provides buffering of the data.

The data ae passed from the input bufferipacker 110 to a matrix circuit 120 in which (if necessary) the input video signal's colorimetry is converted to the colorimetry of the desired output signal, such as the standard "CCIR recommendation 601' (Y,Cr,Cb) colorimetr.

From the matrix circuit 120 the input video signal is passed to a time base changer and delay 130, and via a sub-sampler 170 to a subsampled time base changer and delay 180. The time base changer and delay 130 determines the temporal position of each field of the output video signal, and selects the two fields of the input video signal which are temporally closest to that output field for use in interpolating that output field. For each field of the output video signal, the two input fields selected by the time base changer are appropriately delayed before being supplied to an interpolator 140 in which that output field is interpolated. A control signal t, indicating the temporal position of each output field with respect to the two selected input fields, is supplied from the time base changer and delay 130 to the interpolator 140.

The subsampled time base changer and delay 180 operates in a similar manner, but using spatially subsampled video supplied by the subsampler 170. Pairs of fields, corresponding to the pairs selected by the time base changer 130, are selected by the subsampled time base changer and delay rolll from the subsanipled .-iseo, to be used in the generation of motion vectors.

The time base changers 130 and 180 can operate according to synchronisation signals associated with the input video signal, the output video signal, or both. In the case in which only one synchronisation signal is supplied, the timing of fields of the other of the two video signals is generated deterministically within the time base changers 130, 180.

The pairs of fields of the subsampled input video signal selected by the subsampled time base changer and delay 180 are supplied to a motion processor 185 comprising a direct block matcher 190, a correlation surface processor 200, a motion vector estimator 210, a motion vector reducer 220, a motion vector selector 230 and a motion vector post-processor 240. The pairs of input fields are supplied first to the direct block matcher 190 which calculates correlation surfaces representing the spatial correlation between search blocks in the temporally earlier of the two selected input fields and (larger) search areas in the temporally later of the two input fields.

From the correlation surfaces output by the block matcher 190, the correlation surface processor 200 generates a larger number of interpolated correlation surfaces, which are then passed to the motion vector estimator 210. The motion vector estimator 210 detects points of greatest correlation in the interpolated correlation surfaces. (The original correlation surfaces actually represent the difference between blocks of the two input fields; this means that the points of maximum correlation are in fact minima on the correlation surfaces, and are referred to as "minima") . In order to detect a minimum, additional points on the correlation surfaces are interpolated, providing a degree of compensation for the loss of resolution caused by the use of subsampled video to generate the surfaces. From the detected minimum on each correlation surface, the motion vector estimator 210 generates a motion vector which is supplied to the motion vector reducer 220.

The motion vector estimator 210 also performs a confidence test on each generated motion vector to establish whether that motion vector is significant above the average data level, and associates a confidence flag with each motion vector indicative of the result of the confidence test. The confidence test. knon as the "threshold" test, is described (along with certain other features of the apparatus of Figure 1) in GB-A-2 231 749. The confidence test is also discussed in more detail below.

A test is also performed by the motion vector estimator 210 to detect whether each vector is aliased. In this test, the correlation surface (apart from an exclusion zone around the detected minimum) is examined to detect the next lowest minimum. If this second minimum does not lie at the edge of the exclusion zone, the motion vector derived from the original minimum is flagged as being potentially aliased.

The motion vector reducer 220 operates to reduce the choice of possible motion vectors for each pixel of the output field, before the motion vectors are supplied to the motion vector selector 230. The output field is notionally divided into blocks of pixels, each block having a corresponding position in the output field to that of a search block in the earlier of the selected input fields. The motion vector reducer compiles a group of four motion vectors to be associated with each block of the output field, with each pixel in that block eventually being interpolated using a selected one of that group of four motion vectors.

Vectors which have been flagged as "aliased" are re-qualified during vector reduction if they are identical to non-flagged vectors in nearby blocks.

As part of its function, the motion vector reducer 220 counts the frequencies of occurrence of "good" motion vectors (i.e. motion vectors which pass the confidence test and the alias test, or which were requalified as non-aliased), with no account taken of the position of the blocks of the input fields used to obtain those motion vectors. The good motion vectors are then ranked in order of decreasing frequency.

The most common of the good motion vectors which are significantly different to one another are then classed as "global" motion vectors.

Three motion vectors which pass the confidence test are then selected for each block of output pixels and are supplied, with the zero motion vector, to the motion vector selector 230 for further processing.

These three selected motion vectors are selected in a predetermined order of preference from: (i) the motion vector generated from the corresponding search block (the "local" motion vector"); (ii) those generated from surroullding search blocks ("neighbouring motion vectors); and (iii) the global motion vectors.

The motion vector selector 230 also receives as inputs the two input fields which were selected by the subsampled time base changer and delay 180 and which were used to calculate the motion vectors.

These fields are suitably delayed so that they are supplied to the motion vector selector 230 at the same time as the vectors derived from them. The motion vector selector 230 supplies an output comprising one motion vector per pixel of the output field. This motion vector is selected from the four motion vectors for that block supplied by the motion vector reducer 220.

The vector selection process involves detecting the degree of correlation between test blocks of the two input fields pointed to by a motion vector under test. The notion vector having the greatest degree of correlation between the test blocks is selected for use in interpolation of the output pixel. A "motion flag" is also generated by the vector selector. This flag is set to "static" (no motion) if the degree of correlation between blocks pointed to by the zero motion vector is greater than a preset threshold.

The vector post-processor reformats the motion vectors selected by the motion vector selector 230 to reflect any vertical or horizontal scaling of the picture, and supplies the reformatted vectors to the interpolator 140. Using the motion vectors, the interpolator 140 interpolates an output field from the corresponding two (nonsubsampled) interlaced input fields selected by the time base changer and delay 130, taking into account any image motion indicated by the motion vectors currently supplied to the interpolator 140.

If the motion flag indicates that the current output pixel lies in a moving or temporally changing part of the image, pixels from the two selected fields supplied to the interpolator are combined in relative proportions depending on the temporal position of the output field with respect to the two input fields (as indicated by the control signal t), so that a larger proportion of the nearer input field is used. If the motion flag is set to "static" then the temporal weighting is fixed at 50fro of each input field. The output of the interpolator 140 is passed to an output buffer 1-,0 for output as a high definition output signal, and to a down-converter 160 which generates a conventional definition output signal 165, using the motion flag.

The down-converter 160 allows a representation of the output of the apparatus (which may be, for example, a high definition video signal) to be monitored, transmitted and/or recorded using conventional definition apparatus. This has benefits because conventional definition recording equipment is significantly cheaper and very much more widespread than high definition equipment. For example, a simultaneous output of conventional and high definition video may be required for respective transmission by terrestrial and satellite channels.

The subsampler 170 performs horizontal and vertical spatial subsampling of the input video fields received from the matrix 120, before those input fields are supplied to the time base changer 180.

Horizontal subsampling is a straightforward operation in that the input fields are first prefiltered by a half-bandwidth low pass filter (in the present case of 2:1 horizontal decimation) and alternate video samples along each video line are then discarded, thereby reducing by one half the number of samples along each video line.

Vertical subsampling cf the input fields is complicated by the fact that, in this embodiment. the input video signal 50 is interlaced.

This means that successive lines of video samples in each interlaced field are effectively two video lines apart, and that the lines in each field are vertically displaced from those in the preceding or following field by one video line of the complete frame.

One approach to vertical subsampling would be to perform progressive scan conversion (to generate successive progressively scanned video frames each having 1125 lines) and then to subsample the progressively scanned frames by a factor of 2 to perform the vertical subsampling. However, efficient progressive scan conversion would require a degree of motion compensated processing, which processing could adversely affect the operation of the motion processor 185.

Furthermore, real-time progressive scan conversion of a high definition video signal would require particularly powerful and complex processing apparatus.

A simpler approach to vertical spatial subsampling is shown in Figure 2, in which the input fields are first low pass filtered in the vertical direction (to reduce potential aliasing) and a filtering operation is then performed which effectively displaces each pixel vertically by half a video line downwards (for even fields) or upwards (for odd fields). The resulting displaced fields are broadly equivalent to progressively scanned frames which have been subsampled vertically by a factor of two.

In summary, therefore, the result of the subsampling operations described above is that the motion processor 135 operates on pairs of input fields which are spatially subsampled b a factor of two in the horizontal and the vertical directions. This reduces the processing required for motion vector estimation by a factor of four.

Figure 3 is a schematic diagram of a correlation surface 300.

The correlation surface represents the difference between a search block of the earlier of the two input fields from which the surface is generated, and a (larger) search area in the later of the two input fields. A peak in correlation is therefore represented by a minimum point 310 on the correlation surface 300. The position of the minimum point 310 on the correlation surface 300 determines the magnitude and direction of the motion vector derived from that correlation surface.

In the apparatus of Figure 1, each motion vector is generated by detecting a minimum point on a respective correlation surface. In total, for each pair of input fields supplied to the motion processor 185, eight thousand correlation surfaces are supplied to the vector estimator 210 for use in the generation of eight thousand motion vectors.

In order to reduce the processing requirements of the apparatus of Figure 1, only one quarter of the total number of correlation surfaces are generated by the comparison of blocks of the two subsampled input fields supplied to the block matcher 190. The correlation surfaces to be used in motion vector generation are then interpolated from the correlation surfaces generated by block matching.

This means that two thousand "original" correlation surfaces are generated by the block matcher 190 and supplied to the correlation surface processor 200; the correlation surface processor 200 then generates eight thousand "interpolated" correlation surfaces from the two thousand original correlation surfaces, for use in motion vector estimation.

Figures 4a, 4b, and 4c schematically illustrate the interpolation of correlation surfaces performed by the correlation surface processor 200.

Referring to Figure 4a, each original correlation surface 400 is generated (by the block matcher 190) by comparing a search block at a particular position within the earlier of a pair of subsampled input fields with a (larger) search area in the other of the pair of subsampled input fields. As shown in Figure 4b the search blocks are centred around respective positions (e.g. the position 410) in a grid pattern 420 imposed on the respective input field. The original correlation surfaces generated from the search blocks have corresponding relative positions 410 on the grid 420.

As described above, in order to increase the number of correlation surfaces from the two thousand generated by block matching to the eight thousand required by the motion vector estimator 210, an interpolation process is performed in the correlation surface processor 200 by which four interpolated correlation surfaces 430, 440, L1SO and 460 are generated from each original correlation surface 400. (In fact a filtering process is used so that each interpolated correlation surface depends on a number of surrounding original correlation surfaces).

The interpolated correlation surfaces 430, 440, 450 and 460 have effective positions which are centred around that of the original correlation surface 400, but are displaced slightly horizontally and vertically from the position of the original correlation surface 400.

The displacements are indicated in Figure 4a as fractions of the horizontal and vertical spacing of the original correlation surface grid (i.e. the search block grid 420). In particular, the displacements of the four interpolated correlation surfaces (CS) 430, 440, 450 and 460 generated from the original correlation surface 400 are as follows: Interpolated CS 430: (- t horizontally; - 4 vertically) Interpolated CS 440: (- - horizontally; + <RTI I surfaces is illustrated. In the correlation surface 530, a minimum 535 represents the motion of the top of the bar 510, and corresponds to a horizontal component of the motion of + a. In the correlation surface 540 the minimum 545 is at a point indicative of zero motion, and therefore represents the motion of the centre 520 of the bar 510.

Finally, in the correlation surface 550, the minimum 555 is at a point indicative of a horizontal component of - va, and therefore represents the motion of the bottom of the bar.

As described above, the original correlation surfaces 530, 540 and 550 are not used by the vector estimator 210 in the generation of motion vectors. Instead, interpolated correlation surfaces, at one half of the horizontal and vertical spacing of the original correlation surfaces, are generated for use in interpolation of motion vectors.

Cross sections through five such interpolated correlation surfaces 560, 570, 580, 590 and 600 are illustrated in Figure 6b. The five interpolated correlation surfaces 560, 570, 580. 590 and 600, have respective minima > 6" 56), 58rj. 595 and 605, which indicate respective horizontal motion components of: minimum 565: + va (horizontal motion of the top of the bar 510); minimum 575: + va/2 (horizontal motion of a point between the top and centre 520 of the bar 510) minimum 585: 0 (horizontal motion of the centre 520 of the bar 510) minimum 595: - vaj2 (horizontal motion of a point between the centre 520 and the bottom of the bar 510) minimum 605: - va (horizontal motion of the bottom of the bar 510) The five interpolated correlation surfaces 560. 570, 580. 590 and 600 thus represent the motion of the rotating bar 510 at twice the vertical spatial resolution of the original correlation surfaces 530, 540, 550.

Figure 7 is a schematic block diagram of a part of the correlation surface processor 200 in which the correlation surfaces are interpolated.

In the apparatus of Figure 7. input data 610 representing the original correlation surfaces are supplied in serial form f:'om the block matcher 190 to a number of correlation surface delays 620. Each correlation surface delay 620 delays the input data 610 by a period equivalent to the transmission time of the data representing one correlation surface. This means that the data at the input and the output of each of the correlation surface delays 620 represent identical points within two adjacent correlation surfaces.

The input data 610 are also supplied to two row delays 630, 640, which delay the data by a period equivalent to the transmission time of a complete row of correlation surfaces (i.e. the correlation surfaces generated from a complete row of search blocks). This means that the data at the input and the output of a row delay (630 or 640) represent identical points within two correlation surfaces at corresponding positions in two adjacent rows. The output of each of the row delays 630, b40 is supplied to a series of four correlation surface delays 620.

The input data 610, the data at the output of each of the correlation surface delays 620, and the data of the output of each of the row delays 630, 640 are multiplied by respective filter coefficients COO, CO1, ..., C42, before being summed by an adder 650.

The summed output 660 of the adder 650 represents successive points within an interpolated correlation surface. By using the filtering arrangement shown in Figure 7, each interpolated correlation surface is derived by a filtered combination of data representing corresponding positions in fifteen surrounding original correlation surfaces.

By using the apparatus shown in Figure 7 to generate the required eight thousand correlation surfaces, significantly less data processing hardware is required than would be the case if the eight thousand correlation surfaces were generated directly by block matching.

When the correlation surfaces interpolated b the apparatus of Figure 7 are passed to the motion vector estimator 210, they are examined to detect the point of minimum difference (the point of maximum correlation) in each surface, from which a respective motion vector is generated. A second lowest minimum is also detected from the correlation surface, apart from an exclusion region around the actual minimum point. A confidence test is then performed to detect the significance of the original minimum with respect to the remainder of the correlation surface. Only those motion vectors which pass the confidence test are made available for subsequent use by the interpolator 140.

The fact that the input fields supplied to the block matcher are spatially subsampled means that the calculation of the original correlation surfaces requires less intensive processing than would otherwise by the case. However, the subsampling also has the effect that the spatial resolution of the original and interpolated correlation surfaces is reduced be a corresponding amount.

Figure 8 is a schematic block diagram of the motion vector estimator 210, showing how the reduction in spatial resolution of the correlation surfaces, caused by the spatial subsampling of the input fields, can be overcome. The motion vector estimator 210 receives correlation surfaces in digital form from the direct block matcher 190 and comprises a correlation maximum detector 212, a correlation surface interpolator 214 and a revised maximum detector 216. The revised maximum detector 216 supplies motion vectors derived from the correlation surfaces to the motion vector reducer 220.

The operation of the motion vector estimator 210 will now be described with reference to Figure 9, which shows an array of points on a correlation surface, along with a graphical cross section through the correlation surface (along a line A-A). The correlation maximum detector 212 receives the correlation surface from the direct block matcher 190 and detects points of maximum correlation, such as a point 218 referred to as the "original" maximum. The correlation maximum detector 212 outputs the value of the point 218 and its position on the correlation surface.

The correlation surface interpolator receives the position of the original maximum point 218 along with data representing the original correlation surface, and interpolates eight additional points surrounding the point 218 using a two-dimensional interpolator. The interpolated correlation values at the interpolated points are supplied, along with the original maximum value, to the revised maximum detector 216 which detects whether any of the interpolated correlation values represents a greater maximum than the original maximum. In the example shown in Figure 9, an interpolated point 219 represents a greater maximum than the point 218, and so a motion vector dependent on the point 219 is generated and passed to tulle motion vector reducer 220.

As described above, the motion vector reducer 220 operates to reduce the choice of possible motion vectors for each pixel of the output field, before the motion vectors are supplied to the motion vector selector 230. The output field is notionally divided into blocks of pixels, each block having a corresponding position in the output field to that of a search block in the earlier of the selected input fields. The motion vector reducer compiles a group of four motion vectors to be associated with each block of the output field, with each pixel in that block eventually being interpolated using a selected one of that group of four motion vectors. The four motion vectors for each block (the zero motion vector and three other vectors) are passed to the vector selector 230.

The motion vector selector 230 selects a respective one of the four motion vectors corresponding to each block, for use in interpolation of each output pixel. This selection is performed by comparing test blocks of pixels, pointed to by each of the four motion vectors, and selecting the motion vector having the highest degree of correlation between the test blocks. In order that this process is less affected by the use of spatially subsampled input fields, pixel values within the test blocks are interpolation from surrounding pixels, as described below.

Figures 10 and 11 are schematic diagrams illustrating pixel inteipolation performed by the motion vector selector 230.

As mentioned above, the motion vector selector 230 receives local and global motion vectors from the motion vector reducer 220, the two subsampled input fields from which the motion vectors have been generated, and the control signal t from the time base changer 180.

For each pixel 700 in the current output field 705, the motion vector selector tests each of the four possible motion vectors for that pixel by comparing blocks of pixels pointed to by that motion vector in each of the preceding and following input fields 710, 720. The comparison is made by calculating the sum of absolute luminance differences between corresponding pixels in the two blocks, with a lower sum indicating a higher correlation between the blocks. However, because the input fields 710, 720 are subsampled by a factor of two in the horizontal direction and a factor of two in the vertical direction, alternate pixels are missing froni the blocks, so interpolation is used to reconstruct the missing pixels for the test performed during motion vector selection. In Figure 11, missing pixels to be interpolated are represented by Pl to P, and are generated from the existing pixels A, B, C, and D as follows: P1 = (A+B)/2; P2 = (A+C)/2; P3 = (A+B+C+D)/4; and so on.

Although an embodiment has been described in which one interlaced video format is converted to another interlaced video format, the apparatus is suitable for conversion to and from a number of interlaced and non interlaced film and video signal formats, including: an 1125/60 2:1 interlaced video signal; an 1125/30 1:1 non-interlaced video signal; a 1250/50 2:1 interlaced video signal; a 1250/25 1:1 non-interlaced video signal; a 525/60 2:1 interlaced video signal; a 525/30 1:1 non-interlaced video signal; a 625/50 2:1 interlaced video signal; a 625/25 1:1 non-interlaced video signal; and an 1125/24 3232 pull-down video signal.

In summary, therefore, a number of features of the present embodiment serve to reduce the processing requirements of standards conversion to and/or from high definition video signals, without significantly compromising the equality and resolution of the conversion process. These features are: 1. The interpolation of output fields from interlaced input fields. This removes the need for progressive scan conversion of the input fields.

2. The generation of motion vectors from spatially subsampled versions of the input fields, which reduces the processing overhead of block matching by, in this embodiment, a factor of four.

3. Interpolating 8000 correlation surfaces from the 2000 original correlation surfaces; 8000 motion vectors can then be generated and made available for subsequent se in interpolation, but the block matching process need only be performed 2000 times.

4. Performing minimum detection in each interpolated correlation surface to sub-pixel acc*:rac - this helps to alleviate the loss in resolution of the correlation surfaces caused by the subsampling process.

5. Pixel values are interpolated for use in test blocks during motion vector selection. Again. this helps to alleviate the loss in resolution of the correlation surfaces caused by the subsampling process.

The result of applying the above measures, in combination, is that (in the present embodiment) an apparatus can be constructed to convert between high definition video signal formats in real time.

Claims

1. . Motion compensated video signal processing apparatus comprising: a subs ampler for subsampling input images of an input digital video signal, to generate corresponding subsampled images; a block comparator for comparing blocks of pixels from a pair of the subsampled images, to generate a first plurality of original correlation surfaces, each comprising an array of correlation values representing correlation between the respective blocks of pixels; means for generating a second plurality of interpolated correlation surfaces by interpolation from the original correlation surfaces, the second plurality being greater than the first plurality; means for interpolating between correlation values in each interpolated correlation surface, to detect a point of maximum correlation in that interpolated correlation surface; means for generating a respective motion vector from each interpolated correlation surface, in dependence on the detected point of maximum correlation in that interpolated correlation surface; means for selecting motion vectors for use in interpolation of respective pixels of an output image of an output digital video signal, by comparing test blocks, pointed to by a motion vector under test, in the pair of subsampled images, each test block comprising pixels of the respective subsampled image and test values interpolated from those pixels; and a motion compensated interpolator for interpolating the output image from a pair of input images of the input digital video signal corresponding to the pair of subsampled images, according to the respective selected motion vectors.

2. Apparatus according to claim 1, in which the input digital video signal has a predetermined resolution, the apparatus comprising: means for receiving a digital video signal; and means for adding dummy pixel values to the received digital video signal to generate the input digital video signal, in the case when the received digital video signal has a lower resolution than the predetermined resolution.

3. Apparatus according to claim 2, in which the dummy pixel values are pixel values representing black pixels.

4. Apparatus according to any one of the preceding claims, comprising a down-converter for generating a second output digital video signal from the firstmentioned output digital video signal, the second output digital video signal having a lower resolution than the firstmentioned output digital video signal.

5. Apparatus according to any one of the preceding claims, comprising: a first time base changer for selecting pairs of input images for use in interpolation of a respective output image; and a second time base changer for selecting pairs of subsampled images, corresponding to the pairs of input images selected by the first time base changer, for use in the generation of a respective set of motion vectors indicative of image motion between that pair of subsampled images.

6. Apparatus according to claim 5, in which: the first time base changer comprises means for generating a control signal indicative of the temporal position of each output image with respect to the pair of input images selected for use in interpolation of that image; and the second time base changer comprises means for generating a control signal indicative of the temporal position of each output image with respect to the pair of subsampled images selected for use in the generation of motion vectors.

7. Apparatus according to any one of the preceding claims, in which the input digital video signal is a high resolution video signal.

8. Apparatus according to any one of the preceding claims, in which the input digital video signal is an interlaced video signal.

9. Apparatus according to any one of claims 1 to 6, in which the input digital video signal is selected frow the group consisting of: an 1125/60 2:1 interlaced video signal; an 1125/30 1:1 non-interlaced video signal; a 1250/50 2:1 interlaced video signal; a 1250/25 1:1 non-interlaced video signal; a 525/60 2:1 interlaced video signal; a 525/30 1:1 non-interlaced video signal; a 625/50 2:1 interlaced video signal; a 625/25 1:1 non-interlaced video signal; and an 1125/24 3232 pull-down video signal.

10. Apparatus according to any one of the preceding claims, in which the output digital video signal is a high resolution video signal.

11. Apparatus according to any one of the preceding claims, in which the input digital video signal is an interlaced video signal.

12. Apparatus according to any one of claims 1 to 9, in which the output digital video signal is selected from the group consisting of: an 1125/60 2:1 interlaced video signal; an 1125/30 1:1 non-interlaced video signal; a 1250/50 2:1 interlaced video signal; a 1250/25 1:1 non-interlaced video signal; a 525/60 2:1 interlaced video signal; a 525/30 1:1 non-interlaced video signal; a 625/50 2:1 interlaced video signal; a 625/25 1:1 non-interlaced video signal; and an 1125/24 3232 pull-down video signal.

13. Television standards conversion apparatus, comprising apparatus according to any one of the preceding claims.

14. Film standards conversion apparatus, comprising apparatus according to any one of the preceding claims.

15. Apparatus for converter between film and television standards, comprising apparatus according to any one of the preceding claims.

16. A method of motion compensated video signal processing, the method comprising the steps of: subsampling input images of an input digital video signal, to generate corresponding subsampled images; comparing blocks of pixels from a pair of the subsampled images, to generate a first plurality of original correlation surfaces, each comprising an array of correlation values representing correlation between the respective blocks of pixels; generating a second plurality of interpolated correlation surfaces by interpolation from the original correlation surfaces, the second plurality being greater than the first plurality; interpolating between correlation values in each interpolated correlation surface, to detect a point of maximum correlation in that interpolated correlation surface; generating a respective motion vector from each interpolated correlation surface, in dependence on the detected point of maximum correlation in that interpolated correlation surface; selecting motion vectors for use in interpolation of respective pixels of an output image of an output digital video signal, by comparing test blocks, pointed to by a motion vector under test, in the pair of subsampled images, each test block comprising pixels of the respective subsampled image and test values interpolated from those pixels; and interpolating the output image from a pair of input images of the input digital video signal corresponding to the pair of subsampled images, according to the respective selected motion vectors.

17 Motion conpensated video signal processing apparatus substantially as hereinbefore described with reference to the accompanying drawings.

18. A method of motion compensated video signal processing, the method being substantially as hereinbefore described with reference to the accompanying drawings.

19. Television standards conversion apparatus substantially as hereinbefore described with reference to the accompanying drawings.