WO2000079783A1

WO2000079783A1 - Electronic capture of moving images

Info

Publication number: WO2000079783A1
Application number: PCT/GB2000/002411
Authority: WO
Inventors: Roderick Snell; Martin Weston
Original assignee: Snell & Wilcox Limited
Priority date: 1999-06-22
Filing date: 2000-06-22
Publication date: 2000-12-28
Also published as: AU5550100A; GB2354656A; GB9914576D0

Abstract

A digital video camera has a conventional image sensor and an additional sensor that is scanned at a higher frequency and a lower spatial resolution. The output of the additional sensor is used to provide motion vectors that are more reliable or more accurate than those measured from the image signal alone. The motion vectors can be used inside the camera to process or compress the image signal and can also be made available in a signal which accompanies the image signal.

Description

ELECTRONIC CAPTURE OF MOVING IMAGES

This invention relates to the electronic capture of moving images and to the processing of moving image signals, both inside the camera and elsewhere.

It is well understood that many video signal processors rely upon or are enhanced by the use of motion estimation. This typically involves the comparison of two temporally spaced pictures and the identification of motion vectors which link corresponding regions in the respective pictures.

A typical and very important application of motion estimation is in predictive coding schemes, such as MPEG-2, where the transmitted bitstream largely consists of difference information relating to a motion compensated image prediction. It has already been appreciated that the quality of the motion vectors that are used in a predictive coding scheme, directly affects the quality of image that can be achieved at a given bit rate or the bit rate budget that is required to provide a given level of quality. It has been recognised that digital video cameras, in a wide range of applications from broadcast use to consumer camcorders, can usefully be provided with an "internal" compression encoder to avoid the need to take a full bandwidth digital video signal from the camera. The quality of a compression coding process conducted at the camera is of vital importance since it limits the quality of displays and all further processing conducted on the camera output. The location of the compression encoder at the camera does, however, usually impose size, power and cost constraints upon the design of the coding circuitry.

It is one object of the present invention to provide an improved digital video camera which affords high quality motion vectors to enable efficient compression coding without excessive size, power or cost requirement.

Accordingly, the present invention consists in one respect in a digital video camera having one or more image sensors scanned at a scanning frequency F to provide an image signal, characterised by the provision of an additional sensor scanned at a scanning frequency nF, where n is greater than 1 , the output of the additional sensor being employed in the derivation of motion information for use in processing the image signal.

Advantageously, the camera further comprises an image processor utilising said motion information to process the image signal, for example, to compress it.

The present invention consists in another respect in a method of capturing a moving image by sampling at a first spatial resolution and a first temporal frequency to produce an image signal, wherein sampling is additionally conducted at a second, lower spatial resolution and a second, higher temporal frequency, the additionally sampled information being employed in the derivation of motion information for use in processing the image signal.

The use of an additional sensor which is scanned at perhaps three times the image scanning frequency F (and preferably at least six times the scanning frequency F) provides through temporal over-sampling the ability to generate higher quality motion vectors, using known motion estimation techniques. It is commonplace in known motion estimation techniques to down-sample the image signal prior to the generation of motion vectors. It is therefore possible, in the present invention, to employ a sensor that, although having a higher temporal scanning frequency, has a lower spatial resolution. Since the complexity of a sensor generally increases as the product of the number of pixels and the characteristic frequency of the output image signal, the additional sensor which the present invention proposes can readily be provided using current sensor technology. That is to say, the reduced number of pixels that are required in the new sensor can be "traded" for the increased scanning frequency.

Many of the benefits of the present invention can be gained by simply applying known techniques for the deriving of motion vectors, to the temporally oversampled signal made available by the additional sensor; still further benefit can be achieved through novel motion processing.

One known motion estimation technique is block matching and is widely used in MPEG-2 and other compression coding schemes. A given region of one picture is compared with regions of a second picture, over a given search range, to identify the closest match. This approach has the benefit of simplicity. It has the disadvantage, however, that the motion vectors assigned to neighbouring pixels or other regions of the picture may have wide variations. Since it is common in a coding scheme, for motion vectors to be differentially encoded, this wide variation leads to coding inefficiencies. Proposals have been made for the refinement of vectors derived from a block matching technique, in order to reduce these variations. An alternative motion estimation technique is referred to as the gradient method. This predicts a motion vector from knowledge of the difference between corresponding pixels in the two pictures and the picture gradient in one of them. At normal frame rates this approach often suffers when motion vectors become too large. A third approach is that referred to as phase correlation. This involves a correlation process in Fourier transform space, after normalisation of the amplitude term. This is capable of producing accurate vectors which are insensitive to global amplitude changes. The vectors which are produced by the phase correlation process are, however, not localised in space and a separate vector assignment process is accordingly required.

In the present invention, motion estimation is conducted on a signal which is temporally oversampled and in which the maximum expected motion vector is considerably reduced in size. The preferred technique is then the gradient method. The present invention is able to take advantage of the computational simplicity and other advantages of gradient techniques, whilst obviating the main disadvantage of intolerance to large vectors.

The invention will now be described by way of example with reference to the accompanying drawings in which:-

Figure 1 is a block diagram of a prior art digital video camera including an internal MPEG-2 encoder;

Figure 2 is a block diagram of a prior art MPEG-2 encoder for use in the circuit show in Figure 1; Figure 3 is a block diagram of a digital video camera according to a first embodiment of the present invention;

Figure 4 is a block diagram of a digital video camera according to a second embodiment of the present invention; Figure 5 is a block diagram of an MPEG-2 encoder suitable for use in the circuit of Figure 3;

Figure 6 is a block diagram of part of a system according to a further embodiment of the present invention; and

Figure 7 is a block diagram of a digital video camera according to a third embodiment of the present invention.

Turning now to Figure 1 , a digital video camera can be recognised as broadly including an optical path (100) through which light passes to a sensor arrangement (102). This might include three sensors with appropriate filters to provide sensitivity, respectively, to red, green and blue. The sensor output passes through appropriate image processing (104) to provide a digital video output at (106). This can of course be in a variety of formats. In this arrangement, a YUV output from the image processing block (104) is also taken to an MPEG encoder (108) which provides an MPEG-2 bit-stream at (110). For the sake of completeness, there is shown in Figure 2 a conventional MPEG-2 encoder, forming block (108) of Figure 1. The video input is taken to a subtractor (200) and to a motion estimation block (202). The output of the subtractor (200) passes through a discrete cosine transformation (DCT) block (204) and a quantiser (206). The output of the quantiser (206) passes to a variable length coder (208) and also to a local decoder loop including an inverse quantiser (210) and an inverse DCT (212). The output of the inverse DCT passes through an adder (214), the output of which is made available through frame delay (216) to the motion estimation block (202) and to a motion compensation block (218). It is the output of this motion compensation block (218) that provides the second input to subtractor (200) and to adder (214). The motion vector output from the motion estimation block (202) is passed to the variable length coder (208). The motion vectors and the quantised transform coefficients are multiplexed to 00/79783 - 5._ - PCT/GBOO/02411

form the output MPEG-2 bit-stream.

Turning now to Figure 3, there is shown a novel arrangement according to the present invention. Here, the optical path (300) provides light not only to the image sensor arrangement (302) but also to a new sensor (304). The image sensor arrangement (302) can be the same as that employed in the conventional arrangement of Figure 1. In one example, it may have a scanning frequency of 50 Hz; 720 samples per active line and 625 lines. It will be well understood that this is simply one example and that other standard and high definition formats exist, with variations within those formats in the relative sampling structures of luminance and chrominance. As in Figure 1 , the output of the image sensor arrangement (302) passes through appropriate image processing (306) to provide a digital video output at (308). The output of the image processing block passes also to an MPEG-2 encoder (310) which provides an MPEG-2 output at (312). The additional sensor (304) has in this example a scanning frequency of 300 Hz. The number of lines is 312 and the number of samples per active line is 240. The temporally over-sampled and spatially under-sampled output of the sensor (304) passes to a motion estimator (314) which receives also a frame delayed input through frame delay (316). Motion vectors are derived at 50Hz and are made available to the MPEG-2 encoder (310). The motion vectors may optionally be available as a further camera output.

Since the motion estimation block (314) receives information which in the temporal dimension is six times over-sampled, it would be expected (without material change to the technique of motion estimation) to provide motion vectors of substantially higher accuracy or substantially greater reliability.

In the preferred form of the invention, the motion estimation is conducted using a gradient technique, for example the pel recursive method. This is especially suited to the present arrangement in which there is an excess of temporal information and spatial sub-sampling, such that the maximum expected motion vector is relatively small.

In another embodiment of the invention, motion vectors derived from the temporally over-sampled information are not used directly, but are used to assist or improve a motion measurement process operating on the "normal" video. Thus, as shown in Figure 4, a first optical sensor arrangement (401) provides a video input to a first motion estimator (402), which calculates motion vectors describing the motion between this video input and the video of the previous field obtained from a field store (403). The motion estimator (402) uses a method of motion measurement (such as the gradient method) in which initially estimated motion vectors are refined by processing the video. A second optical sensor arrangement (404) operates at a higher temporal sampling rate than the first and drives a second motion estimator (405) which computes motion vectors between the current field and a previous field taken from the field store (406). The second motion estimator (405) produces motion vectors (407) corresponding to the fields from the first optical sensor (401) which are used as initial values for refinement by the first motion estimator (402).

These initial motion vectors (407) need to correspond to the fields from the first optical sensor arrangement (401) and so a timing signal (408) indicative of the temporal sampling at this sensor is fed to the second motion estimator (405). Because this motion estimator is fed from the higher-rate optical sensor (404) it has data from more than two fields available to it for calculation of the motion vectors (407).

Two examples of ways in which these additional fields may be used will now be described.

The first option is to use the known technique of vector tracing. In this method a first vector field is calculated by comparing two successive fields; then the next two fields are compared and the vectors corresponding to the positions in the second field which are "pointed to" by the first vectors are added to the corresponding first vectors so as to obtain a vector field describing the motion between the first and third fields. This process can continue using as many fields are available until an output vector field (407) is required.

The second option is to compare the first two successive fields as in the first method, but to use the result as an initial estimate which is refined using a comparison between the first and third field; the result is then used as an initial value for the comparison of the first and fourth fields, and so on until an output vector field (407) is required. Further methods of utilising the results of motion estimation between higher rate fields as a basis for motion estimation between lower rate fields will occur to persons skilled in the art.

There is shown in Figure 5, a modified MPEG2 encoder for use in the present invention. This is similar to that shown in Figure 2, but it lacks a motion estimation block and, instead, utilises the motion vectors received from motion estimation block 314.

It will be recognised that the motion vectors can also be used for other image processes and there is shown therefore an additional signal line in

Figure 3, from the motion estimation block 314 to the image processing 306. Such processes might be additional to or in place of MPEG encoding. The motion vectors may be provided as an output from the camera, for use in downstream processing. Indeed, in one form of the invention, the motion vectors are not used in the camera.

Because the motion vector signals from the block (314) of Figure 3 have been derived at a higher sampling rate they will contain information which cannot be extracted from the video signal (308); it is therefore advantageous to carry them forward with the video for use in subsequent processing. They could be carried as an ancillary signal in the blanking of the video or as a separate signal associated with the video. In order to make them available at all subsequent processing stages they should be recorded along with the video. Similar arguments apply to the motion vectors from block (402) of Figure 4.

Examples of such processing are shown in Figure 6. It must be understood that in a system in accordance with this aspect of the invention part of the bandwidth and storage capacity must be allocated to motion vectors; however this "price" gives the benefit of improved, more transparent or easier-to-implement processing. The recorder (601) supplies video and motion vectors to the editing process (602). The vectors may be used in the editing process to assist in generating key signals or in interpolation for special effects. The edited video and vectors are recorded again (603) and used to supply the motion- compensated standards converter (604). The converter may convert the vectors so as to relate them to the output field rate.

The recorder (603) also drives the compression pre-processor (605) which carries out motion-compensated noise reduction, and outputs video and motion vectors to the compression coder (606). This compression encoder may, for example, take the form shown in Figure 5. The compressed video and motion vector signal is recorded again (607).

The use of a separate sensor is only one example of how additional sampling can be conducted at a lower spatial resolution and a higher temporal frequency than that used to generate the image signal. To achieve a high temporal scanning rate, a number of separate sensors may be used, and their outputs multiplexed. A compound sensor array could be used, with a subset of the pixels being sampled at a higher frequency than the remainder. Still further alternatives will of course exist.

In a further embodiment of the present invention, the conventional image sensor and the "motion" sensor according to the present invention are supplemented by a third sensor which provides depth information.

Thus as shown in Figure 7, the image sensor (702) and the motion sensor (704) which, together with the optical path (700), are common with the embodiment of Figure 3, are supplemented with a depth sensor (706). This sensor may take the known form of an ultra-sonic sensor that together with the depth measurement block (708) provides in known manner depth vectors or a depth map corresponding to the image.

The motion estimation block (710) which, as in the embodiment of Figure 3, receives adjacent fields from the motion sensor (704) and the delay (712), also receives depth vectors from the depth measurement block (708). These depth vectors are used to refine the assignment of motion vectors.

Similarly, the image processing block (714) receives not only the image signal from image sensor (702) and the motion vectors from motion estimation block (710), but also the depth vectors from the depth measurement block (708). The depth vectors can be used in a variety of ways; for example they can be used to assist segmentation of the image.

The depth vectors are made available as an output from the camera, along with the video, the motion vectors and the output from the image processing block (714).

This invention has been described by way of examples only and a wide variety of further modifications are possible without departing from the scope of the invention.

Claims

1. A digital video camera having one or more image sensors scanned at a scanning frequency F to provide an image signal, characterised by the provision of an additional sensor scanned at a scanning frequency nF, where n is greater than 1 , the output of the additional sensor being employed in the derivation of motion information for use in processing the image signal.

2. A camera according to Claim 1 , further comprising an image processor, utilising said motion information to process the image signal.

3. A camera according to Claim 1 , further comprising a first motion measurement unit operating on the output of the additional sensor to derive first motion vectors.

4. A camera according to Claim 3, further comprising a second motion measurement unit which receives said first motion vectors and operates on the output of the or each image sensors to derive second motion vectors, the operation of said second motion measurement unit being assisted by said first motion vectors.

5. A camera according to Claim 1 , further comprising a compression encoder, utilising said motion information to compress the image signal.

6. A camera according to Claim 1 , wherein the image signal and the motion information are made available at separate outputs of the camera.

7. A camera according to Claim 1 , further comprising a depth sensor.

8. A method of capturing a moving image by sampling at a first spatial resolution and a first temporal frequency to produce an image signal, wherein sampling is additionally conducted at a second, lower spatial resolution and a second, higher temporal frequency, the additionally sampled information being employed in the derivation of motion information for use in processing the image signal.

9. A method according to Claim 8, wherein said motion information is used to process the image signal.

10. A method according to Claim 8, wherein said motion information is used to compress the image signal.

11. A method according to Claim 8, wherein an output from a motion estimator operating on the additionally sampled information is used as a basis for the calculation of motion vectors by a motion estimator operating on the image signal.

12. A method according to Claim 8, wherein said motion information is made available in a companion signal to said image signal.

13. A television signal chain comprising a digital video camera having one or more image sensors scanned at a scanning frequency F to provide an image signal, and an additional sensor scanned at a scanning frequency nF, where n is greater than 1 , the output of the additional sensor being employed in the derivation of motion information with said motion information and said image signal being made available as outputs from the camera and a downstream video processor receiving motion information and said image signal and utilising the motion information for use in processing the image signal.

14. A television signal chain according to Claim 13, wherein the digital video camera further comprises a depth sensor making available depth information and wherein the downstream video processor receives motion information, depth information and said image signal and utilising the motion and depth information for use in processing the image signal.