WO2001074082A1

WO2001074082A1 - Temporal interpolation of interlaced or progressive video images

Info

Publication number: WO2001074082A1
Application number: PCT/US2001/008789
Authority: WO
Inventors: Barry Kahn
Original assignee: Teranex, Inc.
Priority date: 2000-03-27
Filing date: 2001-03-20
Publication date: 2001-10-04
Also published as: AU2001247574A1

Abstract

The present invention is directed to improving the accuracy of interpolated images to convert interlaced images into non-interlaced progressive images, and to convert the frame rate of either progressive non-interlaced images or interlaced fields (Fig. 7). The method comprises the steps of comparing first information obtained from pixels used to represent a first image with second information obtained from pixels used to represent a second image; extracting a motion vector from image motion among the at least two images; producing a measure of confidence of accuracy with which the motion vector is generated; synthesizing a first synthesized image using the motion vector; synthesizing a second synthesized image using at least one of the first information and the second information; and interpolating an image between the first image and the second image by combining the first synthesized image with the second synthesized image using the measure of confidence as a weighting factor.

Description

TEMPORAL INTERPOLATION OF INTERLACED OR PROGRESSIVE

VIDEO IMAGES

BACKGROUND OF THE INVENTION Field of The Invention: The present invention is generally related to image processing, and more particularly, to interpolating video images from interlaced or progressive video images. Background Information:

One aspect of image processing involves interpolating synthetic images to allow for conversion from one frame rate to another. Such techniques are applied to both progressive and interlaced image sequences (e.g., progressive or interlaced television images). In a typical video sequence (e.g. NTSC and PAL) an image is portrayed by alternately displaying the odd and even lines (also referred to as "fields"). The time between the display of these fields is sufficiently short (1/60 second for NTSC, 1/50 second for PAL) that the fields appear as a complete interlaced image. Since there is a time between when the fields are displayed, there is both a temporal and spatial nonalignment of the odd and even lines. In contrast progressive images contain both odd and even fields that were captured at the same instant in time. Motion pictures (e.g. film or video) are composed of image or frame sequences that represent samples taken at regular intervals in time. If the frames are sampled at a sufficiently high rate, the appearance of smooth motion is achieved. Common sampling rates include 24 frames per second for film, 60 fields per second for NTSC standard video in the United States and Canada, and 50 fields per second for PAL standard video in Europe and elsewhere. To convert a motion picture sequence to a different sampling rate, new frames must be created which appear to be intermediate in time between frames sampled at the source frame rate; this process is called temporal interpolation.

The process of temporal interpolation is one of predicting the contents of an image frame that is temporally between available image frames. Where objects are in motion within the sequence of image frames, the interpolated frame must position those objects spatially between the object positions in the surrounding available image frames. In order to do this, a process of motion estimation is performed to determine the motion between available frames. The estimated motion is represented by a spatial offset known as a motion vector. Depending on the motion estimation technique employed, motion vectors may be computed with respect to an entire image frame (one vector per frame), with respect to pixel blocks of varying sizes (e.g. 16x16, 8x8, etc.), or with respect to individual pixels (one vector per pixel). The vector(s) are scaled in accordance with the proportion of the temporal offset of the interpolated frame with respect to the surrounding available frames. The vector (s) are applied to one or both of the surrounding frames in a process known as motion compensation to generate the interpolated image frame.

The motion of an image object is not always at a constant rate, therefore a motion vector generated by linear interpolation may not be strictly accurate. However, if a set of motion vectors is determined for each frame in sequence, the sequence of sets of motion vectors can be considered a piecewise linear approximation of nonlinear motion over the sequence. Given that a set of motion vectors represents a linear function mapping a frame from time tø to ti , then the function can be scaled linearly to create a frame at any time between tg to ti . In practice, the process of temporal interpolation by motion compensation is more complex than a simple mapping. Several factors contribute to this complexity, including ambiguities in the motion estimation process and the covering and uncovering from one frame to the next due to objects in motion. With regard to interlaced images, an additional problem exists due to the spatial and temporal offset which exists from one image to the next. For example, in a single video frame constituted by two interlaced fields of information separated in space (e.g., by one line) and separated in time (e.g., by one half of a frame time), one field includes the odd numbered scan lines of an image, while the other includes the spatially offset even numbered scan lines. This complicates the task of motion estimation, since the spatial offset between successive fields introduces uncertainty into the process of comparing objects to determine their relative positions. The generation of progressive frames from interlaced fields involves the temporal interpolation of fields which are opposite in even/odd polarity to the available fields. Progressive frames are formed by joining the interpolated fields with the available fields. A conversion of interlaced fields to progressive frames at a different frame rate may also be achieved by generating and joining both odd and even fields by temporal interpolation.

Accordingly, it would be desirable to provide a method of temporal interpolation which avoids the inaccuracies found in existing techniques of motion compensation so that more reliable image conversion can be achieved.

SUMMARY OF THE INVENTION The present invention is directed to improving the accuracy with which video images are interpolated to convert interlaced video images into noninterlaced progressive video images (i.e. , de-interlacing), and to alter the frame rate of either progressive non-interlaced video images or interlaced video fields (i.e., frame rate conversion). Exemplary embodiments are directed to a method and apparatus for synthesizing an image using at least two images in a sequence of video images, such as two non-interlaced progressive video images or two interlaced images. In accordance with exemplary embodiments, the method comprises the steps of comparing first information obtained from pixels used to represent a first image with second information obtained from pixels used to represent a second image; extracting a motion vector from image motion among the at least two images; producing a measure of confidence of accuracy with which the motion vector is generated; synthesizing a first synthesized image using the motion vector; synthesizing a second synthesized image using at least one of the first information and the second information; and interpolating an image between the first image and the second image by combining the first synthesized image with the second synthesized image using the measure of confidence as a weighting factor.

BRIEF DESCRIPTION OF THE DRAWINGS

Other objects and advantages of the present invention will become more apparent to those skilled in the art upon reading the detailed description of the preferred embodiments, wherein like elements have been designated by like numerals, and wherein:

Figure 1 is a block diagram of an exemplary apparatus which uses motion estimation and motion compensation for frame interpolation in a progressive frame sequence according to the present invention;

Figure 2 is a block diagram of an exemplary temporal interpolation process according to the present invention;

Figures 3 A and 3B show two different exemplary correlation surfaces;

Figure 4 illustrates an example of a single motion vector;

Figure 5 illustrates an exemplary use of motion vectors to create a motion compensation frame; Figure 6 illustrates an exemplary process of temporal interpolation from 60 frames per second to 50 frames per second, in accordance with the present invention; Figure 7 illustrates motion estimation and motion compensation for interpolation in an interlaced field sequence;

Figure 8 illustrates use of motion vectors with interlaced images to create a motion compensated frame.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

1. Temporal Interpolation For Frame Rate Conversion a. Frame Rate Conversion Of Non-interlaced Images

( 1. ) Motion Estimation

Figures 1 and 2 show portions of an exemplary frame rate conversion apparatus 100, configured as a functional block diagram, for frame interpolation in a progressive frame sequence. In the following discussion, "P" denotes progressive frames, "F" denotes interlaced fields, and synthesized frames or fields are identified by the symbol P or F , respectively.

Motion estimation is the first part of the temporal interpolation process. In Figure 1, at least two images, such as two sequential frames of non-interlaced image data in a video signal, or any two frames, are processed to detect image motion. The two frames are sampled at a sample rate and are labeled P_t and P_t+1 For purposes of calculating a motion vector, one of the two frames (e.g., P_t ) is identified as the frame of interest, and the other frame (e.g. , P_t+I ) is the next sequential image frame that is used to identify motion with respect to the image of interest. Those skilled in the art will appreciate that in this case (i.e., identifying motion from P_t to P_t+1, when P_t and P_t+1 are images in a temporally ordered sequence) motion will be determined in a forward direction from P_t to P_t+1 However, motion can alternately be assessed in a backward direction from P_t+1 to P_t in which case V_t+1 is the image of interest. In the following discussion, P_t will be considered the frame of interest for simplicity. The two video frames P_t and P_t+1 are supplied to a motion estimation unit 102 to produce a motion vector and associated confidence metric for each pixel of either P_t or P_t+1 . The motion estimation can be performed according to that described in U.S. Patent No. 5,016,102, the contents of which are hereby incorporated by reference in its entirety. Alternately, the motion estimation unit can be configured in accordance with a motion estimator as described in commonly assigned, co-pending U.S. Application Serial No. (Attorney

Docket No. 032797-062), entitled "PROCESSING SEQUENTIAL VIDEO IMAGES TO DETECT IMAGE MOTION AMONG INTERLACED OR PROGRESSIVE VIDEO IMAGES" , filed on even date herewith, the contents of which are incorporated herein by reference in their entirety.

In the copending application, the motion estimation unit 102 is described as producing a correlation surface for each pixel in the frame of interest, from which correlation data (Cl_xγ, C2_XY, C_χγ ) and motion vector (V_xγ) are extracted for each pixel. The correlation data is used to produce a confidence metric M(x,y) for each pixel. The frame correlation uses frames P_t and P_t+1 that are temporally separated by one frame, although any desired temporal separation can be used. Each motion vector is determined by spatially correlating a pixel in P_t with a pixel in P, ₊₁. Correlation is performed by comparing a region of pixels that surround and include a target pixel in the frame P, with spatially shifted regions in the other frame P_{r +;}. The confidence metric is a measure of correlation confidence (i.e. , measure of correlation accuracy quantified, for example, as a value from 0 to 1) of the motion vector. The confidence is variable because some portions of the source frame may not be visible in the adjacent frame or the correlation is ambiguous. In the motion estimation unit, frame correlation is performed by defining a search area within the frame P_(+;. For example, a search area ±Sx, ±Sy is defined with respect to the frame V_t+1 for a given pixel in the reference frame P_r A block is defined which extends over image pixels from -Bx to +Bx and from - By to +By. The block center is located within the search area and used to calculate a correlation point C_xγ on a correlation surface. This process is performed by repeatedly moving the block center to a new location within the search area to generate a set of correlation points (i.e., one correlation point is calculated for each location of the block). This process is repeated for each possible location of the block center within the search area which extends from -Sx to +Sx and from -Sy to +Sy.

The set of correlation points is mapped into a correlation surface for the target pixel. The correlation surface will correspond in size to the search area

±Sx, ±Sy. The progressive frame correlation process by which each correlation point of the correlation surface for a given target pixel is calculated, is defined by:

Alternately, where the motion estimation is performed in the direction from P_t+1 to P_t (with P_t+1 being the frame of interest), the progressive frame correlation process is defined by :

In these equations, "i" and "j" ^are integers incremented in accordance with the two summations shown. The values "x" and "y" account for spatial and temporal offsets of the pixels in the second image P_t+1 with respect to the first image P_t. Thus, for each pixel in the frame P_r , a correlation surface is produced which comprises a SAD (sum of the absolute difference) for each location of the block center within the search area. Each SAD represents a correlation point C_XJ on the correlation surface, the SAD being recomputed each time the block is moved within the search region. The mapping of all SADs for a given search area constitutes the correlation surface for a given pixel.

Because each SAD provides one correlation point C_xγ on the correlation surface, the correlation surface is a two-dimensional array wherein each point is ^• mapped to a pixel location in the search area of the frame P_f+7. Using the correlation surface, the pixel location to which image data of a given pixel in frame P, has moved in frame P,_w can be determined. The lower the SAD associated with a given point on the correlation surface, the better the correlation.

Those skilled in the art will appreciate that any block size suitable for a particular application and computation capability of the system can be used to generate a correlation surface for a given pixel, from which a motion vector for the given pixel can be derived. The correlation block size (+ Bx pixels horizontally by + By pixels vertically) can be set to a size large enough such that the SAD is statistically valid, yet small enough to be responsive to movement of small structures. In accordance with exemplary embodiments of the present invention, the motion estimation unit can be implemented using a parallel processor as described in commonly assigned, copending U.S. Application Serial No. 09/057,482 entitled MESH CONNECTED COMPUTER, the disclosure of which is hereby incorporated by reference in its entirety.

Two examples of correlation surfaces are shown in Figures 3 A and 3B. In Figure 3A, there is a predominant "well" 302 representing a strong correlation surrounded by poor correlations. This correlation surface, labeled 304, provides a good indication that image data associated with the target pixel of interest in frame P_f has moved to the pixel location in frame V_t+1 which corresponds to the location of the "well" . In Figure 3B, there is a "valley" 306 in the correlation surface 308 which is indicative of ambiguous correlations along a linear structure.

The correlation surface determined for a given pixel can be analyzed to extract the best (Cl_xγ) and second-best (C2_XY) correlation points in frame P_r+7 of the pixel of interest in frame P_r . That is, these points represent the best match of image data to that of the given pixel in frame P_? for which the correlation surface was produced. The motion vector Vl_xγ associated with Cl_χy (i.e. , for a given (x,y) pixel coordinate) is selected as the most likely candidate for specifying the direction and the magnitude of motion the image data of the given pixel in frame P, underwent between frames P, and P₍₊ . That is, the best correlation value Cl_xy is the minimum value within the correlation surface for the given pixel and is used to extract a motion vector which represents the motion of the pixel's image data between frames P_r and V_t+1. The value Cl_χy is defined by:

Cl_xγ= m \C_XY(x,y)l The geometry for computing a motion vector V of a small correlation surface using the correlation data Cl is illustrated in Figure 4. Figure 4 illustrates a motion vector 402 associated with two images 404 and 406 (represented as the two frames P, and P,_+;). The motion vector corresponds to the distance and direction which image data associated with a given pixel 408 has moved in transitioning from the pixel location in frame P_r to the pixel location in frame P_t+1. The position of Cl_xy on the correlation surface associated with pixel 408 implies the motion that the image data associated with the pixel of interest has undergone. The motion vector associated with that correlation is:

Only the motion vector associated with the best correlation (for each pixel in the image) is retained for subsequent filtering.

The second-best correlation value C2_XY the average correction value C_χγ for a given pixel are provided to enable the computation of the correlation confidence metric. The second-best correlation value is C2_γy, the next ranked minimum located beyond a predetermined distance (e.g., a specified radius (β)) from the best value Cl_xy for the given pixel. The use of a minimum radius increases the likelihood that the second-best correlation is not a false second best correlation point associated with the best correlation point The average correlation value (C_χγ) for the surface is computed as follows: s_x s_r

Σ Σ ^cχγ(^χ>y⁾ x -S_x y =-S_γ

C =

(2S +1)(2S„+1)

The foregoing process of determining correlation data Cl_χy, C2_xy and C_χγ is repeated for each pixel in frame P_p so that a motion vector and associated correlation data can be determined for every pixel.

After generating motion vectors and confidence metrics for each pixel of frame P_f ,the motion vectors can be optionally filtered in unit 102. The filter of the motion estimation unit 102, can be configured in accordance with any known motion vector filtering algorithm including, but not limited to, those described in U.S. Patent No. 5,016,102. Alternately the filtering can be performed according to that of the aforementioned U.S. Application. That is, the motion vectors can processed to identify and replace anomalous vectors. Only those vectors deemed "bad" are replaced with "filtered" vectors, the remaining vectors being left unchanged. Although any known filtering technique can be used, in one exemplary embodiment, a vector can be flagged as bad if there are not at least two adjacent pixels in the frame P_r with the same motion vector components (xl_χγ, yl^) as the center position. If the vector is flagged as bad it can be replaced with a filtered vector. A filtered motion vector output of unit 102 is labeled "V(x,y)^π.

In addition to generating filtered motion vectors, exemplary embodiments generate a confidence metric for each motion vector as a measure of the accuracy with which the motion vector for the pixel has been generated. Where the motion estimation unit 102 is configured in accordance with the aforementioned copending U.S. application, a confidence metric computation uses the best correlation point

Cl, the second best correlation point C2, and the average correlation value C for computing the confidence metric of a given pixel.

Two confidence metrics are defined which are indicative of the accuracy of the best motion vector V_χγ. The absolute confidence metric M_xγ ) computes a ratio of the best correlation value with respect to the average correlation value of the surface. This confidence metric quantifies the correlation "strength" and is defined as:

_%,ABS ^C^XY

^Mχγ ^{= 1} -^—

^XY

The relative confidence metric (M_xr ) computes a ratio of the difference

between the correlation values of the best and second-best correlation points with respect to (1-Cl_xy). This confidence metric which is a function of the difference between the correlation values C2 and Cl, quantifies the correlation "ambiguity" and is defined as:

These can be further combined into a single confidence metric by, for example, a simple multiplication:

^Mχγ^=Mχγ ^S * M , REL XY

where M_xy can, for example, be within a range of 0 and 1. (2.) Motion Compensation For Temporal Interpolation Using the information generated in the Figure 1 motion estimation unit

102, a high quality synthetic frame can be generated for a temporal offset Δt= t_x of the synthetic frame with respect to the frames P_r and P_t+i using a vector scaling unit 104 and a motion compensation unit 106.

A motion compensated frame can be interpolated at a time t,. between two frames P_t and P_t+1 by first determining an interpolated frame _β ^{M F} using the tx forward motion compensation function:

where "MCF" denotes forward motion compensation; t < ^x < t+1 ^{~ x ~} ; and Y_x, Y_y are the x and y components of the motion vector.

An exemplary use of motion vectors to create a motion compensated frame is shown in Figure 5 with respect to pixel image information which transitions from a location of pixel 502 to pixel 504 over a frame pitch (temporal offset between successive frames) T_f of Δt= 1. Figure 5 also illustrates interpolated frames 506, 508 and 510 formed with temporal offsets of 50% (Δt=0.5), 20% (Δt=0.2) and 80% (Δt=0.8), respectively. Optionally, a backward motion compensation function can be computed in the Figure 1 motion compensation unit 106:

where "MCB" denotes backward motion compensation. The forward and backward motion compensated frames can then be combined in motion compensation unit 106 to create a first synthesized image as an interpolated, motion compensated frame _tt (x,y) using a blending function:

* ' (1 - Δt) Δt

The foregoing is one of many blending possibilities.

In practice the motion vectors are not equally valid and the above function will generate a number of anomalous pixels. To prevent the synthesis of erroneous pixels, the motion vector confidence metric can be used as a weighting value in the generation of motion compensated pixels. That is, the confidence metric can be used as a weighting factor in combining the motion compensated frame P^ with a frame synthesizing an alternate technique.

Figure 2 uses the inputs and outputs of the Figure 1 portion of a frame rate conversion apparatus 100 to produce an alternate synthesized frame that is combined with the motion compensated frame P_tχ . For example, in addition to the motion compensated frame P_tχ , a second synthesized image can be generated as a frame _tt . This frame can be generated using an alternate technique, such as simple temporal interpolation (TI), implemented using an interpolated frame unit 202 of Figure 2, where:

^ ) =P_t(x,y)-(l "At) +P_t ,_j(^χ,y)At

A final temporally interpolated frame P_tχ can then be generated in a quality metric blending unit 204 from the motion compensated frame and the simply interpolated frame, using the confidence metric M(x,y) as a weighting function:

P_u (x,y) = P^ (x,y)-M(x,y)+ P^ (x, _y (l ~M(x,y)) Figure 6 shows an exemplary 60 fps sequence which has been converted to an interpolated 50 fps sequence using the frame rate converter of Figures 1 and 2. The first and second synthesized images are not limited to being synthesized in the manner described above. For example, the first synthesized image P_tχ can be generated with any combination of forward or backward motion estimation, and can be generated using only forward motion compensation or using only backward motion compensation. The second image can be generated using any alternate synthesizing technique. b. Frame Rate Conversion Of Interlaced Images ( 1. ) Motion Estimation

Frame rate conversion with interlaced frames can alternately, or additionally be performed by the frame rate conversion apparatus of Figures 1 and 2 by synthesizing an even field at a time "tx" and by synthesizing an odd field temporally aligned with the synthesized even field. Motion vectors between even fields and motion vectors between odd fields can be produced in the same manner that motion vectors are produced for non-interlaced images. That is, assuming that the sequence of images includes interlaced even and odd fields, two consecutive even fields can be analyzed in exactly the same manner described with respect to two consecutive non-interlaced frames to calculate motion vectors and confidence metrics for each pixel. Similarly, two consecutive odd fields can be analyzed to determine motion vectors and confidence metrics. (2.) Motion Compensation Having defined motion vectors between two consecutive even fields, and/or motion vectors between two consecutive odd fields, motion compensation can be used to produce a synthesized image at any location between the even and odd fields using the motion vectors between the even fields and/or the odd fields.

For example, a synthesized even field can be produced by first generating a forward motion compensated field with the function:

F ^CF (x, y) = F_t__x (x + Vx - At, y + Vy At)

and then generating a backward motion compensated field with the function:

F_l ^CB (x,y) = F_M(x - Vx - (l - At), y - Vy (l - At)

where Δt is any fraction. The two fields can then be blended:

The motion compensated field can then blended with an alternatively generated

field F ^lxTI using the motion vector confidence metric M to create the synthesized

(i.e., temporally interpolated) even field:

F_tx{x,y) =

M{x,y)+ F ^! {x,y)- {l -M{x,y))

This process can then be repeated to synthesize an odd field temporally aligned with the synthesized even field.

Referring to Figure 7, the preceding process generates a synthesized even t -t-1 field 706 at the time t_x where t < t_x < t+1, with Δt= at the same phase

2 (e.g., even field) as the fields used for motion estimation: i.e., F_t and F_t+1, represented as fields 702 and 704 in Figure 7. By repeating the process with respect to two sequential odd fields, an opposite phase field (e.g., odd field 708) can be generated at the same time t_x by performing the motion estimation on fields F_t and F_t+2 (labeled 710 and 712) with At=^— . The two fields collectively

2 constitute a progressive frame 714 at time t_x as illustrated in Fig 7. The progressive frame can then be further processed using standard image processing techniques such as spatial filtering and resampling for conversion between NTSC and PAL video standards for example.

Thus, an interlaced sequence of images can be converted to a noninterlaced sequence of any desired frame rate. By multiplying the motion vectors by any integer or fraction, any desired temporal and spatial shift of the image information included in the pixels of the reference fields or frame, can be synthesized.

2. Temporal Interpolation For De-interlacing

Figure 8 shows an apparatus 800 for de-interlacing interlaced video images, such as sequential interlaced fields of a television video signal. Of course, a process as described above can be used to synthesize a missing even or odd field. The Figure 8 apparatus can also be used when it is desired to de-interlace a sequence of images without changing the frame rate of those images. In other words, Δt will always be 1/2. The Figure 8 motion estimation unit 802 processes three source fields labeled F_t_, , F_t and F_t+1 representing three consecutive source fields (e.g., two fields of even numbered scan lines of a video frame and one field of odd numbered scan lines of a video frame, or vice verse). As with the frame rate conversion described above, image motion among interlaced video images can be detected in any known fashion. Where consecutive fields provide spatially interlaced pixels, the aforementioned copending U.S. application describes using two different correlation techniques to produce two different correlation surfaces for each pixel in a field of interest: an interframe correlation implemented using an interframe correlation unit, and an intraframe correlation implemented using an intraframe correlation unit. The resultant sets of correlation surfaces are then combined to extract a motion vector and confidence metric for each pixel of the reference field.

The interframe correlation involves two spatially aligned fields which are temporally one frame apart, and therefore spatially aligned (e.g., two successive even fields, or two successive odd fields). The intraframe correlation involves two fields which are temporally one field apart, and therefore spatially nonaligned (e.g., two successive, spatially nonaligned fields, such as an even field, and a successive odd field).

Because the intraframe correlation is performed using two spatially nonaligned fields (e.g. , one which includes the even numbered scan lines of the video frame, and another which includes the odd numbered scan lines of the video frame), a vertical interpolation is performed in one of the two fields using a vertical interpolation unit 801. The vertical interpolation unit spatially aligns the scan lines of the two fields, thereby permitting a correlation between the two fields in the intraframe correlation unit.

The vertical interpolation of, for example, the reference field F_r can be performed by simply averaging the pixel values immediately above and below a pixel of interest. Of course, any desired interpolation technique, vertical or otherwise, can be used to fill in pixels of the scan lines needed to establish spatial alignment with the other field or fields to be used in the correlation process. The vertically interpolated (VI) field is designated F_t .

After calculating the interpolated field, motion estimation can be performed in motion estimation unit 802 using a method and apparatus as described in the copending application. The motion estimation unit performs the intraframe correlation and the interframe correlation. For each pixel of the frame F the interframe correlation and the intraframe correlation produce a correlation surface C⁷ and C, respectively. The correlation surfaces of the interframe correlation and the intraframe correlation are combined for each pixel into a composite correlation surface. Using the composite correlation surface for a given pixel, an unfiltered motion vector VI and associated confidence metric are extracted using correlation data Cl, C2, C for that pixel of the reference field F_f. This process is repeated until an unfiltered motion vector and associated confidence metric have been produced for all pixels. The unfiltered motion vectors for all pixels of the reference frame are supplied to a filter to produce filtered motion vectors in a manner already described with respect to progressive images.

Correlation data extracted from the correlation surfaces are used by the motion estimation unit 802 to produce a confidence metric M for each pixel in a manner already described with respect to progressive images. Thus, when processing interlaced images, the Figure 8 motion estimation unit 802 outputs a filtered motion vector V(x,y) and a motion vector confidence metric M_xy for every pixel X,Y in the reference field F_f . a. Intraframe Correlation The intraframe correlation utilizes fields F_tΛ and F_t temporally spaced by one field to produce a first correlation surface for each pixel of reference frame F_r In an exemplary embodiment, the intraframe correlation process is implemented using pixel correspondence based on a search area ±Sx, +Sy and a block ±Bx, +By movable within the search area as was described with respect to the pixel correspondence approach to correlating progressive frames. The intraframe correlation results in a correlation point for each location of the block center within the search area. Each correlation point is determined as follows:

The mapping of all correlation points for each location of the block center in the search area constitutes a correlation surface C for a given pixel in the reference field F_t . Each correlation point on the correlation surface C represents a pixel location in the search area of the field F ,._;. This process is repeated for all pixels in the field of interest. b. Interframe Correlation The interframe correlation utilizes spatially aligned fields F_tΛ and F_t+l that are temporally separated by two fields (i.e., one frame with a temporal separation of 2Δt). As with the intraframe correlation, the interframe correlation process is implemented using pixel correspondence based on a search area ±Sx, ±Sy and a block ±Bx, ±By movable within the search area, as was described with respect to the pixel correspondence approach to correlating progressive frames. The intraframe correlation results in a correlation point for each location of the block center within the search area. Each correlation point is determined as follows:

The mapping of all correlation points for each location of the block in the search area constitutes a correlation surface C¹ for a given pixel in the reference field F_t. Each correlation point on the correlation surface C^r represents a pixel location in the search area of the field F_r_₇. Thus, this process is identical to intraframe correlation with the exception that F_tΛ is shifted by twice the vector magnitude (i.e. 2x and 2y) due to the two field temporal separation.

The correlation surface of the intraframe correlation implies the motion vector over a one Δt time increment. A two-pixel shift between fields with a 2 Δt separation has the same rate (assuming constant velocity) as a one-pixel shift between fields with a Δt separation. Thus, the image motion implied by the correlation surface C¹ has been normalized to the same rate (pixels per Δt) as the correlation surface C such that these surfaces can be composited. The composite surface can be used to extract the correlation data Cl, C2, C , and derive motion vector information for each pixel of the interlaced reference field F_r c. Correlation Compositing The outputs of the interframe and intraframe correlation are their respective correlation surfaces. The two surfaces C_xγ and C_χγ for every pixel in the

reference image can be composited as follows:

where /is a function such as simple summation, weighted summation, or multiplication, or combination thereof.

Combining the intraframe and interframe correlation takes advantage of each of their strengths. Intraframe correlation uses two inputs that are temporally one field apart, thus minimizing the effect of acceleration of objects within the image. Interframe correlation uses two unmodified (non-interpolated) inputs, which provides a more accurate correlation.

The results of the correlation compositing for each pixel of the reference field F_t can be used to extract correlation data that, in turn, is used to derive a motion vector and confidence metric for each pixel of the reference field in a manner as described with respect to processing of progressive images. In addition, filtering of the motion vectors can be performed in a manner as described with respect to that of the Figure 1 motion estimator. d. Temporal Interpolation For Motion Compensated De-Interlace

The Figure 8 system includes a vector scaling unit 804 and a motion compensation unit 806 to perform temporal interpolation for the de-interlace process. A progressive (non-interlaced) motion picture sequence can be created from an interlaced sequence by synthesizing fields synchronized in time but opposite in phase for each existing field. The process is similar to progressive temporal interpolation: two motion compensated fields are generated.

Using the system of Fig 8 to convert an interlaced sequence of images into a non-interlaced sequence, the previous field F_t is motion compensated in motion compensation unit 806 using a temporal offset Δt of 1/2 from vector scaling unit 804 to interpolate the forward motion compensated field:

The following field F_t+1 is motion compensated in motion compensation unit 806 using a temporal offset Δt of 1/2 to interpolate the backward motion compensated field:

These two interpolated fields are blended equally in motion compensation unit 806 to generate the motion compensated field:

Because the motion compensated pixels are not all of equal confidence, the motion compensated field can be blended with the vertically interpolated field in a quality metric blending unit, such as the blending unit 204 of Figure 2 using the confidence metric M and the function:

F, ^c (x,y) = F_t ^MC{x,y)-M(x,y)+ F_t ^VI (x,yy (l -M(x,y)) where the quality blending metric unit has been configured to receive F_t , and

F_t from Figure 2, rather than the frame inputs from Figure 1. Note that the

above function is basically identical to that used to produce temporally interpolated frame P_tχ (x,y), with the exception that a vertically interpolated image F_t is used

as the second image, rather than the temporally interpolated image P^ used previously. However, those skilled in the art will appreciate that a temporally interpolated image, generated in a manner as described with respect to P_tχ , but

using the image fields F (e.g. , fields F_t and F_t+1) as opposed to image frames P, could be used in place of F_t . In addition, instead of using a 1-dimensional

vertical interpolation filter followed by a 1-dimensional temporal interpolation filter, a single 2-dimensional vertical-temporal filter can be employed using at least two of the fields, (including F_t).

It will be appreciated by those skilled in the art that the present invention can be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The presently disclosed embodiments are therefore considered in all respects to be illustrative and not restricted. The scope of the invention is indicated by the appended claims rather than the foregoing description and all changes that come within the meaning and range and equivalence thereof are intended to be embraced therein.

Claims

What Is Claimed Is:

1. Method for synthesizing a video image from at least two images in a sequence of video images, the method comprising the steps of: comparing first information obtained from pixels used to represent a first image with second information obtained from pixels used to represent a second image; extracting a motion vector from image motion among the at least two images; producing a measure of confidence of accuracy with which the motion vector is generated; synthesizing a first synthesized image using the motion vector; synthesizing a second synthesized image using at least one of the first information and the second information; and interpolating an image between the first image and the second image by combining the first synthesized image with the second synthesized image using the measure of confidence as a weighting factor.

2. A method according to claim 1, wherein the step of synthesizing a first synthesized image includes a step of: spatially offsetting the first information by the motion vector multiplied by a desired temporal offset.

3. A method according to claim 2, wherein said step of comparing includes a step of: producing a correlation surface representative of image motion between the first image and second image.

4. A method according to claim 3, wherein the step of extracting includes a step of: deriving the motion vector from correlation data included in the correlation surface as a measure of the motion among the at least two images.

5. A method according to claim 1, wherein the motion vector is extracted by comparing the second information with the first information, and the step of synthesizing the first synthesized image includes a step of: spatially offsetting the first information by the motion vector multiplied by a desired temporal offset.

6. A method according to claim 1, wherein the motion vector is extracted by comparing the second information from the first information, and the step of synthesizing the first synthesized image includes a step of: offsetting the second information by the motion vector multiplied by a desired temporal offset.

7. A method according to claim 1, wherein the motion vector is extracted by comparing the first information with the second information, and the step of synthesizing the first synthesized image includes a step of: spatially offsetting the first information by the motion vector multiplied by a desired temporal offset.

8. A method according to claim 1, wherein the motion vector is extracted by comparing the first information from the second information, and the step of synthesizing the first synthesized image includes a step of: offsetting the second information by the motion vector multiplied by a desired temporal offset.

9. Method according to claim 1, wherein the motion vector is extracted by comparing the second information with the first information, and said step of synthesizing a first synthesized image includes a step of: compensating image motion using the first image and the motion vector to produce a forward motion compensated image; compensating image motion using the second image and the motion vector to produce a backward motion compensated image; and blending the forward motion compensated image and the backward motion compensation image to produce a motion compensated image as said first synthesized image (blending of forward and backward motion estimation).

10. Method according to claim 1, wherein the step of synthesizing a second image includes a step of: combining the first information and the second information to produce a temporally interpolated image.

11. A method according to claim 10, wherein the first synthesized image is a synthesized motion compensated image, the method comprising the step of: combining the synthesized motion compensated image and the interpolated image to produce a synthesized video image (P_tx).

12. A method according to claim 11, wherein the first and second images are non-interlaced images.

13. Method according to claim 11, wherein the first and second images are interlaced images.

14. Method according to claim 1, wherein the images are interlaced images, the method comprising the steps of: producing a vertically interpolated image from the second image; comparing the vertically interpolated image with the first image to produce a first correlation surface; comparing the first image with a third image to produce a second correlation surface; and combining the first and second correlation surfaces into a composite correlation surface; and extracting the motion vector from the composite correlation surface to produce de-interlaced image.

15. Method according to claim 14, comprising the step of: producing said measure of confidence using the composite correlation surface.

16. Method according to claim 14, wherein the motion vector is extracted by comparing the second information from the first information, and said step of synthesizing a first synthesized image includes a step of: compensating image motion using the first image and the motion vector to produce a forward motion compensated image; compensating image motion using the second image and the motion vector to produce a backward motion compensated image; and blending the forward motion compensated image and the backward motion compensation image to produce a motion compensated image as said first synthesized image.

17. Method according to claim 16, wherein the step of synthesizing a second image includes a step of: combining the first information and the second information to produce a temporally interpolated image.

18. A method according to claim 17, wherein the first synthesized image is a synthesized motion compensated image, the method comprising the step of: combining the synthesized motion compensated image and the interpolated image to produce a synthesized video image.