METHOD AND APPARATUS FOR PROGRESSIVE VIDEO MATERIAL IDENTIFICATION
CROSS REFERENCE TO RELATED APPLICATION
The invention is related to U.S. Provisional Application No. 60/297,046, filed on 8 June, 2001, entitled METHOD AND APPARATUS FOR MPEG-2 PROGRESSIVE VIDEO MATERIAL IDENTIFICATION, the subject matter thereof being fully incorporated by reference herein.
FIELD OF THE INVENTION The invention relates generally to coding of digital video signals, and more particularly to application of coding information respecting conversion from a film format to a digital video format.
BACKGROUND OF THE INVENTION
Advances in video coding and compression algorithms during the past few years has made possible the transmission and/or recording of video program material using comparatively small bandwidth of storage mediums. One such coding and compression methodology is MPEG-2, as described and codified in ISO 13818-2. Indeed the MPEG-2 methodology is an integral part of the Advanced Television System standard as adopted by the Advanced Television Systems Committee (ATSC). MPEG-2 is also widely used for transmission and/or recording of NTSC (National Television Standards Committee) based material, particularly cable and satellite transmission and DVD recordings.
In the process of applying such coding/compression methodologies, digitized video signal information is operated upon by an encoder at the transmission or recording site, which carries out the desired compression algorithms and produces as an output a video bitstream requiring substantially less transmission bandwidth or storage capacity than would have been required for the original video signal information. At the receiving or playback location, that compressed video bitstream is operated upon by a decoder which reverses the compression process and restores the original video signal information.
As is well known in the art, video program information is imaged as a sequence of scan lines and such scan lines may be provided in an interlaced format or a progressive (noninterlaced) format. Certain operations performed in the process of decoding and displaying
MPEG-coded video information require knowledge of whether the material is comprised of interlaced or progressive pictures. As is also known, this data is intended to be conveyed via the progressive Jrame bit in the picture coding extension of MPEG-2 data. The bit should be set for progressive pictures, and clear for non-progressive pictures. Among the operations needing information as to whether the video signal is using interlaced or progressive scan are frozen pictures and chroma upsampling. Frozen pictures are generated using both fields of a progressive picture, but must be generated using a single repeated field of a non-progressive picture. Chroma upsampling filters (to convert video samples from 4:2:0 to 4:2:2 format) are configured differently depending on the progressive nature of the picture.
A problem arises for such operations because there is a significant amount of MPEG video material, including DVD and satellite broadcast, that improperly sets the progressive Jrame bit, causing vibrating frozen pictures, and non-optimal chroma upsampling (including possible picture distortion).
SUMMARY OF INVENTION
The repeatjirstjield bit in the Picture Coding Extension of MPEG-2 data is used to perform temporal conversion (3:2 pulldown) for conversion of film material from 24 frames/sec to the 60 fields/sec rate needed for NTSC compatibility. It is known that such 24 frames/sec source material is normally encoded as progressive pictures. The invention operates to monitor the repeatjirstjield bit, as received at a decoder, to determine if the received repeatjirstjield bits correspond to 24 frames/sec source material. Upon an affirmative finding from that monitoring that the setting of the repeat Jrstjield bit indicates 24 frames/sec source material, the invention provides an indicia that the received video signal is encoded as progressive pictures. Similarly, upon an affirmative determination that the setting of the repeat Jrstjeld bit does not comport with 24 frames/sec source material, the invention provides an indicia that the received video signal is encoded as interlaced pictures.
BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a schematic depiction of the video decoder portion of a set-top box.
Figure 2 schematically depicts an object as it would appear if displayed on a progressive display device.
Figure 3 schematically depicts the object of Figure 2 as it would be displayed on two fields of an interlaced display device.
Figure 4 schematically depicts two frame pictures of the object in motion.
Figure 5 provides a schematic depiction of each frame of Figure 4 split into two fields that are temporally coincident.
Figure 6 depicts four field pictures of the object in motion. Figure 7 schematically depicts two frozen pictures resulting from a combination of two of the fields depicted in Figure 6.
Figure 8 depicts a flow chart for an exemplary embodiment of a method utilizing the principles of the present invention.
DETAILED DESCRIPTION OF THE INVENTION The invention is described hereafter in terms a preferred embodiment in which an
MPEG-encoded video signal is transmitted from a broadcast source and received at a user location. According to this preferred embodiment, the decoding of the encoded MPEG video signal at the receiver location (as well as other functions not material to the discussion of the invention) occurs in a device known as a set-top box, with acts incorporating the principles of the present invention being carried out in conjunction with the decoding of the received video signal by the set-top box. It should be understood, however, that the invention is equally applicable to other modes of transmitting a signal to such a receiver, as well as to the process of recording the compressed video signal onto a storage medium such as a DVD, with the process of the invention occurring in the decode processing by the DVD player. Although the encoded program material may be delivered to an end user in various forms and via various media - e.g., as a signal modulated on a carrier transmitted via a transmission medium or recorded on a DVD, conceptually the process is the same. Program source material is digitized, compressed and encoded at a source location. Upon receipt of that information at a user location, the encoding, compression and digitization steps are reversed and the original video material is reproduced on some form of video display device.
As noted in the Background section, a video picture is displayed on a television display device as a series of lines starting at the top of the display and progressing to the bottom. For the purposes of this discussion, assume that the first line at the top of the display is line 1, and the last line at the bottom of the display is line 480. A progressive picture (frame) consists of all 480 lines of a picture, displayed at a nominal rate of 30 frames per second. (For NTSC systems, the exact frame rate is 29.97 frames per second; ASTC systems, however, support both the NTSC 29.97 FPS rate and a true 30 FPS rate.) An interlaced picture, on the other hand, consists of 240 lines of a picture (every other line), displayed at a
nominal rate of 60 fields per second. Field pictures alternate between all of the odd numbered lines and all of the even numbered lines.
Most television receivers in use today are interlaced display devices. Such receivers typically do not include the hardware and software to decode an MPEG-encoded video stream and, as a result, a separate set-top box (STB) is provided for processing such an encoded video steam into a form usable by the receiver itself. To accommodate the input requirements of the receiver, the output of the STB is typically an analog signal in and interlaced analog video format.
The video decoder portion of an STB typically includes a microprocessor that parses MPEG data and extracts the parameters for each displayed picture. For perspective in the discussion following, Figure 1 provides a schematic illustration of the MPEG video decoder functions implemented in an STB, in accordance with the principles of the present invention.
The picture parameters extracted by the microprocessor 101 are used by the decoder in the implementation of the illustrated decoder functions: variable length decode 102, inverse quantization 103, inverse DCT 104, motion compensation 105, and picture reconstruction
(spatial and temporal) 106. The method of the invention is focused on the Spatial and
Temporal Reconstruction 106 function. This function includes vertical and horizontal scaling and filtering, as well as the processing of picture information related to field or frame pictures. As shown in the figure, the Spatial and Temporal Reconstruction function further includes a Detector function 106A in accordance with the principles of the present invention.
There are instances where it is desirable for an STB' to display a frozen picture. One example is when the STB has the capability to pause playback. Another example is a channel change. When a viewer changes the channel, an STB will display a frozen picture while it performs initialization and/or satisfies conditions needed for a transition to steady-state video decode/display. In the display of a frozen picture, an interlaced television display will alternately display two fields, one for the even lines and one for the odd lines. Accordingly, the STB will output these two fields continuously.
To illustrate an important advantage of the method of the invention, reference is made to Figures 2 through 7. Those figures collectively represent, schematically, one or more objects depicted on a video display in various configurations related to the operation of the invention. For purposes of illustration, each object depicted is represented as a solid square comprised of six rows of pixels. Each row is shown as a horizontal line, representing a row of pixels on a single scan line of a video display. The dashed line about the periphery of the
object is for reference only, to indicate the position of the entire solid square object, and would not actually appear on the display.
Figure 2 shows how the solid square object would appear if displayed on a progressive display device - i.e., all scan lines displayed in a single frame, while Figure 3 shows how the same solid square object would appear if displayed on two fields (alternate scan lines) of an interlaced display device.
Correspondingly, Figure 4 schematically depicts the object moving down and right, with the object displayed as two frame pictures (progressive), and Figure 5 schematically depicts the same movement of the object, displayed as four field pictures with an interlaced display device. As will be apparent from the figure, each of the two interlaced fields for the object at a given position on the screen (fit & fib for the first position and f2t & f2b for the second position) are temporally coincident. Because those fields are temporally coincident, a frozen picture constructed from top and bottom fields from the same frame would have no motion artifacts. Likewise, chrominance upsampling utilizing top and bottom fields from the same frame would have no objectionable artifacts.
Figure 6, on the other hand, schematically depicts the object moving down and right, recorded with four field pictures, where each of the field pictures is temporally displaced relative to the preceding field picture. Figure 7 then schematically depicts the same object movement, displayed as two pairs of field pictures on an interlaced display device. Since the fields are temporally displaced, the objects are spatially displaced, and a frozen picture constructed from two consecutive top and bottom fields would exhibit flicker, as the object would be shown alternately in two different positions.
This case, referred to as a vibrating picture, is considered objectionable, and should be avoided if possible. One option for avoiding such vibrating pictures is to display the same field picture for both field display periods. However, such "line-doubling" may also result in some degradation of picture quality (i.e., lower resolution). Likewise, chrominance upsampling utilizing top and bottom fields from the same frame would also have objectionable artifacts.
With MPEG encoded video material, the MPEG syntax provides a descriptor called a progressive Jrame bit to indicate whether the video material is encoded as progressive frame
(non-interlaced) pictures, or as non-progressive (interlaced) field pictures. (It is of course preferred that the video material is encoded utilizing the same method that was used to record
it.) This progressive Jrame bit descriptor may be used by the STB to determine whether a frozen picture should display both fields (frame material) or to repeat one field (field material). If this progressive Jrame bit is in error, the STB would display both fields to create a frozen picture from interlaced video material, resulting in vibrating pictures. In like manner, the STB would display only a single field for progressive video material, resulting in some loss of resolution.
The progressive Jrame bit is also used by the vertical filter to perform upconversion of the chrominance samples from 4:2:0 to 4:2:2 format. The filter configuration (sample selection and filter coefficients) is dependent on the progressive nature of the video material. Application of progressive upconversion filter methods to non-progressive material will, under certain circumstances, result in an objectionable video display error. This is often seen as dark vertical lines within a rapidly moving colored object.
To address these problems, the inventor has determined an alternate methodology for detecting the video scan format. As is well known, video program source material that is originally recorded on film, typically at 24 frames per second, is subject to a special conversion process to address a basic incompatibility between the film format and the video format to which it is to be converted. Such film based original source material also constitutes a large proportion of total NTSC video material. To convert from the 24 frames per second film format to the 60 fields per second format of the video signal, a process known as 3-2 pulldown is used to create 10 video fields from 4 film frames.
This 3-2 pulldown process is carried out by repeating the first displayed field of every other frame picture. To that end, each frame picture (F) is split into two field pictures, top and bottom (fit, fib). For every other frame picture, the first displayed field picture is repeated. This is represented as follows: F1-F2-F3-F4 is displayed as flt-flb-flt-f2b-f2t-f3b-f3t-f3b-f4t- f4b, and so on.
As will be apparent, each of the repeated fields constitutes redundant data. In order to conserve bandwidth (or storage capacity), the MPEG-2 encoding process discards each of the repeated fields and sets a flag called the repeatjirstjield bit of the Picture Coding Extension set, to indicate to the MPEG decoder at the STB that it should insert a repeat of that first field when the flag is set in order to maintain temporal synchronism for the decoded video data - i.e., to cause a progressive frame picture to be displayed as three fields. As long as this bit is found by the STB decoder to be set for every other picture in a sequence, one can be
reasonably sure that the material originated from 24 frames/sec film and is comprised of progressive pictures. Accordingly, the method of the invention monitors the repeatjirstjield flag at the STB decoder for the above described pattern associated with 3-2 pulldown. Upon detection of that pattern, it is assumed that the received signal is comprised of progressive pictures and a signal is provided to other decoder functions reliant on knowledge of whether the signal is progressive or interlaced.
The method of the invention is described hereafter in connection with the flow chart of Figure 8, which is directed to an illustrative embodiment of that methodology. Considering that flow chart, the process begins at Start step 800 and proceeds to step 801, where the microprocessor/parser (101 in Figure 1) in the STB parses the MPEG picture parameters from the Picture Coding Extension. With the completion of this parsing step, the repeatjirstjield flag is available for use. And, as indicated above, with 24 frame/second source material, this flag toggles with each picture. In decisional step 802, the output from the parsing step is evaluated to determine if the repeatjirstjield flag is set, or True, for 2 of the previous 4 frame pictures — i.e. for alternate frames in the 4 frame sequence. In the case of this pattern occurring ("yes" decision), the method proceeds to an identification of the video material as representing 24 frames/second source material in step 803 and an indicia is provided to the STB that progressive filtering is called for. Such identification of the video material as representing 24 frames/second source material will continue until the repeatjirstjield flag is reset for three consecutive frame pictures. For a preferred embodiment of the invention, steps 802 and 803 will be implemented in the Spatial and Temporal Reconstruction/Detector element (106, 106 A) of Figure 1
From step 803, the method proceeds to decision step 806, where an evaluation is made as to whether additional video frames are available to process. In the event of an affirmative decision in step 806, the method returns to the Start step (800) to address the additional video frames. If no additional frames are available for processing, the method of the invention terminates at step 807.
Going back to decision step 802, and the alternative case of the 2-in-4 repeatjirstjield flag pattern not being found at step 802 ("no" decision), the method moves to decision step 804, where an evaluation is made as to whether the repeat Jrstjeld flag pattern is not present (False). For the case of an affirmative finding being made as to the repeat Jrstjield flag pattern not being present, the method proceeds to an identification of the video material as not representing 24 frames/second source material, in step 805 and an
indicia is provided to the STB that interlace filtering is called for. From step 805, as from step 803, the method proceeds to a determination of whether additional frames are available for processing, at step 806. If, on the other hand, an affirmative finding is not made in step 804 as to the pattern occurring ("no" decision), the method proceeds directly to step 806, for a determination as to whether additional video frames are available to process.
The previous description merely illustrates the principles of the invention. It will thus be appreciated that those skilled in the art will be able to devise various arrangements which, although not explicitly described or shown herein, embody the principles of the invention and are included within its spirit and scope. Furthermore, all examples and conditional language recited herein are principally intended to be only for pedagogical purposes to aid the reader in understanding the principles of the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions. Moreover, all statements herein reciting principles, aspects, and embodiments of the invention, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.
Thus, for example, it will be appreciated by those skilled in the art that the block diagrams herein represent conceptual views of illustrative circuitry embodying the principles of the invention. Similarly, it will be appreciated that any of the processes, acts, and steps described herein represent various processes which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown. Accordingly, this description is to be construed as illustrative only and is for the purpose of teaching those skilled in the art the best mode of carrying out the invention and is not intended to illustrate all possible forms thereof. It is also understood that the words used are words of description, rather that limitation, and that details of the structure may be varied substantially without departing from the spirit of the invention and the exclusive use of all modifications which come within the scope of the appended claims is reserved.