EP1161838A1

EP1161838A1 - Method and apparatus for generating time stamp information

Info

Publication number: EP1161838A1
Application number: EP00921403A
Authority: EP
Inventors: Si Jun Huang; Joel Schoenblum
Original assignee: Scientific Atlanta LLC
Current assignee: Scientific Atlanta LLC
Priority date: 1999-03-22
Filing date: 2000-03-17
Publication date: 2001-12-12
Also published as: WO2000057647A1; AU4173100A

Abstract

Variable rate multiplexer devices have, by definition, a variable output rate for bits of information encoded therein. As a result stamp information is necessary to assure that a picture is downloaded at the current time. Unfortunately, not every picture will include time stamp information required. Therefore, some time stamp information must be inferred and applied to certain pictures. This information can be calculated by modifying a known time value with a correction value. For example, an inferred time stamp value may be calculated by adding a correction value to the time stamp value of the time stamp on the previous picture.

Description

METHOD AND APPARATUS FOR GENERATING TIME STAMP INFORMATION

Cross Reference to Related Applications This application is a continuation in part application of pending U.S. application

Serial No. 08/823.007. filed March 21. 1997, by Huang, et al, entitled "Using a Receiver Model to Multiplex Variable-Rate Bit Streams Having Timing Constraints," and assigned to Scientific-Atlanta, Inc.

Field of the Invention

The invention relates in general to the transmission of variable-rate bit streams and more particularly to the generation of time stamp information in video packets in said bit streams to assure timely decoding and / or presentation.

Background of the Invention

A new problem in data transmission is the transmission of data that requires a high bandwidth, is bursty, and has temporal constraints. Traditionally, data transmission has been done on the public switched networks provided by the telephone companies or on packet networks. The public switched networks are designed for interactive voice applications, and provide relatively low-bandwidth circuits that satisfy stringent temporal constraints. The packet networks are designed for the transfer of data between computer systems. The only constraint is that the data eventually arrive at its destination. The amount of bandwidth available for a transfer depends on the degree of congestion in the network. The packet networks thus typically make no guarantees about when or even in what order the data in a burst of data will arrive at its destination.

It may thus be appreciated that neither the telephone network nor the packet network is well-adapted to handle high-bandwidth, bursty data with time constraints. An example of such data is digital television which has been compressed according to the Motion Picture Experts Group ("MPEG") MPEG-2 standard, otherwise set forth in ISO/IEC 13818-1 and 13818-2.

Referring now to FIG.l there is illustrated therein those details of the MPEG-2 standard relevant to the present invention. It is to be understood however, that the invention as described hereinafter is not so limited, and will work with other data compression techniques such as the Digital Video Broadcast (DVB) standard. The MPEG-2 standard defines an encoding scheme for compressing digital representations of video. The encoding scheme takes advantage of the fact that video images generally have large amounts of spatial and temporal redundancy. There is spatial redundancy because a given video picture has sections where the entire area has the same appearance; the larger the areas and the more of them there are. the greater amount of spatial redundancy in the image. There is temporal redundancy because there is often not much change between a given video image and the ones that precede and follow it in a sequence. The less the amount of change between two video images, the greater the amount of temporal redundancy. The more spatial redundancy there is in an image and the more temporal redundancy there is in the sequence of images to which the image belongs, the fewer the bits of information that will be needed to represent the image.

Maximum advantage for the transmission of images encoded using the MPEG-2 standard is obtained if the images can be transmitted at variable bit rates. The bit rates can vary because the rate at which a receiving device receives images is constant, while the images have a varying number of bits. A complex image therefore requires a higher bit rate than a simple image, and a sequence of MPEG images transmitted at variable bit rates is a variable-rate bit stream with time constraints. For example, a sequence of images that shows a news anchorperson in front of a solid color background will have much more spatial and temporal redundancy than a sequence of images for a commercial or MTV song presentation, and the bit rate for the images showing the news anchor will be far lower than the bit rate for the images of the MTV song presentation.

The MPEG-2 compression scheme presents a sequence of video images as a sequence of compressed pictures, each of which must be decoded at a specific time. There are three ways in which pictures may be compressed. One way is intra-coding, in which the compression is done without reference to any other picture. This encoding technique reduces spatial redundancy but not time redundancy, and the pictures resulting from it are generally larger than those in which the encoding reduces both spatial redundancy and temporal redundancy. Pictures encoded in this way are called I-pictures. A certain number of I-pictures are required in a sequence, first, because the initial picture of a sequence is necessarily an I-picture, and second, because I-pictures permit recovery from transmission errors.

Time redundancy is reduced by encoding pictures as a set of changes from earlier or later pictures or both. In MPEG-2, this is done using motion compensated forward and backward predictions. When a picture uses only forward motion compensated prediction, it is called a Predictive-coded picture, or P picture. When a picture uses both forward and backward motion compensated predictions, it is called a bi-directional predictive-coded picture, or a B picture in short. P pictures generally have fewer bits than I-pictures and B pictures have the smallest number of bits. The number of bits required to encode a given sequence of pictures in MPEG-2 is thus dependent on the distribution of picture coding types mentioned above, as well as the picture content itself. As will be apparent from the foregoing discussion, the sequence of pictures required to encode the images of the news anchorperson will have fewer and smaller I-pictures and smaller B and P pictures than the sequence required for the MTV song presentation, and consequently, the MPEG-2 representation of the images of the news anchorperson will be much smaller than the MPEG-2 representation of the images of the MTV sequence.

The MPEG-2 pictures are being received by a low-cost consumer electronics device such as a digital television set or a set-top box provided by a cable television

("CATV") service provider. The low cost of the device strictly limits the amount of memory available to store the MPEG-2 pictures. Moreover, the pictures are being used to produce moving images. The MPEG-2 pictures must consequently arrive in the receiver in the right order and with time intervals between them such that the next MPEG-2 picture is available when needed and there is room in the memory for the picture which is currently being sent. In the art, a memory which has run out of data is said to have underflowed, while a memory which has received more data than it can hold is said to have overflowed. In the case of underflow, the motion in the TV picture must stop until the next MPEG-2 picture arrives, and in the case of overflow, the data which did not fit into memory is simply lost.

FIG. 1 is a representation of a system 10 including digital picture source 12 and a television 14 that are connected by a channel 16 that is carrying a MPEG-2 bit stream representation of a sequence of TV images. The digital picture source 12 generates uncompressed digital representations ("UDRN of images 18, which go to variable bit rate ("VBR") encoder 20. Encoder 20 encodes the uncompressed digital representations to produce variable rate bit stream ("VRBS") 22. Variable rate bit stream 22 is a sequence of compressed digital pictures 24 of variable length. As indicated above, when the encoding is done according to the MPEG-2 standard, the length of a picture depends on the complexity of the image it represents and whether it is an I-picture, a P picture, or a B picture. Additionally, the length of the picture depends on the encoding rate of VBR encoder 20. That rate can be varied. In general, the more bits used to encode a picture, the better the picture quality.

The variable rate bit stream 22 is transferred via channel 16 to VBR decoder 26, which decodes the compressed digital pictures 24 to produce uncompressed digital pictures 105. These in turn are provided to television 117. If television 117 is a digital television, they will be provided directly; otherwise, there will be another element which converts uncompressed digital pictures ("UDP") 28 into standard analog television signals and then provides those signals to television 14. There may of course be any number of VBR decoders 26 receiving the output of a single encoder 20. In FIG. 1, channel 16 transfers bit stream 22 as a sequence of packets 30. The compressed digital pictures 24 thus appear in FIG. 1 as varying-length sequences of packets 30. Thus, picture 24(a) may have "n" packets while picture 24(d) has "k" packets. Included in each picture 24 is timing information 32. Timing information contains two kinds of information: clock information and time stamps. Clock information is used to synchronize decoder 26 with encoder 20. The time stamps include the Decoding Time Stamp ('"DTSN which specifies when a picture is to be decoded and the Presentation Time Stamp ("PTS") which specifies when the picture is actually to be displayed. The times specified in the time stamps are specified in terms of the clock information. As indicated above, VBR decoder 26 contains a relatively small amount of memory for storing pictures 30 until they are decoded and provided to TV 14. This memory is shown at 34 in FIG. 1 and will be referred to hereinafter as the decoder's bit buffer. Bit buffer 34 must be at least large enough to hold the largest possible MPEG-2 picture. Further, channel 16 must provide the pictures 24 to bit buffer 34 in such fashion that decoder 26 can make them available at the proper times to TV 14 and that bit buffer 34 never overflows or underflows. Bit buffer 34 underflows if not all of the bits in a picture 24 have arrived in bit buffer by the time specified in the picture's time stamp for decoder to begin decoding the picture. Providing pictures 24 to VBR decoder 26 in the proper order and at the proper times is made more complicated by the fact that a number of channels 16 may share a single very high bandwidth data link. For example, a CATV provider may use a satellite link to provide a large number of TV programs from a central location to a number of CATV network head ends, from which they are transmitted via coaxial or fiber optic cable to individual subscribers or may even use the satellite link to provide the TV programs directly to the subscribers. When a number of channels share a medium such as a satellite link, the medium is said to be multiplexed among the channels.

FIG. 2 shows such a multiplexed medium. A number of channels 16(0) through 16(n) which are carrying packets containing bits from variable rate bit streams 22(0..n) are received in multiplexer 40, which processes the packets as required to multiplex them onto high bandwidth ("HBW") medium 42. The packets then go via medium 42 to demultiplexer 44, which separates the packets into the packet streams for the individual channels 16(0..n). A simple way of sharing a high bandwidth medium among a number of channels that are carrying digital data is to repeatedly give each individual channel 16 access to the high bandwidth medium for a short period of time, referred to hereinafter as a slot.

One way of doing this is shown at 50 in FIG. 2. The short period of time appears at 50 as a slot 52; during a slot 52, a fixed number of packets 32 belonging to a channel 16 may be output to medium 42. Each channel 16 in turn has a slot 52, and all of the slots taken together make up a time slice 54. When medium 42 is carrying channels like channel 16 that have varying bit rates and time constraints, slot 52 for each of the channels 16 must output enough packets to provide bits at the rate necessary to send the largest pictures to channel 16 within channel 16's time, overflow, and underflow constraints. Of course, most of the time, a channel^'s slot 52 will be outputting fewer packets than the maximum to medium 42. and sometimes may not be carrying any packets at all. Since each slot 52 represents a fixed portion of medium 42's total bandwidth, any time a slot 52 is not full, a part of medium 42's bandwidth is being wasted.

In order to avoid wasting the medium bandwidth, a technique is used which ensures that each time slice is generally almost full of packets. This technique is termed statistical multiplexing. It takes advantage of the fact that at a given moment in time, each of the channels in a set of channels will be carrying bits at a different bit rate, and the medium bandwidth need only be large enough at that moment of time to transmit what the channels are presently carrying, not large enough to transmit what all of the channels could carry if they were transmitting at the maximum rate. The output of the channels is analyzed statistically to determine what the actual maximum rate of output for the entire set of channels will be and the medium bandwidth is sized to satisfy that actual peak rate. Typically, the bandwidth that is determined in this fashion will be far less than is required for multiplexing in the manner shown at 50 in FIG. 2. As a result, more channels can be sent in a given amount of bandwidth. At the level of slots, what statistical multiplexing requires is a mechanism which in effect permits a channel to have a slot in time slice 54 which varies in length to suit the actual needs of channel 16 during that time slice 54. Such a time slice 54 with varying-length slots 56 is shown at 55.

Methods of statistically multiplexing bit streams are disclosed in, for example, U.S. Patent 5,506,844, entitled, Method for Configuring a Statistical Multiplexer to Dynamically Allocate Communication Channel Bandwidth, to Rao, issued April 9, 1996; and United States patent application Serial no. 08/823,007 entitled Using a Receiver Model to Multiplex Variable Rate Bit Streams Having Timing Constraints, filed March

21, 1997, the disclosures of each of which are incorporated herein by reference.

As noted above, it is preferred that each picture 24 include both clock information and time stamp information, the time stamp information being in the form of the DTS and PTS. The MPEG-2 standard, however, does not require that each and every picture include DTS and PTS information. Rather, only one picture within the stream per every 0.7 second is required to include time stamp information. In conventional MPEG-2 video encoder implementation, every coded picture usually carries a DTS to fill an MPEG-2 video decoder the accurate time of decoding that picture. However as discussed above, a statistical multiplexer that is designed to support any compliant MPEG-2 encoded bit streams should be able to support the stream that does not have DTS for each coded picture. Therefore a means needs to be developed to handle this special case for a statistical multiplexer.

Accordingly, there exists a need to provide a way to assign time stamp information, and particularly DTS information, to pictures without any "native" time stamp information. The assigned time stamp information should be accurate, and be able to be assigned without damaging the picture frame. Moreover, the assigned time stamp information should be reliably assigned to every picture without "^'native" information so that the picture is decoded at the correct time.

Brief Description of the Drawing

FIG. 1 is a block diagram illustrating how digital television pictures are encoded, transmitted, and decoded;

FIG. 2 is a block diagram showing multiplexing of variable-rate bit streams onto a high band width medium;

FIG. 3 is a block diagram of a statistical multiplexer which implements a preferred embodiment of the invention;

FIG. 4 is a more detailed block diagram of a part of the statistical multiplexer of FIG. 3.

Detailed Description of the Preferred Embodiment

While the specification concludes with claims defining the features of the invention that are regarded as novel, it is believed that the invention will be better understood from a consideration of the following description in conjunction with the drawing figures, in which like reference numerals are carried forward.

At its simplest, the invention relates to the ability to look ahead "n" time slices to see whether or not the system will have sufficient bandwidth to accommodate the video information that will need to be output. The process described hereinbelow looks at the relative space needs per channel and allocates bits (n MPEG packets) as required. By looking a sufficient number of time slices into the future, panic conditions, i.e., a condition in which bandwidth requisites will exceed bandwidth availability, can be identified. Once identified, such conditions may be avoided by looking for opportunities to insert glue pictures.

Referring now to FIG. 3, there is illustrated therein a block diagram of a statistical multiplexer which implements a preferred embodiment of the invention. FIG. 3 illustrates an overview of a statistical multiplexer 80 for MPEG-2 bit streams which is implemented according to the principles of the invention. The main components of multiplexer 80 are packet collection controller 82. a transmission controller ("TC") 84(i) for each variable-rate bit stream 22(i), a packet delivery controller 86, and a modulator 88, which receives the output of packet delivery controller 86 and outputs it in the proper form for transmission medium 42. Packet collection controller 82 collects packets from variable-rate bit streams 22(0.. n) and distributes the packets that carry a given bit stream 22(i) to the bitstream's corresponding transmission controller 84(i). In the preferred embodiment, the packets for all of the bit streams 22(0..n) are output to bus 90. Each packet contains an indication of which bit stream it belongs to. and packet collection controller responds to the indication contained in a packet by routing it to the proper transmission controller 84(i). It should be noted here that the packets in each bit stream 22(i) arrive in transmission controller 84(i) in the order in which they were sent by encoder 20(i).

Transmission controller 84(i) determines the rate at which packets from its corresponding bit stream 22(i) is output to medium 42. The actual rate determination is made by transmission rate controller ('"TRC") 92, which at a minimum, bases its determination on the following information: for at least a current picture 24 in bit stream

22(i), the timing information 32 and the size of the current picture. A Video Buffer Verifier (VBV) model 94(i), which is a model of a hypothetical bit buffer 34(i). VBV model 94(i) uses the timing information and picture size information to determine a range of rates at which bit stream 22(i) must be provided to the decoder's bit buffer 34(i) if bit buffer 34(i) is to neither overflow nor underflow. Transmission rate controller 92(i) provides the rate information to packet delivery controller 86, which uses the information from all of the transmission controllers 84 to determine during each time slice how the bandwidth of transmission medium 42 should be allocated among the bit streams 22 during the next time slice. The more packets a bit stream 22(i) needs to output during a time slice, the more bandwidth it receives for that time slice.

Continuing in more detail, transmission controller 84 obtains the timing and picture size information by means of 96. which reads bit stream 22(i) as it enters transmission controller 84 and recovers the timing information 32 and the picture size 98 from bit stream 22(i). Analyzer 96 can do so because the MPEG-2 standard requires that the beginning of each picture 24 be marked and that the timing information 32, if present, occupy predetermined locations in each picture 24. As previously explained, timing information 32 for each picture 24 includes a clock value and a decoding time stamp ("DTS"). Transmission controller 84(i) and later decoder 26(i) use the clock value to synchronize themselves with encoder 20(i). The timing information is found in the header of the Packetized Elementary Stream ("PES") packet that encapsulates the compressed video data. The information is contained in the PTS and DTS time stamp parameters of the PES header. The MPEG-2 standard requires that the time stamp, either PTS or DTS or both, be sent at least every 700 milliseconds (msec). If a DTS is not explicitly sent with a compressed picture, then the decoding time can be determined from parameters in the Sequence and Picture headers, or extrapolated from the DTS value of a previously transmitted picture. For details, see Annex D of ISO/IEC 13818-1. Bit stream analyzer 96 determines the size of a picture simply by counting the bits (or packets) from the beginning of one picture to the beginning of the next picture. The timing information and the picture size are used in VBV model 94(i). VBV model 94(i) requires the timing information and picture size information for each picture in bit stream 22(i) from the time the picture enters multiplexer 80 until the time the picture is decoded in decoder 26(i). A DTS buffer 100 which is designed to store DTS values of incoming pictures must be large enough to hold the timing information for all of the pictures required for the model. It should be noted here that VBV model 94(i)'s behavior is defined solely by the semantics of the MPEG-2 standard, not by any concrete bit buffer 34(i). This guarantees the bit stream generated by the Statistical Multiplexer described herein will be decodable by any compliant MPEG-2 video decoder that has a defined minimal recovery bit buffer. Given this minimum buffer size, the timing information for the pictures, and the sizes of the individual pictures, VBV model 94(i) can determine a rate of output for bit stream 22(i) which will guarantee for bit buffers 34(i) of any working MPEG-2 decoder that each picture arrives in the bit buffer 34(i) before the time it is to be decoded and that there will be no overflow nor underflow of bit buffer 34(i).

Referring now to FIG. 4 there is illustrated therein a more detailed block diagram of statistical multiplexer including means for inserting time stamp information into picture 24 having none.

In particular FIG. 4 is a detailed block diagram illustrating the features of analyzer 96 which provides time stamp information to pictures 24 arriving without such information. The analyzer 96 includes as least three sub-analyzers: a transport stream ("TS") analyzer 120; a packetized elementary stream analyzer 122; and a video analyzer 124. Pictures in the transport stream 22(1) enter the analyzer and in particular the PES analyzer 122. which detects the presence or absence of time stamp information. As noted above, such information is always located, when present, in a particular location.

If timing information is present, the output of analyzer 122 is sent to a picture storage device 126, along with the output of TS analyzer 120 and video analyzer 124. Thereafter, the stored information is sent to the TRC 92 for transmission to the decoder. If however there is no time stamp information in the picture 24, a time stamp extrapolator 128 calculates the current time stamp and send it to the TRC 92 for statistical multiplexing operation. The time stamp value is determined by modifying a known time value by a correction value. For example, a known time stamp value from an immediately previous picture can be modified by adding a correction value thereto. Preferably the correction value is based on some measurable characteristic of the video stream, rather that an arbitrary value.

In one preferred embodiment, time stamp information, and in particular DTS information, is generated for a picture having none in the time stamp extrapolator 128 as follows: The frame rate of the incoming video stream is determined by the video analyzer 124; from this, a value for time period per frame (ΔT) is calculated as 1 /frame rate. If no DTS information is present in a picture, an "inferred" value, or "current DTS^" is calculated by adding ΔT to the DTS value for the last previous picture including DTS information. The "current DTS" is then used by the statistical multiplexer to control the packet delivery based on decoder^'s VBV model. It is to be noted that while the foregoing discussion relates to DTS information, the invention is not so limited. It may be applied equally effectively to PTS information.

While the preferred embodiments of the invention have been illustrated and described, it will be clear that the invention is not so limited. Numerous modifications, changes, variations, substitutions and equivalents will occur to those skilled in the art without departing from the spirit and scope of the present invention as defined by the appended claims.

Claims

What is claimed is:

1. A method of inferring a time stamp value for a video packet in a video stream having no time stamp attached thereto, comprising the steps of generating a correction value and applying said correction value to a known time value.

2. A method as in claim 1, wherein said time stamp is a decoding time stamp.

3. A method as in claim 1 , wherein said time stamp is a presentation time stamp.

4. A method as in claim 1, wherein the correction value is based upon a measurable characteristic of said video stream.

5. A method as in claim 4, wherein said characteristic is the frame rate of said video stream.

6. A method as in claim 5, wherein said correction value is the inverse of frame rate.

7. A method as in claim 1 , wherein the known time value is the time stamp value of a previous video packet having a time stamp attached thereto.

8. A method as in claim 1 , wherein the correction value is added to said known time value.

9. A method of generating a time stamp value for a video packet in a video stream, said video packet having no such value, said method comprising the steps of: determining a frame rate of said video stream; calculating a time period per frame of said video stream, based upon said frame rate; and modifying a known time value by said time period per frame.

10. A method as in claim 9, wherein said time stamp is a decoding time stamp.

11. A method as in claim 9, where in the correction value is based upon a measurable characteristic of said video stream.

12. A method as in claim 9, wherein said time period per frame is the inverse of said frame rate.

13. A method as in claim 12, wherein time period per frame is added to said known time value.

14. A method as in claim 9, wherein the known time value is the time stamp value of a previous video packet having a time stamp attached thereto.

15. A method of generating a decoding time stamp value for a video packet in an incoming video stream, said method comprising the steps of: determining the frame rate of said incoming video stream; calculating a correction value equal to the inverse of said frame rate; and adding said correction value to a time stamp value for a previous video packet in said video stream.

16. A method as in claim 15, wherein said previous video packet is the immediately previous video packet.

17. A method as in claim 15, wherein said previous video packet time stamp is a decoding time stamp.