WO1999043157A1 - System and method for non-causal encoding of video information for improved streaming thereof - Google Patents

System and method for non-causal encoding of video information for improved streaming thereof Download PDF

Info

Publication number
WO1999043157A1
WO1999043157A1 PCT/US1999/002411 US9902411W WO9943157A1 WO 1999043157 A1 WO1999043157 A1 WO 1999043157A1 US 9902411 W US9902411 W US 9902411W WO 9943157 A1 WO9943157 A1 WO 9943157A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
segment
information
encoding
encoded
Prior art date
Application number
PCT/US1999/002411
Other languages
French (fr)
Inventor
Feng Chi Wang
Original Assignee
Motorola Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Motorola Inc. filed Critical Motorola Inc.
Priority to AU25824/99A priority Critical patent/AU2582499A/en
Publication of WO1999043157A1 publication Critical patent/WO1999043157A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/114Adapting the group of pictures [GOP] structure, e.g. number of B-frames between two anchor frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/142Detection of scene cut or scene change
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/152Data rate or code amount at the encoder output by measuring the fullness of the transmission buffer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/177Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/192Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output

Definitions

  • the invention generally relates to multimedia applications and, more particularly, to the encoding and streaming of video information over a communication network.
  • the first approach is to have a client node download a file having the video information from a corresponding "website,” or server node, and to then play-back the information, once the file has been completely transferred.
  • the second approach is to have the server node "stream" the information to the client node so that the client may begin play-back soon after the information starts to arrive. Because the streaming approach does not suffer from the long start-up delays inherent in the downloading approach, it is believed to be preferable in certain regards.
  • a target data rate is first selected, for example, 20 Kb/sec for a 28.8 Kb/sec modem. (The other 8.8 Kb/sec of bandwidth is saved for audio information and packet overhead.) If a sequence of video frames is to be encoded and because of its inherent informational content the streaming of the encoded data would require a data rate higher than the target rate to maintain a certain image quality level, then the TrueStream system adjusts certain encoding parameters to reduce image quality and compress the image so that encoded frames fit into the channel bandwidth.
  • the TrueStream system applies a "use it or lose it” approach and adjusts certain encoding parameters to improve the video quality and so that the channel capacity is used. For some images, e.g. a still blank screen, the expenditure of additional bits will have little or no improvement on the image quality. In these cases, the "use it or lose it” approach wastes bits.
  • the sequence will be played-back with pictures having a relatively coarser level of detail and a relatively smaller frame rate, and that in the latter case, the sequence will be played-back with pictures having a finer level of detail and a higher frame rate.
  • Figure 1 shows a standard source encoder 100 having a known architecture
  • Figure 2 shows a system architecture of an exemplary embodiment of the invention
  • Figure 3 is a flowchart of characterization logic of an exemplary embodiment of the invention.
  • FIGS. 4A-G are a flowchart of bitrate controller logic of an exemplary embodiment of the invention. Detailed Description
  • the invention involves a system and method for non-causal encoding of video information for improved streaming thereof.
  • Exemplary embodiments of the invention encode video information at variable bitrates and by doing so can utilize the channel in new and useful ways. For example, if the informational content of certain video frames can be encoded at a relatively low bitrate, the otherwise- unused channel capacity can be used to transmit information that will be needed for future video frames. This might be helpful in that a future portion of a video stream might, because of its informational content, require so much data that it otherwise is streamable only by sacrificing the quality of its play-back.
  • Exemplary embodiments first analyze and characterize the information to be encoded. The information is then encoded based on characterization information developed from analyzing the video. Thus, the encoding of a given frame depends not only on the characterization of the informational needs of the data corresponding to the given frame but to past and future frames, relative to the given frame. By characterizing the information before encoding, the exemplary embodiment gains knowledge about the clip and can make better decisions in allocating bandwidth resources. The characterizing of future information and use of the characterization of future video in making decisions for encoding of present video makes the system properly classified as "non-causal.”
  • the exemplary embodiments are particularly concerned with video information encoded according to H.263.
  • material aspects of H.263 are outlined below, followed by a description of the exemplary embodiments.
  • a video sequence is encoded and decoded as a sequence of frames or pictures.
  • Each picture is organized as groups of blocks (GOBs), macroblocks, and blocks, and each picture may be of a variety of picture formats and subformats.
  • each picture may be of the INTRA type, also known as an "I” frame, or the INTER type, which includes entities known as "P" frames and "PB" frames.
  • FIG. 1 shows a standard source encoder 100.
  • Coding controller 110 controls switches 120 and 130 and quantizer Q. In short, the controller 110 controls the sampling rate, or frame rate, of the video sequence by selecting particular frames of Video In, and it controls the level of detail of each encoded picture by supplying quantization parameters qz to quantizer Q.
  • the output information 'q' is a compressed version of a picture: an I frame is a compressed version of a current picture, or portion thereof, and a P or PB frame is a compressed version of difference information, representing the difference between the current picture, or portion thereof, and the last, or next in the case of B-frames.
  • coding controller 110 connects switch 120 to input 121 and switch 130 to input 131.
  • Video In is connected to the transform block T, which processes the data according to a known discrete cosine transform (DCT).
  • DCT discrete cosine transform
  • the transformed data known as transform coefficients, are then quantized by quantizer Q, and the quantized information 'q' is received by inverse quantizer Q "1 .
  • the inverse quantized information is received by inverse transform T " ⁇ and the inverse transformed information is received by summation node 140.
  • summation node 140 adds unconnected input 131 and the reconstituted picture information from the inverse transform T " ⁇ the reconstituted picture information being the picture Video In after it has been transformed, quantized, inverse quantized, and inverse transformed.
  • the output of the summation node 140 is thus the reconstituted picture information, which is stored in picture memory P.
  • Difference node 150 produces the difference between the current block, Video In, and the prior, reconstituted block, provided by picture memory P. This difference information is then transformed (T), quantized (Q), inverse quantized (Q "1 ), and inverse transformed (T ⁇ 1 ) analogously to that described above.
  • the information provided by inverse transform ( 1 ), in this arrangement, however, is not the reconstituted picture, but rather the reconstituted difference information originally provided by difference node 150.
  • Summation node 140 adds the reconstituted difference information to the prior block, provided by input 132, to yield a reconstituted version of the current picture, which is stored in picture memory P.
  • the compression gains achieved by the above arrangement result from the statistical nature of video information and from the quantization rules.
  • a given pixel in one frame is likely to be the same or nearly the same as a corresponding pixel of a prior frame.
  • pixels having no difference from one picture to the next tend to run contiguously so that many adjacent pixels might be identical to the corresponding pixels of a prior frame.
  • the H.263 encoding methods address the above with a variety of techniques that can be summarized by stating that the more probable video information events require less encoded bits than the less probable events, to maintain a certain image quality. (See ITU-T Recommendation H.263 of 2 May 1996, at 25-27, variable and fixed length codes for transform coefficients.)
  • the amount of bits needed to represent a picture or block depends primarily on three things: (a) whether the blocks are being encoded as INTER or INTRA type; (b) the informational nature of a block in relation to a prior block; and (c) the level of quantization used in the encoding.
  • INTER-based encoding requires less bits than INTRA-based encoding for a given frame, and coarser detail requires less encoded bits than finer detail, everything else being equal.
  • the selection of INTER or INTRA and the level of quantization are the primary independent variables affecting the amount of data required to encode a given picture, Video In.
  • Figure 2 shows an exemplary system 200 for encoding and packetizing files of variable bitrate video data.
  • Video In i.e., unencoded data
  • the characte zer 260 analyzes and characterizes the video portion.
  • the encoder 220 encodes the information, under the control of coding control 225 and bitrate controller 210.
  • the bitrate controller 210 acts in response to characterization information 265 provided by characterizer 260.
  • the encoded data 227 is received by packetizer 230, which is partially controlled by bitrate controller 210, and which packetizes the information into a specified format to create a file 235.
  • File 235 may then be used by server logic 240 to stream the information over the network 250.
  • the above embodiment thus forms a two-pass approach to encoding.
  • the first pass analyzes and characterizes the video information to be encoded.
  • the second pass encodes the video information to be encoded, adjusting encoding parameters based on the characterization of the video.
  • Video Characterizer An exemplary characterizer 260 analyzes and characterizes the entire set of data to be encoded before any encoding operations begin.
  • the characterization provides information indicative of the average motion and the average error of the video, as well as indications of scene changes. Both "motion" and “error” are characteristics known in the video encoding art.
  • a "scene change" is inferred whenever the error of one frame relative to a prior frame is so large that the probability that the two frames represent a change in scenes is high.
  • FIG. 3 is a flowchart of exemplary characterization logic.
  • the logic starts in step 300 and proceeds to step 310, which initializes characterization parameters and divides the video to be encoded into N segments.
  • Each segment corresponds to a predetermined time duration, e.g., 3 seconds of playback, which in turn depends on the reaction time of the codecs involved. Thus, in an encoding arrangement targeting 30 frames/second, a segment would correspond to 90 frames of video.
  • the last segment (N+1) has a high likelihood of corresponding to fractional portion of 3 seconds and is treated as such by default. It must be noted that this segment size is just exemplary, and various segment sizes may be used with as few as two frames of video per segment.
  • the logic proceeds to step 320 to determine whether the segment pointer is pointing to the last segment. If so, the logic proceeds to step 330 which characterizes the segment with default values (more below) and then proceeds to step 399 to end the logic flow.
  • step 320 determines that the last segment is not being pointed to, the logic proceeds to step 340.
  • step 340 the segment is considered in sub- segments with each sub-segment corresponding to what will eventually be a frame of data. Thus, if the targeted encoding rate is 30 frames per second, a sub- segment will correspond to 1/30th of a second of video. Starting with the second such sub-segment and for each subsequent sub-segment of the segment, e.g., sub-segment 2 through 90, the logic calculates a Motion Parameter (MP) and an Error Parameter (EP) using largely conventional techniques such as arithmetic difference and city-block distance.
  • MP Motion Parameter
  • EP Error Parameter
  • an exhaustive search is performed to find the Motion Vector that results in the smallest Y-Luma differential coding error with respect to a prior frame. This is done for each macroblock-corresponding portion of the sub-segment.
  • the Motion Vectors are limited to [-16, 15.5] and are restricted to data that is completely within the video frame boundaries, i.e., the Unrestricted Motion Vector Annex (of H.263) is not allowed.
  • An exemplary way of determining the Y-Luma error is calculated by averaging over the absolute values of the differences of all pixels in the MacroBlock (i.e., this is not a RMS error).
  • the resulting Motion Vector pair [dx,dy] is used to calculate MP by summing the absolute values of dx and dy (city block distance).
  • EP is set to the corresponding Y-Luma error summation.
  • step 350 calculates an Average Motion Parameter (AMP) and an Average Error Parameter (AEP) for the segment by considering the MPs and EPs of the sub-segments.
  • step 360 determines whether any of the
  • EPs for the sub-segments exceeds a threshold value that would correspond to a scene change.
  • Scene changes are important from an informational standpoint and thus warrant the use of "extra bandwidth" to encode the video.
  • Each EP must be considered to detect a scene change, i.e., a large error from one frame to the next, because there is no guarantee that a scene change could be detected from analyzing AEP alone.
  • step 370 updates a data structure used to hold information characterizing the video.
  • the structure holds AEP and AMP values for each segment and includes a marker to indicate whether a given segment has a scene change, i.e., a relatively high EP for at least one of the sub-segments in the segment.
  • step 320 determines whether more segments need to be characterized, as explained above.
  • Encoder 220 includes the elements of Figure 1 , except that the coding controller 110 of Figure 1 is divided into controller 225 and bitrate controller 210, discussed below.
  • Controller 225 includes control features known in the art as well as control features described in the related applications identified and incorporated above. (The application entitled Improved Video Encoding System And Method includes features that would control the bitrate of the data stream in a mutually- exclusive manner to that covered by the bitrate controller 210. Under one embodiment of the invention, a user input allows selection between the two.
  • Bitrate Controller Figures 4A-G are flowcharts of exemplary bitrate controller logic. Briefly, the logic operates in the following manner. A target bitrate and quantization level are received from a user.
  • Each segment e.g., 3 seconds of video, is then compared to some global characteristic information. Thus, it may be determined whether a given segment has more motion than average or more error than average. Depending on the comparison the bitrate and quantization parameters may be adjusted relative to the targets. Thus, a segment having a relatively large amount of motion may be allocated more bits of encoded information than a segment having less motion.
  • the segment is encoded using conventional techniques. Thus, the encoding of a given segment depends on characteristics of prior and future segments to be encoded.
  • TBitrate target bitrate
  • TQuant target quantization level
  • TBitrate might be set at 20 Kb/s to allow adequate resources for audio data or the like.
  • Quant may vary from 1-31 with lower numbers corresponding to finer detail and consequently requiring more bits to encode the data.
  • step 404 the first segment is encoded using TBitrate and TQuant as the encoding parameters.
  • the encoding of video data, responsive to the above parameters, is conventional.
  • step 406 a Global Average Error Parameter (GAEP) and a Global Average Motion Parameter (GAMP) are calculated. This is done by averaging the AEPs and the AMPs, respectively, of the N-1 segments between the first segment and the fractional segment. GAEP and GAMPs thus provide characteristic information of the entire video clip to be encoded.
  • GAEP Global Average Error Parameter
  • GAMP Global Average Motion Parameter
  • step 408 the segment pointer (SP) and the Bitcredit variable (Bitcredit) are intialized.
  • SP is initialized to the second segment and Bitcredit is initialized to 0.
  • Bitcredit effectively keeps a running total of the otherwise unused bandwidth which may be used for encoding video, if needed.
  • step 410 it is determined whether the AEP of the current segment being encoded (i.e., AEP[SP]) has twice the global average error (i.e., energy) of the whole clip (GAEP). The step also determines whether the AEP of the current segment is 1.5 times the global average of the whole clip and whether the current segment (i.e., SP) has a scene change marker set from the characterization phase.
  • step 412 sets the variable Bitrate equal to the TBitrate and the Bitcredit (which initially starts at zero, but which as explained below can grow as a result of frames not needing many bits for encoding thereof).
  • step 414 it is determined whether the AMP of the current segment has 1.5 times the global average motion of the whole clip (GAMP). If so, the logic proceeds to step 416 in which the variable Quant is set to the target Quant (TQuant) plus three; thus setting the quantization parameter to correspond to coarser detail to better encode the relatively higher motion segment. If step 414 determines that the current segment is not a relatively high motion segment, as compared to the global average, the logic proceeds to step 418, in which it is determined whether the current segment is a relatively low motion segment. In particular, the AMP of the current segment is compared to 0.5 times the GAMP.
  • step 420 the quantization variable Quant is adjusted to correspond to finer detail; in this instance Quant equals TQuant minus three.
  • step 468 the current segment's frames are encoded using conventional encoding logic that is responsive to the encoding variables Quant and Bitrate (e.g., version 2.0 of Telenor's public domain software, or the logic of the related, pending applications identified above).
  • step 470 the segment pointer is incremented and then to step 472 in which it is determined whether the segment pointer is pointing to the last whole segment.
  • the last segment of the clip that is, the fractional segment, is encoded using the target quantization (TQuant) and the target Bitrate (TBitrate) in step 474 and the logic ends in step 499.
  • TQuant target quantization
  • TBitrate target Bitrate
  • step 410 If the segment pointer is not pointing to the last segment, the logic loops back to step 410 described above.
  • step 410 determines that the current segment does not have a relatively high error parameter (i.e., not high energy)
  • the logic proceeds to step 422, FIG. 4C.
  • Step 422 determines whether the current segment has a relatively high error parameter, but not as high as that described above.
  • step 422 compares the AEP of the current segment to 1.5 times the global AEP and also compares the AEP to the global AEP if the current segment also includes a scene change. (Again, a scene change may not necessarily require a high AEP, but nonetheless corresponds to at least some of the frames having high error parameters) If the comparison determines that the current frame has this second stage of relatively high error parameter, the logic proceeds to step 424.
  • Bitrate is set to TBitrate plus one-half of the Bitcredit.
  • Bitcredit is allocated for the segments having a relatively high error parameter but which is less high than the first tier discussed above.
  • step 426 in which the AMP is compared to 1.5 times the GAMP, and in which if the AMP is larger the logic proceeds to step 428 to adjust Quant to TQuant plus two.
  • step 430 in which the AMP is compared to 0.5 times the GAMP, and in which if the AMP is smaller the logic proceeds to step 432 to adjust Quant to TQuant minus two.
  • step 468 FIG. 4G, described above. If step 422 determines that the AEP and scene change marker do not indicate that the current segment is a second tier, relatively high error parameter segment, then the logic proceeds to step 434, FIG. 4D.
  • Step 434 determines whether or not the current segment has a scene change marker. As mentioned above, a segment having a scene change need not have as high an AEP to qualify for additional bandwidth allocation.. If the current segment does have a scene change, Bitrate and Quant are set to their target values in step 436 and the logic proceeds to step 468, as described above. If the current segment does not contain a scene change the logic proceeds to 438, FIG. 4E. Step 438 determines whether the current segment has a relatively low error parameter, in particular whether AEP is 0.5 times the global average error parameter (GAEP). If the AEP is less, the logic proceeds to step 440.
  • GEP global average error parameter
  • Bitrate is set to 0.5 times TBitrate, the underlying principle being that a relatively lower energy segment should require a lower bitrate for the encoding thereof. Correspondingly, the use of a lower bitrate correlates to otherwise available bandwidth not being used for the current segment. Thus, Bitcredit is adjusted upward by adding 0.5 times TBitrate to Bitcredit.
  • the logic proceeds to step 442 in which the AMP of the current segment is compared to 1.5 times the GAMP. If the AMP is larger, indicating that the current segment has relatively high motion compared to the whole clip, the logic proceeds to step 444 which adjusts Quant to be 2 higher, and thus coarser, than the target quantization TQuant. If the AMP is lower, the logic proceeds to step 446.
  • step 446 the AMP is compared to 0.5 times the GAMP. If the AMP is lower, indicating that the current segment has relatively low motion compared to the whole clip, the logic proceeds to step 448 which adjusts Quant to be 2 lower, and thus finer, than the target quantization TQuant. If the AMP is higher, the logic proceeds to step 450.
  • step 450 the AMP is compared to 0.1 times the GAMP. If the AMP is lower, indicating that the current segment has the relative lowest motion compared to the whole clip, and can be assumed to be a still frame, the logic proceeds to step 448 which adjusts Quant to be 5 lower, and thus finer, than the target quantization TQuant. The logic then proceeds to step 468, described above.
  • step 438 determines that the error parameter for the current segment is not lower than 0.5 times the GAEP, the logic proceeds to step 454, FIG. 4F.
  • step 454 the AEP is compared to 0.75 times the GAEP. This step, in conjunction with step 438, determines whether the AEP lay between 0.5 and 0.75 times the GAEP, or whether the AEP is greater than 0.75 times the GAEP. If the AEP is greater than 0.75 times the GAEP, the logic proceeds to step 456 which sets the quantization and bitrate encoding parameters, Quant and Bitrate, to the target values, set by the user. The logic would then proceed to step 468, described above. If the AEP is below 0.75 the GAEP (which again in conjunction with step
  • step 438 means that the AEP is below 0.75 times the GAEP but above 0.5 times the GAEP), then the logic proceeds to step 458.
  • Bitrate is set to 0.75 times the target bitrate, reflecting that this segment has relatively low error, or energy, compared to the average energy of the clip. This conservation of bitrate is also reflected in step 458 adjusting the Bitcredit upward by 0.25 times the bitrate. The logic then proceeds to step 460.
  • step 460 the AMP is compared to 1.5 times the GAMP. If the AMP is greater, indicating that the current segment has relatively high motion compared to the whole clip, the logic proceeds to step 462 which adjusts Quant to be 2 higher, and thus coarser, than the target quantization TQuant. If the AMP is lower, the logic proceeds to step 464.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A system (200) and method of non-causal encoding of video information (225), the video information (225) including a plurality of video segments, which involves: analyzing (260) the video information (225) to obtain global characterization information about the video information (225) and segment characterization information (265) about the plurality of video segments; and encoding (220) the video segments based on the global characterization information and the segment characterization (265) of the video segment being encoded (227).

Description

INTERNATIONAL SEARCH REPORT International application No. PCT/US99/02411
A. CLASSIFICATION OF SUBJECT MATTER
IPC(6) .H04N 7/12
US CL . 348/410, 416; 382/236 According to international Patent Classification (IPC) or to both national classification and IPC FIELDS SEARCHED
Minimum documentation searched (classification system followed by classification symbols) U.S. . 348/410, 416; 382 236
Documentation searched other than minimum documentation to the extent that such documents are included in the fields searched
Electronic data base consulted during the international search (name of data base and, where practicable, search terms used)
C. DOCUMENTS CONSIDERED TO BE RELEVANT
Category* Citation of document, with indication, where appropriate, of the relevant passages Relevant to claim No.
X, P US 5,832,125 A (REESE et al.) 03 November 1998, col. 3, lines 4- 1, 9, 10 10. Y, P 2-8 Y, P US 5,774,593 A (ZICK et al.) 30 June 1998, col. 10, lines 48-67. 2-8 A US 5,225,904 A (GOLIN et al.) 06 July 1993, col. 1, lines 18-46. 1-10 A US 5,708,473 A (MEAD) 13 January 1998, col. 2, lines 39-67. 1-10 A US 5,227,878 A (PURI et al.) 13 July 1993, col. 2, lines 48-68, col. 1-10 3, lines 1-6.
I I Further documents are listed in the continuation of Box C. | | See patent family annex.
* Special categories of cited ooumanta later document published efter the international filing date or priority date and not in conflict with the application but cited to understand
'A' doeumanl defining tha general Mate of the art which - not considered the principle or theory underlying the invβnuon to be of particular -relevance document of particular relevance, the claimed invention cannot be
'Ea earlier document published on or after the mtemauonal filing date considered novel or cannot be considered to involve an mvetrtrve step
" " document which may throw doubts on priority cleimls. or which u when the document is taken alone cited to establish the publication data of another citauon or other special reason (as specified) document of particular relevance, die claimed invention cannot be considered to involve an tnvenuve step when the document is
*0" document referring to an oral disclosure, use. exhibition or other combined with one or more other such documents, such combm at-on means being obvious to • person skilled in the art
P" document published prior to the international filing date but later than document member of the same patent family the priority data claimed
Date of the actual completion of the international search Date of mailing of the international search report
05 APRIL 1999 03 JUN 1999
Name and mailing address of the ISA/US Authorized officer Commissioner of Patents and Trademarks Box PCT Washington, D.C 20231 Tommy P. Chin Vj(_*t .
Facsimile No. (703) 305-3230 Telephone No. (703) 305-4700 H -
Form PCT/ISA 210 (second sheet)(July 1992)* System and Method for Non-Causal Encoding of Video Information for Improved Streaming Thereof
Cross-Reference to Related Applications This application is related to the following U.S. Patent Applications, both of which are assigned to the assignee of this application and incorporated by reference herein:
System and Device for, and Method of, Encoding Video Information for
Improved Streaming Thereof having Feng Chi Wang as an inventor, U.S. Pat. Apl. Ser. No. 08/885,076, filed June 30, 1997; and
Improved Video Encoding System and Method having Feng Chi Wang and
Manickam Sridhar as inventors, U.S. Pat. Apl. Ser. No. 08/711 ,702, filed
September 6, 1996.
Field of invention
The invention generally relates to multimedia applications and, more particularly, to the encoding and streaming of video information over a communication network.
Background of Invention
Generally speaking, there are two modern approaches to "playing-back" multimedia information located at a remote location, such as playing-back a "video clip" on the Internet. The first approach is to have a client node download a file having the video information from a corresponding "website," or server node, and to then play-back the information, once the file has been completely transferred. The second approach is to have the server node "stream" the information to the client node so that the client may begin play-back soon after the information starts to arrive. Because the streaming approach does not suffer from the long start-up delays inherent in the downloading approach, it is believed to be preferable in certain regards.
It is believed that a substantial number of remote access users, such as Internet users, access the network via a voiceband modem. To this end, various communication standards have been proposed. H.261 and H.263, for example, each specify a coded representation that can be used for compressing video at low bitrates. (See ITU-T Recommendation H.263 of 2 May 1996, which is hereby incorporated by reference in its entirety.) Because typical voiceband modems have maximum data rates of less than 56 Kb/s, the quality of a streamed play-back depends on how effectively the channel is used. TrueStream Streaming Software, version 1.1 , for example, keeps the channel at full utilization to improve the play-back's appearance. (TrueStream Streaming Software, version 1.1 , is available from Motorola, Inc.) In short, with version 1.1 of the TrueStream Streaming Software, a target data rate is first selected, for example, 20 Kb/sec for a 28.8 Kb/sec modem. (The other 8.8 Kb/sec of bandwidth is saved for audio information and packet overhead.) If a sequence of video frames is to be encoded and because of its inherent informational content the streaming of the encoded data would require a data rate higher than the target rate to maintain a certain image quality level, then the TrueStream system adjusts certain encoding parameters to reduce image quality and compress the image so that encoded frames fit into the channel bandwidth. On the other hand, if a sequence of video frames is to be encoded such that the streaming of it would not fully utilize the channel, the TrueStream system applies a "use it or lose it" approach and adjusts certain encoding parameters to improve the video quality and so that the channel capacity is used. For some images, e.g. a still blank screen, the expenditure of additional bits will have little or no improvement on the image quality. In these cases, the "use it or lose it" approach wastes bits. The consequence of the above is that in the former case the sequence will be played-back with pictures having a relatively coarser level of detail and a relatively smaller frame rate, and that in the latter case, the sequence will be played-back with pictures having a finer level of detail and a higher frame rate.
Brief Description of the Drawings In the Drawing,
Figure 1 shows a standard source encoder 100 having a known architecture; Figure 2 shows a system architecture of an exemplary embodiment of the invention;
Figure 3 is a flowchart of characterization logic of an exemplary embodiment of the invention;
Figures 4A-G are a flowchart of bitrate controller logic of an exemplary embodiment of the invention. Detailed Description
The invention involves a system and method for non-causal encoding of video information for improved streaming thereof. Exemplary embodiments of the invention encode video information at variable bitrates and by doing so can utilize the channel in new and useful ways. For example, if the informational content of certain video frames can be encoded at a relatively low bitrate, the otherwise- unused channel capacity can be used to transmit information that will be needed for future video frames. This might be helpful in that a future portion of a video stream might, because of its informational content, require so much data that it otherwise is streamable only by sacrificing the quality of its play-back.
Exemplary embodiments first analyze and characterize the information to be encoded. The information is then encoded based on characterization information developed from analyzing the video. Thus, the encoding of a given frame depends not only on the characterization of the informational needs of the data corresponding to the given frame but to past and future frames, relative to the given frame. By characterizing the information before encoding, the exemplary embodiment gains knowledge about the clip and can make better decisions in allocating bandwidth resources. The characterizing of future information and use of the characterization of future video in making decisions for encoding of present video makes the system properly classified as "non-causal."
The exemplary embodiments are particularly concerned with video information encoded according to H.263. Thus, material aspects of H.263 are outlined below, followed by a description of the exemplary embodiments.
Outline of H.263
A video sequence is encoded and decoded as a sequence of frames or pictures. Each picture is organized as groups of blocks (GOBs), macroblocks, and blocks, and each picture may be of a variety of picture formats and subformats. In addition, each picture may be of the INTRA type, also known as an "I" frame, or the INTER type, which includes entities known as "P" frames and "PB" frames.
An "I" frame is independent in that it represents a complete image. Its encoding and decoding have no dependencies on prior frames. With "P" and "PB" frames, on the other hand, the encoding and decoding depends on prior and/or future frames. P and PB frames may be thought of as an encoded representation of the difference between one picture and other picture(s). Figure 1 shows a standard source encoder 100. Coding controller 110, among other things, controls switches 120 and 130 and quantizer Q. In short, the controller 110 controls the sampling rate, or frame rate, of the video sequence by selecting particular frames of Video In, and it controls the level of detail of each encoded picture by supplying quantization parameters qz to quantizer Q. The output information 'q' is a compressed version of a picture: an I frame is a compressed version of a current picture, or portion thereof, and a P or PB frame is a compressed version of difference information, representing the difference between the current picture, or portion thereof, and the last, or next in the case of B-frames.
For I frame blocks, coding controller 110 connects switch 120 to input 121 and switch 130 to input 131. Thus, Video In is connected to the transform block T, which processes the data according to a known discrete cosine transform (DCT). The transformed data, known as transform coefficients, are then quantized by quantizer Q, and the quantized information 'q' is received by inverse quantizer Q"1. The inverse quantized information, in turn, is received by inverse transform T"\ and the inverse transformed information is received by summation node 140. Consequently, summation node 140 adds unconnected input 131 and the reconstituted picture information from the inverse transform T"\ the reconstituted picture information being the picture Video In after it has been transformed, quantized, inverse quantized, and inverse transformed. The output of the summation node 140 is thus the reconstituted picture information, which is stored in picture memory P.
For P or PB type blocks, coding controller 110 controls switches 120 and 130 to connect to inputs 122 and 132, respectively. Difference node 150 produces the difference between the current block, Video In, and the prior, reconstituted block, provided by picture memory P. This difference information is then transformed (T), quantized (Q), inverse quantized (Q"1), and inverse transformed (T~1) analogously to that described above. The information provided by inverse transform ( 1), in this arrangement, however, is not the reconstituted picture, but rather the reconstituted difference information originally provided by difference node 150. Summation node 140 adds the reconstituted difference information to the prior block, provided by input 132, to yield a reconstituted version of the current picture, which is stored in picture memory P. The compression gains achieved by the above arrangement result from the statistical nature of video information and from the quantization rules. In particular, a given pixel in one frame is likely to be the same or nearly the same as a corresponding pixel of a prior frame. Moreover, pixels having no difference from one picture to the next tend to run contiguously so that many adjacent pixels might be identical to the corresponding pixels of a prior frame. The H.263 encoding methods address the above with a variety of techniques that can be summarized by stating that the more probable video information events require less encoded bits than the less probable events, to maintain a certain image quality. (See ITU-T Recommendation H.263 of 2 May 1996, at 25-27, variable and fixed length codes for transform coefficients.)
The amount of bits needed to represent a picture or block depends primarily on three things: (a) whether the blocks are being encoded as INTER or INTRA type; (b) the informational nature of a block in relation to a prior block; and (c) the level of quantization used in the encoding. Typically, INTER-based encoding requires less bits than INTRA-based encoding for a given frame, and coarser detail requires less encoded bits than finer detail, everything else being equal. Thus, the selection of INTER or INTRA and the level of quantization are the primary independent variables affecting the amount of data required to encode a given picture, Video In.
II. Overview of a System and Method for Non-Causal Encoding of Video Information for Improved Streaming Thereof
Figure 2 shows an exemplary system 200 for encoding and packetizing files of variable bitrate video data. Video In, i.e., unencoded data, is received by encoder 220 and video characterizer 260. The characte zer 260 analyzes and characterizes the video portion. After, the video has been characterized, the encoder 220 encodes the information, under the control of coding control 225 and bitrate controller 210. The bitrate controller 210, in turn, acts in response to characterization information 265 provided by characterizer 260. The encoded data 227 is received by packetizer 230, which is partially controlled by bitrate controller 210, and which packetizes the information into a specified format to create a file 235. File 235 may then be used by server logic 240 to stream the information over the network 250.
The above embodiment thus forms a two-pass approach to encoding. The first pass analyzes and characterizes the video information to be encoded. The second pass encodes the video information to be encoded, adjusting encoding parameters based on the characterization of the video. o a. Video Characterizer An exemplary characterizer 260 analyzes and characterizes the entire set of data to be encoded before any encoding operations begin. The characterization provides information indicative of the average motion and the average error of the video, as well as indications of scene changes. Both "motion" and "error" are characteristics known in the video encoding art. A "scene change" is inferred whenever the error of one frame relative to a prior frame is so large that the probability that the two frames represent a change in scenes is high.
Figure 3 is a flowchart of exemplary characterization logic. The logic starts in step 300 and proceeds to step 310, which initializes characterization parameters and divides the video to be encoded into N segments. Each segment corresponds to a predetermined time duration, e.g., 3 seconds of playback, which in turn depends on the reaction time of the codecs involved. Thus, in an encoding arrangement targeting 30 frames/second, a segment would correspond to 90 frames of video. As part of the dividing step, the last segment (N+1) has a high likelihood of corresponding to fractional portion of 3 seconds and is treated as such by default. It must be noted that this segment size is just exemplary, and various segment sizes may be used with as few as two frames of video per segment. The logic proceeds to step 320 to determine whether the segment pointer is pointing to the last segment. If so, the logic proceeds to step 330 which characterizes the segment with default values (more below) and then proceeds to step 399 to end the logic flow.
If step 320 determines that the last segment is not being pointed to, the logic proceeds to step 340. In step 340, the segment is considered in sub- segments with each sub-segment corresponding to what will eventually be a frame of data. Thus, if the targeted encoding rate is 30 frames per second, a sub- segment will correspond to 1/30th of a second of video. Starting with the second such sub-segment and for each subsequent sub-segment of the segment, e.g., sub-segment 2 through 90, the logic calculates a Motion Parameter (MP) and an Error Parameter (EP) using largely conventional techniques such as arithmetic difference and city-block distance.
Namely, to determine MP and EP for a sub-segment, an exhaustive search is performed to find the Motion Vector that results in the smallest Y-Luma differential coding error with respect to a prior frame. This is done for each macroblock-corresponding portion of the sub-segment. The Motion Vectors are limited to [-16, 15.5] and are restricted to data that is completely within the video frame boundaries, i.e., the Unrestricted Motion Vector Annex (of H.263) is not allowed. An exemplary way of determining the Y-Luma error is calculated by averaging over the absolute values of the differences of all pixels in the MacroBlock (i.e., this is not a RMS error). The resulting Motion Vector pair [dx,dy] is used to calculate MP by summing the absolute values of dx and dy (city block distance). EP is set to the corresponding Y-Luma error summation.
The logic then proceeds to step 350 which calculates an Average Motion Parameter (AMP) and an Average Error Parameter (AEP) for the segment by considering the MPs and EPs of the sub-segments. The logic then proceeds to step 360 which determines whether any of the
EPs for the sub-segments exceeds a threshold value that would correspond to a scene change. Scene changes are important from an informational standpoint and thus warrant the use of "extra bandwidth" to encode the video. Each EP must be considered to detect a scene change, i.e., a large error from one frame to the next, because there is no guarantee that a scene change could be detected from analyzing AEP alone.
The logic then proceeds to step 370 which updates a data structure used to hold information characterizing the video. Among other things, the structure holds AEP and AMP values for each segment and includes a marker to indicate whether a given segment has a scene change, i.e., a relatively high EP for at least one of the sub-segments in the segment.
The logic proceeds back to step 320 to determine whether more segments need to be characterized, as explained above.
b. Encoder and Controller
Encoder 220 includes the elements of Figure 1 , except that the coding controller 110 of Figure 1 is divided into controller 225 and bitrate controller 210, discussed below. Controller 225 includes control features known in the art as well as control features described in the related applications identified and incorporated above. (The application entitled Improved Video Encoding System And Method includes features that would control the bitrate of the data stream in a mutually- exclusive manner to that covered by the bitrate controller 210. Under one embodiment of the invention, a user input allows selection between the two. c. Bitrate Controller Figures 4A-G are flowcharts of exemplary bitrate controller logic. Briefly, the logic operates in the following manner. A target bitrate and quantization level are received from a user. Each segment, e.g., 3 seconds of video, is then compared to some global characteristic information. Thus, it may be determined whether a given segment has more motion than average or more error than average. Depending on the comparison the bitrate and quantization parameters may be adjusted relative to the targets. Thus, a segment having a relatively large amount of motion may be allocated more bits of encoded information than a segment having less motion. After a segment is analyzed in comparison to the global information, and consequently after the encoding parameters are possibly adjusted in response thereto, the segment is encoded using conventional techniques. Thus, the encoding of a given segment depends on characteristics of prior and future segments to be encoded. The logic starts at step 400 and proceeds to step 402 in which a target bitrate (TBitrate) and a target quantization level (TQuant) are received as user inputs. For example, on a targeted 28.8 Kb/s connection, TBitrate might be set at 20 Kb/s to allow adequate resources for audio data or the like. Under H.263, Quant may vary from 1-31 with lower numbers corresponding to finer detail and consequently requiring more bits to encode the data.
The logic proceeds to step 404 in which the first segment is encoded using TBitrate and TQuant as the encoding parameters. The encoding of video data, responsive to the above parameters, is conventional.
The logic proceeds to step 406 in which a Global Average Error Parameter (GAEP) and a Global Average Motion Parameter (GAMP) are calculated. This is done by averaging the AEPs and the AMPs, respectively, of the N-1 segments between the first segment and the fractional segment. GAEP and GAMPs thus provide characteristic information of the entire video clip to be encoded.
The logic proceeds to step 408 in which the segment pointer (SP) and the Bitcredit variable (Bitcredit) are intialized. SP is initialized to the second segment and Bitcredit is initialized to 0. As will be explained below, Bitcredit effectively keeps a running total of the otherwise unused bandwidth which may be used for encoding video, if needed.
The logic proceeds to step 410 in which it is determined whether the AEP of the current segment being encoded (i.e., AEP[SP]) has twice the global average error (i.e., energy) of the whole clip (GAEP). The step also determines whether the AEP of the current segment is 1.5 times the global average of the whole clip and whether the current segment (i.e., SP) has a scene change marker set from the characterization phase.
If either of the above are true, then the logic proceeds to step 412 which sets the variable Bitrate equal to the TBitrate and the Bitcredit (which initially starts at zero, but which as explained below can grow as a result of frames not needing many bits for encoding thereof).
The logic proceeds to step 414 in which it is determined whether the AMP of the current segment has 1.5 times the global average motion of the whole clip (GAMP). If so, the logic proceeds to step 416 in which the variable Quant is set to the target Quant (TQuant) plus three; thus setting the quantization parameter to correspond to coarser detail to better encode the relatively higher motion segment. If step 414 determines that the current segment is not a relatively high motion segment, as compared to the global average, the logic proceeds to step 418, in which it is determined whether the current segment is a relatively low motion segment. In particular, the AMP of the current segment is compared to 0.5 times the GAMP. If the AMP is less, corresponding to a relatively low motion segment, the logic proceeds to step 420 in which the quantization variable Quant is adjusted to correspond to finer detail; in this instance Quant equals TQuant minus three. The logic then proceeds to step 468 (FIG. 4G) in which the current segment's frames are encoded using conventional encoding logic that is responsive to the encoding variables Quant and Bitrate (e.g., version 2.0 of Telenor's public domain software, or the logic of the related, pending applications identified above).
The logic then proceeds to step 470 in which the segment pointer is incremented and then to step 472 in which it is determined whether the segment pointer is pointing to the last whole segment.
If the segment pointer is pointing to the last whole segment, the last segment of the clip, that is, the fractional segment, is encoded using the target quantization (TQuant) and the target Bitrate (TBitrate) in step 474 and the logic ends in step 499.
If the segment pointer is not pointing to the last segment, the logic loops back to step 410 described above.
If step 410 determines that the current segment does not have a relatively high error parameter (i.e., not high energy), the logic proceeds to step 422, FIG. 4C. (The exemplary rules for determining relatively high error parameter were discussed above, when first discussing step 410) Step 422 determines whether the current segment has a relatively high error parameter, but not as high as that described above. In particular, step 422 compares the AEP of the current segment to 1.5 times the global AEP and also compares the AEP to the global AEP if the current segment also includes a scene change. (Again, a scene change may not necessarily require a high AEP, but nonetheless corresponds to at least some of the frames having high error parameters) If the comparison determines that the current frame has this second stage of relatively high error parameter, the logic proceeds to step 424.
In step 424, Bitrate is set to TBitrate plus one-half of the Bitcredit. Thus, only some of the accumulated Bitcredit is allocated for the segments having a relatively high error parameter but which is less high than the first tier discussed above.
The logic proceeds to step 426 in which the AMP is compared to 1.5 times the GAMP, and in which if the AMP is larger the logic proceeds to step 428 to adjust Quant to TQuant plus two.
The logic proceeds to step 430 in which the AMP is compared to 0.5 times the GAMP, and in which if the AMP is smaller the logic proceeds to step 432 to adjust Quant to TQuant minus two. The logic then proceeds to step 468, FIG. 4G, described above. If step 422 determines that the AEP and scene change marker do not indicate that the current segment is a second tier, relatively high error parameter segment, then the logic proceeds to step 434, FIG. 4D.
Step 434 determines whether or not the current segment has a scene change marker. As mentioned above, a segment having a scene change need not have as high an AEP to qualify for additional bandwidth allocation.. If the current segment does have a scene change, Bitrate and Quant are set to their target values in step 436 and the logic proceeds to step 468, as described above. If the current segment does not contain a scene change the logic proceeds to 438, FIG. 4E. Step 438 determines whether the current segment has a relatively low error parameter, in particular whether AEP is 0.5 times the global average error parameter (GAEP). If the AEP is less, the logic proceeds to step 440.
In step 440, Bitrate is set to 0.5 times TBitrate, the underlying principle being that a relatively lower energy segment should require a lower bitrate for the encoding thereof. Correspondingly, the use of a lower bitrate correlates to otherwise available bandwidth not being used for the current segment. Thus, Bitcredit is adjusted upward by adding 0.5 times TBitrate to Bitcredit. The logic proceeds to step 442 in which the AMP of the current segment is compared to 1.5 times the GAMP. If the AMP is larger, indicating that the current segment has relatively high motion compared to the whole clip, the logic proceeds to step 444 which adjusts Quant to be 2 higher, and thus coarser, than the target quantization TQuant. If the AMP is lower, the logic proceeds to step 446.
In step 446, the AMP is compared to 0.5 times the GAMP. If the AMP is lower, indicating that the current segment has relatively low motion compared to the whole clip, the logic proceeds to step 448 which adjusts Quant to be 2 lower, and thus finer, than the target quantization TQuant. If the AMP is higher, the logic proceeds to step 450.
In step 450, the AMP is compared to 0.1 times the GAMP. If the AMP is lower, indicating that the current segment has the relative lowest motion compared to the whole clip, and can be assumed to be a still frame, the logic proceeds to step 448 which adjusts Quant to be 5 lower, and thus finer, than the target quantization TQuant. The logic then proceeds to step 468, described above.
If step 438 determines that the error parameter for the current segment is not lower than 0.5 times the GAEP, the logic proceeds to step 454, FIG. 4F.
In step 454, the AEP is compared to 0.75 times the GAEP. This step, in conjunction with step 438, determines whether the AEP lay between 0.5 and 0.75 times the GAEP, or whether the AEP is greater than 0.75 times the GAEP. If the AEP is greater than 0.75 times the GAEP, the logic proceeds to step 456 which sets the quantization and bitrate encoding parameters, Quant and Bitrate, to the target values, set by the user. The logic would then proceed to step 468, described above. If the AEP is below 0.75 the GAEP (which again in conjunction with step
438 means that the AEP is below 0.75 times the GAEP but above 0.5 times the GAEP), then the logic proceeds to step 458.
In step 458, Bitrate is set to 0.75 times the target bitrate, reflecting that this segment has relatively low error, or energy, compared to the average energy of the clip. This conservation of bitrate is also reflected in step 458 adjusting the Bitcredit upward by 0.25 times the bitrate. The logic then proceeds to step 460.
In step 460, the AMP is compared to 1.5 times the GAMP. If the AMP is greater, indicating that the current segment has relatively high motion compared to the whole clip, the logic proceeds to step 462 which adjusts Quant to be 2 higher, and thus coarser, than the target quantization TQuant. If the AMP is lower, the logic proceeds to step 464.

Claims

In step 464, the AMP is compared to 0.5 times the GAMP. If the AMP is lower, indicating that the current segment has relatively low motion compared to the whole clip, the logic proceeds to step 466 which adjusts Quant to be 1 lower, and thus finer, than the target quantization TQuant. The logic then proceeds to step 468, described above.It should be noted that this invention may be embodied in software and/or firmware stored on a computer useable medium, such as a computer disk or memory chip. The invention may also take the form of a computer data signal embodied in a carrier wave, such as when the invention is embodied in software/firmware which is electrically transmitted, for example, over the Internet. The present invention may be embodied in other specific forms without departing from the spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range within the equivalency of the claims are to be embraced within their scope. What is claimed is:Claims
1. A method of non-causal encoding of video information, the video information including a plurality of video segments, the method comprising: analyzing the video information to obtain global characterization information about the video information and segment characterization information about the plurality of video segments; and encoding the video segments based on the global characterization information and the segment characterization information of the video segment being encoded.
2. The method of claim 1 further including dividing the video information into the plurality (N+1) of video segments including a first video segment, a last video segment (N+1) and remaining (N-1) segments between the first and last video segments.
3. The method of claim 2 wherein the step of analyzing includes calculating an average motion parameter (AMP) and an average error parameter (AEP) for each of the video segments except the last video segment, determining if a scene change has occurred for each of the plurality of video segments and storing the AMP, AEP and scene change information as the segment characterization information for the plurality of video segments.
4. The method of claim 3 wherein the step of analyzing further includes determining the global characterization information about the video information by calculating a global average error parameter (GAEP) and a global average motion parameter (GAMP) over the remaining (N-1) video segments.
5. The method of claim 4 wherein the step of encoding includes determining encoding parameters for the video segment of the remaining video segments being encoded based on the global characterization information and the segment characterization information of the video segment being encoded.
6. The method of claim 5 wherein the step of determining includes determining a first encoding parameter, a bitrate parameter, by comparing the AEP of the segment to be encoded with the GAEP and by considering the scene change information associated with the video segment to be encoded.
7. The method of claim 6 wherein the step of determining further includes determining a second encoding parameter, a quantization parameter, by comparing the AMP of the segment to be encoded with the GAMP.
8. The method of claim 7 wherein the step of determining the bitrate parameter includes accumulating unused bitrate when the bitrate parameter for a video segment being encoded is set below a target bitrate due to low AEP for that video segment relative to the GAEP and utilizing the unused bitrate for video segments having an AEP exceeding the GAEP by a first predetermined amount or by a second, lesser predetermined amount coupled with a scene change by setting the bitrate parameter at a value which exceeds the target bitrate.
9. A system for non-causal encoding of video information, the video information including a plurality of video segments, the system comprising: a video characterizer for analyzing the video information to obtain global characterization information about the video information and segment characterization information about the plurality of video segments; a controller, responsive to the video characterizer, for determining encoding parameters based on the global and segment characterization information; and an encoder, responsive to the controller, for encoding the video segments using the determined encoding parameters.
PCT/US1999/002411 1998-02-19 1999-02-04 System and method for non-causal encoding of video information for improved streaming thereof WO1999043157A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU25824/99A AU2582499A (en) 1998-02-19 1999-02-04 System and method for non-causal encoding of video information for improved streaming thereof

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US2628098A 1998-02-19 1998-02-19
US09/026,280 1998-02-19

Publications (1)

Publication Number Publication Date
WO1999043157A1 true WO1999043157A1 (en) 1999-08-26

Family

ID=21830912

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/002411 WO1999043157A1 (en) 1998-02-19 1999-02-04 System and method for non-causal encoding of video information for improved streaming thereof

Country Status (2)

Country Link
AU (1) AU2582499A (en)
WO (1) WO1999043157A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006099082A2 (en) * 2005-03-10 2006-09-21 Qualcomm Incorporated Content adaptive multimedia processing
US8879635B2 (en) 2005-09-27 2014-11-04 Qualcomm Incorporated Methods and device for data alignment with time domain boundary
US8948260B2 (en) 2005-10-17 2015-02-03 Qualcomm Incorporated Adaptive GOP structure in video streaming
US9131164B2 (en) 2006-04-04 2015-09-08 Qualcomm Incorporated Preprocessor method and apparatus
CN114168792A (en) * 2021-12-06 2022-03-11 北京达佳互联信息技术有限公司 Video recommendation method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5225904A (en) * 1987-10-05 1993-07-06 Intel Corporation Adaptive digital video compression system
US5227878A (en) * 1991-11-15 1993-07-13 At&T Bell Laboratories Adaptive coding and decoding of frames and fields of video
US5708473A (en) * 1994-08-30 1998-01-13 Hughes Aircraft Company Two stage video film compression method and system
US5774593A (en) * 1995-07-24 1998-06-30 University Of Washington Automatic scene decomposition and optimization of MPEG compressed video
US5832125A (en) * 1995-12-07 1998-11-03 Intel Corporation Bit rate control using short-term and long-term performance characterization

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5225904A (en) * 1987-10-05 1993-07-06 Intel Corporation Adaptive digital video compression system
US5227878A (en) * 1991-11-15 1993-07-13 At&T Bell Laboratories Adaptive coding and decoding of frames and fields of video
US5708473A (en) * 1994-08-30 1998-01-13 Hughes Aircraft Company Two stage video film compression method and system
US5774593A (en) * 1995-07-24 1998-06-30 University Of Washington Automatic scene decomposition and optimization of MPEG compressed video
US5832125A (en) * 1995-12-07 1998-11-03 Intel Corporation Bit rate control using short-term and long-term performance characterization

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006099082A2 (en) * 2005-03-10 2006-09-21 Qualcomm Incorporated Content adaptive multimedia processing
WO2006099082A3 (en) * 2005-03-10 2007-09-20 Qualcomm Inc Content adaptive multimedia processing
US9197912B2 (en) 2005-03-10 2015-11-24 Qualcomm Incorporated Content classification for multimedia processing
US8879635B2 (en) 2005-09-27 2014-11-04 Qualcomm Incorporated Methods and device for data alignment with time domain boundary
US8879857B2 (en) 2005-09-27 2014-11-04 Qualcomm Incorporated Redundant data encoding methods and device
US8879856B2 (en) 2005-09-27 2014-11-04 Qualcomm Incorporated Content driven transcoder that orchestrates multimedia transcoding using content information
US9071822B2 (en) 2005-09-27 2015-06-30 Qualcomm Incorporated Methods and device for data alignment with time domain boundary
US9088776B2 (en) 2005-09-27 2015-07-21 Qualcomm Incorporated Scalability techniques based on content information
US9113147B2 (en) 2005-09-27 2015-08-18 Qualcomm Incorporated Scalability techniques based on content information
US8948260B2 (en) 2005-10-17 2015-02-03 Qualcomm Incorporated Adaptive GOP structure in video streaming
US9131164B2 (en) 2006-04-04 2015-09-08 Qualcomm Incorporated Preprocessor method and apparatus
CN114168792A (en) * 2021-12-06 2022-03-11 北京达佳互联信息技术有限公司 Video recommendation method and device

Also Published As

Publication number Publication date
AU2582499A (en) 1999-09-06

Similar Documents

Publication Publication Date Title
US20200236357A1 (en) Independently coding frame areas
US6542546B1 (en) Adaptable compressed bitstream transcoder
US6490320B1 (en) Adaptable bitstream video delivery system
US6574279B1 (en) Video transcoding using syntactic and semantic clues
US6493386B1 (en) Object based bitstream transcoder
US8891370B2 (en) Method and apparatus for transmitting a coded video signal
US8218617B2 (en) Method and system for optimal video transcoding based on utility function descriptors
US5677969A (en) Method, rate controller, and system for preventing overflow and underflow of a decoder buffer in a video compression system
CN1726709B (en) Method and device for encoding image of uncompressed digital video frequency sequence
JP3519673B2 (en) Video data creation device and video encoding device
JPH07312756A (en) Circuit, device and method for conversion of information quantity of compressed animation image code signal
JP2004166128A (en) Method, device and program for coding image information
WO1999043157A1 (en) System and method for non-causal encoding of video information for improved streaming thereof
KR101063094B1 (en) Methods for Compressing Data
CN112004082B (en) Optimization method for code rate control by using double frames as control unit
CN112004083B (en) Method and system for optimizing code rate control by utilizing inter-frame prediction characteristics
AU678927C (en) Method, rate controller, and system for preventing overflow and underflow of a decoder buffer

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase