WO2000018134A1 - Realisation du saut de trames sans recourir a l'estimation du mouvement - Google Patents
Realisation du saut de trames sans recourir a l'estimation du mouvement Download PDFInfo
- Publication number
- WO2000018134A1 WO2000018134A1 PCT/US1999/021830 US9921830W WO0018134A1 WO 2000018134 A1 WO2000018134 A1 WO 2000018134A1 US 9921830 W US9921830 W US 9921830W WO 0018134 A1 WO0018134 A1 WO 0018134A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- frame
- motion
- current image
- distortion measure
- distortion
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/65—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using error resilience
- H04N19/69—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using error resilience involving reversible variable length codes [RVLC]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/107—Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
- H04N19/126—Details of normalisation or weighting functions, e.g. normalisation matrices or variable uniform quantisers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/132—Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/149—Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/15—Data rate or code amount at the encoder output by monitoring actual compressed data size at the memory before deciding storage at the transmission buffer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/152—Data rate or code amount at the encoder output by measuring the fullness of the transmission buffer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/154—Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/162—User input
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/189—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
- H04N19/192—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/189—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
- H04N19/196—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/189—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
- H04N19/196—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters
- H04N19/198—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters including smoothing of a sequence of encoding parameters, e.g. by averaging, by choice of the maximum, minimum or median value
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/587—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal sub-sampling or interpolation, e.g. decimation or subsequent interpolation of pictures in a video sequence
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
- H04N19/89—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving methods or arrangements for detection of transmission errors at the decoder
Definitions
- the present invention relates to image processing, and, in particular, to video compression.
- motion-compensated inter-frame differencing in which blocks of image data are encoded based on the pixel-to-pixel differences between each block in an image currently being encoded and a selected block in a reference image.
- the process of selecting a block in the reference image for a particular block in the current image is called motion estimation.
- the goal of motion estimation is to find a block in the reference image that closely matches the block in the current image such that the magnitudes of the pixel-to-pixel differences between those two blocks are small, thereby enabling the block in the current image to be encoded in the resulting compressed bitstream using a relatively small number of bits.
- a block in the current image is compared with different blocks of the same size and shape within a defined search region in the reference image.
- the search region is typically defined based on the corresponding location of the block in the current image with allowance for inter-frame motion by a specified number of pixels (e.g., 8) in each direction.
- Each comparison involves the computation of a mathematical distortion measure that quantifies the differences between the two blocks of image data.
- One typical distortion measure is the sum of absolute differences (SAD) which corresponds to the sum of the absolute values of the corresponding pixel-to-pixel differences between the two blocks, although other distortion measures may also be used.
- This selected block of reference image data is referred to as the "best integer-pixel location," because the distance between that block and the corresponding location of the block of current image data may be represented by a motion vector having X (horizontal) and Y (vertical) components that are both integers representing displacements in integer numbers of pixels.
- the process of selecting the best integer-pixel location is referred to as full-pixel or integer-pixel motion estimation.
- half-pixel motion estimation may be performed.
- the block of current image data is compared to reference image data corresponding to different half-pixel locations surrounding the best integer-pixel location, where the comparison for each half-pixel location is based on interpolated reference image data.
- the primary goal in video compression processing is to reduce the number of bits used to represent sequences of video images while still maintaining an acceptable level of image quality during playback of the resulting compressed video bitstream.
- Another goal in many video compression applications is to maintain a relatively uniform bit rate, for example, to satisfy transmission bandwidth and/or playback processing constraints.
- Video compression processing often involves the tradeoff between bit rate and playback quality. This tradeoff typically involves reducing the average number of bits per image in the original video sequence by selectively decreasing the playback quality in each image that is encoded into the compressed video bitstream. Alternatively or in addition, the tradeoff can involve skipping certain images in the original video sequence, thereby encoding only a subset of those original images into the resulting compressed video bitstream.
- a video encoder may be able to skip additional images adaptively as needed to satisfy bit rate requirements.
- the decision to skip an additional image is typically based on a distortion measure (e.g., SAD) of the motion-compensated interframe differences and only after motion estimation has been performed for the particular image.
- SAD a distortion measure
- the motion-compensated interframe differences derived from the motion estimation processing are then used to further encode the image data (e.g., depending on the exact video compression algorithm, using such techniques as discrete cosine transform (DCT) processing, quantization, run-length encoding, and variable-length encoding).
- DCT discrete cosine transform
- the motion-compensated interframe differences are no longer needed, and processing continues to the next image in the video sequence.
- the present invention is directed to a technique for generating an estimate of a motion- compensated distortion measure for a particular image in a video sequence without actually having to perform motion estimation for that image.
- the estimated distortion measure can be used during video encoding to determine whether to skip the image without first having to perform motion estimation.
- motion estimation processing is avoided and the computational load of the video compression processing is accordingly reduced.
- motion estimation processing can then be implemented, as needed, to generate motion-compensated interframe differences for subsequent compression processing. Under such a video compression scheme, motion estimation processing is implemented only when the resulting interframe differences will be needed to encode the corresponding image.
- the present invention is a method for processing a sequence of video images, comprising the steps of (a) generating a raw distortion measure for a current image in the sequence relative to a reference image; (b) using the raw distortion measure to generate an estimate of a motion-compensated distortion measure for the current image relative to the reference image without having to perform motion estimation on the current image; (c) determining whether or how to encode the current image based on the estimate of the motion-compensated distortion measure; and (d) generating a compressed video bitstream for the sequence of video images based on the determination of step (c).
- Fig. 1 shows pseudocode for an algorithm for generating a raw (i.e., non-motion-compensated) distortion measure for an image, according to one embodiment of the present invention
- Fig. 2 shows pseudocode for an algorithm for estimating a motion-compensated distortion measure for an image, according to one embodiment of the present invention
- Figs. 3A-3C provide pseudocode for an algorithm for determining what frames to code and how to code them, according to one embodiment of the present invention.
- Fig. 1 shows pseudocode for an algorithm for generating a raw (i.e., non-motion-compensated) distortion measure for an image, according to one embodiment of the present invention.
- the particular raw distortion measure generated using the algorithm of Fig. 1 is a mean absolute difference MAD.
- the algorithm in Fig. 1 can be interpreted as applying to gray-scale images in which each pixel is represented by a single multi-bit intensity value. It will be understood that the algorithm can be easily extended to color images in which each pixel is represented by two or more different multi-bit components (e.g., red, green, and blue components in an RGB format or an intensity (Y) and two color (U and V) components in a YUV format).
- the algorithm of Fig. 1 distinguishes two different types of pixels in the current image: Type I being those pixels having an intensity value sufficiently similar to the corresponding pixel value in the reference image and Type II being those pixels having a pixel value sufficiently different from that of the corresponding pixel in the reference image.
- the "corresponding" pixel is the pixel in the reference image having the same location (i.e., same row and column) as a pixel in the current image.
- the pixels in that portion of the current image will typically be characterized as being of Type I.
- relatively spatially uniform portions i.e., portions in which the pixels have roughly the same value
- those pixels will also typically be characterized as being of Type I.
- the absolute differences between the pixels in the current image and the corresponding pixels in the reference image will be relatively large and most of those current-image pixels will typically be characterized as being of Type ⁇ .
- nl and n2 are counters for these two different types of pixels, respectively, and the variables distl and distl are intermediate distortion measures for these two different types of pixels, respectively.
- these four variables are initialized to zero at Lines 1-2 in Fig. 1.
- the absolute difference ad between the current pixel value and the corresponding pixel value in the reference frame is generated (Line 4). If ad is less than a specified threshold value thresh, then the current pixel is determined to be of Type I, and distl and nl are incremented by ad and 1, respectively (Line 5).
- the current pixel is determined to be of Type ⁇ , and distl and nl are incremented by ad and 1, respectively (Line 6).
- a typical threshold value for the parameter thresh is about 20.
- the intermediate distortion measures distl and distl are then normalized in Lines 8 and 9, respectively.
- relative movement of the person's head from frame to frame e.g., a side-to-side motion
- will result in some portions of the wall being newly covered by pixels corresponding to the head and other portions of the wall that were previously occluded by the head being newly exposed.
- the raw distortion measure MAD is a mean absolute difference that is corrected for double-image effects.
- a typical value for the parameter factor is 0.5.
- the term ⁇ distl*nl*( ⁇ -factor) corrects for double-image effects by treating pixels removed from Type ⁇ as Type I pixels so that the average distortion level in similar areas is added back.
- the distortion distl of Type I pixels is considered as an estimate for the residual and coding noise. It is assumed that this cannot be removed by motion compensation.
- the Type II pixels occupy roughly twice the area as compared to the "perfectly" motion-compensated images, and the term factor reflects this, and is nominally chosen as 0.5.
- the term factor is allowed to vary, since motion compensation is typically not perfect.
- the unoccluded region can be motion compensated; however, the fraction of pixels (n2*(l -factor)) is expected to have a residual plus coding noise similar to Type I pixels. Hence, the term distl*n2*(l- factor) is used as an estimate for distortion of these unoccluded Type ⁇ pixels.
- Fig. 2 shows pseudocode for an algorithm for estimating a motion-compensated distortion measure for an image, according to one embodiment of the present invention.
- the particular distortion measure estimated using the algorithm of Fig. 2 is the motion-compensated mean absolute difference S.
- the algorithm of Fig. 2 derives an estimate Se for the distortion measure S from the raw distortion measure MAD derived using the algorithm of Fig. 1. This estimated distortion measure Se can be used to determine whether to skip images during video encoding without having to perform motion estimation processing for each image.
- the raw distortion measure MAD(T) for the current frame and the raw distortion measure MAD(I- ⁇ ) for the previous frame are used to determine a measure H of the percentage change in MAD from the previous frame to the current frame (Line 1 of Fig. 2).
- Other suitable expressions characterizing the change in the raw distortion measure MAD from the previous frame to the current frame could also conceivably be used.
- the estimated distortion measure Se(I) for the current frame is assumed to be the same as the actual motion- compensated distortion measure 5(/-l) for the previous frame (Line 3). Otherwise, if the percentage change H is less than a second threshold value Tl (Line 4) (where Tl is greater than Tl), then the estimated distortion measure Se ⁇ ) for the current frame is determined using the expression in Line 5, where the factor k is a parameter preferably specified between 0 and 1. Otherwise, the percentage change H is greater than the second threshold value Tl (Line 6) and the estimated distortion measure Se(I) for the current frame is determined using the expression in Line 7. Typical values for Tl and Tl are 0.1 and 0.5, respectively.
- the raw distortion measure MAD(T) is a measure of the non-motion-compensated pixel differences between the current frame and its reference frame.
- the raw distortion measure MAD(I-l) is a measure of the raw pixel differences between the previous frame and its reference frame, which may be the same as or different from the reference frame for the current frame.
- the percentage change H is a measure of the relative change between the two raw distortion measures MAD(I) and MAD(I-l), which are themselves measures of rates of change between those images and their corresponding reference images. Motion compensation does a fairly good job predicting image data when there is little or no change in distortion from frame to frame.
- the actual motion-compensated distortion measure 5(7-1) for the previous frame will be a good estimate Se(I) of the motion-compensated distortion measure 5(7) for the current frame, as in Line 3 of Fig. 2.
- the distortion from frame to frame is changing (e.g., during a scene changes or other non-uniform changes in imagery)
- motion compensation will not do as good a job predicting the image data.
- the actual motion-compensated distortion measure 5(/-l) for the previous frame will not necessarily be a good indication of the actual motion-compensated distortion measure S(I) for the current frame.
- the percentage change H is large (e.g., H>T1), it may be safer to estimate the actual motion-compensated distortion measure 5(7) for the current frame from the raw distortion measure MAD(I) for the current frame, as in the expression in Line 7 of Fig. 2.
- the estimated distortion measure Se generated using the algorithms of Figs. A and B can be used to determine whether to skip the current image, that is, whether to avoid encoding the current image into the compressed video bitstream during video encoding processing.
- an adaptive frame-skipping scheme enables a video coder to maintain control over the transmitted frame rate and the quality of the reference frames. In cases of high motion, this ensures a graceful degradation in frame quality and the frame rate.
- the coder can be in one of two states: steady state or transient state.
- steady state all attempts are made to meet a specified frame rate, and, if this is not possible, an attempt is made to maintain a certain minimum frame rate.
- the coder switches into a transient state, where large frame skips are allowed until the buffer level depletes and the next frame can be transmitted.
- transient state typically occurs during scene changes and sudden large motions. It is desirable for the coder to go from the transient state to the steady state in a relatively short period of time.
- images may be designated as the following different types of frames for compression processing: o An intra (I) frame which is encoded using only intra-frame compression techniques, o A predicted (P) frame which is encoded using inter-frame compression techniques based on a previous I or P frame, and which can itself be used as a reference frame to encode one or more other frames, o A bi-directional (B) frame which is encoded using bi-directional inter-frame compression techniques based on a previous I or P frame and a subsequent I or P frame, and which cannot be used to encode another frame, and o A PB frame which corresponds to two images — a P frame and a temporally preceding B frame — that are encoded as a single frame with a single set of overhead data (as in the H.263 video compression algorithm).
- I intra
- P predicted
- B bi-directional
- B frames H.263+, MPEG
- PB frames H.263
- the B and PB frames are used for two purposes in two different situations.
- the system is designed for applications where control over the rate and quality of reference frames is required.
- the parameters that are adjusted include the rate for the frame, the acceptable distortion level in the frame, and the frame-rate. An attempt is made to maintain these parameters by performing an intelligent mode decision as to when to encode B or PB frames and by intelligently skipping frames, when warranted.
- R Number of bits needed to encode the current frame as a P frame.
- R Number of bits needed to encode the current frame as a P frame.
- the same model can be applied for a B frame except with a quantizer that is typically higher than that of a corresponding P frame;
- H Number of bits needed to encode the overhead (i.e., header and motion information);
- XI, XI Parameters of quadratic model, which are recursively updated from frame to frame.
- the estimate Se generated using the algorithms of Figs. A and B without have to perform motion estimation, is preferably used in Equation (1 ) for the motion-compensated distortion measure S.
- MAD Raw distortion measure for the current frame where the distortion measure is based on the mean absolute difference.
- H Overhead bits (e.g., for motion vectors) other than bits used to transmit residuals for the current frame. If this information is unavailable, H is assumed to be zero.
- CBR constant bit rate
- smin Smallest skip desired for encoding the next frame (e.g., 1 / average target frame rate).
- smax Largest skip allowed between frames at steady state.
- skip Pointer corresponding to the number of frames to skip from the previously encoded frame.
- Bframeskip Pointer corresponding to frame stored as a potential B frame.
- B Buffer occupancy at frame skip before encoding frame skip.
- B Bp-(Rp*skip), where Bp is the buffer occupancy after encoding the previous frame.
- PCI Similar to PCFD1, except that the determination is made after motion estimation, e.g., by comparing the average motion vector magnitude to a specified threshold level.
- Figs. 3A-3C provide pseudocode for an algorithm for determining what frames to code and how to code them, according to one embodiment of the present invention.
- the algorithm contains seven routines: START, LOOP1-LOOP5, and TRANSIENT.
- START is called during steady-state processing after coding a reference frame.
- TRANSIENT is called during transient processing.
- START is called during steady-state processing after coding a reference frame.
- TRANSIENT is called during transient processing.
- all attempts are made to meet the preset specified frame rate, and, if this is not possible, an attempt is made to maintain a certain minimum frame rate.
- the coder switches into the transient state, where large frame skips are allowed until the buffer level depletes and the next frame can be transmitted.
- the transient state typically occurs at the start of the transmission, during scene changes, and during sudden large motions.
- the processing of the START routine begins at Line Al in Fig. 3 A with the initialization of the current frame pointer skip to the minimum skip value smin.
- the smallest frame skip value may be 2, corresponding to a coding scheme in which an attempt is made to encode every other image in the original video sequence.
- the raw distortion measure MAD is computed for the current frame skip using the algorithm of Fig. 1.
- Equation (1) is then evaluated using Se to estimate R, the number of bits needed to encode the current frame as a P frame. If encoding the current frame as a P frame does not make the buffer too full, then the flag PCFDI is set to 1 (i.e., true). Otherwise, PCFD1 is set to 0 (i.e., false).
- PCFDI is true (Line A2) indicating that the current frame can be transmitted as a P frame
- motion estimation is performed for the current frame
- the actual motion-compensated distortion measure 5 is calculated
- the number of bit R is reevaluated using S in Equation (1) instead of Se
- the values for flags PCI and PC2 are determined (Line A3).
- the flag PCI indicates the impact to the buffer from encoding the current frame skip as a P frame based on the motion-compensated distortion measure 5.
- PCI is set to 1 if frame skip can be encoded as a P frame.
- the flag PCI indicates whether the motion estimation results indicate that motion (e.g., average motion vector magnitude for frame) is larger than a specified threshold. If so, then PCI is set to 1.
- the LOOP1 routine starts by storing the current frame smin as a possible B frame
- the next frame is selected by setting skip equal to 2*smin (Line B8).
- the LOOP2 routine is called when there is not enough room in the buffer to transmit the current frame skip-smin as a P frame. Under those circumstances, frame smin will not be encoded and the LOOP2 routine attempts to select the next frame to be coded and determine how that next frame should be encoded.
- the parameter skip is set to smin+l to point to the next frame in the video sequence (Line Cl in Fig. 3B), and the frames from smin+l to smin+floor(smin/2), where "floor” is a truncation operation, are then sequentially analyzed (Lines C2, C14, C15) to see if any of them can be encoded (Lines C3-C13).
- the number of bits to encode are calculated based on the raw distortion measure MAD and the flags PCFDI and PCFDI are set to indicate whether there is room in the buffer and whether motion is large, respectively (Line C3).
- the flag PCFDI is set without actually performing motion estimation, by comparing the raw distortion measure MAD to a specified threshold level. If MAD is greater than the threshold level, then motion is assumed to be large and PCFDI is set to 1.
- the LOOP3 routine is called when the processing in the LOOP2 routine fails to determine conclusively which frame to encode next and/or how to encode it. In that case, the LOOP3 routine attempts to select the next frame to be coded and determine how that next frame should be encoded.
- the parameter skip is set to •_m +floor(_'m /2)+l (Line Dl in Fig. 3B), and the frames from there up to 2*smin-l are then sequentially analyzed (Lines D2, D5, D6) to see if any of them can be encoded (Lines D3-D4).
- Initializing the parameter skip to smin+f ⁇ oor(smin/2)+l allows the P and the B frames to be closer together for the given B skip, which improves coding efficiency in an H.263 PB frame when the P and B frames are tightly coupled. With true B frames, this strategy may need to be changed.
- the number of bits R to encode are calculated based on the estimated distortion measure Se generated from the raw distortion measure MAD and the flags PCFDI and PCFDI are set to indicate whether there is room in the buffer and whether motion is large, respectively (Line D3). If both those conditions are met, then the current frame skip is encoded as a P frame, and processing returns to the START routine (Line D4).
- the LOOP4 routine is called when the processing in the LOOP3 routine fails to determine conclusively which frame to encode next and/or how to encode it. In that case, the LOOP4 routine attempts to select the next frame to be coded and determine how that next frame should be encoded.
- the parameter skip is set to 2*smin+l (Line El in Fig. 3C), and the frames from there up to smax-l are then sequentially analyzed (Lines E2, E6, E7) to see if any of them can be encoded (Lines E3-E5).
- the number of bits R to encode are calculated based on the estimated distortion measure Se, which is in turn based on the raw distortion measure MAD, and the flag PBCFD is set (Line E3).
- the LOOP5 routine is called when the processing in the LOOP4 routine fails to determine conclusively which frame to encode next and/or how to encode it. In that case, the LOOP5 routine attempts to select the next frame to be coded and determine how that next frame should be encoded.
- the parameter skip is set to smax+ ⁇ (Line FI in Fig. 3C), and the frames from there up to smin + smax are then sequentially analyzed (Lines F2, F5, F6) to see if any of them can be encoded (Lines F3-F4).
- the number of bits R to encode are calculated based on the estimated distortion measure Se, which is in turn based on the raw distortion measure MAD, and the flag PBCFD is set (Line F3).
- the TRANSIENT routine is called when the processing in the LOOP5 routine fails to determine conclusively which frame to encode next and/or how to encode it. In that case, processing switches from the steady state into the transient state, where the TRANSIENT routine selects one or more frames for encoding as P frames until the TRANSIENT routine determines that processing can return to the steady state. In alternative embodiments, the TRANSIENT routine may encode at least some of the frames as B frames.
- the algorithm presented in Figs. 3A-3C provides a complete approach to frame skipping, PB decision, and quality control when the quantizer step variation is constrained to be within certain bounds from one reference frame to the next.
- the scheme maintains the user-defined minimum frame rate during steady-state operations and attempts to transmit data at a high quality and at an "acceptable" frame rate (greater than the minimum frame rate). It provides a graceful degradation in quality and frame rate when there is an increase in motion or complexity. B frames are used both for improving the frame rate and the coded quality. However, in situations of scene change or when the motion increases very rapidly, the demands of frame rate or reference frame quality may be unable to be met. In this situation, processing goes into a transient state to "catch up" and slowly re-enter a new steady state.
- the scheme requires minimal additional computational complexity and no additional storage (beyond that required to store the incoming frames).
- the present invention can be embodied in the form of methods and apparatuses for practicing those methods.
- the present invention can also be embodied in the form of program code embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
- the present invention can also be embodied in the form of program code, for example, whether stored in a storage medium, loaded into and/or executed by a machine, or transmitted over some transmission medium, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
- program code When implemented on a general-purpose processor, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Algebra (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Selon cette invention, une mesure de distorsion brute (p.ex., l'écart absolu moyen, ou MAD) relative à une trame de référence est générée pour l'image courante dans un flux vidéo. La mesure de distorsion brute est ensuite utilisée pour générer une estimation (p.ex., Se) d'une mesure de distorsion à compensation de mouvement (p.ex., S) pour l'image courante se rapportant à l'image de référence, et ce sans avoir à effectuer l'estimation de mouvement pour l'image courante. L'estimation mesure de distorsion à compensation de mouvement est ensuite utilisée pour déterminer les délais ou la manière de coder l'image courante, et un flux binaire comprimé est généré pour la séquence d'images vidéo, partiellement sur la base de cette détermination. La présente invention permet à un codeur vidéo de déterminer le moment du saut de trames sans avoir à passer au préalable par l'estimation de mouvement, qui sollicite d'importantes ressources de calcul, et, partant, à dépenser inutilement la charge de calcul pour des trames sautées.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10093998P | 1998-09-18 | 1998-09-18 | |
US60/100,939 | 1998-09-18 | ||
US25594699A | 1999-02-23 | 1999-02-23 | |
US09/255,946 | 1999-02-23 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2000018134A1 true WO2000018134A1 (fr) | 2000-03-30 |
Family
ID=26797724
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1999/021830 WO2000018134A1 (fr) | 1998-09-18 | 1999-09-20 | Realisation du saut de trames sans recourir a l'estimation du mouvement |
Country Status (3)
Country | Link |
---|---|
JP (1) | JP3641172B2 (fr) |
KR (1) | KR100323683B1 (fr) |
WO (1) | WO2000018134A1 (fr) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1202579A1 (fr) * | 2000-10-31 | 2002-05-02 | Interuniversitair Microelektronica Centrum Vzw | Procédé et appareil pour le codage adaptif de séquences de trames de données |
EP1204279A3 (fr) * | 2000-10-31 | 2002-05-15 | Interuniversitair Microelektronica Centrum Vzw | Procédé et appareil pour le codage adaptif de séquences de données encadrées |
WO2002085038A1 (fr) * | 2001-04-16 | 2002-10-24 | Mitsubishi Denki Kabushiki Kaisha | Procede et systeme destines a determiner la distorsion dans un signal video |
WO2006094000A2 (fr) * | 2005-03-01 | 2006-09-08 | Qualcomm Incorporated | Codage d'une region d'interet inflechi par une mesure de la qualite pour visiophonie |
WO2006094033A1 (fr) * | 2005-03-01 | 2006-09-08 | Qualcomm Incorporated | Techniques adaptatives de saut de trames destinees a un codage video a vitesse commandee |
WO2006094001A2 (fr) * | 2005-03-01 | 2006-09-08 | Qualcomm Incorporated | Codage d'une region d'interet avec omission de l'arriere-plan pour visiophonie |
WO2006093999A2 (fr) * | 2005-03-01 | 2006-09-08 | Qualcomm Incorporated | Codage d'une region d'interet en visiophonie par attribution de bits dans le domaine rho |
US7616690B2 (en) | 2000-10-31 | 2009-11-10 | Imec | Method and apparatus for adaptive encoding framed data sequences |
US7733566B2 (en) | 2006-06-21 | 2010-06-08 | Hoya Corporation | Supporting mechanism |
DE102015121148A1 (de) * | 2015-12-04 | 2017-06-08 | Technische Universität München | Reduzieren der Übertragungszeit von Bildern |
CN110832858A (zh) * | 2017-07-03 | 2020-02-21 | Vid拓展公司 | 基于双向光流的运动补偿预测 |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100739133B1 (ko) * | 2001-04-17 | 2007-07-13 | 엘지전자 주식회사 | 디지털 비디오 코딩시 b프레임 코딩 방법 |
KR20120072202A (ko) | 2010-12-23 | 2012-07-03 | 한국전자통신연구원 | 움직임 추정 장치 및 방법 |
CN102271269B (zh) * | 2011-08-15 | 2014-01-08 | 清华大学 | 一种双目立体视频帧率转换方法及装置 |
KR102308373B1 (ko) | 2021-06-08 | 2021-10-06 | 주식회사 스누아이랩 | 얼굴인식을 위한 비디오 디블러링장치 및 그 장치의 구동방법 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0514865A2 (fr) * | 1991-05-24 | 1992-11-25 | Mitsubishi Denki Kabushiki Kaisha | Système pour le codage d'images |
EP0772362A2 (fr) * | 1995-10-30 | 1997-05-07 | Sony United Kingdom Limited | Compression de données vidéo |
-
1999
- 1999-09-17 KR KR1019990040209A patent/KR100323683B1/ko not_active IP Right Cessation
- 1999-09-20 JP JP26623299A patent/JP3641172B2/ja not_active Expired - Fee Related
- 1999-09-20 WO PCT/US1999/021830 patent/WO2000018134A1/fr active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0514865A2 (fr) * | 1991-05-24 | 1992-11-25 | Mitsubishi Denki Kabushiki Kaisha | Système pour le codage d'images |
EP0772362A2 (fr) * | 1995-10-30 | 1997-05-07 | Sony United Kingdom Limited | Compression de données vidéo |
Non-Patent Citations (3)
Title |
---|
LEE J ET AL: "RATE-DISTORTION OPTIMIZED FRAME TYPE SELECTION FOR MPEG ENCODING", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,US,IEEE INC. NEW YORK, vol. 7, no. 3, 1 June 1997 (1997-06-01), pages 501-510, XP000690588, ISSN: 1051-8215 * |
LEE J: "A FAST FRAME TYPE SELECTION TECHNIQUE FOR VERY LOW BIT RATE CODING USING MPEG-1", REAL-TIME IMAGING,GB,ACADEMIC PRESS LIMITED, vol. 5, no. 2, 1 April 1999 (1999-04-01), pages 83-94, XP000831435, ISSN: 1077-2014 * |
TIHAO CHIANG ET AL: "A NEW RATE CONTROL SCHEME USING QUADRATIC RATE DISTORTION MODEL", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,US,IEEE INC. NEW YORK, vol. 7, no. 1, pages 246-250, XP000678897, ISSN: 1051-8215 * |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
USRE44457E1 (en) | 2000-10-31 | 2013-08-27 | Imec | Method and apparatus for adaptive encoding framed data sequences |
US7616690B2 (en) | 2000-10-31 | 2009-11-10 | Imec | Method and apparatus for adaptive encoding framed data sequences |
EP1204279A3 (fr) * | 2000-10-31 | 2002-05-15 | Interuniversitair Microelektronica Centrum Vzw | Procédé et appareil pour le codage adaptif de séquences de données encadrées |
EP1202579A1 (fr) * | 2000-10-31 | 2002-05-02 | Interuniversitair Microelektronica Centrum Vzw | Procédé et appareil pour le codage adaptif de séquences de trames de données |
WO2002085038A1 (fr) * | 2001-04-16 | 2002-10-24 | Mitsubishi Denki Kabushiki Kaisha | Procede et systeme destines a determiner la distorsion dans un signal video |
EP2046048A3 (fr) * | 2005-03-01 | 2013-10-30 | Qualcomm Incorporated | Codage de région d'intérêt avec omission de fond pour la téléphonie de vidéo |
US8514933B2 (en) | 2005-03-01 | 2013-08-20 | Qualcomm Incorporated | Adaptive frame skipping techniques for rate controlled video encoding |
WO2006094000A3 (fr) * | 2005-03-01 | 2006-12-28 | Qualcomm Inc | Codage d'une region d'interet inflechi par une mesure de la qualite pour visiophonie |
WO2006093999A3 (fr) * | 2005-03-01 | 2006-12-28 | Qualcomm Inc | Codage d'une region d'interet en visiophonie par attribution de bits dans le domaine rho |
US7724972B2 (en) | 2005-03-01 | 2010-05-25 | Qualcomm Incorporated | Quality metric-biased region-of-interest coding for video telephony |
WO2006094001A2 (fr) * | 2005-03-01 | 2006-09-08 | Qualcomm Incorporated | Codage d'une region d'interet avec omission de l'arriere-plan pour visiophonie |
US8768084B2 (en) | 2005-03-01 | 2014-07-01 | Qualcomm Incorporated | Region-of-interest coding in video telephony using RHO domain bit allocation |
WO2006094001A3 (fr) * | 2005-03-01 | 2007-01-04 | Qualcomm Inc | Codage d'une region d'interet avec omission de l'arriere-plan pour visiophonie |
EP2309747A3 (fr) * | 2005-03-01 | 2013-06-26 | Qualcomm Incorporated | Codage d'une region d'intérêt avec allocation de bits dans le domaine RHO |
WO2006093999A2 (fr) * | 2005-03-01 | 2006-09-08 | Qualcomm Incorporated | Codage d'une region d'interet en visiophonie par attribution de bits dans le domaine rho |
WO2006094033A1 (fr) * | 2005-03-01 | 2006-09-08 | Qualcomm Incorporated | Techniques adaptatives de saut de trames destinees a un codage video a vitesse commandee |
WO2006094000A2 (fr) * | 2005-03-01 | 2006-09-08 | Qualcomm Incorporated | Codage d'une region d'interet inflechi par une mesure de la qualite pour visiophonie |
US8693537B2 (en) | 2005-03-01 | 2014-04-08 | Qualcomm Incorporated | Region-of-interest coding with background skipping for video telephony |
US7733566B2 (en) | 2006-06-21 | 2010-06-08 | Hoya Corporation | Supporting mechanism |
DE102015121148A1 (de) * | 2015-12-04 | 2017-06-08 | Technische Universität München | Reduzieren der Übertragungszeit von Bildern |
WO2017093205A1 (fr) | 2015-12-04 | 2017-06-08 | Technische Universität München | Réduction du temps de transmission d'images |
CN110832858B (zh) * | 2017-07-03 | 2023-10-13 | Vid拓展公司 | 用于视频编解码的设备、方法 |
CN110832858A (zh) * | 2017-07-03 | 2020-02-21 | Vid拓展公司 | 基于双向光流的运动补偿预测 |
Also Published As
Publication number | Publication date |
---|---|
KR100323683B1 (ko) | 2002-02-07 |
JP3641172B2 (ja) | 2005-04-20 |
JP2000125302A (ja) | 2000-04-28 |
KR20000023277A (ko) | 2000-04-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7224734B2 (en) | Video data encoding apparatus and method for removing a continuous repeat field from the video data | |
EP2250813B1 (fr) | Procédé et appareil de sélection de trame prédictive pour une efficacité rehaussée et une qualité subjective | |
EP0798930B1 (fr) | Appareil de codage vidéo | |
WO2000018134A1 (fr) | Realisation du saut de trames sans recourir a l'estimation du mouvement | |
JP4702059B2 (ja) | 動画像を符号化する方法及び装置 | |
US7095784B2 (en) | Method and apparatus for moving picture compression rate control using bit allocation with initial quantization step size estimation at picture level | |
JPH10136375A (ja) | 動画像の動き補償方法 | |
JP3755155B2 (ja) | 画像符号化装置 | |
JP3210082B2 (ja) | 符号化装置及びその方法 | |
US20040234142A1 (en) | Apparatus for constant quality rate control in video compression and target bit allocator thereof | |
JP3593929B2 (ja) | 動画像符号化方法及び動画像符号化装置 | |
JP2000201354A (ja) | 動画像符号化装置 | |
JP2000197049A (ja) | 動画像可変ビットレート符号化装置および方法 | |
US6763138B1 (en) | Method and apparatus for coding moving picture at variable bit rate | |
JP4644097B2 (ja) | 動画像符号化プログラム、プログラム記憶媒体、および符号化装置。 | |
JP3480067B2 (ja) | 画像符号化装置及び方法 | |
US7133448B2 (en) | Method and apparatus for rate control in moving picture video compression | |
KR100390167B1 (ko) | 화상 부호화방법 및 화상 부호화장치 | |
JP2001016594A (ja) | 動画像の動き補償方法 | |
KR100413979B1 (ko) | 예측부호화방법및장치 | |
JP2005303555A (ja) | 動画像符号化装置および動画像符号化方法 | |
JP3711573B2 (ja) | 画像符号化装置及び画像符号化方法 | |
JPH0646411A (ja) | 画像符号化装置 | |
JPH08126012A (ja) | 動画像圧縮装置 | |
Ryu | Block matching algorithm using neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): BR CA CN IN |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
122 | Ep: pct application non-entry in european phase |