EP1803302A4 - Vorrichtung und verfahren zur regulierung der bitrate eines kodierten skalierbaren bitstroms auf mehrschichtbasis - Google Patents
Vorrichtung und verfahren zur regulierung der bitrate eines kodierten skalierbaren bitstroms auf mehrschichtbasisInfo
- Publication number
- EP1803302A4 EP1803302A4 EP05856385A EP05856385A EP1803302A4 EP 1803302 A4 EP1803302 A4 EP 1803302A4 EP 05856385 A EP05856385 A EP 05856385A EP 05856385 A EP05856385 A EP 05856385A EP 1803302 A4 EP1803302 A4 EP 1803302A4
- Authority
- EP
- European Patent Office
- Prior art keywords
- frame
- stream
- bit
- bitrate
- skipping
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000005540 biological transmission Effects 0.000 claims abstract description 42
- 230000002123 temporal effect Effects 0.000 claims description 32
- 238000012790 confirmation Methods 0.000 claims description 6
- 230000009466 transformation Effects 0.000 description 33
- 238000013139 quantization Methods 0.000 description 29
- 239000013598 vector Substances 0.000 description 19
- 238000007906 compression Methods 0.000 description 12
- 230000006835 compression Effects 0.000 description 12
- 238000010276 construction Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 6
- 238000013459 approach Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000013144 data compression Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000011426 transformation method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/234327—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by decomposing into layers, e.g. base layer and one or more enhancement layers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/132—Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/164—Feedback from the receiver or from the transmission channel
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/266—Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
- H04N21/2662—Controlling the complexity of the video stream, e.g. by scaling the resolution or bitrate of the video stream based on the client capabilities
Definitions
- Apparatuses and methods consistent with the present invention relate to video compression. More particularly, the present invention relates to a method and an apparatus for realizing signal to noise ratio (SNR) scalability in a video stream server in order to transmit a video stream in a variable network environment.
- SNR signal to noise ratio
- Multimedia data is usually voluminous and requires a large capacity storage medium. Also, a wide bandwidth is required for transmitting the multimedia data. For example, digitizing one frame of a 24-bit true color image with a resolution of 640 x 480 requires 640 x 480 x 24 bits, that is, 7.37 mega bits (Mbits).
- a bandwidth of approximately 221 Mbits per second is needed to transmit this data at the rate of 30 frames per second, and a storage space of approximately 1, 200 giga bits (Gbits) is needed to store a 90-minute movie. Taking this into consideration, it is required to use a compressed coding scheme when transmitting multimedia data.
- a basic principle of data compression is to eliminate redundancy in the data.
- the three types of data redundancy are: spatial redundancy, temporal redundancy, and perceptual-visual redundancy.
- Spatial redundancy refers to the duplication of identical colors or objects in an image
- temporal redundancy refers to little or no variation between adjacent frames in a moving picture or successive repetition of the same sounds in audio
- perceptual-visual redundancy refers to the limitations of human vision and the inability to hear high frequencies.
- Data compression types can be classified into loss/lossless compression depending upon whether source data is lost, intraframe/interframe compression depending upon whether data is compressed independently relative to each frame, and symmetrical/asymmetrical compression depending upon whether the same amount of time is taken to decompress as it is to compress .
- loss/lossless compression is mainly used in compressing text data or medical data
- lossy compression is mainly used in compressing multimedia data.
- Intraframe compression is generally used for eliminating spatial redundancy and interframe compression is used for eliminating temporal redundancy.
- Transmission media in current use have a variety of transmission speeds, covering ultra-high-speed communication networks capable of transmitting data at a rate of tens of Mbits per second, mobile communication networks having a transmission speed of 384 kilo bits (Kbits) per second and so on.
- Kbits kilo bits
- conventional video encoding algorithms e.g., MPEG-I, MPEG-2, MPEG-4, H.263 and H.264 (Advanced Video Coding)
- temporal redundancy is eliminated by motion compensation
- spatial redundancy is eliminated by spatial transformations.
- Scalable video coding refers to video coding having scalability in a spatial domain, that is, in terms of resolution. Scalability has the property of enabling a compressed bit-stream to be decoded partially or in advance, whereby videos having a variety of resolutions can be played.
- the term 'scalability' herein is used to collectively refer to spatial scalability for controlling the resolution of a video, signal-to-noise ratio (SNR) scalability for controlling the quality of a video, and temporal scalability for controlling the frame rates of a video, and combinations thereof.
- SNR signal-to-noise ratio
- the spatial scalability may be implemented based on the wavelet transformation.
- temporal scalability has been implemented using motion compensated temporal filtering (MCTF) and unconstrained MCTF (UMCTF).
- MCTF motion compensated temporal filtering
- UMCTF unconstrained MCTF
- SNR scalability may be implemented based on the embedded quantization coding scheme that considers spatial correlation or on the fine granular scalability (FGS) coding scheme used for MPEG series codecs.
- FGS fine granular scalability
- FIG. 1 An overall construction of a video coding system to support scalability is depicted in FIG. 1.
- a video encoder 45 encodes an input video 10 through temporal filtering, spatial transformation, and quantization to thereby generate a bit-stream 20.
- a pre- decoder 50 may implement a variety of scalabilities relative to texture data in a simple manner by truncating or extracting a part of the bit-stream 20 received from the video encoder 45 . Picture quality, resolution or frame rate may be considered for the truncating.
- the process of implementing the scalability by truncating a part of the bit- stream is called 'pre-decoding.' [9]
- the video decoder 60 reconstructs the output video 30 from the pre-decoded bit- stream 25 by inversely performing the processes conducted by the video encoder 45.
- Pre-decoding of the bit-stream according to pre-decoding conditions is not necessarily conducted by the pre-decoder 50.
- the bit-stream may be pre-decoded at the video decoder 60 side.
- a multi-layer may comprise a base layer, a first enhancement layer and a second enhancement layer, and each layer has different resolutions (QCTF, CIF 2CIF) or different frame rates.
- FIG. 2 illustrates an example of a scalable video codec using a multi-layer structure.
- a base layer is defined in the quarter common intermediate format (QCIF) having a frame rate of 15 Hz
- a first enhancement layer is defined in the common intermediate format (CIF) having a frame rate of 30 Hz
- a second enhancement layer is defined as a standard definition (SD) having a frame rate of 60 Hz.
- CIF common intermediate format
- SD standard definition
- the layers illustrated in FIG. 2 have different resolutions and frame rates. However, there may exist layers having the same resolution but different frame rates, or having the same frame rate but different resolutions.
- a conventional method to implement the SNR scalability at the pre-decoder 50 side is as illustrated in FIG. 3.
- a bit-stream 20 generated by a video encoder consists of a plurality of group of pictures (GOPs), and each GOP consists of a plurality of frame information.
- Frame information 40 consists of a motion component 41 and a texture component 42.
- the pre-decoder 50 determines a transmissible bitrate according to the bandwidth of the network connected to the decoder side, and truncates a part of the original texture component 42 based on the determined bitrate.
- the texture component left after truncating the original texture component 42, that is, the texture component 43 pre-decoded based on the SNR, and the motion component 41 are transmitted to the video decoder side.
- this texture component is encoded by a method to support SNR scalability, the SNR scalability can be implemented by a simple operation to truncate a part of the texture component backward.
- Encoding methods to support SNR scalability are: fine granular scalability (FGS) coding used in codecs of the MPEG series, and embedded quantization coding used in codecs of the wavelet series.
- FGS fine granular scalability
- the bit-stream generated by the embedded quantization has an additional merit: it can be pre-decoded finer than the bit-stream generated by the FGS coding. Disclosure of Invention
- the bit-stream may not approach a target bitrate desired by a user when the SNR changes in a layer.
- the quality of the picture is degraded because of excessive truncation of data or the bit-stream is transmitted as it is because there is no bit to be further truncated, this may cause a network delay in real-time streaming. Therefore, there is a need for a pre-decoding method to solve this problem.
- the present invention has been proposed to solve the problem described above, and an object of the present invention is to provide a pre-decoding method and apparatus capable of coping adaptively with a variable network environment.
- Another object of the present invention is to solve a problem that a bit-stream cannot approach a target transmission bitrate only by truncating texture information of a frame in the current layer.
- an apparatus for adapting a bitrate of a coded scalable bit-stream based on multi-layers to a variable network environment comprising a bit-stream parsing unit to parse an input bit- stream, a pre-decoding condition determining unit to determine a target transmission bitrate according to the variable network environment, a pre-decoding unit to skip at least one frame among the frames included in the parsed bit-stream according to the determined target transmission bitrate, and a bit-stream transmission unit to transmit to a client device the bit-stream that has been restructured due to frame skipping.
- a multilayer based video decoder comprising a skip confirmation unit to confirm skipping of the current frame by reading out a value of the field that indicates the texture data size of the current frame from an input bit-stream, a base layer decoder to restore a base layer frame having the same temporal position as the current frame when the value indicates frame skipping, and a upsampling unit to upsample the restored lower layer frame to the resolution of an enhancement layer.
- a method of adapting a bitrate of a coded scalable bit-stream based on multi-layers to a variable network environment comprising parsing an input bit-stream, determining a target transmission bitrate according to the variable network environment, skipping at least one frame among the frames included in the parsed bit-stream according to the determined target transmission bitrate, and transmitting to a client device the bit-stream that has been restructured after the frame skipping.
- a multi-layer based video decoding method comprising confirming skipping of the current frame by reading out a value of the field that indicates the texture data size of the current frame from an input bit-stream, restoring a base layer frame having the same temporal position as the current frame when the value indicates frame skipping, and upsampling the restored lower layer frame to the resolution of an enhancement layer.
- FlG. 1 illustrates the overall construction of a video coding system to support scalability
- FlG. 2 illustrates an example of a frame array of a scalable video codec using a multi-layered structure
- FlG. 3 illustrates conventional implementation of SNR scalability
- FIGS. 4 to 7 illustrate frame skipping according to an exemplary embodiment of the present invention
- FlG. 8 illustrates a construction of a multi-layer video encoder according to an exemplary embodiment of the present invention
- FlG. 9 illustrates an example of a discrete cosine transform (DCT) conversion of a differential coefficient
- FlG. 10 illustrates a threshold bitrate
- FIGS. 11 to 14 illustrate a structure of a bit-stream according to an exemplary embodiment of the present invention
- FlG. 15 illustrates a construction of a pre-decoder according to an exemplary embodiment of the present invention
- FlG. 16 is a block diagram illustrating an example of pre-decoding of each frame
- FlG. 17 is a block diagram illustrating an example of pre-decoding of each frame
- FlG. 18 is a flow chart illustrating operations conducted in a pre-decoder according to an exemplary embodiment of the present invention.
- FlG. 19 is a flowchart illustrating operations conducted in a pre-decoder relative to a bit-stream encoded by a closed - loop mechanism
- FlG. 20 is a flowchart illustrating operations conducted in a pre-decoder according to another exemplary embodiment of the present invention
- FlG. 21 illustrates a result after truncating texture data by frame skipping
- FlG. 22 illustrates a result after truncating motion information by frame skipping
- FlG. 23 illustrates a construction of a multi-layered video decoder according to an exemplary embodiment of the present invention.
- a special method is used to implement SNR scalability when a bit-stream cannot reach a target transmission bitrate at the pre-decoder side even by truncating a texture component of the bit-stream encoded using multi layers.
- the pre-decoder skips a frame in the current layer, and a decoder restores the frame of the current layer, using a frame of a base layer corresponding to the frame of the current layer.
- This technology extends the scope of scalability, saves bits of the skipped frame, and provides a video frame superior in visual quality, rather than restoring a part of the current layer frame because of insufficient bitrate.
- FIGS. 4 to 7 The frame skipping used in the present invention will be described with reference to FIGS. 4 to 7. It is assumed that there is a bit-stream comprising a plurality of base layer frames (Al to A4) and a plurality of enhancement layer frames (Fl to F8).
- An enhancement layer frame, (i.e., Fl) can be restored by upsampling the base layer (i.e., Al) in the same temporal position at the decoder side even though a part thereof is truncated.
- frames Fl, F3, F5 and F7 are truncated.
- the pre-decoder may skip a part or the entire enhancement layer frame in which a base layer frame is present at the same position.
- the sequence of skipping starts from the last frame of the current pre-decoding unit in an inverse manner.
- F7 and F5 of FlG. 4 are skipped, and a video decoder may replace the frames F5 and F7 of the enhancement layer by the frames (A3u and A4u) generated by upsampling of A3 and A4.
- FlG. 6 illustrates a case where base layers (Al to A8) and enhancement layer frames (Fl to F8) are present, by way of example.
- a frame may first be truncated from the enhancement layer frames.
- I-picture intra picture
- P-picture predictive picture
- FIG. 6 when it is assumed that Fl is an I-picture, F5 is a P-picture and the other oblique frames are bidirectional pictures (B-pictures), and when it is necessary to truncate frames, only B-pictures may be truncated, starting from the later temporal frames.
- a frame may be truncated without classifying frames into high-pass frames and low-pass frames.
- an open- loop coding method such as MCTF
- a frame may be truncated without classifying frames into high-pass frames and low-pass frames.
- the open- loop coding since errors are distributed between the low-pass frames and the high-pass frames, the quality of the picture of the high-pass frames referencing the low-pass frames, whose picture quality is somewhat low, is not greatly degraded as compared to the closed-loop coding.
- a multi-layered video encoder 100 is as illustrated in FlG. 8.
- the video encoder 100 comprises a downsampler 110, motion estimation units 121 and 131, temporal transformation units 122 and 132, spatial transformation units 123 and 133, quantization units 124 and 134, a decoding unit 135, an entropy coding unit 150 and a picture quality comparing unit 160.
- the downsampler 110 downsamples an input video with a resolution and a frame rate adapted to each layer. As illustrated in FlG. 2, when a base layer of QCIF @ 15Hz and an enhancement layer of CIF@30Hz are used, an original input video is downsampled to QCIF and CIF separately , and the resultant videos are again downsampled at the frame rates of 15 Hz and 30 Hz , respectively. Downsampling of the resolution may be conducted by means of an MPEG downsampler or a wavelet downsampler, and downsampling of the frame rate may be conducted through frame skipping or frame interpolation.
- the motion estimation unit 121 performs motion estimation with regard to an enhancement layer frame and obtains a motion vector of a base layer frame.
- Motion estimation is a process to search a reference frame for a block which is the most similar to a block of the current frame, that is, a block having the least errors.
- a variety of methods such as a fixed size block matching or a hierarchical variable size block matching (HVSBM) may be used.
- HVSBM hierarchical variable size block matching
- the motion vector component of the enhancement layer from which redundancy is eliminated, can be represented most efficiently by use of the motion vector of the base layer frame obtained by the motion estimation unit 131.
- the temporal transformation unit 122 constructs a prediction frame using the motion vector obtained by the motion estimation unit 121 and a frame at a position temporally different from the current frame, and obtains the difference between the current frame and the prediction frame to thereby reduce the temporal redundancy. As a result, a residual frame is generated.
- the current frame is an intraframe that is encoded without reference to a different frame, it does not need a motion vector, and a temporal transmission using the prediction frame is also omitted.
- MCTF or UMCTF may be used to support the temporal scalability.
- an enhancement layer frame is an intraframe
- a method of removing the redundancy of textures between layers by use of a base layer frame at the corresponding position may be used.
- the base layer frame having passed through the quantization unit 134 is restored by decoding the resultant frame again in the decoding unit 135, whereby the enhancement layer frame on the corresponding position can be efficiently predicted by use of the restored base layer frame (upsampled when necessary), which is called T B-intra prediction.
- the spatial transformation unit 123 generates a transform coefficient by performing spatial transformation on the residual frame generated by the temporal transformation module 122 or on an original input frame.
- DCT or wavelet spatial transformation is used for a method of spatial transformation .
- the transform coefficient is a DCT coefficient, and it is a wavelet coefficient when the wavelet transformation is used.
- the quantization unit 124 quantizes the transform coefficient generated by the spatial transformation unit 123 to thereby generate a quantization coefficient. At this time, the quantization unit 124 formats the quantization coefficient in order to support the SNR scalability. As a method to support the SNR scalability, FGS coding or embedded quantization may be used.
- FGS coding will first be described.
- the difference between the original input frame and the decoded base layer frame is obtained, and the obtained difference is decomposed into a plurality of bit-planes.
- a difference coefficient of a DCT block is as illustrated in FIG. 9 (In the 8 x 8 DCT block , the omitted portions are all indicated by Os ' ), the difference coefficient can be arrayed as ⁇ +13, -11, 0, 0, +17, 0, 0, 0, -3, 0, 0, ... ⁇ when a zigzag scan is used, and it can be decomposed into five bit-planes as in Table 1.
- a value of a bit-plane is represented as a binary coefficient.
- the enhancement layer formatted into bit-planes decomposed as above starts from the 4th bit-plane (highest order) and is successively arrayed by each bit-plane unit to the Oth bit-plane Qowest order).
- the bit-plane having the lowest difference is first truncated, to thereby implement the SNR scalability.
- the decoder side is sent the array: ⁇ +8, -8, 0, 0, 16, 0, 0, 0, 0, ... ⁇ .
- Embedded quantization is appropriate for use in a wavelet-based codec. For example, only the values higher than a first threshold value are encoded and only the values higher than a new threshold value generated by halving the first threshold value are encoded , wherein this new threshold value is halved again and these operations are repeated. Unlike FGS, embedded quantization is conducted using the spatial correlation. Embedded quantization methods include the embedded zerotrees wavelet algorithm (EZW), embedded zeroblock coding (EZBC), and set partitioning in hierarchical trees (SPIHT).
- EZW embedded zerotrees wavelet algorithm
- EZBC embedded zeroblock coding
- SPIHT set partitioning in hierarchical trees
- the base layer frame undergoes motion estimation by the motion estimation unit 131, temporal trans- formation by the temporal transformation unit 132, spatial transformation by the spatial transformation 133 and quantization by the quantization unit 134.
- the entropy coding unit 150 generates an output bit-stream by conducting lossless coding (or entropy coding) on the quantization coefficient generated by the quantization unit 134 of the base layer and the quantization unit 124 of the enhancement layer, a motion vector of the base layer generated by the motion estimation unit 131 of the base layer, and a motion vector component of the enhancement layer generated by the motion estimation unit 121.
- lossless coding a variety of coding methods such as Huffman coding, arithmetic coding or variable length coding may be used.
- the pre-decoder may control the transmission bitrate by truncating the bit-stream of the current frame, starting from a later portion of the current frame, according to network conditions.
- the bit-stream of the current frame is truncated, the entire texture component in addition to the motion component may be truncated.
- the bit- stream is truncated according to the network situation as in FIG.
- the target transmission bitrate is lower than a specific threshold bitrate, a better result may be produced if the entire bit-stream of the current frame is discarded and the current frame is restored by upsampling a lower layer frame corresponding thereto in the video decoder.
- a problem is how to seek such a threshold bitrate. Since the pre-decoder cannot determine where the picture quality is superior or inferior because no original input frame exists, the threshold value information must be determined and transmitted by the video encoder 100.
- the video encoder 100 may further comprise a picture quality comparing unit 160.
- the picture quality comparing unit 160 compares an enhancement layer frame restored by decoding the texture component of the enhancement layer frame, a part of which is truncated, in the bit-stream generated by the entropy coding unit 150, with a frame generated by decoding a base layer frame temporally corresponding to the enhancement layer frame and upsampling it to the resolution of the enhancement layer. For quality comparison, the sums of the differences between a frame and the original frame may be compared, or Peak SNRs (PSNRs) obtained based on an original frame may be compared.
- PSNRs Peak SNRs
- the picture quality comparing unit 160 may record the sought threshold bitrate as a marker bit on a bit-stream generated by the entropy coding unit 150.
- FIGS. 11 to 14 illustrate a construction of the bit-stream 300 shown in FlG. 8, according to an exemplary embodiment of the present invention.
- FlG. 11 schematically illustrates the overall construction of the bit-stream 300.
- the bit-stream 300 consists of a sequence header field 310 and a data field 320.
- the data field 320 may comprise one or more GOP fields including 330, 340 and 350.
- FlG. 12 illustrates a detailed structure of each GOP field 330.
- the GOP field 330 may comprise a GOP header 360, a T(O) field 370 in which information on a first frame (an intraframe) based on the first temporal filtering sequence is recorded, an MV field 380 in which a set of motion vectors is recorded, and a 'the other T' field 390 in which information on other frames (interframes), excluding the first frame, are recorded.
- FlG. 13 illustrates a detailed structure of the MV field 380 where motion vectors are recorded.
- Each motion vector field includes a Size field 381 to represent the size of a motion vector and a Data field 382 to record actual data of the motion vector therein.
- the Data field 382 may be composed of a field (not shown) containing information on entropy coding and a binary stream field (not shown) containing actual motion vector information.
- FlG. 14 illustrates a detailed structure of 'the other T' field 390.
- information on interframes the number of which is one less than that of the frames, may be recorded in this field 390.
- Information on each interframe comprises a frame header field 391, a Data Y field
- Size fields 392, 394 and 396 are attached to the front of each of the fields 393, 395 and 396 to represent the size of each component.
- Size fields 392, 394 and 396 are recorded properties of the video confined to the concerned frame, unlike the sequence header field 310 and the GOP header field 360.
- a single frame consists a plurality of color components (including a brightness (Y) component)
- pre-decoding by each color component may be conducted, and the color components each constituting a frame may be pre-decoded with the same percentage.
- the threshold bitrate described with reference to FlG. 10 may be recorded in the frame header field 391 and transmitted to the pre-decoder. Instead of recording the threshold bit number in the frame header field 391, it may be indicated as a separate marker bit on a color component basis.
- FlG. 15 illustrates a construction of a pre-decoder 200 according to an exemplary embodiment of the present invention.
- the pre-decoder 200 pre-decodes the bit-stream 300 provided from the video encoder 100 and controls the SNR or the bitrate of the bit- stream 300.
- Pre-decoding involves controlling the resolution, frame rate and SNR by extracting or truncating part of the bit-stream.
- pre-decoding refers to controlling the SNR of a bit-stream hereinafter.
- the pre-decoder 200 may be understood as a video stream server to transmit a scalable video stream adaptive to a variable network environment.
- the pre-decoder 200 comprises a bit-stream parsing unit 210, a pre-decoding condition determining unit 240, a pre-decoding unit 220 and a bit-stream transmitting unit 230.
- the bit-stream parsing unit 210 parses a bit-stream 300 supplied from the video encoder 100. In this case, it reads out a frame-based information included in the bit- stream 300, e.g., the frame header 391 of 'the other T' field 390, data size information of each color component 392, 394 and 396 and texture information 393, 395 and 397 of FIG. 14.
- a frame-based information included in the bit- stream 300 e.g., the frame header 391 of 'the other T' field 390, data size information of each color component 392, 394 and 396 and texture information 393, 395 and 397 of FIG. 14.
- the pre-decoding condition determining unit 240 determines pre-decoding conditions, i.e., a target transmission bitrate, subject to a variable network situation.
- pre-decoding conditions i.e., a target transmission bitrate
- currently available feedback information is fed back from the video decoder that received the bit-stream transmitted from the pre-decoder 200, on which basis the target transmission bitrate is determined.
- the video decoder is an apparatus to restore a video stream, which may be understood as referring to a client device to receive the video streaming service.
- the pre-decoding unit 220 pre-decodes the parsed bit-stream according to the determined target transmission bitrate.
- a first exemplary embodiment of pre-decoding of a bit-stream on a frame basis and a second exemplary embodiment of pre-decoding of a set of a predetermined number of frames, that is, pre- decoding on a unit basis will be described.
- the pre-decoding unit may be identical to or different from a GOP unit.
- Pre-decoding on a frame basis is adaptive to a frame so that each frame has a variable bitrate according to a variable network situation.
- FIG. 16 is a diagram illustrating pre-decoding on a frame basis, where each of Fl through F4 indicates a frame component in the bit-stream, and the shaded portions indicate portions truncated from each frame by pre-decoding.
- Pre-decoding on a unit basis determines a transmission bitrate for a plurality of frames.
- FlG. 17 is a diagram illustrating pre-decoding on a unit basis.
- a pre- decoding unit consists of four frames
- the same number of bits as in the variable pre- decoding result of FTG. 16 can be truncated as a whole.
- the period of reflecting a change in the network situation is lengthened as compared with transmission on a frame basis.
- the feedback period received from the video decoder side is lengthened accordingly.
- operations conducted by the pre-decoding unit 220 are as illustrated in the flow chart of FlG. 18. It is first determined whether a frame is present in a lower layer of the current frame (SlO). If not present (No in SlO), the current frame is pre-decoded according to pre- decoding conditions or a target transmission bitrate determined by the pre-decoding condition determining unit 240 (S50). In this case, since skipping of the current frame is not possible, the texture component of the current frame is truncated as much as possible in order to adjust the SNR.
- the current frame refers to a frame to be currently transmitted by the pre-decoder.
- the pre-decoder 200 may conduct pre-decoding of the current frame based on color components (Y, U, and V) as illustrated in FlG. 14, and record the sizes of changed texture data in the color component size fields 392, 394 and 396 after pre-decoding. In this case, pre-decoding may be conducted on a color component basis at a variable rate or at the same rate according to the network situation.
- step S50 is repeated.
- Skipping of the current frame may imply that only the texture data of the current frame is skipped or both the texture data and motion data are skipped.
- the video decoder restores the texture data of the current frame by upsampling the texture data of a base layer corresponding thereto, but uses a motion vector of the current frame as a motion vector for the inverse temporal transformation.
- the video decoder upsamples both the texture data and the motion data of the base layer frame, and restores the texture data and motion data of the current frame.
- the threshold bitrate may be determined and transmitted by the picture quality comparing unit 160 of the video decoder. But, it may be determined by any unit. For example, a pre-decoder determines a specific rate between texture information and motion information with respect to a frame and truncates the texture information, whereby the bitrate when a ratio of the texture information to the motion information reaches a specific rate may be set as the threshold bitrate.
- the threshold bitrate may be determined in various ways as construed by those skilled in the art.
- the pre-decoding may be performed in the sequence depicted in FIG.19.
- FIG. 19 is different from FIG. 18 in that it further comprises a determination operation S25.
- operation S25 when the current frame is a B-picture which is not used as a reference frame of other frames (Yes in S25), skipping of the current frame can be performed in S30 and S40.
- operation S25 when the current frame is an I-picture or a P-picture which is used as a reference frame of another frame (No in S25), skipping of the current frame is not possible (S50 and S60).
- the method of FIG. 18 may be used when a lower layer frame is present, despite it being an open-loop coding method, regardless of whether the frame is an I- picture, B-picture or P-picture.
- operations conducted by the pre-decoding unit 220 are as illustrated in FIG. 20.
- a target transmission bitrate determined by the pre-decoding condition determining unit 240 is less than the threshold bitrate (Yes in S 110)
- frames belonging to the pre- decoding unit are pre-decoded in order to have the threshold bitrate (S 120).
- the changed bit numbers are recorded in the field to indicate the texture data size of each frame (S 130). Since the threshold bitrate still exceeds the target transmission bitrate after the pre-decoding, frame skipping can be conducted thereafter.
- the threshold bitrate is controlled so as to approach the target transmission bitrate by skipping the frames , whose lower layers are present , in the inverse order, among the frames belonging to the current pre-decoding unit (S 140).
- the current pre- decoding unit refers to a pre-decoding unit that the pre-decoder 200 currently transmits.
- '0' is recorded in the fields to indicate the texture data sizes of the skipped frames (S 150).
- the frame header 391 and the size fields by color component 392, 394 and 396 are not removed.
- the T(I) field of that frame includes only the frame header 391 and size fields by component 392, 394 and 396, as illustrated in FlG. 21.
- '0' is recorded in the size fields by component 392, 394 and 395.
- the MV(I) field containing motion information on the first frame in FIG. 13 contains only the field 381 to indicate the size of motion vector as illustrated in FlG. 22.
- the field 381 is written with '0.'
- a pre-decoding unit pre-decodes according to the target transmission bitrate (S 160).
- S 160 the target transmission bitrate
- all the frames are not necessarily truncated in a uniform manner. It is not necessary that all frames be truncated evenly as long as the target transmission bitrate is satisfied.
- the changed bit numbers are recorded in the size fields of the respective frames (S 170).
- the bit-stream transmission unit 230 transmits a bit-stream restructured by having its bitrate adjusted by the pre-decoding unit 220, that is, a pre- decoded bit-stream, to the video decoder side, and receives feedback information from the video decoder.
- the feedback information includes information on the currently available bitrate measured as the video decoder receives the bit-stream.
- FlG. 23 illustrates a construction of a multi-layered video decoder 400 according to an exemplary embodiment of the present invention.
- the video decoder 400 comprises an entropy decoding unit 410, a base layer decoder 420, an enhancement layer decoder 430, a skip confirmation unit 440, and an upsampling unit 450.
- the entropy decoding unit 410 conducts inverse entropy coding operations, that is, it extracts data of a base layer frame and data of an enhancement layer from an input bit-stream.
- Each data of the base layer frame and the enhancement layer frame consists of texture data and motion data.
- the skip confirmation unit 440 reads a field that indicates the size of texture data of the current frame among the enhancement layer frames.
- the value is ' 0, ' indicating frame skipping, the number of the current skipped frame is provided to the base layer decoder 420.
- the value is not ' 0, ' the number of the current frame is provided to the enhancement layer decoder 430.
- the current frame in the video decoder 400 of the present invention refers to a frame of a layer that is to be currently restored.
- the base layer decoder 420 restores a lower layer frame having the identical temporal position as the current frame having the provided frame number.
- the enhancement layer decoder 430 restores the current frame from the texture data equivalent to the value.
- the enhancement layer decoder 430 comprises an inverse quantization unit 431, an inverse spatial transformation unit 432 and an inverse temporal transformation unit 433.
- the inverse quantization unit 431 inversely quantizes texture data provided from the skip confirmation unit 440.
- This inverse quantization is an inversion of the quantization conducted in the video encoder 100.
- the quantization table used in the quantization is also used as it is, to restore a transform coefficient.
- the quantization table may be transmitted from the encoder side, or it may be determined in advance by the encoder and the decoder.
- the inverse spatial transformation unit 432 conducts an inverse spatial transformation on the inversely-quantized result.
- the inverse spatial transformation corresponds to spatial transformation conducted in the video encoder 100. Specifically, an inverse DCT transformation or an inverse wavelet transformation may be used.
- the inverse temporal transformation unit 433 restores a video sequence from the inversely-spatial transformation result.
- an estimation frame is generated by use of a motion vector of an enhancement layer provided by the entropy decoding unit 410 and the already restored video frame, and the current frame is restored by adding the inverse-spatial transformation result and the generated estimation frame.
- an intraframe not transformed temporally by the encoder has no need to pass through inverse temporal transformation.
- the intraframe may also remove redundancy of texture of an enhancement layer by use of the base layer when encoding.
- the inverse temporal transformation unit 433 can restore the current frame, which is an intraframe, by use of the restored base layer frame.
- texture data of the base layer may be restored to a base layer frame by passing through the inverse quantization unit 421, the inverse spatial transformation unit 422 and the inverse temporal transformation unit 423. It has been described that the base layer decoder 420 and the enhancement layer decoder 430 are logically separate. However, it is obvious to those skilled in the art that a single decoding module can be implemented to restore both the enhancement layer and the base layer.
- the upsampling unit 450 upsamples the restored base layer frame at the resolution of the enhancement layer.
- the frame generated as a result of upsampling becomes an enhancement layer frame having the number of the concerned frame. This upsampling may be conducted when the resolution of the base layer and that of the enhancement layer are different, but it can be deleted when both have the same resolution.
- AU of the exemplary embodiments of the present invention have been described with reference to a case where a frame has a single base layer and a single enhancement layer.
- those skilled in the art may sufficiently work from the above description other cases where many more layers are added.
- an algorithm used between the base layer and the first enhancement layer will likewise apply between the first enhancement layer and the second enhancement layer.
- FIGS. 8, 15 and 23 may refer to software units or hardware units such as a field programmable gate array (FPGA) or an appUcation- specific integrated circuit (ASIC).
- FPGA field programmable gate array
- ASIC appUcation- specific integrated circuit
- these elements are not limited thereto, and they may be in addressable storage media or they may execute one or more processors. Those functions inherently provided by these elements may be further broken down, or a single element to execute a specific function may be implemented by integrating a plurality of elements.
- the present invention may also be utilized when only truncation of texture information of a frame of a layer makes it not possible to approach a target transmission bitrate.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US61902304P | 2004-10-18 | 2004-10-18 | |
KR1020040107960A KR100703724B1 (ko) | 2004-10-18 | 2004-12-17 | 다 계층 기반으로 코딩된 스케일러블 비트스트림의비트율을 조절하는 장치 및 방법 |
PCT/KR2005/003208 WO2006080655A1 (en) | 2004-10-18 | 2005-09-28 | Apparatus and method for adjusting bitrate of coded scalable bitsteam based on multi-layer |
Publications (2)
Publication Number | Publication Date |
---|---|
EP1803302A1 EP1803302A1 (de) | 2007-07-04 |
EP1803302A4 true EP1803302A4 (de) | 2007-11-07 |
Family
ID=36740644
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP05856385A Withdrawn EP1803302A4 (de) | 2004-10-18 | 2005-09-28 | Vorrichtung und verfahren zur regulierung der bitrate eines kodierten skalierbaren bitstroms auf mehrschichtbasis |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP1803302A4 (de) |
WO (1) | WO2006080655A1 (de) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8243789B2 (en) | 2007-01-25 | 2012-08-14 | Sharp Laboratories Of America, Inc. | Methods and systems for rate-adaptive transmission of video |
CN114424552A (zh) * | 2019-09-29 | 2022-04-29 | 华为技术有限公司 | 一种低延迟信源信道联合编码方法及相关设备 |
WO2021074269A1 (en) * | 2019-10-15 | 2021-04-22 | Interdigital Ce Patent Holdings, Sas | Method and apparatuses for sending and receiving a video |
US20230087097A1 (en) * | 2021-09-22 | 2023-03-23 | Mediatek Inc. | Frame sequence quality booster using information in an information repository |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6529552B1 (en) * | 1999-02-16 | 2003-03-04 | Packetvideo Corporation | Method and a device for transmission of a variable bit-rate compressed video bitstream over constant and variable capacity networks |
WO2003036983A2 (en) * | 2001-10-26 | 2003-05-01 | Koninklijke Philips Electronics N.V. | Spatial scalable compression |
WO2006006777A1 (en) * | 2004-07-15 | 2006-01-19 | Samsung Electronics Co., Ltd. | Method and apparatus for predecoding and decoding bitstream including base layer |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
MXPA04002148A (es) * | 2001-09-06 | 2004-06-29 | Thomson Licensing Sa | Metodo y aparato paa el cronometraje de tiempo de reproduccion transcurrido en archivos de datos de audio codificados en forma digital con velocidad variable de bits. |
US20030215011A1 (en) * | 2002-05-17 | 2003-11-20 | General Instrument Corporation | Method and apparatus for transcoding compressed video bitstreams |
KR20040047010A (ko) * | 2002-11-28 | 2004-06-05 | 엘지전자 주식회사 | 영상 전화 시스템의 비트율 조절방법 |
KR100543608B1 (ko) * | 2003-01-03 | 2006-01-20 | 엘지전자 주식회사 | 오브젝트 기반 비트율 제어방법 및 장치 |
-
2005
- 2005-09-28 EP EP05856385A patent/EP1803302A4/de not_active Withdrawn
- 2005-09-28 WO PCT/KR2005/003208 patent/WO2006080655A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6529552B1 (en) * | 1999-02-16 | 2003-03-04 | Packetvideo Corporation | Method and a device for transmission of a variable bit-rate compressed video bitstream over constant and variable capacity networks |
WO2003036983A2 (en) * | 2001-10-26 | 2003-05-01 | Koninklijke Philips Electronics N.V. | Spatial scalable compression |
WO2006006777A1 (en) * | 2004-07-15 | 2006-01-19 | Samsung Electronics Co., Ltd. | Method and apparatus for predecoding and decoding bitstream including base layer |
Non-Patent Citations (2)
Title |
---|
See also references of WO2006080655A1 * |
YUSUF A A ET AL: "An adaptive motion vector composition algorithm for frame skipping video transcoding", ELECTROTECHNICAL CONFERENCE, 2004. MELECON 2004. PROCEEDINGS OF THE 12TH IEEE MEDITERRANEAN DUBROVNIK, CROATIA 12-15 MAY 2004, PISCATAWAY, NJ, USA,IEEE, US, 12 May 2004 (2004-05-12), pages 235 - 238Vol1, XP010733770, ISBN: 0-7803-8271-4 * |
Also Published As
Publication number | Publication date |
---|---|
WO2006080655A1 (en) | 2006-08-03 |
EP1803302A1 (de) | 2007-07-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7881387B2 (en) | Apparatus and method for adjusting bitrate of coded scalable bitsteam based on multi-layer | |
US8031776B2 (en) | Method and apparatus for predecoding and decoding bitstream including base layer | |
JP4763548B2 (ja) | スケーラブルビデオコーディング及びデコーディング方法と装置 | |
KR100596706B1 (ko) | 스케일러블 비디오 코딩 및 디코딩 방법, 이를 위한 장치 | |
US8520962B2 (en) | Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer | |
KR100772868B1 (ko) | 복수 계층을 기반으로 하는 스케일러블 비디오 코딩 방법및 장치 | |
US20060013313A1 (en) | Scalable video coding method and apparatus using base-layer | |
US20050169379A1 (en) | Apparatus and method for scalable video coding providing scalability in encoder part | |
US20050195899A1 (en) | Method and apparatus for video coding, predecoding, and video decoding for video streaming service, and image filtering method | |
US20050226334A1 (en) | Method and apparatus for implementing motion scalability | |
US20060114999A1 (en) | Multi-layer video coding and decoding methods and multi-layer video encoder and decoder | |
EP1538567A2 (de) | Verfahren und Vorrichtung zur skalierbaren Videokodierung und -dekodierung | |
WO2006080655A1 (en) | Apparatus and method for adjusting bitrate of coded scalable bitsteam based on multi-layer | |
WO2006043753A1 (en) | Method and apparatus for predecoding hybrid bitstream |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20070309 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): DE FR GB IT NL |
|
A4 | Supplementary search report drawn up and despatched |
Effective date: 20071005 |
|
17Q | First examination report despatched |
Effective date: 20071130 |
|
DAX | Request for extension of the european patent (deleted) | ||
RBV | Designated contracting states (corrected) |
Designated state(s): DE FR GB IT NL |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: SAMSUNG ELECTRONICS CO., LTD. |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20160831 |