EP1839441A2 - Fine granularity scalable video encoding and decoding method and apparatus capable of controlling deblocking - Google Patents
Fine granularity scalable video encoding and decoding method and apparatus capable of controlling deblockingInfo
- Publication number
- EP1839441A2 EP1839441A2 EP06715725A EP06715725A EP1839441A2 EP 1839441 A2 EP1839441 A2 EP 1839441A2 EP 06715725 A EP06715725 A EP 06715725A EP 06715725 A EP06715725 A EP 06715725A EP 1839441 A2 EP1839441 A2 EP 1839441A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- deblocking
- base layer
- data
- fgs
- enhancement layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 67
- 238000013139 quantization Methods 0.000 claims description 36
- 238000000605 extraction Methods 0.000 claims description 10
- 230000002123 temporal effect Effects 0.000 claims description 9
- 239000000284 extract Substances 0.000 claims description 5
- 238000001914 filtration Methods 0.000 claims description 5
- 230000009466 transformation Effects 0.000 claims description 4
- 230000001131 transforming effect Effects 0.000 claims 6
- 230000006872 improvement Effects 0.000 description 17
- 230000008569 process Effects 0.000 description 11
- 238000009499 grossing Methods 0.000 description 10
- 230000006835 compression Effects 0.000 description 8
- 238000007906 compression Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 8
- 230000003247 decreasing effect Effects 0.000 description 7
- 230000007423 decrease Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
- H04N19/615—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding using motion compensated temporal filtering [MCTF]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
- H04N19/34—Scalability techniques involving progressive bit-plane based encoding of the enhancement layer, e.g. fine granular scalability [FGS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/63—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/13—Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
Definitions
- the present invention relates to a fine granularity scalable video encoding and decoding method and apparatus capable of controlling deblocking
- multimedia data is large, a high capacity storage medium and a wide bandwidth is required to store and transmit the multimedia data, respectively. Therefore, in order to transmit multimedia data including text, moving pictures (hereinafter referred to as 'video'), and audio, a compression and coding technique must be used of methods of compressing multimedia data, in particular, video compression methods can be classified into lossy/non lossy compression, intra frame/inter frame compression, and symmetric/asymmetric compression according to whether original data is lost, whether data is independently compressed for each frame, and whether the time required for compression is the same as the time for reconstruction, respectively Compression when the resolution of frames varies is classified as scalable compression
- Scalability is a technique using a base layer and an enhancement layer, and allowing a decoder to observe the processing status, network status, and others, and to perform selective decoding with respect to time, space, or the Signal to Noise Ratio (SNR) Of scalabilities, Fine Granularity Scalability (FGS) encodes the base layer and the enhancement layer. After the enhancement layer has been encoded, the encoded enhancement layer may not be transmitted or decoded according to the transmission efficiency of a network or the status of a decoder. Through FGS, data can be suitably transmitted according to a bit rate.
- SNR Signal to Noise Ratio
- FGS Fine Granularity Scalability
- an aspect of the present invention is to provide an encoding and decoding method and apparatus, which can perform low-intensity deblocking in video encoding and decoding that supports FGS, thus improving a Peak Signal to Noise Ratio (PSNR).
- PSNR Peak Signal to Noise Ratio
- Another aspect of the present invention is to provide an encoding and decoding method and apparatus, which improve video quality while reducing data loss caused by deblocking.
- a FGS-based video encoding method capable of controlling deblocking, comprising the steps of (a) receiving original data of video and generating a base layer based on the original data, (b) obtaining a difference between data that are obtained by reconstructing the base layer and deblocking the reconstructed base layer, and the original data, thus generating an enhancement layer, (c) generating a reconstructed frame, based on the data that are obtained by reconstructing the enhancement layer, and data that are obtained by reconstructing and deblocking the reconstructed base layer, and (d) deblocking the reconstructed frame at a lower intensity than that of deblocking that has been performed in step (b) or (c).
- FGS-based video decoding method capable of controlling deblocking, comprising the steps of (a) receiving a video stream and extracting a base layer from the video stream, (b) extracting an enhancement layer from the video stream, (c) adding data that are obtained by reconstructing and deblocking the base layer, to data that are obtained by reconstructing the enhancement layer, thus generating a reconstructed frame, and (d) deblocking the reconstructed frame at a lower intensity than that of deblocking performed m step (c).
- FGS-based video encoder capable of controlling deblocking, comprising a base layer generation unit for generating a base layer based on original data of video, an enhancement layer generation unit for obtaining a difference between data that are obtained by reconstructing and deblocking the base layer, and the original data, thus generating an enhancement layer, a reconstructed frame generation unit for generating a reconstructed frame, based on data that are obtained by reconstructing the enhancement layer, and data that are obtained by reconstructing and deblocking the base layer, and a first deblocking unit for deblocking the reconstructed frame at a lower intensity than that of deblocking performed by the enhancement layer generation unit or the reconstructed frame generation unit.
- FGS-based video decoder capable of controlling deblocking, comprising a base layer extraction unit for extracting a base layer from a received video stream, an enhancement layer extraction unit for extracting an enhancement layer from the received video stream, a reconstructed frame generation unit for adding data that are obtained by reconstructing and deblocking the base layer, to data that are obtained by reconstructing the enhancement layer, thus generating a reconstructed frame, and a first deblocking unit for deblocking the reconstructed frame at a lower intensity than that of deblocking performed by the reconstructed frame generation unit.
- FIG. 1 is a diagram showing an apparatus for encoding video that supports FGS according to an embodiment of the present invention
- FIG. 2 is a diagram showing an apparatus for decoding video that supports FGS according to an embodiment of the present invention
- FIG. 3 is a diagram showing an apparatus for encoding video that supports FGS according to another embodiment of the present invention.
- FIG. 4 is a diagram showing an apparatus for decoding video that supports FGS according to another embodiment of the present invention.
- FIG. 5 is a flowchart showing a process of encoding the original data of a video according to an embodiment of the present invention
- FIG. 6 is a flowchart showing a process of decoding a received video stream according to an embodiment of the present invention
- FIG. 7 is a view showing an example of reconstruction results for a base layer and enhancement layers according to an embodiment of the present invention.
- FIGS. 8 A and 8B are graphs showing the degree of improvement of a PSNR according to an embodiment of the present invention.
- FIGS. 9 A and 9B are graphs showing the degree of improvement of a PSNR according to another embodiment of the present invention.
- the terms 'unit' and 'module' which are used in the exemplary embodiments of the present invention, denote software components, or hardware components, such as a Field- Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC).
- FPGA Field- Programmable Gate Array
- ASIC Application Specific Integrated Circuit
- Each module executes certain functions.
- a module can be implemented to reside in an addressable storage medium, or to run on one or more processors. Therefore, as an example, a module includes various components, such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, sub-routines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays and variables.
- the functions provided by the components and modules can be combined into a small number of components and modules, or can be separated into additional components or modules.
- components and modules can be implemented to drive one or more central processing units (CPUs) in a device or security multimedia card.
- FIG. 1 is a diagram showing an apparatus for encoding video that supports FGS according to an exemplary embodiment of the present invention.
- a base layer is generated using an original frame 101.
- the original frame 101 may be a frame extracted from a group of pictures (GOP), and it may be obtained by performing Motion-Compensated Temporal Filtering (MCTF) on the GOP.
- MCTF Motion-Compensated Temporal Filtering
- a transform & quantization unit 201 performs transformation and quantization.
- a base layer frame 501 is generated.
- an enhancement layer denotes data to be added to the base layer
- the difference between the original frame and the base layer frame is obtained. Residual data obtained by the difference is used later in such a way that a decoder obtains original video data by adding corresponding residual data to the original frame.
- the data obtained by the decoder is inversely quantized and inversely transformed with respect to the original frame. Accordingly, the base layer frame, calculated by the transform & quantization unit 201, is inversely quantized and inversely transformed by an inverse quantization & inverse -transform unit 301 in order to reconstruct the base layer frame.
- the decoder performs deblocking to eliminate boundaries between blocks constituting the reconstructed frame; deblocking is performed on the frame reconstructed by a deblocking unit 401.
- the difference between the reconstructed base layer frame 102 calculated by the inverse quantization & inverse transform unit 301 and the original frame 101 is obtained by a subtracter 11.
- Data obtained using the subtracter 11 is transformed and quantized by a transform & quantization unit 202 in order to generate a first enhancement layer frame 502.
- the first enhancement layer frame is added to the reconstructed base layer frame 102 in order to generate a second enhancement layer frame.
- the first enhancement layer frame is reconstructed using an inverse quantization & inverse transform unit 302 so that a first reconstructed enhancement layer frame 103 is generated.
- the frames 103 and 102 are added to each other by an adder 12 to generate a new frame 104.
- the difference between the frame 104 and the original frame 101 is obtained by a subtracter 11.
- Residual data, obtained by the difference is transformed and quantized by a transform & quantization unit 203 to generate a second enhancement layer frame 503.
- the above process is repeated so that a third enhancement layer frame, a fourth enhancement layer frame, and others can be successively generated.
- the base layer frame 501, the first enhancement layer frame 502 and the second enhancement layer frame 503 generated in this way can be transmitted in the form of a Network Abstraction Layer unit (NAL unit).
- NAL unit Network Abstraction Layer unit
- the decoder can reconstruct data even if part of the received NAL unit is truncated.
- deblocking is performed on a reconstructed frame 106 that is obtained by adding the second reconstructed enhancement layer frame 105, reconstructed by an inverse quantization & inverse transform unit 303, to the frame 104 through the adder 12.
- a deblocking coefficient decreases when deblocking is performed by a deblocking unit 402.
- a high deblocking coefficient is assigned when deblocking is performed by the deblocking unit 402, but an over- smoothing problem occurs.
- the deblocking co- efficient is set to a low value, such as 1 or 2, for the deblocking unit 402 so as to prevent the above problem, thus decreasing the degree of deblocking and preventing over-smoothing.
- the reconstructed frame, deblocked in this way, can be referred to when other frames are generated.
- a temporal sub-band picture is generated by performing MCTF on a GOP constituting video, and original data is extracted from the temporal sub-band picture.
- the original data is down-sampled from all of the data. If this data is transformed through a Discrete Cosine Transform (DCT) or a wavelet transform, and quantized and encoded, the base layer is generated.
- DCT Discrete Cosine Transform
- the transform & quantization units 201, 202 and 203 of FIG. 1 can perform lossy encoding. Part of the original information is lost because it is transformed through a DCT and quantized. Accordingly, this encoding is called lossy encoding.
- the transform & quantization unit 201 of FIG. 1 is an exemplary embodiment of a base layer generation unit for generating a base layer, and the transform & quantization units 202 and 203 for generating enhancement layers are exemplary embodiments of an enhancement layer generation unit.
- the reconstructed frames are indicated by reference numerals 102, 104, 106, 103 and 105, and the inverse quantization & inverse transform units 301, 302 and 303 for generating the reconstructed frames are exemplary embodiments of a reconstructed frame generation unit.
- FIG. 2 is a diagram of an apparatus for decoding video to support FGS according to an exemplary embodiment of the present invention.
- the base layer frame 501, the first enhancement layer frame 502 and the second enhancement layer frame 503, generated in the process shown in FIG. 1, are received, and since these frames are encoded data, they are decoded by inverse quantization & inverse transform units 311, 312 and 313. At this time, a reconstructed base layer frame 111 is obtained through a deblocking block 411.
- Frames 111, 112 and 113 which have been decoded and reconstructed, are added to each other by an adder 12.
- Deblocking is performed on the added frames by a deblocking unit to eliminate the boundaries between blocks.
- the base layer frame has already been deblocked by the deblocking unit 411 so that a coefficient for deblocking, which is performed by the deblocking unit 412, decreases to 1 or 2 in the embodiment of the present invention. After deblocking has been completed in this way, a reconstructed original frame is reproduced.
- the inverse quantization & inverse transform unit 311 of FIG. 2 is an exemplary embodiment of a base layer extraction unit for extracting a base layer
- the inverse quantization & inverse transform units 312 and 313 for extracting enhancement layers are exemplary embodiments of an enhancement layer extraction unit.
- Reconstructed frames are indicated by reference numerals 111, 112 and 113, and the adder 12 for adding the frames to each other is an embodiment of a reconstructed frame generation unit.
- FGS uses an enhancement layer of a Scalable Video
- a NAL unit obtained as a result of FGS can be truncated at a specific point, and frames can be reconstructed using data existing up to the truncation point.
- data to be transmitted corresponds to a base layer, and other enhancement layers can be flexibly transmitted depending on the transmission status of a network. All enhancement layers have residual data occurring due to the difference between the enhancement layers and the base layer (or a reconstructed frame composed of the base layer and a previous enhancement layer).
- a quantization parameter QPi is a parameter for generating an i-th enhancement layer. As the magnitude of the quantization parameter increases, the step size increases. Therefore, at the time of generating enhancement layers, data can be obtained while the magnitude of the quantization parameter gradually decreases.
- FIGS. 1 and 2 perform deblocking at a low intensity when enhancement layers are directly encoded, or when enhancement layers are added to a base layer and decoded, thus reducing information loss caused by excessive deblocking.
- FGS described with reference to FIGS. 1 and 2
- SVM 3.0 An exemplary embodiment for implementing FGS using another method is described below.
- FIG. 3 is a diagram of an apparatus for encoding video to support FGS according to another embodiment of the present invention. Unlike FIG. 1, a base layer and an enhancement layer are generated, and the enhancement layer is implemented through a bit plane.
- original video data is transformed by a transform unit 221.
- a transform unit 221 As an example of transform, a Discrete Cosine Transform (DCT) can be used.
- a base layer is generated if data obtained as the result of the DCT transform is quantized by a quantization unit 222, and the quantized data is encoded by an encoding unit 223 that uses entropy encoding or variable length coding (VLC).
- VLC variable length coding
- deblocking is performed in a decoder
- deblocking is also performed by a deblocking unit 421 m an encoding stage
- residual data the difference between deblocked data and the original video data
- the residual data is encoded again by an encoding unit 224
- MSB Most Significant Bit
- MSB Most Significant Bit
- LSB Least Significant Bit
- the enhancement layer generated by the encoding unit 224 is transmitted with the base layer.
- deblocking is performed by a deblocking unit 422 in order to reconstruct the frame
- the deblocking is performed by the deblocking unit 422 after the deblocking for the base layer has been performed by the deblocking unit 421, a deblocking coefficient is decreased, thus preventing the occurrence of over- smoothing.
- FIG. 4 is a view of an apparatus for decoding video to support FGS according to another exemplary embodiment of the present invention. Unlike FIG. 2, a base layer and an enhancement layer are received Data of the enhancement layer can be partially truncated in one enhancement layer depending on the receiving capability or decoding capability of a decoding stage (decorder).
- Both the base layer and the enhancement layer, transmitted in a stream format, are inverse quantized and inverse transformed
- the base layer is reconstructed by a deblocking unit 431 after passing through an inverse quantization unit 331 and an inverse transform unit 332.
- the enhancement layer is reconstructed through an inverse quantization unit 335 and an inverse transform unit 336
- the reconstructed base layer and enhancement layer are added to each other by an adder 12 so that a single reconstructed frame is created.
- deblocking is performed by a deblocking unit 432.
- FIG. 5 is a flowchart showing a process of encoding the original data of video according to an embodiment of the present invention
- MCTF is performed on original data constituting video so that a frame is generated in step SlOl.
- the original data may be a GOP composed of a plurality of frames.
- a motion vector is obtained through motion estimation, and a motion compensated frame is configured using the motion vector and a reference frame. Further, the difference between a current frame and the motion compensated frame is obtained so that a residual frame is obtained, thus reducing temporal redundancy.
- various methods such as fixed size block matching or Hierarchical Variable Size Block Matching (HVSBM), can be used.
- MCTF is one method of providing temporal scalability, and some methods of implementing the MCTF includes a method using a Haar filter, a Motion Adaptive Filtering (MAF) method, a method using a 5/3 filter.
- the results, calculated by these methods, provide temporally scalable video data.
- a process of generating base layer data and enhancement layer data is executed.
- data is divided into a base layer and an enhancement layer.
- the base layer is extracted from a frame, on which the MCTF has been performed, through sampling in step S 103.
- the base layer can be compressed using several schemes. In the case of motion compensation video encoding, a DCT can be used.
- the base layer becomes the basis for generating the enhancement layer so that various existing video encoding methods can be used.
- the base layer can be generated by the transform & quantization units 201, 202 and 203 of FIG 1, or the transform unit 221, the quantization unit 222 and the encoding unit 223 of FIG. 3.
- step S 105 residual data, obtained by the difference between the base layer, generated in step S103, and the original data generated in step SlOl, is extracted, so the enhancement layer is generated in step S 105
- various fine-granular schemes can be used. For example, a wavelet method, a DCT method, and a matching-pursuit based method can be used. It is well known that, of these methods, the bitplane DCT coding method and the embedded zero-tree wavelet (EZW) method exhibit excellent performance.
- step S 105 an inverse quantization procedure to inversely quantize a quantized base layer may be further required.
- the base layer is reconstructed by the inverse quantization & inverse transform units 301, 302 and 303 of FIG. 1, or the inverse quantization unit 321 of FIG. 3, as described above.
- video data can be obtained by adding the enhancement layer to the base layer that has been inversely quantized; the base layer must be inverse quantized to obtain the residual data in order to reduce data loss.
- deblocking can be performed after inverse quantization has been performed. Deblocking is used to smooth the boundaries between blocks constituting frames. The difference between the base layer, which was inversely quantized, and the original data, on which MCTF was performed m step SlOl, is obtained, so that the enhancement layer is generated, as described above.
- one or more enhancement layers may exist. As the number of enhancement layers increases, the unit of FGS is subdivided, thereby improving SNR scalability.
- the decorder can determine the number of enhancement layers to be received and to be decoded, depending on its decoding capability or reception capability.
- step S 110 If base layer data and enhancement layer data are generated with respect to a single frame, a procedure of adding the base layer data to the enhancement layer data and generating a new reconstructed frame is required in step S 110.
- the reconstructed frame becomes the basis for generating other frames, or is necessary for generating a predictive frame for motion estimation. In this case, since boundaries between blocks exist in the reconstructed frame, deblocking is performed to eliminate the boundaries between blocks.
- the reconstructed frame includes the base layer, which has been deblocked in step S 105, so that deblocking is performed at a low intensity in step S115.
- step S 105 If it is assumed that base layer data is B, enhancement layer data is El, E2, .., En, and deblocking performed on the base layer data in step S 105 is Dl, the reconstructed frame F, obtained in step Sl 10, can be expressed as Dl(B) + El + E2 + ... + En. Further, the result of the deblocking performed in step Sl 15 is: D2 (Dl(B) + El + E2 + . + En) In this case, the deblocking coefficient df2 of D2 may be set to 1 or 2.
- FIG. 5 shows that, after original video data is transformed to provide temporal scalability, the transformed data is divided into base layer data and enhancement layer data to provide SNR scalability.
- this processing sequence is not necessarily performed.
- base layer data and enhancement layer data is obtained to provide SNR scalability for original video data regardless of whether corresponding data is used to provide temporal scalability
- a new transform procedure for providing another type of scalability may be conducted.
- a plurality of schemes may be employed, and the present invention is not limited to these schemes.
- FIG. 6 is a flowchart showing a process of decoding a received video stream according to an exemplary embodiment of the present invention.
- a process of a decoder receiving and decoding a video stream is described in the following.
- the decoder receives the video stream in step S201.
- the decoder extracts a base layer from the received video stream, and reconstructs the base layer in step S203.
- the reconstruction of the base layer is performed through an inverse quantization and an inverse transform.
- the reconstructed base layer is deblocked in order to be added to other enhancement layers in step S205.
- an enhancement layer is extracted from the received video stream, and the extracted enhancement layer is reconstructed in step S210.
- the reconstruction of the enhancement layer is also performed through an inverse quantization and an inverse transform.
- the base layer, deblocked in step S205, and the enhancement layer, reconstructed in step S210 are added to each other, so that a reconstructed frame is generated in step S220.
- deblocking is performed on the reconstructed frame with a deblocking coefficient of 1 or 2 in step S230. Since the base layer has already been deblocked once in step S205, deblocking is performed at a low intensity to prevent over-smoothing in step S230.
- FIG. 7 is a diagram showing an example of reconstruction results for a base layer and enhancement layers according to an embodiment of the present invention.
- FIG. 7 illustrates the generation of a reconstructed frame, which has been deblocked by the deblocking unit 402 of FIG. 1, or a reconstructed frame, which has been deblocked by the deblocking unit 412 of FIG. 2. Further, FIG. 7 also illustrates the generation of a reconstructed frame, which has been deblocked by the deblocking unit 422 of FIG 3, or a reconstructed frame, which has been deblocked by the deblocking unit 432 of FIG. 4.
- a frame 151 denotes a frame obtained by deblocking a reconstructed base layer after reconstructing the base layer again. That is, the frame 151 is obtained by performing deblocking through the deblocking unit 401 of FIG. 1, the deblocking unit 411 of FIG. 2, the deblocking unit 421 of FIG. 3, or the deblocking unit 431 of FIG. 4.
- Reference numeral 152 or 153 is a frame obtained by reconstructing an enhancement layer.
- the reconstruction of the enhancement layer is performed by the inverse quantization & inverse transform units 302 and 303 of FIG. 1, the inverse quantization & inverse transform units 312 and 313 of FIG. 2, the decoding unit 325 of FIG. 3, or the inverse transform unit 336 of FIG. 4.
- the reconstructed enhancement layers and the reconstructed base layer, which has been deblocked, are added by an adder to produce a single frame 155.
- deblocking is performed again. As described above, if a deblocking coefficient is decreased and deblocking is performed, over- smoothing may be prevented. Through this process, the original frame 157 is reconstructed.
- the deblocking coefficient or deblocking filter is decreased to 1 or 2 to perform deblocking.
- deblocking coefficients ranging up to 4 exist. If the deblocking coefficient is subdivided and the maximum value thereof is increased to 8 or 16, deblocking is performed using a low deblocking coefficient corresponding to the increased coefficient.
- Table 1 shows results obtained according to an exemplary embodiment of the present invention.
- a football moving picture is sampled at frequencies of 7.5 Hz and 15 Hz.
- Table 1 shows the degree of improvement of PSNR when the method of decreasing the deblocking coefficient, proposed in the present invention, is applied depending on the bit rate of a network. As shown in Table 1, it can be seen that the degree of improvement of the PSNR is high at a low rate (160 kbps and 192 kbps at 7.5 Hz, and 243 kbps at 15 Hz). The degree of improvement of Table 1 is displayed graphically in FIGS. 8A and 8B. FIG.
- FIG. 8A shows the degree of improvement of PSNR when video, sampled at a frequency of 7.5 Hz in the Quarter Common Intermediate Format (QCIF), is deblocked at a low intensity.
- FIG. 8B shows the degree of improvement of the PSNR when video, sampled at a frequency of 15 Hz in the QCIF, is deblocked at a low intensity. As shown in the two graphs, the degree of improvement of the PSNR is high when the bit rate is low.
- QCIF Quarter Common Intermediate Format
- Table 2 shows results obtained according to an exemplary embodiment of the present invention.
- Table 2 shows the degree of improvement of the PSNR when the method of decreasing a deblocking coefficient, proposed in the exemplary embodiment of the present invention, is applied depending on the bit rate of a network.
- the degree of improvement of the PSNR is high at a low rate (588 kbps and 690 kbps at 15 Hz, and 920 kbps and 1124 Kbps at 30 Hz).
- the degree of improvement in Table 2 is displayed graphically in FIGS. 9A and 9B.
- FIG. 9A and 9B The degree of improvement in Table 2 is displayed graphically in FIGS. 9A and 9B.
- FIG. 9A shows the degree of improvement of the PSNR when video, sampled at a frequency of 15 Hz in the QCIF, is deblocked at a low intensity.
- FIG. 9B shows the degree of improvement of the PSNR when video, sampled at a frequency of 30 Hz in the QCIF, is deblocked at a low intensity.
- the degree of improvement of the PSNR is high when the bit rate is low. That is, FGS is required when the bit rate of a network is low, so that the image quality is excellent if the degree of improvement of the PSNR is high while the bit rate is low, as shown in Tables 1 and 2 according to the method proposed in the present specification.
- the present invention is advantageous in that it can perform deblocking at a low intensity in video encoding and decoding that support FGS, thus improving a PSNR
- the present invention is advantageous in that it can improve the quality of video while reducing data loss caused by deblocking
Abstract
Disclosed herein is a Fine Granularity Scalability (FGS)-based video encoding and decoding method and apparatus capable of controlling deblocking. In the video decoding method according to the present invention, original video data is received and a base layer is generated based on the original data. Next, the difference between the original data and data that is obtained by reconstructing the base layer and deblocking the reconstructed base layer is obtained, thus generating an enhancement layer. Then, a reconstructed frame is generated based on the data that is obtained by reconstructing the enhancement layer, and data that is obtained by reconstructing and deblocking the reconstructed base layer. Finally, the reconstructed frame is deblocked at a lower intensity than that of deblocking performed in the first two steps.
Description
Description
FINE GRANULARITY SCALABLE VIDEO ENCODING AND DECODING METHOD AND APPARATUS CAPABLE OF
CONTROLLING DEBLOCKING
Technical Field
[1] The present invention relates to a fine granularity scalable video encoding and decoding method and apparatus capable of controlling deblocking
Background Art
[2] Since multimedia data is large, a high capacity storage medium and a wide bandwidth is required to store and transmit the multimedia data, respectively. Therefore, in order to transmit multimedia data including text, moving pictures (hereinafter referred to as 'video'), and audio, a compression and coding technique must be used Of methods of compressing multimedia data, in particular, video compression methods can be classified into lossy/non lossy compression, intra frame/inter frame compression, and symmetric/asymmetric compression according to whether original data is lost, whether data is independently compressed for each frame, and whether the time required for compression is the same as the time for reconstruction, respectively Compression when the resolution of frames varies is classified as scalable compression
[3] The purpose of conventional video coding is to transmit information optimized for a given bit rate However, in network video applications, such as streaming video over the Internet, the performance of a network is not constant, but changes according to the circumstances Accordingly, flexible coding is required, in addition to the purpose of conventional video encoding which is to perform optimal coding for a predetermined bit rate
[4] Scalability is a technique using a base layer and an enhancement layer, and allowing a decoder to observe the processing status, network status, and others, and to perform selective decoding with respect to time, space, or the Signal to Noise Ratio (SNR) Of scalabilities, Fine Granularity Scalability (FGS) encodes the base layer and the enhancement layer. After the enhancement layer has been encoded, the encoded enhancement layer may not be transmitted or decoded according to the transmission efficiency of a network or the status of a decoder. Through FGS, data can be suitably transmitted according to a bit rate.
Disclosure of Invention
Technical Problem
[5] Meanwhile, video encoding is performed to code and transmit a plurality of blocks
in a single screen. Accordingly, at the time of decoding video, visible boundaries between blocks may appear. The operation of smoothing the boundaries between blocks is called deblocking, and a component for smoothing the boundaries is called a deblocking filter.
[6] If the intensity of deblocking filtering is increased, the strength of smoothing boundaries is increased, so that the boundaries between blocks may disappear. However, information may disappear due to the deblocking filter, so that the selection of a deblocking filter greatly influences performance.
Technical Solution
[7] Therefore, an apparatus and method for efficiently using a deblocking filter and supporting FGS are required.
[8] Accordingly, the present invention has been made keeping in mind the above problems occurring in the prior art, and an aspect of the present invention is to provide an encoding and decoding method and apparatus, which can perform low-intensity deblocking in video encoding and decoding that supports FGS, thus improving a Peak Signal to Noise Ratio (PSNR).
[9] Another aspect of the present invention is to provide an encoding and decoding method and apparatus, which improve video quality while reducing data loss caused by deblocking.
[10] The object of the present invention is not limited to the above aspects, and other aspects, not described, will be clearly understood by those skilled in the art from the following descriptions.
[11] In accordance with one aspect of the present invention to accomplish the above objects, there is provided a FGS-based video encoding method capable of controlling deblocking, comprising the steps of (a) receiving original data of video and generating a base layer based on the original data, (b) obtaining a difference between data that are obtained by reconstructing the base layer and deblocking the reconstructed base layer, and the original data, thus generating an enhancement layer, (c) generating a reconstructed frame, based on the data that are obtained by reconstructing the enhancement layer, and data that are obtained by reconstructing and deblocking the reconstructed base layer, and (d) deblocking the reconstructed frame at a lower intensity than that of deblocking that has been performed in step (b) or (c).
[12] In accordance with another aspect of the present invention, there is provided a
FGS-based video decoding method capable of controlling deblocking, comprising the steps of (a) receiving a video stream and extracting a base layer from the video stream, (b) extracting an enhancement layer from the video stream, (c) adding data that are obtained by reconstructing and deblocking the base layer, to data that are obtained by reconstructing the enhancement layer, thus generating a reconstructed frame, and (d)
deblocking the reconstructed frame at a lower intensity than that of deblocking performed m step (c).
[13] In accordance with a further aspect of the present invention, there is provided a
FGS-based video encoder capable of controlling deblocking, comprising a base layer generation unit for generating a base layer based on original data of video, an enhancement layer generation unit for obtaining a difference between data that are obtained by reconstructing and deblocking the base layer, and the original data, thus generating an enhancement layer, a reconstructed frame generation unit for generating a reconstructed frame, based on data that are obtained by reconstructing the enhancement layer, and data that are obtained by reconstructing and deblocking the base layer, and a first deblocking unit for deblocking the reconstructed frame at a lower intensity than that of deblocking performed by the enhancement layer generation unit or the reconstructed frame generation unit.
[14] In accordance with yet another aspect of the present invention, there is provided a
FGS-based video decoder capable of controlling deblocking, comprising a base layer extraction unit for extracting a base layer from a received video stream, an enhancement layer extraction unit for extracting an enhancement layer from the received video stream, a reconstructed frame generation unit for adding data that are obtained by reconstructing and deblocking the base layer, to data that are obtained by reconstructing the enhancement layer, thus generating a reconstructed frame, and a first deblocking unit for deblocking the reconstructed frame at a lower intensity than that of deblocking performed by the reconstructed frame generation unit.
Description of Drawings
[15] FIG. 1 is a diagram showing an apparatus for encoding video that supports FGS according to an embodiment of the present invention;
[16] FIG. 2 is a diagram showing an apparatus for decoding video that supports FGS according to an embodiment of the present invention;
[17] FIG. 3 is a diagram showing an apparatus for encoding video that supports FGS according to another embodiment of the present invention,
[18] FIG. 4 is a diagram showing an apparatus for decoding video that supports FGS according to another embodiment of the present invention,
[19] FIG. 5 is a flowchart showing a process of encoding the original data of a video according to an embodiment of the present invention;
[20] FIG. 6 is a flowchart showing a process of decoding a received video stream according to an embodiment of the present invention,
[21] FIG. 7 is a view showing an example of reconstruction results for a base layer and enhancement layers according to an embodiment of the present invention;
[22] FIGS. 8 A and 8B are graphs showing the degree of improvement of a PSNR
according to an embodiment of the present invention; and
[23] FIGS. 9 A and 9B are graphs showing the degree of improvement of a PSNR according to another embodiment of the present invention.
Mode for Invention
[24] Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the attached drawings. The features and advantages of the present invention will be more clearly understood from the exemplary embodiments, which will be described m detail in conjunction with the accompanying drawings. However, the present invention is not limited to the exemplary embodiments, which will be disclosed later, but can be implemented in various forms. The exemplary embodiments are provided to complete the disclosure of the present invention, and to sufficiently disclose the scope of the present invention to those skilled in the art. The present invention should be defined by the attached claims. The same reference numerals are used throughout the different drawings to designate the same or similar components.
[25] The terms 'unit' and 'module', which are used in the exemplary embodiments of the present invention, denote software components, or hardware components, such as a Field- Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC). Each module executes certain functions. A module can be implemented to reside in an addressable storage medium, or to run on one or more processors. Therefore, as an example, a module includes various components, such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, sub-routines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays and variables. The functions provided by the components and modules can be combined into a small number of components and modules, or can be separated into additional components or modules. Moreover, components and modules can be implemented to drive one or more central processing units (CPUs) in a device or security multimedia card.
[26] FIG. 1 is a diagram showing an apparatus for encoding video that supports FGS according to an exemplary embodiment of the present invention. First, a base layer is generated using an original frame 101. The original frame 101 may be a frame extracted from a group of pictures (GOP), and it may be obtained by performing Motion-Compensated Temporal Filtering (MCTF) on the GOP. In order to extract a base layer from the original frame, a transform & quantization unit 201 performs transformation and quantization. As a result, a base layer frame 501 is generated.
[27] Since an enhancement layer denotes data to be added to the base layer, the difference between the original frame and the base layer frame is obtained. Residual
data obtained by the difference is used later in such a way that a decoder obtains original video data by adding corresponding residual data to the original frame. The data obtained by the decoder is inversely quantized and inversely transformed with respect to the original frame. Accordingly, the base layer frame, calculated by the transform & quantization unit 201, is inversely quantized and inversely transformed by an inverse quantization & inverse -transform unit 301 in order to reconstruct the base layer frame.
[28] Further, the decoder performs deblocking to eliminate boundaries between blocks constituting the reconstructed frame; deblocking is performed on the frame reconstructed by a deblocking unit 401.
[29] The difference between the reconstructed base layer frame 102 calculated by the inverse quantization & inverse transform unit 301 and the original frame 101 is obtained by a subtracter 11. Data obtained using the subtracter 11 is transformed and quantized by a transform & quantization unit 202 in order to generate a first enhancement layer frame 502. The first enhancement layer frame is added to the reconstructed base layer frame 102 in order to generate a second enhancement layer frame. For this operation, the first enhancement layer frame is reconstructed using an inverse quantization & inverse transform unit 302 so that a first reconstructed enhancement layer frame 103 is generated. The frames 103 and 102 are added to each other by an adder 12 to generate a new frame 104. The difference between the frame 104 and the original frame 101 is obtained by a subtracter 11. Residual data, obtained by the difference, is transformed and quantized by a transform & quantization unit 203 to generate a second enhancement layer frame 503. The above process is repeated so that a third enhancement layer frame, a fourth enhancement layer frame, and others can be successively generated.
[30] The base layer frame 501, the first enhancement layer frame 502 and the second enhancement layer frame 503 generated in this way can be transmitted in the form of a Network Abstraction Layer unit (NAL unit). When the frames are transmitted as a NAL unit, the decoder can reconstruct data even if part of the received NAL unit is truncated.
[31] Further, deblocking is performed on a reconstructed frame 106 that is obtained by adding the second reconstructed enhancement layer frame 105, reconstructed by an inverse quantization & inverse transform unit 303, to the frame 104 through the adder 12. In this case, since the base layer frame has already been deblocked by the deblocking unit 401, a deblocking coefficient decreases when deblocking is performed by a deblocking unit 402. Generally, a high deblocking coefficient is assigned when deblocking is performed by the deblocking unit 402, but an over- smoothing problem occurs. In the exemplary embodiment of the present invention, the deblocking co-
efficient is set to a low value, such as 1 or 2, for the deblocking unit 402 so as to prevent the above problem, thus decreasing the degree of deblocking and preventing over-smoothing. The reconstructed frame, deblocked in this way, can be referred to when other frames are generated.
[32] As an example of video data in FIG. 1 , a temporal sub-band picture is generated by performing MCTF on a GOP constituting video, and original data is extracted from the temporal sub-band picture. The original data is down-sampled from all of the data. If this data is transformed through a Discrete Cosine Transform (DCT) or a wavelet transform, and quantized and encoded, the base layer is generated.
[33] The transform & quantization units 201, 202 and 203 of FIG. 1 can perform lossy encoding. Part of the original information is lost because it is transformed through a DCT and quantized. Accordingly, this encoding is called lossy encoding.
[34] The transform & quantization unit 201 of FIG. 1 is an exemplary embodiment of a base layer generation unit for generating a base layer, and the transform & quantization units 202 and 203 for generating enhancement layers are exemplary embodiments of an enhancement layer generation unit. The reconstructed frames are indicated by reference numerals 102, 104, 106, 103 and 105, and the inverse quantization & inverse transform units 301, 302 and 303 for generating the reconstructed frames are exemplary embodiments of a reconstructed frame generation unit.
[35] FIG. 2 is a diagram of an apparatus for decoding video to support FGS according to an exemplary embodiment of the present invention. The base layer frame 501, the first enhancement layer frame 502 and the second enhancement layer frame 503, generated in the process shown in FIG. 1, are received, and since these frames are encoded data, they are decoded by inverse quantization & inverse transform units 311, 312 and 313. At this time, a reconstructed base layer frame 111 is obtained through a deblocking block 411.
[36] Frames 111, 112 and 113, which have been decoded and reconstructed, are added to each other by an adder 12. Deblocking is performed on the added frames by a deblocking unit to eliminate the boundaries between blocks. In this case, the base layer frame has already been deblocked by the deblocking unit 411 so that a coefficient for deblocking, which is performed by the deblocking unit 412, decreases to 1 or 2 in the embodiment of the present invention. After deblocking has been completed in this way, a reconstructed original frame is reproduced.
[37] The inverse quantization & inverse transform unit 311 of FIG. 2 is an exemplary embodiment of a base layer extraction unit for extracting a base layer, and the inverse quantization & inverse transform units 312 and 313 for extracting enhancement layers are exemplary embodiments of an enhancement layer extraction unit. Reconstructed frames are indicated by reference numerals 111, 112 and 113, and the adder 12 for
adding the frames to each other is an embodiment of a reconstructed frame generation unit.
[38] FGS, depicted in FIGS. 1 and 2, uses an enhancement layer of a Scalable Video
Model (SVM) 3.0. A NAL unit obtained as a result of FGS can be truncated at a specific point, and frames can be reconstructed using data existing up to the truncation point. In this case, data to be transmitted corresponds to a base layer, and other enhancement layers can be flexibly transmitted depending on the transmission status of a network. All enhancement layers have residual data occurring due to the difference between the enhancement layers and the base layer (or a reconstructed frame composed of the base layer and a previous enhancement layer). A quantization parameter QPi is a parameter for generating an i-th enhancement layer. As the magnitude of the quantization parameter increases, the step size increases. Therefore, at the time of generating enhancement layers, data can be obtained while the magnitude of the quantization parameter gradually decreases.
[39] If video is encoded through lossy encoding, the combination of lost data and the number of bits required for encoding is the cost. For example, if it is assumed that lost data is E, the required bits are B, a predetermined coefficient is λ, then the cost of encoding C is:
[40] C = E + λB
[41] Therefore, criteria for determining the number of enhancement layers to be generated can be calculated based on the cost. In FIGS. 1 and 2, enhancement layers including two stages are generated.
[42] The exemplary embodiments of the present invention shown in FIGS. 1 and 2 perform deblocking at a low intensity when enhancement layers are directly encoded, or when enhancement layers are added to a base layer and decoded, thus reducing information loss caused by excessive deblocking.
[43] FGS, described with reference to FIGS. 1 and 2, is applied to the SVM 3.0. An exemplary embodiment for implementing FGS using another method is described below.
[44] FIG. 3 is a diagram of an apparatus for encoding video to support FGS according to another embodiment of the present invention. Unlike FIG. 1, a base layer and an enhancement layer are generated, and the enhancement layer is implemented through a bit plane.
[45] In FIG. 3, original video data is transformed by a transform unit 221. As an example of transform, a Discrete Cosine Transform (DCT) can be used. A base layer is generated if data obtained as the result of the DCT transform is quantized by a quantization unit 222, and the quantized data is encoded by an encoding unit 223 that uses entropy encoding or variable length coding (VLC). Meanwhile, since the
difference between the base layer and the original video data is obtained to generate an enhancement layer, the data that has been quantized by the quantization unit 222 is inversely quantized by an inverse quantization unit 321. In this case, since deblocking is performed in a decoder, deblocking is also performed by a deblocking unit 421 m an encoding stage, and then residual data, the difference between deblocked data and the original video data, is obtained. Then, the residual data is encoded again by an encoding unit 224 As m the case of a bit plane, of respective bits, the Most Significant Bit (MSB), the next MSB, . , the Least Significant Bit (LSB) can be grouped in the form of a bit plane, and then encoded. The enhancement layer generated by the encoding unit 224 is transmitted with the base layer.
[46] Meanwhile, in order to obtain reference information required to generate another frame, a reconstructed frame that can be obtained using the base layer and the enhancement layer is necessary In this case, deblocking is performed by a deblocking unit 422 in order to reconstruct the frame In this case, since the deblocking is performed by the deblocking unit 422 after the deblocking for the base layer has been performed by the deblocking unit 421, a deblocking coefficient is decreased, thus preventing the occurrence of over- smoothing.
[47] FIG. 4 is a view of an apparatus for decoding video to support FGS according to another exemplary embodiment of the present invention. Unlike FIG. 2, a base layer and an enhancement layer are received Data of the enhancement layer can be partially truncated in one enhancement layer depending on the receiving capability or decoding capability of a decoding stage (decorder).
[48] Both the base layer and the enhancement layer, transmitted in a stream format, are inverse quantized and inverse transformed The base layer is reconstructed by a deblocking unit 431 after passing through an inverse quantization unit 331 and an inverse transform unit 332. Further, the enhancement layer is reconstructed through an inverse quantization unit 335 and an inverse transform unit 336 The reconstructed base layer and enhancement layer are added to each other by an adder 12 so that a single reconstructed frame is created. At this time, deblocking is performed by a deblocking unit 432. However, since deblocking has been performed on the base layer by the deblocking unit 431, a deblocking coefficient is decreased at the time of performing deblocking on the reconstructed frame through the deblocking unit 432, thus preventing the occurrence of over-smoothing. If over- smoothing occurs, data in a corresponding portion disappears, causing data loss.
[49] FIG. 5 is a flowchart showing a process of encoding the original data of video according to an embodiment of the present invention
[50] MCTF is performed on original data constituting video so that a frame is generated in step SlOl. The original data may be a GOP composed of a plurality of frames. In
this process, a motion vector is obtained through motion estimation, and a motion compensated frame is configured using the motion vector and a reference frame. Further, the difference between a current frame and the motion compensated frame is obtained so that a residual frame is obtained, thus reducing temporal redundancy. As the motion estimation method, various methods, such as fixed size block matching or Hierarchical Variable Size Block Matching (HVSBM), can be used. MCTF is one method of providing temporal scalability, and some methods of implementing the MCTF includes a method using a Haar filter, a Motion Adaptive Filtering (MAF) method, a method using a 5/3 filter. The results, calculated by these methods, provide temporally scalable video data. Thereafter, in order to provide SNR scalable video data using this data, a process of generating base layer data and enhancement layer data is executed.
[51] In order to provide SNR scalability in a frame that is generated to be temporally scalable, such as MCTF, data is divided into a base layer and an enhancement layer. The base layer is extracted from a frame, on which the MCTF has been performed, through sampling in step S 103. The base layer can be compressed using several schemes. In the case of motion compensation video encoding, a DCT can be used. The base layer becomes the basis for generating the enhancement layer so that various existing video encoding methods can be used. The base layer can be generated by the transform & quantization units 201, 202 and 203 of FIG 1, or the transform unit 221, the quantization unit 222 and the encoding unit 223 of FIG. 3.
[52] Next, residual data, obtained by the difference between the base layer, generated in step S103, and the original data generated in step SlOl, is extracted, so the enhancement layer is generated in step S 105 In order to generate the enhancement layer, various fine-granular schemes can be used. For example, a wavelet method, a DCT method, and a matching-pursuit based method can be used. It is well known that, of these methods, the bitplane DCT coding method and the embedded zero-tree wavelet (EZW) method exhibit excellent performance.
[53] Meanwhile, in order to obtain residual data in step S 105, an inverse quantization procedure to inversely quantize a quantized base layer may be further required. For this operation, the base layer is reconstructed by the inverse quantization & inverse transform units 301, 302 and 303 of FIG. 1, or the inverse quantization unit 321 of FIG. 3, as described above.
[54] In the decorder, video data can be obtained by adding the enhancement layer to the base layer that has been inversely quantized; the base layer must be inverse quantized to obtain the residual data in order to reduce data loss. At this time, deblocking can be performed after inverse quantization has been performed. Deblocking is used to smooth the boundaries between blocks constituting frames. The difference between the
base layer, which was inversely quantized, and the original data, on which MCTF was performed m step SlOl, is obtained, so that the enhancement layer is generated, as described above.
[55] In step S 105, one or more enhancement layers may exist. As the number of enhancement layers increases, the unit of FGS is subdivided, thereby improving SNR scalability. The decorder can determine the number of enhancement layers to be received and to be decoded, depending on its decoding capability or reception capability.
[56] If base layer data and enhancement layer data are generated with respect to a single frame, a procedure of adding the base layer data to the enhancement layer data and generating a new reconstructed frame is required in step S 110. The reconstructed frame becomes the basis for generating other frames, or is necessary for generating a predictive frame for motion estimation. In this case, since boundaries between blocks exist in the reconstructed frame, deblocking is performed to eliminate the boundaries between blocks. The reconstructed frame includes the base layer, which has been deblocked in step S 105, so that deblocking is performed at a low intensity in step S115.
[57] If deblocking is performed with respect to the reconstructed frame using a high deblocking coefficient, data loss may increase, so that a deblocking coefficient decreases to about 1 or 2, and thus, deblocking is performed at a low intensity
[58] The result of deblocking, performed in FIG. 5, is expressed in the equations that follow
[59] If it is assumed that base layer data is B, enhancement layer data is El, E2, .., En, and deblocking performed on the base layer data in step S 105 is Dl, the reconstructed frame F, obtained in step Sl 10, can be expressed as Dl(B) + El + E2 + ... + En. Further, the result of the deblocking performed in step Sl 15 is: D2 (Dl(B) + El + E2 + . + En) In this case, the deblocking coefficient df2 of D2 may be set to 1 or 2.
[60] The exemplary embodiment of FIG. 5 shows that, after original video data is transformed to provide temporal scalability, the transformed data is divided into base layer data and enhancement layer data to provide SNR scalability. However, this processing sequence is not necessarily performed. After base layer data and enhancement layer data is obtained to provide SNR scalability for original video data regardless of whether corresponding data is used to provide temporal scalability, a new transform procedure for providing another type of scalability may be conducted. Further, for the MCTF procedure, a plurality of schemes may be employed, and the present invention is not limited to these schemes.
[61] FIG. 6 is a flowchart showing a process of decoding a received video stream according to an exemplary embodiment of the present invention. In detail, a process of
a decoder receiving and decoding a video stream is described in the following.
[62] The decoder receives the video stream in step S201. The decoder extracts a base layer from the received video stream, and reconstructs the base layer in step S203. The reconstruction of the base layer is performed through an inverse quantization and an inverse transform. The reconstructed base layer is deblocked in order to be added to other enhancement layers in step S205. Further, an enhancement layer is extracted from the received video stream, and the extracted enhancement layer is reconstructed in step S210. The reconstruction of the enhancement layer is also performed through an inverse quantization and an inverse transform. The base layer, deblocked in step S205, and the enhancement layer, reconstructed in step S210, are added to each other, so that a reconstructed frame is generated in step S220. Further, deblocking is performed on the reconstructed frame with a deblocking coefficient of 1 or 2 in step S230. Since the base layer has already been deblocked once in step S205, deblocking is performed at a low intensity to prevent over-smoothing in step S230.
[63] FIG. 7 is a diagram showing an example of reconstruction results for a base layer and enhancement layers according to an embodiment of the present invention. FIG. 7 illustrates the generation of a reconstructed frame, which has been deblocked by the deblocking unit 402 of FIG. 1, or a reconstructed frame, which has been deblocked by the deblocking unit 412 of FIG. 2. Further, FIG. 7 also illustrates the generation of a reconstructed frame, which has been deblocked by the deblocking unit 422 of FIG 3, or a reconstructed frame, which has been deblocked by the deblocking unit 432 of FIG. 4.
[64] A frame 151 denotes a frame obtained by deblocking a reconstructed base layer after reconstructing the base layer again. That is, the frame 151 is obtained by performing deblocking through the deblocking unit 401 of FIG. 1, the deblocking unit 411 of FIG. 2, the deblocking unit 421 of FIG. 3, or the deblocking unit 431 of FIG. 4. Reference numeral 152 or 153 is a frame obtained by reconstructing an enhancement layer. The reconstruction of the enhancement layer is performed by the inverse quantization & inverse transform units 302 and 303 of FIG. 1, the inverse quantization & inverse transform units 312 and 313 of FIG. 2, the decoding unit 325 of FIG. 3, or the inverse transform unit 336 of FIG. 4. The reconstructed enhancement layers and the reconstructed base layer, which has been deblocked, are added by an adder to produce a single frame 155. In this case, deblocking is performed again. As described above, if a deblocking coefficient is decreased and deblocking is performed, over- smoothing may be prevented. Through this process, the original frame 157 is reconstructed.
[65] In the exemplary embodiment of low-intensity deblocking described in FIGS. 5 and 6 the deblocking coefficient or deblocking filter is decreased to 1 or 2 to perform
deblocking. Currently, deblocking coefficients ranging up to 4 exist. If the deblocking coefficient is subdivided and the maximum value thereof is increased to 8 or 16, deblocking is performed using a low deblocking coefficient corresponding to the increased coefficient.
[66]
Table 1 - Degree of Improvement of PSNR of Video Sequence
[67] Table 1 shows results obtained according to an exemplary embodiment of the present invention. Here, a football moving picture is sampled at frequencies of 7.5 Hz and 15 Hz. Table 1 shows the degree of improvement of PSNR when the method of decreasing the deblocking coefficient, proposed in the present invention, is applied depending on the bit rate of a network. As shown in Table 1, it can be seen that the degree of improvement of the PSNR is high at a low rate (160 kbps and 192 kbps at 7.5 Hz, and 243 kbps at 15 Hz). The degree of improvement of Table 1 is displayed graphically in FIGS. 8A and 8B. FIG. 8A shows the degree of improvement of PSNR when video, sampled at a frequency of 7.5 Hz in the Quarter Common Intermediate Format (QCIF), is deblocked at a low intensity. FIG. 8B shows the degree of improvement of the PSNR when video, sampled at a frequency of 15 Hz in the QCIF, is deblocked at a low intensity. As shown in the two graphs, the degree of improvement of the PSNR is high when the bit rate is low.
[68]
Table 2 - Degree of Improvement of PSNR of Video Sequence
[69] Table 2 shows results obtained according to an exemplary embodiment of the present invention. In the case where a football moving picture is sampled at a frequencies of 15 Hz and 30 Hz, Table 2 shows the degree of improvement of the PSNR when the method of decreasing a deblocking coefficient, proposed in the exemplary embodiment of the present invention, is applied depending on the bit rate of a network. As shown in Table 2, it can be seen that the degree of improvement of the PSNR is high at a low rate (588 kbps and 690 kbps at 15 Hz, and 920 kbps and 1124 Kbps at 30 Hz). The degree of improvement in Table 2 is displayed graphically in FIGS. 9A and 9B. FIG. 9A shows the degree of improvement of the PSNR when video, sampled at a frequency of 15 Hz in the QCIF, is deblocked at a low intensity. FIG. 9B shows the degree of improvement of the PSNR when video, sampled at a frequency of 30 Hz in the QCIF, is deblocked at a low intensity. As shown in the two graphs, the degree of improvement of the PSNR is high when the bit rate is low. That is, FGS is required when the bit rate of a network is low, so that the image quality is excellent if the degree of improvement of the PSNR is high while the bit rate is low, as shown in Tables 1 and 2 according to the method proposed in the present specification.
Industrial Applicability
[70] Accordingly, the present invention is advantageous in that it can perform deblocking at a low intensity in video encoding and decoding that support FGS, thus improving a PSNR
[71] Further, the present invention is advantageous in that it can improve the quality of video while reducing data loss caused by deblocking
[72] Although the exemplary embodiments of the present invention have been disclosed for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims. Therefore, it
should be understood that the above embodiments are only exemplified in all aspects and are not restrictive. The scope of the present invention should be defined in the attached claims, rather than the detailed description. Those skilled in the art will appreciate that all modifications, equivalences and substitutions derived from the meaning and scope of the claims and concept equivalent thereto are included in the spirit and scope of the present invention defined by the attached claims.
Claims
[1] A Fine Granularity Scalability (FGS)-based video encoding method capable of controlling deblocking, comprising receiving original data of video and generating a base layer based on the original data; obtaining the difference between the original data and data that is obtained by reconstructing the base layer and deblocking the reconstructed base layer, thus generating an enhancement layer; generating a reconstructed frame, based on data that is obtained by reconstructing the enhancement layer, and the data that is obtained by deblocking the reconstructed base layer; and deblocking the reconstructed frame at an intensity different from a deblocking intensity used in deblocking the reconstructed base layer.
[2] The FGS-based video encoding method according to claim 1, wherein a deblocking intensity used in deblocking the reconstructed frame is lower than the deblocking intensity used in deblocking the reconstructed base layer.
[3] The FGS-based video encoding method according to claim 1, wherein a deblocking coefficient used in deblocking the reconstructed frame is set to 1 or 2.
[4] The FGS-based video encoding method according to claim 1, wherein generating of the base layer comprises transforming and quantizing the original data.
[5] The FGS-based video encoding method according to claim 4, wherein the transformation comprises a Discrete Cosine Transform (DCT).
[6] The FGS-based video encoding method according to claim 4, wherein reconstructing of the base layer comprises inverse transforming and inverse quantizing the original data which is transformed and quantized.
[7] The FGS-based video encoding method according to claim 1, wherein generating of the enhancement layer comprises transforming and quantizing the difference between the original data and the data that is obtained by reconstructing the base layer and deblocking the reconstructed base layer.
[8] The FGS based video encoding method according to claim 1, wherein generating of the enhancement layer comprises generating two or more enhancement layers.
[9] The FGS-based video encoding method according to claim 8, wherein generating of the enhancement layer comprises: encoding residual data generated by the difference between the original data and the data that is obtained by reconstructing the base layer and deblocking the reconstructed base layer, thus generating a first enhancement layer, and encoding a residual frame generated by the difference between a reconstructed
frame and the original data, thus generating a second enhancement layer, the reconstructed frame being obtained by adding the data that is obtained by reconstructing the first enhancement layer to the data that is obtained by reconstructing the base layer and deblocking the reconstructed base layer.
[10] The FGS-based video encoding method according to claim 1, wherein the original video data is data obtained by performing Motion-Compensated Temporal Filtering (MCTF) on a Group of Pictures (GOP).
[11] A Fine Granularity Scalability (FGS)-based video decoding method capable of controlling deblocking, comprising receiving a video stream and extracting a base layer from the video stream; extracting an enhancement layer from the video stream; adding data that is obtained by reconstructing the base layer and deblocking the reconstructed base layer to data that is obtained by reconstructing the enhancement layer, thus generating a reconstructed frame; and deblocking the reconstructed frame at an intensity different from a deblocking intensity used in deblocking the reconstructed base layer.
[12] The FGS-based video decoding method according to claim 11, wherein the deblocking intensity used in the deblocking the reconstructed frame is lower than the deblocking intensity used in deblocking the reconstructed base layer.
[13] The FGS-based video decoding method according to claim 11, wherein a deblocking coefficient used in deblocking the reconstructed frame is set to 1 or 2.
[14] The FGS-based video decoding method according to claim 11, wherein reconstructing of the base layer comprises inverse transforming and inverse quantizing the base layer.
[15] The FGS-based video decoding method according to claim 14, wherein the inverse transformation comprises an Inverse Discrete Cosine Transform (IDCT).
[16] The FGS-based video decoding method according to claim 11, wherein reconstructing of the enhancement layer comprises inverse transforming and inverse quantizing the enhancement layer.
[17] The FGS-based video decoding method according to claim 11, wherein extracting the enhancement layer comprises extracting two or more enhancement layers.
[18] The FGS-based video decoding method according to claim 17, wherein extractm g of the two or more enhancement layer comprises: extracting a first enhancement layer from the video stream; and extracting a second enhancement layer from remaining data of the video stream after extracting the first enhancement layer from the video stream.
[19] A Fine Granularity Scalability (FGS)-based video encoder capable of controlling
deblocking, the encoder comprising: a base layer generation unit which generates a base layer based on original video data; an enhancement layer generation unit which obtains a difference between the original data and data that is obtained by reconstructing the base layer and deblocking the reconstructed base layer, thus generating an enhancement layer; a reconstructed frame generation unit which generates a reconstructed frame based on data that is obtained by reconstructing the enhancement layer, and the data that is obtained by deblocking the reconstructed base layer; and a deblocking unit which deblocks the reconstructed frame at an intensity different from a deblocking intensity used in deblocking the reconstructed base layer.
[20] The FGS-based video encoder according to claim 19, wherein the deblocking unit which deblocks the reconstructed frame is configured to have a deblocking intensity lower than the deblocking intensity used in deblocking the reconstructed base layer.
[21] The FGS-based video encoder according to claim 19, wherein the deblocking unit which deblocks the reconstructed frame is configured to have a deblocking coefficient set to 1 or 2.
[22] The FGS-based video encoder according to claim 19, wherein the base layer generation unit is configured to transform and quantize the original data.
[23] The FGS-based video encoder according to claim 22, wherein transforming of the original data comprises a Discrete Cosine Transform (DCT).
[24] The FGS-based video encoder according to claim 22, wherein one of the base layer generation unit and the enhancement layer generation unit is configured to inverse-transform and inverse-quantize the original data in reconstructing the base layer
[25] The FGS-based video encoder according to claim 19, wherein the enhancement layer generation unit is configured to transform and quantize the difference between the original data and the data that is obtained by reconstructing the base layer and deblocking the reconstructed base layer.
[26] The FGS-based video encoder according to claim 19, wherein the enhancement layer generation unit is configured to generate two or more enhancement layers.
[27] The FGS-based video encoder according to claim 26, wherein the enhancement layer generation unit comprises: a first enhancement layer generation unit which encodes a residual data generated by the difference between the original data and the data that is obtained by reconstructing the base layer and deblocking the reconstructed base
layer, thus generating a first enhancement layer; and a second enhancement layer generation unit which encodes a residual frame generated by the difference between a reconstructed frame and the original data, thus generating a second enhancement layer, the reconstructed frame being obtained by adding the data that is obtained by reconstructing the first enhancement layer to the data that is obtained by reconstructing the base layer and deblocking the reconstructed base layer
[28] The FGS-based video encoder according to claim 19, wherein the original video data is data obtained by performing Motion-Compensated Temporal Filtering (MCTF) on a Group of Pictures (GOP).
[29] A Fine Granularity Scalability (FGS)-based video decoder capable of controlling deblocking, the decoder comprising: a base layer extraction unit which extracts a base layer from a received video stream; an enhancement layer extraction unit which extracts an enhancement layer from the received video stream; a reconstructed frame generation unit which adds data that is obtained by reconstructing the base layer and deblocking the reconstructed base layer to data that is obtained by reconstructing the enhancement layer, thus generating a reconstructed frame; and a deblocking unit which deblocks the reconstructed frame at an intensity different from a deblocking intensity used in deblocking the reconstructed base layer.
[30] The FGS-based video decoder according to claim 29, wherein the deblocking unit which deblocks the reconstructed frame is configured to have a deblocking intensity lower than the deblocking intensity used in deblocking the reconstructed base layer
[31] The FGS-based video decoder according to claim 29, wherein the deblocking unit which deblocks the reconstructed frame is configured to have a deblocking coefficient set to 1 or 2.
[32] The FGS-based video decoder according to claim 29, the decoder further comprising: an inverse quantization unit which inverse-quantizes the base layer, and an inverse transform unit which inverse-transforms the inverse-quantized base layer, wherein the data obtained by reconstructing the base layer and deblocking the reconstructed base layer is generated by deblocking the inverse-transformed base layer.
[33] The FGS-based video decoder according to claim 32, wherein the inverse transformation comprises an Inverse Discrete Cosine Transform (IDCT).
[34] The FGS-based video decoder according to claim 29, the decoder further comprising: an inverse quantization unit which inverse-quantizes the enhancement layer; and an inverse transform unit which inverse-transforms the inverse-quantized enhancement layer, wherein the data obtained by reconstructing the enhancement layer is generated based on the inverse-transformed base layer.
[35] The FGS-based video decoder according to claim 29, wherein the enhancement layer extraction unit is configured to extract two or more enhancement layers.
[36] The FGS-based video decoder according to claim 35, wherein the enhancement layer extraction unit comprises: a first enhancement layer extraction unit which extracts a first enhancement layer from the video stream; and a second enhancement layer extraction unit which extracts a second enhancement layer from remaining data of the video stream after extracting the first enhancement layer from the video stream.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US64458205P | 2005-01-19 | 2005-01-19 | |
KR1020050011423A KR100703744B1 (en) | 2005-01-19 | 2005-02-07 | Method and apparatus for fine-granularity scalability video encoding and decoding which enable deblock controlling |
PCT/KR2006/000168 WO2006078107A2 (en) | 2005-01-19 | 2006-01-17 | Fine granularity scalable video encoding and decoding method and apparatus capable of controlling deblocking |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1839441A2 true EP1839441A2 (en) | 2007-10-03 |
Family
ID=36692641
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP06715725A Withdrawn EP1839441A2 (en) | 2005-01-19 | 2006-01-17 | Fine granularity scalable video encoding and decoding method and apparatus capable of controlling deblocking |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP1839441A2 (en) |
WO (1) | WO2006078107A2 (en) |
-
2006
- 2006-01-17 EP EP06715725A patent/EP1839441A2/en not_active Withdrawn
- 2006-01-17 WO PCT/KR2006/000168 patent/WO2006078107A2/en active Application Filing
Non-Patent Citations (1)
Title |
---|
See references of WO2006078107A3 * |
Also Published As
Publication number | Publication date |
---|---|
WO2006078107A2 (en) | 2006-07-27 |
WO2006078107A3 (en) | 2006-11-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20060159359A1 (en) | Fine granularity scalable video encoding and decoding method and apparatus capable of controlling deblocking | |
JP5014989B2 (en) | Frame compression method, video coding method, frame restoration method, video decoding method, video encoder, video decoder, and recording medium using base layer | |
JP5026965B2 (en) | Method and apparatus for predecoding and decoding a bitstream including a base layer | |
KR100703749B1 (en) | Method for multi-layer video coding and decoding using residual re-estimation, and apparatus for the same | |
JP4891234B2 (en) | Scalable video coding using grid motion estimation / compensation | |
US8411753B2 (en) | Color space scalable video coding and decoding method and apparatus for the same | |
KR100703724B1 (en) | Apparatus and method for adjusting bit-rate of scalable bit-stream coded on multi-layer base | |
US7889793B2 (en) | Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer | |
JP4922391B2 (en) | Multi-layer video encoding method and apparatus | |
US7933456B2 (en) | Multi-layer video coding and decoding methods and multi-layer video encoder and decoder | |
JP4844741B2 (en) | Moving picture coding apparatus and moving picture decoding apparatus, method and program thereof | |
US20060013311A1 (en) | Video decoding method using smoothing filter and video decoder therefor | |
US20050152611A1 (en) | Video/image coding method and system enabling region-of-interest | |
WO2006112642A1 (en) | Method and apparatus for adaptively selecting context model for entropy coding | |
WO2006080662A1 (en) | Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer | |
US20060250520A1 (en) | Video coding method and apparatus for reducing mismatch between encoder and decoder | |
EP1741297A1 (en) | Method and apparatus for implementing motion scalability | |
JP2008515328A (en) | Video coding and decoding method using inter-layer filtering, video encoder and decoder | |
WO2004064405A1 (en) | Encoding method, decoding method, encoding device, and decoding device | |
KR20040083450A (en) | Memory-bandwidth efficient fine granular scalability(fgs) encoder | |
EP1889487A1 (en) | Multilayer-based video encoding method, decoding method, video encoder, and video decoder using smoothing prediction | |
WO2006078107A2 (en) | Fine granularity scalable video encoding and decoding method and apparatus capable of controlling deblocking | |
WO2006098586A1 (en) | Video encoding/decoding method and apparatus using motion prediction between temporal levels | |
EP1787473A1 (en) | Multi-layer video coding and decoding methods and multi-layer video encoder and decoder |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20070704 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR |
|
DAX | Request for extension of the european patent (deleted) | ||
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20100803 |