JP4522951B2

JP4522951B2 - Moving picture encoding method and apparatus, decoding method and apparatus, moving picture processing program, and computer-readable recording medium

Info

Publication number: JP4522951B2
Application number: JP2006007716A
Authority: JP
Inventors: 孝之仲地; 知子澤邊
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2006-01-16
Filing date: 2006-01-16
Publication date: 2010-08-11
Anticipated expiration: 2026-01-16
Also published as: JP2007189622A

Description

本発明は、動画像符号化方法及び装置及び復号化方法及び装置及び動画像処理プログラム及びコンピュータ読み取り可能な記録媒体に係り、特に、動画像を効率よく伝送、復号するための動画像符号化方法及び装置及び復号化方法及び装置及び動画像処理プログラム及びコンピュータ読み取り可能な記録媒体に関する。 The present invention relates to a moving image encoding method and apparatus, a decoding method and apparatus, a moving image processing program, and a computer-readable recording medium, and in particular, a moving image encoding method for efficiently transmitting and decoding a moving image. The present invention relates to an apparatus, a decoding method and apparatus, a moving image processing program, and a computer-readable recording medium.

原画像を無歪みで再生できる可逆符号化は、医療、美術、印刷などの高精細な画像が要求される静止画像の領域において特に必要とされ、様々な手法が提案されている（例えば、非特許文献１，２，３参照）。 Lossless encoding that can reproduce an original image without distortion is particularly necessary in the field of still images that require high-definition images such as medical, art, and printing, and various methods have been proposed (for example, non-coding). (See Patent Documents 1, 2, and 3).

一方、動画像に対しては非可逆符号化の研究が活発に行われてきたが、近年はディジタル放送やディジタルシネマのアーカイブ、編集などにおいて動画像可逆符号化が必要とされている。 On the other hand, research on lossy coding has been actively conducted on moving images, but in recent years, lossless coding of moving images is required in digital broadcasting and digital cinema archiving and editing.

可逆符号化法は、静止画像を対象として様々な手法が提案されており、国際標準符号化としてJPEG-LS（非特許文献１参照）、JPEG2000ロスレスモード（非特許文献２参照）等の規格がある。 As the lossless encoding method, various methods have been proposed for still images. As international standard encoding, standards such as JPEG-LS (see Non-Patent Document 1), JPEG2000 lossless mode (see Non-Patent Document 2), and the like. is there.

JPEG-LSは、非線形予測符号化とコンテクストモデリングに基づく方式で画像の局所的性質変化に追従した予測が可能であり、JPEG2000ロスレスモードと比較して演算が高速で符号化効率に優れる（例えば、非特許文献４参照）。 JPEG-LS is a method based on non-linear predictive coding and context modeling that can predict changes following local changes in the image, and is faster and more efficient in coding than JPEG2000 lossless mode (for example, Non-patent document 4).

一方、JPEG2000は、解像度スケーラビリティやSNRスケーラビリティをはじめ画像の領域毎に圧縮率を可変にできるROI（Region Of Interest）機能などJPEG-LSにはない様々な高度な機能を持つ。 On the other hand, JPEG2000 has various advanced functions that JPEG-LS does not have, such as resolution scalability and SNR scalability, and ROI (Region Of Interest) function that can change the compression rate for each image area.

一方、符号化効率の観点からすると、フレーム間相関を利用することが望ましく、より高い圧縮率が得られると予想される。フレーム間相関を利用した可逆動画像符号化法は、解像度スケーラビリティ機能は持たないものの、フレーム間相関の除去に動き補償を用いて符号化効率を改善した手法である（例えば、非特許文献５，６参照）。 On the other hand, from the viewpoint of coding efficiency, it is desirable to use inter-frame correlation, and it is expected that a higher compression rate can be obtained. The lossless video coding method using inter-frame correlation is a technique that improves the coding efficiency by using motion compensation to remove inter-frame correlation, although it does not have a resolution scalability function (for example, Non-Patent Document 5, 6).

また、フレーム間相関を利用した可逆動画像符号化法は、解像度スケーラビリティ機能を有し、符号化効率に優れる機能がある（例えば、非特許文献７参照）。ウェーブレット変換と予測符号化を組み合わせたハイブリッド符号化であり、ウェーブレットの最低周波数帯域で適応的にフレーム内相関とフレーム間相関を除去し、低演算量で効率よくエントロピーを低減する。これは、自然画像が最低周波数帯域間でフレーム間相関が強く、それ以外の帯域では弱い性質を利用している。この方法は、多重解像度可逆ビデオ符号化法MLVC(Multiresolution Lossless Video Coding)と呼ばれている。 In addition, the lossless video encoding method using inter-frame correlation has a resolution scalability function and a function that is excellent in encoding efficiency (for example, see Non-Patent Document 7). It is a hybrid coding that combines wavelet transform and predictive coding, and adaptively removes intra-frame correlation and inter-frame correlation in the lowest frequency band of the wavelet, thereby efficiently reducing entropy with a small amount of computation. This utilizes the property that a natural image has a strong inter-frame correlation between the lowest frequency bands and is weak in other bands. This method is called multiresolution lossless video coding (MLVC).

一方で、各周波数帯域間のフレーム間相関は高い周波数帯域では弱い傾向にあるものの、その大きさは一定ではなく画像によって異なる。MLVCを拡張し対象画像の統計的性質に応じてフレーム内相関とフレーム間相関を除去する帯域を適応的に変化させることにより符号化効率を改善した手法が提案されている（例えば、非特許文献８参照）。 On the other hand, although the inter-frame correlation between the frequency bands tends to be weak in the high frequency band, the magnitude is not constant and varies depending on the image. A method has been proposed in which the encoding efficiency is improved by extending MLVC and adaptively changing the band for removing intra-frame correlation and inter-frame correlation according to the statistical properties of the target image (for example, non-patent literature) 8).

この方法は、多重解像度可変可逆ビデオ符号化法Ex-MLVC(Extended Multiresolution Lossless Video Coding)と呼ばれている。
ISO/ITC JTC1/SC29 FCD14495, Lossless and near-lossless coding of continuous tone still images (JPEG-LS), ISO/IEC JTC1/SC29/WG1, July 1997. ISO/IEC JTC1/SC29 WG1 N1646R, JPEG2000 Part I Final Committee Draft Ver, 1.0, ISO/IEC JTC1/SC29 WG1, March 2000. T. Nakachi, T,Fujii and J. Suzuki, “Pel adaptive predictive coding based on image segmentation for lossless compression,” IEICE Trans. on Fundamentals, vol. E82-A, no.6, June 1999. D. Santa-Cruz and T. Ebrahimi, “A study of JPEG2000 still image coding versus other standards," vol. 2, pp.673-676, Tampere, Finland, EUSIPCO-2000, Sep. 2000. 小野尚紀、八島由幸、“動き補償を用いたロスレス符号化の性能評価”、信学技法、IE95-18, pp.21-27, 1995. 中嶋淳一、八島由幸、小林直樹、“MPEG-2符号化パラメータに基づく階層的ロスレス符号化の検討” 信学技法、IE99-124, pp.25-30, 2000. T. Nakachi, T. Sawabe, T. Fujii and T. Fujii, “Multiresolution lossless video coding using inter/intra frame adaptive prediction, ”IEICE Trans. on Fundamentals, vol. E85-A, no.8, pp.1822-1830, Aug, 2002. T. Nakachi, T. Sawabe and T. Fujii, “A Study on Multiresolution Lossless Video Codiong Using Inter/Intra Frame Adaptive Prediction,” VCIP 2003, vol.5150, pp.1685-1696, June 2003. This method is called a multi-resolution variable lossless video coding method Ex-MLVC (Extended Multiresolution Lossless Video Coding).
ISO / ITC JTC1 / SC29 FCD14495, Lossless and near-lossless coding of continuous tone still images (JPEG-LS), ISO / IEC JTC1 / SC29 / WG1, July 1997. ISO / IEC JTC1 / SC29 WG1 N1646R, JPEG2000 Part I Final Committee Draft Ver, 1.0, ISO / IEC JTC1 / SC29 WG1, March 2000. T. Nakachi, T, Fujii and J. Suzuki, “Pel adaptive predictive coding based on image segmentation for lossless compression,” IEICE Trans. On Fundamentals, vol. E82-A, no.6, June 1999. D. Santa-Cruz and T. Ebrahimi, “A study of JPEG2000 still image coding versus other standards,” vol. 2, pp.673-676, Tampere, Finland, EUSIPCO-2000, Sep. 2000. Naoki Ono and Yoshiyuki Yashima, “Performance Evaluation of Lossless Coding Using Motion Compensation”, IEICE Technical, IE95-18, pp.21-27, 1995. Junichi Nakajima, Yoshiyuki Yashima, Naoki Kobayashi, “Examination of hierarchical lossless coding based on MPEG-2 coding parameters” IEICE-Technology, IE99-124, pp.25-30, 2000. T. Nakachi, T. Sawabe, T. Fujii and T. Fujii, “Multiresolution lossless video coding using inter / intra frame adaptive prediction,” IEICE Trans. On Fundamentals, vol. E85-A, no.8, pp.1822- 1830, Aug, 2002. T. Nakachi, T. Sawabe and T. Fujii, “A Study on Multiresolution Lossless Video Codiong Using Inter / Intra Frame Adaptive Prediction,” VCIP 2003, vol.5150, pp.1685-1696, June 2003.

しかしながら、MLVC、 Ex-MLVCの両手法とも、空間解像度スケーラビリティを有するものの、符号化データの効率的な伝送順序については規定していない。なお、空間解像度スケーラビリティを有するとは、一つの符号化ビットストリームから異なる空間解像度の画像を段階的に復号化可能であることを示す。 However, although both MLVC and Ex-MLVC methods have spatial resolution scalability, they do not define an efficient transmission order of encoded data. Note that having spatial resolution scalability indicates that images of different spatial resolutions can be decoded step by step from one encoded bit stream.

本発明は、上記の点に鑑みなされたもので、空間解像度スケーラビリティを有する動画像可逆符号化・復号化において、符号化データを効率的に伝送することが可能な動画像符号化方法及び装置及び復号化方法及び装置及び動画像処理プログラム及びコンピュータ読み取り可能な記録媒体を提供することを目的とする。 The present invention has been made in view of the above points, and a moving image encoding method and apparatus capable of efficiently transmitting encoded data in moving image lossless encoding / decoding having spatial resolution scalability, and It is an object to provide a decoding method and apparatus, a moving image processing program, and a computer-readable recording medium.

図１は、本発明の原理を説明するための図である。 FIG. 1 is a diagram for explaining the principle of the present invention.

本発明（請求項１）は、入力された原画像をウェーブレット変換により帯域分割し（ステップ１）、分割された帯域毎にフレーム内およびフレーム間予測を画素毎に行う動画像符号化方法であって、
復号化側において原画像と同じ解像度で再生する場合に、
符号化手段において、帯域分割手段により分割された帯域の符号化対象画素近傍信号のフレーム間相関が、所定の閾値より大きい場合には（ステップ２、Ｙｅｓ）、フレーム間予測を行い（ステップ３）、該フレーム間相関が所定の閾値より小さい場合には（ステップ２、Ｎｏ）、フレーム内予測を行い（ステップ４）、符号化する符号化ステップと、
Ｎを正の整数でウェーブレット分割レベル数おし、ｋを１以上Ｎ以下の整数として、符号化された各フレーム毎に、
最低周波数帯域ＬＬNの符号化されたデータを１番目に伝送する第１の伝送ステップ（ステップ５）と、
最低周波数帯域より高い周波数帯域ＨＬk、ＬＨk、ＨＨｋの符号化されたデータを２番目以降に、各帯域のフレーム間予測のフレーム内予測に対する処理回数の比率が大きい順番に伝送する第２の伝送ステップ（ステップ６）と、からなる。 The present invention (Claim 1) is a moving picture encoding method in which an input original image is band-divided by wavelet transform (step 1), and intra-frame and inter-frame prediction is performed for each pixel for each divided band. And
When playing back at the same resolution as the original image on the decoding side,
In the encoding means, inter-frame correlation of the band coded pixel neighborhood signal divided by the band dividing means, if greater than Jo Tokoro threshold performed (Step 2, Yes), the inter-frame prediction (Step 3 ), When the inter-frame correlation is smaller than a predetermined threshold (No in Step 2), an intra-frame prediction (Step 4) is performed, and an encoding step is performed.
For each encoded frame, where N is a positive integer, the number of wavelet division levels, and k is an integer between 1 and N,
A first transmission step (step 5) for transmitting encoded data of the lowest frequency band LLN first;
Second transmission step of transmitting encoded data of frequency bands HLk, LHk, and HHk higher than the lowest frequency band in order of increasing the ratio of the number of processing times to intra-frame prediction of inter-frame prediction of each band. (Step 6) .

本発明（請求項２）は、請求項１記載の動画像符号化方法により符号化されたデータを原画像と同じ解像度で再生するために復号する動画像復号化方法であって、
Ｎを正の整数でウェーブレット分割レベル数とし、ｋを１以上Ｎ以下の整数として、復号化手段において、符号化されたデータを符号化された各フレーム毎に、
最低周波数帯域ＬＬNの符号化されたデータを１番目に復号する第１の復号ステップ（ステップ７）と、
最低周波数帯域より高い周波数帯域ＨＬk、ＬＨk、ＨＨkの符号化されたデータを２番目以降に、各帯域のフレーム間予測のフレーム内予測に対する処理回数の比率が大きい順番に復号する第２の復号ステップ（ステップ８）と、からなる。 The present invention (Claim 2) is a moving picture decoding method for decoding data encoded by the moving picture encoding method according to claim 1 in order to reproduce it at the same resolution as the original image ,
With N being a positive integer and the number of wavelet division levels, and k being an integer of 1 to N, the decoding means encodes the encoded data for each encoded frame,
A first decoding step (step 7) for first decoding the encoded data of the lowest frequency band LLN;
Second decoding step of decoding encoded data of frequency bands HLk, LHk, and HHk higher than the lowest frequency band in order from the second to the largest in the ratio of the number of processing times to intra-frame prediction of inter-frame prediction of each band (Step 8) .

本発明（請求項３）は、入力された原画像をウェーブレット変換により帯域分割し、分割された帯域毎にフレーム内およびフレーム間予測を画素毎に行う動画像符号化方法であって、
復号化側において原画像の１／２ⁿ倍の解像度毎に段階的に再生する場合に、
符号化手段において、帯域分割手段により分割された帯域の符号化対象画素近傍信号のフレーム間相関が所定の閾値より大きい場合には、フレーム間予測を行い、該フレーム間相関が所定の閾値より小さい場合には、フレーム内予測を行い、符号化する符号化ステップと、
Ｎを正の整数でウェーブレット分割レベル数とし、ｋを１以上（Ｎ−１）以下の整数として、符号化された各フレーム毎に、
最低周波数帯域ＬＬNの符号化されたデータを１番目に伝送する第１の伝送ステップと、
第Ｎレベルの３つの周波数帯域ＨＬN、ＬＨN、ＨＨNの符号化されたデータを２ないし４番目に、各帯域のフレーム間予測のフレーム内予測に対する処理回数の比率が大きい順番に伝送する第２の伝送ステップと、
第Ｎレベルの３つの周波数帯域ＨＬN、ＬＨN、ＨＨNより高い周波数帯域ＨＬk、ＬＨk、ＨＨkの符号化されたデータを５番目以降に、低周波数帯域から高周波数帯域の順にレベル毎に、かつ同一レベル内では、ＨＬk、ＬＨk、ＨＨkの順番に伝送する第３の伝送ステップと、からなる。 The present invention (Claim 3) is a moving image encoding method for dividing an input original image by wavelet transform and performing intra-frame and inter-frame prediction for each divided band for each pixel,
When playing back in stages for each resolution of 1/2 ⁿ times the original image on the decoding side,
In the encoding means, when the inter-frame correlation of the encoding target pixel neighboring signal in the band divided by the band dividing means is larger than a predetermined threshold, inter-frame prediction is performed, and the inter-frame correlation is smaller than the predetermined threshold. A coding step of performing intra-frame prediction and coding;
For each encoded frame, where N is a positive integer and the number of wavelet division levels, and k is an integer between 1 and (N−1),
A first transmission step of transmitting encoded data of the lowest frequency band LLN first;
The encoded data of the three frequency bands HLN, LHN, and HHN of the Nth level are transmitted from the second to the fourth, in the order of the ratio of the number of processing times with respect to the intraframe prediction of the interframe prediction of each band. A transmission step;
Three frequency bands HLN of the N levels, LHN, HHN higher frequency band HLk, LHk, the fifth and subsequent encoded data for HHK, for each level in order of the high frequency band from a low frequency band, and the same level Includes a third transmission step for transmitting in the order of HLk, LHk, and HHk.

本発明（請求項４）は、請求項３記載の動画像符号化方法により符号化されたデータを原画像の１／２ⁿ倍の解像度毎に段階的に再生するために復号する動画像復号化方法であって、
Ｎを正の整数でウェーブレット分割レベル数とし、ｋを１以上（Ｎ−１）以下の整数として、復号化手段において、符号化されたデータを符号化された各フレーム毎に、
最低周波数帯域ＬＬNの符号化されたデータを１番目に復号する第１の復号ステップと、
第Ｎレベルの３つの周波数帯域ＨＬN、ＬＨN、ＨＨNの符号化されたデータを２ないし４番目に、各帯域のフレーム間予測のフレーム内予測に対する処理回数の比率が大きい順番に復号する第２の復号ステップと、
第Ｎレベルの３つの周波数帯域ＨＬN、ＬＨN、ＨＨNより高い周波数帯域ＨＬk、ＬＨk、ＨＨkの符号化されたデータを５番目以降に、低周波数帯域から高周波数帯域の順にレベル毎に、かつ同一レベル内ではＨＬk、ＬＨk、ＨＨkの順番に復号する第３の復号ステップと、からなる。 The present invention (Claim 4) is a moving picture decoding for decoding data encoded by the moving picture encoding method according to claim 3 in order to reproduce it step by step at a resolution of 1/2 ⁿ times the original image. A method of
With N being a positive integer and the number of wavelet division levels, and k being an integer not less than 1 and not more than (N−1), the decoding means encodes the encoded data for each encoded frame.
A first decoding step of first decoding encoded data of the lowest frequency band LLN;
Three frequency bands HLN of the N levels, LHN, 4th 2 to the encoded data HHN, the Gosuru restored sequentially ratio of the number of processing times is larger for the intra prediction of the prediction between bands of the frame 2 Decryption step of
Three frequency bands HLN of the N levels, LHN, HHN higher frequency band HLk, LHk, the fifth and subsequent encoded data for HHK, for each level in order of the high frequency band from a low frequency band, and the same level Includes a third decoding step for decoding in the order of HLk, LHk, and HHk.

図２は、本発明の原理構成図である。 FIG. 2 is a principle configuration diagram of the present invention.

本発明（請求項５）は、入力された原画像をウェーブレット変換により帯域分割する帯域分割手段２０を有し、分割された帯域毎にフレーム内およびフレーム間予測を画素毎に行う動画像符号化装置であって、
復号化側において原画像と同じ解像度で再生する場合に、
帯域分割手段２０により分割された帯域の符号化対象画素近傍信号のフレーム間相関が所定の閾値より大きい場合には、フレーム間予測を行い、該フレーム間相関が所定の閾値より小さい場合には、フレーム内予測を行い、符号化する符号化手段３０と、
Ｎを正の整数でウェーブレット分割レベル数とし、ｋを１以上Ｎ以下の整数として、符号化された各フレーム毎に、
最低周波数帯域ＬＬNの符号化されたデータを１番目に伝送する第１の伝送手段５１と、
最低周波数帯域より高い周波数帯域ＨＬk、ＬＨk、ＨＨkの符号化されたデータを２番目以降に、各帯域のフレーム間予測のフレーム内予測に対する処理回数の比率が大きい順番に伝送する第２の伝送手段５２と、を有する。 The present invention (Claim 5) has a band dividing means 20 for dividing a band of an input original image by wavelet transform, and performs moving picture coding for performing intra-frame and inter-frame prediction for each divided band for each pixel. A device,
When playing back at the same resolution as the original image on the decoding side,
If the inter-frame correlation of the divided band coded pixel sensor signal of the band dividing means 20 is constant greater than threshold Tokoro performs interframe prediction, if between the frame correlation is smaller than the predetermined threshold Encoding means 30 for performing intra-frame prediction and encoding;
For each encoded frame, where N is a positive integer and the number of wavelet division levels, and k is an integer between 1 and N,
A first transmission means 51 for first transmitting encoded data of the lowest frequency band LLN;
Second transmission means for transmitting encoded data of frequency bands HLk, LHk, and HHk higher than the lowest frequency band in order from the second to the largest in the ratio of the number of processing times to intra-frame prediction of inter-frame prediction of each band 52 .

本発明（請求項６）は、請求項５記載の動画像符号化装置により符号化されたデータを原画像と同じ解像度で再生するために復号する動画像復号化装置であって、
Ｎを正の整数でウェーブレット分割レベル数とし、ｋを１以上Ｎ以下の整数として、符号化されたデータを符号化された各フレーム毎に、
最低周波数帯域ＬＬNの符号化されたデータを１番目に復号する第１の復号手段６１と、
最低周波数帯域より高い周波数帯域ＨＬk、ＬＨk、ＨＨkの符号化されたデータを２番目以降に、各帯域のフレーム間予測のフレーム内予測に対する処理回数の比率が大きい順番に復号する第２の復号手段６２と、を有する。 The present invention (Claim 6) is a moving picture decoding apparatus for decoding data encoded by the moving picture encoding apparatus according to claim 5 in order to reproduce it at the same resolution as the original image ,
For each frame in which the encoded data is encoded, where N is a positive integer and the number of wavelet division levels, k is an integer between 1 and N ,
First decoding means 61 for first decoding encoded data of the lowest frequency band LLN;
Second decoding means for decoding the encoded data of the frequency bands HLk, LHk, and HHk higher than the lowest frequency band in the descending order of the ratio of the number of processing times to intra-frame prediction of inter-frame prediction of each band. 62 .

本発明（請求項７）は、入力された原画像をウェーブレット変換により帯域分割する帯域分割手段を有し、分割された帯域毎にフレーム内およびフレーム間予測を画素毎に行う動画像符号化装置であって、
復号側において原画像の１／２ⁿ倍の解像度毎に段階的に再生する場合に、
帯域分割手段により分割された帯域の符号化対象画素近傍信号のフレーム間相関が所定の閾値より大きい場合には、フレーム間予測を行い、該フレーム間相関が所定の閾値より小さい場合には、フレーム内予測を行い、符号化する符号化手段と、
Ｎを正の整数でウェーブレット分割レベル数とし、ｋを１以上（Ｎ−１）以下の整数として、符号化された各フレーム毎に、
最低周波数帯域ＬＬNの符号化されたデータを１番目に伝送する第１の伝送手段と、
第Ｎレベルの３つの周波数帯域ＨＬN、ＬＨN、ＨＨNの符号化されたデータを２ないし４番目に、各帯域のフレーム間予測のフレーム内予測に対する処理回数の比率が大きい順番に伝送する第２の伝送手段と、
第Ｎレベルの３つの周波数帯域ＨＬN、ＬＨN、ＨＨNより高い周波数帯域ＨＬk、ＬＨk、ＨＨkの符号化されたデータを５番目以降に、低周波数帯域から高周波数帯域の順にレベル毎に、かつ同一レベル内では、ＨＬk、ＬＨk、ＨＨkの順番に伝送する第３の伝送手段と、を有する。 The present invention (Claim 7) has a band dividing means for dividing an input original image into bands by wavelet transform, and a moving picture coding apparatus that performs intra-frame and inter-frame prediction for each divided band for each pixel. Because
When playing back in stages for each resolution of 1/2 ⁿ times the original image on the decoding side,
When the inter-frame correlation of the encoding target pixel neighborhood signal of the band divided by the band dividing unit is larger than a predetermined threshold, inter-frame prediction is performed, and when the inter-frame correlation is smaller than the predetermined threshold, Encoding means for performing intra prediction and encoding;
For each encoded frame, where N is a positive integer and the number of wavelet division levels, and k is an integer between 1 and (N−1),
First transmission means for transmitting encoded data of the lowest frequency band LLN first;
The encoded data of the three frequency bands HLN, LHN, and HHN of the Nth level are transmitted from the second to the fourth, in the order of the ratio of the number of processing times with respect to the intraframe prediction of the interframe prediction of each band. Transmission means;
Three frequency bands HLN of the N levels, LHN, HHN higher frequency band HLk, LHk, the fifth and subsequent encoded data for HHK, for each level in order of the high frequency band from a low frequency band, and the same level Includes third transmission means for transmitting in the order of HLk, LHk, and HHk.

本発明（請求項８）は、請求項７記載の動画像符号化装置により符号化されたデータを原画像の１／２ⁿ倍の解像度に段階的に再生するために復号する動画像復号化装置であって、
Ｎを正の整数でウェーブレット分割レベル数とし、ｋを１以上（Ｎ−１）以下の整数として、符号化されたデータを符号化された各フレーム毎に、
最低周波数帯域ＬＬNの符号化されたデータを１番目に復号する第１の復号手段と、
第Ｎレベルの３つの周波数帯域ＨＬN、ＬＨN、ＨＨNの符号化されたデータを２ないし４番目に、各帯域のフレーム間予測のフレーム内予測に対する処理回数の比率が大きい順番に復号する第２の復号手段と、
第Ｎレベルの３つの周波数帯域ＨＬN、ＬＨN、ＨＨNより高い周波数帯域ＨＬk、ＬＨk、ＨＨkの符号化されたデータを５番目以降に、低周波数帯域から高周波数帯域の順にレベル毎に、かつ、同一レベル内ではＨＬk、ＬＨk、ＨＨkの順番に復号する第３の復号手段と、を有する。 According to the present invention (Claim 8), video decoding is performed so that the data encoded by the video encoding apparatus according to Claim 7 is decoded so as to be reproduced stepwise to a resolution of 1/2 ⁿ times the original image. A device,
For each frame in which the encoded data is encoded, where N is a positive integer and the number of wavelet division levels, k is an integer of 1 to (N-1),
First decoding means for first decoding the encoded data of the lowest frequency band LLN;
Decode second to fourth encoded data of the three frequency bands HLN, LHN, and HHN of the Nth level in descending order of the ratio of the number of processing times to intraframe prediction of interframe prediction of each band Decryption means;
The encoded data of the frequency bands HLk, LHk, and HHk higher than the three frequency bands HLN, LHN, and HHN of the Nth level are the same for each level in the order from the fifth frequency band to the high frequency band. Within the level , there is a third decoding means for decoding in the order of HLk, LHk, and HHk.

本発明（請求項９）は、コンピュータを、請求項５乃至８のいずれか１項に記載の動画像処理装置として機能させる動画像処理プログラムである。 The present invention (Claim 9) is a moving image processing program that causes a computer to function as the moving image processing apparatus according to any one of Claims 5 to 8.

本発明（請求項１０）は、コンピュータを、請求項５乃至８のいずれか１項に記載の動画像処理装置として機能させるプログラムを格納したコンピュータ読み取り可能な記録媒体である。 The present invention (Claim 10) is a computer-readable recording medium storing a program that causes a computer to function as the moving image processing apparatus according to any one of Claims 5 to 8.

上記のように本発明によれば、空間解像度スケーラビリティを有する動画像可逆符号化方法・復号化方法において、符号化データの効率的な段階的伝送を実現することができる。 As described above, according to the present invention, efficient stepwise transmission of encoded data can be realized in a moving image lossless encoding method / decoding method having spatial resolution scalability.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

[第１の実施の形態]
本実施の形態では、Ex-MLVCにおける符号化データの効率的な段階的伝送方法について説明する。以下では、Ex-MLVCの基礎であるMLVCの可逆符号化器を用いて説明する。 [First embodiment]
In the present embodiment, an efficient stepwise transmission method of encoded data in Ex-MLVC will be described. In the following, description will be made using the MLVC lossless encoder which is the basis of Ex-MLVC.

図３は、本発明の第１の実施の形態におけるMLVCの可逆符号化器の構成を示す。 FIG. 3 shows the configuration of the MLVC lossless encoder according to the first embodiment of the present invention.

同図に示すMLVC可逆符号化器は、可逆カラー変換部１０、可逆ウェーブレット変換部２０、時空間適応予測符号化部３０、エントロピー符号化部４０及び多重化部５０から構成される。 The MLVC lossless encoder shown in the figure includes a lossless color transform unit 10, a lossless wavelet transform unit 20, a space-time adaptive prediction coding unit 30, an entropy coding unit 40, and a multiplexing unit 50.

可逆カラー変換部１０は、入力された原画像のカラー成分間の相関を減少させ、その信号を可逆ウェーブレット変換部２０に出力する。 The reversible color transform unit 10 reduces the correlation between the color components of the input original image and outputs the signal to the reversible wavelet transform unit 20.

可逆ウェーブレット変換部２０は、可逆フィルタを用いて、入力された動画像の信号を帯域分割し、解像度スケーラビリティを実現すると共に、フレーム内相関を減少させ、最低周波数帯域を時空間適用予測符号化部３０に出力し、最低周波数帯域以外はエントロピー符号化部４０に出力する。最低周波数帯域以外の信号については、フレーム間相関が弱く、時間計算の増加に対する符号化効率の向上率が小さいことを考慮して時空間適応予測符号化は行わない。 The reversible wavelet transform unit 20 divides the band of the input moving image signal using a reversible filter, realizes resolution scalability, reduces intra-frame correlation, and uses the lowest frequency band as a space-time applied predictive encoding unit. 30 and outputs to the entropy encoding unit 40 other than the lowest frequency band. For signals other than the lowest frequency band, spatio-temporal adaptive predictive coding is not performed considering that the inter-frame correlation is weak and the improvement rate of the coding efficiency with respect to the increase in time calculation is small.

時空間適応予測符号化部３０は、符号化対象画素近傍信号のフレーム間相関が所定の値より大きい場合はフレーム間予測を行い、フレーム間創刊が所定の値より小さい場合にはフレーム内予測を行い符号化し、多重化部５０に出力する。 The space-time adaptive prediction encoding unit 30 performs inter-frame prediction when the inter-frame correlation of the encoding target pixel neighborhood signal is larger than a predetermined value, and performs intra-frame prediction when the inter-frame launch is smaller than the predetermined value. Encoding is performed and output to the multiplexing unit 50.

エントロピー符号化部４０は、最低周波数帯域以外の信号に対してはエントロピー符号化を行い、多重化部５０に出力する。 The entropy encoding unit 40 performs entropy encoding on a signal other than the lowest frequency band and outputs the signal to the multiplexing unit 50.

多重化部５０は、時空間適応予測符号化部３０とエントロピー符号化部４０からの信号を多重化して、符号化ビットストリームを復号化器に出力する。 The multiplexing unit 50 multiplexes signals from the space-time adaptive prediction encoding unit 30 and the entropy encoding unit 40 and outputs an encoded bit stream to the decoder.

本実施の形態では、ランダムアクセス機能を実現するために、適当な間隔でフレーム内符号化（Inter-coded picture：Iフレーム）を行う。フレーム内符号化間の画像の集まりはGOP(Group Of Pictures)と呼ばれ、本実施の形態では、フレーム内符号化とフレーム間予測符号化フレーム（Predictive-coded picture：Pフレーム）の２種類のフレームにより構成する。 In this embodiment, in order to realize a random access function, intra-frame coding (Inter-coded picture: I frame) is performed at an appropriate interval. A collection of images between intra-frame coding is called GOP (Group Of Pictures), and in this embodiment, two types of intra-frame coding and inter-frame prediction coded frames (predictive-coded pictures: P frames) are used. Consists of frames.

本実施の形態では、解像度スケーラビリティ実現の容易さから、可逆ウェーブレット変換部２０の変換方式としてウェーブレットを用いる。可逆符号化を実現するために、（５，３）可逆フィルタ（例えば、A.R. Calderbank et. al., "Wavelet transforms that map integers to integers," vol. E85-A, no.8, pp.1822-1830, Aug. 2002.）を使用し、画像の縦横方向で同一のものを用いる。ウェーブレット分割の方法は、１次元ウェーブレット変換を画像の縦横各方向に独立に施すことにより画像を４つの周波数帯域に分割し、最低周波数を担う周波数帯域を再帰的に４つの周波数帯域に分割するMallat分割（例えば、S.G. Mallat, "A Theory for multiresolution signal decomposition: The wavelet representation," IEEE Trans. Pattern Analysis & Machine Intelligence, vol. 11, pp. 674-693, July 1989）を用いる。 In this embodiment, a wavelet is used as the conversion method of the reversible wavelet transform unit 20 because of the ease of realizing resolution scalability. In order to realize lossless encoding, (5, 3) lossless filters (for example, AR Calderbank et. Al., “Wavelet transforms that map integers to integers,” vol. E85-A, no.8, pp.1822- 1830, Aug. 2002.), and use the same image in the vertical and horizontal directions. The wavelet division method divides an image into four frequency bands by performing a one-dimensional wavelet transform independently in the vertical and horizontal directions of the image, and recursively divides the frequency band carrying the lowest frequency into four frequency bands. Splitting (eg, SG Mallat, “A Theory for multiresolution signal decomposition: The wavelet representation,” IEEE Trans. Pattern Analysis & Machine Intelligence, vol. 11, pp. 674-693, July 1989) is used.

図４に２レベルのMallat分割の例を示す。当該可逆ウェーブレット変換部２０では、最低周波数帯域（LL）の信号を時空間適応予測符号化部３０に出力し、それ以外の帯域の信号をエントロピー符号化部４０に出力する。 FIG. 4 shows an example of two-level Mallat division. The reversible wavelet transform unit 20 outputs a signal in the lowest frequency band (LL) to the spatiotemporal adaptive prediction encoding unit 30, and outputs signals in other bands to the entropy encoding unit 40.

可逆符号化に関する静止画像国際標準規格JPEG-LSは、非線形予測符号化とコンテクストモデリングに基づく方式で画像の局所的な性質変化に追従した予測が可能であり、JPEG2000ロスレスモードと比較して演算が高速で符号化効率に優れることが知られている。MLVCでは、JPEG-LSで用いられている非線形予測器を３次元間予測に拡張することにより、高い符号化効率を実現している。 JPEG-LS, the international standard for still images related to lossless encoding, is capable of prediction that follows changes in local properties of an image using a method based on nonlinear predictive encoding and context modeling. Compared with JPEG2000 lossless mode, It is known that the coding efficiency is excellent at high speed. MLVC achieves high coding efficiency by extending the nonlinear predictor used in JPEG-LS to inter-dimensional prediction.

MLVCの時空間適応予測符号化部３０は、図５に示すような構成を有する。時空間適応予測符号化部３０は、２次元予測器３１、３次元予測器３２、動き推定部３３、相関係数計算部３４、シフト演算子３５、エントロピー符号化部３６から構成され、予測符号化法を最低周波数帯域の信号に適用する。Iフレームおいては、２次元予測器３１（フレーム内相関を除去）を適用する。また、符号化効率を高めるため、Pフレームにおいては、２次元予測器３１と３次元予測器３２（フレーム内相関とフレーム間相関を同時に除去）を適応的に切り替え、残差信号を生成する。MLVCでは、この方法を「時空間適応予測符号化」と呼ぶ。 The MLVC spatio-temporal adaptive prediction encoding unit 30 has a configuration as shown in FIG. The space-time adaptive prediction encoding unit 30 includes a two-dimensional predictor 31, a three-dimensional predictor 32, a motion estimation unit 33, a correlation coefficient calculation unit 34, a shift operator 35, and an entropy encoding unit 36. Apply the method to the signal of the lowest frequency band. In the I frame, a two-dimensional predictor 31 (removal of intra-frame correlation) is applied. Also, in order to increase the coding efficiency, in the P frame, the two-dimensional predictor 31 and the three-dimensional predictor 32 (removing intra-frame correlation and inter-frame correlation simultaneously) are adaptively switched to generate a residual signal. In MLVC, this method is called “space-time adaptive predictive coding”.

当該時空間適応予測符号化部３０における２次元予測器３１と３次元予測器３２の切り換えは、相関係数計算部３４において、図６に示す現フレームａ内の近傍信号値ｘ₁〜ｘ₄と参照フレームｂ内の近傍信号値ｙ_１〜ｙ₄の相関係数Ｒを計算して決定する。ここでｙ_１〜ｙ₄は、動き推定部３３の動き推定の結果をもとにシフト演算子３５によりシフトした信号列を用いる。相関係数計算部３４において相関係数を求め、相関係数がある閾値Ｔ_ｈ以上の場合、フレーム間相関が大きいと判断し、３次元予測器３２を用いる。それ以外は２次元予測器３１を用いる。 Switching between the two-dimensional predictor 31 and the three-dimensional predictor 32 in the spatio-temporal adaptive predictive encoding unit 30 is performed by the correlation coefficient calculation unit 34 in the vicinity signal values x _{1 to} x ₄ in the current frame a shown in FIG. And the correlation coefficient R of the neighborhood signal values y _{1 to} y ₄ in the reference frame b is calculated and determined. Here, for y ₁ to y ₄ , a signal sequence shifted by the shift operator 35 based on the result of motion estimation by the motion estimation unit 33 is used. The correlation coefficient in the correlation coefficient calculation unit 34, not less than the threshold T _h a correlation coefficient, determines that correlation between frames is large, using a three-dimensional predictor 32. Otherwise, a two-dimensional predictor 31 is used.

つまり、
・相関係数Ｒ＜Ｔ_ｈの場合は２次元予測器３１を用いる。
・相関係数Ｒ≧Ｔ_ｈの場合は３次元予測器3２を用いる。 That means
When the correlation coefficient R < _Th , the two-dimensional predictor 31 is used.
• For the correlation coefficient R ≧ T _h is used three-dimensional predictor 32.

但し、 However,

２次元及び３次元予測は、以下のように行う。

Two-dimensional and three-dimensional prediction is performed as follows.

１）２次元予測：
本発明では、画像の局所的な性質変化に追従した予測を行うため、予め用意された複数の予測器を適応的に切り替える非線形予測器を用いる。ここでは、JPEG-LSと同じ非線形予測器を利用する。JPEG-LSでは、３種類の予測器を、符号化対象画素近傍信号の状態により切り換え、残差信号を生成する。縦方向及び横方向にエッジがあると判断された場合は、それぞれのエッジ方向に隣接する１画素を用いて予測し、輝度変化が滑らかであると判断した場合は隣接する３画素を用いて予測する。 1) Two-dimensional prediction:
In the present invention, a non-linear predictor that adaptively switches among a plurality of predictors prepared in advance is used in order to perform prediction following a local property change of an image. Here, the same nonlinear predictor as JPEG-LS is used. In JPEG-LS, three types of predictors are switched according to the state of the encoding target pixel neighborhood signal to generate a residual signal. When it is determined that there are edges in the vertical direction and the horizontal direction, prediction is performed using one pixel adjacent to each edge direction, and when it is determined that the luminance change is smooth, prediction is performed using three adjacent pixels. To do.

ここで、ｘ₁、x₂、ｘ₃は符号化対象画素近傍の信号値であり、「４」で定義される。

Here, x ₁ , x ₂ , and x ₃ are signal values in the vicinity of the encoding target pixel, and are defined by “4”.

２）３次元予測：
３次元予測の場合にも２次元予測と同様に、符号化対象画素の近傍信号値の状態により予測器を切り換える非線形予測を行う。縦方向または横方向にエッジがあると判断された場合は、現フレームａ及び参照フレームｂのそれぞれのエッジ方向に隣接する信号を用いて予測する。エッジの方向は、参照フレームｂの縦方向の差分絶対値│ｙ₀−ｙ₃│と横方向の差分絶対値│ｙ₀−ｙ_１│を比較して、ある閾値よりも大きい方向をエッジと判断する。縦方向のエッジ（以下の条件ｉの場合）と判断された場合は、現フレームａ及び参照フレームｂとの縦方向の信号ｘ₃，ｙ₀，ｙ₃に関して、２次元予測と同様な方法で予測信号を選択する。横方向のエッジ（以下の条件iiの場合）の場合も同様な方法で予測器を選択する。エッジでないと判断された場合には、近傍の3画素（ｘ₁，ｘ₃，ｙ₀）の平均値を予測値とする。 2) 3D prediction:
In the case of three-dimensional prediction, similarly to the two-dimensional prediction, non-linear prediction is performed in which the predictor is switched depending on the state of the neighborhood signal value of the pixel to be encoded. When it is determined that there is an edge in the vertical direction or the horizontal direction, prediction is performed using signals adjacent to each edge direction of the current frame a and the reference frame b. As for the edge direction, the absolute difference value | y ₀ −y ₃ | in the vertical direction of the reference frame b is compared with the absolute difference value | y ₀ −y ₁ | in the horizontal direction. to decide. When it is determined that the edge is in the vertical direction (in the case of the following condition i), the vertical signals x ₃ , y ₀ , and y ₃ with the current frame a and the reference frame b are processed in the same manner as in the two-dimensional prediction. Select the prediction signal. In the case of a horizontal edge (in the case of condition ii below), a predictor is selected by the same method. If it is determined not to be an edge, the average value of three neighboring pixels (x ₁ , x ₃ , y ₀ ) is used as the predicted value.

条件ｉ：│ｙ₀−ｙ₁│−│ｙ₀−ｙ₃│＞Ｔの場合： Condition i: When │y ₀ -y ₁ │-│y ₀ -y ₃ │> T:

条件ii：│ｙ₀−ｙ₁│−│ｙ₀−ｙ₃│＜−Ｔの場合：

Condition ii: When | y ₀ −y ₁ | − | y ₀ −y ₃ | <−T:

条件iii：条件ｉ,ii以外の場合

Condition iii: Other than conditions i and ii

MLVCでは、最低周波数帯域以外の信号については、フレーム間相関が弱く、計算時間の増加に対する符号化効率の向上率が小さいことを考慮し、時空間適応予測符号化部３０による、時空間適応予測符号化を行わない。

In MLVC, the spatio-temporal adaptive prediction encoding unit 30 performs spatio-temporal adaptive prediction in consideration of the fact that the inter-frame correlation is weak for signals other than the lowest frequency band and the improvement rate of the encoding efficiency with respect to the increase in calculation time is small. Does not encode.

Ex-MLVCでは、更なる符号化効率の向上を目的として、時空間適応予測符号化の適応帯域を画像の統計的性質に応じて適応的に変化させるモデルを提供している。 Ex-MLVC provides a model that adaptively changes the adaptive band of spatio-temporal adaptive predictive coding according to the statistical properties of the image for the purpose of further improving the coding efficiency.

Ex-MLVCでは、最低周波数帯域については、MLVCと同様に、時空間適応予測符号化部３０で時空間適応予測処理を行い、それ以外の帯域においては、フレーム間相関が強い帯域では、時空間適応予測処理を行い、弱い帯域ではエントロピー符号化部３０でウェーブレット係数を直接符号化する。但し、
１）計算量をできるだけ低く抑え、
２）付加情報量をなくす、
ために各帯域において直接フレーム間相関は計算しない。 In Ex-MLVC, for the lowest frequency band, the spatio-temporal adaptive prediction encoding unit 30 performs spatio-temporal adaptive prediction processing in the same manner as MLVC. An adaptive prediction process is performed, and in the weak band, the wavelet coefficients are directly encoded by the entropy encoding unit 30. However,
1) Keep the amount of calculation as low as possible,
2) Eliminate the amount of additional information
Therefore, the inter-frame correlation is not directly calculated in each band.

図７に示すように、ウェーブレット係数が最低周波数帯域を基準にして同一方向の帯域間には相関があることを利用して、1レベル低い周波数帯域での３次元予測と２次元予測の処理回数の比率に応じて切り換えを行う。 As shown in FIG. 7, using the fact that wavelet coefficients have a correlation between bands in the same direction with reference to the lowest frequency band, the number of processing times of three-dimensional prediction and two-dimensional prediction in a frequency band one level lower Switching is performed according to the ratio.

１）1レベル低周波数帯のR_inter≧THの場合は、時空間適応予測を行う。 1) When R _inter ≧ TH in the 1-level low frequency band, space-time adaptive prediction is performed.

２）1レベル低周波数帯のR_inter<THの場合は、ウェーブレット係数符号化を行う。
但し、 2) When R _inter <TH in the 1-level low frequency band, wavelet coefficient coding is performed.
However,

ここで、Inter、Intraは、それぞれ３次元予測器３２及び２次元予測器３１の処理回数を表す。なお、これらの処理回数は、それぞれ時空間適応予測符号化部３０内のメモリ（図示せず）に格納されており、Ｒ_interの計算を行う際に読み出される。

Here, Inter and Intra represent the number of processings of the three-dimensional predictor 32 and the two-dimensional predictor 31, respectively. These processing times are stored in a memory (not shown) in the spatio-temporal adaptive predictive coding unit 30 and are read when R _inter is calculated.

R_interの値が大きい場合は、３次元予測器３２の処理回数が多いことを表す。即ち、フレーム間相関が強いことを意味している。また、上記の１）、２）のTHは所定の閾値を表す。 When the value of R _inter is large, it indicates that the number of processing times of the three-dimensional predictor 32 is large. That is, it means that the correlation between frames is strong. Further, TH in the above 1) and 2) represents a predetermined threshold value.

次に、帯域単位処理手順について説明する。 Next, the bandwidth unit processing procedure will be described.

可逆ウェーブレット変換部２０におけるウェーブレット分割レベル数が「２」の場合を例に説明する。 The case where the number of wavelet division levels in the reversible wavelet transform unit 20 is “2” will be described as an example.

図８は、本発明の第１の実施の形態における時空間適応予測の帯域単位処理を説明するための図（帯域分割＝２レベル）である。同図に示す矢印の方向に向かって、時空間適応予測処理を行うか否かを判断する。 FIG. 8 is a diagram (band division = 2 levels) for explaining band unit processing of space-time adaptive prediction in the first embodiment of the present invention. It is determined whether or not the spatiotemporal adaptive prediction processing is performed in the direction of the arrow shown in FIG.

図９は、本発明の第１の実施の形態における帯域単位の処理手順を示す。 FIG. 9 shows a processing procedure for each band in the first embodiment of the present invention.

まず、最初に、Ex-MLVCでは、最低周波数帯域LL2においては、MLVCと同様に時空間適応予測符号化部３０で時空間適応予測処理を行う。全ての処理が終了後、R_interを計算する（ステップ１０１）。 First, in Ex-MLVC, in the lowest frequency band LL2, the space-time adaptive prediction encoding unit 30 performs space-time adaptive prediction processing in the same manner as MLVC. After all processing is completed, R _inter is calculated (step 101).

R_Inter≧THの場合（ステップ１０２、Yes）、HL2、LH2、HH2帯域において時空間適応予測処理を行う（ステップ１０３，１０６，１０９）。R_Inter＜THの場合（ステップ１０２、No）は終了する。 When R _Inter ≧ TH (step 102, Yes), spatiotemporal adaptive prediction processing is performed in the HL2, LH2, and HH2 bands (steps 103, 106, and 109). If R _Inter <TH (step 102, No), the process ends.

時空間適応予測処理を行った場合は、時空間適用予測符号化部３０の相関係数計算部３４において、HL2、LH2、HH2帯域のそれぞれについて、R_Interを計算する。それぞれの帯域においてR_Inter≧THの条件を満たすか否かを判断し（ステップ１０４，１０７，１１０）、HL1、LH1、HH1帯域において時空間適用予測処理を行うかどうかを決定する。 When the spatiotemporal adaptive prediction process is performed, the correlation coefficient calculation unit 34 of the spatiotemporal application prediction encoding unit 30 calculates R _Inter for each of the HL2, LH2, and HH2 bands. It is determined whether or not the condition of R _Inter ≧ TH is satisfied in each band (steps 104, 107, and 110), and it is determined whether or not to perform the spatio-temporal application prediction process in the HL1, LH1, and HH1 bands.

次に、小ブロック単位処理手順について説明する。 Next, the small block unit processing procedure will be described.

ウェーブレット変換領域おいては、各周波数帯の同じ空間位置に対応する成分は相関が強いことが知られている。この性質を利用して、小ブロック単位で時空間適用予測を行うかどうかの判断を行う。 In the wavelet transform region, it is known that components corresponding to the same spatial position in each frequency band have a strong correlation. Using this property, it is determined whether to perform space-time application prediction in small blocks.

図１０は、本発明の第１の実施の形態における時空間適応予測ブロック単位の処理を説明するための図である。同図では、分割レベル数が「２」の場合の処理方向を示している。 FIG. 10 is a diagram for explaining processing in units of space-time adaptive prediction blocks in the first embodiment of the present invention. In the figure, the processing direction when the number of division levels is “2” is shown.

図１１は、本発明の第１の実施の形態におけるブロック単位の処理手順を示す。 FIG. 11 shows a processing procedure for each block in the first embodiment of the present invention.

処理手順は、図９に示す方法をブロック毎に行い、全ブロック数（N）になるまで繰り返す。 For the processing procedure, the method shown in FIG. 9 is performed for each block and is repeated until the total number of blocks (N) is reached.

これにより、画像の局所的性質に追従させることが可能となり、符号化効率が改善する。 Thereby, it becomes possible to follow the local property of the image, and the coding efficiency is improved.

以上の帯域単位処理または小ブロック単位処理により生成された残差信号またはウェーブレット係数は、エントロピー符号化部４０により、符号化データとして出力される。 The residual signal or wavelet coefficient generated by the above band unit processing or small block unit processing is output as encoded data by the entropy encoding unit 40.

Ex-MLVCでは、スケーラビリティ機能により伝送時や復号時の時間優先度を自由に設定できる。スケーラビリティの単位として、C：色、R：空間解像度の２つを定義している。 In Ex-MLVC, the time priority at the time of transmission and decoding can be freely set by the scalability function. Two scalability units are defined: C: color and R: spatial resolution.

図１２は、本発明の第１の実施の形態における空間解像度を優先的に並べたRC構造の例を示す。但し、これは１例であり、LLやHHなどの各成分をどのような順序で、符号化器による伝送あるいは、復号化器で復号するかはユーザに任されており、Ex-MLVCでは規定していない。ここでは、空間解像度の各成分（LL2、HL2、LH2、…）を効率よく、伝送並びに復号する順序について述べる。本発明では、空間解像度スケーラビリティを実現する際に、
１）原画像と同じ解像度で再生する場合；
２）原画像の１/２ⁿ倍の解像度毎に段階的に再生する場合；
の２つの技術を提供する。 FIG. 12 shows an example of an RC structure in which the spatial resolutions are preferentially arranged in the first embodiment of the present invention. However, this is an example, and it is up to the user to determine the order in which the components such as LL and HH are transmitted by the encoder or decoded by the decoder. Not done. Here, the order in which each component (LL2, HL2, LH2,...) Of spatial resolution is efficiently transmitted and decoded will be described. In the present invention, when realizing spatial resolution scalability,
1) When playing back at the same resolution as the original image;
2) When reproducing in stages at resolutions of 1/2 ⁿ times the original image;
The following two technologies are provided.

まず、本実施の形態では、上記の１）の場合について説明する。 First, in the present embodiment, the above case 1) will be described.

Ｉピクチャの場合、その伝送（符号化器）または復号（復号化器）する順序は固定であり、以下の２つの条件を満たすように並べる。 In the case of an I picture, the transmission (encoder) or decoding (decoder) order is fixed, and the pictures are arranged so as to satisfy the following two conditions.

・低周波数帯域から高周波数帯域；
・同一レベル内の周波数帯域では、HL→LH→HH；
図１２（ａ）にはウェーブレットレベル＝２の場合のＲＣデータ構造を示す。図１３（ａ）には、図１２（ａ）に対応するウェーブレット領域での順序を示す。・ Low frequency band to high frequency band;
・ In frequency band within the same level , HL → LH → HH;
FIG. 12A shows an RC data structure when the wavelet level = 2. FIG. 13A shows the order in the wavelet region corresponding to FIG.

また、Ｐピクチャの場合は、以下の条件を満たすように並べ替えを行う。 In the case of a P picture, rearrangement is performed so as to satisfy the following conditions.

・１番目は、最低周波数帯域LLN（但し、Ｎは正の整数でウェーブレット分割レベル数）；
・２番目以降の周波数帯域は、R_Interの降順；
即ち、符号化器の多重化部５０から伝送（出力）する際には、予めメモリ等に格納された上記の条件を参照して、最低周波数帯域LLN以外は、R_Interが大きい周波数帯域ほど先に伝送する。同様に、復号化器でも、予め上記の条件が格納されたメモリ等を参照して、最低周波数帯域LLN以外は、R_Interが大きい周波数帯域ほど先に復号する。 -The first is the lowest frequency band LLN (where N is a positive integer and the number of wavelet division levels);
・ The second and subsequent frequency bands are in descending order of R _Inter ;
That is, in transmitting from the multiplexing unit 50 of the encoder (output), with reference to pre-stored in a memory such as the conditions, but the lowest frequency band LLN is larger frequency band R _Inter ahead Transmit to. Similarly, in the decoder, with reference to the previously it said conditions are stored memory or the like, but the lowest frequency band LLN decodes earlier larger frequency band R _Inter.

ここで、R_Interは、３次元予測器３２及び２次元予測器３１の処理回数比率を表し、R_Interが大きいほど、３次元予測器３２の処理回数が多いことを意味している。即ち、上記の条件は、３次元予測器３２が２次元予測器３１よりも符号化効率に大きく寄与することを前提としている。 Here, R _Inter represents the ratio of the number of processing times of the three-dimensional predictor 32 and the two-dimensional predictor 31, and means that the larger the number of R _Inter , the larger the number of processing times of the three-dimensional predictor 32. That is, the above condition is based on the premise that the three-dimensional predictor 32 contributes more greatly to the coding efficiency than the two-dimensional predictor 31.

ウェーブレット分割レベル数が「２」で、R_Interの大きさに関して、
・HL2→LH2→HL1→HH2→LH1→HH1
の場合、周波数帯域の伝送または復号の順序は、図１２（ｂ）ならびに、図１３（ｂ）のようになる。 Number wavelet division level is "2", with respect to the size of the R _Inter,
・ HL2 → LH2 → HL1 → HH2 → LH1 → HH1
In this case, the order of frequency band transmission or decoding is as shown in FIG. 12B and FIG. 13B.

［第２の実施の形態］
第２の実施の形態として、上記の空間解像度スケーラビリティを実現する際の「２）の原画像の１／２^ｎの解像度毎に段階的に再生する場合」について説明する。 [Second Embodiment]
As the second embodiment, a case of “2) stepwise reproduction for each 1/2 ⁿ resolution of the original image when realizing the spatial resolution scalability” will be described.

Ｉピクチャに関しては、前述の「１）原画像と同じ解像度で再生する場合」と同じ基準のもと、固定した順序に並べる。Ｐピクチャの場合は、以下の条件を満たすように並べ替えを行う。 The I pictures are arranged in a fixed order based on the same criteria as in “1) When reproducing at the same resolution as the original image” described above. In the case of a P picture, rearrangement is performed so as to satisfy the following conditions.

・低周波数帯域から高周波数帯域；
・１番目は、最低周波数帯域LLN(但し、Ｎは正の整数でウェーブレット分割レベル数)；
・２−４番目は、HLN、LHN、HHNに関してR_Interの降順；
・５番目以降の順番は、各レベル内の周波数帯域HL、LH、HHを、対応するレベルのHL、LH、HHの順序；
に準じる。・ Low frequency band to high frequency band;
-The first is the lowest frequency band LLN (where N is a positive integer and the number of wavelet division levels);
・ 2-4th is descending order of R _Inter for HLN, LHN, HHN;
- 5 th and subsequent order, the frequency band HL in each level, LH, and HH, corresponding, Relais bell HL, LH, and HH sequence;
According to

ウェーブレット分割レベル数が「２」でR_Interの大きさに関して、
・HL２→LH２→HH2
の場合、各周波数帯域の符号化器における伝送または、復号化器における復号の順序は、図１２（ｃ）並びに図１３（ｃ）のようになる。つまり、多重化部５０は、予め上記の条件が格納されたメモリ等を参照して符号化ビットストリームを伝送する。また、復号化器においても、予め上記の条件が格納されたメモリ等を参照して、入力されたデータを復号する。

Regarding the size of R _{Inter when} the number of wavelet division levels is “2”,
・ HL2 → LH2 → HH2
In this case, the order of transmission in the encoder of each frequency band or decoding in the decoder is as shown in FIG. 12 (c) and FIG. 13 (c). That is, the multiplexing unit 50 transmits the encoded bit stream with reference to a memory or the like in which the above conditions are stored in advance. Also in the decoder, the input data is decoded with reference to a memory or the like in which the above conditions are stored in advance.

また、上記の符号化器及び復号化器の動作をプログラムとして構築し、符号化器、復号化器として利用されるコンピュータにインストールして実行させる、または、ネットワークを介して流通させることが可能である。 In addition, the operations of the encoder and decoder described above can be constructed as a program, installed in a computer used as an encoder and decoder, executed, or distributed via a network. is there.

また、構築されたプログラムをハードディスク装置や、フレキシブルディスク・ＣＤ−ＲＯＭ等の可搬記憶媒体に格納しておき、コンピュータにインストールする、または、配布することが可能である。 Further, the constructed program can be stored in a portable storage medium such as a hard disk device or a flexible disk / CD-ROM, and installed in a computer or distributed.

なお、本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において種々変更・応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made within the scope of the claims.

本発明は、動画像の符号化・復号化技術に適用可能である。 The present invention is applicable to a moving image encoding / decoding technique.

本発明の原理を説明するための図である。It is a figure for demonstrating the principle of this invention. 本発明の原理構成図である。It is a principle block diagram of this invention. 本発明の第１の実施の形態におけるMLVC符号化器の構成図である。It is a block diagram of the MLVC encoder in the 1st Embodiment of this invention. 本発明の第１の実施の形態におけるMallet分割（２レベル）の例である。It is an example of Mallet division | segmentation (2 levels) in the 1st Embodiment of this invention. 本発明の第１の実施の形態における時空間適応予測符号化部のブロック図である。It is a block diagram of the space-time adaptive prediction encoding part in the 1st Embodiment of this invention. 本発明の第１の実施の形態における符号化対象信号ｘと近傍信号を示す図である。It is a figure which shows the encoding target signal x and the vicinity signal in the 1st Embodiment of this invention. 本発明の第１の実施の形態における帯域間の相関関係を示す図である。It is a figure which shows the correlation between the bands in the 1st Embodiment of this invention. 本発明の第１の実施の形態における時空間適応予測の帯域単位の処理を説明するための図（帯域分割＝２レベル）である。It is a figure (band division = 2 levels) for demonstrating the process of the band unit of the spatiotemporal adaptive prediction in the 1st Embodiment of this invention. 本発明の第１の実施の形態における帯域単位の処理手順（帯域分割＝２レベル）である。It is the processing procedure (band division | segmentation = 2 level) of the band unit in the 1st Embodiment of this invention. 本発明の第１の実施の形態における時空間適応予測のブロック単位の処理（帯域分割＝２レベル）である。This is block-by-block processing (band division = 2 levels) for spatiotemporal adaptive prediction in the first embodiment of the present invention. 本発明の第１の実施の形態におけるブロック単位の処理手順（帯域分割＝２レベル）である。It is the processing procedure (band division | segmentation = 2 level) of the block unit in the 1st Embodiment of this invention. 本発明の第１・第２の実施の形態における空間解像度と優先的に並べたＲ−Ｃ構造の例（ウェーブレット分割レベル＝２の場合）である。It is an example (in the case of wavelet division | segmentation level = 2) of the RC structure put in order with the spatial resolution in the 1st, 2nd embodiment of this invention. 本発明の第１・第２の実施の形態における空間解像度成分の伝送順序・復号化順序の例（ウェーブレット領域表現）である。It is an example (wavelet domain expression) of the transmission order and decoding order of the spatial resolution component in the 1st, 2nd embodiment of this invention.

Explanation of symbols

１０可逆カラー変換部
２０可逆ウェーブレット変換部、帯域分割手段
３０時空間適応予測符号化部、符号化手段
４０エントロピー符号化部
５０多重化部
５１第１の伝送手段
５２第２の伝送手段
６１第１の復号手段
６２第２の復号手段
３１２次元予測器
３２３次元予測器
３３動き推定部
３４シフト演算子 DESCRIPTION OF SYMBOLS 10 Reversible color transformation part 20 Reversible wavelet transformation part, Band-division means 30 Space-time adaptive prediction encoding part, Encoding means 40 Entropy encoding part 50 Multiplexing part 51 First transmission means 52 Second transmission means 61 1st Decoding means 62 second decoding means 31 two-dimensional predictor 32 three-dimensional predictor 33 motion estimator 34 shift operator

Claims

A moving image encoding method in which an input original image is band-divided by wavelet transform, and intra-frame and inter-frame prediction is performed for each pixel for each divided band,
When playing back at the same resolution as the original image on the decoding side,
In the encoding means, when the inter-frame correlation of the encoding target pixel neighboring signal in the band divided by the band dividing means is larger than a predetermined threshold, inter-frame prediction is performed, and the inter-frame correlation is smaller than the predetermined threshold. A coding step of performing intra-frame prediction and coding;
For each encoded frame, where N is a positive integer and the number of wavelet division levels, and k is an integer between 1 and N,
A first transmission step of transmitting encoded data of the lowest frequency band LLN first;
Second transmission for transmitting encoded data of frequency bands HLk, LHk, and HHk higher than the lowest frequency band in order from the second to the largest in the ratio of the number of processing times to intra-frame prediction of inter-frame prediction of each band. Steps,
A moving image encoding method comprising:

A video decoding method for decoding data encoded by the video encoding method according to claim 1 in order to reproduce the data at the same resolution as an original image,
In the decoding means, N is a positive integer and the number of wavelet division levels, k is an integer of 1 to N, and the encoded data is encoded for each encoded frame.
A first decoding step of first decoding encoded data of the lowest frequency band LLN;
Second decoding for decoding the encoded data of the frequency bands HLk, LHk, and HHk higher than the lowest frequency band in the order of the ratio of the number of processing times with respect to the intra-frame prediction of the inter-frame prediction of each band. Steps,
A moving picture decoding method comprising:

A moving image encoding method in which an input original image is band-divided by wavelet transform, and intra-frame and inter-frame prediction is performed for each pixel for each divided band,
When playing back in stages for each resolution of 1/2 ⁿ times the original image on the decoding side,
In the encoding means, when the inter-frame correlation of the encoding target pixel neighboring signal in the band divided by the band dividing means is larger than a predetermined threshold, inter-frame prediction is performed, and the inter-frame correlation is smaller than the predetermined threshold. A coding step of performing intra-frame prediction and coding;
For each encoded frame, where N is a positive integer and the number of wavelet division levels, and k is an integer between 1 and (N−1),
A first transmission step of transmitting encoded data of the lowest frequency band LLN first;
The encoded data of the three frequency bands HLN, LHN, and HHN of the Nth level are transmitted from the second to the fourth, in the order of the ratio of the number of processing times with respect to the intraframe prediction of the interframe prediction of each band. A transmission step;
The encoded data of the frequency bands HLk, LHk, and HHk higher than the three frequency bands HLN, LHN, and HHN of the Nth level are the same for each level in order from the fifth frequency to the high frequency band. Within the level , a third transmission step for transmitting in the order of HLk, LHk, HHk;
A moving image encoding method comprising:

A video decoding method for decoding data encoded by the video encoding method according to claim 3 in order to reproduce it step by step at a resolution of 1/2 ⁿ times the original image,
In the decoding unit, N is a positive integer and the number of wavelet division levels, k is an integer of 1 to (N-1), and the encoded data is encoded for each frame.
A first decoding step of first decoding encoded data of the lowest frequency band LLN;
Three frequency bands HLN of the N levels, LHN, 4th 2 to the encoded data HHN, the Gosuru restored sequentially ratio of the number of processing times is larger for the intra prediction of the prediction between bands of the frame 2 Decryption step of
The encoded data of the frequency bands HLk, LHk, and HHk higher than the three frequency bands HLN, LHN, and HHN of the Nth level are the same for each level in order from the fifth frequency to the high frequency band. A third decoding step for decoding in the order of HLk, LHk, HHk within the level ;
A moving picture decoding method comprising:

A moving image encoding apparatus having band dividing means for dividing an input original image by wavelet transform, and performing intra-frame and inter-frame prediction for each divided band for each pixel,
When playing back at the same resolution as the original image on the decoding side,
When the inter-frame correlation of the encoding target pixel neighboring signal of the band divided by the band dividing unit is larger than a predetermined threshold, inter-frame prediction is performed, and when the inter-frame correlation is smaller than the predetermined threshold Encoding means for performing intra-frame prediction and encoding;
For each encoded frame, where N is a positive integer and the number of wavelet division levels, and k is an integer between 1 and N,
First transmission means for transmitting encoded data of the lowest frequency band LLN first;
Second transmission for transmitting encoded data of frequency bands HLk, LHk, and HHk higher than the lowest frequency band in order of increasing the ratio of the number of processing times to intra-frame prediction of inter-frame prediction of each band. Means,
A moving picture encoding apparatus comprising:

A video decoding device that decodes data encoded by the video encoding device according to claim 5 in order to reproduce the data at the same resolution as the original image,
For each frame in which the encoded data is encoded, where N is a positive integer and the number of wavelet division levels, k is an integer between 1 and N,
First decoding means for first decoding the encoded data of the lowest frequency band LLN;
Second decoding for decoding the encoded data of the frequency bands HLk, LHk, and HHk higher than the lowest frequency band in the order of the ratio of the number of processing times with respect to the intra-frame prediction of the inter-frame prediction of each band. Means,
A moving picture decoding apparatus comprising:

A moving image encoding apparatus having band dividing means for dividing an input original image by wavelet transform, and performing intra-frame and inter-frame prediction for each divided band for each pixel,
When playing back in stages for each resolution of 1/2 ⁿ times the original image on the decoding side,
When the inter-frame correlation of the encoding target pixel neighboring signal of the band divided by the band dividing unit is larger than a predetermined threshold, inter-frame prediction is performed, and when the inter-frame correlation is smaller than the predetermined threshold, Encoding means for performing intra-frame prediction and encoding;
For each encoded frame, where N is a positive integer and the number of wavelet division levels, and k is an integer between 1 and (N−1),
First transmission means for transmitting encoded data of the lowest frequency band LLN first;
The encoded data of the three frequency bands HLN, LHN, and HHN of the Nth level are transmitted from the second to the fourth, in the order of the ratio of the number of processing times with respect to the intraframe prediction of the interframe prediction of each band. Transmission means;
The encoded data of the frequency bands HLk, LHk, and HHk higher than the three frequency bands HLN, LHN, and HHN of the Nth level are the same for each level in order from the fifth frequency to the high frequency band. Within the level , a third transmission means for transmitting in the order of HLk, LHk, HHk;
A moving picture encoding apparatus comprising:

A moving picture decoding apparatus that decodes data encoded by the moving picture encoding apparatus according to claim 7 in order to reproduce it step by step to a resolution of 1/2 ⁿ times the original image,
For each frame in which the encoded data is encoded, where N is a positive integer and the number of wavelet division levels, k is an integer of 1 to (N-1),
First decoding means for first decoding the encoded data of the lowest frequency band LLN;
Decode second to fourth encoded data of the three frequency bands HLN, LHN, and HHN of the Nth level in descending order of the ratio of the number of processing times to intraframe prediction of interframe prediction of each band Decryption means;
The encoded data of the frequency bands HLk, LHk, and HHk higher than the three frequency bands HLN, LHN, and HHN of the Nth level for the fifth and subsequent levels, from the low frequency band to the high frequency band for each level , and Third decoding means for decoding in the order of HLk, LHk, HHk within the same level ;
A moving picture decoding apparatus comprising:

Computer
9. A moving image processing program that causes the moving image processing apparatus according to claim 5 to function.

Computer
A computer-readable recording medium storing a program that functions as the moving image processing apparatus according to any one of claims 5 to 8.