JP2008011009A

JP2008011009A - Video signal encoder, video signal decoder, video signal encoding program, and video signal decoding program

Info

Publication number: JP2008011009A
Application number: JP2006177670A
Authority: JP
Inventors: Kazuhiro Shimauchi; 和博嶋内
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2006-06-28
Filing date: 2006-06-28
Publication date: 2008-01-17

Abstract

<P>PROBLEM TO BE SOLVED: To solve a problem of a conventional video coding system that a local difference of image quality is caused because blocks subjected to filter processing and blocks not subjected to filter processing are intermingled in L frames in the prediction processing. <P>SOLUTION: Since a filtering section 411D applies filtering to a signal resulting from mixing a video signal obtained by applying resolution interpolation to a base layer local decode signal with a signal subjected to the ME/MC processing in response to respective weight W values to produce H frames, and update processing is applied to a signal resulting from weighting the H frames by the weight W to produce the L frames, only components in the temporal direction are reflected on the L frames. Blocks subjected to the filter processing by the filtering section 411D exist in the L frames at all times. Thus, band division in the temporal direction can be realized while predicting between resolutions. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は映像信号符号化装置、映像信号復号化装置、映像信号符号化プログラム及び映像信号復号化プログラムに係り、特に空間解像度とフレームレートにそれぞれスケーラビリティをもたせた映像信号符号化装置、映像信号復号化装置、映像信号符号化プログラム及び映像信号復号化プログラム符号化に関する。 The present invention relates to a video signal encoding device, a video signal decoding device, a video signal encoding program, and a video signal decoding program, and in particular, a video signal encoding device and a video signal decoding having scalability in spatial resolution and frame rate, respectively. The present invention relates to an encoding device, a video signal encoding program, and a video signal decoding program encoding.

従来、映像符号化において空間解像度及びＳＮＲ(Signal to Noise Ratio)スケーラビリティを実現する方法は数多く提案されており、様々な分野でこれらの実用化がなされている。しかしながら、フレームレー卜のスケーラビリティに関しては画期的なものが存在せず、低レートの映像信号を実現するためには、符号化処理とは別に便宜的にフレームを間引くなどの処理が必要であった。これに対して、近年、フレームレートのスケーラビリティを高い符号化効率で実現する方法が提案された（例えば、特許文献１参照）。 Conventionally, many methods for realizing spatial resolution and SNR (Signal to Noise Ratio) scalability in video coding have been proposed, and these have been put to practical use in various fields. However, there is no breakthrough in frame rate scalability, and in order to realize a low-rate video signal, processing such as thinning out frames for convenience is required in addition to encoding processing. It was. On the other hand, in recent years, a method for realizing the scalability of the frame rate with high coding efficiency has been proposed (for example, see Patent Document 1).

この特許文献１では、「動き補償時間方向フィルタ処理（以下、ＭＣＴＦ;Motion Compensation Temporal Filteringと表記）」が提案されており、これは、例えば従来のＨ．２６４／ＡＶＣのような動き推定（ＭＥ;Motion Estimation）と動き補償（ＭＣ;Motion Compensation）を行った後に時間方向にリフティング構成のウェーブレット変換を用いて帯域分割する方法である。 In this Patent Document 1, “motion compensation time direction filtering (hereinafter referred to as MCTF; Motion Compensation Temporal Filtering)” has been proposed. This is a method of performing band division using a lifting wavelet transform in the time direction after performing motion estimation (ME) and motion compensation (MC) like H.264 / AVC.

図１１にＭＣＴＦの一例として５／３リフティング構成のものを示す。また、ＭＣＴＦの手順の流れの例を図１２に示す。この従来の映像信号符号化方法の一例であるＭＣＴＦの一例について、図１１及び図１２と共に説明するに、まず、図１１に示すオリジナルの映像信号２０１を例えば８フレーム毎の単位（以下ＧＯＰ（Group Of Picture）と表記）で処理を行うために蓄積する（図１２のステップＳ５０１）。ここで、映像信号２０１は図１１に示すように、フレームＳ（ｎ−２）〜フレームＳ（ｎ＋５）の連続する８フレームからなる。 FIG. 11 shows a 5/3 lifting configuration as an example of MCTF. An example of the MCTF procedure flow is shown in FIG. An example of the MCTF, which is an example of this conventional video signal encoding method, will be described with reference to FIGS. 11 and 12. First, the original video signal 201 shown in FIG. Is stored for processing (step S501 in FIG. 12). Here, as shown in FIG. 11, the video signal 201 is composed of eight consecutive frames of frame S (n−2) to frame S (n + 5).

図１１のオリジナルの映像信号２０１のうち、Ｓ（ｎ）フレームを処理対象フレーム（偶数）とした場合、Ｓ（ｎ）フレームの前後の奇数フレームであるＳ（ｎ−１）フレームとＳ（ｎ＋１）フレームをＳ（ｎ）フレームの位相に合わせるようにそれぞれＭＥ／ＭＣを行う（図１２のステップＳ５０２）。 In the original video signal 201 of FIG. 11, when the S (n) frame is a processing target frame (even number), the S (n−1) frame and the S (n + 1) which are odd frames before and after the S (n) frame. ) ME / MC is performed so that the frame matches the phase of the S (n) frame (step S502 in FIG. 12).

そして、ＭＥ／ＭＣ処理したＳ（ｎ−１）フレーム及びＳ（ｎ＋１）フレームと、Ｓ（ｎ）フレームとに対してフィルタ処理する（図１２のステップＳ５０３）。ステップＳ２０２及びステップＳ２０３の処理をここではプレディクション処理（予測処理）と呼ぶ。プレディクション処理の結果得られるフレームがＨ_１（ｎ）（高周波数成分に分割された）フレーム２０２である。この処理をＧＯＰ内の連続する他の偶数フレームにも同様に行う（ステップＳ２０４、Ｓ２０２、Ｓ２０３）。 Then, the ME / MC-processed S (n−1) frame, S (n + 1) frame, and S (n) frame are filtered (step S503 in FIG. 12). The process of step S202 and step S203 is called a prediction process (prediction process) here. A frame obtained as a result of the prediction process is an H ₁ (n) frame 202 (divided into high frequency components). This process is similarly performed for other consecutive even frames in the GOP (steps S204, S202, and S203).

これにより、図１１に２０２で示すように、Ｈ_１（ｎ）フレーム以外に、Ｈ_１（ｎ−２）、Ｈ_１（ｎ＋２）、Ｈ_１（ｎ＋４）フレームが生成される。これらのＨ_１（ｎ−２）、Ｈ_１（ｎ）、Ｈ_１（ｎ＋２）、Ｈ_１（ｎ＋４）フレームは、１ＧＯＰのオリジナルの映像信号２０１を、２つの分割周波数帯域にサブバンド分割することで得られる低域側分割周波数帯域のＬ成分フレームと高域側分割周波数帯域のＨ成分フレームのうちのＨ成分フレーム（以下、Ｈフレームともいう）である。 Thus, as indicated by 202 in FIG. _11, in addition to _H 1 (n) _{_{frame, H 1 (n-2)}} , H 1 (n + 2), H 1 (n + 4) frame is generated. In these H ₁ (n−2), H ₁ (n), H ₁ (n + 2), and H ₁ (n + 4) frames, the original video signal 201 of 1 GOP is divided into two divided frequency bands. The H component frame (hereinafter also referred to as the H frame) of the L component frame in the low frequency division frequency band and the H component frame in the high frequency division frequency band obtained in the above.

次に、Ｓ（ｎ−１）フレームを処理対象フレーム（奇数）とした場合、そのＳ（ｎー１）フレームの前後のプレディクションしたフレーム（偶数）であるＨ_１（ｎ）フレームとＨ_１（ｎ−２）フレームとを、Ｓ（ｎ−１）フレームの位相に合わせるように逆ＭＣして（図１２のステップＳ５０５）、図１１に２０３で示すように逆ＭＣしたＨ_１（ｎ）フレームとＨ_１（ｎ−２）フレーム、及びＳ（ｎ−１）フレームにフィルタ処理をする（図１２のステップＳ５０６）。このフィルタ処理のことをここではアップデート処理（更新処理）と呼ぶ。 Next, when an S (n−1) frame is a processing target frame (odd number), a pre-predicted frame (even number) H ₁ (n) frame and H ₁ before and after the S (n−1) frame. H ₁ (n) obtained by performing inverse MC on the (n−2) frame so as to match the phase of the S (n−1) frame (step S505 in FIG. 12) and performing inverse MC as indicated by 203 in FIG. Filter processing is performed on the frame, the H ₁ (n−2) frame, and the S (n−1) frame (step S506 in FIG. 12). This filter processing is referred to as update processing (update processing) here.

このアップデート処理の結果得られるＬ_１（ｎ−１）フレーム２０３は時間方向の低域成分である。なお、Ｓ（ｎ−１）フレーム以外にも、同様の処理を他の奇数フレームであるＳ（ｎ＋１）フレーム、Ｓ（ｎ＋３）フレームに対しても行う（図１２のステップＳ５０７、Ｓ２０５、Ｓ２０６）。 The L ₁ (n−1) frame 203 obtained as a result of this update process is a low-frequency component in the time direction. In addition to the S (n−1) frame, the same processing is performed for other odd frames, ie, the S (n + 1) frame and the S (n + 3) frame (steps S507, S205, and S206 in FIG. 12). .

この結果、図１１に２０３で示すように時間方向の低域成分であるＬ_１（ｎ＋１）フレーム、Ｌ_１（ｎ＋３）フレーム、Ｌ_１（ｎ＋５）フレームも生成される。これらのＬ_１（ｎ−１）、Ｌ_１（ｎ＋１）、Ｌ_１（ｎ＋３）、Ｌ_１（ｎ＋５）フレームは、１ＧＯＰのオリジナルの映像信号２０１を、２つの分割周波数帯域にサブバンド分割することで得られる低域側分割周波数帯域のＬ成分フレームと高域側分割周波数帯域のＨ成分フレームのうちのＬ成分フレーム（以下、Ｌフレームともいう）である。 As a result, as indicated by 203 in FIG. 11, L ₁ (n + 1) frames, L ₁ (n + 3) frames, and L ₁ (n + 5) frames, which are low-frequency components in the time direction, are also generated. In these L ₁ (n−1), L ₁ (n + 1), L ₁ (n + 3), and L ₁ (n + 5) frames, the original video signal 201 of 1 GOP is divided into two divided frequency bands. The L component frame (hereinafter also referred to as L frame) of the L component frame in the low frequency division frequency band and the H component frame in the high frequency division frequency band obtained in the above.

以上の処理をステップＳ２０８で残りのＬフレームが１枚か、又はＭＣＴＦを途中で終了すると判定するまで、ＭＣＴＦを繰り返し行い（図１２のステップＳ５０２〜Ｓ５０７）、時間方向に帯域分割して低域信号を取り出すことで、フレームレートの低い映像信号を生成する。 The above processing is repeated until it is determined in step S208 that there is one remaining L frame or the MCTF is terminated halfway (steps S502 to S507 in FIG. 12), and the frequency band is divided into low frequencies. By extracting the signal, a video signal having a low frame rate is generated.

すなわち、２回目のＭＣＴＦでは４つのＬフレーム２０３（Ｌ_１（ｎ−１）、Ｌ_１（ｎ＋１）、Ｌ_１（ｎ＋３）、Ｌ_１（ｎ＋５））のうちの２フレームを単位として用いてプレディクション処理により図１１に２０４で示す２つのＨフレームＨ_２（ｎ−１）とＨ_２（ｎ＋３）とを生成すると共に、アップデート処理により図１１に２０５で示す２つのＬフレームＬ_２（ｎ＋１）、Ｌ_２（ｎ＋５）とを生成する。更に、３回目のＭＣＴＦでは２つのＬフレーム２０５（Ｌ_２（ｎ＋１）、Ｌ_２（ｎ＋５））を用いて図１１に２０６で示すＨフレームＨ_３（ｎ−１）と２０７で示すＬフレームＬ_３（ｎ＋５）をそれぞれ生成する。 That is, in the second MCTF, the pre-ready is performed using 2 frames out of 4 L frames 203 (L ₁ (n−1), L ₁ (n + 1), L ₁ (n + 3), L ₁ (n + 5)) as a unit. 11 generates two H frames H ₂ (n−1) and H ₂ (n + 3) indicated by 204 in FIG. 11, and two L frames L ₂ (n + 1) indicated by 205 in FIG. 11 by update processing. , L ₂ (n + 5). Furthermore, in the third MCTF, two L frames 205 (L ₂ (n + 1), L ₂ (n + 5)) are used, and an H frame H ₃ (n−1) indicated by 206 in FIG. ₃ (n + 5) is generated respectively.

ここで、ＭＣＴＦを繰り返して実施する場合、図１１から分かるように、２回目のＭＣＴＦはプレディクション処理対象フレームが偶数フレームから奇数フレームへと変わり、アップデート処理対象フレームが奇数フレームから偶数フレームとなり、それ以降もＭＣＴＦの回数を重ねる度に対象フレームが変わることに注意されたい。 Here, when MCTF is repeatedly performed, as can be seen from FIG. 11, in the second MCTF, the prediction processing target frame changes from an even frame to an odd frame, the update processing target frame changes from an odd frame to an even frame, Note that the target frame changes each time the number of MCTFs is repeated.

このようにして、最終的には、図１３に３０１で示すように、1つのＬフレームＬ_３（ｎ＋５）と７つのＨフレームＨ_１（ｎ−２）、Ｈ_２（ｎ−１）、Ｈ_１（ｎ）、Ｈ_３（ｎ＋１）、Ｈ_１（ｎ＋２）、Ｈ_２（ｎ＋３）、Ｈ_１（ｎ＋４）とからなる１ＧＯＰ分の係数フレーム列を得ることで、１ＧＯＰのフレーム間符号化が完了する。 In this way, finally, as indicated by 301 in FIG. 13, one L frame L ₃ (n + 5) and seven H frames H ₁ (n−2), H ₂ (n−1), H By obtaining a coefficient frame sequence for 1 GOP consisting of ₁ (n), H ₃ (n + 1), H ₁ (n + 2), H ₂ (n + 3), and H ₁ (n + 4), inter-frame coding of 1 GOP is completed. To do.

デコード側で高レートの映像信号を再生する場合には，ＭＣＴＦの逆の手順で逆アップデート処理及び逆プレディクション処理を繰り返し、低域信号に高域信号を合成すればよい。 When a high-rate video signal is reproduced on the decoding side, the reverse update process and the reverse prediction process are repeated in the reverse procedure of MCTF to synthesize the high-frequency signal with the low-frequency signal.

更に、特許文献１には、時間方向のみならず、空間方向にも例えばウェーブレットやＤＣＴを用いて帯域分割する方法が示されている。このことにより、デコード側ではデコーダやディスプレイなどの性能に合わせて任意の解像度及びフレームレートを選択して受信することが可能となっている。なお、特許文献１の方法における各帯域の信号は、各帯域の相関関係は利用されずにそれぞれ独立に符号化される。 Further, Patent Document 1 discloses a method of performing band division using, for example, a wavelet or DCT not only in the time direction but also in the spatial direction. As a result, on the decoding side, it is possible to select and receive an arbitrary resolution and frame rate in accordance with the performance of the decoder and display. In addition, the signal of each band in the method of Patent Document 1 is encoded independently without using the correlation of each band.

一方、現行の高性能な符号化方式であるＨ．２６４／ＡＶＣを拡張したものに上記のＭＣＴＦを組み合わせることでより高効率、かつ、空間解像度、フレームレート及びＳＮＲそれぞれのスケーラビリティ機能をもった符号化を実現した符号化方法が知られている（例えば、非特許文献１参照）。この非特許文献１記載の符号化装置は、特許文献１記載の符号化装置とは異なり、空間−時間解像度間の相関を利用して符号化を行うことで高い符号化効率を表現している。 On the other hand, the current high-performance encoding method is H.264. An encoding method that realizes encoding with higher efficiency and scalability functions of spatial resolution, frame rate, and SNR by combining the above MCTF with an extension of H.264 / AVC is known (for example, Non-Patent Document 1). Unlike the encoding apparatus described in Patent Document 1, the encoding apparatus described in Non-Patent Document 1 expresses high encoding efficiency by performing encoding using the correlation between space-time resolution. .

図１４は上記の非特許文献１記載の符号化部１０と復号化部３０の一例のブロック図を示す。同図において、符号化部１０にはオリジナルの映像信号が入力され、符号化部１０で生成されたビットストリームが通信回線またはメディア２０を介して復号化部３０に伝送される。復号化部３０では供給されたビットストリームから必要な情報を取り出して、ディスプレイ等の性能に合った空間解像度、フレームレート、ＳＮＲのデコード映像信号を出力する。 FIG. 14 is a block diagram illustrating an example of the encoding unit 10 and the decoding unit 30 described in Non-Patent Document 1. In the figure, an original video signal is input to the encoding unit 10, and the bit stream generated by the encoding unit 10 is transmitted to the decoding unit 30 via the communication line or the medium 20. The decoding unit 30 extracts necessary information from the supplied bit stream and outputs a decoded video signal having a spatial resolution, a frame rate, and an SNR suitable for the performance of a display or the like.

符号化部１０は、時間−空間デシメーション部１１、ベースレイヤエンコード部１２、ベースレイヤリコンストラクト部１３、エンハンスメントレイヤエンコード部１４及び多重化部１５から構成される。 The encoding unit 10 includes a time-space decimation unit 11, a base layer encoding unit 12, a base layer reconstructing unit 13, an enhancement layer encoding unit 14, and a multiplexing unit 15.

空間−時間デシメーション部１１は、オリジナルの映像信号を入力として受け付け、入力された映像信号を所望の空間解像度に空間解像度デシメーション（縮小）する機能を有する。ここで、空間解像度デシメーションの方法には、空間フィルタによる方法やウェーブレット変換などが用いられる。また、空間−時間デシメーション部１１は、所望の空間解像度に空間解像度デシメーションされた信号を所望のフレームレートに時間解像度デシメーションし、ベースレイヤエンコード部１２に出力する機能を有する。ここで、時間解像度デシメーションの方法には、ＭＣＴＦや単純フレーム間引きなどが用いられる。 The space-time decimation unit 11 has a function of receiving an original video signal as an input and decimation (reduction) the input video signal to a desired spatial resolution. Here, as a spatial resolution decimation method, a method using a spatial filter, wavelet transform, or the like is used. In addition, the space-time decimation unit 11 has a function of performing time resolution decimation on a signal subjected to spatial resolution decimation to a desired spatial resolution to a desired frame rate and outputting it to the base layer encoding unit 12. Here, MCTF, simple frame thinning, or the like is used as a method of temporal resolution decimation.

ベースレイヤエンコード部１２は、空間−時間デシメーション部１１の出力信号を入力として受け付け、入力された信号を符号化（エンコード）してビットストリームを生成し、多重化部１５へ出力する機能を有する。ここで、符号化の方法には、Ｈ.２６４などが用いられる。また、ベースレイヤエンコード部１２は、Ｈ.２６４等における直交変換・量子化まで処理を行った信号をベースレイヤリコンストラクト部１３へ出力する機能を有する。 The base layer encoding unit 12 has a function of receiving the output signal of the space-time decimation unit 11 as an input, encoding (encoding) the input signal, generating a bit stream, and outputting the bit stream to the multiplexing unit 15. Here, H.264 or the like is used as the encoding method. In addition, the base layer encoding unit 12 has a function of outputting a signal that has been processed up to orthogonal transformation / quantization in H.264 or the like to the base layer restructuring unit 13.

ベースレイヤリコンストラクト部１３は、ベースレイヤエンコード部１２において、Ｈ.２６４などの規格にて量子化までの処理を行わせた信号を入力として受け付ける機能と、その入力信号をリコンストラク卜して局部復号画像信号であるローカルデコード信号を生成し、エンハンスメントレイヤエンコード部１４へ出力する機能とを有する。 The base layer reconstructing unit 13 has a function of accepting, as an input, a signal obtained by performing processing up to quantization in a standard such as H.264 in the base layer encoding unit 12, and reconstructing the input signal to locally It has a function of generating a local decode signal which is a decoded image signal and outputting it to the enhancement layer encoding unit 14.

エンハンスメントレイヤエンコード部１４は、オリジナルの映像信号とベースレイヤリコンストラクト部１３より入力されるローカルデコード信号とを入力として受け付ける機能と、これらの入力信号を用いて、時間−空間−ＳＮＲスケーラビリティを実現するための信号の符号化を行い、それにより得た符号化信号（ビットストリーム）を多重化部１５に出力する機能とを有する。詳細については後述する。 The enhancement layer encoding unit 14 realizes time-space-SNR scalability by using a function of accepting an original video signal and a local decode signal input from the base layer reconstructing unit 13 as inputs, and these input signals. And a function of outputting the encoded signal (bit stream) obtained thereby to the multiplexing unit 15. Details will be described later.

多重化部１５は、ベースレイヤエンコード部１２及びエンハンスメントレイヤエンコード部１４より出力されるそれぞれのビットストリームを入力として受け付け、多重化して一つのビットストリームを生成し、符号化部１０の外部、例えば通信回線やメディア２０へ出力する機能を有する。 The multiplexing unit 15 receives each bit stream output from the base layer encoding unit 12 and the enhancement layer encoding unit 14 as an input and multiplexes to generate one bit stream. It has a function of outputting to a line or media 20.

復号化部３０は、エクストラクト部３１、ベースレイヤデコード部３２及びエンハンスメントレイヤデコード部３３から構成される。エクストラクト部３１は、ビットストリームを入力として受け付ける機能と、復号化部３０またはディスプレイ等の性能に合わせて、ビットストリーム全体から復号に必要なものを切り出し、分割してそれぞれをベースレイヤデコード部３２及びエンハンスメントレイヤデコード部３３に出力する機能とを有する。 The decoding unit 30 includes an extract unit 31, a base layer decoding unit 32, and an enhancement layer decoding unit 33. The extract unit 31 cuts out and divides the necessary bit stream from the entire bit stream according to the function of accepting the bit stream as input and the performance of the decoding unit 30 or the display, and divides each of them into the base layer decoding unit 32 And a function of outputting to the enhancement layer decoding unit 33.

ベースレイヤデコード部３２は、エクストラクト部３１で切り出されたベースレイヤのビットストリームを入力として受け付ける機能と、入力されたビットストリームを復号し、デコード映像信号をエンハンスメントレイヤデコード部３３と必要に応じてディスプレイ等への出力を行う機能とを有する。ここで、復号にはＨ.２６４デコーダなどを用いる。 The base layer decoding unit 32 receives a base layer bit stream extracted by the extract unit 31 as an input, decodes the input bit stream, and outputs a decoded video signal to the enhancement layer decoding unit 33 as necessary. A function of outputting to a display or the like. Here, an H.264 decoder or the like is used for decoding.

エンハンスメントレイヤデコード部３３は、エクストラクト部３１から得られるビットストリーム及びベースレイヤデコード部３２から出力されるデコード映像信号を入力として受け付ける機能と、入力される信号を用いて、ディスプレイ等の性能に合わせた空間解像度、フレームレート、ＳＮＲを実現する映像信号を復号する機能とを有する。復号された映像信号は、ディスプレイ等へ出力される。なお、エンハンスメントレイヤデコード部３３の詳細については後述する。 The enhancement layer decoding unit 33 uses the function of receiving the bit stream obtained from the extract unit 31 and the decoded video signal output from the base layer decoding unit 32 as input, and the input signal to match the performance of the display or the like. And a function of decoding a video signal that realizes a spatial resolution, a frame rate, and an SNR. The decoded video signal is output to a display or the like. Details of the enhancement layer decoding unit 33 will be described later.

次に、図１４に示した符号化部の構成例を用いて映像信号をスケーラブル符号化する手順について図１５のフローチャートと共に説明する。符号化部１０は、まず、オリジナルの映像信号を空間−時間デシメーション部１１において、空間解像度のデシメーションを行う（図１５のステップＳ６０１）。ここで、デシメーション後の信号の空間解像度が、解像度スケーラビリティにおいて最も低い解像度となるようにデシメーションする。次に、時間方向のデシメーションを行う（図１５のステップＳ６０２）。ここで、空間解像度と同様に、フレームレートのスケーラビリティにおいて最も低いフレームレートとなるように時間解像度デシメーションする。 Next, a procedure for scalable encoding of a video signal using the configuration example of the encoding unit shown in FIG. 14 will be described with reference to the flowchart of FIG. First, the encoding unit 10 performs a spatial resolution decimation on the original video signal in the space-time decimation unit 11 (step S601 in FIG. 15). Here, decimation is performed so that the spatial resolution of the signal after decimation becomes the lowest resolution in resolution scalability. Next, decimation in the time direction is performed (step S602 in FIG. 15). Here, similarly to the spatial resolution, temporal resolution decimation is performed so that the frame rate has the lowest frame rate scalability.

次に、上記の空間解像度及びフレームレートをデシメーションした信号を、ベースレイヤエンコード部１２を用いて符号化し、ビットストリームを生成する（図１５のステップＳ６０３）。続いて、生成されたビットストリームを多重化部１５へ送り、符号化過程で得られる直交変換係数量子化データをベースレイヤリコンストラクト部１３へ送る。 Next, the signal obtained by decimating the spatial resolution and the frame rate is encoded using the base layer encoding unit 12 to generate a bit stream (step S603 in FIG. 15). Subsequently, the generated bit stream is sent to the multiplexing unit 15, and the orthogonal transform coefficient quantized data obtained in the encoding process is sent to the base layer reconstructing unit 13.

続いて、ベースレイヤの符号化過程で得られた直交変換係数量子化データをベースレイヤリコンストラクト部１３においてリコンストラクトし、ローカルデコード信号を生成する（図１５のステップＳ６０４）。そして、生成したローカルデコード信号をエンハンスメントレイヤエンコード部１４に送る。 Subsequently, the orthogonal transform coefficient quantized data obtained in the base layer encoding process is reconstructed in the base layer reconstructing unit 13 to generate a local decoded signal (step S604 in FIG. 15). Then, the generated local decode signal is sent to the enhancement layer encoding unit 14.

続いて、オリジナルの映像信号と生成されたベースレイヤローカルデコード信号を用いて、エンハンスメントレイヤエンコード部１４において時間−空間−ＳＮＲスケーラビリティを実現するための信号の符号化を行う（図１５のステップＳ６０５）。詳細については後述する。符号化により生成されたビットストリームは、多重化部１５へ送られる。 Subsequently, using the original video signal and the generated base layer local decode signal, the enhancement layer encoding unit 14 encodes a signal for realizing time-space-SNR scalability (step S605 in FIG. 15). . Details will be described later. The bit stream generated by encoding is sent to the multiplexing unit 15.

そして、ベースレイヤエンコード部１２及びエンハンスメントレイヤエンコード部１４より得られたそれぞれのビットストリームを多重化部１５において、多重化を行い、一つのビットストリームを生成する（図１５のステップＳ６０６）。 Then, the respective bit streams obtained from the base layer encoding unit 12 and the enhancement layer encoding unit 14 are multiplexed by the multiplexing unit 15 to generate one bit stream (step S606 in FIG. 15).

次に、図１４に示した復号化部３０の構成例を用いてスケーラブル構成のビットストリームを復号してデコード映像信号を得る手順を図１６のフローチャートと共に説明する。図１４の通信回線やメディア等２０からビットストリームを、エクストラクト部３１を用いて受信すると、エクストラクト部３１は受信したビットストリームを解析し、復号化部３０及びディスプレイ等の性能に合わせて必要な符号データを抽出する。そして、ベースレイヤデコード部３２、エンハンスメントレイヤデコード部３３それぞれに対応したデータに分割して出力する（図１６のステップＳ７０１）。 Next, a procedure for obtaining a decoded video signal by decoding a scalable bit stream using the configuration example of the decoding unit 30 shown in FIG. 14 will be described with reference to the flowchart of FIG. When the bit stream is received from the communication line 20 or the media 20 of FIG. 14 using the extract unit 31, the extract unit 31 analyzes the received bit stream and needs to match the performance of the decoding unit 30 and the display. Extract code data. And it divides | segments into the data corresponding to each of the base layer decoding part 32 and the enhancement layer decoding part 33, and outputs them (step S701 of FIG. 16).

続いて、エクストラクト部３１で分割したベースレイヤに対応するデータが、ベースレイヤデコード部３２で復号される（図１６のステップＳ７０２）。復号されたベースレイヤデコード映像信号は、エンハンスメントレイヤデコード部３３に出力され、必要があればディスプレイ等にも出力される。 Subsequently, the data corresponding to the base layer divided by the extract unit 31 is decoded by the base layer decoding unit 32 (step S702 in FIG. 16). The decoded base layer decoded video signal is output to the enhancement layer decoding unit 33 and, if necessary, output to a display or the like.

続いて、エクストラクト部３１で分割したエンハンスメントレイヤに対応するデータ及びベースレイヤデコード部３２で復号したベースレイヤデコード映像信号がエンハンスメントレイヤデコード部３３において復号される（図１６のステップＳ７０３）。そして、復号したデコード映像信号をディスプレイ等へ出力する。エンハンスメントレイヤのデコード手順についての詳細は後述する。 Subsequently, the data corresponding to the enhancement layer divided by the extract unit 31 and the base layer decoded video signal decoded by the base layer decoding unit 32 are decoded by the enhancement layer decoding unit 33 (step S703 in FIG. 16). Then, the decoded decoded video signal is output to a display or the like. Details of the enhancement layer decoding procedure will be described later.

図１７は、図１４のエンハンスメントレイヤエンコード部１４の一例のブロック図を示す。図１７に示すように、エンハンスメントレイヤエンコード部１４は、ＭＣＴＦ−ＩＬＰ部１４１、直交変換・量子化部１４２及びエントロピー符号化部１４３から構成される。 FIG. 17 shows a block diagram of an example of the enhancement layer encoding unit 14 of FIG. As shown in FIG. 17, the enhancement layer encoding unit 14 includes an MCTF-ILP unit 141, an orthogonal transform / quantization unit 142, and an entropy encoding unit 143.

ＭＣＴＦ−ＩＬＰ（MCTF-Inter Layer Prediction）部１４１は、オリジナルの映像信号とベースレイヤのローカルデコード信号を入力として受け付ける機能と、入力される信号を用いて時間−空間スケーラビリティを実現する信号を生成し、映像情報を直交変換・量子化部１４２、動き情報をエントロピー符号化部１４３へ出力する機能とを有する。動作の詳細については後述する。 An MCTF-ILP (MCTF-Inter Layer Prediction) unit 141 generates a signal that realizes time-space scalability using an input signal and a function of receiving an original video signal and a base layer local decode signal as inputs. , And a function of outputting the video information to the orthogonal transform / quantization unit 142 and the motion information to the entropy encoding unit 143. Details of the operation will be described later.

直交変換・量子化部１４２は、映像情報を入力として受け付け、直交変換して量子化し、エントロピー符号化部１４３へ出力する機能を有する。直交変換はＤＣＴやアダマール変換、ウェーブレット変換などが代表例である。エントロピー符号化部１４３は、量子化された映像情報と動き情報を入力として受け付け、それぞれをエントロピー符号化し、エンハンスメントレイヤエンコード部１４の外部へ出力する機能を有する。ここで、エントロピー符号化はＳＮＲスケーラビリティが実現できるように工夫されている。 The orthogonal transform / quantization unit 142 has a function of receiving video information as input, performing orthogonal transform and quantization, and outputting the quantized image to the entropy encoding unit 143. Typical examples of the orthogonal transform include DCT, Hadamard transform, and wavelet transform. The entropy encoding unit 143 has a function of accepting quantized video information and motion information as inputs, entropy encoding each of them, and outputting them to the outside of the enhancement layer encoding unit 14. Here, the entropy coding is devised so that SNR scalability can be realized.

次に、図１７に示したエンハンスメントレイヤエンコード部１４の構成例を用いてエンハンスメントレイヤを符号化する手順について図１８のフローチャートと共に説明する。オリジナルの映像信号とベースレイヤのローカルデコード信号を用いて、ＭＣＴＦ−ＩＬＰ部１４１において、時間−空間スケーラビリティを実現する信号を生成する（図１８のステップＳ８０１）。手順の詳細については後述する。この結果得られた映像情報を直交変換・量子化部１４２に送る。動き情報は、エントロピー符号化部１４３に送られる。 Next, a procedure for encoding an enhancement layer using the configuration example of the enhancement layer encoding unit 14 shown in FIG. 17 will be described with reference to the flowchart of FIG. Using the original video signal and the base layer local decode signal, the MCTF-ILP unit 141 generates a signal that realizes time-space scalability (step S801 in FIG. 18). Details of the procedure will be described later. The video information obtained as a result is sent to the orthogonal transform / quantization unit 142. The motion information is sent to the entropy encoding unit 143.

ＭＣＴＦ−ＩＬＰ部１４１において生成した時間−空間スケーラビリティを実現する映像情報信号は直交変換・量子化部１４２で直交変換後に量子化される（図１８のステップＳ８０２）。直交変換及び量子化した信号はエントロピー符号化部１４３へ送られる。 The video information signal realizing the time-space scalability generated in the MCTF-ILP unit 141 is quantized after the orthogonal transformation by the orthogonal transformation / quantization unit 142 (step S802 in FIG. 18). The orthogonally transformed and quantized signal is sent to the entropy coding unit 143.

ＭＣＴＦ−ＩＬＰ部１４１で得た動き情報と直交変換・量子化部１４２から出力される映像情報とは、エントロピー符号化部１４３においてエントロピー符号化され、ビットストリームが生成される（図１８のステップＳ８０３）。 The motion information obtained by the MCTF-ILP unit 141 and the video information output from the orthogonal transform / quantization unit 142 are entropy encoded by the entropy encoding unit 143 to generate a bit stream (step S803 in FIG. 18). ).

次に、図１４のエンハンスメントレイヤデコード部３３について説明する。図１９はエンハンスメントレイヤデコード部３３の一例のブロック図を示す。図１９において、エンハンスメントレイヤデコード部３３は、エントロピー復号化部３３１、逆直交変換・逆量子化部３３２及び逆ＭＣＴＦ−ＩＬＰ部３３３から構成される。 Next, the enhancement layer decoding unit 33 in FIG. 14 will be described. FIG. 19 is a block diagram illustrating an example of the enhancement layer decoding unit 33. In FIG. 19, the enhancement layer decoding unit 33 includes an entropy decoding unit 331, an inverse orthogonal transform / inverse quantization unit 332, and an inverse MCTF-ILP unit 333.

エントロピー復号化部３３１は、ビットストリームを入力として受け付け、復号して映像情報を逆量子化・逆直交変換部３３２、動き情報を逆ＭＣＴＦ−ＩＬＰ部３３３へ出力する機能を有する。 The entropy decoding unit 331 has a function of receiving and decoding a bit stream as input and outputting video information to an inverse quantization / inverse orthogonal transform unit 332 and motion information to an inverse MCTF-ILP unit 333.

逆量子化・逆直交変換部３３２は、復号された映像情報を入力として受け付け、逆量子化した後逆直交変換を行って逆ＭＣＴＦ−ＩＬＰ部３３３へ出力する機能を有する。逆ＭＣＴＦ−ＩＬＰ部３３３は、逆直交変換された映像情報信号、ベースレイヤのローカルデコード信号及び復号された動き情報を入力として受け付け、入力された信号を用いて、時間方向の帯域合成と空間解像度間の逆予測を行って映像信号を復号し、エンハンスメントレイヤデコード部３３の外部へ出力する。動作の詳細については後述する。 The inverse quantization / inverse orthogonal transform unit 332 has a function of receiving decoded video information as an input, performing inverse quantization, performing inverse orthogonal transform, and outputting the result to the inverse MCTF-ILP unit 333. The inverse MCTF-ILP unit 333 accepts an inverse orthogonal transformed video information signal, a base layer local decode signal, and decoded motion information as inputs, and uses the input signals to perform band synthesis and spatial resolution in the time direction. The video signal is decoded by performing reverse prediction, and output to the outside of the enhancement layer decoding unit 33. Details of the operation will be described later.

次に、図１９に示したエンハンスメントレイヤデコード部３３の構成例を用いてエンハンスメントレイヤを復号化する手順を図２０のフローチャートと共に説明する。エンハンスメントレイヤデコード部３３は、入力されたビットストリームをエントロピー復号化部３３１において復号し（図２０のステップＳ１００１）、復号した結果得られる映像情報を逆量子化・逆直交変換部３３２へ送り、復号した動き情報を逆ＭＣＴＦ−ＩＬＰ部３３３に送る。 Next, a procedure for decoding the enhancement layer using the configuration example of the enhancement layer decoding unit 33 shown in FIG. 19 will be described with reference to the flowchart of FIG. The enhancement layer decoding unit 33 decodes the input bit stream in the entropy decoding unit 331 (step S1001 in FIG. 20), and sends the video information obtained as a result of the decoding to the inverse quantization / inverse orthogonal transform unit 332 for decoding. The motion information thus transmitted is sent to the inverse MCTF-ILP unit 333.

復号した映像情報は逆量子化・逆直交変換部３３２を用いて逆量子化された後、逆直交変換される（図２０のステップＳ１００２）。この結果得られる映像情報は逆ＭＣＴＦ−ＩＬＰ部３３３へと送られる。逆ＭＣＴＦ−ＩＬＰ部３３３は、逆直交変換された映像情報と、ベースレイヤのローカルデコード信号及び復号された動き情報とを用いて、時間方向の帯域合成と空間解像度間の逆予測を行い映像信号を復号する（図２０のステップＳ１００３）。手順の詳細については後述する。 The decoded video information is inversely quantized using the inverse quantization / inverse orthogonal transform unit 332 and then inversely orthogonal transformed (step S1002 in FIG. 20). The video information obtained as a result is sent to the inverse MCTF-ILP unit 333. The inverse MCTF-ILP unit 333 performs inverse prediction between temporal band synthesis and spatial resolution using the video information obtained by inverse orthogonal transformation, the local decode signal of the base layer, and the decoded motion information. Is decoded (step S1003 in FIG. 20). Details of the procedure will be described later.

次に、図１７のＭＣＴＦ−ＩＬＰ部１４１の動作例と処理手順の例について図２１と図２２とを用いて説明する。まず、図２１に示すオリジナルの映像信号１１０４を例えば８フレーム毎の単位（ＧＯＰ）で処理を行うために蓄積する（図２２のステップＳ１２０１）。ここで、映像信号１１０４は図２１に示すように、フレームＳ（ｎ−２）〜フレームＳ（ｎ＋５）の連続する８フレームからなる。また、ベースレイヤのフレームレートは、オリジナル映像信号の１／２である。ＭＣＴＦ−ＩＬＰ部１４１は、図２１からも分かるように、ＭＣＴＦにＩＬＰ（Inter Layer Prediction，解像度間予測）を拡張したものである。 Next, an operation example and a processing procedure example of the MCTF-ILP unit 141 in FIG. 17 will be described with reference to FIGS. 21 and 22. First, the original video signal 1104 shown in FIG. 21 is accumulated for processing, for example, in units of 8 frames (GOP) (step S1201 in FIG. 22). Here, as shown in FIG. 21, the video signal 1104 is composed of eight consecutive frames of frame S (n−2) to frame S (n + 5). Further, the frame rate of the base layer is ½ of the original video signal. As can be seen from FIG. 21, the MCTF-ILP unit 141 is an extension of ILP (Inter Layer Prediction, inter-resolution prediction) to MCTF.

まず、プレディクション処理の手順について説明する。図２１に示すベースレイヤのローカルデコード信号１１０３をオリジナルの映像信号１１０４と同じ空間解像度となるようにインターポレーション（拡大）する（図２１の１１０２、図２２のステップＳ１２０２）。そして、図２１のＳ（ｎ）フレームを処理対象フレーム（偶数）とした場合、その前後のＳ（ｎ−１）フレームとＳ（ｎ＋１）フレームをＳ（ｎ）フレームの位相に合わせるように、それぞれＭＥ／ＭＣを行う（図２２のステップＳ１２０３）。 First, the procedure of the prediction process will be described. The base layer local decode signal 1103 shown in FIG. 21 is interpolated (enlarged) so as to have the same spatial resolution as the original video signal 1104 (1102 in FIG. 21, step S1202 in FIG. 22). When the S (n) frame in FIG. 21 is a processing target frame (even number), the S (n−1) frame and the S (n + 1) frame before and after the frame are matched with the phase of the S (n) frame. Each performs ME / MC (step S1203 in FIG. 22).

このとき、従来のＭＣＴＦとは異なり、図２１に１１０３で示すベースレイヤのローカルデコード信号中のＳ’（ｎ）をインターポレーションした信号を第３のリファレンスフレームとして選択的に用いることができる。すなわち、ブロック毎に時間方向の動き補償を伴った帯域分割と空間解像度間の予測の符号化効率の良い方を適応的に採用することができる。ここで、解像度間の予測を用いた場合、そのブロックのベクトルは０とすることで、デコード側で判断可能な仕組みとなっている。 At this time, unlike the conventional MCTF, a signal obtained by interpolating S ′ (n) in the local decode signal of the base layer indicated by 1103 in FIG. 21 can be selectively used as the third reference frame. That is, it is possible to adaptively adopt the one with the better coding efficiency of the prediction between the band division accompanied with the temporal motion compensation for each block and the spatial resolution. Here, when prediction between resolutions is used, the vector of the block is set to 0 so that the decoding side can determine.

このようにベースレイヤの情報を選択的に用いてＭＣしたＳ（ｎ−１）フレームとＳ（ｎ＋１）フレーム及びＳ（ｎ）フレームにフィルタ処理を施して、図２１に示すＨ_１（ｎ）フレームを生成する（図２２のステップＳ１２０４）。以上のステップＳ１２０３及びＳ１２０４の処理をＧＯＰ内の他の偶数フレームに対しても行う（図２２のステップＳ１２０５）。 Filtering is performed on the S (n−1) frame, the S (n + 1) frame, and the S (n) frame that have been MCed by selectively using the base layer information as described above, and H ₁ (n) shown in FIG. A frame is generated (step S1204 in FIG. 22). The processes in steps S1203 and S1204 are also performed for other even frames in the GOP (step S1205 in FIG. 22).

次にアップデート処理の手順について説明する。図２１のＳ（ｎ−１）フレームを処理対象フレーム（奇数）とした場合、その前後のプレディクションしたＨ_１（ｎ）フレームとＨ_１（ｎ−２）フレームをＳ（ｎ−１）フレームの位相に合わせるように逆ＭＣして（図２２のステップＳ１２０６）、逆ＭＣしたＨ_１（ｎ）フレームとＨ_１（ｎ−２）フレーム、及びＳ（ｎ−１）フレームにフィルタ処理をする（図２２のステップＳ１２０８）。 Next, the update processing procedure will be described. When the S (n−1) frame in FIG. 21 is a processing target frame (odd number), the pre-predicted H ₁ (n) frame and the H ₁ (n−2) frame before and after the S (n−1) frame are S (n−1) frames. The inverse MC is performed to match the phase (step S1206 in FIG. 22), and the H ₁ (n) frame, the H ₁ (n−2) frame, and the S (n−1) frame subjected to the inverse MC are filtered. (Step S1208 in FIG. 22).

ここで、プレディクション処理におけるＭＣでベースレイヤの情報を用いて解像度間予測を行ったブロックは、時間方向のＨ成分ではないため、ステップＳ１２０６で逆ＭＣしたブロックがベースレイヤを採用したブロックであるとステップＳ１２０７で判定したときは、ステップＳ１２０８のフィルタ処理（時間方向のＬ成分を生成する処理）はスキップする。続いて、ＧＯＰ内の偶数フレームに対するアップデートが処理が終了したかどうか判定し（図２２のステップＳ１２０９）、終了していない場合は他の奇数フレームに対してもステップＳ１２０６〜Ｓ１２０８のアップデート処理を行う。 Here, since the block for which the inter-resolution prediction is performed using the base layer information in the MC in the prediction process is not the H component in the time direction, the block obtained by performing the inverse MC in step S1206 is the block in which the base layer is adopted. If it is determined in step S1207, the filtering process in step S1208 (processing for generating the L component in the time direction) is skipped. Subsequently, it is determined whether or not the update for the even frame in the GOP has been completed (step S1209 in FIG. 22). If not, the update process in steps S1206 to S1208 is performed for other odd frames. .

以上の処理をステップＳ１２１０で残りのＬフレームが１枚か、又はＭＣＴＦを途中で終了すると判定するまで、ＭＣＴＦを繰り返し行い（図２２のステップＳ１２０２〜Ｓ１２０９）、時間方向に帯域分割して低域信号を取り出すことで、フレームレートの低い映像信号を生成する。時間方向の帯域分割は上記のように繰り返し行うことができる。一方、解像度間の予測はベースレイヤとのフレームレートが一致している場合に利用できる。 The above processing is repeated in step S1210 until the number of remaining L frames is one, or until the MCTF is terminated halfway (steps S1202 to S1209 in FIG. 22), and the band is divided in the time direction to obtain a low frequency band. By extracting the signal, a video signal having a low frame rate is generated. Band division in the time direction can be repeated as described above. On the other hand, prediction between resolutions can be used when the frame rate of the base layer matches.

次に、図１９の逆ＭＣＴＦ−ＩＬＰ部３３３の動作例と処理手順の例について図２３と図２４を用いて説明する。図２３の例は、オリジナルの映像信号のＧＯＰサイズが”８”、図２３に１３０４で示すように符号化側のＭＣＴＦ−ＩＬＰが１階分解、１３０３で示すようにベースレイヤのフレームレートがオリジナルの１／２の条件で符号化された信号を復元するものである。まず、図２３に示すオリジナルの映像信号１３０４を例えば８フレーム毎の単位（ＧＯＰ）で処理を行うために蓄積する（図２４のステップＳ１４０１）。 Next, an operation example and a processing procedure example of the inverse MCTF-ILP unit 333 in FIG. 19 will be described with reference to FIGS. 23 and 24. In the example of FIG. 23, the GOP size of the original video signal is “8”, the MCTF-ILP on the encoding side is first-order decomposition as indicated by 1304 in FIG. That is, a signal encoded under 1/2 of the above condition is restored. First, the original video signal 1304 shown in FIG. 23 is stored for processing, for example, in units of 8 frames (GOP) (step S1401 in FIG. 24).

次に、図２３に示すベースレイヤのデコード信号１３０３をエンハンスメントレイヤの信号１３０４と同じ空間解像度となるように、１３０２で示すように予めインターポレーションしておく（図２４のステップＳ１４０２）。続いて、逆アップデート処理を行う。まず、図２３のＳ（ｎ−１）フレームを復元する場合、Ｈ_１（ｎ−２）フレームとＨ_１（ｎ）フレームをＳ（ｎ−１）フレームの位相に合わせるようにそれぞれ逆ＭＣして（図２４のステップＳ１４０３）、逆ＭＣしたＨ_１（ｎ−２）フレームとＨ_１（ｎ）フレーム、及びＬ_１（ｎ−１）フレームにフィルタ処理をする（図２４のステップＳ１４０５）。 Next, the base layer decoded signal 1303 shown in FIG. 23 is interpolated in advance as shown by 1302 so as to have the same spatial resolution as the enhancement layer signal 1304 (step S1402 in FIG. 24). Subsequently, reverse update processing is performed. First, when restoring the S (n-1) frame of FIG. 23, the inverse MC is performed so that the H ₁ (n-2) frame and the H ₁ (n) frame are matched with the phase of the S (n-1) frame. (Step S1403 in FIG. 24), the H ₁ (n-2) frame, the H ₁ (n) frame, and the L ₁ (n-1) frame subjected to inverse MC are filtered (Step S1405 in FIG. 24).

ここで、逆ＭＣには、エントロピー復号化部３３１で復号した（図２０のステップＳ１００１）動き情報を用い、ベクトルが”０”であるブロックにはフィルタ処理は行わない（図２４のステップＳ１４０４）。同様の処理をＧＯＰ内の同じテンポラルレベルの他のＬフレームに対しても行う（図２４のステップＳ１４０６）。 Here, for the inverse MC, the motion information decoded by the entropy decoding unit 331 (step S1001 in FIG. 20) is used, and the filtering process is not performed on the block whose vector is “0” (step S1404 in FIG. 24). . Similar processing is performed for other L frames of the same temporal level in the GOP (step S1406 in FIG. 24).

次に、逆プレディクション処理の手順について説明する。図２３のＳ（ｎ）フレームを復元する場合、処理対象のＳ（ｎ）フレームの前後の逆アップデートしたＳ(n-1)フレームとＳ(n+1)フレームをＨ_１（ｎ）フレームの位相に合わせるようにそれぞれＭＣする（図２４のステップＳ１４０７）。ここで、ベクトルが”０”のブロックは、ベースレイヤデコード信号Ｓ’(n)をインターポレーションした信号を適用する。 Next, the procedure of reverse prediction processing will be described. When the S (n) frame in FIG. 23 is restored, the reversely updated S (n−1) frame and S (n + 1) frame before and after the S (n) frame to be processed are replaced with the H ₁ (n) frame. MC is performed to match the phase (step S1407 in FIG. 24). Here, a signal obtained by interpolating the base layer decoded signal S ′ (n) is applied to a block whose vector is “0”.

そして、ＭＣしたＳ(n-1)フレームとＳ(n+1)フレーム及びＨ_１(ｎ)フレームにフィルタ処理をすることで、Ｓ（ｎ）フレームを得る（図２４のステップＳ１４０８）。同様の処理をＧＯＰ内の同じテンポラルレベルの他のＨフレームに対しても行う（図２４のステップＳ１４０９）。 Then, the S (n-1) frame, the S (n + 1) frame, and the H ₁ (n) frame that have been MCed are filtered to obtain an S (n) frame (step S1408 in FIG. 24). Similar processing is performed for other H frames of the same temporal level in the GOP (step S1409 in FIG. 24).

符号化側で、さらに複数階層のＭＣＴＦ−ＩＬＰを行った場合には、逆アップデート処理と逆プレディクンョン処理をさらに繰り返してデコード映像信号を得る（図２４のステップＳ１４１０）。 When MCTF-ILP of a plurality of hierarchies is performed on the encoding side, the reverse update process and the reverse prediction process are further repeated to obtain a decoded video signal (step S1410 in FIG. 24).

特表２００４−５２４７３０号公報JP-T-2004-524730 Diego Santa-Cruz,Davide Maestroni,Francesco Ziliani,Julien Reichel and Stefano Tubaro,”Improved Scalable MCTF Video Codec Using A H.264/AVC Base Layer”,Picture Coding Symposium 2004,Dec.2004.Diego Santa-Cruz, Davide Maestroni, Francesco Ziliani, Julien Reichel and Stefano Tubaro, “Improved Scalable MCTF Video Codec Using A H.264 / AVC Base Layer”, Picture Coding Symposium 2004, Dec. 2004.

上記の非特許文献１記載の映像信号符号化方式は、時間方向の帯域分割であるＭＣＴＦと解像度間の予測ＩＬＰを選択的に用いることで時間−空間スケーラビリティを高符号化効率で実現した優れた符号化方式である。しかしながら、この符号化方式ではデコード映像を主観的に評価した場合にちらつきが生じてしまうという問題がある。これについては、非特許文献１の中でも”a flash of artifacts can appear”と指摘されており、改善を要する課題である。
非特許文献１記載の映像信号符号化方式では、プレディクション処理（予測処理）においてベースレイヤローカルデコード信号を採用したブロックにはアップデート処理（更新処理）でフィルタ処理を施さない。このため、Ｌフレームにフィルタ処理を施したブロックと時間周波数が高いにもかかわらず（フィルタ処理しなければ映像信号の質に問題が生じる可能性があるものの）フィルタ処理が施されないままフレームが間引かれてしまっているブロックが混在し、局所的な画質の差が生じてしまっている。 The video signal coding method described in Non-Patent Document 1 is superior in that time-space scalability is realized with high coding efficiency by selectively using MCTF which is band division in the time direction and prediction ILP between resolutions. It is an encoding method. However, this encoding method has a problem that flicker occurs when the decoded video is subjectively evaluated. This is pointed out as “a flash of artifacts can appear” in Non-Patent Document 1, and is a problem requiring improvement.
In the video signal encoding method described in Non-Patent Document 1, a block that uses a base layer local decode signal in a prediction process (prediction process) is not subjected to a filter process in an update process (update process). For this reason, even though the L frame is subjected to filtering processing and the time frequency is high (if the filtering processing is not performed, there may be a problem in the quality of the video signal) Blocks that have been drawn are mixed, resulting in a local difference in image quality.

動きのある部分においては、原理的により相関の強いベースレイヤローカルデコード信号が採用される頻度が高いため、この問題が生じ易い。特に、ＭＣＴＦにおいては、時間帯域分割を繰り返し行うため、Ｌフレームにフィルタ処理が施されずにフレームが間引かれることによって生じる劣化が、次のＭＣ及び時間帯域分割処理に伝播してしまい、それ以降の処理の精度を保つことが難しくなる。従って、従来の映像符号化における単純な空間−時間適応処理よりも、ＭＣＴＦにおける適応処理は信号処理理論に基づいて処理される（信号を破綻させない）必要がある。 In a moving part, this problem is likely to occur because a base layer local decoded signal having a stronger correlation in principle is frequently used. In particular, in MCTF, since time band division is repeated, deterioration caused by thinning out frames without being subjected to filter processing on L frames propagates to the next MC and time band division processing. It becomes difficult to maintain the accuracy of subsequent processing. Therefore, the adaptive processing in MCTF needs to be processed based on the signal processing theory (the signal is not broken) rather than the simple space-time adaptive processing in the conventional video coding.

本発明は以上の点に鑑みなされたもので、非特許文献１において生じている符号化と信号処理の相反する現象を解決し、符号化効率とデコード映像の品質を共に向上させ得る映像信号符号化装置、映像信号復号化装置、映像信号符号化プログラム及び映像信号復号化プログラムを提供することを目的とする。 The present invention has been made in view of the above points, and is a video signal code that solves the conflicting phenomenon between encoding and signal processing that occurs in Non-Patent Document 1, and can improve both encoding efficiency and decoded video quality. It is an object to provide an encoding device, a video signal decoding device, a video signal encoding program, and a video signal decoding program.

上記の目的を達成するため、第１の発明の映像信号符号化装置は、入力映像信号に対して補償時間方向フィルタ処理を行って、２つの分割周波数帯域にサブバンド分割された低域側分割周波数帯域のＬフレームと高域側分割周波数帯域のＨフレームとからなるフレーム列信号を生成し、そのフレーム列信号に対して直交変換・量子化及びエントロピー符号化を行って得た第１の符号化信号を、入力映像信号よりも低解像度の映像信号を符号化した第２の符号化信号と共に出力する映像信号符号化装置であって、入力映像信号を所定の比率で空間−時間縮小し、入力映像信号とは異なる空間−時間解像度を持つ低解像度映像信号を生成する縮小手段(11)と、低解像度映像信号を符号化して第２の符号化信号を生成する符号化手段(12)と、第２の符号化信号を局部復号して局部復号信号を生成する局部復号手段(13)と、局部復号信号を縮小手段とは逆の比率で空間解像度拡大し、高空間解像度映像信号を生成する拡大手段(4117)と、入力映像信号の時間方向に存在する異なる２つのフレームの一方を基準フレームとして他方のフレームの動きを推定して、基準フレームと他方のフレームの対応する各ブロックの相対位置関係を示す動き情報を取得し、その動き情報に基づいて他方のフレームの動きを補償する動き推定／補償手段(4118)と、動き推定／補償手段により動き補償された入力映像信号に、第１の重みにより重み付けを行う第１の重み付け手段(4119,411A)と、局部復号信号から拡大手段により生成された高空間解像度映像信号に、第１の重みを用いて算出した第２の重みにより重み付けを行う第２の重み付け手段(4119,411B)と、第１及び第２の重み付け手段により重み付けされたそれぞれの信号に対して、合成及びフィルタリングを行ってＨフレームを生成する第１の処理手段(411C,411D)と、第１の処理手段により生成されたＨフレームに第１の重みにより重み付けを行う第３の重み付け手段(4119,4112)と、第３の重み付け手段により重み付けされたＨフレームに対して、動き推定／補償手段によって取得した動き情報を逆方向に適用して動き補償した後フィルタリングしてＬフレームを生成する第２の処理手段(4113)と、第１及び第２の処理手段によりそれぞれ生成されたＨフレームとＬフレームを合成してフレーム列信号を生成して出力するフレーム列信号生成手段(4115)とを有することを特徴とする。 In order to achieve the above object, a video signal encoding apparatus according to a first aspect of the present invention performs a compensation time direction filtering process on an input video signal, and performs subband division into two divided frequency bands. A first code obtained by generating a frame sequence signal composed of an L frame in the frequency band and an H frame in the high frequency division frequency band, and performing orthogonal transform / quantization and entropy encoding on the frame sequence signal A video signal encoding apparatus that outputs a video signal together with a second encoded signal obtained by encoding a video signal having a resolution lower than that of the input video signal, the input video signal being space-time reduced by a predetermined ratio, Reduction means (11) for generating a low-resolution video signal having a spatial-temporal resolution different from the input video signal, and encoding means (12) for encoding the low-resolution video signal to generate a second encoded signal , Second mark A local decoding means (13) for locally decoding the encoded signal to generate a local decoded signal, and an expanding means (4117) for expanding the spatial resolution of the local decoded signal at a ratio opposite to that of the reducing means to generate a high spatial resolution video signal (4117). ) And a motion indicating the relative positional relationship between each block corresponding to the reference frame and the other frame by estimating the motion of the other frame using one of two different frames existing in the time direction of the input video signal as a reference frame. A motion estimation / compensation unit (4118) for acquiring information and compensating for the motion of the other frame based on the motion information; and an input video signal motion-compensated by the motion estimation / compensation unit is weighted with a first weight The first weighting means (4119, 411A) that performs the processing, and the high spatial resolution video signal generated by the enlargement means from the local decoded signal is weighted by the second weight calculated using the first weight. The second weighting means (4119, 411B) and the first processing means (411C) for generating an H frame by combining and filtering the signals weighted by the first and second weighting means. , 411D), a third weighting means (4119, 4112) for weighting the H frame generated by the first processing means with the first weight, and the H frame weighted by the third weighting means The second processing means (4113) for applying the motion information acquired by the motion estimation / compensation means in the reverse direction to perform motion compensation and then filtering to generate an L frame, and the first and second processing means Frame sequence signal generation means (4115) for generating and outputting a frame sequence signal by synthesizing the generated H frame and L frame, respectively.

また、上記の目的を達成するため、第２の発明の映像信号符号化プログラムは、入力映像信号に対して補償時間方向フィルタ処理を行って、２つの分割周波数帯域にサブバンド分割された低域側分割周波数帯域のＬフレームと高域側分割周波数帯域のＨフレームとからなるフレーム列信号を生成し、そのフレーム列信号に対して直交変換・量子化及びエントロピー符号化を行って得た第１の符号化信号を、入力映像信号よりも低解像度の映像信号を符号化した第２の符号化信号と共に出力する動作を、コンピュータにより実行させる映像信号符号化プログラムであって、コンピュータを、第１の発明の映像信号符号化装置の構成の各手段として機能させることを特徴とする。 In order to achieve the above object, the video signal encoding program according to the second invention performs a compensation time direction filtering process on the input video signal to perform subband division into two divided frequency bands. A first frame obtained by generating a frame sequence signal composed of an L frame in the side division frequency band and an H frame in the high side division frequency band, and performing orthogonal transform / quantization and entropy coding on the frame sequence signal A video signal encoding program for causing a computer to execute an operation of outputting the encoded signal together with a second encoded signal obtained by encoding a video signal having a resolution lower than that of the input video signal. It is made to function as each means of the structure of the video signal encoding device of this invention.

また、上記の目的を達成するため、第３の発明の映像信号復号化装置は、第１の発明の映像信号符号化装置より出力された、又は第２の発明の映像信号符号化プログラムにより生成された、第１及び第２の符号化信号と動き情報と第１乃至第３の重みを示す重み情報からなる信号を受信して復号化する映像信号復号化装置であって、第１及び第２の符号化信号を受信してそれぞれ分離する受信手段(31)と、受信手段からの第１の符号化信号をエントロピー復号化した後、逆量子化・逆直交変換してＨフレーム及びＬフレームを復号する第１の復号手段(511,512)と、受信手段からの第２の符号化信号を復号して低解像度映像信号を生成する第２の復号手段(32)と、第２の復号手段により復号された低解像度映像信号を空間解像度拡大し、高空間解像度映像信号を生成する拡大手段(5137)と、第１の復号手段からのＨフレームに、受信した重み情報から得た第１の重みにより重み付けする第１の重み付け手段(5133)と、第１の重み付け手段により重み付けされたＨフレーム又は復号映像信号に対して、受信手段によって受信した動き情報を逆方向に適用して動き補償した後フィルタリングして第１の復号信号を生成する第１の処理手段(5134)と、第１の復号信号を、受信手段によって受信した動き情報を適用して動き補償した後、受信手段によって受信した重み情報から得た第１の重みにより重み付けする第２の重み付け手段(5136,5138)と、拡大手段からの高空間解像度映像信号に対して、受信手段によって受信した重み情報から得た第２の重みにより重み付けする第３の重み付け手段(5139)と、第２及び第３の重み付け手段によりそれぞれ重み付けされた信号を合成し、その合成信号及び受信手段によって受信及び分離したＨフレームに対してフィルタリングを行って第２の復号信号を生成する第２の処理手段(513A,513B)と、第１の復号信号と第２の復号信号を合成して復号映像信号を生成する復号映像信号生成手段(513C)とを有することを特徴とする。 In order to achieve the above object, the video signal decoding device of the third invention is generated by the video signal encoding program output from the video signal encoding device of the first invention or by the video signal encoding program of the second invention. A video signal decoding apparatus that receives and decodes a signal composed of first and second encoded signals, motion information, and weight information indicating first to third weights. Receiving means (31) for receiving and separating the two encoded signals, and entropy decoding the first encoded signal from the receiving means, and then performing inverse quantization and inverse orthogonal transform to perform H frame and L frame A first decoding means (511, 512) for decoding the image, a second decoding means (32) for decoding the second encoded signal from the receiving means to generate a low-resolution video signal, and a second decoding means The spatial resolution of the decoded low-resolution video signal is expanded to achieve high spatial resolution. A first weighting means (5133) for weighting the H frame from the first decoding means with a first weight obtained from the received weight information; a first weighting means (5133) for generating a video signal; First processing means for generating a first decoded signal by performing motion compensation on the H frame or decoded video signal weighted by the weighting means by applying the motion information received by the receiving means in the reverse direction and performing motion compensation. (5134) and second weighting means for weighting the first decoded signal with the first weight obtained from the weight information received by the receiving means after applying motion compensation to the motion information received by the receiving means (5136, 5138), a third weighting means (5139) for weighting the high spatial resolution video signal from the enlarging means with the second weight obtained from the weight information received by the receiving means, Second processing means (513A) that combines the weighted signals by the third weighting means and performs filtering on the combined signal and the H frame received and separated by the receiving means to generate a second decoded signal. , 513B) and decoded video signal generation means (513C) for generating a decoded video signal by synthesizing the first decoded signal and the second decoded signal.

また、上記の目的を達成するため、第４の発明の映像信号復号化プログラムは、第１の発明の映像信号符号化装置により出力された、又は第２の発明の映像信号符号化プログラムにより生成された、第１及び第２の符号化信号と動き情報と第１乃至第３の重みを示す重み情報からなる信号を受信して、コンピュータにより復号化させる映像信号復号化プログラムであって、コンピュータを、第３の発明の映像信号復号化装置の構成の各手段として機能させることを特徴とする。 In order to achieve the above object, the video signal decoding program of the fourth invention is output by the video signal encoding device of the first invention or generated by the video signal encoding program of the second invention. A video signal decoding program for receiving a signal composed of first and second encoded signals, motion information, and weight information indicating first to third weights, and decoding the signal by a computer, Is made to function as each means of the structure of the video signal decoding device of the third invention.

本発明の目的は、非特許文献１のプレディクション処理（予測処理）において選択的にベースレイヤの局部復号を用いること、また、アップデート処理（更新処理）におけるフィルタ処理を施すか否かの処理を切り替えることによって生じていた前述の問題を解決することである。この目的を達成するために、第１の発明又は第２発明においては、動き推定／補償手段により動き補償された入力映像信号に第１の重みにより重み付けを行うと共に、局部復号信号から空間解像度拡大により生成された高空間解像度映像信号に、第１の重みから算出した第２の重みにより重み付けを行い、これら重み付けされた信号に対して、合成及びフィルタリングを行ってＨフレームを第１の処理手段により生成し、更に第１の処理手段により生成されたＨフレームに、動き補償された入力映像信号と同じ第１の重みにより重み付けし、そのＨフレームに対して第２の処理手段により逆動き補償した後フィルタリングしてＬフレームを生成するようにしたため、第１の処理手段では常に動き補償された入力映像信号と、局部復号信号から空間解像度拡大により生成された高空間解像度映像信号とが重み付け合成された信号に基づいてＨフレームを生成でき、第２の処理手段により時間方向の成分のみを反映させたＬフレームを生成できる。すなわち、解像度間の予測をしつつ、時間方向の帯域分割を実現する。 An object of the present invention is to selectively use base layer local decoding in the prediction process (prediction process) of Non-Patent Document 1, and to determine whether or not to perform a filter process in the update process (update process). The problem is to solve the above-mentioned problem caused by switching. In order to achieve this object, in the first or second invention, the input video signal motion-compensated by the motion estimation / compensation means is weighted by the first weight, and the spatial resolution is expanded from the local decoded signal. Is weighted with the second weight calculated from the first weight, and the weighted signal is synthesized and filtered to generate the H frame as the first processing means. And the H frame generated by the first processing means is weighted by the same first weight as the motion compensated input video signal, and the H processing is inversely compensated for the H frame by the second processing means. Since the L frame is generated after filtering, the first processing means always uses the motion compensated input video signal and the local decoded signal. Produced by between resolution enlarged on the basis of a high spatial resolution image signal are weighted synthesized signal can be generated H frame, it can produce L-frame which reflects only a component in the time direction by the second processing means. That is, band division in the time direction is realized while predicting between resolutions.

すなわち、本発明では、第２の処理手段により得られるＬフレームに常にフィルタ処理を施したブロックが存在するため、非特許文献１に記載の従来装置では、プレディクション処理においてベースレイヤ局部復号信号を採用したブロックにはアップデート処理でフィルタ処理を施さないため、Ｌフレームにフィルタ処理を施したブロックとフィルタ処理を施さなければ映像信号の質に問題が生じる可能性があるもののフィルタ処理が施されないままフレームが間引かれてしまっているブロックが混在し、局所的な画質の差が生じるという問題が生じていたが、この問題を解決することができる。 That is, in the present invention, there is a block in which the L frame obtained by the second processing means is always subjected to filter processing. Therefore, in the conventional apparatus described in Non-Patent Document 1, the base layer local decoded signal is used in the prediction process. The adopted block is not subjected to the filtering process in the update process. Therefore, if the filtering process is not applied to the block subjected to the filtering process on the L frame, there may be a problem in the quality of the video signal, but the filtering process is not performed. There has been a problem that a block in which frames are thinned out is mixed and a difference in local image quality occurs, but this problem can be solved.

更に、ＭＣＴＦにおいては、時間帯域分割を繰り返し行うため、Ｌフレームにフィルタ処理が施されずにフレームが間引かれることによって生じる劣化が、次のＭＣ及び時間帯域分割処理に伝播してしまい、それ以降の処理の精度を保つことが難しくなるという問題も解決することができ、また、空間−時間成分の予測、帯域分割の自由度が増すことから符号化効率も向上できる。 Furthermore, in MCTF, time band division is repeatedly performed, so that degradation caused by thinning out frames without performing filter processing on L frames propagates to the next MC and time band division processing. The problem that it is difficult to maintain the accuracy of the subsequent processing can be solved, and the coding efficiency can be improved because the degree of freedom of space-time component prediction and band division increases.

本発明によれば、第１の処理手段により常に動き補償された入力映像信号と局部復号信号とが重み付け合成された信号に基づいてＨフレームを生成し、第２の処理手段により時間方向の成分のみを反映させたＬフレームを生成することで、第２の処理手段により得られるＬフレームに常にフィルタ処理を施したブロックが存在するようにしたため、非特許文献１に記載の従来装置におけるＬフレームにフィルタ処理を施したブロックとフィルタ処理を施さなければ映像信号の質に問題が生じる可能性があるもののフィルタ処理が施されないままフレームが間引かれてしまっているブロックが混在し、局所的な画質の差が生じるという問題を解決することができ、これにより、デコード映像におけるちらつきが改善されデコード映像の品質を向上できる。 According to the present invention, an H frame is generated on the basis of a signal obtained by weighting and combining an input video signal and a local decoded signal that are always motion-compensated by the first processing means, and a time direction component is produced by the second processing means. By generating an L frame that reflects only the L frame obtained by the second processing means, there is always a block that has been subjected to the filter processing. Therefore, the L frame in the conventional apparatus described in Non-Patent Document 1 There is a mixture of blocks that have been subjected to filter processing and blocks in which frames are thinned out without being subjected to filter processing, although there may be a problem in the quality of the video signal if filter processing is not performed. This can solve the problem of differences in image quality, which improves flicker in the decoded video and improves the quality of the decoded video. It can be.

また、本発明によれば、ＭＣＴＦにおいては、時間帯域分割を繰り返し行うため、Ｌフレームにフィルタ処理が施されずにフレームが間引かれることによって生じる劣化が、次のＭＣ及び時間帯域分割処理に伝播してしまい、それ以降の処理の精度を保つことが難しくなるという問題も解決することができ、更に、空間−時間成分の予測、帯域分割の自由度が増すことから符号化効率も向上できる。 In addition, according to the present invention, since MCTF repeatedly performs time band division, degradation caused by thinning out frames without performing filter processing on L frames may cause subsequent MC and time band division processing. It is possible to solve the problem that it becomes difficult to maintain the accuracy of the subsequent processing because of propagation, and the coding efficiency can be improved because the degree of freedom of space-time component prediction and band division increases. .

次に、本発明の実施の形態について図面と共に説明する。図１は本発明になる映像信号符号化装置及び映像信号復号化装置の一実施の形態のブロック図を示す。同図中、図１４と同一構成部分には同一符号を付し、その説明を省略する。図１において、本発明になる映像信号符号化装置の一実施の形態である符号化部４０は、時間−空間デシメーション部１１、ベースレイヤエンコード部１２、ベースレイヤリコンストラクト部１３、エンハンスメントレイヤエンコード部４１及び多重化部１５から構成される。また、本発明になる映像信号復号化装置の一実施の形態である復号化部５０は、エクストラクト部３１、ベースレイヤデコード部３２及びエンハンスメントレイヤデコード部５１から構成される。 Next, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of a video signal encoding apparatus and video signal decoding apparatus according to the present invention. In the figure, the same components as those in FIG. 14 are denoted by the same reference numerals, and the description thereof is omitted. In FIG. 1, an encoding unit 40, which is an embodiment of a video signal encoding device according to the present invention, includes a time-space decimation unit 11, a base layer encoding unit 12, a base layer reconstructing unit 13, and an enhancement layer encoding unit. 41 and the multiplexing unit 15. The decoding unit 50 as an embodiment of the video signal decoding apparatus according to the present invention includes an extract unit 31, a base layer decoding unit 32, and an enhancement layer decoding unit 51.

図１４と同様に、本実施の形態でも符号化部４０にはオリジナルの映像信号が入力され、符号化部４０で生成されたビットストリームが通信回線またはメディア２０を介して復号化部５０に伝送される。復号化部５０では供給されたビットストリームから必要な情報を取り出して、ディスプレイ等の性能に合った空間解像度、フレームレート、ＳＮＲのデコード映像信号を出力する。 Similarly to FIG. 14, in this embodiment, the original video signal is input to the encoding unit 40, and the bit stream generated by the encoding unit 40 is transmitted to the decoding unit 50 via the communication line or the medium 20. Is done. The decoding unit 50 extracts necessary information from the supplied bit stream and outputs a decoded video signal having a spatial resolution, a frame rate, and an SNR suitable for the performance of a display or the like.

ここで、本発明では、非特許文献１におけるＭＣＴＦとＩＬＰを選択的に用いることによる弊害を改善することを目的としており、本実施の形態では非特許文献１のＭＣＴＦ−ＩＬＰ部を含むエンハンスメントレイヤエンコード部１４を後述する構成のエンハンスメントレイヤエンコード部４１に変更すると共に、逆ＭＣＴＦ−ＩＬＰ部を含むエンハンスメントレイヤデコード部３３を後述する構成のエンハンスメントレイヤデコード部５１に変更した点に特徴がある。それ以外の構成部分に関しては、図１４の従来の構成と同様である。なお、図１の実施の形態は図１４の例と同様に空間解像度スケーラビリティを２層で実現しているが、多層であってもよい。 Here, the present invention aims to improve the adverse effects caused by the selective use of MCTF and ILP in Non-Patent Document 1, and in this embodiment, an enhancement layer including the MCTF-ILP part of Non-Patent Document 1 is used. The encoding unit 14 is changed to an enhancement layer encoding unit 41 having a configuration described later, and the enhancement layer decoding unit 33 including the inverse MCTF-ILP unit is changed to an enhancement layer decoding unit 51 having a configuration described later. Other components are the same as the conventional configuration of FIG. The embodiment of FIG. 1 realizes spatial resolution scalability with two layers as in the example of FIG. 14, but it may be multi-layered.

図２は図１中のエンハンスメントレイヤエンコード部４１の一実施の形態のブロック図を示す。図２において、エンハンスメントレイヤエンコード部４１は、ＭＣＴＦ−ＩＬＰ部４１１、直交変換・量子化部４１２及びエントロピー符号化部４１３から構成される。ＭＣＴＦ−ＩＬＰ部４１１は、オリジナルの映像信号とベースレイヤリコンストラクト部１３からのベースレイヤのローカルデコード信号（局部復号信号）とを入力として受け、これらの信号を用いて、空間−時間スケーラビリティを実現する信号を生成する機能を有する。動作の詳細については後述する。また、ＭＣＴＦ−ＩＬＰ部４１１は、映像情報を直交変換・量子化部４１２へ出力し、重み情報及び動き情報をそれぞれエントロピー符号化部４１３へ出力する機能を有する。 FIG. 2 shows a block diagram of an embodiment of the enhancement layer encoding unit 41 in FIG. In FIG. 2, the enhancement layer encoding unit 41 includes an MCTF-ILP unit 411, an orthogonal transform / quantization unit 412, and an entropy encoding unit 413. The MCTF-ILP unit 411 receives the original video signal and the base layer local decode signal (local decoded signal) from the base layer reconstruct unit 13 as inputs, and realizes space-time scalability using these signals. A function of generating a signal to be generated. Details of the operation will be described later. The MCTF-ILP unit 411 has a function of outputting video information to the orthogonal transform / quantization unit 412 and outputting weight information and motion information to the entropy encoding unit 413, respectively.

直交変換・量子化部４１２は、ＭＣＴＦ−ＩＬＰ部４１１より出力される映像情報を入力として受け、この信号を直交変換後に量子化する機能を有する。直交変換はＤＣＴやアダマール変換、ウェーブレット変換などが代表例である。また、直交変換・量子化部４１２は、直交変換後に量子化した信号をエントロピー符号化部４１３へ出力する機能を有する。 The orthogonal transform / quantization unit 412 has a function of receiving the video information output from the MCTF-ILP unit 411 as an input and quantizing the signal after orthogonal transform. Typical examples of the orthogonal transform include DCT, Hadamard transform, and wavelet transform. The orthogonal transform / quantization unit 412 has a function of outputting a signal quantized after the orthogonal transform to the entropy coding unit 413.

エントロピー符号化部４１３は、直交変換・量子化部４１２より出力された映像情報と、ＭＣＴＦ−ＩＬＰ部４１１より出力された重み情報及び動き情報とを取得する機能と、これら取得した入力情報をエントロピー符号化してビットストリームを生成し、エンハンスメントレイヤエンコーダ部４１の外部へ出力する機能とを有する。ここで、エントロピー符号化は従来の方法と同様にＳＮＲスケーラビリティが実現できるような構成や手順も採用できる。 The entropy encoding unit 413 obtains the video information output from the orthogonal transform / quantization unit 412 and the weight information and motion information output from the MCTF-ILP unit 411, and entropy the acquired input information. A function of generating a bit stream by encoding and outputting the bit stream to the outside of the enhancement layer encoder unit 41; Here, the entropy coding can adopt a configuration and procedure that can realize SNR scalability as in the conventional method.

図２に示した構成のエンハンスメントレイヤエンコード部４１を用いてエンハンスメントレイヤを符号化する手順を図３のフローチャートと共に説明する。オリジナルの映像信号とベースレイヤの局部復号信号を用いて、ＭＣＴＦ−ＩＬＰ部４１１において、時間−空間スケーラビリティを実現する信号を生成する（図３のステップＳ１０１）。手順の詳細については後述する。この結果得られた映像情報を直交変換・量子化部４１２に送る。重み情報及び動き情報は、エントロピー符号化部４１３に送られる。 A procedure for encoding an enhancement layer using the enhancement layer encoding unit 41 having the configuration shown in FIG. 2 will be described with reference to the flowchart of FIG. Using the original video signal and the base layer local decoded signal, the MCTF-ILP unit 411 generates a signal that realizes time-space scalability (step S101 in FIG. 3). Details of the procedure will be described later. The video information obtained as a result is sent to the orthogonal transform / quantization unit 412. The weight information and motion information are sent to the entropy coding unit 413.

ＭＣＴＦ−ＩＬＰ部４１１において生成した時間−空間スケーラビリティを実現する映像情報信号は直交変換・量子化部４１２で直交変換された後、得られた直交変換係数が量子化される（図３のステップＳ１０２）。直交変換及び量子化された信号はエントロピー符号化部４１３へ送られる。ＭＣＴＦ−ＩＬＰ部４１１で得た重み情報と動き情報及び直交変換・量子化部４１２から出力される映像情報は、エントロピー符号化部４１３においてエントロピー符号化され、ビットストリームが生成される（図３のステップＳ１０３）。 The video information signal realizing the time-space scalability generated in the MCTF-ILP unit 411 is orthogonally transformed by the orthogonal transformation / quantization unit 412, and the obtained orthogonal transformation coefficient is quantized (step S102 in FIG. 3). ). The orthogonally transformed and quantized signal is sent to the entropy coding unit 413. The weight information and motion information obtained by the MCTF-ILP unit 411 and the video information output from the orthogonal transform / quantization unit 412 are entropy-coded by the entropy coding unit 413 to generate a bit stream (see FIG. 3). Step S103).

次に、エンハンスメントレイヤエンコード部４１内の図２に示したＭＣＴＦ−ＩＬＰ部４１１の構成及び動作について詳細に説明する。図４はＭＣＴＦ−ＩＬＰ部４１１の一実施の形態の構成図を示す。同図において、ＭＣＴＦ−ＩＬＰ部４１１は、入力切り替え器４１１０、プレディクション処理部４１１１、アップデート用重み付け部４１１２、アップデート処理部４１１３、フレーム管理部４１１４、フレーム並び替え部４１１５、動き情報更新部４１１６より構成される。 Next, the configuration and operation of the MCTF-ILP unit 411 shown in FIG. 2 in the enhancement layer encoding unit 41 will be described in detail. FIG. 4 shows a configuration diagram of an embodiment of the MCTF-ILP unit 411. In the figure, an MCTF-ILP unit 411 includes an input switching unit 4110, a prediction processing unit 4111, an update weighting unit 4112, an update processing unit 4113, a frame management unit 4114, a frame rearranging unit 4115, and a motion information updating unit 4116. Composed.

プレディクション処理部４１１１は、入力切り替え器４１１０より入力される信号と、ベースレイヤリコンストラクト部１３から出力されたベースレイヤの局部復号信号と、フレーム管理部４１１４より出力されるフレーム管理情報とを入力として取得する機能を有する。プレディクション処理部４１１１の内部構成及び動作については後述する。また、プレディクション処理部４１１１は、プレディクション処理結果（予測処理結果）の信号をフレーム並び替え部４１１５及びアップデート用重み付け部４１１２に出力する機能と、重み情報をアップデート用重み付け部４１１２、動き情報更新部４１１６及びＭＣＴＦ−ＩＬＰ部４１１の外部へと出力する機能と、動き情報（例えば動きベクトル、マクロブロックタイプなど）をアップデート処理部４１１３及び動き情報更新部４１１６へ出力する機能とを有する。 Prediction processing unit 4111 receives a signal input from input switching unit 4110, a base layer local decoded signal output from base layer reconstructing unit 13, and frame management information output from frame management unit 4114. It has the function to acquire as. The internal configuration and operation of the prediction processing unit 4111 will be described later. Also, the prediction processing unit 4111 has a function of outputting a signal of the prediction processing result (prediction processing result) to the frame rearranging unit 4115 and the update weighting unit 4112, the weight information for the update weighting unit 4112, and the motion information update. A function to output to the outside of the unit 4116 and the MCTF-ILP unit 411, and a function to output motion information (eg, motion vector, macroblock type, etc.) to the update processing unit 4113 and the motion information update unit 4116.

アップデート処理部４１１３は、入力切り替え器４１１０より入力される信号と、アップデート用重み付け部４１１２より出力される信号と、フレーム管理部４１１４より出力されるフレーム管理情報と、プレディクション処理部４１１１より出力される動き情報を入力として取得する機能と、入力された信号と情報を用いて、逆ＭＣとフィルタ処理を行う機能と、処理後の信号をフレーム並び替え部４１１５に出力する機能とを有する。 The update processing unit 4113 receives a signal input from the input switching unit 4110, a signal output from the update weighting unit 4112, frame management information output from the frame management unit 4114, and output from the prediction processing unit 4111. A function of acquiring motion information as input, a function of performing inverse MC and filtering using the input signal and information, and a function of outputting the processed signal to the frame rearranging unit 4115.

フレーム管理部４１１４は、ＭＣＴＦ−ＩＬＰ部４１１内のフレームを管理する機能と、その管理情報を入力切り替え器４１１０、プレディクション処理部４１１１、アップデート処理部４１１３及びフレーム並び替え部４１１５へ出力する機能とを有する。 The frame management unit 4114 has a function of managing a frame in the MCTF-ILP unit 411, and a function of outputting the management information to the input switch 4110, the prediction processing unit 4111, the update processing unit 4113, and the frame rearrangement unit 4115. Have

フレーム並び替え部４１１５は、プレディクション処理部４１１１より出力されるプレディクション処理後（予測処理後）の信号と、アップデート処理部４１１３より出力されるアップデート処理後（更新処理後）の信号及びフレーム管理部４１１４より出力されるフレーム管理情報を取得する機能と、取得した各情報に基づいてＭＣＴＦ−ＩＬＰ処理の終了したフレームを適宜所望の順序に並び替える機能と、その並び替えの処理結果をＭＣＴＦ−ＩＬＰ部４１１の外部へ出力する機能とを有する。 The frame rearrangement unit 4115 receives the signal after the prediction process (after the prediction process) output from the prediction processing unit 4111, the signal after the update process (after the update process) output from the update processing unit 4113, and the frame management. A function for acquiring the frame management information output from the unit 4114, a function for appropriately rearranging the frames for which the MCTF-ILP processing has been completed based on each acquired information, and the processing result of the rearrangement as MCTF- A function of outputting to the outside of the ILP unit 411.

アップデート用重み付け部４１１２は、プレディクション処理部４１１１より出力されるプレディクション処理結果の信号（Ｈフレーム）及び重み情報を取得する機能と、取得したプレディクション処理結果の信号（Ｈフレーム）に取得した重み情報をもとに重み付け処理をする機能と、重み付け処理結果の信号をアップデート処理部４１１３へ出力する機能とを有する。 The weighting unit for update 4112 acquires the signal of the prediction process result (H frame) and weight information output from the prediction processing unit 4111 and the acquired signal of the prediction process result (H frame). It has a function of performing weighting processing based on the weight information and a function of outputting a signal of the weighting processing result to the update processing unit 4113.

動き情報更新部４１１６は、プレディクション処理部４１１１より出力される動き情報及び重み情報を取得する機能と、取得した重み情報の内容に応じて動き情報を更新する機能と、その更新結果をＭＣＴＦ−ＩＬＰ部４１１の外部へ出力する機能とを有する。 The motion information update unit 4116 has a function of acquiring the motion information and weight information output from the prediction processing unit 4111, a function of updating the motion information according to the content of the acquired weight information, and the update result as MCTF- A function of outputting to the outside of the ILP unit 411.

入力切り替え器４１１０は、映像信号と、アップデート処理部４１１３より出力されるアップデート処理結果の信号（Ｌフレーム）と、フレーム管理部４１１４より出力されるフレーム管理情報とを取得する機能と、取得したフレーム管理情報に応じて取得した映像信号とアップデート処理結果の信号（Ｌフレーム）のいずれかを選択し、プレディクション処理部４１１１及びアップデート処理部４１１３へ出力する機能とを有する。 The input switching unit 4110 has a function of acquiring a video signal, a signal (L frame) of an update processing result output from the update processing unit 4113, and frame management information output from the frame management unit 4114, and the acquired frame It has a function of selecting either a video signal acquired according to management information or a signal (L frame) of an update processing result and outputting it to a prediction processing unit 4111 and an update processing unit 4113.

次に、プレディクション処理部４１１１の詳細な構成例について説明する。図４に示すように、プレディクション処理部４１１１は、解像度インターポレーション部４１１７、ＭＥ／ＭＣ部４１１８、重み指示部４１１９、プレディクション用重み付け部４１１Ａ、ＩＬＰ用重み付け部４１１Ｂ、信号合成部４１１Ｃ、フィルタリング部４１１Ｄより構成される。 Next, a detailed configuration example of the prediction processing unit 4111 will be described. 4, the prediction processing unit 4111 includes a resolution interpolation unit 4117, an ME / MC unit 4118, a weight instruction unit 4119, a prediction weighting unit 411A, an ILP weighting unit 411B, a signal synthesis unit 411C, A filtering unit 411D is included.

解像度インターポレーション部４１１７は、ベースレイヤローカルデコード信号及びフレーム管理部４１１４より出力されるフレーム管理情報を取得する機能と、ベースレイヤローカルデコード信号を映像信号と同じ空間解像度までインターポレーション（拡大；補間）する機能とを有する。ここで、インターポレーションの方法は何を用いてもよいが、ベースレイヤの空間解像度までデシメーションした（縮小した）ときに用いたフィルタに対応するものを用いることが望ましい。また、解像度インターポレーション部４１１７は、インターポレーションした結果の信号をＩＬＰ用重み付け部４１１Ｂに出力する機能を有する。 The resolution interpolation unit 4117 interpolates (enlarges; the base layer local decode signal and the frame management information output from the frame management unit 4114) and the base layer local decode signal to the same spatial resolution as the video signal. Interpolation). Here, any interpolation method may be used, but it is desirable to use a filter corresponding to the filter used when decimating (reducing) the spatial resolution of the base layer. Further, the resolution interpolation unit 4117 has a function of outputting the signal resulting from the interpolation to the ILP weighting unit 411B.

ＭＥ／ＭＣ部４１１８は、入力切り替え器４１１０より入力される信号及びフレーム管理部４１１４より出力されるフレーム管理情報を取得する機能と、例えば従来のＭＰＥＧ(Moving Picture Experts Group)シリーズやＨ.２６４、ＭＣＴＦのような動き推定（ＭＥ）及び動き補償（ＭＣ）を行う機能と、ＭＥ／ＭＣの結果得られる信号をプレディクション用重み付け部４１１Ａへ出力し、動き情報をプレディクション部４１１１の外部に出力する機能とを有する。 The ME / MC unit 4118 has a function of acquiring a signal input from the input switcher 4110 and frame management information output from the frame management unit 4114, and, for example, a conventional MPEG (Moving Picture Experts Group) series, H.264, A function for performing motion estimation (ME) and motion compensation (MC) such as MCTF, and a signal obtained as a result of ME / MC are output to the weighting unit 411A for prediction, and motion information is output to the outside of the prediction unit 4111. It has the function to do.

重み指示部４１１９は、フィルタリング部４１１Ｄから出力されるプレディクション処理結果の信号（Ｈフレーム）及びフレーム管理部４１１４より出力されるフレーム管理情報を取得する機能と、取得した情報を基に重みＷを算出し、それをプレディクション用重み付け部４１１Ａ、ＩＬＰ用重み付け部４１１Ｂ、アップデート用重み付け部４１１２及び動き情報更新部４１１６に供給すると共にＭＣＴＦ−ＩＬＰ部４１１の外部へ出力する機能とを有する。なお、重み指示部４１１９に対する入力は外部から指示を受け付けてもよい。 The weight instruction unit 4119 has a function of acquiring a prediction processing result signal (H frame) output from the filtering unit 411D and frame management information output from the frame management unit 4114, and a weight W based on the acquired information. It has a function of calculating, supplying it to the weighting unit for prediction 411A, the weighting unit for ILP 411B, the weighting unit for update 4112, and the motion information update unit 4116 and outputting it to the outside of the MCTF-ILP unit 411. Note that the input to the weight instruction unit 4119 may accept an instruction from the outside.

プレディクション用重み付け部４１１Ａは、ＭＥ／ＭＣ部４１１８よりＭＥ／ＭＣした信号及び重み指示部４１１９より出力された重みＷを取得する機能と、ＭＥ／ＭＣした信号に重みＷを重み付けし、信号合成部４１１Ｃへ出力する機能とを有する。ＩＬＰ用重み付け部４１１Ｂは、解像度インターポレーション部４１１７より出力される信号及び重み指示部４１１９より重みＷを取得する機能と、（１−Ｗ）を算出する機能と、算出結果を解像度インターポレーションされた信号に重み付けして信号合成部４１１Ｃへ出力する機能とを有する。 The weighting unit for prediction 411A obtains a signal obtained by ME / MC from the ME / MC unit 4118 and a weight W output from the weight instruction unit 4119, and weights the weight W on the signal subjected to ME / MC, and performs signal synthesis. Part 411C. The ILP weighting unit 411B obtains the signal output from the resolution interpolation unit 4117 and the weight W from the weight instruction unit 4119, the function of calculating (1-W), and the calculation result as the resolution interpolation. The weighted signal is weighted and output to the signal synthesizer 411C.

信号合成部４１１Ｃは、プレディクション用重み付け部４１１Ａ及びＩＬＰ用重み付け部４１１Ｂよりそれぞれ出力される信号と、フレーム管理部４１１４から出力されるフレーム管理情報を取得する機能と、プレディクション用重み付け部４１１Ａ及びＩＬＰ用重み付け部４１１Ｂよりそれぞれ取得した信号を例えば加算して合成し、プレディクション処理用の最終的なリファレンスフレームを作成する機能と、その結果得られる信号をフィルタリング部４１１Ｄへ出力する機能とを有する。 The signal synthesis unit 411C has a function of acquiring signals output from the weighting unit for prediction 411A and the weighting unit for ILP 411B, frame management information output from the frame management unit 4114, a weighting unit for prediction 411A, For example, the signals acquired from the ILP weighting unit 411B are added and combined, for example, to create a final reference frame for prediction processing, and a function to output the resulting signal to the filtering unit 411D .

フィルタリング部４１１Ｄは、信号合成部４１１Ｃから出力される信号、入力切り替え器４１１０より出力される信号及びフレーム管理部４１１４より出力されるフレーム情報を取得する機能と、信号合成部４１１Ｃで得られるリファレンスフレームと入力信号にフィルタ処理を行ってＨフレームを生成し、プレディクション処理部４１１１の外部及び重み指示部４１１９へと出力する機能とを有する。 The filtering unit 411D has a function of acquiring a signal output from the signal combining unit 411C, a signal output from the input switching unit 4110, and frame information output from the frame management unit 4114, and a reference frame obtained by the signal combining unit 411C. And a function of performing filter processing on the input signal to generate an H frame and outputting it to the outside of the prediction processing unit 4111 and to the weight instruction unit 4119.

すなわち、ＭＣＴＦ−ＩＬＰ部４１１は、オリジナルの映像信号に対してＭＥ／ＭＣ部４１１８でＭＥ／ＭＣした信号とベースレイヤローカルデコード信号を空間解像度・インターポレーション部４１１７においてインターポレーションした信号にそれぞれプレディクション用重み付け部４１１ＡとＩＬＰ用重み付け部４１１Ｂで重み付けをする。そして、信号合成部４１１Ｃでそれぞれの信号を合成し、リファレンスフレームを生成する。リファレンスフレームと入力切り替え器４１１０からの出力をフィルタリング部４１１Ｄでフィルタ処理してプレディクション処理結果の信号（Ｈフレーム）を得る。アップデート処理部４１１３におけるフィルタ処理は、スキップせずに、プレディクション部４１１１におけるプレディクション用重み付け部４１１Ａと同じ重み付けをしたＨフレームを用い、ＬフレームにＨフレームの時間方向の成分のみを反映させる。すなわち、空間解像度間の予測をしつつ、時間方向の帯域分割を実現する。 That is, the MCTF-ILP unit 411 converts the signal obtained by performing ME / MC in the ME / MC unit 4118 and the base layer local decoded signal to the signal obtained by interpolating in the spatial resolution / interpolation unit 4117 with respect to the original video signal. Weighting is performed by the weighting unit for prediction 411A and the weighting unit for ILP 411B. Then, each signal is synthesized by the signal synthesis unit 411C to generate a reference frame. The reference frame and the output from the input switch 4110 are filtered by the filtering unit 411D to obtain a signal (H frame) as a result of the prediction process. In the filter processing in the update processing unit 4113, the H frame weighted in the same way as the prediction weighting unit 411A in the prediction unit 4111 is used without skipping, and only the component in the time direction of the H frame is reflected in the L frame. In other words, time-division band division is realized while prediction between spatial resolutions.

次に、図４に示した構成のＭＣＴＦ−ＩＬＰ部４１１によるＭＣＴＦ−ＩＬＰ処理を行う手順を図５のフローチャートと共に説明する。図５に示す一連の処理を、非特許文献１と同様に１ＧＯＰ単位で行うために、１ＧＯＰ分の映像フレームを蓄積する（図５のステップＳ２０１）。なお、図５に示す一連の処理を、複数のＧＯＰ単位で行い、複数のＧＯＰにまたがる処理を行ってもよい。次に、図１のベースレイヤリコンストラクト部１３からのベースレイヤローカルデコード信号を、図４の解像度インターポレーション部４１１７において、空間−時間デシメーション部１１のデシメーション比率とは逆の比率で空間解像度インターポレーションして、ＭＣＴＦ−ＩＬＰ部４１１の入力映像信号と同じ高空間解像度の映像信号に変換する（図５のステップＳ２０２）。 Next, a procedure for performing MCTF-ILP processing by the MCTF-ILP unit 411 having the configuration shown in FIG. 4 will be described with reference to the flowchart of FIG. In order to perform the series of processing shown in FIG. 5 in units of 1 GOP as in Non-Patent Document 1, video frames for 1 GOP are accumulated (step S201 in FIG. 5). Note that the series of processing shown in FIG. 5 may be performed in units of a plurality of GOPs, and processing across a plurality of GOPs may be performed. Next, the base layer local decoded signal from the base layer reconstructing unit 13 in FIG. 1 is converted into a spatial resolution interpolator at a resolution opposite to the decimation ratio of the space-time decimation unit 11 in the resolution interpolation unit 4117 in FIG. It is porated and converted into a video signal having the same high spatial resolution as the input video signal of the MCTF-ILP unit 411 (step S202 in FIG. 5).

次に、プレディクション処理部４１１１によりプレディクション処理を行う。プレディクション処理では、最初は外部から入力された高空間解像度の映像信号が入力切り替え器４１１０を通してプレディクション処理部４１１１に入力され、ＭＣＴＦによる１回目の帯域分割処理が行われる。プレディクション処理部４１１１ではＭＥ／ＭＣ部４１１８により、処理対象フレーム（偶数）の前後のＭＥ／ＭＣ部４１１８において入力切り替え器４１１０からの出力信号である処理対象フレーム（偶数）の前後のフレーム（奇数）に対してＭＥ／ＭＣ処理が行われる（図５のステップＳ２０３）。 Next, the prediction processing unit 4111 performs a prediction process. In the prediction process, initially, a high spatial resolution video signal input from the outside is input to the prediction processing unit 4111 through the input switch 4110, and the first band division process by MCTF is performed. In the prediction processing unit 4111, the ME / MC unit 4118 uses the ME / MC unit 4118 before and after the processing target frame (even number) and the frames before and after the processing target frame (even number) that are output signals from the input switching unit 4110 (odd number). ) Is performed (step S203 in FIG. 5).

すなわち、ＭＥ／ＭＣ部４１１８では、まず処理対象フレームを基準フレームとして前又は後のフレームの動きを推定して、基準フレームと前又は後のフレームの対応する各ブロックの相対位置関係を示す動きベクトル情報（以下、動き情報ともいう）を取得する動き推定（ＭＥ）処理を行い、続いて、その動きベクトル情報に基づいて基準フレームの前又は後のフレームの動きを補償する動き補償（ＭＣ）処理を行う。このＭＥ／ＭＣ処理を基準フレームの前と後のフレームのそれぞれについて行う。 That is, the ME / MC unit 4118 first estimates the motion of the previous or subsequent frame using the processing target frame as a reference frame, and shows a motion vector indicating the relative positional relationship between corresponding blocks of the reference frame and the previous or subsequent frame. Motion compensation (MC) processing for performing motion estimation (ME) processing for acquiring information (hereinafter also referred to as motion information), and subsequently compensating for motion of a frame before or after the reference frame based on the motion vector information I do. This ME / MC processing is performed for each of the frames before and after the reference frame.

なお、後述するフィルタリング部４１１Ｄからサブバンド分割された２つの分割周波数帯域のうち高域側分割周波数帯域のＨフレームが最初に出力された後、引き続いてプレディクション処理部４１１１では同様にして２回目以降のＭＣＴＦによる帯域分割処理を行うが、ＭＣＴＦによる２回目以降の帯域分割処理は、図１１等と共に説明したようにＬフレームに対して行われるため、アップデート処理部４１１３から出力される低域側分割周波数帯域のＬフレームが入力切り替え器４１１０により選択されてプレディクション処理部４１１１に入力される。 In addition, after the H frame of the high frequency side divided frequency band is first output from the two divided frequency bands obtained by subband division from the filtering unit 411D described later, the prediction processing unit 4111 subsequently performs the second time in the same manner. Subsequent band division processing by MCTF is performed, but since the second and subsequent band division processing by MCTF is performed on the L frame as described with reference to FIG. 11 and the like, the low frequency side output from the update processing unit 4113 The L frame of the divided frequency band is selected by the input switch 4110 and input to the prediction processing unit 4111.

上記のＭＥ／ＭＣ部４１１８でＭＥ／ＭＣ処理された信号は、重み指示部４１１９の指示に従ってプレディクション用重み付け部４１１ＡにおいてＷの値（０＜Ｗ＜１）を重み付けされる（図５のステップＳ２０４）。なお、ここでは重み指示部４１１９には、信号合成部４１１Ｃの出力信号が小さくなるようにフィルタリング部４１１Ｄから出力されるプレディクション処理後の信号（Ｈフレーム）をフィードバックして最適なＷを求めるようにしているが、重み指示部４１１９における重みＷの算出ルールはこれ以外にも何通りも考えられ、例えば外部から指示を与えてもよい。 The signal subjected to the ME / MC processing in the ME / MC unit 4118 is weighted with a value of W (0 <W <1) in the weighting unit for prediction 411A in accordance with an instruction from the weight instruction unit 4119 (step of FIG. 5). S204). Here, the weight instruction unit 4119 feeds back the signal (H frame) after the prediction process output from the filtering unit 411D so as to reduce the output signal of the signal synthesis unit 411C so as to obtain the optimum W. However, there are various other rules for calculating the weight W in the weight instruction unit 4119. For example, an instruction may be given from the outside.

一方、解像度インターポレーション部４１１７においてベースレイヤローカルデコード信号に対して解像度のインターポレーションを行って得られた高空間解像度の映像信号は、重み指示部４１１９の指示に従ってＩＬＰ用重み付け部４１１Ｂにおいて（１−Ｗ）の値が重み付けされる（図５のステップＳ２０５）。 On the other hand, a high spatial resolution video signal obtained by performing resolution interpolation on the base layer local decode signal in the resolution interpolation unit 4117 is transmitted to the ILP weighting unit 411B according to the instruction from the weight instruction unit 4119 ( 1-W) is weighted (step S205 in FIG. 5).

続いて、プレディクション用重み付け部４１１Ａ及びＩＬＰ用重み付け部４１１Ｂにおいてそれぞれ重み付けした信号を信号合成部４１１Ｃで合成し、プレディクション処理のための最終的なリファレンスフレームを生成する（図５のステップＳ２０６）。 Subsequently, signals weighted by the weighting unit for prediction 411A and the weighting unit for ILP 411B are combined by the signal combining unit 411C to generate a final reference frame for the prediction process (step S206 in FIG. 5). .

続いて、入力切り替え器４１１０で選択された信号と、信号合成部４１１Ｃからのリファレンスフレームとに対して、フィルタリング部４１１Ｄにてフィルタ処理が行われる（図５のステップＳ２０７）。この結果、フィルタリング部４１１ＤからＭＣＴＦ−ＩＬＰ部４１１におけるプレディクション処理結果の信号としてＨフレームを得ることができる。 Subsequently, the filtering unit 411D performs a filtering process on the signal selected by the input switching unit 4110 and the reference frame from the signal synthesis unit 411C (step S207 in FIG. 5). As a result, an H frame can be obtained from the filtering unit 411D as a signal of a prediction processing result in the MCTF-ILP unit 411.

続いて、ＧＯＰ内の偶数フレームのすべてに対して、ステップＳ２０３〜Ｓ２０７のプレディクション処理が行われたかどうか判定し（図５のステップＳ２０８）、全て終了するまでステップＳ２０３〜Ｓ２０７のプレディクション処理が繰り返される。これにより、プレディクション処理部４１１１による１回目以降のＭＣＴＦによる帯域分割処理が終了する。 Subsequently, it is determined whether or not the prediction processes in steps S203 to S207 have been performed on all the even frames in the GOP (step S208 in FIG. 5), and the prediction processes in steps S203 to S207 are performed until all the frames are completed. Repeated. Thus, the first and subsequent MCTF band division processing by the prediction processing unit 4111 ends.

続いて、アップデート処理が行われる。アップデート処理では、プレディクション処理結果の信号のうち、時間方向の高周波数成分だけをアップデート処理に反映させたいため、フィルタリング部４１１Ｄから出力されたプレディクション処理結果の信号（Ｈフレーム）にアップデート用重み付け部４１１２においてＷの重み付けをする（図５のステップＳ２０９）。 Subsequently, an update process is performed. In the update process, in order to reflect only the high frequency component in the time direction in the signal of the prediction process result in the update process, the signal (H frame) of the prediction process result output from the filtering unit 411D is weighted for update. The unit 4112 weights W (step S209 in FIG. 5).

次に、アップデート用重み付け部４１１２においてＷの重み付けをした、処理対象フレーム（奇数）の前後のフレーム（偶数）のプレディクション処理結果の信号（Ｈフレーム）は、プレディクション処理部４１１１から得られる動き情報を用いて、アップデート処理部４１１３において逆ＭＣされる（図５のステップＳ２１０）。続いて、アップデート処理部４１１３は、この逆ＭＣした信号と入力切り替え器４１１０により選択された信号に対してフィルタ処理を行う（図５のステップＳ２１１）。この結果、アップデート処理を行ったＬフレームが生成される。 Next, the signal (H frame) of the prediction processing result of the frame (even number) before and after the processing target frame (odd number) weighted by W in the update weighting unit 4112 is obtained from the prediction processing unit 4111. Using the information, the update processing unit 4113 performs inverse MC (step S210 in FIG. 5). Subsequently, the update processing unit 4113 performs a filtering process on the inverted MC signal and the signal selected by the input switching unit 4110 (step S211 in FIG. 5). As a result, an L frame subjected to the update process is generated.

続いて、同じＧＯＰ内の偶数フレームに対してアップデート処理が終了したかどうか判定し（図５のステップＳ２１２）、終了するまで上記のステップＳ２０９〜Ｓ２１１のアップデート処理を他のフレームに対しても行う。 Subsequently, it is determined whether or not the update process has been completed for even frames in the same GOP (step S212 in FIG. 5), and the update processes in steps S209 to S211 are performed for other frames until the update process is completed. .

ステップＳ２１２でアップデート処理が終了したと判定されることにより、１回目のＭＣＴＦによる帯域分割処理が終了し、続いて、入力切り替え器４１１０がアップデート処理部４１１３からのＬフレームを選択するように切り替わった後、上記のステップＳ２０３〜Ｓ２１２による２回目以降のＭＣＴＦによる帯域分割処理が行われ、最終的に残りのＬフレームが１枚になるか、又はＭＣＴＦを途中で終了するまで、上記のステップＳ２０３以降の処理を繰り返し、従来のＭＣＴＦと同様に、時間方向の帯域分割を繰り返し行う（図５のステップＳ２１３）。最後に、フレーム並び替え部４１１５で所望のフレーム順序に並び替えてＭＣＴＦ−ＩＬＰ部４１１の外部へ出力する。以上の方法を用いることで、ＭＣＴＦ−ＩＬＰ処理を施した信号を生成できる。 When it is determined in step S212 that the update process has been completed, the first band division process by MCTF is completed, and then the input switching unit 4110 is switched to select the L frame from the update processing unit 4113. Thereafter, the band splitting process by the MCTF after the second time in the above steps S203 to S212 is performed, and the above steps S203 and after until the remaining L frame becomes one or the MCTF is ended halfway. This process is repeated, and the band division in the time direction is repeated as in the conventional MCTF (step S213 in FIG. 5). Finally, the frame rearrangement unit 4115 rearranges the frames in a desired frame order, and outputs them to the outside of the MCTF-ILP unit 411. By using the above method, a signal subjected to MCTF-ILP processing can be generated.

次に、制御情報（動き情報、重み情報）の扱いについて説明する。基本的には、ＭＥ／ＭＣ部４１１８で得た動き情報及び重み指示部４１１９で算出した重みＷは、そのままＭＣＴＦ−ＩＬＰ部４１１の外部へ出力されエントロピー符号化される。 Next, handling of control information (motion information, weight information) will be described. Basically, the motion information obtained by the ME / MC unit 4118 and the weight W calculated by the weight instruction unit 4119 are directly output to the outside of the MCTF-ILP unit 411 and are entropy encoded.

このように、本実施の形態によれば、フィルタリング部４１１Ｄから出力されるプレディクション処理後の信号（Ｈフレーム）を重み指示部４１１９にフィードバックして、信号合成部４１１Ｃの出力信号が小さくなるような最適な重みＷを求め、その重みＷの値又は（１−Ｗ）の値に従って、フィルタリング部４１１Ｄに入力される信号に対して重み付けすると共に、フィルタリング部４１１Ｄから出力されるプレディクション処理された信号（Ｈフレーム）に対しても重みＷにより重み付けしているため、ベースレイヤローカルデコード信号を解像度インターポレーションして得られた映像信号とＭＥ／ＭＣ処理した信号とがそれぞれ重みＷの値に応じて混合された信号に対してフィルタリング部４１１Ｄによるフィルタリングが行われてＨフレームが生成され、また、そのＨフレームに重みＷで重み付けした信号に対してアップデート処理を行ってＬフレームを生成するため、Ｌフレームに時間方向の成分のみを反映させることができる（Ｌフレームに常にフィルタリング部４１１Ｄによるフィルタ処理を施したブロックが存在する）。 Thus, according to the present embodiment, the signal (H frame) after the prediction process output from filtering section 411D is fed back to weight instruction section 4119 so that the output signal of signal combining section 411C becomes small. The optimum weight W is obtained, and the signal input to the filtering unit 411D is weighted according to the value of the weight W or the value of (1-W), and the prediction process output from the filtering unit 411D is performed. Since the signal (H frame) is also weighted by the weight W, the video signal obtained by the resolution interpolation of the base layer local decode signal and the ME / MC-processed signal have the weight W value, respectively. Filtering by the filtering unit 411D is performed on the mixed signal accordingly. Since an H frame is generated and an L frame is generated by performing an update process on the signal weighted by the weight W to the H frame, only the time direction component can be reflected in the L frame (L frame There is always a block that has been filtered by the filtering unit 411D).

このため、解像度間の予測をしつつ、時間方向の帯域分割を実現することができ、その結果、プレディクション処理においてベースレイヤローカルデコード信号を採用したブロックにはアップデート処理でフィルタ処理を施さないため、Ｌフレームにフィルタ処理を施したブロックと、フィルタ処理を施さなければ映像信号の質に問題が生じる可能性があるもののフィルタ処理が施されないままフレームが間引かれてしまっているブロックとが混在し、局所的な画質の差が生じるという非特許文献１の問題を解決することができ、デコード映像におけるちらつきが改善される。また、ＭＣＴＦにおいては、時間帯域分割を繰り返し行うため、Ｌフレームにフィルタ処理が施されずにフレームが間引かれることによって生じる劣化が、次のＭＣ及び時間帯域分割処理に伝播してしまい、それ以降の処理の精度を保つことが難しくなるという問題も解決することができ、更に、空間−時間成分の予測、帯域分割の自由度が増すことから符号化効率も向上できる。また、本実施の形態のＭＣＴＦ−ＩＬＰ部４１１の構成と手順はすべて線形処理であるため、量子化の影響を無視できる条件であればデコード側での再構成が可能となる。 For this reason, it is possible to realize band division in the time direction while predicting between resolutions, and as a result, the block that uses the base layer local decode signal in the prediction process is not subjected to the filter process in the update process. , A block in which the L frame is subjected to filter processing and a block in which the frame is thinned out without being subjected to the filter processing although there may be a problem in the quality of the video signal if the filter processing is not performed. In addition, the problem of Non-Patent Document 1 in which a local difference in image quality occurs can be solved, and the flicker in the decoded video is improved. Also, in MCTF, time band division is repeatedly performed, so that degradation caused by thinning out frames without being subjected to filter processing on L frames propagates to the next MC and time band division processing. The problem that it is difficult to maintain the accuracy of the subsequent processing can be solved, and the coding efficiency can be improved because the degree of freedom of space-time component prediction and band division increases. In addition, since the configuration and procedure of the MCTF-ILP unit 411 according to the present embodiment are all linear processing, reconfiguration on the decoding side is possible as long as the influence of quantization can be ignored.

なお、この実施の形態の応用例として、例えば、重みＷを”０”としてもよい。この場合は、非特許文献１のベースレイヤの情報を選択した場合と等価であり、上記の非特許文献１の問題は残るが、図４の動き情報更新部４１１６でベクトルを”０”とすることができ、それによりエントロピーが下がる場合にはこの方法を採用できる。更に、重みＷの値を”０”または”１”に限定して、重みＷを符号化しないようにすれば、完全に非特許文献１と等価となり、互換性を保つこともできる。 As an application example of this embodiment, for example, the weight W may be set to “0”. This case is equivalent to the case of selecting the base layer information in Non-Patent Document 1, and the problem of Non-Patent Document 1 remains, but the motion information update unit 4116 in FIG. 4 sets the vector to “0”. This method can be employed when the entropy is reduced. Furthermore, if the value of the weight W is limited to “0” or “1” and the weight W is not encoded, it is completely equivalent to the non-patent document 1, and compatibility can be maintained.

次に、図１の復号化部５０内のエンハンスメントレイヤデコード部５１について説明する。図６はエンハンスメントレイヤデコード部５１の一実施の形態のブロック図を示す。同図に示すように、エンハンスメントレイヤデコード部５１は、エントロピー復号化部５１１、逆量子化・逆直交変換部５１２及び逆ＭＣＴＦ−ＩＬＰ部５１３から構成される。エントロピー復号化部５１１は、ビットストリームを取得して解析し、エントロピー復号化を行って、映像情報、重み情報及び動き情報を復号する機能と、復号した映像情報を逆量子化・逆直交変換部５１２へ出力し、重み情報及び動き情報を逆ＭＣＴＦ−ＩＬＰ部５１３へ出力する機能とを有する。ここで、入力されたビットストームがＳＮＲスケーラビリティに対応していた場合、従来の方法と同様に、これを復号できるような構成や手順を備えてもよい。 Next, the enhancement layer decoding unit 51 in the decoding unit 50 of FIG. 1 will be described. FIG. 6 shows a block diagram of an embodiment of the enhancement layer decoding unit 51. As shown in the figure, the enhancement layer decoding unit 51 includes an entropy decoding unit 511, an inverse quantization / inverse orthogonal transform unit 512, and an inverse MCTF-ILP unit 513. The entropy decoding unit 511 acquires and analyzes a bitstream, performs entropy decoding, decodes video information, weight information, and motion information, and an inverse quantization / inverse orthogonal transform unit for the decoded video information And the function of outputting the weight information and the motion information to the inverse MCTF-ILP unit 513. Here, when the input bit storm corresponds to SNR scalability, a configuration and a procedure for decoding the bit storm may be provided as in the conventional method.

逆量子化・逆直交変換部５１２は、エントロピー復号化部５１１より出力される復号映像情報を取得し、この復号映像情報を逆量子化した後逆直交変換する機能と、逆量子化及び逆直交変換した信号を逆ＭＣＴＦ−ＩＬＰ部５１３へ出力する機能とを有する。逆ＭＣＴＦ−ＩＬＰ部５１３は、逆量子化・逆直交変換部５１２において逆量子化及び逆直交変換された信号と、エントロピー復号化部５１１から出力される復号された重み情報及び動き情報と、図１のベースレイヤデコード部３２においてベースレイヤをデコードした信号とを取得する機能と、時間方向の帯域合成と解像度間の逆予測を行って、デコード映像信号を得る機能と、デコード映像信号をエンハンスメントレイヤデコード部５１の外部へ出力する機能とを有する。 The inverse quantization / inverse orthogonal transform unit 512 acquires the decoded video information output from the entropy decoding unit 511, performs inverse quantization on the decoded video information, and then performs inverse orthogonal transform, and inverse quantization and inverse orthogonal A function of outputting the converted signal to the inverse MCTF-ILP unit 513; The inverse MCTF-ILP unit 513 includes a signal obtained by inverse quantization and inverse orthogonal transform in the inverse quantization / inverse orthogonal transform unit 512, decoded weight information and motion information output from the entropy decoding unit 511, A function for obtaining a signal obtained by decoding the base layer in one base layer decoding unit 32, a function for obtaining a decoded video signal by performing band synthesis in the time direction and inverse prediction between resolutions, and an enhancement layer for the decoded video signal. A function of outputting to the outside of the decoding unit 51.

次に、図６に示した構成のエンハンスメントレイヤデコード部５１によるエンハンスメントレイヤを復号化する手順を図７のフローチャートと共に説明する。エンハンスメントレイヤデコード部５１は、入力されたビットストリームをエントロピー復号化部５１１において復号する（図７のステップＳ３０１）。復号した結果得られる映像情報は逆量子化・逆直交変換部５１２へ送られ、復号した重み情報及び動き情報は逆ＭＣＴＦ−ＩＬＰ部５１３に送られる。 Next, the procedure for decoding the enhancement layer by the enhancement layer decoding unit 51 having the configuration shown in FIG. 6 will be described with reference to the flowchart of FIG. The enhancement layer decoding unit 51 decodes the input bit stream in the entropy decoding unit 511 (step S301 in FIG. 7). The video information obtained as a result of decoding is sent to the inverse quantization / inverse orthogonal transform unit 512, and the decoded weight information and motion information are sent to the inverse MCTF-ILP unit 513.

復号した映像情報は逆量子化・逆直交変換部５１２にて逆量子化された後逆直交変換される（図７のステップＳ３０２）。この結果得られる信号は逆ＭＣＴＦ−ＩＬＰ部５１３へと送られる。逆ＭＣＴＦ−ＩＬＰ部５１３は逆直交変換された映像情報、ベースレイヤのデコード信号及び復号された重み情報及び動き情報を用いて、時間方向の帯域合成と解像度間の逆予測を行いデコード映像信号を得る（図７のステップＳ３０３）。手順の詳細については後述する。 The decoded video information is inversely quantized by the inverse quantization / inverse orthogonal transform unit 512 and then inversely orthogonal transformed (step S302 in FIG. 7). The signal obtained as a result is sent to the inverse MCTF-ILP unit 513. The inverse MCTF-ILP unit 513 performs temporal direction band synthesis and inverse prediction between resolutions using the video information obtained by inverse orthogonal transformation, the base layer decoded signal, and the decoded weight information and motion information, and outputs the decoded video signal. Is obtained (step S303 in FIG. 7). Details of the procedure will be described later.

次に、図６に示したエンハンスメントレイヤデコード部５１内の逆ＭＣＴＦ−ＩＬＰ部５１３の構成及び動作について詳細に説明する。図８は逆ＭＣＴＦ−ＩＬＰ部５１３の一実施の形態の構成図を示す。同図に示すように、逆ＭＣＴＦ−ＩＬＰ部５１３は、フレーム管理部５１３０、フレーム分割部５１３１、入力切り替え器５１３２、逆アップデート用重み付け部５１３３、逆アップデート処理部５１３４、逆プレディクション処理部５１３５及びフレーム並び替え部５１３Ｃで構成される。 Next, the configuration and operation of the inverse MCTF-ILP unit 513 in the enhancement layer decoding unit 51 shown in FIG. 6 will be described in detail. FIG. 8 shows a configuration diagram of an embodiment of the inverse MCTF-ILP unit 513. As shown in the figure, the inverse MCTF-ILP unit 513 includes a frame management unit 5130, a frame division unit 5131, an input switch 5132, a reverse update weighting unit 5133, a reverse update processing unit 5134, a reverse prediction processing unit 5135, and The frame rearrangement unit 513C is configured.

フレーム管理部５１３０は、逆ＭＣＴＦ−ＩＬＰ部５１３内のフレームを管理する機能と、そのフレーム管理情報をフレーム分割部５１３１、入力切り替え器５１３２、逆アップデート処理部５１３４、逆プレディクション処理部５１３５及びフレーム並び替え部５１３Ｃへと出力する機能とを有する。 The frame management unit 5130 manages the frame in the inverse MCTF-ILP unit 513, and the frame management information of the frame management unit 5131, the input switching unit 5132, the reverse update processing unit 5134, the reverse prediction processing unit 5135, and the frame A function of outputting to the rearrangement unit 513C.

フレーム分割部５１３１は、復元対象の映像情報及びフレーム管理部５１３０より出力されるフレーム管理情報を取得する機能と、取得した映像情報をＬフレームとＨフレームに分け、Ｌフレームを入力切り替え器５１３２に出力し、Ｈフレームを逆アップデート用重み付け部５１３３及び逆プレディクション処理部５１３５へ出力する機能とを有する。 The frame division unit 5131 divides the acquired video information into an L frame and an H frame and acquires the video information to be restored and the frame management information output from the frame management unit 5130, and the L frame is input to the input switch 5132. And a function of outputting the H frame to the reverse update weighting unit 5133 and the reverse prediction processing unit 5135.

入力切り替え器５１３２はフレーム分割部５１３１から出力される信号、逆プレディクション処理部５１３５から出力される信号及びフレーム管理部５１３０から出力されるフレーム管理情報を取得する機能と、フレーム管理情報に応じて入力されるフレーム分割部５１３１からの出力信号または逆プレディクション処理部５１３５から出力される信号のいずれかを選択して逆アップデート処理部５１３４へ出力する機能とを有する。 The input switching unit 5132 acquires a signal output from the frame dividing unit 5131, a signal output from the reverse prediction processing unit 5135, and a frame management information output from the frame management unit 5130, according to the frame management information. A function of selecting either an output signal from the input frame dividing unit 5131 or a signal output from the reverse prediction processing unit 5135 and outputting the selected signal to the reverse update processing unit 5134;

逆アップデート用重み付け部５１３３は、フレーム分割部５１３１から出力されるＨフレームとエントロピー復号化部５１１で復号された重み情報を取得する機能と、重み情報Ｗに基づいてＨフレームに重み付けを行い、その結果得られる信号を逆アップデート処理部５１３４へ出力する機能とを有する。 The inverse update weighting unit 5133 weights the H frame based on the weight information W based on the function of acquiring the H frame output from the frame dividing unit 5131 and the weight information decoded by the entropy decoding unit 511, And a function of outputting a signal obtained as a result to the reverse update processing unit 5134.

逆アップデート処理部５１３４は、逆アップデート用重み付け部５１３３より出力される信号（Ｈフレーム）、入力切り替え器５１３２から出力される信号、エントロピー復号化部５１１で復号された動き情報及びフレーム管理部５１３０から出力されるフレーム管理情報を取得する機能と、取得した動き情報を基に逆ＭＣを行い（動き情報を逆方向に適用して動き補償を行い）、更にフィルタ処理を行って得たＬフレームを逆プレディクション処理部５１３５及びフレーム並び替え部５１３Ｃへ出力する機能とを有する。 The inverse update processing unit 5134 receives the signal (H frame) output from the inverse update weighting unit 5133, the signal output from the input switch 5132, the motion information decoded by the entropy decoding unit 511, and the frame management unit 5130. The function to acquire the output frame management information and the inverse MC based on the acquired motion information (the motion information is applied in the reverse direction to perform motion compensation), and the L frame obtained by further filtering And a function of outputting to the reverse prediction processing unit 5135 and the frame rearranging unit 513C.

逆プレディクション処理部５１３５はフレーム管理部５１３０から出力されるフレーム管理情報、フレーム分割部５１３１から出力されるＨフレーム、逆アップデート処理部５１３４から出力されるＬフレーム、エントロピー復号化部５１１で復号された動き情報及び重み情報、ベースレイヤデコード信号を取得する機能と、逆プレディクション処理結果をフレーム並び替え部５１３Ｃへ出力する機能とを有する。逆プレディクション処理部５１３５の内部構成及び動作については後述する。 The reverse prediction processing unit 5135 is decoded by the frame management information output from the frame management unit 5130, the H frame output from the frame division unit 5131, the L frame output from the reverse update processing unit 5134, and the entropy decoding unit 511. A function of acquiring the motion information and weight information and the base layer decode signal, and a function of outputting the reverse prediction processing result to the frame rearrangement unit 513C. The internal configuration and operation of the reverse prediction processing unit 5135 will be described later.

フレーム並び替え部５１３Ｃは、逆アップデート処理部５１３４で復元したＬフレーム、逆プレディクション処理で復元したＨフレーム及びフレーム管理部５１３０から出力されるフレーム管理情報を取得する機能と、フレーム情報を用いて、逆アップデート処理部５１３４で復元したＬフレーム及び逆プレディクション処理２１０７で復元したＨフレームを所望のフレーム順序に並び替えて入力切り替え器５１３２及び外部へと出力する機能とを有する。 The frame rearranging unit 513C uses the L information restored by the reverse update processing unit 5134, the H frame restored by the reverse prediction processing, and the frame management information output from the frame management unit 5130, and the frame information. The L frame restored by the reverse update processing unit 5134 and the H frame restored by the reverse prediction processing 2107 are rearranged in a desired frame order and output to the input switching unit 5132 and the outside.

次に、逆プレディクション部５１３５の詳細な構成例について説明する。図８に示すように、逆プレディクション処理部５１３５は、ＭＣ部５１３６、解像度インターポレーション部５１３７、逆プレディクション用重み付け部５１３８、逆ＩＬＰ用重み付け部５１３９、信号・合成部５１３Ａ、フィルタリング部５１３Ｂから構成される。 Next, a detailed configuration example of the reverse prediction unit 5135 will be described. As shown in FIG. 8, the inverse prediction processing unit 5135 includes an MC unit 5136, a resolution interpolation unit 5137, an inverse prediction weighting unit 5138, an inverse ILP weighting unit 5139, a signal / synthesis unit 513A, and a filtering unit 513B. Consists of

ＭＣ部５１３６は、逆アップデート処理部５１３４から出力されるＬフレーム、動き情報及びフレーム管理部５１３０から出力されるフレーム情報を取得する機能と、動き情報を基にＬフレームの動き補償（ＭＣ）を行い、その結果得られる信号を逆プレディクション用重み付け部５１３８へ出力する機能とを有する。 The MC unit 5136 performs a function of acquiring the L frame output from the inverse update processing unit 5134, the motion information and the frame information output from the frame management unit 5130, and motion compensation (MC) of the L frame based on the motion information. And a function of outputting a signal obtained as a result to the weighting unit 5138 for reverse prediction.

解像度インターポレーション部５１３７は、ベースレイヤデコード信号及びフレーム管理部５１３０より出力されるフレーム管理情報を取得する機能と、取得したベースレイヤデコード信号を復元対象の映像情報と同じ空間解像度までインターポレーションする機能とを有する。ここで、インターポレーションの方法は何を用いてもよいが、符号化部４０のＭＣＴＦ−ＩＬＰ部４１１で用いたフィルタに対応するものを用いることが望ましい。また、解像度インターポレーション部５１３７はインターポレーションした結果の信号を逆ＩＬＰ用重み付け部５１３８に出力する機能を有する。 The resolution interpolation unit 5137 interpolates the base layer decoded signal and the frame management information output from the frame management unit 5130 and the acquired base layer decoded signal to the same spatial resolution as the video information to be restored. It has the function to do. Here, any interpolation method may be used, but it is preferable to use a filter corresponding to the filter used in the MCTF-ILP unit 411 of the encoding unit 40. Further, the resolution interpolation unit 5137 has a function of outputting a signal resulting from the interpolation to the inverse ILP weighting unit 5138.

逆プレディクション用重み付け部５１３８は、ＭＣ部５１３６から出力される信号及び重み情報を取得する機能と、取得した重み情報Ｗに基づいてＭＣ部５１３６から出力される信号に重み付けをして、信号合成部５１３Ａへと出力する機能とを有する。逆ＩＬＰ用重み付け部５１３９は、解像度インターポレーション部５１３７から出力される信号及び重み情報を取得する機能と、取得した重み情報Ｗに基づいて（１−Ｗ）を算出し、解像度インターポレーション部５１３７から出力される信号に重み付けを行って、信号合成部５１３Ａへ出力する機能とを有する。 The weighting unit for inverse prediction 5138 weights the signal output from the MC unit 5136 based on the function of acquiring the signal and weight information output from the MC unit 5136 and the acquired weight information W, and performs signal synthesis. A function of outputting to the unit 513A. The inverse ILP weighting unit 5139 calculates (1-W) based on the function of acquiring the signal and weight information output from the resolution interpolation unit 5137 and the acquired weight information W, and the resolution interpolation unit A function of weighting the signal output from 5137 and outputting the result to the signal synthesis unit 513A.

信号合成部５１３Ａは、逆プレディクション用重み付け部５１３８及び逆ＩＬＰ用重み付け部５１３９から出力されるそれぞれの信号を取得する機能と、取得したそれぞれの信号を例えば加算して合成し、逆ＭＣＴＦ−ＩＬＰのためのリファレンスフレームを生成し、フィルタリング部５１３Ｂへ出力する機能とを有する。 The signal synthesis unit 513A adds the respective signals output from the inverse prediction weighting unit 5138 and the inverse ILP weighting unit 5139, and the obtained signals, for example, and synthesizes them to obtain an inverse MCTF-ILP. For generating a reference frame for and for outputting to the filtering unit 513B.

フィルタリング部５１３Ｂは、フレーム分割部５１３１から出力されるＨフレーム、信号合成部５１３Ａより出力されるリファレンスフレーム及びフレーム管理部５１３０から出力されるフレーム管理情報を取得する機能と、取得したＨフレームとリファレンスフレームに対してフィルタ処理を施して、時間方向の帯域合成及び解像度間の逆予測された（復元された）信号を生成してフレーム並び替え部５１３Ｃへ出力する機能とを有する。 The filtering unit 513B has a function of acquiring the H frame output from the frame dividing unit 5131, the reference frame output from the signal combining unit 513A, and the frame management information output from the frame management unit 5130, and the acquired H frame and reference The filter processing is performed on the frames to generate a band synthesis in the time direction and a reversely predicted (restored) signal between the resolutions and output to the frame rearranging unit 513C.

次に、図８に示した構成の逆ＭＣＴＦ−ＩＬＰ部５１３による逆ＭＣＴＦ−ＩＬＰ処理を行う手順について図９のフローチャートと共に説明する。図９に示す一連の処理を、１ＧＯＰ単位で行うために、１ＧＯＰ分の映像フレームを蓄積する（図９のステップＳ４０１）。なお、図９に示す一連の処理を複数のＧＯＰ単位で行い、複数のＧＯＰにまたがる処理を行ってもよい。続いて、図１のベースレイヤデコード部３２からのベースレイヤデコード信号は、図８の解像度インターポレーション部５１３７において予めインターポレーションしておく（図９のステップＳ４０２）。 Next, a procedure for performing the inverse MCTF-ILP process by the inverse MCTF-ILP unit 513 having the configuration shown in FIG. 8 will be described with reference to the flowchart of FIG. In order to perform the series of processing shown in FIG. 9 in units of 1 GOP, video frames for 1 GOP are accumulated (step S401 in FIG. 9). Note that the series of processing shown in FIG. 9 may be performed in units of a plurality of GOPs, and processing that spans a plurality of GOPs may be performed. Subsequently, the base layer decoded signal from the base layer decoding unit 32 in FIG. 1 is interpolated in advance in the resolution interpolation unit 5137 in FIG. 8 (step S402 in FIG. 9).

次に、逆アップデート処理を行う。逆アップデート処理では、入力された復元対象の映像情報のうちＬフレームは、最初は入力切り替え器５１３２を通して逆アップデート処理部５１３４に入力される。ここで、２回目以降の帯域合成処理は、復元したフレーム（逆プレディクション処理結果の出力）に対して行われるため、入力切り替え器５１３２からは復元したフレームが出力されることになる。 Next, reverse update processing is performed. In the reverse update process, L frames of the input video information to be restored are initially input to the reverse update processing unit 5134 through the input switch 5132. Here, since the second and subsequent band synthesis processes are performed on the restored frame (output of the reverse prediction process result), the restored frame is output from the input switching unit 5132.

逆アップデート処理に用いるＨフレームの信号は逆アップデート用重み付け部５１３３において重みＷの重み付けされる（図９のステップＳ４０３）。処理対象のＬフレームの前後のＷの重み付けをされたＨフレームの信号は、逆アップデート処理部５１３４において動き情報を用いて逆ＭＣされる（図９のステップＳ４０４）。さらに、逆アップデート処理部５１３４は、入力切り替え器５１３２から出力される信号と逆ＭＣされた信号に対してフィルタ処理を施す（図９のステップＳ４０５）。この結果、逆アップデート処理を行った復元されたフレームが生成される。続いて、ＧＯＰ内の同レベルのＬフレームに対して逆アップデートが全て終了したかどうか判断し（図９のステップＳ４０６）、逆アップデートが全て終了するまでステップＳ４０３〜Ｓ４０５の処理が繰り返される。 The H-frame signal used for the reverse update process is weighted by the weight W in the reverse update weighting unit 5133 (step S403 in FIG. 9). The H frame signal weighted with W before and after the L frame to be processed is inverse-MCed using the motion information in the inverse update processing unit 5134 (step S404 in FIG. 9). Further, the inverse update processing unit 5134 performs a filter process on the signal that is inverse MC to the signal output from the input switching unit 5132 (step S405 in FIG. 9). As a result, a restored frame subjected to the reverse update process is generated. Subsequently, it is determined whether or not all reverse updates have been completed for L frames at the same level in the GOP (step S406 in FIG. 9), and the processes of steps S403 to S405 are repeated until all the reverse updates are completed.

次に、逆プレディクション処理を行う。この逆プレディクション処理では、処理対象のＨフレームの前後の逆アップデートして復元したＬフレームの信号をＭＣ部５１３６において動き情報を用いてＭＣする（図９のステップＳ４０７）。そして、ＭＣした信号に対して、逆プレディクション用重み付け部５１３８で、重み情報Ｗの重み付けをする（図９のステップＳ４０８）。一方、解像度インターポレーション部５１３７で解像度インターポレーションされたベースレイヤデコード信号に対しては、逆ＩＬＰ用重み付け部５１３９において重み（１−Ｗ）の値を算出し、それを重み付けする（図９のステップＳ４０９）。 Next, reverse prediction processing is performed. In this reverse prediction process, the MC unit 5136 MCs the L frame signal restored by reverse updating before and after the H frame to be processed using the motion information (step S407 in FIG. 9). Then, the weighted information W is weighted by the inverse prediction weighting unit 5138 for the MC signal (step S408 in FIG. 9). On the other hand, for the base layer decoded signal subjected to resolution interpolation by the resolution interpolation unit 5137, the inverse ILP weighting unit 5139 calculates a weight (1-W) value and weights it (FIG. 9). Step S409).

逆プレディクション用重み付け部５１３８及び逆ＩＬＰ用重み付け部５１３９においてそれぞれ重み付けされた信号は、信号合成部５１３Ａに供給されて合成され、プレディクション処理のための最終的なリファレンスフレームとして生成される（図９のステップＳ４１０）。この信号合成部５１３Ａから出力されたリファレンスフレームと、フレーム分割部５１３１から出力されるＨフレームとに対してフィルタリング部５１３８にてフィルタ処理が施される（図９のステップＳ４１１）。この結果、逆ＭＣＴＦ−ＩＬＰ部５１３における逆プレディクション処理結果の信号を得ることができる。 The signals weighted by the inverse prediction weighting unit 5138 and the inverse ILP weighting unit 5139 are supplied to the signal synthesis unit 513A and synthesized, and generated as a final reference frame for the prediction process (see FIG. 9 step S410). The filtering unit 5138 performs a filtering process on the reference frame output from the signal combining unit 513A and the H frame output from the frame dividing unit 5131 (step S411 in FIG. 9). As a result, the signal of the reverse prediction processing result in the reverse MCTF-ILP unit 513 can be obtained.

続いて、ＧＯＰ内の同レベルのＨフレームの逆プレディクション処理が終了したかどうか判定し（図９のステップＳ４１２）、終了していない場合は終了するまで、ステップＳ４０７〜Ｓ４１１の処理が繰り返し行われる。なお、符号化部４０の時間方向の帯域分解数や復号化部５０の環境に応じて、時間方向の帯域合成は繰り返し行う（図９のステップＳ４１３）。すなわち、全レベル逆ＭＣＴＦ−ＩＬＰが終了するか、逆ＭＣＴＦ−ＩＬＰを途中で終了するまではステップＳ４０３以降の動作が繰り返される。 Subsequently, it is determined whether or not the reverse prediction process for the H frame at the same level in the GOP has been completed (step S412 in FIG. 9). If not, the processes in steps S407 to S411 are repeated until the process is completed. Is called. Note that band synthesis in the time direction is repeatedly performed according to the number of band resolutions in the time direction of the encoding unit 40 and the environment of the decoding unit 50 (step S413 in FIG. 9). In other words, the operation after step S403 is repeated until the all-level inverse MCTF-ILP is completed or until the inverse MCTF-ILP is terminated halfway.

最後に、フレーム並び替え部５１３Ｃで所望のフレーム順序に並び替えて逆ＭＣＴＦ−ＩＬＰ部５１３の外部へ出力する。以上が本実施の形態におけるＭＣＴＦ−ＩＬＰ部４１１を含むエンハンスメントレイヤエンコード部４１及び逆ＭＣＴＦ−ＩＬＰ部５１３を含むエンハンスメントレイヤデコード部５１の構成例と手順の例である。これらを実施することで、従来技術よりも高符号化効率で品質が向上するスケーラブル符号化方式を実現することができる。 Finally, the frame rearrangement unit 513C rearranges the frames in a desired frame order and outputs the result to the outside of the inverse MCTF-ILP unit 513. The above is an example of the configuration and procedure of the enhancement layer encoding unit 41 including the MCTF-ILP unit 411 and the enhancement layer decoding unit 51 including the inverse MCTF-ILP unit 513 in the present embodiment. By implementing these, it is possible to realize a scalable coding scheme in which quality is improved with higher coding efficiency than in the prior art.

次に、本発明の符号化プログラム及び復号化プログラムの実施の形態について説明する。図１０は本発明になる符号化プログラム及び復号化プログラムの一実施の形態を備えた情報処理装置１００の一例のブロック図を示す。同図に示すように、情報処理装置１００は、外部記憶装置１０１、一時記憶装置１０２、通信装置１０３、入力装置１０４、出力装置１０５及び中央処理制御装置１０６で構成されており、コンピュータである中央処理制御装置１０６により符号化及び復号化装置の機能をプログラムにより実現させるものである。 Next, embodiments of the encoding program and decoding program of the present invention will be described. FIG. 10 shows a block diagram of an example of an information processing apparatus 100 provided with an embodiment of an encoding program and a decoding program according to the present invention. As shown in the figure, the information processing apparatus 100 includes an external storage device 101, a temporary storage device 102, a communication device 103, an input device 104, an output device 105, and a central processing control device 106, and is a central computer that is a computer. The processing control device 106 realizes the functions of the encoding and decoding device by a program.

ここで、上記のプログラムは記録媒体から入力装置１０４により読み取られて中央処理制御装置１０６に取り込まれてもよいし、ネットワークを介して通信装置１０３により受信されて中央処理制御装置１０６に取り込まれてもよい。 Here, the above program may be read from the recording medium by the input device 104 and taken into the central processing control device 106, or may be received by the communication device 103 via the network and taken into the central processing control device 106. Also good.

中央処理制御装置１０６は、上記符号化プログラムにより、符号化部４０の空間−時間デシメーション部１１に相当する空間−時間デシメーション手段１０７、ベースレイヤエンコード部１２に相当するベースレイヤエンコード手段１０８、ベースレイヤリコンストラクト部１３に相当するベースレイヤリコンストラクト手段１０９、エンハンスメントレイヤエンコード部４１に相当するエンハンスメントレイヤエンコード手段１１０を実行し、また、復号化プログラムにより、復号化部５０のエクストラクト部３１に相当するエクストラクト手段１１３、ベースレイヤデコード部３２に相当するベースレイヤデコード手段１１３及びエンハンスメントレイヤデコード部５１に相当するエンハンスメントデコード手段１１４を実行する。 The central processing control device 106 uses the above encoding program to generate a space-time decimation unit 107 corresponding to the space-time decimation unit 11 of the encoding unit 40, a base layer encoding unit 108 corresponding to the base layer encoding unit 12, a base layer The base layer reconstructing unit 109 corresponding to the reconstructing unit 13 and the enhancement layer encoding unit 110 corresponding to the enhancement layer encoding unit 41 are executed, and it corresponds to the extract unit 31 of the decoding unit 50 by a decoding program. The extraction unit 113, the base layer decoding unit 113 corresponding to the base layer decoding unit 32, and the enhancement decoding unit 114 corresponding to the enhancement layer decoding unit 51 are executed.

本発明の映像信号符号化装置及び映像信号復号化装置の一実施の形態のブロック図である。It is a block diagram of one embodiment of a video signal encoding device and a video signal decoding device of the present invention. 本発明の映像信号符号化装置の要部のエンハンスメントレイヤエンコード部の一実施の形態のブロック図である。It is a block diagram of one Embodiment of the enhancement layer encoding part of the principal part of the video signal encoding apparatus of this invention. 図２の動作説明用フローチャートである。It is a flowchart for operation | movement description of FIG. 図２中のＭＣＴＦ−ＩＬＰ部の一実施の形態の構成図である。It is a block diagram of one Embodiment of the MCTF-ILP part in FIG. 図４の動作説明用フローチャートである。5 is a flowchart for explaining the operation of FIG. 本発明の映像信号復号化装置の要部のエンハンスメントレイヤデコード部の一実施の形態のブロック図である。It is a block diagram of one Embodiment of the enhancement layer decoding part of the principal part of the video signal decoding apparatus of this invention. 図６の動作説明用フローチャートである。It is a flowchart for operation | movement description of FIG. 図６中の逆ＭＣＴＦ−ＩＬＰ部の一実施の形態の構成図である。It is a block diagram of one Embodiment of the reverse MCTF-ILP part in FIG. 図８の動作説明用フローチャートである。It is a flowchart for operation | movement description of FIG. 本発明の符号化及び復号化プログラムを実行する情報処理装置の一例のブロック図である。It is a block diagram of an example of the information processing apparatus which performs the encoding and decoding program of this invention. ＭＣＴＦの一例を示す図である。It is a figure which shows an example of MCTF. ＭＣＴＦの一例を示すフローチャー卜である。It is a flow chart showing an example of MCTF. ＭＣＴＦ処理後の信号の一例を示す図である。It is a figure which shows an example of the signal after MCTF process. 従来の映像信号符号化装置及び映像信号復号化装置の一例のブロック図である。It is a block diagram of an example of the conventional video signal encoding device and video signal decoding device. 従来の符号化部の動作説明用フローチャートである。It is a flowchart for operation | movement description of the conventional encoding part. 従来の復号化部の動作説明用フローチャートである。It is a flowchart for operation | movement description of the conventional decoding part. 従来の映像信号符号化装置の要部のエンハンスメントレイヤエンコード部の一例のブロック図である。It is a block diagram of an example of the enhancement layer encoding part of the principal part of the conventional video signal encoding apparatus. 図１７の動作説明用フローチャートである。18 is a flowchart for explaining the operation of FIG. 従来の映像信号復号化装置の要部のエンハンスメントレイヤエンコード部の一例のブロック図である。It is a block diagram of an example of the enhancement layer encoding part of the principal part of the conventional video signal decoding apparatus. 図１９の動作説明用フローチャートである。20 is a flowchart for explaining the operation of FIG. 図１７中のＭＣＴＦ−ＩＬＰ部の一例の動作を説明する図である。It is a figure explaining the operation | movement of an example of the MCTF-ILP part in FIG. 従来のＭＣＴＦ−ＩＬＰ部の動作説明用フローチャートである。It is a flowchart for operation | movement description of the conventional MCTF-ILP part. 従来の逆ＭＣＴＦ−ＩＬＰ部の一例の動作を説明する図である。It is a figure explaining operation | movement of an example of the conventional reverse MCTF-ILP part. 従来の逆ＭＣＴＦ−ＩＬＰ部の動作説明用フローチャートである。It is a flowchart for operation | movement description of the conventional reverse MCTF-ILP part.

Explanation of symbols

１１空間−時間デシメーション部
１２ベースレイヤエンコード部
１３ベースレイヤリコンストラクト部
１５多重化部
２０通信回線またはメディア
３１エクストラクト部
３２ベースレイヤデコード部
４０符号化部
４１エンハンスメントレイヤエンコード部
５０復号化部
５１エンハンスメントレイヤデコード部
１００情報処理装置
１０６中央処理制御装置
１１０エンハンスメントレイヤエンコード手段
１１４エンハンスメントレイヤデコード手段
４１１ＭＣＴＦ−ＩＬＰ部
４１２直交変換・量子化部
４１３エントロピー符号化部
５１１エントロピー復号化部
５１２逆量子化・逆直交変換部
５１３逆ＭＣＴＦ−ＩＬＰ部
４１１０、５１３２入力切り替え器
４１１１プレディクション部
４１１２アップデート用重み付け部
４１１３アップデート処理部
４１１４、５１３０フレーム管理部
４１１５、５１３Ｃフレーム並び替え部
４１１６動き情報更新部
４１１７、５１３７解像度インターポレーション部
４１１８ＭＥ／ＭＣ部
４１１９重み指示部
４１１Ａプレディクション用重み付け部
４１１ＢＩＬＰ用重み付け部
４１１Ｃ信号合成部
４１１Ｄ、５１３Ｂフィルタリング部
５１３１フレーム分割部
５１３３逆アップデート用重み付け部
５１３４逆アップデート処理部
５１３５逆プレディクション処理部
５１３６ＭＣ部
５１３８逆プレディクション用重み付け部
５１３９逆ＩＬＰ用重み付け部
５１３Ａ信号・合成部

DESCRIPTION OF SYMBOLS 11 Space-time decimation part 12 Base layer encoding part 13 Base layer restructuring part 15 Multiplexing part 20 Communication line or media 31 Extraction part 32 Base layer decoding part 40 Encoding part 41 Enhancement layer encoding part 50 Decoding part 51 Enhancement Layer decoding unit 100 Information processing device 106 Central processing control device 110 Enhancement layer encoding unit 114 Enhancement layer decoding unit 411 MCTF-ILP unit 412 Orthogonal transformation / quantization unit 413 Entropy encoding unit 511 Entropy decoding unit 512 Inverse quantization / inverse Orthogonal transformation unit 513 Inverse MCTF-ILP unit 4110, 5132 Input switching unit 4111 Prediction unit 4112 Update weight 4114 5130 Frame management unit 4115, 513C Frame rearrangement unit 4116 Motion information update unit 4117, 5137 Resolution interpolation unit 4118 ME / MC unit 4119 Weight instruction unit 411A Prediction weighting unit 411B For ILP Weighting unit 411C Signal combining unit 411D, 513B Filtering unit 5131 Frame division unit 5133 Reverse update weighting unit 5134 Reverse update processing unit 5135 Reverse prediction processing unit 5136 MC unit 5138 Reverse prediction weighting unit 5139 Reverse ILP weighting unit 513A Signal・ Synthesizer

Claims

A frame sequence composed of an L frame in a low frequency division frequency band and an H frame in a high frequency division frequency band, which are subband divided into two division frequency bands by performing compensation time direction filtering on the input video signal A first encoded signal obtained by generating a signal and performing orthogonal transform / quantization and entropy encoding on the frame sequence signal is encoded with a video signal having a lower resolution than the input video signal. A video signal encoding device for outputting together with the encoded signal of 2;
Reduction means for space-time reduction of the input video signal by a predetermined ratio to generate a low-resolution video signal having a space-time resolution different from the input video signal;
Encoding means for encoding the low resolution video signal to generate the second encoded signal;
Local decoding means for locally decoding the second encoded signal to generate a local decoded signal;
Expansion means for expanding the spatial resolution of the local decoded signal at a ratio opposite to that of the reduction means, and generating a high spatial resolution video signal;
Correspondence between the reference frame and the other frame is estimated by estimating a motion of another frame among the plurality of frames using one of the different frames existing in the time direction of the input video signal as a reference frame. Motion estimation / compensation means for acquiring motion information indicating a relative positional relationship of each block and compensating for motion of the other frame based on the motion information;
First weighting means for weighting the input video signal motion-compensated by the motion estimation / compensation means with a first weight;
Second weighting means for weighting the high spatial resolution video signal generated by the enlarging means from the local decoded signal with a second weight calculated using the first weight;
First processing means for generating the H frame by combining and filtering the respective signals weighted by the first and second weighting means;
Third weighting means for weighting the H frame generated by the first processing means with the first weight;
Applying the motion information acquired by the motion estimation / compensation means in the reverse direction to the H frame weighted by the third weighting means to perform motion compensation and then filtering to generate the L frame. Two processing means;
Frame sequence signal generating means for generating and outputting the frame sequence signal by combining the H frame and the L frame generated by the first and second processing units, respectively;
Frame sequence signal encoding means for performing orthogonal transform / quantization and entropy encoding on the frame sequence signal generated by the frame sequence signal generation means to generate the first encoded signal. A video signal encoding device.

A frame sequence composed of an L frame in a low frequency division frequency band and an H frame in a high frequency division frequency band, which are subband divided into two division frequency bands by performing compensation time direction filtering on the input video signal A first encoded signal obtained by generating a signal and performing orthogonal transform / quantization and entropy encoding on the frame sequence signal is encoded with a video signal having a lower resolution than the input video signal. A video signal encoding program for causing a computer to execute an operation to be output together with the encoded signal of 2;
The computer,
Reduction means for space-time reduction of the input video signal by a predetermined ratio to generate a low-resolution video signal having a space-time resolution different from the input video signal;
Encoding means for encoding the low resolution video signal to generate the second encoded signal;
Local decoding means for locally decoding the encoded signal to generate a local decoded signal;
Expansion means for expanding the spatial resolution of the local decoded signal at a ratio opposite to that of the reduction means, and generating a high spatial resolution video signal;
The movement of the other frames of the plurality of frames is estimated using one of a plurality of different frames existing in the time direction of the input video signal as a reference frame, and the correspondence between the reference frame and the other frames is estimated. Motion estimation / compensation means for acquiring motion information indicating a relative positional relationship of each block and compensating for the motion of the other frame based on the motion information;
First weighting means for weighting the input video signal motion-compensated by the motion estimation / compensation means with a first weight;
Second weighting means for weighting the high spatial resolution video signal generated by the enlarging means from the local decoded signal with a second weight calculated using the first weight;
First processing means for generating the H frame by combining and filtering the respective signals weighted by the first and second weighting means;
Third weighting means for weighting the H frame generated by the first processing means with the first weight;
Applying the motion information acquired by the motion estimation / compensation means in the reverse direction to the H frame weighted by the third weighting means to perform motion compensation and then filtering to generate the L frame. Two processing means;
Frame sequence signal generating means for generating and outputting the frame sequence signal by synthesizing the H frame and the L frame generated by the first and second processing units, respectively;
The frame sequence signal generated by the frame sequence signal generation means is subjected to orthogonal transform / quantization and entropy encoding to function as a frame sequence signal encoding means for generating the first encoded signal. A video signal encoding program.

The first and second encoded signals, the motion information, and the first to second signals output from the video signal encoding device according to claim 1 or generated by the video signal encoding program according to claim 2. A video signal decoding apparatus for receiving and decoding a signal including weight information indicating a third weight,
Receiving means for receiving and separating the first and second encoded signals, respectively;
First decoding means for decoding the H frame and the L frame by performing entropy decoding on the first encoded signal from the receiving means and then performing inverse quantization and inverse orthogonal transform;
Second decoding means for decoding the second encoded signal from the receiving means to generate the low resolution video signal;
Expansion means for enlarging a spatial resolution of the low resolution video signal decoded by the second decoding means to generate a high spatial resolution video signal;
First weighting means for weighting the H frame from the first decoding means with the first weight obtained from the received weight information;
The H frame or decoded video signal weighted by the first weighting unit is applied to the motion information received by the receiving unit in the reverse direction to perform motion compensation and then filtered to obtain a first decoded signal. First processing means to generate;
A second weighting that weights the first decoded signal with the first weight obtained from the weight information received by the receiving means after applying motion compensation to the motion information received by the receiving means; Means,
Third weighting means for weighting the high spatial resolution video signal from the enlarging means with a second weight obtained from the weight information received by the receiving means;
The second weighted signal is synthesized by the second and third weighting means, and the second decoded signal is generated by performing filtering on the synthesized signal and the H frame received and separated by the receiving means. Two processing means;
A video signal decoding apparatus comprising: a decoded video signal generation unit configured to combine the first decoded signal and the second decoded signal to generate the decoded video signal.

The first and second encoded signals, the motion information, and the first to second signals output by the video signal encoding device according to claim 1 or generated by the video signal encoding program according to claim 2. A video signal decoding program for receiving a signal composed of weight information indicating a third weight and decoding the signal by a computer,
The computer,
Receiving means for receiving and separating the first and second encoded signals, respectively;
First decoding means for decoding the H frame and the L frame by performing entropy decoding on the first encoded signal from the receiving means and then performing inverse quantization and inverse orthogonal transform;
Second decoding means for decoding the second encoded signal from the receiving means to generate the low resolution video signal;
Expansion means for enlarging a spatial resolution of the low resolution video signal decoded by the second decoding means to generate a high spatial resolution video signal;
First weighting means for weighting the H frame from the first decoding means with the first weight obtained from the received weight information;
The H frame or decoded video signal weighted by the first weighting unit is applied to the motion information received by the receiving unit in the reverse direction to perform motion compensation and then filtered to obtain a first decoded signal. First processing means to generate;
A second weighting that weights the first decoded signal with the first weight obtained from the weight information received by the receiving means after applying motion compensation to the motion information received by the receiving means; Means,
Third weighting means for weighting the high spatial resolution video signal from the enlarging means with a second weight obtained from the weight information received by the receiving means;
The second weighted signal is synthesized by the second and third weighting means, and the second decoded signal is generated by filtering the synthesized signal and the H frame received and separated by the receiving means. Two processing means;
Decoded video signal generation means for generating the decoded video signal by combining the first decoded signal and the second decoded signal;
And a video signal decoding program characterized by the above.