JPH08307868A

JPH08307868A - Moving image decoder

Info

Publication number: JPH08307868A
Application number: JP10502795A
Authority: JP
Inventors: Ichiro Tamiya; 一郎民谷; Yoichi Katayama; 陽一片山
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1995-04-28
Filing date: 1995-04-28
Publication date: 1996-11-22

Abstract

PURPOSE: To realize the moving image decoder for a storage system medium of a video rate at a low cost. CONSTITUTION: The decoder is made up of a variable length code decoding circuit 1, an inverse quantization circuit 2, an inverse discrete cosine transformation circuit 3, and an inter-frame motion compensation prediction generating circuit 4. A buffer memory 5 is arranged between the inverse quantization circuit 2 and the inverse discrete cosine transformation circuit 3 and a buffer memory 5 is used to realize a matrix transposition function required for inverse zigzag conversion processing and 2-dimension inverse DCT transformation.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ＭＰＥＧ方式に代表さ
れる、フレーム間予測と離散コサイン変換のハイブリッ
ド構成で圧縮されたビットストリームを伸長する再生装
置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a reproducing apparatus for decompressing a bit stream compressed by a hybrid structure of inter-frame prediction and discrete cosine transform represented by MPEG system.

【０００２】[0002]

【従来の技術】ビデオ信号をコンパクトディスク（Ｃ
Ｄ）のような比較的狭帯域のディジタルストレージメデ
ィアに格納することを目的に、高能率符号化処理が施さ
れる。国際標準化組織ＩＳＯ−ＩＥＣＪＴＣ１／ＳＣ
２／ＷＧ１１（以下、通称ＭＰＥＧ）で、１．５Ｍｂｐ
ｓ相当のメディアに対応した符号化方式が検討された。
検討方式の概要については、例えば、画像電子学会誌第
２０巻第４号３０６〜３１６頁に解説が掲載されてい
る。掲載内容によれば、動き補償フレーム間予測と離散
コサイン変換（ＤＣＴ）、量子化、可変長符号化を組み
合わせたハイブリッド符号化方式であり、ＶＨＳビデオ
相当の画質を実現する。また、現行ＴＶ放送の画質を保
って１５Ｍｂｐｓ以内（実用３〜８Ｍｂｐｓ）に圧縮す
るＭＰＥＧ２方式も標準化されている。これらの方式
は、いずれもビデオ信号の圧縮にフレーム間予測方式と
離散コサイン変換（ＤＣＴ：ＤｉｓｃｒｅｔｅＣｏｓ
ｉｎｅＴｒａｎｓｆｏｒｍ）を用いている。また、主
にＴＶ電話／会議に用いられるＩＴＵ−ＴＨ．２６１方
式も同様にフレーム間予測方式と離散コサイン変換を用
いている。2. Description of the Related Art Video signals are transmitted to a compact disc (C
A high-efficiency encoding process is performed for the purpose of storing in a relatively narrow band digital storage medium such as D). International standardization organization ISO-IEC JTC1 / SC
2 / WG11 (hereinafter MPEG), 1.5 Mbp
An encoding method compatible with s-equivalent media has been studied.
For an overview of the study method, for example, the explanation is published in Journal of Image Electronics Engineers, Vol. 20, No. 4, pages 306 to 316. According to the contents of the publication, it is a hybrid coding method that combines motion-compensated interframe prediction, discrete cosine transform (DCT), quantization, and variable length coding, and realizes image quality equivalent to VHS video. In addition, the MPEG2 system for maintaining the image quality of the current TV broadcast and compressing it within 15 Mbps (practical 3 to 8 Mbps) is also standardized. All of these methods are for inter-frame prediction method and discrete cosine transform (DCT: Discrete Cos) for video signal compression.
ine Transform) is used. In addition, ITU-TH. Similarly, the H.261 method also uses the inter-frame prediction method and the discrete cosine transform.

【０００３】こうしたＭＰＥＧ方式に基づいて圧縮され
たビデオ信号をリアルタイム復号するには、可変長復
号、逆ジグザグ変換、逆量子化、逆ＤＣＴ、動き補償と
いう一連の処理をハードワイアドロジックで高速処理す
ることが多い。このとき、可変長符号は、信号により符
号量が異なるため、可変長復号処理に要する処理時間も
信号により大きく変動する。一方、逆量子化や逆ＤＣＴ
は信号によらず一定である。このため、可変長復号結果
に対し以降の処理を施す前に一旦メモリに格納し、処理
時間差を吸収させる必要がある。In order to decode a video signal compressed based on the MPEG system in real time, a series of processes such as variable length decoding, inverse zigzag conversion, inverse quantization, inverse DCT, and motion compensation are processed at high speed by hardwired logic. I often do it. At this time, since the variable-length code has a different code amount depending on the signal, the processing time required for the variable-length decoding process also largely varies depending on the signal. On the other hand, inverse quantization and inverse DCT
Is constant regardless of the signal. Therefore, it is necessary to temporarily store the variable length decoding result in the memory to absorb the processing time difference before performing the subsequent processing.

【０００４】復号処理ハードウェアの一例として、１９
９４年電子情報通信学会春季大会Ｃ−６５８として発表
されたデコーダＬＳＩがある。記載内容によれば、可変
長復号、ランレングス復号した固定長符号を一旦メモリ
回路に格納している。メモリに格納された固定長符号
は、ジグザクスキャン順からブロックスキャン順に戻し
ながら読み出され、逆量子化と逆ＤＣＴを施され、別の
メモリに格納される。更に動き補償回路によってフレー
ム間予測信号を再生し、逆ＤＣＴ結果と加算することで
復号画像を得ている。ブロックによってフレーム間予測
方式が異なることから、動き補償処理も処理時間に違い
が出るのでＤＣＴまでの処理との処理時間差を吸収する
ためにＤＣＴ結果を格納するメモリが配置されている。As an example of the decoding processing hardware, 19
There is a decoder LSI announced as C-658 of the 1994 IEICE Spring Conference. According to the description, the fixed length code subjected to the variable length decoding and the run length decoding is temporarily stored in the memory circuit. The fixed-length code stored in the memory is read while returning from the zigzag scan order to the block scan order, subjected to inverse quantization and inverse DCT, and stored in another memory. Further, the motion compensation circuit reproduces the inter-frame prediction signal and adds it to the inverse DCT result to obtain a decoded image. Since the inter-frame prediction method differs depending on the block, the processing time also differs in the motion compensation processing. Therefore, a memory for storing the DCT result is arranged to absorb the processing time difference from the processing up to DCT.

【０００５】この従来方式におけるメモリ回路配置で
は、可変長復号処理の時間調整用メモリは逆量子化前に
配置し、そこで、ジグザグスキャン順からブロックスキ
ャン順への変換も実現している。これは、時間調整機能
とスキャン変換機能を兼ねさせることでメモリの節約を
はかり、更に、逆量子化によってデータのダイナミック
レンジが拡大する前にメモリを配置することで、データ
の格納に必要な記憶容量を少なく出来るからである。例
えば、ＭＰＥＧ１方式では、逆量子化前のビット幅は９
ｂｉｔ、逆量子化後は１２ｂｉｔに増加するため、１デ
ータあたり３ｂｉｔのメモリ節約となる。In the memory circuit arrangement according to this conventional system, the time adjustment memory for the variable length decoding process is arranged before the dequantization, and the conversion from the zigzag scan order to the block scan order is also realized there. This is to save memory by combining the time adjustment function and the scan conversion function. Furthermore, by arranging the memory before the dynamic range of data is expanded by dequantization, the memory required for storing data is stored. This is because the capacity can be reduced. For example, in the MPEG1 system, the bit width before dequantization is 9
Since the number of bits is increased to 12 bits after dequantization, the memory is saved by 3 bits per data.

【０００６】次に、上記の従来の回路構成の問題点を図
８を用いながら説明する。ここでは、可変長復号から逆
離散コサイン変換までのデコード処理部分のみを抽出し
て記している。逆離散コサイン変換までの処理は、可変
長復号回路９１、逆ジグザグ変換回路９２、逆量子化回
路９３、逆離散コサイン変換回路９４、転置用メモリ９
５により実現される。逆ジグザク変換回路９２は、可変
長復号回路９１と逆量子化回路９３の間に配置され、可
変長復号回路９１の出力を一旦内部のメモリに蓄えるこ
とで逆ジグザグスキャン変換を行ない逆量子化回路への
データ供給に用いる。この変換の実現には内部メモリへ
の書き込み時もしくは読み出し時に、ジグザクスキャン
順からブロック内ラスタ走査順にメモリのアドレス発生
順を入れかえることによって、逆ジグザグスキャン変換
を実現する。Next, problems of the above conventional circuit configuration will be described with reference to FIG. Here, only the decoding processing portion from variable length decoding to inverse discrete cosine transform is extracted and described. The process up to the inverse discrete cosine transform is performed by the variable length decoding circuit 91, the inverse zigzag transform circuit 92, the inverse quantizing circuit 93, the inverse discrete cosine transform circuit 94, the transposing memory 9
It is realized by 5. The inverse zigzag conversion circuit 92 is arranged between the variable length decoding circuit 91 and the inverse quantization circuit 93, and temporarily stores the output of the variable length decoding circuit 91 in the internal memory to perform the inverse zigzag scan conversion and the inverse quantization circuit. Used to supply data to To realize this conversion, the reverse zigzag scan conversion is realized by changing the address generation order of the memory from the zigzag scan order to the intra-block raster scan order at the time of writing or reading to the internal memory.

【０００７】逆離散コサイン変換回路９４では逆量子化
されたデータが８個揃った時点で行方向の逆コサイン変
換を施し、結果を転置用メモリ９５に書き込んで行く。
逆量子化回路９３から供給されるデータに行方向変換を
施し終えた時点で逆離散コサイン変換回路９４は転置用
メモリ９５からデータを読み出し、列方向の逆コサイン
変換を施しながら後段に出力する。この様に、逆離散コ
サイン変換回路９４を行／列双方に時分割で用いる場合
は、列方向の逆コサイン変換を施している間は逆量子化
回路９３の出力は受け付けられないため、逆量子化回路
９３は待ち状態となる。言い替えれば、逆離散コサイン
変換回路９４が行方向変換を実行する時間にタイミング
を合わせて逆量子化処理を済ませなければならず、その
ために逆量子化回路９３の回路を並列化するなどして高
速化している。The inverse discrete cosine transform circuit 94 performs the inverse cosine transform in the row direction when eight pieces of the inversely quantized data are collected, and writes the result in the transposing memory 95.
The inverse discrete cosine transform circuit 94 reads the data from the transposing memory 95 at the time when the row-direction transform is completed on the data supplied from the inverse quantizer circuit 93, and outputs it to the subsequent stage while performing the inverse cosine transform in the column direction. As described above, when the inverse discrete cosine transform circuit 94 is used for both rows / columns in a time division manner, the output of the inverse quantization circuit 93 is not accepted while performing the inverse cosine transform in the column direction, and thus the inverse quantum The conversion circuit 93 enters a waiting state. In other words, the inverse discrete cosine transform circuit 94 has to complete the inverse quantization processing in time with the time when the row-direction transformation is executed, and therefore, the circuit of the inverse quantization circuit 93 is parallelized to achieve high speed. It has become.

【０００８】図８の逆離散コサイン変換回路９４を２個
用意し、転置メモリを挟んで行変換と列変換を同時動作
させると、逆量子化回路９３が待ち状態となることは回
避できる。しかしながら、逆量子化よりも演算量の多い
逆離散コサイン変換回路９４を２組搭載することに伴う
回路規模増大を招いてしまう。If two inverse discrete cosine transform circuits 94 shown in FIG. 8 are prepared and the row transform and the column transform are simultaneously operated with the transposing memory interposed, it is possible to avoid the inverse quantizing circuit 93 from entering a waiting state. However, the circuit scale increases as a result of mounting two sets of the inverse discrete cosine transform circuits 94, which require a larger amount of calculation than the inverse quantization.

【０００９】[0009]

【発明が解決しようとする課題】従来方式のメモリ配置
構成では、逆ジグザグスキャン変換機能を伴う時間調整
用メモリから読み出されたデータに対して、逆量子化と
逆ＤＣＴがメモリに蓄えられることなく連続して処理さ
れる。このとき、逆量子化と逆ＤＣＴ処理時間のバラン
スをとるため並列処理の導入による動作速度の高速化が
必要であった。ところが、逆量子化と逆ＤＣＴがいずれ
も乗算を多用する演算量の多い処理のためいずれの場合
も復号装置全体としてコスト高を招くという問題があ
る。本発明が解決しようとする課題は、上記問題点を回
避してビデオ復号装置全体の処理効率を向上し、もっと
低コストの復号装置を提供することにある。In the conventional memory arrangement configuration, the inverse quantization and the inverse DCT are stored in the memory for the data read from the time adjustment memory having the inverse zigzag scan conversion function. It is processed continuously without. At this time, in order to balance the inverse quantization and inverse DCT processing times, it was necessary to increase the operation speed by introducing parallel processing. However, both of the inverse quantization and the inverse DCT have a problem in that the cost of the entire decoding device increases because of the large amount of computations that frequently use multiplication. The problem to be solved by the present invention is to avoid the above problems, improve the processing efficiency of the entire video decoding apparatus, and provide a decoding apparatus at a lower cost.

【００１０】[0010]

【課題を解決するための手段】上記の目的を達成するた
めに、本発明による動画像復号装置は、可変長復号回
路、逆量子化回路、逆離散コサイン変換回路、動き補償
信号生成回路という一連の演算回路配置において、逆量
子化回路と逆離散コサイン変換回路間にバッファメモリ
を配置し、このバッファメモリを用いて、逆ジグザグ変
換と２次元逆離散コサイン変換に必要な行列転置機能を
実現している。また、このような動画像復号装置全体を
効率良く制御する方法を与えている。In order to achieve the above object, a moving picture decoding apparatus according to the present invention comprises a series of a variable length decoding circuit, an inverse quantization circuit, an inverse discrete cosine transform circuit, and a motion compensation signal generation circuit. In the arithmetic circuit arrangement of, a buffer memory is arranged between the inverse quantization circuit and the inverse discrete cosine transform circuit, and the matrix transposing function required for the inverse zigzag transform and the two-dimensional inverse discrete cosine transform is realized by using this buffer memory. ing. In addition, a method for efficiently controlling the entire moving image decoding apparatus is provided.

【００１１】[0011]

【作用】従来方式と本発明方式で必要とされるメモリ容
量を見積もる。逆ＤＣＴ変換の実現には１次元逆ＤＣＴ
を行方向と列方向の２回に分けて計算することが多い。
いずれの場合も、行（もしくは列）方向に逆ＤＣＴを施
した結果を転置して列（もしくは行）方向順に変換する
ためのメモリ回路が必要となる。この転置回路には逆Ｄ
ＣＴの計算に必要な精度を保つために通常１６ｂｉｔ程
度のメモリを用いる。また、逆量子化前後のデータ表現
に要するビット数は各々９ｂｉｔ，１２ｂｉｔ程度であ
る。更に、バッファメモリへの書き込みまでの処理とバ
ッファメモリから読み出された以降の処理を同時実行さ
せるためにダブルバッファ構成のメモリを用いることが
多い。ここでは上記のような前提の下で比較してみる。The memory capacity required by the conventional method and the present invention method is estimated. One-dimensional inverse DCT is used to realize the inverse DCT transform.
Is often divided into two calculations, one in the row direction and one in the column direction.
In either case, a memory circuit is required to transpose the result of the inverse DCT in the row (or column) direction and convert the result in the column (or row) direction. This transposition circuit has an inverse D
A memory of about 16 bits is usually used to maintain the accuracy required for CT calculation. Further, the number of bits required to represent the data before and after the inverse quantization is about 9 bits and 12 bits, respectively. Further, a memory having a double buffer structure is often used in order to simultaneously execute the processing up to writing to the buffer memory and the processing thereafter read from the buffer memory. Here, let's compare under the above assumptions.

【００１２】従来方式では、逆量子化前のデータをダブ
ルバッファで格納するために、９ｂｉｔ×２×Ｂ＝１８
Ｂ［ｂｉｔ］が必要である。ここでＢは、１ブロックの
ＤＣＴ係数を構成するデータ数であり、ＭＰＥＧやＨ．
２６１では６４となる。また、これとは別に逆ＤＣＴに
用いる転置メモリは１６ｂｉｔ×１×Ｂ＝１６Ｂ［ｂｉ
ｔ］となり、合計３４Ｂ［ｂｉｔ］の容量が必要であ
る。In the conventional method, 9 bits × 2 × B = 18 are stored in order to store the data before dequantization in the double buffer.
B [bit] is required. Here, B is the number of pieces of data that make up one block of DCT coefficient, and is the MPEG or H.264 standard.
It becomes 64 in 261. Separately, the transpose memory used for the inverse DCT is 16 bits × 1 × B = 16B [bi
t], and a total capacity of 34 B [bit] is required.

【００１３】一方本発明では、可変長復号処理の時間調
整バッファメモリを逆量子化後に配置し、そこで逆スキ
ャン変換を施す。また、逆ＤＣＴ回路の出力をバッファ
メモリに格納させて転置メモリの機能をも同時に実現し
ている。したがって、必要なメモリ容量は１６ｂｉｔ×
２×Ｂ＝３２Ｂ［ｂｉｔ］となる。この様に、逆ジグザ
グ変換に必要なメモリ容量自体は９ｂｉｔから１６ｂｉ
ｔに増えても可変長復号処理から逆ＤＣＴまでに必要な
メモリ容量全体を考慮すれば、逆に本発明方式によるメ
モリ配置の方が少ないメモリ容量で実現できることが分
かる。また、転置メモリ機能を共有させたことによって
必要なメモリ数を３個から２個に減らすことが出来る。On the other hand, in the present invention, the time adjustment buffer memory for the variable length decoding process is arranged after the inverse quantization, and the inverse scan conversion is performed there. Further, the output of the inverse DCT circuit is stored in the buffer memory to simultaneously realize the function of the transposition memory. Therefore, the required memory capacity is 16 bits x
2 × B = 32 B [bit]. Thus, the memory capacity required for the inverse zigzag conversion is 9 bits to 16 bits.
Considering the entire memory capacity required from the variable length decoding process to the inverse DCT even if the number is increased to t, conversely, it can be seen that the memory arrangement according to the method of the present invention can be realized with a smaller memory capacity. Also, by sharing the transposed memory function, the required number of memories can be reduced from three to two.

【００１４】本発明に従ってバッファメモリを配置した
復号装置では、可変長復号処理と逆量子化までを第１
段、逆ＤＣＴ演算を第２段、動き補償処理を第３段とし
たパイプライン処理が行われるような制御が可能とな
る。この制御方法の特徴として、逆量子化後に配置した
バッファメモリのバンク切替えと同時に逆ＤＣＴ演算と
次ブロックの可変長復号処理を起動すること、また、前
記次ブロックの可変長復号処理の起動と同時に前ブロッ
クのフレーム加算処理を開始させることが挙げられる。In the decoding device in which the buffer memory is arranged according to the present invention, the variable length decoding process and the dequantization are performed first.
It is possible to perform control such that pipeline processing in which the second stage is the inverse DCT calculation and the third stage is the motion compensation process is performed. A feature of this control method is that the inverse DCT operation and the variable length decoding process of the next block are started at the same time when the banks of the buffer memory arranged after the inverse quantization are switched, and the variable length decoding process of the next block is started at the same time. One example is to start the frame addition process of the previous block.

【００１５】[0015]

【実施例】図１は本発明による動画像符号化装置の一実
施例を示している。本実施例は、可変長復号回路１、逆
量子化回路２、逆離散コサイン変換回路３、予測信号生
成回路４、バッファメモリ回路５，６，７、フレーム加
算回路８、フレームメモリ９、タイミング制御回路１０
より構成される。可変長復号回路１に供給される圧縮ビ
デオストリームは、固定長符号列に変換されて逆量子化
回路２に供給される。逆量子化回路２では、予め与えら
れた量子化ステップサイズと量子化テーブルの値の乗算
による逆量子化処理を施し、バッファメモリ５に書き込
む。逆量子化を施すデータの順番は可変長復号で得られ
るデータ順、すなわち、ジグザグスキャン順であるが、
バッファメモリ５に格納する際にジグザグスキャン順に
対応したアドレスを発生させることによって逆ジグザグ
スキャン変換を施す。バッファメモリ５に書き込まれた
データは、後に逆離散コサイン変換３により読み出され
行方向の１次元逆離散コサイン変換を施される。行方向
に逆離散コサイン変換が施されたデータは、一旦バッフ
ァメモリ５に書き戻された後に再度読み出され列方向の
逆離散コサイン変換が施される。このようにして逆離散
コサイン変換で得られたフレーム間予測誤差信号がバッ
ファメモリ６に格納される。1 shows an embodiment of a moving picture coding apparatus according to the present invention. In this embodiment, a variable length decoding circuit 1, an inverse quantization circuit 2, an inverse discrete cosine transform circuit 3, a prediction signal generation circuit 4, buffer memory circuits 5, 6, 7, a frame addition circuit 8, a frame memory 9, timing control. Circuit 10
It is composed of The compressed video stream supplied to the variable length decoding circuit 1 is converted into a fixed length code string and supplied to the inverse quantization circuit 2. The inverse quantization circuit 2 performs an inverse quantization process by multiplying a quantization step size given in advance and a value of a quantization table, and writes it in the buffer memory 5. The order of data to be subjected to inverse quantization is the data order obtained by variable length decoding, that is, the zigzag scan order,
The inverse zigzag scan conversion is performed by generating addresses corresponding to the zigzag scan order when storing in the buffer memory 5. The data written in the buffer memory 5 is later read by the inverse discrete cosine transform 3 and subjected to the one-dimensional inverse discrete cosine transform in the row direction. The data subjected to the inverse discrete cosine transform in the row direction is once written back to the buffer memory 5 and then read again to be subjected to the inverse discrete cosine transform in the column direction. The inter-frame prediction error signal thus obtained by the inverse discrete cosine transform is stored in the buffer memory 6.

【００１６】フレームメモリ９には、復号された画像信
号が格納されている。予測信号生成回路４は以前に復号
されフレームメモリ９に格納された画像データからフレ
ーム間予測信号を生成し、バッファメモリ７に格納す
る。バッファメモリ６とバッファメモリ７に各々格納さ
れた信号をフレーム加算回路８で加算して復号画像信号
を得、フレームメモリ９に格納する。タイミング制御回
路１０は、後に詳述するタイミングで可変長復号回路
１、バッファメモリ回路５、フレーム加算回路８の起動
タイミングを調整することによって全体の制御を行う。The frame memory 9 stores the decoded image signal. The prediction signal generation circuit 4 generates an interframe prediction signal from the image data previously decoded and stored in the frame memory 9, and stores it in the buffer memory 7. The signals respectively stored in the buffer memory 6 and the buffer memory 7 are added by the frame addition circuit 8 to obtain a decoded image signal, which is stored in the frame memory 9. The timing control circuit 10 adjusts the activation timing of the variable length decoding circuit 1, the buffer memory circuit 5, and the frame addition circuit 8 at a timing which will be described in detail later, thereby performing overall control.

【００１７】図２は可変長復号回路１の一例を示してい
る。０から７までのシフトを実現するバレルシフタ２
１、可変長復号テーブル２２、判定回路２３、シフト量
計算回路２５、８ｂｉｔのレジスタ２６，２７、ゼロラ
ン長をカウントするカウンタ２９、復号されたレベル値
を格納するレジスタ２０、セレクタ２８より構成され
る。以下では簡単のために、離散コサイン変換係数に対
する可変長符号の復号処理を説明する。判定回路２３
は、内部状態を持つシーケンサで、内部状態とバレルシ
フタ２１の出力から１個の可変長符号を復号してコサイ
ン変換係数を表すレベル値とこれに先立つゼロラン長を
決定し、各々レジスタ２０とカウンタ２９に出力する。
判定回路２３が一つの可変長符号を検出するまで可変長
復号テーブル２２の内容に従って、７ｂｉｔ以下のシフ
トはバレルシフタ２１で、８ｂｉｔの倍数のシフトはレ
ジスタ２７，２８を使って入力ストリームのシフトを繰
り返す。一般に、バレルシフタの出力として一度に参照
するデータのビット数と可変長符号の最大符号長によっ
て１つの符号をデコードし終えるまでに要するステップ
数が定まる。ここではバレルシフタ出力は６ビットと
し、可変長符号の最大長を２４ビットとすれば、この様
な回路では高々４ステップの計算で可変長符号１個数の
デコードが終了する。FIG. 2 shows an example of the variable length decoding circuit 1. Barrel shifter 2 for shifting from 0 to 7
1. Variable length decoding table 22, determination circuit 23, shift amount calculation circuit 25, 8-bit registers 26, 27, counter 29 for counting zero run length, register 20 for storing the decoded level value, and selector 28. . For the sake of simplicity, a variable length code decoding process for discrete cosine transform coefficients will be described below. Judgment circuit 23
Is a sequencer having an internal state, and decodes one variable length code from the internal state and the output of the barrel shifter 21 to determine a level value representing a cosine transform coefficient and a zero run length preceding this, and respectively register 20 and counter 29. Output to.
According to the contents of the variable length decoding table 22, until the judgment circuit 23 detects one variable length code, the shift of 7 bits or less is repeated by the barrel shifter 21, and the shift of the multiple of 8 bits is repeated by using the registers 27 and 28 to repeat the shift of the input stream. . In general, the number of steps required to finish decoding one code is determined by the number of bits of data referred to at one time as the output of the barrel shifter and the maximum code length of the variable-length code. Here, assuming that the barrel shifter output is 6 bits and the maximum length of the variable length code is 24 bits, in such a circuit, the decoding of one variable length code is completed by calculating at most 4 steps.

【００１８】復号した符号が１個以上のゼロランをもつ
場合は、その数がカウンタ２９の初期値として格納さ
れ、カウンタ２９をカウントダウンすると同時にセレク
タ２８の出力を‘０’とする。カウンタ２９の値が０と
なったら、レジスタ２０の値を出力するとともにカウン
タ２９のカウントダウンを終了する。また、エンド・オ
ブ・ブロックコードを検出した場合には直前のレベル値
以降６４個になるまで値‘０’出力を繰り返す様にカウ
ンタ２９が判定回路２３によって設定される。When the decoded code has one or more zero runs, the number is stored as the initial value of the counter 29, and the counter 29 is counted down and the output of the selector 28 is set to "0". When the value of the counter 29 becomes 0, the value of the register 20 is output and the countdown of the counter 29 is completed. Further, when the end-of-block code is detected, the counter 29 is set by the determination circuit 23 so that the value “0” is repeatedly output until the number becomes 64 after the immediately preceding level value.

【００１９】ここで、図示せずも、可変長復号回路１は
逆量子化回路２のレディ信号を観察し、レディでない場
合は次のデータ出力を待ち合わせる。但し、待ち合わせ
が起こった場合も、デコード結果はレジスタ２０とカウ
ンタ２９に格納されるので、次のＤＣＴ係数に対するデ
コード処理を進めることができる。このようにして、８
ｂｉｔの入力ポートからビットストリームが途切れなく
供給される限り、高々４クロックに１個の割合でデコー
ドされたレベル信号が出力される。Here, although not shown, the variable length decoding circuit 1 observes the ready signal of the inverse quantization circuit 2 and waits for the next data output if it is not ready. However, even if a wait occurs, the decoding result is stored in the register 20 and the counter 29, so that the decoding process for the next DCT coefficient can be advanced. In this way, 8
As long as the bit stream is continuously supplied from the bit input port, the decoded level signal is output at a rate of one every four clocks.

【００２０】図３は逆量子化回路２の一例を示してい
る。逆量子化回路２は、部分積乗算器３０１、量子化ス
ケールを格納するレジスタ３０２、量子化テーブル３０
３、セレクタ３０４，３１０、シフタ３０５、クリップ
処理回路３０７、レジスタ３０８，３１１、加算器３０
９、逆量子化処理をバイパスするセレクタ３１２より構
成される。レジスタ３０２と量子化テーブル３０３に
は、ビットストリームの他の階層で供給される量子化ス
テップ値と量子化マトリクスデータが各々格納される。
ここで説明するＤＣＴ係数のデコード時点ではいずれも
設定済みとなっている。逆量子化操作は、可変長復号回
路でデコードされた各ＤＣＴ係数に量子化テーブル３０
３内の対応する値と、レジスタ３０２の量子化ステップ
値を乗算する処理である。こののちクリップ処理回路３
０７により偶数奇数変換やクリップ処理が施される。FIG. 3 shows an example of the inverse quantization circuit 2. The inverse quantization circuit 2 includes a partial product multiplier 301, a register 302 that stores a quantization scale, and a quantization table 30.
3, selectors 304 and 310, shifter 305, clip processing circuit 307, registers 308 and 311, adder 30
9. The selector 312 bypasses the inverse quantization process. The register 302 and the quantization table 303 respectively store the quantization step value and the quantization matrix data supplied in another layer of the bitstream.
All have been set at the time of decoding the DCT coefficient described here. In the inverse quantization operation, the quantization table 30 is applied to each DCT coefficient decoded by the variable length decoding circuit.
This is a process of multiplying the corresponding value in 3 by the quantization step value of the register 302. After this, clip processing circuit 3
At 07, even-odd conversion and clip processing are performed.

【００２１】入力されたＤＣＴ係数には、先ず第１のス
テップとして、セレクタ量子化テーブル３０３から読み
だされた８ｂｉｔデータと入力データの乗算が行われ
る。このときセレクタ３０４，３１０は、いずれもＡ入
力を選択する。また、レジスタ３０８は値‘０’に初期
化される。部分積乗算器３０１にはセレクタ３０４で選
択される量子化テーブル３０３の値が供給されるが、シ
フタ３０５によって下位４ｂｉｔが選ばれて部分積乗算
器３０１で部分乗算が施され、レジスタ３０８に結果が
格納される。次に第２のステップで、シフタ３０５によ
って量子化テーブル３０３の出力から上位４ｂｉｔが供
給され、部分積乗算器３０１と入力との間で部分積が施
される。レジスタ３０８に格納された下位４ｂｉｔとの
部分積と部分積乗算器３０１で求まった部分積値が加算
器３０９で加算されることによって入力データと量子化
テーブルの値の積が求まり、レジスタ３１１に格納され
る。このときレジスタ３０８は‘０’に初期化される。First, as a first step, the input DCT coefficient is multiplied by the 8-bit data read from the selector quantization table 303 and the input data. At this time, the selectors 304 and 310 both select the A input. Further, the register 308 is initialized to the value “0”. The value of the quantization table 303 selected by the selector 304 is supplied to the partial product multiplier 301, but the lower 4 bits are selected by the shifter 305, partial multiplication is performed by the partial product multiplier 301, and the result is stored in the register 308. Is stored. Next, in the second step, the upper 4 bits are supplied from the output of the quantization table 303 by the shifter 305, and a partial product is applied between the partial product multiplier 301 and the input. The product of the input data and the value of the quantization table is obtained by adding the partial product of the lower 4 bits stored in the register 308 and the partial product value obtained by the partial product multiplier 301 by the adder 309, and the result is stored in the register 311. Is stored. At this time, the register 308 is initialized to "0".

【００２２】次に第３のステップで、セレクタ３０４，
３１０は、いずれもＢ入力を選択し、レジスタ３０２の
内容とレジスタ３１１の値が各々出力される。レジスタ
３１１の値と量子スケール値が部分積乗算器３０１で先
ずシフタ３０５が下位４ｂｉｔを選択して部分積がレジ
スタ３０８に格納される。次に第４のステップで、シフ
タ３０５が上位４ｂｉｔを選択して、部分積乗算器３０
１と加算器３０９による乗累算結果がレジスタ３０８に
格納される。この乗算処理に要する４ステップの処理が
終了したら、図示せずも、逆量子化回路２は可変長復号
回路１にレディ信号を発信し、次のデータ入力が可能な
ことを示す。レジスタ３０８の値はクリップ処理回路３
０７、セレクタ３１２を介して出力される。Next, in the third step, the selector 304,
All 310 select the B input, and the contents of the register 302 and the value of the register 311 are output. In the partial product multiplier 301, the shifter 305 first selects the lower 4 bits of the value of the register 311 and the quantum scale value, and the partial product is stored in the register 308. Next, in the fourth step, the shifter 305 selects the upper 4 bits, and the partial product multiplier 30
The result of multiplication and accumulation by 1 and the adder 309 is stored in the register 308. When the four-step processing required for this multiplication processing is completed, the dequantization circuit 2 sends a ready signal to the variable length decoding circuit 1 to indicate that the next data can be input, although not shown. The value of the register 308 is the clip processing circuit 3
07, and is output via the selector 312.

【００２３】尚、ＤＣＴ係数の直流成分やゼロラン長の
復号結果の様に可変長復号回路１で逆量子化処理を施す
必要のないと分かっているデータについては、セレクタ
３１２を用いて入力データをそのまま１ステップで出力
する。以上のようにして、１個のＤＣＴ係数あたり高々
４ステップで実現され、ゼロラン符号が多い場合には、
より短い時間で逆量子化処理が実現される。For the data such as the DC component of the DCT coefficient and the decoding result of the zero run length, which is known to be unnecessary to be inversely quantized by the variable length decoding circuit 1, the input data is converted by using the selector 312. Output as it is in one step. As described above, when each DCT coefficient is realized in 4 steps at most and there are many zero-run codes,
The inverse quantization process is realized in a shorter time.

【００２４】図４はバッファメモリ５の一例を示してい
る。図において、６４ワードメモリ４１ａ，４１ｂ、メ
モリの出力データを選択するセレクタ４２、離散コサイ
ン変換回路３への読み出しカウンタ４３、逆ジグザグス
キャンアドレス発生器４４、メモリのアドレスを指定す
るセレクタ４５ａ，４５ｂ、書き込みデータを選択する
セレクタ４６ａ，４６ｂ、書き込みアドレス発生器４
７、セレクタ４８、カウンタ４３の出力の上位３ｂｉｔ
と下位３ｂｉｔを入れ替えを行える行・列入れ換え回路
４９より構成される。ここでは、シングルポートＲＡＭ
を２面用いたダブルバッファ構成としている。このダブ
ルバッファは、ＡとＢの２つの状態を持ち、各々の状態
でセレクタ４２，４５ａ，４５ｂ，４６ａ，４６ｂの対
応する入力ポート名のデータが各セレクタの出力として
選択される。ここでは状態Ａすなわち各セレクタがＡと
書かれた側の信号を出力している状態での動作を説明す
る。FIG. 4 shows an example of the buffer memory 5. In the figure, 64-word memories 41a and 41b, a selector 42 for selecting output data of the memory, a read counter 43 to the discrete cosine conversion circuit 3, an inverse zigzag scan address generator 44, selectors 45a and 45b for designating addresses of the memory, Selectors 46a and 46b for selecting write data, write address generator 4
7, high-order 3 bits of the output of the selector 48 and the counter 43
And a row / column switching circuit 49 capable of switching the lower 3 bits. Here, single port RAM
Has a double buffer structure using two surfaces. This double buffer has two states of A and B, and in each state, the data of the input port name corresponding to the selector 42, 45a, 45b, 46a, 46b is selected as the output of each selector. Here, the operation in the state A, that is, the state in which each selector outputs the signal on the side written as A will be described.

【００２５】逆量子化回路３から出力されたデータはメ
モリ４１ｂに書き込まれる。書き込みアドレスは逆ジグ
ザグスキャンアドレス発生器４４から供給され、ジグザ
グスキャン順に入力されたデータがブロックラスタ走査
順となるよう書き込まれる。The data output from the inverse quantization circuit 3 is written in the memory 41b. The write address is supplied from the inverse zigzag scan address generator 44, and the data input in the zigzag scan order is written in the block raster scan order.

【００２６】一方、メモリ４１ａには、一つ前のブロッ
クに対する逆量子化結果が格納されており、その値が後
述する逆離散コサイン変換回路３からの要求に基づいて
読み出しアドレスカウンタ４３、書き込みアドレス発生
器４７、セレクタ４８、行・列入れ換え回路４９を制御
することによって読み出され、２次元の逆離散コサイン
変換が実現される。具体的には、逆離散コサイン変換回
路３が行方向ＤＣＴ演算を実行するときは、読み出しア
ドレスカウンタ４３の出力を行・列入れ換え回路４９で
変換することなくそのまま読み出しアドレスとしてメモ
リ４１ａからデータを読み出すとともに、セレクタ４６
ａを介して供給される逆離散コサイン変換回路３の出力
データを書き込みアドレス発生器４７の発生するアドレ
スに書き込む。また、逆離散コサイン変換回路３が列方
向ＤＣＴ演算を実行するときは、読み出しアドレスカウ
ンタ４３の出力を行・列入れ換え回路４９で変換したア
ドレスを読み出しアドレスとし、メモリ４１ａからセレ
クタ４２を介してデータを読み出す。On the other hand, the memory 41a stores the inverse quantization result for the previous block, and the value thereof is read address counter 43 and write address based on the request from the inverse discrete cosine transform circuit 3 described later. It is read out by controlling the generator 47, the selector 48, and the row / column interchange circuit 49, and the two-dimensional inverse discrete cosine transform is realized. Specifically, when the inverse discrete cosine transform circuit 3 executes a row-direction DCT operation, the output of the read address counter 43 is directly read from the memory 41a as a read address without being converted by the row / column interchange circuit 49. Together with the selector 46
The output data of the inverse discrete cosine transform circuit 3 supplied via a is written in the address generated by the write address generator 47. Further, when the inverse discrete cosine transform circuit 3 executes the column direction DCT operation, the output of the read address counter 43 is converted by the row / column interchange circuit 49 to be the read address, and the data is read from the memory 41 a via the selector 42. Read out.

【００２７】状態Ｂすなわち各セレクタがＢと書かれた
側の信号を出力している状態での動作は、メモリ４１ａ
とメモリ４１ｂの機能が入れ替わる。このようにダブル
バッファ回路は、必ず一方のメモリは逆量子化回路２の
出力書き込み専用として逆ジグザグスキャン機能を実現
し、もう一方のメモリが逆離散コサイン変換回路の入出
力専用として行列の転置機能を実現している。このダブ
ルバッファメモリの切替えは、後述するタイミングでタ
イミング制御回路１０により制御される。The operation in the state B, that is, the state in which each selector outputs the signal on the side written B, is performed by the memory 41a.
And the functions of the memory 41b are exchanged. In this way, in the double buffer circuit, one memory always realizes the inverse zigzag scan function by dedicating the output of the dequantization circuit 2, and the other memory performs the matrix transposition function by dedicating the input / output of the inverse discrete cosine transform circuit. Has been realized. The switching of the double buffer memory is controlled by the timing control circuit 10 at a timing described later.

【００２８】図５は逆離散コサイン変換回路３の一例を
示している。並列乗算器５１ａ，５１ｂ、加算器５２
ａ，５２ｂ、レジスタ５３ａ，５３ｂ，５４ａ，５４
ｂ、加減算器５５、逆ＤＣＴ計算を行うための乗数を格
納した係数ＲＯＭ５６ａ，５６ｂ、４ワードのデータを
格納するレジスタファイル５７ａ，５７ｂより構成され
る。FIG. 5 shows an example of the inverse discrete cosine transform circuit 3. Parallel multipliers 51a and 51b, adder 52
a, 52b, registers 53a, 53b, 54a, 54
b, an adder / subtractor 55, coefficient ROMs 56a and 56b storing multipliers for performing inverse DCT calculation, and register files 57a and 57b storing 4-word data.

【００２９】本実施例による逆ＤＣＴ計算は、ＤＣＴ係
数（ｙ₀ｙ₁・・・ｙ₇）からベクトル（ｘ₀ｘ₁・・
・ｘ₇）を求める以下の計算式に従って計算される。In the inverse DCT calculation according to this embodiment, the DCT coefficient (y ₀ y ₁ ... y ₇ ) is used to calculate the vector (x ₀ x ₁ ...
· X ₇₎ is calculated according to the following calculation formula for obtaining a.

【００３０】[0030]

【数１】 [Equation 1]

【００３１】で定義され、係数ＲＯＭ５６ａと係数ＲＯ
Ｍ５６ｂには各々式（１），（２）に対応する値が書き
込まれている。（１）式と（２）式の結果である４次ベ
クトルを加算することで（ｘ₀ｘ₂ｘ₄ｘ₆）が、ま
た、減算することで（ｘ₁ｘ₃ｘ₅ｘ₇）が求まる。Is defined by the coefficient ROM 56a and the coefficient RO
Values corresponding to equations (1) and (2) are written in M56b, respectively. (X ₀ x ₂ x ₄ x ₆ ) is obtained by adding the fourth-order vector that is the result of equations (1) and (2), and (x ₁ x ₃ x ₅ x ₇ ) is obtained by subtracting it. I want it.

【００３２】バッファメモリ５より１個の８次ベクトル
が読み出され、偶数番目の係数と奇数番目の係数に分か
れてレジスタファイル５７ａ，５７ｂに格納される。こ
のとき、レジスタファイル５７ａ，５７ｂのベクトルデ
ータは、図４の読み出しアドレスカウンタの値を行・列
入れ換え回路４９で入れ換えることなくそのままメモリ
に格納することによって順次読み出され格納されてい
る。One 8th-order vector is read from the buffer memory 5, divided into even-numbered coefficients and odd-numbered coefficients and stored in the register files 57a and 57b. At this time, the vector data of the register files 57a and 57b are sequentially read and stored by directly storing the values of the read address counter of FIG. 4 in the memory without exchanging them by the row / column exchanging circuit 49.

【００３３】レジスタファイル５７ａ，５７ｂのベクト
ルデータは、各々乗算器５１ａと５１ｂによって式
（１），（２）の４×４マトリクスと４次ベクトルが同
時に乗算され加算器５２ａ，５２ｂとレジスタ５３ａ，
５３ｂによって累算される。４ステップの計算で同時に
レジスタ３ａ，３ｂに求まった２つのデータは、次のク
ロックでレジスタ５４ａ，５４ｂに退避され加減算器５
５によってバタフライ演算が施された後に得られた２個
のデータｘ_i，ｘ_3-i（ｉ＝０・・・３）が順次出力さ
れる。ベクトル行列積の結果をレジスタ５４ａ，５４ｂ
に退避することにより、乗算器５１ａ，５１ｂと加算器
５２ａ，５２ｂ、レジスタ５３ａ，５３ｂは、次のベク
トル行列積の計算を連続して開始することができる。こ
の結果は、順次出力ポートを介してバッファメモリ５に
蓄えられる。このときバッファメモリ５では逆ＤＣＴ演
算前のベクトルデータは既にレジスタファイル５７ａ，
５７ｂに読み出された後なので、必要なデータが書き潰
されることは無い。The vector data in the register files 57a and 57b are simultaneously multiplied by the 4 × 4 matrix of the equations (1) and (2) and the quaternary vector by the multipliers 51a and 51b, respectively, and adders 52a and 52b and the register 53a
It is accumulated by 53b. The two data obtained in the registers 3a and 3b at the same time by the calculation of four steps are saved in the registers 54a and 54b at the next clock and added / subtracted by the adder / subtractor 5
Two pieces of data x _i , x _3-i (i = 0 ... 3) obtained after the butterfly operation is performed by 5 are sequentially output. The result of the vector matrix multiplication is registered in the registers 54a and 54b.
By saving to, the multipliers 51a and 51b, the adders 52a and 52b, and the registers 53a and 53b can continuously start calculation of the next vector matrix product. The result is sequentially stored in the buffer memory 5 via the output port. At this time, in the buffer memory 5, the vector data before the inverse DCT operation is already registered in the register file 57a,
Since it has been read by 57b, necessary data is not overwritten.

【００３４】以上のようにして４ステップで２個の値が
求まるので、６４個のデータに対して１２８ステップで
行方向の逆ＤＣＴ演算が施される。バッファメモリ５に
行方向逆ＤＣＴが施されて書き込まれたデータは、再び
読み出され列方向のＤＣＴが施される。２次元逆ＤＣＴ
は行方向１２８ステップ、列方向１２８ステップの計２
５６ステップで求まる。従って、１ＤＣＴ係数当りには
４ステップを要することとなる。Since two values are obtained in four steps as described above, the inverse DCT operation in the row direction is applied to 64 pieces of data in 128 steps. The data written in the buffer memory 5 by performing the inverse DCT in the row direction is read again and subjected to the DCT in the column direction. Two-dimensional inverse DCT
Is 128 steps in the row direction and 128 steps in the column direction, for a total of 2
It can be found in 56 steps. Therefore, four steps are required for each DCT coefficient.

【００３５】図６は予測信号生成回路４の一例を示して
いる。１画素遅延を実現するレジスタ６１、加算器６
３、セレクタ６６で構成される水平方向補間部と、８画
素遅延を実現するレジスタ６２、加算器６４、セレクタ
６７で構成される垂直方向補間部と、加算器６５、セレ
クタ６８で構成される時間方向補間部より構成される。
水平、垂直、時間方向の各補間部は、可変長復号によっ
て得られた各ブロックの予測信号生成情報によって各補
間部を構成するセレクタ６６，６７，６８の入力として
Ｈ，Ｌどちらを選択するかが決定される。各セレクタ
は、入力Ｈを選択した場合に補間が行われた値を出力
し、入力Ｌを選択した場合に補間が行われない様に接続
されている。FIG. 6 shows an example of the prediction signal generation circuit 4. Register 61 and adder 6 that realize 1-pixel delay
3, a horizontal interpolation unit configured by a selector 66, a vertical interpolation unit configured by a register 62 that realizes 8-pixel delay, an adder 64, and a selector 67, and a time configured by an adder 65 and a selector 68. It is composed of a direction interpolation unit.
Which of the horizontal, vertical, and temporal interpolation units selects H or L as an input to the selectors 66, 67, and 68 that configure each interpolation unit according to the prediction signal generation information of each block obtained by variable-length decoding Is determined. The respective selectors are connected so that when the input H is selected, the interpolated value is output, and when the input L is selected, the interpolation is not performed.

【００３６】フレーム間予測情報によって、フレーム間
予測信号の生成に必要なデータがフレームメモリ９から
９画素×９ラインのブロックデータとして読み出され、
水平方向補間部に供給され、出力として８画素×９ライ
ンのブロックデータとして垂直方向補間部に出力され
る。垂直方向補間部では、出力として８画素×８ライン
を出力する。このときセレクタ６８はＬ入力すなわち値
０を選択するのでブロック情報に基づいて水平及び垂直
補間が施されたデータが外部に出力される。両方向予測
ブロックの場合は、先ず、前方向予測ブロックが上述の
方法によって水平／垂直補間がなされた後にバッファメ
モリ７に書き込まれる。次に、セレクタ６８は入力Ｈを
選択し、後ろ方向予測ブロックについて水平／垂直補間
部で補間処理が施されると同時にバッファメモリ７から
読み出された前方向予測ブロックと今求められる補間さ
れた後ろ方向予測ブロックとが加算器６５によって時間
方向補間が施され再度バッファメモリ７に書き込まれ
る。According to the inter-frame prediction information, the data necessary for generating the inter-frame prediction signal is read from the frame memory 9 as block data of 9 pixels × 9 lines,
It is supplied to the horizontal direction interpolation unit and is output to the vertical direction interpolation unit as block data of 8 pixels × 9 lines. The vertical interpolation unit outputs 8 pixels × 8 lines as an output. At this time, since the selector 68 selects the L input, that is, the value 0, the data subjected to the horizontal and vertical interpolation based on the block information is output to the outside. In the case of a bidirectional prediction block, first, the forward prediction block is written in the buffer memory 7 after horizontal / vertical interpolation is performed by the above method. Next, the selector 68 selects the input H, and the backward prediction block is subjected to the interpolation processing by the horizontal / vertical interpolation unit, and at the same time, the forward prediction block read from the buffer memory 7 and the current predicted block are interpolated. The backward prediction block and the backward prediction block are temporally interpolated by the adder 65 and are again written in the buffer memory 7.

【００３７】このようにして、前方もしくは後ろ方向の
みの片方向予測ブロックの場合は、９画素×９ラインの
データの補間処理に要する時間はレジスタ６１，６２に
よる遅延時間も合わせて（９×９）＋９であり、１画素
当りに換算すると（８１＋９／６４＝１．４１と見積も
ることが出来る。両方向予測ブロックには倍の１画素当
り２．９クロックとなる。したがって、フレーム間予測
信号生成回路では、１画素当たり２．９クロックの処理
時間でフレーム間予測信号をバッファメモリ７に格納す
ることが出来る。In this way, in the case of a unidirectional prediction block only in the forward or backward direction, the time required for the interpolation processing of the data of 9 pixels × 9 lines is calculated by adding the delay time by the registers 61 and 62 (9 × 9). ) +9, which can be estimated as (81 + 9/64 = 1.41) when converted per pixel. The bidirectional prediction block has twice as many 2.9 clocks per pixel. Then, the inter-frame prediction signal can be stored in the buffer memory 7 in a processing time of 2.9 clocks per pixel.

【００３８】バッファメモリ７に格納されたフレーム間
予測信号は、バッファメモリ６に格納された逆ＤＣＴ結
果と同時に読み出され、フレーム加算回路８で加算処理
を施すことによって復号画像を再生し、フレームメモリ
９に書き込まれる。図示せずも、フレーム加算回路での
処理は１画素当り１クロックで出来る。バッファメモリ
７に格納されるフレーム間予測信号の生成に２．９クロ
ック、バッファメモリ７から読み出して復号信号の生成
に１クロックとなるので、合計約４クロックでフレーム
間予測と復号画像の生成を終了することができる。The inter-frame prediction signal stored in the buffer memory 7 is read at the same time as the inverse DCT result stored in the buffer memory 6, and an addition process is performed in the frame adding circuit 8 to reproduce a decoded image, It is written in the memory 9. Although not shown, the processing in the frame addition circuit can be performed with one clock per pixel. Since it takes 2.9 clocks to generate the inter-frame prediction signal stored in the buffer memory 7 and 1 clock to generate the decoded signal read from the buffer memory 7, the inter-frame prediction and the decoded image generation are performed in about 4 clocks in total. Can be finished.

【００３９】本発明による動画像復号装置の動作タイミ
ングを図７を用いて説明する。可変長復号回路１と逆量
子化回路２の動作を上段、逆離散コサイン変換回路３の
動作を中段、予測信号生成回路４とフレーム加算回路８
の動作を下段に示した。今まで説明した様に、実施例に
於ける可変長復号回路１、逆量子化回路２、逆離散コサ
イン変換回路３、及び、予測信号生成回路４とフレーム
加算回路８のそれぞれについて、４クロック毎に１個の
割合で処理が進められる。タイミング制御回路１０は、
上記３つの処理が各々終了した段階で、次のブロックの
処理に同時に進む様に制御しており、従って、可変長復
号回路１と、逆離散コサイン変換回路３と、フレーム加
算回路８が同時に起動されている。The operation timing of the moving picture decoding apparatus according to the present invention will be described with reference to FIG. The operation of the variable length decoding circuit 1 and the inverse quantization circuit 2 is in the upper stage, the operation of the inverse discrete cosine transform circuit 3 is in the middle stage, the prediction signal generation circuit 4 and the frame addition circuit 8
The operation of is shown in the lower row. As described above, the variable length decoding circuit 1, the inverse quantization circuit 2, the inverse discrete cosine transform circuit 3, and the prediction signal generation circuit 4 and the frame addition circuit 8 in the embodiment each have 4 clocks. The processing is performed at a rate of one. The timing control circuit 10
Upon completion of each of the above three processes, control is performed so as to proceed to the process of the next block at the same time. Therefore, the variable length decoding circuit 1, the inverse discrete cosine transform circuit 3, and the frame addition circuit 8 are simultaneously activated. Has been done.

【００４０】上段の可変長復号回路、逆量子化回路で
は、入力ビットストリームによって非ゼロのデータ数が
異なるために、ブロックによって処理時間に違いが見ら
れる。特に、期間１では可変長復号するブロックがノッ
トコーデッドであったために、ＤＣＴ係数が存在しない
ため処理時間が短くなっている。逆離散コサイン変換回
路３では、行方向の変換と列方向の変換処理が順次実行
される。但し、期間２ではノットコーデッドブロックに
対して逆ＤＣＴ演算は不要なので逆離散コサイン変換回
路３は起動されていない。予測信号生成回路４とフレー
ム加算回路８では、１周期の間に１回のフレーム加算処
理と２回までの予測信号生成処理が実行される。例えば
期間２では、両方向予測ブロックに対する予測信号生成
が行われている。一方、期間３では対応するブロックが
イントラブロックのためフレーム間予測信号は生成して
いない。In the variable length decoding circuit and the inverse quantization circuit in the upper stage, the number of non-zero data differs depending on the input bit stream, so that the processing time differs depending on the block. Particularly, in the period 1, since the block to be subjected to the variable length decoding is not coded, there is no DCT coefficient, so that the processing time is shortened. The inverse discrete cosine transform circuit 3 sequentially performs a row-direction transformation process and a column-direction transformation process. However, in the period 2, the inverse DCT calculation is not necessary for the knot-coded block, so the inverse discrete cosine transform circuit 3 is not activated. The prediction signal generation circuit 4 and the frame addition circuit 8 execute the frame addition process once and the prediction signal generation process up to twice in one cycle. For example, in the period 2, the prediction signal generation for the bidirectional prediction block is performed. On the other hand, in period 3, the inter-frame prediction signal is not generated because the corresponding block is an intra block.

【００４１】同図には、バッファメモリ５のバンク切替
え、予測信号生成回路４におけるセレクタ６８の制御信
号、バッファメモリ５における行・列入れ換え変換回路
４９の制御信号も同時に示している。バッファメモリ５
のバンク切替えは、逆離散コサイン変換回路３の起動と
同時に行われる。従って、ノットコーデッドブロックに
対する逆ＤＣＴ演算の起動タイミング、すなわち期間１
から期間２への移行時においてはバンク切替えは起こし
ていない。また、予測信号生成回路４におけるセレクタ
６８の制御信号は、両方向予測でかつ２回目の予測信号
生成時にのみ‘Ｈ’となって時間方向の補間処理が実現
されることが分かる。また、逆離散コサイン変換回路が
列方向ＤＣＴを計算する間は行・列入れ換え変換回路４
９の制御信号がハイレベルとなり行と列を入れ換えたア
ドレスが発生され、行列の転置機能が使われていること
が示される。In the figure, the bank switching of the buffer memory 5, the control signal of the selector 68 in the prediction signal generation circuit 4, and the control signal of the row / column interchange conversion circuit 49 in the buffer memory 5 are also shown. Buffer memory 5
The bank switching is performed simultaneously with the activation of the inverse discrete cosine transform circuit 3. Therefore, the activation timing of the inverse DCT calculation for the not coded block, that is, the period 1
No bank switching has occurred at the time of transition from 1 to period 2. Further, it can be seen that the control signal of the selector 68 in the prediction signal generation circuit 4 becomes'H 'only in the bidirectional prediction and only when the second prediction signal is generated, and the interpolation processing in the time direction is realized. Further, while the inverse discrete cosine transform circuit calculates the column direction DCT, the row / column interchange transform circuit 4
The control signal of 9 becomes high level, an address in which rows and columns are interchanged is generated, and it is shown that the transposition function of the matrix is used.

【００４２】このように、逆量子化後に配置したバッフ
ァメモリ５のバンク切替えと同時に逆ＤＣＴ処理と次ブ
ロックの可変長復号処理を起動すること、また、次ブロ
ックの可変長復号処理の起動と同時に前ブロックのフレ
ーム加算処理を開始させていることが、本制御方式の特
徴となっている。特に、逆離散コサイン変換回路３は行
方向の変換に１２８クロック要するのに対し、フレーム
加算処理は６４クロックしかかからないため、フレーム
加算回路８の実現上の遅延を含めても逆離散コサイン変
換回路３が列方向の変換処理を開始して２次元逆ＤＣＴ
結果の出力を開始する前にフレーム加算処理を終了させ
られる。従って、逆ＤＣＴ結果を格納するバッファメモ
リ６は１ブロック分で充分であり、特別な待ち合わせ制
御も不要となっている。As described above, the inverse DCT process and the variable length decoding process of the next block are started at the same time as the bank switching of the buffer memory 5 arranged after the inverse quantization, and the variable length decoding process of the next block is started at the same time. The start of the frame addition process of the previous block is a feature of this control method. In particular, the inverse discrete cosine transform circuit 3 requires 128 clocks for the conversion in the row direction, whereas the frame addition process takes only 64 clocks. Therefore, the inverse discrete cosine transform circuit 3 includes the delay for realizing the frame adder circuit 8. Starts the conversion process in the column direction and starts the two-dimensional inverse DCT
The frame addition process can be ended before the output of the result is started. Therefore, the buffer memory 6 for storing the inverse DCT result is sufficient for one block, and no special waiting control is required.

【００４３】以上の様に、本発明による実施例では、可
変長復号、逆量子化、逆ＤＣＴそれぞれについて４クロ
ック毎に１個の割合で処理が進められる。尚、データ１
個あたりの処理クロック数は、システム全体の処理要求
速度と回路規模とのバランスによって定めるものであ
り、この実施例はその一例として４ステップ毎の実現例
を示したに過ぎない。たとえば、同じ動作周波数でより
高速な処理が必要とされるシステムでは、各演算器の並
列度を上げてデータ１個当りの処理クロック数を減らす
ことが出来る。この場合、可変長復号回路２で一度にデ
コードできるビット数を６ｂｉｔから１２ｂｉｔにすべ
くシフタレジスタの出力を倍とし、大きな量子化テーブ
ルを採用する。逆量子化回路２では、部分積乗算器を並
列乗算器としてクロックで一回の乗算を行える様にし、
逆離散コサイン変換回路３では並列乗算器、加算器、レ
ジスタ等からなる積和演算器を倍の４並列とすればよ
い。逆に、高速な処理を必要としないシステムでは、こ
の場合可変長復号回路１で一度にデコードできるビット
数を６ｂｉｔから３ｂｉｔにすべくシフタレジスタの出
力を半分とし、量子化テーブルを変更する。逆量子化回
路２では、部分積乗算器で一度に計算できる被乗数のビ
ット数を４ｂｉｔから２ｂｉｔに減少させ、また、逆離
散コサイン変換回路３では積和演算器を１個とすればよ
い。従って、図７に示したタイミングに従った制御方式
は各演算ユニット内の構成方式に依らない。As described above, in the embodiment according to the present invention, processing is performed at a rate of one every four clocks for variable length decoding, inverse quantization and inverse DCT. Data 1
The number of processing clocks per piece is determined by the balance between the required processing speed of the entire system and the circuit scale, and this embodiment merely shows an example of implementation every four steps. For example, in a system that requires higher speed processing at the same operating frequency, it is possible to increase the parallel degree of each arithmetic unit and reduce the number of processing clocks per data. In this case, the output of the shifter register is doubled in order to change the number of bits that can be decoded at one time by the variable length decoding circuit 2 from 6 bits to 12 bits, and a large quantization table is adopted. In the inverse quantization circuit 2, the partial product multiplier is used as a parallel multiplier so that multiplication can be performed once with a clock.
In the inverse discrete cosine transform circuit 3, the product-sum calculator including parallel multipliers, adders, registers, etc. may be doubled in four parallels. Conversely, in a system that does not require high-speed processing, in this case, the output of the shifter register is halved and the quantization table is changed in order to reduce the number of bits that can be decoded at one time by the variable length decoding circuit 1 from 6 bits to 3 bits. In the inverse quantization circuit 2, the number of bits of the multiplicand that can be calculated at one time by the partial product multiplier is reduced from 4 bits to 2 bits, and in the inverse discrete cosine transform circuit 3, only one product sum calculator is required. Therefore, the control method according to the timing shown in FIG. 7 does not depend on the configuration method in each arithmetic unit.

【００４４】[0044]

【発明の効果】本発明によって、逆量子化処理と逆ＤＣ
Ｔ処理間に配置されたバッファメモリを用いて、パイプ
ライン処理が可能となり、各演算ユニットの効率良い設
計が可能となる。従って、復号化装置全体としての最適
化が計れるという効果がある。According to the present invention, the inverse quantization process and the inverse DC are performed.
Pipeline processing can be performed using the buffer memory arranged between the T processings, and efficient design of each arithmetic unit can be performed. Therefore, there is an effect that the entire decoding apparatus can be optimized.

【００４５】本発明方式で必要とされるメモリ個数は従
来法より減少でき、メモリ容量も、全体として従来方式
と同等程度以下となる。従って、装置の小型化に有利に
なるという効果がある。The number of memories required in the method of the present invention can be reduced as compared with the conventional method, and the memory capacity as a whole becomes equal to or less than that of the conventional method. Therefore, there is an effect that it is advantageous for downsizing the device.

【００４６】更に本発明によれば、ＤＣＴ係数の直流成
分やレベル値０の様に逆量子化処理が不要なデータに対
して逆量子化回路をバイパスし、よって処理時間を短く
済ませることができる。このため、装置外部の要因によ
ってビットストリーム供給が遅れた場合も、逆量子化処
理のバイパス機能によって逆ＤＣＴ以降の処理待ち合わ
せの発生確率を低減することが出来る。Further, according to the present invention, the inverse quantization circuit can be bypassed for the data such as the DC component of the DCT coefficient or the level value 0 which does not require the inverse quantization, and the processing time can be shortened. . Therefore, even if the bit stream supply is delayed due to a factor external to the apparatus, it is possible to reduce the probability of occurrence of processing wait after the inverse DCT by the bypass function of the inverse quantization processing.

[Brief description of drawings]

【図１】本発明による動画像復号装置の一実施例のブロ
ック図である。FIG. 1 is a block diagram of an embodiment of a moving image decoding apparatus according to the present invention.

【図２】可変長復号回路１の一例である。FIG. 2 is an example of a variable length decoding circuit 1.

【図３】逆量子化回路２の一例である。FIG. 3 is an example of an inverse quantization circuit 2.

【図４】バッファメモリ５の一例である。FIG. 4 is an example of a buffer memory 5.

【図５】逆離散コサイン変換回路３の一例である。5 is an example of an inverse discrete cosine transform circuit 3. FIG.

【図６】予測信号生成回路４の一例である。FIG. 6 is an example of a prediction signal generation circuit 4.

【図７】本発明による実施例の動作タイミングを示した
説明図である。FIG. 7 is an explanatory diagram showing operation timing of the embodiment according to the present invention.

【図８】従来の復号装置のブロック図である。FIG. 8 is a block diagram of a conventional decoding device.

[Explanation of symbols]

１可変長復号回路２逆量子化回路３逆離散コサイン変換回路４予測信号生成回路５バッファメモリ回路６バッファメモリ回路７バッファメモリ回路８フレーム加算回路９フレームメモリ１０タイミング制御回路 1 Variable Length Decoding Circuit 2 Inverse Quantization Circuit 3 Inverse Discrete Cosine Transform Circuit 4 Prediction Signal Generation Circuit 5 Buffer Memory Circuit 6 Buffer Memory Circuit 7 Buffer Memory Circuit 8 Frame Addition Circuit 9 Frame Memory 10 Timing Control Circuit

Claims

[Claims]

1. A variable-length decoding circuit for decoding a variable-length code in a moving picture decoding apparatus for decoding video data compressed by variable-length coding after scan conversion and discrete-cosine transform coefficients, and the variable-length decoding. An inverse quantization circuit connected to the output of the circuit, a first buffer memory connected to the output of the inverse quantization circuit and written with inverse quantized data, and an inverse connection connected to the output of the first buffer memory And a discrete cosine transform circuit, wherein the output of the inverse discrete cosine transform circuit is also connected to the input of the first buffer memory.

2. The moving picture decoding apparatus according to claim 1,
The first buffer memory is composed of two memories,
A moving image decoding apparatus having means for alternately switching between a memory written from the inverse quantization circuit in the reverse scan order at the time of encoding and a memory input / output from the inverse discrete cosine transform circuit.

3. The moving picture decoding apparatus according to claim 1,
The moving picture decoding device, wherein the dequantization circuit has means for switching whether or not to perform dequantization processing for each DCT coefficient, and the variable length decoding circuit controls the switching.

4. The moving picture decoding device according to claim 1, further comprising:
A second buffer memory that stores the output of the inverse discrete cosine transform circuit; a frame memory that stores a decoded image; a prediction signal generation circuit that generates an interframe prediction signal from the decoded image stored in the frame memory; A third buffer memory that stores the output of the prediction signal generation circuit, reads the data stored in the second buffer memory and the third buffer memory, generates a decoded image signal, and stores the decoded image signal in the frame memory. And a frame adder circuit for performing the moving image decoding.

5. The moving picture decoding apparatus according to claim 4,
The variable-length decoding circuit and the inverse quantization circuit have an inverse quantization process of a variable-length code as a first stage, the inverse discrete cosine transform circuit have an inverse discrete cosine transform process as a second stage, and the frame addition circuit performs decoding. The image signal generation process and the inter-frame prediction signal generation process of the next block by the prediction signal generation circuit are set as the third stage, and the process of the next block is performed when all the processes from the first stage to the third stage are completed for one block. A moving picture decoding device characterized in that the moving pictures are simultaneously transferred to.