JP3654622B2

JP3654622B2 - DCT arithmetic device and IDCT arithmetic device

Info

Publication number: JP3654622B2
Application number: JP17132299A
Authority: JP
Inventors: 俊宏南; 高庸新田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1999-06-17
Filing date: 1999-06-17
Publication date: 2005-06-02
Anticipated expiration: 2019-06-17
Also published as: JP2001005800A

Description

【０００１】
【発明の属する技術分野】
本発明は、画像符号化で用いられる１次元または２次元ＤＣＴ（Discrete Cosine Transform）とその逆変換である１次元または２次元ＩＤＣＴ（Inverse Discrete Cosine Transform）を実行するための演算装置に関するものである。
【０００２】
【従来の技術】
（１）ＤＣＴ演算とＩＤＣＴ演算
f(i)(i=0,1,2,3,4,5,6,7)を原信号とし、F(j)(j=0,1,2,3,4,5,6,7)をＤＣＴ変換係数とすると、８点１次元ＤＣＴ演算は次の式「数１」と「数２」で表わされる。
【０００３】
【数１】

【０００４】
【数２】

【０００５】
ここで、A＝2/2、B＝cos(π/8)/2、C＝cos(3π/8)/2、D＝cos(π/16)/2、E＝cos(3π/16)/2、F＝cos(5π/16)/2、G＝cos(7π/16)/2である。
【０００６】
また、８点１次元ＩＤＣＴ演算は次の式「数３」と「数４」で表わせる。
【０００７】
【数３】

【０００８】
【数４】

【０００９】
ここで、上記式「数３」と「数４」の右辺第１項は同一であり、また、上記式「数３」と「数４」の右辺第２項も同一である。
【００１０】
上記のように、８点１次元ＤＣＴ演算と８点１次元ＩＤＣＴ演算は、たかだか７種類の係数とデータの乗累算で実現される。このため、ＤＣＴ演算とＩＤＣＴ演算においては汎用乗算器を用いる必要はなく、上記Ａ〜Ｇの係数に対応する７種類の固定係数乗算器を用いることによってそれらの演算回路を構成することができる。
【００１１】
従来技術の第１の例
従来技術の第１の例である８点１次元ＤＣＴ演算装置を図５に示す。このＤＣＴ演算装置は、クロックＣＫ（以下、単にＣＫと言う）に同期して動作し、１ＣＫ当たり２個の原信号を入力し、また、２個のＤＣＴ係数を出力する。上記式「数１」は図５の左側の３個の固定係数乗算器３〜５と４個の累算器１７〜２０で計算され、上記式「数２」は図５の右側の４個の固定係数乗算器６〜９と４個の累算器２１〜２４で計算される。以下、具体的な動作を説明する。
【００１２】
最初に、累算器１７〜２４の中のレジスタ４１〜４８はリセットされ、それらの値が「０」にされる。この後、１ＣＫ当たり２個の原信号がＩＮ０にf(0),f(1),f(2),f(3)、ＩＮ１にf(7),f(6),f(5),f(4)の順番で入力され、加算器１と減算器２に入力される。固定係数乗算器３〜５（ＭＰＹＡ，ＭＰＹＢ，ＭＰＹＣ）は被乗数がそれぞれ上記Ａ，Ｂ，Ｃに固定された乗算器であり、累算器１７，１８，１９，２０ではそれぞれF(0),F(2),F(4),F(6)が計算される。
【００１３】
また、累算器１８〜２０の中の加減算器３４，３５，３６は入力ａとｂより「ａ＋ｂ」または「−ａ＋ｂ」を計算することができる。加算器１ではf(0)+f(7),f(1)+f(6),f(2)+f(5),f(3)+f(4)が計算され、固定係数乗算器３〜５に入力される。
【００１４】
最初に、f(0)+f(7)が入力されると、乗算器３，４，５ではそれぞれA×{f(0)+f(7)},B×{f(0)+f(7)},C×{f(0)+f(7)}が並列に計算され、バス１０，１１，１２に出力される。累算器１７では、セレクタ２５でバス１０を選択することにより、A×{f(0)+f(7)}がレジスタ４１（ＲＥＧ００）にセットされる。このとき、レジスタ４１（ＲＥＧ００）は最初に「０」にリセットされているため、加算器３３はスルーとなる。同様にして、累算器１８，１９，２０では、セレクタ２６，２７，２８でそれぞれバス１１，１０，１２を選択することにより、B×{f(0)+f(7)},A×{f(0)+f(7)},C×{f(0)+f(7)}がレジスタ４２（ＲＥＧ１０），レジスタ４３（ＲＥＧ２０），レジスタ４４（ＲＥＧ３０）にセットされる。
【００１５】
次に、加算器１ではｆ（１）＋ｆ（６）が計算され、固定係数乗算器３〜５に入力される。乗算器３，４，５ではそれぞれA×{f(1)+f(6)},B×{f(1)+f(6)},C×{f(1)+f(6)}が並列に計算され、バス１０，１１，１２に出力される。累算器１７では、セレクタ２５でバス１０が選択される。このとき、レジスタ４１（ＲＥＧ００）にはA×{f(0)+f(7)}がセットされているため、加算器３３ではA×{f(0)+f(7)}+A×{f(1)+f(6)}が計算され、その値がレジスタ４１（ＲＥＧ００）にセットされる。
【００１６】
同様にして、累算器１８では、セレクタ２６でバス１２が選択され、レジスタ４２（ＲＥＧ１０）にB×{f(0)+f(7)}+C×{f(1)+f(6)}がセットされる。また、累算器１９では、セレクタ２７でバス１０が選択される。このとき、加算器３５ではA×{f(0)+f(7)}-A×{f(0)+f(7)}が計算され、レジスタ４３（ＲＥＧ２０）にセットされる。同様にして、レジスタ４４（ＲＥＧ３０）にはC×{f(0)+f(7)}-B×{f(0)+f(7)}がセットされる。以下、同様の計算が行われ、４ＣＫ目にF0,F2,F4,F6が求まる。ただし、このとき、これらはレジスタ４１〜４４（ＲＥＧ００〜ＲＥＧ３０）ではなく、レジスタ４９〜５２（ＲＥＧ０１〜ＲＥＧ３１）にセットされる。このセット動作と同時に、レジスタ４１〜４４（ＲＥＧ００〜ＲＥＧ３０）はリセットされ、次のＣＫより新しい８点１次元ＤＣＴ演算が開始される。
【００１７】
一方、固定係数乗算器６〜９（ＭＰＹＤ，ＭＰＹＤ，ＭＰＹＦ，ＭＰＹＧ）は被乗数がそれぞれ上記Ｄ，Ｅ，Ｆ，Ｇに固定された乗算器であり、累算器２１，２２，２３，２４ではそれぞれF(1),F(3),F(5),F(7)が計算される。また、累算器２２〜２４の中の加減算器３８，３９，４０は入力ａとｂより「ａ＋ｂ」または「−ａ＋ｂ」を計算することができる。入力ＩＮ０，ＩＮ１に接続された減算器２では、f(0)-f(7),f(1)-f(6),f(2)-f(5),f(3)-f(4)が計算され、上記F(0),F(2),F(4),F(6)の計算と同様にしてF(1),F(3),F(5),F(7)が計算され、累算器２１〜２４の中のレジスタ５３〜５６（ＲＥＧ４１〜ＲＥＧ７１）にセットされる。なお、累算器２１〜２４において、２９〜３２はセレクタ、３７は加算器、３８〜４０は加減算器、４５〜４８はレジスタ（ＲＥＧ４０〜７０）である。
【００１８】
累算器１７〜２４のレジスタ４９〜５６にセットされたＤＣＴ係数F(0)〜F(7)は、バス５７を通じて１ＣＫごとにＯＵＴ０からF(0),F(2),F(4),F(6)、バス５８を通じてＯＵＴ１からF(1),F(3),F(5),F(7)の順番で出力される。
【００１９】
このようなＤＣＴ演算装置に関する技術については、例えば、A.Madisetti and A.N.Willson,“A 100MHz 2-D8×8DCT/IDCT Processor for HDTV Applications”,IEEE Trans.Circuits Syst.Video Technol.,vol.5,No.2,pp.158-164,April 1995.に記載されている。尚、この文献には、以下に示す従来技術の第２、３、４の例に示す技術についても記載されている。
【００２０】
従来技術の第２の例
従来技術の第２の例である８点１次元ＩＤＣＴ演算装置を図６に示す。このＩＤＣＴ演算装置の構成は、前記従来技術の第１の例の８点１次元ＤＣＴ演算装置から加算器１と減算器２を削除し、出力部に加算器５９、減算器６０を付加した構造である。
【００２１】
入力ＩＮ０にはＣＫごとにＤＣＴ係数F(0),F(2),F(4),F(6)が入力される。これらは、固定係数乗算器３〜５（ＭＰＹＡ，ＭＰＹＢ，ＭＰＹＣ）に入力され、上記式「数３」と「数４」の右辺第１項の行列の乗算が図６の左側の３個の固定係数乗算器３〜５と４個の累算器１７〜２０で計算される。この場合、累算器１７〜２０の構成は図５と同様の構成であり、同一符号で示している。
【００２２】
一方、入力ＩＮ１にはF(1),F(3),F(5),F(7)が入力される。これらは固定係数乗算器６〜９（ＭＰＹＤ，ＭＰＹＥ，ＭＰＹＦ，ＭＰＹＧ）に入力され、上記式「数３」と「数４」の右辺第２項の行列の乗算が図６の右側の４個の固定係数乗算器６〜９と４個の累算器２１〜２４で計算される。累算器２１〜２４の構成は図５と同様の構成であり、同一符号で示している。
【００２３】
そして、式「数３」の右辺第１項の行列の乗算結果と右辺第２項の行列の乗算結果の加算は加算器５９で計算され、これと並列して、式「数４」の右辺第１項の行列の乗算結果と右辺第２項の行列の乗算結果の減算が減算器６０で計算される。この結果、ＣＫごとに加算器５９の出力としてf(0),f(1),f(2),f(3)、減算器６０の出力としてf(7),f(6),f(5),f(4)が得られ、ＯＵＴ０とＯＵＴ１から出力される。
【００２４】
従来技術の第３の例
従来技術の第３の例である８点１次元ＤＣＴ／ＩＤＣＴ演算共用装置を図７に示す。この従来例は、上記従来技術の第１の例のＤＣＴ演算装置と従来技術の第２の例のＩＤＣＴ演算装置を組み合わせた回路構成である。
【００２５】
セレクタ６１，６２，６３，６４の入力をすべてａ側とすることによりＤＣＴ演算を実行でき、セレクタ６１，６２，６３，６４の入力をすべてｂ側とすることによりＩＤＣＴ演算を実行できる。なお、図５および図６と同一部分は同一符号で示している。
【００２６】
従来技術の第４の例
従来技術の第４の例である８点×８点２次元ＤＣＴ／ＩＤＣＴ演算共用装置を図８に示す。上記従来技術の第３の例である８点１次元ＤＣＴ／ＩＤＣＴ演算共用装置６７と８×８Ｗ(ワード)の２次元転置メモリ６８で構成される。８×８Ｗの２次元転置メモリ６８は、同時に２画素書き込み又は読み出しができ、水平方向に書き込んだデータを垂直方向に読み出せるメモリである。ただし、２次元転置メモリ６８は、実際には６４Ｗの連続したアドレス（ｉ＝０〜６３）をもつメモリであり、ｘ＝０〜７（ｉ／８、ｉを８で割った余り）とｙ＝０〜７（ｙ＝ｉ／８、ｉを８で割って切り捨て）という２次元の仮想的なアドレスにより水平方向または垂直方向にアクセスされる。
【００２７】
具体的な動作としては、８×８個の２次元の信号に対して最初に水平方向に８点１次元ＤＣＴ（または８点１次元ＩＤＣＴ）を８回実行し、その結果を８×８Ｗの２次元転置メモリ６８に水平方向に書き込んで記憶する。次に、２次元転置メモリ６８から垂直方向に読み出して８点１次元ＤＣＴ（または８点１次元ＩＤＣＴ）を８回実行する。尚、最初に垂直方向に１次元ＤＣＴ（または１次元ＩＤＣＴ）を行い、２回目に水平方向に８点１次元ＤＣＴ（または８点１次元ＩＤＣＴ）を行っても良い。
【００２８】
【発明が解決しようとする課題】
しかしながら、従来技術の第４の例である８点×８点２次元ＤＣＴ／ＩＤＣＴ演算共用装置では、水平方向のＤＣＴまたはＩＤＣＴと垂直方向のＤＣＴまたはＩＤＣＴを交互に繰り返す。２次元ＤＣＴ／ＩＤＣＴ演算共用装置の外部からの入力は水平方向のＤＣＴまたはＩＤＣＴを実行しているときのみ必要であるため、１ＣＫ当たり２データを３２ＣＫの間に集中して入力し、３２ＣＫの間入力無しとしなければならない。また、外部への出力は垂直方向のＤＣＴまたはＩＤＣＴのとき生じるため、１ＣＫ当たり２データが３２ＣＫの間に集中して出力され、３２ＣＫの間出力無しとなる。
【００２９】
このように入力と出力がバースト的に発生すると、２次元ＤＣＴ／ＩＤＣＴ演算共用装置と前段または後段の装置とのインタフェースの回路が複雑になる。
【００３０】
これを緩和するため、２次元ＤＣＴ／ＩＤＣＴ演算共用装置の前段に１データ書き込みと２データ読み出しが同時にできる３ポートの６４Ｗメモリ、後段に２データ書き込みと１データ読み出しが同時にできる３ポートの６４Ｗメモリを設け、それらのメモリでバッファリングして入力と出力を１ＣＫ当たり１データに変換することも考えられるが、ハード規模が増加する。
【００３１】
また、８点１次元ＤＣＴ／ＩＤＣＴ演算共用装置６７は高速動作のためにパイプライン化される場合が多いが、これにより、データの入力が始まってから最初のデータが出力されるまでに遅れ、すなわちパイプラインレイテンシが発生する。従来技術の第４の例に示した構成では、２次元転置メモリ６８への書き込みが終了しなければ、読み出しを開始できないため、後半の１次元ＤＣＴまたはＩＤＣＴを開始するとき、必ずレイテンシ分の無効サイクルが生じることとなる。この無効サイクルを少しでも少なくするために、２次元転置メモリ６８を４ポート化し、２データ書き込みと２データ読み出しが同時にできるようにすることも考えられるが、２次元転置メモリ６８が大規模化する。さらに、４ポート化したとしても単一の１次元ＤＣＴ／ＩＤＣＴ演算共用装置を用いている限り、レイテンシ削減効果には限界がある。
【００３２】
図９と図１０にＩＤＣＴの場合を示す。図９は２次元転置メモリ６８に８行目の水平方向のＩＤＣＴ演算結果を書き込み、同時に垂直方向のＩＤＣＴ演算のために、１列目の信号を読み出す場合を示す。８行目の書込みは、{f(0),f(7)}、{f(1),f(6)}、{f(2),f(5)}、{f(3),f(4)}の順番で行い、１列目の読み出しは{F(0),F(1)}、{F(2),F(3)}、{F(4),F(5)}、{F(6),F(7)}の順番で行うことになる。
【００３３】
ところが、図９に示すように、F(7)=f(0)であり、f(0)が２次元転置メモリ６８に書き込まれなければ、F(7)として読み出すことができない。
【００３４】
図１０は、８点１次元ＤＣＴ／ＩＤＣＴ演算共用装置６７を５段のパイプラインで構成した場合を示す。尚、単純化のためにパイプラインレジスタ６９〜７４のみ示してある。図９に示したように後半の１次元ＤＣＴのために読み出されるF7は、前半の１次元ＤＣＴ演算の結果であるf0と同一である。従って、f0の書き込みが終わらなければ、F7を読み出すことはできない。このため、図１０から分かるように、パイプライン段数が５段よりも大きくなると、f0を転置メモリ６８に書き込むサイクルとF7を転置メモリ６８から読み出すサイクルとの間に空き、言い換えるとパイプラインに空きが生じ、無効サイクルが発生することとなる。
【００３５】
本発明は、１ＣＫ当たり１個の入力と出力という条件の下で、パイプライン段数にかかわらず無効サイクルが生じないようにすることができる１次元または２次元ＤＣＴ演算装置および１次元または２次元ＩＤＣＴ演算装置を提供することを目的とする。
【００３６】
【課題を解決するための手段】
本発明は、１次元ｋ点ＤＣＴ演算回路または１次元ｋ点ＩＤＣＴ演算回路を構成するとき、従来は１個の値のみを被乗数として取り得る固定係数乗算器をｋ個（kは２のべき乗）用いていたのに対し、１個または２個の値を被乗数として取り得る半固定乗算器をｋ／２個用いて、ハード量を削減し、この１次元ｋ点ＤＣＴ演算回路または１次元ｋ点ＩＤＣＴ演算回路を水平方向用と垂直方向用に別個に設けることにより、ハード量の増加を引き起こすことなく、上記問題点を解決するようにしたものである。
【００３７】
即ち、本願の第１の発明は、水平方向または垂直方向にｋ点ＤＣＴ演算を行う１次元ＤＣＴ演算装置であって、入力される原信号に掛けられる係数の中から選択した１個または２個の値のみを被乗数とするk/2（ｋは２のべき乗）個の半固定乗算器と、f(0),f(k-1),f(1),f(k-2),f(2),f(k-3),……,f(k/2-1),f(k/2)の順番で１個づつ入力される原信号からf(0)+f(k-1),f(0)-f(k-1),f(1)+f(k-2),f(1)-f(k-2),f(2)+f(k-3),f(2)-f(k-3),……,f(k/2-1)+f(k/2),f(k/2-1)-f(k/2)を順番に計算し、その出力を前記ｋ／２個の半固定乗算器に入力する加減算器と、２個の累算値を保持する記憶手段を有し、前記ｋ／２個の半固定乗算器の出力から１つを選択し、そのまま加算するか正負を反転して加算するかを選択して前記２個の累算値に交互に加算し、それぞれＤＣＴ係数F(0)とF(1),F(2)とF(3),F(4)とF(5),……,F(k-2)とF(k-1)を出力するｋ／２個の累算器と、前記ｋ／２個の累算器から出力されるｋ個のＤＣＴ係数の値を保持し、該ｋ個から成るＤＣＴ係数をF(0),F(1),F(2),F(3),F(4),……,F(k-2),F(k-1)の順番で出力する記憶回路とを備えることを特徴とする。
【００３８】
また、本願の第２の発明は、水平方向または垂直方向にｋ点ＩＤＣＴ演算を行う１次元ＩＤＣＴ演算装置であって、ＤＣＴ係数に掛けられる係数の中から選択した１個または２個の値のみを被乗数とし、F(0),F(1),F(2),F(3),F(4),……,F(k-2),F(k-1)の順番で１個づつ入力されたＤＣＴ係数と乗算するｋ／２（ｋは２のべき乗）個の半固定乗算器と、２個の累算値を保持する記憶手段を有し、前記ｋ／２個の半固定乗算器の出力から１つを選択し、そのまま加算するか正負を反転して加算するかを選択して前記２個の累算値に交互に加算するｋ／２個の累算器と、前記ｋ／２個の累算器から出力されるｋ個の値を保持する記憶手段を有し、前記累算器から出力された前記ｋ個の値に対し、ｋ／２回の加算とｋ／２回の減算を行ってf(0),f(1),f(2),f(3),f(4),……,f(k-2),f(k-1)の順番で原信号を求める加減算器とを備えることを特徴とする。
【００３９】
また、本願の第３の発明は、水平方向または垂直方向にｋ点ＤＣＴ演算またはｋ点ＩＤＣＴ演算を行う１次元ＤＣＴ／ＩＤＣＴ演算装置であって、原信号およびＤＣＴ係数に掛けられる係数の中から選択した１個または２個の値のみを被乗数とするｋ／２（ｋは２のべき乗）個の半固定乗算器と、f(0),f(k-1),f(1),f(k-2),f(2),f(k-3),……,f(k/2-1),f(k/2)の順番で１個づつ入力される原信号からf(0)+f(k-1),f(0)-f(k-1),f(1)+f(k-2),f(1)-f(k-2),f(2)+f(k-3),f(2)-f(k-3),……,f(k/2-1)+f(k/2),f(k/2-1)-f(k/2)を順番に計算し、その出力を前記ｋ／２個の半固定乗算器に入力するか、若しくはF(0),F(1),F(2),F(3),F(4),……,F(k-2),F(k-1)の順番で１個づつ入力されるＤＣＴ係数を前記ｋ／２個の半固定乗算器に入力するかを選択可能な加減算器と、２個の累算値を保持する記憶手段を有し、前記ｋ／２個の半固定乗算器の出力から１つを選択し、そのまま加算するか正負を反転して加算するかを選択して前記２個の累算値に交互に加算するｋ／２個の累算器と、前記ｋ／２個の累算器から出力されるｋ個の値を保持する記憶手段を有し、F(0),F(1),F(2),F(3),F(4),……,F(k-2),F(k-1)の順番でＤＣＴ係数を出力するか、若しくは前記累算器から出力された前記ｋ個の値に対し、ｋ／２回の加算とｋ／２回の減算を行ってf(0),f(1),f(2),f(3),f(4),……,f(k-2),f(k-1)の順番で原信号を求めて出力するかを選択可能な加減算器とを備えることを特徴とする。
【００４０】
また、本願の第４の発明は、ｍ点×ｎ点（ｍとｎは２のべき乗）の２次元ＤＣＴ演算を実行する２次元ＤＣＴ演算装置であって、水平方向にｋ点ＤＣＴ演算を行うための第１の１次元ＤＣＴ演算回路と、垂直方向にｋ点ＤＣＴ演算を行うための第２の１次元ＤＣＴ演算回路と、水平方向に書き込まれたデータを垂直方向に読み出すか、若しくは垂直方向に書き込まれたデータを水平方向に読み出せるｍ×ｎワードの２ポートメモリとを有し、前記第１および第２の１次元ＤＣＴ演算回路は、入力される原信号に掛けられる係数の中から選択した１個または２個の値のみを被乗数とするｋ／２（ｋは２のべき乗）個の半固定乗算器と、f(0),f(k-1),f(1),f(k-2),f(2),f(k-3),……,f(k/2-1),f(k/2)の順番で１個づつ入力される原信号からf(0)+f(k-1),f(0)-f(k-1),f(1)+f(k-2),f(1)-f(k-2),f(2)+f(k-3),f(2)-f(k-3),……,f(k/2-1)+f(k/2),f(k/2-1)-f(k/2)を順番に計算し、その出力を前記ｋ／２個の半固定乗算器に入力する加減算器と、２個の累算値を保持する記憶手段を有し、前記ｋ／２個の半固定乗算器の出力から１つを選択し、そのまま加算するか正負を反転して加算するかを選択して前記２個の累算値に交互に加算し、それぞれＤＣＴ係数F(0)とF(1),F(2)とF(3),F(4)とF(5),……,F(k-2)とF(k-1)を出力するｋ／２個の累算器と、前記ｋ／２個の累算器から出力されるｋ個のＤＣＴ係数の値を保持し、該ｋ個から成るＤＣＴ係数をF(0),F(1),F(2),F(3),F(4),……,F(k-2),F(k-1)の順番で出力する記憶回路とを備えることを特徴とする。
【００４１】
また、本願の第５の発明は、ｍ点×ｎ点（ｍとｎは２のべき乗）の２次元ＩＤＣＴ演算を実行する２次元ＩＤＣＴ演算装置であって、水平方向にｋ点ＩＤＣＴ演算を行うための第１の１次元ＩＤＣＴ演算回路と、垂直方向にｋ点ＩＤＣＴ演算を行うための第２の１次元ＩＤＣＴ演算回路と、水平方向に書き込まれたデータを垂直方向に読み出すか、若しくは垂直方向に書き込まれたデータを水平方向に読み出せるｍ×ｎワードの２ポートメモリとを有し、前記第１および第２の１次元ＩＤＣＴ演算回路は、ＤＣＴ係数に掛けられる係数の中から選択した１個または２個の値のみを被乗数とし、F(0),F(1),F(2),F(3),F(4),……,F(k-2),F(k-1)の順番で１個づつ入力されたＤＣＴ係数と乗算するｋ／２（ｋは２のべき乗）個の半固定乗算器と、２個の累算値を保持する記憶手段を有し、前記ｋ／２個の半固定乗算器の出力から１つを選択し、そのまま加算するか正負を反転して加算するかを選択して前記２個の累算値に交互に加算するｋ／２個の累算器と、前記ｋ／２個の累算器から出力されるｋ個の値を保持する記憶手段を有し、前記累算器から出力された前記ｋ個の値に対し、ｋ／２回の加算とｋ／２回の減算を行ってf(0),f(1),f(2),f(3),f(4),……,f(k-2),f(k-1)の順番で原信号を求める加減算器とを備えることを特徴とする。
【００４２】
また、本願の第６の発明は、ｍ点×ｎ点（ｍとｎは２のべき乗）の２次元ＤＣＴ演算または２次元ＩＤＣＴ演算を実行する２次元ＤＣＴ／ＩＤＣＴ演算装置であって、水平方向にｋ点ＤＣＴ演算またはｋ点ＩＤＣＴ演算を行うための第１の１次元ＤＣＴ／ＩＤＣＴ演算共用回路と、垂直方向にｋ点ＤＣＴ演算またはＩＤＣＴ演算を行うための第２の１次元ＤＣＴ／ＩＤＣＴ演算共用回路と、水平方向に書き込まれたデータを垂直方向に読み出すか、若しくは垂直方向に書き込まれたデータを水平方向に読み出せるｍ×ｎワードの２ポートメモリとを有し、前記第１および第２の１次元ＤＣＴ／ＩＤＣＴ演算共用回路は、原信号およびＤＣＴ係数に掛けられる係数の中から選択した１個または２個の値のみを被乗数とするｋ／２（ｋは２のべき乗）個の半固定乗算器と、f(0),f(k-1),f(1),f(k-2),f(2),f(k-3),……,f(k/2-1),f(k/2)の順番で１個づつ入力される原信号からf(0)+f(k-1),f(0)-f(k-1),f(1)+f(k-2),f(1)-f(k-2),f(2)+f(k-3),f(2)-f(k-3),……,f(k/2-1)+f(k/2),f(k/2-1)-f(k/2)を順番に計算し、その出力を前記ｋ／２個の半固定乗算器に入力するか、若しくはF(0),F(1),F(2),F(3),F(4),……,F(k-2),F(k-1)の順番で１個づつ入力されるＤＣＴ係数を前記ｋ／２個の半固定乗算器に入力するかを選択可能な加減算器と、２個の累算値を保持する記憶手段を有し、前記ｋ／２個の半固定乗算器の出力から１つを選択し、そのまま加算するか正負を反転して加算するかを選択して前記２個の累算値に交互に加算するｋ／２個の累算器と、前記ｋ／２個の累算器から出力されるｋ個の値を保持する記憶手段を有し、F(0),F(1),F(2),F(3),F(4),……,F(k-2),F(k-1)の順番でＤＣＴ係数を出力するか、若しくは前記累算器から出力された前記ｋ個の値に対し、ｋ／２回の加算とｋ／２回の減算を行ってf(0),f(1),f(2),f(3),f(4),……,f(k-2),f(k-1)の順番で原信号を求めて出力するかを選択可能な加減算器とを備えることを特徴とする。
【００４３】
【発明の実施の形態】
以下、本発明の実施形態を図面を参照しつつ説明する。
【００４４】
第１の実施形態
本発明の第１の実施形態である８点１次元ＤＣＴ演算装置を図１に示す。このＤＣＴ演算装置は、ＣＫに同期して動作し、１ＣＫ当たり１個の原信号を入力し、また、１個のＤＣＴ係数を出力する。奇数番目ＣＫと偶数番目ＣＫで上述した式「数１」と「数２」を交互に計算する。回路構成としては、レジスタ７５〜７７、加減算器７８、半固定乗算器７９〜８２（ＭＰＹＡ／Ｄ〜ＭＰＹＧ）、累算器８７〜９０、レジスタ４９〜５２（ＲＥＧ０１〜３１）、レジスタ５３〜５６（ＲＥＧ４１〜７１）、セレクタ１０３を備えている。また、累算器８７は、セレクタ９１、加算器９５、レジスタ（ＲＥＧ００）４１および４５（ＲＥＧ４０）、セレクタ９９で構成されている。また、累算器８８〜９０は、セレクタ９２〜９４、加減算器９６〜９８、レジスタ（ＲＥＧ１０）４２〜４４（ＲＥＧ３０）および４６（ＲＥＧ５０）〜４８（ＲＥＧ７０）、セレクタ１００〜１０２で構成されている。
【００４５】
以下、具体的な動作を説明する。
最初に、各累算器８７〜９０内のレジスタ４１〜４８はリセットされ、それらの値が「０」にされる。この後、１ＣＫ当たり１個の原信号がＩＮよりＣＫごとにf(0),f(7),f(1),f(6),f(2),f(5),f(3),f(4)の順番で入力される。これらのうち、f(0),f(1),f(2),f(3)はレジスタ７５（ＲＥＧ０）にセットされ、１ＣＫ後にレジスタ７６（ＲＥＧ１）にシフトされ、それぞれ２ＣＫの間保持される。また、f(7),f(6),f(5),f(4)はレジスタ７７（ＲＥＧ２）にセットされ、同じく２ＣＫの間保持される。これにより、加減算器７８では、ＣＫごとにf(0)+f(7),f(0)-f(7),f(1)+f(6),f(1)-f(6),f(2)+f(5),f(2)-f(5),f(3)+f(4),f(3)-(4)が計算され、その計算結果が半固定乗算器７９〜８２（ＭＰＹＡ／Ｄ，ＭＰＹＢ／Ｅ，ＭＰＹＣ／Ｆ，ＭＰＹＧ）に入力される。
【００４６】
半固定乗算器７９〜８１（ＭＰＹＡ／Ｄ，ＭＰＹＢ／Ｅ，ＭＰＹＣ／Ｆ）は被乗数がそれぞれＡとＤ，ＢとＥ，ＣとＦの２種類に固定された乗算器であり、前述した式「数１」を計算するときは、Ａ，Ｂ，Ｃ、「数２」式を計算するときはＤ，Ｅ，Ｆが選択される。また、半固定乗算器８２（ＭＰＹＧ）は被乗数がＧに固定されており、前述の式「数２」の計算にのみ使用される。累算器８７，８８，８９，９０ではそれぞれF(0)とF(1),F(2)とF(3),F(4)とF(5),F(6)とF(7)がＣＫごとに交互に計算される。
【００４７】
最初にf(0)+f(7)が入力されると、半固定乗算器７９，８０，８１ではそれぞれA×{f(0)+f(7)},B×{f(0)+f(7)},C×{f(0)+f(7)}が並列に計算され、バス８３，８４，８５に出力される。累算器８７では、セレクタ９１でバス８３を選択することにより、A×{f(0)+f(7)}がレジスタ４１（ＲＥＧ００）にセットされる。このとき、セレクタ９９ではレジスタ４１（ＲＥＧ００）の出力が選択されている。レジスタ４１（ＲＥＧ００）は最初に「０」にリセットされているため、加算器９５はスルーとなる。同様にして、累算器８８，８９，９０では、セレクタ９２，９３，９４でそれぞれバス８４，８３，８５を選択することにより、B×{f(0)+f(7)},A×{f(0)+f(7)},C×{f(0)+f(7)}がレジスタ４２（ＲＥＧ１０），レジスタ４３（ＲＥＧ２０），レジスタ４４（ＲＥＧ３０）にセットされる。
【００４８】
次に、f(0)-f(7)が入力されると，半固定乗算器７９，８０，８１，８２ではそれぞれD×{f(0)-f(7)},E×{f(0)-f(7)I},F×{f(0)-f(7)},G×{f(0)-f(7)}が並列に計算され、バス８３，８４，８５，８６に出力される。
【００４９】
累算器８７では、セレクタ９１でバス８３を選択することにより、D×{f(0)-f(7)}がレジスタ４５（ＲＥＧ４０）にセットされる。このとき、セレクタ９９はレジスタ４５（ＲＥＧ４０）の出力を選択している。レジスタ４５（ＲＥＧ４０）は最初に「０」にリセットされているため、加算器９５はスルーとなる。同様にして、累算器８８，８９，９０では、セレクタ９２，９３，９４でそれぞれバス８４，８５，８６を選択することにより、E×{f(0)-f(7)},F×{f(0)-f(7)},G×{f(0)-f(7)}がレジスタ４６（ＲＥＧ５０），レジスタ４７（ＲＥＧ６０），レジスタ４８（ＲＥＧ７０）にセットされる。以下、ＣＫごとに前述の式「数１」と「数２」が交互に計算され、７ＣＫ目にF(0),F(2),F(4),F(6)が求まり、レジスタ４９〜５２（ＲＥＧ０１〜ＲＥＧ３１）にセットされる。また、８ＣＫ目にF(1),F(3),F(5),F(7）が求まり、レジスタ５３〜５６（ＲＥＧ４１〜ＲＥＧ７１）にセットされる。レジスタ４１〜４４（ＲＥＧ００〜ＲＥＧ３０）は７ＣＫ目に、レジスタ４５〜４８（ＲＥＧ４０〜ＲＥＧ７０）は８ＣＫ目にリセットされ、それぞれ８ＣＫ目、９ＣＫ目より新しい８点１次元ＤＣＴ演算が開始される。
【００５０】
レジスタ４９〜５６にセットされたＤＣＴ係数F(0)〜F(7)は、セレクタ５７，５８，１０３で選択されて、ＯＵＴから１ＣＫごとにF(0),F(1),F(2),F(3),F(4),F(5),F(6),F(7)の順番で出力される。
【００５１】
第２の実施形態
本発明の第２の実施形態である８点１次元ＩＤＣＴ演算装置を図２に示す。このＩＤＣＴ演算装置の構成は、前記第１の実施形態の８点１次元ＤＣＴ演算装置から入力部のレジスタ７５，７６，７７（ＲＥＧ０，ＲＥＧ１，ＲＥＧ２）と加減算器７８、及び出力部のセレクタ１０３を削除し、出力部にレジスタ１０４と加減算器１０５を付加した構造であり、図１の第１の実施形態と同一部分は同一符号で示している。この実施形態では、奇数番目ＣＫと偶数番目ＣＫで前述の式「数３」と「数４」の右辺第１項の行列の乗算と右辺第２項の行列の乗算が交互に実行される。
【００５２】
入力ＩＮには、ＣＫごとにＤＣＴ係数F(0),F(1),F(2),F(3),F(4),F(5),F(6),F(7)が順番に入力されて半固定乗算器７９〜８２（ＭＰＹＡ／Ｄ，ＭＰＹＢ／Ｅ，ＭＰＹＣ／Ｆ，ＭＰＹＧ）に入力される。そして、７ＣＫ目に前述の式「数３」と「数４」の右辺第１項の行列の乗算結果がレジスタ４９〜５２（ＲＥＧ０１，ＲＥＧ１１，ＲＥＧ２１，ＲＥＧ３１）にセットされ、８ＣＫ目に右辺第２項の行列の乗算結果がレジスタ５３〜５６（ＲＥＧ４１，ＲＥＧ５１，ＲＥＧ６１，ＲＥＧ７１）にセットされる。そして、「数３」と「数４」における右辺第１項の行列の乗算結果と右辺第２項の行列の乗算結果の加算と減算は加減算器１０５で計算される。その際、セレクタ５７と５８はＯＵＴからＣＫごとにf(0),f(1),f(2),f(3),f(4),f(5),f(6),f(7)の順番に出力されるように制御される。尚、レジスタ４９〜５２（ＲＥＧ０１，ＲＥＧ１１，ＲＥＧ２１，ＲＥＧ３１）に右辺第１項の行列の乗算結果が得られるタイミングとレジスタ５３〜５６（ＲＥＧ４１，ＲＥＧ５１，ＲＥＧ６１，ＲＥＧ７１）に右辺第２項の行列の乗算結果が得られるタイミングに１ＣＫの差があり、そのままでは計８回の加算と減算を行えない。このため、レジスタ１０４（ＲＥＧ３）でレジスタ４９〜５２（ＲＥＧ０１，ＲＥＧ１１，ＲＥＧ２１，ＲＥＧ３１）の出力を１ＣＫ遅延させることにより、レジスタ５３〜５６（ＲＥＧ４１，ＲＥＧ５１，ＲＥＧ６１，ＲＥＧ７１）の出力とタイミングを一致させるようにしている。
【００５３】
第３の実施形態
本発明の第３の実施形態である８点１次元ＤＣＴ／ＩＤＣＴ演算共用装置を図３に示す。この実施形態は、前記第１の実施形態のＤＣＴ演算装置と第２の実施形態のＩＤＣＴ演算装置を組み合わせた回路構成であり、同一部分は同一符号で示している。この実施形態の８点１次元ＤＣＴ／ＩＤＣＴ演算共用装置においては、乗算器７９〜８２の入力側のセレクタ１０６と加減算器１０５の出力側のセレクタ１０７の入力をａ側とすることによりＤＣＴ演算を実行でき、セレクタ１０６と１０７の入力をｂ側とすることによりＩＤＣＴ演算を実行できる。
【００５４】
第４の実施形態
本発明の第４の実施形態である８点×８点２次元ＤＣＴ／ＩＤＣＴ演算共用装置を図４に示す。
【００５５】
この実施形態の８点×８点２次元ＤＣＴ／ＩＤＣＴ演算共用装置は、上記第３の実施形態で示した８点１次元ＤＣＴ／ＩＤＣＴ演算共用装置１０８と１１０、及び８×８Ｗの２次元転置メモリ１０９で構成される。８×８Ｗの２次元転置メモリ１０９は、同時に書き込みと読み出しができ、水平方向に書き込んだデータを垂直方向に読み出せるメモリである。８×８個の２次元の信号に対して８点１次元ＤＣＴ／ＩＤＣＴ演算共用装置１０８により水平方向に８点１次元ＤＣＴ（または８点１次元ＩＤＣＴ）を８回実行し、その結果を８×８Ｗの２次元転置メモリ１０９に水平方向に書き込んで記憶する。次に、２次元転置メモリ１１０から垂直方向に読み出して８点１次元ＤＣＴ／ＩＤＣＴ演算共用装置１１０により８点１次元ＤＣＴ（または８点１次元ＩＤＣＴ）を８回実行する。尚、８点１次元ＤＣＴ／ＩＤＣＴ演算共用装置１０８により垂直方向に１次元ＤＣＴ（または１次元ＩＤＣＴ）を行い、８点１次元ＤＣＴ／ＩＤＣＴ演算共用装置１１０により水平方向に８点１次元ＤＣＴ（または８点１次元ＩＤＣＴ）を行っても良い。
【００５６】
尚、本実施形態では、８点×８点２次元ＤＣＴ／ＩＤＣＴ演算装置の場合のみ示したが、８点×８点２次元ＤＣＴ演算装置と８点×８点２次元ＩＤＣＴ演算装置も同様にして構成できる。
【００５７】
以上の説明から分かるように、本発明によれば、通常の画像符号化装置において適正な入出力レートである１ＣＫ当たり１個の入力と出力という条件のとき、従来技術と同程度のハード規模であって、パイプライン段数を深くしても無効サイクルが発生しない２次元ＤＣＴ／ＩＤＣＴ演算共用装置を実現することができる。
【００５８】
すなわち、従来技術の第１〜第３の例に示した８点１次元ＤＣＴ／ＩＤＣＴ演算共用装置は１ＣＫ当たり２個の結果を出力するのに対し、本発明の第１〜第３の実施形態は１ＣＫ当たり１個の結果を出力する。
【００５９】
両者の代表として、従来技術の第３の例と本発明の第３の実施形態のハード量を比較すると、固定または半固定係数乗算器の規模は両者とも同一であるが、加算器と加減算器の総数は従来技術の第３の例が１２個、本発明の第３の実施形態が６個である。本発明の第３の実施形態のハード量は、従来技術の第３の例の２／３程度であるが、スループットは半分になる。
【００６０】
ただし、本発明の利用分野である画像符号化では、従来技術の第４の例と本発明の第４の実施形態に示した８点×８点２次元ＤＣＴ／ＩＤＣＴ演算共用装置が用いられる。これらは、両方とも６４ＣＫ当たり６４個の結果を出力し、スループットは等しい。
【００６１】
一方、従来技術の第４の例では８点１次元ＤＣＴ／ＩＤＣＴ演算共用装置を１個用いるのに対し、本発明の第４の実施形態では水平方向用と垂直方向用に２個の８点１次元ＤＣＴ／ＩＤＣＴ演算共用装置を用いる。これらだけを比較すると、本発明の第４の実施形態のハード量は従来技術の第３の例の１．５倍程度になる。しかし、「発明が解決しようとする課題」において示したように、従来技術の第４の例には前段または後段の装置とのインタフェース回路が複雑になるという欠点がある。この問題を解決するために２次元ＤＣＴ／ＩＤＣＴ演算共用装置の前段に１データ書き込みと２データ読み出しが同時にできる３ポートの６４Ｗメモリ、後段に２データ書き込みと１データ読み出しが同時にできる３ポートの６４Ｗメモリを設け、それらのメモリでバッファリングして入力と出力を１ＣＫ当たり１データに変換する構成を採用した場合には、本発明の第４の実施形態と従来技術の第４の例のハード量は同程度となる。
【００６２】
さらに、図４に示したように本発明の第４の実施形態は、入力ＩＮから出力ＯＵＴまでパイプライン化されているため、８点１次元ＤＣＴ／ＩＤＣＴ演算共用装置１０８と１１０のパイプライン段数が深くても無効サイクルを生じない。もちろん、従来技術の第３の例である８点１次元ＤＣＴ／ＩＤＣＴ演算共用装置を水平方向用と垂直方向用に２個と４ポートメモリを用いて、入力から出力までパイプライン化することも可能ではあるが、この構成では１ＣＫ当たり２個の入力と出力となる。これは、ハード量が２倍になるにもかかわらず、通常の画像符号化装置ではオーバスペックであり、その性能を生かせない。
【００６３】
尚、上記実施形態においては、信号数が８点×８点の場合のみ説明したが、本発明は任意の信号数の１次元または２次元ＤＣＴ演算装置、１次元または２次元ＩＤＣＴ演算装置、及び１次元または２次元ＤＣＴ／ＩＤＣＴ演算共用装置に対して同様に適用できる。
【００６４】
【発明の効果】
以上の説明から明らかなように、本発明によれば、通常の画像符号化装置において適正な入出力レートである１ＣＫ当たり１個の入力と出力という条件のとき、従来技術と同程度のハード規模であって、パイプライン段数を深くしても無効サイクルが発生しないようにすることができるＤＣＴ／ＩＤＣＴ演算装置を実現できるという効果がある。
【図面の簡単な説明】
【図１】本発明の第１の実施形態である８点１次元ＤＣＴ演算装置の構成図である。
【図２】本発明の第２の実施形態である８点１次元ＩＤＣＴ演算装置の構成図である。
【図３】本発明の第３の実施形態である８点１次元ＤＣＴ／ＩＤＣＴ演算共用装置の構成図である。
【図４】本発明の第４の実施形態である８点２次元ＤＣＴ／ＩＤＣＴ演算共用装置の構成図である。
【図５】従来技術の第１の例を示す構成図である。
【図６】従来技術の第２の例を示す構成図である。
【図７】従来技術の第３の例を示す構成図である。
【図８】従来技術の第４の例を示す構成図である。
【図９】従来技術の第４の例の動作を説明するための説明図である。
【図１０】従来技術の第４の例の問題点を説明するための説明図である。
【符号の説明】
４１〜４８，４９〜５６，７５〜７７…レジスタ、７８…加減算器、７９〜８２…半固定乗算器、８３〜８６…バス、８７〜９０…累算器、９１〜９４…セレクタ、９５…加算器、９６〜９８…加減算器、５７，５８…セレクタ、１０３…セレクタ、１０５…加減算器、１０８，１１０…８点１次元ＤＣＴ／ＩＤＣＴ演算共用装置、１０９…２ポート６４Ｗ転置メモリ。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an arithmetic unit for executing a one-dimensional or two-dimensional DCT (Discrete Cosine Transform) used in image coding and a reverse one-dimensional or two-dimensional IDCT (Inverse Discrete Cosine Transform). .
[0002]
[Prior art]
(1) DCT calculation and IDCT calculation
Let f (i) (i = 0,1,2,3,4,5,6,7) be the original signal and F (j) (j = 0,1,2,3,4,5,6,7 ) Is a DCT transform coefficient, the 8-point one-dimensional DCT operation is expressed by the following equations “Equation 1” and “Equation 2”.
[0003]
[Expression 1]

[0004]
[Expression 2]

[0005]
Where A = 2/2, B = cos (π / 8) / 2, C = cos (3π / 8) / 2, D = cos (π / 16) / 2, E = cos (3π / 16) / 2, F = cos (5π / 16) / 2, and G = cos (7π / 16) / 2.
[0006]
Further, the 8-point one-dimensional IDCT calculation can be expressed by the following formulas “Equation 3” and “Equation 4”.
[0007]
[Equation 3]

[0008]
[Expression 4]

[0009]
Here, the first term on the right side of the above formulas “Equation 3” and “Equation 4” is the same, and the second term on the right side of the equations “Equation 3” and “Equation 4” is also the same.
[0010]
As described above, the eight-point one-dimensional DCT calculation and the eight-point one-dimensional IDCT calculation are realized by multiplying and multiplying at most seven kinds of coefficients and data. For this reason, it is not necessary to use general-purpose multipliers in the DCT calculation and the IDCT calculation, and these calculation circuits can be configured by using seven types of fixed coefficient multipliers corresponding to the coefficients A to G.
[0011]
First example of prior art
FIG. 5 shows an 8-point one-dimensional DCT arithmetic apparatus as a first example of the prior art. This DCT arithmetic device operates in synchronization with a clock CK (hereinafter simply referred to as CK), inputs two original signals per CK, and outputs two DCT coefficients. The above expression “Equation 1” is calculated by the three fixed coefficient multipliers 3 to 5 and the four accumulators 17 to 20 on the left side of FIG. 5, and the above expression “Equation 2” is calculated on the right side of FIG. The fixed coefficient multipliers 6 to 9 and the four accumulators 21 to 24 are used. A specific operation will be described below.
[0012]
Initially, the registers 41 to 48 in the accumulators 17 to 24 are reset, and their values are set to “0”. After this, two original signals per CK are f (0), f (1), f (2), f (3) on IN0, and f (7), f (6), f (5), The signals are input in the order of f (4) and input to the adder 1 and the subtracter 2. Fixed coefficient multipliers 3 to 5 (MPYA, MPYB, MPYC) are multipliers whose multiplicands are fixed to A, B, and C, respectively. In

accumulators

17, 18, 19, and 20, F (0), F (2), F (4), F (6) are calculated.
[0013]
Further, the adders /

subtracters

34, 35 and 36 in the accumulators 18 to 20 can calculate “a + b” or “−a + b” from the inputs a and b. In adder 1, f (0) + f (7), f (1) + f (6), f (2) + f (5), f (3) + f (4) are calculated, and fixed coefficient multiplication Input to devices 3-5.
[0014]
First, when f (0) + f (7) is input, the

multipliers

3, 4 and 5 respectively receive A × {f (0) + f (7)} and B × {f (0) + f (7)}, C × {f (0) + f (7)} are calculated in parallel and output to the

buses

10, 11, and 12. In the accumulator 17, by selecting the bus 10 by the selector 25, A × {f (0) + f (7)} is set in the register 41 (REG00). At this time, since the register 41 (REG00) is initially reset to “0”, the adder 33 becomes through. Similarly, in the

accumulators

18, 19, and 20, by selecting the

buses

11, 10, and 12 by the

selectors

26, 27, and 28, respectively, B × {f (0) + f (7)}, A × {f (0) + f (7)}, C × {f (0) + f (7)} are set in the register 42 (REG10), the register 43 (REG20), and the register 44 (REG30).
[0015]
Next, in adder 1, f (1) + f (6) is calculated and input to fixed coefficient multipliers 3-5. In the

multipliers

3, 4 and 5, A × {f (1) + f (6)}, B × {f (1) + f (6)}, C × {f (1) + f (6)} Are calculated in parallel and output to the

buses

10, 11 and 12. In the accumulator 17, the bus 10 is selected by the selector 25. At this time, since A × {f (0) + f (7)} is set in the register 41 (REG00), the adder 33 uses A × {f (0) + f (7)} + A ×. {f (1) + f (6)} is calculated, and its value is set in the register 41 (REG00).
[0016]
Similarly, in the accumulator 18, the bus 12 is selected by the selector 26, and B × {f (0) + f (7)} + C × {f (1) + f (6) is stored in the register 42 (REG10). )} Is set. In the accumulator 19, the bus 10 is selected by the selector 27. At this time, the adder 35 calculates A × {f (0) + f (7)} − A × {f (0) + f (7)} and sets it in the register 43 (REG20). Similarly, C × {f (0) + f (7)} − B × {f (0) + f (7)} is set in the register 44 (REG30). Thereafter, the same calculation is performed, and F0, F2, F4, and F6 are obtained at the 4th CK. However, at this time, these are set not in the registers 41 to 44 (REG00 to REG30) but to the registers 49 to 52 (REG01 to REG31). Simultaneously with this setting operation, the registers 41 to 44 (REG00 to REG30) are reset, and an 8-point one-dimensional DCT operation newer than the next CK is started.
[0017]
On the other hand, the fixed coefficient multipliers 6 to 9 (MPYD, MPYD, MPYF, MPYG) are multipliers in which the multiplicands are fixed to the above D, E, F, G, respectively, and the

accumulators

21, 22, 23, 24 are F (1), F (3), F (5) and F (7) are calculated respectively. Further, the adders /

subtracters

38, 39, and 40 in the accumulators 22 to 24 can calculate "a + b" or "-a + b" from the inputs a and b. In the subtractor 2 connected to the inputs IN0 and IN1, f (0) -f (7), f (1) -f (6), f (2) -f (5), f (3) -f ( 4) is calculated, and F (1), F (3), F (5), F (7) are calculated in the same manner as F (0), F (2), F (4), F (6) above. ) Is calculated and set in the registers 53 to 56 (REG41 to REG71) in the accumulators 21 to 24. In the accumulators 21 to 24, 29 to 32 are selectors, 37 is an adder, 38 to 40 are adders / subtracters, and 45 to 48 are registers (REGs 40 to 70).
[0018]
The DCT coefficients F (0) to F (7) set in the registers 49 to 56 of the accumulators 17 to 24 are sent from OUT0 to F (0), F (2), and F (4) every 1 CK through the bus 57. , F (6), and the bus 58, the signals are output from OUT1 in the order of F (1), F (3), F (5), F (7).
[0019]
As for the technology relating to such a DCT arithmetic apparatus, for example, A. Madisetti and ANWillson, “A 100 MHz 2-D8 × 8 DCT / IDCT Processor for HDTV Applications”, IEEE Trans. Circuits Syst. Video Technol., Vol. No. 2, pp. 158-164, April 1995. This document also describes the techniques shown in the second, third, and fourth examples of the prior art described below.
[0020]
Second example of the prior art
FIG. 6 shows an 8-point one-dimensional IDCT arithmetic apparatus that is a second example of the prior art. The configuration of this IDCT arithmetic unit is a structure in which the adder 1 and the subtracter 2 are deleted from the 8-point one-dimensional DCT arithmetic unit of the first example of the prior art, and an adder 59 and a subtracter 60 are added to the output unit. It is.
[0021]
DCT coefficients F (0), F (2), F (4), and F (6) are input to the input IN0 for each CK. These are input to the fixed coefficient multipliers 3 to 5 (MPYA, MPYB, MPYC), and the multiplication of the matrix of the first term on the right side of the above-mentioned formulas [Equation 3] and [Equation 4] is performed on the left side of FIG. Calculation is performed by fixed coefficient multipliers 3 to 5 and four accumulators 17 to 20. In this case, the configurations of the accumulators 17 to 20 are the same as those in FIG. 5 and are denoted by the same reference numerals.
[0022]
On the other hand, F (1), F (3), F (5), and F (7) are input to the input IN1. These are input to the fixed coefficient multipliers 6 to 9 (MPYD, MPYE, MPYF, MPYG), and the multiplication of the matrix of the second term on the right side of the above-mentioned formulas [Equation 3] and [Equation 4] is four on the right side of FIG. The fixed coefficient multipliers 6 to 9 and the four accumulators 21 to 24 are used. The configurations of the accumulators 21 to 24 are the same as those in FIG. 5 and are denoted by the same reference numerals.
[0023]
The addition of the multiplication result of the matrix of the first term on the right side of the expression “Equation 3” and the multiplication result of the matrix of the second term on the right side is calculated by the adder 59, and in parallel with this, the right side of the expression “Equation 4” The subtracter 60 calculates the subtraction of the multiplication result of the matrix of the first term and the multiplication result of the matrix of the second term on the right side. As a result, for each CK, f (0), f (1), f (2), f (3) are output from the adder 59, and f (7), f (6), f ( 5) and f (4) are obtained and output from OUT0 and OUT1.
[0024]
Third example of prior art
FIG. 7 shows an 8-point one-dimensional DCT / IDCT arithmetic shared apparatus which is a third example of the prior art. This conventional example has a circuit configuration in which the DCT arithmetic device of the first example of the prior art is combined with the IDCT arithmetic device of the second example of the prior art.
[0025]
The DCT operation can be executed by setting all the inputs of the

selectors

61, 62, 63, and 64 to the a side, and the IDCT operation can be executed by setting all the inputs of the

selectors

61, 62, 63, and 64 to the b side. 5 and 6 are denoted by the same reference numerals.
[0026]
Fourth example of prior art
FIG. 8 shows an 8-point × 8-point two-dimensional DCT / IDCT operation sharing apparatus which is a fourth example of the prior art. This is composed of an 8-point one-dimensional DCT / IDCT calculation sharing device 67 and an 8 × 8 W (word) two-dimensional transposition memory 68, which is the third example of the prior art. The 8 × 8 W two-dimensional transposition memory 68 is a memory that can simultaneously write or read two pixels and can read data written in the horizontal direction in the vertical direction. However, the two-dimensional transposition memory 68 is actually a memory having 64 W continuous addresses (i = 0 to 63), and x = 0 to 7 (i / 8, the remainder obtained by dividing i by 8) and y It is accessed in the horizontal or vertical direction by a two-dimensional virtual address of = 0 to 7 (y = i / 8, i divided by 8 and rounded down).
[0027]
As a specific operation, an 8-point 1-dimensional DCT (or 8-point 1-dimensional IDCT) is first executed 8 times in the horizontal direction on 8 × 8 two-dimensional signals, and the result is 8 × 8 W. The two-dimensional transposition memory 68 is written and stored in the horizontal direction. Next, the 8-point one-dimensional DCT (or 8-point one-dimensional IDCT) is executed eight times by reading from the two-dimensional transposition memory 68 in the vertical direction. Note that one-dimensional DCT (or one-dimensional IDCT) may be performed first in the vertical direction, and eight-point one-dimensional DCT (or eight-point one-dimensional IDCT) may be performed in the horizontal direction for the second time.
[0028]
[Problems to be solved by the invention]
However, the 8-point × 8-point two-dimensional DCT / IDCT computation sharing apparatus, which is the fourth example of the prior art, alternately repeats horizontal DCT or IDCT and vertical DCT or IDCT. Since input from the outside of the two-dimensional DCT / IDCT operation sharing device is necessary only when performing horizontal DCT or IDCT, two data per CK are input in a concentrated manner between 32CK and between 32CK. There must be no input. Further, since the output to the outside occurs in the case of DCT or IDCT in the vertical direction, 2 data per 1CK are concentrated and output during 32CK, and there is no output during 32CK.
[0029]
When the input and output are generated in bursts in this way, the circuit of the interface between the two-dimensional DCT / IDCT operation sharing apparatus and the preceding or succeeding apparatus becomes complicated.
[0030]
To alleviate this, a 3-port 64W memory capable of simultaneously writing 1 data and reading 2 data in the former stage of the two-dimensional DCT / IDCT arithmetic shared device, and a 3-port 64W memory capable of simultaneously writing 2 data and 1 data in the subsequent stage It is also possible to convert the input and output to 1 data per 1 CK by buffering with those memories, but the hardware scale increases.
[0031]
In addition, the 8-point one-dimensional DCT / IDCT computation sharing device 67 is often pipelined for high-speed operation, but this causes a delay from the start of data input to the output of the first data, That is, pipeline latency occurs. In the configuration shown in the fourth example of the prior art, reading cannot be started unless writing to the two-dimensional transposition memory 68 is completed. Therefore, when the latter one-dimensional DCT or IDCT is started, the latency is always invalidated. A cycle will occur. In order to reduce this invalid cycle as much as possible, it is conceivable that the two-dimensional transposition memory 68 is provided with four ports so that two data writing and two data reading can be performed simultaneously. . Further, even if the number of ports is four, the latency reduction effect is limited as long as a single one-dimensional DCT / IDCT arithmetic shared device is used.
[0032]
9 and 10 show the case of IDCT. FIG. 9 shows a case where the horizontal IDCT calculation result of the eighth row is written in the two-dimensional transposition memory 68 and at the same time the first column signal is read for the vertical IDCT calculation. The writing on the eighth line is {f (0), f (7)}, {f (1), f (6)}, {f (2), f (5)}, {f (3), f (4)} is performed in the order of {F (0), F (1)}, {F (2), F (3)}, {F (4), F (5)} , {F (6), F (7)}.
[0033]
However, as shown in FIG. 9, if F (7) = f (0) and f (0) is not written to the two-dimensional transposition memory 68, it cannot be read as F (7).
[0034]
FIG. 10 shows a case where the 8-point one-dimensional DCT / IDCT operation sharing apparatus 67 is configured by a five-stage pipeline. For simplicity, only pipeline registers 69-74 are shown. As shown in FIG. 9, F7 read for the latter one-dimensional DCT is the same as f0 which is the result of the former one-dimensional DCT operation. Therefore, F7 cannot be read unless f0 is written. Therefore, as can be seen from FIG. 10, when the number of pipeline stages is larger than five, there is a space between the cycle in which f0 is written to the transposed memory 68 and the cycle in which F7 is read from the transposed memory 68, in other words, there is a space in the pipeline. Will occur and an invalid cycle will occur.
[0035]
The present invention provides a one-dimensional or two-dimensional DCT arithmetic unit and a one-dimensional or two-dimensional IDCT capable of preventing an invalid cycle from occurring regardless of the number of pipeline stages under the condition of one input and output per CK. An object is to provide an arithmetic device.
[0036]
[Means for Solving the Problems]
In the present invention, when a one-dimensional k-point DCT arithmetic circuit or a one-dimensional k-point IDCT arithmetic circuit is configured, conventionally, k fixed coefficient multipliers (k is a power of 2) that can take only one value as a multiplicand. In contrast to the above, k / 2 semi-fixed multipliers that can take one or two values as multiplicands are used to reduce the amount of hardware, and this one-dimensional k-point DCT arithmetic circuit or one-dimensional k-point By separately providing IDCT arithmetic circuits for the horizontal direction and the vertical direction, the above-described problems are solved without causing an increase in the amount of hardware.
[0037]
  That is, this applicationof1st inventionIshorizontal directionOrOne-dimensional DCT calculation device that performs k-point DCT calculation in the vertical directionBecause, K / 2 (k is a power of 2) semi-fixed multipliers with only one or two values selected from coefficients to be multiplied to the input original signal, f (0), f (k-1), f (1), f (k-2), f (2), f (k-3), ..., f (k / 2-1), f (k / 2) F (0) + f (k-1), f (0) -f (k-1), f (1) + f (k-2), f (1 ) -f (k-2), f (2) + f (k-3),f (2) -f (k-3), ..., f (k / 2-1) + f (k / 2), f (k / 2-1) -f (k / 2) in order, and outputs the k / 2 An adder / subtracter for inputting to the semi-fixed multiplier and a storage means for holding two accumulated values, and selecting one from the outputs of the k / 2 semi-fixed multipliers and adding them as they are or positive / negative Are added to the two accumulated values alternately, and DCT coefficients F (0) and F (1), F (2) and F (3), F (4 ) And F (5),..., F (k-2) and F (k-1) are output, and k / 2 accumulators are output from the k / 2 accumulators. The value of the DCT coefficient is held, and the k DCT coefficients are defined as F (0), F (1), F (2), F (3), F (4),..., F (k− 2) and a memory circuit for outputting in the order of F (k-1).
[0038]
  Also,Of this applicationSecond inventionPerforms a k-point IDCT operation in the horizontal or vertical directionOne-dimensional IDCT arithmetic unitBecause, Only one or two values selected from the coefficients multiplied by the DCT coefficient are multiplicands, and F (0), F (1), F (2), F (3), F (4),. ..., k / 2 (k is a power of 2) semi-fixed multipliers for multiplying the input DCT coefficients one by one in the order of F (k-2), F (k-1), and two Storage means for holding accumulated values, selecting one from the outputs of the k / 2 semi-fixed multipliers, selecting whether to add as it is or inverting the sign and adding the two K / 2 accumulators that alternately add to the accumulated values of K, and storage means for holding k values output from the k / 2 accumulators, from the accumulator The output k values are added k / 2 times and subtracted k / 2 times to obtain f (0), f (1), f (2), f (3), f (4 ),..., F (k-2), f (k-1), and an adder / subtractor for obtaining the original signal in this order.
[0039]
  Also,Of this applicationThird inventionPerforms k-point DCT computation or k-point IDCT computation in the horizontal or vertical directionOne-dimensional DCT / IDCT arithmetic unitBecause, K / 2 (k is a power of 2) semi-fixed multipliers with only one or two values selected from coefficients multiplied by the original signal and the DCT coefficients, f (0), f (k-1), f (1), f (k-2), f (2), f (k-3), ..., f (k / 2-1), f (k / 2) F (0) + f (k-1), f (0) -f (k-1), f (1) + f (k-2), f (1 ) -f (k-2), f (2) + f (k-3), f (2) -f (k-3), ..., f (k / 2-1) + f (k / 2 ), f (k / 2-1) -f (k / 2) in order, and the output is input to the k / 2 semi-fixed multipliers or F (0), F (1 ), F (2), F (3), F (4), ..., k / 2 DCT coefficients that are input one by one in the order of F (k-2), F (k-1) And an adder / subtractor that can select whether to input to the semi-fixed multiplier, and a storage means that holds two accumulated values, and selects one of the outputs of the k / 2 semi-fixed multipliers. , Select whether to add as it is or invert and add, and alternately add to the two accumulated values k / 2 And a storage means for holding k values output from the k / 2 accumulators, and F (0), F (1), F (2), F (3 ), F (4),..., F (k-2), F (k-1) in the order of output, or for the k values output from the accumulator, Perform k / 2 additions and k / 2 subtractions, f (0), f (1), f (2), f (3), f (4), ..., f (k-2) , and an adder / subtractor capable of selecting whether to obtain and output the original signal in the order of f (k−1).
[0040]
  In addition,4th inventionIsTwo-dimensional DCT arithmetic device that performs a two-dimensional DCT operation of m points × n points (m and n are powers of 2)BecauseA first one-dimensional DCT calculation circuit for performing k-point DCT calculation in the horizontal direction, a second one-dimensional DCT calculation circuit for performing k-point DCT calculation in the vertical direction, and data written in the horizontal direction The first and second one-dimensional DCT arithmetic circuits are inputted with a two-port memory of m × n words that can be read in the vertical direction or the data written in the vertical direction can be read in the horizontal direction. K / 2 (k is a power of 2) semi-fixed multipliers with only one or two values selected from the coefficients multiplied by the original signal as multiplicands, and f (0), f (k− 1), f (1), f (k-2), f (2), f (k-3), ..., one in the order of f (k / 2-1), f (k / 2) F (0) + f (k-1), f (0) -f (k-1), f (1) + f (k-2), f (1) -f ( k-2), f (2) + f (k-3),f (2) -f (k-3), ..., f (k / 2-1) + f (k / 2), f (k / 2-1) -f (k / 2) in order, and outputs the k / 2 An adder / subtracter for inputting to the semi-fixed multiplier and a storage means for holding two accumulated values, and selecting one from the outputs of the k / 2 semi-fixed multipliers and adding them as they are or positive / negative Are added to the two accumulated values alternately, and DCT coefficients F (0) and F (1), F (2) and F (3), F (4 ) And F (5),..., F (k-2) and F (k-1) are output, and k / 2 accumulators are output from the k / 2 accumulators. The value of the DCT coefficient is held, and the k DCT coefficients are defined as F (0), F (1), F (2), F (3), F (4),..., F (k− 2) and a memory circuit for outputting in the order of F (k-1).
[0041]
  Also,Of this application5th inventionIsExecutes a two-dimensional IDCT operation of m points × n points (m and n are powers of 2)2D IDCT operationapparatusBecauseA first one-dimensional IDCT operation circuit for performing a k-point IDCT operation in the horizontal direction, a second one-dimensional IDCT operation circuit for performing a k-point IDCT operation in the vertical direction, and data written in the horizontal direction A two-port memory of m × n words from which data written in the vertical direction or data written in the vertical direction can be read out in the horizontal direction, and the first and second one-dimensional IDCT arithmetic circuits use DCT coefficients F (0), F (1), F (2), F (3), F (4), ..., F ( k−2), F (k−1) in order, k / 2 (k is a power of 2) semi-fixed multipliers that multiply the input DCT coefficients one by one, and two accumulated values Storage means for holding, selecting one from the outputs of the k / 2 semi-fixed multipliers, and selecting whether to add as it is or to invert the sign And k / 2 accumulators that alternately add to the two accumulated values, and storage means for holding k values output from the k / 2 accumulators, The k values output from the accumulator are subjected to k / 2 addition and k / 2 subtraction to obtain f (0), f (1), f (2), f (3 ), f (4),..., f (k-2), f (k-1), and an adder / subtractor for obtaining the original signal in this order.
[0042]
  Also,Of this application6th inventionIsExecute a two-dimensional DCT operation or a two-dimensional IDCT operation of m points × n points (m and n are powers of 2)2D DCT / IDCT operationapparatusBecauseA first one-dimensional DCT / IDCT arithmetic shared circuit for performing k-point DCT computation or k-point IDCT computation in the horizontal direction, and a second one-dimensional DCT for performing k-point DCT computation or IDCT computation in the vertical direction / IDCT arithmetic shared circuit and a 2-port memory of m × n words from which data written in the horizontal direction can be read out in the vertical direction, or data written in the vertical direction can be read out in the horizontal direction, The first and second one-dimensional DCT / IDCT operation sharing circuits are k / 2 (k is a power of 2) in which only one or two values selected from coefficients multiplied by the original signal and the DCT coefficient are multiplicands. ) Semi-fixed multipliers and f (0), f (k-1), f (1), f (k-2), f (2), f (k-3), ..., f ( f (0) + f (k-1), f (0) -f (k-1), f from the original signals input one by one in the order of k / 2-1) and f (k / 2) (1) + f (k-2), f (1) -f (k-2), f (2) + f (k-3), f (2) -f (k-3), ... , f (k / 2-1) + f (k / 2), f (k / 2-1) -f (k / 2) in order, and the output is k / 2 semi-fixed multiplications Or F (0), F (1), F (2), F (3), F (4), ..., F (k-2), F (k-1) An adder / subtractor capable of selecting whether to input the DCT coefficients input one by one to the k / 2 semi-fixed multipliers, and storage means for holding two accumulated values, Select one from the outputs of the two semi-fixed multipliers, select whether to add as it is or invert the sign, and add to the two accumulated values alternately k / 2 accumulations A storage means for holding k values output from the k / 2 accumulators, and F (0), F (1), F (2), F (3), DCT coefficients are output in the order of F (4),..., F (k-2), F (k-1), or for the k values output from the accumulator, k / F (0), f (1), f (2), f (3), f (4), ..., f (k-2), f after adding 2 times and subtracting k / 2 times (k And an adder / subtractor capable of selecting whether to obtain and output an original signal in the order of -1).
[0043]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings.
[0044]
First embodiment
FIG. 1 shows an 8-point one-dimensional DCT arithmetic apparatus according to the first embodiment of the present invention. This DCT arithmetic device operates in synchronization with CK, inputs one original signal per CK, and outputs one DCT coefficient. The above-described equations “Equation 1” and “Equation 2” are calculated alternately for the odd-numbered CK and the even-numbered CK. As circuit configurations, registers 75 to 77, adder / subtractor 78, semi-fixed multipliers 79 to 82 (MPYA / D to MPYG), accumulators 87 to 90, registers 49 to 52 (REG01 to 31), registers 53 to 56 (REGs 41 to 71) and a selector 103 are provided. The accumulator 87 includes a selector 91, an adder 95, registers (REG00) 41 and 45 (REG40), and a selector 99. The accumulators 88 to 90 include selectors 92 to 94, adders / subtractors 96 to 98, registers (REG10) 42 to 44 (REG30) and 46 (REG50) to 48 (REG70), and selectors 100 to 102. Yes.
[0045]
A specific operation will be described below.
Initially, the registers 41 to 48 in each accumulator 87 to 90 are reset and their values are set to “0”. After this, one original signal per CK is f (0), f (7), f (1), f (6), f (2), f (5), f (3) for each CK from IN. , f (4) in this order. Of these, f (0), f (1), f (2), and f (3) are set in the register 75 (REG0), shifted to the register 76 (REG1) after 1CK, and held for 2CK respectively. The Further, f (7), f (6), f (5), and f (4) are set in the register 77 (REG2) and are held for 2CK. Thereby, in the adder / subtractor 78, f (0) + f (7), f (0) -f (7), f (1) + f (6), f (1) -f (6) for each CK. , f (2) + f (5), f (2) -f (5), f (3) + f (4), f (3)-(4) are calculated and the result is semi-fixed multiplication Are input to the devices 79 to 82 (MPYA / D, MPYB / E, MPYC / F, MPYG).
[0046]
Semi-fixed multipliers 79 to 81 (MPYA / D, MPYB / E, MPYC / F) are multipliers whose multiplicands are fixed to two types of A and D, B and E, and C and F, respectively. When calculating “Equation 1”, A, B, and C are selected, and when calculating “Equation 2”, D, E, and F are selected. The semi-fixed multiplier 82 (MPYG) has a multiplicand fixed to G, and is used only for the calculation of the above-described equation “Equation 2”. In the

accumulators

87, 88, 89, 90, F (0) and F (1), F (2) and F (3), F (4) and F (5), F (6) and F (7, respectively. ) Is calculated alternately for each CK.
[0047]
When f (0) + f (7) is input first, the

semi-fixed multipliers

79, 80, 81 respectively A × {f (0) + f (7)}, B × {f (0) + f (7)}, C × {f (0) + f (7)} are calculated in parallel and output to the

buses

83, 84, and 85. In the accumulator 87, A × {f (0) + f (7)} is set in the register 41 (REG00) by selecting the bus 83 with the selector 91. At this time, the selector 99 selects the output of the register 41 (REG00). Since the register 41 (REG00) is initially reset to “0”, the adder 95 becomes through. Similarly, in the

accumulators

88, 89, 90, by selecting the

buses

84, 83, 85 by the

selectors

92, 93, 94, respectively, B × {f (0) + f (7)}, A × {f (0) + f (7)}, C × {f (0) + f (7)} are set in the register 42 (REG10), the register 43 (REG20), and the register 44 (REG30).
[0048]
Next, when f (0) -f (7) is input, the

semi-fixed multipliers

79, 80, 81, and 82 respectively receive D × {f (0) −f (7)}, E × {f ( 0) -f (7) I}, F × {f (0) -f (7)}, G × {f (0) -f (7)} are calculated in parallel, and

buses

83, 84, 85, 86.
[0049]
In the accumulator 87, D × {f (0) −f (7)} is set in the register 45 (REG 40) by selecting the bus 83 with the selector 91. At this time, the selector 99 selects the output of the register 45 (REG 40). Since the register 45 (REG 40) is initially reset to “0”, the adder 95 becomes through. Similarly, in the

accumulators

88, 89, 90, by selecting the

buses

84, 85, 86 by the

selectors

92, 93, 94, respectively, E × {f (0) −f (7)}, F × {f (0) -f (7)}, G × {f (0) -f (7)} are set in the register 46 (REG50), the register 47 (REG60), and the register 48 (REG70). Hereinafter, for each CK, the above-mentioned equations “Equation 1” and “Equation 2” are calculated alternately, and F (0), F (2), F (4), F (6) are obtained at the 7th CK, and the register 49 To 52 (REG01 to REG31). Further, F (1), F (3), F (5), and F (7) are obtained at the 8th CK and are set in the registers 53 to 56 (REG41 to REG71). The registers 41 to 44 (REG00 to REG30) are reset to the 7th CK and the registers 45 to 48 (REG40 to REG70) are reset to the 8th CK, and new 8-point one-dimensional DCT operations are started from the 8th and 9th clocks, respectively.
[0050]
The DCT coefficients F (0) to F (7) set in the registers 49 to 56 are selected by the

selectors

57, 58, and 103, and F (0), F (1), and F (2 ), F (3), F (4), F (5), F (6), F (7) in this order.
[0051]
Second embodiment
FIG. 2 shows an 8-point one-dimensional IDCT arithmetic apparatus that is the second embodiment of the present invention. The configuration of this IDCT arithmetic unit is the same as that of the 8-point one-dimensional DCT arithmetic unit according to the first embodiment, but registers 75, 76, 77 (REG0, REG1, REG2) and adder / subtractor 78 of the input unit, and selector 103 of the output unit. Is deleted, and a register 104 and an adder / subtracter 105 are added to the output unit, and the same parts as those in the first embodiment in FIG. In this embodiment, the multiplication of the matrix of the first term on the right side and the multiplication of the matrix on the second term on the right side of the above-mentioned formulas “Equation 3” and “Equation 4” are alternately executed by the odd number CK and the even number CK.
[0052]
The input IN has DCT coefficients F (0), F (1), F (2), F (3), F (4), F (5), F (6), F (7) for each CK. These are input in order and input to semi-fixed multipliers 79 to 82 (MPYA / D, MPYB / E, MPYC / F, MPYG). Then, the multiplication result of the matrix of the first term on the right side of the above-mentioned equations “Equation 3” and “Equation 4” is set in registers 7 to 52 (REG01, REG11, REG21, REG31) at 7CK, and the right side The multiplication result of the two-term matrix is set in the registers 53 to 56 (REG41, REG51, REG61, REG71). The addition / subtraction of the multiplication result of the matrix of the first term on the right side and the multiplication result of the matrix of the second term on the right side in “Equation 3” and “Equation 4” is calculated by the adder / subtractor 105. At that time, the

selectors

57 and 58 make f (0), f (1), f (2), f (3), f (4), f (5), f (6), f ( It is controlled to output in the order of 7). It should be noted that the registers 49 to 52 (REG01, REG11, REG21, REG31) obtain the multiplication result of the first term matrix on the right side and the registers 53 to 56 (REG41, REG51, REG61, REG71) on the second term matrix on the right side. There is a difference of 1CK in the timing at which the multiplication result is obtained, and if it is left as it is, a total of 8 additions and subtractions cannot be performed. Therefore, the output of the registers 49 to 52 (REG01, REG11, REG21, REG31) is delayed by 1 CK in the register 104 (REG3), thereby matching the timing with the outputs of the registers 53 to 56 (REG41, REG51, REG61, REG71). I try to let them.
[0053]
Third embodiment
FIG. 3 shows an 8-point one-dimensional DCT / IDCT calculation sharing apparatus according to the third embodiment of the present invention. This embodiment is a circuit configuration in which the DCT arithmetic device of the first embodiment and the IDCT arithmetic device of the second embodiment are combined, and the same parts are denoted by the same reference numerals. In the 8-point one-dimensional DCT / IDCT calculation sharing apparatus of this embodiment, the DCT calculation is performed by setting the inputs of the selector 106 on the input side of the multipliers 79 to 82 and the selector 107 on the output side of the adder / subtractor 105 to the a side. The IDCT operation can be executed by setting the inputs of the

selectors

106 and 107 to the b side.
[0054]
Fourth embodiment
FIG. 4 shows an 8-point × 8-point two-dimensional DCT / IDCT calculation sharing apparatus according to the fourth embodiment of the present invention.
[0055]
The 8-point × 8-point two-dimensional DCT / IDCT arithmetic shared apparatus of this embodiment is the same as the 8-point one-dimensional DCT / IDCT arithmetic shared

apparatuses

108 and 110 and 8 × 8 W two-dimensional transpose shown in the third embodiment. The memory 109 is configured. The 8 × 8 W two-dimensional transposition memory 109 is a memory that can write and read simultaneously and can read data written in the horizontal direction in the vertical direction. An 8-point 1-dimensional DCT (or 8-point 1-dimensional IDCT) is executed 8 times horizontally on the 8 × 8 2-dimensional signal by the 8-point 1-dimensional DCT / IDCT arithmetic shared apparatus 108, and the result is 8 It is written and stored in the x8W two-dimensional transposition memory 109 in the horizontal direction. Next, the 8-point one-dimensional DCT (or 8-point one-dimensional IDCT) is executed eight times by the 8-point one-dimensional DCT / IDCT arithmetic shared device 110 by reading from the two-dimensional transposition memory 110 in the vertical direction. The 8-point one-dimensional DCT / IDCT computation sharing device 108 performs one-dimensional DCT (or one-dimensional IDCT) in the vertical direction, and the 8-point 1-dimensional DCT / IDCT computation sharing device 110 performs 8-point one-dimensional DCT ( Alternatively, 8-point one-dimensional IDCT) may be performed.
[0056]
In this embodiment, only the case of an 8 point × 8 point two-dimensional DCT / IDCT arithmetic unit is shown, but the same applies to an 8 point × 8 point two dimensional DCT arithmetic unit and an 8 point × 8 point two dimensional IDCT arithmetic unit. Can be configured.
[0057]
As can be seen from the above description, according to the present invention, when a condition of one input and one output per 1 CK which is an appropriate input / output rate in a normal image encoding apparatus is used, the hardware scale is the same as that of the prior art. Thus, it is possible to realize a two-dimensional DCT / IDCT arithmetic shared apparatus in which an invalid cycle does not occur even when the number of pipeline stages is deepened.
[0058]
That is, the 8-point one-dimensional DCT / IDCT arithmetic shared apparatus shown in the first to third examples of the prior art outputs two results per CK, whereas the first to third embodiments of the present invention. Outputs one result per CK.
[0059]
As a representative of both, when comparing the hardware amount of the third example of the prior art and the third embodiment of the present invention, the scale of the fixed or semi-fixed coefficient multiplier is the same, but the adder and the adder / subtracter The total number is 12 in the third example of the prior art and 6 in the third embodiment of the present invention. The hardware amount of the third embodiment of the present invention is about 2/3 of the third example of the prior art, but the throughput is halved.
[0060]
However, in image coding, which is a field of application of the present invention, the 8-point × 8-point two-dimensional DCT / IDCT operation sharing apparatus shown in the fourth example of the prior art and the fourth embodiment of the present invention is used. They both output 64 results per 64CK and the throughput is equal.
[0061]
On the other hand, in the fourth example of the prior art, one 8-point one-dimensional DCT / IDCT arithmetic shared apparatus is used, whereas in the fourth embodiment of the present invention, two 8-points for the horizontal direction and the vertical direction are used. A one-dimensional DCT / IDCT arithmetic shared apparatus is used. Comparing only these, the hardware amount of the fourth embodiment of the present invention is about 1.5 times that of the third example of the prior art. However, as shown in “Problems to be Solved by the Invention”, the fourth example of the prior art has a drawback that the interface circuit with the preceding or succeeding apparatus becomes complicated. In order to solve this problem, a 3-port 64W memory capable of simultaneously writing 1 data and reading 2 data at the front stage of the two-dimensional DCT / IDCT arithmetic shared apparatus, and a 3-port 64W memory capable of simultaneously writing 2 data and 1 data at the subsequent stage. In the case of adopting a configuration in which memories are provided and the input and output are converted to one data per CK by buffering with those memories, the hardware amount of the fourth embodiment of the present invention and the fourth example of the prior art Is comparable.
[0062]
Furthermore, since the fourth embodiment of the present invention is pipelined from the input IN to the output OUT as shown in FIG. 4, the number of pipeline stages of the 8-point one-dimensional DCT / IDCT arithmetic shared

devices

108 and 110 Even if the depth is deep, no invalid cycle occurs. Of course, the third example of the prior art can be pipelined from input to output by using two 8-point one-dimensional DCT / IDCT arithmetic shared devices for both the horizontal and vertical directions and a 4-port memory. Although possible, this configuration results in two inputs and outputs per CK. This is an overspec in the normal image coding apparatus, although the hardware amount is doubled, and the performance cannot be utilized.
[0063]
In the above embodiment, only the case where the number of signals is 8 points × 8 points has been described. However, the present invention is a one-dimensional or two-dimensional DCT arithmetic device, a one-dimensional or two-dimensional IDCT arithmetic device having an arbitrary number of signals, and The present invention can be similarly applied to a one-dimensional or two-dimensional DCT / IDCT arithmetic shared apparatus.
[0064]
【The invention's effect】
As is apparent from the above description, according to the present invention, the hardware scale comparable to that of the prior art is obtained under the condition of one input and output per 1 CK which is an appropriate input / output rate in a normal image encoding apparatus. Thus, there is an effect that it is possible to realize a DCT / IDCT arithmetic unit capable of preventing an invalid cycle from occurring even if the number of pipeline stages is deepened.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of an 8-point one-dimensional DCT arithmetic device according to a first embodiment of the present invention.
FIG. 2 is a configuration diagram of an 8-point one-dimensional IDCT arithmetic device according to a second embodiment of the present invention.
FIG. 3 is a configuration diagram of an 8-point one-dimensional DCT / IDCT operation sharing apparatus according to a third embodiment of the present invention.
FIG. 4 is a configuration diagram of an 8-point two-dimensional DCT / IDCT operation sharing apparatus according to a fourth embodiment of the present invention.
FIG. 5 is a configuration diagram showing a first example of the prior art.
FIG. 6 is a block diagram showing a second example of the prior art.
FIG. 7 is a block diagram showing a third example of the prior art.
FIG. 8 is a configuration diagram showing a fourth example of the prior art.
FIG. 9 is an explanatory diagram for explaining the operation of the fourth example of the prior art.
FIG. 10 is an explanatory diagram for explaining a problem of the fourth example of the prior art.
[Explanation of symbols]
41-48, 49-56, 75-77 ... register, 78 ... adder / subtractor, 79-82 ... semi-fixed multiplier, 83-86 ... bus, 87-90 ... accumulator, 91-94 ... selector, 95 ... Adder, 96-98 ... adder / subtractor, 57, 58 ... selector, 103 ... selector, 105 ... adder / subtractor, 108, 110 ... 8-point one-dimensional DCT / IDCT arithmetic shared device, 109 ... 2-port 64W transposed memory.

Claims

A one-dimensional DCT calculation device that performs k-point DCT calculation in a horizontal direction or a vertical direction,
K / 2 (k is a power of 2) semi-fixed multipliers with only one or two values selected from coefficients multiplied to the input original signal as multiplicands;
f (0), f (k-1), f (1), f (k-2), f (2), f (k-3), ..., f (k / 2-1), f ( f (0) + f (k-1), f (0) -f (k-1), f (1) + f (k-2) from the original signals input one by one in the order of k / 2) ), f (1) -f (k-2), f (2) + f (k-3), f (2) -f (k-3) , ..., f (k / 2-1) + an adder / subtractor for calculating f (k / 2), f (k / 2-1) -f (k / 2) in order, and inputting the output to the k / 2 semi-fixed multipliers;
Storing means for holding two accumulated values, selecting one from the outputs of the k / 2 semi-fixed multipliers, and selecting whether to add as it is or to invert and add The DCT coefficients F (0) and F (1), F (2) and F (3), F (4) and F (5), ..., F are added to the two accumulated values alternately. k / 2 accumulators that output (k-2) and F (k-1);
The k DCT coefficient values output from the k / 2 accumulators are held, and the k DCT coefficients are represented by F (0), F (1), F (2), F (3 , F (4),..., F (k-2), F (k-1), and a storage circuit that outputs the data in this order.

A one-dimensional IDCT calculation device that performs k-point IDCT calculation in a horizontal direction or a vertical direction,
F (0), F (1), F (2), F (3), F (4), ... only one or two values selected from the coefficients multiplied by the DCT coefficient are taken as multiplicands. , F (k-2), F (k-1) in order, k / 2 (k is a power of 2) semi-fixed multipliers for multiplying the input DCT coefficients,
Storing means for holding two accumulated values, selecting one from the outputs of the k / 2 semi-fixed multipliers, and selecting whether to add as it is or to invert and add K / 2 accumulators that alternately add to the two accumulated values;
Storage means for holding k values output from the k / 2 accumulators, and adding k / 2 times and k to the k values output from the accumulators; / Subtract 2 times to obtain f (0), f (1), f (2), f (3), f (4), ..., f (k-2), f (k-1) A one-dimensional IDCT arithmetic device comprising: an adder / subtracter for obtaining an original signal in order.

A one-dimensional DCT / IDCT computing device that performs a k-point DCT computation or a k-point IDCT computation in a horizontal direction or a vertical direction,
K / 2 (k is a power of 2) semi-fixed multipliers with only one or two values selected from the coefficients multiplied by the original signal and the DCT coefficients,
f (0), f (k-1), f (1), f (k-2), f (2), f (k-3), ..., f (k / 2-1), f ( f (0) + f (k-1), f (0) -f (k-1), f (1) + f (k-2) from the original signals input one by one in the order of k / 2) ), f (1) -f (k-2), f (2) + f (k-3), f (2) -f (k-3), ..., f (k / 2-1) + f (k / 2), f (k / 2-1) -f (k / 2) are calculated in order, and the output is input to the k / 2 semi-fixed multipliers or F (0 ), F (1), F (2), F (3), F (4),..., F (k-2), F (k-1) An adder / subtractor capable of selecting whether to input to the k / 2 semi-fixed multipliers;
Storing means for holding two accumulated values, selecting one from the outputs of the k / 2 semi-fixed multipliers, and selecting whether to add as it is or to invert and add K / 2 accumulators that alternately add to the two accumulated values;
Storage means for holding k values output from the k / 2 accumulators, and F (0), F (1), F (2), F (3), F (4) , ..., F (k-2), F (k-1) are output in the order of DCT coefficients, or k / 2 additions to the k values output from the accumulator And k / 2 subtractions to obtain f (0), f (1), f (2), f (3), f (4), ..., f (k-2), f (k-1 And an adder / subtractor capable of selecting whether to obtain and output the original signal in the order of (1).

A two-dimensional DCT calculation device that executes a two-dimensional DCT calculation of m points × n points (m and n are powers of 2) ,
A first one-dimensional DCT calculation circuit for performing k-point DCT calculation in the horizontal direction;
A second one-dimensional DCT operation circuit for performing a k-point DCT operation in the vertical direction;
It has a two-port memory of m × n words that can read data written in the horizontal direction in the vertical direction, or read data written in the vertical direction in the horizontal direction,
The first and second one-dimensional DCT arithmetic circuits are k / 2 (k is a power of 2) in which only one or two values selected from coefficients multiplied to the input original signal are multiplicands. Semi-fixed multipliers,
f (0), f (k-1), f (1), f (k-2), f (2), f (k-3), ..., f (k / 2-1), f ( f (0) + f (k-1), f (0) -f (k-1), f (1) + f (k-2) from the original signals input one by one in the order of k / 2) ), f (1) -f (k-2), f (2) + f (k-3), f (2) -f (k-3) , ..., f (k / 2-1) + an adder / subtractor for calculating f (k / 2), f (k / 2-1) -f (k / 2) in order, and inputting the output to the k / 2 semi-fixed multipliers;
Storing means for holding two accumulated values, selecting one from the outputs of the k / 2 semi-fixed multipliers, and selecting whether to add as it is or to invert and add The DCT coefficients F (0) and F (1), F (2) and F (3), F (4) and F (5), ..., F are added to the two accumulated values alternately. k / 2 accumulators that output (k-2) and F (k-1);
The k DCT coefficient values output from the k / 2 accumulators are held, and the k DCT coefficients are represented by F (0), F (1), F (2), F (3 ), F (4),..., F (k-2), F (k-1) and a storage circuit for outputting in this order.

5. The two-dimensional DCT arithmetic device according to claim 4, wherein m = n = k.

A two-dimensional IDCT computing device that performs a two-dimensional IDCT computation of m points × n points (m and n are powers of 2) ,
A first one-dimensional IDCT calculation circuit for performing k-point IDCT calculation in the horizontal direction;
A second one-dimensional IDCT operation circuit for performing a k-point IDCT operation in the vertical direction;
It has a two-port memory of m × n words that can read data written in the horizontal direction in the vertical direction, or read data written in the vertical direction in the horizontal direction,
The first and second one-dimensional IDCT arithmetic circuits use only one or two values selected from coefficients multiplied by the DCT coefficient as multiplicands, and F (0), F (1), F (2 ), F (3), F (4),..., F (k-2), F (k-1) in order, k / 2 (k is 2) Power) semi-fixed multipliers,
Storing means for holding two accumulated values, selecting one from the outputs of the k / 2 semi-fixed multipliers, and selecting whether to add as it is or to invert and add K / 2 accumulators that alternately add to the two accumulated values;
Storage means for holding k values output from the k / 2 accumulators, and adding k / 2 times and k to the k values output from the accumulators; / Subtract 2 times to obtain f (0), f (1), f (2), f (3), f (4), ..., f (k-2), f (k-1) An adder / subtractor for obtaining an original signal in order.

The two-dimensional IDCT arithmetic apparatus according to claim 6, wherein m = n = k.

m points × n points (m and n are a power of 2) a two-dimensional DCT / IDCT arithmetic unit for performing two-dimensional DCT operation or two-dimensional IDCT operations,
A first one-dimensional DCT / IDCT operation sharing circuit for performing a k-point DCT operation or a k-point IDCT operation in the horizontal direction;
A second one-dimensional DCT / IDCT calculation shared circuit for performing a k-point DCT calculation or IDCT calculation in the vertical direction;
It has a two-port memory of m × n words that can read data written in the horizontal direction in the vertical direction, or read data written in the vertical direction in the horizontal direction,
The first and second one-dimensional DCT / IDCT arithmetic shared circuits use k / 2 (k is 2 as a multiplicand) with only one or two values selected from coefficients multiplied by the original signal and the DCT coefficients. Power of) semi-fixed multipliers,
f (0), f (k-1), f (1), f (k-2), f (2), f (k-3), ..., f (k / 2-1), f ( f (0) + f (k-1), f (0) -f (k-1), f (1) + f (k-2) from the original signals input one by one in the order of k / 2) ), f (1) -f (k-2), f (2) + f (k-3), f (2) -f (k-3), ..., f (k / 2-1) + f (k / 2), f (k / 2-1) -f (k / 2) are calculated in order, and the output is input to the k / 2 semi-fixed multipliers or F (0 ), F (1), F (2), F (3), F (4),..., F (k-2), F (k-1) An adder / subtractor capable of selecting whether to input to the k / 2 semi-fixed multipliers;
Storing means for holding two accumulated values, selecting one from the outputs of the k / 2 semi-fixed multipliers, and selecting whether to add as it is or to invert and add K / 2 accumulators that alternately add to the two accumulated values;
Storage means for holding k values output from the k / 2 accumulators, and F (0), F (1), F (2), F (3), F (4) , ..., F (k-2), F (k-1) are output in the order of DCT coefficients, or k / 2 additions to the k values output from the accumulator And k / 2 subtractions to obtain f (0), f (1), f (2), f (3), f (4), ..., f (k-2), f (k-1 2) a two-dimensional DCT / IDCT arithmetic apparatus, comprising: an adder / subtractor capable of selecting whether to obtain and output an original signal in the order of

9. The two-dimensional DCT / IDCT arithmetic apparatus according to claim 8, wherein m = n = k.