JP3733627B2

JP3733627B2 - Digital signal arithmetic unit

Info

Publication number: JP3733627B2
Application number: JP30070295A
Authority: JP
Inventors: 哲二郎近藤; 秀雄中屋; 賢堀士
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1995-10-25
Filing date: 1995-10-25
Publication date: 2006-01-11
Anticipated expiration: 2015-10-25
Also published as: JPH09120396A

Description

【０００１】
【発明の属する技術分野】
この発明は、ディジタル化された画素データまたは音声データの積和演算において、演算精度をおとさずに回路規模を削減することができるディジタル信号演算装置に関する。
【０００２】
【従来の技術】
従来、ディジタル化された画素データまたは音声データの積和演算を行う装置においては、演算精度を落とさないために積算器、加算器および周辺素子などがともにワーストケース（語長が最も長くなる場合）のビット数が扱えるようにしていたため、タップ数を増やすとそれだけ回路規模が大きくなる傾向があった。
【０００３】
このような従来方式のカスケードに接続された係数時変の積和演算回路のブロック図を図４に示す。４１で示す入力端子から８ビットにディジタル化された画素データが供給され、その画素データは、ｎタップブロック化回路４２へ供給される。ｎタップブロック化回路４２では、入力画素データをフィルタ演算のタップに応じたブロック化がなされる。ブロック化がなされた信号は、アドレス信号発生回路４３およびレジスタ４５へ供給される。
【０００４】
アドレス信号発生回路４３では、供給されたブロックに含まれる画素に対応するアドレス信号を発生し、そのアドレス信号は、予測係数メモリ回路４４₁へ供給される。予測係数メモリ回路４４₁は、レジスタ５１、５４、アドレスデコーダ回路５２およびメモリセル５３から構成される。アドレス信号発生回路４３からのアドレス信号は、レジスタ５１へ供給され、保持される。保持されたアドレス信号は、この装置のシステムクロックに基づいて、レジスタ５１から予測係数メモリ回路４４₂およびアドレスデコーダ回路５２へ供給される。
【０００５】
このように、システムクロックに基づいてアドレス信号は、予測係数メモリ回路４４_nまで供給され、その予測係数メモリ回路では、以下に示すような処理がなされる。アドレスデコーダ回路５２では、メモリセル５３に対応するように供給されたアドレス信号の変換が行われる。アドレスデコーダ回路５２において、変換されたアドレス信号は、メモリセル５３へ供給され、対応する予測係数が読み出される。読み出された予測係数は、メモリセル５３からレジスタ５４を介して、積和演算回路４７₁へ供給される。
【０００６】
積和演算回路４７₁は、積算器５５、レジスタ５６、５８および加算器５７から構成される。この積和演算回路４７₁には、レジスタ４５からブロック化された画素データ、丸め処理回路４６から２５ビットからなる８タップのデータおよび予測係数が供給される。供給された８ビットからなる画素データと１０ビットからなる予測係数は、積算器５５において、積算され、１８ビットの積算結果として、レジスタ５６を介して加算器５７へ供給される。加算器５７では、丸め処理回路４６からのデータと積算器５５の積算結果とが加算され、レジスタ５８を介して積和演算回路４７₂へ供給される。
【０００７】
このように、積和演算回路４７_nまで同様の処理が行われ、積和演算回路４７_nからの出力は、リミッタ回路４８へ供給される。リミッタ回路４８では、供給された２５ビットのデータが８ビットに制限される。この図４に示す回路には、特に途中の演算精度を落とさないために、ビット幅方向にはオーバーフローまたはアンダーフローが起こらないように充分な幅（２５ビット）を有する。
【０００８】
ここで、予測係数が時変であることが条件となる画像における簡単なフィルタリングの例として、例えばサンプリングレート変換のようなものがあり、４ｆscのクロックによるデータ系列から３ｆscのクロックによるデータ系列に変換することを想定する。この場合図５に示すように、４ｆscのデータ系列の間の画素データをフィルタによって算出しなければならない。したがって、フィルタの係数は画素位置に応じて３種類（Ｂ０、Ｂ１、Ｂ２）を予め用意しておく必要があり、３ｆscのデータ系列へ変換するためには、これらをタイミングに応じて切り換える必要がある。
【０００９】
このように、図４に示すようなパイプライン方式でサンプリングレート変換を行う場合、ｎタップのブロック化回路４２は、削除することも可能であり、入力画素データが順次各積和演算回路４７₁〜４７_nに供給されていれば良い。アドレス信号発生回路４３では、上述したように画素位置に応じたアドレス信号が発生し、各予測係数メモリ回路４４₁〜４４_nに順次ラッチを介して供給される。各予測係数メモリ回路４４₁〜４４_nでは、アドレスデコーダ回路５２でメモリセル５３の該当箇所が選択され、メモリセル５３から予測係数が読み出され、各積和演算回路４７₁〜４７_nに供給される。積和演算回路４７₁〜４７_nでは、順次供給される画素データと予測係数の積が計算され、前段の積和演算回路４７₁〜４７_n-1の結果と足し合わされる。以上のようにして、順次タップ毎の積の結果が足し込まれ、最後にリミッタ回路４８から８ビットに制限して出力される。
【００１０】
【発明が解決しようとする課題】
しかしながら、演算のためのビット数（２５ビット）を充分すぎるくらい持っていることと、パイプライン処理の弊害で予測係数メモリ回路のアドレスデコーダ回路が個別になっていることで、かなり回路規模が大きくなる問題があった。
【００１１】
従って、この発明の目的は、予測係数メモリ回路のパイプライン処理を並列にすることで、メモリセルの利用効率を上げることができ、それによって、回路全体の規模を縮小化することができるディジタル信号演算装置を提供することにある。
【００１２】
【課題を解決するための手段】
請求項１に記載の発明は、ディジタル信号を積和するディジタル信号演算装置において、入力されたディジタル信号から抽出されたｎ個のディジタル信号に対応するアドレス信号に応じて、メモリからｎ種類の予測係数を読み出す予測係数読み出し手段と、読み出されたｎ種類の予測係数と、入力されたディジタル信号から抽出されたｎ個のディジタル信号とがそれぞれ供給され、積和演算が行われるｎタップの積和演算手段と、積和演算が行われたデータを制限するリミッタ手段とを有し、積和演算手段は、少なくとも、ｎ種類の予測係数と、ｎ個のディジタル信号とをそれぞれ積算するｎ個の積算器と、ｎ個の積算器による積算結果の少なくとも一部を加算すると共に、加算結果をクリップして出力する第１階層の加算器と、第１階層の加算器による加算結果の少なくとも一部を加算すると共に、加算結果をクリップして出力する第２階層の加算器と、第１及び第２階層の加算器の間に設けられるレジスタとから構成されるようにしたことを特徴とするディジタル信号演算装置である。
【００１４】
入力画素データが８ビットであり、フィルタ演算された最終結果も８ビットであることを考慮すると、ここまでビット数に余裕を持たせておく必要もなく、途中で最適に丸めることで、ビット数を削除することが可能である。
【００１５】
【発明の実施の形態】
以下、この発明の一実施例について図面を参照して説明する。まず、ディジタル信号演算装置の一実施例のブロック図を図１に示す。１で示す入力端子から８ビットにディジタル化された画素データが供給され、その画素データは、ｎタップブロック化回路２へ供給される。ｎタップブロック化回路２では、入力画素データをフィルタ演算のｎタップに対応したブロック化がなされる。すなわち、このｎタップブロック化回路２では、ｎ画素毎にブロック化が行われ、ブロック化された各画素がｎ個の積和演算回路（ＭＰＹ）９₁〜９_nへ供給されるとともに、アドレス信号発生回路３へ供給される。
【００１６】
アドレス信号発生回路３では、供給されたブロックに含まれる画素に対応するアドレス信号を発生し、そのアドレス信号は、予測係数メモリ回路４へ供給される。予測係数メモリ回路４は、レジスタ５、８₁〜８_n、アドレスデコーダ回路６およびメモリセル７から構成される。アドレス信号発生回路３からのアドレス信号は、レジスタ５へ供給され、保持される。保持されたアドレス信号は、この装置のシステムクロックに基づいてレジスタ５からアドレスデコーダ回路６へ供給される。
【００１７】
アドレスデコーダ回路６では、メモリセル７に対応するように供給されたアドレス信号の変換が行われる。アドレスデコーダ回路６において、変換されたアドレス信号は、メモリセル７へ供給され、後段の積和演算回路９₁〜９_nにそれぞれ対応する予測係数（ＣＭ）が読み出される。読み出された予測係数は、１０ビットからなり、メモリセル７からレジスタ８₁〜８_nを介して、積和演算回路９₁〜９_nに供給される。
【００１８】
積和演算回路９₁は、レジスタ１０、１２および積算器１１から構成される。そして、上述したようにｎタップブロック化回路２から供給された８ビットの画素データは、レジスタ１０へ供給される。その８ビットの画素データと１０ビットの予測係数は、積算器１１において、積算される。その積算結果は、１８ビットのデータとして、レジスタ１２を介して、積和演算回路９₁の出力として加算器１３へ供給される。また、積和演算回路９₂〜９_nは、積和演算回路９₁と同様の処理を行い、各積和演算回路９₂〜９_nは、対応する加算器へ１８ビットのデータを出力する。
【００１９】
すなわち、加算器１３には、積和演算回路９₁および９₂からの１８ビットのデータが供給される。そして、その加算結果は、下位の１ビットを切り捨てることでビット幅を増やすことなく、すなわち１８ビットのデータとして、レジスタ１４を介して、次の階層となる加算器１５へ供給される。加算器１５では、レジスタ１７を介して供給される積和演算回路９₃と図示しない積和演算回路９₄との加算結果と、レジスタ１４を介して供給された加算結果との加算が行われる。そして、その加算結果は、加算器１３と同様に下位の１ビットを切り捨てることで１８ビットのデータとして、次の階層へ出力される。
【００２０】
同様に、加算器１９では、図示しない積和演算回路９_n-1の出力と積和演算回路９_nの出力との加算が行われる。その加算結果は、レジスタ２０を介して次の階層へ出力される。このようにして、最終階層の加算器２１では、２つの１８ビットのデータが加算される。そして、上述の加算器と同様に加算器２１の加算結果は、下位１ビットを切り捨てることで１８ビットのデータとして、レジスタ２２を介してリミッタ回路２３へ供給される。リミッタ回路２３では、供給された１８ビットのデータが８ビットのデータに制限され、出力端子２４から伝送される。
【００２１】
ここで、図２は、この上述の積和演算回路を２５タップの一例で示したものである。また、この図２は、上述の実施例よりさらに積極的に途中の階層でビットの削減を行っている。この例は、２５タップからなるため、きれいな階層構造ができないことから、一番下のタップとなる積算器３１₂₅の積算結果を最終段に近い階層で加算するようにしている。この積算器３１₂₅には、一般にタップの中心となるデータの演算が行われるように配置すると、精度良く演算が行われるという利点も生じる。
【００２２】
２の補数を用いた表現で符号を含む９ビットからなる入力画素データと、固定小数点を用いた表現で符号を含む１０ビットからなる予測係数とが積算器３１に供給され、積算される。図３に示すように、この積算器３１において、最初の計算の結果は、フルビットの幅では１８ビットとなるが、下位の３ビットを切り捨て、１５ビットとして出力される。このようにして積算器３１₁の出力は、レジスタを介して、加算器３２₁へ供給される。同様に積算器３１₂〜３１₂₄の出力もレジスタを介して、加算器３２₁〜３２₁₂へ供給される。
【００２３】
そして、加算器３２₁、すなわち第１階層の加算器は、加算結果において、図３に示すように、上位ビット方向に１ビット伸びるが、下位の１ビットを切り捨てることでビット幅を同じ１５ビットとしている。同様に第１階層の加算器３２₂〜３２₁₂では、積算器３１₃〜３１₂₄から供給された１５ビットのデータが加算され、それぞれの加算結果は、１５ビットのデータとして、第２階層の加算器３３₁〜３３₆へ供給される。
【００２４】
第２階層の加算器３３₁には、加算器３２₁および３２₂から出力された１５ビットからなるデータが供給される。その加算器３３₁の加算結果は、図３に示すように、上位ビット方向への伸びを抑え、下位の１ビットを切り捨て、１４ビットのデータとしてレジスタを介して加算器３４₁へ供給される。同様に第２階層の加算器３３₂〜３３₆では、第１階層の加算器３２₃〜３２₁₂から供給される１５ビットのデータが加算され、それぞれの加算結果は、１４ビットのデータとして、第３階層の加算器３４₁〜３４₃へ供給される。
【００２５】
第３階層の加算器３４₁には、加算器３３₁および３３₂から出力された１４ビットからなるデータが供給される。その加算器３４₁の加算結果は、図３に示すように、上位ビット方向への伸びを抑え、下位の１ビットを切り捨て、１３ビットのデータとしてレジスタを介して加算器３５₁へ供給される。同様に第３階層の加算器３４₂および３４₃では、第２階層の加算器３３₃〜３３₆から供給された１４ビットのデータが加算され、それぞれの加算結果は、１３ビットのデータとして、第４階層の加算器３５₁および３５₂へ供給される。
【００２６】
第４階層の加算器３５₁には、加算器３４₁および３４₂から出力された１３ビットからなるデータが供給される。その加算器３５₁の加算結果は、図３に示すように、上位ビット方向への伸びを抑え、下位の１ビットを切り捨て、１２ビットのデータとしてレジスタを介して加算器３６へ供給される。同様に第４階層の加算器３５₂では、第３階層の加算器３４₃および積算器３１₂₅から供給された１３ビットのデータが加算され、その加算結果は、１２ビットのデータとして、第５階層の加算器３６へ供給される。
【００２７】
この加算器３５₂へ供給される積算器３１₂₅では、上述したようにタップの中心となるデータ（９ビット）と予測係数（１０ビット）が積算される。この積算器３１₂₅から出力されるデータは、上述した積算器３１₁〜３１₂₄と同様に１８ビットとなる。しかしながら、この積算器３１₂₅の出力は、第４階層の加算器３５₂へ供給されるため、上位ビット方向に１ビット伸ばし、下位の６ビットを削除することにより１３ビットとして、加算器３５₂へ供給される。
【００２８】
そして、第５階層の加算器３６には、加算器３５₁および３５₂から出力された１２ビットからなるデータ供給される。その加算器３６の加算結果は、図３に示すように、上位ビット方向の伸びを抑え、下位の１ビットを切り捨て、すなわち小数点以下のビットが全て削除された１１ビットのデータとしてリミッタ回路３７へ供給される。リミッタ回路３７では、供給された１１ビットのデータが８ビットに制限され、予測値は、出力端子２４から出力される。
【００２９】
この実施例で用いられているアドレス信号発生装置３は、供給されたブロックに含まれる画素データに対応するアドレス信号を発生させるが、ブロックに含まれる画素データに対して、例えばＡＤＲＣ（ダイナミックレンジ適応符号化）、ＤＰＣＭ（予測符号化）、ＤＣＴ（離散的コサイン変換）、ＶＱ（ベクトル量子化）、ＢＴＣ（ブロックトランケーション符号化）等を用いて圧縮することによって、その圧縮された値をそのブロックのクラスとして、アドレス信号を発生するクラス分類の手法を使用することも可能である。
【００３０】
すなわち、クラス分類によって予測係数が読み出され、読み出された予測係数と画素データとを用いた線形一次結合によって、予測値が生成される。このように、予測値が生成されることによって、例えば、ＳＤ（Standard Difinition ）信号からＨＤ（High Difinition ）信号へのアップコンバージョン、またはサブサンプリングにより間引かれた画素の予測がなされる。
【００３１】
さらに、この実施例では、係数ゲインが１に近いフィルタの場合、充分にオーバーフローまたはアンダーフローに耐えられる構造となっており、この実施例の最終的な演算結果と、フルビット演算による最終的な演算結果を８ビットに丸めた場合とを比較しても、±１〜±２程度の誤差しか生じておらず、その画像をモニタ表示した場合においても、差異が識別できないという評価がなされている。
【００３２】
また、この実施例では、画素データを用いて説明したが、音声データを用いても同じ結果を得ることができる。
【００３３】
【発明の効果】
この発明に依れば、画像を扱う場合において、８ビットで入力される画素データを最終段において、８ビットに丸めて出力する場合、それほど厳密に精度をとる必要はなく、演算の途中の段階で最適に丸め処理を重ねる、すなわち積算結果を足し込む階層毎の加算器の結果の下位ビットを切り捨てて削減し、足し込むに毎にビット幅が増加しないようにしたものでも、それ程精度は、劣化しない。
【００３４】
さらに、この発明に依れば、積和演算回路を並列化とし、予測係数メモリ回路へ供給するアドレス信号を１つとし、アドレスデコーダ回路を共通化し、さらにメモリセルの利用効率を上げるものであり、画像のフィルタ演算を行う積和演算回路の回路規模を削減することができる。
【図面の簡単な説明】
【図１】この発明のディジタル信号演算装置の一実施例を示すブロック図である。
【図２】この発明に係る積算演算回路の一例を示すブロック図である。
【図３】この発明に係る積算演算回路の階層に応じたビット幅の一例を示す略線図である。
【図４】従来のディジタル信号演算装置のブロック図である。
【図５】信号変換の説明に用いる図である。
【符号の説明】
２ｎタップブロック化回路
３アドレス信号発生回路
４予測係数メモリ回路
６アドレスデコーダ回路
７メモリセル
９積和演算回路
１１積算器
１３、１６、１９、１５、２１加算器
２３リミッタ回路[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a digital signal arithmetic apparatus capable of reducing the circuit scale without reducing the arithmetic accuracy in the product-sum operation of digitized pixel data or audio data.
[0002]
[Prior art]
Conventionally, in an apparatus that performs a product-sum operation on digitized pixel data or audio data, the accumulator, adder, and peripheral elements are all worst-cased (when the word length is the longest) in order not to reduce the calculation accuracy. Since the number of bits can be handled, the circuit scale tends to increase as the number of taps increases.
[0003]
FIG. 4 shows a block diagram of a coefficient time-varying product-sum operation circuit connected to such a conventional cascade. Pixel data digitized to 8 bits is supplied from an input terminal 41, and the pixel data is supplied to an n-tap blocking circuit 42. In the n-tap blocking circuit 42, the input pixel data is blocked according to the tap of the filter operation. The blocked signal is supplied to the address signal generation circuit 43 and the register 45.
[0004]
In the address signal generating circuit 43 generates an address signal corresponding to the pixels included in the supply block, the address signals are supplied to the prediction coefficient memory circuit 44 _1. Prediction coefficient memory circuit 44 ₁ is comprised of registers 51 and 54, the address decoder circuit 52 and the memory cell 53. The address signal from the address signal generation circuit 43 is supplied to the register 51 and held. Holding address signal, based on the system clock of the device, supplied from the register 51 to the prediction coefficient memory circuit 44 ₂ and the address decoder circuit 52.
[0005]
As described above, the address signal is supplied to the prediction coefficient memory circuit 44 _n based on the system clock, and the following processing is performed in the prediction coefficient memory circuit. The address decoder circuit 52 converts an address signal supplied so as to correspond to the memory cell 53. In the address decoder circuit 52, the converted address signal is supplied to the memory cell 53, and the corresponding prediction coefficient is read out. Prediction coefficients read out from the memory cell 53 via the register 54, it is supplied to the product-sum operation circuit 47 _1.
[0006]
The product-sum operation circuit 47 ₁ includes an integrator 55, registers 56 and 58, and an adder 57. The product-sum operation circuit 47 ₁ is supplied with pixel data that has been blocked from the register 45 and 8-tap data consisting of 25 bits and a prediction coefficient from the rounding processing circuit 46. The supplied 8-bit pixel data and 10-bit prediction coefficient are integrated in the integrator 55 and supplied to the adder 57 via the register 56 as an 18-bit integration result. The adder 57 is a result of integration and data from rounding circuit 46 multiplier 55 adder is supplied to the product-sum operation circuit 47 ₂ via the register 58.
[0007]
Thus, similar processing to the product-sum operation circuit 47 _n is performed, the output from the product-sum operation circuit 47 _n is supplied to the limiter circuit 48. In the limiter circuit 48, the supplied 25-bit data is limited to 8 bits. The circuit shown in FIG. 4 has a sufficient width (25 bits) so that overflow or underflow does not occur in the bit width direction, in particular, in order not to reduce the calculation accuracy in the middle.
[0008]
Here, as an example of simple filtering in an image on the condition that the prediction coefficient is time-varying, there is, for example, sampling rate conversion, which converts a data sequence using a 4 fsc clock to a data sequence using a 3 fsc clock. Assuming that In this case, as shown in FIG. 5, pixel data between 4 fsc data series must be calculated by a filter. Therefore, three types of filter coefficients (B0, B1, B2) need to be prepared in advance according to the pixel position, and in order to convert to a 3 fsc data series, it is necessary to switch these according to the timing. is there.
[0009]
As described above, when the sampling rate conversion is performed by the pipeline method as shown in FIG. 4, the n-tap blocking circuit 42 can be deleted, and the input pixel data is sequentially supplied to each product-sum operation circuit 47 _1. It is sufficient that it is supplied to ˜47 _n . The address signal onset raw circuit 43, an address signal corresponding to a pixel position is generated as described above, it is supplied via sequentially latched to each prediction coefficient memory circuit 44 ₁ ~ 44 _n. In each of the prediction coefficient memory circuits 44 _{1 to} 44 _n , a corresponding portion of the memory cell 53 is selected by the address decoder circuit 52, and a prediction coefficient is read from the memory cell 53 and supplied to each of the product-sum operation circuits 47 _{1 to} 47 _n . Is done. In the product-sum operation circuits 47 _{1 to} 47 _n , the product of the sequentially supplied pixel data and the prediction coefficient is calculated and added to the results of the previous product-sum operation circuits 47 _{1 to} 47 _n−1 . As described above, the result of the product for each tap is sequentially added, and finally, the limiter circuit 48 outputs the result limited to 8 bits.
[0010]
[Problems to be solved by the invention]
However, the circuit scale is considerably large because the number of bits for calculation (25 bits) is too large and the address decoder circuit of the prediction coefficient memory circuit is individual due to the adverse effect of pipeline processing. There was a problem.
[0011]
Therefore, an object of the present invention is to make it possible to increase the use efficiency of memory cells by parallelizing the pipeline processing of the prediction coefficient memory circuit, thereby reducing the scale of the entire circuit. It is to provide an arithmetic device.
[0012]
[Means for Solving the Problems]
According to the first aspect of the present invention, in the digital signal arithmetic unit for multiplying and summing digital signals, n types of predictions are made from the memory in accordance with address signals corresponding to n digital signals extracted from the input digital signals. and prediction coefficient reading means to read out coefficients, and prediction coefficients for n type read, and n digital signals extracted from the input digital signal are respectively supplied, n taps product-sum operation is performed Product-sum operation means and limiter means for limiting the data on which the product-sum operation has been performed. The product-sum operation means integrates at least n types of prediction coefficients and n digital signals, respectively. n accumulators, a first hierarchy adder that adds at least a part of the accumulation results of the n accumulators , and clips and outputs the addition results ; A second layer adder that adds at least a part of the addition result by the adder, clips and outputs the addition result, and a register provided between the first and second layer adders This is a digital signal arithmetic device characterized by being configured as described above.
[0014]
Considering that the input pixel data is 8 bits and the final result of the filter operation is 8 bits, it is not necessary to leave a sufficient number of bits so far. Can be deleted.
[0015]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an embodiment of the present invention will be described with reference to the drawings. First, a block diagram of an embodiment of a digital signal arithmetic unit is shown in FIG. Pixel data digitized to 8 bits is supplied from an input terminal indicated by 1, and the pixel data is supplied to an n-tap blocking circuit 2. In the n-tap blocking circuit 2, the input pixel data is blocked corresponding to n-tap of the filter operation. That is, in this n-tap blocking circuit 2, blocking is performed for every n pixels, and each of the blocked pixels is supplied to _n product-sum operation circuits (MPY) 9 _{1 to} 9 _n and addresses The signal is supplied to the signal generation circuit 3.
[0016]
The address signal generation circuit 3 generates an address signal corresponding to a pixel included in the supplied block, and the address signal is supplied to the prediction coefficient memory circuit 4. The prediction coefficient memory circuit 4 includes registers 5 and 8 ₁ to 8 _n , an address decoder circuit 6 and a memory cell 7. The address signal from the address signal generation circuit 3 is supplied to the register 5 and held. The held address signal is supplied from the register 5 to the address decoder circuit 6 based on the system clock of this apparatus.
[0017]
The address decoder circuit 6 converts an address signal supplied so as to correspond to the memory cell 7. In the address decoder circuit 6, the converted address signal is supplied to the memory cell 7, and the prediction coefficient (CM) corresponding to each of the subsequent product-sum operation circuits 9 _{1 to} 9 _n is read. The read prediction coefficient consists of 10 bits and is supplied from the memory cell 7 to the product-sum operation circuits 9 _{1 to} 9 _n via the registers 8 ₁ to 8 _n .
[0018]
The product-sum operation circuit 9 ₁ includes registers 10 and 12 and an integrator 11. As described above, the 8-bit pixel data supplied from the n-tap blocking circuit 2 is supplied to the register 10. The 8-bit pixel data and the 10-bit prediction coefficient are integrated in the integrator 11. The multiplication result as 18-bit data, via the register 12, is supplied to the adder 13 as an output of the product-sum operation circuit 9 _1. The product-sum operation circuits 9 _{2 to} 9 _n perform the same processing as the product-sum operation circuit 9 _1, and each product-sum operation circuit 9 _{2 to} 9 _n outputs 18-bit data to the corresponding adder. .
[0019]
That is, the adder 13 is supplied with 18-bit data from the product-sum operation circuits 9 ₁ and 9 ₂ . Then, the addition result is supplied to the adder 15 as the next layer through the register 14 without increasing the bit width by truncating the lower one bit, that is, as 18-bit data. In the adder 15, the sum of the product-sum operation circuit 9 ₄ (not shown) and product-sum operation circuit 9 ₃ supplied via the register 17, the addition of the addition result supplied via the register 14 is performed . The addition result is output to the next layer as 18-bit data by truncating the lower 1 bit as in the adder 13.
[0020]
Similarly, the adder 19 adds the output of the product _- sum operation circuit 9 _{n-1 (} not shown) and the output of the product-sum operation circuit 9 _n . The addition result is output to the next hierarchy through the register 20. In this manner, the adder 21 in the final hierarchy adds two 18-bit data. Similarly to the adder described above, the addition result of the adder 21 is supplied to the limiter circuit 23 via the register 22 as 18-bit data by truncating the lower 1 bit. In the limiter circuit 23, the supplied 18-bit data is limited to 8-bit data and transmitted from the output terminal 24.
[0021]
FIG. 2 shows the above-described product-sum operation circuit as an example of 25 taps. Further, in FIG. 2, bits are reduced more actively in the middle layer than in the above embodiment. This example is to be added in 25 to become a tap, from the inability to clean hierarchy, near the accumulation result of the integrator 31 ₂₅ as a tap at the bottom in the final stage hierarchy. The integrator 31 _25, the general arithmetic operation of data that is central to the tap arranged to be performed, an advantage that accuracy calculation is performed occurs.
[0022]
The input pixel data consisting of 9 bits including the sign in the expression using the two's complement and the prediction coefficient consisting of 10 bits including the sign in the expression using the fixed point are supplied to the integrator 31 and integrated. As shown in FIG. 3, in the accumulator 31, the result of the first calculation is 18 bits in the full bit width, but the lower 3 bits are rounded down and output as 15 bits. In this way, the output of the integrator 31 ₁ is supplied to the adder 32 ₁ via the register. Similarly, the output of the integrator 31 _2-31 ₂₄ even through the register is supplied to the adder 321 _to 323 _12.
[0023]
The adder 32 ₁ , that is, the first layer adder, in the addition result, as shown in FIG. 3, extends by 1 bit in the upper bit direction, but the same bit width is obtained by truncating the lower 1 bit. It is said. In the first layer of the adder 32 ₂ to 32 ₁₂ Similarly, the integrator 31 _3-31 are subject to 15-bit data supplied from the _24, each of the addition result, 15 as bit data, the second layer It is supplied to the adders 33 _{1 to} 33 ₆ .
[0024]
The 15-bit data output from the adders 32 ₁ and 32 ₂ is supplied to the adder 33 ₁ in the second hierarchy. Its adder 33 ₁ of the addition result, as shown in FIG. 3, to suppress the elongation of the upper bit direction, truncating a bit lower, is supplied to the adder 34 ₁ via the register as 14-bit data . Similarly, in the second layer adders 33 _{2 to} 33 ₆ , the 15-bit data supplied from the first layer adders 32 _{3 to} 32 ₁₂ are added, and each addition result is expressed as 14-bit data. is supplied to the adder 34 ₁ to 34 ₃ of the third layer.
[0025]
The 14-bit data output from the adders 33 ₁ and 33 ₂ is supplied to the adder 34 ₁ in the third hierarchy. Addition result of the adder 34 _1, as shown in FIG. 3, to suppress the elongation of the upper bit direction, truncating a bit lower, is supplied to the adder 35 ₁ through the register as the 13-bit data . Similarly in the third hierarchical adder 34 ₂ and 34 ₃ are added 14-bit data supplied from the adder 33 ₃ to 33 ₆ of the second hierarchy, each addition result as 13-bit data, It is supplied to the adders 35 ₁ and 35 _{2 in} the fourth layer.
[0026]
The 13-bit data output from the adders 34 ₁ and 34 ₂ is supplied to the adder 35 ₁ in the fourth layer. Addition result of the adder 35 _1, as shown in FIG. 3, to suppress the elongation of the upper bit direction, truncating a bit lower, is supplied to the adder 36 via the register as 12 bits of data. The adder 35 ₂ of the fourth hierarchy as well, 13-bit data supplied from the adder 34 ₃ and integrator 31 ₂₅ in the third layer is added, the addition result as 12-bit data, the fifth It is supplied to the adder 36 of the hierarchy.
[0027]
The integrator 31 ₂₅ supplied to the adder 35 _2, the center to become data (9 bits) and the prediction coefficient of the tap, as described above (10 bits) is accumulated. Data output from the integrator 31 ₂₅ is the same as the 18-bit and the integrator 31 _1-31 ₂₄ described above. However, the output of the integrator 31 ₂₅ to be supplied to the adder 35 ₂ of the fourth hierarchy, as 13 bit by stretching 1 bit to the upper bit direction, to remove the 6 bits of the lower adder 35 ₂ Supplied to.
[0028]
The fifth layer adder 36 is supplied with 12-bit data output from the adders 35 ₁ and 35 ₂ . As shown in FIG. 3, the addition result of the adder 36 suppresses the expansion in the upper bit direction, truncates the lower one bit, that is, 11-bit data from which all the bits after the decimal point are deleted, to the limiter circuit 37. Supplied. In the limiter circuit 37, the supplied 11-bit data is limited to 8 bits, and the predicted value is output from the output terminal 24.
[0029]
The address signal generator 3 used in this embodiment generates an address signal corresponding to the pixel data included in the supplied block. For example, ADRC (Dynamic Range Adaptation) is applied to the pixel data included in the block. Coding), DPCM (predictive coding), DCT (discrete cosine transform), VQ (vector quantization), BTC (block truncation coding), etc. to compress the compressed value into the block It is also possible to use a class classification method for generating an address signal as the class.
[0030]
That is, a prediction coefficient is read out by class classification, and a prediction value is generated by linear linear combination using the read prediction coefficient and pixel data. Thus, by generating the prediction value, for example, prediction of pixels thinned out by up-conversion from an SD (Standard Definition) signal to an HD (High Definition) signal or sub-sampling is performed.
[0031]
Further, in this embodiment, in the case of a filter having a coefficient gain close to 1, it has a structure that can sufficiently withstand overflow or underflow. Even when the calculation result is rounded to 8 bits, only an error of about ± 1 to ± 2 occurs, and it is evaluated that the difference cannot be identified even when the image is displayed on the monitor. .
[0032]
In this embodiment, the pixel data is used for the explanation, but the same result can be obtained even if the audio data is used.
[0033]
【The invention's effect】
According to the present invention, in the case of handling an image, when pixel data input in 8 bits is rounded to 8 bits and output in the final stage, it is not necessary to take precision so accurately, and a stage in the middle of an operation Even if the rounding process is optimally repeated, that is, the lower bit of the result of the adder for each layer that adds the integration results is rounded down and reduced so that the bit width does not increase with each addition, the accuracy is so much. Does not deteriorate.
[0034]
Furthermore, according to the present invention, the product-sum operation circuit is parallelized, the address signal supplied to the prediction coefficient memory circuit is one, the address decoder circuit is shared, and the utilization efficiency of the memory cell is further increased. Therefore, the circuit scale of the product-sum operation circuit that performs the filter operation of the image can be reduced.
[Brief description of the drawings]
FIG. 1 is a block diagram showing an embodiment of a digital signal arithmetic device according to the present invention.
FIG. 2 is a block diagram showing an example of an integration operation circuit according to the present invention.
FIG. 3 is a schematic diagram showing an example of a bit width corresponding to a hierarchy of an integration arithmetic circuit according to the present invention.
FIG. 4 is a block diagram of a conventional digital signal arithmetic device.
FIG. 5 is a diagram used for explaining signal conversion;
[Explanation of symbols]
2 n-tap blocking circuit 3 address signal generation circuit 4 prediction coefficient memory circuit 6 address decoder circuit 7 memory cell 9 product-sum operation circuit 11 accumulator 13, 16, 19, 15, 21 adder 23 limiter circuit

Claims

In a digital signal arithmetic unit that multiplies and sums digital signals,
According to the corresponding address signal into n digital signals extracted from the input digital signal, a prediction coefficient reading means for to read out the prediction coefficients of n type from the memory,
N-tap sum-of-products operation means for supplying the n-type prediction coefficients read out and n number of digital signals extracted from the input digital signal, respectively, and performing a sum-of-products operation;
Limiter means for limiting the data on which the product-sum operation has been performed,
The product-sum operation means is at least
And prediction coefficient of the n type, and the n integrators for integrating said n digital signals and respectively,
A first layer adder for adding at least a part of the integration results by the n integrators and clipping and outputting the addition results ;
A second layer adder that adds at least a part of the addition result by the first layer adder and clips and outputs the addition result;
A digital signal arithmetic apparatus comprising a register provided between the first and second layer adders .

A plurality of adders configured in a hierarchical structure including the first and second hierarchy adders ,
2. The digital signal according to claim 1, wherein the data to be output is clipped as a rounding process for the lower bits without increasing in the higher bit direction as the number of operation stages is increased. Arithmetic unit.

The prediction coefficient reading means is
The n kinds of prediction coefficients are read from the memory in accordance with an address signal generated according to a class determined based on n digital signals extracted from the input digital signal. 2. The digital signal arithmetic device according to 1.

The prediction coefficient reading means is
And characterized in that in response to the address signal generated in accordance with the class that is determined based on the value generated by compressing a digital signal which is the input, and to read out the prediction coefficients of n type from the memory The digital signal arithmetic device according to claim 3 .

The limiter means is
2. The digital signal arithmetic apparatus according to claim 1, wherein the data is limited so as to have the same number of bits as the input digital signal.

The product-sum operation means is
The summation result of the digital signal corresponding to the center tap among the n taps is added to the layer close to the final stage among the plurality of adders configured in a hierarchical structure including the first and second layer adders. 2. The digital signal arithmetic apparatus according to claim 1, wherein the addition is performed by a calculator.