JP5086675B2

JP5086675B2 - Filter calculator and motion compensation device

Info

Publication number: JP5086675B2
Application number: JP2007079161A
Authority: JP
Inventors: 陽一片山
Original assignee: Renesas Electronics Corp
Current assignee: Renesas Electronics Corp
Priority date: 2007-03-26
Filing date: 2007-03-26
Publication date: 2012-11-28
Anticipated expiration: 2027-03-26
Also published as: JP2008242594A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a filter computing unit using Booth's algorithm and a motion compensation device that can reduce hardware resources and power consumption. <P>SOLUTION: The filter computing unit, which multiplies-and-accumulates filter coefficients and input pixel values by Booth's algorithm, has an input part for inputting the pixel values, two or more operation parts 10<SB>j</SB>each for repeatedly decoding an output from the input part by Booth's algorithm to produce one or more code data and multiplying each of the one or more code data by the corresponding filter coefficient, input selectors 13<SB>j</SB>each for inputting an output from the input part selectively into any of the operation parts 10<SB>j</SB>, and a control part 31 for determining repeated operation cycles and repeated operation timings of each operation part 10<SB>j</SB>according to the output of the input part and controlling the input selectors 13<SB>j</SB>according to the determinations. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、動画の圧縮符号化復号に使用される動き補償処理におけるフィルタ演算を実行するために好適なフィルタ処理装置及びこれを具備する動き補償処理装置に関する。 The present invention relates to a filter processing apparatus suitable for executing a filter operation in a motion compensation process used for compression coding / decoding of a moving image, and a motion compensation processing apparatus including the filter processing apparatus.

次世代ＤＶＤ（Digital Versatile Disk）やＤＴＶ（デジタルテレビ）に採用が決定しているＨ．２６４／ＡＶＣやＶＣ−１といった新しいコーデックがある。これらの復号装置においては、動き補償部での動き補償予測フィルタのフィルタ演算をブースのアルゴリズムを適用した乗算器で構成される場合がある。 H. has been decided to be adopted for the next generation DVD (Digital Versatile Disk) and DTV (digital television). There are new codecs such as H.264 / AVC and VC-1. In these decoding apparatuses, the filter operation of the motion compensation prediction filter in the motion compensation unit may be configured by a multiplier to which Booth's algorithm is applied.

乗算器の演算時間は、部分積加算をするために必要とする時間と桁上げ信号吸収をするために必要な時間の総和であり、演算速度を高速にする上でこれらの処理時間の短縮が問題となる。その対策として加算回路を減らすために部分積の数そのものを削減する必要がある。そのためには乗数の連続する複数ビットを一まとめのグループにして、このグループに対応した部分積を生成すれば部分積を削減することができる。そこで部分積数削減のために用いられるのが２次のブースである。２次のブースとは、乗数を２ビットごとに区切り、各組と下位組の最上位ビットの計３ビットをひとまとめにするというアルゴリズムを適用した部分積削減の手法である。 The operation time of the multiplier is the sum of the time required to add the partial products and the time required to absorb the carry signal. In order to increase the operation speed, the processing time can be shortened. It becomes a problem. As a countermeasure, it is necessary to reduce the number of partial products in order to reduce the number of adder circuits. For this purpose, partial products can be reduced by grouping together a plurality of bits having consecutive multipliers and generating a partial product corresponding to this group. Therefore, the secondary booth is used to reduce the partial product number. The secondary booth is a partial product reduction technique to which an algorithm is applied in which a multiplier is divided every 2 bits, and a total of 3 bits of the most significant bits of each group and the lower group are combined.

しかしながら、上記のようなコーデックのフィルタ演算を行なう際、これをブースのアルゴリズムを適用した乗算器で構成すると、多数の乗算器が必要となり回路規模が増大する。また、同様にＨ．２６４の画面内予測における予測画像の生成に使用されるフィルタ演算をブースのアルゴリズムを適用した乗算器で適用すると回路規模は増大する。 However, when the codec filter operation as described above is performed by a multiplier to which the Booth algorithm is applied, a large number of multipliers are required and the circuit scale increases. Similarly, H. When the filter operation used to generate a predicted image in the H.264 intra-screen prediction is applied by a multiplier to which the Booth algorithm is applied, the circuit scale increases.

ところで、特許文献１には、乗算器の数を極力少なくし、回路規模を小さくした離散コサイン変換器が開示されている。図１３は、特許文献１に記載の離散コサイン変換器を示す図である。この離散コサイン変換器は、加算器６１２、６４０、６４２、差分器６１０、レジスタ６１４、マルチプレクサ６１６、６５２、マルチプレクサ乗算器６１８、６２０、６２２、６３４、バタフライ加算器６２６、６２８、６３０、６３２、６４４、６４６、６４８、６５０、乗算器６２４、６３６、６３８、及び量子化器６５４を有する。画像データの交流成分として差分器６１０による差分データを得て、これに対しＤＣＴを行う。そして、差分についてのＤＣＴとすることによって、必要な係数の数が少なくなるため、乗算器の数を減少できる。さらに、同一の係数を異なるデータに対し乗算する場合にはマルチプレクサ乗算器６１８、６２０、６２２、６３４を用い、時分割で乗算を行う。このため、乗算器の数をさらに減少することができる。また、乗算すべき係数を量子化器６５４の量子化テーブルに対し予め乗算しておくため、乗算回数を減少することができる。このように、特許文献１に記載の離散コサイン変換器は、離散コサイン変換の特性を利用し、乗算とバタフライ演算を利用して高速に同演算を実行するものである。 By the way, Patent Document 1 discloses a discrete cosine transformer in which the number of multipliers is reduced as much as possible and the circuit scale is reduced. FIG. 13 is a diagram illustrating a discrete cosine transformer described in Patent Document 1. In FIG. The discrete cosine transformer includes adders 612, 640, and 642, a difference unit 610, a register 614, multiplexers 616 and 652, multiplexer multipliers 618, 620, 622, and 634, butterfly adders 626, 628, 630, 632, and 644. , 646, 648, 650, multipliers 624, 636, 638, and a quantizer 654. Difference data by the difference unit 610 is obtained as an AC component of the image data, and DCT is performed on the difference data. Since the number of necessary coefficients is reduced by using the DCT for the difference, the number of multipliers can be reduced. Further, when multiplying different data by the same coefficient, multiplexer multipliers 618, 620, 622, and 634 are used to perform multiplication in a time division manner. For this reason, the number of multipliers can be further reduced. In addition, since the coefficient to be multiplied is previously multiplied with the quantization table of the quantizer 654, the number of multiplications can be reduced. As described above, the discrete cosine transformer described in Patent Document 1 uses the characteristics of the discrete cosine transform and performs the same operation at high speed using multiplication and butterfly operations.

また、特許文献２には、空間フィルタリング等の画像信号処理を時系列的に行なう信号処理装置が開示されている。この信号処理装置は、同じ部分積乗算器を繰り返し用いることで乗算器の回路規模を削減するものである。図１４は、特許文献２に記載の情報処理装置におけるプロセッサ、レジスタ回路及び係数レジスタを示す図である。情報処理装置は、入出力バッファ回路７４０、５個の係数Ｗ_１〜Ｗ_５を保持する係数レジスタ７１１、及びプロセッサ７１０を有する。入出力バッファ回路７４０は、バス８２０に、５行分の画素データを保持するＲＡＭ７４６、及び画素データＤ_１〜Ｄ_５をそれぞれ保持する５個のレジスタ７４１〜７４５を有する。プロセッサ７１０は、２個の乗算器７１０ａ、７１０ｂ、加算器７１０ｃ、レジスタ７１０ｄ、ゲート回路７１０ｅ、データ入力側のマルチプレクサ７１０ｆ、７１０ｇ、及び係数入力側のマルチプレクサ７１０ｈ、７１０ｉを有する。 Patent Document 2 discloses a signal processing apparatus that performs image signal processing such as spatial filtering in a time series. This signal processing apparatus reduces the circuit scale of a multiplier by repeatedly using the same partial product multiplier. FIG. 14 is a diagram illustrating a processor, a register circuit, and a coefficient register in the information processing apparatus described in Patent Document 2. The information processing apparatus includes an input / output buffer circuit 740, a coefficient register 711 that holds _five coefficients W _{1 to} W ₅ , and a processor 710. The input / output buffer circuit 740 includes a RAM 746 that holds pixel data for five rows and five registers 741 to 745 that respectively hold pixel data D _{1 to} D ₅ on the bus 820. The processor 710 includes two multipliers 710a and 710b, an adder 710c, a register 710d, a gate circuit 710e, multiplexers 710f and 710g on the data input side, and multiplexers 710h and 710i on the coefficient input side.

この情報処理装置においては、乗算器７１０ａ、７１０ｂにてそれぞれ部分積Ｐ_１＝Ｗ_１×Ｄ_１、Ｐ_２＝Ｗ_２×Ｄ_２を計算する。部分積Ｐ_１、Ｐ_２及びレジスタ７１０ｄの値がゲート回路７１０ｅを解して加算器７１０ｃに入力され、和が求められ、その結果がレジスタ７１０ｄに保持される。ゲート回路７１０ｅには図示しない制御回路からゲート信号が印加されレジスタ７１０ｄの値が部分積と加算される。次に乗算器７１０ａ、７１０ｂにてそれぞれ部分積Ｐ_３＝Ｗ_３×Ｄ_３、Ｐ_４＝Ｗ_４×Ｄ_４を計算し、前回の部分積の和に加算される。さらに、乗算器７１０ａにて部分積Ｐ_５＝Ｗ_５×Ｄ_５が計算され、乗算器７１０ｂには零が入力される。よって、Ｐ_５のみ前回までの部分積の和に加算され、レジスタ７１０ｄに保持される。レジスタ７１０ｄの内容をゲート７１７及びバス８１０を解して図示せぬメモリセル部に保存する。こうして注目データＤ_３について隣接するデータＤ_２、Ｄ_１、Ｄ_４、Ｄ_５についての５次のベクトルコンボリューションインテグラルを得ることができる。このように、特許文献２に記載の情報処理装置においては、ベクトルコンボリューションの次数５に対し２個の乗算器７１０ａ、７１０ｂとすることができる。
特開平６−４４２９１号公報特開昭６２−１０５２８７号公報 In this information processing apparatus, multipliers 710a and 710b calculate partial products P ₁ = W ₁ × D ₁ and P ₂ = W ₂ × D ₂ , respectively. The partial products P ₁ and P ₂ and the value of the register 710d are input to the adder 710c through the gate circuit 710e, the sum is obtained, and the result is held in the register 710d. A gate signal is applied to the gate circuit 710e from a control circuit (not shown), and the value in the register 710d is added to the partial product. Next, multipliers 710a and 710b calculate partial products P ₃ = W ₃ × D ₃ and P ₄ = W ₄ × D ₄ , respectively, and add them to the sum of the previous partial products. Further, the partial product P ₅ = W ₅ × D ₅ is calculated by the multiplier 710a, and zero is input to the multiplier 710b. Therefore, is added to the sum of partial products only up to the last P _5, it is held in the register 710d. The contents of the register 710d are stored in a memory cell unit (not shown) through the gate 717 and the bus 810. In this way, a fifth-order vector convolution integral can be obtained for the data D ₂ , D ₁ , D ₄ , and D ₅ adjacent to the data of interest D ₃ . Thus, in the information processing apparatus described in Patent Document 2, two multipliers 710a and 710b can be used for the degree 5 of vector convolution.
JP-A-6-44291 JP-A-62-105287

しかしながら、特許文献１に記載の離散コサイン変換器においては、高速に乗算を実施するために、大規模な乗算器を使用するため回路規模が大きいという問題点がある。また汎用的に処理させるために、特に画像の性質を利用するものではないため、演算精度が求められる場合には、その分だけ演算器も演算精度分だけ回路規模も大きくなり、消費電力増大につながる。 However, the discrete cosine transformer described in Patent Document 1 has a problem that the circuit scale is large because a large-scale multiplier is used to perform multiplication at high speed. In addition, since image processing is not particularly used for general-purpose processing, when computing accuracy is required, the computing unit and the circuit scale are increased by that amount, which increases power consumption. Connected.

また、特許文献２に記載の情報処理装置においては、プロセッサ内においては、乗算器を５つ設ける場合に比して演算時間が３倍となってしまうという問題点がある。 In addition, the information processing apparatus described in Patent Document 2 has a problem that the computation time is three times longer than that in the case where five multipliers are provided in the processor.

本発明に係るフィルタ演算器は、乗数と被乗数とをブースアルゴリズムを用いて積和演算するフィルタ演算器であって、前記乗数が入力される入力部と、前記ブースアルゴリズムに従って前記入力部からの出力をデコードして１又は複数の符号データを求め、対応する被乗数と当該１又は複数の符号データのそれぞれとの積を求める繰り返し演算を行う２以上の演算部と、前記入力部からの出力を選択して前記演算部のいずれかに入力する入力選択セレクタと、前記入力部からの出力に基づき前記各演算部における繰り返し演算回数及び繰り返し演算タイミングを決定し、この決定結果に基づき前記入力選択セレクタを制御する制御部とを有するものである。 A filter arithmetic unit according to the present invention is a filter arithmetic unit that performs a product-sum operation on a multiplier and a multiplicand using a Booth algorithm, and an input unit to which the multiplier is input, and an output from the input unit according to the Booth algorithm 2 or more to obtain one or a plurality of code data, select two or more arithmetic units for performing a repetitive operation to obtain a product of the corresponding multiplicand and each of the one or a plurality of code data, and select an output from the input unit Then, an input selection selector that inputs to any of the arithmetic units, and the number of repetitive operations and the repetitive operation timing in each of the arithmetic units are determined based on the output from the input unit, and the input selection selector is selected based on the determination result. And a control unit for controlling.

本発明にかかる動き補償処理装置は、予測画像を生成する動き補償処理装置であって、垂直方向の入力データに対してフィルタ演算を行なう第１フィルタ演算器と、水平方向の入力データに応じてフィルタ演算を行なう第２フィルタ演算器と、前記第１及び第２フィルタ演算器の演算結果又は第１及び第２のフィルタ演算に入力する入力データに対して重み付けを行なう重み付け演算部とを備え、前記第１及び第２フィルタ演算器は、入力データとフィルタ係数とをブースアルゴリズムを用いて積和演算するフィルタ演算器であって、乗数と被乗数とをブースアルゴリズムを用いて積和演算するフィルタ演算器であって、前記乗数が入力される入力部と、前記ブースアルゴリズムに従って前記入力部からの出力をデコードして１又は複数の符号データを求め、対応する被乗数と当該１又は複数の符号データのそれぞれとの積を求める繰り返し演算を行う２以上の演算部と、前記入力部からの出力を選択して前記演算部のいずれかに入力する入力選択セレクタと、前記入力部からの出力に基づき前記各演算部における繰り返し演算回数及び繰り返し演算タイミングを決定し、この決定結果に基づき前記入力選択セレクタを制御する制御部とを有するものである。 A motion compensation processing apparatus according to the present invention is a motion compensation processing apparatus that generates a predicted image, and includes a first filter arithmetic unit that performs a filter operation on vertical input data and a horizontal input data. A second filter arithmetic unit that performs a filter operation, and a weighting arithmetic unit that performs weighting on the operation results of the first and second filter arithmetic units or input data input to the first and second filter operations, The first and second filter arithmetic units are filter arithmetic units that perform a product-sum operation on input data and filter coefficients using a Booth algorithm, and a filter operation that performs a product-sum operation on a multiplier and a multiplicand using a Booth algorithm. An input unit to which the multiplier is input, and an output from the input unit according to the Booth algorithm to decode one or more codes Two or more arithmetic units that perform data calculation and perform a repetitive calculation to obtain a product of the corresponding multiplicand and each of the one or more code data, and an output from the input unit to select any one of the arithmetic units An input selection selector for input, and a control unit that determines the number of repetition operations and the repetition operation timing in each of the operation units based on an output from the input unit, and controls the input selection selector based on the determination result. is there.

本発明においては、入力部からの出力に基づき各演算部における繰り返し演算回数及び繰り返し演算タイミングを決定し、この決定結果に基づき入力選択セレクタを制御し、入力部からの出力を選択して演算部のいずれかに入力することにより、一の入力部からの出力に対し、複数の演算部を共有化することができ、ハードウェアを削減しつつ繰り返し演算回数を最小限として演算処理時間の短縮化を図ることができる。 In the present invention, the number of repetitive operations and the repetitive operation timing in each operation unit are determined based on the output from the input unit, the input selection selector is controlled based on the determination result, and the output from the input unit is selected to calculate the operation unit By inputting to either of these, multiple arithmetic units can be shared with respect to the output from one input unit, reducing the number of repeated operations and reducing the processing time while reducing hardware Can be achieved.

本発明によれば、ハードウェア量及び消費電力を削減することができるブースアルゴリズムを利用したフィルタ演算器及び動き補償装置を提供することができる。 According to the present invention, it is possible to provide a filter arithmetic unit and a motion compensation device using a booth algorithm that can reduce the amount of hardware and power consumption.

以下、本発明を適用した具体的な実施の形態について、図面を参照しながら詳細に説明する。本実施の形態は、ブースアルゴリズムを利用したフィルタ演算器において、演算器を有効利用することで、繰り返し演算回数を減少して処理速度を向上させるものである。 Hereinafter, specific embodiments to which the present invention is applied will be described in detail with reference to the drawings. In the present embodiment, in the filter arithmetic unit using the Booth algorithm, the arithmetic unit is effectively used to reduce the number of repetitive calculations and improve the processing speed.

先ず、本実施の形態にかかるフィルタ演算器を適用することができる画像復号装置について説明する。ここでは、一例として、Ｈ.２６４及びＶＣ−１における動き補償処理におけるフィルタ演算を実行するフィルタ演算器に適用した場合について説明する。なお、本発明は、Ｈ．２６４及びＶＣ−１の両規格におけるフィルタ演算が可能な動き補償回路について説明するが、Ｈ．２６４のみのフィルタ演算を行なう動き補償回路、ＶＣ−１のみのフィルタ演算を行なう動き補償回路、又はその他ＭＰＥＧ（Moving Picture Experts Group）２、４等のフィルタ演算器にも適用可能であることは勿論である。 First, an image decoding apparatus to which the filter arithmetic unit according to this embodiment can be applied will be described. Here, as an example, a case will be described in which the present invention is applied to a filter computing unit that executes filter computation in motion compensation processing in H.264 and VC-1. In addition, this invention is H.264. A motion compensation circuit capable of performing a filter operation in both H.264 and VC-1 standards will be described. Of course, the present invention can also be applied to a motion compensation circuit that performs a filter operation of only H.264, a motion compensation circuit that performs a filter operation of only VC-1, or other filter operation units such as MPEG (Moving Picture Experts Group) 2, 4 and the like. It is.

先ず、Ｈ.２６４、ＶＣ−１の画像復号装置について説明する。図１及び図２は、それぞれＨ.２６４及びＶＣ−１に準拠して符号化された圧縮画像を復号する復号装置を示すブロック図である。Ｈ．２６４は、ＭＰＥＧ４ＡＶＣ（Advanced Video Coding）とも呼ばれ、データ圧縮率は、ＭＰＥＧ−２の２倍以上、ＭＰＥＧ−４の１．５倍以上とすることができる圧縮符号化方式である。また、ＶＣ−１（Windows Media Video（ＷＭＶ）９）（登録商標）はマイクロソフト社が開発した動画圧縮技術であり、Ｈ．２６４と同程度のデータ圧縮率を有する。これらのアドバンスドコーデック（高圧縮コーデック）は、ＨＤＤＶＤ（High Definition DVD）、又はブルーレイディスク等の次世代ＤＶＤ規格に適用される。 First, an H.264, VC-1 image decoding apparatus will be described. 1 and 2 are block diagrams illustrating a decoding device that decodes a compressed image encoded in accordance with H.264 and VC-1. H. H.264 is also referred to as MPEG4 AVC (Advanced Video Coding), and is a compression encoding method that can make the data compression rate more than twice that of MPEG-2 and 1.5 times that of MPEG-4. VC-1 (Windows Media Video (WMV) 9) (registered trademark) is a video compression technology developed by Microsoft Corporation. The data compression rate is about the same as H.264. These advanced codecs (high compression codecs) are applied to next-generation DVD standards such as HD DVD (High Definition DVD) or Blu-ray Disc.

図１に示すように、Ｈ.２６４の画像復号装置２００は、可変長復号部２０２と、逆量子化部２０３と、逆アダマール変換部２０４と、加算器２０５と、デブロッキングフィルタ２０６と、動き補償部２１２と、重み付け予測部２１１と、画面内予測部２１０と、復号画像２０８を表示するモニタ２０９を有する。 As shown in FIG. 1, an H.264 image decoding apparatus 200 includes a variable length decoding unit 202, an inverse quantization unit 203, an inverse Hadamard transform unit 204, an adder 205, a deblocking filter 206, a motion The compensation unit 212, the weighted prediction unit 211, the in-screen prediction unit 210, and the monitor 209 that displays the decoded image 208 are included.

可変長復号部２０２は、圧縮データ２０１が入力され可変長符号化された圧縮データを、変換テーブルに基づき可変長復号する。そして、可変長復号された復号データは、逆量子化部２０３にて逆量子化され、逆アダマール変換部２０４にて逆アダマール変換され加算器２０５へ送られる。加算器２０５の出力は、デブロッキングフィルタ２０６によりブロック歪を除去されて復号画像２０８とされ、モニタ２０９を介して表示される。 The variable-length decoding unit 202 performs variable-length decoding on the compressed data that is input with the compressed data 201 and is variable-length encoded based on the conversion table. The decoded data subjected to variable length decoding is inversely quantized by the inverse quantization unit 203, subjected to inverse Hadamard transform by the inverse Hadamard transform unit 204, and sent to the adder 205. The output of the adder 205 is deblocked by the deblocking filter 206 to obtain a decoded image 208, which is displayed via the monitor 209.

ここで、加算器２０５の出力が画面内予測部２１０にも入力され、予測画像２１３が生成される。また、復号画像が動き補償部２１２にて動き補償処理が行なわれ、重み付け予測部２１１にて重み付けされて予測画像２１３が生成される。加算器２０５は、Ｉフレーム処理の際には画面内予測部２１０からの予測画像２１３に予測誤差を加算し出力する。一方、Ｐ、Ｂフレーム処理の際には、切替部２０７にて切り替え、重み付け予測部２１１から送られる予測画像２１３に予測誤差を加算して出力する。 Here, the output of the adder 205 is also input to the in-screen prediction unit 210, and a predicted image 213 is generated. In addition, the motion compensation unit 212 performs motion compensation processing on the decoded image, and the weighted prediction unit 211 weights the decoded image to generate a predicted image 213. The adder 205 adds a prediction error to the predicted image 213 from the intra-screen prediction unit 210 and outputs the result when performing I frame processing. On the other hand, in the P and B frame processing, switching is performed by the switching unit 207, and a prediction error is added to the predicted image 213 sent from the weighted prediction unit 211 and output.

また、図２に示すように、ＶＣ−１の画像復号装置２２０も、画像復号装置２００とほぼ同様に構成され、可変長復号部２２２、逆量子化部２２３、逆ＤＣＴ変換部２２４、加算器２２５、ループフィルタ２２６、重み付け予測部２２９、動き補償部２３０、及び復号画像２２７を表示するモニタ２２８を有する。ＶＣ−１の画像復号装置２２０は、画面内予測を行なわない点、重み付け予測を行なってから動き補償処理を行う点、デブロッキングフィルタ２０６の代わりにループフィルタ２２６が使用される点が異なる。
（３−２）動き補償部 As shown in FIG. 2, the VC-1 image decoding apparatus 220 is also configured in substantially the same manner as the image decoding apparatus 200, and includes a variable length decoding unit 222, an inverse quantization unit 223, an inverse DCT conversion unit 224, and an adder. 225, a loop filter 226, a weighted prediction unit 229, a motion compensation unit 230, and a monitor 228 for displaying a decoded image 227. The VC-1 image decoding apparatus 220 is different in that it does not perform intra prediction, performs motion compensation processing after performing weighted prediction, and uses a loop filter 226 instead of the deblocking filter 206.
(3-2) Motion compensation unit

図３は、Ｈ.２６４及びＶＣ−１の規格に準拠したフィルタ演算を含む動き補償処理を実行する動き補償（ＭＣ）部を示すブロック図である。この動き補償部３００は、Ｈ．２６４及びＶＣ−１のいずれの動き補償部でも使用可能な構成とされている。すなわち、両規格にて共有できる。この動き補償部３００は、フィルタ演算器３０２、３０３と、セレクタ３０１、３０４、３０７、３１０、３１３と、乗算器３０４、３１２、加算器３０６、３０８、３１１と、ラインメモリ３０９とを有する。 FIG. 3 is a block diagram showing a motion compensation (MC) unit that executes a motion compensation process including a filter operation conforming to the standards of H.264 and VC-1. This motion compensation unit 300 is an H.264 standard. H.264 and VC-1 motion compensators can be used. That is, it can be shared by both standards. The motion compensation unit 300 includes filter arithmetic units 302 and 303, selectors 301, 304, 307, 310 and 313, multipliers 304 and 312, adders 306, 308 and 311, and a line memory 309.

Ｈ．２６４では、フィルタ演算器３０２、３０３にてフィルタ演算施した後、上述した重み付け係数を使用してオフセット付き重み補間信号を求め、予測画像２１３を得る。ここで、入力ＩＮから入力された参照ピクチャＲ０の画素値が、フィルタ演算器３０２にて垂直方向フィルタによるフィルタ演算が実行され、フィルタ演算器３０３にて水平方向フィルタによるフィルタ演算が施される。そして、生成されたフィルタ演算済みのデータがラインメモリ３０９に格納される。次に、参照ピクチャＲ１の画素値が入力ＩＮから入力されると、同様に、フィルタ演算器３０２、３０３にてフィルタ演算が施され、フィルタ演算済みのデータに乗算器３０５にて重み係数を乗算し、加算器３０６にてオフセット値を加算する。一方、ラインメモリに格納されているデータがセレクタ３１３を介して乗算器３１２にて重み付き係数と乗算され、これらが加算器３０８にて加算され、オフセット付き重み補間信号Ｗ_０Ｘ_０＋Ｗ_１Ｘ_１＋Ｄを生成する。生成されたデータは、ラインメモリ３０９を経て出力ＯＵＴから出力される。 H. In H.264, after the filter operation is performed by the filter operation units 302 and 303, a weighted interpolation signal with an offset is obtained using the above-described weighting coefficient, and a predicted image 213 is obtained. Here, the pixel value of the reference picture R0 input from the input IN is subjected to a filter operation by a vertical filter in the filter arithmetic unit 302, and is subjected to a filter operation by a horizontal filter in the filter arithmetic unit 303. The generated filter-calculated data is stored in the line memory 309. Next, when the pixel value of the reference picture R1 is input from the input IN, similarly, the filter arithmetic units 302 and 303 perform the filter operation, and the multiplier 305 multiplies the filter-calculated data by the weight coefficient. Then, the adder 306 adds the offset value. On the other hand, the data stored in the line memory is multiplied by the weighted coefficient by the multiplier 312 via the selector 313 and added by the adder 308, and the weighted interpolation signal W ₀ X ₀ + W ₁ X with offset is added. ₁ + D is generated. The generated data is output from the output OUT via the line memory 309.

ＶＣ−１の場合は、入力ＩＮからのデータがセレクタ３１３、セレクタ３１０を介し、更にセレクタ３０４から乗算器３０５、加算器３０６を介し、そしてセレクタ３０１を介してフィルタ演算器３０２、３０３に入力される。フィルタ演算器３０３の結果は、セレクタ３０４、セレクタ３０７を介してそのままラインメモリ３０９へ格納され、出力ＯＵＴから出力される。乗算器３１２、加算器３１１、乗算器３０５、加算器３０６では、以下の重み付けが実行される。
Ｈ＝（ｉＳｃａｌｅ×Ｆ＋ｉＳｈｉｆｔ＋３２）＞＞６
ここで、Ｆは入力値、ｉＳｃａｌｅ、ｉＳｈｉｆｔは重み係数を示す。 In the case of VC-1, data from the input IN is input to the filter arithmetic units 302 and 303 via the selector 313 and the selector 310, further from the selector 304 via the multiplier 305 and the adder 306, and via the selector 301. The The result of the filter computing unit 303 is stored in the line memory 309 as it is via the selector 304 and the selector 307 and is output from the output OUT. In the multiplier 312, the adder 311, the multiplier 305, and the adder 306, the following weighting is executed.
H = (iScale × F + iShift + 32) >> 6
Here, F is an input value, and iScale and iShift are weighting factors.

このように構成された動き補償部３００は、セレクタ３０１、３０４、３０７、３１０、３１３にてフィルタ演算器３０２、３０３への入力、出力を適宜選択するため、重み付けをフィルタ演算後に実行するＨ．２６４であっても、重み付けをフィルタ演算前に実行するＶＣ−１であっても、いずれの演算にも適用可能である。 The motion compensator 300 configured in this way is selected by the selectors 301, 304, 307, 310, and 313 so that the inputs and outputs to the filter calculators 302 and 303 are appropriately selected. Even if it is H.264, even if it is VC-1 which performs weighting before a filter calculation, it is applicable to any calculation.

次に、このような動き補償部等に使用することができるフィルタ演算器について詳細に説明する。なお、本実施の形態においては、Ｈ．２６４やＶＣ−１を例にとって説明するが、本フィルタ演算器は、ＭＰＥＧ４、２などにおけるフィルタ演算器としても使用することが可能である。 Next, a filter arithmetic unit that can be used for such a motion compensation unit will be described in detail. In the present embodiment, H.264 is used. This filter arithmetic unit can be used as a filter arithmetic unit in MPEG4, 2 or the like.

図４は、フィルタ演算器３０２、３０３の詳細を示す図であって、本実施の形態にかかるフィルタ演算器を示すブロック図である。フィルタ演算器３０２、３０３は同様の構成のため、ここではフィルタ演算器１として説明する。また、本実施の形態においては、５つのフィルタ係数を有し、５つの画素値から一の演算結果を求めるフィルタ演算を実行する場合について説明するため、５つの演算部１０_１〜１０_５を有する。ただし、例えば、Ｈ．２６４であれば、輝度信号Ｇｙが６タップフィルタ、色差信号Ｇｃは２タップフィルタのフィルタ演算であり、ＶＣ−１であれば輝度信号Ｇｙが４タップフィルタ、色差信号Ｇｃは２タップフィルタのフィルタ演算となり、それぞれフィルタ係数に対応するフィルタ演算器を用いる。なお、フィルタ演算器を１又はフィルタ係数未満の数のみ用意して、フィルタ演算器を繰り返し使用して演算を行なうようにしてもよい。 FIG. 4 is a diagram showing details of the filter calculators 302 and 303, and is a block diagram showing the filter calculator according to the present embodiment. Since the filter calculators 302 and 303 have the same configuration, they will be described as the filter calculator 1 here. Further, in the present embodiment, in order to explain a case where a filter operation having five filter coefficients and obtaining one operation result from five pixel values is performed, five operation units 10 ₁ to 10 ₅ are provided. . However, for example, H.M. If it is H.264, the luminance signal Gy is a 6-tap filter and the color difference signal Gc is a 2-tap filter, and if it is VC-1, the luminance signal Gy is a 4-tap filter and the color-difference signal Gc is a 2-tap filter. Thus, a filter calculator corresponding to each filter coefficient is used. Note that it is also possible to prepare only a number of filter operation units or a number less than the filter coefficient, and perform the operation by repeatedly using the filter operation units.

本実施の形態にかかるフィルタ演算器１は、ブースデコーダ及び部分積生成部からなり、繰り返し演算を行う演算部１０_１〜１０_５を共有することでフィルタ演算器の規模を大幅に小さくしつつ、演算部１０_１〜１０_５を高効率に利用することで演算処理速度を向上するものである。また、減算器１２_１〜１２_５により、現在の画像データと１つ前の画像データとの差分をとってフィルタ演算することで演算量を低減し、これにより、演算時間を短縮化する。さらに、演算部１０_１〜１０_５の各演算結果を一の加算器２１にて加算することで、隣接する演算部を利用した場合であっても演算結果を元に戻すことなく加算処理を行なうことができる。 The filter computing unit 1 according to the present embodiment includes a booth decoder and a partial product generation unit, and greatly reduces the scale of the filter computing unit by sharing the computing units 10 ₁ to 10 ₅ that perform repeated computations. The calculation processing speed is improved by using the calculation units 10 ₁ to 10 ₅ with high efficiency. Further, the subtracters 12 _{1 to} 12 ₅ take the difference between the current image data and the previous image data and perform a filter operation to reduce the amount of calculation, thereby shortening the calculation time. Further, by adding the calculation results of the calculation units 10 ₁ to 10 ₅ with one adder 21, even if adjacent calculation units are used, addition processing is performed without restoring the calculation results. be able to.

ここで、本実施の形態にかかるフィルタ演算器は、ブースのアルゴリズムを使用して乗算を行うフィルタ演算器である。そこで、本実施の形態にかかるフィルタ演算器の理解を容易とするため、先ず、２次のブースアルゴリズムを利用した乗算器について説明する。 Here, the filter arithmetic unit according to the present embodiment is a filter arithmetic unit that performs multiplication using Booth's algorithm. In order to facilitate understanding of the filter arithmetic unit according to the present embodiment, first, a multiplier using a second order Booth algorithm will be described.

乗数Ｙを符号付き８ビット整数
Ｙ＝−ｙ[７]・２^７＋ｙ[６]・２^６＋ｙ[５]・２^５＋ｙ[４]・２^４＋ｙ[３]・２^３＋ｙ[２]・２^２＋ｙ[１]・２^１＋ｙ[０]・２^０
とすると、任意整数である被乗数Ｘとの積Ｐ＝Ｘ×Ｙは以下のようになる。 Multiplier Y is a signed 8-bit integer Y = −y [7] · 2 ⁷ + y [6] · 2 ⁶ + y [5] · 2 ⁵ + y [4] · 2 ⁴ + y [3] · 2 ³ + y [2]・ 2 ² + y [1] ・ 2 ¹ + y [0] ・ 2 ⁰
Then, the product P = X × Y with the multiplicand X, which is an arbitrary integer, is as follows.

この（−２・ｙ[２ｉ＋１]＋ｙ[２ｉ]＋ｙ[２ｉ-１]）を算出するものをブースデコーダ、Ｘ×（−２・ｙ[２ｉ＋１}＋ｙ[２ｉ]＋ｙ[２ｉ-１])×２^２ｉを部分積という。ここで、本明細書においては、ブースデコーダにより求められるデコード値（−２・ｙ[２ｉ＋１]＋ｙ[２ｉ]＋ｙ[２ｉ-１]）を符号データということとする。また、Ｘ×（−２・ｙ[２ｉ＋１}＋ｙ[２ｉ]＋ｙ[２ｉ-１])×２^２ｉ（部分積）を生成する回路を部分積生成ユニット、Ｘ×（−２・ｙ[２ｉ＋１}＋ｙ[２ｉ]＋ｙ[２ｉ-１])×２^２ｉのうち、各ｉに対応した部分積を生成する回路を部分積生成部、符号データ（−２・ｙ[２ｉ＋１]＋ｙ[２ｉ]＋ｙ[２ｉ-１]）を求める回路をブースデコーダ、符号データ×被乗数からなる演算を行ない部分積を求める回路を乗算部、部分積のうち、×２^２ｉの演算を実行する部分をビットシフト部ということとする。 What calculates (−2 · y [2i + 1] + y [2i] + y [2i-1]) is a booth decoder, X × (−2 · y [2i + 1} + y [2i] + y [2i-1]) × 2 ²ⁱ is called partial product. In this specification, the decode value (−2 · y [2i + 1] + y [2i] + y [2i−1]) obtained by the Booth decoder is referred to as code data. In addition, a circuit for generating X × (−2 · y [2i + 1} + y [2i] + y [2i−1]) × 2 ²ⁱ (partial product) is a partial product generation unit, and X × (−2 · y [2i + 1} + Y [2i] + y [2i-1]) × 2 ²ⁱ , a circuit that generates a partial product corresponding to each i is represented by a partial product generation unit, code data (−2 · y [2i + 1] + y [2i] + y [ 2i-1]) is a booth decoder, a circuit that performs an operation consisting of code data × multiplicand to obtain a partial product is a multiplication unit, and a portion of the partial product that is to execute an operation of × 2 ²ⁱ is a bit shift unit. And

ここで、下記表１に示すように、符号データ（−２・ｙ[２ｉ＋１]＋ｙ[２ｉ]＋ｙ[２ｉ-１]）の値の組み合わせは８通りしかなく、０、±１、±２の値のみしかとらない。よって、乗算器は、０、±Ｘ、±２Ｘに２^２ｉを乗算した値（部分積）を算出して加算する値の組み合わせの対応（真理値表）として書ける。また、符号データの値は８通りしかないため、ブースデコーダは、単なる組み合わせ論理回路により得ることができる。 Here, as shown in Table 1 below, there are only eight combinations of values of the code data (−2 · y [2i + 1] + y [2i] + y [2i−1]), and 0, ± 1, ± 2 Takes only a value. Therefore, the multiplier can calculate a value (partial product) obtained by multiplying 0, ± X, ± 2X by 2 ²ⁱ and write it as a correspondence (truth table) of combinations of values to be added. Further, since there are only 8 values of the code data, the booth decoder can be obtained by a simple combinational logic circuit.

０、±Ｘ、±２Ｘのうち、２Ｘの生成は１ビットのシフトで行なうことができる。一方、負数の生成は被乗数Ｘが２の補数表現であるのでＸの各ビットを反転させ最下位ビットに１を加えればよい。これを実現するために、例えば、符号データ（−２・ｙ[２ｉ＋１]＋ｙ[２ｉ]＋ｙ[２ｉ-１]）を生成する回路（ブースデコーダ）は、乗数Ｙの入力に対して部分積の絶対値（０、Ｘ、２Ｘ）を選択するための２つの信号と反転を選択するための１つの信号とからなる３つの信号を生成する。また、乗算部は、この３つの信号を受けて、絶対値が０の場合は０を、Ｘの場合は被乗数Ｘを、２Ｘの場合は被乗数Ｘを１ビットシフトしたものを選択し、さらに、反転が必要な場合はその値を反転させて部分積を生成することができる。さらに、×２^２ｉを実行するビットシフト部は、単純にビット線を２ｉだけシフトさせればよい。 Of 0, ± X, and ± 2X, 2X can be generated by a 1-bit shift. On the other hand, since the multiplicand X is expressed in the complement of 2 when generating the negative number, it is only necessary to invert each bit of X and add 1 to the least significant bit. In order to realize this, for example, a circuit (Booth decoder) that generates code data (−2 · y [2i + 1] + y [2i] + y [2i−1]) Three signals including two signals for selecting an absolute value (0, X, 2X) and one signal for selecting inversion are generated. Further, the multiplication unit receives these three signals, and when the absolute value is 0, it is 0, and when it is X, the multiplicand X In the case of 2X, the multiplicand X shifted by 1 bit is selected, and if the inversion is necessary, the value can be inverted to generate a partial product. Furthermore, the bit shift unit that executes x2 ²ⁱ may simply shift the bit line by 2i.

図５は、このような２次のブースのアルゴリズムに従って乗算を実行する乗算器を示すブロック図である。乗算器４００は、被乗数Ｘを出力するレジスタＦ０と、乗数Ｙを出力するレジスタＦ７を有する。更に、乗数Ｙ及び被乗数Ｘが入力され部分積を生成する部分積生成ユニット４０１と、部分積生成ユニット４０１にて生成された部分積を加算する加算器４５０とを有する。部分積生成ユニット４０１は、４つの部分積生成部４１０、４２０、４３０、４４０を有する。 FIG. 5 is a block diagram showing a multiplier that performs multiplication in accordance with such a second order Booth algorithm. The multiplier 400 includes a register F0 that outputs a multiplicand X and a register F7 that outputs a multiplier Y. Furthermore, a partial product generation unit 401 that receives a multiplier Y and a multiplicand X and generates a partial product, and an adder 450 that adds the partial products generated by the partial product generation unit 401 are provided. The partial product generation unit 401 includes four partial product generation units 410, 420, 430, and 440.

各部分積生成部は、上述したように、乗数Ｙのうち所定ビットが入力され、ブースのアルゴリズムに従って符号データ（０、±１、±２）を生成するブースデコーダと、得られた符号データと被乗数Ｘとの乗算結果を出力する乗算部と、乗算部の演算結果のビットシフトを行なうビットシフト部とから構成されるものとする。 As described above, each partial product generator receives a predetermined bit of the multiplier Y, generates a code data (0, ± 1, ± 2) according to Booth's algorithm, and the obtained code data It is assumed that a multiplication unit that outputs a multiplication result with the multiplicand X and a bit shift unit that performs a bit shift of a calculation result of the multiplication unit.

各部分積生成部は、Ｘ×（−２・ｙ[２ｉ＋１}＋ｙ[２ｉ]＋ｙ[２ｉ-１])×２^２ｉの"ｉ"に対応したものとなっており、例えば乗数Ｙが８ビット（ｙ_０〜ｙ_７とする）であれば、ｉ＝０〜３であり、それぞれＸ×（−２・ｙ_１＋ｙ_０＋０)×２^０、Ｘ×（−２・ｙ_３＋ｙ_２＋ｙ_１）×２^２、Ｘ×（−２・ｙ_５＋ｙ_４＋ｙ_３）×２^４、Ｘ×（−２・ｙ_７＋ｙ_６＋ｙ_５)×２^６を求める。図５においては、これらの部分積を求める部分積生成部を、それぞれ４１０、４２０、４３０、４４０としている。なお、本実施の形態においては、ブースデコーダでデコードする乗数Ｙが８ビットを例にとって説明するが、これ未満、又は以上であってもよいことは勿論である。その場合は、部分積生成部の個数を適宜調整すればよい。 Each partial product generator corresponds to “i” of X × (−2 · y [2i + 1} + y [2i] + y [2i−1]) × 2 ²ⁱ . For example, the multiplier Y is 8 bits. (Y _{0 to} y ₇ ), i = 0 to 3, and X × (−2 · y ₁ + y ₀ +0) × 2 ⁰ and X × (−2 · y ₃ + y ₂ + y _{1, respectively.} ) × 2 ² , X × (−2 · y ₅ + y ₄ + y ₃ ) × 2 ⁴ , and X × (−2 · y ₇ + y ₆ + y ₅ ) × 2 ⁶ are obtained. In FIG. 5, the partial product generators for obtaining these partial products are 410, 420, 430, and 440, respectively. In the present embodiment, the multiplier Y decoded by the Booth decoder is described by taking 8 bits as an example, but it is needless to say that the multiplier Y may be less than or more than this. In that case, what is necessary is just to adjust the number of partial product production | generation parts suitably.

次に、実際の演算を例にとって、この乗算器４００の動作について説明する。８ビットの乗数Ｙは、図６（ａ）のように表すことができる。乗数を２ビットごとに区切り、各組と下位組の最上位ビットの計３ビット（ただしｙ_−１＝０）のデータから符号データが得られる。これらに被乗数Ｘを乗算し、対応するビットシフト（×２^ｉ）を演算することで部分積を生成することができる。このため、図６（ｂ）に示すように、レジスタＦ７は８ビットを出力するシフトレジスタからなり、乗数Ｙ｛ｙ_０〜ｙ_７｝を出力する。このとき部分積生成部４１０には、乗数Ｙのうち下位２ビット{ｙ_０、ｙ_１}、部分積生成部４２０、４３０、４４０にはそれぞれ、{ｙ_１、ｙ_２、ｙ_３}、{ｙ_３、ｙ_４、ｙ_５}、{ｙ_５、ｙ_６、ｙ_７}を入力する。部分積生成部４１０は、入力されたこれらの所定ビットから符号データを生成するブースデコーダ４１１と、得られた符号データと被乗数Ｘとの乗算を行なう乗算部４１２と、乗算結果を所定ビットシフトするビットシフト部４１３とを有する。他の部分積生成部４２０、４３０、４４０も同様に構成される。ここでは、被乗数Ｘ＝３５８（１６６Ｈ）、乗数Ｙ＝１２３（７ＢＨ）の乗算について説明する。下記表２は、演算工程における各出力値を示す。 Next, the operation of the multiplier 400 will be described using an actual calculation as an example. The 8-bit multiplier Y can be expressed as shown in FIG. The multiplier is divided every 2 bits, and the code data is obtained from the data of a total of 3 bits (however, y ₋₁ = 0) of the most significant bit of each group and the lower group. A partial product can be generated by multiplying these by the multiplicand X and calculating the corresponding bit shift (× 2 ⁱ ). For this reason, as shown in FIG. 6B, the register F7 is formed of a shift register that outputs 8 bits, and outputs a multiplier Y {y _{0 to} y ₇ }. At this time, the partial product generator 410 has the lower two bits {y ₀ , y ₁ } of the multiplier Y, and the partial product generators 420, 430, and 440 have {y ₁ , y ₂ , y ₃ }, { Enter y ₃ , y ₄ , y ₅ }, {y ₅ , y ₆ , y ₇ }. The partial product generation unit 410 generates a code data from these inputted predetermined bits, a booth decoder 411, a multiplication unit 412 that multiplies the obtained code data and the multiplicand X, and shifts the multiplication result by a predetermined bit. A bit shift unit 413. The other partial product generation units 420, 430, and 440 are similarly configured. Here, the multiplication of the multiplicand X = 358 (166H) and the multiplier Y = 123 (7BH) will be described. Table 2 below shows each output value in the calculation process.

Ｘ×Ｙ＝３５８×１２３＝４４０３４（ＡＣ０２Ｈ）
Ｙ＝１２３（７ＢＨ）
＝(−２・０＋１＋１)・２^６
＋（−２・１＋１＋１）・２^４
＋（−２・１＋０＋１）・２^２
＋（−２・１＋１＋０）・２^０
＝２・２^６＋０・２^４＋（−１）・２^２＋（−１）・２^０
よって、下記となる。
Ｘ×Ｙ＝{（２×３５８）×２^６} ・・・部分積生成部４１０にて演算
＋{（０×３５８）×２^４} ・・・部分積生成部４２０にて演算
＋{（−１×３５８）×２^２} ・・・部分積生成部４３０にて演算
＋{（−１×３５８）×２^０} ・・・部分積生成部４４０にて演算

X × Y = 358 × 123 = 44034 (AC02H)
Y = 123 (7BH)
= (-2 ・ 0 + 1 + 1) ・ 2 ⁶
+ (− 2 · 1 + 1 + 1) · 2 ⁴
+ (-2 · 1 + 0 + 1) · 2 ²
+ (-2 · 1 + 1 + 0) · 2 ⁰
= 2 · 2 ⁶ + 0 · 2 ⁴ + (− 1) · 2 ² + (− 1) · 2 ⁰
Therefore, it becomes the following.
X × Y = {(2 × 358) × 2 ⁶ }... Operation by partial product generation unit 410 + {(0 × 358) × 2 ⁴ }... Operation by partial product generation unit 420 + {( −1 × 358) × 2 ² }... Operation by partial product generation unit 430 + {(− 1 × 358) × 2 ⁰ }... Operation by partial product generation unit 440

先ず、被乗数入力部Ｆ０からは"３５８"が各部分積生成部４１０、４２０、４３０、４４０に入力される。乗数入力部Ｆ７からは、各部分積生成部４１０、４２０、４３０、４４０に、それぞれ{ｙ_０、ｙ_１}＝{１、１}、{ｙ_１、ｙ_２、ｙ_３}＝{１、０、１}、{ｙ_３、ｙ_４、ｙ_５}＝{１、１、１}、{ｙ_５、ｙ_６、ｙ_７}＝{１、１、０}が入力される。ブースデコーダ４１１、４２１、４３１、４４１は入力された所定ビットから、それぞれ（−２・ｙ[２ｉ＋１}＋ｙ[２ｉ]＋ｙ[２ｉ-１])＝（−２・ｙ_１＋ｙ_０＋０)、（−２・ｙ_３＋ｙ_２＋ｙ_１）、（−２・ｙ_５＋ｙ_４＋ｙ_３）、（−２・ｙ_７＋ｙ_６＋ｙ_５)の演算に対応する符号データを出力する。上記の式より本例では、各ブースデコーダ４１１、４２１、４３１、４４１は、それぞれ、"−１"、"−１"、"０"、"２"を出力する。 First, “358” is input to each partial product generation unit 410, 420, 430, 440 from the multiplicand input unit F0. From the multiplier input unit F7, each partial product generation unit 410, 420, 430, 440 receives {y ₀ , y ₁ } = {1, 1}, {y ₁ , y ₂ , y ₃ } = {1, 0, 1}, {y ₃ , y ₄ , y ₅ } = {1, 1, 1}, {y ₅ , y ₆ , y ₇ } = {1, 1, 0} are input. The booth decoders 411, 421, 431, and 441 respectively receive (−2 · y [2i + 1} + y [2i] + y [2i−1]) = (− 2 · y ₁ + y ₀ +0), ( Code data corresponding to the calculation of (−2 · y ₃ + y ₂ + y ₁ ), (−2 · y ₅ + y ₄ + y ₃ ), (−2 · y ₇ + y ₆ + y ₅ ) is output. From the above equation, in this example, each booth decoder 411, 421, 431, 441 outputs “−1”, “−1”, “0”, “2”, respectively.

各乗算部４１２、４２２、４３２、４４２は、上記符号データ×被乗数Ｘを演算して、それぞれビットシフト部４１３、４２３、４３３、４４３へ入力する。ビットシフト部４１３はそのまま加算器４５０へ出力する。なお、本例においては説明の明確のためビットシフト部４１３を設けているが設ける必要はない。ビットシフト部４２３、４３３、４４３は、受け取った結果をそれぞれ２ビット、４ビット、６ビットシフトさせた後、加算器４５０へ入力する。 Each of the multipliers 412, 422, 432, 442 calculates the code data × the multiplicand X and inputs them to the bit shift units 413, 423, 433, 443, respectively. The bit shift unit 413 outputs it to the adder 450 as it is. In this example, the bit shift unit 413 is provided for clarity of explanation, but it is not necessary to provide it. Bit shift sections 423, 433, and 443 shift the received results by 2 bits, 4 bits, and 6 bits, respectively, and input the result to adder 450.

本例の加算器４５０は、全加算器（フルアダー）４５１、４５２と、半加算器（ハーフアダー）４５３と、結果を受け取るレジスタ４５４とを有する。各ビットシフト部４１３、４２３、４３３、４４３から入力された値は、加算器４５０にて加算され、乗算結果Ｐとして出力される。 The adder 450 of this example includes full adders (full adders) 451 and 452, half adders (half adders) 453, and a register 454 for receiving the result. The values input from the bit shift units 413, 423, 433, and 443 are added by the adder 450 and output as the multiplication result P.

このように、２次のブースのアルゴリズムを使用すると、乗数を、０、±１、±２の符号データ×２^２ｉとし、被乗数と演算を行なわせるので、部分積の個数が略半分となる。よって加算器にて加算する部分積の個数を略半減させることができるので、乗算器を小型化することができる。 In this way, when the second-order Booth algorithm is used, the multiplier is set to 0, ± 1, ± 2 code data × 2 ²ⁱ and the calculation is performed with the multiplicand, so that the number of partial products is substantially halved. Therefore, the number of partial products to be added by the adder can be substantially halved, so that the multiplier can be reduced in size.

このような部分積生成ユニットを使用すると図５に示すフィルタ演算器は図７に示すような演算回路となる。図７は、従来の構成のフィルタ演算器を示す図である。すなわち上述したように、例えば８ビットであれば４つの部分積生成部を要し、例えば１０ビットであれば５つの部分積生成部を要する。なお、図７には簡単のため３つの部分積生成部のみを示している。 When such a partial product generation unit is used, the filter arithmetic unit shown in FIG. 5 becomes an arithmetic circuit as shown in FIG. FIG. 7 is a diagram illustrating a filter arithmetic unit having a conventional configuration. That is, as described above, for example, if it is 8 bits, four partial product generators are required, and if it is 10 bits, for example, five partial product generators are required. FIG. 7 shows only three partial product generators for simplicity.

図７を簡単に説明すると、フィルタ演算器５０１はレジスタ（フリップフロップ：ＦＦ）５０２、５１０、５１１、５１３、５１６、部分積生成部５０３〜５０５、加算器５０９、加算器５１２、５１４、リミッタ回路５１５を有する。部分積生成部５０３〜５０５はそれぞれブースデコーダ５０６〜５０８を有する。画素データが乗数Ｙとして入力されＦＦ５０２に保持される。ＦＦ５０２から、各ビットに応じた部分積生成部５０３〜５０５へ値が入力され部分積が生成される。加算器５０９はそれを加算し、上位ビットと下位ビットをそれぞれＦＦ５１２、５１１に入力する。加算器５１２はＦＦ５１０及びＦＦ５１１からの値を加算してＦＦ５１３に出力する加算器５１４はＦＦ５１３からの値とフィルタ係数Ｂとを加算し、リミッタ回路５１５は加算器５１４の値を例えば０〜２５５の範囲に制限してＦＦ５１６へ出力する。 Briefly describing FIG. 7, the filter arithmetic unit 501 includes registers (flip-flops: FFs) 502, 510, 511, 513, and 516, partial product generation units 503 to 505, an adder 509, adders 512 and 514, and a limiter circuit. 515. The partial product generators 503 to 505 have booth decoders 506 to 508, respectively. Pixel data is input as a multiplier Y and held in the FF 502. A value is input from the FF 502 to the partial product generation units 503 to 505 corresponding to each bit to generate a partial product. The adder 509 adds them and inputs the upper and lower bits to the FFs 512 and 511, respectively. The adder 512 adds the values from the FF 510 and the FF 511 and outputs the result to the FF 513. The adder 514 adds the value from the FF 513 and the filter coefficient B. The limiter circuit 515 sets the value of the adder 514 to 0 to 255, for example. Limit to the range and output to FF516.

このフィルタ演算器は、
[出力画素]＝Ｌｉｍ（[入力画素]×Ａ＋Ｂ）
の演算を実行する。ここで、Ａはフィルタ係数を示す。Ｂは各フィルタ演算において必要に応じて加算される所定の定数である。従来のフィルタ演算器においては、外部のメモリ等から読み出したデータは、バースト的に読み出される。この際、通常、高速演算する場合は、大規模な乗算器によりパイプライン処理する方式になっている。このため、例えば入力画素データが１０ビットであれば部分積生成部が５つ必要となり、回路規模が大きく、よって消費電力も大きい。 This filter operator
[Output pixel] = Lim ([Input pixel] × A + B)
Execute the operation. Here, A indicates a filter coefficient. B is a predetermined constant added as needed in each filter operation. In a conventional filter arithmetic unit, data read from an external memory or the like is read in bursts. At this time, in general, when high-speed computation is performed, the pipeline processing is performed by a large-scale multiplier. For this reason, for example, if the input pixel data is 10 bits, five partial product generation units are required, which results in a large circuit scale and thus high power consumption.

そこで、本実施の形態においては、図７示す部分積生成部５０３〜５０５を１つの部分積生成部とし、１つの部分積生成部を繰り返し使用することで回路規模を縮小し、消費電力を削減する。なお、本実施の形態においては、後述するように、５タップのフィルタ演算を行うため、繰り返し演算を行う演算部を５つ有している。したがって、図７に示す例であれば、演算部は１つでよい。また、上述したように、各演算部１０_１〜１０_５を有効利用することで演算時間の短縮化を図る。さらに、隣接する画素値の差分データを用いることで演算値を小さい値とすることができ、演算処理時間の更なる短縮化を図る。 Therefore, in the present embodiment, the partial product generation units 503 to 505 shown in FIG. 7 are used as one partial product generation unit, and the circuit scale is reduced and power consumption is reduced by repeatedly using one partial product generation unit. To do. In this embodiment, as will be described later, in order to perform a 5-tap filter operation, there are five operation units that perform repeated operations. Therefore, in the example shown in FIG. In addition, as described above, the calculation time can be shortened by effectively using the calculation units 10 ₁ to 10 ₅ . Furthermore, the calculation value can be reduced by using difference data of adjacent pixel values, and the calculation processing time can be further shortened.

図４に戻って、フィルタ演算器１は、乗数となる画素値と被乗数となるフィルタ係数とをブースアルゴリズムを用いて積和演算するフィルタ演算器であって、画素値が入力される入力部と、ブースアルゴリズムに従って入力部からの出力をデコードして１又は複数の符号データを求め、対応するフィルタ係数と当該１又は複数の符号データのそれぞれとの積を求める繰り返し演算を行う２以上の演算部１０_ｊと、入力部からの出力を選択して演算部のいずれかに入力する入力選択セレクタ１３_ｊと、入力部からの出力に基づき各演算部１０_ｊにおける繰り返し演算回数及び繰り返し演算タイミングを決定し、この決定結果に基づき入力選択セレクタ１３_ｊを制御する制御部３１とを有する。さらに、各演算部１０_ｊからの出力を全て加算する加算器２１と、加算器２１の値を保持するレジスタ２２と、レジスタ２２の値を制限するリミッタ回路２３と、リミッタ回路２３からの出力を保持するレジスタ２４と、レジスタ２２の出力又は０を選択出力するセレクタ２５とを有する。 Returning to FIG. 4, the filter computing unit 1 is a filter computing unit that performs a product-sum operation using a Booth algorithm on a pixel value that is a multiplier and a filter coefficient that is a multiplicand, and an input unit to which the pixel value is input. , Two or more arithmetic units that perform one or more arithmetic operations to obtain one or a plurality of code data by decoding the output from the input unit according to the Booth algorithm and obtain a product of the corresponding filter coefficient and each of the one or a plurality of code data 10 _j , an input selection selector 13 _j that selects an output from the input unit and inputs it to one of the arithmetic units, and determines the number of repetitive operations and the repetitive arithmetic timing in each arithmetic unit 10 _j based on the output from the input unit And a control unit 31 that controls the input selection selector 13 _j based on the determination result. Further, an adder 21 that adds all the outputs from the respective arithmetic units 10 _j, a register 22 that holds the value of the adder 21, a limiter circuit 23 that limits the value of the register 22, and an output from the limiter circuit 23 It has a register 24 for holding, and a selector 25 for selecting and outputting the output of the register 22 or 0.

入力部は、現在の画素値を格納するレジスタＦ００〜Ｆ０４とそれぞれこれに対応する前回の画素値を格納するレジスタＦ０１〜Ｆ０５と、これらの差分データを取る減算器１２_ｊ（１２_１〜１２_５）と、Ｆ０１〜Ｆ０５の画素値又は０を選択出力するセレクタ１１_１〜１１_５と、減算結果を格納するレジスタＦ０６〜Ｆ０９、Ｆ０Ａとを有する。演算部１０_１〜１０_５は、同様の構成であるため、演算部１０_１について説明すると、演算部１０_１は、レジスタＦ０６に格納された値のうち後述する繰り返し演算回数に応じた３ビットの値を選択して出力する選択器１４_１、選択器１４_１が選択出力した値をブースのアルゴリズムに従ってデコードして符号データを生成するブースデコーダ１５_１、当該符号データとある係数（フィルタ係数Ａ）とを乗算する乗算部１６_１、繰り返し演算回数に応じてビットシフトするビットシフト部１７_１、乗算部１６_１による乗算回数を決定する繰り返し回数決定部１８_１を有する。ブースデコーダ１５_１、乗算部１６_１、及びビットシフト部１７_１により部分積生成部を構成する。なお、繰り返し回数決定部１８_１は選択部１４_１、ビットシフト部１７_１を繰り返し演算回数及びそのタイミングに基づき制御するものであるが、制御部３１によりこれらの制御を行なってもよい。 The input unit includes registers F00 to F04 that store current pixel values, registers F01 to F05 that store previous pixel values corresponding thereto, and subtractors 12 _j (12 _{1 to} 12 ₅ ) that take the difference data therebetween. ) and has a selector ₁₁ 1 to 11 ₅ for selectively outputting the pixel value or zero F01～F05, register F06~F09 for storing the subtraction result, and F0A. For calculation unit ₁₀ 1 to 10 ₅ has the same structure, explaining arithmetic unit 10 _1, the arithmetic unit 10 _1, the 3 bits corresponding to the repeated number of operations to be described later of the value stored in register F06 selector 14 1 selects and outputs the _value, the selector 14 ₁ decodes according to the algorithm of the selected output value booth booth decoder 151 to generate the code _data, is with the code data coefficients (filter coefficients a) multiplying unit 16 1 for multiplying the _bets, bit shift section 17 1 for bit shifting in response to the repeated number of _operations, with the number of repetitions determining unit 18 ₁ to determine the number of multiplications by the multiplier unit 16 _1. The booth decoder 15 ₁ , the multiplier 16 ₁ , and the bit shift unit 17 ₁ constitute a partial product generator. Incidentally, determining the number of repetitions 18 ₁ selector 14 _1, and controls, based on the repetition number of calculations and its timing bit shift section 17 ₁ may perform these controls by the control unit 31.

ここで、本フィルタ演算器１では、Ａ乃至Ｅをフィルタ係数とすると、以下演算を行うものとして説明する。
[出力画素]＝Ｌｉｍ（Ａ＊Ｆ１＋Ｂ＊Ｆ２＋Ｃ＊Ｆ３＋Ｄ＊Ｆ４＋Ｅ＊Ｆ５） Here, in this filter arithmetic unit 1, assuming that A to E are filter coefficients, the following description will be made assuming that the calculation is performed.
[Output Pixel] = Lim (A * F1 + B * F2 + C * F3 + D * F4 + E * F5)

このフィルタ演算器１は、通常外部メモリからのデータはバースト的に転送されてくるため、必ずしも常に連続にデータが入力されるとは限らない。また画像データは隣同士の画素同士には比較的相関関係があるため、画素同士の差分は比較的小さい。以上の特徴を利用することで、小規模な部分積生成部を使用し回路規模を大幅に削減させることができる。同時に前データとの差分が少ない場合にはほぼ連続的にデータを出力させ、例外的に差分が大きくなり乗算時間が伸びてもバーストデータ間に若干の時間があるため、それほどの性能劣化を伴わずに処理を可能にすることができる。更に、回路規模削減により消費電力を削減することも可能である。 Since the data from the external memory is normally transferred in bursts in the filter computing unit 1, the data is not always input continuously. In addition, since the image data has a relatively correlation between adjacent pixels, the difference between the pixels is relatively small. By utilizing the above features, the circuit scale can be greatly reduced using a small partial product generator. At the same time, when the difference from the previous data is small, the data is output almost continuously, and even if the difference is large and the multiplication time is extended, there is a little time between burst data, so there is a considerable performance degradation. Processing can be made without Furthermore, power consumption can be reduced by reducing the circuit scale.

さらに、本実施の形態のように、５つのフィルタ係数からなるフィルタ演算を行なう場合、各フィルタ係数に対する演算を並列に高速処理させようとすると演算効率が低下することがあり、所望の高速処理が得られない場合がある。または、制御が複雑になるとい場合がある。そこで、本実施の形態においては、演算部１０_１〜１０_５の前段に入力選択セレクタ１３_１〜１３_５を設け、演算処理を行なっていない（演算が終了した）演算部を利用して演算効率を向上し、演算処理を高速化する。 Furthermore, when performing a filter operation consisting of five filter coefficients as in the present embodiment, the calculation efficiency may be reduced if an operation for each filter coefficient is processed in parallel at a high speed. It may not be obtained. Or there are cases where the control becomes complicated. Therefore, in the present embodiment, the input selection selectors 13 _{1 to} 13 ₅ are provided in the preceding stage of the calculation units 10 ₁ to 10 ₅ , and the calculation efficiency is obtained using the calculation unit that has not performed the calculation process (the calculation has been completed). Improve the processing speed.

以下、本実施の形態にかかるフィルタ演算器１について更に詳細に説明する。減算器１２_１〜１２_５は、レジスタＦ０１〜Ｆ０５に保持されている１つ前の画像値からレジスタＦ００〜Ｆ０４に保持されている現在の画像値を減算して差分データを求める。この理由について説明する。図８は、画像について水平方向の隣り合った画素間の差信号の振幅分布を示す図である（画像情報圧縮、テレビジョン学会偏、Ｐ７１）。横軸は振幅、縦軸は周波数を示す。差信号は０近傍の狭い範囲に集中する。よって、減算器１２_１〜１２_５により差信号を求めることで、０に近い値とすることができる。差分データとして入力を０に近い値とすることで、後述する繰り返し演算回数を最小限とすることができ、演算処理時間を短縮化することができる。この値はＦ０６〜Ｆ０９、Ｆ０Ａに保持される。なお、本実施の形態においては、画素値は８ビット、減算後の値は符号ビットを含めて９ビットとする。 Hereinafter, the filter computing unit 1 according to the present embodiment will be described in more detail. The subtracters 12 _{1 to} 12 ₅ subtract the current image value held in the registers F00 to F04 from the previous image value held in the registers F01 to F05 to obtain difference data. The reason for this will be described. FIG. 8 is a diagram showing the amplitude distribution of the difference signal between adjacent pixels in the horizontal direction for an image (image information compression, Television Society bias, P71). The horizontal axis represents amplitude and the vertical axis represents frequency. The difference signal is concentrated in a narrow range near zero. Therefore, by obtaining the difference signal by the subtracters 12 _{1 to} 12 ₅ , a value close to 0 can be obtained. By setting the input as the difference data to a value close to 0, it is possible to minimize the number of repeated calculations described later, and to shorten the calculation processing time. This value is held in F06 to F09 and F0A. In this embodiment, the pixel value is 8 bits, and the value after subtraction is 9 bits including the sign bit.

次に、レジスタＦ０６〜Ｆ０９、Ｆ０Ａの減算結果に基づき制御部３１が演算部での繰り返し演算回数及びそのタイミングを求め、これに基づき各演算部１０_１〜１０_５での繰り返し演算回数及び繰り返し演算タイミングを決定する。すなわち、例えば、レジスタＦ０７からの出力は、演算部１０_２のみならず、必要に応じて演算部１０_１、１０_３でも行なう。 Next, based on the subtraction results of the registers F06 to F09 and F0A, the control unit 31 obtains the number of repeated operations in the operation unit and its timing, and based on this, the number of repeated operations and the repeated operation in each of the operation units 10 ₁ to 10 ₅ Determine timing. That is, for example, the output from the register F07 not only computing unit 10 ₂ performs any operation unit ₁₀ 1, 10 ₃ as necessary.

以下では具体的な数値を例に説明する。Ｆ００〜Ｆ０５まで、下記表３のように、１０、１１、６、３９、３４、３５という値が入力されることとする。この場合、各画素値の差分は、１、−５、３３、−５、１となり、各乗算部（×２^０、×２^２、×２^４、×２^６）で使用される符号データは下記表のようになって、演算部１０_１〜１０_５の繰り返し演算回数は、それぞれ、１、２、４、２、１回となる。ここで、レジスタＦ０６〜Ｆ０９、Ｆ０Ａの出力結果をそれぞれ対応する演算部１０_１〜１０_５で行なう場合、自身の演算ということとする。 Hereinafter, specific numerical values will be described as an example. From F00 to F05, the values 10, 11, 6, 39, 34, and 35 are input as shown in Table 3 below. In this case, the difference between the pixel values is 1, −5, 33, −5, 1 and the code data used in each multiplication unit (× 2 ⁰ , × 2 ² , × 2 ⁴ , × 2 ⁶ ) is As shown in the following table, the number of repeated computations of the computation units 10 ₁ to 10 ₅ is ₁ , 2, 4, 2, and 1 respectively. Here, when the output results of the registers F06 to F09 and F0A are respectively performed by the corresponding computing units 10 ₁ to 10 ₅ , they are assumed to be their own computations.

制御部３１は、これらの各演算部１０_１〜１０_５における繰り返し演算回数がなるべく均一になるようセレクタ１３_１〜１３_５を制御する。すなわち、繰り返し演算回数が多い演算部は繰り返し演算回数が少ない演算部も使用して演算を行なうようにする。本例においては、演算部１０_１が自身の演算と演算部１０_２の演算を１回、演算部１０_２は自身の演算１回と演算部１０_３の演算を１回、演算部１０_４は、自身の演算１回と演算部１０_３の演算を１回行うことで、各演算部１０_１〜１０_５が２回ずつ演算を行うこととなる。演算部１０_１〜１０_５を各演算部で共有しない場合は、繰り返し演算回数が最も多い４回分の演算時間が必要であるが、本実施の形態のように、セレクタ１３_１〜１３_５を設けて隣り合う演算部１０_１〜１０_５を共有することで、繰り返し演算回数を半分に減らし、演算処理の高速化を図ることができる。 Control unit 31 repeats the number of operations in each of these computing section ₁₀ 1 to 10 ₅ controls the selectors _131-134 ₅ so as to be as uniform as possible. In other words, a calculation unit with a large number of repeated calculations uses a calculation unit with a small number of repeated calculations to perform the calculation. In the present embodiment, the arithmetic unit 10 ₁ has its own operation and the arithmetic unit 10 ₂ of the operation once, arithmetic unit 10 ₂ is its operation once and once operation of the arithmetic unit 10 _3, an arithmetic unit 10 ₄ Each calculation unit 10 ₁ to 10 ₅ performs the calculation twice by performing one calculation of itself and the calculation of the calculation unit 10 ₃ once. If the arithmetic unit 10 ₁ to 10 ₅ is not shared by the arithmetic unit is repeatedly the number of calculations is required calculation time of the largest four times, as in the present embodiment, provided selectors _131-134 ₅ By sharing the adjacent computing units 10 ₁ to 10 ₅ , it is possible to reduce the number of repeated computations in half and to speed up the computation process.

なお、本実施の形態においては、演算部１０_１〜１０_５で乗算される被乗数をフィルタ係数Ａ〜Ｅとして説明しているが、被乗数が画素値であって、乗数がフィルタ係数の場合、すなわち、乗数がある定められた値である場合は、予め演算回数がわかっている。そのような場合は、制御部３１を設けることなく、予め演算部１０_１〜１０_５の演算回数が均一となるようレジスタＦ０６〜Ｆ０９、Ｆ０Ａの出力を振り分けるようにすればよい。 In the present embodiment, the multiplicands multiplied by the arithmetic units 10 ₁ to 10 ₅ are described as filter coefficients A to E. However, when the multiplicand is a pixel value and the multiplier is a filter coefficient, that is, When the multiplier is a predetermined value, the number of calculations is known in advance. In such a case, without providing the control unit 31, the outputs of the registers F06 to F09 and F0A may be distributed in advance so that the number of computations of the computation units 10 ₁ to 10 ₅ becomes uniform.

次に、制御部３１における繰り返し演算回数及び繰り返し演算タイミング決定方法について説明する。表３に示すように、各演算部１０_１〜１０_５の繰り返し演算回数を加算すると合計１０回であり、各演算部１０_１〜１０_５がそれぞれ２回ずつ繰り返し演算を行えばよいことがわかる。そこで、制御部３１は、各演算部１０_１〜１０_５の繰り返し演算回数が２回ずつとなるように、セレクタ１３_１〜１３_５を切替制御する。これにより、例えば、演算部１０_１は自身の第１回繰り返し演算と、演算部１０_２の第２回繰り返し演算を行う。演算部１０_２は、自身の第１回繰り返し演算と、演算部１０_３の第３回繰り返し演算を行う演算部１０_３は、自身の第１、２回繰り返し演算を行う。演算部１０_４は、自身の第１繰り返し演算と、演算部１０_３の第４回繰り返し演算を行う。演算部１０_５は自身の第１回繰り返し演算と演算部１０_４の第２回繰り返し演算を行う。 Next, a method for determining the number of repeated calculations and the repeated calculation timing in the controller 31 will be described. As shown in Table 3, when the number of repeated operations of each of the arithmetic units 10 ₁ to 10 ₅ is added, it is 10 in total, and it is understood that each of the arithmetic units 10 ₁ to 10 ₅ may perform the operation twice. . Therefore, the control unit 31 repeats the number of calculations of the arithmetic unit ₁₀ 1 to 10 ₅ is such that twice, and switching control of the selector _131-134 _5. Thus, for example, it performs the arithmetic unit 10 ₁ and the operation repeated first time itself, the second iteration operation of the arithmetic unit 10 _2. Arithmetic unit 10 _2, the first iteration and arithmetic, arithmetic unit 10 ₃ repeatedly performs computation third arithmetic unit 10 ₃ of itself, repeated computation first and second times of itself. Arithmetic unit 10 ₄ performs a first repeat operation itself, the 4th repetitive operation of the arithmetic unit 10 _3. Calculation unit ₁₀₅ repeats performing the operation a second time of the first iteration operation and the arithmetic unit 10 ₄ of its own.

一方、従来方法の場合、表４に示すように、演算部単独の繰り返し演算回数は最大４回となる。更に、合計の繰り返し演算回数は１７回となる。従って、本例の場合は、隣合う演算部を共有しても全体の繰り返し演算回数は４回のままとなる。これに対し、隣合う画素の差分を取る場合は、演算部単独の繰り返し演算回数は最大４回となり従来と同様となるが、更に隣合う演算部を共有することで上述のように繰り返し演算回数が２回と大幅に減らすことができる。なお、本例の場合、差分をとらないと繰り返し演算回数が減少しないが、本例のように水平方向に隣接する演算部のみならず、後述するように、垂直方向に隣接する演算部同士を共有することで、演算回数を減少さる能力を向上することができる。 On the other hand, in the case of the conventional method, as shown in Table 4, the number of repetitive calculations of the calculation unit alone is a maximum of four times. Furthermore, the total number of repeated calculations is 17. Therefore, in the case of this example, even if the adjacent calculation units are shared, the total number of repeated calculations remains four. On the other hand, when the difference between adjacent pixels is taken, the number of iterations of the computation unit alone is the maximum of 4 times, which is the same as the conventional one, but by further sharing the neighboring computation unit, the number of iterations as described above. Can be greatly reduced to twice. In the case of this example, if the difference is not taken, the number of repeated computations does not decrease, but not only the computation units adjacent in the horizontal direction as in this example, but also computation units adjacent in the vertical direction as described later. By sharing, the ability to reduce the number of operations can be improved.

図９は、本実施の形態と参考例との比較により、隣接画素の差分を取り、かつ隣合う演算部を共有した場合の効果を示す図である。参考例は、隣接画素の差分を取るのみで、隣り合う演算部を共有しない場合を示す。図９に示すように、本来ならば、最大４回の繰り返し演算を必要とするが、本実施の形態においては、２回の繰り返し演算で全ての演算を終了することができる。すなわち、演算速度を２倍とすることができる。 FIG. 9 is a diagram illustrating an effect when the difference between adjacent pixels is obtained and the adjacent calculation units are shared by comparing the present embodiment with the reference example. The reference example shows a case where only the difference between adjacent pixels is taken and adjacent calculation units are not shared. As shown in FIG. 9, originally, a maximum of four iterations are required, but in this embodiment, all the computations can be completed with two iterations. That is, the calculation speed can be doubled.

ここで、セレクタ１３_ｊの具体的な制御方法としては、以下に説明する方法がある。制御部３１は、全演算部１０_ｊの繰り返し演算回数の総数を求め、各演算部１０_ｊにおける平均繰り返し演算回数を算出する。平均繰り返し演算回数は小数点切り上げの整数とし、当該整数回分繰り返し演算を行うよう、セレクタ１３_ｊを制御する。この場合、例えば、繰り返し演算回数は９ビットの並びに応じてカウントするが、予め９ビットの並びに応じた繰り返し演算回数が対応付けられたテーブルを用意し、図１０（ａ）に示すように、このテーブル４１を参照して各演算部１０_ｋの繰り返し演算回数をカウントし、この繰り返し演算回数の総数に応じてセレクタ１３_ｊを制御する方法がある。 Here, as a specific control method of the selector 13 _j , there is a method described below. The control unit 31 obtains the total number of iterations of all computation units 10 _j and calculates the average number of iterations in each computation unit 10 _j . The average number of iterations is an integer rounded up to the decimal point, and the selector 13 _j is controlled to perform the iterations for the integer number of times. In this case, for example, the number of iterations is counted according to a sequence of 9 bits, but a table in which the number of iterations according to a sequence of 9 bits is associated in advance is prepared, as shown in FIG. There is a method of referring to the table 41 and counting the number of repeated operations of each operation unit 10 _k and controlling the selector 13 _j according to the total number of repeated operations.

また、繰り返し演算回数の総数を求める他の方法としては、各レジスタＦ０６〜Ｆ０９、Ｆ０Ａに格納される９ビットの上位ビットから符号を判定していき、符号の変化点を検出する方法がある。（ｙ_−１、ｙ_０、ｙ_１）＝データ群Ｓ０、（ｙ_１、ｙ_２、ｙ_３）＝データ群Ｓ１、（ｙ_３、ｙ_４、ｙ_５）＝データ群Ｓ２、（ｙ_５、ｙ_６、ｙ_７）＝データ群Ｓ３、（ｙ_７、ｙ_８、ｙ_８）＝データ群Ｓ４、としたとき、例えば、−５であれば、上位ビットｙ_８から検査していくとビットｙ_３までは全て１であり、ビットｙ_２で０となるため、変化点はデータ群Ｓ１に含まれる。この場合は、データ群Ｓ０、Ｓ１の演算のみを行なえばよく、繰り返し演算回数は２回である。すなわち、変化点が出現する以降のデータ群の演算のみを行えばよい。３３であれば、変化点はデータ群Ｓ３に含まれ、この場合は、データ群Ｓ０からデータ群Ｓ３までの４グループの演算を行えばよく、繰り返し演算回数は４回である。このようにして求めた繰り返し演算回数を加算することで、繰り返し演算総数を求め、演算部１０_１〜１０_５の総数で除して平均繰り返し演算回数を求めることができる。 As another method for obtaining the total number of repetitive operations, there is a method of detecting the sign change point by determining the sign from the higher 9 bits stored in the registers F06 to F09 and F0A. (Y ₋₁ , y ₀ , y ₁ ) = data group S 0, (y ₁ , y ₂ , y ₃ ) = data group S 1, (y ₃ , y ₄ , y ₅ ) = data group S 2, (y ₅ , If y ₆ , y ₇ ) = data group S3, (y ₇ , y ₈ , y ₈ ) = data group S4, for example, if it is −5, then when checking from the higher bit y ₈ , bit y _Since all the values up to ₃ are 1 and the bit y ₂ is 0, the change point is included in the data group S1. In this case, only the calculation of the data groups S0 and S1 has to be performed, and the number of repeated calculations is two. That is, it is only necessary to perform calculation on the data group after the change point appears. If it is 33, the change point is included in the data group S3. In this case, four groups of calculations from the data group S0 to the data group S3 may be performed, and the number of repeated calculations is four. By adding the number of repeated operations thus obtained, the total number of repeated operations can be obtained, and the average number of repeated operations can be obtained by dividing by the total number of computing units 10 ₁ to 10 ₅ .

さらに、繰り返し演算回数の総数を求める他の方法としては、データ群毎に、データ群が（０００）又は（１１１）であるか否かを検出するようにしてもよい。データ群が（０００）又は（１１１）である場合は、２次のブースデコード結果が０になるため、演算する必要がない。この場合は、上位ビット側からであっても下位ビット側からであっても、又は全ビット同時に行なうようにしてもよい。例えば３３であれば、データ群Ｓ０、Ｓ２、Ｓ３が演算対象であり繰り返し演算回数は３回である。このように、単純に「３３」は６ビットで表せるから、演算回数が４回とするよりも繰り返し演算回数を減らすことができる。 Furthermore, as another method for obtaining the total number of repeated operations, it may be detected for each data group whether the data group is (000) or (111). When the data group is (000) or (111), the secondary booth decoding result is 0, so that there is no need for calculation. In this case, the processing may be performed from the upper bit side, the lower bit side, or all the bits simultaneously. For example, in the case of 33, the data group S0, S2, S3 is a calculation target, and the number of repeated calculations is three. Thus, since “33” can be simply represented by 6 bits, the number of repeated operations can be reduced as compared with the case where the number of operations is four.

図１０（ｂ）は、データ群毎に（０００）又は（１１１）であるか否かを検出する回路の一例を示す図である。９ビットのデータをデータ群Ｓ０〜Ｓ４に分け、それぞれ判定部５１〜５５に入力し、（０００）又は（１１１）であるか否かを判定する。例えば、（０００）又は（１１１）であれば０を、そうでなければ１を出力する。テーブル５６は判定部５１〜５５の出力に応じて繰り返し演算回数を出力する。このとき、どのデータ群の演算を行なうかの情報、すなわち繰り返し演算タイミングを示す情報（以下データ群情報という。）を一緒に出力する。 FIG. 10B is a diagram illustrating an example of a circuit that detects whether the data group is (000) or (111). The 9-bit data is divided into data groups S0 to S4, which are input to the determination units 51 to 55, respectively, to determine whether they are (000) or (111). For example, if (000) or (111), 0 is output, otherwise 1 is output. The table 56 outputs the number of repetitive calculations according to the outputs of the determination units 51 to 55. At this time, information indicating which data group is to be calculated, that is, information indicating the repetitive calculation timing (hereinafter referred to as data group information) is output together.

図１０（ｃ）は、変化点がどの位置にあるかを検出することで繰り返し演算回数を決定する具体的な回路の一例を示す図である。上位ビットからＦＦ６１に画像データを入力する。ＦＦ６１に保持された上位のビットと次に入力されるそれより下位のビットとを比較器６２で比較し、一致であれば例えば"０"、不一致であれば例えば"１"を出力する。カウンタ６３はダウンカウンタでありカウント値を９から０までカウントする。回数決定部６４は、"１"が入力されたときのカウンタ値に基づき、繰り返し演算回数を選択部１４、ビットシフト部１７へ出力する。以上のようにして、制御部３１は、繰り返し演算回数と、どのデータ群で演算が必要かを示すデータ群情報とを求める。なお、繰り返し演算回数及びデータ群情報を繰り返し回数決定部１８_１〜１８_５で決定し、これらのデータを基に制御部３１が各演算部１０_１〜１０_５で実行する演算を決定してもよい。 FIG. 10C is a diagram illustrating an example of a specific circuit that determines the number of repetitive computations by detecting where the change point is located. Image data is input to the FF 61 from the upper bits. The higher order bit held in the FF 61 is compared with the next lower order input bit by the comparator 62, and for example, “0” is output if they match, and “1” is output if they do not match. The counter 63 is a down counter and counts the count value from 9 to 0. The number determination unit 64 outputs the number of repeated calculations to the selection unit 14 and the bit shift unit 17 based on the counter value when “1” is input. As described above, the control unit 31 obtains the number of repeated calculations and data group information indicating which data group requires the calculation. It should be noted that the number of repetition calculations and data group information are determined by the repetition number determination units 18 ₁ to 18 ₅ , and the control unit 31 determines the calculation to be executed by each of the calculation units 10 ₁ to 10 ₅ based on these data. Good.

選択部１３_１〜１３_５は、繰り返し演算回数及びデータ群情報に応じてレジスタＦ０６〜Ｆ０９、Ｆ０Ａからの入力を選択する。すなわち、本例においては、選択部１３_１は、繰り返し演算回数１回目のときは、レジスタＦ０６の値（ｙ_１，ｙ_０，０）＝（０，１，０）を選択し、２回目のときは、レジスタＦ０７の値（ｙ_３，ｙ_２，ｙ_１）＝（１，０，１）を選択する。同様に、選択部１３_２は、１回目の演算の際はレジスタＦ０７の値（ｙ_１，ｙ_０，０）＝（１，１，０）を選択し、２回目はレジスタＦ０８の値（ｙ_５，ｙ_４，ｙ_３）＝（１，０，０）を選択する。選択部１３_３は、繰り返し演算回数１回目のときは、レジスタＦ０８の値（ｙ_１，ｙ_０，０）＝（０，１，０）を選択し、２回目のときもレジスタＦ０８の値（ｙ_３，ｙ_２，ｙ_１）＝（０，０，０）を選択する。選択部１３_４は、繰り返し演算回数１回目のときは、レジスタＦ０９の値（ｙ_１，ｙ_０，０）＝（１，１，０）を選択し、２回目のときはレジスタＦ０８の値（ｙ_７，ｙ_６，ｙ_５）＝（０，０，１）を選択する。選択部１３_５は、繰り返し演算回数１回目のときは、レジスタＦ０Ａの値（ｙ_１，ｙ_０，０）＝（０，１，０）を選択し、２回目のときはレジスタＦ０９の値（ｙ_３，ｙ_２，ｙ_１）＝（１，０，１）を選択する。 The selection units 13 _{1 to} 13 ₅ select inputs from the registers F06 to F09 and F0A according to the number of repeated operations and the data group information. That is, in this example, the selection unit 13 _1, when the repetition number of calculations first, the value of the register _{_{F06 (y 1, y 0,}} 0) = select (0,1,0), the second time In this case, the value (y ₃ , y ₂ , y ₁ ) = ( ₁ , 0, ₁ ) of the register F07 is selected. Similarly, the selection unit 13 _2, first the value of the register F07 is during operation _(y _1, y 0, 0) = select (1,1,0), the second is the value of the register F08 (y ₅ , y ₄ , y ₃ ) = (1, 0, 0). Selector 13 ₃ repeats the number of calculations first time, the register value of F08 _(y _1, y 0, 0) = select (0,1,0), the second value also registers F08 when ( y ₃ , y ₂ , y ₁ ) = (0, 0, 0) is selected. Selector 13 _4, when the repetition number of calculations first, register the value of _{_{F09 (y 1, y 0,}} 0) = select (1,1,0), the second value of the register F08 when the ( y ₇ , y ₆ , y ₅ ) = (0, 0, 1) is selected. Selector 13 ₅ repeats the number of calculations first time, the register value of _{_{F0A (y 1, y 0,}} 0) = select (0,1,0), the second value of the register F09 when the ( y ₃ , y ₂ , y ₁ ) = ( ₁ , 0, ₁ ) is selected.

なお、本実施の形態においては、隣接する演算部同士のみ共有することができることとして説明するが、演算部１０_１と１０_５とを共有してもよい。または、例えば演算部１０_１と演算部１０_４など、いずれの演算部とも共有できるようにしてもよい。 In the present embodiment, it is described as being able to share only calculation unit adjacent to, the arithmetic unit 10 ₁ and 10 ₅ and may share. Or, for example, the arithmetic unit 10 ₁ and arithmetic unit 10 ₄ may be able to share with any of the arithmetic unit.

ブースデコーダ１５_ｋは、上述したように、各３ビットのデータ群から表３に示す符号データを求める。乗算部１６_ｋは上述したように、符号データにフィルタ係数Ａを乗算してビットシフト部１７_ｋへ出力する。 The booth decoder 15 _k obtains the code data shown in Table 3 from each 3-bit data group as described above. As described above, the multiplication unit 16 _k multiplies the code data by the filter coefficient A and outputs the result to the bit shift unit 17 _k .

データ群情報はビットシフト部１７_１〜１７_５へ入力されており、データ群情報（繰り返し演算タイミング）に基づき、０ビットシフト（×２^０、１ビットシフト（×２^２）、２ビットシフト（×２^４）、３ビットシフト（×２^６）を適切に行なう。さらに、本実施の形態においては、ビットシフト部１７_１〜１７_５の後段に、全演算部１０_１〜１０_５の演算結果を全て加算する一の加算器２１が設けられているため、例えばＦ０２からの出力データは、隣の演算部１０_１で演算されるが、もとの演算部１０_２に戻さず、そのまま加算器２１に入力して加算することができる。 Data group information is inputted to the bit shift section ₁₇ 1-17 _5, based on the data group information (repeated calculation timing), 0 bit shift (× ² 0, 1 bit shift (× ² 2), 2-bit shift ( × ² 4), 3-bit shift (× ^{2 6)} appropriately performed. further, in this embodiment, in the subsequent stage of the bit shift unit ₁₇ 1 to 17 _5, calculation results of all calculation unit ₁₀ 1 to 10 ₅ the order in which one adder 21 for adding all provided, for example, the output data from the F02 is computed by the next operation unit 10 _1, without returning to the original operation unit 10 _2, as adders 21 can be added and added.

加算器２１は、本例においては、上述したように２回ですべての演算が終了するため、レジスタ２２に保存されている前回の演算値に対し、２回分の演算値を加算しこれを出力する。この加算器２１は、各データ群Ｓ０〜Ｓ４から得られた部分積を加算すると共に、前回の加算結果に今回の加算結果を加算することで、現在の画素データのフィルタ演算結果を得ることができる。すなわち、前回の加算結果及び今回の加算結果はいずれも差分データにフィルタ係数Ａを乗算した部分積和からなるため、これらを加算することで、差分データではない画素データのフィルタ演算結果を求めることができる。 In this example, the adder 21 completes all the calculations in two times as described above. Therefore, the adder 21 adds the calculated values for two times to the previous calculated value stored in the register 22 and outputs the result. To do. The adder 21 adds the partial products obtained from the data groups S0 to S4, and adds the current addition result to the previous addition result, thereby obtaining the filter operation result of the current pixel data. it can. That is, both the previous addition result and the current addition result are composed of partial product sums obtained by multiplying the difference data by the filter coefficient A, and by adding these, the filter operation result of pixel data that is not difference data is obtained. Can do.

なお、加算器２１は、必要であれば係数ａ等を加算処理し、演算結果をリミッタ回路２３へ出力する。リミッタ回路２３は、例えば０〜２５５までの間に演算結果がおさまるよう制限して結果をレジスタ２４に出力する。セレクタ２５は、初回の繰り返し演算の際には０を選択し、その他レジスタ２２の値を選択出力する。 The adder 21 adds the coefficient a and the like if necessary, and outputs the calculation result to the limiter circuit 23. The limiter circuit 23 outputs the result to the register 24 by limiting the operation result to fall within a range of 0 to 255, for example. The selector 25 selects 0 in the first iterative calculation and selectively outputs the value of the other register 22.

本実施の形態においては、符号データ×フィルタ係数を行なう演算部１０_ｊ（ブースデコーダ１５_ｊ及び乗算部１６_ｊ）を備え、演算部１０_ｊを繰り返し使用することで演算規模を縮小したフィルタ演算器において、隣合う演算部１０_ｊを共有化することで各演算部１０_ｊを使用する繰り返し演算回数を平準化する。このことにより、例えば、ある一の演算部の繰り返し演算回数が４回で、これに隣接する演算部の繰り返し演算回数が２回である場合、一の演算部の演算１回分を隣接する演算部で行わせることにより両者の演算回数を３回とし、最大繰り返し演算回数を減らすことができる。 In the present embodiment, a filter arithmetic unit that includes a calculation unit 10 _j (Booth decoder 15 _j and multiplication unit 16 _j ) that performs code data × filter coefficient, and that reduces the calculation scale by repeatedly using the calculation unit 10 _j . , The number of repetitive calculations using each calculation unit 10 _j is leveled by sharing the adjacent calculation unit 10 _j . Thus, for example, when the number of repeated calculations of a certain calculation unit is four and the number of repeated calculations of an adjacent calculation unit is two, the calculation unit adjacent to one calculation unit of one calculation unit is adjacent. By performing the above, the number of computations of both can be made 3, and the maximum number of repeated computations can be reduced.

さらに、画像データは隣同士の画素同士には比較的相関関係があるため、画素同士の差分も比較的小さことを利用し、入力画像データについて現在のデータと次のデータとの差分をとってフィルタ係数と乗算し、それを加算してフィルタ演算を行なう。このとき、差分をとった入力データは０近傍の値となるため、繰り返し演算回数を大幅に減少させることができる。なお、通常外部メモリからのデータはバースト的に転送されてくるため、常に連続にデータが入力されるわけではない。すなわち、データ入力の待ち時間があるため、たとえ繰り返し演算が含まれても待ち時間の間に行なうことができる。 Furthermore, since the image data is relatively correlated with the adjacent pixels, the difference between the pixels is utilized, and the difference between the current data and the next data is obtained for the input image data. The filter coefficient is multiplied and added to perform a filter operation. At this time, the input data taking the difference becomes a value in the vicinity of 0, so that the number of repeated calculations can be greatly reduced. Since data from the external memory is normally transferred in bursts, data is not always input continuously. In other words, since there is a waiting time for data input, even if repetitive calculation is included, it can be performed during the waiting time.

以上のように演算部を繰り返し且つ共有して使用することで、回路規模を大幅に削減させることができると共に、各演算部で実行する繰り返し演算回数平均化し、処理時間を大幅に減少することができる。また、隣接画素間の値を演算に利用することで更に繰り返し演算回数を減少させて消費電力を削減することができる。 As described above, the operation unit can be used repeatedly and shared, so that the circuit scale can be greatly reduced, and the number of repeated operations executed by each operation unit can be averaged to significantly reduce the processing time. it can. Further, by using the value between adjacent pixels for the calculation, the number of repeated calculations can be further reduced to reduce the power consumption.

次に、本実施の形態における変形例について説明する。図１１及び図１２は、本実施の形態にかかる変形例を示す図である。上述の実施の形態においては、水平方向で隣り合う演算部１０_ｊを共有することで繰り返し演算回数を減少させるものとして説明したが、本変形例においては、垂直方向で隣合う演算部も共有することで、更に繰り返し演算回数を減少させるものである。なお、上述の実施の形態と同様、垂直方向の演算部のみを共有するようにしてもよいことは勿論である。 Next, a modification of the present embodiment will be described. 11 and 12 are diagrams showing a modification example according to the present embodiment. In the above-described embodiment, it has been described that the number of repetitive calculations is reduced by sharing the calculation unit 10 _j adjacent in the horizontal direction. However, in this modification, the calculation units adjacent in the vertical direction are also shared. In this way, the number of repeated calculations is further reduced. Of course, as in the above-described embodiment, only the arithmetic unit in the vertical direction may be shared.

図１１に示すフィルタ演算器１００は、図４に示すフィルタ演算器１が５つ集まったものであり、これらのフィルタ演算器を１_１、１_２、１_３、１_４、１_５とし、特に区別する必要がない場合は、フィルタ演算器１ということとする。本フィルタ演算器１００は、フィルタ演算器１を５つ並列に接続することで、垂直方向に５画素同時に演算することができる。そして、垂直方向に隣接するフィルタ演算器１の演算部を共有する。なお、本変形例では、フィルタ演算器１に水平方向に連続する５つの画素値が入力され、フィルタ演算器１_１、１_２、１_３、１_４、１_５には垂直方向に連続する５つの画素値が入力されるものとして説明するが、垂直フィルタの場合には、フィルタ演算器１に垂直方向に連続する５つの画素値が入力され、フィルタ演算器１_１、１_２、１_３、１_４、１_５には水平方向に連続する５つの画素値が入力される。 A filter arithmetic unit 100 shown in FIG. 11 is a collection of five filter arithmetic units 1 shown in FIG. 4, and these filter arithmetic units are designated as 1 ₁ , 1 ₂ , 1 ₃ , 1 ₄ , 1 ₅ , When there is no need to distinguish, it is referred to as a filter computing unit 1. This filter calculator 100 can simultaneously calculate five pixels in the vertical direction by connecting five filter calculators 1 in parallel. And the calculation part of the filter calculator 1 adjacent to a perpendicular direction is shared. In this modification, the five pixel values horizontally continuous in the filter operation unit 1 is input, the filter operation unit 1 _1, 1 2, _{1 3,} ₁ _4, 1 ₅ continuous in the vertical direction 5 In the case of a vertical filter, five pixel values that are continuous in the vertical direction are input to the filter calculator 1 ₁ , and the filter calculators 1 ₁ , 1 ₂ , 1 ₃ , Five pixel values that are continuous in the horizontal direction are input to 1 ₄ and 1 ₅ .

フィルタ演算器１_１は演算部１０_ｊ（１０_１〜１０_５）を有し、フィルタ演算器１_２は演算部１０_ｋ（１０_６〜１０_１０）を有し、フィルタ演算器１_３は演算部１０_ｌ（１０_１１〜１０_１５）を有し、フィルタ演算器１_４は演算部１０_ｍ（１０_１６〜１０_２０）を有し、フィルタ演算器１_５は演算部１０_ｎ（１０_２１〜１０_２５）を有する。図１２は、演算部１０_１〜１０_２５の詳細を示すブロック図である。演算部１０_１〜１０_２５は基本的には同様の構成を有するため、ここでは、フィルタ演算器１_２の演算部１０_７について説明する。 Filter operation unit _{1 1} includes an arithmetic unit _{_{_{10 j (10 1 ~10 5)}}} , the filter operation unit _{1 2} includes an arithmetic unit _{_{_{10 k (10 6 ~10 10)}}} , the filter operation unit _{1 3} calculation unit a 10 _l a _{₍₁₀₁₁ 15),} the filter operation unit _{1 4} has an arithmetic unit ₁₀ m ₍₁₀ 16 _{to 10 20),} the filter operation unit _{1 5} calculation unit ₁₀ n ₍₁₀ 21 _{to 10 25} ). FIG. 12 is a block diagram illustrating details of the arithmetic units 10 ₁ to 10 ₂₅ . For calculation unit ₁₀ 1 _{to 10 25} is basically has the same configuration, it will be described operation section ₁₀₇ of the filter operation unit _{1 2.}

図１２に示すように、演算部１０_７は、水平方向には演算部１０_６、１０_８と隣接し、演算部を相互に共有する。このため、セレクタ１１３には、演算部１０_６、１０_８のそれぞれレジスタＦ０６、Ｆ０８の値が入力される。また、これら２５個の演算部１０_１〜１０_２５において、セレクタ１１３がどのレジスタからのデータを選択して繰り返し演算を行うかは、図示せぬ制御部によって制御されているものとする。 As illustrated in FIG. 12, the calculation unit 10 ₇ is adjacent to the calculation units 10 ₆ and 10 _{8 in the} horizontal direction and shares the calculation units with each other. For this reason, the values of the registers F06 and F08 of the arithmetic units 10 ₆ and 10 ₈ are input to the selector 113, respectively. Also, in these 25 arithmetic units 10 ₁ to 10 ₂₅ , it is assumed that a control unit (not shown) controls which register 113 selects data from which the selector 113 selects data.

この演算部１０_７は、垂直方向には、演算部１０_２、１０_１２と隣接し、これらの演算部とも演算部を共有する。したがって、セレクタ１１３には、演算部１０_２、１０_１２のそれぞれレジスタＦ０２、Ｆ１２の値が入力される。セレクタ１１３は、図示せぬ制御部の制御のもと、隣接する演算部１０_６、１０_８、１０_２、１０_１２と演算部１０_７の繰り返し演算回数が平準化するよう各演算部の繰り返し演算回数を割り当て、これに応じてセレクタ１１１３にその演算に応じた入力を選択させる。 The arithmetic unit ₁₀₇ is in the vertical direction, adjacent to the arithmetic unit ₁₀ 2, _{10 12,} share the arithmetic unit with these operations unit. Therefore, the values of the registers F02 and F12 of the arithmetic units 10 ₂ and 10 ₁₂ are input to the selector 113, respectively. Under the control of a control unit (not shown), the selector 113 performs repetitive calculation of each calculation unit so that the number of repeated calculations of the adjacent calculation units 10 ₆ , 10 ₈ , 10 ₂ , 10 ₁₂ and the calculation unit 10 ₇ is leveled. The number of times is assigned, and in response to this, the selector 1113 is made to select an input corresponding to the calculation.

さらに、本変形例においては、垂直方向の演算も実施することから、繰り返し演算後のデータを元のフィルタ演算器１に戻す必要がある。このため、水平方向においては、出力側も隣接演算部１０_２、１０_１２と接続されており、セレクタ１１３が演算部１０_２、１０_１２のレジスタＦ０２、Ｆ１２の値を選択した場合は、それぞれ演算結果を元の演算部１_１、１_３へ出力する。このため、隣接水平画素の演算部１０_２、１０_１２の出力が入力される出力選択セレクタ１１４を有している。この出力選択セレクタ１１４も、図示せぬ制御部の制御のもと、自身の演算結果又は水平隣接演算部の演算結果、具体的には、演算部１０_７であれば、ビットシフト部１０_７の出力、シフトレジスタ１０_２の出力、又はビットシフト部１０_１２の出力を選択し、加算器２１に出力する。また、演算部１０_７の出力は、演算部１０_２、１０_１２の出力に接続された出力選択セレクタに入力される。 Furthermore, in this modification example, since the calculation in the vertical direction is also performed, it is necessary to return the data after the repeated calculation to the original filter calculator 1. For this reason, in the horizontal direction, the output side is also connected to the adjacent arithmetic units 10 ₂ , 10 ₁₂ , and when the selector 113 selects the values of the registers F 02, F ₁₂ of the arithmetic units 10 ₂ , 10 ₁₂ , respectively The result is output to the original arithmetic units 1 ₁ , 1 ₃ . For this reason, it has the output selection selector 114 into which the outputs of the arithmetic units 10 ₂ , 10 ₁₂ of the adjacent horizontal pixels are input. This output selection selector 114 is also under the control of a control unit (not shown), the calculation result of itself or the calculation result of the horizontal adjacent calculation unit, specifically, if the calculation unit 10 ₇ , the bit shift unit 10 ₇ output, the output of the shift register 10 _2, or selects the output of the bit shifter _{10 12,} and outputs to the adder 21. Further, the output of the arithmetic unit ₁₀₇ is input to the connected output selection selector to the output of the operational section ₁₀ 2, _{10 12.}

なお、垂直方向、水平方向で制御部を物理的に別途設けてもよいことは勿論である。また、水平方向及び垂直方向のいずれにおいても演算部１０_７を使用したい場合は、例えば水平方向の演算部の使用を優先するなど、各繰り返し演算は優先度を有していてもよい。さらに、水平方向においてもフィルタ演算器１_１とフィルタ演算器１_５とを接続し、両者で演算部を共有するようにしてもよい。さらにまた、水平、垂直方向に隣接する演算部のみならず、斜め方向、すなわち、演算部１０_７であれば、演算部１０_１、１０_３、１０_１１、１０_１３とも演算部を共有してもよい。 Of course, the control unit may be physically provided separately in the vertical direction and the horizontal direction. Also, if you want to use the operation section ₁₀₇ in both the horizontal and vertical directions, for example, priority is given to the use of horizontal operation unit, the repetitive operation may have a priority. Furthermore, to connect the filter operation unit 1 ₁ and the filter operation unit 1 ₅ in the horizontal direction, it may share the arithmetic unit in both. Furthermore, in addition to the calculation units adjacent in the horizontal and vertical directions, the calculation units 10 ₁ , 10 ₃ , 10 ₁₁ , and 10 ₁₃ may share the calculation units as long as they are in the oblique direction, that is, the calculation unit 10 _7. Good.

本変形例においても、上述したように、演算部を共有することで、繰り返し演算回数の最大数を低減することができ、演算処理速度を高速化する。特に、水平方向のみならず、垂直方向の演算部も共有することで、一の演算部は、共有できる演算部を例えば隣接８近傍の演算部とすることができ、より効率よく演算回数を平均化することができる。 Also in this modification, as described above, by sharing the calculation unit, the maximum number of repeated calculations can be reduced, and the calculation processing speed is increased. In particular, by sharing not only the horizontal calculation unit but also the vertical calculation unit, one calculation unit can make a calculation unit that can be shared, for example, a calculation unit in the vicinity of the adjacent eight, and more efficiently average the number of calculations. Can be

また、本変形例においては、上述の実施の形態と同様に、小規模なセレクタと加算器（ＦｕｌｌＡＤＤＥＲ）を追加した簡易な制御により、稼働率が高い演算部への演算を稼働率が低い演算部に分担させ（融通し合い）、複数サイクル時間を削減するものであるが、垂直方向のデータ相関性が高い場合は、垂直方向のフィルタ演算器の演算部に負荷が集中するのを防ぐために、入力されるデータを例えば１サイクルずつずらす。このように、タイミングずらすことにより、更に処理速度の低下することなく、高速に演算することが可能になる。 Further, in this modified example, as in the above-described embodiment, a simple operation with a small selector and an adder (Full ADDER) is added, so that an operation to a calculation unit with a high operation rate is performed at a low operation rate. The calculation unit shares (consolidates) and reduces the multiple cycle time. However, when the data correlation in the vertical direction is high, the load is prevented from concentrating on the calculation unit of the filter operator in the vertical direction. Therefore, the input data is shifted, for example, by one cycle. Thus, by shifting the timing, it becomes possible to perform high-speed computation without further reducing the processing speed.

なお、本発明は上述した実施の形態のみに限定されるものではなく、本発明の要旨を逸脱しない範囲において種々の変更が可能であることは勿論である。例えば、上述の実施の形態では、ハードウェアの構成として説明したが、これに限定されるものではなく、任意の処理を、ＣＰＵ（Central Processing Unit）にコンピュータプログラムを実行させることにより実現することも可能である。この場合、コンピュータプログラムは、記録媒体に記録して提供することも可能であり、また、インターネットその他の伝送媒体を介して伝送することにより提供することも可能である。 It should be noted that the present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the present invention. For example, in the above-described embodiment, the hardware configuration has been described. However, the present invention is not limited to this, and arbitrary processing may be realized by causing a CPU (Central Processing Unit) to execute a computer program. Is possible. In this case, the computer program can be provided by being recorded on a recording medium, or can be provided by being transmitted via the Internet or another transmission medium.

Ｈ.２６４に準拠して符号化された圧縮画像を復号する復号装置を示すブロック図である。It is a block diagram which shows the decoding apparatus which decodes the compressed image encoded based on H.264. ＶＣ−１に準拠して符号化された圧縮画像を復号する復号装置を示すブロック図である。It is a block diagram which shows the decoding apparatus which decodes the compressed image encoded based on VC-1. Ｈ.２６４及びＶＣ−１の規格に準拠したフィルタ演算を含む動き補償処理を実行する動き補償（ＭＣ）部を示すブロック図である。It is a block diagram which shows the motion compensation (MC) part which performs the motion compensation process containing the filter calculation based on the specification of H.264 and VC-1. 本発明の実施の形態にかかるフィルタ演算器を示すブロック図である。It is a block diagram which shows the filter arithmetic unit concerning embodiment of this invention. ２次のブースのアルゴリズムに従って乗算を実行する乗算器を示すブロック図である。FIG. 3 is a block diagram illustrating a multiplier that performs multiplication according to a second order Booth algorithm. （ａ）は、ブースのアルゴリズムにより符号データ生成に使用されるビットを説明する図、（ｂ）は、図１に示す乗算器の部分積生成ユニットの詳細を示す図である。(A) is a figure explaining the bit used for code | cord | chord data generation by Booth's algorithm, (b) is a figure which shows the detail of the partial product production | generation unit of the multiplier shown in FIG. 従来のフィルタ演算器を示す図である。It is a figure which shows the conventional filter calculator. 画像について水平方向の隣り合った画素間の差信号の振幅分布を示す図である。It is a figure which shows the amplitude distribution of the difference signal between the adjacent pixels of the horizontal direction about an image. 本発明の実施の形態にかかるフィルタ演算器の制御部における繰り返し演算総数のカウントする具体例を示す図である。It is a figure which shows the specific example which counts the total number of repetitive calculations in the control part of the filter arithmetic unit concerning embodiment of this invention. （ａ）は、本実施の形態にかかるフィルタ演算器の演算タイミングを示す図、（ｂ）は、図８に示す従来のフィルタ演算器の演算タイミングを示す図である。(A) is a figure which shows the calculation timing of the filter arithmetic unit concerning this Embodiment, (b) is a figure which shows the calculation timing of the conventional filter arithmetic unit shown in FIG. 本発明の実施の形態における変形例にかかるフィルタ演算器を示す図である。It is a figure which shows the filter computing unit concerning the modification in embodiment of this invention. 本発明の実施の形態における変形例にかかるフィルタ演算器の演算部の詳細を示す図である。It is a figure which shows the detail of the calculating part of the filter arithmetic unit concerning the modification in embodiment of this invention. 特許文献１に記載の離散コサイン変換器を示す図である。It is a figure which shows the discrete cosine transformer of patent document 1. 特許文献２に記載の情報処理装置におけるプロセッサ、レジスタ回路及び係数レジスタを示す図である。FIG. 11 is a diagram illustrating a processor, a register circuit, and a coefficient register in the information processing apparatus described in Patent Document 2.

Explanation of symbols

１、１１、１_２、１_３、１_４、１_５、１００、３０２、３０３、５０１フィルタ演算器
１０_１〜１０_２５、１０_ｊ、１０_ｋ、１０_ｌ、１０_ｍ、１０_ｎ演算部
１１、１１_１〜１１_５、２５、３０１、３０４、３０７、３１０、３１３セレクタ
１２_ｊ、１２_１〜１２_５減算器
１３、１３_１〜１３_５、１１３入力選択セレクタ
１４、１４_１〜１４_５選択部
１５_ｊ、１５_１〜１５_５、５０６〜５０８ブースデコーダ
１６_ｊ、１６_１〜１６_５、５０３〜５０５部分積生成部
１７、１７_１〜１７_５、４１３、４２３、４３３、４４３ビットシフト部
１８_１〜１８_５繰り返し回数決定部
２１加算器
２２、２４レジスタ
２３リミッタ回路
３１制御部
４１、５６テーブル
５１〜５５判定部
６２比較器
６３カウンタ
６４回数決定部
２００、２２０画像復号装置
２０１、２２１圧縮データ
２０２、２２２可変長復号部
２０３、２２３逆量子化部
２０４逆アダマール変換部
２０５、２２５加算器
２０６デブロッキングフィルタ
２０７切替部
２０８、２２７復号画像
２０９、２２８モニタ
２１０画面内予測部
２１１、２２９重み付け予測部
２１２、２３０、３００動き補償部
２１３、２３３予測画像
２２４逆ＤＣＴ変換部
２２６ループフィルタ
３０４、３０５、３１２、４００、４１２、４２２、４３２、４４２乗算器
３０６、３０８、３１１、４５０、６１２加算器
３０９ラインメモリ
４０１部分積生成ユニット 1,11,1 _2, ₁ _3, 1 _4, 1 5, 100,302,303,501 filter operation unit _{_{_{_{10 1 ~10 25, 10 j,}}}} 10 k, 10 l, 10 m, 10 n arithmetic unit 11, 11 ₁ to 11 _5, 25,301,304,307,310,313 selector ₁₂ _j, 12 1 to 12 ₅ subtractor 13 _1-13 _5, 113 input selection selector 14, 14 ₁ to 14 ₅ selector 15 _j , 15 ₁ to 15 ₅ , 506 to 508 Booth decoder 16 _j , 16 _{1 to} 16 ₅ , 503 to 505 Partial product generation unit 17, 17 _{1 to} 17 ₅ , 413, 423, 433, 443 Bit shift unit 18 ₁ to 18 ₅ repeat count determination unit 21 adder 22, 24 registers 23 the limiter circuit 31 the control unit 41,56 table 51 to 55 judging unit 62 comparator 63 counts 64 Number determination unit 200, 220 Image decoding device 201, 221 Compressed data 202, 222 Variable length decoding unit 203, 223 Inverse quantization unit 204 Inverse Hadamard transform unit 205, 225 Adder 206 Deblocking filter 207 Switching unit 208, 227 Decoding Image 209, 228 Monitor 210 In-screen prediction unit 211, 229 Weighted prediction unit 212, 230, 300 Motion compensation unit 213, 233 Predicted image 224 Inverse DCT transform unit 226 Loop filter 304, 305, 312, 400, 412, 422, 432 , 442 Multiplier 306, 308, 311, 450, 612 Adder 309 Line memory 401 Partial product generation unit

Claims

A filter arithmetic unit that performs a product-sum operation on a multiplier and a multiplicand using a Booth algorithm,
An input unit for inputting the multiplier;
The sought by decoding one or more encoded data output from the input unit in accordance with Booth's algorithm, two or more operations to perform the iterative operations for obtaining the product of the respective corresponding said multiplicand and the one or more encoded data And
An input selection selector that selects an output from the input unit and inputs it to one of the two or more arithmetic units;
A filter arithmetic unit comprising: a control unit that determines the number of repeated calculations and a repeated calculation timing in each of the two or more calculation units based on an output from the input unit, and controls the input selection selector based on the determination result .

The filter arithmetic unit according to claim 1, wherein the input unit includes a subtractor that obtains a difference between current input data and previous input data, and outputs the subtraction result.

Has one of the first adder for adding the respective output results of the two or more arithmetic unit,
The input unit includes a subtractor that obtains a difference between current input data and previous input data, and outputs the subtraction result.
The first adder, the cumulative result obtained from the input of the input data of the previous result of the subtraction of claims 1 filter operation, wherein the cumulative addition of the multiplication result obtained by the arithmetic unit vessel.

The input unit is provided corresponding to each arithmetic unit,
The input selection selector, said each arithmetic unit provided in correspondence, the input unit corresponding to the self, and by selecting one of the output from the input unit pixel value is input horizontally adjacent The filter arithmetic unit according to claim 1, wherein the filter arithmetic unit inputs to the arithmetic unit corresponding to the self.

The input unit is provided corresponding to each arithmetic unit,
The input selection selector provided corresponding to each of the arithmetic unit, output from the input unit corresponding to the self, and one of the output from the input unit pixel value is input vertically adjacent The filter arithmetic unit according to claim 1, wherein the filter arithmetic unit is selected and input to the arithmetic unit corresponding to itself.

The input unit is provided corresponding to each arithmetic unit,
The input selection selector provided corresponding to each of the arithmetic unit, one of the outputs from the input unit outputs of the input unit corresponding to the self, the well is pixel values adjacent in the horizontal and vertical directions is input select filter operation unit according to claim 1 or 2, wherein the input to the arithmetic unit corresponding to the self.

A second adder provided corresponding to each horizontal direction for adding output results obtained by each of the two or more arithmetic units based on one horizontal pixel value;
An output selection selector that selects an output result obtained by the arithmetic unit based on one horizontal pixel value to be added by the second adder provided corresponding to the horizontal direction;
The input unit includes a subtractor that obtains a difference between current input data and previous input data, and outputs the subtraction result.
The second adder, based on the value selected from the output selection selector, adds the result obtained by subtracting the pixel value in the horizontal direction to the accumulated result obtained from the previous input data input in one horizontal direction. The filter arithmetic unit according to claim 5 or 6, wherein the multiplication results obtained by the arithmetic unit are cumulatively added.

The computing unit is
A selection unit that divides the output data output from the input unit every two bits from the lower order and selects a total of 3 bits, one of each set and the most significant bit of the lower set,
A Booth decoder to generate the code data decoding in accordance with the Booth algorithm 3-bit data output from the selector,
A multiplier for obtaining a product of the code data and the multiplicand;
The filter arithmetic unit according to claim 1, further comprising: a bit shift unit that shifts an output result from the multiplier by a predetermined bit.

The control unit searches for a position where there is a change in the bit value in order from the upper bit of the subtraction result output from the input unit, which is the difference between the current input data and the previous input data. The filter arithmetic unit according to any one of claims 1 to 8, wherein the number of repetitive operations and the repetitive operation timing are determined based on the search result.

The control unit changes the bit value for all bits from the lower bit to the upper bit of the subtraction result output from the input unit, which is the difference between the current input data and the previous input data. searches the is located, the repetition number of operations and filter operation unit of any one of claims 1 to 8, characterized in that determining the repeat calculation timing based on the search results.

The control unit divides a subtraction result output from the input unit, which is a difference between the current input data and the previous input data, every two bits from the lower order, as a group per total of three-bit bit, for each group, determines whether all the bit values are either identical, and determines the repetition number of operations and the repetitive operation timing based on the determination result The filter arithmetic unit according to any one of claims 1 to 8.

A motion compensation processing device for generating a predicted image,
A first filter calculator for performing a filter operation on the input data in the vertical direction;
A second filter arithmetic unit that performs a filter operation according to horizontal input data;
A weighting operation unit that performs weighting on the operation results of the first and second filter operation units or input data input to the first and second filter operations,
The first and second filter arithmetic units are filter arithmetic units that perform a product-sum operation on input data and filter coefficients using a Booth algorithm,
A a multiplier and multiplicand filter calculator for calculating a product sum with the Booth algorithm,
An input unit for inputting the multiplier;
The sought by decoding one or more encoded data output from the input unit in accordance with Booth's algorithm, two or more operations to perform the iterative operations for obtaining the product of the respective corresponding said multiplicand and the one or more encoded data And
An input selection selector for selecting an output from the input unit and inputting the output to any of the arithmetic units;
A motion compensation processing apparatus comprising: a control unit that determines the number of repetition calculations and a repetition calculation timing in each of the two or more calculation units based on an output from the input unit, and controls the input selection selector based on the determination result.