JP4833690B2

JP4833690B2 - Arithmetic circuit and arithmetic method

Info

Publication number: JP4833690B2
Application number: JP2006057810A
Authority: JP
Inventors: 順之橋本; クブチャンダニテジュ
Original assignee: Kawasaki Microelectronics Inc
Current assignee: Kawasaki Microelectronics Inc
Priority date: 2006-03-03
Filing date: 2006-03-03
Publication date: 2011-12-07
Anticipated expiration: 2026-03-03
Also published as: JP2007233934A

Description

本発明は、２次元空間に配置されたデータに対し、畳み込み演算によって各種のフィルタ処理を行う演算回路およびその演算方法に関するものである。 The present invention relates to an arithmetic circuit that performs various types of filter processing on data arranged in a two-dimensional space by a convolution operation and an arithmetic method thereof.

元画像にデジタル処理を行い、画質の向上、ノイズ除去、エッヂ強調などのフィルタ処理を行う従来技術の１つとして畳み込み演算が知られている。畳み込み演算は、主に画像などのように、ｘ列×ｙ行からなる２次元空間に配置されたデータに対して各種のフィルタ処理をするために使用される。畳み込み演算は、一般的に上記の目的で使用される場合が多いが、その係数値を変えることによって、様々な特性を持つフィルタを構成することができる。 A convolution operation is known as one of conventional techniques for performing digital processing on an original image and performing filter processing such as image quality improvement, noise removal, and edge enhancement. The convolution operation is mainly used to perform various kinds of filter processing on data arranged in a two-dimensional space consisting of x columns × y rows, such as an image. In general, the convolution operation is often used for the above-mentioned purpose, but by changing the coefficient value, a filter having various characteristics can be configured.

例えば、５列×５行のフィルタ処理を行う場合、元画像の列方向（横方向）をｘ、行方向（縦方向）をｙ、元画像のｘ列ｙ行目の画素値をＬ_x,y、フィルタ処理後のｘ列ｙ行目の画素値をＰ_x,y、５列×５行のフィルタ係数のｉ列ｊ行目の係数値をＣ_i,jとすると、フィルタ処理後の画素値Ｐ_x,yは下記式（１）により算出される。上記の通り、フィルタの特性は係数値Ｃ_i,jにより決定され、画質の向上、ノイズ除去、エッヂ強調などの各種の機能を果たす。 For example, when performing a filtering process of 5 columns × 5 rows, the column direction (horizontal direction) of the original image is x, the row direction (vertical direction) is y, and the pixel value of the x column y row of the original image is L _{x, If y is} the pixel value in the x column and y row after the filter processing is P _{x, y} , and the coefficient value in the i column j row of the 5 × 5 filter coefficients is C _{i, j} , the filtered pixel The value P _{x, y} is calculated by the following equation (1). As described above, the characteristics of the filter are determined by the coefficient values C _{i, j} and perform various functions such as image quality improvement, noise removal, and edge enhancement.

以下、説明を簡単にするために、元画像（フィルタ処理前）の２列×２行の４画素分の画素値から、フィルタ処理後の１画素の画素値を算出する畳み込み演算を行うフィルタ（演算回路）について説明する。 Hereinafter, in order to simplify the description, a filter that performs a convolution operation for calculating a pixel value of one pixel after the filter processing from pixel values of four columns of 2 columns × 2 rows of the original image (before the filter processing) ( The arithmetic circuit will be described.

ここでは、図８に示す元画像の各画素の画素値Ｌ１１，Ｌ１２，Ｌ１３，…、Ｌ２１，Ｌ２２，Ｌ２３，…、Ｌ３１，Ｌ３２，Ｌ３３，…に対して畳み込み演算によるフィルタ処理を行い、図９に示すフィルタ処理後の画像の各画素の画素値Ｐ１１，Ｐ１２，Ｐ１２，…、Ｐ２１，Ｐ２２，Ｐ２３，…を算出するものとする。 Here, the pixel values L11, L12, L13,..., L21, L22, L23,..., L31, L32, L33,. Assume that pixel values P11, P12, P12,..., P21, P22, P23,.

図１０は、元画像の各画素の画素値とフィルタ処理後の画像の各画素の画素値との関係を表す一例の概念図である。すなわち、この例では、元画像の２列×２行＝４画素分の画素値Ｌ_i,j，Ｌ_i,j+1，Ｌ_i+1,j，Ｌ_i+1,_j+1を使用して、フィルタ処理後の１画素の画素値Ｐ_i,jが算出される。２列×２行のフィルタの場合、４個（＝２列×２行）のフィルタ係数Ｃ_1,1，Ｃ_1,2，Ｃ_2,1，Ｃ_2,2が必要である。フィルタ処理後の各画素の画素値Ｐ_i,jは下記式（２）により算出される。 FIG. 10 is a conceptual diagram illustrating an example of a relationship between a pixel value of each pixel of the original image and a pixel value of each pixel of the image after filtering. That is, in this example, the pixel values L _{i, j} , L _{i, j + 1} , L _{i + 1, j} , L _{i + 1} , _{j + 1} of 2 columns × 2 rows = 4 pixels of the original image are used. Thus, the pixel value P _{i, j} of one pixel after the filter processing is calculated. In the case of a filter of 2 columns × 2 rows, four (= 2 columns × 2 rows) filter coefficients C _1,1 , C _1,2 , C _2,1 , C _2,2 are required. The pixel value P _{i, j} of each pixel after the filter processing is calculated by the following equation (2).

Ｐ_i,j＝Ｃ_1,1Ｌ_i,j＋Ｃ_1,2Ｌ_i,j+1＋Ｃ_2,1Ｌ_i+1,j＋Ｃ_2,2Ｌ_i+1,_j+1 … （２） P _{i, j} = C _1,1 L _{i, j} + C _1,2 L _{i, j + 1} + C _2,1 L _{i + 1, j} + C _2,2 L _{i + 1} , _{j + 1} (2)

例えば、フィルタ処理後の画像の最初の画素の画素値Ｐ１１は、図１１に示すように、四角の太枠で囲まれた元画像の１および２列目の４画素分の画素値Ｌ１１，Ｌ２１，Ｌ１２，Ｌ２２を使用して算出される。また、図１３に示す順序でフィルタ処理を行う場合、フィルタ処理後の画像の２番目の画素の画素値Ｐ１２は、図１２に示すように、四角の太枠で囲まれた元画像の２及び３列目の４画素分の画素値Ｌ１２，Ｌ２２，Ｌ１３，Ｌ２３を使用して算出される。 For example, the pixel value P11 of the first pixel of the image after filtering is, as shown in FIG. 11, the pixel values L11 and L21 for four pixels in the first and second columns of the original image surrounded by a thick square frame. , L12, L22. Further, when the filtering process is performed in the order shown in FIG. 13, the pixel value P12 of the second pixel of the image after the filtering process is 2 and 2 of the original image surrounded by a square thick frame as shown in FIG. Calculation is performed using pixel values L12, L22, L13, and L23 for four pixels in the third column.

また、例えばノイズ除去のフィルタ処理を行う場合、元画像の４画素分の画素値Ｌ_i,j，Ｌ_i,j+1，Ｌ_i+1,j，Ｌ_i+1,_j+1の平均値を算出するために、フィルタ係数Ｃ_1,1，Ｃ_1,2，Ｃ_2,1，Ｃ_2,2を下記式（３）のように全て同じ値の１／４に設定する。これにより、上記式（２）は下記式（４）の形式となる。このように、元画像の４画素分の画素値Ｌ_i,j，Ｌ_i,j+1，Ｌ_i+1,j，Ｌ_i+1,_j+1の平均値を算出することにより、元画像における小さなノイズを除去する効果を得ることができる。 Also, for example, when performing noise removal filtering, the average of the pixel values L _{i, j} , L _{i, j + 1} , L _{i + 1, j} , L _{i + 1} , _{j + 1} of the four pixels of the original image In order to calculate the values, the filter coefficients C _1,1 , C _1,2 , C _2,1 , C _2,2 are all set to ¼ of the same value as in the following equation (3). Thereby, the said Formula (2) becomes a format of following Formula (4). Thus, by calculating the average value of the pixel values L _{i, j} , L _{i, j + 1} , L _{i + 1, j} , L _{i + 1} , _{j + 1} for the four pixels of the original image, An effect of removing small noise in the image can be obtained.

Ｃ_1,1＝Ｃ_1,2＝Ｃ_2,1＝Ｃ_2,2＝１／４ … （３）
Ｐ_i,j＝（Ｌ_i,j＋Ｌ_i,j+1＋Ｌ_i+1,j＋Ｌ_i+1,_j+1）／４ … （４） _{_{_{C 1,1 = C 1,2 = C 2,1}}} = C 2,2 = 1/4 ... (3)
P _{i, j} = (L _{i, j} + L _{i, j + 1} + L _{i + 1, j} + L _{i + 1} , _{j + 1} ) / 4 (4)

ここで、上記式（２）により、図１３に示す順序でフィルタ処理を行う場合、例えばフィルタ処理後の画素の画素値Ｐ１１とＰ１２のように、隣りあう画素の演算において、図１１と図１２とを見比べると分かるように、１列の２画素分の画素値Ｌ１２とＬ２２が、Ｐ１１とＰ１２の演算の両方で使用されている。従って、例えば特許文献１などには、隣りあう画素の演算において共通に使用される元画像の同じ画素の画素値をメモリなどから２度読み出すことを防止する技術が提案されている。 Here, when the filter processing is performed in the order shown in FIG. 13 by the above equation (2), for example, in the calculation of adjacent pixels like the pixel values P11 and P12 of the pixels after the filter processing, FIG. 11 and FIG. As can be seen from the above, pixel values L12 and L22 for two pixels in one column are used in both the calculations of P11 and P12. Therefore, for example, Patent Document 1 proposes a technique for preventing the pixel value of the same pixel of the original image that is commonly used in calculation of adjacent pixels from being read twice from a memory or the like.

以下、特許文献１について説明する。 Hereinafter, Patent Document 1 will be described.

図１４は、従来の畳み込み演算によるフィルタ処理を行う演算回路の構成を表す一例の概略図である。同図に示す演算回路１４０は、説明を簡単化するために、特許文献１によって提案された技術を２列×２行のフィルタとして構成したものである。演算回路１４０は、画像情報記憶装置１４１と、元画像用レジスタ１４２と、係数用シフトレジスタ１４４と、マルチプレクサ（ＭＵＸ）１４５と、積和演算器１４６と、加算器１４８と、マルチプレクサ（ＭＵＸ）１４９と、２つのフリップフロップ１５０ａ、１５０ｂからなる中間結果レジスタ１５４と、カウンタ１５６とによって構成されている。 FIG. 14 is a schematic diagram illustrating an example of a configuration of an arithmetic circuit that performs filter processing using a conventional convolution operation. An arithmetic circuit 140 shown in the figure is configured by applying the technique proposed in Patent Document 1 as a filter of 2 columns × 2 rows in order to simplify the description. The arithmetic circuit 140 includes an image information storage device 141, an original image register 142, a coefficient shift register 144, a multiplexer (MUX) 145, a product-sum operation unit 146, an adder 148, and a multiplexer (MUX) 149. And an intermediate result register 154 composed of two flip-flops 150a and 150b, and a counter 156.

また、図１５および図１６は、図１４に示す演算回路１４０のカウンタ１５６のカウント値＝０の場合と、カウント値＝１の場合における演算回路１４０の状態を概念的に表したものである。また、図１７は、演算回路１４０で用いられている積和演算器１４６の構成を表す概念図であって、係数値Ｃ１（Ｃ１１またはＣ２１）と画素値Ｌ１とを乗算する乗算器１６０ａと、係数値Ｃ２（Ｃ１２またはＣ２２）と画素値Ｌ２とを乗算する乗算器１６０ｂと、２つの乗算器１６０ａ、１６０ｂの演算結果を加算する加算器１６２とによって構成されている。 15 and 16 conceptually show states of the arithmetic circuit 140 when the counter value 156 of the arithmetic circuit 140 shown in FIG. 14 is 0 and when the count value = 1. FIG. 17 is a conceptual diagram showing the configuration of the product-sum calculator 146 used in the arithmetic circuit 140. The multiplier 160a that multiplies the coefficient value C1 (C11 or C21) and the pixel value L1, The multiplier 160b that multiplies the coefficient value C2 (C12 or C22) and the pixel value L2 and the adder 162 that adds the operation results of the two multipliers 160a and 160b.

カウンタ１５６は、初期化後、０，１を繰り返しカウントする。カウンタ１５６のカウント値＝０の場合、画像情報記憶装置１４１に記憶されている元画像の中から、処理対象となる１列の２画素分の画素値が読み出され、元画像用レジスタ１４２に、それぞれＬ１，Ｌ２として保持される。また、図１５に示すように、係数用シフトレジスタ１４４からは、マルチプレクサ１４５を介して係数値Ｃ１１，Ｃ２１が出力される。 The counter 156 repeatedly counts 0 and 1 after initialization. When the count value of the counter 156 is 0, pixel values for two pixels in one column to be processed are read from the original image stored in the image information storage device 141, and are stored in the original image register 142. Are held as L1 and L2, respectively. Further, as shown in FIG. 15, coefficient values C <b> 11 and C <b> 21 are output from the coefficient shift register 144 via the multiplexer 145.

積和演算器１４６では、元画像用レジスタ１４２から入力される１列の２画素分の画素値Ｌ１，Ｌ２と、係数用シフトレジスタ１４４から入力される係数値Ｃ１１，Ｃ２１との積和演算Ｃ１１Ｌ１＋Ｃ２１Ｌ２が行われる。加算器１４８では、積和演算器１４６の演算結果と０との加算が行われ、その加算結果Ｃ１１Ｌ１＋Ｃ２１Ｌ２が、マルチプレクサ１４９を介して中間結果レジスタ１５４のフリップフロップ１５０ａに入力される。 The product-sum operation unit 146 performs a product-sum operation C11L1 + C21L2 between the pixel values L1 and L2 for two pixels in one column input from the original image register 142 and the coefficient values C11 and C21 input from the coefficient shift register 144. Is done. The adder 148 adds the operation result of the product-sum operation unit 146 and 0, and the addition result C11L1 + C21L2 is input to the flip-flop 150a of the intermediate result register 154 via the multiplexer 149.

中間結果レジスタ１５４のフリップフロップ１５０ａには、クロック信号に同期して、加算器１４８から入力される加算結果が保持される。フリップフロップ１５０ｂには、同じくクロック信号に同期して、フリップフロップ１５０ａの出力が保持されるとともに、フィルタ処理後の画素の画素値として出力される。すなわち、フリップフロップ１５０ｂには、１サイクル（１クロック）前にフリップフロップ１５０ａに保持された値が保持される。 The flip-flop 150a of the intermediate result register 154 holds the addition result input from the adder 148 in synchronization with the clock signal. Similarly, in synchronization with the clock signal, the output of the flip-flop 150a is held in the flip-flop 150b and is output as the pixel value of the pixel after the filter processing. That is, the value held in the flip-flop 150a before one cycle (one clock) is held in the flip-flop 150b.

一方、カウンタ１５６のカウント値＝１の場合、画像情報記憶装置１４１から新たな画素値の読み出しは行われない。従って、元画像用レジスタ１４２には、カウンタ１５６のカウント値＝０の時の２画素分の画素値Ｌ１，Ｌ２が保持されている。係数用シフトレジスタ１４４からは、図１６に示すように、係数値Ｃ１２，Ｃ２２が出力される。積和演算器１４６では、１列の２画素分の画素値Ｌ１，Ｌ２と、係数値Ｃ１２，Ｃ２２との積和演算Ｃ１２Ｌ１＋Ｃ２２Ｌ２が行われる。 On the other hand, when the count value of the counter 156 = 1, no new pixel value is read from the image information storage device 141. Therefore, the original image register 142 holds pixel values L1 and L2 for two pixels when the count value of the counter 156 = 0. Coefficient values C12 and C22 are output from the coefficient shift register 144 as shown in FIG. The product-sum calculator 146 performs a product-sum operation C12L1 + C22L2 of the pixel values L1 and L2 for two pixels in one column and the coefficient values C12 and C22.

加算器１４８では、積和演算器１４６の演算結果とフリップフロップ１５０ｂの出力とが加算される。これにより、３サイクル前にカウンタ１５６のカウンタ値＝０であった時の積和演算器１４６の演算結果Ｃ１１Ｌ１＋Ｃ２１Ｌ２とカウンタ値＝１の時の演算結果Ｃ１２Ｌ１＋Ｃ２２Ｌ２とが加算される。フリップフロップ１５０ｂには、クロック信号に同期して、加算器１４８の加算結果が保持されるとともに、フィルタ処理後の画素の画素値として出力される。フリップフロップ１５０ａには、自分自身の１サイクル前の保持値が保持（維持）される。 The adder 148 adds the operation result of the product-sum operation unit 146 and the output of the flip-flop 150b. As a result, the calculation result C11L1 + C21L2 of the product-sum calculator 146 when the counter value of the counter 156 is 0 three cycles before and the calculation result C12L1 + C22L2 when the counter value = 1 are added. The flip-flop 150b holds the addition result of the adder 148 in synchronization with the clock signal and outputs it as the pixel value of the pixel after the filter processing. The flip-flop 150a holds (maintains) its own held value one cycle before.

次に、下記表１を参照しながら、演算回路１４０の動作を具体的に説明する。 Next, the operation of the arithmetic circuit 140 will be specifically described with reference to Table 1 below.

最初のサイクルは、初期化のサイクルである。中間結果レジスタ１５４の２つのフリップフロップ１５０ａ、１５０ｂの保持値Ｐ１，Ｐ２は０となる。また、表１中では省略しているが、係数用シフトレジスタ１４４には、係数値Ｃ１１，Ｃ２１，Ｃ１２，Ｃ２２が設定される。 The first cycle is an initialization cycle. The holding values P1 and P2 of the two flip-flops 150a and 150b of the intermediate result register 154 are zero. Although omitted in Table 1, coefficient values C11, C21, C12, and C22 are set in the coefficient shift register 144.

サイクル１では、カウンタ１５６のカウント値＝０である。従って、画像情報記憶装置１４１から、最初の１列目の２画素分の画素値Ｌ１１，Ｌ２１が読み出され、それぞれ元画像用レジスタ１４２にＬ１，Ｌ２として保持される。また、係数用シフトレジスタ１４４からは係数値Ｃ１１，Ｃ２１が出力される。そして、積和演算器１４６により、Ｃ１１Ｌ１１＋Ｃ２１Ｌ２１が演算され、加算器１４８により、さらに０が加算される。加算器１４８の加算結果Ｃ１１Ｌ１１＋Ｃ２１Ｌ２１は、クロック信号に同期して、中間結果レジスタ１５４のフリップフロップ１５０ａに、保持値Ｐ１として保持される。フリップフロップ１５０ｂには、フリップフロップ１５０ａに１サイクル前に保持された値である０が、保持値Ｐ２として保持される。 In cycle 1, the count value of the counter 156 = 0. Therefore, the pixel values L11 and L21 for the first two columns of pixels are read from the image information storage device 141 and are stored in the original image register 142 as L1 and L2, respectively. The coefficient values C11 and C21 are output from the coefficient shift register 144. Then, C11L11 + C21L21 is calculated by the product-sum calculator 146, and 0 is further added by the adder 148. The addition result C11L11 + C21L21 of the adder 148 is held as the hold value P1 in the flip-flop 150a of the intermediate result register 154 in synchronization with the clock signal. In the flip-flop 150b, 0, which is the value held in the flip-flop 150a one cycle before, is held as the hold value P2.

サイクル２では、カウンタ１５６のカウント値＝１となる。従って、元画像用レジスタ１４２のＬ１，Ｌ２は、Ｌ１１，Ｌ２１のままである。一方、係数用シフトレジスタ１４４からは係数値Ｃ１２，Ｃ２２が出力される。そして、積和演算器１４６により、Ｃ１２Ｌ１１＋Ｃ２２Ｌ２１が演算され、加算器１４８により、さらに中間結果レジスタ１５４のフリップフロップ１５０ｂの出力０が加算される。加算器１４８の加算結果Ｃ１２Ｌ１１＋Ｃ２２Ｌ２１は、クロック信号に同期して、中間結果レジスタ１５４のフリップフロップ１５０ｂに保持される。フリップフロップ１５０ａは、自分自身の出力であるＣ１１Ｌ１１＋Ｃ２１Ｌ２１を維持する。 In cycle 2, the count value of the counter 156 is 1. Therefore, L1 and L2 of the original image register 142 remain L11 and L21. On the other hand, coefficient values C12 and C22 are output from the coefficient shift register 144. The product-sum calculator 146 calculates C12L11 + C22L21, and the adder 148 further adds the output 0 of the flip-flop 150b of the intermediate result register 154. The addition result C12L11 + C22L21 of the adder 148 is held in the flip-flop 150b of the intermediate result register 154 in synchronization with the clock signal. The flip-flop 150a maintains C11L11 + C21L21, which is its own output.

サイクル３では、再びカウンタ１５６のカウント値＝０となる。２番目の１列の２画素分の画素値Ｌ１２，Ｌ２２が読み出され、それぞれ元画像用レジスタ１４２にＬ１，Ｌ２として保持される。係数用シフトレジスタ１４４からは係数値Ｃ１１，Ｃ２１が出力される。そして、積和演算器１４６により、Ｃ１１Ｌ１２＋Ｃ２１Ｌ２２が演算され、加算器１４８により、さらに０が加算される。加算器１４８の加算結果Ｃ１１Ｌ１２＋Ｃ２１Ｌ２２は、クロック信号に同期して、中間結果レジスタ１５４のフリップフロップ１５０ａに保持される。フリップフロップ１５０ｂには、フリップフロップ１５０ａの出力Ｃ１１Ｌ１１＋Ｃ２１Ｌ２１が保持される。 In cycle 3, the count value of the counter 156 becomes 0 again. Pixel values L12 and L22 for two pixels in the second one column are read and held in the original image register 142 as L1 and L2, respectively. Coefficient values C11 and C21 are output from the coefficient shift register 144. Then, C11L12 + C21L22 is calculated by the product-sum calculator 146, and 0 is further added by the adder 148. The addition result C11L12 + C21L22 of the adder 148 is held in the flip-flop 150a of the intermediate result register 154 in synchronization with the clock signal. The flip-flop 150b holds the output C11L11 + C21L21 of the flip-flop 150a.

サイクル４では、再びカウンタ１５６のカウント値＝１となる。同じく元画像用レジスタ１４２のＬ１，Ｌ２は、Ｌ１２，Ｌ２２のままである。係数用シフトレジスタ１４４からは係数値Ｃ１２，Ｃ２２が出力される。そして、積和演算器１４６により、Ｃ１２Ｌ１２＋Ｃ２２Ｌ２２が演算され、加算器１４８により、さらに中間結果レジスタ１５４のフリップフロップ１５０ｂの出力Ｃ１１Ｌ１１＋Ｃ２１Ｌ２１が加算される。加算器１４８の加算結果Ｃ１１Ｌ１１＋Ｃ２１Ｌ２１＋Ｃ１２Ｌ１２＋Ｃ２２Ｌ２２は、クロック信号に同期して、中間結果レジスタ１５４のフリップフロップ１５０ｂに保持され、フィルタ処理後の画素の画素値Ｐ１１として出力される。フリップフロップ１５０ａは、自分自身の出力Ｃ１１Ｌ１２＋Ｃ２１Ｌ２２を維持する。 In cycle 4, the count value of the counter 156 becomes 1 again. Similarly, L1 and L2 of the original image register 142 remain L12 and L22. Coefficient values C12 and C22 are output from the coefficient shift register 144. The product-sum calculator 146 calculates C12L12 + C22L22, and the adder 148 further adds the output C11L11 + C21L21 of the flip-flop 150b of the intermediate result register 154. The addition result C11L11 + C21L21 + C12L12 + C22L22 of the adder 148 is held in the flip-flop 150b of the intermediate result register 154 in synchronization with the clock signal, and is output as the pixel value P11 of the pixel after filtering. The flip-flop 150a maintains its own output C11L12 + C21L22.

サイクル５以降の動作は、上記と同様である。すなわち、演算回路１４０では、初期化後、４サイクルでフィルタ処理後の最初の画素の画素値Ｐ１１が出力され、以後、２サイクル毎に、フィルタ処理後の２番目以降の画素の画素値Ｐ１２，Ｐ１３，…が順次出力される。 The operations after cycle 5 are the same as described above. That is, in the arithmetic circuit 140, after initialization, the pixel value P11 of the first pixel after filtering is output in four cycles, and thereafter, the pixel value P12 of the second and subsequent pixels after filtering is processed every two cycles. P13,... Are sequentially output.

上記の通り、特許文献１では、隣りあう画素の演算で共通に使用される元画像の画素の画素値を、画像情報記憶装置から２度読み出す必要がないという利点がある。しかし、特許文献１では、初期化後、フィルタ処理後の最初の画素の画素値Ｐ１１が出力されるまでに、４サイクルを必要とし、２番目以降の画素の画素値Ｐ１２，Ｐ１３，…が出力される毎に、２サイクルを必要とするため、フィルタ処理に処理時間（サイクル数）がかかりすぎるという問題がある。 As described above, Patent Document 1 has an advantage that it is not necessary to read out twice from the image information storage device the pixel value of the pixel of the original image that is commonly used in the calculation of adjacent pixels. However, in Patent Document 1, four cycles are required after the initialization until the pixel value P11 of the first pixel after the filtering process is output, and the pixel values P12, P13,... Of the second and subsequent pixels are output. Each time it takes two cycles, there is a problem that it takes too much processing time (number of cycles) for the filter processing.

特開昭５１−１４１５３６号公報JP 51-141536 A

本発明の目的は、前記従来技術に基づく問題点を解消し、２次元空間に配置されたデータに対して、従来よりも少ないサイクル数で畳み込み演算によるフィルタ処理を行うことができる演算回路および演算方法を提供することにある。 SUMMARY OF THE INVENTION An object of the present invention is to solve the problems based on the prior art and to perform an arithmetic circuit and an arithmetic circuit capable of performing filter processing by convolution operation on data arranged in a two-dimensional space with fewer cycles than in the past. It is to provide a method.

また、本発明の他の目的は、畳み込み演算によるフィルタ処理に高速性が要求されない場合、処理速度よりも回路規模を優先させて、その回路規模を適宜削減することができる演算回路および演算方法を提供することにある。 Another object of the present invention is to provide an arithmetic circuit and an arithmetic method capable of appropriately reducing the circuit scale by giving priority to the circuit scale over the processing speed when high speed is not required for the filter processing by the convolution operation. It is to provide.

上記目的を達成するために、本発明は、２次元空間に配置されたデータの畳み込み演算を行う演算回路であって、
Ｎ列×Ｎ行（Ｎは３以上の整数）の係数を保持する係数用レジスタと、
Ｎ個のデータ値を保持するデータレジスタと、
ｍ個（２≦ｍ＜Ｎ）の積和演算器を備えた前処理回路と、
１〜Ｎ番目の積和レジスタを備えた後処理回路とを有し、
初期化サイクルにおいて前記１〜Ｎ番目の内の１〜Ｎ−１番目の積和レジスタを初期化し、
その後、１〜Ｎサイクルにかけて順に、前記２次元空間のＮ列×Ｎ行の範囲の１〜Ｎ列目のデータ値を前記データレジスタに保持し、
各サイクル毎に、
前記前処理回路のｍ個の積和演算器の少なくとも一部のそれぞれで、前記Ｎ個のデータ値と、前記Ｎ列×Ｎ行のいずれかの列のＮ個の係数との間の積和を演算する操作を、Ｎ回未満繰り返して行うことにより前記Ｎ個のデータ値と全ての列の係数との間の積和の演算を行い、
前記積和演算器が演算した、前記Ｎ個のデータ値と前記１列目のＮ個の係数との間の積和演算結果を前記後処理回路の１番目の積和レジスタに保持するとともに、前記Ｎ個のデータ値と前記ｎ列目（ｎ＝２〜Ｎ）の係数との間の積和演算結果と、前サイクルで前記後処理回路のｎ−１番目の積和レジスタに保持した値との積算値を、該後処理回路のｎ番目の積和レジスタに保持することにより、
前記Ｎ番目の積和レジスタから、前記２次元空間の前記Ｎ列×Ｎ行の範囲の中心に位置する第１の演算対象点の畳み込み演算結果を出力することを特徴とする演算回路を提供する。 In order to achieve the above object , the present invention is an arithmetic circuit for performing a convolution operation of data arranged in a two-dimensional space,
A coefficient register for storing coefficients of N columns × N rows (N is an integer of 3 or more);
A data register holding N data values;
a pre-processing circuit including m (2 ≦ m <N) product-sum calculators;
And a post-processing circuit including 1st to Nth product-sum registers,
Initialize the 1-N-1 product-sum registers of the 1-Nth in the initialization cycle,
After that, in the order of 1 to N cycles, the data value of the 1st to Nth columns in the range of N columns × N rows of the two-dimensional space is held in the data register,
For each cycle,
In each of at least some of the m product-sum calculators of the preprocessing circuit, the product-sum between the N data values and the N coefficients of any column of the N columns × N rows. By calculating the product sum between the N data values and the coefficients of all the columns by repeating the operation of calculating N times less than N times,
A product-sum operation result between the N data values and the N coefficients in the first column calculated by the product-sum calculator is held in a first product-sum register of the post-processing circuit, and The product-sum operation result between the N data values and the coefficients of the n-th column (n = 2 to N), and the value held in the n−1th product-sum register of the post-processing circuit in the previous cycle Is held in the n-th product-sum register of the post-processing circuit,
Provided is an arithmetic circuit which outputs a convolution operation result of a first operation target point located at the center of the range of N columns × N rows in the two-dimensional space from the Nth product-sum register. .

ここで、前記後処理回路がさらに、前記１〜Ｎ番目の積和レジスタのそれぞれに対応して設けられた１〜Ｎ番目の一時レジスタを備え、
前記各サイクル毎に、前記積和演算器が演算した、前記データ値と前記１〜Ｎ列目の係数との間の積和演算結果のそれぞれを前記１〜Ｎ番目の一時レジスタに保持した後に、該一時レジスタのｎ番目に保持した積和演算結果と前記前サイクルでｎ−１番目の積和レジスタに保持された値との積算値を、前記ｎ番目の積和レジスタに保持することが好ましい。 Here, the post-processing circuit further includes a 1-Nth temporary register provided corresponding to each of the 1-Nth product-sum registers,
After each product-sum operation result calculated by the product-sum operation unit between the data value and the coefficients in the 1st to Nth columns is held in the 1st to Nth temporary registers for each cycle. The integrated value of the n-th product-sum operation result held in the temporary register and the value held in the (n−1) -th product-sum register in the previous cycle may be held in the n-th product-sum register. preferable.

また、本発明は、２次元空間に配置されたデータの畳み込み演算を行う演算回路であって、
Ｎ列×Ｎ行（Ｎは２以上の整数）の係数を保持する係数用レジスタと、
Ｎ列×Ｎ行のデータレジスタと、
Ｎ個の積和演算器を備えた前処理回路と、
Ｎ個の積和レジスタを備えた後処理回路とを有し、
前記データレジスタに前記２次元空間のＮ列×Ｎ行の範囲の１〜Ｎ列目のデータを保持し、
第１サイクルにおいて、前記Ｎ個の内のｎ番目（ｎ＝１〜Ｎ）の積和演算器で、前記Ｎ列×Ｎ行のデータレジスタのｎ列目に保持されたＮ個のデータ値と前記Ｎ列×Ｎ行の１列目のＮ個の係数との間の積和を演算し、それぞれの演算結果を、前記後処理回路のｎ番目の積和レジスタに保持し、
その後、第ｋ（ｋ＝２〜Ｎ）サイクルにかけて順に、
前記データレジスタに保持されたデータを−１列分シフトし、該データレジスタのＮ列目に、前記２次元空間のＮ列×Ｎ行の範囲に列方向に隣接するＮ＋ｋ−１列目のデータを保持し、
前記ｎ番目の積和演算器で、前記Ｎ列×Ｎ行のデータレジスタのｎ列目に保持されたＮ個のデータ値と前記Ｎ列×Ｎ行のｋ列目のＮ個の係数との間の積和を演算し、それぞれの演算結果と、前サイクルで前記後処理回路のｎ番目の積和レジスタに保持された値との積算値を、前記後処理回路のｎ番目の積和レジスタに保持する操作を繰り返すことにより、
前記Ｎ個の積和レジスタのそれぞれから、前記２次元空間のＮ列×Ｎ行の範囲の中心に位置する第１演算対象点、および、該２次元空間の列方向に前記第１の演算対象点に順に隣りあう、２〜Ｎ番目の演算対象点の畳み込み演算結果を出力することを特徴とする演算回路を提供する。 The present invention is an arithmetic circuit for performing a convolution operation of data arranged in a two-dimensional space,
A coefficient register for holding coefficients of N columns × N rows (N is an integer of 2 or more);
N columns x N rows of data registers;
A preprocessing circuit having N product-sum calculators;
A post-processing circuit having N product-sum registers,
The data register holds the data of the 1st to Nth columns in the range of N columns × N rows of the two-dimensional space,
In the first cycle, the N data values held in the nth column of the data register of N columns × N rows in the nth (n = 1 to N) product-sum operation unit Calculate the sum of products with the N coefficients in the first column of the N columns × N rows, and store the respective operation results in the n th product sum register of the post-processing circuit,
Then, in order through the kth (k = 2 to N) cycle,
The data held in the data register is shifted by −1 column, and the N + k−1 column data adjacent to the N column × N row of the two-dimensional space in the column direction is shifted to the N column of the data register. Hold
In the nth product-sum operation unit, N data values held in the nth column of the N column × N row data register and N coefficients in the N column × N row k column The sum of the results of each operation and the value held in the nth product-sum register of the post-processing circuit in the previous cycle is calculated as the n-th product-sum register of the post-processing circuit. By repeating the operation to hold
From each of the N product-sum registers, a first calculation target point located in the center of a range of N columns × N rows of the two-dimensional space, and the first calculation target in the column direction of the two-dimensional space Provided is an arithmetic circuit characterized in that a convolution operation result of the 2nd to Nth operation target points that are adjacent to the point in order is output.

また、本発明は、２次元空間に配置されたデータとＮ列×Ｎ行（Ｎは２以上の整数）の係数との間の畳み込み演算を行う演算方法であって、
Ｎ個の積和演算器を備えた前処理回路と、
Ｎ個の積和レジスタを備えた後処理回路とを用意し、
第１サイクルにおいて、前記２次元空間のＮ列×Ｎ行の範囲のデータ値と、前記Ｎ列×Ｎ行の第１列のＮ個の係数とを前記前処理回路に供給し、前記Ｎ個の内のｎ番目（ｎ＝１〜Ｎ）の積和演算器で、前記２次元空間のｎ列目のＮ個のデータ値と前記Ｎ個の係数との間の積和を演算し、その演算結果を、前記後処理回路のｎ番目の積和レジスタに保持し、
その後、第ｋ（ｋ＝２〜Ｎ）サイクルにかけて順に、
前記２次元空間のＮ列×Ｎ行のｋ列目から、該Ｎ列×Ｎ行の範囲に列方向に隣接する、Ｎ＋ｋ−１列目までのデータ値と、前記Ｎ列×Ｎ行の内のｋ列目のＮ個の係数とを前記前処理回路に供給し、前記ｎ番目の積和演算器で、前記２次元空間の（Ｎ＋ｋ−１）列目のＮ個のデータ値と前記Ｎ個の係数との間の積和を演算し、それぞれの演算結果と、前サイクルで前記後処理回路のｎ番目の積和レジスタに保持された値との積算値を、前記後処理回路のｎ番目の積和レジスタに保持する操作を繰り返すことにより、
前記Ｎ個の積和レジスタのそれぞれから、前記２次元空間のＮ列×Ｎ行の範囲の中心に位置する第１演算対象点、および、該２次元空間の列方向に前記第１の演算対象点に順に隣りあう、２〜Ｎ番目の演算対象点の畳み込み演算結果を出力することを特徴とする演算方法を提供する。 Further, the present invention is an arithmetic method for performing a convolution operation between data arranged in a two-dimensional space and coefficients of N columns × N rows (N is an integer of 2 or more),
A preprocessing circuit having N product-sum calculators;
Prepare a post-processing circuit with N product-sum registers,
In the first cycle, a data value in a range of N columns × N rows of the two-dimensional space and N coefficients of the first column of N columns × N rows are supplied to the preprocessing circuit, and the N pieces The product-sum calculator between the N data values in the n-th column of the two-dimensional space and the N coefficients is calculated by an nth (n = 1 to N) product-sum calculator The operation result is held in the nth product-sum register of the post-processing circuit,
Then, in order through the kth (k = 2 to N) cycle,
The data values from the Nth column × Nth row kth column to the N + k−1th column adjacent to the Nth column × Nth row in the column direction, and the Nth column × Nth row N coefficients in the k-th column are supplied to the pre-processing circuit, and the N-th product-sum calculator calculates the N data values in the (N + k−1) -th column in the two-dimensional space and the N The sum of products between the coefficients is calculated, and an integrated value of each operation result and the value held in the nth product-sum register of the post-processing circuit in the previous cycle is calculated as n of the post-processing circuit. By repeating the operation held in the th product-sum register,
From each of the N product-sum registers, a first calculation target point located in the center of a range of N columns × N rows of the two-dimensional space, and the first calculation target in the column direction of the two-dimensional space Provided is a calculation method characterized by outputting a convolution calculation result of 2nd to Nth calculation target points that are adjacent to a point in order.

また、本発明は、２次元空間に配置されたデータの畳み込み演算を行う演算回路であって、
Ｎ列×Ｎ行（Ｎは２以上の整数）の係数を保持する係数用レジスタと、
Ｎ列×Ｎ行のデータレジスタと、
ｍ個（ｍ＜Ｎ）の積和演算器を備えた前処理回路と、
Ｎ個の積和レジスタを備えた後処理回路とを有し、
前記データレジスタに前記２次元空間内のＮ列×Ｎ行の範囲の１〜Ｎ列目のデータを保持し、
第１サイクルにおいて、
前記前処理回路のｍ個の積和演算器の少なくとも一部のそれぞれで、前記Ｎ列×Ｎ行のデータレジスタのいずれかの列に保持されたＮ個のデータ値と、前記Ｎ列×Ｎ行の１列目のＮ個の係数との間の積和を演算する操作を繰り返して行うことにより、前記データレジスタの全ての列に保持されたデータ値と前記１列目の係数との間の積和の演算を行い、
前記積和演算器が演算した、前記データレジスタのｎ列目（ｎ＝１〜Ｎ）に保持されたデータ値と前記１列目の係数との間の積和演算結果を、前記前処理回路のｎ番目の積和レジスタに保持し、
その後、第ｋ（ｋ＝２〜Ｎ）サイクルにかけて順に、
前記データレジスタに保持されたデータを列方向に−１シフトし、該データレジスタのＮ列目に、前記２次元空間のＮ列×Ｎ行の範囲に列方向に隣接するＮ＋ｋ−１列目のデータを保持し、
前記前処理回路のｍ個の積和演算器の少なくとも一部のそれぞれで、前記Ｎ列×Ｎ行のデータレジスタのいずれかの列に保持されたＮ個のデータ値と、前記Ｎ列×Ｎ行のｋ列目のＮ個の係数との間の積和を演算する操作を、繰り返して行うことにより、前記データレジスタの全ての列に保持されたデータ値と前記ｋ列目の係数との間の積和の演算を行い、
前記積和演算器が演算した、前記データレジスタのｎ列目に保持されたデータ値と前記ｋ列目の係数との間の積和演算結果と、前サイクルで前記後処理回路のｎ番目の積和レジスタに保持された値との積算値を、該後処理回路のｎ番目の積和レジスタに保持する操作を繰り返すことにより、
前記Ｎ個の積和レジスタのそれぞれから、前記２次元空間のＮ列×Ｎ行の範囲の中心に位置する第１の演算対象点、および、該２次元空間の列方向に前記第１の演算対象点に順に隣りあう、２〜Ｎ番目の演算対象点の畳み込み演算結果を出力することを特徴とする演算回路を提供する。 The present invention is an arithmetic circuit for performing a convolution operation of data arranged in a two-dimensional space,
A coefficient register for holding coefficients of N columns × N rows (N is an integer of 2 or more);
N columns x N rows of data registers;
a pre-processing circuit including m (m <N) product-sum calculators;
A post-processing circuit having N product-sum registers,
The data register holds data in the 1st to Nth columns in the range of N columns × N rows in the two-dimensional space,
In the first cycle,
In each of at least some of the m product-sum calculators of the preprocessing circuit, N data values held in any column of the N columns × N rows of data registers, and the N columns × N By repeating the operation of calculating the product sum between the N coefficients in the first column of the row, the data values held in all the columns of the data register and the coefficients in the first column are The sum of products of
The pre-processing circuit calculates the product-sum operation result between the data value held in the n-th column (n = 1 to N) of the data register and the coefficient in the first column, which is calculated by the product-sum calculator. In the nth sum of products register
Then, in order through the kth (k = 2 to N) cycle,
The data held in the data register is shifted by −1 in the column direction, and the N + k−1th column adjacent to the N column × N row in the two-dimensional space in the column direction is shifted to the Nth column of the data register. Keep the data,
In each of at least some of the m product-sum calculators of the preprocessing circuit, N data values held in any column of the N columns × N rows of data registers, and the N columns × N By repeatedly performing the operation of calculating the sum of products between the N coefficients in the k-th column of the row, the data values held in all the columns of the data register and the coefficients in the k-th column are calculated. Perform product-sum operation between
The product-sum operation result between the data value held in the n-th column of the data register and the coefficient in the k-th column, calculated by the product-sum calculator, and the nth of the post-processing circuit in the previous cycle By repeating the operation of holding the integrated value with the value held in the product-sum register in the nth product-sum register of the post-processing circuit,
From each of the N product-sum registers, a first calculation target point located at the center of a range of N columns × N rows of the two-dimensional space, and the first calculation in the column direction of the two-dimensional space Provided is an arithmetic circuit that outputs a convolution calculation result of the 2nd to N-th calculation target points that are adjacent to the target point in order.

本発明によれば、従来と同様に、隣りあうデータの演算で共通に使用されるデータ値を記憶装置から２度読み出す必要がないという利点がある。また、本発明では、複数の積和演算器を用いて演算処理を行うことよって、従来の演算回路の数倍の処理速度を実現することができる。一方、処理速度が要求されない用途であれば、使用する積和演算器の個数を適宜減らす構成とすることもできるので、回路規模を適宜削減することができる。 According to the present invention, as in the prior art, there is an advantage that it is not necessary to read out twice from the storage device a data value that is commonly used in calculation of adjacent data. In the present invention, the processing speed is several times that of a conventional arithmetic circuit by performing arithmetic processing using a plurality of product-sum arithmetic units. On the other hand, if the processing speed is not required, the number of product-sum calculators to be used can be appropriately reduced, so that the circuit scale can be appropriately reduced.

以下に、添付の図面に示す好適実施形態に基づいて、本発明の演算回路および演算方法を詳細に説明する。 The arithmetic circuit and the arithmetic method of the present invention will be described below in detail based on preferred embodiments shown in the accompanying drawings.

図１は、本発明の演算回路の構成を表す第１の実施形態の概略図である。同図に示す演算回路１０は、２次元空間に配置された元画像の内の２列×２行の４画素分の画素値から、畳み込み演算によるフィルタ処理を順次行う。演算回路１０は、元画像用レジスタ１２と、係数用レジスタ１４と、前処理回路２２と、後処理回路２４とによって構成されている。 FIG. 1 is a schematic diagram of the first embodiment showing the configuration of the arithmetic circuit of the present invention. The arithmetic circuit 10 shown in the figure sequentially performs filter processing by convolution operation from pixel values of 4 pixels of 2 columns × 2 rows in an original image arranged in a two-dimensional space. The arithmetic circuit 10 includes an original image register 12, a coefficient register 14, a preprocessing circuit 22, and a postprocessing circuit 24.

元画像用レジスタ（本発明のデータレジスタ）１２は、クロック信号に同期して、元画像の各画素の画素値が記憶されている画像情報記憶装置（図示省略、図１４を参照）から処理順に読み出される、処理対象の１列の２画素分の画素値を、それぞれＬ１，Ｌ２として順次保持する。元画像用レジスタ１２の出力Ｌ１，Ｌ２は、前処理回路２２の積和演算器１６ａ、１６ｂの両方に共通に入力される。 The original image register (data register of the present invention) 12 is synchronized with the clock signal and is processed in the order of processing from an image information storage device (not shown, see FIG. 14) in which the pixel value of each pixel of the original image is stored. The read pixel values of two pixels in one column to be processed are sequentially held as L1 and L2, respectively. The outputs L1 and L2 of the original image register 12 are commonly input to both the product-sum calculators 16a and 16b of the preprocessing circuit 22.

係数用レジスタ１４は、演算回路１０の機能（フィルタ処理）を決定する係数値を保持する。本実施形態の場合、係数用レジスタ１４には、元画像の２列×２行＝４画素に対応する４個の係数値Ｃ１１，Ｃ２１，Ｃ１２，Ｃ２２が保持されている。１列目の係数値Ｃ１１，Ｃ２１は積和演算器１６ａに入力され、２列目の係数値Ｃ１２，Ｃ２２は積和演算器１６ｂに入力される。 The coefficient register 14 holds a coefficient value that determines the function (filter processing) of the arithmetic circuit 10. In the present embodiment, the coefficient register 14 holds four coefficient values C11, C21, C12, and C22 corresponding to 2 columns × 2 rows = 4 pixels of the original image. The coefficient values C11 and C21 in the first column are input to the product-sum calculator 16a, and the coefficient values C12 and C22 in the second column are input to the product-sum calculator 16b.

前処理回路２２は、列数分（行数分）の２つの積和演算器１６ａ、１６ｂによって構成されている。 The preprocessing circuit 22 is composed of two product-sum calculators 16a and 16b for the number of columns (for the number of rows).

積和演算器１６ａは、１列の２画素分の画素値Ｌ１，Ｌ２と、図１中左側から１列目の２つの係数値Ｃ１１，Ｃ２１との積和演算Ｃ１１Ｌ１＋Ｃ２１Ｌ２を行う。 The product-sum calculator 16a performs a product-sum operation C11L1 + C21L2 of the pixel values L1 and L2 for two pixels in one column and the two coefficient values C11 and C21 in the first column from the left side in FIG.

積和演算器１６ｂは、１列の２画素分の画素値Ｌ１，Ｌ２と、同２列目の２つの係数値Ｃ１２，Ｃ２２との積和演算Ｃ１２Ｌ１＋Ｃ２２Ｌ２を行う。積和演算器１６ｂから出力される演算結果（ｂ）は、加算器１８に入力される。 The product-sum operation unit 16b performs a product-sum operation C12L1 + C22L2 of the pixel values L1 and L2 for two pixels in one column and the two coefficient values C12 and C22 in the second column. The calculation result (b) output from the product-sum calculator 16 b is input to the adder 18.

積和演算器１６ａ、１６ｂは、例えば図１７に示す構成とすることができる。なお、積和演算器１６ａ、１６ｂの構成は、何ら限定されず、同様の積和演算を行うことができる各種構成のものがいずれも使用可能である。 The product-sum calculators 16a and 16b can be configured as shown in FIG. 17, for example. The configuration of the product-sum calculators 16a and 16b is not limited at all, and any of various configurations capable of performing the same product-sum operation can be used.

続いて、後処理回路２４は、列数分の２つの積和レジスタ２１ａ、２１ｂによって構成されている。 Subsequently, the post-processing circuit 24 includes two product-sum registers 21a and 21b corresponding to the number of columns.

積和レジスタ２１ａは、フリップフロップ２０ａによって構成されている。フリップフロップ２０ａは、クロック信号（図示省略）の有効エッジ（クロックエッジ）のタイミングで、積和演算器１６ａの出力Ｔ１を保持値Ｐ１として保持する。言い変えると、積和レジスタ２１ａは、積和演算器１６ａの出力と０との加算結果を保持する。このため、次に述べる積和レジスタ２１ｂと比べて加算器が省略されている。フリップフロップ２０ａはまた、Ｐ１を保持する以前には、前サイクルで保持された値を、出力Ｐ１’として加算器１８に出力する。 The product-sum register 21a is constituted by a flip-flop 20a. The flip-flop 20a holds the output T1 of the product-sum calculator 16a as the hold value P1 at the timing of the valid edge (clock edge) of the clock signal (not shown). In other words, the product-sum register 21a holds the addition result of the output of the product-sum calculator 16a and 0. For this reason, an adder is omitted as compared with a product-sum register 21b described below. The flip-flop 20a also outputs the value held in the previous cycle to the adder 18 as the output P1 'before holding P1.

積和レジスタ２１ｂは、加算器１８と、フリップフロップ２０ｂとによって構成されている。加算器１８によって、積和演算器１６ｂの出力Ｔ２と、フリップフロップ２０ａの出力Ｐ１’との加算が行われ、その加算結果（ｏ）は、クロックエッジのタイミングで、フリップフロップ２０ｂに保持値Ｐ２として保持される。このように、積和レジスタ２１ｂは、加算器１８とフリップフロップ２０ｂとを備えることにより、積和演算器１６ｂの演算結果と前サイクルで積和レジスタ２１ａに保持された値との積算値を算出し、保持する。フリップフロップ２０ｂに保持された値Ｐ２は、フィルタ処理後の画像の各画素の画素値として出力される。 The product-sum register 21b includes an adder 18 and a flip-flop 20b. The adder 18 adds the output T2 of the product-sum calculator 16b and the output P1 ′ of the flip-flop 20a, and the addition result (o) is stored in the flip-flop 20b at the hold value P2 at the timing of the clock edge. Held as. As described above, the product-sum register 21b includes the adder 18 and the flip-flop 20b, thereby calculating an integrated value between the operation result of the product-sum operation unit 16b and the value held in the product-sum register 21a in the previous cycle. And hold. The value P2 held in the flip-flop 20b is output as the pixel value of each pixel of the image after filtering.

なお、演算結果を保持するためにフリップフロップ２０ａ、２０ｂに供給するクロック信号は、画素値を保持するために元画素用レジスタ１２に供給するクロック信号と同一のものにすることができる。そして、クロック信号の第１の有効エッジのタイミングで元画像用レジスタ１２に保持した画素値を利用した演算結果を、同一のクロック信号の次の有効エッジのタイミングで、フリップフロップ２０ａ、２０ｂに保持することができる。この場合、クロック信号の第１の有効エッジから、次の有効エッジまでの期間が、演算回路１０の１つの演算サイクルになる。 The clock signal supplied to the flip-flops 20a and 20b for holding the calculation result can be the same as the clock signal supplied to the original pixel register 12 for holding the pixel value. The calculation result using the pixel value held in the original image register 12 at the timing of the first valid edge of the clock signal is held in the flip-flops 20a and 20b at the timing of the next valid edge of the same clock signal. can do. In this case, the period from the first valid edge of the clock signal to the next valid edge is one computation cycle of the computation circuit 10.

次に、演算回路１０の基本的な動作を説明する。 Next, the basic operation of the arithmetic circuit 10 will be described.

演算回路１０では、初期化のサイクルで、少なくともフリップフロップ２０ａが初期化され、その保持値は０となる。また、係数用レジスタ１４には、４個の係数値Ｃ１１，Ｃ２１，Ｃ１２，Ｃ２２が設定（保持）される。 In the arithmetic circuit 10, at least the flip-flop 20a is initialized in the initialization cycle, and the held value becomes zero. Further, four coefficient values C11, C21, C12, and C22 are set (held) in the coefficient register 14.

続いて、各サイクル毎に、処理順に読み出される元画像の１列の２画素分の画素値が、Ｌ１，Ｌ２として元画像用レジスタ１２に保持される。例えば、図８に示す元画像の各画素の画素値が画像情報記憶装置に記憶され、図１３に示す順序でフィルタ処理が順次行われる場合、元画像用レジスタ１２のＬ１，Ｌ２には、１サイクル毎に、図８中左側から、１列目のＬ１１，Ｌ２１、２列目のＬ１２，Ｌ２２、３列目のＬ１３，Ｌ２３…の順序で１列の２画素分の画素値が保持される。 Subsequently, for each cycle, pixel values for two pixels in one column of the original image read in the processing order are held in the original image register 12 as L1 and L2. For example, when the pixel value of each pixel of the original image shown in FIG. 8 is stored in the image information storage device and the filtering process is sequentially performed in the order shown in FIG. For each cycle, the pixel values of two pixels in one column are held in the order of L11, L21 in the first column, L12, L22 in the second column, L13, L23 in the third column, and so on from the left side in FIG. .

前処理回路２２の積和演算器１６ａでは、元画像用レジスタ１２から入力される１列の２画素分の画素値Ｌ１，Ｌ２と、係数用レジスタ１４から入力される１列目の係数値Ｃ１１，Ｃ２１との積和演算Ｃ１１Ｌ１＋Ｃ２１Ｌ２が行われる。また、積和演算器１６ｂでは、同じく１列の２画素分の画素値Ｌ１，Ｌ２と、２列目の係数値Ｃ１２，Ｃ２２との積和演算Ｃ１２Ｌ１＋Ｃ２２Ｌ２が行われる。 In the product-sum calculator 16 a of the pre-processing circuit 22, the pixel values L 1 and L 2 for two pixels in one column input from the original image register 12 and the coefficient value C 11 in the first column input from the coefficient register 14. , C21 and product-sum operation C11L1 + C21L2. Similarly, the product-sum calculator 16b performs a product-sum operation C12L1 + C22L2 of the pixel values L1 and L2 for two pixels in one column and the coefficient values C12 and C22 in the second column.

積和演算器１６ａの出力Ｔ１は、クロックエッジのタイミングで後処理回路２４の積和レジスタ２１ａのフリップフロップ２０ａに保持値Ｐ１として保持される。そして、加算器１８により、積和演算器１６ｂの出力Ｔ２と、フリップフロップ２０ａの出力Ｐ１’（＝１サイクル前の積和演算器１６ａの出力Ｔ１）とが加算される。その加算結果（ｏ）は、同じくクロックエッジのタイミングで、積和レジスタ２１ｂのフリップフロップ２０ｂに保持値Ｐ２として保持され、フィルタ処理後の画像の各画素の画素値として順次出力される。 The output T1 of the product-sum calculator 16a is held as the hold value P1 in the flip-flop 20a of the product-sum register 21a of the post-processing circuit 24 at the timing of the clock edge. Then, the adder 18 adds the output T2 of the product-sum calculator 16b and the output P1 'of the flip-flop 20a (= 1 output T1 of the product-sum calculator 16a before one cycle). The addition result (o) is held as the hold value P2 in the flip-flop 20b of the product-sum register 21b at the same clock edge timing, and sequentially output as the pixel value of each pixel of the image after the filter processing.

なお、フィルタ処理後の画像の各画素（演算対象点）の画素値（畳み込み演算結果）は、図１０〜図１２に示すように、２次元空間の２列×２行の処理対象の画素の範囲の中心に位置する。 Note that the pixel value (convolution operation result) of each pixel (calculation target point) of the image after the filter processing is as shown in FIG. 10 to FIG. 12 of the pixel to be processed of 2 columns × 2 rows in the two-dimensional space. Located at the center of the range.

次に、下記表２を参照しながら、演算回路１０の動作を具体的に説明する。以下の説明では、図８に示す元画像の各画素の画素値が画像情報記憶装置に記憶され、図１３に示す順序でフィルタ処理が順次行われるものとする。 Next, the operation of the arithmetic circuit 10 will be specifically described with reference to Table 2 below. In the following description, it is assumed that the pixel value of each pixel of the original image shown in FIG. 8 is stored in the image information storage device, and the filter processing is sequentially performed in the order shown in FIG.

表２には、各サイクルの開始時に元画像用レジスタ１２に保持される画素値Ｌ１，Ｌ２、その画素値を利用して積和演算器１６ａ、１６ｂが算出する積和演算結果Ｔ１，Ｔ２、および、同一サイクルの最後にフリップフロップ２０ａ、２０ｂに保持される値Ｐ１，Ｐ２を示す。後から示す表３、４についても同様である。 Table 2 shows pixel values L1 and L2 held in the original image register 12 at the start of each cycle, and product-sum operation results T1 and T2 calculated by the product-sum operation units 16a and 16b using the pixel values. The values P1 and P2 held in the flip-flops 20a and 20b at the end of the same cycle are shown. The same applies to Tables 3 and 4 shown later.

最初のサイクルは、初期化のサイクルである。少なくともフリップフロップ２０ａの保持値が０に初期化される。 The first cycle is an initialization cycle. At least the hold value of the flip-flop 20a is initialized to zero.

サイクル１では、クロック信号の最初のエッジのタイミングで画像情報記憶装置から、図８中左側から１列目の２画素分の画素値Ｌ１１，Ｌ２１が読み出され、元画像用レジスタ１２に、それぞれＬ１，Ｌ２として保持される。その結果、積和演算器１６ａからは、演算結果Ｔ１＝Ｃ１１Ｌ１１＋Ｃ２１Ｌ２１が出力され、積和演算器１６ｂからは、演算結果Ｔ２＝Ｃ１２Ｌ１１＋Ｃ２２Ｌ２１が出力される。 In cycle 1, the pixel values L11 and L21 for the two pixels in the first column from the left side in FIG. 8 are read from the image information storage device at the timing of the first edge of the clock signal, and are stored in the original image register 12 respectively. It is held as L1 and L2. As a result, the operation result T1 = C11L11 + C21L21 is output from the product-sum operation unit 16a, and the operation result T2 = C12L11 + C22L21 is output from the product-sum operation unit 16b.

フリップフロップ２０ａには、クロック信号の次のエッジのタイミングで、積和演算器１６ａの出力Ｔ１＝Ｃ１１Ｌ１１＋Ｃ２１Ｌ２１が保持値Ｐ１として保持される。加算器１８からは、フリップフロップ２０ａの出力Ｐ１’（表２では、前のサイクル、すなわち、初期化サイクルでのＰ１の値として示されている）＝０と積和演算器１６ｂの出力Ｔ２＝Ｃ１２Ｌ１１＋Ｃ２２Ｌ２１との積算結果（ｏ）＝Ｃ１２Ｌ１１＋Ｃ２２Ｌ２１が出力される。フリップフロップ２０ｂには、クロックエッジのタイミングで、この積算結果（ｏ）が保持値Ｐ２として保持される。 The flip-flop 20a holds the output T1 = C11L11 + C21L21 of the product-sum calculator 16a as the hold value P1 at the timing of the next edge of the clock signal. From the adder 18, the output P1 ′ of the flip-flop 20a (shown in Table 2 as the value of P1 in the previous cycle, that is, the initialization cycle) = 0 and the output T2 of the product-sum calculator 16b = Integration result with C12L11 + C22L21 (o) = C12L11 + C22L21 is output. In the flip-flop 20b, the integration result (o) is held as the hold value P2 at the timing of the clock edge.

続いて、サイクル２では、画像情報記憶装置から、同２列目の２画素分の画素値Ｌ１２，Ｌ２２が読み出され、元画像用レジスタ１２に、それぞれＬ１，Ｌ２として保持される。その結果、積和演算器１６ａからは、演算結果Ｔ１＝Ｃ１１Ｌ１２＋Ｃ２１Ｌ２２が出力され、積和演算器１６ｂからは、演算結果Ｔ２＝Ｃ１２Ｌ１２＋Ｃ２２Ｌ２２が出力される。 Subsequently, in cycle 2, pixel values L12 and L22 for two pixels in the second column are read from the image information storage device, and are stored in the original image register 12 as L1 and L2, respectively. As a result, the operation result T1 = C11L12 + C21L22 is output from the product-sum operation unit 16a, and the operation result T2 = C12L12 + C22L22 is output from the product-sum operation unit 16b.

フリップフロップ２０ａには、クロックエッジに同期して、積和演算器１６ａの出力Ｔ１＝Ｃ１１Ｌ１１＋Ｃ２１Ｌ２１が保持される。加算器１８からは、前サイクル（サイクル１）でフリップフロップ２０ａに保持された値Ｐ１’（表２では、前のサイクル、すなわち、サイクル１でのＰ１の値として示されている）＝Ｃ１１Ｌ１１＋Ｃ２１Ｌ２１と積和演算器１６ｂの出力Ｔ２＝Ｃ１２Ｌ１２＋Ｃ２２Ｌ２２との積算結果（ｏ）＝Ｃ１１Ｌ１１＋Ｃ１２Ｌ１２＋Ｃ２１Ｌ２１＋Ｃ２２Ｌ２２が出力される。フリップフロップ２０ｂには、クロックエッジのタイミングで、この積算結果（ｏ）が保持値Ｐ２として保持される。そして、フリップフロップ２０ｂからは、この保持値が、フィルタ処理後の画素値Ｐ１１として出力される。 The flip-flop 20a holds the output T1 = C11L11 + C21L21 of the product-sum calculator 16a in synchronization with the clock edge. From the adder 18, the value P1 ′ held in the flip-flop 20a in the previous cycle (cycle 1) (shown in Table 2 as the value of P1 in the previous cycle, ie, cycle 1) = C11L11 + C21L21 Accumulation result (o) = C11L11 + C12L12 + C21L21 + C22L22 with the output T2 = C12L12 + C22L22 of the product-sum calculator 16b is output. In the flip-flop 20b, the integration result (o) is held as the hold value P2 at the timing of the clock edge. Then, the hold value is output from the flip-flop 20b as the pixel value P11 after the filter processing.

続いて、サイクル３では、画像情報記憶装置から、同３列目の２画素分の画素値Ｌ１３，Ｌ２３が読み出され、元画像用レジスタ１２に、それぞれＬ１，Ｌ２として保持される。その結果、積和演算器１６ａからは、演算結果Ｔ１＝Ｃ１１Ｌ１３＋Ｃ２１Ｌ２３が出力され、積和演算器１６ｂからは、演算結果Ｔ２＝Ｃ１２Ｌ１３＋Ｃ２２Ｌ２３が出力される。 Subsequently, in cycle 3, the pixel values L13 and L23 for the two pixels in the third column are read from the image information storage device and held in the original image register 12 as L1 and L2, respectively. As a result, the operation result T1 = C11L13 + C21L23 is output from the product-sum operation unit 16a, and the operation result T2 = C12L13 + C22L23 is output from the product-sum operation unit 16b.

フリップフロップ２０ａには、クロックエッジのタイミングで、積和演算器１６ａの出力Ｔ１＝Ｃ１１Ｌ１２＋Ｃ２１Ｌ２２が保持される。加算器１８からは、前サイクル（サイクル２）でフリップフロップ２０ａに保持された値Ｐ１’＝Ｃ１１Ｌ１２＋Ｃ２１Ｌ２２と積和演算器１６ｂの出力Ｔ２＝Ｃ１２Ｌ１３＋Ｃ２２Ｌ２３との加算結果（ｏ）＝Ｃ１１Ｌ１２＋Ｃ１２Ｌ１３＋Ｃ２１Ｌ２２＋Ｃ２２Ｌ２３が出力される。フリップフロップ２０ｂには、クロックエッジのタイミングで、この加算結果（ｏ）が保持され、フィルタ処理後の画素の画素値Ｐ１２として出力される。 The flip-flop 20a holds the output T1 = C11L12 + C21L22 of the product-sum calculator 16a at the timing of the clock edge. The adder 18 outputs the addition result (o) = C11L12 + C12L13 + C21L22 + C22L23 of the value P1 ′ = C11L12 + C21L22 held in the flip-flop 20a in the previous cycle (cycle 2) and the output T2 = C12L13 + C22L23 of the product-sum calculator 16b. This addition result (o) is held in the flip-flop 20b at the timing of the clock edge, and is output as the pixel value P12 of the pixel after the filter processing.

これ以降の動作は、上記と同様である。演算回路１０では、初期化後、２サイクルでフィルタ処理後の最初の画素の画素値Ｐ１１が出力され、以後、１サイクル毎に、フィルタ処理後の２番目以降の画素の画素値Ｐ１２，Ｐ１３，…が順次出力される。 The subsequent operation is the same as described above. In the arithmetic circuit 10, the pixel value P11 of the first pixel after the filtering process is output in two cycles after the initialization. Thereafter, the pixel values P12, P13, ... are output sequentially.

なお、表２では、初期化サイクルにおいてフリップフロップ２０ｂの保持値Ｐ２も０に初期化した場合を示した。しかし、サイクル１において、クロック信号以前に出力される値Ｐ１’が演算に利用されるフリップフロップ２０ａについては初期化の必要があるが、フリップフロップ２０ｂについては初期化は必須ではない。また、図１３に示す順序で１行目のフィルタ処理後の画素の画素値Ｐ１１〜Ｐ１５の出力を終えた後には、再び初期化サイクルを行ってから、１行目と同様のサイクルを繰り返すことによって２行目のフィルタ処理後の画素の画素値の保持および出力を行う。 Table 2 shows a case where the hold value P2 of the flip-flop 20b is also initialized to 0 in the initialization cycle. However, in cycle 1, it is necessary to initialize the flip-flop 20a in which the value P1 'output before the clock signal is used for the operation, but the initialization is not essential for the flip-flop 20b. Further, after the output of the pixel values P11 to P15 of the pixels after the filter processing in the first row in the order shown in FIG. 13, the initialization cycle is performed again, and then the same cycle as the first row is repeated. To hold and output the pixel value of the pixel after the filter processing in the second row.

本実施形態の演算回路１０では、図１４に示す従来の演算回路１４０と同様に、隣りあう画素の演算で共通に使用される元画像の画素の画素値を画像情報記憶装置から２度読み出す必要がないという利点がある。また、従来の演算回路１４０では、積和演算器が１つだけしか用いられていないが、本実施形態の演算回路１０では、２つの積和演算器を用いることによって、従来の演算回路１４０の２倍の処理速度が実現されている。さらに、従来の演算回路１４０とは異なり、カウンタ１５６やマルチプレクサ１４９を用いた複雑な制御を必要としない。従って、単純な構成で高い処理速度を実現することが可能である。 In the arithmetic circuit 10 of the present embodiment, as in the conventional arithmetic circuit 140 shown in FIG. 14, it is necessary to read the pixel value of the pixel of the original image that is commonly used in the calculation of adjacent pixels from the image information storage device twice. There is an advantage that there is no. In addition, in the conventional arithmetic circuit 140, only one product-sum arithmetic unit is used, but in the arithmetic circuit 10 of the present embodiment, by using two product-sum arithmetic units, Twice the processing speed is realized. Further, unlike the conventional arithmetic circuit 140, complicated control using the counter 156 and the multiplexer 149 is not required. Therefore, it is possible to realize a high processing speed with a simple configuration.

次に、第１の実施形態の別の例として、元画像の５列×５行の２５画素分の画素値を用いて畳み込み演算によるフィルタ処理を行う演算回路について説明する。 Next, as another example of the first embodiment, an arithmetic circuit that performs filter processing by convolution operation using pixel values for 25 pixels of 5 columns × 5 rows of the original image will be described.

図２は、本発明の演算回路の構成を表す第２の実施形態の概念図である。同図に示す演算回路３０は、演算回路１０と比べると抽象的な表現となっているが、２次元空間に配置された元画像の５列×５行の２５画素分の画素値から、同様の畳み込み演算によるフィルタ処理を行う。演算回路３０は、元画像用レジスタ３２と、係数用レジスタ３４と、前処理回路４２と、後処理回路４４とによって構成されている。 FIG. 2 is a conceptual diagram of the second embodiment showing the configuration of the arithmetic circuit of the present invention. The arithmetic circuit 30 shown in the figure is an abstract expression compared to the arithmetic circuit 10, but it is similar from the pixel values for 25 pixels of 5 columns × 5 rows of the original image arranged in the two-dimensional space. Filter processing by convolution operation. The arithmetic circuit 30 includes an original image register 32, a coefficient register 34, a preprocessing circuit 42, and a postprocessing circuit 44.

以下、図２に示す演算回路３０について、図１の演算回路１０との違いを重点的に説明する。 Hereinafter, the difference between the arithmetic circuit 30 shown in FIG. 2 and the arithmetic circuit 10 shown in FIG. 1 will be mainly described.

演算回路３０において、元画像用レジスタ３２および係数用レジスタ３４は、図１の演算回路１０の元画像用レジスタ１２および係数用レジスタ１４と同様の機能を有する。なお、図２では、元画像用レジスタ３２が、Ｌ_iと記された縦長の長方形で表現されている。これは、処理対象となる元画像のｉ番目の１列の５画素分の画素値Ｌ１ｉ〜Ｌ５ｉをまとめてＬ_iと簡略的に表現したものである。また、係数用レジスタ３４のＣ１〜Ｃ５についても同様である。 In the arithmetic circuit 30, the original image register 32 and the coefficient register 34 have the same functions as the original image register 12 and the coefficient register 14 of the arithmetic circuit 10 in FIG. In FIG. 2, the original image register 32 is represented by a vertically long rectangle denoted by L _i . This is obtained by L _i and a simplified representation together pixel values L1i~L5i of five pixels of the i-th row of the original image to be processed. The same applies to C1 to C5 of the coefficient register 34.

前処理回路４２は、列数分の５つの積和演算器３６ａ、３６ｂ、３６ｃ、３６ｄ、３６ｅによって構成されている。前処理回路４２も、図１の演算回路１０の前処理回路２２と同様の機能を有する。例えば、図２中、一番上側（上側から１番目）の積和演算器３６ａは、Ｃ１Ｌ_i（＝Ｃ１１Ｌ１ｉ＋Ｃ２１Ｌ２ｉ＋Ｃ３１Ｌ３ｉ＋Ｃ４１Ｌ４ｉ＋Ｃ５１Ｌ５ｉ）の演算を行い、その演算結果Ｔ１を出力する。また、２〜５番目の積和演算器３６ｂ、３６ｃ、３６ｄ、３６ｅも同様である。 The preprocessing circuit 42 includes five product-sum calculators 36a, 36b, 36c, 36d, and 36e corresponding to the number of columns. The preprocessing circuit 42 also has the same function as the preprocessing circuit 22 of the arithmetic circuit 10 in FIG. For example, in FIG. 2, the uppermost product-sum calculator 36a (first from the top) performs the calculation of C1L _i (= C11L1i + C21L2i + C31L3i + C41L4i + C51L5i) and outputs the calculation result T1. The same applies to the second to fifth product-sum calculators 36b, 36c, 36d, and 36e.

また、後処理回路４４は、列数分の５つの積和レジスタ４１ａ、４１ｂ、４１ｃ、４１ｄ、４１ｅによって構成されている。後処理回路４４も、図１の演算回路１０の積和レジスタ２１ａ、２１ｂと同様に構成され、同様の機能を有する。すなわち、１つめの積和レジスタ４１ａはフリップフロップのみからなるが、２〜５つめの積和レジスタ４１ｂ〜４１ｅは、加算器とフリップフロップとから構成されている。 The post-processing circuit 44 includes five product-sum registers 41a, 41b, 41c, 41d, and 41e corresponding to the number of columns. The post-processing circuit 44 is also configured similarly to the product-sum registers 21a and 21b of the arithmetic circuit 10 in FIG. 1, and has the same function. That is, the first product-sum register 41a is composed of only flip-flops, while the second to fifth product-sum registers 41b-41e are composed of adders and flip-flops.

一番上側（上側から１番目）の積和レジスタ４１ａは、クロックエッジのタイミングで、１番目の積和演算器３６ａの出力Ｔ１を保持値Ｐ１として保持する。 The uppermost (first from the top) product-sum register 41a holds the output T1 of the first product-sum calculator 36a as the hold value P1 at the timing of the clock edge.

２番目の積和レジスタ４１ｂは、クロックエッジのタイミングで、積和レジスタ４１ａの出力Ｐ１’と積和演算器３６ｂの演算結果Ｔ２との加算結果を保持値Ｐ２として保持する。 The second product-sum register 41b holds the addition result of the output P1 'of the product-sum register 41a and the calculation result T2 of the product-sum calculator 36b as the hold value P2 at the timing of the clock edge.

３〜５番目の積和レジスタ４１ｃ、４１ｄ、４１ｅは、２番目の積和レジスタ４１ｂと同様に動作する。そして、最後（５番目）の積和レジスタ４１ｅにクロック信号に同期して保持された保持値Ｐ５が、フィルタ処理後の画像の各画素の画素値Ｐとして出力される。 The third to fifth product-sum registers 41c, 41d, and 41e operate in the same manner as the second product-sum register 41b. Then, the hold value P5 held in synchronization with the clock signal in the last (fifth) product-sum register 41e is output as the pixel value P of each pixel of the image after filtering.

なお、画素値および係数値を８ビットとすると、図２に示すように、元画像用レジスタ３２から前処理回路４２に対しては８ビット×５＝４０本の配線が接続され、係数用レジスタ３４から前処理回路４２に対しては８ビット×５×５＝２００本の配線が接続される。また、演算結果Ｔ１〜Ｔ５を８ビットとすると、前処理回路４２から後処理回路４４に対しては８ビット×５＝４０本の配線が接続される。そして、フィルタ処理後の画素の画素値を８ビットとすると、後処理回路４４からは、８ビット＝８本の配線が接続される。また、これ以後の実施形態においても、必要に応じて、図中に配線のビット数を示す。 If the pixel value and the coefficient value are 8 bits, as shown in FIG. 2, 8 bits × 5 = 40 wires are connected from the original image register 32 to the preprocessing circuit 42, and the coefficient register From 34 to the pre-processing circuit 42, 8 bits × 5 × 5 = 200 wires are connected. Also, assuming that the calculation results T1 to T5 are 8 bits, 8 bits × 5 = 40 wires are connected from the preprocessing circuit 42 to the postprocessing circuit 44. When the pixel value of the pixel after the filter processing is 8 bits, 8 bits = 8 wires are connected from the post-processing circuit 44. Also in the subsequent embodiments, the number of wiring bits is shown in the figure as necessary.

演算回路３０の基本的な動作は、図１の演算回路１０と同様であるから、ここでは、繰り返しの説明を省略する。 Since the basic operation of the arithmetic circuit 30 is the same as that of the arithmetic circuit 10 of FIG. 1, repeated description is omitted here.

同様に、図８に示す元画像の各画素の画素値が画像情報記憶装置に記憶され、図１３に示す順序でフィルタ処理が順次行われる場合、演算回路３０の具体的な動作は、下記表３，４に示す通りである。 Similarly, when the pixel value of each pixel of the original image shown in FIG. 8 is stored in the image information storage device and the filtering process is sequentially performed in the order shown in FIG. 13, the specific operation of the arithmetic circuit 30 is as shown in the following table. As shown in 3 and 4.

なお、上記表３，４において、元画像用レジスタ３２の出力Ｌ_iは、サイクル１でＬ１となっているが、Ｌ１＝Ｌ１１，Ｌ２１，Ｌ３１，Ｌ４１，Ｌ５１の意味である。サイクル２〜６のＬ２〜Ｌ６も同様である。また、積和演算器３６ａの出力Ｔ１は、サイクル１でＣ１Ｌ１となっているが、Ｃ１Ｌ１＝Ｃ１１Ｌ１１＋Ｃ２１Ｌ２１＋Ｃ３１Ｌ３１＋Ｃ４１Ｌ４１＋Ｃ５１Ｌ５１の意味である。積和演算器３６ｂ、３６ｃ、３６ｄ、３６ｅの出力Ｔ２〜Ｔ５、積和レジスタ４１ａ、４１ｂ、４１ｃ、４１ｄ、４１ｅの出力Ｐ１〜Ｐ５も同様である。 In Tables 3 and 4, the output L _i of the original image register 32 is L1 in cycle 1, which means L1 = L11, L21, L31, L41, and L51. The same applies to L2 to L6 of cycles 2 to 6. The output T1 of the product-sum calculator 36a is C1L1 in cycle 1, which means C1L1 = C11L11 + C21L21 + C31L31 + C41L41 + C51L51. The same applies to the outputs T2 to T5 of the product-sum calculators 36b, 36c, 36d, and 36e, and the outputs P1 to P5 of the product-sum registers 41a, 41b, 41c, 41d, and 41e.

演算回路３０では、初期化後、５サイクルでフィルタ処理後の最初の画素の画素値Ｐが出力され、以後、１サイクル毎に、フィルタ処理後の２番目以降の画素の画素値Ｐが順次出力される。また、演算回路３０では、同様に、隣りあう画素の演算で共通に使用される元画像の画素の画素値を画像情報記憶装置から２度読み出す必要がないという利点があるのはもちろん、５つの積和演算器が用いられているので、従来の５倍の処理速度が実現されている。 In the arithmetic circuit 30, after initialization, the pixel value P of the first pixel after filtering is output in five cycles, and thereafter, the pixel value P of the second and subsequent pixels after filtering is sequentially output every cycle. Is done. Similarly, the arithmetic circuit 30 has the advantage that it is not necessary to read out the pixel value of the pixel of the original image that is commonly used in the calculation of adjacent pixels from the image information storage device twice. Since a product-sum operation unit is used, a processing speed five times that of the conventional one is realized.

なお、本発明は、上記第１および第２の実施形態のように、画像データに限らず、２次元空間に配置された、あらゆる種類のデータのデータ値を畳み込み演算によってフィルタ処理する演算回路にも同様に適用可能である。これ以後の実施形態においても同様である。また、上記第１および第２の実施形態の演算回路１０，３０は、２列×２行や５列×５行のフィルタ処理に限らず、Ｎ列×Ｎ行（Ｎは２以上の整数）のフィルタ処理を行う演算回路に適用可能である。 Note that the present invention is not limited to image data as in the first and second embodiments, and is an arithmetic circuit that filters data values of all types of data arranged in a two-dimensional space by a convolution operation. Is equally applicable. The same applies to the following embodiments. The arithmetic circuits 10 and 30 of the first and second embodiments are not limited to the filtering process of 2 columns × 2 rows or 5 columns × 5 rows, but N columns × N rows (N is an integer of 2 or more). The present invention can be applied to an arithmetic circuit that performs this filtering process.

第１および第２の実施形態の演算回路では、一般的に、初期化後、第１〜Ｎサイクルにかけて順に、２次元空間のＮ列×Ｎ行の範囲の１〜Ｎ列目のデータ値がデータレジスタに保持され、１番目の積和演算器の演算結果が１番目の積和レジスタに保持される。また、ｎ番目（ｎ＝２〜Ｎ）の積和演算器の演算結果と、前サイクルでｎ−１番目の積和レジスタに保持された値との積算値がｎ番目の積和レジスタに保持される。これにより、Ｎ番目の積和レジスタから、２次元空間のＮ列×Ｎ行の範囲の中心に位置する第１の演算対象点の畳み込み演算結果が出力される。 In the arithmetic circuits of the first and second embodiments, generally, after the initialization, the data values of the 1st to Nth columns in the range of N columns × N rows in the two-dimensional space are sequentially given from the 1st to Nth cycles. It is held in the data register and the operation result of the first product-sum operation unit is held in the first product-sum register. Also, the integrated value of the operation result of the nth (n = 2 to N) product-sum operation unit and the value held in the n−1th product-sum register in the previous cycle is held in the nth product-sum register. Is done. As a result, the convolution calculation result of the first calculation target point located at the center of the range of N columns × N rows in the two-dimensional space is output from the Nth product-sum register.

また、１〜Ｎサイクルに続くＮ＋１サイクルにおいてさらに、順に、２次元空間のＮ列×Ｎ行の範囲に隣接するＮ＋１列目以降のデータ値がデータレジスタに保持される。これにより、Ｎ＋Ｍ（Ｍは１以上の整数）サイクル後に、Ｎ番目の積和レジスタから、第１の演算対象点から列方向に順に隣りあうＭ番目の演算対象点の畳み込み演算結果が出力される。 Further, in the (N + 1) cycle following the 1st to Nth cycles, the data values after the (N + 1) th column adjacent to the range of N columns × N rows in the two-dimensional space are sequentially held in the data register. As a result, after N + M cycles (M is an integer equal to or greater than 1), the Nth product-sum register outputs the convolution calculation result of the Mth calculation target point adjacent in the column direction from the first calculation target point. .

次に、本発明の第３の実施形態について説明する。 Next, a third embodiment of the present invention will be described.

図３は、本発明の演算回路の構成を表す第３の実施形態の概念図である。同図に示す演算回路５０も、図２の演算回路３０と同様に、２次元空間に配置された元画像の５列×５行の２５画素分の画素値から、畳み込み演算によるフィルタ処理を行う。図３の演算回路５０は、元画像用レジスタ５２と、係数用レジスタ５４と、前処理回路６２と、後処理回路６４と、カウンタ６６とによって構成されている。 FIG. 3 is a conceptual diagram of the third embodiment showing the configuration of the arithmetic circuit of the present invention. Similar to the arithmetic circuit 30 in FIG. 2, the arithmetic circuit 50 shown in FIG. 2 also performs filter processing by convolution operation from pixel values for 25 pixels of 5 columns × 5 rows of the original image arranged in the two-dimensional space. . The arithmetic circuit 50 shown in FIG. 3 includes an original image register 52, a coefficient register 54, a preprocessing circuit 62, a postprocessing circuit 64, and a counter 66.

また、図４は、図３の演算回路５０を具体的に表現した概略図である。なお、図４では、その煩雑さを避けるために、カウンタ６６の表示を省略し、そのカウント値がｊ＝０〜２で表されている。例えば、フリップフロップ６０ａ１のクロック端子はｊ＝０となっているが、これは、フリップフロップ６０ａ１には、カウンタ６６のカウント値ｊ＝０である期間に、クロック信号が入力されることを意味する。 FIG. 4 is a schematic diagram specifically showing the arithmetic circuit 50 of FIG. In FIG. 4, in order to avoid the complexity, the display of the counter 66 is omitted, and the count value is represented by j = 0-2. For example, the clock terminal of the flip-flop 60a1 is j = 0, which means that the clock signal is input to the flip-flop 60a1 during the period when the count value j = 0 of the counter 66. .

以下、図３および図４に示す演算回路５０について、図２の演算回路３０との違いを重点的に説明する。 Hereinafter, the difference between the arithmetic circuit 50 shown in FIGS. 3 and 4 and the arithmetic circuit 30 shown in FIG. 2 will be mainly described.

演算回路５０において、元画像用レジスタ５２および係数用レジスタ５４は、図２の演算回路３０の元画像用レジスタ３２および係数用レジスタ３４と同様の機能を有する。 In the arithmetic circuit 50, the original image register 52 and the coefficient register 54 have the same functions as the original image register 32 and the coefficient register 34 of the arithmetic circuit 30 in FIG.

図２の演算回路３０では、係数用レジスタ３４から５つの積和演算器３６ａ、３６ｂ、３６ｃ、３６ｄ、３６ｅに各々対応する係数値Ｃ１〜Ｃ５が同時（並列）に出力される。これに対し、図３の演算回路５０では、カウンタ６６のカウント値ｊが変わる毎に、係数用レジスタ５４から２つの積和演算器５６ａ、５６ｂに、１つまたは２つの係数値のみが同時（並列）に出力される。両者は、この点で異なっている。 In the arithmetic circuit 30 of FIG. 2, coefficient values C1 to C5 respectively corresponding to the five product-sum calculators 36a, 36b, 36c, 36d, and 36e are output simultaneously (in parallel) from the coefficient register 34. On the other hand, in the arithmetic circuit 50 of FIG. 3, every time the count value j of the counter 66 changes, only one or two coefficient values are simultaneously sent from the coefficient register 54 to the two product-sum calculators 56a and 56b ( Output in parallel). The two are different in this respect.

図２に示す演算回路３０では、１つの係数値Ｃを８ビットとすると、８ビット×５×５＝２００本の配線が必要となる。これに対し、図３および図４に示す演算回路５０では、係数用レジスタ５４から出力される係数値は最大で２つであるから、同じく１つの係数値を８ビットとすると、８ビット×５×２＝８０本の配線だけしか必要がなく、配線数を２／５に削減することができる。 In the arithmetic circuit 30 shown in FIG. 2, if one coefficient value C is 8 bits, 8 bits × 5 × 5 = 200 wires are required. On the other hand, in the arithmetic circuit 50 shown in FIGS. 3 and 4, since the coefficient value output from the coefficient register 54 is two at the maximum, if one coefficient value is similarly 8 bits, 8 bits × 5 X2 = Only 80 wires are required, and the number of wires can be reduced to 2/5.

つまり、本実施形態の場合、係数用レジスタ５４の出力ビット幅を少なくできる。このように、出力ビット幅を少なくすることにより、係数用レジスタ５４として、レジスタではなく、実際には、より面積が小さいメモリやＦＩＦＯなどを使用することが可能となる。また、このことは、係数用レジスタ５４に限らず、以下の実施形態に示すように、元画像用レジスタについても同様である。 That is, in the present embodiment, the output bit width of the coefficient register 54 can be reduced. As described above, by reducing the output bit width, it is possible to use a memory, FIFO, or the like having a smaller area in place of the register as the coefficient register 54. Further, this is not limited to the coefficient register 54, and the same applies to the original image register as shown in the following embodiment.

続いて、前処理回路６２は、列数（５）に比較して少ない２つの積和演算器５６ａ、５６ｂによって構成されている。 Subsequently, the preprocessing circuit 62 is constituted by two product-sum calculators 56a and 56b which are smaller than the number of columns (5).

積和演算器５６ａは、Ｃ_2j+1Ｌ_iの演算を行い、その演算結果Ｔ_2j+1を出力する。また、積和演算器５６ｂは、Ｃ_2j+2Ｌ_iの演算を行い、その演算結果Ｔ_2j+2を出力する。 MAC unit 56a performs calculation of C _{2j + 1} L _i, and outputs the operation result T _{2j + 1.} Also, sum-of-products arithmetic unit 56b performs arithmetic operation of C _{2j + 2} L _i, and outputs the operation result T _{2j + 2.}

カウンタ６６は、初期化後、クロック信号に同期して、０〜２を繰り返しカウントする。本実施形態の場合、カウンタ６６のカウント値ｊ＝０の時、フィルタ係数用レジスタ５４から係数値Ｃ１，Ｃ２が出力され、それぞれ積和演算器５６ａ、５６ｂに入力される。また、カウント値ｊが１の時、係数値Ｃ３，Ｃ４が、それぞれ積和演算器５６ａ、５６ｂに入力され、カウント値ｊが２の時、係数値Ｃ５が積和演算器５６ａに入力される。 The counter 66 repeatedly counts 0 to 2 in synchronization with the clock signal after initialization. In the present embodiment, when the count value j = 0 of the counter 66, the coefficient values C1 and C2 are output from the filter coefficient register 54 and input to the product-sum calculators 56a and 56b, respectively. When the count value j is 1, the coefficient values C3 and C4 are input to the product-sum calculators 56a and 56b, respectively. When the count value j is 2, the coefficient value C5 is input to the product-sum calculator 56a. .

積和演算器５６ａでは、係数値Ｃ１が入力されると、その演算結果Ｔ１として、Ｃ１１Ｌ１ｉ＋Ｃ２１Ｌ２ｉ＋Ｃ３１Ｌ３ｉ＋Ｃ４１Ｌ４ｉ＋Ｃ５１Ｌ５ｉが出力される。また、Ｃ３が入力されると、Ｔ３として、Ｃ１３Ｌ１ｉ＋Ｃ２３Ｌ２ｉ＋Ｃ３３Ｌ３ｉ＋Ｃ４３Ｌ４ｉ＋Ｃ５３Ｌ５ｉが出力され、Ｃ５が入力されると、Ｔ５として、Ｃ１５Ｌ１ｉ＋Ｃ２５Ｌ２ｉ＋Ｃ３５Ｌ３ｉ＋Ｃ４５Ｌ４ｉ＋Ｃ５５Ｌ５ｉが出力される。 When the coefficient sum C1 is input to the product-sum calculator 56a, C11L1i + C21L2i + C31L3i + C41L4i + C51L5i is output as the calculation result T1. When C3 is input, C13L1i + C23L2i + C33L3i + C43L4i + C53L5i is output as T3, and when C5 is input, C15L1i + C25L2i + C35L3i + C45L4i + C55L5i is output as T5.

積和演算器５６ｂでは、係数値Ｃ２が入力されると、その演算結果Ｔ２として、Ｃ１２Ｌ１ｉ＋Ｃ２２Ｌ２ｉ＋Ｃ３２Ｌ３ｉ＋Ｃ４２Ｌ４ｉ＋Ｃ５２Ｌ５ｉが出力される。また、Ｃ４が入力されると、Ｔ４として、Ｃ１４Ｌ１ｉ＋Ｃ２４Ｌ２ｉ＋Ｃ３４Ｌ３ｉ＋Ｃ４４Ｌ４ｉ＋Ｃ５４Ｌ５ｉが出力される。 When the coefficient sum C2 is input to the product-sum calculator 56b, C12L1i + C22L2i + C32L3i + C42L4i + C52L5i is output as the calculation result T2. When C4 is input, C14L1i + C24L2i + C34L3i + C44L4i + C54L5i is output as T4.

なお、積和演算器５６ａに係数値Ｃ５が入力される時、積和演算器５６ｂの出力は後処理回路６４で使用されないので、積和演算器５６ｂに入力される係数値は何ら限定されない。 Note that when the coefficient value C5 is input to the product-sum calculator 56a, the output of the product-sum calculator 56b is not used by the post-processing circuit 64, so the coefficient value input to the product-sum calculator 56b is not limited.

続いて、後処理回路６４は、列数分の５つの積和レジスタ６１ａ、６１ｂ、６１ｃ、６１ｄ、６１ｅによって構成されている。 Subsequently, the post-processing circuit 64 includes five product-sum registers 61a, 61b, 61c, 61d, and 61e corresponding to the number of columns.

図４に示すように、１番目の積和レジスタ６１ａは、２つのフリップフロップ６０ａ１，６０ａ２によって構成されている。１段目のフリップフロップ６０ａ１は、カウンタ６６のカウント値ｊ＝０の時に、クロックエッジのタイミングで、積和演算器５６ａの出力を１列目の積和演算結果Ｔ１として一時的に保持する。具体的には、例えば、クロック信号の有効エッジのタイミングでカウンタ６６が動作してカウント値ｊが０に変化し、クロック信号の１周期の間に、係数値の演算器５６ａへの入力、および、演算器５６ａからの積和演算結果の出力が行われ、次の有効エッジのタイミングで、フリップフロップ６０ａ１に保持される。この、１段目のフリップフロップ（以降、「一時レジスタ」と呼ぶ）６０ａ１に保持された積和演算結果は、次にカウント値ｊ＝０となる期間に、クロックエッジのタイミングで、２段目のフリップフロップ６０ａ２に保持値Ｐ１として保持される。一方、Ｐ１が保持される以前には、前回カウント値ｊ＝０であった期間に２段目のフリップフロップ６０ａ２に保持された値Ｐ１’が、フリップフロップ６０ａ２から出力され、２番目の積和レジスタ６１ｂの加算器５８ｂに入力される。 As shown in FIG. 4, the first product-sum register 61a is composed of two flip-flops 60a1 and 60a2. The flip-flop 60a1 in the first stage temporarily holds the output of the product-sum calculator 56a as the product-sum operation result T1 in the first column at the timing of the clock edge when the count value j = 0 of the counter 66. Specifically, for example, the counter 66 operates at the timing of the valid edge of the clock signal and the count value j changes to 0, and during one cycle of the clock signal, the coefficient value is input to the calculator 56a, and The product-sum operation result is output from the arithmetic unit 56a and held in the flip-flop 60a1 at the next effective edge timing. The product-sum operation result held in this first-stage flip-flop (hereinafter referred to as “temporary register”) 60a1 is the second-stage flip-flop at the clock edge timing in the period when the count value j = 0. Is held in the flip-flop 60a2 as the hold value P1. On the other hand, before P1 is held, the value P1 ′ held in the second flip-flop 60a2 during the period when the previous count value j = 0 is output from the flip-flop 60a2, and the second sum of products is output. The data is input to the adder 58b of the register 61b.

積和レジスタ６１ｂは、加算器５８ｂと、２つのフリップフロップ６０ｂ１，６０ｂ２とによって構成されている。１段目のフリップフロップ（一時レジスタ）６０ｂ１には、カウンタ６６のカウント値ｊ＝０である期間に、クロックエッジのタイミングで、積和演算器５６ｂの出力を２列目の積和演算結果Ｔ２として保持する。加算器５８ｂは、一時レジスタ６０ｂ１に保持された積和演算結果Ｔ２と、上記フリップフロップ６０ａ２の出力Ｐ１’とを加算する。フリップフロップ６０ｂ２は、次にカウンタ６６のカウント値ｊ＝０となる期間に、クロックエッジのタイミングで、加算器５８ｂの出力Ｐ１’＋Ｔ２を保持する。ここでの演算に利用されるＰ１’は、Ｐ１が保持される以前にフリップフロップ６０ａ２に保持されていた値である。 The product-sum register 61b includes an adder 58b and two flip-flops 60b1 and 60b2. The first-stage flip-flop (temporary register) 60b1 outputs the output of the product-sum operation unit 56b to the second-column product-sum operation result T2 at the timing of the clock edge during the period when the count value j = 0 of the counter 66. Hold as. The adder 58b adds the product-sum operation result T2 held in the temporary register 60b1 and the output P1 'of the flip-flop 60a2. The flip-flop 60b2 holds the output P1 '+ T2 of the adder 58b at the timing of the clock edge in the period when the count value j = 0 of the counter 66 is next. P1 'used for the calculation here is a value held in the flip-flop 60a2 before P1 is held.

積和レジスタ６１ｃ、６１ｄ、６１ｅの構成および動作も、積和レジスタ６１ｂの構成および動作と同様である。すなわち、積和レジスタ６１ｃおよび積和レジスタ６１ｄの一時レジスタ６１ｃ１，６１ｄ１は、カウンタ６６のカウント値ｊ＝１である期間に、クロックエッジのタイミングで、それぞれ積和演算器５６ａおよび５６ｂの出力を、３列目および４列目の積和演算結果Ｔ３，Ｔ４として保持する。積和レジスタ６１ｅの一時レジスタ６０ｅ１は、カウンタ６６のカウント値ｊ＝２である期間に、クロックエッジのタイミングで、積和演算器５６ａの出力を５列目の積和演算結果Ｔ５として保持する。また、積和レジスタ６１ｃ〜６１ｅの加算器５８ｃ〜５８ｅは、それぞれ、積和レジスタ６１ｂ〜６１ｄの２段目のフリップフロップ６０ｂ２〜６０ｄ２に保持された値Ｐ２’〜Ｐ４’と、対応する一時レジスタに保持された積和演算結果Ｔ３〜Ｔ５との積算値を、２段目のフリップフロップ６０ｃ２〜６０ｅ２に出力する。そして、これらの積算値Ｐ２’＋Ｔ３〜Ｐ４’＋Ｔ５は、次にカウント値ｊ＝０になる期間に、クロックエッジのタイミングで、それぞれの積和レジスタの２段目のフリップフロップ６０ｃ２〜６０ｅ２に保持される。 The configuration and operation of the product-sum registers 61c, 61d, and 61e are the same as the configuration and operation of the product-sum register 61b. That is, the product-sum register 61c and the temporary registers 61c1 and 61d1 of the product-sum register 61d respectively output the outputs of the product-sum calculators 56a and 56b at the clock edge timing during the period when the count value j = 1 of the counter 66. The product-sum operation results T3 and T4 in the third and fourth columns are held. The temporary register 60e1 of the product-sum register 61e holds the output of the product-sum calculator 56a as the product-sum operation result T5 in the fifth column at the timing of the clock edge during the period when the count value j = 2 of the counter 66. Further, the adders 58c to 58e of the product-sum registers 61c to 61e respectively correspond to the values P2 ′ to P4 ′ held in the second-stage flip-flops 60b2 to 60d2 of the product-sum registers 61b to 61d and the corresponding temporary registers. The accumulated value of the product-sum operation results T3 to T5 held in is output to the second-stage flip-flops 60c2 to 60e2. Then, these integrated values P2 ′ + T3 to P4 ′ + T5 are held in the second-stage flip-flops 60c2 to 60e2 of the respective product-sum registers at the timing of the clock edge in the next period when the count value j = 0. Is done.

次に、演算回路５０の基本的な動作を説明する。 Next, the basic operation of the arithmetic circuit 50 will be described.

演算回路５０では、初期化処理として、係数用レジスタ１４に、５つの列のそれぞれ５個の係数値Ｃ１〜Ｃ５が保持される。そして、少なくとも積和レジスタ６１ａ、６１ｂ、６１ｃ、６１ｄの一時レジスタおよび２段目のフリップフロップの保持値は全て０に初期化される。 In the arithmetic circuit 50, five coefficient values C1 to C5 in five columns are held in the coefficient register 14 as initialization processing. At least the temporary registers of the product-sum registers 61a, 61b, 61c, and 61d and the holding values of the second-stage flip-flops are all initialized to zero.

前述の通り、カウンタ６６は、クロック信号に同期して、０〜２の順に繰り返しカウントを行い、そのカウント値ｊ＝０〜２を出力する。このカウント値ｊは、係数用レジスタ５４、元画像用レジスタ５２、前処理回路６２および後処理回路６４に入力される。そして、カウント値ｊ＝０になる毎に、処理対象となる元画像の１列の５画素分の画素値が、元画像用レジスタ５２にＬｉとして保持される。 As described above, the counter 66 repeatedly counts in the order of 0 to 2 in synchronization with the clock signal, and outputs the count value j = 0 to 2. The count value j is input to the coefficient register 54, the original image register 52, the preprocessing circuit 62, and the postprocessing circuit 64. Each time the count value j = 0, the pixel values for five pixels in one column of the original image to be processed are held in the original image register 52 as Li.

カウンタ６６のカウント値ｊ＝０の期間には、係数値レジスタ５４から、１および２番目の係数値Ｃ１，Ｃ２が出力され、それぞれ前処理回路６２の積和演算器５６ａ、５６ｂに入力される。 During the period of the count value j = 0 of the counter 66, the first and second coefficient values C1 and C2 are output from the coefficient value register 54 and input to the product-sum calculators 56a and 56b of the preprocessing circuit 62, respectively. .

積和演算器５６ａでは、元画像用レジスタ５２から入力される、１番目の１列の５画素分の画素値Ｌ１と、係数用レジスタ５４から入力される、１番目の１列の５つの係数値Ｃ１との積和演算Ｃ１Ｌ１が行われ、その演算結果Ｔ１が出力される。また、積和演算器５６ｂでは、１番目の１列の５画素分の画素値Ｌ１と、２番目の１列の５つの係数値Ｃ２との積和演算Ｃ２Ｌ１が行われ、その演算結果Ｔ２が出力される。 In the product-sum calculator 56a, the pixel values L1 for five pixels in the first column input from the original image register 52 and the five factors in the first column input from the coefficient register 54 are displayed. A product-sum operation C1L1 with the numerical value C1 is performed, and the operation result T1 is output. The product-sum calculator 56b performs a product-sum operation C2L1 of the pixel values L1 for five pixels in the first column and the five coefficient values C2 in the second column, and the calculation result T2 is Is output.

積和演算器５６ａ、５６ｂの出力Ｔ１およびＴ２は、クロックエッジのタイミングで、それぞれ１番目の積和レジスタ６１ａのフリップフロップ６０ａ１および２番目の積和レジスタ６１ｂのフリップフロップ６０ｂ１に保持される。 The outputs T1 and T2 of the product-sum calculators 56a and 56b are held in the flip-flop 60a1 of the first product-sum register 61a and the flip-flop 60b1 of the second product-sum register 61b, respectively, at the timing of the clock edge.

続いて、カウンタ６６のカウント値ｊ＝１の期間には、係数用レジスタ５４から、３および４列目の係数値Ｃ３，Ｃ４が出力され、それぞれ積和演算器５６ａ、５６ｂに入力される。 Subsequently, during the period of the count value j = 1 of the counter 66, the coefficient values C3 and C4 in the third and fourth columns are output from the coefficient register 54 and input to the product-sum calculators 56a and 56b, respectively.

同様に、積和演算器５６ａでは、１番目の１列の５画素分の画素値Ｌ１と、３番目の１列の５つの係数値Ｃ３との積和演算Ｃ３Ｌ１が行われ、その演算結果Ｔ３が出力される。また、積和演算器５６ｂでは、１番目の１列の５画素分の画素値Ｌ１と、４番目の１列の５つの係数値Ｃ４との積和演算Ｃ４Ｌ１が行われ、その演算結果Ｔ４が出力される。 Similarly, the product-sum operation unit 56a performs a product-sum operation C3L1 of the pixel values L1 for five pixels in the first column and the five coefficient values C3 in the third column, and the calculation result T3 Is output. The product-sum calculator 56b performs a product-sum operation C4L1 of the pixel values L1 for five pixels in the first column and the five coefficient values C4 in the fourth column, and the calculation result T4 is Is output.

積和演算器５６ａ、５６ｂの出力Ｔ３およびＴ４は、クロックエッジのタイミングで、それぞれ３番目の積和レジスタ６１ｃの一時レジスタ６０ｃ１および４番目の積和レジスタ６１ｄの一時レジスタ６０ｄ１に保持される。 The outputs T3 and T4 of the product-sum calculators 56a and 56b are held in the temporary register 60c1 of the third product-sum register 61c and the temporary register 60d1 of the fourth product-sum register 61d, respectively, at the timing of the clock edge.

続いて、カウンタ６６のカウント値ｊ＝２の期間には、係数用レジスタ５４から、５番目の係数値Ｃ５が出力され、前処理回路６２の積和演算器５６ａに入力される。 Subsequently, during the period of the count value j = 2 of the counter 66, the fifth coefficient value C5 is output from the coefficient register 54 and input to the product-sum calculator 56a of the preprocessing circuit 62.

積和演算器５６ａでは、１番目の１列の５画素分の画素値Ｌ１と、５番目の１列の５つの係数値Ｃ５との積和演算Ｃ５Ｌ１が行われ、その演算結果Ｔ５が出力される。 The product-sum operation unit 56a performs a product-sum operation C5L1 of the pixel values L1 for five pixels in the first column and the five coefficient values C5 in the fifth column, and outputs the calculation result T5. The

積和演算器５６ａの出力Ｔ５は、クロックエッジのタイミングで、５番目の積和レジスタ６１ｅの一時レジスタ６０ｅ１に保持される。このように、演算回路５０では、列数（５）よりも少ない個数（２個）の積和演算器５６ａ、５６ｂを使用して、カウンタ６６のカウント値がｊ＝０，１，２と変化する間に、５個の積和演算結果Ｔ１〜Ｔ５を算出し、それぞれの積和レジスタ６１ａ〜６１ｅの一時レジスタ６０ａ１〜６０ｅ１に保持する。 The output T5 of the product-sum calculator 56a is held in the temporary register 60e1 of the fifth product-sum register 61e at the timing of the clock edge. As described above, in the arithmetic circuit 50, the count value of the counter 66 is changed to j = 0, 1, 2 by using the product-sum arithmetic units 56a, 56b which is smaller in number (two) than the number of columns (5). In the meantime, five product-sum operation results T1 to T5 are calculated and held in the temporary registers 60a1 to 60e1 of the respective product-sum registers 61a to 61e.

そして、次のカウント値ｊ＝０の期間には、一時レジスタ６０ａ１〜６０ｅ１に一時的に保持した積和演算結果を、そのまま、もしくは、積和レジスタ６１ａ〜６１ｄの２段目のフリップフロップに保持されていた値と積算した上で、２段目のフリップフロップ６０ａ２〜６０ｅ２に保持する。 Then, during the next count value j = 0, the product-sum operation results temporarily held in the temporary registers 60a1 to 60e1 are held as they are or in the second-stage flip-flops of the product-sum registers 61a to 61d. After being integrated with the value that has been set, it is held in the second-stage flip-flops 60a2 to 60e2.

積和レジスタ６１ａでは、一時レジスタ６０ａ１に保持された積和演算結果Ｔ１が、クロックエッジのタイミングでフリップフロップ６０ａ２に保持される。また、積和レジスタ６１ｂでは、加算器５８ｂにより、一時レジスタ６０ｂ１に保持された積和演算結果Ｔ２と、フリップフロップ６０ａ２の出力Ｐ１’とが加算され、その加算結果Ｐ１’＋Ｔ１が、クロックエッジのタイミングでフリップフロップ６０ｂ２に、保持値Ｐ２として保持される。ここでの加算演算に利用されるフリップフロップ６０ａ２の出力Ｐ１’は、前回、カウンタ６６のカウント値ｊ＝０であった期間にフリップフロップ６０ａ２に保持された値である。この場合には初期化後の値、すなわち０である。 In the product-sum register 61a, the product-sum operation result T1 held in the temporary register 60a1 is held in the flip-flop 60a2 at the timing of the clock edge. In the product-sum register 61b, the adder 58b adds the product-sum operation result T2 held in the temporary register 60b1 and the output P1 ′ of the flip-flop 60a2, and the addition result P1 ′ + T1 is the clock edge. At the timing, it is held in the flip-flop 60b2 as the hold value P2. The output P1 'of the flip-flop 60a2 used for the addition operation here is the value held in the flip-flop 60a2 during the previous period when the count value j = 0 of the counter 66. In this case, the value after initialization, that is, 0.

積和レジスタ６１ｃでは、加算器５８ｃにより、一時レジスタ６０ｃ１に保持された積和演算結果Ｔ３と、フリップフロップ６０ｂ２の出力Ｐ２’とが加算され、その加算結果Ｐ２’＋Ｔ３が、クロックエッジのタイミングでフリップフロップ６０ｃ３に、保持値Ｐ３として保持される。また、積和レジスタ６１ｄでは、加算器５８ｄにより、一時レジスタ６０ｄ１に保持された積和演算結果Ｔ４と、フリップフロップ６０ｃ２の出力Ｐ３’とが加算され、その加算結果Ｐ３’＋Ｔ４が、クロックエッジのタイミングでフリップフロップ６０ｄ２に、保持値Ｐ４として保持される。ここでの加算結果に利用されるフリップフロップ６０ｂ２および６０ｃ２の出力Ｐ２’およびＰ３’は、前回、カウント値ｊ＝０であった期間にフリップフロップ６０ｂ２および６０ｃ２に保持された値である。この場合には初期化後の値、すなわち０である。 In the product-sum register 61c, the adder 58c adds the product-sum operation result T3 held in the temporary register 60c1 and the output P2 ′ of the flip-flop 60b2, and the addition result P2 ′ + T3 is obtained at the timing of the clock edge. It is held in the flip-flop 60c3 as a hold value P3. In the product-sum register 61d, the adder 58d adds the product-sum operation result T4 held in the temporary register 60d1 and the output P3 ′ of the flip-flop 60c2, and the addition result P3 ′ + T4 is added to the clock edge. At the timing, it is held in the flip-flop 60d2 as the hold value P4. The outputs P2 'and P3' of the flip-flops 60b2 and 60c2 used for the addition result here are values held in the flip-flops 60b2 and 60c2 during the previous period when the count value j = 0. In this case, the value after initialization, that is, 0.

積和レジスタ６１ｅでは、加算器５８ｅにより、一時レジスタ６０ｅ１に保持された積和演算結果Ｔ５と、フリップフロップ６０ｄ２の出力Ｐ４’とが加算され、その加算結果Ｐ４’＋Ｔ５が、クロックエッジのタイミングで、保持値Ｐ５としてフリップフロップ６０ｅ２に保持される。ここでの加算演算に利用されるフリップフロップ６０ｄ２の出力Ｐ４’は、前回、カウント値ｊ＝０であった期間に保持された値である。この場合には、初期化後の値、すなわち０である。そして、フリップフロップ６０ｅ２の出力Ｐ５が、フィルタ処理後の画像の各画素の画素値Ｐとして出力される。 In the product-sum register 61e, the adder 58e adds the product-sum operation result T5 held in the temporary register 60e1 and the output P4 ′ of the flip-flop 60d2, and the addition result P4 ′ + T5 is obtained at the timing of the clock edge. The held value P5 is held in the flip-flop 60e2. The output P4 'of the flip-flop 60d2 used for the addition operation here is a value held last time during the period in which the count value j = 0. In this case, the value after initialization, that is, 0. Then, the output P5 of the flip-flop 60e2 is output as the pixel value P of each pixel of the image after filtering.

このように、演算回路５０では、カウンタ６６のカウンタ値ｊが０，１，２と変化する過程で、２個の積和演算器５６ａ、５６ｂを利用して積和演算Ｃ１Ｌ１〜Ｃ５Ｌ１を行い、その演算結果Ｔ１〜Ｔ５を、１〜５番目の積和レジスタ６１ａ〜６１ｅのそれぞれの一時レジスタ６０ａ１〜６０ｅ１に保持する。そして、次にカウント値ｊ＝０になった期間に、１番目の積和レジスタ６１ａの一時レジスタ６０ａ１に保持された積和演算結果Ｔ１が同一の積和レジスタ６１ａの２段目のフリップフロップ６０ａ２に保持される。同時に、２〜５番目の積和レジスタ６１ｂ〜６１ｅの２段目のフリップフロップ６０ｂ２〜６０ｅ２には、それぞれ、同一の積和レジスタ６１ｂ〜６１ｅの一時レジスタ６０ｂ１〜６０ｅ１に保持された積和演算結果Ｔ２〜Ｔ５と、１〜４番目の積和レジスタ６１ａ〜６１ｄの２段目のフリップフロップ６０ａ２〜６０ｄ２に、前回カウント値ｊ＝０であった期間に保持された値Ｐ１’〜Ｐ４’との積算値が、保持される。 As described above, the arithmetic circuit 50 performs the product-sum operations C1L1 to C5L1 using the two product-sum calculators 56a and 56b in the process in which the counter value j of the counter 66 changes to 0, 1, and 2. The calculation results T1 to T5 are held in the temporary registers 60a1 to 60e1 of the first to fifth product-sum registers 61a to 61e, respectively. Then, during the next period when the count value j = 0, the product-sum operation result T1 held in the temporary register 60a1 of the first product-sum register 61a is the second-stage flip-flop 60a2 of the same product-sum register 61a. Retained. At the same time, the product-sum operation results held in the temporary registers 60b1 to 60e1 of the same product-sum registers 61b to 61e are respectively stored in the second flip-flops 60b2 to 60e2 of the second to fifth product-sum registers 61b to 61e. T2 to T5 and the values P1 ′ to P4 ′ held in the period when the previous count value j = 0 was stored in the second flip-flops 60a2 to 60d2 of the first to fourth product-sum registers 61a to 61d. The integrated value is held.

すなわち、演算回路５０では、カウンタ６６のカウント値ｊが０，１，２と変化し、さらに０に戻る過程で、図２の演算回路３０が１サイクルに実施する演算を実施する。これ以降、カウント値ｊが０，１，２であるそれぞれの期間を、演算のサイクルと区別するために、ステップと呼ぶ。 That is, in the arithmetic circuit 50, the arithmetic circuit 30 shown in FIG. 2 performs an operation performed in one cycle in the process in which the count value j of the counter 66 changes to 0, 1, 2 and returns to 0. Hereinafter, each period in which the count value j is 0, 1, and 2 is referred to as a step in order to distinguish it from a calculation cycle.

演算回路５０では、さらに、カウンタ６６によって、０〜２の順に繰り返しカウントが行われ、上記の動作が繰り返される。そして、演算回路３０と同様に、初期化後５サイクルで、５列×５行の畳み込み演算によるフィルタ処理を行った最初の画素の画素値Ｐが後処理回路６４の最後の積和レジスタ６１ｅのフリップフロップ６０ｅ２に保持され、出力される。 In the arithmetic circuit 50, the counter 66 repeatedly performs counting in the order of 0 to 2, and the above operation is repeated. Similarly to the arithmetic circuit 30, the pixel value P of the first pixel subjected to the filter processing by the 5 column × 5 row convolution operation in the five cycles after the initialization is stored in the last product-sum register 61 e of the post-processing circuit 64. It is held in the flip-flop 60e2 and output.

同様に、図８に示す元画像の各画素の画素値が画像情報記憶装置に記憶され、図１３に示す順序でフィルタ処理が順次行われる場合、演算回路５０の具体的な動作は、下記表５，６に示す通りである。 Similarly, when the pixel value of each pixel of the original image shown in FIG. 8 is stored in the image information storage device and the filter processing is sequentially performed in the order shown in FIG. 13, the specific operation of the arithmetic circuit 50 is as follows. As shown in FIGS.

表５および表６には、各ステップにおけるカウント値ｊと、クロックエッジに同期して積和レジスタ６１ａ〜６１ｅのそれぞれの一時レジスタ６０ａ１〜６０ｅ１に保持される積和演算結果Ｔ１〜Ｔ５、および、２段目のフリップフロップ６０ａ２〜６０ｅ２に保持される値Ｐ１〜Ｐ５を示した。従って、カウント値ｊ＝０となるステップにおいて２段目のフリップフロップ６０ｂ２〜６０ｅ２に保持する値を算出するために、それぞれの積和レジスタ６１ｂ〜６１ｅの加算器５８ｂ〜５８ｅに供給される値Ｐ１’〜Ｐ４’は、同一ステップにおけるＰ１〜Ｐ４の値として表６に示された値とは異なる。例えば、表６には、第４ステップにおけるＰ１〜Ｐ４の値としては、それぞれＴ１１，Ｔ２１，Ｔ３１，Ｔ４１が示されている。しかし、同一ステップにおいて加算器５８ｂ〜５８ｅに供給されるＰ１’〜Ｐ４’の値は、前回ｊ＝０であった第１ステップにおいてそれぞれのフリップフロップ６０ａ２〜６０ｄ２に保持され、その後、第３ステップまで維持されていた値、すなわち、０である。なお、表５および表６において、例えばＴ１１は、表４におけるＣ１Ｌ１と同じものである。Ｔ１２，Ｔ１３，…およびＴ２１，Ｔ３１，…、等についても同様である。 Tables 5 and 6 show the count value j in each step, the product-sum operation results T1 to T5 held in the temporary registers 60a1 to 60e1 of the product-sum registers 61a to 61e in synchronization with the clock edge, and The values P1 to P5 held in the second-stage flip-flops 60a2 to 60e2 are shown. Therefore, the value P1 supplied to the adders 58b to 58e of the product-sum registers 61b to 61e in order to calculate the value held in the second-stage flip-flops 60b2 to 60e2 in the step where the count value j = 0. “˜P4” is different from the values shown in Table 6 as the values of P1 to P4 in the same step. For example, Table 6 shows T11, T21, T31, and T41 as values of P1 to P4 in the fourth step, respectively. However, the values of P1 ′ to P4 ′ supplied to the adders 58b to 58e in the same step are held in the respective flip-flops 60a2 to 60d2 in the first step where j = 0 last time, and then the third step. The value that has been maintained until 0, that is, 0. In Tables 5 and 6, for example, T11 is the same as C1L1 in Table 4. The same applies to T12, T13,... And T21, T31,.

ここで、表４，５を参照しながら、カウンタ６６のカウント値ｊが０，１，２と変化し、さらに０に戻る過程で、演算回路５０が、図２の演算回路３０が１サイクルに実施する演算を実施する処理を説明する。 Here, referring to Tables 4 and 5, in the process in which the count value j of the counter 66 changes to 0, 1, and 2 and further returns to 0, the arithmetic circuit 50 makes the arithmetic circuit 30 in FIG. A process for performing an operation to be performed will be described.

例えば、表５，６の第１ステップから開始される第１サイクルにおいては、まず、第１ステップにおいて、元画像用レジスタ５２に１番目の１列の画素値Ｌ１が保持される。そして、第１〜第３ステップにかけて、２つの積和演算器５６ａ、５６ｂによって、元画像用レジスタ５２に保持された１番目の１列の画素値Ｌ１と係数用レジスタ５４に保持された係数値との間の積和演算が行われる。そして、この積和演算結果Ｔ１１〜Ｔ５１が、図４に示されたＴ１〜Ｔ５の値として、積和レジスタ６１ａ〜６１ｅの一時レジスタ６０ａ１〜６０ｅ１に保持される。続いて、第４ステップにおいて、１番目の積和レジスタ６１ａの一時レジスタ６０ａ１に保持された、１番目の１列の画素値と１列目の係数との間の積和演算結果Ｔ１１が、１番目の積和レジスタ６１ａの２段目のフリップフロップ６０ａ２に、表６のＰ１の値として保持される。また、２〜５番目の積和レジスタ６１ｂ〜６１ｅの一時レジスタ６０ｂ１〜６０ｅ１に保持された、１番目の１列の画素値と２〜５列目の係数値との間の積和演算結果Ｔ２１〜Ｔ５１は、それぞれ、１〜４番目の積和レジスタ６１ａ〜６１ｄの２段目のフリップフロップ６０ａ２〜６０ｄ２から供給される値Ｐ１’〜Ｐ４’（第１ステップにおいて保持された値Ｐ１〜Ｐ４）と積算され、２〜５番目の積和レジスタ６１ｂ〜６１ｅの２段目のフリップフロップ６０ｂ２〜６０ｅ２に保持される。この場合には、第１ステップにおいてフリップフロップ６０ａ２〜６０ｄ２に保持された値はいずれも０であるため、積和演算結果Ｔ２１〜Ｔ５１がそのまま、表６のＰ２〜Ｐ５の値として保持される。これによって、第１の処理サイクルが完了する。 For example, in the first cycle starting from the first step of Tables 5 and 6, first, the pixel value L1 of the first one column is held in the original image register 52 in the first step. Then, through the first to third steps, the first column of pixel values L1 held in the original image register 52 and the coefficient values held in the coefficient register 54 by the two product-sum calculators 56a and 56b. Multiply-and-accumulate operations are performed. The product-sum calculation results T11 to T51 are held in the temporary registers 60a1 to 60e1 of the product-sum registers 61a to 61e as the values of T1 to T5 shown in FIG. Subsequently, in a fourth step, the product-sum operation result T11 between the pixel value of the first column and the coefficient of the first column held in the temporary register 60a1 of the first product-sum register 61a is 1 The value of P1 in Table 6 is held in the second flip-flop 60a2 of the second product-sum register 61a. The product-sum operation result T21 between the pixel values of the first column and the coefficient values of the second to fifth columns held in the temporary registers 60b1 to 60e1 of the second to fifth product-sum registers 61b to 61e. To T51 are values P1 ′ to P4 ′ (values P1 to P4 held in the first step) supplied from the second-stage flip-flops 60a2 to 60d2 of the first to fourth product-sum registers 61a to 61d, respectively. And are held in the second flip-flops 60b2 to 60e2 of the second to fifth product-sum registers 61b to 61e. In this case, since the values held in the flip-flops 60a2 to 60d2 in the first step are all 0, the product-sum operation results T21 to T51 are held as they are as the values of P2 to P5 in Table 6. This completes the first processing cycle.

この第４ステップにおいては、第１サイクルにおける第２段のフリップフロップ６０ａ２〜６０ｅ２への積和演算結果もしくは積算値の保持と並行して、第２サイクルにおける元画像用レジスタ５２への２番目の１列の画素値Ｌ２の保持、および、この画素値Ｌ２と係数値との間の積和演算とその結果の一時レジスタへの保持が開始される。すなわち、表５のＴ１，Ｔ２の値は、それぞれＴ１２，Ｔ２２に変化する。このように、演算回路５０においては、カウンタ６６のカウント値ｊが０になるステップにおいて、連続する２つのサイクルの処理の一部が並列して行われる。 In the fourth step, the second sum to the original image register 52 in the second cycle is held in parallel with the product-sum operation result or the accumulated value in the second-stage flip-flops 60a2 to 60e2 in the first cycle. The holding of the pixel value L2 of one column, the product-sum operation between the pixel value L2 and the coefficient value, and the holding of the result in the temporary register are started. That is, the values of T1 and T2 in Table 5 change to T12 and T22, respectively. Thus, in the arithmetic circuit 50, in the step where the count value j of the counter 66 becomes 0, part of the processing of two consecutive cycles is performed in parallel.

第２サイクルの処理は、さらに第５〜第７ステップにおいて引き続いて行われる。すなわち、第４ステップに続いて第５，６ステップにおいて積和演算とその結果の一時レジスタへの保持が行われ、第７ステップにおいて、第２段のフリップフロップ６０ａ２〜６０ｅ２への積和演算結果もしくは積算値の保持が行われる。ここで、第７ステップにおいて積算値の算出において利用される、１〜４番目の積和レジスタ６１ａ〜６１ｄの２段目のフリップフロップ６０ａ〜６０ｄに保持された値は、前サイクル、すなわち、第４ステップで、クロック信号に同期して保持されたものである。 The processing in the second cycle is further performed in the fifth to seventh steps. That is, following the fourth step, the product-sum operation is performed in the fifth and sixth steps and the result is stored in the temporary register. In the seventh step, the product-sum operation result to the second-stage flip-flops 60a2 to 60e2 is performed. Alternatively, the integrated value is held. Here, the values held in the second-stage flip-flops 60a to 60d of the first to fourth product-sum registers 61a to 61d used in the integration value calculation in the seventh step are the previous cycle, that is, the first cycle. It is held in four steps in synchronization with the clock signal.

ここで、上記第１サイクルの最後に、第４ステップにおいて行われる積算に利用される値Ｐ１’〜Ｐ４’は、初期化ステップに続く第１ステップにおいて２段目のフリップフロップ６０ａ〜６０ｄに保持値Ｐ１〜Ｐ４として保持されたものである。前述のように、第１ステップにおいては、既に第１サイクルの処理としての積和演算および積和演算結果の一時レジスタへの保持が開始されている。しかし、やはり前述のように、演算回路５０においては、カウント値ｊが０になるステップにおいて、連続する２つのサイクルの処理の一部が並列して行われる。すなわち、第１ステップでの２段目のフリップフロップ６０ａ〜６０ｄへの保持値Ｐ１〜Ｐ４の保持は、前サイクルである初期化サイクルの一部として行われる。従って、第１サイクルの最後の第４ステップにおいて積算値の算出に利用される、１〜４番目の積和レジスタ６１ａ〜６１ｄの２段目のフリップフロップ６０ａ〜６０ｄの保持値も、前サイクルで保持された値である。 Here, at the end of the first cycle, values P1 ′ to P4 ′ used for integration performed in the fourth step are held in the second-stage flip-flops 60a to 60d in the first step following the initialization step. It is held as values P1 to P4. As described above, in the first step, the product-sum operation as the processing of the first cycle and the holding of the product-sum operation result in the temporary register are already started. However, as described above, in the arithmetic circuit 50, in the step where the count value j becomes 0, part of the processing of two consecutive cycles is performed in parallel. That is, the holding values P1 to P4 in the second-stage flip-flops 60a to 60d in the first step are performed as part of the initialization cycle that is the previous cycle. Accordingly, the holding values of the second-stage flip-flops 60a to 60d of the first to fourth product-sum registers 61a to 61d, which are used to calculate the integrated value in the last fourth step of the first cycle, are also the previous cycle. It is the retained value.

その後、第３以降のサイクルの処理も同様に行われ、第５サイクルが終了する第１６ステップにおいて、最初のフィルタ処理後の画素の画素値Ｐが出力される。以後、３サイクル毎に、２番目以降のフィルタ処理後の画素の画素値Ｐが順次出力される。 Thereafter, the processes in the third and subsequent cycles are performed in the same manner, and the pixel value P of the pixel after the first filter process is output in the sixteenth step when the fifth cycle ends. Thereafter, every three cycles, the pixel value P of the pixel after the second and subsequent filter processing is sequentially output.

本実施形態の演算回路５０においても、隣りあう画素の演算で共通に使用される元画像の画素の画素値を画像情報記憶装置から２度読み出す必要がないという利点がある。また、演算回路５０では、２つの積和演算器５６ａ、５６ｂだけを用いている。これにより、その処理速度は、従来の１／３倍となるが、処理速度が要求されない用途であれば、回路規模を削減することができるという利点がある。 The arithmetic circuit 50 of this embodiment also has an advantage that it is not necessary to read out the pixel value of the pixel of the original image that is commonly used in the calculation of adjacent pixels from the image information storage device twice. The arithmetic circuit 50 uses only two product-sum arithmetic units 56a and 56b. As a result, the processing speed is 1/3 times the conventional speed, but there is an advantage that the circuit scale can be reduced if the processing speed is not required.

なお、本実施形態では、積和演算器５６ａには、係数値Ｃ１，Ｃ３，Ｃ５の順で順次係数値が入力され、積和演算器５６ｂには、Ｃ２，Ｃ４の順で順次係数値が入力されている。しかし、本発明は、これに限らず、積和演算器に係数値が入力される順序はどのような順序であっても良い。また、積和演算器５６ａ、５６ｂのどちらにどの係数値を入力するかも任意である。 In the present embodiment, coefficient values are sequentially input to the product-sum calculator 56a in the order of coefficient values C1, C3, and C5, and coefficient values are sequentially input to the product-sum calculator 56b in the order of C2 and C4. Have been entered. However, the present invention is not limited to this, and the order in which the coefficient values are input to the product-sum operation unit may be any order. It is also arbitrary which coefficient value is input to which of the product-sum calculators 56a and 56b.

第３の実施形態の演算回路は、５列×５行に限らず、Ｎ列×Ｎ行（Ｎは３以上の整数）の畳み込み演算によるフィルタ処理に適用可能である。この時、前処理回路５２の積和演算器の個数ｍは、２≦ｍ＜Ｎとする。 The arithmetic circuit of the third embodiment is not limited to 5 columns × 5 rows, but can be applied to filter processing by convolution operation of N columns × N rows (N is an integer of 3 or more). At this time, the number m of product-sum calculators in the preprocessing circuit 52 is 2 ≦ m <N.

本実施形態の演算回路では、一般的に、初期化後、１〜Ｎサイクルにかけて順に、２次元空間のＮ列×Ｎ行の範囲の１〜Ｎ列目のデータ値をデータレジスタに保持し、各サイクル毎に、前処理回路のｍ個の積和演算器の少なくとも一部のそれぞれで、Ｎ個のデータ値と、Ｎ列×Ｎ行のいずれかの列のＮ個の係数との間の積和を演算する操作を、Ｎ回（Ｎステップ）未満繰り返して行うことによりＮ個のデータ値と全ての列の係数との間の積和の演算を行う。そして、積和演算器が演算した、Ｎ個のデータ値と１列目のＮ個の係数との間の積和演算結果を後処理回路の１番目の積和レジスタに保持するとともに、Ｎ個のデータ値とｎ列目（ｎ＝２〜Ｎ）の係数との間の積和演算結果と、前サイクルでの後処理回路のｎ−１番目の積和レジスタに保持した値との積算値を、後処理回路のｎ番目の積和レジスタに保持する。これにより、Ｎ番目の積和レジスタから、２次元空間の前記Ｎ列×Ｎ行の範囲の中心に位置する第１の演算対象点の畳み込み演算結果が出力される。 In the arithmetic circuit of the present embodiment, generally, after initialization, data values in the 1st to Nth columns in the range of N columns × N rows in the two-dimensional space are sequentially stored in the data register over 1 to N cycles. For each cycle, between each of at least some of the m multiply-add calculators of the pre-processing circuit, between N data values and N coefficients in any column of N columns × N rows. The operation of calculating the sum of products is repeated less than N times (N steps), thereby calculating the sum of products between the N data values and the coefficients of all the columns. The product-sum operation result between the N data values and the N coefficients in the first column calculated by the product-sum calculator is held in the first product-sum register of the post-processing circuit, and N Of the product-sum operation between the data value of n and the coefficients of the n-th column (n = 2 to N) and the value held in the n−1th product-sum register of the post-processing circuit in the previous cycle Is held in the nth product-sum register of the post-processing circuit. As a result, the convolution calculation result of the first calculation target point located at the center of the range of N columns × N rows in the two-dimensional space is output from the Nth product-sum register.

また、図４に示すように、後処理回路がさらに、１〜Ｎ番目の積和レジスタのそれぞれに対応して設けられた１〜Ｎ番目の一時レジスタ（図４のフリップフロップ６０ａ１，６０ｂ１，６０ｃ１，６０ｄ１，６０ｅ１）を備えることが好ましい。この場合、各サイクル毎に、積和演算器が演算した、データ値と１〜Ｎ列目の係数値との間の積和演算結果のそれぞれが１〜Ｎ番目の一時レジスタに保持された後に、一時レジスタのｎ番目に保持された積和演算結果と前サイクルでｎ−１番目の積和レジスタに保持された値との積算値が、ｎ番目の積和レジスタに保持される。 Further, as shown in FIG. 4, the post-processing circuit is further provided with 1-Nth temporary registers (flip-flops 60a1, 60b1, 60c1 of FIG. 4) provided corresponding to each of the 1-Nth product-sum registers. , 60d1, 60e1). In this case, after each product-sum operation result between the data value and the coefficient value in the 1st to Nth columns calculated by the product-sum operation unit is held in the 1st to Nth temporary registers for each cycle. The integrated value of the nth product-sum operation result held in the temporary register and the value held in the n−1th product-sum register in the previous cycle is held in the nth product-sum register.

次に、本発明の第４の実施形態について説明する。 Next, a fourth embodiment of the present invention will be described.

図５は、本発明の演算回路の構成を表す第４の実施形態の概略図である。同図に示す演算回路７０は、２次元空間に配置された元画像の２列×２行の４画素分の画素値から、畳み込み演算によるフィルタ処理を行う。演算回路７０は、元画像用シフトレジスタ７２と、係数用シフトレジスタ７４と、前処理回路８２と、後処理回路８４と、カウンタ８６とによって構成されている。 FIG. 5 is a schematic diagram of the fourth embodiment showing the configuration of the arithmetic circuit of the present invention. The arithmetic circuit 70 shown in the figure performs a filtering process by a convolution operation from pixel values of 4 pixels of 2 columns × 2 rows of an original image arranged in a two-dimensional space. The arithmetic circuit 70 includes an original image shift register 72, a coefficient shift register 74, a preprocessing circuit 82, a postprocessing circuit 84, and a counter 86.

元画像用シフトレジスタ７２は、処理対象となる元画像の２列×２行＝４画素分の画素値を保持する。そして、１サイクル毎に、次の１列の２画素分の画素値が、図５中右側の１列にシフトインされ、各列の２画素分の画素値が左側に１列ずつシフトされる。 The original image shift register 72 holds pixel values of 2 columns × 2 rows = 4 pixels of the original image to be processed. Then, for each cycle, the pixel values for the two pixels in the next column are shifted into the right column in FIG. 5, and the pixel values for the two pixels in each column are shifted one column to the left. .

すなわち、元画像用シフトレジスタ７２では、１サイクル毎に、左側（１列目）の１列の２画素分の画素値Ｌ_iがシフトアウトされ、右側（２列目）の１列の２画素分の画素値Ｌ_i+1が左側の１列にシフトされてＬ_iとして保持され、処理対象となる次の１列の２画素分の画素値が新たなＬ_i+1として右側の１列に保持される。１サイクル毎に、左側の１列の２画素分の画素値Ｌ_iは積和演算器７６ａに入力され、右側の１列の２画素分の画素値Ｌ_i+1は積和演算器７６ｂに入力される。 That is, in the original image shift register 72, for each cycle, the pixel value L _i of two pixels in one column of the left (first column) are shifted out, two pixels in one column on the right (second row) The pixel value L _{i + 1 of the} minute is shifted to one column on the left side and held as L _i , and the pixel value for two pixels in the next one column to be processed is a new column on the right side as a new L _{i + 1} Retained. For each cycle, the pixel values L _i for two pixels in the left column are input to the product-sum calculator 76a, and the pixel values L _{i + 1 for} the two pixels in the right column are input to the product-sum calculator 76b. Entered.

続いて、係数用シフトレジスタ７４は、演算回路７０の機能（フィルタ処理）を決定するための、元画像の２列×２行＝４画素に対応する４個の係数値Ｃ１１，Ｃ２１，Ｃ１２，Ｃ２２を保持する。 Subsequently, the coefficient shift register 74 determines the function (filtering process) of the arithmetic circuit 70, and the four coefficient values C11, C21, C12, corresponding to 2 columns × 2 rows = 4 pixels of the original image. Hold C22.

係数用シフトレジスタ７４では、１サイクル毎に、図５中、左側の１列の２つの係数値Ｃ１１，Ｃ２１と、右側の１列の２つの係数値Ｃ１２，Ｃ２２とが交互にシフト（ローテーション）され、左側の１列の２つの係数値が出力される。１列の２つの係数値Ｃ１（＝Ｃ１１，Ｃ２１）もしくはＣ２（＝Ｃ１２，Ｃ２２）は、１サイクル毎に、積和演算器７６ａ、７６ｂの両方に共通に入力される。 In the coefficient shift register 74, two coefficient values C11 and C21 in one column on the left side and two coefficient values C12 and C22 in one column on the right side in FIG. 5 are alternately shifted (rotated) every cycle. Then, two coefficient values in one column on the left side are output. Two coefficient values C1 (= C11, C21) or C2 (= C12, C22) in one column are input in common to both the product-sum calculators 76a and 76b for each cycle.

カウンタ８６は、初期化後、クロック信号に同期して１〜２を繰り返しカウントする。このカウンタ８６の計数と、係数用シフトレジスタ７４のシフトとが、同一のクロック信号に同期して行われることにより、本実施形態の場合、カウント値ｋ＝１の時には係数値Ｃ１が、カウント値ｋ＝２の時には係数値Ｃ２が、積和演算器７６ａ、７６ｂに共通に入力される。 The counter 86 repeatedly counts 1 to 2 in synchronization with the clock signal after initialization. The count of the counter 86 and the shift of the coefficient shift register 74 are performed in synchronization with the same clock signal. In this embodiment, when the count value k = 1, the coefficient value C1 becomes the count value. When k = 2, the coefficient value C2 is commonly input to the product-sum calculators 76a and 76b.

前処理回路８２は、列数分の２つの積和演算器７６ａ、７６ｂによって構成されている。 The preprocessing circuit 82 includes two product-sum calculators 76a and 76b corresponding to the number of columns.

積和演算器７６ａは、サイクル毎に、元画像用シフトレジスタ７２の左側の１列の２画素分の画素値Ｌ_i（例えば、サイクル１ではＬ１，サイクル２ではＬ２）と、係数用シフトレジスタ７４の１番目（図５中、左側）の１列の２つの係数値Ｃ_k（例えば、サイクル１ではＣ１，サイクル２ではＣ２）との積和演算Ｔ１＝Ｃ_kＬ_iを行う。積和演算器７６ａの出力Ｔ１は加算器７８ａに入力される。 The product-sum operation unit 76a includes, for each cycle, pixel values L _i (for example, L1 in cycle 1 and L2 in cycle 2) for two pixels in one column on the left side of the original image shift register 72, and a coefficient shift register. A product-sum operation T1 = C _k _Li is performed with two coefficient values C _k (for example, C1 in cycle 1 and C2 in cycle 2) in the first column of 74 (left side in FIG. 5). The output T1 of the product-sum calculator 76a is input to the adder 78a.

一方、積和演算器７６ｂは、元画像用シフトレジスタ７２の右側の１列の２画素分の画素値Ｌ_i+1（例えば、サイクル１ではＬ２，サイクル２ではＬ３）と、係数用シフトレジスタ７４の１番目の１列の２つの係数値Ｃ_kとの積和演算Ｔ２＝Ｃ_kＬ_i+1を行う。積和演算器７６ｂの出力Ｔ２は加算器７８ｂに入力される。 On the other hand, the product-sum operation unit 76b includes pixel values L _{i + 1} (for example, L2 in cycle 1 and L3 in cycle 2) for two pixels in one column on the right side of the original image shift register 72, and a coefficient shift register. A product-sum operation T2 = C _k L _{i + 1} with the two coefficient values C _{k in} the first column of 74 is performed. The output T2 of the product-sum calculator 76b is input to the adder 78b.

後処理回路８４は、列数分の２つの積和レジスタ８１ａ、８１ｂによって構成されている。 The post-processing circuit 84 includes two product-sum registers 81a and 81b corresponding to the number of columns.

積和レジスタ８１ａは、加算器７８ａと、セレクタ７９ａと、フリップフロップ８０ａによって構成されている。加算器７８ａは、積和演算器７６ａの出力Ｔ１とフリップフロップ８０ａの出力Ｐ１’とを加算する。セレクタ７９ａは、カウンタ８６のカウント値ｋ＝１の時には積和演算器７６ａの出力Ｔ１を選択し、出力する。カウント値ｋ＝２の時には加算器７８ａの出力Ｐ１’＋Ｔ１を選択し、出力する。フリップフロップ８０ａは、クロックエッジのタイミングで、セレクタ７９ａの出力を保持する。保持された値Ｐ１は、フィルタ処理後の画像の奇数番目の各画素の画素値として出力される。 The product-sum register 81a includes an adder 78a, a selector 79a, and a flip-flop 80a. The adder 78a adds the output T1 of the product-sum calculator 76a and the output P1 'of the flip-flop 80a. The selector 79a selects and outputs the output T1 of the product-sum calculator 76a when the count value k = 1 of the counter 86 is 1. When the count value k = 2, the output P1 '+ T1 of the adder 78a is selected and output. The flip-flop 80a holds the output of the selector 79a at the timing of the clock edge. The held value P1 is output as the pixel value of each odd-numbered pixel of the image after filtering.

また、積和レジスタ８１ｂは、加算器７８ｂと、セレクタ７９ｂと、フリップフロップ８０ｂによって構成されている。加算器７８ｂは、積和演算器７６ｂの出力Ｔ２とフリップフロップ８０ｂの出力Ｐ２’とを加算する。セレクタ７９ｂは、カウンタ８６のカウント値ｋ＝１の時には積和演算器７６ｂの出力Ｔ１を選択し、出力する。カウント値ｋ＝２の時には加算器７８ｂの出力Ｐ２’＋Ｔ２を選択し、出力する。フリップフロップ８０ｂは、クロック信号に同期して、セレクタ７９ｂの出力を保持する。保持された値Ｐ２は、フィルタ処理後の画像の偶数番目の各画素の画素値として出力される。 The product-sum register 81b includes an adder 78b, a selector 79b, and a flip-flop 80b. The adder 78b adds the output T2 of the product-sum calculator 76b and the output P2 'of the flip-flop 80b. The selector 79b selects and outputs the output T1 of the product-sum calculator 76b when the count value k = 1 of the counter 86 is 1. When the count value k = 2, the output P2 '+ T2 of the adder 78b is selected and output. The flip-flop 80b holds the output of the selector 79b in synchronization with the clock signal. The held value P2 is output as the pixel value of each even-numbered pixel of the image after filtering.

次に、表７を参照しながら、図８に示す元画像の各画素の画素値が画像情報記憶装置に記憶され、図１３に示す順序でフィルタ処理が順次行われる場合の演算回路７０の動作を説明する。 Next, referring to Table 7, the operation of the arithmetic circuit 70 when the pixel value of each pixel of the original image shown in FIG. 8 is stored in the image information storage device and the filter processing is sequentially performed in the order shown in FIG. Will be explained.

初期化のサイクルでは、元画像用シフトレジスタ７２には、図５中上側に示すように、左側の１列の２画素分の画素値Ｌ_iとして、元画像の１列目の２画素分の画素値Ｌ１１，Ｌ２１が保持され、右側の２画素分の画素値Ｌ_i+1として、元画像の２列目の２画素分の画素値Ｌ１２，Ｌ２２が保持される。係数用シフトレジスタ７４には、係数値Ｃ１（＝Ｃ１１，Ｃ２１）およびＣ２（＝Ｃ１２，Ｃ２２）が、最初にＣ１が出力される状態に保持される。 In the initialization cycle, the original image shift register 72, as shown in the upper side in FIG. 5, as the pixel value L _i of two pixels of one column to the left, the first column of the original image of two pixels Pixel values L11 and L21 are held, and pixel values L12 and L22 for two pixels in the second column of the original image are held as pixel values L _{i + 1} for the two pixels on the right side. The coefficient shift register 74 holds the coefficient values C1 (= C11, C21) and C2 (= C12, C22) in a state where C1 is first output.

サイクル１では、積和演算器７６ａでは、元画像用レジスタ７２から入力される左側の１列の２画素分の画素値Ｌ１＝Ｌ１１，Ｌ２１と、係数用シフトレジスタ７４から入力される２つの係数値Ｃ１＝Ｃ１１，Ｃ２１との積和演算Ｃ１Ｌ１＝Ｃ１１Ｌ１１＋Ｃ２１Ｌ２１が行われる。 In cycle 1, the product-sum calculator 76 a receives the pixel values L 1 = L 11 and L 21 for two pixels in the left column input from the original image register 72 and the two relations input from the coefficient shift register 74. The product-sum operation C1L1 = C11L11 + C21L21 with the numerical values C1 = C11, C21 is performed.

同様に、積和演算器７６ｂでは、右側の１列の２画素分の画素値Ｌ２＝Ｌ１２，Ｌ２２と、２つの係数値Ｃ１＝Ｃ１１，Ｃ２１との積和演算Ｃ１Ｌ２＝Ｃ１１Ｌ１２＋Ｃ２１Ｌ２２が行われる。 Similarly, the product-sum calculator 76b performs a product-sum operation C1L2 = C11L12 + C21L22 of the pixel values L2 = L12, L22 for two pixels in the right column and the two coefficient values C1 = C11, C21.

そして、サイクル１では、カウンタ８６のカウント値ｋ＝１であるため、セレクタ７９ａにより、積和演算器７６ａの出力Ｔ１＝Ｃ１１Ｌ１１＋Ｃ２１Ｌ２１が選択され、クロックエッジのタイミングでフリップフロップ８０ａに保持される。 In cycle 1, since the count value k of the counter 86 is 1, the output T1 = C11L11 + C21L21 of the product-sum calculator 76a is selected by the selector 79a and is held in the flip-flop 80a at the timing of the clock edge.

同様に、セレクタ７９ｂにより、積和演算器７６ｂの出力Ｔ２＝Ｃ１１Ｌ１２＋Ｃ２１Ｌ２２が選択され、クロックエッジのタイミングでフリップフロップ８０ｂに保持される。 Similarly, the selector 79b selects the output T2 = C11L12 + C21L22 of the product-sum calculator 76b and holds it in the flip-flop 80b at the timing of the clock edge.

サイクル２では、図５中下側に示すように、元画像用シフトレジスタ７２の左側の１列の２画素分の画素値Ｌ_iとして、元画像の２列目の２画素分の画素値Ｌ１２，Ｌ２２が保持され、右側の２画素分の画素値Ｌ_i+1として、元画像の３列目の２画素分の画素値Ｌ１３，Ｌ２３が保持され、それぞれ、積和演算器７６ａおよび７６ｂに入力される。また、係数用シフトレジスタ７４からは、係数値Ｃ２（＝Ｃ１２，Ｃ２２）が出力される。 In cycle 2, as shown in the lower side of FIG. 5, the pixel value L12 for the two pixels in the second column of the original image is used as the pixel value L _i for the two pixels in the first column on the left side of the shift register 72 for the original image. , L22 are held, and pixel values L13, L23 for the second pixel in the third column of the original image are held as pixel values Li _{+ 1} for the two pixels on the right side, and are respectively stored in the product-sum calculators 76a and 76b. Entered. Further, the coefficient value C2 (= C12, C22) is output from the coefficient shift register 74.

従って、積和演算器７６ａでは、左側の１列の２画素分の画素値Ｌ２＝Ｌ１２，Ｌ２２と、係数値Ｃ２＝Ｃ１２，Ｃ２２との積和演算Ｃ２Ｌ２＝Ｃ１２Ｌ１２＋Ｃ２２Ｌ２２が行われる。 Accordingly, the product-sum calculator 76a performs a product-sum operation C2L2 = C12L12 + C22L22 of the pixel values L2 = L12, L22 for the two pixels in the left column and the coefficient values C2 = C12, C22.

同様に、積和演算器７６ｂでは、右側の１列の２画素分の画素値Ｌ３＝Ｌ１３，Ｌ２３と、係数値Ｃ２＝Ｃ１２，Ｃ２２との積和演算Ｃ２Ｌ３＝Ｃ１２Ｌ１３＋Ｃ２２Ｌ２３が行われる。 Similarly, the product-sum calculator 76b performs a product-sum operation C2L3 = C12L13 + C22L23 of the pixel values L3 = L13, L23 for the two pixels in the right column and the coefficient values C2 = C12, C22.

積和レジスタ８１ａでは、加算器７８ａにより、積和演算器７６ａの出力Ｔ１＝Ｃ１２Ｌ１２＋Ｃ２２Ｌ２２とフリップフロップ８０ａの出力Ｐ１’＝Ｃ１１Ｌ１１＋Ｃ２１Ｌ２１とが加算される。そして、サイクル２では、カウンタ８６のカウント値ｋ＝２であるため、セレクタ７９ａにより、加算器７８ａの出力Ｐ１’＋Ｔ１＝Ｃ１１Ｌ１１＋Ｃ１２Ｌ１２＋Ｃ２１Ｌ２１＋Ｃ２２Ｌ２２が選択され、クロックエッジのタイミングでフリップフロップ８０ａに保持される。保持された値Ｐ１は、フィルタ処理後の画像の１番目の画素の画素値Ｐ１１として出力される。 In the product-sum register 81a, the output T1 = C12L12 + C22L22 of the product-sum calculator 76a and the output P1 ′ = C11L11 + C21L21 of the flip-flop 80a are added by the adder 78a. In cycle 2, since the count value k of the counter 86 is 2, the selector 79a selects the output P1 '+ T1 = C11L11 + C12L12 + C21L21 + C22L22 of the adder 78a and holds it in the flip-flop 80a at the timing of the clock edge. The held value P1 is output as the pixel value P11 of the first pixel of the image after filtering.

同様に、積和レジスタ８１ｂでは、加算器７８ｂにより、積和演算器７６ｂの出力Ｔ２＝Ｃ１２Ｌ１３＋Ｃ２２Ｌ２３と、クロック信号以前のフリップフロップ８０ｂの出力Ｐ２’＝Ｃ１１Ｌ１２＋Ｃ２１Ｌ２２とが加算される。そして、セレクタ７９ｂにより、加算器７８ｂの出力Ｐ２’＋Ｔ２＝Ｃ１１Ｌ１２＋Ｃ１２Ｌ１３＋Ｃ２１Ｌ２２＋Ｃ２２Ｌ２３が選択され、クロックエッジのタイミングでフリップフロップ８０ｂに保持される。保持された値Ｐ２は、フィルタ処理後の画像の２番目の画素の画素値Ｐ１１として出力される。 Similarly, in the product-sum register 81b, the adder 78b adds the output T2 = C12L13 + C22L23 of the product-sum calculator 76b and the output P2 '= C11L12 + C21L22 of the flip-flop 80b before the clock signal. Then, the output P2 '+ T2 = C11L12 + C12L13 + C21L22 + C22L23 of the adder 78b is selected by the selector 79b and held in the flip-flop 80b at the timing of the clock edge. The held value P2 is output as the pixel value P11 of the second pixel of the image after filtering.

これ以降の動作は、上記と同様である。すなわち、演算回路７０では、初期化後、２サイクルで１および２番目のフィルタ処理後の２画素分の画素値Ｐ１１，Ｐ１２が同時（並列）に出力される。これ以後、２サイクル毎に、フィルタ処理後の２画素分の画素値が同時に出力される。 The subsequent operation is the same as described above. That is, in the arithmetic circuit 70, pixel values P11 and P12 for two pixels after the first and second filter processing in two cycles after initialization are output simultaneously (in parallel). Thereafter, pixel values for two pixels after the filter processing are simultaneously output every two cycles.

本実施形態の演算回路７０においても、隣りあう画素の演算で共通に使用される元画像の画素の画素値を画像情報記憶装置から２度読み出す必要がないという利点がある。また、本実施形態の演算回路７０では、２つの積和演算器７６ａ、７６ｂと、２つの積和レジスタ８１ａ、８１ｂとが用いられている。これによって、従来の２倍の処理速度が実現されている。 The arithmetic circuit 70 of this embodiment also has an advantage that it is not necessary to read out the pixel value of the pixel of the original image that is commonly used in the calculation of adjacent pixels from the image information storage device. In the arithmetic circuit 70 of this embodiment, two product-sum calculators 76a and 76b and two product-sum registers 81a and 81b are used. As a result, the processing speed twice as high as that of the prior art is realized.

次に、第４の実施形態の別の例として、元画像の５列×５行の２５画素分の画素値を用いて畳み込み演算によるフィルタ処理を行う演算回路について説明する。 Next, as another example of the fourth embodiment, an arithmetic circuit that performs filter processing by convolution operation using pixel values of 25 pixels of 5 columns × 5 rows of the original image will be described.

図６は、本発明の演算回路の構成を表す第５の実施形態の概念図である。同図に示す演算回路１００は、２次元空間に配置された元画像の５列×５行の２５画素分の画素値から、畳み込み演算によるフィルタ処理を行う。演算回路１００は、元画像用シフトレジスタ１０２と、係数用シフトレジスタ１０４と、前処理回路１１２と、後処理回路１１４と、カウンタ１１６とによって構成されている。 FIG. 6 is a conceptual diagram of the fifth embodiment showing the configuration of the arithmetic circuit of the present invention. The arithmetic circuit 100 shown in the figure performs a filter process by a convolution operation from pixel values for 25 pixels of 5 columns × 5 rows of an original image arranged in a two-dimensional space. The arithmetic circuit 100 includes an original image shift register 102, a coefficient shift register 104, a pre-processing circuit 112, a post-processing circuit 114, and a counter 116.

以下、図６に示す演算回路１００について、図５の演算回路７０との違いを重点的に説明する。 Hereinafter, the difference between the arithmetic circuit 100 shown in FIG. 6 and the arithmetic circuit 70 of FIG. 5 will be mainly described.

演算回路１００において、元画像用シフトレジスタ１０２および係数用シフトレジスタ１０４は、図５の演算回路７０の元画像用シフトレジスタ７２および係数用シフトレジスタ７４と同様の機能を有する。 In the arithmetic circuit 100, the original image shift register 102 and the coefficient shift register 104 have the same functions as the original image shift register 72 and the coefficient shift register 74 of the arithmetic circuit 70 in FIG.

前処理回路１１２は、列数分の５つの積和演算器１０６ａ、１０６ｂ、１０６ｃ、１０６ｄ、１０６ｅによって構成されている。前処理回路１１２も、図５の前処理回路８２と同様の機能を有する。 The pre-processing circuit 112 includes five product-sum calculators 106a, 106b, 106c, 106d, and 106e corresponding to the number of columns. The preprocessing circuit 112 also has the same function as the preprocessing circuit 82 in FIG.

また、後処理回路１１４は、列数分の５つの積和レジスタ１１１ａ、１１１ｂ、１１１ｃ、１１１ｄ、１１１ｅによって構成されている。後処理回路１１４も、図５の演算回路７０の後処理回路８４と同様の機能を有する。 Further, the post-processing circuit 114 includes five product-sum registers 111a, 111b, 111c, 111d, and 111e corresponding to the number of columns. The post-processing circuit 114 also has the same function as the post-processing circuit 84 of the arithmetic circuit 70 in FIG.

カウンタ１１６は、初期化後、クロック信号に同期して、１サイクル毎に１〜５を繰り返しカウントする。本実施形態の場合、カウンタ１１６のカウント値ｋ＝１の時、係数用シフトレジスタ１０４から係数値Ｃ１（＝Ｃ１１，Ｃ２１，Ｃ３１，Ｃ４１，Ｃ５１）が出力され、全ての積和演算器１０６ａ、１０６ｂ、１０６ｃ、１０６ｄ、１０６ｅに共通に入力される。また、カウント値ｋ＝２〜５の時に、係数値Ｃ２〜Ｃ５が、それぞれ出力される。 The counter 116 repeatedly counts 1 to 5 every cycle in synchronization with the clock signal after initialization. In the present embodiment, when the count value k = 1 of the counter 116, the coefficient value C1 (= C11, C21, C31, C41, C51) is output from the coefficient shift register 104, and all the product-sum calculators 106a, 106b, 106c, 106d, and 106e are input in common. Further, when the count value k = 2 to 5, coefficient values C2 to C5 are output, respectively.

すなわち、カウンタ１１６のカウント値ｋ＝１〜５の時に、係数値Ｃ１〜Ｃ５についての処理がそれぞれ行われる。 That is, when the count value k of the counter 116 is 1 to 5, the processing for the coefficient values C1 to C5 is performed.

演算回路１００の動作は、図５の演算回路７０と同様である。前処理回路１１２では、前述のように、カウンタ１１６のカウント値ｋ＝１〜５の時に、それぞれ係数値Ｃ１〜Ｃ５を利用した積和演算処理が行われ、積和演算結果Ｔ１〜Ｔ５が出力される。後処理回路１１４では、ｋ＝１のときには、前処理回路から入力された積和演算結果Ｔ１〜Ｔ５がそのまま保持される。一方、ｋ＝２〜５のときには、前サイクルで（すなわち、それぞれｋ＝１〜４のときに）保持された値Ｐ１’〜Ｐ５’とＴ１〜Ｔ５との積算値が保持される。そして、５サイクル毎に、後処理回路１１４から、フィルタ処理後の画像の５画素分の画素値Ｐが出力される。 The operation of the arithmetic circuit 100 is the same as that of the arithmetic circuit 70 in FIG. In the pre-processing circuit 112, as described above, when the count value k of the counter 116 is 1 to 5, product-sum operation processing using the coefficient values C1 to C5 is performed, and product-sum operation results T1 to T5 are output. Is done. In the post-processing circuit 114, when k = 1, the product-sum operation results T1 to T5 input from the pre-processing circuit are held as they are. On the other hand, when k = 2 to 5, the integrated values of the values P1 'to P5' and T1 to T5 held in the previous cycle (that is, when k = 1 to 4 respectively) are held. Then, every five cycles, the post-processing circuit 114 outputs the pixel value P for five pixels of the image after filtering.

同様に、図８に示す元画像の各画素の画素値が画像情報記憶装置に記憶され、図１３に示す順序でフィルタ処理が順次行われる場合、演算回路１００の具体的な動作は、下記表８〜１０に示す通りである。 Similarly, when the pixel value of each pixel of the original image shown in FIG. 8 is stored in the image information storage device and the filter processing is sequentially performed in the order shown in FIG. 13, the specific operation of the arithmetic circuit 100 is as follows. As shown in 8-10.

すなわち、演算回路１００では、初期化後、５サイクルで最初のフィルタ処理後の５画素分の画素値Ｐ１１〜Ｐ５１が同時に出力される。これ以後、５サイクル毎に、フィルタ処理後の５画素分の画素値が同時に出力される。 That is, in the arithmetic circuit 100, pixel values P11 to P51 for five pixels after the first filter processing are output simultaneously in five cycles after initialization. Thereafter, pixel values for five pixels after the filter processing are simultaneously output every five cycles.

本実施形態の演算回路１００においても、隣りあう画素の演算で共通に使用される元画像の画素の画素値を画像情報記憶装置から２度読み出す必要がないという利点がある。また、本実施形態の演算回路１００では、５つの積和演算器１０６ａ、１０６ｂ、１０６ｃ、１０６ｄ、１０６ｅと、５つの積和レジスタ１１１ａ、１１１ｂ、１１１ｃ、１１１ｄ、１１１ｅとを用いているため、従来の５倍の処理速度が実現されている。 The arithmetic circuit 100 of the present embodiment also has an advantage that it is not necessary to read out the pixel value of the pixel of the original image that is commonly used in the calculation of adjacent pixels from the image information storage device twice. In addition, the arithmetic circuit 100 according to the present embodiment uses five product-sum arithmetic units 106a, 106b, 106c, 106d, and 106e and five product-sum registers 111a, 111b, 111c, 111d, and 111e. 5 times the processing speed.

なお、第４および第５の実施形態の演算回路７０，１００も、５列×５行に限らず、Ｎ列×Ｎ行（Ｎは２以上の整数）の畳み込み演算によるフィルタ処理に適用可能である。 The arithmetic circuits 70 and 100 of the fourth and fifth embodiments are not limited to 5 columns × 5 rows, and can be applied to filter processing by convolution operation of N columns × N rows (N is an integer of 2 or more). is there.

第４および第５の実施形態の演算回路では、一般的に、データレジスタに２次元空間のＮ列×Ｎ行の範囲の１〜Ｎ列目のデータを保持する。そして、第１サイクルにおいて、Ｎ個の内のｎ番目（ｎ＝１〜Ｎ）の積和演算器で、Ｎ列×Ｎ行のデータレジスタのｎ列目に保持されたＮ個のデータ値と、Ｎ列×Ｎ行の１列目のＮ個の係数との間の積和を演算し、それぞれの演算結果を、後処理回路のｎ番目の積和レジスタに保持する。その後、第ｋ（ｋ＝２〜Ｎ）サイクルにかけて順に、データレジスタに保持されたデータを−１列分シフトし、データレジスタのＮ列目に、２次元空間のＮ列×Ｎ行の範囲に列方向に隣接するＮ＋ｋ−１列目のデータを保持し、さらに、Ｎ個の内のｎ番目の積和演算器で、Ｎ列×Ｎ行のデータレジスタのｎ列目に保持されたＮ個のデータ値とＮ列×Ｎ行のｋ列目のＮ個の係数との間の積和を演算し、それぞれの演算結果と、前サイクルで後処理回路のｎ番目の積和レジスタに保持された値との積算値を、後処理回路のｎ番目の積和レジスタに保持する操作を繰り返す。これにより、Ｎ個の積和レジスタのそれぞれから、２次元空間のＮ列×Ｎ行の範囲の中心に位置する第１演算対象点、および、２次元空間の列方向に第１の演算対象点に順に隣りあう、２〜Ｎ番目の演算対象点の畳み込み演算結果が出力される。 In the arithmetic circuits of the fourth and fifth embodiments, generally, data in the 1st to Nth columns in the range of N columns × N rows in the two-dimensional space is held in the data register. Then, in the first cycle, the N data values held in the nth column of the data register of N columns × N rows are calculated by the nth (n = 1 to N) product-sum calculators out of N , N-column × N-row N-th coefficient of the first column is calculated, and each calculation result is held in the n-th product-sum register of the post-processing circuit. Thereafter, the data held in the data register is shifted by −1 column in order through the k-th (k = 2 to N) cycle, and the Nth column of the data register is shifted to the range of N columns × N rows in the two-dimensional space. N + k−1 columns of data adjacent in the column direction are held, and further, N pieces of data held in the nth column of the data register of N columns × N rows by the nth product-sum calculator. Is calculated and stored in the n-th product-sum register of the post-processing circuit in the previous cycle. The operation of holding the integrated value with the obtained value in the nth product-sum register of the post-processing circuit is repeated. Thereby, from each of the N product-sum registers, the first calculation target point located in the center of the range of N columns × N rows in the two-dimensional space, and the first calculation target point in the column direction of the two-dimensional space The results of the convolution calculation of the 2nd to Nth calculation target points that are adjacent to each other are output.

最後に、本発明の第６の実施形態について説明する。 Finally, a sixth embodiment of the present invention will be described.

図７は、本発明の演算回路の構成を表す第６の実施形態の概念図である。同図に示す演算回路１２０も、２次元空間に配置された元画像の５列×５行の２５画素分の画素値から、畳み込み演算によるフィルタ処理を行う。図７の演算回路１２０は、元画像用シフトレジスタ１２２と、係数用シフトレジスタ１２４と、前処理回路１３２と、後処理回路１３４と、カウンタ１３６とによって構成されている。 FIG. 7 is a conceptual diagram of the sixth embodiment showing the configuration of the arithmetic circuit of the present invention. The arithmetic circuit 120 shown in the figure also performs a filtering process by a convolution operation from pixel values for 25 pixels of 5 columns × 5 rows of the original image arranged in the two-dimensional space. 7 includes an original image shift register 122, a coefficient shift register 124, a pre-processing circuit 132, a post-processing circuit 134, and a counter 136.

図７に示す演算回路１２０は、図６に示す演算回路１００において、積和演算器の個数を減らし、図３に示す演算回路５０のように、これを時系列（時分割）に使用する構成のものである。従って、カウンタ１３６は、図６に示す演算回路１００のカウンタ１１６の機能と、図３に示す演算回路５０のカウンタ６６の機能とを併せ持つ。すなわち、後処理回路１３４の処理サイクルを制御するカウント値ｋと、前処理回路の処理のステップを制御するカウント値ｊとを生成する。カウント値ｊは、０に初期化された後、クロック信号に同期して０〜４のカウントを繰り返す。カウント値ｋは、１に初期化された後、カウント値ｊが０に戻る毎に１つずつ加算されて、１〜５のカウントを繰り返す。 The arithmetic circuit 120 shown in FIG. 7 has a configuration in which the number of product-sum arithmetic units is reduced in the arithmetic circuit 100 shown in FIG. 6 and is used in time series (time division) like the arithmetic circuit 50 shown in FIG. belongs to. Therefore, the counter 136 has both the function of the counter 116 of the arithmetic circuit 100 shown in FIG. 6 and the function of the counter 66 of the arithmetic circuit 50 shown in FIG. That is, a count value k for controlling the processing cycle of the post-processing circuit 134 and a count value j for controlling the processing steps of the pre-processing circuit are generated. After the count value j is initialized to 0, the count of 0 to 4 is repeated in synchronization with the clock signal. After the count value k is initialized to 1, the count value j is incremented by one every time the count value j returns to 0, and the count of 1 to 5 is repeated.

以下、図７の演算回路１２０について、図６の演算回路１００との違いを重点的に説明する。 Hereinafter, the difference between the arithmetic circuit 120 in FIG. 7 and the arithmetic circuit 100 in FIG. 6 will be mainly described.

演算回路１２０において、元画像用シフトレジスタ１２２および係数用シフトレジスタ１２４は、図６の演算回路１００の元画像用シフトレジスタ１０２および係数用シフトレジスタ１０４と同様の機能を有する。 In the arithmetic circuit 120, the original image shift register 122 and the coefficient shift register 124 have the same functions as the original image shift register 102 and the coefficient shift register 104 of the arithmetic circuit 100 in FIG.

図６の演算回路１００では、元画像用シフトレジスタ１０２から、前処理回路１１２の５つの積和演算器１０６ａ、１０６ｂ、１０６ｃ、１０６ｄ、１０６ｅに対して、各々対応する各列の１画素分の画素値Ｌ_i〜Ｌ_i+4が同時に入力される。これに対し、図７の演算回路１２０では、１ステップ毎に、各列の５画素分の画素値Ｌ_i+jがカウント値ｊの変化に従って順に、１つの積和演算器１２６に時系列に入力される。両者は、この点で異なっている。そして、カウント値ｊが０〜４に変化する過程で５ステップの処理が行われた後、カウント値ｊが０に戻り、次のサイクルに進むときに、処理対象となる次の１列の５画素分の画素値がシフトインされる。 In the arithmetic circuit 100 shown in FIG. 6, from the original image shift register 102 to the five product-sum arithmetic units 106a, 106b, 106c, 106d, and 106e of the preprocessing circuit 112, one pixel of each corresponding column. Pixel values L _{i to} L _{i + 4} are input simultaneously. On the other hand, in the arithmetic circuit 120 of FIG. 7, for each step, the pixel values L _{i + j} for five pixels in each column are sequentially transmitted to one product-sum arithmetic unit 126 in time series in accordance with the change in the count value j. Entered. The two are different in this respect. Then, after the process of 5 steps is performed in the process of the count value j changing from 0 to 4, the count value j returns to 0, and when proceeding to the next cycle, 5 of the next column to be processed is processed. Pixel values for pixels are shifted in.

１つの画素の画素値を８ビットとすると、図６に示す演算回路１００では、８ビット×５×５＝２００本の配線が必要となる。これに対し、図７に示す演算回路１２０では、元画像用シフトレジスタ１２２から出力される画素値は１列の５画素分だけであるから、同じく１つの画素値を８ビットとすると、８ビット×５＝４０本の配線だけしか必要がなく、配線数を１／５に削減することができる。 If the pixel value of one pixel is 8 bits, the arithmetic circuit 100 shown in FIG. 6 requires 8 bits × 5 × 5 = 200 wires. On the other hand, in the arithmetic circuit 120 shown in FIG. 7, since the pixel value output from the original image shift register 122 is only 5 pixels in one column, similarly, if one pixel value is 8 bits, 8 bits. X5 = Only 40 wires are required, and the number of wires can be reduced to 1/5.

また、係数用シフトレジスタ１２４では、図６の係数用シフトレジスタ１０４と同様に、１サイクル毎に、すなわち、カウント値ｋが変化する毎に、図６中、５つの係数値Ｃ１〜Ｃ５が順次シフト（ローテーション）され、図中で最も左側の１列の５つの係数値が出力される。ただし、同一の係数値Ｃ_kの供給が、カウント値ｊが０〜４の間に変化する５ステップの期間継続して行われ、その後、カウント値ｋが変化し、次のサイクルに進む時に係数Ｃ１〜Ｃ５のシフトが行われる点で、図６の係数用シフトレジスタ１０４とは異なっている。 Further, in the coefficient shift register 124, as in the coefficient shift register 104 in FIG. 6, the five coefficient values C1 to C5 in FIG. 6 are sequentially applied every cycle, that is, every time the count value k changes. It is shifted (rotated), and five coefficient values in the leftmost column in the figure are output. However, the supply of the same coefficient value C _k is continuously performed for a period of five steps in which the count value j changes between 0 and 4, and then the coefficient value is changed when the count value k changes and proceeds to the next cycle. 6 is different from the coefficient shift register 104 in FIG. 6 in that the shifts C1 to C5 are performed.

続いて、前処理回路６２は、１つの積和演算器１２６によって構成されている。積和演算器１２６は、Ｃ_kＬ_i+jの積和演算を行い、その演算結果Ｔ_j+1を出力する。 Subsequently, the preprocessing circuit 62 is configured by one product-sum calculator 126. The product-sum operation unit 126 performs a product-sum operation on C _k L _{i + j} and outputs the operation result T _{j + 1} .

また、後処理回路１３４は、列数分の５つの積和レジスタ１３１ａ、１３１ｂ、１３１ｃ、１３１ｄ、１３１ｅによって構成されている。後処理回路１３４の積和レジスタ１３１ａ〜１３１ｅのそれぞれは、例えば、図５の積和レジスタ８１ａ（または８１ｂ）の前段に、図４の一時レジスタ６０ａ１（または６０ｂ１〜６０ｅ１）を組み合わせた構成とすることができる。この場合、図４に示した演算回路５０と同様に、それぞれの積和レジスタ１３１ａ〜１３１ｅの一時レジスタには、カウント値ｊが対応する値である期間にクロック信号を供給する。また、２段目のフリップフロップには、カウント値ｊが０に戻った期間にクロック信号を供給する。 Further, the post-processing circuit 134 includes five product-sum registers 131a, 131b, 131c, 131d, and 131e corresponding to the number of columns. Each of the product-sum registers 131a to 131e of the post-processing circuit 134 has, for example, a configuration in which the temporary register 60a1 (or 60b1 to 60e1) of FIG. 4 is combined with the preceding stage of the product-sum register 81a (or 81b) of FIG. be able to. In this case, similarly to the arithmetic circuit 50 shown in FIG. 4, a clock signal is supplied to the temporary registers of the product-sum registers 131a to 131e during a period in which the count value j is a corresponding value. A clock signal is supplied to the flip-flop at the second stage during the period when the count value j returns to zero.

次に、演算回路１２０の動作を説明する。 Next, the operation of the arithmetic circuit 120 will be described.

演算回路１２０では、初期化のサイクルで、元画像用シフトレジスタ１２２には、１〜５列目の画素値Ｌ１〜Ｌ５が保持され、係数用レジスタ１４には、５列分の係数値Ｃ１〜Ｃ５が保持される。 In the arithmetic circuit 120, in the initialization cycle, the pixel values L1 to L5 for the first to fifth columns are held in the original image shift register 122, and the coefficient values C1 to C5 for five columns are stored in the coefficient register 14. C5 is retained.

サイクル１では、カウンタ１３６のカウント値ｊ＝０〜４に応じて、元画像用シフトレジスタ１２２から、１〜５列目の画素値Ｌ１〜Ｌ５が順次出力され、係数用シフトレジスタ１２４から、係数値Ｃ１（すなわち、ｋ＝１である）が出力される。 In cycle 1, the pixel values L1 to L5 in the first to fifth columns are sequentially output from the original image shift register 122 according to the count value j = 0 to 4 of the counter 136, and the coefficient shift register 124 A numerical value C1 (that is, k = 1) is output.

前処理回路１３２の積和演算器１２６では、カウンタ１３６のカウント値ｊ＝０〜４に応じて、１ステップ毎に、１〜５列目の画素値Ｌ_i+j（＝Ｌ１〜Ｌ５）と、係数値Ｃ１とから積和演算Ｃ１Ｌ_i+jが行われ、その演算結果Ｔ_j+1（Ｔ１〜Ｔ５）が順次出力される。すなわち、積和演算器１２６では、カウント値ｊ＝０〜４に応じて、１ステップ毎に、積和演算Ｃ１Ｌ１、Ｃ１Ｌ２、Ｃ１Ｌ３、Ｃ１Ｌ４、Ｃ１Ｌ５が順次行われ、その演算結果Ｔ１〜Ｔ５が出力される。 In the sum-of-products calculator 126 of the pre-processing circuit 132, the pixel values L _{i + j} (= L1 to L5) in the first to fifth columns are obtained for each step in accordance with the count value j = 0 to 4 of the counter 136. The product-sum operation C1L _{i + j} is performed from the coefficient value C1, and the operation results T _{j + 1} (T1 to T5) are sequentially output. That is, the product-sum calculator 126 sequentially performs product-sum operations C1L1, C1L2, C1L3, C1L4, and C1L5 for each step according to the count value j = 0 to 4, and outputs the calculation results T1 to T5. Is done.

ステップ１〜５の積和演算器１２６の出力Ｔ１〜Ｔ５（＝Ｃ１Ｌ１、Ｃ１Ｌ２、Ｃ１Ｌ３、Ｃ１Ｌ４、Ｃ１Ｌ５）は、後処理回路１３４の積和レジスタ１３１ａ、１３１ｂ、１３１ｃ、１３１ｄ、１３１ｅにそれぞれ入力され、クロックエッジのタイミングで保持される。 Outputs T1 to T5 (= C1L1, C1L2, C1L3, C1L4, C1L5) of the product-sum operation unit 126 in steps 1 to 5 are respectively input to the product-sum registers 131a, 131b, 131c, 131d, and 131e of the post-processing circuit 134. , And held at the timing of the clock edge.

次のサイクル２で、元画像用シフトレジスタ１２２に、処理対象となる次の１列の５画素分の画素値Ｌ６がシフトインされ、元画像用シフトレジスタ１２２には、２〜６列目の画素値Ｌ２〜Ｌ６が保持される。 In the next cycle 2, the pixel values L 6 for five pixels in the next column to be processed are shifted into the original image shift register 122, and the second to sixth columns are stored in the original image shift register 122. Pixel values L2 to L6 are held.

サイクル２におけるステップ１〜５では、カウンタ１３６のカウント値ｊ＝０〜４に応じて、元画像用シフトレジスタ１２２から、１〜５列目の画素値Ｌ２〜Ｌ６が順次出力され、係数用シフトレジスタ１２４から、係数値Ｃ２（すなわち、ｋ＝２である）が出力される。 In steps 1 to 5 in cycle 2, the pixel values L2 to L6 in the first to fifth columns are sequentially output from the original image shift register 122 according to the count value j = 0 to 4 of the counter 136, and the coefficient shift is performed. The coefficient value C2 (that is, k = 2) is output from the register 124.

同様に、積和演算器１２６では、カウンタ１３６のカウント値ｊ＝０〜４に応じて、１〜５列目の画素値Ｌ_i+j（＝Ｌ２〜Ｌ６）と、係数値Ｃ２とから積和演算Ｃ２Ｌ_i+jが行われ、その演算結果Ｔ_j+1（Ｔ１〜Ｔ５）が順次出力される。すなわち、積和演算器１３２では、カウント値ｊ＝０〜４に応じて、１ステップ毎に、積和演算Ｃ２Ｌ２、Ｃ２Ｌ３、Ｃ２Ｌ４、Ｃ２Ｌ５、Ｃ２Ｌ６が順次行われ、その演算結果Ｔ１〜Ｔ５が出力される。 Similarly, the product-sum calculator 126 calculates the product from the pixel values L _{i + j} (= L2 to L6) in the first to fifth columns and the coefficient value C2 in accordance with the count value j = 0 to 4 of the counter 136. The sum operation C2L _{i + j} is performed, and the operation results T _{j + 1} (T1 to T5) are sequentially output. That is, the product-sum calculator 132 sequentially performs the product-sum operations C2L2, C2L3, C2L4, C2L5, and C2L6 for each step according to the count value j = 0 to 4, and outputs the calculation results T1 to T5. Is done.

サイクル２での積和演算器１２６の出力Ｔ１〜Ｔ５（＝Ｃ２Ｌ２、Ｃ２Ｌ３、Ｃ２Ｌ４、Ｃ２Ｌ５、Ｃ２Ｌ６）は、それぞれサイクル１で積和レジスタ１３１ａ、１３１ｂ、１３１ｃ、１３１ｄ、１３１ｅに保持された各々対応するＰ１’〜Ｐ５’（＝Ｃ１Ｌ１、Ｃ１Ｌ２、Ｃ１Ｌ３、Ｃ１Ｌ４、Ｃ１Ｌ５）と加算される。そして、その加算結果が、クロックエッジのタイミングで、それぞれ積和レジスタ１３１ａ、１３１ｂ、１３１ｃ、１３１ｄ、１３１ｅに再度保持される。 Outputs T1 to T5 (= C2L2, C2L3, C2L4, C2L5, C2L6) of the product-sum calculator 126 in cycle 2 correspond to the product-sum registers 131a, 131b, 131c, 131d, and 131e respectively stored in cycle 1. P1 ′ to P5 ′ (= C1L1, C1L2, C1L3, C1L4, C1L5) to be added. The addition result is again held in the product-sum registers 131a, 131b, 131c, 131d, and 131e at the timing of the clock edge.

すなわち、積和レジスタ１３１ａ、１３１ｂ、１３１ｃ、１３１ｄ、１３１ｅには、それぞれＣ１Ｌ１＋Ｃ２Ｌ２、Ｃ１Ｌ２＋Ｃ２Ｌ３、Ｃ１Ｌ３＋Ｃ２Ｌ４、Ｃ１Ｌ４＋Ｃ２Ｌ５、Ｃ１Ｌ５＋Ｃ２Ｌ６が保持される。 That is, C1L1 + C2L2, C1L2 + C2L3, C1L3 + C2L4, C1L4 + C2L5, and C1L5 + C2L6 are held in the product-sum registers 131a, 131b, 131c, 131d, and 131e, respectively.

なお、前述のように、後処理回路１３４を、図３，４に示した演算回路５０の場合のように、カウント値ｊが増大する過程で積和演算結果を一時レジスタに保持し、カウント値ｊが０に戻った期間に、そのまま、もしくは、前サイクルで保持した値と積算して、後段のフリップフロップに保持する構成とすることができる。このとき、カウント値ｊが０に戻るタイミングでカウント値ｋのカウントを進めた場合、サイクル１における積和演算結果の保持は、実際には、カウント値ｋ＝２に変化した後に行われる。また、サイクル２における積和演算結果と前のサイクルで保持した値との積算値の保持は、カウント値ｋ＝３に変化した後に行われる。従って、積和演算結果をそのまま保持するか、前のサイクルで保持した値と積算してから保持するかの制御を、図５に示した演算回路７０のように、セレクタを用いて行う場合、カウント値ｋ＝２であるときには一時レジスタの出力を選択し、カウント値ｋが２以外であるときには加算器の出力を選択するように制御することができる。 As described above, the post-processing circuit 134 holds the product-sum operation result in the temporary register in the process of increasing the count value j as in the case of the arithmetic circuit 50 shown in FIGS. In the period when j returns to 0, it can be held in the flip-flop at the subsequent stage as it is, or integrated with the value held in the previous cycle. At this time, when the count value k is incremented at the timing when the count value j returns to 0, the product-sum operation result in the cycle 1 is actually held after the count value k = 2. The accumulated value of the product-sum operation result in cycle 2 and the value held in the previous cycle is held after the count value k changes to 3. Therefore, when control is performed using a selector as in the arithmetic circuit 70 shown in FIG. 5 to control whether the product-sum operation result is held as it is or after being integrated with the value held in the previous cycle, It can be controlled to select the output of the temporary register when the count value k = 2, and to select the output of the adder when the count value k is other than 2.

以下同様にして、１サイクル毎に、元画像用シフトレジスタ１２２に、次の１列の５画素分の画素値がシフトインされ、係数用シフトレジスタ１２４から、次の係数値Ｃ_kが出力され、上記の動作が繰り返し行われる。その結果、初期化後、２５サイクル毎に、フィルタ処理後の５画素分の画素値Ｐが同時に出力される。 In the same manner, the pixel values for the next five pixels are shifted into the original image shift register 122 for each cycle, and the next coefficient value C _k is output from the coefficient shift register 124. The above operation is repeated. As a result, after initialization, the pixel value P for five pixels after the filter processing is simultaneously output every 25 cycles.

本実施形態の演算回路１２０においても、隣りあう画素の演算で共通に使用される元画像の画素の画素値を画像情報記憶装置から２度読み出す必要がないという利点がある。また、演算回路１２０では、５つではなく、１つの積和演算器１２６だけを用いることによって、その処理速度は、従来の１／５倍となるが、処理速度が要求されない用途であれば、回路規模を削減することができるという利点がある。もちろん、図３の演算回路５０のように、前処理回路１３２に複数の積和演算器を設け、５列の画素値を利用した積和演算結果Ｔ１〜Ｔ５を、列数（５）未満の繰り返し回数で得ることも可能である。これによって、処理速度と回路規模とのバランスをはかることが可能である。 The arithmetic circuit 120 of this embodiment also has an advantage that it is not necessary to read out the pixel value of the pixel of the original image that is commonly used in the calculation of adjacent pixels from the image information storage device. Further, in the arithmetic circuit 120, by using only one product-sum arithmetic unit 126 instead of five, the processing speed becomes 1/5 times that of the conventional one, but if the processing speed is not required, There is an advantage that the circuit scale can be reduced. Of course, like the arithmetic circuit 50 in FIG. 3, the pre-processing circuit 132 is provided with a plurality of product-sum arithmetic units, and the product-sum arithmetic results T1 to T5 using the pixel values of five columns are less than the number of columns (5). It is also possible to obtain the number of repetitions. This makes it possible to balance the processing speed and the circuit scale.

なお、本実施形態の演算回路１２０では、積和演算器１２６に画素値をＬ_iからＬ_i+4の順序で入力しているが、これに限らず、画素値はどのような順序で入力しても良い。また、後処理回路１３４において、積和演算結果Ｔ１〜Ｔ４をそのまま保持するか、前のサイクルで保持した値Ｐ１’〜Ｐ５’と積和演算結果Ｔ１〜Ｔ５との積算値を保持するかの制御を、セレクタを用いて行うことは必須ではない。例えば、積和レジスタを構成するフリップフロップに加算器の出力を直接接続した構成とした場合であっても、フリップフロップからのフィルタ処理後の画素値Ｐの出力を終えた後に、次の画素についての積和演算結果Ｔ１〜Ｔ５の一時レジスタへの保持を行っている期間に、フリップフロップの初期化を行い、保持値Ｐ１〜Ｐ５を０とする処理を行うことが可能である。 In the arithmetic circuit 120 of this embodiment, the pixel values are input to the product-sum calculator 126 in the order of L _i to L _{i + 4} , but the present invention is not limited to this, and the pixel values are input in any order. You may do it. Further, in the post-processing circuit 134, whether the product-sum operation results T1 to T4 are held as they are or whether the accumulated values of the values P1 ′ to P5 ′ held in the previous cycle and the product-sum operation results T1 to T5 are held. It is not essential to perform control using a selector. For example, even when the output of the adder is directly connected to the flip-flop constituting the product-sum register, after the output of the pixel value P after the filter processing from the flip-flop, During the period in which the product-sum operation results T1 to T5 are held in the temporary register, it is possible to initialize the flip-flop and perform the process of setting the hold values P1 to P5 to 0.

また、本実施形態の演算回路は、５列×５行に限らず、Ｎ列×Ｎ行（Ｎは２以上の整数）の畳み込み演算によるフィルタ処理に適用可能である。この場合、積和演算器の個数ｍは、１≦ｍ＜Ｎとする。 In addition, the arithmetic circuit according to the present embodiment is not limited to 5 columns × 5 rows, and can be applied to filter processing based on N columns × N rows (N is an integer of 2 or more). In this case, the number m of product-sum calculators is 1 ≦ m <N.

本実施形態の演算回路では、一般的に、データレジスタに２次元空間内のＮ列×Ｎ行の範囲の１〜Ｎ列目のデータを保持する。第１サイクルにおいて、前処理回路のｍ個の積和演算器の少なくとも一部のそれぞれで、Ｎ列×Ｎ行のデータレジスタのいずれかの列に保持されたＮ個のデータ値と、Ｎ列×Ｎ行の１列目のＮ個の係数との間の積和を演算する操作を繰り返して行うことにより、データレジスタの全ての列に保持されたデータ値と１列目の係数との間の積和の演算を行う。そして、積和演算器が演算した、データレジスタのｎ列目（ｎ＝１〜Ｎ）に保持されたデータ値と１列目の係数との間の積和演算結果を、前処理回路のｎ番目の積和レジスタに保持する。その後、第ｋ（ｋ＝２〜Ｎ）サイクルにかけて順に、データレジスタに保持されたデータを列方向に−１シフトし、データレジスタのＮ列目に、２次元空間のＮ列×Ｎ行の範囲に列方向に隣接するＮ＋ｋ−１列目のデータを保持し、前処理回路のｍ個の積和演算器の少なくとも一部のそれぞれで、Ｎ列×Ｎ行のデータレジスタのいずれかの列に保持されたＮ個のデータ値と、Ｎ列×Ｎ行のｋ列目のＮ個の係数との間の積和を演算する操作を、繰り返して行うことにより、データレジスタの全ての列に保持されたデータ値とｋ列目の係数との間の積和の演算を行う。続いて、積和演算器が演算した、データレジスタのｎ列目に保持されたデータ値とｋ列目の係数との間の積和演算結果と、前サイクルで後処理回路のｎ番目の積和レジスタに保持された値との積算値を、後処理回路のｎ番目の積和レジスタに保持する操作を繰り返す。これにより、Ｎ個の積和レジスタのそれぞれから、２次元空間のＮ列×Ｎ行の範囲の中心に位置する第１の演算対象点、および、２次元空間の列方向に第１の演算対象点に順に隣りあう、２〜Ｎ番目の演算対象点の畳み込み演算結果が出力される。 In the arithmetic circuit of this embodiment, generally, data in the 1st to Nth columns in the range of N columns × N rows in the two-dimensional space is held in the data register. In the first cycle, N data values held in any column of N columns × N rows of data registers in each of at least some of the m product-sum calculators of the preprocessing circuit, and N columns X By repeating the operation of calculating the sum of products with the N coefficients in the first column of N rows, the data values held in all the columns of the data register and the coefficients in the first column Perform product-sum operation. Then, the product-sum operation result between the data value held in the n-th column (n = 1 to N) of the data register and the coefficient in the first column, which is calculated by the product-sum calculator, is expressed as n in the preprocessing circuit. Hold in the th product-sum register. Thereafter, the data held in the data register is sequentially shifted by −1 in the column direction over the kth (k = 2 to N) cycle, and the range of N columns × N rows in the two-dimensional space is shifted to the Nth column of the data register. N + k−1 columns of data adjacent to each other in the column direction are held in at least one part of the m product-sum calculators of the pre-processing circuit, and stored in any column of N columns × N rows of data registers. By repeatedly performing the operation of calculating the sum of products between the held N data values and the N coefficients of the N columns × N rows and the kth column, the data values are held in all the columns of the data register. The product-sum operation is performed between the obtained data value and the coefficient in the k-th column. Subsequently, the product-sum operation result between the data value held in the nth column of the data register and the coefficient in the kth column calculated by the product-sum calculator and the nth product of the post-processing circuit in the previous cycle The operation of holding the integrated value with the value held in the sum register in the nth product-sum register of the post-processing circuit is repeated. Thereby, from each of the N product-sum registers, the first calculation target point located in the center of the range of N columns × N rows in the two-dimensional space, and the first calculation target in the column direction of the two-dimensional space A convolution calculation result of the 2nd to Nth calculation target points that are adjacent to the point in order is output.

なお、本発明において、データレジスタは、記憶装置（例えば、元画像用レジスタは、画像情報記憶装置）から、ラインメモリやＦＩＦＯなどのバッファを介してデータを読み込む構成としても良い。また、前処理回路は、積和演算器以外の構成要素を含んでいても良いし、後処理回路は、積和レジスタ以外の構成要素を含んでいても良い。また、積和演算器や積和レジスタの具体的な構成も何ら限定されず、同様の機能を果たす各種構成のものが使用可能である。 In the present invention, the data register may be configured to read data from a storage device (for example, the original image register is an image information storage device) via a buffer such as a line memory or FIFO. The preprocessing circuit may include components other than the product-sum operation unit, and the post-processing circuit may include components other than the product-sum register. Further, the specific configurations of the product-sum operation unit and the product-sum register are not limited at all, and various configurations having the same functions can be used.

本発明は、基本的に以上のようなものである。
以上、本発明の演算回路および演算方法について詳細に説明したが、本発明は上記実施形態に限定されず、本発明の主旨を逸脱しない範囲において、種々の改良や変更をしてもよいのはもちろんである。 The present invention is basically as described above.
Although the arithmetic circuit and the arithmetic method of the present invention have been described in detail above, the present invention is not limited to the above-described embodiments, and various improvements and modifications may be made without departing from the gist of the present invention. Of course.

本発明の演算回路の構成を表す第１の実施形態の概略図である。It is the schematic of 1st Embodiment showing the structure of the arithmetic circuit of this invention. 本発明の演算回路の構成を表す第２の実施形態の概念図である。It is a conceptual diagram of 2nd Embodiment showing the structure of the arithmetic circuit of this invention. 本発明の演算回路の構成を表す第３の実施形態の概念図である。It is a conceptual diagram of 3rd Embodiment showing the structure of the arithmetic circuit of this invention. 図３に示す演算回路の具体的な構成を表す概略図である。FIG. 4 is a schematic diagram illustrating a specific configuration of the arithmetic circuit illustrated in FIG. 3. 本発明の演算回路の構成を表す第４の実施形態の概略図である。It is the schematic of 4th Embodiment showing the structure of the arithmetic circuit of this invention. 本発明の演算回路の構成を表す第５の実施形態の概念図である。It is a conceptual diagram of 5th Embodiment showing the structure of the arithmetic circuit of this invention. 本発明の演算回路の構成を表す第６の実施形態の概念図である。It is a conceptual diagram of 6th Embodiment showing the structure of the arithmetic circuit of this invention. ２次元空間に配置された元画像の各画素の画素値を表す一例の概念図である。It is a conceptual diagram of an example showing the pixel value of each pixel of the original image arranged in the two-dimensional space. ２列×２行のフィルタ処理後の画像の各画素の画素値を表す一例の概念図である。It is a conceptual diagram of an example showing the pixel value of each pixel of the image after the filter process of 2 columns × 2 rows. 図８に示す元画像の各画素の画素値と図９に示すフィルタ処理後の画像の各画素の画素値との関係を表す概念図である。It is a conceptual diagram showing the relationship between the pixel value of each pixel of the original image shown in FIG. 8, and the pixel value of each pixel of the image after the filter process shown in FIG. 図１０において、１番目の処理対象の元画像の４画素分の画素値とフィルタ処理後の画素の画素値を表す概念図である。In FIG. 10, it is a conceptual diagram showing the pixel value for 4 pixels of the original image of the 1st process target, and the pixel value of the pixel after a filter process. 図１０において、２番目の処理対象の元画像の４画素分の画素値とフィルタ処理後の画素の画素値を表す概念図である。In FIG. 10, it is a conceptual diagram showing the pixel value for 4 pixels of the 2nd original image of a process target, and the pixel value of the pixel after a filter process. 畳み込み演算によるフィルタ処理の順序を表す一例の概念図である。It is a conceptual diagram of an example showing the order of the filter process by a convolution operation. 従来の畳み込み演算によるフィルタ処理を行う演算回路の構成を表す一例の概略図である。It is the schematic of an example showing the structure of the arithmetic circuit which performs the filter process by the conventional convolution operation. 図１４に示す演算回路において、カウンタのカウント値が０の場合の状態を表す概念図である。FIG. 15 is a conceptual diagram illustrating a state when the count value of the counter is 0 in the arithmetic circuit illustrated in FIG. 14. 図１４に示す演算回路において、カウンタのカウント値が１の場合の状態を表す概念図である。FIG. 15 is a conceptual diagram illustrating a state when the count value of the counter is 1 in the arithmetic circuit illustrated in FIG. 14. 積和演算器の構成を表す一例の概念図である。It is a conceptual diagram of an example showing the structure of a product-sum calculator.

Explanation of symbols

１０，３０，５０，７０，１００，１２０，１４０演算回路
１２，３２，５２，１０２，１４２元画像用レジスタ
１４，３４，５４，１０４係数用レジスタ
２２，４２，６２，８２，１１２，１３２前処理回路
２４，４４，６４，８４，１１４，１３４後処理回路
１６ａ、１６ｂ、３６ａ、３６ｂ、３６ｃ、３６ｄ、３６ｅ、５６ａ、５６ｂ、７６ａ、７６ｂ、１０６ａ、１０６ｂ、１０６ｃ、１０６ｄ、１０６ｅ、１２６，１４６積和演算器
２１ａ、２１ｂ、４１ａ、４１ｂ、４１ｃ、４１ｄ、４１ｅ、６１ａ、６１ｂ、６１ｃ、６１ｄ、６１ｅ、８１ａ、８１ｂ、１１１ａ、１１１ｂ、１１１ｃ、１１１ｄ、１１１ｅ、１３１ａ、１３１ｂ、１３１ｃ、１３１ｄ、１３１ｅ積和レジスタ
６６，１１６，１３６，１５６カウンタ
６０ａ１，６０ａ２，６０ｂ１，６０ｂ２，６０ｃ１，６０ｃ２，６０ｄ１，６０ｄ２，６０ｅ１，６０ｅ２、８０ａ、８０ｂ、８０ｃ、８０ｄ、８０ｅ、１５０ａ、１５０ｂフリップフロップ
５８ｂ、５８ｃ、５８ｄ、５８ｅ、７８ａ、７８ｂ、７８ｃ、７８ｄ、７８ｅ、１４８，１６２加算器
７２，１２２元画像用シフトレジスタ
７４，１２４，１４４係数用シフトレジスタ
７９ａ、７９ｂセレクタ
８６カウンタ
１４１画像情報記憶装置
１４５，１４９マルチプレクサ（ＭＵＸ）
１５４中間結果レジスタ
１６０ａ、１６０ｂ乗算器 10, 30, 50, 70, 100, 120, 140 Arithmetic circuit 12, 32, 52, 102, 142 Original image register 14, 34, 54, 104 Coefficient register 22, 42, 62, 82, 112, 132 Previous Processing circuit 24, 44, 64, 84, 114, 134 Post-processing circuit 16a, 16b, 36a, 36b, 36c, 36d, 36e, 56a, 56b, 76a, 76b, 106a, 106b, 106c, 106d, 106e, 126, 146 Multiply-accumulator 21a, 21b, 41a, 41b, 41c, 41d, 41e, 61a, 61b, 61c, 61d, 61e, 81a, 81b, 111a, 111b, 111c, 111d, 111e, 131a, 131b, 131c, 131d 131e product-sum register 66, 116, 136, 156 counter 60a1, 60a2, 60b1, 60b2, 60c1, 60c2, 60d1, 60d2, 60e1, 60e2, 80a, 80b, 80c, 80d, 80e, 150a, 150b Flip-flops 58b, 58c, 58d, 58e, 78a, 78b, 78c, 78d 78e, 148, 162 Adder 72, 122 Original image shift register 74, 124, 144 Coefficient shift register 79a, 79b Selector 86 Counter 141 Image information storage device 145, 149 Multiplexer (MUX)
154 Intermediate result register 160a, 160b Multiplier

Claims

An arithmetic circuit that performs a convolution operation of data arranged in a two-dimensional space,
A coefficient register for storing coefficients of N columns × N rows (N is an integer of 3 or more);
A data register holding N data values;
a pre-processing circuit including m (2 ≦ m <N) product-sum calculators;
And a post-processing circuit including 1st to Nth product-sum registers,
Initialize the 1-N-1 product-sum registers of the 1-Nth in the initialization cycle,
After that, in the order of 1 to N cycles, the data value of the 1st to Nth columns in the range of N columns × N rows of the two-dimensional space is held in the data register,
For each cycle,
In each of at least some of the m product-sum calculators of the preprocessing circuit, the product-sum between the N data values and the N coefficients of any column of the N columns × N rows. By calculating the product sum between the N data values and the coefficients of all the columns by repeating the operation of calculating N times less than N times,
A product-sum operation result between the N data values and the N coefficients in the first column calculated by the product-sum calculator is held in a first product-sum register of the post-processing circuit, and The product-sum operation result between the N data values and the coefficients of the n-th column (n = 2 to N), and the value held in the n−1th product-sum register of the post-processing circuit in the previous cycle Is held in the n-th product-sum register of the post-processing circuit,
An arithmetic circuit that outputs a convolution calculation result of a first calculation target point located in the center of the range of N columns × N rows in the two-dimensional space from the Nth product-sum register.

The post-processing circuit further includes a 1-Nth temporary register provided corresponding to each of the 1-Nth product-sum registers;
After each product-sum operation result calculated by the product-sum operation unit between the data value and the coefficients in the 1st to Nth columns is held in the 1st to Nth temporary registers for each cycle. The accumulated value of the n-th product-sum operation result held in the temporary register and the value held in the (n-1) -th product-sum register in the previous cycle is held in the n-th product-sum register. The arithmetic circuit according to claim 1 , wherein:

An arithmetic circuit that performs a convolution operation of data arranged in a two-dimensional space,
A coefficient register for holding coefficients of N columns × N rows (N is an integer of 2 or more);
N columns x N rows of data registers;
A preprocessing circuit having N product-sum calculators;
A post-processing circuit having N product-sum registers,
The data register holds the data of the 1st to Nth columns in the range of N columns × N rows of the two-dimensional space,
In the first cycle, the N data values held in the nth column of the data register of N columns × N rows in the nth (n = 1 to N) product-sum operation unit Calculate the sum of products with the N coefficients in the first column of the N columns × N rows, and store the respective operation results in the n th product sum register of the post-processing circuit,
Then, in order through the kth (k = 2 to N) cycle,
The data held in the data register is shifted by −1 column, and the N + k−1 column data adjacent to the N column × N row of the two-dimensional space in the column direction is shifted to the N column of the data register. Hold
In the nth product-sum operation unit, N data values held in the nth column of the N column × N row data register and N coefficients in the N column × N row k column The sum of the results of each operation and the value held in the nth product-sum register of the post-processing circuit in the previous cycle is calculated as the n-th product-sum register of the post-processing circuit. By repeating the operation to hold
From each of the N product-sum registers, a first calculation target point located in the center of a range of N columns × N rows of the two-dimensional space, and the first calculation target in the column direction of the two-dimensional space An arithmetic circuit which outputs a convolution operation result of the 2nd to Nth operation target points adjacent to each other in order.

An arithmetic method for performing a convolution operation between data arranged in a two-dimensional space and coefficients of N columns × N rows (N is an integer of 2 or more),
A preprocessing circuit having N product-sum calculators;
Prepare a post-processing circuit with N product-sum registers,
In the first cycle, a data value in a range of N columns × N rows of the two-dimensional space and N coefficients of the first column of N columns × N rows are supplied to the preprocessing circuit, and the N pieces The product-sum calculator between the N data values in the n-th column of the two-dimensional space and the N coefficients is calculated by an nth (n = 1 to N) product-sum calculator The operation result is held in the nth product-sum register of the post-processing circuit,
Then, in order through the kth (k = 2 to N) cycle,
The data values from the Nth column × Nth row kth column to the N + k−1th column adjacent to the Nth column × Nth row in the column direction, and the Nth column × Nth row N coefficients in the k-th column are supplied to the pre-processing circuit, and the N-th product-sum calculator calculates the N data values in the (N + k−1) -th column in the two-dimensional space and the N The sum of products between the coefficients is calculated, and an integrated value of each operation result and the value held in the nth product-sum register of the post-processing circuit in the previous cycle is calculated as n of the post-processing circuit. By repeating the operation held in the th product-sum register,
From each of the N product-sum registers, a first calculation target point located in the center of a range of N columns × N rows of the two-dimensional space, and the first calculation target in the column direction of the two-dimensional space A calculation method that outputs a convolution calculation result of the 2nd to Nth calculation target points that are adjacent to each other in order.

An arithmetic circuit that performs a convolution operation of data arranged in a two-dimensional space,
A coefficient register for holding coefficients of N columns × N rows (N is an integer of 2 or more);
N columns x N rows of data registers;
a pre-processing circuit including m (m <N) product-sum calculators;
A post-processing circuit having N product-sum registers,
The data register holds data in the 1st to Nth columns in the range of N columns × N rows in the two-dimensional space,
In the first cycle,
In each of at least some of the m product-sum calculators of the preprocessing circuit, N data values held in any column of the N columns × N rows of data registers, and the N columns × N By repeating the operation of calculating the product sum between the N coefficients in the first column of the row, the data values held in all the columns of the data register and the coefficients in the first column are The sum of products of
The pre-processing circuit calculates the product-sum operation result between the data value held in the n-th column (n = 1 to N) of the data register and the coefficient in the first column, which is calculated by the product-sum calculator. In the nth sum of products register
Then, in order through the kth (k = 2 to N) cycle,
The data held in the data register is shifted by −1 in the column direction, and the N + k−1th column adjacent to the N column × N row in the two-dimensional space in the column direction is shifted to the Nth column of the data register. Keep the data,
In each of at least some of the m product-sum calculators of the preprocessing circuit, N data values held in any column of the N columns × N rows of data registers, and the N columns × N By repeatedly performing the operation of calculating the sum of products between the N coefficients in the k-th column of the row, the data values held in all the columns of the data register and the coefficients in the k-th column are calculated. Perform product-sum operation between
The product-sum operation result between the data value held in the n-th column of the data register and the coefficient in the k-th column, calculated by the product-sum calculator, and the nth of the post-processing circuit in the previous cycle By repeating the operation of holding the integrated value with the value held in the product-sum register in the nth product-sum register of the post-processing circuit,
From each of the N product-sum registers, a first calculation target point located at the center of a range of N columns × N rows of the two-dimensional space, and the first calculation in the column direction of the two-dimensional space An arithmetic circuit which outputs a convolution calculation result of the 2nd to Nth calculation target points adjacent to the target point in order.