JP7292903B2

JP7292903B2 - Image processing device and image processing method

Info

Publication number: JP7292903B2
Application number: JP2019037584A
Authority: JP
Inventors: 貴久山本; 政美加藤
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-03-01
Filing date: 2019-03-01
Publication date: 2023-06-19
Anticipated expiration: 2039-03-01
Also published as: JP2020140625A

Description

本発明は、画像処理装置及び画像処理方法に関し、２次元データに対してフィルタ演算処理を行う画像処理技術に関する。 The present invention relates to an image processing apparatus and an image processing method, and more particularly to an image processing technique for performing filter operation processing on two-dimensional data.

例えば、画像処理分野では、画像における画素配列が２次元になっていることから、２次元の畳み込みフィルタ演算（コンボリューション演算）のような２次元のデータ演算が頻繁に用いられる。図１４は、２次元のデータ演算の１つである畳み込みフィルタ演算の例を説明する図である。図１４において、（ａ）はフィルタ演算の演算対象画像１４０１を示しており、（ｂ）はフィルタ演算に用いるフィルタカーネル１４０２を示しており、（ｃ）はフィルタ演算結果である演算出力画像１４０３を示している。 For example, in the field of image processing, a two-dimensional data operation such as a two-dimensional convolution filter operation (convolution operation) is frequently used because an image has a two-dimensional pixel array. FIG. 14 is a diagram for explaining an example of convolution filter calculation, which is one of two-dimensional data calculations. In FIG. 14, (a) shows a target image 1401 for filter operation, (b) shows a filter kernel 1402 used for filter operation, and (c) shows an operation output image 1403 which is the result of filter operation. showing.

図１４には、フィルタ係数が２次元配列された、サイズが３×３のフィルタカーネル１４０２を一例として示しており、この場合、次式に示す積和演算処理により、畳み込みフィルタ演算結果が算出される。 FIG. 14 shows an example of a 3×3 filter kernel 1402 in which filter coefficients are arranged two-dimensionally. be.

ここで、「Ｄ_i,j」は演算対象画像の座標（ｊ，ｉ）での画素値を示し、「Ｒ_i,j」は演算出力画像の座標（ｊ，ｉ）でのフィルタ演算結果を示す。また、「Ｗ_s,t」は演算対象画像の座標（ｊ＋ｔ，ｉ＋ｓ）の画素値に適用するフィルタカーネルの値（フィルタ係数値）を示す。「ｃｏｌｕｍｎＳｉｚｅ」はフィルタカーネルの水平方向サイズを示し、「ｒｏｗＳｉｚｅ」はフィルタカーネルの垂直方向サイズを示す。 Here, “D _i,j ” indicates the pixel value at the coordinates (j, i) of the operation target image, and “R _i,j ” indicates the filter operation result at the coordinates (j, i) of the operation output image. show. "W _s,t " indicates the value of the filter kernel (filter coefficient value) applied to the pixel value at the coordinates (j+t, i+s) of the calculation target image. "columnSize" indicates the horizontal size of the filter kernel, and "rowSize" indicates the vertical size of the filter kernel.

フィルタカーネル１４０２を演算対象画像１４０１中で走査させつつ、前述の演算を行うことで、畳み込みフィルタ演算の演算出力画像１４０３を得ることができる。このとき、元画像である演算対象画像１４０１のサイズを垂直方向サイズＡ×水平方向サイズＢとすると、フィルタ演算結果である演算出力画像１４０３のサイズは、（Ａ－ｒｏｗＳｉｚｅ＋１）×（Ｂ－ｃｏｌｕｍｎＳｉｚｅ＋１）となる。 By performing the above-described calculation while scanning the filter kernel 1402 in the calculation target image 1401, a calculation output image 1403 of the convolution filter calculation can be obtained. At this time, if the size of the calculation target image 1401, which is the original image, is vertical size A×horizontal size B, the size of the calculation output image 1403, which is the result of the filter calculation, is (A−rowSize+1)×(B−columnSize+1). becomes.

このような演算に対して、特許文献１、２では、積和演算器に代表されるような演算器を複数用意し、演算器に供給される演算対象画像の画素データを、複数の演算器間で共有して演算を並列処理させる。これにより、演算処理の高速化、並びに演算対象画像の画素データの効率的な使用を達成しようとしている。 For such calculations, in Patent Documents 1 and 2, a plurality of calculators such as sum-of-products calculators are prepared, and the pixel data of the calculation target image supplied to the calculators are processed by the plurality of calculators. shared between them to perform parallel processing. As a result, it is attempted to achieve high-speed arithmetic processing and efficient use of pixel data of the arithmetic target image.

特開２００４－１３８７３号公報JP-A-2004-13873 特開２０１０－１３４６９７号公報JP 2010-134697 A

しかしながら、特許文献１、２に記載の手法を用いて、並列に演算処理を行える演算器を増やそうとすると、演算対象画像に対して一方向（例えば水平方向）に並列演算を拡張することになる。特許文献１、２に記載の構成では、「積和演算器の数」が決まれば、「並列に演算されるフィルタ演算出力画素の分布」が一意に決定される。例えば、積和演算器の数が８個である場合、１次元に並んだ８画素分が並列に演算される。そのため、以下のような問題が発生する。 However, if an attempt is made to increase the number of arithmetic units capable of performing arithmetic processing in parallel using the methods described in Patent Documents 1 and 2, the parallel arithmetic is extended in one direction (for example, the horizontal direction) with respect to the image to be arithmetically operated. . In the configurations described in Patent Documents 1 and 2, once the "number of sum-of-products computing units" is determined, the "distribution of filter computation output pixels computed in parallel" is uniquely determined. For example, when the number of sum-of-products calculators is eight, eight pixels arranged one-dimensionally are calculated in parallel. Therefore, the following problems occur.

まず、演算対象画像が小さい場合には、並列演算が有効に活用できないという問題が生じる。例えば、演算対象画像の水平方向サイズが、並列演算のために用意されている積和演算器の個数よりも小さい場合、フィルタ演算には寄与しない積和演算器が発生してしまい、演算効率の低下を招くことが考えられる。 First, when the image to be computed is small, there arises a problem that parallel computation cannot be effectively utilized. For example, if the horizontal size of the target image is smaller than the number of multiply-accumulate calculators prepared for parallel calculation, some multiply-accumulator calculators that do not contribute to the filter calculation will occur, resulting in lower calculation efficiency. It is conceivable that it will lead to a decline.

また、逆に演算効率を向上させるために並列度（並列演算のために用意されている積和演算器の個数）を増やそうとした場合にも、例えば、演算対象画像の水平方向サイズを超えて並列度を増やすことができない。つまり、それ以上演算器の数を増やしても、演算効率が向上しないという問題が発生する。 Conversely, if you try to increase the degree of parallelism (the number of multiply-accumulate calculators prepared for parallel calculation) in order to improve the calculation efficiency, for example, the horizontal size of the calculation target image will be exceeded. Parallelism cannot be increased. In other words, even if the number of calculators is increased any further, there arises a problem that the calculation efficiency is not improved.

本発明は、このような事情に鑑みてなされたものであり、任意のサイズの演算対象画像に対して良好な演算効率でフィルタ演算を実行できるようにすることを目的とする。 SUMMARY OF THE INVENTION The present invention has been made in view of such circumstances, and an object of the present invention is to enable a filter operation to be performed on an operation target image of any size with good operation efficiency.

本発明に係る画像処理装置は、画像記憶手段に記憶されている画像の画素データに対して、フィルタカーネルを走査させてフィルタ演算処理を行う画像処理装置であって、前記フィルタ演算処理が並列して行われる画素の配置に応じて接続状態が制御され、前記画像記憶手段から読み出された複数の前記画素データを一時的に記憶する複数の一時記憶手段と、前記フィルタカーネルにおけるフィルタ係数と、前記複数の一時記憶手段に記憶された複数の前記画素データとを用いたフィルタ演算処理を並列に行い、該フィルタ演算処理が並列に行われる画素の配置に応じて１つ又は複数の演算グループにグループ化される複数の演算手段と、前記複数の一時記憶手段の間での前記フィルタ演算処理に用いる前記画素データの転送を制御し、前記複数の演算手段が複数の演算グループにグループ化された場合に、第１の転送モードでは同じ演算グループに属する前記演算手段で用いられるように前記画素データを転送し、第２の転送モードでは別の演算グループに属する前記演算手段でも用いられるように前記画素データを転送するように制御する転送制御手段とを有することを特徴とする。 An image processing apparatus according to the present invention is an image processing apparatus for performing filter arithmetic processing by scanning a filter kernel with respect to pixel data of an image stored in an image storage means, wherein the filter arithmetic processing is performed in parallel. a plurality of temporary storage means for temporarily storing the plurality of pixel data read from the image storage means, the filter coefficients in the filter kernel; Filter operation processing is performed in parallel using the plurality of pixel data stored in the plurality of temporary storage means, and one or more operation groups are formed according to the arrangement of pixels in which the filter operation processing is performed in parallel. and controlling transfer of the pixel data used for the filtering operation between the plurality of arithmetic means grouped into a plurality of temporary storage means , wherein the plurality of arithmetic means are grouped into a plurality of arithmetic groups. In the first transfer mode, the pixel data are transferred so as to be used by the arithmetic means belonging to the same arithmetic group, and in the second transfer mode, the pixel data are transferred so as to be used by the arithmetic means belonging to another arithmetic group. and transfer control means for controlling to transfer the pixel data .

本発明によれば、任意のサイズの演算対象画像に対して良好な演算効率でフィルタ演算を実行することが可能となる。 According to the present invention, it is possible to perform filter calculation with good calculation efficiency on a calculation target image of any size.

第１の実施形態における画像処理装置の構成例を示す図である。1 is a diagram illustrating a configuration example of an image processing apparatus according to a first embodiment; FIG. 本実施形態における並列化形状の制御例を説明する図である。It is a figure explaining the example of control of the parallelization shape in this embodiment. 第１の実施形態における演算処理部の構成例を示す図である。It is a figure which shows the structural example of the arithmetic processing part in 1st Embodiment. 第１の実施形態における演算処理例を示すフローチャートである。4 is a flowchart showing an example of arithmetic processing in the first embodiment; 第１の実施形態における演算処理例を示すタイムチャートである。4 is a time chart showing an example of arithmetic processing in the first embodiment; 第１の実施形態における演算処理例を示すタイムチャートである。4 is a time chart showing an example of arithmetic processing in the first embodiment; 第１の実施形態における演算処理例を示すタイムチャートである。4 is a time chart showing an example of arithmetic processing in the first embodiment; 第２の実施形態における画像処理装置の構成例を示す図である。FIG. 10 is a diagram illustrating a configuration example of an image processing apparatus according to a second embodiment; FIG. 第２の実施形態における演算処理部の構成例を示す図である。It is a figure which shows the structural example of the arithmetic processing part in 2nd Embodiment. 第２の実施形態における演算処理例を示すフローチャートである。9 is a flowchart showing an example of arithmetic processing in the second embodiment; 第２の実施形態における演算処理例を示すタイムチャートである。9 is a time chart showing an example of arithmetic processing in the second embodiment; 第２の実施形態における演算処理例を示すタイムチャートである。9 is a time chart showing an example of arithmetic processing in the second embodiment; 第３の実施形態における画像処理装置の構成例を示す図である。FIG. 12 is a diagram illustrating a configuration example of an image processing apparatus according to a third embodiment; FIG. 畳み込みフィルタ演算を説明する図である。It is a figure explaining a convolution filter operation.

以下、本発明の実施形態を図面に基づいて説明する。 BEST MODE FOR CARRYING OUT THE INVENTION An embodiment of the present invention will be described below with reference to the drawings.

（第１の実施形態）
本発明の第１の実施形態について説明する。第１の実施形態における画像処理装置は、畳み込みフィルタ演算等のフィルタ演算処理を並列に演算する際に、フィルタ演算の演算対象領域の形状に応じて、並列に算出されるフィルタ演算出力画素の並び方を変更できるようにする。このようにフィルタ演算の演算対象領域の形状に応じて、並列処理されるフィルタ演算出力画素の並び方を変更可能にし、演算対象領域の形状が変化しても並列演算の演算効率が低下しないようにする。 (First embodiment)
A first embodiment of the present invention will be described. In the image processing apparatus according to the first embodiment, when performing filter computation processing such as convolution filter computation in parallel, the arrangement of filter computation output pixels calculated in parallel according to the shape of the computation target region of the filter computation is to be able to change In this way, it is possible to change the arrangement of the filter operation output pixels to be processed in parallel according to the shape of the area subject to the filter operation, so that the efficiency of the parallel operation does not decrease even if the shape of the area subject to the operation changes. do.

ここで、フィルタ演算出力画素の並び方とは、並列演算で同時に出力される出力画素がどのような配置になっているのかということを表す。例えば、本実施形態における画像処理装置が演算器を８個有する（８並列）場合、フィルタ演算出力として８画素を同時に出力することが可能である。さらに、その８画素の並びが、垂直方向が１画素で水平方向が８画素であるのか、垂直方向が２画素で水平方向が４画素であるのか、或いは垂直方向が４画素で水平方向が２画素であるのかを、設定により切り替え可能とする。以下では、フィルタ演算出力画素の並び方を「並列化形状」とも称す。また、垂直方向がＭ画素で水平方向がＮ画素の領域を「Ｍ×Ｎ」と表記する。 Here, the arrangement of filter operation output pixels indicates the arrangement of output pixels that are simultaneously output in parallel operation. For example, when the image processing apparatus in this embodiment has eight arithmetic units (eight parallel units), it is possible to simultaneously output eight pixels as filter arithmetic outputs. Further, whether the arrangement of the 8 pixels is 1 pixel in the vertical direction and 8 pixels in the horizontal direction, 2 pixels in the vertical direction and 4 pixels in the horizontal direction, or 4 pixels in the vertical direction and 2 pixels in the horizontal direction. Whether it is a pixel or not can be switched by setting. Hereinafter, the arrangement of filter operation output pixels is also referred to as “parallelized shape”. A region of M pixels in the vertical direction and N pixels in the horizontal direction is expressed as “M×N”.

以下では、演算器が８個で構成される画像処理装置を一例に説明するが、画像処理装置が有する演算器の個数は、これに限定されるものではなく、任意の複数である。また、出力画素の並び（並列化形状）を、垂直方向が１画素で水平方向が８画素である１×８と、垂直方向が２画素で水平方向が４画素である２×４と、垂直方向が４画素で水平方向が２画素である４×２との間で切り替える例を示す。しかし、切り替え可能な出力画素の並び（並列化形状）は、これに限定されるものではなく、演算器の個数に応じて適宜設定可能である。 An image processing apparatus having eight computing units will be described below as an example, but the number of computing units included in the image processing apparatus is not limited to this, and may be any number. Also, the array of output pixels (parallelized shape) can be 1×8 with 1 pixel in the vertical direction and 8 pixels in the horizontal direction, 2×4 with 2 pixels in the vertical direction and 4 pixels in the horizontal direction, and 2×4 with 2 pixels in the vertical direction and 4 pixels in the horizontal direction. An example of switching between 4×2 with 4 pixels in the direction and 2 pixels in the horizontal direction is shown. However, the arrangement of switchable output pixels (parallelized shape) is not limited to this, and can be appropriately set according to the number of arithmetic units.

本実施形態における画像処理装置では、出力画素の並び（並列化形状）に応じて複数の演算器をグループ化し、同一の水平方向１行分の出力画素を演算する演算器から構成される演算器群を同一演算グループとして設定する。そして、それぞれの演算グループ内、或いは演算グループ間での演算対象データの転送を制御する。例えば、並列化形状を１×８とする場合には、８個の演算器からなる単一の演算グループを設定する。同様に、並列化形状を２×４とする場合には、それぞれ４個の演算器からなる２つの演算グループを設定し、並列化形状を４×２とする場合には、それぞれ２個の演算器からなる４つの演算グループを設定する。したがって、演算グループを設定するということは、並列化形状を選択するということと等しい。 In the image processing apparatus according to the present embodiment, a plurality of computing units are grouped according to the arrangement of output pixels (parallelized shape), and the computing units are composed of computing units for computing the same horizontal row of output pixels. Set the group as the same operation group. Then, it controls the transfer of calculation target data within each calculation group or between calculation groups. For example, when the parallelization configuration is 1×8, a single arithmetic group consisting of eight arithmetic units is set. Similarly, when the parallelization shape is 2×4, two operation groups each consisting of four operation units are set, and when the parallelization shape is 4×2, two operation groups are set. We set up four operation groups consisting of units. Therefore, setting an operation group is equivalent to selecting a parallelization shape.

図１は、第１の実施形態における画像処理装置１０１の構成例を示すブロック図である。画像処理装置１０１は、形状制御部１０２、転送制御部１０３、フィルタ係数格納部１０４、画像データ格納部１０５、及び演算処理部１０６を有する。 FIG. 1 is a block diagram showing a configuration example of an image processing apparatus 101 according to the first embodiment. The image processing apparatus 101 has a shape control section 102 , a transfer control section 103 , a filter coefficient storage section 104 , an image data storage section 105 and an arithmetic processing section 106 .

形状制御部１０２は、入力される設定情報Ｓ１に基づいて演算器のグループ（演算グループ）を設定し、並列演算により出力される出力画素の並び（並列化形状）を決定する。設定情報Ｓ１には、演算対象画像のサイズ、フィルタカーネルのサイズ、及び演算処理部１０６が有する演算器の数（並列数）についての情報が含まれている。本実施形態では、フィルタ演算の演算対象領域の形状が変化しても並列演算の演算効率を低下させないために、フィルタ演算に寄与しない演算器を極力減らすことが重要となる。そこで、形状制御部１０２は、入力される設定情報Ｓ１から、良好な演算効率で並列演算を行うことができる出力画素の並び（並列化形状）を判断する。 The shape control unit 102 sets groups of calculators (calculation groups) based on the input setting information S1, and determines the arrangement of output pixels (parallelized shape) output by parallel calculation. The setting information S1 includes information about the size of the calculation target image, the size of the filter kernel, and the number of calculators (parallel number) of the calculation processing unit 106 . In the present embodiment, it is important to reduce the number of computing units that do not contribute to the filter computation as much as possible so as not to reduce the computation efficiency of the parallel computation even if the shape of the computation target area of the filter computation changes. Therefore, the shape control unit 102 determines, from the input setting information S1, the arrangement of output pixels (parallelized shape) that allows parallel computation with good computational efficiency.

図２を参照して、形状制御部１０２が、設定情報Ｓ１からどのように並列化形状を決定するのかを説明する。図２は、本実施形態における並列化形状の制御例を説明する図である。図２において、（ａ）、（ｂ）、及び（ｃ）はフィルタ演算の演算対象画像２０１、２０２、及び２０３を示している。また、図２において、（ｄ）は並列化形状を１×８としたときの出力画素２０４を示しており、（ｅ）は並列化形状を２×４としたときの出力画素２０５を示しており、（ｆ）は並列化形状を４×２としたときの出力画素２０６を示している。 How the shape control unit 102 determines the parallelized shape from the setting information S1 will be described with reference to FIG. FIG. 2 is a diagram for explaining an example of parallelized shape control in this embodiment. In FIG. 2, (a), (b), and (c) show images 201, 202, and 203 to be filtered. In FIG. 2, (d) shows the output pixel 204 when the parallelization shape is 1×8, and (e) shows the output pixel 205 when the parallelization shape is 2×4. and (f) shows the output pixel 206 when the parallelized shape is 4×2.

図２（ａ）に示す演算対象画像２０１の水平方向サイズは１０であり、図２（ｂ）に示す演算対象画像２０２の水平方向サイズは６であり、図２（ｃ）に示す演算対象画像２０３の水平方向サイズは４である。ここで、本実施形態では、フィルタカーネルのサイズは、図１４（ｂ）に示したフィルタカーネル１４０２と同じ３×３とし、画像処理装置１０１の演算処理部１０６が有する演算器の個数（並列数）は８とする。 The horizontal size of the calculation target image 201 shown in FIG. 2(a) is 10, the horizontal size of the calculation target image 202 shown in FIG. 2(b) is 6, and the calculation target image shown in FIG. The horizontal size of 203 is four. Here, in this embodiment, the size of the filter kernel is 3×3, which is the same as the filter kernel 1402 shown in FIG. ) is set to 8.

フィルタ演算の演算対象画像が演算対象画像２０１である場合、出力画素２０４、２０５、２０６が出力される何れの並列化形状を選択しても、フィルタ演算に寄与しない演算器は発生しない。フィルタ演算の演算対象画像が演算対象画像２０２である場合、出力画素２０５、２０６が出力される並列化形状であれば、フィルタ演算に寄与しない演算器は発生しない。また、フィルタ演算の演算対象画像が演算対象画像２０３である場合、出力画素２０６が出力される並列化形状であれば、フィルタ演算に寄与しない演算器は発生しない。 When the target image of the filter computation is the target image 201, there is no computation unit that does not contribute to the filter computation regardless of which parallelized shape that outputs the output pixels 204, 205, and 206 is selected. When the target image for filter computation is the target image 202 for computation, if the parallelized shape outputs the output pixels 205 and 206, there is no calculator that does not contribute to the filter computation. Further, when the target image of the filter computation is the target image 203, if the parallelized shape outputs the output pixel 206, there is no computing unit that does not contribute to the filter computation.

したがって、形状制御部１０２は、フィルタ演算の演算対象画像が演算対象画像２０３である場合には、並列化形状として出力画素２０６が出力される４×２を選択する。なお、フィルタ演算の演算対象画像が演算対象画像２０１、２０２である場合には、並列化形状の選択肢が複数ある。このように複数の並列化形状が選択可能である場合、形状制御部１０２は、フィルタ演算の演算処理時間が短くなる並列化形状を選択する。フィルタ演算の演算処理時間は、並列化形状、演算対象画像のサイズ、及びフィルタカーネルのサイズ等に基づいて取得可能である。 Therefore, when the target image for filter computation is the target image 203, the shape control unit 102 selects 4×2, which outputs the output pixels 206, as the parallelization shape. Note that when the target images of the filter computation are the target images 201 and 202, there are a plurality of parallelized shape options. When a plurality of parallelized shapes can be selected in this manner, the shape control unit 102 selects a parallelized shape that shortens the computation processing time of the filter computation. The computation processing time of the filter computation can be obtained based on the parallelized shape, the size of the computation target image, the size of the filter kernel, and the like.

転送制御部１０３は、フィルタ係数格納部１０４及び画像データ格納部１０５からフィルタ係数Ｓ２及び画素データＳ３をそれぞれ読み出して演算処理部１０６に供給する制御を行う。また、転送制御部１０３は、転送制御信号Ｓ４を演算処理部１０６に出力することにより、演算処理部１０６内での画素データの転送の制御を行う。転送制御部１０３が行う、フィルタ係数格納部１０４及び画像データ格納部１０５からのデータ読み出し、及び演算処理部１０６内での画素データの転送に係る制御は、形状制御部１０２により決定された並列化形状に基づいて行われる。 The transfer control unit 103 reads out the filter coefficient S2 and the pixel data S3 from the filter coefficient storage unit 104 and the image data storage unit 105, respectively, and controls the supply to the arithmetic processing unit . Further, the transfer control unit 103 controls the transfer of pixel data in the arithmetic processing unit 106 by outputting a transfer control signal S4 to the arithmetic processing unit 106 . Data readout from the filter coefficient storage unit 104 and the image data storage unit 105 and control of pixel data transfer in the arithmetic processing unit 106 performed by the transfer control unit 103 are parallelized as determined by the shape control unit 102. It is done based on the shape.

フィルタ係数格納部１０４には、フィルタ演算で使用されるフィルタカーネルの値（フィルタ係数）が格納される。画像データ格納部１０５には、フィルタ演算の演算対象画像の画素データが格納される。画像データ格納部１０５は、画像記憶手段の一例である。演算処理部１０６は、並列して演算を実行可能な複数の演算器を有し、入力されるフィルタ係数Ｓ２と画素データＳ３とを用いてフィルタ演算を行う。 The filter coefficient storage unit 104 stores values of filter kernels (filter coefficients) used in filter calculation. The image data storage unit 105 stores pixel data of an image to be filtered. The image data storage unit 105 is an example of image storage means. The calculation processing unit 106 has a plurality of calculators capable of executing calculations in parallel, and performs filter calculation using the input filter coefficient S2 and pixel data S3.

図３は、演算処理部１０６の構成例を示すブロック図である。以下、図３を参照して、演算処理部１０６の内部構成、及び形状制御部１０２から入力される形状制御信号Ｓ５による演算処理部１０６内での接続制御について説明する。演算処理部１０６は、レジスタ３０１～３０８、３１１～３１８、積和演算器３２１～３２８、及びセレクタ３３１～３３３を有する。レジスタ３０１～３０８、３１１～３１８は、記憶手段の一例であり、積和演算器３２１～３２８は、演算手段の一例である。 FIG. 3 is a block diagram showing a configuration example of the arithmetic processing unit 106. As shown in FIG. The internal configuration of the arithmetic processing unit 106 and the connection control in the arithmetic processing unit 106 based on the shape control signal S5 input from the shape control unit 102 will be described below with reference to FIG. The arithmetic processing unit 106 has registers 301-308, 311-318, sum-of-products calculators 321-328, and selectors 331-333. Registers 301 to 308 and 311 to 318 are examples of storage means, and sum-of-products calculators 321 to 328 are examples of calculation means.

レジスタ３０１～３０８、３１１～３１８は、画像データ格納部１０５から読み出されたフィルタ演算対象の画素データを一時的に記憶する。また、レジスタ３０１～３０８、３１１～３１８は、それぞれ接続されている積和演算器、並びに接続されているレジスタに画素データを出力する。例えば、レジスタ３０８は、記憶している画素データを積和演算器３２８とレジスタ３０７に出力する。レジスタ３０１～３０８、３１１～３１８のうち、積和演算器に直接接続されているレジスタ３０１～３０８を、直接接続レジスタとも称す。例えば、積和演算器３２７にとっての直接接続レジスタはレジスタ３０７である。これとは逆に、積和演算器に直接接続されていないレジスタ３１１～３１８を、間接接続レジスタとも称す。 Registers 301 to 308 and 311 to 318 temporarily store pixel data to be subjected to filter calculation read from the image data storage unit 105 . Also, the registers 301 to 308 and 311 to 318 output pixel data to the connected sum-of-products calculators and the connected registers. For example, the register 308 outputs the stored pixel data to the sum-of-products calculator 328 and the register 307 . Of the registers 301 to 308 and 311 to 318, the registers 301 to 308 that are directly connected to the sum-of-products arithmetic unit are also called directly connected registers. For example, the directly connected register for multiply-accumulator 327 is register 307 . Conversely, the registers 311 to 318 that are not directly connected to the sum-of-products calculator are also called indirectly connected registers.

積和演算器３２１～３２８は、直接接続されたレジスタ３０１～３０８から供給される画素データとフィルタ係数格納部１０４から供給されるフィルタ係数との乗算結果を累積加算する演算処理を並列に行う。例えば、積和演算器３２８は、レジスタ３０８から供給される画素データとフィルタ係数格納部１０４から供給されるフィルタ係数とを用いて積和演算を行う。 The sum-of-products calculators 321 to 328 perform arithmetic processing for cumulatively adding the results of multiplication of the pixel data supplied from the directly connected registers 301 to 308 and the filter coefficients supplied from the filter coefficient storage unit 104 in parallel. For example, the sum-of-products calculator 328 performs a sum-of-products operation using the pixel data supplied from the register 308 and the filter coefficients supplied from the filter coefficient storage unit 104 .

セレクタ３３１～３３３は、接続されている複数の入力から１つを選択して出力する。セレクタ３３１～３３３には形状制御部１０２からの形状制御信号が接続されており、セレクタ３３１～３３３は、この形状制御信号により、複数の入力のうちのどの入力を選択して出力するかを設定可能となっている。つまり、形状制御信号により、セレクタ３３１～３３３の入出力関係が設定できる。例えば、セレクタ３３１は、入力されるレジスタ３０３及びレジスタ３１１の出力のうち、いずれかを選択してレジスタ３０２に出力する。セレクタ３３１が、レジスタ３０３の出力をレジスタ３０２に出力するか、レジスタ３１１の出力をレジスタ３０２に出力するかを、形状制御信号により制御できる。 Selectors 331 to 333 select and output one from a plurality of connected inputs. A shape control signal from the shape control unit 102 is connected to the selectors 331 to 333, and the selectors 331 to 333 set which input to select and output from the plurality of inputs according to the shape control signal. It is possible. In other words, the input/output relationship of the selectors 331 to 333 can be set by the shape control signal. For example, the selector 331 selects one of the input outputs of the register 303 and the register 311 and outputs it to the register 302 . Whether the selector 331 outputs the output of the register 303 to the register 302 or outputs the output of the register 311 to the register 302 can be controlled by the shape control signal.

ここで、本実施形態における積和演算器３２１～３２８の演算グループ分けについて説明する。この演算グループ分けは、セレクタ３３１～３３３の入出力関係を設定することで実現される。本実施形態では、積和演算器３２１～３２８の演算グループ分けは、次の通りにする。 Here, the operation grouping of the sum-of-products operators 321 to 328 in this embodiment will be described. This grouping of operations is realized by setting the input/output relationship of the selectors 331-333. In this embodiment, the sum-of-products calculators 321 to 328 are grouped as follows.

・並列化形状を１×８とするとき

・When the parallelization shape is 1×8

・並列化形状を２×４とするとき

・When the parallelization shape is 2×4

・並列化形状を４×２とするとき

・When the parallelization shape is 4×2

形状制御部１０２は、決定した並列化形状に応じて、前述の表に示すように積和演算器３２１～３２８をグループ化することにより演算グループ分けを行う。さらに、形状制御部１０２は、同一の演算グループに属する積和演算器の直接接続レジスタ同士は直接接続し、異なる演算グループに属する積和演算器の直接接続レジスタ同士は間接接続レジスタを間に挟んで接続するように、形状制御信号を設定し出力する。したがって、本実施形態では、形状制御部１０２は、セレクタ３３１～３３３に対して、以下の表に示すレジスタ３０３の出力を選択して出力するように形状制御信号を出力する。 The shape control unit 102 groups the sum-of-products calculators 321 to 328 according to the determined parallelized shape, as shown in the table above, to perform calculation grouping. Further, the shape control unit 102 directly connects the directly connected registers of the sum-of-products arithmetic units belonging to the same arithmetic group, and interposes the indirect-connected registers between the directly connected registers of the sum-of-products arithmetic units belonging to different arithmetic groups. Set and output the shape control signal to connect with Therefore, in this embodiment, the shape control section 102 outputs shape control signals to the selectors 331 to 333 so as to select and output the outputs of the register 303 shown in the table below.

例えば、並列化形状を１×８とするときには、セレクタ３３１は、レジスタ３０３の出力を選択してレジスタ３０２に出力する。また、並列化形状を２×４とするときにも、セレクタ３３１は、レジスタ３０３の出力を選択してレジスタ３０２に出力する。一方、並列化形状を４×２とするときには、セレクタ３３１は、レジスタ３１１の出力を選択してレジスタ３０２に出力する。 For example, when the parallelization shape is 1×8, the selector 331 selects the output of the register 303 and outputs it to the register 302 . Also when the parallelization shape is 2×4, the selector 331 selects the output of the register 303 and outputs it to the register 302 . On the other hand, when the parallelization shape is 4×2, the selector 331 selects the output of the register 311 and outputs it to the register 302 .

このように形状制御信号によりセレクタ３３１～３３３の入出力関係を制御することで、同一の演算グループ内、或いは演算グループ間の直接接続レジスタ並びに間接接続レジスタの接続関係が所望のものになる。例えば、並列化形状を２×４とするときには、グループＢ１に属する積和演算器３２１～３２４の直接接続レジスタ３０１～３０４は、お互いに直接接続されるようにセレクタ３３１が設定される。なお、レジスタ３０２とレジスタ３０３との間にはセレクタ３３１があるが、セレクタは単に接続関係を決めているだけなのでレジスタ間の接続関係としては無視するものとする。また、グループＢ１に属する直接接続レジスタ（レジスタ３０４）とグループＢ２に属する直接接続レジスタ（レジスタ３０５）との間には、間接接続レジスタ３１３、３１４が接続されるようにセレクタ３３２が設定される。 By controlling the input/output relationship of the selectors 331 to 333 by the shape control signal in this way, the connection relationship of the directly connected registers and the indirectly connected registers within the same operation group or between the operation groups becomes desired. For example, when the parallelization configuration is 2×4, the selector 331 is set so that the directly connected registers 301 to 304 of the sum-of-products operators 321 to 324 belonging to group B1 are directly connected to each other. There is a selector 331 between the registers 302 and 303, but since the selector simply determines the connection relationship, it is ignored as the connection relationship between the registers. A selector 332 is set so that the indirect connection registers 313 and 314 are connected between the directly connected register (register 304) belonging to group B1 and the directly connected register (register 305) belonging to group B2.

ここで、並列化形状を２×４とする場合、セレクタ３３１、３３３で接続が遮断されているため、レジスタ３１１、３１２、３１５、３１６に格納された画素データは、積和演算器に供給されることはない。そのため、並列化形状を２×４とする場合、レジスタ３１１、３１２、３１５、３１６には、どのようなデータが入っていても演算結果には影響しない。同様に、並列化形状を１×８とする場合、セレクタ３３１～３３３で接続が遮断されているため、レジスタ３１１～３１６に格納された画素データは、積和演算器に供給されることはない。そのため、並列化形状を１×８とする場合、レジスタ３１１～３１６には、どのようなデータが入っていても演算結果には影響しない。 Here, when the parallelization shape is 2×4, the pixel data stored in the registers 311, 312, 315, and 316 are supplied to the sum-of-products calculator because the selectors 331 and 333 are disconnected. never Therefore, when the parallelization shape is 2×4, the operation result is not affected regardless of what data is stored in the registers 311, 312, 315, and 316. FIG. Similarly, when the parallelization shape is 1×8, the pixel data stored in the registers 311 to 316 are not supplied to the sum-of-products calculator because the selectors 331 to 333 are disconnected. . Therefore, when the parallelization shape is 1×8, the operation result is not affected regardless of what data is stored in the registers 311 to 316 .

以上のように、形状制御部１０２から供給される形状制御信号Ｓ５により、演算処理部１０６内部のレジスタ３０１～３０８、３１１～３１８の接続状態が制御される。例えば、並列化形状を１×８とする場合、レジスタ３１８→３１７→３０８→３０７→３０６→３０５→３０４→３０３→３０２→３０１という順で画素データが転送されるように演算処理部１０６内の接続状態が制御される。また、例えば、並列化形状を２×４とする場合、レジスタ３１８→３１７→３０８→３０７→３０６→３０５→３１４→３１３→３０４→３０３→３０２→３０１という順で画素データが転送されるように接続状態が制御される。また、例えば、並列化形状を２×４とする場合、レジスタ３１８→３１７→３０８→３０７→３１６→３１５→３０６→３０５→３１４→３１３→３０４→３０３→３１２→３１１→３０２→３０１という順で画素データが転送されるように接続状態が制御される。 As described above, the connection states of the registers 301 to 308 and 311 to 318 inside the arithmetic processing unit 106 are controlled by the shape control signal S5 supplied from the shape control unit 102 . For example, when the parallelization shape is 1×8, the pixel data are transferred in the order of registers 318→317→308→307→306→305→304→303→302→301. Connection state is controlled. Also, for example, when the parallelized shape is 2×4, pixel data are transferred in the order of registers 318→317→308→307→306→305→314→313→304→303→302→301. Connection state is controlled. Further, for example, when the parallelization shape is 2×4, the order is register 318→317→308→307→316→315→306→305→314→313→304→303→312→311→302→301. The connection state is controlled so that pixel data is transferred.

次に、演算処理部１０６の動作について説明する。以下に説明する演算処理部１０６の動作は、転送制御部１０３から入力される転送制御信号Ｓ４によって制御される。この動作が行われる前に、前述したようにして演算処理部１０６内におけるレジスタ３０１～３０８、３１１～３１８の接続状態は、所望の並列化形状に従って制御されているものとする。つまり、セレクタ３３１～３３３の接続状態（入出力関係）は、決定されているものとする。 Next, operation of the arithmetic processing unit 106 will be described. The operation of the arithmetic processing unit 106 described below is controlled by a transfer control signal S4 input from the transfer control unit 103. FIG. It is assumed that before this operation is performed, the connection states of the registers 301 to 308 and 311 to 318 in the arithmetic processing unit 106 are controlled according to the desired parallelization configuration as described above. In other words, it is assumed that the connection states (input/output relationships) of the selectors 331 to 333 are determined.

演算処理部１０６では、転送制御部１０３からの転送制御信号により、レジスタ３０１～３０８、３１１～３１８に記憶されている画素データがレジスタ間をどのように転送されていくのかが制御される。まず、ここでは転送制御の概要を述べ、転送制御によるフィルタ演算の詳細は後述する。 The arithmetic processing unit 106 controls how the pixel data stored in the registers 301 to 308 and 311 to 318 are transferred between the registers according to the transfer control signal from the transfer control unit 103 . First, an outline of transfer control will be described here, and details of filter calculation by transfer control will be described later.

転送制御信号によるフィルタ演算時の画素データの転送制御は、第１の転送モード及び第２の転送モードの２つの転送モードを有する。第１の転送モードでは、転送される画素データが演算グループ内の積和演算器での演算にのみ使用されるように、レジスタ間で画素データが転送される。つまり、第１の転送モードの時には、ある演算グループに属する積和演算器での演算に使用された画素データが、他の演算グループに属する積和演算器での演算に使用されることはない。この第１の転送モードで演算処理部１０６内の画素データ転送が制御されているときには、演算処理部１０６がフィルタカーネル１行分の演算処理を行う。 Transfer control of pixel data at the time of filter calculation by a transfer control signal has two transfer modes, a first transfer mode and a second transfer mode. In the first transfer mode, pixel data is transferred between registers so that the transferred pixel data is used only for calculations in the sum-of-products calculator within the calculation group. That is, in the first transfer mode, the pixel data used for calculation by the sum-of-products calculator belonging to one calculation group is not used for calculations by the sum-of-products calculator belonging to another calculation group. . When pixel data transfer in the arithmetic processing unit 106 is controlled in this first transfer mode, the arithmetic processing unit 106 performs arithmetic processing for one row of the filter kernel.

また、第２の転送モードでは、転送される画素データが別の演算グループの積和演算器での演算にも使用されるように、レジスタ間で画素データが転送される。演算処理部１０６にて、フィルタカーネル１行分の演算処理が終了すると、第２の転送モードによる画素データ転送が実行される。一般にフィルタカーネルは複数行で構成されるので、第１の転送モードによる画素データの転送及び第２の転送モードによる画素データの転送を繰り返すことで、フィルタ演算が実行される。 Further, in the second transfer mode, pixel data is transferred between registers so that the transferred pixel data is also used for calculation in the sum-of-products calculator of another calculation group. When the arithmetic processing unit 106 completes arithmetic processing for one row of the filter kernel, pixel data transfer is executed in the second transfer mode. Since a filter kernel generally consists of a plurality of rows, the filter operation is executed by repeating the transfer of pixel data in the first transfer mode and the transfer of pixel data in the second transfer mode.

例えば、図１４に示したフィルタカーネル１４０２を用いたフィルタ演算のような場合、はじめに第１の転送モードで画素データの転送が行われて最初の１行分の演算が積和演算器で行われ、次式に示す演算結果が得られる。 For example, in the case of filter calculation using the filter kernel 1402 shown in FIG. 14, pixel data is first transferred in the first transfer mode, and the calculation for the first row is performed by the sum-of-products calculator. , the following equation is obtained.

続いて、第２の転送モードで画素データの転送が行われ、その後、再び第１の転送モードで画素データの転送が行われて次の１行分の演算が積和演算器で行われ、次式に示す演算結果が得られる。 Subsequently, the pixel data is transferred in the second transfer mode, and then the pixel data is transferred again in the first transfer mode, and the calculation for the next row is performed by the sum-of-products calculator, A calculation result shown in the following equation is obtained.

その後、再び第２の転送モードで画素データの転送が行われ、さらに第１の転送モードで画素データの転送が行われて次の１行分の演算が積和演算器で行われ、次式に示すフィルタ演算結果が得られる。 After that, the pixel data is transferred again in the second transfer mode, and the pixel data is transferred in the first transfer mode. A filter operation result shown in is obtained.

この一連の動作を、図４に示すフローチャートと、図５、図６、及び図７に示すタイムチャートと参照して説明する。図４は、第１の実施形態における画像処理装置の演算処理例を示すフローチャートである。図５は、並列化形状を２×４とした場合のフィルタ演算の演算処理例を示すタイムチャートであり、図６は、並列化形状を４×２とした場合のフィルタ演算の演算処理例を示すタイムチャートである。また、図７は、並列化形状を１×８とした場合のフィルタ演算の演算処理例を示すタイムチャートである。 This series of operations will be described with reference to the flowchart shown in FIG. 4 and the time charts shown in FIGS. FIG. 4 is a flowchart illustrating an example of arithmetic processing of the image processing apparatus according to the first embodiment. FIG. 5 is a time chart showing an example of filter operation processing when the parallelization shape is 2×4, and FIG. 6 is an example of filter operation processing when the parallelization shape is 4×2. It is a time chart showing. Also, FIG. 7 is a time chart showing an example of filter calculation processing when the parallelization shape is 1×8.

最初に、図４及び図５を参照して、並列化形状を２×４とした場合の動作について説明する。前述したように、形状制御部１０２による並列化形状の決定は、図４のフローチャートに示す並列演算処理の開始に先だって行われており、演算処理部１０６内のセレクタ３３１～３３３の接続状態は決定されているものとする。セレクタ３３１は、レジスタ３０３の出力を選択してレジスタ３０２に出力し、セレクタ３３２は、レジスタ３１３の出力を選択してレジスタ３０４に出力し、セレクタ３３３は、レジスタ３０７の出力を選択してレジスタ３０６に出力するよう制御されている。また、前述のように並列化形状を２×４とした場合、レジスタ３１１、３１２、３１５、３１６に格納されているデータは演算に使用されない。したがって、図５に示すタイムチャートでは、これらのレジスタ３１１、３１２、３１５、３１６の出力欄は空欄としている。 First, with reference to FIGS. 4 and 5, the operation when the parallelized shape is 2×4 will be described. As described above, the parallelized shape is determined by the shape control unit 102 prior to the start of the parallel arithmetic processing shown in the flowchart of FIG. It shall be The selector 331 selects the output of the register 303 and outputs it to the register 302 , the selector 332 selects the output of the register 313 and outputs it to the register 304 , the selector 333 selects the output of the register 307 and outputs it to the register 306 . is controlled to output to Moreover, when the parallelization shape is 2×4 as described above, the data stored in the registers 311, 312, 315, and 316 are not used for the calculation. Therefore, in the time chart shown in FIG. 5, the output columns of these registers 311, 312, 315 and 316 are blank.

図４に示すステップＳ４０１にて、画像処理装置１０１は、演算処理部１０６が有する積和演算器での並列演算（フィルタ演算）のための準備（画素データの用意）を行い、並列演算で使用するレジスタ群に画像データ格納部１０５から画素データを供給する。このステップＳ４０１では、転送制御部１０３は、画像データ格納部１０５から演算対象画像の水平方向に６画素（Ｄ００、Ｄ０１、・・・、Ｄ０５）の画素データを読み出して、演算処理部１０６に転送するよう指示する（図５の時刻ｔ０～ｔ５）。転送制御部１０３は、その転送が終了すると、画像データ格納部１０５から演算対象画像の次の行（２行目）から６画素（Ｄ１０、Ｄ１１、・・・、Ｄ１５）の画素データを読み出して、演算処理部１０６に転送するよう指示する（図５の時刻ｔ６～ｔ１１）。これにより、この次のサイクル（図５の時刻ｔ１２）では、演算処理部１０６において演算に使用されるレジスタには、演算対象の画素データが格納された状態になり、図５に示す時刻ｔ１２から積和演算器３２１～３２８による積和演算が実行される。 In step S401 shown in FIG. 4, the image processing apparatus 101 prepares (prepares pixel data) for parallel computation (filter computation) in the sum-of-products calculator of the computation processing unit 106, and prepares pixel data for use in the parallel computation. Pixel data is supplied from the image data storage unit 105 to a group of registers to be used. In step S401, the transfer control unit 103 reads pixel data of six pixels (D00, D01, . (time t0 to t5 in FIG. 5). When the transfer is completed, the transfer control unit 103 reads the pixel data of 6 pixels (D10, D11, . , instructs to transfer to the arithmetic processing unit 106 (time t6 to t11 in FIG. 5). As a result, in the next cycle (time t12 in FIG. 5), the pixel data to be computed is stored in the registers used for computation in the computation processing unit 106, and from time t12 in FIG. Sum-of-products calculations are performed by sum-of-products calculators 321-328.

なお、図５の時刻ｔ１２以降も、転送制御部１０３は、行を順次ずらし、画像データ格納部１０５から演算対象画像の水平方向に６画素ずつ画素データを読み出して、演算処理部１０６に転送するよう指示する。これにより、例えば、図５の時刻ｔ１２～ｔ１７に示されるように画像データ格納部１０５から３行目の演算対象の画素データ（Ｄ２０、Ｄ２１、・・・、Ｄ２５）が演算処理部１０６のレジスタに供給される。また、例えば、それに続く、時刻ｔ１８～ｔ２３に示されるように画像データ格納部１０５から４行目の演算対象の画素データ（Ｄ３０、Ｄ３１、・・・、Ｄ３５）が演算処理部１０６のレジスタに供給される。 After time t12 in FIG. 5, the transfer control unit 103 sequentially shifts rows, reads pixel data from the image data storage unit 105 in units of six pixels in the horizontal direction of the calculation target image, and transfers the pixel data to the calculation processing unit 106. to do so. As a result, for example, pixel data (D20, D21, . supplied to Further, for example, the pixel data (D30, D31, . supplied.

続いてステップＳ４０２で、演算処理部１０６が、フィルタカーネルにおける水平方向の１行目のフィルタ係数（Ｗ００、Ｗ０１、Ｗ０２）と画素データとの積和演算を行う。そのため、転送制御部１０３は、演算処理部１０６内のレジスタに対して、シフト処理（レジスタ間の画素データ転送）を行うように指示する。この転送制御は第１の転送モードであり、その転送中に、ある演算グループに属する積和演算器に供給された画素データが、別の演算グループに属する積和演算器に供給されることはない。例えば、グループＢ２に属する積和演算器３２５～３２８に供給された画素データＤ１０～Ｄ１５は、グループＢ１に属する積和演算器３２１～３２４に供給されない。 Subsequently, in step S402, the arithmetic processing unit 106 performs a sum-of-products operation of the filter coefficients (W00, W01, W02) of the first row in the horizontal direction in the filter kernel and the pixel data. Therefore, the transfer control unit 103 instructs the registers in the arithmetic processing unit 106 to perform shift processing (transfer of pixel data between registers). This transfer control is the first transfer mode, during which the pixel data supplied to the sum-of-products calculator belonging to one calculation group is not supplied to the sum-of-products calculator belonging to another calculation group. do not have. For example, the pixel data D10-D15 supplied to the sum-of-products calculators 325-328 belonging to group B2 are not supplied to the sum-of-products calculators 321-324 belonging to group B1.

また、このステップＳ４０２では、転送制御部１０３は、第１の転送モードのタイミングに合わせて、フィルタ係数格納部１０４からフィルタ係数（フィルタカーネルの水平１行分）を読み出して、演算処理部１０６の積和演算器に転送するよう指示する。このようにして、演算処理部１０６では、第１の転送モードで演算対象の画素データの転送が行われ、フィルタカーネルの水平方向１行分の積和演算が実行される（図５の時刻ｔ１２～ｔ１４）。 In step S402, the transfer control unit 103 reads the filter coefficients (one horizontal line of the filter kernel) from the filter coefficient storage unit 104 in synchronization with the timing of the first transfer mode, and Instruct to transfer to the sum-of-products calculator. In this manner, in the arithmetic processing unit 106, the pixel data to be operated is transferred in the first transfer mode, and the sum-of-products operation for one row in the horizontal direction of the filter kernel is executed (time t12 in FIG. 5). ~ t14).

続いて、ステップＳ４０３では、転送制御部１０３は、並列演算（フィルタ演算）の終了条件を確認し、並列演算（フィルタ演算）が終了したか否かを判定する。この時点では、フィルタカーネルの水平方向の１行目と画素データとの積和演算を行っただけであるので、並列演算（フィルタ演算）が終了していないと転送制御部１０３が判定し（ＮＯ）、ステップＳ４０４に進む。 Subsequently, in step S403, the transfer control unit 103 confirms conditions for ending the parallel computation (filter computation), and determines whether or not the parallel computation (filter computation) has ended. At this point, the transfer control unit 103 determines that the parallel operation (filter operation) has not ended (NO ) and proceed to step S404.

ステップＳ４０４では、ある演算グループに属する積和演算器に供給された画素データが別の演算グループに属する積和演算器に供給されるように、転送制御部１０３は、演算処理部１０６に指示する（図５の時刻ｔ１５～ｔ１７）。具体的には、転送制御部１０３は、ある演算グループに属する積和演算器に供給された画素データが別の演算グループに属する積和演算器に供給されるように、演算処理部１０６内のレジスタに対して、シフト処理を行うように指示する。この転送制御は第２の転送モードであり、その転送中の間は、積和演算器は積和演算を行うことを停止する。 In step S404, the transfer control unit 103 instructs the arithmetic processing unit 106 so that the pixel data supplied to the sum-of-products arithmetic unit belonging to one arithmetic group is supplied to the sum-of-products arithmetic unit belonging to another arithmetic group. (Times t15 to t17 in FIG. 5). Specifically, the transfer control unit 103 controls the operation processing unit 106 so that pixel data supplied to a sum-of-products calculator belonging to a certain calculation group is supplied to a sum-of-products calculator belonging to another calculation group. Instructs the register to shift. This transfer control is the second transfer mode, and the sum-of-products arithmetic unit stops performing the sum-of-products operation during the transfer.

この第２の転送モードでの画素データの転送により、次のサイクル（図５の時刻ｔ１８）では、演算グループ（グループＢ１）に属する積和演算器３２１～３２４に画素データＤ１０～Ｄ１５が供給可能な状態になる。この画素データＤ１０～Ｄ１５は、前回の第１の転送モード開始前に、別の演算グループ（グループＢ２）に属する積和演算器３２５～３２８に供給されていた画素データである。また、時刻ｔ１２～ｔ１７における画像データの読み出し及び転送により、次のサイクル（時刻ｔ１８）では、演算グループ（グループＢ２）に属する積和演算器３２５～３２８に３行目の画素データＤ２０～Ｄ２５が供給可能な状態になる。 By transferring the pixel data in this second transfer mode, in the next cycle (time t18 in FIG. 5), the pixel data D10-D15 can be supplied to the sum-of-products calculators 321-324 belonging to the calculation group (group B1). state. The pixel data D10 to D15 are the pixel data supplied to the sum-of-products calculators 325 to 328 belonging to another calculation group (group B2) before the previous start of the first transfer mode. Further, by reading and transferring the image data at times t12 to t17, in the next cycle (time t18), the pixel data D20 to D25 of the third row are transferred to the sum-of-products calculators 325 to 328 belonging to the calculation group (group B2). become available for supply.

その後、再度ステップＳ４０２の処理を行い、演算処理部１０６が、フィルタカーネルにおける水平方向の２行目のフィルタ係数（Ｗ１０、Ｗ１１、Ｗ１２）と画素データとの積和演算を行う（図５の時刻ｔ１８～ｔ２０）。以降、ステップＳ４０３で、転送制御部１０３が、並列演算（フィルタ演算）の終了と判定するまで、画像処理装置１０１は、ステップＳ４０２及びステップＳ４０４の処理を繰り返し実行する。 After that, the process of step S402 is performed again, and the arithmetic processing unit 106 performs a sum-of-products operation of the filter coefficients (W10, W11, W12) of the second row in the filter kernel in the horizontal direction and the pixel data (at the time shown in FIG. 5). t18-t20). Thereafter, the image processing apparatus 101 repeatedly executes the processes of steps S402 and S404 until the transfer control unit 103 determines in step S403 that the parallel computation (filter computation) has ended.

フィルタカーネルにおける水平方向の３行目のフィルタ係数（Ｗ２０、Ｗ２１、Ｗ２２）と画素データとの積和演算が行われた後のステップＳ４０３で、転送制御部１０３は、並列演算（フィルタ演算）の終了と判定し（ＹＥＳ）、処理が終了する。並列演算（フィルタ演算）の終了時には、演算処理部１０６内の積和演算器３２１～３２８からは、次式で表される演算結果が出力される。 In step S403 after the product-sum operation of the filter coefficients (W20, W21, W22) on the third horizontal line in the filter kernel and the pixel data, the transfer control unit 103 performs parallel operation (filter operation). It is determined to be finished (YES), and the process ends. At the end of the parallel computation (filter computation), the sum-of-products calculators 321 to 328 in the computation processing unit 106 output computation results expressed by the following equations.

図５の時刻ｔ２４での処理により並列演算（フィルタ演算）が終了した場合には、例えば、積和演算器３２１からはｉ＝０，ｊ＝０としたときの演算結果が出力され、積和演算器３２４からはｉ＝０，ｊ＝３としたときの演算結果が出力される。また、例えば、積和演算器３２５からはｉ＝１，ｊ＝０としたときの演算結果が出力され、積和演算器３２８からはｉ＝１，ｊ＝３としたときの演算結果が出力される。 When the parallel operation (filter operation) is completed by the processing at time t24 in FIG. The calculator 324 outputs the calculation result when i=0 and j=3. Further, for example, the product-sum calculator 325 outputs the calculation result when i=1 and j=0, and the product-sum calculator 328 outputs the calculation result when i=1 and j=3. be done.

さらに、図５の時刻ｔ２４以降では、前述した演算処理に並行して、次のフィルタ演算のために、画像データ格納部１０５からの読み出し及びレジスタのシフト処理が、パイプライン的に先行して行われている。前述した動作を、演算対象画像において、演算対象の行を順次ずらし、さらに最終行に達したら１行目に戻って演算対象の列を次の列にずらすようにして、図４に示した動作を繰り返すことにより、演算対象画像全体をフィルタ演算した演算出力画像が得られる。 Further, after time t24 in FIG. 5, reading from the image data storage unit 105 and register shift processing are performed prior to the next filter calculation in parallel with the above-described calculation processing in a pipeline manner. It is The above-described operation is performed by sequentially shifting the rows to be computed in the image to be computed, and then returning to the first row when the last row is reached and shifting the column to be computed to the next column, thereby performing the operation shown in FIG. is repeated to obtain a computation output image obtained by filtering the entire computation target image.

次に、図４及び図６を参照して、並列化形状を４×２とした場合の動作について説明する。並列化形状を２×４とした場合と同様に、形状制御部１０２による並列化形状の決定は、図４のフローチャートに示す並列演算処理の開始に先だって行われており、演算処理部１０６内のセレクタ３３１～３３３の接続状態は決定されているものとする。セレクタ３３１は、レジスタ３１１の出力を選択してレジスタ３０２に出力し、セレクタ３３２は、レジスタ３１３の出力を選択してレジスタ３０４に出力し、セレクタ３３３は、レジスタ３１５の出力を選択してレジスタ３０８に出力するよう制御されている。並列化形状を４×２とした場合、すべてのレジスタ３０１～３０８、３１１～３１８に格納されたデータが演算に使用される。 Next, with reference to FIGS. 4 and 6, the operation when the parallelization shape is 4×2 will be described. Similar to the case where the parallelized shape is 2×4, the determination of the parallelized shape by the shape control unit 102 is performed prior to the start of the parallel arithmetic processing shown in the flowchart of FIG. It is assumed that the connection states of the selectors 331 to 333 have been determined. The selector 331 selects the output of the register 311 and outputs it to the register 302 , the selector 332 selects the output of the register 313 and outputs it to the register 304 , the selector 333 selects the output of the register 315 and outputs it to the register 308 . is controlled to output to When the parallelization configuration is 4×2, data stored in all registers 301 to 308 and 311 to 318 are used for operations.

図４に示すステップＳ４０１にて、画像処理装置１０１は、演算処理部１０６が有する積和演算器での並列演算（フィルタ演算）のための準備（画素データの用意）を行い、並列演算で使用するレジスタ群に画像データ格納部１０５から画素データを供給する。このステップＳ４０１では、転送制御部１０３は、画像データ格納部１０５から演算対象画像の水平方向に４画素（Ｄ００、Ｄ０１、Ｄ０２、Ｄ０３）の画素データを読み出して、演算処理部１０６に転送するよう指示する（図６の時刻ｔ０～ｔ３）。転送制御部１０３は、その転送が終了すると、画像データ格納部１０５から演算対象画像の次の行（２行目）から４画素（Ｄ１０、Ｄ１１、Ｄ１２、Ｄ１３）の画素データを読み出して、演算処理部１０６に転送するように指示する（図６の時刻ｔ４～ｔ７）。 In step S401 shown in FIG. 4, the image processing apparatus 101 prepares (prepares pixel data) for parallel computation (filter computation) in the sum-of-products calculator of the computation processing unit 106, and prepares pixel data for use in the parallel computation. Pixel data is supplied from the image data storage unit 105 to a group of registers to be used. In this step S401, the transfer control unit 103 reads pixel data of four pixels (D00, D01, D02, D03) in the horizontal direction of the calculation target image from the image data storage unit 105, and transfers the pixel data to the calculation processing unit 106. instruction (time t0 to t3 in FIG. 6). When the transfer is completed, the transfer control unit 103 reads the pixel data of four pixels (D10, D11, D12, D13) from the next row (second row) of the calculation target image from the image data storage unit 105, and performs the calculation. The processor 106 is instructed to transfer (time t4 to t7 in FIG. 6).

さらに、転送制御部１０３は、画像データ格納部１０５から演算対象画像の次の行（３行目）から４画素（Ｄ２０、Ｄ２１、Ｄ２２、Ｄ２３）の画素データを読み出して、演算処理部１０６に転送するように指示する（図６の時刻ｔ８～ｔ１１）。また、さらに、転送制御部１０３は、画像データ格納部１０５から演算対象画像の次の行（４行目）から４画素（Ｄ３０、Ｄ３１、Ｄ３２、Ｄ３３）の画素データを読み出して、演算処理部１０６に転送するように指示する（図６の時刻ｔ１２～ｔ１５）。これにより、この次のサイクル（図６の時刻ｔ１６）では、演算処理部１０６において演算に使用されるレジスタには、演算対象の画素データが格納された状態になり、図６に示す時刻ｔ１６から積和演算器３２１～３２８による積和演算が実行される。 Further, the transfer control unit 103 reads out pixel data of four pixels (D20, D21, D22, D23) from the next row (third row) of the calculation target image from the image data storage unit 105, and sends the data to the calculation processing unit 106. A transfer is instructed (time t8 to t11 in FIG. 6). Furthermore, the transfer control unit 103 reads pixel data of four pixels (D30, D31, D32, D33) from the next row (fourth row) of the calculation target image from the image data storage unit 105, and 106 (time t12 to t15 in FIG. 6). As a result, in the next cycle (time t16 in FIG. 6), the register used for calculation in the arithmetic processing unit 106 stores the pixel data to be calculated, and from time t16 shown in FIG. Sum-of-products calculations are performed by sum-of-products calculators 321-328.

なお、図６の時刻ｔ１６以降も、転送制御部１０３は、行を順次ずらし、画像データ格納部１０５から演算対象画像の水平方向に４画素ずつ画素データを読み出して、演算処理部１０６に転送するよう指示する。これにより、例えば、図６の時刻ｔ１６～ｔ１９に示されるように画像データ格納部１０５から５行目の演算対象の画素データ（Ｄ４０、Ｄ４１、Ｄ４２、Ｄ４３）が演算処理部１０６のレジスタに供給される。また、例えば、それに続く、時刻ｔ２０～ｔ２３に示されるように画像データ格納部１０５から６行目の演算対象の画素データ（Ｄ５０、Ｄ５１、Ｄ５２、Ｄ５３）が演算処理部１０６のレジスタに供給される。 After time t16 in FIG. 6, the transfer control unit 103 sequentially shifts the rows, reads pixel data from the image data storage unit 105 in units of four pixels in the horizontal direction of the calculation target image, and transfers the pixel data to the calculation processing unit 106. to do so. As a result, for example, pixel data (D40, D41, D42, D43) to be calculated in the fifth row are supplied from the image data storage unit 105 to the registers of the arithmetic processing unit 106 as shown at times t16 to t19 in FIG. be done. Further, for example, the pixel data (D50, D51, D52, D53) to be operated on the sixth row are supplied from the image data storage unit 105 to the registers of the arithmetic processing unit 106 as indicated at times t20 to t23. be.

続いてステップＳ４０２で、演算処理部１０６が、フィルタカーネルにおける水平方向の１行目のフィルタ係数（Ｗ００、Ｗ０１、Ｗ０２）と画素データとの積和演算を行う。そのため、転送制御部１０３は、演算処理部１０６内のレジスタに対して、シフト処理を行うように指示する。この転送制御は第１の転送モードである。また、このステップＳ４０２では、転送制御部１０３は、第１の転送モードのタイミングに合わせて、フィルタ係数格納部１０４からフィルタ係数（フィルタカーネルの水平１行分）を読み出して、演算処理部１０６の積和演算器に転送するよう指示する。このようにして、演算処理部１０６では、第１の転送モードで演算対象の画素データの転送が行われ、フィルタカーネルの水平方向１行分の積和演算が実行される（図６の時刻ｔ１６～ｔ１８）。 Subsequently, in step S402, the arithmetic processing unit 106 performs a sum-of-products operation of the filter coefficients (W00, W01, W02) of the first row in the horizontal direction in the filter kernel and the pixel data. Therefore, the transfer control unit 103 instructs the register in the arithmetic processing unit 106 to perform shift processing. This transfer control is the first transfer mode. In step S402, the transfer control unit 103 reads the filter coefficients (one horizontal line of the filter kernel) from the filter coefficient storage unit 104 in synchronization with the timing of the first transfer mode, and Instruct to transfer to the sum-of-products calculator. In this manner, in the arithmetic processing unit 106, the pixel data to be operated is transferred in the first transfer mode, and the sum-of-products operation for one row in the horizontal direction of the filter kernel is executed (time t16 in FIG. 6). ~ t18).

続いて、ステップＳ４０３では、転送制御部１０３は、並列演算（フィルタ演算）の終了条件を確認し、並列演算（フィルタ演算）が終了したか否かを判定する。この時点では、フィルタカーネルの水平方向の１行目について積和演算を行っただけであるので、並列演算（フィルタ演算）が終了していないと転送制御部１０３が判定し（ＮＯ）、ステップＳ４０４に進む。 Subsequently, in step S403, the transfer control unit 103 confirms conditions for ending the parallel computation (filter computation), and determines whether or not the parallel computation (filter computation) has ended. At this point, the sum-of-products operation has only been performed for the first row in the horizontal direction of the filter kernel, so the transfer control unit 103 determines that the parallel operation (filter operation) has not ended (NO), and step S404. proceed to

ステップＳ４０４では、ある演算グループに属する積和演算器に供給された画素データが別の演算グループに属する積和演算器に供給されるように、転送制御部１０３は、演算処理部１０６に指示する（図６の時刻ｔ１９）。具体的には、転送制御部１０３は、ある演算グループに属する積和演算器に供給された画素データが別の演算グループに属する積和演算器に供給されるように、演算処理部１０６内のレジスタに対して、シフト処理を行うように指示する。この転送制御は第２の転送モードであり、その転送中の間は、積和演算器は積和演算を行うことを停止する。 In step S404, the transfer control unit 103 instructs the arithmetic processing unit 106 so that the pixel data supplied to the sum-of-products arithmetic unit belonging to one arithmetic group is supplied to the sum-of-products arithmetic unit belonging to another arithmetic group. (Time t19 in FIG. 6). Specifically, the transfer control unit 103 controls the operation processing unit 106 so that pixel data supplied to a sum-of-products calculator belonging to a certain calculation group is supplied to a sum-of-products calculator belonging to another calculation group. Instructs the register to shift. This transfer control is the second transfer mode, and the sum-of-products arithmetic unit stops performing the sum-of-products operation during the transfer.

この第２の転送モードでの画素データの転送により、次のサイクル（図６の時刻ｔ２０）では、その前に別の異なる演算グループに属する積和演算器に供給されていた画素データが、積和演算器３２１～３２６に供給可能な状態になる。また、時刻ｔ１６～ｔ１９における画像データの読み出し及び転送により、次のサイクル（時刻ｔ２０）では、演算グループ（グループＣ４）に属する積和演算器３２７、３２８に５行目の画素データＤ４０～Ｄ４３が供給可能な状態になる。 By transferring the pixel data in this second transfer mode, in the next cycle (time t20 in FIG. 6), the pixel data previously supplied to the sum-of-products arithmetic unit belonging to a different arithmetic group is transferred to the product-accumulator. It becomes ready to be supplied to the sum calculators 321-326. Further, by reading and transferring the image data at times t16 to t19, in the next cycle (time t20), the pixel data D40 to D43 of the fifth row are transferred to the sum-of-products calculators 327 and 328 belonging to the calculation group (group C4). become available for supply.

その後、再度ステップＳ４０２の処理を行い、演算処理部１０６が、フィルタカーネルにおける水平方向の２行目のフィルタ係数（Ｗ１０、Ｗ１１、Ｗ１２）と画素データとの積和演算を行う（図６の時刻ｔ２０～ｔ２２）。以降、ステップＳ４０３で、転送制御部１０３が、並列演算（フィルタ演算）の終了と判定するまで、画像処理装置１０１は、ステップＳ４０２及びステップＳ４０４の処理を繰り返し実行する。 After that, the process of step S402 is performed again, and the arithmetic processing unit 106 performs the sum-of-products operation of the filter coefficients (W10, W11, W12) of the second row in the filter kernel in the horizontal direction and the pixel data (the time shown in FIG. 6). t20-t22). Thereafter, the image processing apparatus 101 repeatedly executes the processes of steps S402 and S404 until the transfer control unit 103 determines in step S403 that the parallel computation (filter computation) has ended.

図６の時刻ｔ２４での処理により並列演算（フィルタ演算）が終了した場合には、例えば、積和演算器３２１からはｉ＝０，ｊ＝０としたときの演算結果が出力され、積和演算器３２２からはｉ＝０，ｊ＝１としたときの演算結果が出力される。また、例えば、積和演算器３２３からはｉ＝１，ｊ＝０としたときの演算結果が出力され、積和演算器３２８からはｉ＝３，ｊ＝１としたときの演算結果が出力される。 When the parallel operation (filter operation) is completed by the processing at time t24 in FIG. The calculator 322 outputs the calculation result when i=0 and j=1. Further, for example, the product-sum calculator 323 outputs the calculation result when i=1 and j=0, and the product-sum calculator 328 outputs the calculation result when i=3 and j=1. be done.

さらに、図６の時刻ｔ２４以降では、前述した演算処理に並行して、次のフィルタ演算のために、画像データ格納部１０５からの読み出し及びレジスタのシフト処理が、パイプライン的に先行して行われている。前述した動作を、演算対象画像において、演算対象の行を順次ずらし、さらに最終行に達したら１行目に戻って演算対象の列を次の列にずらすようにして、図４に示した動作を繰り返すことにより、演算対象画像全体をフィルタ演算した演算出力画像が得られる。 Further, after time t24 in FIG. 6, reading from the image data storage unit 105 and register shift processing are performed prior to the next filter calculation in parallel with the above-described calculation processing in a pipeline manner. It is The above-described operation is performed by sequentially shifting the rows to be computed in the image to be computed, and then returning to the first row when the last row is reached and shifting the column to be computed to the next column, thereby performing the operation shown in FIG. is repeated to obtain a computation output image obtained by filtering the entire computation target image.

続いて、図４及び図７を参照して、並列化形状を１×８とした場合の動作について説明する。前述したように、形状制御部１０２による並列化形状の決定は、図４のフローチャートに示す並列演算処理の開始に先だって行われており、演算処理部１０６内のセレクタ３３１～３３３の接続状態は決定されているものとする。セレクタ３３１は、レジスタ３０３の出力を選択してレジスタ３０２に出力し、セレクタ３３２は、レジスタ３０５の出力を選択してレジスタ３０４に出力し、セレクタ３３３は、レジスタ３０７の出力を選択してレジスタ３０６に出力するよう制御されている。また、前述のように並列化形状を１×８とした場合、レジスタ３１１～３１６に格納されているデータは演算に使用されない。したがって、図７に示すタイムチャートでは、これらのレジスタ３１１～３１６の出力欄は空欄としている。 Next, with reference to FIGS. 4 and 7, the operation when the parallelized shape is 1×8 will be described. As described above, the parallelized shape is determined by the shape control unit 102 prior to the start of the parallel arithmetic processing shown in the flowchart of FIG. It shall be The selector 331 selects the output of the register 303 and outputs it to the register 302 , the selector 332 selects the output of the register 305 and outputs it to the register 304 , the selector 333 selects the output of the register 307 and outputs it to the register 306 . is controlled to output to Further, when the parallelization shape is 1×8 as described above, the data stored in the registers 311 to 316 are not used for the calculation. Therefore, in the time chart shown in FIG. 7, the output columns of these registers 311 to 316 are blank.

図４に示すステップＳ４０１にて、画像処理装置１０１は、演算処理部１０６が有する積和演算器での並列演算（フィルタ演算）のための準備（画素データの用意）を行い、並列演算で使用するレジスタ群に画像データ格納部１０５から画素データを供給する。このステップＳ４０１では、転送制御部１０３は、画像データ格納部１０５から演算対象画像の水平方向に１０画素（Ｄ００、Ｄ０１、・・・、Ｄ０９）の画素データを読み出して、演算処理部１０６に転送するよう指示する（図７の時刻ｔ０～ｔ９）。これにより、この次のサイクル（図７の時刻ｔ１０）では、演算処理部１０６において演算に使用されるレジスタには、演算対象の画素データが格納された状態になり、図７に示す時刻ｔ１０から積和演算器３２１～３２８による積和演算が実行される。 In step S401 shown in FIG. 4, the image processing apparatus 101 prepares (prepares pixel data) for parallel computation (filter computation) in the sum-of-products calculator of the computation processing unit 106, and prepares pixel data for use in the parallel computation. Pixel data is supplied from the image data storage unit 105 to a group of registers to be used. In step S401, the transfer control unit 103 reads pixel data of 10 pixels (D00, D01, . (time t0 to t9 in FIG. 7). As a result, in the next cycle (time t10 in FIG. 7), the registers used for calculation in the arithmetic processing unit 106 store the pixel data to be calculated, and from time t10 shown in FIG. Sum-of-products calculations are performed by sum-of-products calculators 321-328.

なお、図７の時刻ｔ１０以降も、転送制御部１０３は、行を順次ずらし、画像データ格納部１０５から演算対象画像の水平方向に１０画素ずつ画素データを読み出して、演算処理部１０６に転送するよう指示する。これにより、例えば、図７の時刻ｔ１０～ｔ１９に示されるように画像データ格納部１０５から２行目の演算対象の画素データ（Ｄ１０、Ｄ１１、・・・、Ｄ１９）が演算処理部１０６のレジスタに供給される。また、例えば、それに続く、時刻ｔ２０～ｔ２９に示されるように画像データ格納部１０５から３行目の演算対象の画素データ（Ｄ２０、Ｄ２１、・・・、Ｄ２９）が演算処理部１０６のレジスタに供給される。 After time t10 in FIG. 7, the transfer control unit 103 sequentially shifts the rows, reads pixel data from the image data storage unit 105 in units of 10 pixels in the horizontal direction of the calculation target image, and transfers the pixel data to the calculation processing unit 106. to do so. As a result, for example, pixel data (D10, D11, . supplied to Further, for example, the pixel data (D20, D21, . supplied.

続いてステップＳ４０２で、演算処理部１０６が、フィルタカーネルにおける水平方向の１行目のフィルタ係数（Ｗ００、Ｗ０１、Ｗ０２）と画素データとの積和演算を行う。そのため、転送制御部１０３は、演算処理部１０６内のレジスタに対して、シフト処理を行うように指示する。この転送制御は第１の転送モードである。また、このステップＳ４０２では、転送制御部１０３は、第１の転送モードのタイミングに合わせて、フィルタ係数格納部１０４からフィルタ係数（フィルタカーネルの水平１行分）を読み出して、演算処理部１０６の積和演算器に転送するよう指示する。このようにして、演算処理部１０６では、第１の転送モードで演算対象の画素データの転送が行われ、フィルタカーネルの水平方向１行分の積和演算が実行される（図７の時刻ｔ１０～ｔ１２）。 Subsequently, in step S402, the arithmetic processing unit 106 performs a sum-of-products operation of the filter coefficients (W00, W01, W02) of the first row in the horizontal direction in the filter kernel and the pixel data. Therefore, the transfer control unit 103 instructs the register in the arithmetic processing unit 106 to perform shift processing. This transfer control is the first transfer mode. In step S402, the transfer control unit 103 reads the filter coefficients (one horizontal line of the filter kernel) from the filter coefficient storage unit 104 in synchronization with the timing of the first transfer mode, and Instruct to transfer to the sum-of-products calculator. In this manner, in the arithmetic processing unit 106, the pixel data to be operated is transferred in the first transfer mode, and the sum-of-products operation for one row in the horizontal direction of the filter kernel is executed (time t10 in FIG. 7). ~ t12).

続いて、ステップＳ４０３では、転送制御部１０３は、並列演算（フィルタ演算）の終了条件を確認し、並列演算（フィルタ演算）が終了したか否かを判定する。この時点では、フィルタカーネルの水平方向の１行目について積和演算を行っただけであるので、並列演算（フィルタ演算）が終了していないと転送制御部１０３が判定し（ＮＯ）、ステップＳ４０４に進む。ただし、並列化形状を１×８とした場合、演算グループは１つであるので、ステップＳ４０４では何ら処理を行わずにステップＳ４０２に進む。また、時刻ｔ１０～ｔ１９における画像データの読み出し及び転送により、次のサイクル（時刻ｔ２０）では、積和演算器３２１～３２８に２行目の画素データＤ１０～Ｄ１９が供給可能な状態になる。 Subsequently, in step S403, the transfer control unit 103 confirms conditions for ending the parallel computation (filter computation), and determines whether or not the parallel computation (filter computation) has ended. At this point, the sum-of-products operation has only been performed for the first row in the horizontal direction of the filter kernel, so the transfer control unit 103 determines that the parallel operation (filter operation) has not ended (NO), and step S404. proceed to However, when the parallelized shape is 1×8, there is one operation group, so no processing is performed in step S404 and the process proceeds to step S402. Further, by reading and transferring the image data at times t10 to t19, the second row pixel data D10 to D19 can be supplied to the sum-of-products calculators 321 to 328 in the next cycle (time t20).

そして、再度ステップＳ４０２の処理を行い、演算処理部１０６が、フィルタカーネルにおける水平方向の２行目のフィルタ係数（Ｗ１０、Ｗ１１、Ｗ１２）と画素データとの積和演算を行う（図７の時刻ｔ２０～ｔ２２）。以降、ステップＳ４０３で、転送制御部１０３が、並列演算（フィルタ演算）の終了と判定するまで、画像処理装置１０１は、ステップＳ４０２及びステップＳ４０４の処理を繰り返し実行する。 Then, the process of step S402 is performed again, and the arithmetic processing unit 106 performs a product-sum operation of the filter coefficients (W10, W11, W12) in the second row in the horizontal direction of the filter kernel and the pixel data (at the time shown in FIG. 7). t20-t22). Thereafter, the image processing apparatus 101 repeatedly executes the processes of steps S402 and S404 until the transfer control unit 103 determines in step S403 that the parallel computation (filter computation) has ended.

図７の時刻ｔ３２での処理により並列演算（フィルタ演算）が終了した場合には、例えば、積和演算器３２１からはｉ＝０，ｊ＝０としたときの演算結果が出力され、積和演算器３２２からはｉ＝０，ｊ＝１としたときの演算結果が出力される。また、例えば、積和演算器３２８からはｉ＝０，ｊ＝７としたときの演算結果が出力される。前述した動作を、演算対象画像において、演算対象の行を順次ずらし、さらに最終行に達したら１行目に戻って演算対象の列を次の列にずらすようにして、図４に示した動作を繰り返すことにより、演算対象画像全体をフィルタ演算した演算出力画像が得られる。 When the parallel operation (filter operation) is completed by the processing at time t32 in FIG. The calculator 322 outputs the calculation result when i=0 and j=1. Further, for example, the sum-of-products calculator 328 outputs a calculation result when i=0 and j=7. The above-described operation is performed by sequentially shifting the rows to be computed in the image to be computed, and then returning to the first row when the last row is reached and shifting the column to be computed to the next column, thereby performing the operation shown in FIG. is repeated to obtain a computation output image obtained by filtering the entire computation target image.

第１の実施形態によれば、形状制御部１０２により複数の積和演算器を複数の演算グループにグループ化して並列化形状を決定し、演算処理部１０６内にあるレジスタ間の接続を設定する。さらに、転送制御部１０３により、演算対象の画素データの転送制御を行うことで、並列に演算されるフィルタ演算の出力画素が、演算出力画像上で２次元になるように演算することができる。さらには、その２次元形状も制御することが可能である。 According to the first embodiment, the shape control unit 102 groups a plurality of sum-of-products arithmetic units into a plurality of operation groups to determine a parallelization shape, and sets connections between registers in the arithmetic processing unit 106. . Furthermore, the transfer control unit 103 controls the transfer of pixel data to be calculated, so that the output pixels of filter calculations that are calculated in parallel can be calculated in two dimensions on the calculation output image. Furthermore, it is possible to control its two-dimensional shape.

これにより、並列度（積和演算器の数）に比較して、小さなサイズの演算対象画像に対しても、フィルタ演算に寄与しない積和演算器が発生することを抑制し、積和演算器を有効に使用してフィルタ演算を行うことが可能となる。したがって、並列度を増加させるほどフィルタ演算を効率的に行うことができ、演算対象画像のサイズにかかわらず、演算効率を低下させることなく、良好な演算効率でのフィルタ演算を実行することが可能となる。 As a result, it is possible to suppress the occurrence of sum-of-products calculators that do not contribute to the filter calculation even for images that are small in size compared to the degree of parallelism (the number of sum-of-products calculators). can be effectively used for filter operation. Therefore, the filter operation can be performed more efficiently as the degree of parallelism is increased, and the filter operation can be performed with good operation efficiency without lowering the operation efficiency regardless of the size of the image to be operated. becomes.

タイムチャート（図５、図６、図７）を用いて示したように、それぞれの並列化形状でフィルタ演算に要する時間が異なる場合がある。このような場合には、形状制御部１０２は、フィルタ演算処理時間が短くなる並列化形状を選択するようにすればよい。例えば、前述した演算対象画像２０１の場合、並列化形状として１×８、２×４、４×２の何れを採用してもフィルタ演算に寄与しない演算器は発生しない。しかしながら、タイムチャート（図５、図６、図７）に示されるように、フィルタ演算の処理時間が短いのは並列化形状が２×４又は４×２の場合であるので、形状制御部１０２は、並列化形状として２×４又は４×２を選択することが望ましい。 As shown using the time charts (FIGS. 5, 6, and 7), the time required for the filter operation may differ depending on the parallelization shape. In such a case, the shape control unit 102 may select a parallelized shape that shortens the filter computation processing time. For example, in the case of the above-described calculation target image 201, even if any one of 1×8, 2×4, and 4×2 is adopted as the parallelization shape, there is no calculator that does not contribute to the filter calculation. However, as shown in the time charts (FIGS. 5, 6, and 7), the filter operation processing time is short when the parallelized shape is 2×4 or 4×2. It is preferable to choose 2x4 or 4x2 as the parallelization geometry.

（第２の実施形態）
次に、本発明の第２の実施形態について説明する。第１の実施形態における画像処理装置では、形状制御信号によって演算処理部１０６内のレジスタ間の接続状態が一旦決定されると、画素データの転送は、決定された接続状態で隣接するレジスタ間でのデータ転送に限定される。このために、第２の転送モードで行われる演算グループ間を跨る画素データの転送を実現するために、隣接するレジスタ間でのデータ転送を複数のクロックサイクルに亘って行う必要があった。 (Second embodiment)
Next, a second embodiment of the invention will be described. In the image processing apparatus according to the first embodiment, once the connection state between the registers in the arithmetic processing unit 106 is determined by the shape control signal, pixel data is transferred between adjacent registers in the determined connection state. data transfer. Therefore, in order to transfer pixel data across operation groups in the second transfer mode, it is necessary to transfer data between adjacent registers over a plurality of clock cycles.

第２の実施形態における画像処理装置では、形状制御信号によるレジスタ間の接続状態の設定とは別に、転送制御信号による転送モードに応じてレジスタ間の接続状態を制御する。これにより、第２の実施形態では、第２の転送モードでの画素データの転送に必要な時間を第１の実施形態よりも大幅に短縮でき、第１の実施形態よりもフィルタ演算を高速に行うことが可能となる。 In the image processing apparatus according to the second embodiment, the connection state between registers is controlled according to the transfer mode by the transfer control signal, in addition to setting the connection state between registers by the shape control signal. As a result, in the second embodiment, the time required to transfer pixel data in the second transfer mode can be significantly reduced as compared with the first embodiment, and the filter operation can be performed at a higher speed than in the first embodiment. can be done.

図８は、第２の実施形態における画像処理装置８０１の構成例を示すブロック図である。図８において、図１に示した構成要素と同一の機能を有する構成要素には同一の符号を付し、重複する説明は省略する。画像処理装置８０１は、形状制御部１０２、転送制御部８０３、フィルタ係数格納部１０４、画像データ格納部８０５、及び演算処理部８０６を有する。 FIG. 8 is a block diagram showing a configuration example of an image processing apparatus 801 according to the second embodiment. In FIG. 8, constituent elements having the same functions as the constituent elements shown in FIG. 1 are denoted by the same reference numerals, and overlapping descriptions are omitted. The image processing device 801 has a shape control unit 102 , a transfer control unit 803 , a filter coefficient storage unit 104 , an image data storage unit 805 and an arithmetic processing unit 806 .

転送制御部８０３は、図１に示した転送制御部１０３の機能に加えて、転送モードに応じて演算処理部８０６内のレジスタ間の接続状態を制御する機能を有する。画像データ格納部８０５には、図１に示した画像データ格納部１０５と同様に、フィルタ演算の演算対象画像の画素データが格納される。ここで、本実施形態では、画像データ格納部８０５は、フィルタ演算を高速に行うために複数の画素データを１つのクロックサイクルで読み出し可能なメモリ幅を有するとする。演算処理部８０６は、並列して演算を実行可能な複数の演算器を有し、入力されるフィルタ係数Ｓ２と画素データＳ３とを用いてフィルタ演算を行う。 In addition to the functions of the transfer control unit 103 shown in FIG. 1, the transfer control unit 803 has a function of controlling the connection state between registers in the arithmetic processing unit 806 according to the transfer mode. The image data storage unit 805 stores the pixel data of the image to be subjected to the filter operation, similarly to the image data storage unit 105 shown in FIG. Here, in this embodiment, the image data storage unit 805 has a memory width capable of reading out a plurality of pixel data in one clock cycle in order to perform filter calculation at high speed. The calculation processing unit 806 has a plurality of calculators capable of executing calculations in parallel, and performs filter calculation using the input filter coefficient S2 and pixel data S3.

図９は、演算処理部８０６の構成例を示すブロック図である。図９において、図３に示した構成要素と同一の機能を有する構成要素には同一の符号を付し、重複する説明は省略する。演算処理部８０６は、レジスタ３０１～３０８、３１１～３１８、積和演算器３２１～３２８、及びセレクタ３３１～３３３、９４１～９４６、９５１～９５６を有する。 FIG. 9 is a block diagram showing a configuration example of the arithmetic processing unit 806. As shown in FIG. In FIG. 9, constituent elements having the same functions as the constituent elements shown in FIG. 3 are given the same reference numerals, and redundant explanations are omitted. The arithmetic processing unit 806 has registers 301-308, 311-318, sum-of-products calculators 321-328, and selectors 331-333, 941-946, and 951-956.

セレクタ９４１～９４６、９５１～９５６は、接続されている複数の入力から１つを選択して出力する。セレクタ９４１～９４６、９５１～９５６には転送制御部８０３からの転送制御信号が接続されており、セレクタ９４１～９４６、９５１～９５６は、この転送制御信号により、複数の入力のうちのどの入力を選択して出力するかを設定可能となっている。本実施形態では、転送制御信号により、第１の転送モードが設定されている時と第２の転送モードが設定されている時とで、セレクタ９４１～９４６、９５１～９５６の入出力関係を切り替えることが可能である。 Selectors 941 to 946 and 951 to 956 select and output one from a plurality of connected inputs. A transfer control signal from the transfer control unit 803 is connected to the selectors 941 to 946 and 951 to 956, and the selectors 941 to 946 and 951 to 956 select which input among a plurality of inputs according to this transfer control signal. It is possible to set whether to select and output. In this embodiment, the transfer control signal switches the input/output relationship of the selectors 941 to 946 and 951 to 956 depending on whether the first transfer mode is set or the second transfer mode is set. Is possible.

本実施形態では、転送制御部８０３は、セレクタ９４１～９４６、９５１～９５６に対して、以下の表に示す入出力間の接続になるように転送制御信号を出力する。以下の表では、並列化形状が異なる３つの場合について、セレクタ９４１～９４６、９５１～９５６が、転送制御信号に応じてどのような入出力接続状態になるかを示している。なお、形状制御部１０２によって制御されるセレクタ３３１～３３３の動作は第１の実施形態と同様である。 In this embodiment, the transfer control unit 803 outputs transfer control signals to the selectors 941 to 946 and 951 to 956 so that the input and output are connected as shown in the table below. The table below shows how the selectors 941 to 946 and 951 to 956 become input/output connection states in accordance with the transfer control signals for three different parallelization configurations. The operations of the selectors 331 to 333 controlled by the shape control section 102 are the same as in the first embodiment.

・並列化形状を１×８とするとき

・When the parallelization shape is 1×8

この表では、例えば並列化形状を１×８とするときの第１の転送モードでは、セレクタ９４１は、レジスタ３０２の出力を選択して出力することを示している。同様に、セレクタ９４２は、レジスタ３０３の出力を選択して出力することを示している。なお、セレクタ９４２に対して直接的には、並列化形状を１×８とするときにレジスタ３０３の出力を選択し出力するセレクタ３３１が接続されているので、括弧内にセレクタ３３１を記載している。また、斜線は、どの入力を出力としてもよいことを示している。さらに、並列化形状を１×８とするときには演算グループが１つであり第２の転送モードは存在しないので、表中では×としている。 This table shows that the selector 941 selects and outputs the output of the register 302 in the first transfer mode when the parallelization shape is 1×8, for example. Similarly, the selector 942 selects and outputs the output of the register 303 . Note that the selector 331 for selecting and outputting the output of the register 303 when the parallelization shape is 1×8 is directly connected to the selector 942, so the selector 331 is described in parentheses. there is Also, hatched lines indicate that any input may be used as an output. Furthermore, when the parallelization configuration is 1.times.8, there is one operation group and there is no second transfer mode, so x is shown in the table.

・並列化形状を２×４とするとき

・When the parallelization shape is 2×4

この表で、セレクタ９４５、９４６の出力が画素データ１、２とあるのは、画像データ格納部８０５の出力を選択していることを示す。 In this table, the output of the selectors 945 and 946 being pixel data 1 and 2 indicates that the output of the image data storage unit 805 is selected.

・並列化形状を４×２とするとき

・When the parallelization shape is 4×2

次に、演算処理部８０６の動作について説明する。以下に説明する演算処理部８０６の動作は、転送制御部８０３から入力される転送制御信号Ｓ４によって制御される。この動作が行われる前に、第１の実施形態と同様に、演算処理部８０６内におけるレジスタ３０１～３０８、３１１～３１８の接続状態は、所望の並列化形状に従って決定されているものとする。 Next, operation of the arithmetic processing unit 806 will be described. The operation of the arithmetic processing unit 806 described below is controlled by a transfer control signal S4 input from the transfer control unit 803. FIG. Before this operation is performed, as in the first embodiment, the connection states of the registers 301 to 308 and 311 to 318 in the arithmetic processing unit 806 are determined according to the desired parallelization configuration.

第２の実施形態においても、第１の実施形態と同様に第１の転送モードでの画素データの転送と第２の転送モードでの画素データの転送とを繰り返すことで、フィルタ演算が実行される。第１の転送モードの時には、レジスタ３０１～３０８、３１１～３１８の接続状態は、第１の実施形態と同様であり動作も同じである。つまり、第１の転送モードでは、転送される画素データが演算グループ内の積和演算器での演算にのみ使用されるように、レジスタ間で画素データが転送される。 In the second embodiment, similarly to the first embodiment, the filter operation is executed by repeating the transfer of pixel data in the first transfer mode and the transfer of pixel data in the second transfer mode. be. In the first transfer mode, the connection states of the registers 301-308 and 311-318 are the same as in the first embodiment, and the operations are also the same. That is, in the first transfer mode, pixel data is transferred between registers so that the transferred pixel data is used only for calculations in the sum-of-products calculator within the calculation group.

第１の実施形態と異なるのは、第２の転送モードの場合である。第１の実施形態においては、第１の転送モードと第２の転送モードとでレジスタ間の接続状態は変更しない。そのため、第２の転送モードで行われる演算グループ間を跨る画素データの転送を実現するために、複数のクロックサイクルに亘って画素データの転送を行う必要があった。それに対して、第２の実施形態では、第１の転送モードと第２の転送モードとでレジスタ間の接続状態を変更することにより、第２の転送モードで行われる画素データの転送時間を短縮する。 The second transfer mode differs from the first embodiment. In the first embodiment, the connection state between registers does not change between the first transfer mode and the second transfer mode. Therefore, in order to transfer pixel data across operation groups in the second transfer mode, it is necessary to transfer pixel data over a plurality of clock cycles. In contrast, in the second embodiment, by changing the connection state between registers between the first transfer mode and the second transfer mode, the transfer time of pixel data performed in the second transfer mode is shortened. do.

第２の実施形態における一連の動作を、図１０に示すフローチャートと、図１１及び図１２に示すタイムチャートとを参照して説明する。図１０は、第２の実施形態における画像処理装置の演算処理例を示すフローチャートである。図１１は、並列化形状を２×４とした場合のフィルタ演算の演算処理例を示すタイムチャートであり、図１２は、並列化形状を４×２とした場合のフィルタ演算の演算処理例を示すタイムチャートである。 A series of operations in the second embodiment will be described with reference to the flowchart shown in FIG. 10 and the time charts shown in FIGS. FIG. 10 is a flow chart showing an example of arithmetic processing of the image processing apparatus according to the second embodiment. FIG. 11 is a time chart showing an example of filter operation processing when the parallelization shape is 2×4, and FIG. 12 is an example of filter operation processing when the parallelization shape is 4×2. It is a time chart showing.

最初に、図１０及び図１１を参照して、並列化形状を２×４とした場合の動作について説明する。第１の実施形態と同様に、形状制御部１０２による並列化形状の決定は、図１０に示す並列演算処理の開始に先だって行われている。ステップＳ１００１～Ｓ１００３にて、画像処理装置８０１は、演算処理部８０６が有する積和演算器での並列演算（フィルタ演算）のための準備（画素データの用意）を行い、並列演算で使用するレジスタ群に画像データ格納部８０５から画素データを供給する。 First, with reference to FIGS. 10 and 11, the operation when the parallelized shape is 2×4 will be described. As in the first embodiment, determination of the parallelized shape by the shape control unit 102 is performed prior to the start of the parallel arithmetic processing shown in FIG. In steps S1001 to S1003, the image processing apparatus 801 makes preparations (preparing pixel data) for parallel computations (filter computations) in the sum-of-products calculator of the computation processing unit 806, and registers to be used in the parallel computations. Pixel data is supplied from the image data storage unit 805 to the group.

まずステップＳ１００１では、転送制御部８０３は、第２の転送モードに設定し、画像データ格納部８０５から演算対象画像の水平方向に６画素の画素データを読み出して、演算処理部８０６に転送するよう指示する（図１１の時刻ｔ０）。図１１において、時刻ｔ０での画像データ格納部８０５からは、画素データＤ００～Ｄ０５がパラレルに出力されている（図１１においてはＤ０＊として表記）。第２の転送モードでのレジスタ間接続状態になっているので、この次のサイクル（図１１の時刻ｔ１）では、演算処理部８０６内のレジスタ３０５～３０８、３１７、３１８には、演算対象の画素データＤ００～Ｄ０５が格納された状態になる。 First, in step S<b>1001 , the transfer control unit 803 sets the second transfer mode, reads pixel data of six pixels in the horizontal direction of the calculation target image from the image data storage unit 805 , and transfers the data to the calculation processing unit 806 . command (time t0 in FIG. 11). In FIG. 11, pixel data D00 to D05 are output in parallel from the image data storage unit 805 at time t0 (denoted as D0* in FIG. 11). Since the registers are connected in the second transfer mode, in the next cycle (time t1 in FIG. 11), the registers 305 to 308, 317, and 318 in the arithmetic processing unit 806 have Pixel data D00 to D05 are stored.

続いて、ステップＳ１００２では、転送制御部８０３は、演算処理部８０６において並列演算に使用するレジスタ群に演算対象の画素データが格納されているか否か、つまり、フィルタ演算の準備が完了しているか否かを判定する。この時点では、演算処理部８０６内のレジスタ３０１～３０４、３１３、３１４には画素データが格納されていないので、フィルタ演算の準備が完了していないと転送制御部８０３が判定し（ＮＯ）、ステップＳ１００３に進む。 Subsequently, in step S1002, the transfer control unit 803 determines whether pixel data to be processed is stored in a group of registers used for parallel computation in the computation processing unit 806, that is, whether preparation for filter computation is complete. determine whether or not At this time, the registers 301 to 304, 313, and 314 in the arithmetic processing unit 806 do not store pixel data, so the transfer control unit 803 determines (NO) that the preparation for the filter operation is not completed. The process proceeds to step S1003.

ステップＳ１００３では、転送制御部８０３は、演算処理部８０６内のレジスタに対して、第１の転送モードでシフト処理（レジスタ間の画素データ転送）を行うように指示する（図１１の時刻ｔ２～ｔ３）。 In step S1003, the transfer control unit 803 instructs the registers in the arithmetic processing unit 806 to perform shift processing (pixel data transfer between registers) in the first transfer mode (from time t2 in FIG. 11). t3).

さらにステップＳ１００１に戻って、転送制御部８０３は、第２の転送モードで、画像データ格納部８０５から読み出された画素データＤ１０～Ｄ１５をレジスタ３０５～３０８、３１７、３１８に転送する。また、それと同時に、転送制御部８０３は、演算処理部８０６内のレジスタ間での画素データの転送を行うように指示する（図１１の時刻ｔ４）。これにより、このサイクル（図１１の時刻ｔ４）では、演算処理部８０６において演算に使用されるレジスタには、演算対象の画素データが格納された状態になり、図１１に示す時刻ｔ４から積和演算器３２１～３２８による積和演算が実行される。 Further, returning to step S1001, the transfer control unit 803 transfers the pixel data D10 to D15 read from the image data storage unit 805 to the registers 305 to 308, 317 and 318 in the second transfer mode. At the same time, the transfer control unit 803 instructs to transfer pixel data between registers in the arithmetic processing unit 806 (time t4 in FIG. 11). As a result, in this cycle (time t4 in FIG. 11), the registers used for calculation in the arithmetic processing unit 806 store the pixel data to be calculated, and from time t4 shown in FIG. A sum-of-products operation is performed by computing units 321-328.

このとき、ステップＳ１００２において、フィルタ演算の準備が完了したと転送制御部８０３が判定し（ＹＥＳ）、ステップＳ１００４に進む。ステップＳ１００４では、演算処理部８０６が、フィルタカーネルにおける水平方向の１行目のフィルタ係数（Ｗ００、Ｗ０１、Ｗ０２）と画素データとの積和演算を行う（図１１の時刻ｔ４～ｔ６）。また、積和演算と並行して、第１の転送モードでレジスタ間の画素データ転送が行われる（図１１の時刻ｔ５～ｔ６）。 At this time, in step S1002, the transfer control unit 803 determines that preparation for filter calculation is completed (YES), and the process proceeds to step S1004. In step S1004, the arithmetic processing unit 806 performs a sum-of-products operation between the filter coefficients (W00, W01, W02) of the first row in the horizontal direction in the filter kernel and the pixel data (time t4 to t6 in FIG. 11). In parallel with the sum-of-products operation, pixel data transfer between registers is performed in the first transfer mode (time t5 to t6 in FIG. 11).

続いて、ステップＳ１００５では、転送制御部８０３は、並列演算（フィルタ演算）の終了条件を確認し、並列演算（フィルタ演算）が終了したか否かを判定する。この時点では、まだフィルタカーネルの水平方向の１行目と画素データとの積和演算を行っただけであるので、並列演算（フィルタ演算）が終了していないと転送制御部８０３が判定し（ＮＯ）、ステップＳ１００６に進む。 Subsequently, in step S1005, the transfer control unit 803 confirms conditions for ending the parallel computation (filter computation), and determines whether or not the parallel computation (filter computation) has ended. At this point, the transfer control unit 803 determines that the parallel calculation (filter calculation) has not ended because the sum-of-products calculation of the first horizontal row of the filter kernel and the pixel data has only been performed ( NO), the process proceeds to step S1006.

ステップＳ１００６では、転送制御部８０３は、演算処理部８０６に対して第２の転送モードでレジスタ間の画素データ転送を行うように指示する（図１１の時刻ｔ７）。このサイクル（図１１の時刻ｔ７）からフィルタカーネルの２行目について積和演算器３２１～３２８による積和演算が実行される。 In step S1006, the transfer control unit 803 instructs the arithmetic processing unit 806 to transfer pixel data between registers in the second transfer mode (time t7 in FIG. 11). From this cycle (time t7 in FIG. 11), sum-of-products calculations are performed by sum-of-products calculators 321 to 328 for the second row of the filter kernel.

その後、再度ステップＳ１００４の処理を行い、演算処理部８０６は、フィルタカーネルにおける水平方向の２行目のフィルタ係数（Ｗ１０、Ｗ１１、Ｗ１２）と画素データとの積和演算を行う（図１１の時刻ｔ７～ｔ９）。また、積和演算と並行して、第１の転送モードでレジスタ間の画素データ転送が行われる（図１１の時刻ｔ８～ｔ９）。以降、ステップＳ１００５で、転送制御部８０３が、並列演算（フィルタ演算）の終了と判定するまで、画像処理装置８０１は、ステップＳ１００４及びステップＳ１００６の処理を繰り返し実行する。 After that, the process of step S1004 is performed again, and the arithmetic processing unit 806 performs a sum-of-products operation of the filter coefficients (W10, W11, W12) in the second row in the horizontal direction of the filter kernel and the pixel data (at the time shown in FIG. 11). t7-t9). In parallel with the sum-of-products operation, pixel data transfer between registers is performed in the first transfer mode (time t8 to t9 in FIG. 11). Thereafter, the image processing apparatus 801 repeatedly executes the processing of steps S1004 and S1006 until the transfer control unit 803 determines in step S1005 that the parallel computation (filter computation) has ended.

フィルタカーネルにおける水平方向の３行目のフィルタ係数（Ｗ２０、Ｗ２１、Ｗ２２）と画素データとの積和演算が行われた後のステップＳ１００５で、転送制御部８０３は、並列演算（フィルタ演算）の終了と判定し（ＹＥＳ）、処理が終了する。並列演算（フィルタ演算）の終了時には、演算処理部８０６内の積和演算器３２１～３２８からは、第１の実施形態と同様の演算結果が得られる。 In step S1005 after the product-sum operation of the filter coefficients (W20, W21, W22) on the third horizontal line in the filter kernel and the pixel data, the transfer control unit 803 performs parallel operation (filter operation). It is determined to be finished (YES), and the process ends. At the end of the parallel computation (filter computation), the sum-of-products calculators 321 to 328 in the computation processing unit 806 provide computation results similar to those of the first embodiment.

次に、図１０及び図１２を参照して、並列化形状を４×２とした場合の動作について説明する。並列化形状を２×４とした場合と同様に、形状制御部１０２による並列化形状の決定は、図１０のフローチャートに示す並列演算処理の開始に先だって行われている。 Next, with reference to FIGS. 10 and 12, the operation when the parallelized shape is 4×2 will be described. Similar to the case where the parallelization shape is 2×4, the parallelization shape is determined by the shape control unit 102 prior to the start of the parallel arithmetic processing shown in the flowchart of FIG.

まずステップＳ１００１では、転送制御部８０３は、第２の転送モードに設定し、画像データ格納部８０５から演算対象画像の水平方向に４画素の画素データを読み出して、演算処理部８０６に転送するよう指示する（図１２の時刻ｔ０）。図１２において、時刻ｔ０での画像データ格納部からは、画素データＤ００～Ｄ０３がパラレルに出力されている（図１２においてはＤ０＊として表記）。これにより、この次のサイクル（図１２の時刻ｔ１）では、演算処理部８０６内のレジスタ３０７、３０８、３１７、３１８には、演算対象の画素データＤ００～Ｄ０３が格納された状態になる。 First, in step S<b>1001 , the transfer control unit 803 sets the second transfer mode, reads pixel data of four pixels in the horizontal direction of the calculation target image from the image data storage unit 805 , and transfers the pixel data to the calculation processing unit 806 . command (time t0 in FIG. 12). In FIG. 12, pixel data D00 to D03 are output in parallel from the image data storage unit at time t0 (indicated as D0* in FIG. 12). As a result, in the next cycle (time t1 in FIG. 12), the registers 307, 308, 317, and 318 in the arithmetic processing unit 806 store the pixel data D00 to D03 to be calculated.

続いて、ステップＳ１００２では、転送制御部８０３は、フィルタ演算の準備が完了しているか否かを判定する。この時点では、演算処理部８０６内のレジスタ３０１～３０６、３１１～３１６には画素データが格納されていないので、フィルタ演算の準備が完了していないと転送制御部８０３が判定し（ＮＯ）、ステップＳ１００３に進む。ステップＳ１００３では、転送制御部８０３は、演算処理部８０６内のレジスタに対して、第１の転送モードでシフト処理（レジスタ間の画素データ転送）を行うように指示する（図１２の時刻ｔ２～ｔ３）。 Subsequently, in step S1002, the transfer control unit 803 determines whether or not preparations for filter calculation have been completed. At this time, the registers 301 to 306 and 311 to 316 in the arithmetic processing unit 806 do not store pixel data, so the transfer control unit 803 determines that the preparation for the filter operation is not completed (NO). The process proceeds to step S1003. In step S1003, the transfer control unit 803 instructs the registers in the arithmetic processing unit 806 to perform shift processing (pixel data transfer between registers) in the first transfer mode (from time t2 in FIG. 12). t3).

ステップＳ１００２で、転送制御部８０３が、フィルタ演算の準備完了と判定するまで、画像処理装置８０１は、ステップＳ１００１及びステップＳ１００３の処理を繰り返し実行する。ステップＳ１００１の処理は、図１２の時刻ｔ４、ｔ７、ｔ１０において実行され、ステップＳ１００３の処理は、図１２の時刻ｔ５～ｔ６、ｔ８～ｔ９において実行される。そして、図１２の時刻ｔ１０では、演算処理部８０６において演算に使用されるレジスタには、演算対象の画素データが格納された状態になり、図１２に示す時刻ｔ１０から積和演算器３２１～３２８による積和演算が実行される。 The image processing apparatus 801 repeatedly executes the processing of steps S1001 and S1003 until the transfer control unit 803 determines in step S1002 that preparation for the filter operation is complete. The process of step S1001 is performed at times t4, t7 and t10 in FIG. 12, and the process of step S1003 is performed at times t5 to t6 and t8 to t9 in FIG. Then, at time t10 in FIG. 12, the registers used for calculation in the arithmetic processing unit 806 are in a state of storing the pixel data to be calculated, and from time t10 shown in FIG. A sum-of-products operation is performed.

ステップＳ１００４では、演算処理部８０６が、フィルタカーネルにおける水平方向の１行目のフィルタ係数（Ｗ００、Ｗ０１、Ｗ０２）と画素データとの積和演算を行う（図１２の時刻ｔ１０～ｔ１２）。また、積和演算と並行して、第１の転送モードでレジスタ間の画素データ転送が行われる（図１２の時刻ｔ１１～ｔ１２）。 In step S1004, the arithmetic processing unit 806 performs a sum-of-products operation between the filter coefficients (W00, W01, W02) of the first row in the horizontal direction in the filter kernel and the pixel data (time t10 to t12 in FIG. 12). In parallel with the sum-of-products operation, pixel data transfer between registers is performed in the first transfer mode (time t11 to t12 in FIG. 12).

ステップＳ１００６では、転送制御部８０３は、演算処理部８０６に対して第２の転送モードでレジスタ間の画素データ転送を行うように指示する（図１２の時刻ｔ１３）。このサイクル（図１２の時刻ｔ１３）からフィルタカーネルの２行目について積和演算器３２１～３２８による積和演算が実行される。 In step S1006, the transfer control unit 803 instructs the arithmetic processing unit 806 to transfer pixel data between registers in the second transfer mode (time t13 in FIG. 12). From this cycle (time t13 in FIG. 12), sum-of-products calculations are performed by sum-of-products calculators 321 to 328 for the second row of the filter kernel.

その後、再度ステップＳ１００４の処理を行い、演算処理部８０６は、フィルタカーネルにおける水平方向の２行目のフィルタ係数（Ｗ１０、Ｗ１１、Ｗ１２）と画素データとの積和演算を行う（図１２の時刻ｔ１３～ｔ１５）。また、積和演算と並行して、第１の転送モードでレジスタ間の画素データ転送が行われる（図１２の時刻ｔ１４～ｔ１５）。以降、ステップＳ１００５で、転送制御部８０３が、並列演算（フィルタ演算）の終了と判定するまで、画像処理装置８０１は、ステップＳ１００４及びステップＳ１００６の処理を繰り返し実行する。 After that, the process of step S1004 is performed again, and the arithmetic processing unit 806 performs a sum-of-products operation of the filter coefficients (W10, W11, W12) in the second row in the horizontal direction in the filter kernel and the pixel data (at the time shown in FIG. 12). t13-t15). In parallel with the sum-of-products operation, pixel data transfer between registers is performed in the first transfer mode (time t14 to t15 in FIG. 12). Thereafter, the image processing apparatus 801 repeatedly executes the processing of steps S1004 and S1006 until the transfer control unit 803 determines in step S1005 that the parallel computation (filter computation) has ended.

第２の実施形態によれば、第１の実施形態と同様に、演算対象画像のサイズにかかわらず、演算効率を低下させることなく、良好な演算効率でのフィルタ演算を実行することが可能となる。また、第２の実施形態では、第１の転送モードと第２の転送モードとでレジスタ間の接続状態を変更することにより、第２の転送モードで行われる画素データの転送時間を短縮することができる。したがって、第２の実施形態における画像処理装置では、第１の実施形態よりもフィルタ演算を高速に行うことが可能になり、フィルタ演算の演算効率を向上させることができる。 According to the second embodiment, similarly to the first embodiment, it is possible to perform filter calculation with good calculation efficiency without lowering the calculation efficiency regardless of the size of the calculation target image. Become. Further, in the second embodiment, the connection state between registers is changed between the first transfer mode and the second transfer mode, thereby shortening the transfer time of pixel data performed in the second transfer mode. can be done. Therefore, in the image processing apparatus according to the second embodiment, it becomes possible to perform filter calculations at a higher speed than in the first embodiment, and the calculation efficiency of filter calculations can be improved.

（第３の実施形態）
次に、本発明の第３の実施形態について説明する。第１及び第２の実施形態では、形状制御部１０２によりフィルタ演算の演算対象領域の形状に応じて並列化形状を変更する例を示した。第１の実施形態において示したように、１次元の並列化形状よりも２次元の並列化形状とした方が高速にフィルタ演算処理できる場合がある。このような場合には、並列化形状を変更しなくても、２次元の並列化形状を実現するだけで、１次元の並列化形状とする場合よりも高速にフィルタ演算処理を行うことが可能である。 (Third Embodiment)
Next, a third embodiment of the invention will be described. In the first and second embodiments, examples have been shown in which the shape control unit 102 changes the parallelized shape according to the shape of the computation target area of the filter computation. As shown in the first embodiment, there are cases where the filter operation processing can be performed faster with a two-dimensional parallelized shape than with a one-dimensional parallelized shape. In such a case, even without changing the parallelization geometry, it is possible to perform filter operation processing at a higher speed than in the case of a one-dimensional parallelization geometry simply by realizing a two-dimensional parallelization geometry. is.

第３の実施形態では、本発明を適用して単純に２次元の並列化形状を実現する場合について説明する。つまり、第３の実施形態では、並列化形状を２次元の固定された形状とし、フィルタ演算を行う例を示す。そのため、第３の実施形態における画像処理装置では、複数の並列化形状を切り替えるために必要となる形状制御部は不要である。 In the third embodiment, a case in which the present invention is applied to simply implement a two-dimensional parallelized shape will be described. In other words, in the third embodiment, an example in which a parallelized shape is a two-dimensional fixed shape and a filter operation is performed will be described. Therefore, the image processing apparatus according to the third embodiment does not require a shape control section that is necessary for switching between a plurality of parallelized shapes.

図１３は、第３の実施形態における画像処理装置１３０１の構成例を示すブロック図である。前述したように本実施形態では、並列化形状は固定であるため、画像処理装置１３０１は、複数の並列化形状を切り替えるための形状制御部は有していない。図１３において、図１に示した構成要素と同一の機能を有する構成要素には同一の符号を付し、重複する説明は省略する。画像処理装置１３０１は、転送制御部１３０３、フィルタ係数格納部１０４、画像データ格納部１０５、及び演算処理部１３０６を有する。 FIG. 13 is a block diagram showing a configuration example of an image processing apparatus 1301 according to the third embodiment. As described above, in this embodiment, parallelized shapes are fixed, so the image processing apparatus 1301 does not have a shape control unit for switching between a plurality of parallelized shapes. In FIG. 13, constituent elements having the same functions as the constituent elements shown in FIG. 1 are denoted by the same reference numerals, and overlapping descriptions are omitted. The image processing device 1301 has a transfer control unit 1303 , a filter coefficient storage unit 104 , an image data storage unit 105 and an arithmetic processing unit 1306 .

転送制御部１３０３は、フィルタ係数格納部１０４及び画像データ格納部１０５からフィルタ係数Ｓ２及び画素データＳ３をそれぞれ読み出して演算処理部１３０６に供給する制御を行う。また、転送制御部１３０３は、転送制御信号Ｓ４を演算処理部１３０６に出力することにより、演算処理部１３０６内での画素データの転送の制御を行う。ここで、本実施形態では並列化形状は固定であるので、転送制御部１３０３も、所定の並列化形状に適したデータ転送制御を行う。例えば、並列化形状が２×４であれば、第１又は第２の実施形態において説明した転送制御部１０３又は８０３の動作のうち、並列化形状を２×４とした場合と同様の動作を行う。 The transfer control unit 1303 reads the filter coefficient S2 and the pixel data S3 from the filter coefficient storage unit 104 and the image data storage unit 105, respectively, and controls the supply to the arithmetic processing unit 1306. FIG. Further, the transfer control unit 1303 outputs a transfer control signal S4 to the arithmetic processing unit 1306 to control the transfer of pixel data within the arithmetic processing unit 1306 . Here, since the parallelization shape is fixed in this embodiment, the transfer control unit 1303 also performs data transfer control suitable for a predetermined parallelization shape. For example, if the parallelization shape is 2×4, the same operation as that in the case of the parallelization shape of 2×4 among the operations of the transfer control unit 103 or 803 described in the first or second embodiment is performed. conduct.

演算処理部１３０６は、並列して演算を実行可能な複数の演算器を有し、入力されるフィルタ係数Ｓ２と画素データＳ３とを用いてフィルタ演算を行う。ここで、本実施形態では並列化形状は固定であるので、演算処理部１３０６も、所定の並列化形状とした場合の動作を行う。したがって、図３や図９に示したセレクタ３３１～３３３は不要となり、代わりに所定の並列化形状としたときのセレクタの入出力接続を実現するようにレジスタ間を接続すればよい。 The arithmetic processing unit 1306 has a plurality of arithmetic units capable of executing arithmetic operations in parallel, and performs filter arithmetic using the input filter coefficient S2 and pixel data S3. Here, since the parallelization shape is fixed in this embodiment, the arithmetic processing unit 1306 also performs the operation when the predetermined parallelization shape is used. Therefore, the selectors 331 to 333 shown in FIGS. 3 and 9 are not required, and instead, the registers may be connected so as to realize the input/output connection of the selectors in a predetermined parallel configuration.

このように画像処理装置１３０１を構成することで、２次元の並列化形状を有するフィルタ演算処理を実現することが可能となる。 By configuring the image processing apparatus 1301 in this way, it becomes possible to realize filter operation processing having a two-dimensional parallelized shape.

（本発明の他の実施形態）
本発明は、前述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読み出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Another embodiment of the present invention)
The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in the computer of the system or apparatus reads and executes the program. It can also be realized by processing to It can also be implemented by a circuit (for example, ASIC) that implements one or more functions.

なお、前記実施形態は、何れも本発明を実施するにあたっての具体化のほんの一例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならないものである。すなわち、本発明はその技術思想、又はその主要な特徴から逸脱することなく、様々な形で実施することができる。 It should be noted that the above-described embodiments are merely examples of specific implementations of the present invention, and the technical scope of the present invention should not be construed to be limited by these. That is, the present invention can be embodied in various forms without departing from its technical concept or main features.

１０１、８０１、１３０１：画像処理装置１０２：形状制御部１０３、８０３、１３０３：転送制御部１０４：フィルタ係数格納部１０５、８０５：画像データ格納部１０６、８０６、１３０６：演算処理部３０１～３０８、３１１～３１８：レジスタ３２１～３２８：積和演算器３３１～３３３：セレクタ９４１～９４６、９５１～９５６：セレクタ 101, 801, 1301: image processing device 102: shape control unit 103, 803, 1303: transfer control unit 104: filter coefficient storage unit 105, 805: image data storage unit 106, 806, 1306: arithmetic processing unit 301 to 308, 311 to 318: registers 321 to 328: sum-of-product calculators 331 to 333: selectors 941 to 946, 951 to 956: selectors

Claims

An image processing device for performing filter arithmetic processing by scanning a filter kernel with respect to pixel data of an image stored in an image storage means,
a plurality of temporary storage means for temporarily storing the plurality of pixel data read out from the image storage means, the connection state of which is controlled according to the arrangement of pixels for which the filter operation processing is performed in parallel;
performing parallel filter computation using the filter coefficients in the filter kernel and the plurality of pieces of pixel data stored in the plurality of temporary storage means; a plurality of computing means grouped into one or more computing groups by
Controlling the transfer of the pixel data used for the filter calculation process between the plurality of temporary storage means , and when the plurality of calculation means are grouped into a plurality of calculation groups, the same transfer mode is used in the first transfer mode. Transfer of transferring the pixel data so as to be used by the arithmetic means belonging to the arithmetic group, and controlling transfer of the pixel data so as to be used by the arithmetic means belonging to another arithmetic group in a second transfer mode. and a control means.

2. The image processing apparatus according to claim 1 , wherein said transfer control means changes a connection state between said plurality of temporary storage means between said first transfer mode and said second transfer mode.

The transfer control means is
controlling the transfer of the pixel data in the first transfer mode while the filter operation processing for one row in the filter kernel is being performed;
3. The image processing apparatus according to claim 1 , wherein the transfer of the pixel data is controlled in the second transfer mode when the filtering operation processing for one row in the filter kernel is completed.

In the second transfer mode, the transfer control means transfers the plurality of pixel data used for performing filter calculation processing for one row in the filter kernel in the first calculation group to the first calculation group. 4. The plurality of pixel data are transferred so as to be used for performing another row of filter operation processing in the filter kernel in a second operation group different from the The image processing device according to any one of 1.

5. The apparatus according to any one of claims 1 to 4 , further comprising connection control means for controlling a connection state of said plurality of temporary storage means in accordance with the arrangement of pixels on which said filtering operations are performed in parallel. The described image processing device.

6. The apparatus according to any one of claims 1 to 5 , further comprising grouping means for grouping said plurality of arithmetic means into one or a plurality of arithmetic groups according to the arrangement of pixels on which said filter arithmetic processing is performed in parallel. The image processing apparatus according to any one of items 1 to 3.

7. The image processing apparatus according to claim 6 , wherein each of said operation groups is composed of said operation means for performing said filter operation processing in parallel on pixels in the same row.

An image processing method for performing filter arithmetic processing by scanning a filter kernel with respect to pixel data of an image stored in an image storage means,
a storage step of storing the plurality of pixel data read from the image storage means in a plurality of temporary storage means whose connection state is controlled according to the arrangement of pixels in which the filter operation processing is performed in parallel;
one or more filtering operations using the filter coefficients in the filter kernel and the plurality of pieces of pixel data stored in the plurality of temporary storage means according to the arrangement of pixels in which the filtering operations are performed in parallel; A calculation step performed in parallel by a plurality of calculation means grouped into a plurality of calculation groups ;
Controlling the transfer of the pixel data used for the filter operation processing between the plurality of temporary storage means , and when the plurality of operation means are grouped into a plurality of operation groups, the same operation is performed in a first transfer mode. Transfer of transferring the pixel data so as to be used by the arithmetic means belonging to the arithmetic group, and controlling transfer of the pixel data so as to be used by the arithmetic means belonging to another arithmetic group in a second transfer mode. and a control step.