JP2009295163A

JP2009295163A - Image processing device and image processing method

Info

Publication number: JP2009295163A
Application number: JP2009134326A
Authority: JP
Inventors: Yusuke Suzuki; 裕介鈴木
Original assignee: Toshiba Corp; Toshiba TEC Corp
Current assignee: Toshiba Corp; Toshiba TEC Corp
Priority date: 2008-06-05
Filing date: 2009-06-03
Publication date: 2009-12-17
Anticipated expiration: 2029-06-03
Also published as: JP5112386B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image processing device and image processing method for easily enhancing the processing efficiency of imaging processing having dependency between pixels. <P>SOLUTION: This image processing device includes a common memory 3 for storing an image, a plurality of processors 6 that are connected to the common memory and execute image processing, and a controller 5 for dividing the image stored in the common memory into a plurality of regions, setting the divided regions to optional processors, respectively, and parallel-processing the set processors while sequentially shifting their processing start timings. The processing region in which a precedent processor precedingly starting operation performs the image processing includes a pixel for satisfying the dependency between pixels firstly processed by a following processor subsequently starting operation. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、複数の汎用プロセッサコアを持つ画像処理装置であって、画素の処理結果が、隣接する他の画素に依存して決まるタイプの画像処理を高速化する技術に関する。 The present invention relates to an image processing apparatus having a plurality of general-purpose processor cores, and relates to a technique for speeding up image processing of a type in which pixel processing results are determined depending on other adjacent pixels.

従来、大量の画像処理をリアルタイムで実現しようとした場合は、ＡＳＩＣ（Application Specific Integrated Circuit）などの専用ハードウェアを用い、画像処理アルゴリズムを専用ハードウェアに固定して実現していた。近年、ＳＩＭＤ（Single Instruction Multiple Data）型の演算器、及びＳＩＭＤ型データレジスタのスロット間（隣接画素間）演算を行なうハードウェアを用いた画像処理技術が知られている。この画像処理技術では、これらのハードウェア上で動作するプログラムを書き換えることによって、パラメータ、アルゴリズムを、柔軟に変更することができる。 Conventionally, when a large amount of image processing is to be realized in real time, dedicated hardware such as ASIC (Application Specific Integrated Circuit) is used and the image processing algorithm is fixed to the dedicated hardware. 2. Description of the Related Art In recent years, an image processing technique using a SIMD (Single Instruction Multiple Data) type arithmetic unit and hardware for performing an operation between slots (between adjacent pixels) of a SIMD type data register is known. In this image processing technique, parameters and algorithms can be flexibly changed by rewriting programs that run on these hardware.

このようなＳＩＭＤ型の技術を開示したものとして、特許文献１に記載された技術が知られている。この特許文献１に記載された技術は、誤差拡散タイプの画像処理を、水平・垂直方向の２つの方向の画素列の処理に分解し、垂直方向はＳＩＭＤ型演算器、水平方向は逐次処理用回路を用いて処理することで、高速な処理を実現するものである。 As a technique for disclosing such SIMD type technology, the technology described in Patent Document 1 is known. The technique described in this Patent Document 1 decomposes error diffusion type image processing into pixel row processing in two directions, horizontal and vertical directions, the SIMD type arithmetic unit in the vertical direction and the sequential processing in the horizontal direction. High-speed processing is realized by processing using a circuit.

しかしながら、特許文献１に記載された技術を実施する際には専用ハードウェアが必要である。そして、性能を更に向上させようとした場合、単にその専用ハードウェアを増設しても処理効率を高められるものでもない。そのため、性能を更に向上させようとした場合、新たな専用ハードウェアが必要となる。即ち、特許文献１に記載された技術では、処理効率向上を容易に実現することが難しいという問題がある。 However, dedicated hardware is required to implement the technique described in Patent Document 1. When further improving the performance, it is not possible to increase the processing efficiency simply by adding dedicated hardware. Therefore, new dedicated hardware is required to further improve the performance. That is, the technique described in Patent Document 1 has a problem that it is difficult to easily improve the processing efficiency.

本願発明は係る事情に鑑みてなされたものであって、画素間に依存関係がある画像処理の処理効率向上を簡便に実現することのできる画像処理装置及び画像処理方法を提供することを目的とする。 The present invention has been made in view of such circumstances, and an object of the present invention is to provide an image processing apparatus and an image processing method capable of easily realizing improvement in processing efficiency of image processing having a dependency relationship between pixels. To do.

上記課題を解決するための本発明は、画像を格納する共有メモリと、共有メモリに接続され画像処理を実行する複数のプロセッサと、前記共有メモリに格納された画像を複数の領域に分割して、分割された領域を任意のプロセッサにそれぞれ設定し、前記設定したプロセッサの処理開始タイミングを順次ずらせて並列処理させるコントローラとを有し、先行して動作を開始する先行プロセッサが画像処理する処理領域内には、続いて動作を開始する後行プロセッサが最初に処理する画素の画素間依存関係を充足する画素が含まれている画像処理装置である。 In order to solve the above problems, the present invention provides a shared memory for storing an image, a plurality of processors connected to the shared memory for executing image processing, and an image stored in the shared memory divided into a plurality of areas. A processing area in which the preceding processor that starts the operation in advance has a controller that sets the divided areas in arbitrary processors, and sequentially shifts the processing start timings of the set processors to perform parallel processing. The image processing apparatus includes a pixel that satisfies the inter-pixel dependency of the pixel that is subsequently processed by the subsequent processor that starts the operation.

また本発明は、画像を格納する共有メモリと、共有メモリに接続され画像処理を実行する複数のプロセッサと、前記共有メモリに格納された画像を複数の領域に分割して、分割された領域を任意のプロセッサにそれぞれ設定し、少なくとも複数のプロセッサを１つのグループ単位にまとめた複数のグループを設定し、前記１つのグループに属する複数のプロセッサについては同時に動作を開始させて、前記グループ毎の処理開始タイミングを順次ずらせて並列処理させるコントローラとを有し、先行して動作を開始する先行グループに属するプロセッサが画像処理する処理領域内には、続いて動作を開始する後行グループに属する全てのプロセッサが最初に処理する画素の画素間依存関係を充足する画素が含まれている画像処理装置である。 The present invention also provides a shared memory for storing an image, a plurality of processors connected to the shared memory for executing image processing, an image stored in the shared memory being divided into a plurality of areas, Each group is set as an arbitrary processor, a plurality of groups in which at least a plurality of processors are grouped into one group is set, and a plurality of processors belonging to the one group are started to operate at the same time. A controller that sequentially processes start timings and performs parallel processing, and a processor belonging to the preceding group that starts the operation in advance performs processing within the processing area for all subsequent members that start the operation. The image processing apparatus includes a pixel that satisfies the inter-pixel dependency of a pixel that is first processed by the processor.

また本発明は、複数台の上記記載の発明である画像処理装置を通信路を介して相互に接続した画像処理システムにおいて、一つの前記画像処理装置のコントローラは、分割した前記処理領域の一部の領域を他の画像処理装置に送信して処理させると共に、前記他の画像処理装置との間で画素依存性に伴う前記画像処理を開始するためのタイミング信号の授受動作を制御する画像処理システムである。 The present invention is also directed to an image processing system in which a plurality of image processing apparatuses according to the above-described inventions are connected to each other via a communication path, wherein a controller of one image processing apparatus is a part of the divided processing area. Image processing system for transmitting and processing the image area to another image processing apparatus and controlling a timing signal transmission / reception operation for starting the image processing accompanying pixel dependence with the other image processing apparatus It is.

また本発明は、画像を格納する共有メモリと、共有メモリに接続され画像処理を実行する複数のプロセッサとを備えた画像処理装置の画像処理方法において、前記共有メモリに格納された画像を複数の領域に分割し、分割された領域を任意のプロセッサにそれぞれ設定し、前記設定したプロセッサの処理開始タイミングを順次ずらせて並列処理し、先行して動作を開始する先行プロセッサが画像処理する処理領域内には、続いて動作を開始する後行プロセッサが最初に処理する画素の画素間依存関係を充足する画素が含まれている画像処理方法である。 According to another aspect of the present invention, there is provided an image processing method of an image processing apparatus including a shared memory for storing an image and a plurality of processors connected to the shared memory to execute image processing. In the processing area where the divided processor sets each divided area to an arbitrary processor, shifts the processing start timing of the set processor sequentially and performs parallel processing, and the preceding processor that starts the operation in advance performs image processing. Is an image processing method including a pixel that satisfies the inter-pixel dependency of a pixel that is subsequently processed by a subsequent processor that starts the operation.

また本発明は、画像を格納する共有メモリと、共有メモリに接続され画像処理を実行する複数のプロセッサとを備えた画像処理装置の画像処理方法において、前記共有メモリに格納された画像を複数の領域に分割し、分割された領域を任意のプロセッサにそれぞれ設定し、少なくとも複数のプロセッサを１つのグループ単位にまとめた複数のグループを設定し、前記１つのグループに属する複数のプロセッサについては同時に動作を開始させ、前記グループ毎の処理開始タイミングを順次ずらせて並列処理させ、先行して動作を開始する先行グループに属するプロセッサが画像処理する処理領域内には、続いて動作を開始する後行グループに属する全てのプロセッサが最初に処理する画素の画素間依存関係を充足する画素が含まれている画像処理方法である。 According to another aspect of the present invention, there is provided an image processing method of an image processing apparatus including a shared memory for storing an image and a plurality of processors connected to the shared memory to execute image processing. Dividing into areas, setting the divided areas as arbitrary processors, setting a plurality of groups in which at least a plurality of processors are grouped into one group, and operating a plurality of processors belonging to the one group simultaneously In the processing area in which the processor belonging to the preceding group that starts the operation in advance performs image processing in the processing area that sequentially performs the processing start timing for each group, the succeeding group that starts the operation subsequently An image containing a pixel that satisfies the inter-pixel dependency of the pixel that is first processed by all processors belonging to It is a management method.

また本発明は、複数台の上記記載の発明である画像処理装置を通信路を介して相互に接続した画像処理システムの画像処理方法において、一つの前記画像処理装置は、分割した前記処理領域の一部の領域を他の画像処理装置に送信して処理させると共に、前記他の画像処理装置との間で画素依存性に伴う前記画像処理を開始するためのタイミング信号の授受動作を制御する画像処理方法である。 The present invention also provides an image processing method of an image processing system in which a plurality of image processing apparatuses according to the above-described inventions are connected to each other via a communication path, wherein one image processing apparatus An image that transmits a part of the region to another image processing apparatus to be processed, and controls an operation of sending and receiving a timing signal for starting the image processing with pixel dependency with the other image processing apparatus It is a processing method.

この発明の画像処理装置及び画像処理方法によれば、画素間に依存関係がある画像処理の処理効率向上を簡便に実現することができる。 According to the image processing apparatus and the image processing method of the present invention, it is possible to easily realize improvement in processing efficiency of image processing having a dependency relationship between pixels.

第１の実施の形態の画像処理装置を示す図。1 is a diagram illustrating an image processing apparatus according to a first embodiment. 第１の実施の形態の画像処理装置を適用した画像処理システムを示す図。1 is a diagram illustrating an image processing system to which an image processing apparatus according to a first embodiment is applied. 対象とする画素の処理が他の画素の処理結果に依存する画像処理の一例を説明する図。The figure explaining an example of the image process in which the process of the pixel made into object depends on the process result of another pixel. 各プロセッサの処理領域を示す図。The figure which shows the process area | region of each processor. コア間の同期タイミングを示す図。The figure which shows the synchronous timing between cores. コア間の同期タイミングを示す図。The figure which shows the synchronous timing between cores. 汎用プロセッサコアの動作を示すタイムチャート。The time chart which shows operation | movement of a general purpose processor core. 画像領域が大きい場合の汎用プロセッサコアの割り当てを示す図。The figure which shows the allocation of the general purpose processor core when an image area is large. 制御用プロセッサコアの概略の処理を示すフローチャート。The flowchart which shows the outline process of the processor core for control. 汎用プロセッサコアの概略の処理を示すフローチャート。The flowchart which shows the general | schematic process of a general purpose processor core. 汎用プロセッサコアに割り当てられる領域を示す図。The figure which shows the area | region allocated to a general purpose processor core. 最初に画像処理を実行する２つのプロセッサコアが処理する領域を表す図。The figure showing the area | region which two processor cores which perform image processing first process. 次に画像処理を実行する２つのプロセッサコアが処理する領域を表す図。The figure showing the area | region which two processor cores which perform image processing next process. 汎用プロセッサコアの同期処理のタイミングを示す図。The figure which shows the timing of the synchronous process of a general purpose processor core. 汎用プロセッサコア間の同期タイミングを詳細に示す図。The figure which shows the synchronous timing between general purpose processor cores in detail. 汎用プロセッサコアで処理する画像データと共有メモリに格納されている処理済みデータとを示す図。The figure which shows the image data processed with a general purpose processor core, and the processed data stored in the shared memory. 制御用プロセッサコアの概略の処理を示すフローチャート。The flowchart which shows the outline process of the processor core for control. 汎用プロセッサコアの概略の処理を示すフローチャート。The flowchart which shows the general | schematic process of a general purpose processor core. 汎用プロセッサコアの動作を示すタイムチャート。The time chart which shows operation | movement of a general purpose processor core. 処理領域と汎用プロセッサコア間の同期タイミングを示す図。The figure which shows the synchronous timing between a processing area and a general purpose processor core. 外部バスに接続されている画像処理装置が１台の場合の処理を説明する図。The figure explaining the process in case the number of image processing apparatuses connected to the external bus is one. 外部バスに接続されている画像処理装置が複数台の場合の処理を説明する図。The figure explaining the process in case there are a plurality of image processing apparatuses connected to the external bus. 制御用プロセッサコアの概略の処理を示すフローチャート。The flowchart which shows the outline process of the processor core for control. 汎用プロセッサコアの概略の処理を示すフローチャート。The flowchart which shows the general | schematic process of a general purpose processor core. 汎用プロセッサコアの概略の処理を示すフローチャート。The flowchart which shows the general | schematic process of a general purpose processor core.

[第１の実施の形態]
図１は、第１の実施の形態の画像処理装置を示す図である。
画像処理装置１は、図示しない外部装置との間でデータを送受信する外部インタフェース２、共有メモリ３、共有メモリとの送受信を行うメモリインタフェース４、制御用プロセッサコア５、画像処理を実行する複数の汎用プロセッサコア６、及びこれらの各ハードウェア間のデータ通信路であるバス７を備えている。 [First embodiment]
FIG. 1 is a diagram illustrating an image processing apparatus according to the first embodiment.
The image processing apparatus 1 includes an external interface 2 that transmits / receives data to / from an external device (not shown), a shared memory 3, a memory interface 4 that transmits / receives data to / from the shared memory, a control processor core 5, and a plurality of processors that execute image processing. A general-purpose processor core 6 and a bus 7 serving as a data communication path between these pieces of hardware are provided.

汎用プロセッサコア６は、共有メモリ３よりも高速にアクセス可能なローカルメモリ８を内部に備えている。さらに、汎用プロセッサコア６は、プログラムに従って画像処理を実行する演算ユニット（不図示）と、ローカルメモリ８と共有メモリ３間のデータ転送、および、他の汎用プロセッサコア６との間で動作開始タイミングを合わせるための同期処理を実現するＤＭＡ転送ユニット（不図示）を備えている。これら汎用プロセッサコア６は、それぞれプログラムによって制御可能であり、また、並行に実行させることが可能である。 The general-purpose processor core 6 includes a local memory 8 that can be accessed faster than the shared memory 3. Further, the general-purpose processor core 6 performs an operation start timing between an arithmetic unit (not shown) that executes image processing according to a program, data transfer between the local memory 8 and the shared memory 3, and other general-purpose processor cores 6. A DMA transfer unit (not shown) that realizes synchronization processing for matching the two is provided. Each of these general-purpose processor cores 6 can be controlled by a program and can be executed in parallel.

汎用プロセッサコア６間のデータ通信方法については、ローカルメモリ８間で直接送受信する方法と、共有メモリ３を介して送受信する方法がある。このいずれの方法も使用することができるが、画像データのように大量のデータを送受信する場合には、共有メモリ３を介して通信する方式を用いることで高速に処理することができる。 As a data communication method between the general-purpose processor cores 6, there are a method of directly transmitting / receiving between the local memories 8 and a method of transmitting / receiving via the shared memory 3. Any of these methods can be used. However, when a large amount of data such as image data is transmitted / received, high-speed processing can be performed by using a method of communicating via the shared memory 3.

また、汎用プロセッサコア間での起動を制御する同期処理に関しては、同期処理を行う汎用プロセッサコア数が増える毎に指数的に同期処理に要する時間が増大する。このため、本画像処理装置１においては、より少ない同期回数で所定の処理を実現することが必要である。 As for the synchronization processing for controlling the activation between the general-purpose processor cores, the time required for the synchronization processing exponentially increases as the number of general-purpose processor cores performing the synchronization processing increases. For this reason, in the present image processing apparatus 1, it is necessary to realize predetermined processing with a smaller number of synchronizations.

制御用プロセッサコア５は、画像処理装置１の動作を統括して制御する。例えば、制御用プロセッサコア５は、汎用プロセッサコア６で生成した処理データを外部インタフェース２を介して外部へ通信する。また、制御用プロセッサコア５は、外部インタフェース２を介して外部からの指示を受信して汎用プロセッサコア６を制御する。 The control processor core 5 controls the overall operation of the image processing apparatus 1. For example, the control processor core 5 communicates processing data generated by the general-purpose processor core 6 to the outside via the external interface 2. Further, the control processor core 5 receives an instruction from the outside via the external interface 2 and controls the general-purpose processor core 6.

なお、図１に示す画像処理装置１は、共有メモリ３を備えているが、この共有メモリ３は、任意に接続可能な外部装置であっても良く、常に画像処理装置が備えていなくても良い。 The image processing apparatus 1 shown in FIG. 1 includes the shared memory 3. However, the shared memory 3 may be an external device that can be arbitrarily connected, and may not always include the image processing apparatus. good.

図２は、第１の実施の形態の画像処理装置を適用した画像処理システムを示す図である。この画像処理システムは、Ｆａｘ、プリンター、コピー、スキャン機能を備えたＭＦＰ（Multi Functional Peripheral：画像形成装置）の一部として構成されている。 FIG. 2 is a diagram illustrating an image processing system to which the image processing apparatus according to the first embodiment is applied. This image processing system is configured as a part of an MFP (Multi Functional Peripheral) having a fax, printer, copy, and scan functions.

即ち、画像処理装置１は、ＭＦＰを構成するハードウェアのーつとして、ＭＦＰ本体側の制御手段１０によって管理されている。そして、画像処理装置１は、ＭＦＰ本体側と外部インタフェース２を介して、画像処理プログラム、画像データを受信して、処理プログラムに従って画像処理を実施し、再び、処理された画像をＭＦＰ本体側に出力する。 That is, the image processing apparatus 1 is managed by the control unit 10 on the MFP body side as one piece of hardware constituting the MFP. Then, the image processing apparatus 1 receives the image processing program and the image data via the MFP main body side and the external interface 2, performs the image processing according to the processing program, and sends the processed image again to the MFP main body side. Output.

次に、図２を参照しつつ画像処理の流れについて説明する。
先ずＭＦＰ側の制御手段１０が、ＭＦＰ内の記憶領域１１から画像処理プログラムを取り出して、共有メモリ３にコピーさせると共に、画像処理装置１に対して決められた手順に従って指示を行うことで画像処理が開始される。この際、画像処理の対象となる画像データは、画像処理プログラムと同様に、ＭＦＰ本体の記憶領域１１に格納されている。 Next, the flow of image processing will be described with reference to FIG.
First, the control unit 10 on the MFP side takes out the image processing program from the storage area 11 in the MFP, copies it to the shared memory 3, and instructs the image processing apparatus 1 according to a predetermined procedure to perform image processing. Is started. At this time, the image data to be subjected to image processing is stored in the storage area 11 of the MFP main body, like the image processing program.

画像処理装置１では、制御用プロセッサコア５が画像処理プログラムを各汎用プロセッサコア６に読み込ませる。また制御用プロセッサコア５は、ＭＦＰ側の記憶領域１１から、画像データを取得して共有メモリ３に保存する。 In the image processing apparatus 1, the control processor core 5 causes each general-purpose processor core 6 to read an image processing program. In addition, the control processor core 5 acquires image data from the storage area 11 on the MFP side and stores it in the shared memory 3.

ＭＦＰ本体側の制御手段１０から、処理開始が指示されると、制御用プロセッサコア５が、汎用プロセッサコア６に対して画像処理の開始を指示する。各汎用プロセッサコア６は、画像処理を実行する。画像処理された画像データは共有メモリ３に一旦格納される。画像処理が完了すると、共有メモリ３に格納された画像データは、ＭＦＰ側の記憶領域１１に送信されて格納される。
図２には、画像データの流れが矢印で示されている。またＭＦＰ側の記憶領域１１には、未処理の画像データと処理済みの画像データとが格納されていることが示されている。 When the start of processing is instructed from the control means 10 on the MFP main body side, the control processor core 5 instructs the general-purpose processor core 6 to start image processing. Each general-purpose processor core 6 executes image processing. The image data subjected to image processing is temporarily stored in the shared memory 3. When the image processing is completed, the image data stored in the shared memory 3 is transmitted to and stored in the storage area 11 on the MFP side.
In FIG. 2, the flow of image data is indicated by arrows. The storage area 11 on the MFP side shows that unprocessed image data and processed image data are stored.

ここで、画像処理装置１が実行する画像処理は、ある画素の処理結果を取り入れて他の画素を処理するような処理である。即ち、対象とする画素の処理が他の画素の処理結果とは独立になされるのではなく、対象とする画素の処理が他の画素の処理結果に依存するような処理である。 Here, the image processing executed by the image processing apparatus 1 is processing that takes in the processing result of a certain pixel and processes other pixels. In other words, the processing of the target pixel is not performed independently of the processing result of the other pixels, but the processing of the target pixel depends on the processing result of the other pixels.

図３は、そのような画像処理の一例を説明する図である。図３に示す画像処理では、対象となる画素の計算において、図３に示すように、その対象画素の左、左上、上に隣接する画素の画像処理結果を用いる。第１の実施の形態の画像処理装置１では、このような依存関係のある画像処理を、複数の汎用プロセッサコア６により並列に処理することで高速な処理を実現する。なお、上述のような依存関係のある画像処理としては、誤差拡散処理、ディザ処理等が挙げられる。 FIG. 3 is a diagram illustrating an example of such image processing. In the image processing shown in FIG. 3, in the calculation of the target pixel, as shown in FIG. 3, the image processing results of the pixels adjacent to the left, upper left, and upper of the target pixel are used. In the image processing apparatus 1 of the first embodiment, high-speed processing is realized by processing such dependent image processing in parallel by a plurality of general-purpose processor cores 6. Note that image processing having the dependency relationship as described above includes error diffusion processing, dither processing, and the like.

続いて、複数の汎用プロセッサコア６での画像処理方法について説明する。図４は、それぞれの汎用プロセッサコア６に割り当てられる領域を示す図である。 Subsequently, an image processing method in the plurality of general-purpose processor cores 6 will be described. FIG. 4 is a diagram showing areas allocated to the respective general-purpose processor cores 6.

画像処理の並列処理にあたり、図３に示すように左、左上、上にある画素の計算結果が必要であることから、図４に示すように、画像領域を斜め方向に分割し、この領域毎に汎用プロセッサコア６を割り当てて画像処理を実行させる。図４には、複数の汎用プロセッサコア６の内、領域ごとに処理を実行する汎用プロセッサコア６を明示している。 In parallel processing of image processing, the calculation results of the pixels on the left, upper left, and upper side are necessary as shown in FIG. 3, so the image area is divided in an oblique direction as shown in FIG. The general-purpose processor core 6 is assigned to the image processor to execute image processing. FIG. 4 clearly shows the general-purpose processor core 6 that executes processing for each area among the plurality of general-purpose processor cores 6.

各汎用プロセッサコアでの処理に関しては、後述する同期方式により実行タイミングを時間的にずらしながら処理を行うことにより、同期処理のオーバーヘッドが少ない効率的な並列処理が可能となる。 With respect to processing in each general-purpose processor core, efficient parallel processing with less overhead of synchronization processing becomes possible by performing processing while shifting the execution timing in time by a synchronization method described later.

各汎用プロセッサコア６での処理と、汎用プロセッサコア間の同期処理による画像処理の詳細について説明する。図５は、画像領域の左側の汎用プロセッサコア６に割り当てられた領域を拡大して示す図である。 Details of image processing by processing in each general-purpose processor core 6 and synchronization processing between the general-purpose processor cores will be described. FIG. 5 is an enlarged view showing an area allocated to the general-purpose processor core 6 on the left side of the image area.

最初に処理を実行するのは、図５の左端の領域に割り当てられた汎用プロセッサコア(1)であり、他の汎用プロセッサコアについては、処理を担当する領域の左側に位置する領域を処理する汎用プロセッサコアからの同期信号を待つ（同期待ち）状態にある。 The first process is executed by the general-purpose processor core (1) assigned to the leftmost area in FIG. 5. For the other general-purpose processor cores, the area located on the left side of the area in charge of the process is processed. Waiting for a synchronization signal from the general-purpose processor core (waiting for synchronization).

処理状態にある汎用プロセッサコア(1)の画像処理は、領域の左の画素から順次右の画素に対して実行される。この画像処理の際、処理に必要な画像データが共有メモリ３から取り出されてローカルメモリ８に転送される。なお、画像データの転送サイズは、共有メモリ３とローカルメモリ８との間で予め定められたサイズ、または、その倍数のサイズである。 The image processing of the general-purpose processor core (1) in the processing state is executed sequentially from the left pixel to the right pixel in the region. During this image processing, image data necessary for the processing is taken out from the shared memory 3 and transferred to the local memory 8. The transfer size of the image data is a size determined in advance between the shared memory 3 and the local memory 8 or a multiple of the size.

また、領域内の一つの行について画像処理を実行する際には、１行前の処理結果が必要である。そのため、１行前の処理結果を保持し続ける。
汎用プロセッサコア(1)の画像処理は、右側領域を処理する汎用プロセッサコア(2)との同期処理を図る画素まで連続して行われる。 Further, when image processing is executed for one row in the area, the processing result of the previous row is required. Therefore, the processing result of the previous line is continuously held.
The image processing of the general-purpose processor core (1) is continuously performed up to the pixels for which the synchronization processing with the general-purpose processor core (2) that processes the right region is performed.

隣接領域との同期処理解除について、図５の先頭２行分を拡大した図６を用いて説明する。図６では、「Ａ」と表記される部分が共有メモリ３とローカルメモリ８との間のＤＭＡ転送単位であるメモリブロックを表している。Ａ_ｉ，ｊと表現された場合の添え字ｉは行数を表し、添え字ｊはブロック番号を表している。そして、Ｕｐ（ｘ）、Ｕｐｌｅｆｔ（ｘ）、Ｌｅｆｔ（ｘ）は、それぞれ画素ｘが依存する上、左上、左の画素の値を表している。また、図６の点線は汎用プロセッサコアの処理する領域の境界線を表している。 The cancellation of the synchronization processing with the adjacent area will be described with reference to FIG. 6 in which the first two lines in FIG. 5 are enlarged. In FIG. 6, a portion denoted by “A” represents a memory block that is a DMA transfer unit between the shared memory 3 and the local memory 8. When expressed as A _{i, j} , the subscript i represents the number of rows, and the subscript j represents the block number. Up (x), Upleft (x), and Left (x) represent the values of the upper, upper left, and left pixels on which the pixel x depends, respectively. A dotted line in FIG. 6 represents a boundary line of a region processed by the general-purpose processor core.

汎用プロセッサコア(2)が処理を開始する画素は、処理対象となる領域内で先頭行（＝１行）の左端にあたる画素で、図６中ではａｌｍで示される。この画素の画像処理を実行するための他の画素との計算依存関係を考慮する。計算に依存する画素Ｕｐ（ａｌｍ）、Ｕｐｌｅｆｔ（ａｌｍ）は、上の行に属し、画像領域外である。従って、画素ａｌｍの画像処理に影響する他の画素は、隣接左側画素Ｌｅｆｔ（ａｌｍ）のみである。即ち、この画素Ｌｅｆｔ（ａｌｍ）の処理結果が得られれば汎用プロセッサコア(2)は処理を開始できる。 The pixel from which the general-purpose processor core (2) starts processing is a pixel corresponding to the left end of the first row (= 1 row) in the region to be processed, and is indicated by alm in FIG. Consider the calculation dependency with other pixels to execute image processing of this pixel. The pixels Up (alm) and Upleft (alm) depending on the calculation belong to the upper row and are outside the image area. Accordingly, the only other pixel that affects the image processing of the pixel alm is the adjacent left pixel Left (alm). That is, the general-purpose processor core (2) can start processing if the processing result of the pixel Left (alm) is obtained.

画像処理において、元画像、および、処理済み画像は、転送サイズに最適化されたサイズで共有メモリとの間で転送される。汎用プロセッサコア(2)が処理に必要なデータを取得するためには、汎用プロセッサコア(2)の処理開始に必要な隣接左側画素が含まれるメモリブロックＡ_ｌ，ｎのデータが、汎用プロセッサコア(1)により書き込まれた後である必要がある。 In image processing, an original image and a processed image are transferred to and from a shared memory at a size optimized for the transfer size. In order for the general-purpose processor core (2) to acquire data necessary for processing, the data of the memory block A _{l, n} including the adjacent left side pixel necessary for starting the processing of the general-purpose processor core (2) is stored in the general-purpose processor core. Must be after being written by (1).

本実施の形態では、同期解除は隣接左側画素の処理が完了したタイミングではなく、さらの後のタイミングである、汎用プロセッサコア(1)が次の行である２行目の終端画素であるメモリブロックＡ_{２，ｎ−１}の右端に属する画素の処理を完了した後に、この同期解除を行なう。これは、汎用プロセッサコア(1)の動作と汎用プロセッサコア(2)の開始動作との間に時間的なマージンを設けることによって安定した動作を確保するためである。 In the present embodiment, the synchronization cancellation is not the timing at which the processing of the adjacent left pixel is completed, but is a later timing, that is, the memory in which the general-purpose processor core (1) is the terminal pixel of the second row that is the next row After completing the processing of the pixels belonging to the right end of the blocks A _{2 and n−1} , the synchronization is released. This is to ensure a stable operation by providing a time margin between the operation of the general-purpose processor core (1) and the start operation of the general-purpose processor core (2).

この同期解除の後、汎用プロセッサコア(1)は以降同期処理を行なわずに領域の最後の画素まで連続して画像処理を実行する。汎用プロセッサコア(2)は、汎用プロセッサコア(1)と同様に汎用プロセッサコア(3)に対する隣接右側領域の同期解除処理を行い、その後、領域の終端まで画像処理を連続して行なう。
以降、最後の画像領域まで、同様にして汎用プロセッサコア６の処理が行なわれる。 After this synchronization release, the general-purpose processor core (1) subsequently performs image processing up to the last pixel in the area without performing synchronization processing. Similar to the general-purpose processor core (1), the general-purpose processor core (2) performs the synchronization release processing of the adjacent right side area with respect to the general-purpose processor core (3), and then continuously performs image processing up to the end of the area.
Thereafter, the processing of the general-purpose processor core 6 is similarly performed up to the last image area.

図７は、汎用プロセッサコア６の動作を示すタイムチャートである。図７には、それぞれの汎用プロセッサコア内部の処理、共有メモリ間とのデータ転送、および、隣接汎用プロセッサコアとの同期解除処理のタイミングについて、これらを時間軸上に展開して表している。以下、図７を参照しつつ説明する。なお、図７に記載の処理の前には、予め、制御用プロセッサコア５と、処理を実行する複数の汎用プロセッサコア６に処理プログラムがロードされているものとする。 FIG. 7 is a time chart showing the operation of the general-purpose processor core 6. In FIG. 7, the timings of the internal processing of each general-purpose processor core, the data transfer between the shared memories, and the synchronization release processing with the adjacent general-purpose processor core are developed on the time axis. Hereinafter, a description will be given with reference to FIG. It is assumed that the processing program is loaded in advance on the control processor core 5 and a plurality of general-purpose processor cores 6 that execute processing before the processing shown in FIG.

初めに、制御用プロセッサコア５が、バス７を介して、各汎用プロセッサコア６に処理開始を通知する。通知後、左端領域の処理を割り当てられた汎用プロセッサコア(1)が画像処理を開始し、それ以外の汎用プロセッサコア６は、左側領域からの同期信号を待機する。 First, the control processor core 5 notifies each general-purpose processor core 6 of the start of processing via the bus 7. After the notification, the general-purpose processor core (1) assigned with the processing in the left end area starts image processing, and the other general-purpose processor cores 6 wait for a synchronization signal from the left-side area.

汎用プロセッサコア(1)は、初めに共有メモリ３から、ＤＭＡ転送サイズだけブロックＡ_１，１のデータを読み出す。読み込みが完了すると、汎用プロセッサコア(1)は、読み込んだ領域の画像処理を左側の画素から順次実行しつつ、次のブロックＡ_１，２のデータの読み込み動作を実行する。 The general-purpose processor core (1) first reads the data of the block A ₁ , ₁ from the shared memory 3 by the DMA transfer size. When the reading is completed, the general-purpose processor core (1) executes the data reading operation of the next blocks A1 and ₂ while sequentially executing the image processing of the read area from the left pixel.

次に、処理が終わった初めのブロックＡ_１，１のデータの共有メモリ３への書き出しと、次の処理ブロックＡ_１，３のデータ読み込みとに並行して、ブロックＡ_１，２の処理を実行する。
このように、汎用プロセッサコア(1)は、共有メモリ３への読み込み・書き込みと並行して画像処理を行いながら、同期解除を行なう画素まで処理を続ける。そして、汎用プロセッサコア(1)は、汎用プロセッサコア(2)との同期処理を実行する。同期処理を実行した以降は、汎用プロセッサコア(1)は、汎用プロセッサコア(2)との同期を取らずに、領域の最後まで画像処理を行なう。 Next, the processing of the blocks A ₁ and ₂ is performed in parallel with the writing of the data of the first block A _1,1 to the shared memory 3 and the reading of the data of the next processing block A _1,3. Execute.
As described above, the general-purpose processor core (1) continues the processing up to the pixel for which the synchronization is canceled while performing the image processing in parallel with the reading / writing to the shared memory 3. The general-purpose processor core (1) executes a synchronization process with the general-purpose processor core (2). After executing the synchronization processing, the general-purpose processor core (1) performs image processing to the end of the region without synchronizing with the general-purpose processor core (2).

汎用プロセッサコア(1)との同期処理を実行した後、汎用プロセッサコア(2)は、２行目の最終画素を含むブロックまで処理を行い、汎用プロセッサコア(3)との同期処理を実行する。以降、汎用プロセッサコア(3)、汎用プロセッサコア(4)についても同様に処理が行われる。 After executing synchronization processing with the general-purpose processor core (1), the general-purpose processor core (2) performs processing up to the block including the last pixel in the second row, and executes synchronization processing with the general-purpose processor core (3). . Thereafter, the same processing is performed for the general-purpose processor core (3) and the general-purpose processor core (4).

このように汎用プロセッサコア６に割り当てられた処理は、左側隣接領域を処理する汎用プロセッサコア６との同期、当該領域の画像処理、右側隣接領域を処理する汎用プロセッサコア６との同期、当該領域終端までの画像処理であり、これらの処理は順次実行される。 As described above, the processing assigned to the general-purpose processor core 6 includes synchronization with the general-purpose processor core 6 that processes the left adjacent area, image processing of the area, synchronization with the general-purpose processor core 6 that processes the right adjacent area, and the area. This is image processing up to the end, and these processes are executed sequentially.

なお、画像領域が大きい場合には、図８に示すように、汎用プロセッサコア数よりも多い数の領域に画像領域を分割し、順次汎用プロセッサコアを割り当てた後、再び、初めに割り当てた汎用プロセッサコアから順に画像領域を割り当てて、繰り返して１枚の画像を処理する。 If the image area is large, as shown in FIG. 8, the image area is divided into a larger number of areas than the number of general-purpose processor cores, and after sequentially assigning general-purpose processor cores, Image areas are allocated in order from the processor core, and one image is processed repeatedly.

次に、画像処理装置１における各部の制御、及びデータの扱いについて説明する。
図９は、画像処理装置１の概略の処理を示すフローチャートである。図９に示す処理は、制御用プロセッサコア５が実行する。 Next, control of each unit and data handling in the image processing apparatus 1 will be described.
FIG. 9 is a flowchart showing a schematic process of the image processing apparatus 1. The control processor core 5 executes the process shown in FIG.

アクトＡ０１において、制御用プロセッサコア５は、外部から処理に関する指示を受信する。即ち、処理する画像データの種別を受信する。ここでのデータ種別とは、画像全体を一括して受信して処理するか、バンドデータを、順次受信して処理するかを表わす。また、このとき、画像に対してどのような処理を実施させるかを表す処理種別も受信する。アクトＡ０２において、画像処理装置内において処理を実行する汎用プロセッサコア６に対して、指定された画像処理を実行するプログラムを送る。 In Act A01, the control processor core 5 receives an instruction regarding processing from the outside. That is, the type of image data to be processed is received. The data type here indicates whether the entire image is received and processed at once or whether band data is received and processed sequentially. At this time, a processing type indicating what kind of processing is performed on the image is also received. In Act A02, a program for executing the designated image processing is sent to the general-purpose processor core 6 that executes processing in the image processing apparatus.

アクトＡ０３において、処理対象の画像データを受信してその画像タイプを判定する。判定結果、受信した画像データがバンドデータの先頭／最終バンドか、一枚画像の場合、アクトＡ０４において、画像域外領域のデータ格納部分を、先に受信した処理に対応する初期値によって初期化する。判定結果、そうでない場合、即ちバンドデータで、かつ、先頭以外のものの場合、アクトＡ０５において、域外領域にメモリ内に格納された画像の一部をコピーする。 In Act A03, the image data to be processed is received and its image type is determined. As a result of the determination, if the received image data is the first / last band of the band data or a single image, in Act A04, the data storage part of the area outside the image area is initialized with the initial value corresponding to the previously received process. . If not, that is, if it is band data other than the head, in Act A05, a part of the image stored in the memory is copied to the outside area.

この処理の完了後、アクトＡ０６において、画像データを外部から受信する。続いて、アクトＡ０７において、受信した画像データの一部を、画像処理装置に直接接続された共有メモリ３にコピーする。コピーの完了後、アクトＡ０８において、それぞれの汎用プロセッサコア６に対して、処理開始を指示する。この後、それぞれの汎用プロセッサコア６が並列に画像処理を実施し、アクトＡ０９において、制御用プロセッサコア５は、これらの汎用プロセッサコア６から送信される処理終了の受信を待つ。 After completion of this process, in Act A06, image data is received from the outside. Subsequently, in Act A07, a part of the received image data is copied to the shared memory 3 directly connected to the image processing apparatus. After the completion of copying, in Act A08, the general-purpose processor core 6 is instructed to start processing. Thereafter, each general-purpose processor core 6 performs image processing in parallel, and in Act A09, the control processor core 5 waits for reception of processing end transmitted from these general-purpose processor cores 6.

処理終了を受信した後、アクトＡ１０において、処理結果が格納された共有メモリ３から画像データを、受け渡し用メモリにコピーし、バンド、もしくは、１枚画像の結果を、処理依頼元に対して通知する。アクトＡ１２において、全バンドの処理が終了しているかどうかを調べ、画像タイプがバンドで、処理が完了していない場合（アクトＡ１２でＮｏ）は、再び、これまでの処理を繰り返し、全バンド、もしくは、１枚画像の処理が完了していれば（アクトＡ１２でＹｅｓ）、ここで画像処理は完了となる。 After receiving the processing end, in Act A10, the image data is copied from the shared memory 3 storing the processing result to the transfer memory, and the result of the band or one image is notified to the processing request source. To do. In Act A12, it is checked whether or not all the bands have been processed. If the image type is a band and the processing has not been completed (No in Act A12), the above processing is repeated again, Alternatively, if the processing of one image has been completed (Yes in Act A12), the image processing is completed here.

図１０は、汎用プロセッサコア６の概略の処理を示すフローチャートである。
汎用プロセッサコア６で実行される処理については、処理の依頼側から指示される。即ち、処理の依頼側から処理を実行するプログラムが送信されて、これが実行されることで汎用プロセッサコア６上の処理が開始される。 FIG. 10 is a flowchart showing a schematic process of the general-purpose processor core 6.
The processing executed by the general-purpose processor core 6 is instructed from the processing request side. That is, a program for executing a process is transmitted from the process requesting side, and the process on the general-purpose processor core 6 is started by executing the program.

アクトＰ０１において、汎用プロセッサコア６は、処理を行う画像上の領域を受信する。つづいて、アクトＰ０２において、独立した画像領域の画像データを受信する。ここで独立した画像領域とは、処理対象となる画像領域で、かつ、隣接領域の処理結果を利用しない画像領域のことである。 In Act P01, the general-purpose processor core 6 receives an area on the image to be processed. Subsequently, in Act P02, image data of an independent image area is received. Here, an independent image area is an image area that is a processing target and that does not use the processing result of the adjacent area.

次に、アクトＰ０３において、隣接する汎用プロセッサコア６からの同期信号を待機する。即ち隣接領域において、同期をとる画素（同期画素）の画像処理が完了されるのを待つ。アクトＰ０４において、同期信号、即ち同期画素の処理が完了したことを受信すると、アクトＰ０５において、汎用プロセッサコア６は処理を開始する。アクトＰ０６において、処理結果は共有メモリ３にコピーする。 Next, in Act P03, a synchronization signal from the adjacent general-purpose processor core 6 is waited. That is, in the adjacent area, the process waits for the image processing of the synchronized pixel (synchronized pixel) to be completed. In Act P04, when receiving the synchronization signal, that is, that the processing of the synchronized pixel is completed, in Act P05, the general-purpose processor core 6 starts processing. In Act P06, the processing result is copied to the shared memory 3.

自領域において同期画素の処理が完了すると、アクトＰ０７において、同期信号、即ち同期画素までの処理を完了したことを隣接領域を処理する汎用プロセッサコア６に通知する。この処理の後、アクトＰ０８〜Ｐ０９において、汎用プロセッサコア６は割り当てられた領域の最後の画素まで処理と処理結果のコピーとを繰り返して実行する。処理の完了後、アクトＰ１０において、処理完了を制御用プロセッサコア５に対して通知する。 When the process of the synchronized pixel is completed in the own area, in Act P07, the general processor core 6 that processes the adjacent area is notified of the completion of the process up to the synchronized signal, that is, the synchronized pixel. After this process, in Acts P08 to P09, the general-purpose processor core 6 repeatedly executes the process and the copy of the process result up to the last pixel in the allocated area. After the processing is completed, the control processor core 5 is notified of the processing completion in Act P10.

このように、依存関係により単純に並列化して処理することが困難な画像処理についても、画像を斜め方向に分割して画素間の依存関係を考慮した処理制御を採用することで、データ転送、汎用プロセッサ処理に適した並列化を行なうことができる。従って、更なるハードウェアを追加することによるアクセラレーションを行なわずに、高速化を実現することができる。 In this way, even for image processing that is difficult to process simply by parallelization due to dependency relationships, by adopting processing control that considers the dependency relationship between pixels by dividing the image in an oblique direction, data transfer, Parallelization suitable for general-purpose processor processing can be performed. Therefore, it is possible to achieve high speed without performing acceleration by adding additional hardware.

[第２の実施の形態]
本発明の第２の実施の形態に係る画像処理装置は、画像処理方法が第１の実施の形態と異なっている。従って、第１の実施の形態と同一の部位には同一の符号を付してその詳細の説明は省略する。 [Second Embodiment]
The image processing apparatus according to the second embodiment of the present invention is different from the first embodiment in the image processing method. Accordingly, the same parts as those in the first embodiment are denoted by the same reference numerals, and detailed description thereof is omitted.

第２の実施の形態に係る画像処理装置１で実施される並列画像処理方式について説明する。図１１は、汎用プロセッサコア６に割り当てられる領域を示す図である。この並列画像処理方式では、それぞれの汎用プロセッサコア６が処理する領域は、画像を縦、横方向に分割した領域である。そして、横方向についてはライン（行）単位で分割された画素領域を汎用プロセッサコア６に割り当て、縦方向については、縦領域＋縦領域の計算に関係する画素領域を汎用プロセッサコア６に割り当てる。そして、それぞれの汎用プロセッサコア６が並列に計算を実施する。 A parallel image processing method implemented by the image processing apparatus 1 according to the second embodiment will be described. FIG. 11 is a diagram showing an area allocated to the general-purpose processor core 6. In this parallel image processing method, the area processed by each general-purpose processor core 6 is an area obtained by dividing an image in the vertical and horizontal directions. In the horizontal direction, a pixel area divided in line (row) units is assigned to the general-purpose processor core 6, and in the vertical direction, a pixel area related to the calculation of the vertical area + the vertical area is assigned to the general-purpose processor core 6. Each general-purpose processor core 6 performs calculations in parallel.

図１２は、最初に画像処理を実行する２つのプロセッサコアが処理する領域を表す図である。この方式では、常に２つのプロセッサコアが同時に処理を開始する。
汎用プロセッサコア(1)の処理領域は、図１２の内、網掛け領域(1)＋(2)と、右斜線領域(1)である。汎用プロセッサコア(2)の処理領域は、同じく網掛け領域(1)＋(2)と、左斜線領域(2)の部分である。横方向に分割された領域を処理する汎用プロセッサコア(1) に割り当てられる領域の大きさは、画像１行分である。縦方向に分割された領域を処理する汎用プロセッサコア(2) に割り当てられる領域の大きさは、上辺がＤＭＡ転送のブロックサイズ×２、下辺がＤＭＡ転送のブロックサイズである台形領域である。 FIG. 12 is a diagram illustrating an area processed by two processor cores that first execute image processing. In this method, two processor cores always start processing simultaneously.
The processing areas of the general-purpose processor core (1) are the shaded area (1) + (2) and the right hatched area (1) in FIG. Similarly, the processing area of the general-purpose processor core (2) is a shaded area (1) + (2) and a left hatched area (2). The size of the area allocated to the general-purpose processor core (1) that processes the area divided in the horizontal direction is one line of the image. The size of the area allocated to the general-purpose processor core (2) that processes the area divided in the vertical direction is a trapezoidal area in which the upper side is the DMA transfer block size × 2 and the lower side is the DMA transfer block size.

この領域の処理に続いて、内側の領域について汎用プロセッサコア(3)、(4)が同時に処理を開始する。
図１３は、次に画像処理を実行する２つのプロセッサコアが処理する領域を表す図である。図１２に示す領域と比較して、汎用プロセッサコア(3)が処理する領域は、汎用プロセッサコア(1)が処理する領域よりもＤＭＡ転送のブロックサイズ列分少なく、汎用プロセッサコア(4)が処理する領域は、汎用プロセッサコア(2)が処理する領域よりも、１行、１列分少ないサイズの領域となっている。 Following the processing in this area, the general-purpose processor cores (3) and (4) start processing at the same time for the inner area.
FIG. 13 is a diagram illustrating an area to be processed by two processor cores that next execute image processing. Compared to the area shown in FIG. 12, the area processed by the general-purpose processor core (3) is smaller than the area processed by the general-purpose processor core (1) by the block size column of DMA transfer, and the general-purpose processor core (4) The area to be processed is an area having a size smaller by one row and one column than the area processed by the general-purpose processor core (2).

図１４は、汎用プロセッサコア(1)、(2)の処理開始後に、汎用プロセッサコア(3)、(4)が処理を開始するための同期処理のタイミングを示す図である。
汎用プロセッサコア(3)、(4)が開始点の画素を処理するためには、当該開始点の画素を基準として、上側の画素、左上の画素、及び、左側の画素の処理結果が必要である。 FIG. 14 is a diagram illustrating the timing of synchronization processing for the general-purpose processor cores (3) and (4) to start processing after the processing of the general-purpose processor cores (1) and (2) is started.
In order for the general-purpose processor cores (3) and (4) to process the pixel at the start point, the processing results of the upper pixel, the upper left pixel, and the left pixel are required with reference to the start point pixel. is there.

図１２に示すように、縦分割領域を処理する汎用プロセッサコア(2)の領域には、汎用プロセッサコア(1)の一部の領域が含まれ、また、汎用プロセッサコア(3)、(4)の開始点が依存する画素がすべて含まれる。よって、汎用プロセッサコア(2)の処理が、汎用プロセッサコア(3)、(4)の開始点の左側に位置する画素まで処理され、共有メモリ３に書き込まれた段階から汎用プロセッサコア(3)、(4)は処理を開始することができる。 As shown in FIG. 12, the area of the general-purpose processor core (2) for processing the vertically divided area includes a part of the general-purpose processor core (1), and the general-purpose processor cores (3), (4 ) Includes all the pixels on which the starting point depends. Therefore, the processing of the general-purpose processor core (2) is processed up to the pixel located on the left side of the start point of the general-purpose processor cores (3) and (4) and is written to the shared memory 3, and then the general-purpose processor core (3) , (4) can start processing.

図１５は、汎用プロセッサコア間の同期タイミングを詳細に示す図である。
ブロックＡ_１，１及びブロックＡ_１，２は、汎用プロセッサコア(1) 、(2)が処理する領域であり、図１２の網掛け領域(1)＋(2)に相当する。ブロックＡ_２，１及びブロックＡ_３，１は、汎用プロセッサコア(2)が処理する領域である。
ブロックＡ_２，２は、汎用プロセッサコア(3) 、(4)が処理する領域であり、またその領域の一部は汎用プロセッサコア(2)が処理する。ブロックＡ_２，３は、汎用プロセッサコア(4)が処理する領域であり、またその領域の一部は汎用プロセッサコア(2)が処理する。 FIG. 15 is a diagram showing in detail the synchronization timing between general-purpose processor cores.
Block A _1,1 and block A _1,2 are areas processed by the general-purpose processor cores (1), (2), and correspond to the shaded areas (1) + (2) in FIG. Block A _2,1 and block A _3,1 are areas processed by the general-purpose processor core (2).
Blocks A _{2 and 2} are areas processed by the general-purpose processor cores (3) and (4), and a part of the area is processed by the general-purpose processor core (2). Blocks A _{2 and 3} are areas processed by the general-purpose processor core (4), and a part of the area is processed by the general-purpose processor core (2).

ここで、汎用プロセッサコア(3) 、(4)が処理を開始する開始点の画素ａ２ｍは、ブロックＡ_２，２の左端の画素である。この開始点の画素ａ２ｍが画像処理を実行できるためには、画素Ｕｐ（ａ２ｍ）、Ｕｐｌｅｆｔ（ａ２ｍ）、Ｌｅｆｔ（ａ２ｍ）の処理が終了していることが必要である。従って、左隣接画素であるＬｅｆｔ（ａ２ｍ）の処理が終了すれば、汎用プロセッサコア(3) 、(4)は処理を開始することができる。しかし上述のように、汎用プロセッサコア(2)の動作と汎用プロセッサコア(3) 、(4)の開始動作との間に時間的なマージンを設けることによって安定した動作を確保する。このため、汎用プロセッサコア(2)がブロックＡ_３，１の右端の画素を処理したときを同期タイミングとする。 Here, the pixel a2m at the start point where the general-purpose processor cores (3) and (4) start processing is the pixel at the left end of the blocks A2 and ₂ . In order for the pixel a2m at the start point to execute image processing, it is necessary that the processing of the pixel Up (a2m), Upleft (a2m), and Left (a2m) is completed. Accordingly, when the processing of Left (a2m), which is the left adjacent pixel, is completed, the general-purpose processor cores (3) and (4) can start the processing. However, as described above, a stable operation is ensured by providing a time margin between the operation of the general-purpose processor core (2) and the start operations of the general-purpose processor cores (3) and (4). Therefore, the timing when the general-purpose processor core (2) processes the rightmost pixel of the block A _3,1 is set as the synchronization timing.

そこでブロックＡ_３，１の右端の画素の処理が完了した段階で同期待ちが解除され汎用プロセッサコア(3) 、(4)は処理を開始する。処理開始点の画素は、その画素の上、左上、左側の画素の処理結果を必要とするため、同期処理の前に格納済みのこれらのデータを共有メモリ３から読み出して処理を行う。汎用プロセッサコア(2)は、この同期処理以降は同期処理を行わず、領域の終端部分まで処理を進める。以降、同様に、画像の右下方向に向かって並列に処理が行われる。 Therefore, when the processing of the rightmost pixel in the block A ₃ , ₁ is completed, the synchronization wait state is released and the general-purpose processor cores (3) and (4) start processing. Since the pixel at the processing start point requires the processing results of the upper, upper left, and left pixels, the stored data is read from the shared memory 3 and processed before the synchronization processing. The general-purpose processor core (2) does not perform the synchronization process after this synchronization process, and proceeds to the end of the area. Thereafter, similarly, processing is performed in parallel toward the lower right direction of the image.

第１の実施の形態でも記述したように、バス７を介して汎用プロセッサコア６の同期処理を行う場合、同期を取る汎用プロセッサコア６の数が増えると同期処理に要する時間が増大する。領域を単純に縦、及び、横に分割して処理した場合、汎用プロセッサコア(3) 、(4)は、汎用プロセッサコア(1) 、(2)との間で同期処理を行わねばならない。即ち、４つの汎用プロセッサコアで同期処理を行う必要があり処理時間が増加する。 As described in the first embodiment, when the synchronization processing of the general-purpose processor cores 6 is performed via the bus 7, the time required for the synchronization processing increases as the number of general-purpose processor cores 6 to be synchronized increases. When the area is simply divided vertically and horizontally and processed, the general-purpose processor cores (3) and (4) must perform synchronization processing with the general-purpose processor cores (1) and (2). That is, it is necessary to perform synchronous processing with four general-purpose processor cores, and processing time increases.

図１６は、汎用プロセッサコア６で処理する画像データと共有メモリ３に格納されている処理済みデータとを示す図である。 FIG. 16 is a diagram showing image data processed by the general-purpose processor core 6 and processed data stored in the shared memory 3.

第２の実施の形態では、画像処理の依存性を利用して、１つの汎用プロセッサコアの処理領域に同期に必要となる画素を全て含めている。また、処理量が多い汎用プロセッサコアの動作に基づいて同期タイミングを決定するように構成することで、処理を開始した汎用プロセッサコアが共有メモリ３にアクセスするタイミングでは、多くの処理済みのデータが共有メモリ３上に存在していることになる。 In the second embodiment, all the pixels necessary for synchronization are included in the processing area of one general-purpose processor core using the dependency of image processing. In addition, by configuring so that the synchronization timing is determined based on the operation of the general-purpose processor core having a large amount of processing, at the timing when the general-purpose processor core that has started processing accesses the shared memory 3, a large amount of processed data is stored. It exists on the shared memory 3.

次に、画像処理装置１における各部の制御、及びデータの扱いについて説明する。
図１７は、画像処理装置１の概略の処理を示すフローチャートである。図１７に示す処理は、制御用プロセッサコア５が実行する。
なお、図１７のフローチャートに示す処理は、図９のフローチャートに示す処理と同じであるためその詳細の説明は省略する。 Next, control of each unit and data handling in the image processing apparatus 1 will be described.
FIG. 17 is a flowchart showing a schematic process of the image processing apparatus 1. The processing shown in FIG. 17 is executed by the control processor core 5.
The process shown in the flowchart of FIG. 17 is the same as the process shown in the flowchart of FIG.

図１８は、汎用プロセッサコア６の概略の処理を示すフローチャートである。
汎用プロセッサコア６で実行される処理については、処理の依頼側から指示される。即ち、処理の依頼側から処理を実行するプログラムが送信されて、これが実行されることで汎用プロセッサコア６上の処理が開始される。 FIG. 18 is a flowchart showing a schematic process of the general-purpose processor core 6.
The processing executed by the general-purpose processor core 6 is instructed from the processing request side. That is, a program for executing a process is transmitted from the process requesting side, and the process on the general-purpose processor core 6 is started by executing the program.

アクトＰ１０１において、汎用プロセッサコア６は、処理を行う画像上の領域を受信する。つづいて、アクトＰ１０２において、独立した画像領域の画像データを受信する。ここで独立した画像領域とは、処理対象となる画像領域で、かつ、隣接領域の処理結果を利用しない画像領域のことである。 In Act P101, the general-purpose processor core 6 receives an area on the image to be processed. Subsequently, in Act P102, image data of an independent image area is received. Here, an independent image area is an image area that is a processing target and that does not use the processing result of the adjacent area.

次に、アクトＰ１０３において、隣接する汎用プロセッサコア６からの同期信号を待機する。即ち隣接領域において、同期をとる画素（同期画素）の画像処理が完了されるのを待つ。アクトＰ１０４において、同期信号、即ち同期画素の処理が完了したことを受信すると、アクトＰ１０５において、汎用プロセッサコア６は処理を開始する。アクトＰ１０６において、処理結果は共有メモリ３にコピーする。 Next, in Act P103, a synchronization signal from the adjacent general-purpose processor core 6 is waited. That is, in the adjacent area, the process waits for the image processing of the synchronized pixel (synchronized pixel) to be completed. In Act P104, when receiving the synchronization signal, that is, that the processing of the synchronized pixel is completed, in Act P105, the general-purpose processor core 6 starts processing. In Act P106, the processing result is copied to the shared memory 3.

アクトＰ１０７において、汎用プロセッサコア６は隣接する領域を処理する他の汎用プロセッサコア６に対して同期解除を通信する必要があるかどうかを判定する。
同期解除を通信する必要がある場合（アクトＰ１０７でＹｅｓ）、自領域において同期画素の処理が完了すると、アクトＰ１０８において、同期信号、即ち同期画素までの処理を完了したことを隣接領域を処理する汎用プロセッサコア６に通知する。同期解除を通信する必要がない場合（アクトＰ１０７でＮｏ）、次のアクトＰ１０９を実行する。 In Act P107, the general-purpose processor core 6 determines whether it is necessary to communicate the synchronization release to other general-purpose processor cores 6 that process adjacent areas.
When it is necessary to communicate the release of synchronization (Yes in Act P107), when processing of the synchronized pixel is completed in its own region, the adjacent region is processed in Act P108 to indicate that the processing up to the synchronized signal, that is, the synchronized pixel is completed. Notify the general-purpose processor core 6. When it is not necessary to communicate the synchronization release (No in Act P107), the next Act P109 is executed.

アクトＰ１０９〜Ｐ１１０において、汎用プロセッサコア６は割り当てられた領域の最後の画素まで処理と処理結果のコピーとを続ける。処理の完了後、アクトＰ１１１において、処理完了を制御用プロセッサコア５に対して通知する。 In Acts P109 to P110, the general-purpose processor core 6 continues processing and copying of processing results up to the last pixel in the allocated area. After the process is completed, the process completion is notified to the control processor core 5 in Act P111.

ここで、汎用プロセッサコア６が隣接領域を処理する他の汎用プロセッサコア６に対して、同期解除の通信を行なう必要があるかどうかは、当該汎用プロセッサコア６が処理する対象の領域によって判定する。図１１に示すように、対象処理領域が画像バンドを縦横に分割した際の、縦分割領域に該当する場合は同期処理を行い、横分割領域に該当する場合は同期処理を行わないものとして判定する。 Here, whether or not the general-purpose processor core 6 needs to perform synchronization release communication with other general-purpose processor cores 6 that process adjacent areas is determined according to the area to be processed by the general-purpose processor core 6. . As shown in FIG. 11, when the target processing area is divided into vertical and horizontal directions when the image band is divided into vertical and horizontal areas, synchronization processing is performed, and when the target processing area corresponds to horizontal division areas, it is determined that synchronization processing is not performed. To do.

図１９は、汎用プロセッサコア６の動作を示すタイムチャートである。図１９には、それぞれの汎用プロセッサコア内部の処理、共有メモリ間とのデータ転送、および、隣接汎用プロセッサコアとの同期解除処理のタイミングについて、これらを時間軸上に展開して表している。図１９に記載する処理の前には、予め、制御用プロセッサコア５によって、処理用の複数の汎用プロセッサコア６にプログラムがロードされているものとする。 FIG. 19 is a time chart showing the operation of the general-purpose processor core 6. FIG. 19 shows the timings of the internal processing of each general-purpose processor core, the data transfer between the shared memories, and the synchronization release processing with the adjacent general-purpose processor core on the time axis. Before the process illustrated in FIG. 19, it is assumed that a program is loaded in advance into the plurality of general-purpose processor cores 6 for processing by the control processor core 5.

初めに、制御用プロセッサコア５が、バス７を介して、各汎用プロセッサコア６に処理開始を通知する。通知後、汎用プロセッサコア(1)、(2)が画像処理を開始し、それ以外の汎用プロセッサコア６は、隣接領域からの同期信号を待機する。 First, the control processor core 5 notifies each general-purpose processor core 6 of the start of processing via the bus 7. After the notification, the general-purpose processor cores (1) and (2) start image processing, and the other general-purpose processor cores 6 wait for a synchronization signal from the adjacent area.

汎用プロセッサコア(1)は、画像処理装置１に格納されている画像領域の上辺、即ち、画像データの先頭行を処理する。汎用プロセッサコア(2)は、画像左端及び、それを含む数列分を処理する。ここで、列の数は、画像処理装置１に接続された共有メモリ３と、ローカルメモリ８との間でのデータ転送に最適化されたサイズである。 The general-purpose processor core (1) processes the upper side of the image area stored in the image processing apparatus 1, that is, the first line of the image data. The general-purpose processor core (2) processes the left end of the image and several sequences including it. Here, the number of columns is a size optimized for data transfer between the shared memory 3 connected to the image processing apparatus 1 and the local memory 8.

汎用プロセッサコア(1)、(2)は、初期のデータのロードとして、画像処理装置１に接続された記憶領域１１から画像データを共有メモリ３を介して取得する。汎用プロセッサコア(1)、(2)は、共有メモリ３に格納された処理前の画像データを、左端から順次、一定量ずつ取得して処理を行い、処理された結果を一定量ずつ共有メモリ３に書込む。 The general-purpose processor cores (1) and (2) acquire image data from the storage area 11 connected to the image processing apparatus 1 via the shared memory 3 as an initial data load. The general-purpose processor cores (1) and (2) acquire and process the pre-processed image data stored in the shared memory 3 in a certain amount sequentially from the left end, and process the processed results by a certain amount. Write to 3.

この画像処理装置１に接続された共有メモリ３と、ローカルメモリ８との間の画像転送、及び画像処理は、汎用プロセッサコア６内の計算処理ユニットと、ＤＭＡ転送ユニットにより並行して処理される。即ち、汎用プロセッサコア６は、一定量のデータを取得してこのデータを画像処理するが、この処理のタイミングで、処理済みデータ群の書き込み、および、次処理データ群の読込みを並行して実施する。 Image transfer and image processing between the shared memory 3 connected to the image processing apparatus 1 and the local memory 8 are processed in parallel by the calculation processing unit in the general-purpose processor core 6 and the DMA transfer unit. . That is, the general-purpose processor core 6 acquires a certain amount of data and performs image processing on this data. At the timing of this processing, the processing data group is written and the next processing data group is read in parallel. To do.

図１９に示すタイムチャートには、汎用プロセッサコア６間における同期処理タイミング、及び同期処理方式について詳細を記載している。このタイムチャートの見方は図７において既に説明している。従って、その詳細の説明は省略する。 The time chart shown in FIG. 19 describes the details of the synchronization processing timing and the synchronization processing method between the general-purpose processor cores 6. The time chart is already described with reference to FIG. Therefore, detailed description thereof is omitted.

汎用プロセッサコア (2)は、汎用プロセッサコア (3)、(4)との間での同期処理を実行した後、割り当てられた領域の最後まで処理を進める。汎用プロセッサコア (4)は、画像処理を継続し、隣接する領域に対して同期を行うための同期画素について処理を行ったときは、処理後のデータを共有メモリ３に書込むと共に、同期処理を実行する。そしてその後、汎用プロセッサコア (2)と同様に、割り当てられた領域の終端まで処理を実施する。 The general-purpose processor core (2) performs the synchronization process with the general-purpose processor cores (3) and (4), and then proceeds to the end of the allocated area. The general-purpose processor core (4) continues the image processing and writes the processed data to the shared memory 3 and performs the synchronization processing when processing is performed on the synchronized pixels for synchronizing the adjacent regions. Execute. After that, the processing is performed up to the end of the allocated area in the same manner as the general-purpose processor core (2).

なお、画像領域が大きい場合には、図２０に示すように、汎用プロセッサコア数よりも多い数に画像領域を分割し、順次汎用プロセッサコアを割り当てた後、再び、初めに割り当てた汎用プロセッサコアから順に内部の画像領域を割り当てて、繰り返して１枚の画像を処理する。 If the image area is large, as shown in FIG. 20, the image area is divided into a number larger than the number of general-purpose processor cores, and after sequentially assigning general-purpose processor cores, the general-purpose processor cores assigned first are again assigned. An internal image area is allocated in order, and one image is processed repeatedly.

以上説明した第２の実施の形態の並列画像処理方法では、割り当てられた領域の画像処理に続いて、さらにその内側領域を割り当て領域として処理が実行される。上述したように、画像領域を縦、横方向に分割して並列処理する際、一方の分割領域に次の並列処理との同期に必要な画素（同期画素）を含めることで、同期処理回数を減らして、より少ないオーバーヘッドでの並列処理が可能となる。 In the parallel image processing method according to the second embodiment described above, subsequent to the image processing of the allocated area, the processing is executed using the inner area as the allocated area. As described above, when the image area is divided in the vertical and horizontal directions and processed in parallel, the number of synchronization processes can be reduced by including pixels (synchronous pixels) necessary for synchronization with the next parallel processing in one divided area. This reduces parallel processing with less overhead.

[第３の実施の形態]
本発明の第３の実施の形態に係る画像処理装置は、第１の実施の形態と第２の実施の形態とを組み合わせた構成である。従って、第１及び第２の実施の形態と同一の部位には同一の符号を付してその詳細の説明は省略する。 [Third embodiment]
An image processing apparatus according to the third embodiment of the present invention has a configuration in which the first embodiment and the second embodiment are combined. Accordingly, the same parts as those in the first and second embodiments are denoted by the same reference numerals, and detailed description thereof is omitted.

第３の実施の形態では、複数の画像処理装置１を外部バスに接続する仕組みを備えている。外部バスに接続されている画像処理装置１の台数は、手動で、あるいは自動認識機能を備えたハードウェアなどにより自動で検出される。 In the third embodiment, a mechanism for connecting a plurality of image processing apparatuses 1 to an external bus is provided. The number of image processing apparatuses 1 connected to the external bus is detected manually or automatically by hardware having an automatic recognition function.

外部バスに接続されている画像処理装置１が１台のみの場合には、図２１に示すように、本発明の第２の実施の形態に記載の画像処理方式に従って画像処理が実施される。
外部バスに接続されている画像処理装置１が複数台の場合には、図２２に示すように、処理対象の画像データは、本発明の第１の実施の形態に沿った画像処理方式に従って画像処理される。 When only one image processing apparatus 1 is connected to the external bus, as shown in FIG. 21, image processing is performed according to the image processing method described in the second embodiment of the present invention.
When there are a plurality of image processing apparatuses 1 connected to the external bus, as shown in FIG. 22, the image data to be processed is an image according to the image processing method according to the first embodiment of the present invention. It is processed.

即ち、図２２に示す画像処理方式では、画像処理装置１の数に応じて本発明の第１の実施の形態に沿う形で画像領域を分割する。そして、それぞれの画像処理装置１では、分割された領域をさらに第１の実施の形態に沿う形で分割処理する。更にそれぞれの画像処理装置１において依存関係があり相互に情報のやり取りが必要な画素は、外部バスに接続する記憶領域１１を使って共有しながら処理を行う。 That is, in the image processing method shown in FIG. 22, the image area is divided in accordance with the first embodiment of the present invention in accordance with the number of image processing apparatuses 1. In each image processing apparatus 1, the divided area is further divided in the form according to the first embodiment. Further, pixels that have a dependency relationship in each image processing apparatus 1 and need to exchange information with each other are processed while being shared using the storage area 11 connected to the external bus.

次に、画像処理装置１における各部の制御、及びデータの扱いについて説明する。
図２３は、画像処理装置１の概略の処理を示すフローチャートである。図２３に示す処理は、制御用プロセッサコア５が実行する。 Next, control of each unit and data handling in the image processing apparatus 1 will be described.
FIG. 23 is a flowchart showing a schematic process of the image processing apparatus 1. The process shown in FIG. 23 is executed by the control processor core 5.

アクトＡ２０１において、制御用プロセッサコア５は、外部から処理の指示を受信する。即ち、処理する画像データの種別を受信する。ここでのデータ種別とは、画像全体を一括して受信して処理するか、バンドデータを、順次受信して処理するものかを表わす。また、このとき、画像に対してどのような処理を実施させるかを表す処理種別も受信する。 In Act A201, the control processor core 5 receives a processing instruction from the outside. That is, the type of image data to be processed is received. The data type here indicates whether the entire image is received and processed at once or whether band data is received and processed sequentially. At this time, a processing type indicating what kind of processing is performed on the image is also received.

アクトＡ２０２において、外部バスに接続された画像処理装置１の数を調べる。画像処理装置１の数が１つの場合は、アクトＡ２０３において、第２の実施の形態の画像処理を実行するプログラムをそれぞれの汎用プロセッサコア６にロードする。 In Act A202, the number of image processing apparatuses 1 connected to the external bus is checked. When the number of the image processing apparatuses 1 is one, in Act A203, a program for executing the image processing of the second embodiment is loaded into each general-purpose processor core 6.

画像処理装置１の数が２つ以上の場合は、アクトＡ２０４において、第１の実施の形態の画像処理を実行すると共に、画像処理装置１間で、依存関係をもつ画素の同期処理とその処理結果の通信を実行するプログラムをそれぞれの汎用プロセッサコア６にロードする。 When the number of the image processing apparatuses 1 is two or more, in Act A204, the image processing of the first embodiment is executed, and the pixel synchronization process having the dependency relationship between the image processing apparatuses 1 and the processing are performed. A program for executing the communication of the result is loaded into each general-purpose processor core 6.

そして、このロードしたプログラムを用いてアクトＡ２０５からアクトＡ２１４の処理を実行する。なお、アクトＡ２０５からアクトＡ２１４の処理は、図９に示すアクトＡ０３からアクトＡ１２の処理と分割数、ならびに、領域が異なるだけであり、その動作は同様であるため詳細の説明は省略する。 Then, the processing from Act A205 to Act A214 is executed using this loaded program. Note that the processing from Act A205 to Act A214 is the same as the processing from Act A03 to Act A12 shown in FIG.

図２４、図２５は、汎用プロセッサコア６の概略の処理を示すフローチャートである。なお、図２４、図２５は、上述のように画像処理装置１の数が２のときの処理フローである。
汎用プロセッサコア６で実行される処理については、処理の依頼側から指示される。即ち、処理の依頼側から処理を実行するプログラムが送信されて、これが実行されることで汎用プロセッサコア６上の処理が開始される。 24 and 25 are flowcharts showing a schematic process of the general-purpose processor core 6. 24 and 25 are processing flows when the number of image processing apparatuses 1 is two as described above.
The processing executed by the general-purpose processor core 6 is instructed from the processing request side. That is, a program for executing a process is transmitted from the process requesting side, and the process on the general-purpose processor core 6 is started by executing the program.

アクトＰ２０３において、汎用プロセッサコア６は、他の画像処理装置１からの同期信号を待機するかどうかを判断する。他の画像処理装置１の汎用プロセッサコア６の処理結果を利用する場合（アクトＰ２０３でＹｅｓ）、アクトＰ２０４において当該外部の汎用プロセッサコア６の処理結果を待機する。他の画像処理装置１の汎用プロセッサコア６の処理結果を利用しない場合（アクトＰ２０３でＮｏ）、アクトＰ２０５において、同じ画像処理装置１の汎用プロセッサコア６からの同期信号を待つ。 In Act P203, the general-purpose processor core 6 determines whether or not to wait for a synchronization signal from another image processing apparatus 1. When the processing result of the general-purpose processor core 6 of another image processing apparatus 1 is used (Yes in Act P203), the processing result of the external general-purpose processor core 6 is waited in Act P204. When the processing result of the general-purpose processor core 6 of another image processing apparatus 1 is not used (No in Act P203), the process waits for a synchronization signal from the general-purpose processor core 6 of the same image processing apparatus 1 in Act P205.

アクトＰ２０６において、同期信号、即ち同期画素の処理が完了したことを受信すると、アクトＰ２０７において、汎用プロセッサコア６は処理を開始し、隣接領域に対して依存関係に基づく同期処理を行う画素（同期画素）まで画像処理を実施する。
アクトＰ２０８において、同期信号を他の画像処理装置１に送信するかどうかを判断する。同期対象が他の画像処理装置１であった場合（アクトＰ２０８でＹｅｓ）は、アクトＰ２０９において、外部バスに接続された外部メモリ上に処理結果を書込む。同期対象が同じ画像処理装置１の汎用プロセッサコア６であった場合（アクトＰ２０８でＮｏ）は、アクトＰ２１０において、共有メモリに処理結果を書込む。 In Act P206, when the synchronization signal, that is, the completion of the processing of the synchronized pixel is received, in Act P207, the general-purpose processor core 6 starts the process, and performs the synchronization processing based on the dependency on the adjacent region (synchronization pixel). Image processing is performed up to (pixel).
In Act P208, it is determined whether or not to send a synchronization signal to another image processing apparatus 1. When the synchronization target is another image processing apparatus 1 (Yes in Act P208), in Act P209, the processing result is written on the external memory connected to the external bus. When the synchronization target is the general-purpose processor core 6 of the same image processing apparatus 1 (No in Act P208), the processing result is written in the shared memory in Act P210.

そして、アクトＰ２１１において、同期信号、即ち同期画素までの処理を完了したことを該当する汎用プロセッサコア６に通知する。この処理の後、アクトＰ２１２において、画像処理を続行する。 Then, in Act P211, the corresponding general-purpose processor core 6 is notified of the completion of the processing up to the synchronization signal, that is, the synchronization pixel. After this processing, the image processing is continued in Act P212.

アクトＰ２１３において、処理結果を他の画像処理装置１に送信するかどうかを判断する。送信先が他の画像処理装置１であった場合（アクトＰ２１３でＹｅｓ）は、アクトＰ２１４において、外部バスに接続された外部メモリ上に処理結果を書込む。送信先が同じ画像処理装置１の汎用プロセッサコア６であった場合（アクトＰ２１３でＮｏ）は、アクトＰ２１５において、共有メモリに処理結果を書込む。
そして、処理の完了後、アクトＰ２１６において、処理完了を制御用プロセッサコア５に対して通知する。 In Act P213, it is determined whether or not to send the processing result to another image processing apparatus 1. When the transmission destination is another image processing apparatus 1 (Yes in Act P213), in Act P214, the processing result is written on the external memory connected to the external bus. When the transmission destination is the general-purpose processor core 6 of the same image processing apparatus 1 (No in Act P213), the processing result is written in the shared memory in Act P215.
Then, after the process is completed, the control processor core 5 is notified of the process completion in Act P216.

ここで、上述のフローでは画像処理装置の数が２の場合について説明したが、画像処理装置の数が３以上の場合についても、分割数、ならびに、領域が異なるだけで、処理は同様に行われる。 Here, the case where the number of image processing apparatuses is 2 has been described in the above flow. However, even in the case where the number of image processing apparatuses is 3 or more, only the number of divisions and the area are different, and the processing is performed in the same manner. Is called.

このように、第３の実施形態の画像処理装置は、複数の画像処理装置を用いて、汎用プロセッサコア間のデータ転送、同期処理のオーバーヘッドを考慮した処理方式を適用することで、性能向上を実現する。 As described above, the image processing apparatus according to the third embodiment improves performance by using a plurality of image processing apparatuses and applying a processing method that considers data transfer between general-purpose processor cores and synchronization processing overhead. Realize.

以上説明した各実施の形態の画像処理装置によれば、複数プロセッサコアを持つ画像処理システムにおいて、画素間に依存関係がある画像処理を、並列化によって高速に処理することができる。 According to the image processing apparatus of each embodiment described above, in an image processing system having a plurality of processor cores, image processing having a dependency relationship between pixels can be processed at high speed by parallelization.

本発明第１の実施形態の画像処理方式により、複数の計算コアからなる装置において、ある画素の処理が、他の画素の結果を利用するような依存性を持った画像処理を、複数のプロセッサコアにより並列処理することが可能となり、高速な処理の実現が可能となる。 According to the image processing method of the first embodiment of the present invention, in an apparatus composed of a plurality of calculation cores, image processing having a dependency such that processing of a certain pixel uses the result of another pixel is performed by a plurality of processors. Parallel processing can be performed by the core, and high-speed processing can be realized.

本発明第２の実施形態の画像処理方式により、複数の計算コアからなる装置において、同期処理のオーバーヘッドがある程度大きい場合でも、これを考慮した並列処理が可能となるため、ハードウェアの制約が存在する画像処理環境においても、高速処理を実現することができる。 With the image processing method according to the second embodiment of the present invention, even if the overhead of synchronization processing is large to some extent in a device composed of a plurality of computing cores, parallel processing taking this into consideration is possible, so there are hardware limitations Even in an image processing environment, high-speed processing can be realized.

本発明第３の実施形態の画像処理方式により、複数のコアからなる並列処理のプロセッサをさらに複数増設し、デバイスの接続形態に適した、同期・通信オーバーヘッドを考慮した並列処理を実現することができる。従って、複数コアによるプロセッサにおいて性能が不足した場合でも、このプロセッサを複数用いて本実施の形態のように構成することで更に性能の高い画像処理を実現することが可能となる。 According to the image processing method of the third embodiment of the present invention, it is possible to further add a plurality of parallel processing processors composed of a plurality of cores to realize parallel processing considering synchronization and communication overhead suitable for a device connection form. it can. Therefore, even when the performance of a processor with a plurality of cores is insufficient, it is possible to realize image processing with higher performance by using the plurality of processors and configuring as in the present embodiment.

尚、本発明は、上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。
また、上記実施形態に開示されている複数の構成要素の適宜な組み合せにより種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。更に、異なる実施形態に亘る構成要素を適宜組み合せてもよい。 Note that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage.
Further, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, you may combine suitably the component covering different embodiment.

本発明は、画素間に依存関係がある画像処理の処理効率向上を簡便に実現することのできる画像処理装置を製造する産業において利用することができる。 INDUSTRIAL APPLICABILITY The present invention can be used in an industry for manufacturing an image processing apparatus that can easily realize improvement in processing efficiency of image processing having a dependency relationship between pixels.

１…画像処理装置、２…外部インタフェース、３…共有メモリ、４…メモリインタフェース、５…制御用プロセッサコア、６…汎用プロセッサコア、７…バス、８…ローカルメモリ、１０…制御手段、１１…記憶領域。 DESCRIPTION OF SYMBOLS 1 ... Image processing apparatus, 2 ... External interface, 3 ... Shared memory, 4 ... Memory interface, 5 ... Control processor core, 6 ... General-purpose processor core, 7 ... Bus, 8 ... Local memory, 10 ... Control means, 11 ... Storage area.

特開２００７−８７４１７号公報JP 2007-87417 A

Claims

Shared memory to store images,
A plurality of processors connected to the shared memory and executing image processing;
A controller that divides the image stored in the shared memory into a plurality of areas, sets the divided areas in arbitrary processors, and shifts the processing start timing of the set processors sequentially to perform parallel processing. ,
The processing area where the preceding processor that starts the operation performs image processing includes a pixel that satisfies the inter-pixel dependency of the pixel that the subsequent processor that starts the operation first processes. A featured image processing apparatus.

The controller starts the operation of the succeeding processor after the image processing is completed for all the pixels that satisfy the inter-pixel dependency with the succeeding processor included in the processing region by the controller. The image processing apparatus according to claim 1, wherein:

Shared memory to store images,
A plurality of processors connected to the shared memory and executing image processing;
The image stored in the shared memory is divided into a plurality of areas, each divided area is set in an arbitrary processor, a plurality of groups in which at least a plurality of processors are grouped into one group unit is set, A plurality of processors belonging to one group, having a controller that simultaneously starts an operation and sequentially shifts the processing start timing for each group to perform parallel processing;
In the processing area where the processor belonging to the preceding group that starts the operation first performs image processing, the inter-pixel dependency of the pixel that is first processed by all the processors belonging to the succeeding group that starts the operation is satisfied. An image processing apparatus comprising a pixel.

The controller performs the post-processing after the image processing is completed for all the pixels satisfying the inter-pixel dependency with the processors belonging to the succeeding group included in the processing area. The image processing apparatus according to claim 3, wherein an operation of a processor belonging to a row group is started.

In an image processing system in which a plurality of image processing apparatuses according to claim 1 are connected to each other via a communication path,
The controller of one of the image processing devices transmits a part of the divided processing region to another image processing device for processing, and the pixel dependency with the other image processing device An image processing system for controlling a timing signal transmission / reception operation for starting image processing.

In an image processing method of an image processing apparatus comprising a shared memory for storing an image and a plurality of processors connected to the shared memory and executing image processing,
Dividing the image stored in the shared memory into a plurality of areas;
Set the divided areas for each processor,
Parallel processing by sequentially shifting the processing start timing of the set processor,
The processing area where the preceding processor that starts the operation performs image processing includes a pixel that satisfies the inter-pixel dependency of the pixel that the subsequent processor that starts the operation first processes. A featured image processing method.

The operation of the succeeding processor starts after the image processing is completed for all the pixels satisfying the inter-pixel dependency with the succeeding processor included in the processing area. The image processing method according to claim 6.

In an image processing method of an image processing apparatus comprising a shared memory for storing an image and a plurality of processors connected to the shared memory and executing image processing,
Dividing the image stored in the shared memory into a plurality of areas;
Set the divided areas for each processor,
A plurality of groups in which at least a plurality of processors are grouped into one group unit are set,
The plurality of processors belonging to the one group are started to operate simultaneously,
The processing start timing for each group is sequentially shifted in parallel,
In the processing area where the processor belonging to the preceding group that starts the operation first performs image processing, the inter-pixel dependency of the pixel that is first processed by all the processors belonging to the succeeding group that starts the operation is satisfied. An image processing method comprising a pixel.

The processor belonging to the preceding group belongs to the succeeding group after the image processing is completed for all the pixels satisfying the inter-pixel dependency with the processors belonging to the succeeding group included in the processing region. The image processing method according to claim 8, wherein the operation of the processor is started.

In an image processing method of an image processing system in which a plurality of image processing apparatuses according to claim 6 are connected to each other via a communication path,
One of the image processing devices transmits a part of the divided processing region to another image processing device for processing, and the image processing accompanying pixel dependency with the other image processing device An image processing method for controlling an operation of transmitting and receiving a timing signal for starting the operation.