JP2011095807A

JP2011095807A - Image processing apparatus and image processing method

Info

Publication number: JP2011095807A
Application number: JP2009246207A
Authority: JP
Inventors: Atsushi Uehara; 淳上原; Kohei Utsunomiya; 光平宇都宮; Shinichi Arasaki; 真一荒崎
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2009-10-27
Filing date: 2009-10-27
Publication date: 2011-05-12
Anticipated expiration: 2029-10-27
Also published as: JP5487882B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image processing apparatus and method, processing input image data without deteriorating the throughput of a device. <P>SOLUTION: The image processing apparatus 20 is configured to perform image processing, in which the dependency of the image processing to input image data exists in a prescribed direction, in parallel by starting a plurality of threads, and provided with: a device 30 capable of executing processing in parallel by starting the plurality of threads; an intermediate buffer 32 storing the result of image processing performed by the device 30; and a host H for controlling the data quantity of the input image data for which image processing is performed by the device 30. The host H is configured to create division image data D by dividing the input image data so that the boundary section B of division appears in a direction perpendicular to a direction in which the dependency of the image processing exists, to supply division image data D to the device 30, and to make the intermediate buffer 32 store the processing result corresponding to the boundary section B of the division among the processing results of the image processing performed by the device 30. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、画像処理装置および画像処理方法に関する。 The present invention relates to an image processing apparatus and an image processing method.

コンピューター内に取り付けられるデバイスとして、ＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）が広く用いられている。ＧＰＵは、本来はグラフィックス処理のためのものであるが、近年になって、汎用計算に使用するための開発環境がメーカーから提供されている。このように、汎用計算にＧＰＵを利用する技術は、ＧＰＧＰＵ（ＧｅｎｅｒａｌＰｕｒｐｏｓｅｃｏｍｐｕｔｉｎｇｏｎＧＰＵ）として知られている。現在、ＧＰＧＰＵは、計算物理学、映像および画像の処理、データベース管理、生命工学等の分野で利用されている。 A GPU (Graphics Processing Unit) is widely used as a device attached in a computer. A GPU is originally intended for graphics processing, but recently, a development environment for use in general-purpose computation has been provided by a manufacturer. As described above, a technology that uses a GPU for general-purpose computation is known as a general purpose computing on GPU (GPGPU). Currently, GPGPU is used in fields such as computational physics, video and image processing, database management, and biotechnology.

ＧＰＵは、汎用の処理装置であるＣＰＵ（中央処理装置：ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）に比べ、浮動小数点演算を並列かつ高速に実行することができる。すなわち、ＧＰＵは、大量のデータに並列に同じ演算を繰り返すような用途であれば、ＣＰＵに比べ非常に効率よく高速に処理を実行できる。 The GPU can execute floating point operations in parallel and at a higher speed than a CPU (Central Processing Unit) which is a general-purpose processing device. In other words, if the GPU is used for repeating the same operation in parallel with a large amount of data, the GPU can execute processing very efficiently and at a higher speed than the CPU.

特開２００３−１９８８１８号公報JP 2003-198818 A

ところで、ＧＰＵを用いて、処理の依存関係が一定方向にある処理を行う場合、メモリー容量等の制約により、入力された入力画像データに対して、全領域に対して並列に処理を行うことができない場合が存在する。その場合、処理の依存関係が一定方向にある処理にある関係上、その依存関係のある一定方向とは直交する方向に沿って、入力画像データを複数に分割し、分割された入力画像データを順番に処理して、メモリー容量等の制約に対応させる、という手法を採ることが考えられる。しかしながら、この場合には、同時に起動されるスレッド数が減ってしまうため、ＧＰＵでの処理におけるスループットが低下してしまう。 By the way, when processing with a processing dependency in a certain direction is performed using a GPU, it is possible to perform processing on all input areas in parallel for the input image data that has been input due to restrictions such as memory capacity. There are cases where it is not possible. In that case, because the processing dependency relationship is in a certain direction, the input image data is divided into a plurality along the direction orthogonal to the certain dependency direction, and the divided input image data is It is conceivable to adopt a method in which processing is performed in order to cope with restrictions such as memory capacity. However, in this case, since the number of threads activated simultaneously decreases, the throughput in processing by the GPU decreases.

本発明に係る幾つかの態様は、デバイスのスループットを低下させずに入力画像データを処理可能な画像処理装置および画像処理方法を提供することにある。 Some aspects of the present invention are to provide an image processing apparatus and an image processing method capable of processing input image data without reducing the throughput of the device.

上記の課題を解決するため、本発明の画像処理装置は、入力画像データに対する画像処理の依存関係が所定方向に存在する当該画像処理を、複数のスレッドを起動させて並列的に処理可能な画像処理装置であって、複数のスレッドを起動させて、当該スレッドにおける処理を並列的に実行可能なデバイスと、デバイスで画像処理された結果を記憶させることが可能な中間バッファーと、デバイスに画像処理させる入力画像データのデータ量を制御するホストと、を具備し、ホストは、入力画像データを、画像処理の依存関係が存在する方向とは直交する方向に分割の境界部分が表れるように分割した分割画像データを作成する処理を行い、その処理によってデバイスには分割画像データが供給され、デバイスは、分割画像データに対する画像処理を、画像処理の依存関係が存在する方向とは直交する方向に沿って複数のスレッドを起動させて並列的に実行させ、さらにデバイスは、当該デバイスで画像処理された処理結果のうち分割の境界部分に対応する処理結果を上記中間バッファーに記憶させるものである。 In order to solve the above-described problems, the image processing apparatus of the present invention is an image that can process the image processing in which the dependency of the image processing on the input image data exists in a predetermined direction in parallel by activating a plurality of threads. A processing apparatus that activates a plurality of threads and can execute processing in the threads in parallel, an intermediate buffer that can store a result of image processing by the device, and image processing in the device A host that controls the amount of input image data to be generated, and the host divides the input image data so that a boundary portion of the division appears in a direction orthogonal to the direction in which the dependency of image processing exists A process for creating divided image data is performed, and the divided image data is supplied to the device by the process. The process is executed in parallel by activating a plurality of threads along a direction orthogonal to the direction in which the image processing dependency exists. The processing result corresponding to the boundary portion is stored in the intermediate buffer.

このように構成する場合、デバイスには、画像処理の依存関係が存在する方向とは直交する方向に分割の境界部分が表れるように分割した分割画像データが供給され、デバイスでは、この分割画像データに対して、複数のスレッドを起動させて並列的に処理を実行可能となる。そのため、メモリー容量等の制約により、入力された入力画像データに対して、全領域に対して並列に処理を行うことができない場合であっても、画像処理の依存関係が存在する方向とは直交する方向に沿って、より多くのスレッドを起動させる（スレッドを増やす）ことができる。それにより、依存関係のある一定方向とは直交する方向に沿って、入力画像データを分割する場合と比較して、デバイスの処理におけるスループットを向上させることが可能となる。また、並列的に起動させることが可能なスレッド数を多くすることが可能となるため、スレッドにおける処理を並列的に実行可能という、デバイスの特性を生かすことが可能となる。 In such a configuration, the device is supplied with the divided image data that is divided so that the boundary portion of the division appears in a direction orthogonal to the direction in which the image processing dependency exists, and the device receives this divided image data. On the other hand, a plurality of threads can be activated to execute processing in parallel. For this reason, even if the input image data cannot be processed in parallel for the entire input area due to restrictions such as memory capacity, it is orthogonal to the direction in which the dependency of image processing exists. It is possible to activate more threads (increase threads) along the direction to be performed. As a result, it is possible to improve the throughput in the processing of the device as compared with the case where the input image data is divided along a direction orthogonal to a certain direction having a dependency relationship. In addition, since the number of threads that can be activated in parallel can be increased, it is possible to take advantage of the characteristic of the device that processing in threads can be executed in parallel.

また、本発明の他の側面は、上述の発明において、依存関係が所定方向に存在する画像処理は、スムージング処理であることが好ましい。 According to another aspect of the present invention, in the above-described invention, the image processing in which the dependency relationship exists in a predetermined direction is preferably a smoothing process.

このように構成する場合、画像処理の依存関係が存在する方向とは直交する方向に分割の境界部分が表れるように分割した分割画像データに対して、所定方向に処理の依存関係が存在するスムージング処理が為される。そのため、依存関係のある一定方向とは直交する方向に沿って、入力画像データを分割してスムージング処理を実行する場合と比較して、入力画像データ全体のスムージング処理を高速に行うことが可能となる。 When configured in this way, smoothing in which a processing dependency exists in a predetermined direction with respect to divided image data divided so that a boundary portion of the division appears in a direction orthogonal to a direction in which there is a dependency of image processing. Processing is done. Therefore, compared to the case where the input image data is divided and the smoothing process is executed along the direction orthogonal to the fixed direction having the dependency, the smoothing process of the entire input image data can be performed at high speed. Become.

さらに、本発明の他の側面は、上述の発明において、デバイスには、中間バッファーが設けられていると共に、ホストは、デバイスから中間バッファーの記憶容量に関する情報を受け取り、この記憶容量に関する情報に基づいて、分割画像データを作成する処理を実行することが好ましい。 Further, according to another aspect of the present invention, in the above-described invention, the device is provided with an intermediate buffer, and the host receives information on the storage capacity of the intermediate buffer from the device, and is based on the information on the storage capacity. Thus, it is preferable to execute processing for creating divided image data.

このように構成する場合、分割画像データのデータサイズを適切なものとすることが可能となる。 When configured in this manner, the data size of the divided image data can be made appropriate.

また、他の発明は、上述の発明に加えて更に、ホストには、メモリー制御部が設けられていると共に、メモリー制御部は、デバイスへの入力画像データの送出の制御により、分割画像データを作成する処理を実行することが好ましい。 In another invention, in addition to the above-described invention, the host is further provided with a memory control unit, and the memory control unit receives the divided image data by controlling the transmission of the input image data to the device. It is preferable to execute the process to create.

このように構成する場合、メモリー制御部での、デバイスへの入力画像データの送出の制御により、デバイス側の中間バッファーに分割画像データを作成させることが可能となる。すなわち、ホスト側では、分割画像データを作成しなくても済むため、ホスト側のメモリーを消費せずに済む。 When configured in this way, it is possible to create divided image data in the intermediate buffer on the device side by controlling the transmission of the input image data to the device by the memory control unit. That is, since it is not necessary to create divided image data on the host side, it is not necessary to consume memory on the host side.

さらに、他の発明は、上述の発明に加えて更に、ホストは、入力画像データから分割画像データを作成し、当該作成された分割画像データをデバイスに送出することが好ましい。 Furthermore, in another invention, in addition to the above-described invention, it is preferable that the host creates divided image data from the input image data, and sends the created divided image data to the device.

このように構成する場合、分割画像データを作成する処理は要するものの、ホスト側からのデータの送出に際しては、連続するアドレス順に分割画像データを送出可能となる。そのため、データの送出については、高速化が可能となる。 In such a configuration, although processing for creating divided image data is required, when sending data from the host side, the divided image data can be sent in the order of consecutive addresses. Therefore, it is possible to increase the speed of data transmission.

また、他の発明の他の側面である画像処理方法は、入力画像データに対する画像処理の依存関係が所定方向に存在する当該画像処理を、複数のスレッドを起動させて並列的に処理可能な画像処理方法であって、複数のスレッドを起動させて、当該スレッドにおける処理を並列的に実行可能なデバイスと、デバイスで画像処理された結果を記憶させることが可能な中間バッファーと、デバイスに画像処理させる入力画像データのデータ量を制御するホストと、を用い、入力画像データを、画像処理の依存関係が存在する方向とは直交する方向に分割の境界部分が表れるように分割した分割画像データを作成する処理をホストに実行させ、その処理によってデバイスには分割画像データが供給され、デバイスにおける分割画像データに対する画像処理を、画像処理の依存関係が存在する方向とは直交する方向に沿って、複数のスレッドを起動させて並列的に実行させ、さらにデバイスで画像処理された処理結果のうち分割の境界部分に対応する処理結果を中間バッファーに記憶させることが好ましい。 In addition, an image processing method according to another aspect of another invention is an image in which the image processing dependency on the input image data exists in a predetermined direction and can be processed in parallel by activating a plurality of threads. A processing method, in which a plurality of threads are started and a device capable of executing processing in the threads in parallel, an intermediate buffer capable of storing a result of image processing by the device, and image processing in the device And a host that controls the amount of input image data to be generated, and the divided image data obtained by dividing the input image data so that a boundary portion of the division appears in a direction orthogonal to the direction in which the dependency of image processing exists. The processing to be created is executed by the host, and the divided image data is supplied to the device by the processing, and the image corresponding to the divided image data in the device In parallel with the direction perpendicular to the direction in which the image processing dependency exists, multiple threads are activated and executed in parallel. It is preferable to store the corresponding processing result in the intermediate buffer.

本発明の印刷装置およびコンピューターの構成を示す概略図である。1 is a schematic diagram illustrating a configuration of a printing apparatus and a computer according to the present invention. 図１に示すコンピューター内のＧＰＵの構成例を示すブロック図である。It is a block diagram which shows the structural example of GPU in the computer shown in FIG. 図１のプリンターの概略構成を示す図である。FIG. 2 is a diagram illustrating a schematic configuration of the printer of FIG. 1. 処理の依存関係がある方向およびスレッドの並列方向を示す図である。It is a figure which shows the direction with a process dependency, and the parallel direction of a thread | sled. 画像データの分割のイメージおよび処理のイメージを示す図である。It is a figure which shows the image of the division | segmentation of image data, and the image of a process. 従前の画像データ分割のイメージおよび処理のイメージを示す図である。It is a figure which shows the image of the conventional image data division | segmentation, and the image of a process.

以下、本発明の一実施の形態に係る、画像処理装置を備える印刷装置１０について、図１から図６に基づいて説明する。ここで、印刷装置１０とは、コンピューター２０と、インクジェット方式のプリンター４０との組み合わせを指すものとするが、以下の説明において述べる機能を全て備えるプリンターであれば、当該プリンターを印刷装置１０としても良い。また、画像処理装置は、本実施の形態では、コンピューター２０が対応している。 Hereinafter, a printing apparatus 10 including an image processing apparatus according to an embodiment of the present invention will be described with reference to FIGS. 1 to 6. Here, the printing apparatus 10 refers to a combination of the computer 20 and the inkjet printer 40. However, if the printer has all the functions described in the following description, the printer may be used as the printing apparatus 10. good. The image processing apparatus corresponds to the computer 20 in the present embodiment.

＜印刷装置の概略構成＞
図１は、印刷装置１０の概略構成を示す図である。図１に示すように、印刷装置１０は、コンピューター２０と、プリンター４０とから構成されている。 <Schematic configuration of printing device>
FIG. 1 is a diagram illustrating a schematic configuration of the printing apparatus 10. As shown in FIG. 1, the printing apparatus 10 includes a computer 20 and a printer 40.

これらのうち、コンピューター２０は、ＣＰＵ（Central Processing Unit）２１と、メインメモリー２２と、ＨＤＤ（Hard Disk Drive）２３と、インターフェース２４と、バス２５と、デバイス３０等を具備している。 Among these, the computer 20 includes a CPU (Central Processing Unit) 21, a main memory 22, an HDD (Hard Disk Drive) 23, an interface 24, a bus 25, a device 30, and the like.

これらのうち、ＣＰＵ２１は、不図示のＲＯＭ（Read Only Memory）やＨＤＤ２３等から各種プログラムおよび各種データを読み出して、各種の演算を実行する部分である。また、かかる各種プログラムおよび各種データの読み出し後、コンピューター２０の各構成が協動することによって、ＣＰＵ２１には、メモリー制御部２１ａに相当する構成が機能的に実現されている。 Among these, the CPU 21 is a part that reads various programs and various data from a ROM (Read Only Memory), an HDD 23, and the like (not shown) and executes various calculations. Further, after the various programs and various data are read out, the respective configurations of the computer 20 cooperate to realize a functional configuration corresponding to the memory control unit 21a in the CPU 21.

メモリー制御部２１ａは、メインメモリー２２の供給データバッファー２２ａに記憶されている画像データ（この画像データは、請求項でいう入力画像データに対応する）を、デバイス３０側に送出する際の制御を行う。 The memory control unit 21a performs control when the image data stored in the supply data buffer 22a of the main memory 22 (this image data corresponds to input image data in the claims) is sent to the device 30 side. Do.

メインメモリー２２は、例えばＤＲＡＭ等のような各種データおよびプログラムを格納する外部メモリーである。このメインメモリー２２には、例えば、後述するアプリケーションプログラム２３ａで作成されたＲＧＢ表色系の画像データを記憶する供給データバッファー２２ａと、後述するＧＰＵ３１での処理後のデータを記憶する中間データバッファー２２ｂとを有している。 The main memory 22 is an external memory for storing various data and programs such as a DRAM. In the main memory 22, for example, a supply data buffer 22a that stores RGB color system image data created by an application program 23a described later, and an intermediate data buffer 22b that stores data after processing by a GPU 31 described later. And have.

ＨＤＤ２３は、ＣＰＵ２１からの要求に応じて、記録媒体であるハードディスクに記録されているデータあるいはプログラムを読み出すとともに、ＣＰＵ２１の演算処理の結果として発生した所定のデータを前述したハードディスクに記録することを可能としている。インターフェース２４は、プリンター４０に対して画像データを出力すると共に、外部入力装置および外部記憶装置から出力された信号の表現形式を適宜変換して入力させるための回路である。また、バス２５は、ＣＰＵ２１、メインメモリー２２、ＨＤＤ２３、デバイス３０等を接続する信号の伝送路である。 In response to a request from the CPU 21, the HDD 23 can read data or a program recorded on the hard disk as a recording medium, and can record predetermined data generated as a result of the arithmetic processing of the CPU 21 on the hard disk described above. It is said. The interface 24 is a circuit for outputting image data to the printer 40 and appropriately converting and inputting a representation format of signals output from the external input device and the external storage device. The bus 25 is a signal transmission path for connecting the CPU 21, the main memory 22, the HDD 23, the device 30, and the like.

このＨＤＤ２３には、アプリケーションプログラム２３ａ、ビデオドライバープログラム２３ｂ、およびプリンタードライバープログラム２３ｃが実装されており、これらが所定のオペレーティングシステム（ＯＳ）の下で動作している。 The HDD 23 is loaded with an application program 23a, a video driver program 23b, and a printer driver program 23c, which operate under a predetermined operating system (OS).

ここで、プリンタードライバープログラム２３ｃは、解像度変換モジュール、色変換モジュール、色変換テーブル、ハーフトーンモジュール、記録率テーブル、印刷データ生成モジュール、送信モジュール等を構成要素として有している（いずれのモジュールも図示省略）。 Here, the printer driver program 23c includes a resolution conversion module, a color conversion module, a color conversion table, a halftone module, a recording rate table, a print data generation module, a transmission module, and the like as constituent elements (all modules). (Not shown).

これらのうち、解像度変換モジュールは、ＲＧＢ表色系の画像データの解像度を、プリンター４０の印刷解像度に応じて適宜変換する。色変換モジュールは、ＲＧＢ（Red, Green, Blue）表色系によって表現されている画像データを、色変換テーブルを参照して、ＣＭＹＫ（Cyan, Magenta, Yellow, Black）表色系の画像データに変換する処理を行う。ハーフトーンモジュールは、たとえば隣接する画素同士の画素値の平均を取る等の手法により、画像のエッジ部分の画素値の変動を滑らかにするスムージング処理を行う。その後、ハーフトーンモジュールは、たとえばディザ処理により、ＣＭＹＫ表色系によってたとえば１画素が２５６階調によって表現される画像データを、記録率テーブルを参照して、大、中、小の３種類のドットの組み合わせからなるビットマップデータに変換する。なお、スムージング処理は、色変換モジュール等の他のモジュールによって実行されても良い。 Among these, the resolution conversion module appropriately converts the resolution of the RGB color system image data according to the printing resolution of the printer 40. The color conversion module converts the image data expressed by the RGB (Red, Green, Blue) color system into image data of the CMYK (Cyan, Magenta, Yellow, Black) color system by referring to the color conversion table. Perform the conversion process. The halftone module performs a smoothing process that smoothes the fluctuation of the pixel value at the edge portion of the image, for example, by averaging the pixel values of adjacent pixels. After that, the halftone module uses, for example, dither processing to display image data in which one pixel is expressed with 256 gradations in the CMYK color system by referring to the recording rate table and using three types of dots, large, medium, and small. Convert to bitmap data consisting of Note that the smoothing process may be executed by another module such as a color conversion module.

印刷データ生成モジュールは、ハーフトーンモジュールから出力されたビットマップデータから、各主走査時のドットの記録状態を示すラスタデータと、副走査送り量を示すデータとを含む印刷データを生成する。送信モジュールは、印刷データ生成モジュールによって生成された印刷データを、プリンター４０に対して送信する。 The print data generation module generates print data including raster data indicating the dot recording state during each main scan and data indicating the sub-scan feed amount from the bitmap data output from the halftone module. The transmission module transmits the print data generated by the print data generation module to the printer 40.

デバイス３０は、グラフィックスボードとも称呼されるが、このデバイス３０には、ＧＰＵ（Graphics Processing Unit）３１と、中間バッファーの一例としてのグラフィックスメモリー３２とが設けられている。ＧＰＵ３１は、ＣＰＵ２２から送られてきたデータに対して、後述するような処理を施し、再びＣＰＵ２１側に出力する。また、グラフィックスメモリー３２は、ＣＰＵ２１側から送られてきたデータを記憶する。 Although the device 30 is also referred to as a graphics board, the device 30 includes a GPU (Graphics Processing Unit) 31 and a graphics memory 32 as an example of an intermediate buffer. The GPU 31 performs processing as described later on the data sent from the CPU 22 and outputs it again to the CPU 21 side. The graphics memory 32 stores data transmitted from the CPU 21 side.

また、図１に示すように、ＧＰＵ３１には、スレッド制御部３１ａと、メモリー指令部３１ｂとが機能的に実現されている。スレッド制御部３１ａは、後述するように処理用データに対する処理を、自動的にスレッド単位に分割すると共に、スレッドの後述するストリーミングプロセッサー３１６への割り当てをコントロールする。また、メモリー指令部３１ｂは、後述するように、分割された分割画像データＤの境界における計算結果を、グラフィックスメモリー３２に保持させて、次の分割画像データＤの計算において、スレッド制御部３１ａに受け渡すように指令する。 As shown in FIG. 1, the GPU 31 functionally includes a thread control unit 31a and a memory command unit 31b. As will be described later, the thread control unit 31a automatically divides processing for processing data into units of threads, and controls assignment of threads to a streaming processor 316 described later. Further, as will be described later, the memory command unit 31b stores the calculation result at the boundary of the divided image data D in the graphics memory 32, and in the next calculation of the divided image data D, the thread control unit 31a. To pass to.

なお、以下の説明においては、デバイス３０に対して、ＣＰＵ２１、メインメモリー２２およびＨＤＤ２３を含むものを、ホストＨと称呼する。 In the following description, the device 30 including the CPU 21, the main memory 22, and the HDD 23 is referred to as a host H.

＜ＧＰＵの構成例＞
図２は、ＧＰＵ３１の構成例を示すブロック図である。ここでは、ＮＶＩＤＩＡ社のＧｅＦｏｒｃｅ（登録商標）８８００ＧＴＸを例に説明する。このＧＰＵ３１は、ＣＵＤＡ（Compute Unified Device Architecture；登録商標）と呼ばれる、Ｃ言語での統合開発環境に対応しているものである。従前のＧＰＵにおいては、頂点シェーダとピクセルシェーダという、２つの機能のシェーダを備えているが、ＧＰＵ３１は、全てのシェーダが同じ機能を持つ、統合シェーダと呼ばれる設計思想を採用している。なお、統合シェーダは、後述するストリーミングプロセッサー３１６に実現されている。 <GPU configuration example>
FIG. 2 is a block diagram illustrating a configuration example of the GPU 31. Here, GeForce (registered trademark) 8800GTX manufactured by NVIDIA will be described as an example. This GPU 31 corresponds to an integrated development environment in C language called CUDA (Compute Unified Device Architecture; registered trademark). A conventional GPU has a shader having two functions, a vertex shader and a pixel shader, but the GPU 31 adopts a design concept called an integrated shader in which all shaders have the same function. Note that the integrated shader is implemented in a streaming processor 316 described later.

このＧＰＵ３１においては、ＧＰＧＰＵ（General
Purpose computing on GPU）として知られている、ＧＰＵ３１の演算資源を画像処理以外の目的に応用する技術を実行可能となっている。 In this GPU 31, GPGPU (General
It is possible to execute a technology known as “Purpose computing on GPU” that applies the computing resources of the GPU 31 for purposes other than image processing.

このＧＰＵ３１は、８個のテクスチャープロセッサークラスター（ＴＰＣ）３１０を有する。各テクスチャープロセッサークラスター３１０は、２つのストリーミングマルチプロセッサー（ＳＭ）３１１と、コンスタントキャッシュ３１２およびテクスチャーキャッシュ３１３とにより構成される。ストリーミングマルチプロセッサー３１１はそれぞれ、シェアードメモリ３１４、命令ユニット３１５および８個のストリーミングプロセッサー（ＳＰ）３１６により構成される。この構成において、ストリーミングプロセッサー３１６が個々の計算ユニットとなり、８×２×８＝１２８個の処理を並列に実行することができる。ここでは市販されている特定の製品の構成例を示しているが、基本的な構成、すなわち複数の計算ユニットが並列に処理する構成は、どのＧＰＵでも同じある。 This GPU 31 has eight texture processor clusters (TPC) 310. Each texture processor cluster 310 includes two streaming multiprocessors (SM) 311, a constant cache 312 and a texture cache 313. Each of the streaming multiprocessors 311 includes a shared memory 314, an instruction unit 315, and eight streaming processors (SP) 316. In this configuration, the streaming processor 316 becomes an individual calculation unit, and 8 × 2 × 8 = 128 processes can be executed in parallel. Here, a configuration example of a specific product that is commercially available is shown, but the basic configuration, that is, the configuration in which a plurality of calculation units process in parallel is the same for any GPU.

ここで、ある処理（本実施の形態では、後述するように画像データのスムージング）をＧＰＵ３１にて実行する場合、ＧＰＵ３１のスレッド制御部３１ａが、その処理を自動的にスレッド単位に分割する。ところで、分割されたスレッド数は、ストリーミングプロセッサー３１６の数よりも多いことが通常である。一方、１つのストリーミングマルチプロセッサー３１１は、８個のストリーミングプロセッサー３１６を有しているため、１つのストリーミングマルチプロセッサー３１１で物理的に並列処理できるスレッド数は８個となっている。そのため、たとえば、数千〜数万というように、非常に多くの数に分割されたスレッドは、時分割でそれぞれのストリーミングプロセッサー３１６に割り当てるように構成されている。 Here, when a certain process (in this embodiment, smoothing of image data as described later) is executed by the GPU 31, the thread control unit 31a of the GPU 31 automatically divides the process into units of threads. Incidentally, the number of divided threads is usually larger than the number of streaming processors 316. On the other hand, since one streaming multiprocessor 311 has eight streaming processors 316, the number of threads that can be physically processed in parallel by one streaming multiprocessor 311 is eight. Therefore, for example, a very large number of threads such as thousands to tens of thousands are configured to be allocated to the respective streaming processors 316 in a time division manner.

＜その他の構成（プリンターの概略構成）＞
続いて、プリンター４０の概略構成について説明する。図３は、プリンター４０の概略構成を示す図である。プリンター４０は、紙送り機構５０と、インク供給機構６０と、ラインヘッド７０と、プリンター制御部８０とを具備している。 <Other configuration (schematic printer configuration)>
Next, a schematic configuration of the printer 40 will be described. FIG. 3 is a diagram illustrating a schematic configuration of the printer 40. The printer 40 includes a paper feed mechanism 50, an ink supply mechanism 60, a line head 70, and a printer control unit 80.

紙送り機構５０は、紙送りモーター（ＰＦモーター）５１と、この紙送りモーター５１からの駆動力が伝達される給紙ローラー５２等を具備していて、印刷用紙等の印刷媒体Ｐを、供給部位から排紙側に向けて搬送可能となっている。また、インク供給機構６０は、カートリッジホルダー６１と、インクカートリッジ６２と、インク供給路６３とを具備している。カートリッジホルダー６１には、インクカートリッジ６２が着脱自在に装着されている。そのため、図３のプリンター４０は、いわゆるオフキャリッジタイプの構成となっているが、オンキャリッジタイプのプリンターであっても良い。また、インクカートリッジ６２とラインヘッド７０との間には、インク供給路６３が設けられていて、インクカートリッジ６２からラインヘッド７０にインクを供給可能としている。 The paper feed mechanism 50 includes a paper feed motor (PF motor) 51 and a paper feed roller 52 to which the driving force from the paper feed motor 51 is transmitted, and supplies a print medium P such as print paper. It can be conveyed from the part toward the paper discharge side. The ink supply mechanism 60 includes a cartridge holder 61, an ink cartridge 62, and an ink supply path 63. An ink cartridge 62 is detachably attached to the cartridge holder 61. 3 has a so-called off-carriage type configuration, it may be an on-carriage type printer. An ink supply path 63 is provided between the ink cartridge 62 and the line head 70 so that ink can be supplied from the ink cartridge 62 to the line head 70.

また、ラインヘッド７０は、印刷媒体Ｐよりも幅広の長さ寸法を有している。このラインヘッド７０は、複数の短尺ヘッド（図示省略）が、副走査方向において交互に前後しつつ、主走査方向に沿って並ぶように配列されている。 Further, the line head 70 has a width dimension wider than that of the print medium P. The line head 70 is arranged such that a plurality of short heads (not shown) are arranged along the main scanning direction while alternately moving back and forth in the sub scanning direction.

また、プリンター制御部８０は、不図示のＣＰＵ、メモリー（ＲＯＭ、ＲＡＭ、不揮発性メモリー等）、ＡＳＩＣ（Application Specific Integrated Circuit）、バス、タイマ、インターフェース等を有している。このプリンター制御部８０には、各種センサーからの信号が入力されると共に、このセンサーからの信号に基づいて、プリンター制御部８０は、コンピューター２０側から送信されてきた印刷データに基づいて、紙送りモーター５１等のモーター、およびラインヘッド７０等の駆動を司る。 The printer control unit 80 includes a CPU (not shown), a memory (ROM, RAM, nonvolatile memory, etc.), an ASIC (Application Specific Integrated Circuit), a bus, a timer, an interface, and the like. The printer control unit 80 receives signals from various sensors, and based on the signals from the sensors, the printer control unit 80 feeds paper based on print data transmitted from the computer 20 side. It controls the motors such as the motor 51 and the drive of the line head 70 and the like.

＜本実施の形態における動作＞
以上のような構成のコンピューター２０において、画像処理を行う場合の一例について、以下に説明する。 <Operation in the present embodiment>
An example of performing image processing in the computer 20 having the above configuration will be described below.

ＣＰＵ２１の指令によって画像処理を行う場合、その画像処理の中でも、スムージング処理のような処理は、データ配列のうち、図４に示すような所定の方向に処理の依存関係が存在している。たとえば、図４においては、データ配列の横方向のデータ（０，０）〜（ｎ，０）、（０，１）〜（ｎ，１）、．．（０，ｍ）〜（ｎ，ｍ）に、処理の依存関係が存在している。 When image processing is performed in accordance with an instruction from the CPU 21, among such image processing, processing such as smoothing processing has processing dependency in a predetermined direction as shown in FIG. 4 in the data array. For example, in FIG. 4, data (0,0) to (n, 0), (0,1) to (n, 1),. . Processing dependencies exist in (0, m) to (n, m).

このような、所定の方向に処理の依存関係が存在しているデータ（本実施の形態では画像データ）の処理を行うのに際して、メモリー制御部２１ａは、ＧＰＵ３１のグラフィックスメモリー３２の記憶容量の情報を参照して、ＧＰＵ３１に送出するデータ量を決定する。そして、決定されたデータ量と、依存関係のある方向とは直交する方向（図４においては縦方向）のデータ数（画素数）とから、依存関係のある方向のデータ数（画素数）が決定される。そして、画像データ全体においては、図５に示すように、幾つかの分割画像データＤに分割される状態となる。 When processing such data (image data in the present embodiment) having processing dependency in a predetermined direction, the memory control unit 21a determines the storage capacity of the graphics memory 32 of the GPU 31. The amount of data to be sent to the GPU 31 is determined with reference to the information. Then, from the determined data amount and the number of data (number of pixels) in the direction orthogonal to the direction having the dependency (vertical direction in FIG. 4), the number of data (number of pixels) in the direction having the dependency is obtained. It is determined. The entire image data is in a state of being divided into several divided image data D as shown in FIG.

なお、かかる分割画像データＤへの分割は、実際にホストＨ側で分割を行い、その分割が為された分割画像データＤをメインメモリー２２に再度記憶させるようにしても良い。また、メモリー制御部２１ａにおける画像データの送出の制御によって、見かけ上、このような分割画像データＤへの分割を実現するようにしても良い。メモリー制御部２１ａにおける画像データの送出の制御により見かけ上の分割を実現する場合、メモリー制御部２１ａでは、データ送出に関して、依存関係のある方向における最初の先頭アドレスと、依存関係のある方向における最終アドレスまでの間のデータ数とを指定し、それを依存方向のある方向と直交する方向の全てに対して実行することによって、行うことができる。 The division into the divided image data D may be actually performed on the host H side, and the divided image data D after the division may be stored in the main memory 22 again. In addition, apparently such division into divided image data D may be realized by controlling the transmission of image data in the memory control unit 21a. When the apparent division is realized by controlling the transmission of the image data in the memory control unit 21a, the memory control unit 21a has the first head address in the dependency direction and the last in the dependency direction regarding the data transmission. This can be done by designating the number of data up to the address and executing it for all directions orthogonal to the direction of dependence.

また、ＣＰＵ２１ａに機能的に実現されるメモリー制御部２１ａによらずに、ＧＰＵ３１側の指令によって、グラフィックスメモリー３２に分割画像データを作成するようにしても良い。この場合、ＣＰＵ２１がＧＰＵ３１に対して、スムージング処理等の所定の画像処理を指令すると、ＧＰＵ３１のメモリー指令部３１ｂは、メインメモリー２２の供給データバッファー２２ａから、グラフィックスメモリー３２の所定の領域に、分割画像データＤを読み込んで作成する。なお、グラフィックスメモリー３２の所定の領域への分割画像データＤの作成は、依存関係のある方向における最初の先頭アドレスと、依存関係のある方向における最終アドレスまでの間のデータ数とを指定し、それを依存方向のある方向と直交する方向の全てに対して実行することによって実現される。 Further, the divided image data may be created in the graphics memory 32 by a command on the GPU 31 side, without using the memory control unit 21a functionally realized by the CPU 21a. In this case, when the CPU 21 instructs the GPU 31 to perform predetermined image processing such as smoothing processing, the memory command unit 31 b of the GPU 31 moves from the supply data buffer 22 a of the main memory 22 to a predetermined area of the graphics memory 32. The divided image data D is read and created. In creating the divided image data D in a predetermined area of the graphics memory 32, the first head address in the dependency direction and the number of data between the last address in the dependency direction are designated. This is realized by executing it for all the directions orthogonal to the direction having the dependency direction.

そして、ＧＰＵ３１のグラフィックスメモリー３２では、グリッドと呼ばれる単位毎にホストＨから送出されてきた分割画像データＤである処理用データが記憶させられる。その後、この処理用データに対し、スレッド制御部３１ａは、処理用データに対する処理を、自動的にスレッド単位に分割すると共に、スレッドの後述するストリーミングプロセッサー３１６への割り当てをコントロールする。そして、ストリーミングプロセッサー３１６では、各スレッドの処理を、順次実行する。ここでの処理は、スムージング処理のように、図５に示すような所定の方向に処理の依存関係が存在しているものである。たとえば、スムージング処理では、所定の方向に沿って移動しながら、たとえば隣接する画素同士の画素値の平均を取る等、複数の画素の間の平均値を取るためのスレッドが実行される。 The graphics memory 32 of the GPU 31 stores processing data that is divided image data D sent from the host H for each unit called a grid. Thereafter, for this processing data, the thread control unit 31a automatically divides the processing for the processing data into units of threads, and controls the assignment of threads to a streaming processor 316 described later. Then, the streaming processor 316 sequentially executes the processing of each thread. The processing here has a processing dependency in a predetermined direction as shown in FIG. 5 as in the smoothing processing. For example, in the smoothing process, a thread for taking an average value between a plurality of pixels is executed, for example, taking an average of pixel values of adjacent pixels while moving along a predetermined direction.

このような処理に対応する数千〜数万のスレッドが、スレッド制御部３１ａのコントロールにより、見かけ上同時に実行可能となっている。 Thousands to tens of thousands of threads corresponding to such processing can be apparently executed simultaneously under the control of the thread control unit 31a.

そして、分割画像データＤに対応する全てのスレッドの処理が終了したときに、その境界部分Ｂに対応する処理結果を、グラフィックスメモリー３２に保持させるように、メモリー指令部３１ｂは指令する。また、分割画像データＤに対応する全てのスレッドの処理が終了するまでの間、その処理結果は、グラフィックスメモリー３２に蓄えられる。そして、分割画像データＤに対応する全てのスレッドの処理が終了すると、その処理結果が、メモリー指令部３１ｂの指令によって、ホストＨ側に受け渡される。続いて、次の分割画像データＤの処理用データが、ホストＨから送出されてきて、グラフィックスメモリー３２に記憶させられる。 Then, when the processing of all threads corresponding to the divided image data D is completed, the memory command unit 31b instructs the graphics memory 32 to hold the processing result corresponding to the boundary portion B. The processing result is stored in the graphics memory 32 until the processing of all threads corresponding to the divided image data D is completed. When all the threads corresponding to the divided image data D have been processed, the processing result is transferred to the host H side according to a command from the memory command unit 31b. Subsequently, processing data for the next divided image data D is sent from the host H and stored in the graphics memory 32.

また、スレッド制御部３１ａでは、次の処理用データに対する処理を、自動的にスレッド単位に分割すると共に、スレッドの後述するストリーミングプロセッサー３１６への割り当てをコントロールする。ここで、メモリー指令部３１ｂは、直前の処理における、分割された分割画像データＤの境界における計算結果を、グラフィックスメモリー３２から読み出して、スレッド制御部３１ａに受け渡す。すると、スレッド制御部３１ａでは、受け渡された計算結果を、時分割における最初のスレッドに反映させる。そのため、直前に処理がなされた分割画像データＤの処理用データと、現在処理が行われる分割画像データＤの処理用データとの間で、所定の方向における処理の依存関係を損なうことがなくなる。 In addition, the thread control unit 31a automatically divides the processing for the next processing data into units of threads, and controls the allocation of threads to a streaming processor 316 described later. Here, the memory command unit 31b reads the calculation result at the boundary of the divided divided image data D in the immediately preceding process from the graphics memory 32, and transfers it to the thread control unit 31a. Then, the thread control unit 31a reflects the passed calculation result on the first thread in the time division. Therefore, the processing dependency in the predetermined direction is not impaired between the processing data of the divided image data D processed immediately before and the processing data of the divided image data D currently processed.

以後、全ての分割画像データＤに対する処理が終了するまで、このような処理を繰り返し行う。 Thereafter, such processing is repeated until the processing for all the divided image data D is completed.

＜効果＞
以上のような構成の印刷装置１０によれば、デバイス３０（ＧＰＵ３１）には、画像処理の依存関係が存在する方向とは直交する方向に、境界部分Ｂが表れるように分割した分割画像データＤが供給される。そして、デバイス３０（ＧＰＵ３１）では、この分割画像データＤに対して、複数のスレッドを起動させて並列的に処理を実行可能となる。そのため、ＧＰＵ３１のグラフィックスメモリー３２のメモリー容量等の制約により、入力された入力画像データに対して、全領域に対して並列に処理を行うことができない場合であっても、画像処理の依存関係が存在する方向とは直交する方向に沿って、より多くのスレッドを起動させることができる。 <Effect>
According to the printing apparatus 10 configured as described above, the divided image data D that is divided so that the boundary portion B appears in the direction orthogonal to the direction in which the image processing dependency exists in the device 30 (GPU 31). Is supplied. In the device 30 (GPU 31), a plurality of threads can be activated for the divided image data D to execute processing in parallel. Therefore, even if the input image data cannot be processed in parallel for the entire area due to restrictions such as the memory capacity of the graphics memory 32 of the GPU 31, the dependency of the image processing More threads can be activated along a direction orthogonal to the direction in which the is present.

それにより、依存関係のある一定方向とは直交する方向に沿って、入力画像データを分割する場合と比較して、デバイス３０（ＧＰＵ３１）の処理におけるスループットを向上させることが可能となる。また、並列的に起動させることが可能なスレッド数を多くすることが可能となるため、スレッドにおける処理を並列的に実行可能という、デバイス３０（ＧＰＵ３１）の特性を生かすことが可能となる。 Thereby, it is possible to improve the throughput in the processing of the device 30 (GPU 31) as compared to the case where the input image data is divided along a direction orthogonal to the certain direction having the dependency. In addition, since it is possible to increase the number of threads that can be activated in parallel, it is possible to take advantage of the characteristics of the device 30 (GPU 31) that processing in threads can be executed in parallel.

また、本実施の形態では、依存関係が所定方向に存在する画像処理の一例は、スムージング処理となっている。そのため、画像処理の依存関係が存在する方向とは直交する方向に分割の境界部分が表れるように分割した分割画像データＤに対して、所定方向に処理の依存関係が存在するスムージング処理が為される。それにより、依存関係のある一定方向とは直交する方向に沿って、入力画像データを分割してスムージング処理を実行する場合と比較して、入力画像データ全体のスムージング処理を高速に行うことが可能となる。 In the present embodiment, an example of image processing in which the dependency relationship exists in a predetermined direction is smoothing processing. For this reason, the divided image data D that has been divided so that the boundary portion of the division appears in a direction orthogonal to the direction in which the image processing dependency exists, is subjected to smoothing processing that has a processing dependency in a predetermined direction. The This makes it possible to perform smoothing processing on the entire input image data at a higher speed than when performing smoothing processing by dividing the input image data along a direction orthogonal to a certain fixed direction. It becomes.

ここで、従前の、たとえばスムージング処理のような画像処理の一例を、図６に示す。図６に示す画像処理においては、所定の方向における処理の依存関係を損なわせないようにするべく、処理の依存関係が存在する方向とは直交する方向（スレッドの並列方向；図６では縦方向）に、画像データを分割するようにしている。この場合、処理の依存関係が存在する方向は、図６における横方向である関係上、起動できるスレッド数は、図５におけるものと比較して、大幅に少なくなってしまう。その結果、並列処理を高速に行うことができる、という、ＧＰＵ３１の特性を十分に生かすことができなくなっている。 Here, FIG. 6 shows an example of conventional image processing such as smoothing processing. In the image processing shown in FIG. 6, in order not to impair the processing dependency in a predetermined direction, the direction perpendicular to the direction in which the processing dependency exists (parallel direction of threads; vertical direction in FIG. 6) ) Is divided into image data. In this case, the direction in which the processing dependency exists is the horizontal direction in FIG. 6, so that the number of threads that can be activated is significantly smaller than that in FIG. 5. As a result, it is impossible to make full use of the characteristic of the GPU 31 that parallel processing can be performed at high speed.

これに対して、図５から明らかなように、本実施の形態における画像処理では、起動できるスレッド数を非常に多く確保することができ、並列処理を高速に行うことができる、という、ＧＰＵ３１の特性を十分に生かすことが可能となっている。 On the other hand, as apparent from FIG. 5, the image processing in the present embodiment can secure a very large number of threads that can be activated, and can perform parallel processing at high speed. It is possible to make full use of the characteristics.

さらに、本実施の形態では、ホストＨ（メモリー制御部２１ａ）は、デバイス３０からグラフィックスメモリー３２の記憶容量に関する情報を受け取り、この記憶容量に関する情報に基づいて、分割画像データＤを作成する処理を実行している。このため、分割画像データＤのデータサイズを適切なものとすることが可能となる。 Further, in the present embodiment, the host H (memory control unit 21a) receives information related to the storage capacity of the graphics memory 32 from the device 30, and creates divided image data D based on the information related to the storage capacity. Is running. For this reason, the data size of the divided image data D can be made appropriate.

また、本実施の形態では、ホストＨのＣＰＵ２１には、メモリー制御部２１ａが機能的に実現されており、このメモリー制御部２１ａは、デバイス３０への入力画像データの送出の制御により、分割画像データＤを作成する処理を実行している。このため、デバイス３０側のグラフィックスメモリー３２に分割画像データＤを作成させることが可能となる。すなわち、ホストＨ側では、分割画像データＤを作成しなくても済むため、ホストＨ側のメインメモリー２２を消費せずに済む。 In the present embodiment, the CPU 21 of the host H is functionally realized with a memory control unit 21a. The memory control unit 21a controls the divided image by controlling the transmission of input image data to the device 30. A process of creating data D is executed. Therefore, the divided image data D can be created in the graphics memory 32 on the device 30 side. In other words, since it is not necessary to create the divided image data D on the host H side, it is not necessary to consume the main memory 22 on the host H side.

なお、ホストＨは、入力画像データから分割画像データＤを作成し、当該作成された分割画像データＤをデバイス３０に送出するようにしても良い。この場合には、分割画像データＤを作成する処理は要するものの、ホストＨ側からの分割画像データＤの送出に際しては、連続するアドレス順に分割画像データＤを送出可能となる。そのため、データ送出については、高速化が可能となる。 The host H may create the divided image data D from the input image data and send the created divided image data D to the device 30. In this case, the process of creating the divided image data D is required, but when the divided image data D is sent from the host H side, the divided image data D can be sent in the order of consecutive addresses. As a result, data transmission can be speeded up.

また、本実施の形態における印刷装置１０においては、デバイス３０（ＧＰＵ３１）での画像処理の高速化が図れるため、プリンター４０の印刷までのスループットを向上させることが可能となる。特に、ラインヘッド７０を備えるプリンター４０においては、印刷が高速に為されるが、デバイス３０（ＧＰＵ３１）での画像処理の高速化が図れるため、そのようなプリンター４０において、印刷時に画像処理のための待ち時間を短縮化できるか、または待ち時間を発生させないようにすることが可能となる。 Further, in the printing apparatus 10 according to the present embodiment, the speed of image processing in the device 30 (GPU 31) can be increased, so that the throughput up to the printing of the printer 40 can be improved. In particular, in the printer 40 including the line head 70, printing is performed at high speed. However, since image processing in the device 30 (GPU 31) can be accelerated, the printer 40 performs image processing at the time of printing. The waiting time can be shortened, or no waiting time can be generated.

＜変形例＞
以上、本発明の一実施の形態について述べたが、本発明はこれ以外にも種々変形可能である。以下、それについて述べる。 <Modification>
Although one embodiment of the present invention has been described above, the present invention can be variously modified in addition to this. This will be described below.

上述の実施の形態では、依存関係が所定方向に存在する処理として、スムージング処理について説明している。しかしながら、このような依存関係が所定方向に存在する処理は、スムージング処理には限られず、各種の処理に適用させるようにしても良い。たとえば、平面的に依存関係が存在する各種の処理において、ある方向のみに依存関係が生じるように制限を掛け、そのような制限を掛けた各種の処理に、本発明を適用するようにしても良い。そのような処理としては、たとえば、誤差拡散の処理、ラプラシアンフィルタ処理等の各種のフィルタ処理、バイリニア補間、ニアレストネイバー補間等がある。 In the above-described embodiment, the smoothing process is described as the process in which the dependency relationship exists in the predetermined direction. However, the process in which such a dependency relationship exists in a predetermined direction is not limited to the smoothing process, and may be applied to various processes. For example, in various types of processing in which a dependency relationship exists in a plane, a restriction is imposed so that the dependency relationship is generated only in a certain direction, and the present invention may be applied to various processing in which such a limitation is applied. good. Such processing includes, for example, error diffusion processing, various filter processing such as Laplacian filter processing, bilinear interpolation, nearest neighbor interpolation, and the like.

また、上述の実施の形態では、画像処理装置の機能は、印刷装置１０を構成するコンピューター２０に実現されている場合について説明している。しかしながら、この画像処理装置の機能は、コンピューター２０以外の部位に実現されていても良い。たとえば、プリンター４０がデバイス３０と接続可能な構成の場合には、プリンター制御部８０とデバイス３０とによって、画像処理装置が構成される。また、プリンター４０の制御部がデバイスを含む構成を採用する場合には、当該プリンター４０の制御部によって、画像処理装置が構成される。 In the above-described embodiment, the case where the function of the image processing apparatus is realized in the computer 20 constituting the printing apparatus 10 is described. However, the function of this image processing apparatus may be realized in a part other than the computer 20. For example, when the printer 40 is configured to be connectable to the device 30, the image processing apparatus is configured by the printer control unit 80 and the device 30. When the control unit of the printer 40 includes a device, an image processing apparatus is configured by the control unit of the printer 40.

また、上述の実施の形態では、１つのホストＨと、１つのデバイス３０を備えるコンピューター２０について説明している。しかしながら、ホストＨおよびデバイス３０の個数は、１つには限られず、複数存在していても良い。特に、デバイス３０が複数存在する場合には、画像処理において、処理速度の一層の高速化を図ることが可能となる。 In the above-described embodiment, the computer 20 including one host H and one device 30 is described. However, the number of hosts H and devices 30 is not limited to one, and a plurality of hosts H and devices 30 may exist. In particular, when there are a plurality of devices 30, it is possible to further increase the processing speed in image processing.

また、上述の実施の形態においては、デバイス３０は、特定の機種に限られるものではなく、たとえば数千〜数万といった具合に、非常に多くのスレッドを並列的に実行可能なものであれば、どのようなデバイス３０を用いても良い。 In the above-described embodiment, the device 30 is not limited to a specific model, and may be any device that can execute a very large number of threads in parallel, such as thousands to tens of thousands. Any device 30 may be used.

なお、上述した画像処理装置および画像処理方法は、上述したような動作を実現するためのプログラムを、コンピューター２０にインストールすることにより、実現することができる。 The image processing apparatus and the image processing method described above can be realized by installing a program for realizing the above-described operation in the computer 20.

また、上述の実施の形態において、中間バッファーとして、グラフィックスメモリー３２を一例として述べている。しかしながら、それ以外の、コンスタントキャッシュ３１２、テクスチャーキャッシュ３１３、シェアードメモリ３１４のうちの少なくとも１つを、中間バッファーとしても良い。また、デバイス３０の外部に存在するメモリーを、中間バッファーとして用いるようにしても良い。 In the above-described embodiment, the graphics memory 32 is described as an example of the intermediate buffer. However, at least one of the constant cache 312, the texture cache 313, and the shared memory 314 may be used as an intermediate buffer. Further, a memory existing outside the device 30 may be used as an intermediate buffer.

また、上述の実施の形態においては、処理対象とするデータとして、画像データを用いる場合について説明しているが、処理対象とするデータとしては、画像データを印刷用のデータに変換する場合だけでなく、計算物理学、映像および画像の処理、データベース管理、生命工学等でも利用できる。 In the above-described embodiment, the case where image data is used as the data to be processed has been described. However, the data to be processed is only when image data is converted into print data. It can also be used in computational physics, video and image processing, database management, biotechnology, etc.

また、上述の実施の形態においては、インクジェット方式のプリンター４０を例示して説明している。しかしながら、液体噴射装置は、インクジェット方式のプリンター４０には限られない。例えば、ジェルジェット方式のプリンターに対して、本発明を適用することが可能であり、例えば機能材料の粒子が分散されている液状体、ジェルのような粒状体等を噴射する液体噴射装置に本発明を適用することが可能である。また、上述の実施の形態におけるプリンター４０は、プリンター機能以外の機能（スキャナー機能、コピー機能等）を備える構成のような、複合的な機器の一部であっても良い。 In the above-described embodiment, the ink jet printer 40 is described as an example. However, the liquid ejecting apparatus is not limited to the ink jet printer 40. For example, the present invention can be applied to a gel jet printer. For example, the present invention is applied to a liquid ejecting apparatus that ejects a liquid material in which functional material particles are dispersed, a granular material such as a gel, or the like. The invention can be applied. Further, the printer 40 in the above-described embodiment may be a part of a complex device such as a configuration having a function (scanner function, copy function, etc.) other than the printer function.

さらに、プリンターとしては、インクジェット方式以外の方式（レーザ方式、ドットインパクト方式等）に対しても、本発明を適用することは勿論可能である。 Furthermore, as a printer, it is of course possible to apply the present invention to methods other than the ink jet method (laser method, dot impact method, etc.).

１０…印刷装置、２０…コンピューター、２１…ＣＰＵ、２１ａ…メモリー制御部、２２…メインメモリー、２２ａ…供給データバッファー、２２ｂ…中間データバッファー、２３…ＨＤＤ、２３ｃ…プリンタードライバープログラム、３０…デバイス、３１…ＧＰＵ、３２…グラフィックスメモリー（中間バッファーの一例に対応）、４０…プリンター、７０…ラインヘッド、８０…プリンター制御部、３１０…テクスチャープロセッサークラスター、３１１…ストリーミングマルチプロセッサー、３１６…ストリーミングプロセッサー、Ｂ…境界部分、Ｄ…分割画像データ、Ｈ…ホスト、Ｐ…印刷媒体
DESCRIPTION OF SYMBOLS 10 ... Printing device, 20 ... Computer, 21 ... CPU, 21a ... Memory control part, 22 ... Main memory, 22a ... Supply data buffer, 22b ... Intermediate data buffer, 23 ... HDD, 23c ... Printer driver program, 30 ... Device, 31 ... GPU, 32 ... Graphics memory (corresponding to an example of an intermediate buffer), 40 ... Printer, 70 ... Line head, 80 ... Printer control unit, 310 ... Texture processor cluster, 311 ... Streaming multiprocessor, 316 ... Streaming processor, B: boundary portion, D: divided image data, H: host, P: print medium

Claims

An image processing apparatus capable of processing a plurality of threads in parallel to perform the image processing in which the dependency of image processing on input image data exists in a predetermined direction,
A device capable of activating a plurality of the threads and executing the processes in the threads in parallel;
An intermediate buffer capable of storing the result of image processing by the device;
A host for controlling the amount of input image data to be processed by the device;
Comprising
The host performs a process of creating divided image data obtained by dividing the input image data so that a boundary portion of the division appears in a direction orthogonal to a direction in which the image processing dependency exists,
Through the processing, the divided image data is supplied to the device,
The device causes image processing on the divided image data to be executed in parallel by activating a plurality of threads along a direction orthogonal to a direction in which the dependency relationship of the image processing exists.
Further, the device stores, in the intermediate buffer, a processing result corresponding to the boundary portion of the division among the processing results subjected to image processing by the device.
An image processing apparatus.

The image processing apparatus according to claim 1,
The image processing in which the dependency relationship exists in a predetermined direction is a smoothing processing.
An image processing apparatus.

The image processing apparatus according to claim 1 or 2,
The device is provided with the intermediate buffer,
The host receives information on the storage capacity of the intermediate buffer from the device, and executes processing for creating the divided image data based on the information on the storage capacity;
An image processing apparatus.

The image processing apparatus according to claim 3,
The host is provided with a memory control unit,
The memory control unit executes processing for creating the divided image data by controlling transmission of the input image data to the device.
An image processing apparatus.

The image processing apparatus according to claim 3,
The host creates the divided image data from the input image data, and sends the created divided image data to the device.
An image processing apparatus.

An image processing method capable of processing a plurality of threads in parallel and performing the image processing in which the dependency of image processing on input image data exists in a predetermined direction,
A device capable of activating a plurality of the threads and executing the processes in the threads in parallel;
An intermediate buffer capable of storing the result of image processing by the device;
A host for controlling the amount of input image data to be processed by the device;
Use
Causing the host to execute a process of creating divided image data obtained by dividing the input image data so that a boundary portion of the division appears in a direction orthogonal to a direction in which the image processing dependency exists;
Through the processing, the divided image data is supplied to the device,
The image processing on the divided image data in the device is executed in parallel by activating a plurality of threads along a direction orthogonal to the direction in which the dependency relationship of the image processing exists.
Furthermore, the processing result corresponding to the boundary part of the division among the processing results image-processed by the device is stored in the intermediate buffer.
An image processing method.