JP2010073210A

JP2010073210A - Image processing apparatus

Info

Publication number: JP2010073210A
Application number: JP2009216016A
Authority: JP
Inventors: Yusuke Suzuki; 裕介鈴木
Original assignee: Toshiba Corp; Toshiba TEC Corp
Current assignee: Toshiba Corp; Toshiba TEC Corp
Priority date: 2008-09-17
Filing date: 2009-09-17
Publication date: 2010-04-02

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image processing apparatus that allows enhancement of throughput by executing processors in parallel. <P>SOLUTION: In an image processing method, image data developed in a memory are divided longways, each piece of the divided image data is performed with JPEG (Joint Photographic Experts Group) compression processing in parallel; a prescribed amount of code data are asynchronously written every time a prescribed amount of the compression data are stored; and information allowing identification of writing of own processor is recorded in the memory. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

この発明は、プロセッサを並列に実行してスループットを高めることが可能な画像処理装置に関する。 The present invention relates to an image processing apparatus capable of increasing throughput by executing processors in parallel.

ＭＦＰ（Multi-Functional peripheral、すなわち多機能周辺機器）と称される画像処理装置が広く普及している。 An image processing apparatus called an MFP (Multi-Functional peripheral) is widely used.

ＭＦＰにおいては、印刷等の画像処理の高速化、高解像度化がますます要求されつつある。さらに、ファクシミリ、イメージスキャナ等の機能を加えたＭＦＰも数多く市場に供給されており、大容量のデジタル画像データを高速に処理する必要性が大きくなってきている。 In MFPs, there is an increasing demand for faster image processing such as printing and higher resolution. In addition, many MFPs with functions such as facsimiles and image scanners have been supplied to the market, and the need for high-speed processing of large-capacity digital image data is increasing.

ＭＦＰにおいて行われる各種の画像処理は、一般に、数多くの演算をできるだけ高速に行う必要があり、処理速度の高速化の方法としては、一般に個々の演算処理を並列に実行することにより、全体としてのスループットを向上させる方法が有効であり、既に数多くの装置や方法が提案されている。 Various image processing performed in the MFP generally requires a large number of calculations to be performed as fast as possible. As a method for increasing the processing speed, generally, individual calculation processes are performed in parallel. A method for improving the throughput is effective, and many apparatuses and methods have already been proposed.

特許文献１（先行技術文献）には、画像をパケット（画像小区画と属性等とを含むデータ列）として生成し、各パケットを、複数のデータフロー型画像処理部を用いて並列処理する技術が提案されている。 Patent Document 1 (prior art document) discloses a technique in which an image is generated as a packet (a data string including an image subdivision and an attribute), and each packet is processed in parallel using a plurality of data flow image processing units. Has been proposed.

特許文献２（先行技術文献）には、描画オブジェクト（図形、テキスト、画像など）毎に複数の処理ユニットに処理を並列実行させる画像処理装置が提案されている。 Patent Document 2 (prior art document) proposes an image processing apparatus that causes a plurality of processing units to execute processing in parallel for each drawing object (graphic, text, image, etc.).

そして、この他にも、大量の画像データの高速処理に有効な方法として、処理ユニットを並列化して実行する方法がいくつか提案されている。これらの方法によれば、複数の画像パケットが個々の画像パケット毎に画像圧縮されるので、生成された個々の圧縮画像パケット（圧縮パケット）は、データ量が小さく、且つ独立して処理可能な１つの処理単位（画像単位）として扱うことが可能である。従って、圧縮画像パケットのデータ量が小さい場合には、ＭＦＰにおける利用や格納、或いは、通信ネットワークを介した転送等において利便性が高い。 In addition to this, several methods for executing processing units in parallel have been proposed as effective methods for high-speed processing of a large amount of image data. According to these methods, since a plurality of image packets are compressed for each image packet, each generated compressed image packet (compressed packet) has a small amount of data and can be processed independently. It can be handled as one processing unit (image unit). Therefore, when the data amount of the compressed image packet is small, it is highly convenient for use and storage in the MFP or transfer through the communication network.

また、画像パケットを圧縮する方法には、各種のアルゴリズムが存在するが、近年、ＪＰＥＧ（Joint Photographic Experts Group）方式が広く普及している。 There are various algorithms for compressing image packets, but in recent years, the JPEG (Joint Photographic Experts Group) method has become widespread.

ＭＦＰにＪＰＥＧ方式の画像圧縮を採用する大きな利点としては、市場において標準的に広く使われている方法であり、ハードウェア回路等の利用に際しても、これまでに蓄積された設計開発の過程で改善されたものを採用することができるとともに、信頼性や安定性が高く、装置内部でのデータ転送においてもフォーマットが確定しているので、ハードウェア／ソフトウェアの双方共に、設計信頼性／拡張性／再利用性が高いことが挙げられる。 The major advantage of adopting JPEG image compression for MFP is the standard and widely used method in the market, and it is improved in the process of design development accumulated so far when using hardware circuits etc. Can be adopted, and the reliability and stability are high, and the format is fixed in the data transfer inside the device. The reusability is high.

このように、元画像の画像データを複数の画像パケットに分割すると共に、個々の画像パケット毎にＪＰＥＧ方式による画像圧縮（以下、「ＪＰＥＧ圧縮」と略称する）を施すことによって得られる圧縮画像パケットを用いて、並列処理を行う画像処理装置を用いれば、高速なデータ転送／処理を行うことができ、且つ高効率な容量で扱うことができる。 Thus, the compressed image packet obtained by dividing the image data of the original image into a plurality of image packets and performing image compression by the JPEG method for each image packet (hereinafter abbreviated as “JPEG compression”). By using an image processing apparatus that performs parallel processing, it is possible to perform high-speed data transfer / processing and handle it with a highly efficient capacity.

特許文献３（先行技術文献）には、画像データ圧縮処理であるＪＰＥＧ形式の圧縮に関して、１ページ画像の圧縮に関して、圧縮単位であるＭｃｕ（Minimum code unit）をグルーピングし、グループの圧縮符号データを、Ｍｃｕ間の依存関係をリセットするリスタートマーカーを、圧縮パケットに追加しながら、データを並列に処理する方式が提案されている。 In Patent Document 3 (prior art document), regarding the compression of JPEG format, which is an image data compression process, Mcu (Minimum code unit) which is a compression unit is grouped for compression of one-page image, and the compression code data of the group is stored. A method of processing data in parallel while adding a restart marker for resetting the dependency between Mcu to the compressed packet has been proposed.

文献１または文献２に示されたＪＰＥＧ圧縮は、上記のような利点があるが、圧縮データ長が可変であること、また、圧縮されるデータ間に依存関係があるため、圧縮・復号ともに画像先頭からの処理が必要であること、に起因して、複数プロセッサによる並列化での高速化が非常に難しい。 The JPEG compression shown in Document 1 or Document 2 has the above-mentioned advantages, but the compressed data length is variable, and there is a dependency between the compressed data, so that both compression and decoding are image data. Due to the necessity of processing from the beginning, it is very difficult to speed up the parallel processing by a plurality of processors.

一方、文献３が開示する方式では、Ｍｃｕグループの符号化データ書出し毎に、符号データ長を一定長に揃えるため、Ｍｃｕグループが増えた場合に、データの冗長性が高くなり、圧縮率が低下するという課題がある。また、Ｍｃｕグループ毎に、データ処理の終わりを確認せねばならず、処理対象となるデータによってプロセッサ毎の処理時間にバラツキが出た＜プロセッサ毎の処理時間が異なる＞場合、処理が完了したプロセッサは、他のプロセッサの処理完了を待たねばならず、プロセッサの利用率が低下し、その結果、圧縮・伸張処理の処理時間が増大することが知られている。 On the other hand, in the method disclosed in Document 3, since the code data length is made uniform every time the encoded data of the Mcu group is written, the data redundancy increases and the compression rate decreases when the Mcu group increases. There is a problem of doing. Further, if the end of data processing must be confirmed for each Mcu group, and the processing time for each processor varies depending on the data to be processed (processing time for each processor differs), the processor that has completed processing It is known that the processing of another processor must be waited for, and the utilization rate of the processor decreases, and as a result, the processing time of compression / decompression processing increases.

この発明の目的は、プロセッサを複数持つシステムにおいて、ＪＰＥＧをベースとした方式で、圧縮率が高く、高速に画像の圧縮・伸長を行うための、手段、画像フォーマットを提供することである。また、この発明の目的は、メニーコアや、マルチプロセッサシステムで、サイズの大きい画像データを、並列処理により、圧縮・伸長を高速に行うことが可能なＭＦＰを提供することである。 SUMMARY OF THE INVENTION An object of the present invention is to provide means and an image format for performing high-speed image compression / decompression with a high compression ratio in a system having a plurality of processors, based on JPEG. Another object of the present invention is to provide an MFP capable of performing high-speed compression / decompression of large image data by parallel processing in a many-core or multiprocessor system.

この発明は、上記問題点に基づきなされたもので、ＪＰＥＧ方式で、互いに直交する２方向についてマトリクス状に分割した複数の領域の画像データを独立して２以上のプロセッサにより並列に圧縮する画像処理方法において、個々の画像データは、その領域の画像データを圧縮した際のプロセッサを特定する情報を含むことを特徴とする画像処理方法を提供するものである。 The present invention has been made on the basis of the above problems. In the JPEG method, image processing of a plurality of regions divided in a matrix in two directions orthogonal to each other is independently compressed in parallel by two or more processors. In the method, the image processing method is characterized in that each piece of image data includes information for specifying a processor when the image data of the area is compressed.

この発明の一つの実施の形態によれば、メニーコアや、マルチプロセッサシステムで、サイズの大きい画像データを、並列処理により、圧縮・伸長を高速に行うことができる。 According to one embodiment of the present invention, large-size image data can be compressed and decompressed at high speed by parallel processing in a many-core or multiprocessor system.

すなわち、本発明の画像圧縮・伸張方式を適用することにより、マルチコアＣＰＵを搭載したシステムにおいて、並列に圧縮・伸張を行う際、各コアで実行する処理の実行率を高く、メモリアクセスに関する同期処理のオーバーヘッドが低く抑えることができる。 That is, by applying the image compression / decompression method of the present invention, in a system equipped with a multi-core CPU, when performing compression / decompression in parallel, the execution rate of processing executed by each core is high, and synchronous processing related to memory access is performed. The overhead can be kept low.

また、圧縮・伸張処理を別のシステムで実施する場合にも、圧縮・伸張を実行するそれぞれのシステムで、並列度が高いほうのシステムの実行率を高くすることができるため、圧縮・伸張処理トータルでスループットを向上させることができ。 Even when compression / decompression processing is performed on another system, the execution rate of the system with the higher degree of parallelism can be increased in each system that performs compression / decompression. Total throughput can be improved.

また、複数のコアから共通にアクセスされる２次キャッシュメモリと、コアが内部に持つ１次キャッシュメモリ、共有される２次キャッシュメモリ間のスワップ動作の回数を抑えることで、処理速度が低下することを抑止できる。 In addition, the processing speed is reduced by suppressing the number of swap operations between the secondary cache memory that is commonly accessed from a plurality of cores, the primary cache memory in the core, and the shared secondary cache memory. Can be suppressed.

また、２次キャッシュメモリと共有メモリ間のＤＭＡ転送のデータ転送を適切なサイズで実施することにより、処理速度が低下することが防止出来る。 In addition, it is possible to prevent the processing speed from being lowered by performing data transfer of DMA transfer between the secondary cache memory and the shared memory with an appropriate size.

従って、マルチコアＣＰＵによる圧縮・伸張処理を、画質を損なうことなく、高速に処理させることができる。 Therefore, the compression / decompression process by the multi-core CPU can be processed at high speed without impairing the image quality.

本発明が適用される可変長画像圧縮の一例を示す概略図。Schematic which shows an example of the variable-length image compression to which this invention is applied. 本発明の複数ＰＥでの圧縮・伸長処理が可能な圧縮方式を説明する概略図。Schematic explaining the compression system in which compression and expansion | extension processing by multiple PE of this invention are possible. 本発明の実施形態に関わるシステム全体のブロック図。1 is a block diagram of an entire system according to an embodiment of the present invention. 本発明の実施形態に関わる全体処理のフローチャート。The flowchart of the whole process in connection with embodiment of this invention. 本発明の実施形態に関わる圧縮処理のフローチャート。The flowchart of the compression process in connection with embodiment of this invention. 本発明の実施形態に関わる伸長処理のフローチャート。The flowchart of the expansion | extension process in connection with embodiment of this invention. 本発明の別の（第２の）実施形態に関わるシステム全体のブロック図。The block diagram of the whole system in connection with another (2nd) embodiment of this invention. 図７に示した（第２の）実施形態に関わるシステムにおける１次、２次キャッシュ間メモリ操作を説明する概略図。Schematic explaining the memory operation between the primary and secondary cache in the system according to the (second) embodiment shown in FIG. 図７に示した（第２の）実施形態に関わるシステムにおける１次、２次キャッシュ間メモリ操作を説明する概略図。Schematic explaining the memory operation between the primary and secondary cache in the system according to the (second) embodiment shown in FIG. 本発明の別の（第３の）実施形態に関わるシステム全体のブロック図。The block diagram of the whole system in connection with another (3rd) embodiment of this invention. 図９に示した（第３の）実施形態における並列分割数決定方法、ならびに圧縮データの配列例を説明する概略図。FIG. 10 is a schematic diagram for explaining a parallel division number determination method and an example of arrangement of compressed data in the (third) embodiment shown in FIG. 9. 図９に示した（第３の）システムにおいて、分割数よりも少ないコア数のＣＰＵによる処理の例を説明する概略図。FIG. 10 is a schematic diagram illustrating an example of processing performed by a CPU having a smaller number of cores than the number of divisions in the (third) system illustrated in FIG. 9. 図９に示した（第３の）システムにおいて、分割数よりも少ないコア数のＣＰＵによる処理の別の（本願の特徴的な改良結果を含む）例を説明する概略図。FIG. 10 is a schematic diagram for explaining another example (including the characteristic improvement result of the present application) of processing by a CPU having a smaller number of cores than the number of divisions in the (third) system shown in FIG. 9. 図９に示した（第３の）システムにおいて、クライアントＰＣ内処理のフローチャート。10 is a flowchart of processing in the client PC in the (third) system shown in FIG. 9. 図１３に示したクライアントＰＣ内処理における「並列分割数決定処理」を説明するフローチャート。14 is a flowchart for explaining “parallel division number determination processing” in the processing in the client PC shown in FIG. 13. 図９に示した（第３の）システムにおいて、プリンタコントローラ内処理のフローチャート。FIG. 10 is a flowchart of processing in the printer controller in the (third) system shown in FIG. 9. FIG. 本発明の別の（第４の）実施形態に関する処理のメモリ利用イメージ。The memory utilization image of the process regarding another (4th) embodiment of this invention. 図１６に示した方式によりコアの１つが共有メモリ上に配置するデータ（のイメージ）を説明する概略図。FIG. 17 is a schematic diagram for explaining data (image thereof) arranged on a shared memory by one of the cores by the method shown in FIG. 16. 図１６に示した（第４の）システムにおいて実行される「分割数決定処理」を説明するフローチャート。17 is a flowchart for explaining “division number determination processing” executed in the (fourth) system shown in FIG. 16.

以下、図面を参照して、本発明の実施の形態について説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図１は、この発明が適用可能な画像処理装置（ＭＦＰ、Multi-Functional Peripheral）の実施形態に関わる最も基本となるシステムのブロック図を示す。 FIG. 1 is a block diagram of the most basic system according to an embodiment of an image processing apparatus (MFP, Multi-Functional Peripheral) to which the present invention can be applied.

図１に示す画像処理装置（以下、システムと称する）１は、システムバス１１を介して相互に接続する複数のデバイス、例えばＣＰＵ（主制御装置）２１、共有メモリ２３、ＨＤＤ（Hard Disc Drive、大容量記憶装置）２５、ならびにｎ（ｎは正の整数）個のＰＥ（演算ユニット）ＰＥ１〜ＰＥｎを含む。 An image processing apparatus (hereinafter referred to as a system) 1 shown in FIG. 1 includes a plurality of devices connected to each other via a system bus 11, such as a CPU (main control device) 21, a shared memory 23, an HDD (Hard Disc Drive, Large-capacity storage device) 25, and n (n is a positive integer) PEs (arithmetic units) PE1 to PEn.

システムバス１１は、外部とのデータ入出力や、内部のデータ通信に利用される。 The system bus 11 is used for data input / output with the outside and internal data communication.

共有メモリ２３については、システム内で結合されているそれぞれのプロセッサＰＥ１〜ＰＥｎからの書込み、および読込みが可能である。 The shared memory 23 can be written and read from each processor PE1 to PEn coupled in the system.

次に、図１に示すシステム１に適用可能な広く利用されているＪＰＥＧ（Joint Photographic Experts Group）圧縮方式、およびそれをより高速度化可能な本提案（本願）の圧縮方式について説明する。 Next, a widely used JPEG (Joint Photographic Experts Group) compression method applicable to the system 1 shown in FIG. 1 and the compression method of the present proposal (this application) capable of increasing the speed thereof will be described.

図２は、ＪＰＥＧ圧縮の符号化方式を簡単に説明するものである。 FIG. 2 briefly explains the encoding method of JPEG compression.

ＪＰＥＧ圧縮では、画像データは、Ｍｃｕ（Minimum code unit）と呼ばれる圧縮の最小単位の画素のまとまり毎に圧縮される。図２の例では、高さ（方向）１６ピクセル、幅（方向）１６ピクセルの画像圧縮を、高さ８ピクセル、幅８ピクセルのＭｃｕ単位で圧縮するものとする。 In JPEG compression, image data is compressed for each group of pixels, which is a minimum unit of compression called Mcu (Minimum code unit). In the example of FIG. 2, it is assumed that image compression of 16 pixels in height (direction) and 16 pixels in width (direction) is compressed in units of Mcu having a height of 8 pixels and a width of 8 pixels.

圧縮される画像データは、Ｍｃｕの行単位に処理される。 The compressed image data is processed in units of Mcu lines.

処理は、Ｍｃｕブロック内の６４画素を、ＲＧＢカラーの入力画像データを、ＹＣＣの輝度色差カラーデータに色変換処理し、ＹＣＣの成分毎にＤＣＴ処理して周波数成分に変換した後、周波数成分毎に量子化テーブルを用いて量子化し、最後に量子化された値をハフマン符号化により符号化するものである。 The processing is performed by color-converting 64 pixels in the Mcu block from RGB color input image data to YCC luminance color difference color data, DCT processing for each YCC component, and converting the frequency component into frequency components. Are quantized using a quantization table, and the last quantized value is encoded by Huffman coding.

この処理の中で、ハフマン符号化の符号を格納する際、ＤＣＴ変換後に生成されるＭｃｕブロック内の直流成分信号の格納に関しては、前のＭｃｕを圧縮した際の直流成分との差分値として値が格納される。この場合、隣接するＭｃｕ間では、画像の直流成分は比較的似ることから、差分値の値が小さくなり格納されるデータ長が短くなり圧縮率が高くなる効果がある。 In this process, when storing the code of the Huffman coding, regarding the storage of the DC component signal in the Mcu block generated after the DCT conversion, the value is obtained as a difference value from the DC component when the previous Mcu is compressed. Is stored. In this case, since the DC component of the image is relatively similar between adjacent Mcus, there is an effect that the value of the difference value is reduced, the stored data length is shortened, and the compression rate is increased.

しかし、この方式によりＭｃｕ圧縮符号の間に依存関係が生じ、１６×１６画像のＭｃｕ符号データ間には、図示するとおり、特有の依存関係（「Ｍｃｕ（０，０）」に隣接して「Ｍｃｕ（０，１）」が位置し、「Ｍｃｕ（０，１）」の「Ｍｃｕ（０，０）」とは逆側に「Ｍｃｕ（１，０）」が位置する）が存在する。 However, a dependency relationship is generated between the Mcu compression codes by this method, and the Mcu code data of the 16 × 16 image is adjacent to the unique dependency relationship (“Mcu (0, 0)”) as illustrated in FIG. Mcu (0, 1) ”is located, and“ Mcu (1, 0) ”is located on the opposite side of“ Mcu (0, 0) ”of“ Mcu (0, 1) ”.

このように圧縮された画像データを伸長する際には、圧縮された画像の先頭のＭｃｕブロックに該当する符号データから順に、ハフマン復号化、隣接Ｍｃｕ直流成分より直流成分算出、逆量子化、逆ＤＣＴ変換、ＹＣＣカラーデータからＲＧＢへの色変換が順に行われる。 When decompressing image data compressed in this way, Huffman decoding, DC component calculation from adjacent Mcu DC components, inverse quantization, inverse processing are performed in order from the code data corresponding to the first Mcu block of the compressed image. DCT conversion and color conversion from YCC color data to RGB are sequentially performed.

図３は、広く利用されている上述のＪＰＥＧ圧縮方式の処理において実績があるハードウェア回路や装置内部におけるデータ転送のフォーマットの一部または多くの要素を共通化でき、より高速度化可能な圧縮方式とその圧縮されたデータの伸長処理を実現するもので、図１に示したｎ個のプロセッサＰＥ１〜ＰＥｎの内、２つのプロセッサＰＥ１、およびＰＥ２により、圧縮・伸長を、並行に処理することで、より処理速度を向上できる処理の一例を示す。 FIG. 3 shows a hardware circuit that has a proven record in the processing of the above-mentioned JPEG compression method that is widely used, or a part or many elements of the data transfer format inside the apparatus, and compression that enables higher speed. The system and its decompression processing of compressed data are realized. Among the n processors PE1 to PEn shown in FIG. 1, compression and decompression are processed in parallel by two processors PE1 and PE2. An example of processing that can further improve the processing speed will be described.

それぞれのプロセッサＰＥ１，ＰＥ２は、画像データメモリに展開したときにアドレスが連続する画像方向を横方向としたとき、それに垂直な方向を縦方向としたとき、画像データを縦方向に一定間隔で分けて、分かれた領域をプロセッサの１つに割り当てて圧縮させる。 The processors PE1 and PE2 divide the image data at regular intervals in the vertical direction when the image direction in which the addresses are continuous when expanded in the image data memory is the horizontal direction and the vertical direction is the vertical direction. Thus, the divided area is allocated to one of the processors and compressed.

それぞれのプロセッサＰＥ１，ＰＥ２は、圧縮の際、上述のＪＰＥＧとは異なり、Ｍｃｕ直流成分の差分を振り分けられた領域に閉じた形で、領域終端に来たところで、領域内の次のＭｃｕ行の先頭に当たる要素の計算のベースとして演算する。 Each of the processors PE1 and PE2 is different from the above-described JPEG during compression, and when the difference of the Mcu DC component is closed to the allocated area and reaches the end of the area, the next Mcu row in the area is Operate as the base for the calculation of the first element.

これにより、分けられた領域間のデータの依存関係が無くなる。 Thereby, there is no data dependency between the divided areas.

それぞれのプロセッサＰＥ１，ＰＥ２が上記のように割り当てられた領域を圧縮する際、圧縮データ長は、圧縮する画像データによって変わるため、同じ速度で動作するプロセッサ同士でも、ローカルメモリ（ローカルバッファと称する場合もある）に、データが貯まっていく速度が異なる。 When each of the processors PE1 and PE2 compresses the allocated area as described above, the compressed data length varies depending on the image data to be compressed. Therefore, the processors operating at the same speed also have local memory (referred to as a local buffer). However, the speed at which data is stored differs.

本発明では、領域を担当するそれぞれのプロセッサは、ローカルメモリにある一定量の圧縮データが貯まった段階で書込みを行うため、使用するローカルメモリが少なくても実装できる。前記書き出しを行う一定量であるが、これについては、そのシステムが書き出すメディアに対して最適化された値がよい。 In the present invention, each processor in charge of the area performs writing when a certain amount of compressed data is stored in the local memory, so that it can be implemented even if the local memory to be used is small. Although this is a fixed amount for writing, a value optimized for the media written by the system is preferable.

図３から明らかであるが、システム１のＨＤＤ２５に圧縮データを書き出す場合、システム１がＨＤＤ２５にデータを書き込む際の大きさ、例えばシステム１がＨＤＤ２５にデータを書き込む際に２ｋバイト（［ｋｂ］）ずつデータを書き込むのであれば、上記ローカルバッファに貯める圧縮データの量を２ｋバイト（［ｋｂ］）とする。 As apparent from FIG. 3, when writing compressed data to the HDD 25 of the system 1, the size when the system 1 writes data to the HDD 25, for example, 2 kbytes ([kb]) when the system 1 writes data to the HDD 25. If data is to be written one by one, the amount of compressed data stored in the local buffer is 2 kbytes ([kb]).

前記ローカルバッファに蓄積された圧縮データを書き込む際、どのプロセッサが書き込みを行ったかを判別可能なデータを、同じように書き出す。このデータは、符号データとは異なる領域に記憶され、画像全体の圧縮処理が完了するまで保持される。 When the compressed data stored in the local buffer is written, data that can determine which processor performed the writing is similarly written. This data is stored in a different area from the code data, and held until the entire image is compressed.

プロセッサが割り当てられた領域の画像をすべて圧縮したあと、残った圧縮データを上記のように書き出して処理を完了する。 After all the images in the allocated area are compressed by the processor, the remaining compressed data is written as described above to complete the processing.

すべて領域の圧縮処理が完了すると、蓄積された書込みＰＥデータ、および圧縮画像パラメータ（幅、高さ、カラー／モノクロ）と、書込みＰＥデータが書き込まれたオフセットが書き出される。 When the compression processing for all areas is completed, the accumulated writing PE data, the compressed image parameters (width, height, color / monochrome), and the offset at which the writing PE data is written are written out.

これにより、画像を縦に帯状に分けて圧縮されたデータの入った圧縮ファイルが完成する。 As a result, a compressed file containing data compressed by dividing the image vertically into strips is completed.

次に、前記圧縮された画像ファイルの伸張処理について説明する。 Next, the decompression process of the compressed image file will be described.

伸張処理では、それぞれのプロセッサが、書込みＰＥデータが書き込まれたオフセットを、ファイル終端部分から読み取り、続いて、書込みＰＥデータを読み込む。各プロセッサは、書込みＰＥデータを参照し、自プロセッサが書き込んだ符号化データを読み込み、順次、ハフマン復号、隣接Ｍｃｕの直流成分から自Ｍｃｕの直流成分算出、逆量子化、逆ＤＣＴ、ＹＣＣからＲＧＢへの色変換処理を行う。 In the decompression process, each processor reads the offset at which the write PE data is written from the end of the file, and then reads the write PE data. Each processor refers to the write PE data, reads the encoded data written by the processor, sequentially performs Huffman decoding, calculates the DC component of the own Mcu from the DC component of the adjacent Mcu, inverse quantization, inverse DCT, RGB from the YCC Perform color conversion processing.

処理の際、圧縮データが不足した際には、書込みＰＥデータを参照し、次の自ＰＥが書き出した圧縮データを読み込む。 In the process, when the compressed data is insufficient, the write PE data is referred to and the next compressed data written by the self PE is read.

復号された画像データは、各プロセッサからアクセス可能なメモリ領域（共有メモリ）２３に書き込まれる。 The decoded image data is written in a memory area (shared memory) 23 accessible from each processor.

より詳細には、図示のように、
圧縮＜１＞ → 各ＰＥが画像を縦方向に圧縮
ＰＥ１ＣｏｄｅＭｃｕ（０，０）ＣｏｄｅＭｃｕ（１，０）
ＰＥ２ＣｏｄｅＭｃｕ（０，１）ＣｏｄｅＭｃｕ（１，１）
圧縮＜２＞ → 各ＰＥが、Ａ［ｋｂ］分圧縮する度、符号データを書き出す
により、圧縮が行われる。 More specifically, as shown,
Compression <1> → Each PE compresses the image vertically
PE1 CodeMcu (0,0) CodeMcu (1,0)
PE2 CodeMcu (0,1) CodeMcu (1,1)
Compression <2> → Each PE writes code data each time it compresses by A [kb]
Thus, compression is performed.

一方、伸長の場合は、
伸長＜１＞書き込みＰＥデータに従い、Ａ［ｋｂ］ずつデータをロード
伸長＜２＞復号処理、及び復号画像書き出し
ＰＥ１書き込みＰＥデータ
ＰＥ２書き込みＰＥデータ
となる。 On the other hand,
Decompress <1> Load data in A [kb] according to the PE data written
Decompression <2> Decoding processing and decoding image writing
PE1 Write PE data
PE2 Write PE data
It becomes.

図４は、本発明の実施形態に関わる、全体処理のフローチャートである。 FIG. 4 is a flowchart of overall processing according to the embodiment of the present invention.

システムは、画像の入力処理か、出力処理を、外部から受ける［ＡＣＴ０１］。 The system receives an image input process or an output process from the outside [ACT01].

入力処理を受け付けた場合［ＡＣＴ０１−ＹＥＳ］、入力される画像ファイルに関する情報を、圧縮ファイルのヘッダ部分に書き出す［ＡＣＴ０２］。 When the input process is accepted [ACT01-YES], information regarding the input image file is written in the header portion of the compressed file [ACT02].

次に、画像領域を縦方向に分けて、それを各プロセッサに割り当てて、この領域情報と、圧縮処理の開始を指示する［ＡＣＴ０３］。 Next, the image area is divided in the vertical direction, assigned to each processor, and the area information and the start of the compression process are instructed [ACT03].

この後、それぞれのプロセッサは、図５を用いて以下に説明するフローチャートに示す処理を行い、割り当てられた領域を圧縮処理する［ＡＣＴ０４］。 Thereafter, each processor performs the processing shown in the flowchart described below with reference to FIG. 5, and compresses the allocated area [ACT04].

圧縮を行うプロセッサは、圧縮コードとともに、書出しＰＥデータを書き出す［ＡＣＴ０５］。 The processor that performs compression writes the writing PE data together with the compression code [ACT05].

全プロセッサの処理完了後、このデータは圧縮コードに引き続いて書き込まれ、最後に、この書出しＰＥデータが書き込まれているオフセット情報を書き出し、圧縮処理は完了する［ＡＣＴ０６］。 After the processing of all the processors is completed, this data is written following the compression code. Finally, the offset information in which the writing PE data is written is written, and the compression processing is completed [ACT06].

出力処理を受け付けた場合［ＡＣＴ０１−ＮＯ］、各プロセッサに復号処理を指示する［ＡＣＴ０７］。この後、復号処理を指示されたプロセッサは図６により後段に説明するフローチャートの処理を行って画像（データ）の伸長を行う。 When the output process is accepted [ACT01-NO], the decoding process is instructed to each processor [ACT07]. Thereafter, the processor instructed to perform the decoding process performs the process of the flowchart described later with reference to FIG. 6 to decompress the image (data).

すなわち、
画像のヘッダ情報をファイルに書き出し
各ＰＥに縦分割領域をアサインし、圧縮処理開始を指示
各プロセッサでの圧縮終了待ち
書き込みＰＥデータを書き出し
書き込みＰＥデータオフセットを書き出し
各ＰＥに伸長処理開始を指示
が、順に実行される。 That is,
Export image header information to a file
Assign vertical division area to each PE and instruct compression start
Wait for compression to finish on each processor
Write PE data
Write PE data offset
Instruct each PE to start decompression
Are executed in order.

図５は、図４を用いて説明した圧縮時の各プロセッサでの処理に関するフローチャートを示す。 FIG. 5 is a flowchart regarding processing in each processor at the time of compression described with reference to FIG.

各プロセッサは、自プロセッサが圧縮処理する画像の領域を示す情報を取得する。領域の指定に関しては、ｘ列目〜ｙ列目という形で指定される［ＡＣＴ１１］。 Each processor acquires information indicating an area of an image to be compressed by the processor. The area is specified in the form of the x-th column to the y-th column [ACT11].

共有メモリ上の画像データから、処理する対象となる画像データをローカルメモリにコピーして圧縮処理を行う。圧縮処理は、領域左端の８×８領域（図２におけるＭｃｕ（０，０））から始め、それを画像横方向に処理していき、領域の端（図２におけるＭｃｕ（０，１））に達したら、画像左端で、高さをＭｃｕの高さ８だけ下方にずらした位置（図２におけるＭｃｕ（１，０））から再び処理を進めるという順序で行われる。処理に際して、圧縮された符号データは、一旦、ローカルメモリに格納される。 The image data to be processed is copied from the image data on the shared memory to the local memory and compressed. The compression process starts from the 8 × 8 region at the left end of the region (Mcu (0, 0) in FIG. 2), and is processed in the horizontal direction of the image, and the end of the region (Mcu (0, 1) in FIG. 2). Is reached in the order of proceeding again from the position where the height is shifted downward by the height 8 of Mcu (Mcu (1, 0) in FIG. 2) at the left end of the image. In the process, the compressed code data is temporarily stored in the local memory.

一定量（Ａ［ｋｂ］）のＭｃｕデータを圧縮する毎に、ローカルメモリに書き出した符号量をチェックし、一定量を超えていれば、符号データの書き出しを行う。この際、他のプロセッサと同じ領域にデータを書き込んでしまわないよう、書込み管理用のセマフォ（semaphore）を用いて排他的にデータを書き込む［ＡＣＴ１２］〜［ＡＣＴ１３−ＹＥＳ］〜［ＡＣＴ１６−ＹＥＳ］〜［ＡＣＴ１７］〜［ＡＣＴ１８−ＮＯ］。 Each time a certain amount (A [kb]) of Mcu data is compressed, the code amount written to the local memory is checked, and if it exceeds a certain amount, the code data is written. At this time, the data is written exclusively by using a semaphore for writing management so that the data is not written in the same area as other processors [ACT12] to [ACT13-YES] to [ACT16-YES]. [ACT17] to [ACT18-NO].

圧縮対象となる画像データの圧縮処理が完了した場合［ＡＣＴ１４−ＹＥＳ］は、符号バッファに書き込まれている符号量をチェックし、一定量に足りていない場合には、不足分を無効データの書き込みにより埋めた後、データを書き込む［ＡＣＴ１５］。 When the compression processing of the image data to be compressed is completed [ACT14-YES], the code amount written in the code buffer is checked, and when the predetermined amount is not enough, invalid data is written to the shortage. Then, the data is written [ACT15].

すなわち、
処理する画像列情報を取得
圧縮処理
圧縮データがＡ［ｋｂ］以上？
圧縮完了？
Ａ［ｋｂ］まで無効データを詰める
データ書き込み可能か？
Ａ［ｋｂ］圧縮データ、書き込みＰＥデータ書き出し
圧縮完了？
のルーチンが実行される。 That is,
Get image sequence information to process
Compression processing
Compressed data is A [kb] or more?
Compression complete
Pack invalid data up to A [kb]
Is data writable?
A [kb] compressed data, writing PE data writing
Compression complete
This routine is executed.

図６に、図４を用いて説明した各プロセッサでの処理により圧縮された符号データの伸張時の各プロセッサでの処理に関するフローチャートを示す。 FIG. 6 shows a flowchart relating to processing in each processor when the code data compressed by the processing in each processor described with reference to FIG. 4 is decompressed.

伸張処理では、各プロセッサが圧縮データに書き込まれた画像情報を読み込む。このデータには、画像に関する（幅、高さ、カラー／モノクロ）の情報や、圧縮（どのプロセッサが、どの領域を圧縮）に関わる情報が含まれる［ＡＣＴ２１］。 In the decompression process, each processor reads image information written in the compressed data. This data includes information related to the image (width, height, color / monochrome) and information related to compression (which processor compresses which area) [ACT21].

次に、データ終端に書き込まれた書込みＰＥデータのオフセット情報、ならびに、オフセット情報を元にロードした書込みＰＥデータをロードする［ＡＣＴ２２］。画像情報と書込みＰＥデータから、自プロセッサが、画像のどの領域を伸張処理するかを読み取り、書込みＰＥデータをもとにして処理すべき圧縮データをロードしながら画像を復号処理する［ＡＣＴ２３］，［ＡＣＴ２４］。 Next, the offset information of the writing PE data written at the end of the data and the writing PE data loaded based on the offset information are loaded [ACT 22]. The processor reads from the image information and the written PE data which area of the image is to be decompressed, and decodes the image while loading the compressed data to be processed based on the written PE data [ACT23], [ACT24].

復号処理の過程では、Ｍｃｕ単位で画像が復元され、各プロセッサは、一定のライン数Ｎ分画像をローカルメモリに復元した後［ＡＣＴ２５］，［ＡＣＴ２６］、画像データを共有のメモリに書き込む［ＡＣＴ２７］。 In the decoding process, the image is restored in units of Mcu, and each processor restores the image to the local memory for a certain number of lines [ACT25], [ACT26], and then writes the image data to the shared memory [ACT27] ].

すべてのプロセッサの処理が完了すれば、共有メモリ上に画像が展開され、処理は終了である［ＡＣＴ２８−ＹＥＳ］。 When the processing of all the processors is completed, the image is developed on the shared memory, and the processing is completed [ACT28-YES].

換言すると、
画像情報を取得
書き込みＰＥデータ読み込み
Ａ［ｋｂ］圧縮データ読み込み
信号処理
圧縮データなし？
Ｎライン分書き出し
復号完了？
の順で、処理が行われる。 In other words,
Get image information
Write PE data read
A [kb] read compressed data
Signal processing
No compressed data?
Export N lines
Decryption complete?
Processing is performed in this order.

このように、入力画像の圧縮に関して、画像メモリ上に展開される画像を副走査方向に分割し、それぞれの領域をプロセッサに割り当て並列に符号化処理させることで、メニーコアプロセッサや、マルチＣＰＵ環境で、従来の方式に比べて高速な圧縮処理が実現できる。また、圧縮データ書き出しも、各プロセッサで圧縮された符号データが保存に適したサイズまでたまった段階ですぐに書き込むことができ、オーバーヘッドが少なく、高速化が実現できる。 As described above, with respect to compression of the input image, the image developed on the image memory is divided in the sub-scanning direction, and each area is assigned to the processor and encoded in parallel. Compared to the conventional method, high-speed compression processing can be realized. In addition, the compressed data can be written out immediately after the code data compressed by each processor has accumulated to a size suitable for storage, and there is little overhead and high speed can be realized.

図７は、この発明が適用可能な画像処理装置（ＭＦＰ）に適用可能な別の画像圧縮方式を実現するためのシステムのブロック図である。なお、図７に示すシステムは、図１〜図６を用いて前述した画像処理において、各プロセッサが画像圧縮データを書き出す際のデータサイズが、キャッシュメモリライン長のｎ倍（ｎ≧１）であることを特徴とするものである。 FIG. 7 is a block diagram of a system for realizing another image compression method applicable to an image processing apparatus (MFP) to which the present invention is applicable. In the system shown in FIG. 7, in the image processing described with reference to FIGS. 1 to 6, the data size when each processor writes the compressed image data is n times the cache memory line length (n ≧ 1). It is characterized by being.

図７において、システム１０１は、それぞれが１次キャッシュメモリを有する複数（ｎ個（ｎは正の整数））のプロセッサコアＰＥ１，ＰＥ２，・・・，ＰＥｎ−１，ＰＥｎ、各プロセッサコアＰＥ１〜ＰＥｎから直接接続され、データの読み出しと書き込みが行なわれる２次キャッシュメモリ１１１ａ，・・・，１１１ｎ（ｎは正の整数）、それぞれのメモリへのデータ書込み、読出しを制御するメモリコントローラ１０３、システム１０１内に存在する全プロセッサコアＰＥ１〜ＰＥｎから１次、２次キャッシュメモリ１１１ａ〜１１１ｎを介してアクセスされる共有メモリ１０５、画像データ、圧縮された画像データである符号化データ、圧縮処理に関するパラメータ、並びにプロセッサコアで実行されるプログラムを格納するＨＤＤ１０７、および２次キャッシュメモリ、ＨＤＤ、および共有メモリ間でのデータの送受信を実現するシステムバス１０９からなる。なお、２次キャッシュメモリ１１１ａ〜１１１ｎは、一例ではあるが、それぞれ２つのプロセッサコアを単位としてシステムバス１０９と接続するものとする。 7, a system 101 includes a plurality of (n (n is a positive integer)) processor cores PE1, PE2,..., PEn-1, PEn each having a primary cache memory, and processor cores PE1 to PE1. Secondary cache memory 111a,..., 111n (n is a positive integer) that is directly connected to PEn and performs data reading and writing, a memory controller 103 that controls data writing and reading to each memory, and system 101, shared memory 105 accessed from all processor cores PE1 to PEn existing in 101 via primary and secondary cache memories 111a to 111n, image data, encoded data that is compressed image data, and parameters relating to compression processing HDD that stores programs executed by the processor core 07, and the secondary cache memory, consisting of a system bus 109 that realizes transmission and reception of data between HDD, and shared memory. Note that the secondary cache memories 111a to 111n are connected to the system bus 109 in units of two processor cores.

本実施形態において処理される画像データ、並びに画像データを圧縮した符号化データは、共有メモリ上１０５に格納される。 The image data processed in the present embodiment and the encoded data obtained by compressing the image data are stored in the shared memory 105.

１次キャッシュメモリ、２次キャッシュメモリ、共有メモリは、この順に従って、ＣＰＵからのアクセス速度が遅くなり、また、格納出来るデータ容量が大きいものとする。 In the primary cache memory, the secondary cache memory, and the shared memory, the access speed from the CPU decreases in this order, and the data capacity that can be stored is large.

キャッシュメモリは、一定長データをラインとしてデータを格納し、このラインデータを最小単位として読出し書込みを行う。１バイト分のデータの書換えを行う場合、アクセスする１バイトのデータと同じラインに存在するデータと共に読み込みを行い、プロセッサコア内のレジスタ上で、このライン上の必要なバイトデータの書換えが行われた後、再び、同ラインのデータと共に書込みが行われる。 The cache memory stores data using fixed-length data as a line, and performs reading and writing using this line data as a minimum unit. When rewriting the data for 1 byte, it reads together with the data existing on the same line as the 1-byte data to be accessed, and the necessary byte data on this line is rewritten on the register in the processor core. Then, writing is performed again with the data on the same line.

複数のプロセッサコアからなるシステムにおいて、メモリコントローラ１０３はメモリ上に保持されたデータがプログラムに記載されたとおりに保持されるような一貫性を保つためにコヒーレント処理という処理を実施する。 In a system composed of a plurality of processor cores, the memory controller 103 performs a process called coherent processing in order to maintain consistency so that data held in the memory is held as described in the program.

図８Ａは、図７に示したシステムを用いた１次、２次キャッシュメモリ間のメモリ操作に関する一例を示す。 FIG. 8A shows an example of a memory operation between the primary and secondary cache memories using the system shown in FIG.

例えば、プロセッサコア１とプロセッサコア２が、図７における左から１番目と２番目のプロセッサコアＰＥ１，ＰＥ２であるとしたとき、プロセッサコア１（ＰＥ１）とプロセッサコア２（ＰＥ２）が、アドレスが隣接するデータを読み込んだものとする。また、このとき、隣接する２つのデータが、アドレス的に２次キャッシュメモリ上の同ライン上に存在したとする。 For example, assuming that the processor core 1 and the processor core 2 are the first and second processor cores PE1 and PE2 from the left in FIG. 7, the addresses of the processor core 1 (PE1) and the processor core 2 (PE2) are Assume that adjacent data has been read. At this time, it is assumed that two adjacent data exist on the same line on the secondary cache memory in terms of address.

この同一ライン上にあるデータは、プロセッサコア１とプロセッサコア２内の１次キャッシュメモリ上に同じデータとしてコピーされる。 The data on the same line is copied as the same data on the primary cache memory in the processor core 1 and the processor core 2.

この後、プロセッサコア１このデータを書換えたとき、プロセッサコア１とプロセッサコア２が１次キャッシュメモリとして保持しているデータに違いが発生し、状態として、プロセッサコア２のデータは、プロセッサコア１が行った変更が反映されていない古いデータが存在することになる。 Thereafter, when the processor core 1 rewrites this data, a difference occurs between the data held by the processor core 1 and the processor core 2 as the primary cache memory. There will be old data that does not reflect the changes made by.

このため、メモリコントローラ１０３は、各プロセッサコアからのメモリへのアクセスをチェックしながら、上記のようにメモリ上のデータの書換えが行われた場合、一貫性が保たれていないデータ領域の書換え処理を行う。すなわち、プロセッサコア１（ＰＥ１）が１次キャッシュメモリを書換えた場合、メモリコントローラ１０３は、プロセッサコア２（ＰＥ２）内にコピーされたデータに一貫性が保たれていないことを検出し、プロセッサコア１（ＰＥ１）内の１次キャッシュメモリの該当するキャッシュラインのデータを、プロセッサコア２（ＰＥ２）の１次キャッシュのデータに上書きする。 Therefore, the memory controller 103 checks the access to the memory from each processor core and rewrites the data area inconsistently when the data on the memory is rewritten as described above. I do. That is, when the processor core 1 (PE1) rewrites the primary cache memory, the memory controller 103 detects that the data copied in the processor core 2 (PE2) is not consistent, and the processor core The data of the corresponding cache line in the primary cache memory in 1 (PE1) is overwritten on the data in the primary cache of the processor core 2 (PE2).

このように、キャッシュメモリの一貫性を保つためのコヒーレント処理は、ラインと呼ばれるデータ単位で処理される。 As described above, coherent processing for maintaining the consistency of the cache memory is performed in units of data called lines.

しかし、実際には、データを共有しない処理を、プロセッサコア１、２が実施する際、これらのデータが同一のキャッシュラインに存在する場合には、前述のコヒーレント処理が実施される（ことになる）。論理的には問題がないが、コヒーレント処理は、異なるメモリ間におけるデータコピー処理が行われ、かつこのコヒーレント処理の間は、プロセッサコアからのこの領域へのアクセスができない状態になる。 However, in practice, when the processor cores 1 and 2 perform processing that does not share data, if these data exist in the same cache line, the above-described coherent processing is performed (that is, ). Although there is no logical problem, in the coherent process, a data copy process between different memories is performed, and during this coherent process, the processor core cannot access this area.

このため、上述のようなデータを共有しない処理に対するコヒーレント処理の実行は、オーバーヘッドが大きく、論理的には、必要のないデータのコピーであるから、フォルスシェアリング（false sharing）と呼ばれる。 For this reason, execution of coherent processing for processing that does not share data as described above has a large overhead, and is logically a copy of unnecessary data, and is called false sharing.

図８Ｂは、図７に示したシステムを用いた１次、２次キャッシュメモリ間のメモリ操作に関する別の一例を示す。すなわち、図８Ｂは、図８Ａにより説明した実施形態において生じるフォルスシェアリングに関連して各プロセッサコアの実行効率が低下することを避けるため、圧縮データを書き込む際のサイズを、２次キャッシュメモリのキャッシュラインサイズのＮ倍とすることを特徴とするものである。 FIG. 8B shows another example of the memory operation between the primary and secondary cache memories using the system shown in FIG. That is, FIG. 8B shows the size when writing the compressed data in the secondary cache memory in order to avoid a decrease in the execution efficiency of each processor core in relation to the false sharing that occurs in the embodiment described with reference to FIG. 8A. It is characterized by N times the cache line size.

図８Ｂにおいては、図８Ａにより説明した符号化データ書込みのサイズを、キャッシュメモリのラインサイズの整数倍にとることにより、共有メモリ上で、メモリを共有する複数のコアが利用する領域が、それぞれ、キャッシュラインをまたぐことがなくなる。これにより、お互いがメモリにアクセスする際に、他のプロセッサが利用するデータを読み込まずに済む。従って、メモリコントローラ１０３が実施するコヒーレント操作が発生しなくなり、無駄なオーバーヘッドが発生せず、処理が高速に実現できる。 In FIG. 8B, by taking the size of the encoded data writing described with reference to FIG. 8A as an integral multiple of the line size of the cache memory, the areas used by the plurality of cores sharing the memory on the shared memory are respectively , No more crossing the cash line. This eliminates the need to read data used by other processors when each other accesses the memory. Therefore, the coherent operation performed by the memory controller 103 does not occur, no unnecessary overhead occurs, and the processing can be realized at high speed.

より詳細には、図８Ｂにおいて、プロセッサコア１（ＰＥ１）、およびプロセッサコア２（ＰＥ２）が、割り当てられた領域の圧縮処理を、並列に実行している場合、プロセッサコア１は、プロセッサコア１による圧縮処理が進行し、符号化データがキャッシュラインサイズに達した段階で、符号化データをキャッシュラインサイズ分だけメモリに書き込む。これにより、符号化データが２次キャッシュメモリに書き出される。 More specifically, in FIG. 8B, when the processor core 1 (PE1) and the processor core 2 (PE2) execute the compression processing of the allocated area in parallel, the processor core 1 When the compression process according to the above advances and the encoded data reaches the cache line size, the encoded data is written into the memory by the cache line size. As a result, the encoded data is written to the secondary cache memory.

この書き込みサイズがキャッシュラインサイズに合っていない（一致しない）場合は、前述のように、符号化データを記載する他のＣＰＵコアに対してコヒーレント処理が発生して性能、特に処理速度が大幅に低下するが、図８Ｂに示した本方式では、メモリ書込みサイズをキャッシュラインサイズの整数倍に制限したことにより、コヒーレント動作が発生せず、各プロセッサ間の同期処理に伴うオーバーヘッドを低減することができる。 If this write size does not match (does not match) the cache line size, as described above, coherent processing occurs for other CPU cores that describe encoded data, and performance, particularly processing speed, is greatly increased. 8B, in this method shown in FIG. 8B, the memory write size is limited to an integral multiple of the cache line size, so that coherent operation does not occur, and overhead associated with synchronization processing between processors can be reduced. it can.

図９は、この発明が適用可能な画像処理装置（ＭＦＰ）に適用可能な、さらに別の画像圧縮方式を実現するためのシステムのブロック図である。 FIG. 9 is a block diagram of a system for realizing still another image compression method applicable to an image processing apparatus (MFP) to which the present invention is applicable.

図９に示す実施形態は、圧縮処理、伸張処理を実施する装置の両方、もしくは、いずれかがマルチコアのプロセッサを具備する際、画像データの並列分割数を、これらのプロセッサ数に応じて切替えることにより、高いスループットを実現することを特徴とするものである。 The embodiment shown in FIG. 9 switches the number of parallel divisions of image data in accordance with the number of processors when either or both of the apparatuses that perform compression processing and decompression processing have multi-core processors. Thus, a high throughput is realized.

図９に示すシステム２０１は、クライアントＰＣ（データ供給元）２２１とプリンタエンジン（ＭＦＰを含む）２３１が、ネットワーク（例えば、ＬＡＮ）により、相互に接続された状態において、クライアントＰＣ２２１およびプリンタエンジン２３１へ画像出力のための画像データを入力するプリンタコントローラ２３３のそれぞれが、画像圧縮、伸張処理を受け持つマルチコアＣＰＵを含むことを特徴とする。 The system 201 shown in FIG. 9 includes a client PC (data supply source) 221 and a printer engine (including an MFP) 231 connected to each other via a network (for example, a LAN) to the client PC 221 and the printer engine 231. Each of the printer controllers 233 for inputting image data for image output includes a multi-core CPU responsible for image compression and expansion processing.

このシステムでは、ユーザが印刷する文書中のイメージデータは、クライアントＰＣ側のマルチコアにより圧縮される。 In this system, image data in a document printed by a user is compressed by a multi-core on the client PC side.

文書中の画像が圧縮された状態のデータは、ネットワークを介してプリンタエンジン２３１と接続するプリンタコントローラ２３３に送信される。 Data in a state where the image in the document is compressed is transmitted to the printer controller 233 connected to the printer engine 231 via the network.

プリンタコントローラ２３３では、印刷データの受信後、このデータを画像形成処理するが、この際、受信された印刷データ中の圧縮されたイメージデータの伸張処理を自身のマルチコアにより伸張処理する。 After receiving the print data, the printer controller 233 performs image forming processing on this data. At this time, the compressed image data in the received print data is decompressed by its own multi-core.

このように、文書中に含まれるイメージデータの圧縮、伸張処理をマルチコアで実現することにより並列処理し、印刷データ量のデータサイズを減らしてクライアントＰＣと、コントローラ間の通信時間を短縮することにより、プリントのスループットを向上させることが出来る。 In this way, by compressing and decompressing image data contained in a document in a multi-core manner, parallel processing is performed, and the data size of the print data amount is reduced to shorten the communication time between the client PC and the controller. The printing throughput can be improved.

本実施形態に関するイメージデータ圧縮の並列化処理のための、分割サイズの決定方法について説明する。 A method for determining a division size for parallel processing of image data compression according to the present embodiment will be described.

図１０に、クライアントＰＣと、プリンタコントローラのＣＰＵ（プロセッサ）コア数に違いがある場合の並列分割の決定方法の例を示す。 FIG. 10 shows an example of a method for determining parallel partitioning when there is a difference in the number of CPU (processor) cores of the client PC and the printer controller.

ここで、クライアントＰＣのＣＰＵコア数が２、プリンタコントローラのＣＰＵコア数が４であるとする。本実施形態における並列分割数は、圧縮処理、伸張処理を実施するＣＰＵのコア数を比較し数が多いものを採用する。 Here, it is assumed that the CPU core number of the client PC is 2 and the CPU core number of the printer controller is 4. As the number of parallel divisions in the present embodiment, the number of CPU cores that perform compression processing and decompression processing is compared and a large number is used.

図１０の場合、圧縮側がコア数２、伸張側がコア数４であるため、並列化のための画像領域の分割数は４となる。このように分割数を決定された際、圧縮処理側、伸張処理側でコア数が異なる場合には、どちらかの側が実際のＣＰＵコア数よりも多く画像を分割する形になる。 In the case of FIG. 10, since the compression side has 2 cores and the decompression side has 4 cores, the number of image area divisions for parallelization is 4. When the number of divisions is determined in this way, if the number of cores is different between the compression processing side and the decompression processing side, either side divides the image more than the actual number of CPU cores.

図１１および図１２に、図１０に示したクライアントＰＣと、プリンタコントローラのプロセッサコア数に違いがある場合において、ＣＰＵコア数が分割数よりも少ない場合における処理例を示す。 11 and 12 show processing examples when the number of processor cores of the client PC shown in FIG. 10 is different from that of the printer controller when the number of processor cores of the printer controller is different.

ＣＰＵコア数が少ない場合には、１つのプロセッサコアに対して、複数の画像分割領域が処理対象として割り当てられる。 When the number of CPU cores is small, a plurality of image division regions are assigned as processing targets to one processor core.

図１１は、ＣＰＵコア数が２に対して、並列分割数が３に設定された場合であり、この際、ＣＰＵコア１に対しては、領域Ａと領域Ｃ、ＣＰＵコア２に対しては、領域Ｂが割り当てられたものとする。 FIG. 11 shows a case in which the number of CPU cores is set to 2 and the number of parallel divisions is set to 3. In this case, for CPU core 1, areas A and C, and for CPU core 2, , Area B is allocated.

この場合、本発明第１実施形態の方式をそのまま、この圧縮に用いると、図１１（ａ）のような処理の流れとなる。すなわち、初めに、コアが割り当てられた領域Ａ，Ｂの処理が実施され、圧縮データには、これらの領域の符号化データが、書込み量に達した段階で随時書き込まれていき、領域Ａの書込みが完了した段階で、ＣＰＵコア１が領域Ｃの圧縮処理を実施する。 In this case, if the method of the first embodiment of the present invention is used for this compression as it is, the processing flow as shown in FIG. That is, first, the processing of the areas A and B to which the cores are allocated is performed, and the encoded data of these areas is written to the compressed data as needed when the write amount is reached. When the writing is completed, the CPU core 1 performs the compression process of the area C.

このように、圧縮処理を実施した場合、符号化されたデータは、図１１（ｂ）に示すように、はじめにＣＰＵコアが割り当てられる領域の圧縮データである「領域Ａ符号データ」と「領域Ｂ符号データ」の組に対して、そうでない領域の圧縮データである「領域Ｃ符号データ」が、データ中の離れた位置に格納されることになる。 In this way, when the compression process is performed, the encoded data is “area A code data” and “area B”, which are the compressed data of the areas to which the CPU core is allocated first, as shown in FIG. For the set of “code data”, “region C code data”, which is compressed data in the other region, is stored at a distant position in the data.

このような符号化データを本発明の方式の装置において復号する際には、近接領域のデータがファイル上の離れた位置に格納するため、メモリ、キャッシュメモリ上に同時に両方のデータが格納できず、ディスクスワップが発生して性能が極端に低下してしまうことになる。 When such encoded data is decoded by the apparatus of the present invention, since the data in the adjacent area is stored at a distant position on the file, both data cannot be stored simultaneously in the memory and the cache memory. As a result, a disk swap occurs and the performance is extremely lowered.

これに対して、本実施形態の方式においては、図１２（ａ）に示すように、あらかじめ領域処理に関して処理ライン数の閾値を決めておき、ライン数が一定を超えて、かつ、符号量が書込み量に達した段階、すなわちバンドラインを超え符号化データ量が達した時点で、書き込み、一旦その領域の処理を中断し、割り当てられた別の領域の処理に移行するものとする。 On the other hand, in the method of the present embodiment, as shown in FIG. 12A, a threshold for the number of processing lines is determined in advance for the region processing, the number of lines exceeds a certain value, and the code amount is It is assumed that when the amount of writing has been reached, that is, when the amount of encoded data has exceeded the band line, writing, processing of that area is interrupted, and processing for another allocated area is started.

これにより、同じラインに属する符号データを、図１２（ｂ）に模式的に示すように、圧縮データの中で近接させて格納することができるため、効率的にメモリアクセスを実施することができ、処理を高速に実施することができる。 As a result, code data belonging to the same line can be stored close to each other in the compressed data as schematically shown in FIG. 12B, so that memory access can be performed efficiently. , Processing can be performed at high speed.

図９に示すシステムにおいては、図１０により前述した分割数決定方式、ならびに図１２を用いて説明した処理方式を用い、図１３、図１４および図１５に示す、圧縮側、伸張側の処理フローにより処理を実現するものである。 In the system shown in FIG. 9, the division number determination method described above with reference to FIG. 10 and the processing method described with reference to FIG. 12 are used, and the processing flows on the compression side and decompression side shown in FIGS. The processing is realized by the above.

図１３は、図１２に示した「並列分割を伴う処理」を説明するフローチャートである。 FIG. 13 is a flowchart for explaining the “process with parallel division” shown in FIG.

図１３においては、最初に、図１４を用いて後段に説明する「並列分割数決定処理」が実行され、「並列分割を伴う処理」を実行する際の分割数が設定される［ＡＣＴ３１］。 In FIG. 13, first, “parallel division number determination processing” described later with reference to FIG. 14 is executed, and the division number for executing “processing with parallel division” is set [ACT 31].

印刷（画像出力）対象のページを記述する際に、記述対象が圧縮画像を含むか否かがチェックされる［ＡＣＴ３３］。 When a page to be printed (image output) is described, it is checked whether or not the description target includes a compressed image [ACT33].

記述対象が圧縮画像を含む場合［ＡＣＴ３３−ＹＥＳ］、処理対象領域が指示され［ＡＣＴ３４］、符号化処理後、符号化データが書き出される［ＡＣＴ３５］。 When the description target includes a compressed image [ACT33-YES], the processing target area is indicated [ACT34], and the encoded data is written out after the encoding process [ACT35].

続いて、書き込みＰＥデータが書き出され［ＡＣＴ３６］、記述すべき残りページ（印刷対象）がなくなるまで［ＡＣＴ３２−ＹＥＳ］、記述対象が圧縮画像を含むか否かのチェック（［ＡＣＴ３３］）、記述対象が圧縮画像を含む場合の処理対象領域の指示（［ＡＣＴ３４］）、符号化処理後、符号化データの書き出し（［ＡＣＴ３５］）が繰り返される。 Subsequently, the writing PE data is written out [ACT36], and until there is no remaining page (print target) to be described [ACT32-YES], whether or not the description target includes a compressed image ([ACT33]), When the description target includes a compressed image, the process target area instruction ([ACT 34]) and the encoded data writing ([ACT 35]) are repeated after the encoding process.

なお、記述対象が圧縮画像を含まない場合［ＡＣＴ３３−ＮＯ］、次の記述対象に対する記述が繰り返される［ＡＣＴ３７］。一方、記述すべき残りページ（印刷対象）がなくなった場合には、処理終了となる［ＡＣＴ３２−ＹＥＳ］。 When the description object does not include a compressed image [ACT33-NO], the description for the next description object is repeated [ACT37]. On the other hand, when there is no remaining page (print target) to be described, the process ends [ACT32-YES].

すなわち、
並列分離数決定処理
印刷全ページ記述完＜記述終了＞？
記述対象が圧縮画像？
処理対象領域指示
符号化処理、符号化データ書き出し
書き込みＰＥデータ書き出し
記述処理
の各工程が実施される。 That is,
Parallel separation number determination processing
Print all pages complete <end of description>?
Is the description target compressed image?
Processing area instruction
Encoding processing, encoded data writing
Write PE data export
Description processing
These steps are performed.

図１４は、「並列分割数決定処理」を説明するフローチャートである。 FIG. 14 is a flowchart for explaining the “parallel division number determination process”.

図１４に示す「並列分割数決定処理」では、はじめにプリンタコントローラＣＰＵコア数がチェックされる（プリンタコントローラＣＰＵコア数取得）［ＡＣＴ４１］。 In the “parallel division number determination process” shown in FIG. 14, the number of printer controller CPU cores is first checked (printer controller CPU core number acquisition) [ACT41].

続いて、クライアント（ＰＣ）ＣＰＵコア数がチェックされる（クライアントＣＰＵコア数取得）［ＡＣＴ４２］。 Then, the number of client (PC) CPU cores is checked (client CPU core number acquisition) [ACT42].

以下、取得したプリンタコントローラＣＰＵコア数とクライアントＣＰＵコア数とが比較され［ＡＣＴ４３］、ＣＰＵコア数の多い方に従い、分割数が設定される。すなわち、プリンタコントローラＣＰＵコア数が多い場合［ＡＣＴ４３−ＹＥＳ］、分割数はプリンタコントローラＣＰＵコア数に設定される［ＡＣＴ４４］。一方、クライアント（ＰＣ）ＣＰＵコア数が多い場合［ＡＣＴ４３−ＮＯ］、分割数はクライアント（ＰＣ）ＣＰＵコア数に設定される［ＡＣＴ４５］。 Thereafter, the acquired number of printer controller CPU cores and the number of client CPU cores are compared [ACT 43], and the division number is set according to the larger number of CPU cores. That is, when the number of printer controller CPU cores is large [ACT43-YES], the division number is set to the number of printer controller CPU cores [ACT44]. On the other hand, when the number of client (PC) CPU cores is large [ACT43-NO], the number of divisions is set to the number of client (PC) CPU cores [ACT45].

すなわち、
プリンタコントローラＣＰＵコア数取得
クライアント＜ＰＣ＞ＣＰＵコア数取得
＜ＣＰＵコア数比較＞
＜コントローラＣＰＵコア数［大］＞
分割数＝コントローラＣＰＵコア数
＜ＰＣ＜クライアント＞ＣＰＵコア数［大］＞
分割数＝ＰＣＣＰＵコア数
が順に実行される。 That is,
Get the number of CPU cores of the printer controller
Client <PC> Get CPU core count
<CPU core number comparison>
<Number of controller CPU cores [Large]>
Number of divisions = number of controller CPU cores
<PC <client> Number of CPU cores [Large]>
Number of divisions = number of PCCPU cores
Are executed in order.

図１５は、図１４により説明したクライアントＰＣ（ＣＰＵコア）内の処理に対応するプリンタコントローラ側の処理の一例を示す。 FIG. 15 shows an example of processing on the printer controller side corresponding to the processing in the client PC (CPU core) described with reference to FIG.

図１５から明らかなように、描画対象が圧縮画像を含むか否かがチェックされ［ＡＣＴ５２］、その対象が圧縮画像を含む場合［ＡＣＴ５２−ＹＥＳ］、並列分割数を取得し［ＡＣＴ５３］、処理対象領域が指示される［ＡＣＴ５４］。 As is apparent from FIG. 15, it is checked whether or not the drawing target includes a compressed image [ACT52]. If the target includes a compressed image [ACT52-YES], the number of parallel divisions is acquired [ACT53], and processing is performed. The target area is indicated [ACT 54].

以下、復号処理が実行され［ＡＣＴ５５］、描画すべき残りページ（印刷対象）がなくなるまで［ＡＣＴ５１−ＹＥＳ］、描画対象が圧縮画像を含むか否かのチェック（［ＡＣＴ５３］）、描画対象が圧縮画像を含む場合の処理対象領域の指示（［ＡＣＴ５４］）、復号処理（［ＡＣＴ５５］）が繰り返される。 Thereafter, the decoding process is performed [ACT 55], and until there is no remaining page (print target) to be rendered [ACT 51-YES], whether or not the rendering target includes a compressed image ([ACT 53]), and the rendering target is determined. The process area indication ([ACT54]) and the decoding process ([ACT55]) are repeated when a compressed image is included.

なお、描画対象が圧縮画像を含まない場合［ＡＣＴ５３−ＮＯ］、次の描画対象に対する描画が繰り返される［ＡＣＴ５６］。また、描画すべき残りページ（印刷対象）がなくなった場合には、処理終了となる［ＡＣＴ５１−ＹＥＳ］。 When the drawing target does not include a compressed image [ACT53-NO], drawing for the next drawing target is repeated [ACT56]. If there is no remaining page (print target) to be drawn, the process ends [ACT51-YES].

図１６は、この発明が適用可能な画像処理装置（ＭＦＰ）に適用可能なまたさらに別の画像圧縮方式を実現するためのシステムのブロック図である。なお、図１６に示す例は、実質的に、図７に示したシステムのうちの２つのプロセッサコアＰＥ１，ＰＥ２、２次キャッシュメモリ１１１Ａ、および共有メモリ１０５を抜き出した状態と等しい。 FIG. 16 is a block diagram of a system for realizing still another image compression method applicable to an image processing apparatus (MFP) to which the present invention is applicable. Note that the example shown in FIG. 16 is substantially equivalent to the state in which the two processor cores PE1, PE2, the secondary cache memory 111A, and the shared memory 105 in the system shown in FIG. 7 are extracted.

図１６に示すシステムにおいては、共有メモリ１０５の利用効率の最適化のために、並列分割数を決定し処理を実施することにより、メモリにおいて生じるとされるボトルネックを解消し、処理を高速に実現することである。図１６において、処理パラメータが、それぞれのコアが持つ１次キャッシュメモリからスワップアウトされることなく、また、処理データの２次キャッシュと共有メモリ間の転送に関する読み／書きのデータ量が転送に最適なサイズで行われ、オーバーヘッドが少なく、ＤＭＡ転送が最適な条件で実行できるため、スループットを向上させることができる。なお、図１７に、図１６に示す実施形態により実現するコアの１つが共有メモリ上に配置するデータ構造の一例を示す。 In the system shown in FIG. 16, in order to optimize the utilization efficiency of the shared memory 105, the number of parallel divisions is determined and processing is performed, so that the bottleneck that occurs in the memory is eliminated and processing is performed at high speed. Is to realize. In FIG. 16, the processing parameters are not swapped out from the primary cache memory of each core, and the read / write data amount related to the transfer between the secondary cache of the processing data and the shared memory is optimal for the transfer. Since the transmission is performed with a small size, the overhead is small, and the DMA transfer can be executed under the optimum conditions, the throughput can be improved. FIG. 17 shows an example of a data structure arranged on the shared memory by one of the cores realized by the embodiment shown in FIG.

図１６に示す例では、各コアプロセッサＰＥ１，ＰＥ２が利用するシステム上のメモリ（共有メモリ１０５）がシステムバス１０９と接続した状態において、処理の際に、コアプロセッサＰＥ１，ＰＥ２のそれぞれのメモリ（１次キャッシュメモリ）に格納されるデータはＪＰＥＧデータであり、メモリ（共有メモリ１０５）上には、圧縮・伸張のパラメータ、具体的には、量子化テーブル、ハフマンテーブル、色差（Ｕ，Ｖ成分）の間引き率、などが格納される。また、各プロセッサＰＥ１，ＰＥ２と共有メモリ１０５との間に位置する２次キャッシュメモリ１１１ａ上のデータにおいては、個々の処理データの先頭（ヘッダ）に、パラメータが付属する。なお、各プロセッサＰＥ１，ＰＥ２内の１次キャッシュメモリ上の個々のデータにおいてもは、ワーク領域の先頭に、パラメータが付属する。 In the example shown in FIG. 16, in the state where the memory (shared memory 105) on the system used by each of the core processors PE1 and PE2 is connected to the system bus 109, each of the memories of the core processors PE1 and PE2 ( The data stored in the primary cache memory) is JPEG data. On the memory (shared memory 105), compression / decompression parameters, specifically, quantization tables, Huffman tables, color differences (U and V components) ) Is stored. In the data on the secondary cache memory 111a located between the processors PE1 and PE2 and the shared memory 105, parameters are attached to the head (header) of each processing data. It should be noted that a parameter is attached to the head of the work area in the individual data on the primary cache memory in each of the processors PE1 and PE2.

これらのパラメータは、あらかじめ決められた値を利用するものとする。また、処理の対象となる画像データと、それを圧縮した符号化データもメモリに格納され、これらはワーク用として確保された領域に格納される。 Assume that these parameters use predetermined values. Further, image data to be processed and encoded data obtained by compressing the image data are also stored in a memory, and these are stored in an area reserved for work.

圧縮されるデータのサイズは、圧縮パラメータと、圧縮されるデータの内容に依存して決まるため、正確にどの程度必要かは決めることができない。通常、この領域サイズを小さくしすぎると、共有メモリと、キャッシュメモリ間のデータ転送の回数が増加し、転送によるオーバーヘッドによりパフォーマンスが低下する。また、ワーク領域のサイズを大きくとりすぎると、１次キャッシュのデータがあふれ、２次キャッシュへのスワップ操作が発生し、これもパフォーマンスを低下させる一因となる。ここで、図１７に示すが、任意のＰＥが共有メモリ上に定義するＤＭＡ最小転送サイズを最適化することで、ＤＭＡ転送のオーバーヘッドを、平均的な画像の圧縮において最適に抑えることにより、多くの使用用途に関して、高いオーバーヘッドを実現できる。 Since the size of the data to be compressed is determined depending on the compression parameter and the content of the data to be compressed, it cannot be determined exactly how much is necessary. Normally, if this area size is made too small, the number of times of data transfer between the shared memory and the cache memory increases, and the performance deteriorates due to overhead due to the transfer. Further, if the size of the work area is too large, the data in the primary cache overflows and a swap operation to the secondary cache occurs, which also causes a decrease in performance. Here, as shown in FIG. 17, by optimizing the DMA minimum transfer size that an arbitrary PE defines on the shared memory, the overhead of DMA transfer is optimally suppressed in the compression of an average image. High overhead can be realized with respect to the intended use.

なお、図１７は、任意のＰＥが共有メモリ上に定義するＤＭＡ最小転送サイズの構成の一例を示す。 FIG. 17 shows an example of the configuration of the minimum DMA transfer size defined by any PE on the shared memory.

図１７から明らかな通り、任意のＰＥが共有メモリ上に定義するＤＭＡ最小転送サイズは、パラメータ、入力データと出力データからなる処理データを含む。 As apparent from FIG. 17, the DMA minimum transfer size defined by any PE on the shared memory includes processing data including parameters, input data, and output data.

このように、図１６および図１７に示した通り、共有メモリが保持するデータは、前述のようにメモリ間のスワップ処理やＤＭＡ転送のオーバーヘッドを、平均的な画像の圧縮において最適に抑えることにより、多くの使用用途に関して、高いオーバーヘッドを実現することを可能とするものである。 As described above, as shown in FIGS. 16 and 17, the data held in the shared memory is obtained by optimally suppressing the swap processing between the memories and the overhead of the DMA transfer as described above in the average image compression. Therefore, it is possible to realize a high overhead for many usages.

本実施形態は、図１８により以下に説明するが、あらかじめサンプルチャートなどを、方式で採用するパラメータと同じ値によって圧縮し、平均の圧縮率を算出しておき、これを基にワークメモリサイズを予測させることで、平均的な利用シーンで、メモリスワップや、ＤＭＡによるオーバーヘッドが起こらないよう並列分割数を調整するものである。 Although this embodiment will be described below with reference to FIG. 18, a sample chart or the like is compressed in advance with the same value as the parameter adopted in the method, an average compression rate is calculated, and the work memory size is calculated based on this. By predicting, the number of parallel divisions is adjusted so that memory swap and DMA overhead do not occur in an average usage scene.

図１８は、図１６に示したシステムにおける並列分割数の決定方法について説明する一例を示す。 FIG. 18 shows an example for explaining a method of determining the number of parallel divisions in the system shown in FIG.

処理に当り、システムは、キャッシュメモリを共有するプロセッサのグループの情報を取得する。例えば、４コアのＣＰＵコアを有するプロセッサであって、それぞれ、２コアずつが２次キャッシュを共有するプロセッサの場合、グループの数は２であり、それぞれのグループが有するプロセッサコア数は２となる。この際、各コアが共有する共有キャッシュメモリのサイズ情報も取得する［ＡＣＴ１０１］。 In processing, the system acquires information on a group of processors sharing the cache memory. For example, in the case of a processor having four CPU cores, each having two cores sharing a secondary cache, the number of groups is two and the number of processor cores each group has is two. . At this time, the size information of the shared cache memory shared by each core is also acquired [ACT101].

プロセッサコア数の算出処理は、このグループ毎に実施され、はじめに全プロセッサグループに関する処理が完了したか、すなわちグループ数回ループしたか、の確認処理を実施する［ＡＣＴ１０２］。 The processing for calculating the number of processor cores is performed for each group. First, confirmation processing is performed to determine whether the processing for all the processor groups has been completed, that is, whether the processing has been looped several times [ACT102].

次に、共有グループに属するＣＰＵコア数を、そのグループの分割数に設定する。すなわち、「グループ内分割数 ← 共有グループ内のプロセッサ数」がセットされる［ＡＣＴ１０３］。 Next, the number of CPU cores belonging to the shared group is set as the division number of the group. That is, “number of divisions in group ← number of processors in shared group” is set [ACT 103].

続いて、この分割数でこのグループの共有キャッシュサイズを割り、１コアに割り当てる共有キャッシュ上の領域のサイズを算出する［ＡＣＴ１０４］。このとき、処理パラメータのデータサイズの情報を取得して、１プロセッサコアに割り当てられるキャッシュサイズから引くことにより、ワーク領域として利用できるメモリサイズを算出する［ＡＣＴ１０５］。 Subsequently, the shared cache size of this group is divided by this division number, and the size of the area on the shared cache allocated to one core is calculated [ACT 104]. At this time, information on the data size of the processing parameter is acquired and subtracted from the cache size assigned to one processor core, thereby calculating the memory size that can be used as a work area [ACT 105].

ここで算出したワークサイズに関して、平均圧縮率を取得し［ＡＣＴ１０６］、これをワークサイズに掛けることにより、画像、および、符号化領域として利用される予測のサイズを算出する。すなわち、固定処理パラメータサイズを算出する［ＡＣＴ１０７］。 With respect to the work size calculated here, an average compression rate is acquired [ACT 106], and this is multiplied by the work size to calculate the size of the image and the prediction used as the coding area. That is, the fixed processing parameter size is calculated [ACT 107].

なお、ワーク領域では、符号化されるデータの領域のほうが少ないため、これをワークメモリの最小領域サイズとする［ＡＣＴ１０８］。 In the work area, since the area of the encoded data is smaller, this is set as the minimum area size of the work memory [ACT 108].

ここで、このワークメモリの最小領域サイズと２次キャッシュと共有メモリ間のＤＭＡ転送の最小転送サイズとを比較し［ＡＣＴ１０９］、ワークメモリの最小領域サイズよりも共有メモリ間のＤＭＡ転送の最小転送サイズ小さい場合［ＡＣＴ１０９−ＹＥＳ］、グループの分割数を１小さくし［ＡＣＴ１１０］、改めて、ワークメモリの最小領域サイズを算出し、比較を行う［ＡＣＴ１１１−ＹＥＳ］〜［ＡＣＴ１０４］，・・・，［ＡＣＴ１０９］。 Here, the minimum area size of the work memory is compared with the minimum transfer size of the DMA transfer between the secondary cache and the shared memory [ACT109], and the minimum transfer of the DMA transfer between the shared memories is larger than the minimum area size of the work memory. When the size is small [ACT109-YES], the number of group divisions is reduced by 1 [ACT110], and the minimum area size of the work memory is calculated again and compared [ACT111-YES] to [ACT104],. [ACT109].

ワークサイズが最小ＤＭＡサイズよりも大きい場合［ＡＣＴ１０９−ＮＯ］、このプロセッサの分割数を並列処理の分割数に加え、つぎのプロセッサグループの分割数決定処理に移行する［ＡＣＴ１１２］。 When the work size is larger than the minimum DMA size [ACT109-NO], the division number of this processor is added to the division number of the parallel processing, and the processing shifts to the division number determination processing of the next processor group [ACT112].

すなわち、
キャッシュ共有プロセッサグループ情報取得
プロセッサ供給グループ数だけ処理した（グループ数回ループしたか）？
グループ内分割数 ← 共有グループ内のプロセッサ数
キャッシュ共有プロセッサ数算出
平均圧縮率データ取得
固定処理パラメータサイズ算出
ワーク領域サイズ算出
ワーク領域内最小データ領域サイズ算出
最小データ領域サイズ＞ＤＭＡ最小転送サイズ？
グループ内分割数 ← グループ内分割数−１
グループ内分割数 ≠ １？
分割数 ← 分割数＋グループ内分割数
の各工程が実施される。 That is,
Cache shared processor group information acquisition
Have you processed the number of processor supply groups (has looped several times)?
Number of partitions in group ← Number of processors in shared group
Calculate the number of cache shared processors
Average compression rate data acquisition
Fixed processing parameter size calculation
Work area size calculation
Calculation of minimum data area size in work area
Minimum data area size> DMA minimum transfer size?
Number of divisions within group ← Number of divisions within group-1
Number of divisions within group ≠ 1?
Number of partitions ← Number of partitions + Number of partitions in group
These steps are performed.

他のプロセッサグループに関する並列分割数決定の処理方法も、前記と同様に実施される。 The processing method for determining the number of parallel divisions for other processor groups is also performed in the same manner as described above.

以上説明したように、この発明によれば、複写機（ＭＦＰ）内部など、高解像度の巨大な画像を格納するシステムにおいて、格納領域を節約するため画像データを圧縮して格納する際に、近年のプリンタエンジンの高速化や、外部デバイスであるコンピュータ等の高速化に伴う、画像データのより高速な圧縮・伸長が可能となる。すなわち、ＣＰＵを複数持つマルチプロセッサシステムや、１つのＣＰＵの中に複数のＣＰＵを持つメニーコアプロセッサの普及が目ざましく、このようなプロセッサを用いて、前記巨大画像の圧縮・高速化に関する、方法、システムを、特に圧縮率の高いＪＰＥＧ圧縮については、適用することができる。すなわち、本発明の実施形態を適用することで、複数のプロセッサを用いて、１枚の画像をＪＰＥＧ方式のベースとした圧縮・伸長を高速に行うことができるシステムが実現できる。 As described above, according to the present invention, in a system for storing a huge image with high resolution, such as inside a copying machine (MFP), in recent years, when image data is compressed and stored in order to save a storage area. As the printer engine speeds up and the speed of computers such as external devices increases, image data can be compressed and expanded at higher speed. That is, a multi-processor system having a plurality of CPUs and a many-core processor having a plurality of CPUs in one CPU are remarkably widespread, and using such a processor, a method for compression / acceleration of the huge image, The system can be applied especially for JPEG compression with a high compression rate. In other words, by applying the embodiment of the present invention, it is possible to realize a system capable of performing high-speed compression / decompression using a plurality of processors and a single image based on the JPEG method.

本発明の画像処理装置（ＭＦＰ）では、圧縮の際、メモリに展開された画像データを縦方向に分割し、それぞれを別のプロセッサで並列にＪＰＥＧ圧縮処理してゆく。個々のプロセッサはローカルメモリを持ち、一定量の圧縮データが貯まる度に、非同期に一定量の符号データを書込み、また、メモリ上に自プロセッサが書き込んだ旨の情報を記録する。全プロセッサの処理終了後、この書込み順データを圧縮画像データの後ろにつけて圧縮画像ファイルとする。復号時には、個々のプロセッサが、上記書込み順データを下に、一定量ずつ圧縮データを読み込みながら、画像データをメモリ上に復号していく。上記のように、本発明によると、圧縮データの依存関係は、各プロセッサの処理する領域に閉じられるため、他のプロセッサと同期をとらずに処理できるため、圧縮・伸長処理ともに、並列度に応じた性能が得られる。 In the image processing apparatus (MFP) of the present invention, when compressing, the image data expanded in the memory is divided in the vertical direction, and each is subjected to JPEG compression processing in parallel by another processor. Each processor has a local memory, and whenever a certain amount of compressed data is stored, a certain amount of code data is written asynchronously, and information indicating that the processor itself has written is recorded on the memory. After the processing of all the processors is completed, this writing order data is appended to the compressed image data to form a compressed image file. At the time of decoding, each processor decodes the image data onto the memory while reading the compressed data by a certain amount with the writing order data below. As described above, according to the present invention, since the dependency relationship of the compressed data is closed to the area to be processed by each processor, it can be processed without synchronizing with other processors. The corresponding performance is obtained.

従って、メニーコアや、マルチプロセッサシステムで、サイズの大きい画像データを、並列処理により、圧縮・伸長を高速に行うことができる。 Therefore, large-scale image data can be compressed and decompressed at high speed by parallel processing in a many-core or multiprocessor system.

なお、この発明は、上述した各実施の形態に限定されるものではなく、その実施の段階ではその要旨を逸脱しない範囲で種々な変形もしくは変更が可能である。また、各実施の形態は、可能な限り適宜組み合わせて、もしくは一部を削除して実施されてもよく、その場合は、組み合わせもしくは削除に起因したさまざまな効果が得られる。 In addition, this invention is not limited to each embodiment mentioned above, A various deformation | transformation or change is possible in the range which does not deviate from the summary in the stage of the implementation. In addition, the embodiments may be implemented by appropriately combining them as much as possible, or by deleting a part thereof. In that case, various effects resulting from the combination or deletion can be obtained.

例えば、画像データをメモリ展開してアドレスが連続する方向に向かって垂直方向に画像分割を行い、その画像分割された個々の領域について、それぞれ、異なるプロセッサにより、色変換、ＤＣＴ変換、量子化、ハフマン符号化し、それぞれのプロセッサで、ハフマン符号化データが一定量を超えた場合に、一定量分ずつそれぞれのプロセッサから書出された符号化データと、上記符号化データ作成時に、それぞれのプロセッサが書き込みを行う際に書出されたプロセッサを識別可能な書込みＰＥデータと、からなり、
前記符号化データに続いて書込みＰＥデータが連続する画像圧縮フォーマットとして、実現できる。 For example, the image data is developed in the memory and the image is divided in the vertical direction toward the direction in which the addresses are continuous, and each of the divided areas is subjected to color conversion, DCT conversion, quantization, When Huffman coding is performed and each Huffman coded data exceeds a certain amount, the coded data written by each processor by a certain amount and each processor at the time of creating the coded data Write PE data that can identify the processor written when writing,
This can be realized as an image compression format in which writing PE data continues after the encoded data.

また、ローカルメモリを持つ複数のプロセッサと、
各プロセッサがアクセス可能な共有メモリと、
入力画像の一部ずつ、または画像の全体をメモリに格納するメモリ制御部と、
格納された入力画像の一部、または、全体を、画像データをメモリ展開してアドレスが連続する方向に向かって垂直方向に沿って画像領域を分割し、それぞれの分割領域の画像データを、前記プロセッサのいずれか１つに割り当てる形で、色変換、ＤＣＴ、量子化、ハフマン符号処理を行い、処理の結果生成されるハフマン符号データが一定量を超えた場合には、一定量のハフマン符号データを書出し、書出したプロセッサを識別可能なデータをメモリに書き込み、全プロセッサの圧縮処理が完了した段階で、前記プロセッサを識別可能な書込みＰＥデータを書き出し、前記フォーマットの圧縮データ中から、書込みＰＥデータを読み込み、処理を行っているプロセッサに該当するデータだけを順に読み込みながら、ハフマン復号、逆量子化、逆ＤＣＴ、逆色変換処理を行い圧縮画像を伸長処理するプログラムにより動作する画像処理部と、
を含む画像処理装置が実現できる。 And multiple processors with local memory,
Shared memory accessible to each processor;
A memory control unit for storing a part of the input image or the entire image in a memory;
A part or the whole of the stored input image is divided into image areas along the vertical direction in the direction in which the image data is expanded in memory and the addresses are continuous, and the image data of each divided area is When color conversion, DCT, quantization, and Huffman code processing are performed in a form assigned to any one of the processors, and the Huffman code data generated as a result of processing exceeds a certain amount, a certain amount of Huffman code data Is written to the memory, and when the compression processing of all the processors is completed, the write PE data that can identify the processor is written, and the write PE is written out of the compressed data of the format. Huffman decoding and inverse quantization while reading only the data corresponding to the processor that is reading the data An image processing unit which operates by the inverse DCT, the decompression program processes the compressed image subjected to the inverse color conversion processing,
Can be realized.

１、１０１…画像処理装置（画像処理システム）、１１…システムバス、２１…ＣＰＵ（主制御装置）、２３…共有メモリ、２５…ＨＤＤ（画像データ記憶装置）、１０３…メモリコントローラ、１０５…共有メモリ、１０７…ＨＤＤ（画像データ記憶装置）、１０９…システムバス、１１１ａ〜１１１ｎ…２次キャッシュメモリ、ＰＥ１〜ＰＥｎ…演算ユニット（プロセッサコア、コアプロセッサ、単位処理装置）。 DESCRIPTION OF SYMBOLS 1,101 ... Image processing apparatus (image processing system), 11 ... System bus, 21 ... CPU (main control apparatus), 23 ... Shared memory, 25 ... HDD (image data storage device), 103 ... Memory controller, 105 ... Shared Memory 107 107 HDD (image data storage device) 109 System bus 111a to 111n Secondary cache memory PE1 PEn Arithmetic unit (processor core, core processor, unit processing device)

特開平０５−１４３５５２号公報JP 05-143552 A 特開平１１−１７０６５７号公報JP-A-11-170657 特開２００３−３４８３５５号公報JP 2003-348355 A

Claims

In an image processing method in which image data of a plurality of regions divided in a matrix in two directions orthogonal to each other in the JPEG method is independently compressed in parallel by two or more processors.
The individual image data includes an information for specifying a processor when the image data of the area is compressed.

In the JPEG method, the compressed image data is expanded for each of a plurality of regions divided in a matrix in two directions orthogonal to each other, and is returned to the position before the division based on information at the time of compression attached to the image data. In an image processing method in which decompression is performed in parallel by two or more processors,
An image processing method characterized in that information at the time of compression attached to individual image data includes information that can identify a processor at the time of compression.

Multiple processors with local memory,
A shared memory accessible to each of the processors;
A storage control unit that stores a part of the input image or the entire image in a memory together with unique information of the processor;
A part or the whole of the stored input image is expanded in memory to divide the image area along the vertical direction in the direction in which the addresses are continuous, and the image data of each divided area is Assign to any one, perform color conversion, DCT, quantization, Huffman code processing, and if the Huffman code data generated as a result of processing exceeds a certain amount, write a certain amount of Huffman code data Write the data that can identify the issued processor to the memory, and when the compression processing by all the processors is completed, write the processor information at the time of writing that can identify the processor, read the processor information at the time of writing from the compressed data, process Huffman decoding, inverse quantization, inverse DCT while sequentially reading only the data corresponding to the processor performing An image processing unit for decompression the compressed image subjected to the inverse color conversion processing,
An image processing apparatus comprising:

4. The image processing apparatus according to claim 3, wherein the image data on the input buffer is divided in the vertical direction and each is processed in parallel by different processors.

4. The image processing apparatus according to claim 3, wherein, when the compression processing is performed in parallel, when a certain amount of compressed data is accumulated in a local memory of the processor, the data is written without being synchronized with another processor.

4. The image processing apparatus according to claim 3, wherein at the time of the compression, each processor writes data that can identify that it has written data when the processor writes code data.

4. The image processing apparatus according to claim 3, wherein when decoding the image data, a plurality of processors can perform parallel image restoration while sequentially reading the data based on the processor information at the time of compression written in the input data.