JP2010237936A

JP2010237936A - Image processor, image processing method, and program

Info

Publication number: JP2010237936A
Application number: JP2009084813A
Authority: JP
Inventors: Yuji Furuta; 勇次古田
Original assignee: NEC Embedded Products Ltd
Current assignee: NEC Embedded Products Ltd
Priority date: 2009-03-31
Filing date: 2009-03-31
Publication date: 2010-10-21
Anticipated expiration: 2029-03-31
Also published as: JP5391780B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image processor for achieving the efficiency of image processing by restricting the maximum value of the number of difference rectangles in the case of distributed processing. <P>SOLUTION: An image processor 1 is provided with a CPU 10 and a co-processor 20 having a plurality of cores to be operated in parallel, and the CPU 10 divides one screen into a plurality of regions, and those cores are assigned to each of those regions one by one, and as for difference rectangles extracted by the processing of the cores, the difference rectangles extracted in the adjacent regions are merged following a prescribed rule, so that the number of difference rectangles on a whole screen is restricted. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、検出された差分矩形をマージして画像処理の効率化を図る技術に関する。 The present invention relates to a technique for improving the efficiency of image processing by merging detected difference rectangles.

自宅のＰＣを遠隔地の別の装置からリモート操作する際に、自宅のＰＣの画面に表示されるべき画像を、遠隔地にある別の装置の画面上に再現したい場合がある。 When a home PC is remotely operated from another device at a remote location, an image to be displayed on the screen of the home PC may be desired to be reproduced on the screen of another device at a remote location.

例えば、自宅のＰＣの画面を遠隔地の別の装置の画面上に表示し始めたときには、全画面の情報を送信するが、それ以降は自宅のＰＣの画面の変化部分のみの情報を遠隔地の別の装置側に送信し、遠隔地の別の装置では、変化部分のみを描画するようにすると、データ送信量と描画処理量が少なく済む。 For example, when starting to display the screen of the home PC on the screen of another device at a remote location, the information of the full screen is transmitted, but after that, only the changed part of the screen of the home PC is transmitted to the remote location. If another device at a remote location draws only the changed portion, the data transmission amount and the drawing processing amount can be reduced.

自宅のＰＣと遠隔地の別の装置間で画面表示性能の違い（ディスプレイピクセル数、解像度や表示色数）に違いがある場合には、遠隔地の別の装置側のアプリケーションが適切な変換処理（画面の縮小・拡大や色変換）を行うことであってもよい。 If there is a difference in screen display performance (number of display pixels, resolution or number of display colors) between the home PC and another device at a remote location, the application on the other device at the remote location will perform an appropriate conversion process. (Reduction / enlargement of the screen or color conversion) may be performed.

グラフィックス描画処理を高速に行うために、グラフィックスプロセッシングユニット（以下、「ＧＰＵ」）の複数のコアを並列に動作させる技術がある。コアとはプロセサ・ダイ上に作成されるプロセサ回路の中核部分でキャッシュ・メモリを除く半導体回路部分をいう。マルチコアは、１つのプロセサ・パッケージ内に複数のプロセサ・コアを封入した技術である。 In order to perform graphics drawing processing at high speed, there is a technique for operating a plurality of cores of a graphics processing unit (hereinafter, “GPU”) in parallel. The core is a semiconductor circuit portion excluding the cache memory at the core portion of the processor circuit created on the processor die. Multi-core is a technology in which a plurality of processor cores are enclosed in one processor package.

複数のコアによって並列にグラフィックス描画処理を行う場合、複数のコアにどのように処理を振り分けるかが問題となる。複数のコアに描画処理を振り分ける方法として、画面をn個の領域に分割し、ｎ個のコアを各領域の描画処理に割り当てる方法がある。例えばｎ＝１６の場合、図１０に示すように、画面を１６個の領域に分割し、１６個のＧＰＵのコアをそれぞれ画面の各領域の描画処理に振り分ける。 When graphics drawing processing is performed in parallel by a plurality of cores, how to distribute the processing to the plurality of cores becomes a problem. As a method of assigning drawing processing to a plurality of cores, there is a method of dividing a screen into n regions and assigning n cores to the drawing processing of each region. For example, in the case of n = 16, as shown in FIG. 10, the screen is divided into 16 areas, and the 16 GPU cores are allocated to the drawing process of each area of the screen.

関連技術として、エンコーダサーバは、ＭＰＥＧ−２エンコーダから送信されたＭＰＥＧ−２信号を受信し、ＭＰＥＧ−２より情報量の圧縮率が高く、符号化後の伝送速度が低いＭＰＥＧ−４信号に変換し、受信端末は、受信したＭＰＥＧ−４信号をＭＰＥＧ−４デコード部により復号化し、表示部に画像を表示する技術が提案されている（例えば、特許文献１参照）。 As a related technique, an encoder server receives an MPEG-2 signal transmitted from an MPEG-2 encoder, and converts it into an MPEG-4 signal having a higher information compression rate and a lower transmission rate after encoding than MPEG-2. A technique has been proposed in which a receiving terminal decodes a received MPEG-4 signal by an MPEG-4 decoding unit and displays an image on a display unit (see, for example, Patent Document 1).

特開２００６−１３６０１３号公報JP 2006-136013 A

しかしながら、上述の関連技術のように、画面を複数の領域に分割し、複数のＧＰＵなどのコアで分散処理を行うと、１つのＣＰＵで画面全体を１つとみなして処理した結果とは異なる点が生ずる。 However, when the screen is divided into a plurality of areas and distributed processing is performed with a plurality of cores such as GPUs as in the related art described above, the result differs from the result of processing the entire screen as one by one CPU. Will occur.

図１１に示すように、画面を上下に分割し横方向に長い短冊状にし、分割された各領域の描画処理をＧＰＵなどの複数のコアで分散処理を行い、別々の領域で差分を検出すると、本来よりも多くの差分矩形が抽出されてしまう。すなわち、領域をまたがる差分矩形は、本来ならば１つの矩形と認識されるべきところ２つの矩形に分割してそれぞれ抽出されてしまうのである。図１１では差分矩形は、本来２８個のはずが合計４２個抽出されてしまっている。これにより、本来１つの矩形として認識されてれば生じない重複部分のオーバーヘッドが生じ、エンコード／デコードに無駄な処理能力や処理時間を費やしてしまっていた。 As shown in FIG. 11, when the screen is vertically divided into strips that are long in the horizontal direction, the drawing processing of each divided area is distributed by a plurality of cores such as GPUs, and a difference is detected in separate areas. More differential rectangles are extracted than originally. In other words, the difference rectangle that crosses the region is extracted by dividing it into two rectangles, which should be recognized as one rectangle. In FIG. 11, a total of 42 difference rectangles, which should have been 28, have been extracted. As a result, an overhead of an overlapping portion that does not occur if it is originally recognized as one rectangle occurs, and wasteful processing capacity and processing time are consumed for encoding / decoding.

また、より多くの差分矩形が抽出されてしまうので画面全体での差分矩形数の最大値を制限することが困難であった。 In addition, since more difference rectangles are extracted, it is difficult to limit the maximum number of difference rectangles in the entire screen.

本発明は、以上のような課題を解決するためになされたもので、分散処理時の差分矩形数の最大値を制限し画像処理の効率化が図られた画像処理装置、画像処理方法およびプログラムを提供することを目的とする。 The present invention has been made in order to solve the above-described problems. An image processing apparatus, an image processing method, and a program for improving the efficiency of image processing by limiting the maximum value of the number of difference rectangles during distributed processing. The purpose is to provide.

本発明の画像処理装置は、ＣＰＵと並列処理に向いたコアを複数有するコプロセッサを備え、
前記ＣＰＵが１画面を複数の領域に分割し、それぞれの領域毎に１つずつ前記コアを割り当て、
前記コアの処理により抽出された差分矩形について、隣接する領域で抽出された差分矩形を所定のルールに従ってマージして、１画面全体での差分矩形数を制限することを特徴とする。 The image processing apparatus of the present invention includes a coprocessor having a plurality of cores suitable for parallel processing with a CPU,
The CPU divides one screen into a plurality of areas, and assigns the core to each area one by one.
Regarding the difference rectangles extracted by the processing of the core, the difference rectangles extracted in adjacent areas are merged according to a predetermined rule to limit the number of difference rectangles in one entire screen.

また、本発明の画像処理方法は、ＣＰＵと並列処理に向いたコアを複数有するコプロセッサを利用した画像処理方法であって、
前記ＣＰＵが１画面を複数の領域に分割し、それぞれの領域毎に１つずつ前記コアを割り当てるステップと、
前記コアが、抽出した差分矩形について、隣接する領域で抽出された差分矩形を所定のルールに従ってマージするステップとを有し１画面全体での差分矩形数を制限することを特徴とする。 The image processing method of the present invention is an image processing method using a coprocessor having a plurality of cores suitable for parallel processing with a CPU,
The CPU divides one screen into a plurality of areas, and assigns the core one by one for each area;
The core includes a step of merging the difference rectangles extracted in adjacent areas according to a predetermined rule for the extracted difference rectangles, and limiting the number of difference rectangles in one screen.

また、本発明のプログラムは、ＣＰＵと並列処理に向いたコアを複数有するコプロセッサを供えるコンピュータに、
前記ＣＰＵに１画面を複数の領域に分割し、それぞれの領域毎に１つずつ前記コアを割り当てる処理を実行させ、
前記コアに、抽出した差分矩形について、隣接する領域で抽出された差分矩形を所定のルールに従ってマージする処理を実行させ１画面全体での差分矩形数を制限することを特徴とする。 Further, the program of the present invention is provided on a computer provided with a coprocessor having a plurality of cores suitable for CPU and parallel processing.
Causing the CPU to divide one screen into a plurality of areas, and to execute a process of assigning the core one by one for each area;
The core executes a process of merging the difference rectangles extracted in adjacent areas according to a predetermined rule for the extracted difference rectangles to limit the number of difference rectangles in one entire screen.

本発明によれば、分散処理時の差分矩形数の最大値を制限し画像処理の効率化を図ることが可能となる。 According to the present invention, it is possible to limit the maximum value of the number of difference rectangles during distributed processing and improve the efficiency of image processing.

本発明の実施の形態に係る画像処理装置の構成を示す図である。It is a figure which shows the structure of the image processing apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る画像処理装置の機能ブロック図である。1 is a functional block diagram of an image processing apparatus according to an embodiment of the present invention. 本発明の実施の形態に係る処理動作を示すフローチャートである。It is a flowchart which shows the processing operation which concerns on embodiment of this invention. 本発明の実施の形態に係る画面を上下に分割し横方向に長い短冊状に分割することを示す概念図である。It is a conceptual diagram which shows dividing | segmenting the screen which concerns on embodiment of this invention up and down, and dividing | segmenting into the strip shape long in a horizontal direction. 本発明の実施の形態に係るコアに割り当てた状態の概念図である。It is a conceptual diagram of the state allocated to the core which concerns on embodiment of this invention. 本発明の実施の形態に係る差分矩形が抽出された状態の概念図である。It is a conceptual diagram of the state from which the difference rectangle which concerns on embodiment of this invention was extracted. 本発明の実施の形態に係る差分矩形が抽出された状態の概念図である。It is a conceptual diagram of the state from which the difference rectangle which concerns on embodiment of this invention was extracted. 差分矩形をマージする所定のルールを示す図である。It is a figure which shows the predetermined rule which merges a difference rectangle. その他の実施の形態を示す図である。It is a figure which shows other embodiment. 従来の画面を上下に分割し横方向に長い短冊状に分割することを示す概念図である。It is a conceptual diagram which shows dividing | segmenting the conventional screen into the strip shape long divided | segmented up and down horizontally. 従来のコアに割り当てた状態の概念図である。It is a conceptual diagram of the state allocated to the conventional core.

以下、本発明の実施の形態について図面を参照して詳細に説明する。図１に示す本実施の形態における画像処理装置１は、グラフィックデータ処理により、複数の描画コマンドを生成するソフトウェアを実行する中央演算処理装置（以下、「ＣＰＵ」）１０と、複数の描画コマンドで描画処理を行って、複数に分割された画面の各領域の画像描画用データを並列に生成する描画処理部としてのＧＰＵ２０とそのコア２０−１〜２０−１６と、プログラム記憶部４０、メインメモリ５０、入力部６０及び出力部７０を備え、ＣＰＵ１０、ＧＰＵ２０、フレームメモリ３０、プログラム記憶装置４０、メインメモリ５０、入力装置６０及び出力装置７０は、それぞれデータ転送等のためのバス８０を介して接続されている。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. 1 includes a central processing unit (hereinafter referred to as “CPU”) 10 that executes software for generating a plurality of drawing commands by graphic data processing, and a plurality of drawing commands. The GPU 20 and its cores 20-1 to 20-16 as a drawing processing unit that performs drawing processing and generates image drawing data for each area of the divided screen in parallel, a program storage unit 40, a main memory 50, an input unit 60, and an output unit 70. The CPU 10, GPU 20, frame memory 30, program storage device 40, main memory 50, input device 60, and output device 70 are respectively connected via a bus 80 for data transfer and the like. It is connected.

ＣＰＵ１０は、アプリケーションソフトウェア、アプリケーション・プログラム・インタフェース（ＡＰＩ）ミドルウェア等の各種ミドルウェア、及びデバイスドライバ等のソフトウェアを実行して、描画コマンドを生成する。ＣＰＵ１０がソフトウェアに制御されて処理対象のグラフィックデータから生成する描画コマンドは、ＧＰＵ２０のコア２０−１〜２０−１６に転送される。 The CPU 10 executes various middleware such as application software, application program interface (API) middleware, and software such as a device driver to generate a drawing command. Drawing commands generated from the graphic data to be processed by the CPU 10 being controlled by software are transferred to the cores 20-1 to 20-16 of the GPU 20.

ＧＰＵ２０は、並列的な処理に向いたコアを複数持つ、コプロセッサの例示であり、同様の機能を有するものであればＧＰＵに限定されない。ここでは、画面の領域を１６個に分割し、それぞれの領域についてコア２０−１〜２０−１６が各々１個の領域の処理を担当する。
コア２０−１〜２０−１６によって生成された画像描画用データは、フレームメモリ３０に格納される。 The GPU 20 is an example of a coprocessor having a plurality of cores suitable for parallel processing, and is not limited to a GPU as long as it has a similar function. Here, the area of the screen is divided into 16 areas, and the cores 20-1 to 20-16 are in charge of processing each area for each area.
The image drawing data generated by the cores 20-1 to 20-16 is stored in the frame memory 30.

プログラム記憶部４０は、アプリケーションソフトウェア、ＡＰＩミドルウェア、デバイスドライバ等の、ＣＰＵ１０によって実行されるソフトウェアのプログラムコードを格納する。近年、ＧＰＵ２０で汎用のプログラムを実行することができるようになっており、
そのプログラムを格納していてもよい。 The program storage unit 40 stores program codes of software executed by the CPU 10, such as application software, API middleware, and device drivers. In recent years, it has become possible to execute general-purpose programs on the GPU 20,
The program may be stored.

メインメモリ５０は、グラフィックデータ記憶領域５１及びコマンドバッファ領域５２を有する。グラフィックデータ記憶領域５１は処理対象のグラフィックデータを格納する。コマンドバッファ領域５２は、コア２０−１〜２０−１６に転送される描画コマンドを格納する。コマンドバッファ領域５２はＧＰＵ２０が有するコア個数に応じて区分される。図１に示した例では、コマンドバッファ領域５２は第１コマンドバッファ５２０１〜第１６コマンドバッファ５２１６を有する。後述するように、第１コマンドバッファ５２０１〜第１６コマンドバッファ５２１６は、ＣＰＵ１０がデバイスドライバを実行することによって設定される。また、メインメモリ５０は、ＣＰＵ１０の処理により生成されるデータ等を一時的に保存する。 The main memory 50 has a graphic data storage area 51 and a command buffer area 52. The graphic data storage area 51 stores graphic data to be processed. The command buffer area 52 stores drawing commands transferred to the cores 20-1 to 20-16. The command buffer area 52 is divided according to the number of cores that the GPU 20 has. In the example illustrated in FIG. 1, the command buffer area 52 includes a first command buffer 5201 to a sixteenth command buffer 5216. As will be described later, the first command buffer 5201 to the sixteenth command buffer 5216 are set by the CPU 10 executing a device driver. The main memory 50 temporarily stores data generated by the processing of the CPU 10 and the like.

入力部６０はユーザがデータ等を入力するキーボード、マウス等である。出力部７０は、画像処理結果・出力画像等を外部送信可能なインタフェースや表示部としてのディスプレイやプリンタ等である。 The input unit 60 is a keyboard, a mouse, or the like on which a user inputs data. The output unit 70 is an interface capable of externally transmitting image processing results, output images, and the like, a display as a display unit, a printer, and the like.

アプリケーションソフトウェアは、画像処理装置１が有するＧＰＵ２０のコアの個数に関係なく入力部６０から入力されたグラフィックデータを処理し、処理結果として得られるＡＰＩ関数コール（目的の描画を行うための指示）をＡＰＩミドルウェアに引き渡す。ＡＰＩミドルウェアはそのＡＰＩ関数コールを処理して、ＧＰＵ２０のコア２０−１〜２０−１６のハードウェアが実行可能な描画コマンドを生成する。デバイスドライバは、ＡＰＩミドルウェアによって生成された描画コマンドをコマンドバッファ領域５２にバッファリングした後、コア２０−１〜２０−１６が新たな処理を開始できるタイミングで、例えば１フレーム分等のコア２０−１〜２０−１６の処理単位の描画コマンドをコマンドバッファ領域５２からコア２０−１〜２０−１６に転送する。コア２０−１〜２０−１６は描画コマンドを用いて描画処理を行い、描画処理結果である画像描画用データ（例えばピクセルデータ）をフレームメモリ３０に格納し、その後出力部７０から所定の出力がなされる。 The application software processes the graphic data input from the input unit 60 regardless of the number of cores of the GPU 20 included in the image processing apparatus 1, and issues an API function call (instruction for performing target drawing) obtained as a processing result. Deliver to API middleware. The API middleware processes the API function call to generate a drawing command that can be executed by the hardware of the core 20-1 to 20-16 of the GPU 20. The device driver buffers the drawing command generated by the API middleware in the command buffer area 52, and then at a timing when the cores 20-1 to 20-16 can start a new process, for example, the core 20- The drawing command of the processing unit of 1-20-16 is transferred from the command buffer area 52 to the cores 20-1 to 20-16. The cores 20-1 to 20-16 perform a drawing process using a drawing command, store image drawing data (for example, pixel data) as a drawing process result in the frame memory 30, and then a predetermined output is output from the output unit 70. Made.

上述の画像処理装置１に関する構成で、ユーザの入力部６０に対する操作の結果として、フレームメモリ３０の内容が更新される。
以下では、画像処理装置１が、当該フレームメモリ３０、または、出力部７０から出力される画像を遠隔地の装置に送信する際に、画像の変化の差分を抽出する処理について着目して詳細に説明する。 In the configuration related to the image processing apparatus 1 described above, the contents of the frame memory 30 are updated as a result of the user's operation on the input unit 60.
Hereinafter, when the image processing device 1 transmits an image output from the frame memory 30 or the output unit 70 to a remote device, the processing for extracting the difference between the image changes will be described in detail. explain.

図２に、本実施の形態における画像処理装置１の機能ブロック図を示す。 FIG. 2 shows a functional block diagram of the image processing apparatus 1 in the present embodiment.

本実施の形態における画像処理装置１は、画面領域分割部１１１と、割当部１１２、差分検出部１１３と、矩形マージ部１１４とを備えている。 The image processing apparatus 1 according to the present embodiment includes a screen area dividing unit 111, an assigning unit 112, a difference detecting unit 113, and a rectangle merging unit 114.

画面領域分割部１１１は、ＧＰＵ２０のコア数に応じて並列処理が可能な領域数に１つの画面領域を分割する機能を有している。ＣＰＵ１０がＧＰＵ２０に保有しているコア数を問い合わせてそのコア数以内の数で分割することであってよい。なお、本実施の形態の説明においては、画面を上下に分割し横方向に長い短冊状に分割する例で説明するが、画面の分割方法は上下に分割する方法に限らず、例えば画面を左右に分割する等、画面を複数の領域に分割する方法であればどのような分割方法であってもよい。 The screen area dividing unit 111 has a function of dividing one screen area into the number of areas that can be processed in parallel according to the number of cores of the GPU 20. The CPU 10 may inquire about the number of cores held in the GPU 20 and divide by a number within the number of cores. In the description of this embodiment, an example in which the screen is divided vertically and divided into strips that are long in the horizontal direction will be described. However, the method of dividing the screen is not limited to the method of dividing vertically, and for example, the screen is Any division method may be used as long as it divides the screen into a plurality of areas.

割当部１１２は、分割された各領域の描画処理やマージ処理をＧＰＵ２０の所定のコア２０−１〜２０−１６等に割り当てる機能を有している。なお、所定の領域については本実施の形態におけるマージ処理を行うコアを割り当てない、すなわちマージ処理を行わないことであってもよい。 The assigning unit 112 has a function of assigning drawing processing and merging processing of each divided area to predetermined cores 20-1 to 20-16 of the GPU 20. Note that a predetermined area may not be assigned with a core for performing the merge process in the present embodiment, that is, the merge process may not be performed.

差分検出部１１３は、２つの画面（例えば、現在の画面とその直前の画面）に関して、分割された各領域ついて差分矩形を抽出する機能を有している。分割された各領域の処理が割り当てられた所定のコア２０−１〜２０−１６等がそれぞれ差分矩形を抽出する。 The difference detection unit 113 has a function of extracting a difference rectangle for each divided area with respect to two screens (for example, the current screen and the immediately preceding screen). Predetermined cores 20-1 to 20-16 and the like to which the processing of each divided area is assigned extract the difference rectangle.

矩形マージ部１１４は、所定のルールに従って差分矩形をマージする機能を有している。領域をまたがってそれぞれ抽出されてしまう差分矩形を本来の１つの矩形とする、または本来１つの矩形ではないかもしれないが、領域をまたがってそれぞれ抽出されてしまっている蓋然性が高い差分矩形を１つの矩形とする処理である。矩形座標のマージ処理は２つの矩形の接する直線の始点の座標データを消去してさらに終点の座標データを消去すること等で実現されてよい。マージ処理の実行は、ＣＰＵの空き状況に応じてＣＰＵに処理を渡してもよいし、各コアで行ってもよいし、所定のコアで行ってもよいし、それらの組合せにより並列処理されてもよい。
所定の記憶部に格納されている１画面あたりの総抽出矩形数の最大値以内になるように差分矩形をマージする。 The rectangle merging unit 114 has a function of merging difference rectangles according to a predetermined rule. The difference rectangles that are extracted across the regions are made one original rectangle, or may not be one rectangle originally, but the difference rectangles that are highly likely to be extracted across the regions are 1 It is a process to make two rectangles. The process of merging rectangular coordinates may be realized by deleting the coordinate data of the start point of the straight line where two rectangles are in contact and further deleting the coordinate data of the end point. Execution of merge processing may be performed by the CPU depending on the availability of the CPU, may be performed by each core, may be performed by a predetermined core, or may be performed in parallel by a combination thereof. Also good.
The difference rectangles are merged so as to be within the maximum value of the total number of extracted rectangles per screen stored in a predetermined storage unit.

以下、本実施の形態の動作について図面を参照して詳細に説明する。
図３のフローチャートを参照すると、まず、画像処理装置１の画面領域分割部１１１は、ＧＰＵ２０のコア数に応じて並列処理が可能な領域数に１つの画面領域を分割する（Ｓ３０１）。具体的にはＣＰＵ１０がＧＰＵ２０に保有しているコア数を問い合わせてそのコア数である１６個に、画面を上下に分割し横方向に長い短冊状に分割する（図４参照）。 Hereinafter, the operation of the present embodiment will be described in detail with reference to the drawings.
Referring to the flowchart of FIG. 3, first, the screen area dividing unit 111 of the image processing apparatus 1 divides one screen area into the number of areas that can be processed in parallel according to the number of cores of the GPU 20 (S301). Specifically, the CPU 10 inquires about the number of cores possessed by the GPU 20 and divides the screen vertically into 16 pieces that are the number of cores, and divides the screen into long strips in the horizontal direction (see FIG. 4).

次に、割当部１１２は、１６個に分割された各領域の描画処理やマージ処理をＧＰＵ２０のコア２０−１〜２０−１６に割り当てる（Ｓ３０２）。割り当てた状態の概念図を図５に示す。具体的にはＣＰＵ１０が１６分割した領域の上部から順にコア２０−１〜２０−１６に割り当てる。 Next, the assigning unit 112 assigns the drawing process and the merge process of each area divided into 16 parts to the cores 20-1 to 20-16 of the GPU 20 (S302). A conceptual diagram of the allocated state is shown in FIG. Specifically, the CPU 10 assigns the cores 20-1 to 20-16 in order from the top of the 16-divided area.

差分検出部１１３は、現在の画面とその直前の画面の２つの画面に関して、１６個に分割された各領域ついて各々差分矩形を抽出する（Ｓ３０３）。具体的には１６個に分割された各領域の処理が割り当てられたコア２０−１〜２０−１６がそれぞれ差分矩形を抽出する。差分矩形が抽出された状態の概念図を図６に示す。 The difference detection unit 113 extracts a difference rectangle for each of the 16 regions divided into the two screens of the current screen and the screen immediately before the current screen (S303). Specifically, the cores 20-1 to 20-16 to which the processing of each area divided into 16 is assigned extract the difference rectangle. FIG. 6 shows a conceptual diagram in a state where the difference rectangle is extracted.

そして、矩形マージ部１１４は、所定のルールに従って差分矩形をマージする（Ｓ３０４）。具体的にはマージ処理の実行は、ＣＰＵの空き状況に応じてＣＰＵに処理を渡してもよいし、各コアで行ってもよいし、所定のコアで行ってもよいし、それらの組合せにより並列処理されてもよい。
差分矩形が抽出された状態を示す図７の破線で囲まれた部分は、本来の１つの矩形であるのに関わらず領域をまたがって差分矩形がそれぞれ抽出されてしまっている。 Then, the rectangle merge unit 114 merges the difference rectangles according to a predetermined rule (S304). Specifically, the merge process may be performed by the CPU depending on the CPU availability, may be performed by each core, may be performed by a predetermined core, or a combination thereof. It may be processed in parallel.
In the portion surrounded by the broken line in FIG. 7 showing the state in which the difference rectangle is extracted, the difference rectangle is extracted across the region regardless of the original one rectangle.

以下、差分矩形をマージする所定のルールを図８を参照して説明する。所定の記憶部に格納されている１画面あたりの総抽出矩形数の最大数を３２個に限定する場合を例に解説する。すなわち、総抽出矩形数が３２個以内になるように差分矩形をマージする。たて方向のマージであるので基本的にはＸ座標に着目する。
まず、Ｓｔｅｐ１として、２つの隣接する矩形のＸ座標が一致するときにマージする。
Ｓｔｅｐ１の処理後まだ３２個以内とならない場合、次に、Ｓｔｅｐ２として、２つの隣接する矩形の長いほうの辺Ｂの長さが、短いほうの辺Ａの長さの１．６倍以内のときにはマージする。辺が共通している部分は有効画素エリアであるが、辺が共通していない部分は無効画素エリアであり、１．６倍に限らず有効画素エリアと無効画素エリアの面積比で効率性を損なわない値を用いてよい。
Ｓｔｅｐ２の処理後まだ３２個以内とならない場合、次に、Ｓｔｅｐ３として、２つの矩形のＸ座標が一致し、辺Ａと辺ＢのＹ座標の距離が所定値以下のときにマージする。すなわち少し離れて隙間が生じている空間をつめることとなる。
Ｓｔｅｐ３の処理後まだ３２個以内とならない場合、次に、Ｓｔｅｐ４として、２つの矩形の辺Ａと辺Ｂが近いときにマージする。
Ｓｔｅｐ４の処理後まだ３２個以内とならない場合、次に、Ｓｔｅｐ５として、画面の左上から右方向、かつ下方向に始点座標を探索し、３２個目の始点から残る矩形のＸ方向、Ｙ方向の最大値までをマージする。すなわち、３２個〜４２個までは１個にマージしてしまう。 Hereinafter, a predetermined rule for merging difference rectangles will be described with reference to FIG. An example in which the maximum number of total extracted rectangles per screen stored in a predetermined storage unit is limited to 32 will be described. That is, the difference rectangles are merged so that the total number of extracted rectangles is 32 or less. Since it is a merging of the vertical direction, focus on the X coordinate basically.
First, as Step 1, merging is performed when the X coordinates of two adjacent rectangles match.
If the length of the longer side B of two adjacent rectangles is less than 1.6 times the length of the shorter side A, then, if the number is still not less than 32 after the processing of Step 1 Merge. The part where the sides are common is the effective pixel area, but the part where the sides are not common is the invalid pixel area, which is not limited to 1.6 times, and the efficiency is improved by the area ratio of the effective pixel area and the invalid pixel area. A value that is not impaired may be used.
If the number is still not less than 32 after the processing of Step 2, then Step 3 is merged when the X coordinates of the two rectangles match and the distance between the Y coordinates of Side A and Side B is equal to or less than a predetermined value. That is, the space where the gap is generated is slightly separated.
If the number is still not less than 32 after Step 3 processing, then, as Step 4, merging is performed when side A and side B of the two rectangles are close.
If the number is still not less than 32 after the processing of Step 4, next, as Step 5, the start point coordinates are searched from the upper left of the screen to the right and downward, and the X and Y directions of the rectangle remaining from the 32nd start point are searched. Merge up to the maximum value. That is, 32 to 42 are merged into one.

上記の本実施の形態によれば、１画面あたりの総抽出矩形数の最大値の最大値が３２個で、１画面を１６分割する場合、１つのコアあたりの抽出矩形数を最大２個までに限定すれば３２個以内に収まるが、効率が悪いという状況を回避できる。
すなわち、１つのコアあたりの抽出矩形数を最大２個までに制限すると、ある分割された領域を担当するコアでは、差分が検出されずに０個の差分矩形であった場合に、本来であれば、その２つ分は、分割された領域を担当するコアで利用して４つの差分矩形を検出しても良いはずであるが、これらの処理は複数のコアで平行して行うため、融通を行うことができない。そのため、全ての分割された領域を担当するコアについて、２個以下と制限するため、効率が悪くなる。このような状況を回避できるのである。 According to the above-described embodiment, when the maximum value of the total number of extracted rectangles per screen is 32 and one screen is divided into 16, the maximum number of extracted rectangles per core is two. However, it is possible to avoid a situation where the efficiency is low.
In other words, if the number of extraction rectangles per core is limited to a maximum of two, if the core in charge of a certain divided area has no difference detected and there are zero difference rectangles, For example, the two areas may be used by the core in charge of the divided area to detect four difference rectangles, but these processes are performed in parallel by a plurality of cores. Can not do. For this reason, the number of cores in charge of all the divided areas is limited to two or less, resulting in poor efficiency. This situation can be avoided.

なお、その他の実施の形態として図９に示すように、ＣＰＵ１０がＯＳから入手した動画領域の座標から、１画面を動画領域とそれ以外の領域に５分割し、動画領域にはＣＰＵ１０を割り当て、動画領域以外の４領域には面積比からＧＰＵ２０のコアを割り当ててもよい。すなわち、動画領域は激しく変化し、毎画面で差分が多数生じるので帯域確保のためデータを間引く必要があるので別途処理することが求められる。
動画領域およびその他の領域では、上記のＳｔｅｐ１〜Ｓｔｅｐ５のマージ処理を行わないことであっても、また、動画領域以外の４領域については、領域の境界で、上記のＳｔｅｐ１〜Ｓｔｅｐ５のマージ処理を行うことであってもよい。 As another embodiment, as shown in FIG. 9, one screen is divided into a moving image area and other areas from the coordinates of the moving image area obtained by the CPU 10 from the OS, and the CPU 10 is assigned to the moving image area. You may allocate the core of GPU20 to 4 area | regions other than a moving image area | region from area ratio. That is, the moving image area changes drastically, and a large number of differences are generated on each screen. Therefore, it is necessary to thin out data in order to secure a band, and therefore, it is required to process it separately.
In the moving image region and other regions, even if the above Step 1 to Step 5 merge processing is not performed, the above Step 1 to Step 5 merge processing is performed at the boundary of the region for the four regions other than the moving image region. It may be done.

なお、上述する各実施の形態は、本発明の好適な実施の形態であり、本発明の要旨を逸脱しない範囲内において種々変更実施が可能である。例えば、画像処理装置１の機能を実現するためのプログラムを各装置に読込ませて実行することにより各装置の機能を実現する処理を行ってもよい。さらに、そのプログラムは、コンピュータ読み取り可能な記録媒体であるＣＤ−ＲＯＭまたは光磁気ディスクなどを介して、または伝送媒体であるインターネット、電話回線などを介して伝送波により他のコンピュータシステムに伝送されてもよい。 Each of the above-described embodiments is a preferred embodiment of the present invention, and various modifications can be made without departing from the scope of the present invention. For example, a process for realizing the function of each apparatus may be performed by causing each apparatus to read and execute a program for realizing the function of the image processing apparatus 1. Further, the program is transmitted to another computer system by a transmission wave via a computer-readable recording medium such as a CD-ROM or a magneto-optical disk, or via a transmission medium such as the Internet or a telephone line. Also good.

１０ＣＰＵ
２０ＧＰＵ
２０−１〜２０−１６コア
３０フレームメモリ
４０プログラム記憶部
５０メインメモリ
６０入力部
７０出力部
８０バス部 10 CPU
20 GPU
20-1 to 20-16 Core 30 Frame memory 40 Program storage unit 50 Main memory 60 Input unit 70 Output unit 80 Bus unit

Claims

A CPU and a coprocessor having a plurality of cores suitable for parallel processing,
The CPU divides one screen into a plurality of areas, and assigns the core to each area one by one.
An image processing apparatus that limits the number of difference rectangles in one screen by merging the difference rectangles extracted in adjacent areas according to a predetermined rule for the difference rectangles extracted by the processing of the core.

The one screen is divided vertically and divided into strips that are long in the horizontal direction,
The predetermined rule is:
Merge when the X coordinates of two adjacent rectangles match,
Merge when the length of the longer side B of two adjacent rectangles is within a predetermined magnification of the length of the shorter side A;
Merge when the X coordinates of the two rectangles match and the distance between the Y coordinates of side A and side B is less than or equal to a predetermined value;
Merge when side A and side B of two rectangles are close,
By searching for the coordinates of the starting point from the upper left to the lower right of the screen, and merging from the number of starting points corresponding to the number of allowable difference rectangles in the entire screen to the maximum value in the X and Y directions of the remaining rectangle. The image processing apparatus according to claim 1, wherein the image processing apparatus is provided.

The image processing apparatus according to claim 1, wherein the merging process is not performed by distinguishing a specific area from other areas.

An image processing method using a coprocessor having a plurality of cores suitable for parallel processing with a CPU,
The CPU divides one screen into a plurality of areas, and assigns the core one by one for each area;
An image processing method, wherein the core has a step of merging the difference rectangles extracted in adjacent areas according to a predetermined rule for the extracted difference rectangles, and limiting the number of difference rectangles in one entire screen .

In a computer with a coprocessor that has multiple cores suitable for CPU and parallel processing,
Causing the CPU to divide one screen into a plurality of areas, and to execute a process of assigning the core one by one for each area;
A program for limiting the number of difference rectangles in one entire screen by causing the core to execute a process of merging difference rectangles extracted in adjacent areas according to a predetermined rule.