JPH1091594A

JPH1091594A - Multiprocessor data processor and its method

Info

Publication number: JPH1091594A
Application number: JP24368596A
Authority: JP
Inventors: Toshifumi Honda; 敏文本田; Yukio Matsuyama; 幸雄松山; Mitsunobu Isobe; 光庸磯部
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1996-09-13
Filing date: 1996-09-13
Publication date: 1998-04-10

Abstract

PROBLEM TO BE SOLVED: To carry out the processing of continuously transferred data at high speed in parallel with plural processors while sharing a load always equally among them, by storing data of respective blocks, which are divided an allocated, in a local memory so as to process them in a desired processor. SOLUTION: In a picture processing board 104, a correlation operation with standard pictures g1 and g2 are allocated to the stored detection pictures and the correlation operation with a picture g3 is allocated to a picture processing board 105 and the correlation operation with a picture g4 to a picture processing board 106. In scheduling for deciding allocation to the respective processors in a host computer 108, not only the allocation of the processing of a detection picture 109 immediately after but also the allocation of all the processings of detection pictures 109-112 are executed. Data which are continuously acquired are divided into the plural blocks based on the characters of data and data of the respective blocks, which are allocated, are stored in the local memory in the desired processor and they are processed.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、高速測定器の計測
結果や画像などの大量のデータを高速に取り込みなが
ら、それと同時に処理を行うデータ処理装置に関し、特
にそのデータ処理を装置に配置された複数のプロセッサ
に割り当て、各プロセッサが並列にデータ処理を行い、
高速なデータ処理を実現するマルチプロセッサデータ処
理装置およびその方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data processing apparatus for simultaneously processing a large amount of data such as a measurement result of a high-speed measuring instrument and an image while processing the data at a high speed. Assigned to multiple processors, each processor performs data processing in parallel,
The present invention relates to a multiprocessor data processing device and method for realizing high-speed data processing.

【０００２】[0002]

【従来の技術】測定器の計測結果や画像などの大量のデ
ータを高速に取り込みながら、それと同時に処理を行う
データ処理に関しては、例えば文献、二宮隆典：“グリ
ーンシートパターン高速検査装置”、１９９０年電子情
報通信学会春期全国大会、にあるように高速性が要求さ
れる部分はハードウエアロジックによって処理してい
た。また、この他の方式としては例えば文献、上田博
唯、ほか：“リングバスで結合された画像処理用マルチ
プロセッサ”、１９８６年電子通信学会総合全国大会、
にある複数のプロセッサをリング型シフトレジスタで結
び、それぞれを並列に動作させ、高速な処理を行うもの
のように、データ処理を行うプロセッサを複数備えこれ
を並列に動作させるものがある。上記“リングバスで結
合された画像処理用マルチプロセッサ”では処理すべき
画像が大量にありこれに対し順次同一処理を施す場合に
用いる複数の画像をそれぞれ別のプロセッシングユニッ
トに分配する複数データモードや、１枚の画像に対し多
数の特徴量を求める場合に用いる同一画像を複数のプロ
セッサにブロードキャストし、それぞれのプロセッサで
同一の画像から異なった種類の特徴量を求める機能分割
モードを備えていた。2. Description of the Related Art Data processing for simultaneously processing a large amount of data, such as measurement results and images of a measuring instrument, at the same time as high-speed data is described in, for example, Literature, Takanori Ninomiya: "Green Sheet Pattern High-Speed Inspection System", 1990 As in the IEICE Spring National Convention, parts requiring high speed were processed by hardware logic. Other methods include, for example, Literature, Hiroada Ueda, et al .: "Multiprocessor for Image Processing Connected by Ring Bus", 1986 IEICE General Conference,
There is a system in which a plurality of processors for performing data processing are provided and these are operated in parallel, such as a system in which a plurality of processors are connected by a ring-type shift register and operated in parallel to perform high-speed processing. The "multiprocessor for image processing connected by a ring bus" has a large number of images to be processed, and a plurality of data modes for distributing a plurality of images used for sequentially performing the same processing to different processing units. There is provided a function division mode in which the same image used for obtaining a large number of feature amounts for one image is broadcast to a plurality of processors, and each processor obtains different types of feature amounts from the same image.

【０００３】[0003]

【発明が解決しようとする課題】しかし、上記従来のハ
ードウエアロジックによって達成する技術はハードウエ
アロジックによって処理を行うため、様々な処理アルゴ
リズムに対して柔軟に対応することが難しかった。一般
にデータ処理のアルゴリズムは適応対象毎に変化させる
必要があるが、上記従来の方式ではアルゴリズムを変更
させる度にハードウエアロジックを変更させる必要があ
ったため、適応対象を変更するためには多くのコストが
必要であった。また、一連のデータ中に様々な属性のデ
ータが含まれており、それぞれを異なったアルゴリズム
で処理する場合にはきわめて大がかりなハードウエアロ
ジックを構成しない限り不可能であった。また、上記複
数のプロセッサをリングバスで結合し並列動作させる装
置においては、スループットを最大にするために各プロ
セッサの負荷が均等になるように処理を分配する必要が
あるが、処理すべきデータが大量であり、これを順次取
り込みながら処理を行う場合には負荷分散を最適に行う
ことは困難であった。負荷の分散方式には従来の手法と
しては静的負荷分散方式と動的負荷分散方式とがある。
静的負荷分散方式はプログラムのコンパイル時に最適な
プロセッサを割り当てる手法であるが、この方式では取
得データの種類が様々であり、それぞれのデータの性質
によって負荷が変動する場合は対応することができな
い。動的負荷分散方式には集中制御方式と分散制御方式
がある。集中制御方式ではシステム全体を管理するプロ
セッサが存在し、プロセッサの負荷をもとに処理の割り
当てを行う際に物理的なプロセッサを決定する手法であ
るが、この方式は割り当て処理にかかる時間を考慮する
必要があり、連続的に大量のデータをリアルタイムで処
理することが不可欠な場合には適応が難しかった。ま
た、連続的に大量のデータに対して複数の処理を行う場
合、一般に処理に必要な時間は未知であるため、複数の
データブロックをそれぞれ別のプロセッシングユニット
に分配するのが効率的であるのか、あるいは同一のデー
タブロックを複数のプロセッシングユニットに分配し、
それぞれで異なる種類の処理を行うのが良いのかを決定
することは困難であった。このため、上記の２つを組み
合わせて１つのデータブロックを処理するプロセッサ数
を動的に変化させ最適化を図ることは従来行われていな
かった。分散制御方式では、ランダム方式、閾値方式、
巡回方式が知られている。ランダム方式は処理をランダ
ムにプロセッサに割り当てる手法であり、最適な割り当
てが行えることは期待できない。閾値方式は始めに見つ
けた負荷がある一定閾値以下のプロセッサに処理を割り
当てる手法である。これは本来、最も負荷の小さいプロ
セッサに割り当てるのが最適な手法であるため、これも
最適な割り当てを行うことができない。巡回方式はプロ
セッサに番号を振っておき、あるプロセッサに処理を割
り当てたら、つぎはそのプロセッサの番号に一定値を加
えた番号のプロセッサに処理を割り当てる方式である。
この方式は全ての処理が同一であり、かつ処理データの
検出も常に一定である場合には良いが、それ以外の場合
には負荷がばらついてしまい、最適化が図れない。この
ように従来の手法では負荷分散を最適に行うことは不可
能であり、プロセッサ数に比例した性能を引き出すこと
ができなかった。However, the technology achieved by the above-described conventional hardware logic performs processing by the hardware logic, and thus it has been difficult to flexibly cope with various processing algorithms. Generally, the data processing algorithm must be changed for each adaptation target. However, in the above-described conventional method, it is necessary to change the hardware logic every time the algorithm is changed. Was needed. Further, since a series of data includes data of various attributes, it is not possible to process each of them by a different algorithm unless a very large hardware logic is configured. Further, in an apparatus in which a plurality of processors are connected by a ring bus and operated in parallel, it is necessary to distribute the processing so that the load of each processor is equalized in order to maximize the throughput. If the processing is performed while sequentially taking in the load, it is difficult to optimize the load distribution. Conventional methods for load distribution include a static load distribution method and a dynamic load distribution method.
The static load distribution method is a method of allocating an optimum processor at the time of compiling a program. However, in this method, the types of acquired data are various, and it is not possible to cope with a case where the load varies depending on the characteristics of each data. The dynamic load distribution method includes a centralized control method and a distributed control method. In the centralized control method, there is a processor that manages the entire system, and when assigning processing based on the processor load, the physical processor is determined.This method considers the time required for the allocation processing. And it is difficult to adapt when it is essential to continuously process large amounts of data in real time. In addition, when multiple processes are continuously performed on a large amount of data, the time required for processing is generally unknown, so is it efficient to distribute multiple data blocks to different processing units? Or distribute the same data block to multiple processing units,
It has been difficult to determine whether it is better to perform different types of processing for each. For this reason, it has not been conventionally performed to optimize by dynamically changing the number of processors that process one data block by combining the above two. In the distributed control method, a random method, a threshold method,
A traveling system is known. The random method is a method of randomly assigning processes to processors, and it cannot be expected that optimal assignment can be performed. The threshold method is a method of allocating processing to a processor whose load found first is equal to or less than a certain threshold. Since this is originally an optimal method of allocating to the processor with the smallest load, it is impossible to perform optimal allocation. The cyclic method assigns a number to a processor, assigns a process to a certain processor, and then assigns a process to a processor of a number obtained by adding a certain value to the number of the processor.
This method is good when all the processing is the same and the detection of the processing data is always constant, but otherwise, the load varies and optimization cannot be achieved. As described above, it is impossible to perform load distribution optimally by the conventional method, and it is not possible to obtain performance in proportion to the number of processors.

【０００４】本発明の目的は、上記課題を解決すべく、
連続的転送されるデータを複数のプロセッサにおいて常
に同程度の負荷をかけて並列的に高速な処理を行うこと
ができるようにしたマルチプロセッサデータ処理装置お
よびその方法を提供することにある。[0004] An object of the present invention is to solve the above problems.
It is an object of the present invention to provide a multiprocessor data processing device and a method thereof, in which continuously transferred data can always be processed in parallel at high speed by a plurality of processors with the same load.

【０００５】また本発明の他の目的は、被検査対象物か
ら取得されて連続的転送される画像データを複数のプロ
セッサにおいて常に同程度の負荷をかけて並列的に高速
な画像処理を行うことができるようにした検査装置およ
びその方法を提供することにある。Another object of the present invention is to perform high-speed image processing in parallel on image data obtained from an object to be inspected and continuously transferred by always applying the same load to a plurality of processors. It is an object of the present invention to provide an inspection device and a method thereof that can be performed.

【０００６】[0006]

【課題を解決するための手段】本発明は、上記目的を達
成するために、各々にローカルメモリを備えた複数のプ
ロセッサを有し、データを連続的に取得しながらそれの
処理を行っていくマルチプロセッサデータ処理装置にお
いて、取得予定のデータをデータの性質に基づいて複数
に分割された各ブロックに対する前記各プロセッサにお
ける処理の負荷分散が適性化するように前記各ブロック
に対する処理を所望のプロセッサに割当てる割当手段
と、連続的に所得されるデータをデータの性質に基づい
て複数のブロックに分割する分割手段と、該分割手段で
分割され、前記割当手段によって割当てされた各ブロッ
クのデータを前記所望のプロセッサにおいてローカルメ
モリに格納して処理をすることを特徴とするマルチプロ
セッサデータ処理装置である。また本発明は、各々にロ
ーカルメモリを備えた複数のプロセッサを有し、データ
を連続的に取得しながらそれの処理を行っていくマルチ
プロセッサデータ処理装置において、取得予定のデータ
をデータの性質に基づいて複数に分割された各ブロック
に対する処理において並列化可能な部分のそれぞれにか
かる処理時間の期待値と前記各プロセッサの未処理のデ
ータの処理時間の期待値とに基づいて前記各ブロックに
対する処理を所望のプロセッサに割当てる割当手段と、
連続的に所得されるデータをデータの性質に基づいて複
数のブロックに分割する分割手段と、該分割手段で分割
され、前記割当手段によって割当てされた各ブロックの
データを前記所望のプロセッサにおいてローカルメモリ
に格納して処理をすることを特徴とするマルチプロセッ
サデータ処理装置である。In order to achieve the above object, the present invention has a plurality of processors each having a local memory, and performs processing while continuously acquiring data. In the multiprocessor data processing device, processing of each block is performed to a desired processor such that load distribution of processing in each processor is optimized for each block obtained by dividing data to be acquired into a plurality of blocks based on the nature of the data. Allocating means for allocating; dividing means for dividing continuously obtained data into a plurality of blocks based on the nature of the data; and dividing the data of each block divided by the dividing means and allocated by the allocating means into the desired data. Multiprocessor data processing apparatus storing and processing in a local memory in a processor It is. Further, the present invention provides a multiprocessor data processing device which has a plurality of processors each having a local memory and continuously processes data while acquiring the data. Processing for each block based on the expected value of the processing time required for each of the parallelizable parts in the processing on each of the blocks divided based on the plurality of blocks and the expected value of the processing time of the unprocessed data of each processor. Allocating means for allocating to a desired processor,
Dividing means for dividing continuously obtained data into a plurality of blocks on the basis of the nature of the data; and dividing the data of each block divided by the dividing means and assigned by the assigning means into a local memory in the desired processor. And a multiprocessor data processing device for processing.

【０００７】また本発明は、各々にローカルメモリを備
えた複数のプロセッサを有し、データを連続的に取得し
ながらそれの処理を行っていくマルチプロセッサデータ
処理装置において、取得予定のデータをデータの性質に
基づいて複数に分割された各ブロックに対する処理にお
いて並列化可能な部分のそれぞれにかかる処理時間の期
待値と前記各プロセッサの未処理のデータの処理時間の
期待値と割当てするプロセッサの数とに基づいて前記各
ブロックに対する処理を所望のプロセッサに割当てる割
当手段と、連続的に所得されるデータをデータの性質に
基づいて複数のブロックに分割する分割手段と、該分割
手段で分割され、前記割当手段によって割当てされた各
ブロックのデータを前記所望のプロセッサにおいてロー
カルメモリに格納して処理をすることを特徴とするマル
チプロセッサデータ処理装置である。また本発明は、前
記マルチプロセッサデータ処理装置において、前記割当
手段において、各ブロックに対する処理を所望のプロセ
ッサに割当てる際、プロセッサにおける各ブロックに対
する処理の履歴情報を用いて行うように構成したことを
特徴とする。また本発明は、前記マルチプロセッサデー
タ処理装置において、複数の並列実行可能な処理のそれ
ぞれにかかる処理時間の期待値を、データ処理装置の２
次記憶装置に格納されているパラメータをもとに算出す
ることを特徴とする。また本発明は、前記マルチプロセ
ッサデータ処理装置において、複数の並列実行可能な処
理のそれぞれにかかる処理時間の期待値を、あらかじめ
処理を行うデータと同じ属性のデータを処理した際にか
かった処理時間をもとに見積もることを特徴とする。According to the present invention, there is provided a multiprocessor data processing apparatus which has a plurality of processors each having a local memory and performs data processing while continuously acquiring data. The expected value of the processing time required for each of the parallelizable parts in the processing of each block divided into a plurality of blocks based on the property of the above, the expected value of the processing time of the unprocessed data of each processor, and the number of processors to be allocated Allocating means for allocating a process for each block to a desired processor based on the above, dividing means for dividing continuously obtained data into a plurality of blocks based on the nature of the data, divided by the dividing means, The data of each block allocated by the allocation means is stored in a local memory in the desired processor. Making the process Te is a multi-processor data processing apparatus according to claim. Further, the present invention is characterized in that in the multiprocessor data processing device, when allocating a process for each block to a desired processor, the allocating unit is configured to perform the process using history information of the process for each block in the processor. And Further, the present invention provides the multiprocessor data processing device, wherein the expected value of the processing time required for each of the plurality of processes that can be executed in parallel is stored in the data processing device.
The calculation is performed based on the parameters stored in the next storage device. The present invention also provides the multiprocessor data processing device, wherein the expected value of the processing time required for each of the plurality of processes that can be executed in parallel is calculated as the processing time required for processing data having the same attribute as the data to be processed in advance. It is characterized by estimating based on

【０００８】また本発明は、各々にローカルメモリを備
えた複数のプロセッサを用いてデータを連続的に取得し
ながらそれの処理を行っていくマルチプロセッサデータ
処理方法において、取得予定のデータをデータの性質に
基づいて複数に分割された各ブロックに対する前記各プ
ロセッサにおける処理の負荷分散が適性化するように前
記各ブロックに対する処理を所望のプロセッサに割り当
て、この割り当てされた各ブロックのデータを前記所望
のプロセッサにおいてローカルメモリに格納して処理を
することを特徴とするマルチプロセッサデータ処理方法
である。また本発明は、各々にローカルメモリを備えた
複数のプロセッサを用いてデータを連続的に取得しなが
らそれの処理を行っていくマルチプロセッサデータ処理
方法において、取得予定のデータをデータの性質に基づ
いて複数に分割された各ブロックに対する処理において
並列化可能な部分のそれぞれにかかる処理時間の期待値
と前記各プロセッサの未処理のデータの処理時間の期待
値とに基づいて前記各ブロックに対する処理を所望のプ
ロセッサに割り当て、この割り当てされた各ブロックの
データを前記所望のプロセッサにおいてローカルメモリ
に格納して処理をすることを特徴とするマルチプロセッ
サデータ処理方法である。The present invention also provides a multiprocessor data processing method for continuously acquiring and processing data using a plurality of processors each having a local memory. A process for each block is allocated to a desired processor so that a load distribution of a process in each processor for each block divided into a plurality of blocks based on the property is optimized, and data of each allocated block is allocated to the desired processor. A multiprocessor data processing method characterized in that a processor stores and processes in a local memory. The present invention also provides a multiprocessor data processing method for continuously acquiring and processing data using a plurality of processors each having a local memory. The processing for each block is performed based on the expected value of the processing time required for each of the parts that can be parallelized in the processing for each of the divided blocks and the expected value of the processing time of the unprocessed data of each processor. A multiprocessor data processing method, wherein the data is allocated to a desired processor, and the data of each allocated block is stored in a local memory and processed in the desired processor.

【０００９】また本発明は、各々にローカルメモリを備
えた複数のプロセッサを用いてデータを連続的に取得し
ながらそれの処理を行っていくマルチプロセッサデータ
処理方法において、取得予定のデータをデータの性質に
基づいて複数に分割された各ブロックに対する処理にお
いて並列化可能な部分のそれぞれにかかる処理時間の期
待値と前記各プロセッサの未処理のデータの処理時間の
期待値とに基づいて前記各ブロックに対する処理を所望
のプロセッサに割り当て、この割り当てされた各ブロッ
クのデータを前記所望のプロセッサにおいてローカルメ
モリに格納して処理をすることを特徴とするマルチプロ
セッサデータ処理方法である。また本発明は、前記マル
チプロセッサデータ処理方法において、前記各ブロック
に対する処理を所望のプロセッサに割当てる際、プロセ
ッサにおける各ブロックに対する処理の履歴情報を用い
て行うことを特徴とする。また本発明は、各々のプロセ
ッサにローカルメモリを持ち、データを連続的に取得し
ながらそれの処理を行っていくマルチプロセッサデータ
処理装置において、データの性質に基づいて取得予定の
データを複数のブロックに分割し、ブロックのデータの
取得開始時における各プロセッサの実行中のデータ処理
の終了までにかかる時間をハードウエアの構成とデータ
の性質とデータを処理するアルゴリズムとをもとに算出
し、前記ブロックに対する処理で並列化可能な部分を特
定し、前記各プロセッサの未処理のデータの処理時間の
期待値と、前記ブロックに対する処理で並列化可能な部
分のそれぞれにかかる処理時間の期待値をもとに、前記
ブロックの処理を複数あるいは１つのプロセッサに割り
当て、前記ブロックのデータの取得開始とともに、前記
データはすでにその処理の割り当てが決定されている前
記複数あるいは１つのプロセッサのローカルメモリに同
時に転送されることを特徴とする。Further, the present invention provides a multiprocessor data processing method for continuously acquiring data using a plurality of processors each having a local memory and performing the processing on the acquired data. Each block based on the expected value of the processing time required for each of the parallelizable parts in the processing of each block divided into a plurality based on the property and the expected value of the processing time of the unprocessed data of each processor. Is assigned to a desired processor, and the data of each assigned block is stored in a local memory in the desired processor for processing. Further, the present invention is characterized in that, in the multiprocessor data processing method, when allocating a process for each block to a desired processor, the process is performed using history information of a process for each block in the processor. Further, the present invention provides a multiprocessor data processing device that has a local memory in each processor and performs data processing while continuously acquiring data. Calculated based on the hardware configuration and the nature of the data and the algorithm for processing the data, the time required until the end of the data processing being executed by each processor at the start of acquiring the data of the block, A part that can be parallelized in the processing on the block is specified, and an expected value of the processing time of the unprocessed data of each processor and an expected value of the processing time of each of the parts that can be parallelized in the processing on the block are also calculated. Assigning the processing of the block to a plurality or one processor, and starting the acquisition of the data of the block; To the data already characterized in that it is simultaneously transferred to the local memory of the plurality or one processor assignment of the process it has been determined.

【００１０】また本発明は、各々にローカルメモリを備
えた複数のプロセッサを用いてデータを連続的に取得し
ながらそれの処理を行っていくマルチプロセッサデータ
処理装置において、データの性質に基づいて取得予定の
データを複数のブロックに分割し、該各ブロックに対し
て行う複数の並列実行可能な処理のそれぞれにかかる処
理時間の期待値と、各プロセッサに割り当てられた処理
に必要なが終了するまでのがを求めておき、前記各ブロ
ック毎に、その処理を割り当てるプロセッサ数の算出と
その特定は、前記並列実行可能な処理のそれぞれにかか
る処理時間の期待値、あるいはそのブロックの１つ前ま
でのブロックで各プロセッサに割り当てられた処理とそ
の処理に必要な時間の期待値、あるいはハードウエアの
構成により決定されるデータを前記ローカルメモリに格
納することにより増加するプロセッサのメモリアクセス
競合のための待ち時間のうちの１つまたは幾つかの組み
合わせをもとにデータの種類に応じて動的に算出し、前
記ブロックのデータの取得開始とともに、前記データは
すでにその処理の割り当てが決定されている前記複数あ
るいは１つのプロセッサのローカルメモリに同時に格納
されることを特徴とするマルチプロセッサデータ処理装
置である。According to the present invention, there is provided a multiprocessor data processing apparatus which continuously acquires data using a plurality of processors each having a local memory and processes the acquired data. The expected data is divided into a plurality of blocks, the expected value of the processing time required for each of the plurality of parallel executable processes to be performed on each block, and the time required for the process assigned to each processor is completed. For each of the blocks, the calculation and specification of the number of processors to which the processing is assigned is determined by the expected value of the processing time required for each of the processes that can be executed in parallel, or by the immediately preceding block. Is determined by the processing assigned to each processor and the expected value of the time required for that processing, or the hardware configuration. Dynamically calculating according to the type of data based on one or some combination of waiting times for memory access contention of the processor, which is increased by storing data in the local memory, A multiprocessor data processing apparatus, characterized in that at the same time as acquisition of data of a block is started, the data is simultaneously stored in local memories of the plurality or one of the processors for which the assignment of the processing has already been determined.

【００１１】[0011]

【発明の実施の形態】以下、本発明を図１から図１０に
より説明する。図１は本発明の全体構成の一例を示した
ものである。図１はある複数の決められたパターンが検
出画像中に存在するかを調べ、存在していた場合にはこ
れの位置を求めるデータ処理装置の全体構成図である。
１０１は画像検出センサーである。画像検出センサー１
０１で検出された画像信号は、Ａ／Ｄコンバータ１０２
でデジタル信号に変換された後、バッファ１０３に蓄積
される。バッファ１０３に蓄積されたデジタルデータ
は、複数並列に並べられたデータ処置プロセッサボード
１０４、１０５、１０６のすべてに対して転送される。
各データ処理プロセッサボード１０４、１０５、１０６
には、あらかじめ、どの検出画像を格納するかがホスト
コンピュータ１０８からバス１０７を介して通知されて
いる。各データ処理プロセッサボード１０４、１０５、
１０６に転送された画像は、その画像がメモリＭに格納
するようホストコンピュータ１０８から通知されていた
場合にのみ、前記ボード１０４、１０５、１０６上に実
装されたメモリＭに格納される。各データ処理プロセッ
サボード１０４、１０５、１０６におけるＰ１，Ｐ２，
Ｐ３は、各々プロセッサを示す。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below with reference to FIGS. FIG. 1 shows an example of the overall configuration of the present invention. FIG. 1 is an overall configuration diagram of a data processing apparatus for checking whether or not a plurality of predetermined patterns exist in a detected image, and determining the positions of the patterns if they exist.
101 is an image detection sensor. Image detection sensor 1
01 is detected by the A / D converter 102
After being converted into a digital signal by the above, the digital signal is stored in the buffer 103. The digital data stored in the buffer 103 is transferred to all of the data processing processor boards 104, 105, and 106 arranged in parallel.
Each data processing processor board 104, 105, 106
Is notified from the host computer 108 via the bus 107 in advance which detection image is to be stored. Each data processing processor board 104, 105,
The image transferred to the memory 106 is stored in the memory M mounted on the boards 104, 105, and 106 only when the image has been notified from the host computer 108 to store the image in the memory M. P1, P2 in each data processing processor board 104, 105, 106
P3 indicates a processor.

【００１２】今、画像検出センサー１０１により連続的
１０９〜１１２で示した画像を検出する場合について考
える。１０９〜１１２はそれぞれ１枚の画像であるが、
センサー１０１で検出する際には画像の境界部等におい
て検出がとぎれること等無く、一定のレートでの検出が
行われる。各データ処理プロセッサボード１０４、１０
５、１０６上のローカルメモリＭにはこれらの内の一
枚、あるいはすべての画像が蓄積される。ホストコンピ
ュータ１０８には二次記憶装置１１３が接続されてお
り、ここには検出されるパターンの規範パターン１１４
〜１１７が格納されている。この規範パターン１１４〜
１１７はデータ処理プロセッサボード１０４〜１０６上
のローカルメモリＭに画像の検出開始前にバス１０７を
介して転送される。各データ処理プロセッサボード１０
４、１０５、１０６では、それぞれプロセッサＰ１，Ｐ
２，Ｐ３により、ローカルメモリＭに格納された検出画
像（ｆi ）１枚毎に規範パターン（ｇj ）１１４〜１１
７のうちのいくつかのパターンとの濃淡相関演算が行わ
る。全てのデータ処理プロセッサボードにおける演算を
あわせた場合には、各検出画像（ｆi ）１０９〜１１２
はそれぞれ規範パターン（ｇj ）１１４〜１１７と１回
ずつ相関演算が行われるようになっている。検出画像１
０９〜１１２の画像をそれぞれｆ1，ｆ2，ｆ3，ｆ4とお
き、規範パターン１１４〜１１７をそれぞれｇ1，ｇ2，
ｇ3，ｇ4とおくと、プロセッサＰ１〜Ｐ３における各濃
淡相関演算は次の（数１）式で示される。Now, consider a case in which the image detection sensor 101 detects the images shown continuously 109 to 112. Each of 109 to 112 is one image,
When the detection is performed by the sensor 101, the detection is performed at a constant rate without any interruption at the boundary of the image. Each data processing processor board 104, 10
One or all of these images are stored in the local memory M on the reference numerals 5 and 106. A secondary storage device 113 is connected to the host computer 108, in which a reference pattern 114 of a detected pattern is stored.
To 117 are stored. This normative pattern 114 ~
117 is transferred to the local memory M on the data processing processor boards 104 to 106 via the bus 107 before the start of image detection. Each data processing processor board 10
4, 105, and 106, the processors P1, P
2, P3, the reference patterns (gj) 114 to 11 for each detected image (fi) stored in the local memory M.
7 is performed. When the calculations in all the data processing processor boards are combined, each detected image (fi) 109-112
, A correlation operation is performed once with reference patterns (gj) 114 to 117, respectively. Detection image 1
The images 09-112 are designated as f1, f2, f3, f4, respectively, and the reference patterns 114-117 are designated as g1, g2,
If g3 and g4 are set, each grayscale correlation operation in the processors P1 to P3 is represented by the following (Equation 1).

【００１３】[0013]

【数１】 (Equation 1)

【００１４】ｆi（ｘ，ｙ）は、（ｘ，ｙ）座標におけ
る検出画像の濃淡値を示す。従って一つのパターンが存
在する領域（画面）について、ｘ，ｙを画素単位で変化
させることになる。ｆi（ｘ−ｍ，ｙ−ｎ）は、（ｘ，
ｙ）座標からＸ方向にｍ画素，Ｙ方向にｎ画素シフトさ
せたところの検出画像の濃淡値を示す。即ち、検出画像
と規範画像との間において位置ずれが存在するので、Ｘ
方向にｍ画素，Ｙ方向にｎ画素シフトさせていって最も
一致した濃淡相関が（数１）式によって求まる。ｆi に
ｇj と同一のパターンが含まれれば（数１）式の演算結
果は大きくなるため、検出画像ｆi に対してすべての規
範パターンｇj との濃淡相関演算を行った際に最大の相
関を示した規範パターンを求めることで検出画像中にあ
る規範パターンの種類を特定することが可能である。各
データ処理プロセッサボード１０４、１０５、１０６に
おいては、規範パターン１１４〜１１７の全てとの相関
演算が行われるわけではないので、検出画像中にある規
範パターンを特定することはできない。このため、各デ
ータ処理プロセッサボード１０４、１０５、１０６は、
濃淡演算の終了後、バス１０７を介してホストコンピュ
ータに相関演算を行った規範パターン種類の識別子と、
最大値と最大値が得られた座標とを通知する。ホストコ
ンピュータ１０８では検出画像ｆi 毎に、各データ処理
プロセッサボードが通知したそれぞれの規範パターンと
の濃淡相関演算で得られた最大値を比較し、最大の相関
を得た規範パターンと同一のパターンが画像中にあった
と認識し、このパターン識別子とその位置を二次記憶装
置１１３に書き込む。Fi (x, y) indicates the gray value of the detected image at the (x, y) coordinates. Therefore, for an area (screen) where one pattern exists, x and y are changed in pixel units. fi (xm, yn) is (x,
y) Indicates the gray value of the detected image that is shifted by m pixels in the X direction and n pixels in the Y direction from the coordinates. That is, since there is a displacement between the detected image and the reference image, X
The gray-scale correlation that is most matched by shifting the pixel by m pixels in the direction and by n pixels in the Y direction is obtained by equation (1). If fi contains the same pattern as gj, the calculation result of Expression (1) becomes large. Therefore, when the grayscale correlation calculation is performed for all the reference patterns gj on the detected image fi, the maximum correlation is shown. By obtaining the reference pattern, the type of the reference pattern in the detected image can be specified. In each of the data processing processor boards 104, 105, and 106, since a correlation operation is not performed with all of the reference patterns 114 to 117, the reference pattern in the detected image cannot be specified. For this reason, each data processing processor board 104, 105, 106
After completion of the shading calculation, an identifier of a reference pattern type obtained by performing a correlation calculation on the host computer via the bus 107;
The maximum value and the coordinates at which the maximum value was obtained are notified. The host computer 108 compares, for each detected image fi, the maximum value obtained by the grayscale correlation operation with each reference pattern notified by each data processing processor board, and determines the same pattern as the reference pattern having the maximum correlation. It recognizes that it is in the image, and writes this pattern identifier and its position in the secondary storage device 113.

【００１５】図１に示した構成において高速にデータ処
理を行っていくにあたっては，濃淡相関演算のデータ処
理プロセッサＰ１，Ｐ２，Ｐ３への割り当ての最適化が
必要である。プロセッサが複数備えられた装置において
高速な処理を実現するには、プロセッサにおける演算量
が均等になるようそれぞれのプロセッサＰ１〜Ｐ３に演
算を割り当てることが必要である。従来よく用いられて
いる巡回方式を用いて割り当てを決定した場合のタイミ
ングチャートを図２に示す。検出画像の全ての規範パタ
ーンとの濃淡相関演算を、検出画像１０９に関しては、
画像処理ボード１０４に、検出画像１１０は画像処理ボ
ード１０５に、検出画像１１１は画像処理ボード１０６
に、検出画像１１２は画像処理ボード１０４にと割り当
てている。この結果、プロセッサＰ２では２０２に示す
検出画像１１０の検出が終了するまで処理を行うことは
なく、プロセッサＰ３では２０３に示す検出画像１１１
の検出終了まで処理は行えない。また、プロセッサＰ２
とプロセッサＰ３ではそれぞれ２０５と２０６の処理を
終了後、プロセッサＰ１が２０８に示す検出画像１１２
の検出画像を処理している間は処理を行っていない。こ
のように、特に画像検出開始付近と画像検出終了付近に
おいてプロセッサの処理が大きくばらついてしまい、高
速なデータ処理が実現できない。In order to perform high-speed data processing in the configuration shown in FIG. 1, it is necessary to optimize the assignment of the grayscale correlation calculation to the data processors P1, P2, and P3. In order to realize high-speed processing in an apparatus having a plurality of processors, it is necessary to allocate operations to the respective processors P1 to P3 so that the amount of operation in the processors becomes equal. FIG. 2 is a timing chart in the case where allocation is determined using a cyclic method that has been conventionally used. The grayscale correlation calculation with all the reference patterns of the detected image is performed with respect to the detected image 109.
The detected image 110 is stored in the image processing board 105, and the detected image 111 is stored in the image processing board 106.
In addition, the detected image 112 is assigned to the image processing board 104. As a result, the processor P2 does not perform the processing until the detection of the detection image 110 indicated by 202 ends, and the processor P3 performs the detection image 111 indicated by 203.
The processing cannot be performed until the detection of is completed. The processor P2
After completing the processing of 205 and 206 in the processor P3, the processor P1
No processing is performed while the detected image is being processed. As described above, the processing of the processor greatly varies particularly near the start of image detection and near the end of image detection, and high-speed data processing cannot be realized.

【００１６】これを回避するためには、同一の検出画像
を複数のデータ処理プロセッサボード１０４〜１０６の
ローカルメモリＭ１〜Ｍ３に格納し、それぞれのプロセ
ッサＰ１〜Ｐ３において異なった規範パターンとの濃淡
相関演算を行えばよい。しかしながら、通常のＤＲＡＭ
でプロセッサのローカルメモリを構成した場合、画像の
メモリへの書き込みと画像の読み出しを同時に行うこと
はできないため、画像の格納を行っている間プロセッサ
のメモリからのデータ読み出しは制限を受け、プロセッ
サでの画像の処理にかかる時間はその分だけ長くなって
しまう。すなわち、むやみに多くのプロセッサに画像の
処理を割り当てると、かえって多くの時間が必要になっ
てしまい高速なデータ処理が実現できなくなってしま
う。In order to avoid this, the same detected image is stored in the local memories M1 to M3 of the plurality of data processing processor boards 104 to 106, and in each of the processors P1 to P3, a gray level correlation with a different reference pattern is obtained. An operation may be performed. However, ordinary DRAM
When the local memory of the processor is configured with, the writing of the image to the memory and the reading of the image cannot be performed at the same time. The time required for processing the image is longer by that much. That is, if image processing is unnecessarily assigned to many processors, much time is required, and high-speed data processing cannot be realized.

【００１７】これらの課題を回避し、データ処理の高速
化を図るためには、一意的にデータ処理の割り当てを決
定するのではなく、データ量と処理アルゴリズムをもと
に各プロセッサへのデータ処理の割り当ての最適化を行
う必要がある。これをデータ処理の割り当てを最適化し
た場合のタイミングチャートについて図３を用いて説明
する。２１０〜２１２は検出画像１０９の格納を示して
いる。１０９は初めの画像であるため、画像処理ボード
１０４〜１０６は最初データ処理を行っていない。すな
わち、すべてのプロセッサＰ１〜Ｐ３のローカルメモリ
Ｍ１〜Ｍ３に画像を格納してもメモリのアクセス競合を
起こすことはなく、画像処理時間を増加させることはな
い。画像処理ボード１０４ではこの格納した検出画像に
対して規範画像ｇ１とｇ２との相関演算を割り当て、画
像処理ボード１０５にはｇ３との相関演算を、画像処理
ボード１０６ではｇ４との相関演算を割り当てる。プロ
セッサＰ１における処理はプロセッサＰ２およびＰ３の
演算量の２倍となるため、プロセッサＰ１における処理
の終了タイミングとプロセッサＰ２、Ｐ３における処理
のタイミングとは異なってくる。プロセッサＰ２および
Ｐ３では、２１６および２１７に示す検出画像をメモリ
Ｍ２、Ｍ３に格納しながら処理を行っても次の検出画像
１１０のメモリＭ２、Ｍ３への転送終了タイミング付近
において２１４、２１５に示す処理は終了している。こ
のため、処理はその画像の転送が終了した後でなければ
開始できないため、もし検出画像１１０の処理を割り当
てなかった場合には検出画像１１０の転送が終了してか
ら検出画像１１１の転送が終了するまでの間、プロセッ
サＰ２とＰ３は何も処理を行わないことになってしま
う。この無駄になってしまう時間は画像の格納によって
増加する処理時間と比較して長いため、プロセッサＰ２
とＰ３に処理を割り当てて画像の格納のために処理時間
が増加したとしても全体として時間のロスにはならな
い。しかし、プロセッサＰ１において２１３で示すよう
に検出画像１０９に対する処理は、検出画像１１１の転
送終了タイミングの後までかかっているため、プロセッ
サの空いている時間での画像の格納ができない。このた
め、プロセッサＰ１には処理の割り当てを行わない。In order to avoid these problems and to speed up data processing, it is not necessary to uniquely determine the assignment of data processing, but to assign data processing to each processor based on the data amount and processing algorithm. Needs to be optimized. This will be described with reference to FIG. 3 with respect to a timing chart when the allocation of data processing is optimized. Reference numerals 210 to 212 indicate storage of the detected image 109. Since 109 is the first image, the image processing boards 104 to 106 are not performing data processing at first. That is, even if images are stored in the local memories M1 to M3 of all the processors P1 to P3, no memory access conflict occurs and the image processing time does not increase. The image processing board 104 allocates a correlation calculation between the reference images g1 and g2 to the stored detected image, the image processing board 105 allocates a correlation calculation with g3, and the image processing board 106 allocates a correlation calculation with g4. . Since the processing in the processor P1 is twice as large as the calculation amount of the processors P2 and P3, the end timing of the processing in the processor P1 is different from the timing of the processing in the processors P2 and P3. Even if the processors P2 and P3 perform the processing while storing the detected images 216 and 217 in the memories M2 and M3, the processing indicated by 214 and 215 near the transfer end timing of the next detected image 110 to the memories M2 and M3. Has ended. For this reason, since the processing can be started only after the transfer of the image is completed, if the processing of the detected image 110 is not assigned, the transfer of the detected image 111 is completed after the transfer of the detected image 110 is completed. Until the processing, the processors P2 and P3 do not perform any processing. Since this wasted time is longer than the processing time that is increased by storing the image, the processor P2
Even if processing is assigned to P3 and P3, and the processing time increases for storing images, no time is lost as a whole. However, as indicated by 213 in the processor P1, the processing on the detected image 109 takes place after the transfer end timing of the detected image 111, so that the image cannot be stored in a time when the processor is idle. Therefore, no process is assigned to the processor P1.

【００１８】このように、データ処理の割り当てを最適
に行うためには、データの格納開始時の各プロセッサＰ
１〜Ｐ３における未実行のデータ処理量を求め、これを
もとに割り当てを決定する必要がある。しかしながら、
センサー１０１からの画像の検出が連続的に行われ、画
像の検出の終了と次の画像の検出開始とに時間的な間隔
が無い場合、１枚の画像の転送終了後、そのときに実行
中の処理とそれの終了までにかかる時間とを求め、この
後に各プロセッサへの処理の割り当てを決定するのは、
次の画像の転送の開始に割り当ての決定が間に合わなく
なる可能性がある。このため、ホストコンピュータ１０
８における各プロセッサへの割り当てを決定するスケジ
ューリング２０９では、その直後の検出画像１０９の処
理の割り当てのみではなく、検出画像１０９〜１１２の
処理すべての割り当てを行っている。あらかじめスケジ
ューリングを行うため、各検出画像の転送開始時におけ
る各プロセッサでの既に割り当てられている処理に見込
まれる時間の期待値等をデータやアルゴリズムの種類を
もとに求めておく必要がある。ここでは各プロセッサに
おいて行う処理はすべて濃淡相関であるため、１つの規
範パターンとの濃淡相関にかかる処理時間と各プロセッ
サに割り当てる規範パターン数との積より処理時間は推
定可能であり、また、ハードウエアの構成によりメモリ
への転送を行った場合のメモリへのアクセス競合により
増してしまうデータ処理時間も推定可能である。As described above, in order to optimally allocate data processing, each processor P at the start of data storage is required.
It is necessary to determine the amount of unexecuted data processing in 1 to P3 and determine the assignment based on this. However,
If the detection of the image from the sensor 101 is performed continuously and there is no time interval between the end of the detection of the image and the start of the detection of the next image, after the transfer of one image is completed, the operation is being performed at that time And the time it takes to complete it, and then determine the assignment of processing to each processor,
There is a possibility that the allocation decision cannot be made in time for the start of the transfer of the next image. For this reason, the host computer 10
In the scheduling 209 for determining the assignment to each processor in 8, not only the assignment of the process of the detected image 109 immediately after that, but also the assignment of all the processes of the detected images 109 to 112 is performed. In order to perform scheduling in advance, it is necessary to obtain an expected value of a time expected for processing already assigned by each processor at the start of transfer of each detected image based on the type of data and algorithm. Here, since the processing performed in each processor is all grayscale correlation, the processing time can be estimated from the product of the processing time required for grayscale correlation with one reference pattern and the number of reference patterns assigned to each processor. It is also possible to estimate the data processing time which is increased due to contention for access to the memory when the transfer to the memory is performed by the hardware configuration.

【００１９】図４には、ホストコンピュータ１０８が実
行する割り当てアルゴリズムを示す。まず、対象とする
画像ｉの転送開始時点におけるプロセッサの未実行処理
量を調べる。プロセッサ数をｎとおき、各プロセッサに
おいて既に割り当てられた処理の実行に必要な時間の推
定値をＰｔ１、Ｐｔ２、・・・、Ｐｔｎ、これを小さい
順にソートした結果をＰｔmin１、Ｐｔmin２、・・・、
Ｐｔminｎとおく。すなわち、minｉとは既に割り当てた
処理をｉ番目に終了させることが期待されるプロセッサ
の番号を示す。次に画像ｉの処理を複数の並列実行が可
能な処理に分解し、各々にかかる処理時間の期待値をＣ
1、Ｃ2、・・・、Ｃlとおく。次に画像ｉ＋１の格納終
了期待時点から、画像ｉをメモリＭに転送するのにプロ
セッサＰにおいて必要となる時間の期待値Ｄｔｉを差し
引いた時点において、データの処理が全て終了している
ことが期待されるプロセッサＰmin１、Ｐmin２、・・
・、Ｐminｍを求める。画像ｉが最後の取り込み画像で
あった場合はｍ＝ｎとおく。そしてＣ1、Ｃ2、・・・、
Ｃlの処理を割り当てる候補であるプロセッサの数ｍの
集合に分ける全ての組合せにおいて、この集合の組合せ
を（Ｆ1,min1，Ｆ1,min2，・・・，Ｆ1,minｍ），（Ｆ2,min1，Ｆ2,min2，・・・，Ｆ2,minｍ），・・
・，（Ｆc,min1，Ｆc,min2，・・・，Ｆc,minｍ）とおき、変数ｋを変化させたときの以下に示す期待値の
最大値が最小となるように変数ｈを決定し、（Ｆh,min
1，Ｆh,min2，・・・，Ｆh,minｍ）の処理を各プロセッ
サmin 1，min 2，・・・，min ｍに割り当てる。ただし、ｋは１からｍまでの整数、ｈは１からｃまでの
整数である。次にデータの処理が全て終了していること
が期待されるプロセッサがあるか？そして全ての処理を
min １に割り当てる。以上説明した並列実行可能な処理
への分解と処理時間の期待値の算出は、特にスケジュー
リングを行う時点で行う必要はなく、予め求めておき、
二次記憶装置１１３に格納しておいても良い。FIG. 4 shows an assignment algorithm executed by the host computer 108. First, the amount of unexecuted processing of the processor at the start of transfer of the target image i is checked. .., Ptn, and the results obtained by sorting the estimated values of the time required to execute the processing already allocated in each processor in ascending order are Ptmin1, Ptmin2,. ,
Ptminn. That is, “mini” indicates the number of the processor that is expected to end the already assigned process at the i-th position. Next, the process of the image i is decomposed into a plurality of processes that can be executed in parallel.
1, C2,..., Cl. Next, when the expected value Dti of the time required for the processor P to transfer the image i to the memory M is subtracted from the expected storage end time of the image i + 1, it is expected that all the data processing is completed. Processors Pmin1, Pmin2,.
· Find Pminm. If the image i is the last captured image, m = n is set. And C1, C2, ...,
For all combinations that are divided into sets of the number m of processors that are candidates for assigning the Cl process, the combinations of this set are represented by (F1, min1, F1, min2,..., F1, minm), (F2, min1, F2 , min2, ..., F2, minm), ...
, (Fc, min1, Fc, min2,..., Fc, minm), and the variable h is determined so that the maximum value of the following expected value when the variable k is changed is minimized. (Fh, min
, Fh, min2,..., Fh, minm) are assigned to the processors min1, min2,. Here, k is an integer from 1 to m, and h is an integer from 1 to c. Next, is there a processor that is expected to complete all data processing? And all the processing
Assign to min1. The above-described decomposition into the processes that can be executed in parallel and the calculation of the expected value of the processing time do not need to be performed at the time of performing the scheduling in particular.
It may be stored in the secondary storage device 113.

【００２０】図１に示す実施の形態の場合、プロセッサ
数ｎは３となり、各検出画像の処理は濃淡相関演算をそ
れぞれの規範パターン毎に分解可能と考えるため、この
場合にはｉによらずｌは４となる。各々にかかる処理時
間の期待値は全て同一の濃淡演算となるため等しくなる
ため、ここでは処理時間の期待値をＣとおく。画像は検
出する順に画像１、画像２、・・・とおく。検出画像１
０９すなわち画像１の転送開始タイミングにおいてはそ
の前に行っている処理が無いため、Ｐｔ１＝Ｐｔ２＝Ｐ
ｔ３＝０である。画像２の転送開始タイミングにおいて
各プロセッサが行っている画像１の処理の終了までにか
かる時間は以下に示す（数２）式、および（数３）式に
よって算出される。In the case of the embodiment shown in FIG. 1, the number of processors n is 3, and the processing of each detected image is considered to be capable of decomposing the gray-scale correlation operation for each reference pattern. l becomes 4. The expected value of the processing time is set to C since the expected values of the processing time are the same because the same grayscale calculation is performed. The images are referred to as image 1, image 2,... In the order of detection. Detection image 1
09, that is, at the transfer start timing of image 1, there is no process performed before that, so that Pt1 = Pt2 = P
t3 = 0. The time required until the processing of the image 1 performed by each processor at the transfer start timing of the image 2 is calculated by the following Expressions (2) and (3).

【００２１】Ｐｔ１＝２Ｃ（数２）Ｐｔ２＝Ｐｔ３＝Ｃ（数３）次に、画像ｉの格納開始時点において既に割り当てられ
ている実行中の処理の終了推定タイミングに、画像ｉを
メモリに格納した場合にメモリのアクセス競合で増加す
る処理時間を加えたタイミングと画像ｉ＋１の画像格納
終了タイミングを比較し、前者のタイミングが早けれ
ば、画像ｉの処理を割り当てるプロセッサの候補とす
る。すなわち、前者のタイミングが早ければ、画像ｉを
割り当てなかった場合にそのプロセッサが何も処理を行
わないと推定される時間は画像の格納を行うことにより
増加する処理時間を上回るため、全体としてプロセッサ
をより効率的に実行させることができる。これを図１０
を用いて説明する。画像ｉ−１までに対して行う処理を
図１０のタイミングチャートの９０１に示す。画像ｉ＋
１の処理を開始することができるのは９０２に示すよう
に画像ｉ＋１の転送が終了した後である。よって、画像
ｉの処理をこのプロセッサに割り当てなかった場合にプ
ロセッサが処理を行わない区間は９０３で示される。次
に画像ｉの処理を割り当てた場合を考えると画像ｉの転
送中には画像処理を行っており、画像をメモリに格納す
る際に発生するメモリのアクセス競合により増加する処
理時間９０４を考慮する必要がある。９０３の時間と９
０４の時間とを比較した際に９０３の時間が長い場合、
画像ｉ＋１の処理の開始可能タイミングである画像ｉ＋
１の転送終了タイミング前に画像ｉの格納と処理の開始
を行うことができるため処理時間の短縮を図ることが可
能になる。ただし、ｉが最後の画像であった場合にはそ
の次の画像を割り当てることはあり得ないため、すべて
のプロセッサを割り当ての候補とする。メモリのアクセ
ス競合によって増加する処理時間をＤｔ、１枚の画像の
転送の開始から転送の終了までの時間をＴｒとおく。実
施例１において、検出画像１１０、すなわち画像２の処
理の割り当てを決定する場合にはそれぞれのプロセッサ
においてこの比較は以下のようになる。Pt1 = 2C (Equation 2) Pt2 = Pt3 = C (Equation 3) Next, the image i is stored in the memory at the estimated end time of the process being executed already allocated at the time of starting the storage of the image i. In this case, the timing obtained by adding the processing time increased due to the contention of the memory access is compared with the image storage end timing of the image i + 1. If the former timing is earlier, it is determined as a candidate for the processor to which the processing of the image i is assigned. That is, if the former timing is earlier, the time estimated that the processor does not perform any processing when the image i is not allocated exceeds the processing time increased by storing the image. Can be executed more efficiently. This is shown in FIG.
This will be described with reference to FIG. The processing to be performed up to the image i-1 is shown by 901 in the timing chart of FIG. Image i +
The process 1 can be started after the transfer of the image i + 1 is completed, as shown at 902. Therefore, a section in which the processor does not perform processing when the processing of the image i is not assigned to this processor is indicated by 903. Next, considering the case where the processing of the image i is assigned, the image processing is performed during the transfer of the image i, and the processing time 904 that increases due to memory access competition that occurs when the image is stored in the memory is considered. There is a need. 903 hours and 9
If the time of 903 is longer than the time of 04,
Image i +, which is the start timing of the processing of image i + 1
Since the storage of the image i and the start of the processing can be performed before the transfer end timing of 1, the processing time can be reduced. However, if i is the last image, it is impossible to assign the next image, so all processors are candidates for assignment. The processing time that increases due to memory access competition is Dt, and the time from the start of transfer of one image to the end of transfer is Tr. In the first embodiment, when determining the assignment of the processing of the detected image 110, that is, the image 2, the comparison between the processors is as follows.

【００２２】それぞれのタイミングの基準を１１０の転
送開始タイミング２１９として次に示す（数４）式、
（数５）式の関係となる。Ｐ１：２Ｃ＋Ｄｔ＞２Ｔｒ（数４）Ｐ２、Ｐ３：Ｃ＋Ｄｔ＜２Ｔｒ（数５）このため、画像２の処理の割り当て候補としてはＰ２と
Ｐ３が選択されることになる。The reference of each timing is defined as a transfer start timing 219 of 110 (Equation 4) shown below.
Equation 5 is obtained. P1: 2C + Dt> 2Tr (Equation 4) P2, P3: C + Dt <2Tr (Equation 5) Therefore, P2 and P3 are selected as the allocation candidates for the processing of the image 2.

【００２３】次に画像の処理を割り当てる候補となった
プロセッサに対してどの処理を割り当てるかを決定す
る。処理を割り当てる候補となったプロセッサの数をｍ
とおき、分割可能な処理を１つあるいは複数にまとめて
ｍのグループを形成する組み合わせを求める。このｊ番
目の組み合わせを（Ｆ（ｊ，min１）、Ｆ（ｊ，min
２）、・・・、Ｆ（ｊ，min ｍ））とおき、Ｆ（ｊ，mi
n ｋ）をプロセッサmin ｋに割り当てる処理のグループ
の候補とする。プロセッサmin ｋにＦ（ｊ，min ｋ）の
要素である処理を割り当てた場合に処理の終了までにか
かる時間を求め、これが最大になるプロセッサにおいて
処理が終了するタイミングを画像ｉの処理の終了タイミ
ングと推定する。これを全ての組み合わせに対して行
い、最も早く処理が終了する組み合わせを求め、この組
み合わせで示される画像の処理の割り当てを行う。図１
の実施例の検出画像１１０の割り当てにおいてはＰ２と
Ｐ３のみが割り当てる候補であるからｍ＝２である。検
出画像１１０の転送開始時においてＰｔ２とＰｔ３とは
等しいがここでは説明の便宜上min１＝２、min２＝３と
おく。処理は４つに分解されているため、４つの処理を
それぞれＰ２か、Ｐ３で処理するかを決定するので１６
とおりが考えられる。今、規範パターン１１４、１１
５、１１６、１１７との相関演算をそれぞれ処理１、処
理２、処理３、処理４として、次のようにＦ（１，
２）、Ｆ（１，３）を決定したとする。Next, it is determined which process is to be assigned to a processor which is a candidate for assigning an image process. M is the number of processors that are candidates for assigning processing.
Then, a combination that divides the processes that can be divided into one or a plurality to form a group of m is obtained. This j-th combination is represented by (F (j, min1), F (j, min
2),..., F (j, min m)) and F (j, mi)
Let n k) be a candidate for a group of processes to be assigned to the processor min k. When the processing which is an element of F (j, min k) is assigned to the processor min k, the time required for the processing to be completed is obtained. It is estimated. This is performed for all the combinations, the combination whose processing is completed earliest is obtained, and the processing of the image indicated by this combination is assigned. FIG.
In the allocation of the detected image 110 in the embodiment, m = 2 because only P2 and P3 are candidates to be allocated. At the start of the transfer of the detected image 110, Pt2 and Pt3 are equal, but here, for convenience of explanation, it is assumed that min1 = 2 and min2 = 3. Since the processing is decomposed into four, it is determined whether each of the four processings is processed by P2 or P3.
As you can imagine. Now, the reference patterns 114 and 11
5, 116, and 117 as processing 1, processing 2, processing 3, and processing 4, respectively, as F (1,
2) Assume that F (1,3) is determined.

【００２４】Ｆ（１，２）＝｛処理１，処理２，処理３｝Ｆ（１，３）＝｛処理４｝このとき、Ｆ（１，２）の処理にかかる時間は３Ｃ、Ｆ
（１，３）の処理にかかる時間はＣである。検出画像１
１０の転送開始時点におけるＰｔ２とＰｔ３にこれを加
えた値がこの割り当てを行った場合にそれぞれのプロセ
ッサで処理が終了するまでにかかる時間であり以下に示
す（数６）式および（数７）式の関係となる。Ｐ２：４Ｃ（数６）Ｐ３：２Ｃ（数７）よって、この１番目の組み合わせにおける検出画像１１
０の処理の終了タイミングは検出画像１１０の転送開始
タイミングを基準として４Ｃとなる。次にＦ（２，
２）、Ｆ（２，３）をそれぞれ次のようにしたとする。Ｆ（２，２）＝｛処理１，処理２｝Ｆ（２，３）＝｛処理３，処理４｝このときにはＰ２、Ｐ３とも処理の終了までにかかる時
間は３Ｃとなり、終了タイミングは始めの組み合わせよ
り早くなる。F (1, 2) = {Process 1, Process 2, Process 3} F (1, 3) = {Process 4} At this time, the time required for the process of F (1, 2) is 3C, F
The time required for the processing of (1, 3) is C. Detection image 1
The value obtained by adding this to Pt2 and Pt3 at the transfer start time of No. 10 is the time required for each processor to complete the processing when this allocation is performed, and is expressed by the following equations (6) and (7). It becomes the relation of the expression. P2: 4C (Equation 6) P3: 2C (Equation 7) Accordingly, the detected image 11 in the first combination is obtained.
The end timing of the process of 0 is 4C based on the transfer start timing of the detected image 110. Next, F (2,
Suppose that 2) and F (2,3) are as follows. F (2,2) = {Process 1, Process 2} F (2,3) = {Process 3, Process 4} At this time, the time required for the end of the process for both P2 and P3 is 3C, and the end timing is the first. Faster than combination.

【００２５】図５は図１に示すデータ処理プロセッサボ
ード１０４〜１０６の具体的一実施の形態を示す構成図
である。４０２〜４０６までによりプロセッサボードは
構成されている。４０１はバッファであり、図１の１０
３と同じものである。バッファ４０１は画像の１ライン
分だけの容量を持っており、１ライン分の画像がセンサ
ーから転送されると、各プロセッサボード１０４〜１０
６に対してバッファフルを示す信号を送る。プロセッサ
ボード１０４〜１０６は、それがその画像のデータの処
理を割り当てられていた場合のみ、ボード上に実装され
ているＤＭＡ４０２でバッファ４０１から転送されるデ
ータを読み込み、ローカルメモリ４０３に格納する。Ｃ
ＰＵ４０４はキャッシュ４０４を介してローカルメモリ
４０３とアクセスを行う。このため、ＣＰＵは必ずしも
常にメモリ４０３にアクセスする必要は無くなるため、
メモリ４０３に画像を格納している間であってもＣＰＵ
４０５におけるデータ処理は大きくは影響を受けない。
４０６はデュアルポートメモリであり、バス４０７を介
してホストコンピュータ１０８とのデータの受け渡しに
用いる。デュアルポートメモリ４０６を用いることによ
り、ＣＰＵ４０４とホストコンピュータ４０６とのアク
セス競合を少なくさせることができる。FIG. 5 is a block diagram showing a specific embodiment of the data processor boards 104 to 106 shown in FIG. The processor board is configured by 402 to 406. Reference numeral 401 denotes a buffer.
Same as 3. The buffer 401 has a capacity of one line of an image, and when an image of one line is transferred from the sensor, each of the processor boards 104 to
A signal indicating buffer full is sent to 6. The processor boards 104 to 106 read data transferred from the buffer 401 by the DMA 402 mounted on the board and store the data in the local memory 403 only when the processor board 104 to 106 is assigned to process the image data. C
The PU 404 accesses the local memory 403 via the cache 404. For this reason, the CPU does not always need to access the memory 403.
Even while the image is stored in the memory 403, the CPU
Data processing at 405 is not significantly affected.
Reference numeral 406 denotes a dual port memory, which is used to exchange data with the host computer 108 via the bus 407. By using the dual port memory 406, access competition between the CPU 404 and the host computer 406 can be reduced.

【００２６】図６は図１に示したものと異なる種類の画
像に対して適応させた場合のタイミングチャートであ
る。５０１〜５０４が検出する画像であり、一つの画像
に複数のパターンが存在し、あらかじめそれぞれのパタ
ーンの概略位置とそのパターンの種類が分かっている。
このそれぞれのパターンの画像中における位置を特定す
るため、プロセッサはパターンの概略位置より画像処理
を行う領域である画像処理ウインドウを設定し、この領
域内において既に分かっている規範パターンとの濃淡相
関演算を行う。このため、１つのパターンに対して複数
の規範パターンとの相関演算を行う必要はなく、図１に
おける処理のような複数の並列可能な処理への分解は不
可能であるが、１枚の画像中にある複数の画像処理ウイ
ンドウの処理をそれぞれ別のプロセッサに割り当てるこ
とが可能である。このときのタイミングチャートを図７
に示す。FIG. 6 is a timing chart in the case of adapting to an image of a type different from that shown in FIG. The images 501 to 504 are images to be detected. A plurality of patterns exist in one image, and the approximate position of each pattern and the type of the pattern are known in advance.
In order to identify the position of each pattern in the image, the processor sets an image processing window, which is an area where image processing is performed, based on the approximate position of the pattern, and calculates a gray level correlation with a reference pattern already known in this area. I do. Therefore, it is not necessary to perform a correlation operation on a single pattern with a plurality of reference patterns, and it is impossible to decompose the pattern into a plurality of parallel processes such as the process in FIG. The processing of the plurality of image processing windows therein can be assigned to different processors. The timing chart at this time is shown in FIG.
Shown in

【００２７】スケジューリングのアルゴリズム６０１は
図４に示したものと同一のものが使用される。ここで用
いられる画像処理演算は上記（数１）式で示したもので
はあるが、図１に示した実施の形態とは異なり、画像処
理ウインドウ内の画像のみとの相関演算を行えば良いた
め、処理時間は短縮される。画像処理ウインドウは必ず
しも同一ではないため、各画像処理ウインドウ毎に演算
時間が異なってしまう。このような場合にも正確な処理
時間の推定を行うために、二次記憶装置１１３に格納さ
れている処理アルゴリズムにはある単位画像の相関演算
に必要な時間のみが記述されている。この格納されてい
る時間と、画像処理ウインドウのサイズと規範パターン
のサイズとをもとに各割り当てた処理に必要な処理時間
が算出される。初めにスケジューリングを行った後に画
像の検出が開始される。画像５０１はＰ１とＰ２のロー
カルメモリに格納され、４角形のパターンの相関演算は
Ｐ１で３角形のパターンとの相関演算はＰ２で行われ
る。これ以降の画像５０２〜５０４までについてはそれ
ぞれ１画像中に３つずつパターンがあるため、Ｐ１から
Ｐ３の全てのプロセッサのローカルメモリ４０３に画像
を格納し、各プロセッサでそれぞれ１つずつのパターン
の相関演算を行う。The same scheduling algorithm 601 as that shown in FIG. 4 is used. Although the image processing operation used here is represented by the above (Equation 1), unlike the embodiment shown in FIG. 1, the correlation operation only needs to be performed with the image in the image processing window. , Processing time is reduced. Since the image processing windows are not always the same, the calculation time differs for each image processing window. Even in such a case, in order to accurately estimate the processing time, the processing algorithm stored in the secondary storage device 113 describes only the time required for the correlation operation of a certain unit image. Based on the stored time, the size of the image processing window, and the size of the reference pattern, the processing time required for each assigned process is calculated. After the first scheduling, the detection of the image is started. The image 501 is stored in the local memories of P1 and P2, and the correlation calculation of the square pattern is performed at P1 and the correlation calculation with the triangular pattern is performed at P2. Since each of the subsequent images 502 to 504 has three patterns in one image, the images are stored in the local memory 403 of all the processors P1 to P3, and each processor has one pattern. Perform correlation calculation.

【００２８】図７に示すタイミングチャートでは全ての
スケジューリングを画像の連続的な検出の開始前に行っ
ているが、必ずしもプロセッサにて行う処理が期待値と
一致するわけではない。このため、実際に処理にかかっ
た時間と期待値の差により、連続して多くの画像検出を
行った後の、画像の格納開始時での各プロセッサにおい
て実行中の画像処理の終了までに必要な時間の推定値は
実際のものと大きく異なってしまう可能性がある。これ
を避けるためのタイミングチャートが図８である。ここ
ではプロセッサの割り当てスケジューリングは、連続し
たデータの検出開始前のみではなく検出開始後にも行わ
れている。各プロセッサは各々に対して割り当てられた
処理が終了した時点において何枚目の画像の何ライン目
が転送されているかをチェックし、割り当てられた処理
の識別子と共にデュアルポートメモリ４０６に書き込
む。スケジューリングはホストコンピュータ１０８で行
われるが、デュアルポートメモリ４０６をチェックし、
これをもとに既に割り当てた処理の終了タイミングを算
出し、対象とする画像の格納開始時における各プロセッ
サの実行中の処理の終了までにかかる時間を推定し、こ
れをもとに割り当てを決定する。対象とする画像の直前
の画像の格納の終了直後にこのスケジューリングを行っ
た場合には画像の転送開始までにスケジューリングが終
了しないため、２つ前の画像の転送の終了後にスケジュ
ーリングを行っている。In the timing chart shown in FIG. 7, all scheduling is performed before the start of continuous image detection, but the processing performed by the processor does not always match the expected value. Therefore, the difference between the actual processing time and the expected value is required before the end of image processing being executed in each processor at the start of image storage after many continuous image detections have been performed. The estimated time can be very different from the actual one. FIG. 8 is a timing chart for avoiding this. Here, processor assignment scheduling is performed not only before the start of detection of continuous data but also after the start of detection. Each processor checks which line of the image is being transferred at the time when the processing assigned to each processor is completed, and writes the image and the assigned processing identifier into the dual port memory 406. Scheduling is performed by the host computer 108.
Based on this, the end timing of the processing already allocated is calculated, the time required for the end of the processing being executed by each processor at the start of storing the target image is estimated, and the allocation is determined based on this. I do. If this scheduling is performed immediately after the storage of the image immediately before the target image is completed, the scheduling is not completed before the transfer of the image is started. Therefore, the scheduling is performed after the transfer of the image two before is completed.

【００２９】デュアルポートメモリ４０６に書き込まれ
た処理の終了タイミングをもとに、その処理にかかった
時間を算出することができる。求めた時間をこれ以降の
スケジューリングの決定に用いることが可能であり、こ
れにより適切なスケジューリングを行うことができる。
また、これを処理の種類毎に二次記憶に記録することに
よりこれを保存し、ことなる画像パターンに対してもよ
り妥当なスケジューリングを可能にすることや、あるい
はオペレータが妥当なスケジューリングを行ったどうか
のチェックを容易にさせることが可能である。Based on the end timing of the process written in the dual port memory 406, the time required for the process can be calculated. The obtained time can be used for the subsequent scheduling decision, so that appropriate scheduling can be performed.
Also, by storing this in the secondary storage for each type of processing, this is stored, and more appropriate scheduling can be performed for different image patterns, or the operator has performed appropriate scheduling. It is possible to make it easy to check whether or not.

【００３０】次に似通ったタイミングで検出した複数の
画像群を繰り返し処理する場合のスケジューリングにつ
いて述べる。再び、図３に示すように、連続的な画像検
出の開始直前に全ての連続して検出する画像に対するス
ケジューリングを行う場合について考える。図１に示す
１１４〜１１７を規範パターンとして、図９に示す８０
１〜８０４を図１に示す１０９〜１１３と同様に連続的
に処理し、長い間隔をあけた後、８０５〜８０８を検出
する。８０１〜８０４と８０５〜８０８とはパターンが
表れる順番は異なるものの同じ枚数だけ画像を処理して
いく。このとき８０１〜８０４を処理する場合に最適な
スケジューリングは８０５〜８０８を処理する場合にも
最適なスケジューリングである。このように同じタイミ
ングでの画像の検出と処理を繰り返し極めて多くの回数
処理することがわかっている場合、それぞれの画像に対
して処理の割り当てを算出した場合には図３に示す２０
９に示される時間だけ処理時間が余計にかかってしま
う。このような場合には予めスケジューリングを行って
おき、これを図１に示す２次記憶装置１１３に各画像の
処理に対して割り当てられたプロセッサの識別子を格納
しておく。この格納された割り当ての結果を用いること
により毎回スケジューリングを行う必要が無くなり、処
理時間の短縮を図ることができる。Next, a description will be given of scheduling in a case where a plurality of image groups detected at similar timings are repeatedly processed. Again, as shown in FIG. 3, a case is considered in which scheduling is performed on all continuously detected images immediately before the start of continuous image detection. The reference patterns 114 to 117 shown in FIG.
1 to 804 are continuously processed in the same manner as 109 to 113 shown in FIG. 1, and after a long interval, 805 to 808 are detected. Although the order in which the patterns appear in 801 to 804 and 805 to 808 is different, the same number of images are processed. At this time, the optimal scheduling when processing 801 to 804 is also the optimal scheduling when processing 805 to 808. As described above, when it is known that the detection and processing of the image at the same timing are repeated and the processing is performed a very large number of times, when the allocation of the processing is calculated for each image, 20 shown in FIG.
9, the processing time is unnecessarily long. In such a case, scheduling is performed in advance, and the secondary storage device 113 shown in FIG. 1 stores the identifier of the processor assigned to the processing of each image. By using the stored assignment result, it is not necessary to perform scheduling every time, and the processing time can be reduced.

【００３１】[0031]

【発明の効果】本発明によれば、マルチプロセッシング
データ処理装置において実時間での処理を行うという制
約のもとで、全てのプロセッサに対して常に同程度の負
荷をかけることが可能となり、装置の性能を最大限に発
揮させるシステムの構築が可能となる。According to the present invention, it is possible to always apply the same load to all the processors under the constraint that the processing is performed in real time in the multi-processing data processor. It is possible to construct a system that maximizes the performance of the system.

[Brief description of the drawings]

【図１】本発明に係るマルチプロセッサデータ処理装置
の一実施の形態を示す主要部簡略構成図である。FIG. 1 is a simplified configuration diagram of a main part showing an embodiment of a multiprocessor data processing device according to the present invention.

【図２】本発明に係る検出画像毎に処理の割り当てを決
定した場合のタイミングチャートを示す図である。FIG. 2 is a diagram showing a timing chart in a case where processing assignment is determined for each detected image according to the present invention.

【図３】本発明に係る処理の割り当ての最適化を図った
場合のタイミングチャートを示す図である。FIG. 3 is a diagram showing a timing chart when optimizing the allocation of processing according to the present invention.

【図４】本発明に係るマルチプロセッサデータ処理装置
におけるスケジューリングアルゴリズムの一実施の形態
を示す図である。FIG. 4 is a diagram showing one embodiment of a scheduling algorithm in the multiprocessor data processing device according to the present invention.

【図５】本発明に係るマルチプロセッサデータ処理装置
におけるプロセッシングユニットの一実施の形態を示す
主要部簡略構成図である。FIG. 5 is a simplified configuration diagram of a main part showing an embodiment of a processing unit in the multiprocessor data processing device according to the present invention.

【図６】本発明に係るマルチプロセッサデータ処理装置
において処理を行う画像の一例を示す図である。FIG. 6 is a diagram showing an example of an image to be processed in the multiprocessor data processing device according to the present invention.

【図７】本発明に係るマルチプロセッサデータ処理装置
におけるタイミングチャートの一実施の形態を示す図で
ある。FIG. 7 is a diagram showing one embodiment of a timing chart in the multiprocessor data processing device according to the present invention.

【図８】本発明に係るマルチプロセッサデータ処理装置
におけるタイミングチャートの他の一実施の形態を示す
図である。FIG. 8 is a diagram showing another embodiment of the timing chart in the multiprocessor data processing device according to the present invention.

【図９】本発明に係るマルチプロセッサデータ処理装置
において処理する図１とは異なる画像パターンを示す図
である。FIG. 9 is a diagram showing an image pattern different from that of FIG. 1 processed in the multiprocessor data processing device according to the present invention.

【図１０】本発明に係るマルチプロセッサデータ処理装
置において処理時間の短縮をはかったタイミングチャー
トを示す図である。FIG. 10 is a diagram showing a timing chart in which a processing time is reduced in the multiprocessor data processing device according to the present invention.

[Explanation of symbols]

１０１…画像検出センサー、１０２…Ａ／Ｄコンバー
タ、１０３、４０１…バッファ、１０４…プロセッシン
グユニット１、１０５…プロセッシングユニット２、１
０６…プロセッシングユニット３、１０７、４０７…バ
ス、１０８…ホストコンピュータ、１０９…１枚目の検
出画像、１１０…２枚目の検出画像、１１１…３枚目の
検出画像、１１２…４枚目の検出画像、１１３…２次記
憶装置、１１４…規範パターン１、１１５…規範パター
ン２、１１６…規範パターン３、１１７…規範パターン
４、４０２…ＤＭＡ、４０３、Ｍ１〜Ｍ３…ローカルメ
モリ、４０４…キャッシュ、４０５…ＣＰＵ、４０６…
デュアルポートメモリ、Ｐ１〜Ｐ３…プロセッサ101: Image detection sensor, 102: A / D converter, 103, 401: Buffer, 104: Processing unit 1, 105: Processing unit 2, 1
06 processing units 3, 107, 407 bus, 108 host computer, 109 first detected image, 110 second detected image, 111 third detected image, 112 fourth image Detection image, 113 secondary storage device, 114 reference pattern 1, 115 reference pattern 2, 116 reference pattern 3, 117 reference pattern 4, 402 DMA, 403, M1 to M3 local memory, 404 cache , 405 ... CPU, 406 ...
Dual port memory, P1 to P3 ... Processor

Claims

[Claims]

1. A multiprocessor data processing device having a plurality of processors each having a local memory and performing data processing while continuously acquiring data, wherein the data to be acquired is adapted to the nature of the data. Allocating means for allocating processing to each block to a desired processor so that load distribution of processing in each processor to each of the blocks divided into a plurality of blocks is optimized, Dividing means for dividing the data into a plurality of blocks based on the data, and storing the data of each block divided by the dividing means and assigned by the assigning means in a local memory in the desired processor for processing. Multiprocessor data processing device.

2. A multiprocessor data processing apparatus having a plurality of processors each having a local memory and performing data processing while continuously acquiring data. Processing for each block based on the expected value of the processing time required for each of the parallelizable parts in the processing on each of the blocks divided based on the plurality of blocks and the expected value of the processing time of the unprocessed data of each processor. Allocating means for allocating to a desired processor, and dividing means for dividing continuously obtained data into a plurality of blocks based on the nature of the data,
A multiprocessor data processing apparatus, wherein the data of each block divided by the dividing means and allocated by the allocating means is stored in a local memory and processed in the desired processor.

3. A multiprocessor data processing apparatus having a plurality of processors each having a local memory and performing data processing while continuously acquiring data, wherein the data to be acquired is adapted to the nature of the data. Based on the expected value of the processing time required for each of the parts that can be parallelized in the processing of each of the blocks divided based on the plurality of blocks, the expected value of the processing time of the unprocessed data of each processor, and the number of processors to be allocated. Allocating means for allocating the processing for each block to a desired processor, dividing means for dividing continuously obtained data into a plurality of blocks based on the nature of the data, The data of each block allocated by (1) is stored in a local memory in the desired processor and processed. A multiprocessor data processing device, characterized in that:

4. The processor according to claim 2, wherein the assigning means assigns a process to each block to a desired processor by using history information of the process to each block in the processor. Multiprocessor data processing device.

5. A multiprocessor data processing method for continuously acquiring data using a plurality of processors each having a local memory and processing the acquired data, wherein the data to be acquired is determined based on the nature of the data. The processing for each block is allocated to a desired processor so that the load distribution of the processing in each processor for each of the divided blocks is optimized, and the data of each allocated block is locally stored in the desired processor. A multiprocessor data processing method, wherein the method is stored in a memory and processed.

6. A multiprocessor data processing method for continuously acquiring data using a plurality of processors each having a local memory and processing the acquired data, wherein the data to be acquired is based on the nature of the data. The processing for each block is performed based on the expected value of the processing time required for each of the parts that can be parallelized in the processing for each of the divided blocks and the expected value of the processing time of the unprocessed data of each processor. A multiprocessor data processing method, wherein data is allocated to a desired processor, and the data of each allocated block is stored in a local memory in the desired processor for processing.

7. A multiprocessor data processing method for continuously acquiring data using a plurality of processors each having a local memory and processing the acquired data. The processing for each block is performed based on the expected value of the processing time required for each of the parts that can be parallelized in the processing for each of the divided blocks and the expected value of the processing time of the unprocessed data of each processor. A multiprocessor data processing method, wherein data is allocated to a desired processor, and the data of each allocated block is stored in a local memory in the desired processor for processing.

8. The multiprocessor data processing method according to claim 6, wherein when assigning the processing for each block to a desired processor, the processing is performed using history information of the processing for each block in the processor.