JP2007188249A

JP2007188249A - Multiprocessor system

Info

Publication number: JP2007188249A
Application number: JP2006005130A
Authority: JP
Inventors: Masaru Kawasaki; 勝川崎; Tomohiko Matsumoto; 朋彦松本
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2006-01-12
Filing date: 2006-01-12
Publication date: 2007-07-26

Abstract

<P>PROBLEM TO BE SOLVED: To provide a multiprocessor system capable of increasing the processing speed of the whole system to a high speed approaching the processing speed of hardware by suppressing a wasteful use of a data bus. <P>SOLUTION: The multiprocessor system is constituted so that a system control processor is connected to a program memory and connected to a main memory in common together with a plurality of M units through the data bus. Each of the plurality of M units comprises a bus I/F 202 based on a DMAC for reading out data from the main memory through a data input port 201 and outputting a final processing result from a data output port 204 to the main memory and a plurality of P units 211, 212 or 221, 222 for independently executing prescribed processing for input data in each unit and outputting a finally obtained processing result to the bus I/F 202. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明はマルチプロセッサシステムに係り、特に複数のプロセッサがデータバスを介して共通のメインメモリに接続された構成のマルチプロセッサシステムに関する。 The present invention relates to a multiprocessor system, and more particularly to a multiprocessor system having a configuration in which a plurality of processors are connected to a common main memory via a data bus.

画像処理のような膨大な（画像）データを扱うデータ処理では、幾つかの処理ブロックに分けてプロセッサがブロック毎に順次処理を行う。この様子を図７に示す。最終結果を出力するために、プロセッサが４つの処理ブロックを順次処理する必要がある場合、図７に示すように、プロセッサは、４つの処理ブロックＰ０、Ｐ１、Ｐ２、Ｐ３を、順次処理して最終目的の処理結果を得る。各処理ブロックＰ０〜Ｐ３では必要なデータ毎に処理を行う。 In data processing that handles enormous (image) data, such as image processing, the processor divides the data into several processing blocks and sequentially performs processing for each block. This is shown in FIG. When the processor needs to sequentially process four processing blocks to output the final result, the processor sequentially processes the four processing blocks P0, P1, P2, and P3 as shown in FIG. Get the final processing result. Each processing block P0 to P3 performs processing for each necessary data.

画像データ処理の場合、処理単位はピクセル、ライン、ブロック等となる。処理単位のデータ群をＧＤ０、ＧＤ１、ＧＤ２、・・・とすると、各処理ブロックが時間の経過と共に処理するデータ群は、図８に示すようにすると理想的に処理できる。ただし、対象データの値は前の処理により変化している。これは各処理が並行して行われるため、処理ブロックＰ０〜Ｐ３で最も処理時間の掛かる処理の処理間隔で単位処理データの処理が終わる。この様な理想の処理速度を得るためには処理をハードウェア化しなくては実現できない。 In the case of image data processing, processing units are pixels, lines, blocks, and the like. Assuming that the data group of the processing unit is GD0, GD1, GD2,..., The data group processed by each processing block with the passage of time can be ideally processed as shown in FIG. However, the value of the target data is changed by the previous processing. Since each process is performed in parallel, the process of the unit process data ends at the process interval of the process that takes the longest processing time in the process blocks P0 to P3. In order to obtain such an ideal processing speed, the processing cannot be realized without hardware.

プロセッサによるソフトウェア処理の場合、各処理を時間的に並行して行うことができないので、図９に示すように、処理単位のデータを処理ブロックＰ０、Ｐ１、Ｐ２、Ｐ３の順で順次処理した後、次の単位処理データを同様に処理ブロックＰ０、Ｐ１、Ｐ２、Ｐ３の順で順次処理する。図９から分かるように、プロセッサによるソフトウェア処理では４つの処理時間の合計が単位処理データの処理に掛かるため、図８に示したように処理するハードウェアに比べ処理速度で圧倒的に劣る原因となる。 In the case of software processing by the processor, each process cannot be performed in parallel in time. Therefore, as shown in FIG. 9, after processing units of data are sequentially processed in the order of processing blocks P0, P1, P2, and P3. Similarly, the next unit processing data is sequentially processed in the order of the processing blocks P0, P1, P2, and P3. As can be seen from FIG. 9, in the software processing by the processor, the total of the four processing times is taken to process the unit processing data, and therefore the cause of the overwhelmingly lower processing speed than the hardware to be processed as shown in FIG. Become.

図１０はプロセッサが処理を進める際の一例の動作説明用フローチャートを示す。同図に示すように、メインメモリから処理すべきデータを読み出し（ステップＳ１）、例えば処理ブロックＰ０の処理を行い（ステップＳ２）、その処理結果を一旦メインメモリに保存する（ステップＳ３）。続いて、メインメモリ内に格納されているＰ０の処理結果を読み出し（ステップＳ４）、次の処理ブロックＰ１の処理を実行し（ステップＳ５）、その処理結果を一旦メインメモリに保存する（ステップＳ６）。以下、同様にして必要な処理ブロックの数だけ上記の処理を繰り返し、最終結果を得る。この様に、処理を引き継ぐ際、メインメモリに対しアクセスが発生する。 FIG. 10 is a flowchart for explaining the operation of an example when the processor advances the processing. As shown in the figure, data to be processed is read from the main memory (step S1), for example, processing of the processing block P0 is performed (step S2), and the processing result is temporarily stored in the main memory (step S3). Subsequently, the processing result of P0 stored in the main memory is read (step S4), the processing of the next processing block P1 is executed (step S5), and the processing result is temporarily stored in the main memory (step S6). ). Thereafter, the above-described processing is repeated in the same manner as many as necessary processing blocks to obtain the final result. In this way, when the processing is taken over, access to the main memory occurs.

ところで、以上は一台のプロセッサによる処理についての説明であるが、性能と信頼性の向上を目的として複数のプロセッサを結合したマルチプロセッサシステムが従来から知られている（例えば、特許文献１参照）。図１１はｎ台の中央処理装置（ＣＰＵ）１１−１、１１−２、・・・、１１−ｎと、ｍ台以上のディジタル・シグナル・プロセッサ（ＤＳＰ）１２−１、・・・、１２−ｍなどの、複数のプロセッサが、データバス１４を介して共通にメインメモリ１３に接続されている（つまり、メインメモリ１３を共有している。）。 By the way, the above is a description of processing by a single processor, but a multiprocessor system in which a plurality of processors are combined for the purpose of improving performance and reliability has been conventionally known (for example, see Patent Document 1). . 11 shows n central processing units (CPUs) 11-1, 11-2,..., 11-n and m or more digital signal processors (DSPs) 12-1,. A plurality of processors such as −m are connected to the main memory 13 in common via the data bus 14 (that is, the main memory 13 is shared).

この図１１に示した従来のマルチプロセッサシステムの動作について、図１２のフローチャートと共に説明する。まず、ＣＰＵ１１−１が処理を行うため、処理に必要なデータをメインメモリ１３から読み込み（ステップＳ１１）、ＣＰＵ１１−１はこのデータを処理して（ステップＳ１２）、その処理結果をメインメモリ１３に書き込む（ステップＳ１３）。 The operation of the conventional multiprocessor system shown in FIG. 11 will be described with reference to the flowchart of FIG. First, since the CPU 11-1 performs processing, data necessary for processing is read from the main memory 13 (step S11), the CPU 11-1 processes this data (step S12), and the processing result is stored in the main memory 13. Write (step S13).

続いて、ＣＰＵ１１−１の処理結果をＣＰＵ１１−２が処理するために、ＣＰＵ１１−２が、メインメモリ１３に保存されているＣＰＵ１１−１の処理結果をメインメモリ１３から読み込み（ステップＳ１４）、ＣＰＵ１１−２はこのデータを処理して（ステップＳ１５）、その処理結果をメインメモリ１３に書き込む（ステップＳ１６）。以下、同様にして、ＣＰＵ１１−ｎまでが上記と同様にデータを処理する。 Subsequently, in order for the CPU 11-2 to process the processing result of the CPU 11-1, the CPU 11-2 reads the processing result of the CPU 11-1 stored in the main memory 13 from the main memory 13 (step S14). -2 processes this data (step S15) and writes the processing result in the main memory 13 (step S16). In the same manner, the CPU 11-n processes data in the same manner as described above.

システム内にはＤＳＰ１２−１〜１２−ｍも処理を行うため、ＣＰＵ１１−１〜１１−ｎと同様に、まずＤＳＰ１２−１がメインメモリ１３からデータを読み込み（ステップＳ１７）、データを処理して（ステップＳ１８）、その処理結果をメインメモリ１３に書き込む（ステップＳ１９）。他のＤＳＰ１２−２〜１２−ｍも同様の動作を行う。 Since the DSPs 12-1 to 12-m also perform processing in the system, the DSP 12-1 first reads data from the main memory 13 (step S17) and processes the data in the same manner as the CPUs 11-1 to 11-n. (Step S18), the processing result is written in the main memory 13 (Step S19). The other DSPs 12-2 to 12-m perform the same operation.

特公平６−１４６３号公報Japanese Patent Publication No.6-1463

ところで、プロセッサオリエンテッドなデータ処理系において、処理速度がハードウェアより劣る原因の一つがバスの問題である。プロセッサバスは命令バス、データバスに大別されるがデータバスには処理データ以外にローカルデータ（スタック、一時保存データ）が混在し、純粋なハードウェア処理系に比べてデータアクセスが一つのバスに集中し易い。 Incidentally, in a processor-oriented data processing system, one of the causes that the processing speed is inferior to that of hardware is a bus problem. Processor buses are broadly divided into instruction buses and data buses, but the data bus contains local data (stack and temporary storage data) in addition to processing data, and has one data access compared to a pure hardware processing system. Easy to concentrate on.

すなわち、マルチプロセッサによる並列・分散処理系（メモリ共有システム）である、図１１に示した従来のマルチプロセッサシステムでは、システム内のＣＰＵ１１−１、１１−２、・・・等や、ＤＳＰ１２が、図１２のステップＳ１１、Ｓ１３、Ｓ１４、Ｓ１６、Ｓ１７、Ｓ１９等において、それぞれメインメモリ１３にアクセスすることになり、データアクセスが一つのデータバス１４に集中し、ＣＰＵやＤＳＰの数が多い大規模システムほど、この問題が顕著に現れ、バストラフィックが悪化する。 That is, in the conventional multiprocessor system shown in FIG. 11, which is a parallel / distributed processing system (memory sharing system) using multiprocessors, the CPUs 11-1, 11-2,. In steps S11, S13, S14, S16, S17, S19, etc. in FIG. 12, the main memory 13 is accessed, and data access is concentrated on one data bus 14, and the large number of CPUs and DSPs is large. This problem appears more prominently in the system and the bus traffic gets worse.

本発明は以上の点に鑑みなされたもので、無駄なデータバスの使用を抑制して、システム全体の処理速度をハードウェア処理速度に迫る高速度化を実現し得るマルチプロセッサシステムを提供することを目的とする。 The present invention has been made in view of the above points, and provides a multiprocessor system capable of suppressing the use of a useless data bus and realizing the high processing speed of the entire system approaching the hardware processing speed. With the goal.

上記の目的を達成するため、本発明は、システム制御用の第１のプロセッサが第１のプログラムメモリに接続されると共に、複数のユニットと共にデータバスを介してメインメモリに共通に接続された構成のマルチプロセッサシステムであって、複数のユニットのそれぞれは、第１のプロセッサからの処理開始命令により、メインメモリから第１のデータ入力ポートを介してデータを読み込み、最終的な処理結果を第１のデータ出力ポートからメインメモリへ出力する、ダイレクトメモリ転送コントローラによる第１のバスインタフェースと、第１のバスインタフェースを介して入力されたデータに対して、ユニット毎に独立して所定の処理を行い、最終的に得られた処理結果を第１のバスインタフェースへ出力するように接続された複数の基本ユニットとを有し、基本ユニットの数及び互いの接続方法が互いに同一又は一部若しくは全部が異なる構成であり、
複数の基本ユニットのそれぞれは、ダイレクトメモリ転送コントローラによる第２のバスインタフェースと、所定の動作を行わせるためのプログラムが記憶された第２のプログラムメモリと、第２のデータ入力ポートを介して第２のバスインタフェースに入力されたデータに対して、第２のプログラムメモリのプログラムに従って所定の処理を行い、得られた処理結果を第２のバスインタフェースを介して第２のデータ出力ポートへ出力する第２のプロセッサと、第２のプロセッサによる処理結果を一時記憶するデータメモリとを有することを特徴とする。 In order to achieve the above object, the present invention has a configuration in which a first processor for system control is connected to a first program memory and is commonly connected to a main memory through a data bus together with a plurality of units. Each of the plurality of units reads data from the main memory via the first data input port according to a processing start command from the first processor, and displays the final processing result as the first processing result. The first bus interface by the direct memory transfer controller that outputs to the main memory from the data output port and data input via the first bus interface are subjected to predetermined processing independently for each unit. A plurality of processing results connected to output the final processing result to the first bus interface. And a present unit, the number and mutual connection method identical to each other or a part or all different configurations of the basic unit,
Each of the plurality of basic units includes a second bus interface by a direct memory transfer controller, a second program memory in which a program for performing a predetermined operation is stored, and a second data input port. The data input to the second bus interface is subjected to predetermined processing according to the program in the second program memory, and the obtained processing result is output to the second data output port via the second bus interface. It has a 2nd processor and the data memory which memorize | stores the process result by a 2nd processor temporarily, It is characterized by the above-mentioned.

本発明では、複数のユニットのそれぞれは、内部の複数の基本ユニットによりその基本ユニット内の第２のプロセッサにより最終的に得られた処理結果を第１のバスインタフェースを介して第１のデータ出力ポートからデータバスへ出力することができる。 In the present invention, each of the plurality of units outputs the processing result finally obtained by the second processor in the basic unit by the plurality of internal basic units via the first bus interface. Can be output from the port to the data bus.

また、上記の目的を達成するため、本発明は、複数のユニットのそれぞれは、そのユニット内の複数の基本ユニットのうち初段の基本ユニットがメインメモリにアクセスして処理用データをメインメモリから読み出し、第１のバスインタフェースを介して入力された処理用データに対して処理してその処理結果を初段の基本ユニット内のデータメモリに読み込み、複数の基本ユニットのうち、初段の基本ユニットを除く残りの基本ユニットは、前段の基本ユニットにより処理された結果を受けて処理された結果を次段の基本ユニットへ出力し、複数の基本ユニットのうち最終段の基本ユニットで処理して得られた最終的な処理結果を、第１のバスインタフェースを介してメインメモリに書き込むことを特徴とする。 In order to achieve the above object, according to the present invention, each of the plurality of units is configured such that the first basic unit among the plurality of basic units in the unit accesses the main memory and reads processing data from the main memory. The processing data input via the first bus interface is processed and the processing result is read into the data memory in the first-stage basic unit, and the remaining of the plurality of basic units excluding the first-stage basic unit The basic unit receives the result processed by the previous basic unit, outputs the processed result to the next basic unit, and outputs the final result obtained by processing the final basic unit among the multiple basic units. A typical processing result is written into the main memory via the first bus interface.

この発明では、複数の基本ユニットのうち、初段の基本ユニットを除く残りの基本ユニットは、前段の基本ユニットにより処理された結果を受けて処理された結果を次段の基本ユニットへ出力し、複数の基本ユニットのうち最終段の基本ユニットで処理して得られた最終的な処理結果を、第１のバスインタフェースを介してメインメモリに書き込むようにしたため、複数の基本ユニットは互いに独立して動作して、所謂並列・分散処理を実現できる。また、この発明では、基本ユニット間のプロセッサ間制御はごく単純なものとなり、大規模化が容易であり、また、基本ユニット同士はハードウェア接続とすることができる。 In the present invention, among the plurality of basic units, the remaining basic units other than the first-stage basic unit receive the result processed by the previous-stage basic unit and output the processed result to the next-stage basic unit. Since the final processing result obtained by processing in the last basic unit among the basic units is written to the main memory via the first bus interface, the plurality of basic units operate independently of each other. Thus, so-called parallel / distributed processing can be realized. Further, in the present invention, the inter-processor control between the basic units becomes very simple and can be easily scaled up, and the basic units can be connected by hardware.

本発明によれば、複数のユニットのそれぞれは、内部の複数の基本ユニットによりその基本ユニット内の第２のプロセッサにより最終的に得られた処理結果を第１のバスインタフェースを介して第１のデータ出力ポートからデータバスへ出力するようにしたため、基本的なデータ転送は基本ユニット間で行われ、メインメモリへのアクセスは、初段基本ユニットと最終段の基本ユニットのみしか行わず、それ以外では複数のユニットの各ユニット内でデータ転送するため、メインメモリへのバストラフィックは大幅に減少でき、この結果、システム全体のパフォーマンス向上が可能となり、また、各基本ユニットは互いに独立して処理を行うためいわゆる並列・分散処理が実現し、処理速度が飛躍的に向上し、システム全体の処理速度をハードウェア処理速度に迫る高速度化を実現できる。 According to the present invention, each of the plurality of units receives the processing result finally obtained by the second processor in the basic unit by the plurality of internal basic units via the first bus interface. Since data is output from the data output port to the data bus, basic data transfer is performed between the basic units, and access to the main memory is performed only by the first-stage basic unit and the final-stage basic unit. Since data is transferred within each unit of multiple units, bus traffic to the main memory can be greatly reduced. As a result, the performance of the entire system can be improved, and each basic unit performs processing independently of each other. Therefore, so-called parallel / distributed processing is realized, the processing speed is dramatically improved, and the processing speed of the entire system is increased. It can achieve speeds approaching the hardware processing speed.

また、マルチプロセッサによる並列・分散処理ではプロセッサ間の複雑な制御が必要であるが、本発明では基本ユニット間のプロセッサ間制御はごく単純なものとなり、大規模化が容易であり、更に、基本ユニット同士はハードウェア接続のため、データ転送がリアルタイムで扱える。更にメインメモリへの一時的なデータ保存がなくなり無駄なバス使用が減少するため、システム全体の処理能力が向上する。 In addition, multi-processor parallel / distributed processing requires complex control between processors. However, in the present invention, control between processors between basic units is very simple and easy to scale up. Since the units are connected by hardware, data transfer can be handled in real time. Further, since temporary data storage in the main memory is eliminated and useless bus use is reduced, the processing capacity of the entire system is improved.

次に、本発明の実施の形態について図面と共に説明する。まず、本発明のマルチプロセッサシステムで用いるプロセッサの構成について説明する。本発明で用いるプロセッサは、基本ユニットである第１のユニット（以下、「Ｐユニット」という）と、図１１の一つのＣＰＵに相当する第２のユニット（以下、「Ｍユニット」という）とから少なくとも構成される。 Next, embodiments of the present invention will be described with reference to the drawings. First, the configuration of the processor used in the multiprocessor system of the present invention will be described. The processor used in the present invention includes a first unit (hereinafter referred to as “P unit”) which is a basic unit and a second unit (hereinafter referred to as “M unit”) corresponding to one CPU in FIG. At least composed.

図１はＰユニットの一実施の形態のブロック図を示す。同図に示すように、Ｐユニット１００は、一つのデータ入力ポート１０１と、ダイレクト・メモリ・アクセス転送（ＤＭＡ転送）を行わせるコントローラ、すなわち、ＤＭＡＣによるバスインタフェース（バスＩ／Ｆ）１０２と、各種の演算を行う一つのＣＰＵ１０３と、命令を格納している一つのプログラムメモリ１０４と、一つのデータメモリ（ＲＡＭ）１０５と、一つのデータ出力ポート１０６とより構成される。 FIG. 1 shows a block diagram of an embodiment of a P unit. As shown in the figure, the P unit 100 includes one data input port 101, a controller that performs direct memory access transfer (DMA transfer), that is, a bus interface (bus I / F) 102 by DMAC, It comprises a single CPU 103 that performs various operations, a single program memory 104 that stores instructions, a single data memory (RAM) 105, and a single data output port 106.

Ｐユニット１００では、データ入力ポート１０１に入力され、バスＩ／Ｆ１０２を経由してデータ出力ポート１０６から出力されるデータは、処理データのみであり、時間的に一定の間隔で処理結果が得られるものである。また、プログラムメモリ１０４からＣＰＵ１０３に対する命令（ファームウェア）により、自由に機能実装ができるため（プログラマブルなので処理内容をかえることができるため）、複雑な処理の実現も可能である。 In the P unit 100, the data input to the data input port 101 and output from the data output port 106 via the bus I / F 102 is only the processing data, and the processing results are obtained at regular intervals. Is. In addition, since functions can be freely implemented by an instruction (firmware) from the program memory 104 to the CPU 103 (because it is programmable, processing contents can be changed), it is possible to realize complicated processing.

図２はＭユニットの一実施の形態のブロック図を示す。同図に示すように、Ｍユニット２００は、一つのデータ入力ポート２０１と、高機能ＤＭＡＣによるバスＩ／Ｆ２０２と、Ｐユニット接続ブロック２０３と、一つのデータ出力ポート２０４とから構成される。Ｐユニット接続ブロック２０３は、データ入力ポート２０１から入力されたデータが、バスＩ／Ｆ２０２を介して入力される、それぞれ図１の構成の複数のＰユニットからなり、処理したデータをバスＩ／Ｆ２０２を経由してデータ出力ポート２０４へ出力する構成である。従って、Ｍユニット２００は、Ｐユニット接続ブロック２０３を構成するＰユニットの数及び接続状態により、複数の種類が存在する。 FIG. 2 shows a block diagram of an embodiment of the M unit. As shown in the figure, the M unit 200 includes one data input port 201, a bus I / F 202 using a high-performance DMAC, a P unit connection block 203, and one data output port 204. The P unit connection block 203 includes a plurality of P units each configured as shown in FIG. 1 in which data input from the data input port 201 is input via the bus I / F 202, and the processed data is transferred to the bus I / F 202. The data is output to the data output port 204 via the. Accordingly, there are a plurality of types of M units 200 depending on the number of P units constituting the P unit connection block 203 and the connection state.

例えば、Ｐユニット接続ブロック２０３を構成するＰユニットを２つ持つＭユニットとしては、図３（Ａ）に示すＭユニット２１０と、図３（Ｂ）に示すＭユニット２２０の２種類が存在する。図３（Ａ）に示すＭユニット２１０は、直列に接続された２つのＰユニット２１１及び２１２により、Ｐユニット接続ブロック２０３を構成している。また、図３（Ｂ）に示すＭユニット２２０は、並列に接続された２つのＰユニット２２１及び２２２により、Ｐユニット接続ブロック２０３を構成している。 For example, there are two types of M units having two P units constituting the P unit connection block 203: an M unit 210 shown in FIG. 3A and an M unit 220 shown in FIG. The M unit 210 shown in FIG. 3A constitutes a P unit connection block 203 by two P units 211 and 212 connected in series. Also, the M unit 220 shown in FIG. 3B constitutes a P unit connection block 203 by two P units 221 and 222 connected in parallel.

上記のＰユニット接続ブロック２０３を構成する複数のＰユニット２１１及び２１２、又は２２１及び２２２は、独立で処理を行うため、いわゆる並列・分散処理が実現し、処理速度が飛躍的に向上する。 Since the plurality of P units 211 and 212 or 221 and 222 constituting the P unit connection block 203 perform processing independently, so-called parallel / distributed processing is realized, and the processing speed is dramatically improved.

このように、Ｍユニット２００は、目的に応じてＰユニットの接続状態を自由に変えられる仕組みを持っている。Ｐユニット同士の接続はそれぞれの入力と出力を直接ハードウェア的に接続する方法もあるし、お互いのＤＭＡＣが相手と連携し、データを直接やり取りできる方法もある。Ｐユニット同士の接続を、それぞれの入力と出力を直接ハードウェア的に接続するものとした場合は、データ転送がリアルタイムで扱える。 Thus, the M unit 200 has a mechanism that can freely change the connection state of the P unit according to the purpose. The P units can be connected to each other by directly connecting the input and output in hardware, or by the mutual DMAC cooperating with the other party and directly exchanging data. Data transfer can be handled in real time when the connections between P units are such that their inputs and outputs are directly connected by hardware.

図４は本発明になるマルチプロセッサシステムの一実施の形態のブロック図を示す。同図に示すように、本実施の形態は、システム制御ＣＰＵ３０１がプログラムメモリ３０２に接続され、また、データバス３０４を介してｋ台（ｋは１以上の自然数）のＭユニット２００−１、２００−２、・・・、２００−ｋと、メインメモリ３０３に接続された構成であり、システム制御ＣＰＵ３０１とＭユニット２００−１〜２００−ｋとは、データバス３０４を介してメインメモリ３０３を共有している。 FIG. 4 shows a block diagram of an embodiment of a multiprocessor system according to the present invention. As shown in the figure, in this embodiment, a system control CPU 301 is connected to a program memory 302, and k units (k is a natural number of 1 or more) M units 200-1 and 200 are connected via a data bus 304. ,..., 200-k and the main memory 303 are connected to each other, and the system control CPU 301 and the M units 200-1 to 200-k share the main memory 303 via the data bus 304. is doing.

次に、本実施の形態の動作について、図５のフローチャートを併せ参照して説明する。ここでは、一例として処理の流れがＭユニット２００−１、２００−２、・・・、２００−ｋと順番に進む場合を例にとって説明する。システム制御ＣＰＵ３０１が、プログラムメモリ３０２からの命令に従って、まずＭユニット２００−１へデータバス３０４を介して処理開始命令を出力する（ステップＳ２１）。すると、Ｍユニット２００−１は、処理用データをメインメモリ３０３からデータバス３０４を介して読み込み（ステップＳ２２）、その処理用データに対してＭユニット２００−１内の複数のＰユニットが後述するように順次にデータ処理する（ステップＳ２３）。Ｍユニット２００−１はこのようにしてデータ処理した結果を、データバス３０４を介してメインメモリ３０３へ転送し、書き込む（ステップＳ２４）。 Next, the operation of the present embodiment will be described with reference to the flowchart of FIG. Here, a case where the flow of processing proceeds in order of M units 200-1, 200-2,..., 200-k will be described as an example. The system control CPU 301 first outputs a process start command to the M unit 200-1 via the data bus 304 in accordance with a command from the program memory 302 (step S21). Then, the M unit 200-1 reads processing data from the main memory 303 via the data bus 304 (step S22), and a plurality of P units in the M unit 200-1 will be described later with respect to the processing data. The data is sequentially processed as described above (step S23). The M unit 200-1 transfers the result of data processing in this way to the main memory 303 via the data bus 304 and writes it (step S24).

続いて、システム制御ＣＰＵ３０１は、次のＭユニット２００−２へデータバス３０４を介して処理開始命令を出力する（ステップＳ２５）。すると、Ｍユニット２００−２は、メインメモリ３０３に書き込まれているＭユニット２００１−により処理された処理結果をメインメモリ３０３からデータバス３０４を介して読み込み（ステップＳ２６）、その処理結果に対してＭユニット２００−２内の複数のＰユニットが後述するように順次データ処理し（ステップＳ２７）、その処理結果を、データバス３０４を介してメインメモリ３０３へ転送し、書き込む（ステップＳ２８）。以下、上記と同様にして、複数のＭユニットにより、順次データ処理が行われて、最終段のＭユニット２００−ｋにより処理された最終処理結果がメインメモリ３０３に格納されて一連の処理が終了する。 Subsequently, the system control CPU 301 outputs a process start command to the next M unit 200-2 via the data bus 304 (step S25). Then, the M unit 200-2 reads the processing result processed by the M unit 2001- written in the main memory 303 from the main memory 303 via the data bus 304 (step S26), and the processing result is A plurality of P units in the M unit 200-2 sequentially process data as described later (step S27), and transfer the processing result to the main memory 303 via the data bus 304 and write it (step S28). Thereafter, in the same manner as described above, data processing is sequentially performed by a plurality of M units, and the final processing result processed by the final M unit 200-k is stored in the main memory 303, and a series of processing ends. To do.

ここで、Ｍユニット２００−１〜２００−ｋのそれぞれのＭユニットの動作について、更に詳細に図６のフローチャートと共に説明する。Ｍユニット内のＰユニット接続ブロックにＰユニット０からＰユニットｎまでの（ｎ＋１）個のＰユニットがあり、処理の流れがＰユニット０、Ｐユニット１、・・・、Ｐユニットｎの順で進む場合を例にとって説明する。 Here, the operation of each M unit of the M units 200-1 to 200-k will be described in more detail with reference to the flowchart of FIG. There are (n + 1) P units from P unit 0 to P unit n in the P unit connection block in the M unit, and the processing flow is in the order of P unit 0, P unit 1,..., P unit n. An example of a case of proceeding will be described.

まず、システム制御ＣＰＵ３０１から処理開始命令を受けたＭユニットは、その内部のＰユニット接続ブロック内の初段のＰユニット０が、処理に必要な処理用データをメインメモリ３０３から読み込み（ステップＳ３１）、その処理用データを処理し（ステップＳ３２）、得られた処理結果をＰユニット０内のメモリ（図１のデータメモリ１０５に相当）に保存する。次段のＰユニット１はＰユニット０内のメモリからＰユニット０の処理結果のデータを読み込み（ステップＳ３３）、その処理結果のデータに対して所定の処理を行い（ステップＳ３４）、同様に得られた処理結果をＰユニット１内のメモリ（図１のデータメモリ１０５に相当）に保存する。 First, in the M unit that has received the processing start command from the system control CPU 301, the first P unit 0 in the internal P unit connection block reads processing data necessary for processing from the main memory 303 (step S31). The processing data is processed (step S32), and the obtained processing result is stored in a memory (corresponding to the data memory 105 in FIG. 1) in the P unit 0. The P unit 1 in the next stage reads the data of the processing result of the P unit 0 from the memory in the P unit 0 (step S33), performs a predetermined process on the data of the processing result (step S34), and similarly obtains it. The processing result thus obtained is stored in a memory in the P unit 1 (corresponding to the data memory 105 in FIG. 1).

次段のＰユニット２はＰユニット１内のメモリからＰユニット１の処理結果のデータを読み込む（ステップＳ３５）。以下、上記と同様の動作を繰り返し、最終段のＰユニットｎがデータを処理し（ステップＳ３６）、得られた最終の処理結果をＰユニットｎがメインメモリ３０３に書き込む（ステップＳ３７）。 The P unit 2 at the next stage reads data of the processing result of the P unit 1 from the memory in the P unit 1 (step S35). Thereafter, the same operation as described above is repeated, the last P unit n processes the data (step S36), and the final processing result obtained is written in the main memory 303 (step S37).

このように、本実施の形態では、基本的なデータ転送はＰユニット間で行われ、メインメモリ３０３へのアクセスは、図６のステップＳ３１での初段Ｐユニット０と、ステップＳ３７での最終段Ｐユニットｎのみしか行わず、それ以外ではＭユニット内でデータ転送するため、メインメモリ３０３へのバストラフィックは大幅に減少する。この結果、システム全体のパフォーマンス向上が可能となる。 As described above, in this embodiment, basic data transfer is performed between P units, and access to the main memory 303 is performed by the first stage P unit 0 in step S31 in FIG. 6 and the last stage in step S37. Since only the P unit n is performed and data is transferred in the M unit otherwise, the bus traffic to the main memory 303 is greatly reduced. As a result, the performance of the entire system can be improved.

また、図１１に示した従来のマルチプロセッサシステムで、例えば５個のＣＰＵ処理部１１−１〜１１−５が行う処理を、図４のＭユニット２００−１で実現し、図１１の例えば３個のＤＳＰ処理部（１２−１〜１２−３）が行う処理を、図４のＭユニット２００−２で実現すると、メインメモリ３０３へのアクセスは従来１６対（読み出し、書き込みで１対）行われていたものが、図４の実施の形態では２対へと大幅に減少する。 Further, in the conventional multiprocessor system shown in FIG. 11, for example, the processing performed by the five CPU processing units 11-1 to 11-5 is realized by the M unit 200-1 in FIG. When the processing performed by the DSP processing units (12-1 to 12-3) is realized by the M unit 200-2 in FIG. 4, the access to the main memory 303 is conventionally 16 pairs (one pair for reading and writing). What has been described is greatly reduced to two pairs in the embodiment of FIG.

更に、本実施の形態では、各Ｍユニットでまとまった処理を行うため、メインメモリ３０３でのデータの一時保存は無くなる。全システムの大規模化はＭユニットの数で対応できるが、Ｍユニットの大規模化でも対応できる。システムの目的、効率にあった方法がとれる。本実施の形態においては、Ｐユニット間のプロセッサ間制御はごく単純なものとなり、大規模化が容易である。 Furthermore, in this embodiment, since the processing is performed in each M unit, temporary storage of data in the main memory 303 is eliminated. Large scale of all systems can be handled by the number of M units, but large scale of M units can be dealt with. A method that suits the purpose and efficiency of the system can be taken. In the present embodiment, the inter-processor control between the P units is very simple and can be easily scaled up.

また、更に、近年ＣＰＵは飛躍的に微細化が進み、図４のＣＰＵ３０１、プログラムメモリ３０２、Ｍユニット２００−１〜２００−ｋ及びデータバス３０４の多数のプロセッサを含む部分を一つの大規模半導体集積回路（ＬＳＩ）で構成することもでき、このようなＬＳＩを用いて大規模なマルチプロセッサシステムを構築することができる。 Further, in recent years, CPUs have been remarkably miniaturized, and a part including a CPU 301, a program memory 302, an M unit 200-1 to 200-k, and a data bus 304 in FIG. An integrated circuit (LSI) can be used, and a large-scale multiprocessor system can be constructed using such an LSI.

なお、本発明は上記の実施の形態に限定されるものではなく、例えば、図６では初段のＰユニット０からデータの処理が開始され、以下、次段のＰユニット１、・・・と処理が進み、最終段のＰユニットｎで処理が終了するように説明したが、本発明はこれに限らず、Ｍユニットが図３（Ｂ）に示すような構成の場合は、複数のＰユニットが並列に同時に動作を行う構成も可能であり、また、直列接続されたＰユニット群と、並列接続されたＰユニット群とが混在する構成であっても勿論よい。 Note that the present invention is not limited to the above-described embodiment. For example, in FIG. 6, data processing is started from the first P unit 0, and the following P unit 1,... However, the present invention is not limited to this, and when the M unit is configured as shown in FIG. 3B, a plurality of P units are included. A configuration in which operations are performed simultaneously in parallel is also possible, and of course, a configuration in which a P unit group connected in series and a P unit group connected in parallel may coexist.

また、図４において、システム制御ＣＰＵ３０１と共にデータバス３０４を介してメインメモリ３０３を共有しているＭユニット２００−１〜２００−ｋは、それぞれ同一構成であってもよいし、それらの一部又は全部が互いに異なる構成であってもよいことは勿論である。 In FIG. 4, the M units 200-1 to 200-k sharing the main memory 303 with the system control CPU 301 via the data bus 304 may have the same configuration, or part or Of course, all may be different from each other.

更に、上記の実施の形態では、ユニットはＰユニット１００とＭユニット２００の２種類で説明したが、本発明のマルチプロセッサシステムにおいて、データバスに接続されるユニットは、Ｍユニット２００に限定されるものではなく、例えば、図２のＰユニット接続ブロック２０３に替えてＭユニット接続ブロックとする第３のユニットであってもよく、要は、ＤＭＡＣによるバスＩ／Ｆに接続されるユニット接続ブロックが、少なくともＰユニット１００を基本単位として多層化した集合体であればよい。 Furthermore, in the above embodiment, two types of units have been described: the P unit 100 and the M unit 200. However, in the multiprocessor system of the present invention, the units connected to the data bus are limited to the M unit 200. For example, the third unit may be an M unit connection block instead of the P unit connection block 203 in FIG. 2. In short, a unit connection block connected to the bus I / F by DMAC is used. As long as it is an assembly in which at least the P unit 100 is used as a basic unit, the assembly may be multilayered.

本発明システムで用いるＰユニットの一実施の形態のブロック図である。It is a block diagram of one embodiment of a P unit used in the system of the present invention. 本発明システムで用いるＭユニットの一実施の形態のブロック図である。It is a block diagram of one embodiment of the M unit used in the system of the present invention. Ｐユニット接続ブロック内のＰユニットが２つの場合のＭユニットの各例を示すブロック図である。It is a block diagram which shows each example of M unit in case P units in a P unit connection block are two. 本発明のマルチプロセッサシステムの一実施の形態のブロック図である。It is a block diagram of one embodiment of a multiprocessor system of the present invention. 図４のマルチプロセッサシステムの全体動作説明用フローチャートである。5 is a flowchart for explaining the overall operation of the multiprocessor system of FIG. 4. 図４のマルチプロセッサシステムを構成するＭユニット内の動作説明用フローチャートである。6 is a flowchart for explaining operations in an M unit constituting the multiprocessor system of FIG. 4. プロセッサの処理手順の一例を示す図である。It is a figure which shows an example of the process sequence of a processor. 処理ブロックが時間の経過と共に処理するデータ群の一例を示す図である。It is a figure which shows an example of the data group which a process block processes with progress of time. 処理ブロックが処理する単位処理データの処理順を示す図である。It is a figure which shows the process order of the unit process data which a process block processes. プロセッサが処理を進める際の一例の動作説明用フローチャートである。It is a flowchart for operation | movement description of an example when a processor advances a process. 従来のマルチプロセッサシステムの一例のブロック図である。It is a block diagram of an example of the conventional multiprocessor system. 図１１の動作説明用フローチャートである。12 is a flowchart for explaining the operation of FIG. 11.

Explanation of symbols

１００、２１１、２１２、２２１、２２２Ｐユニット
１０１、２０１データ入力ポート
１０２ＤＭＡＣによるバスＩ／Ｆ
１０３中央処理装置（ＣＰＵ）
１０４、３０２プログラムメモリ
１０５データメモリ（ＲＡＭ）
１０６、２０４データ出力ポート
２００、２００−１〜２００−ｋ、２１０、２２０Ｍユニット
２０２高機能ＤＭＡＣによるバスＩ／Ｆ
２０３Ｐユニット接続ブロック
３０１システム制御中央処理装置（ＣＰＵ）
３０３メインメモリ
３０４データバス

100, 211, 212, 221, 222 P unit 101, 201 Data input port 102 Bus I / F by DMAC
103 Central processing unit (CPU)
104, 302 Program memory 105 Data memory (RAM)
106, 204 Data output port 200, 200-1 to 200-k, 210, 220 M unit 202 Bus I / F by high-performance DMAC
203 P unit connection block 301 System control central processing unit (CPU)
303 Main memory 304 Data bus

Claims

A multiprocessor system having a configuration in which a first processor for system control is connected to a first program memory and is connected to a main memory via a data bus together with a plurality of units,
Each of the plurality of units is
In response to a processing start command from the first processor, data is read from the main memory via the first data input port, and a final processing result is output from the first data output port to the main memory. A first bus interface by a memory transfer controller;
A connection is made so that data input via the first bus interface is subjected to predetermined processing independently for each unit, and the final processing result is output to the first bus interface. A plurality of basic units, and the number of the basic units and the connection method to each other are the same or partly or entirely different from each other,
Each of the plurality of basic units is
A second bus interface by a direct memory transfer controller;
A second program memory in which a program for performing a predetermined operation is stored;
Predetermined processing is performed on the data input to the second bus interface via the second data input port according to the program in the second program memory, and the obtained processing result is displayed on the second bus. A second processor for outputting to a second data output port via the interface;
A multiprocessor system, comprising: a data memory for temporarily storing the processing result by the second processor.

In each of the plurality of units, the basic unit in the first stage among the plurality of basic units in the unit accesses the main memory to read processing data from the main memory, and passes through the first bus interface. The processing data input in step 1 is processed and the processing result is read into the data memory in the first-stage basic unit, and the remaining basic units excluding the first-stage basic unit among the plurality of basic units. Receives the result processed by the previous basic unit, outputs the processed result to the next basic unit, and obtains the final result obtained by processing the final basic unit among the plurality of basic units. 2. The processing result is written to the main memory via the first bus interface. Multi-processor system.