JP2002530737A

JP2002530737A - Simultaneous processing of event-based systems

Info

Publication number: JP2002530737A
Application number: JP2000582885A
Authority: JP
Inventors: ホルムベルグ、ペル、アンデルス; − オルヤンクリング、ラルス; ヨンソン、ステン、エドヴァルド; ソホニ、ミリンド; テイケカル、ニクヒル
Original assignee: テレフオンアクチーボラゲツトエルエムエリクソン（パブル）
Priority date: 1998-11-16
Filing date: 1999-11-12
Publication date: 2002-09-17
Anticipated expiration: 2019-11-12
Also published as: KR100401443B1; BR9915363B1; WO2000029942A1; JP4489958B2; CA2350922C; AU1437300A; KR20010080958A; BR9915363A; CA2350922A1; EP1131703A1

Abstract

(57)【要約】本発明によれば、階層化分散処理システム（１）の最上位レベルに複数の共有メモリプロセッサ（１１）が導入され、システムで認識される同時発生イベントフローに基づいてプロセッサの利用が最適化される。第１のアスペクトによれば、イベントのいわゆる非交換カテゴリ（ＮＣＣ）は同時処理実行用の複数のプロセッサ（１１）にマッピングされる。発明の第２のアスペクトによれば、プロセッサ（１１）はマルチプロセッサパイプラインとして動作し、パイプラインに到達する各イベントは、パイプラインのそれぞれ異なる段で実行される内部イベントチェーンとしてスライス単位で処理される。いわゆる行列処理によって一般的な処理構造が得られ、非交換カテゴリは、それぞれ異なるプロセッサセットによって実行され、少なくとも１つのプロセッサセットがマルチプロセッサパイプラインとして動作することにより、パイプラインのそれぞれ異なるプロセッサ段で外部イベントがスライス単位で処理される。 (57) [Summary] According to the present invention, a plurality of shared memory processors (11) are introduced at the highest level of a hierarchical distributed processing system (1), and a processor based on a simultaneous event flow recognized by the system. Use is optimized. According to a first aspect, a so-called non-exchange category (NCC) of events is mapped to multiple processors (11) for simultaneous processing. According to a second aspect of the invention, the processor (11) operates as a multiprocessor pipeline, and each event arriving at the pipeline is processed in slices as an internal event chain executed at different stages of the pipeline. Is done. The general processing structure is obtained by so-called matrix processing, wherein the non-exchange categories are executed by different processor sets, and at least one processor set operates as a multi-processor pipeline, so that at different processor stages of the pipeline. External events are processed in slice units.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】（発明の技術分野）本発明は、一般にイベントベース処理システム、特に階層化分散処理システム
およびその処理システムにおける処理方法に関するものである。[0001] The present invention generally relates to an event-based processing system, and more particularly to a hierarchical distributed processing system and a processing method in the processing system.

【０００２】（発明の背景）演算処理上の観点から、多くのイベントベースのシステムは階層化分散処理シ
ステムとして構成される。例えば、現代の電気通信およびデータ通信ネットワー
クでは一般に、ネットワークからのイベントを処理するためのプロセッサ階層構
造が各ネットワークノードに含まれる。一般に、メッセージパッシングによって
通信する階層構造のプロセッサにおいて、プロセッサ階層の下位レベルのプロセ
ッサは、比較的単純なサブタスクの低レベル処理を行い、階層の上位レベルのプ
ロセッサは、比較的複雑なタスクの高レベル処理を行う。BACKGROUND OF THE INVENTION [0002] From a computational perspective, many event-based systems are configured as hierarchical distributed processing systems. For example, in modern telecommunications and data communications networks, each network node typically includes a processor hierarchy for processing events from the network. Generally, in a hierarchical processor that communicates by message passing, the lower-level processor of the processor hierarchy performs low-level processing of relatively simple subtasks, and the upper-level processor of the hierarchy performs higher-level processing of relatively complex tasks. Perform processing.

【０００３】これらの階層的アーキテクチャは、その本質的な同時性が既にある程度利用さ
れているが、単位時間あたりの処理イベント数が増加するにしたがって、機能の
増大にとってプロセッサ階層の上位レベルがボトルネックになる。例えば、プロ
セッサ階層構造が「木」構造になっていると、階層構造の最上位レベルにおける
プロセッサが最大のボトルネックになる。[0003] These hierarchical architectures already take advantage of their inherent concurrency to some extent, but as the number of processing events per unit time increases, the higher levels of the processor hierarchy become a bottleneck for increased functionality. become. For example, if the processor hierarchical structure is a "tree" structure, the processor at the highest level of the hierarchical structure becomes the largest bottleneck.

【０００４】この問題を軽減するための従来のアプローチは、主としてプロセッサクロック
周波数の高速化、高速メモリ、命令パイプライン処理に頼っている。[0004] Conventional approaches to alleviate this problem rely primarily on faster processor clock frequencies, faster memories, and instruction pipeline processing.

【０００５】（関連技術）Ｕｃｈｉｄａ他に付与された米国特許Ｎｏ．５，２３９，５３９には、複数の
呼プロセッサ（ｃａｌｌｐｒｏｃｅｓｓｏｒ）間で負荷を一様に分散すること
によってＡＴＭ交換の交換網を制御するためのコントローラが開示されている。
発呼順、または呼の各セルに付けられたチャンネル識別子によって、主プロセッ
サは発呼処理を呼プロセッサに割り当てる。交換状態コントローラは交換網にお
ける複数のバッファに関する利用情報を収集し、呼プロセッサは交換状態コント
ローラの内容に基づいて呼処理を行う。RELATED TECHNOLOGY US Pat. No. 5,239,539 discloses a controller for controlling a switching network of an ATM exchange by uniformly distributing the load among a plurality of call processors.
The main processor assigns the call processing to the call processor according to the call order or the channel identifier assigned to each cell of the call. The switching state controller collects usage information on a plurality of buffers in the switching network, and the call processor performs call processing based on the contents of the switching state controller.

【０００６】日本国特許要約書ＪＰ６２７６１９８では、複数のプロセッサユニットを用い
、互いに独立したユニットによってパケットの交換処理を行うパケット交換が開
示されている。Japanese Patent Abstract JP 6276198 discloses packet switching in which a plurality of processor units are used and packets are exchanged by independent units.

【０００７】日本国特許要約書ＪＰ４１００４４９Ａでは、ＡＴＭチャンネルをＳＴＭマル
チプレクシングすることによってＡＴＭ交換とシグナリングプロセッサアレイ（
ＳＰＡ）の間でシグナリングセルを分散するＡＴＭ通信システムが開示されてい
る。ルーティングタグ加算器によってそれぞれの仮想チャンネルに付加されるＳ
ＰＡの番号に基づいたＳＴＭを利用してシグナリングセルを交換することによっ
て、処理負荷の分散が実現される。[0007] In Japanese Patent Abstract JP 4140499A, ATM switching and signaling processor arrays (STM multiplexing) are provided by STM multiplexing of ATM channels.
An ATM communication system for distributing signaling cells among SPAs is disclosed. S added to each virtual channel by the routing tag adder
By exchanging signaling cells using the STM based on the PA number, the processing load can be distributed.

【０００８】日本国特許要約書ＪＰ５２７４２７９では、プロセッサ階層セットの形態を用
い、プロセッサ要素グループに並列パイプライン処理をさせる並列処理装置が開
示されている。Japanese Patent Abstract JP 5274279 discloses a parallel processing device that uses a processor hierarchical set form to cause a processor element group to perform parallel pipeline processing.

【０００９】（発明の概要）本発明の目的は、イベントベースの階層化分散処理システムのスループットを
向上させることである。特に、階層化システムでは上位プロセッサノードで形成
されるボトルネックの輻輳を緩和することが望ましい。SUMMARY OF THE INVENTION It is an object of the present invention to improve the throughput of an event-based hierarchical distributed processing system. In particular, in a hierarchical system, it is desirable to reduce the congestion of the bottleneck formed by the upper processor node.

【００１０】また、本発明の目的は、必ずしも必要ではないが望ましくは上位プロセッサノ
ードとして動作し、システムで認識されたイベントフローコンカレンシに基づい
てイベントを効率的に処理することができる処理システムを提供することである
。[0010] Further, an object of the present invention is not necessarily required but desirably operates as an upper processor node, and can efficiently process an event based on an event flow concurrency recognized by the system. It is to provide.

【００１１】また、本発明の目的は、既存のアプリケーションソフトウェアの再利用を可能
にすると共に、イベントフローにおいてコンカレンシを利用することができる処
理システムを提供することである。Another object of the present invention is to provide a processing system that enables reuse of existing application software and that can use concurrency in an event flow.

【００１２】また、本発明の目的は、階層化分散処理システムにおいてイベントを効率的に
処理するための方法を提供することである。Another object of the present invention is to provide a method for efficiently processing events in a hierarchical distributed processing system.

【００１３】上記目的およびその他の目的は特許請求の範囲で規定されるように、本発明に
よって達成される。The above and other objects are achieved by the present invention as defined in the claims.

【００１４】本発明による一般概念は、階層化分散処理システムの最上位レベル又は最上位
レベル群に複数の共有メモリプロセッサを導入し、システムで認識される同時発
生イベントフローに基づいて複数プロセッサの利用を最適化することである。The general concept according to the present invention is to introduce a plurality of shared memory processors at the highest level or group of highest levels of a hierarchical distributed processing system, and to utilize the plurality of processors based on a simultaneous event flow recognized by the system. Is to optimize.

【００１５】本発明の第１の特徴によると、外部のイベントフローは、非交換（ｎｏｎ−ｃ
ｏｍｍｕｔｉｎｇ）カテゴリと呼ばれるイベントの同時発生カテゴリに分割され
、これらの非交換カテゴリは同時実行用の複数プロセッサにマッピングされる。
一般に、非交換カテゴリはイベントのグループ分けであって、イベントの順序は
カテゴリ内に保存する必要があるが、カテゴリ間の順序付けは不要である。例え
ば、非交換カテゴリは、システムに接続された特定の入力ポート、リージョナル
（局地的な）プロセッサまたはハードウェア装置等の所定発生元から現れるイベ
ントで定義することができる。イベントの各非交換カテゴリは、１つ以上のプロ
セッサからなる所定プロセッサセットに割り当てられ、所定プロセッサセットか
ら発生する内部イベントは、そのプロセッサセットに割り当てられた非交換カテ
ゴリ又はカテゴリ群を保存するために、同じプロセッサセットにフィードバック
される。According to a first aspect of the invention, the external event flow is non-switched (non-c
ommuting category, and these non-exchange categories are mapped to multiple processors for concurrent execution.
Generally, the non-exchange category is a grouping of events, and the order of events needs to be stored in categories, but no ordering between categories is required. For example, a non-switched category may be defined by events coming from a given source, such as a particular input port connected to the system, a regional processor or a hardware device. Each non-exchange category of event is assigned to a predetermined processor set of one or more processors, and internal events emanating from the predetermined processor set are used to store the non-exchange category or categories assigned to that processor set. Are fed back to the same processor set.

【００１６】本発明の第２の特徴によると、複数のプロセッサは多くのプロセッサ段を含む
マルチプロセッサパイプラインとして動作し、パイプラインに到達するそれぞれ
の外部イベントは、パイプラインのそれぞれ異なる段で実行される内部イベント
のチェーンとしてスライス単位で処理される。一般に、それぞれのパイプライン
段は１つのプロセッサで実行されるが、所定のプロセッサがパイプライン中の１
つ以上の段を実行することも可能である。マルチプロセッサパイプラインを実現
するときに特に有利な方法は、共有メモリソフトウェアのソフトウェアブロック
／クラスのクラスタを各プロセッサに割り当てることであって、その場合、各イ
ベントを特定のブロックの対象として定め、この割り当てに基づいて各プロセッ
サにイベントが分散される。According to a second aspect of the invention, the plurality of processors operate as a multi-processor pipeline including a number of processor stages, with each external event arriving at the pipeline executing at a different stage of the pipeline. Is processed in slice units as a chain of internal events to be performed. In general, each pipeline stage is executed by one processor, but a given processor may have only one processor in the pipeline.
It is also possible to carry out more than one stage. A particularly advantageous way of implementing a multiprocessor pipeline is to assign a cluster of software blocks / classes of shared memory software to each processor, where each event is targeted to a particular block, Events are distributed to each processor based on the assignment.

【００１７】一般的な処理構造はいわゆるマトリックス処理で得られ、非交換カテゴリは互
いに異なるプロセッサセットによって実行され、少なくとも１つのプロセッサセ
ットは、パイプラインにおけるそれぞれ異なるプロセッサ段によってスライス単
位で外部イベントを処理するマルチプロセッサパイプラインとして動作するプロ
セッサアレイで形成される。A general processing structure is obtained by so-called matrix processing, wherein non-exchange categories are executed by different processor sets, and at least one processor set processes external events in slice units by different processor stages in the pipeline. And a processor array that operates as a multiprocessor pipeline.

【００１８】共有メモリシステムでは、全体のアプリケーションプログラムおよびデータは
システムのすべての共有メモリプロセッサにアクセスすることができる。したが
って、グローバルデータをプロセッサで処理するとき、データの整合性を確保す
る必要がある。In a shared memory system, the entire application program and data can access all shared memory processors in the system. Therefore, when processing global data with a processor, it is necessary to ensure data consistency.

【００１９】本発明によれば、イベントに応答して実行されるソフトウェアタスクで使用さ
れるグローバルデータをロックするか、あるいはオブジェクト指向ソフトウェア
デザインの場合にはソフトウェアブロック／オブジェクト全体をロックすること
によって、データの整合性を確保することができる。イベント処理に１つ以上の
ブロックからのリソースを必要とする場合は、タスクが互いにロックし合って、
ロックアプローチはデッドロックに陥るかもしれない。そして、デッドロックが
検出され、処理の進行を確実にするためにロールバックが行われるか、あるいは
、タスクに必要な全ブロックをタスク実行開始前に確保することによってデッド
ロックは完全に避けられる。According to the present invention, by locking global data used in software tasks executed in response to events, or in the case of object-oriented software design, by locking entire software blocks / objects, Data consistency can be ensured. If event processing requires resources from one or more blocks, the tasks lock onto each other,
The locking approach may be deadlocked. Then, a deadlock is detected and rollback is performed to ensure the progress of the processing, or the deadlock can be completely avoided by securing all blocks necessary for the task before starting the task execution.

【００２０】データの整合性を確保するもう一つのアプローチは、タスクの並行処理に基づ
いており、タスク間のアクセス衝突を検出し、衝突が検出された実行タスクをロ
ールバックして再開する。衝突は変数使用マーキング（ｖａｒｉａｂｌｅｕｓ
ａｇｅｍａｒｋｉｎｇｓ）に基づいて検出されるか、または読み出しアドレス
と書き込みアドレスを比較するアドレス比較に基づいて検出される。Another approach to ensuring data consistency is based on task parallelism, which detects access conflicts between tasks and rolls back and restarts the running task where the conflict was detected. Collision is a variable use marking
Age markings) or based on an address comparison that compares a read address with a write address.

【００２１】個々のデータの代わりに比較的広い領域をマーキングすれば、大まかな衝突チ
ェックが実現できる。If a relatively large area is marked instead of individual data, a rough collision check can be realized.

【００２２】本発明の解決策によれば、処理システムのスループットを実質的に向上させ、
階層化処理システムにおいて、上位ボトルネックの輻輳が効率的に緩和される。According to the solution of the invention, the throughput of the processing system is substantially increased,
In the hierarchical processing system, the congestion of the upper bottleneck is efficiently reduced.

【００２３】共有メモリマルチプロセッサを使用し、データの整合性を確保するのに適した
手段を利用すれば、単一プロセッサシステム用の既存アプリケーションソフトウ
ェアを再利用することができる。多くの場合、階層化処理システムにおける最上
位レベルの単一プロセッサノードなどの単一プロセッサシステム用として、既に
何百万行ものコードが利用可能である。即納の標準マイクロプロセッサを使用し
て複数のプロセッサを実装する場合、アプリケーションソフトウェアを自動変換
し、更に必要に応じてシステムの仮想計算機／オペレーティングシステムを変更
することによって、既存のアプリケーションソフトウェアをすべて再利用して複
数プロセッサをサポートすることができる。一方、専用設計の特殊ハードウェア
として複数プロセッサが実装されている場合、アプリケーションソフトウェアを
直接そのマルチプロセッサ環境に合わせて移行することができる。いずれにして
も、アプリケーションソフトウェアの設計を最初からはじめる場合と比較して、
貴重な時間の節約と、プログラミングコストの軽減が可能になる。If a shared memory multiprocessor is used and a means suitable for ensuring data consistency is used, existing application software for a single processor system can be reused. In many cases, millions of lines of code are already available for a single-processor system, such as a top-level single-processor node in a hierarchical processing system. Re-use all existing application software when implementing multiple processors using a standard out-of-the-box microprocessor, automatically converting application software and modifying the system's virtual machine / operating system as needed To support multiple processors. On the other hand, when a plurality of processors are implemented as specially designed special hardware, application software can be directly migrated to the multiprocessor environment. In any case, compared to starting the design of application software from the beginning,
It saves valuable time and reduces programming costs.

【００２４】本発明には以下の特長がある：・スループットの向上。・ボトルネックの輻輳緩和。・既存アプリケーションソフトウェアの再利用可能。特に、オブジェクト指向設
計の場合。The present invention has the following features: • Improved throughput. -Alleviation of bottleneck congestion. -Existing application software can be reused. Especially in the case of object-oriented design.

【００２５】その他の特長については、以下の発明実施例の記述において明らかになる。以下に、付図にしたがって、発明に関する上記以外の目的、特徴について記述
する。Other features will become apparent in the following description of the embodiment of the invention. Hereinafter, other objects and features of the present invention will be described with reference to the accompanying drawings.

【００２６】（発明実施例の詳細説明）付図のそれぞれにおいて、同じ参照符号は、対応する要素または同等の要素を
表す。Detailed Description of Inventive Embodiments In each of the accompanying drawings, the same reference numerals indicate corresponding or equivalent elements.

【００２７】図１は上位レベルプロセッサノードを備えた本発明による階層化分散処理シス
テムの概要図である。階層化分散処理システム１には、システム階層構造の複数
レベルにわたって分散された複数プロセッサノードを有する従来の木構造が含ま
れる。例えば、電気通信用のノードおよびルータに、階層化処理システムを見る
ことができる。当然、処理システムで処理されるイベント数が増加するにしたが
って上位レベルプロセッサノード、特に最上層プロセッサノードがボトルネック
になる。FIG. 1 is a schematic diagram of a hierarchical distributed processing system according to the present invention having upper-level processor nodes. The hierarchical distributed processing system 1 includes a conventional tree structure having a plurality of processor nodes distributed over a plurality of levels of the system hierarchy. For example, a hierarchical processing system can be found at telecommunications nodes and routers. Naturally, as the number of events processed by the processing system increases, a higher-level processor node, particularly a top-level processor node, becomes a bottleneck.

【００２８】本発明によれば、そのようなボトルネックの輻輳を緩和する効果的な方法とし
て、階層構造の最上位レベルで複数の共有メモリプロセッサ１１を使用する。図
１では、トップノード１０に複数のプロセッサが設けられている。複数の共有メ
モリプロセッサ１１は、標準マイクロプロセッサを基にしたマルチプロセッサシ
ステムとして実現することが望ましい。すべてのプロセッサ１１が共通のメモリ
、いわゆる共有メモリ１２を共有する。一般に、上位レベルのプロセッサノード
１０へ向かう非同期の外部イベントは、最初に入出力ユニット（Ｉ／Ｏ）１３に
到達し、そこからマッパあるいはディストリビュータ１４に転送される。マッパ
１４は処理用プロセッサ１１にイベントをマッピングまたは分散する。According to the present invention, as an effective way to alleviate such bottleneck congestion, a plurality of shared memory processors 11 are used at the highest level of the hierarchical structure. In FIG. 1, the top node 10 is provided with a plurality of processors. It is desirable that the plurality of shared memory processors 11 be realized as a multiprocessor system based on a standard microprocessor. All processors 11 share a common memory, a so-called shared memory 12. Generally, asynchronous external events going to higher-level processor nodes 10 first arrive at an input / output unit (I / O) 13 from where they are forwarded to a mapper or distributor 14. The mapper 14 maps or distributes events to the processing processor 11.

【００２９】プロセッサノード１０への外部イベントフローは、階層化処理システム１で認
識されるイベントフローコンカレンシに基づいて、イベントに関する複数の同時
カテゴリに分割される。なお、以下の記述において、このカテゴリは非交換カテ
ゴリ（ＮＣＣ）と呼ばれる。１つ以上のプロセッサ１１を含む所定のプロセッサ
セットに各ＮＣＣが割り当てられることをマッパ１４で確認することにより、同
時処理および複数プロセッサの最適利用が可能になる。マッパ１４はプロセッサ
１１のうちの１つ以上に実装することが可能であり、その場合、そのプロセッサ
をマッパ専用にすることが望ましい。The external event flow to the processor node 10 is divided into a plurality of simultaneous categories related to events based on the event flow concurrency recognized by the hierarchical processing system 1. In the following description, this category is called a non-exchange category (NCC). By confirming by the mapper 14 that each NCC is allocated to a predetermined processor set including one or more processors 11, simultaneous processing and optimal use of a plurality of processors become possible. The mapper 14 can be implemented on one or more of the processors 11, in which case it is desirable to dedicate that processor to the mapper.

【００３０】非交換カテゴリはイベントのグループ分けであって、イベントの順序はカテゴ
リ内で保存する必要があるが、異なるカテゴリ間で処理イベントの順序付けは不
要である。プロトコルで情報フローを管理するようなシステムでは、一定の関連
イベントが受け取り順に処理されることが、一般要件として必要がある。システ
ム構成如何に関わらず、これはシステムの不変条件である。適切なＮＣＣを識別
して、ＮＣＣの同時処理を行うことにより、与えられたシステムプロトコルに課
される順序付け要件が確実に満たされると同時に、イベントフローにおける固有
のコンカレンシが利用される。The non-exchange category is a grouping of events, and the order of events needs to be stored within the category, but the order of processing events between different categories is not required. In a system where information flows are managed by a protocol, it is necessary as a general requirement that certain related events be processed in the order in which they are received. Regardless of the system configuration, this is a system invariant. Identifying the appropriate NCC and performing concurrent processing of the NCC ensures that the ordering requirements imposed on a given system protocol are met while at the same time taking advantage of the inherent concurrency in the event flow.

【００３１】外部イベントをイベントチェーンとして「スライス単位」で処理または実行す
ることができれば、複数プロセッサのうち１セット以上をマルチプロセッサパイ
プラインとして動作させることによって、別の、あるいは追加の同時処理が可能
になる。マルチプロセッサパイプラインに到達する各外部イベントはこのように
してスライス単位で処理され、マルチプロセッサパイプラインのそれぞれ異なる
プロセッサ段で実行される。If external events can be processed or executed in “slice units” as an event chain, another or additional simultaneous processing is possible by operating one or more sets of a plurality of processors as a multiprocessor pipeline. become. Each external event arriving at the multiprocessor pipeline is thus processed in slice units and executed in different processor stages of the multiprocessor pipeline.

【００３２】したがって、それぞれ異なるプロセッサセットで各ＮＣＣを実行し、少なくと
も１つのプロセッサセットをマルチプロセッサパイプラインとして動作させる、
いわゆるマトリックス処理によって一般的な処理構造が得られる。なお、図１で
示されるプロセッサの論理「マトリックス」には、いくつかの空要素が含まれる
ことがある。図１に示されるプロセッサの論理マトリックスを単一行のプロセッ
サに減らすと、純粋なＮＣＣ処理が得られ、また、このマトリックスを単一列の
プロセッサに減らすと、純粋なイベントレベルパイプライン処理が得られる。Thus, each NCC is executed on a different processor set, and at least one processor set is operated as a multi-processor pipeline;
A general processing structure is obtained by so-called matrix processing. Note that the logic "matrix" of the processor shown in FIG. 1 may include some empty elements. Reducing the logic matrix of the processor shown in FIG. 1 to a single row processor results in pure NCC processing, and reducing this matrix to a single column processor results in pure event level pipeline processing.

【００３３】一般に、イベントベースのシステムでの演算は、外部からの入力イベントでシ
ステムの状態を変化させて出力イベントを生成する状態マシンとしてモデル化さ
れる。独立／非結合な（ｉｎｄｅｐｅｎｄｅｎｔ／ｄｉｓｊｏｉｎｔ）状態マシ
ンによってそれぞれの非交換カテゴリ／パイプライン段を処理できるとすれば、
様々な状態マシン間でデータの共有はあり得ないだろう。しかし、グローバル状
態またはグローバル変数で表されるグローバルリソースが存在するならば、与え
られたグローバル状態の計算は一般に、１つのプロセッサだけの「原子的（ａｔ
ｏｍｉｃ）」なものでなければならず、それはシステム状態マシンの一部を実行
し与えられた１つのグローバル状態に対して一度にアクセスする。ＮＣＣ／パイ
プラインベースの実行によって、いわゆるシーケンス依存チェックは不要になる
。In general, operations in an event-based system are modeled as a state machine that changes the state of the system with an external input event to generate an output event. Given that each independent non-exchange category / pipeline stage can be handled by an independent / disjoint state machine,
There will be no sharing of data between the various state machines. However, if there is a global resource represented by a global state or global variable, the computation of a given global state is generally "atomic (at
omic), which executes a portion of the system state machine and accesses a given global state at a time. With the NCC / pipeline-based execution, so-called sequence dependency checks are not required.

【００３４】より深く理解するために、以下の例を考察する。別の通信ノードへの空きチャ
ンネルなどのリソースが、あるグローバル変数セットにしたがって割り当てられ
ると仮定する。この場合、異なるＮＣＣの２つの非同期ジョブに関して、空きチ
ャネルを要求する順序は重要ではなく、最初の要求には、選択基準と合う第１の
チャネルが割り当てられ、２番目の要求には、基準と合う次の利用可能なチャネ
ルが割り当てられる。重要な点は、１つのジョブの進行中にチャネルの選択が行
われ、他のジョブがそれに干渉しないことである。チャンネル割り当てを決める
グローバル変数に対するアクセスは「原子的（ａｔｏｍｉｃ）」でなければなら
ない（しかし、特別な場合にはチャンネルサーチを並列化することが可能である
）。For a better understanding, consider the following example. Assume that resources, such as free channels to another communication node, are allocated according to some global variable set. In this case, for two asynchronous jobs of different NCCs, the order of requesting free channels is not important, the first request is assigned the first channel that meets the selection criteria, and the second request is The next available channel that fits is assigned. The important point is that the channel selection is made while one job is in progress and other jobs do not interfere with it. Access to global variables that determine channel assignments must be "atomic" (but in special cases it is possible to parallelize the channel search).

【００３５】もう一つの例は、ＮＣＣが異なる２つのジョブに関するもので、カウンタをイ
ンクリメントする必要がある。どのジョブが最初にカウンタをインクリメントす
るかは重要でないが、インクリメントするために一方のジョブがカウンタ変数を
操作（現在値を読んで、それに１を加算）している間、他方のジョブによる干渉
を禁止する。Another example is for two jobs with different NCCs, where the counter needs to be incremented. It does not matter which job increments the counter first, but while one job is manipulating the counter variable (reading the current value and adding 1 to it) to increment, interference from the other job is Ban.

【００３６】共有メモリシステムでは、共有メモリ１２における全アプリケーションプログ
ラムスペースとデータスペースはすべてのプロセッサからアクセスできる。した
がって、プロセッサはすべてのプロセッサまたは少なくとも２つ以上のプロセッ
サに共通なグローバル変数を操作する必要があるから、データの整合性を確保す
る必要がある。これは図１の参照符号１５で示されるデータ整合手段によって達
成される。In a shared memory system, all application program space and data space in shared memory 12 can be accessed by all processors. Therefore, since the processor needs to operate a global variable common to all processors or at least two or more processors, it is necessary to ensure data consistency. This is achieved by means of data matching, indicated by reference numeral 15 in FIG.

【００３７】以下の記述では、本発明の第１の特徴としてのＮＣＣ処理、本発明の第２の特
徴としてのイベントレベルパイプライン処理、データの整合性を確保するための
手順および手段について説明する。In the following description, an NCC process as a first feature of the present invention, an event level pipeline process as a second feature of the present invention, and a procedure and means for ensuring data consistency will be described. .

【００３８】ＮＣＣ処理図２は発明の第１の特徴によるイベント駆動処理システムの概要図である。処
理システムは、複数の共有メモリプロセッサＰ１〜Ｐ４、共有メモリ１２、入出
力ユニット１３、ディストリビュータ１４、データ整合手段１５、複数の独立し
た並列のイベントキュー１６を有する。NCC Processing FIG. 2 is a schematic diagram of an event driven processing system according to the first aspect of the present invention. The processing system includes a plurality of shared memory processors P1 to P4, a shared memory 12, an input / output unit 13, a distributor 14, a data matching unit 15, and a plurality of independent parallel event queues 16.

【００３９】入出力ユニット１３は外部からの入力イベントを受け取り、出力イベントを送
出する。ディストリビュータ１４は入力イベントを非交換カテゴリ（ＮＣＣ）に
分割し、各ＮＣＣを所定の独立したイベントキュー１６に分散する。各イベント
キューはそれぞれ対応するプロセッサに接続され、各プロセッサはその関連イベ
ントキューから処理のためのイベントを順々にフェッチ、すなわち取り込む。イ
ベントの優先順位が互いに異なっていれば、プロセッサが優先順位にしたがって
イベントを処理するように考慮する必要がある。The input / output unit 13 receives an external input event and sends an output event. The distributor 14 divides incoming events into non-switched categories (NCCs) and distributes each NCC to a predetermined independent event queue 16. Each event queue is connected to a respective processor, and each processor sequentially fetches, or fetches, events from its associated event queue for processing. If the priorities of the events are different from each other, it is necessary to consider that the processor processes the events according to the priorities.

【００４０】例えば、上位レベルの主プロセッサノードと、リージョナルプロセッサと呼ば
れる複数の下位レベルプロセッサとを含み、各リージョナルプロセッサが交代で
複数のハードウェア装置を受持つ階層化処理システムを考察する。このようなシ
ステムでは、ハードウェア装置から発生したイベントと、一群の装置を受持つリ
ージョナルプロセッサから発生したイベントは、所定のプロトコルで定義される
順序要件に関わる諸条件を満足する（上位レベルでの処理によって保護されるエ
ラー条件を除いて）。したがって、特定装置／リージョナルプロセッサからのイ
ベントは非交換カテゴリを形成する。非交換カテゴリを保存するために、それぞ
れの装置／リージョナルプロセッサは、そのイベントを常に同じプロセッサに送
る必要がある。For example, consider a hierarchical processing system that includes an upper-level main processor node and a plurality of lower-level processors called regional processors, where each regional processor alternately takes over a plurality of hardware devices. In such a system, an event generated from a hardware device and an event generated from a regional processor serving a group of devices satisfy various conditions related to an order requirement defined by a predetermined protocol (at a higher level). Except for error conditions protected by processing). Thus, events from a particular device / regional processor form a non-exchange category. To preserve the non-exchange category, each device / regional processor must always send its events to the same processor.

【００４１】例えば電気通信アプリケーションでは、ユーザから受信した数字シーケンス、
またはトランク装置に対するＩＳＤＮのユーザ部分メッセージシーケンスは、受
信順に処理する必要があるが、２つの独立したトランク装置に対するメッセージ
シーケンスは、個々のトランク装置に対する順序が保存される限り、順不同で処
理することができる。For example, in a telecommunications application, a digit sequence received from a user,
Alternatively, the ISDN user part message sequence for a trunk device needs to be processed in the order received, but the message sequence for two independent trunk devices can be processed out of order as long as the order for the individual trunk devices is preserved. it can.

【００４２】図２では、所定の発生源Ｓ１、例えば特定のハードウェア装置または入力ポー
トからのイベントは、所定のプロセッサＰ１にマッピングされ、別の所定の発生
源Ｓ２、例えば特定のリージョナルプロセッサからのイベントは、別の所定のプ
ロセッサＰ３にマッピングされる。一般に共有メモリプロセッサ数と比べて発生
源の数の方がはるかに多いので、通常は各プロセッサに対して複数の発生源が割
り当てられる。典型的な電気通信／データ通信に応用する場合、単一の主プロセ
ッサノードに１０２４のリージョナルプロセッサが接続される。主ノードにおけ
る複数の共有メモリプロセッサに負荷平衡方式でリージョナルプロセッサをマッ
ピングすることは、それぞれの共有メモリプロセッサがおよそ２５６のリージョ
ナルプロセッサに対応することを意味する（主ノードに４つのプロセッサが含ま
れ、各リージョナルプロセッサからそれぞれ同じ負荷が発生すると仮定）。しか
し、実際には更に細分化して、シグナリング装置や加入者端末等のハードウェア
装置を主ノードプロセッサにマッピングするのが好ましい。このようにすると、
一般に負荷平衡を取りやすくなる。電気通信ネットワークにおける各リージョナ
ルプロセッサは何百台ものハードウェア装置を制御する場合がある。したがって
、１０，０００あるいはそれ以上のハードウェア装置を単一のプロセッサにマッ
ピングする（もちろんタイムシェアリングで負荷処理するのであるが）代わりに
、本発明による解決策では、主ノードにおける複数の共有メモリプロセッサに各
ハードウェア装置をマッピングすることで、主ノードのボトルネックの輻輳を緩
和する。In FIG. 2, an event from a given source S1, for example a particular hardware device or input port, is mapped to a given processor P1, and from another given source S2, for example a particular regional processor. The event is mapped to another predetermined processor P3. Since the number of sources is generally much greater than the number of shared memory processors, multiple sources are typically assigned to each processor. For typical telecommunications / data communications applications, 1024 regional processors are connected to a single main processor node. Mapping the regional processors to the plurality of shared memory processors in the main node in a load-balancing manner means that each shared memory processor corresponds to approximately 256 regional processors (the main node includes four processors, Assume that the same load is generated from each regional processor). However, in practice, it is preferable to further subdivide and map hardware devices such as signaling devices and subscriber terminals to the main node processor. This way,
Generally, it becomes easier to balance the load. Each regional processor in a telecommunications network may control hundreds of hardware devices. Thus, instead of mapping 10,000 or more hardware devices to a single processor (although load sharing is of course time-sharing), the solution according to the invention consists of multiple shared memories in the main node. By mapping each hardware device to the processor, the congestion of the bottleneck of the main node is reduced.

【００４３】外部イベントをプロセッサ対プロセッサ（ＣＰ−ｔｏ−ＣＰ）信号すなわちい
わゆる内部イベントで接続されるスライス単位で処理するＡＸＥＤｉｇｉｔａ
ｌＳｗｉｔｃｈｉｎｇＳｙｓｔｅｍｏｆＴｅｌｅｆｏｎａｋｔｉｅｂｏ
ｌａｇｅｔＬＭＥｒｉｃｓｓｏｎなどのシステムでは、プロトコルによる要
件の他にそれ自身の順序付け要件が加わる。ＮＣＣに関するこのようなＣＰ−ｔ
ｏ−ＣＰ信号は、（実行中の最後のスライスで発生する上位優先信号で置換され
ない限り）それらの発生順に処理する必要がある。図２でプロセッサとイベント
キューを結ぶ破線で示されるように各ＣＰ−ｔｏ−ＣＰ信号（内部イベント）が
その発生源である同一プロセッサで処理されるならば、この付加的な順序付け要
件は満たされる。したがって、内部イベントは、それらを生成した同じプロセッ
サかプロセッサセットへのフィードバックによって同一ＮＣＣ内に保持され、そ
の結果、各内部イベントがその生成順に処理されることが保証される。AX Digita that processes external events in units of slices connected by processor-to-processor (CP-to-CP) signals, ie, so-called internal events
l Switching System of Telefonaktiebo
Systems such as the label LM Ericsson have their own ordering requirements in addition to the protocol requirements. Such CP-t for NCC
The o-CP signals need to be processed in their order of occurrence (unless replaced by higher priority signals occurring in the last slice in execution). This additional ordering requirement is satisfied if each CP-to-CP signal (internal event) is processed by the same processor that generated it, as shown by the dashed line connecting the processor and the event queue in FIG. . Thus, internal events are kept in the same NCC by feedback to the same processor or processor set that generated them, thereby ensuring that each internal event is processed in its generation order.

【００４４】通常、処理システムから見たイベント表現は信号メッセージである。一般に、
各信号メッセージには、ヘッダーと信号本体が含まれる。信号本体はソフトウェ
アタスクの実行に必要な情報を含む。例えば、信号本体は、明示的か否かは別と
して、共有メモリ内のソフトウェアコード／データを指すポインタと、所要の入
力オペランドとを含む。この意味において、イベント信号は自立型であって、対
応するタスクを完全に規定する。その結果、プロセッサＰ１〜Ｐ４はそれぞれ独
自にイベントの取り込みおよび処理を行って、対応のソフトウェアタスクまたは
ジョブを並列に実行する。なお、ソフトウェアタスクもジョブと呼ぶことにして
、この開示全般にわたって、タスクとジョブは互換性を持った用語として使用す
る。タスクの並列実行中、プロセッサは、共有メモリのグローバルデータを操作
する必要がある。（ジョブのライフタイム中）いくつかのプロセッサが同じグロ
ーバルデータにアクセスして操作するデータ不整合（ｄａｔａｉｎｃｏｎｓｉ
ｓｔｅｎｃｉｅｓ）を避けるために、データ整合手段１５はデータの整合性が常
に維持されていることを確認する必要がある。タスクの並行実行中にグローバル
データが複数プロセッサで操作されたときのデータ整合性を保証するために、本
発明では２つの基本的な手順を利用する。Usually, the event expression seen from the processing system is a signal message. In general,
Each signal message includes a header and a signal body. The signal body contains the information needed to perform the software task. For example, the signal body, whether explicit or not, includes a pointer to software code / data in shared memory and the required input operands. In this sense, the event signal is self-contained and completely defines the corresponding task. As a result, each of the processors P1 to P4 independently captures and processes an event, and executes the corresponding software task or job in parallel. Note that a software task is also called a job, and throughout this disclosure, a task and a job are used as terms having interchangeability. During parallel execution of tasks, the processor needs to operate on global data in the shared memory. Data inconsistency (during job lifetime) where several processors access and manipulate the same global data
In order to avoid the situation, the data matching means 15 needs to confirm that the data consistency is always maintained. In order to ensure data consistency when global data is manipulated by multiple processors during parallel execution of a task, the present invention utilizes two basic procedures.

【００４５】・ロッキング：各プロセッサは通常、対応するタスクで使用するグローバルデー
タをタスク実行開始前にロックするための手段を、データ整合手段１５の一部と
して含んでいる。このようにすれば、グローバルデータをロックしたプロセッサ
だけがそのデータにアクセスすることができる。ロックされたデータはタスク実
行完了時に解放されることが望ましい。このアプローチにおいては、あるプロセ
ッサによってグローバルデータがロックされているときに別のプロセッサが同じ
データにアクセスしようとした場合、ロックされたデータが解放されるまで、別
のプロセッサは待たなければならない。一般に、ロッキングには待ち時間（ロッ
クされたグローバル状態での待ちまたは停止）が伴い、並列処理量をある程度制
限する（異なるグローバル状態における同時の並行動作は、もちろん可能）。Locking: Each processor usually includes, as a part of the data matching unit 15, means for locking global data used by the corresponding task before starting the task execution. In this way, only the processor that has locked the global data can access that data. It is desirable that the locked data is released when the task execution is completed. In this approach, if another processor attempts to access the same data while one processor has locked global data, the other processor must wait until the locked data is released. In general, locking involves latency (waiting or stopping in a locked global state) and limits the amount of parallel processing to some extent (simultaneous concurrent operations in different global states are of course possible).

【００４６】・衝突検出およびロールバック：ソフトウェアタスクが並列に実行され、アクセ
ス衝突が検出された場合、衝突が検出された１つ以上の実行タスクをロールバッ
クして再開することができる。一般に、衝突検出はマーカー法かアドレス比較法
によって行われる。マーカー法の場合、共有メモリでの変数の使用にマーキング
するための手段が各プロセッサに含まれ、マーキングに基づいて変数のアクセス
衝突が検出される。一般に、衝突検出には、ロールバックによる（無駄な処理の
結果として）ペナルティがある。Collision detection and rollback: If software tasks are executed in parallel and an access collision is detected, one or more execution tasks for which a collision was detected can be rolled back and restarted. Generally, collision detection is performed by a marker method or an address comparison method. In the case of the marker method, means for marking the use of the variable in the shared memory is included in each processor, and an access collision of the variable is detected based on the marking. Generally, collision detection has a penalty for rollback (as a result of wasteful processing).

【００４７】アプローチの選択はアプリケーションによって異なり、ケースバイケースで選
択される。簡単な経験則によれば、データベースシステムには、ロッキングに基
づくデータ整合が適しており、電気通信、データ通信システムには、衝突検出が
適している。いくつかの応用面では、ロッキングと衝突検出の組み合わせが有利
であるかもしれない。The choice of approach depends on the application and is chosen on a case-by-case basis. According to a simple rule of thumb, data matching based on locking is suitable for database systems, and collision detection is suitable for telecommunications and data communication systems. For some applications, a combination of locking and collision detection may be advantageous.

【００４８】データ整合性を確保するための手段としてのロッキングと衝突検出については
、詳しく後述する。Locking and collision detection as means for ensuring data consistency will be described later in detail.

【００４９】図３は本発明の第１の特徴による処理システムの一実施例を示す。この実施例
において、プロセッサＰ１〜Ｐ４は、各プロセッサが個別のローカルキャッシュ
Ｃ１〜Ｃ４を備えた対称マルチプロセッサ（ＳＭＰ）であり、イベントキューは
専用メモリリストＥＱ１〜ＥＱ４（リンクしたリストが望ましい）として共有メ
モリ１２に割り当てられる。FIG. 3 shows an embodiment of a processing system according to the first aspect of the present invention. In this embodiment, the processors P1 to P4 are symmetric multiprocessors (SMPs), each with its own local cache C1 to C4, and the event queues are dedicated memory lists EQ1 to EQ4 (preferably linked lists). Assigned to the shared memory 12.

【００５０】前述のように各イベント信号には一般に、ヘッダーと信号本体が含まれる。こ
の場合、ヘッダーには、対応するイベントが属するＮＣＣを表すＮＣＣタグ（明
示的が否かは問わない）が含まれる。ディストリビュータ１４はイベント信号に
含まれるＮＣＣタグに基づいて、入力イベントをイベントキューＥＱ４〜ＥＱ１
の１つへ分配する。例えば、入力ポート、リージョナルプロセッサまたはハード
ウェア装置など、イベント発生源をＮＣＣタグで表すことができる。入出力ユニ
ット１３で受け取ったイベントが特定のハードウェア装置から発生したものであ
って、これをイベント信号に含まれるタグで表すと仮定する。そうすると、ディ
ストリビュータ１４はイベントのタグを評価し、あらかじめ格納されたイベント
ディスパッチテーブル等に基づいて、共有メモリに割り当てられたイベントキュ
ーＥＱ１〜ＥＱ４のうちの所定キューにイベントを分配する。各プロセッサＰ１
〜Ｐ４は、共有メモリ１２中の各専用イベントキューからローカルキャッシュを
通してイベントを取り込んで、それらイベントを順次処理して、処理を終える。
トラフィック発生元における長期の不均衡を調整するために、イベントディスパ
ッチテーブルを時々変更することができる。As described above, each event signal generally includes a header and a signal body. In this case, the header includes an NCC tag (regardless of whether it is explicit or not) indicating the NCC to which the corresponding event belongs. The distributor 14 stores the input event in the event queues EQ4 to EQ1 based on the NCC tag included in the event signal.
To one of the For example, an event source, such as an input port, a regional processor or a hardware device, can be represented by an NCC tag. It is assumed that the event received by the input / output unit 13 is generated from a specific hardware device and is represented by a tag included in the event signal. Then, the distributor 14 evaluates the tag of the event, and distributes the event to a predetermined queue among the event queues EQ1 to EQ4 allocated to the shared memory based on an event dispatch table or the like stored in advance. Each processor P1
P4 to P4 take in events from each dedicated event queue in the shared memory 12 through the local cache, sequentially process those events, and end the processing.
The event dispatch table can be changed from time to time to adjust for long term imbalances in traffic sources.

【００５１】もちろん、本発明はローカルキャッシュ付きの対称マルチプロセッサに限定す
るものではない。共有メモリシステムに関する他の例として、キャッシュを含ま
ない共有メモリ、共通キャッシュ付き共有メモリ、混成キャッシュ付き共有メモ
リがある。Of course, the present invention is not limited to a symmetric multiprocessor with a local cache. Other examples of shared memory systems include shared memory without cache, shared memory with shared cache, and shared memory with mixed cache.

【００５２】オブジェクト指向設計例図４は共有メモリソフトウェアのオブジェクト指向設計の簡易化共有メモリマ
ルチプロセッサシステムを示す概要図である。共有メモリ１２におけるソフトウ
ェアはオブジェクト指向設計であり、１セットのブロックＢ１〜Ｂｎまたはクラ
スとして構成される。各ブロック／オブジェクトは一定の機能を実行する役割を
もつ。一般に、各ブロック／オブジェクトは２つの主要セクター、すなわち、コ
ードを格納するプログラムセクターと、データを格納するデータセクターに分け
られる。あるブロックのプログラムセクターのコードは、そのブロックに属する
データに関してのみ、アクセス、処理することができる。データセクターもまた
２つのセクター、すなわち、複数のグローバル変数ＧＶ１〜ＧＶｎを含む「グロ
ーバル」データの第１セクターと、レコードＲ１〜Ｒｎなどの例えば「プライベ
ートな」データの第２セクターに分割されるのが望ましく、通常は各レコードに
、レコードＲｘで例示される複数のレコード変数ＲＶ１〜ＲＶｎが含まれる。一
般に各トランザクションは、あるブロックの１つのレコードに関連付けられてお
り、ブロック内のグローバルデータを複数トランザクションで共有することがで
きる。Object-Oriented Design Example FIG. 4 is a schematic diagram showing a simplified shared-memory multiprocessor system for object-oriented design of shared memory software. The software in the shared memory 12 has an object-oriented design and is configured as a set of blocks B1 to Bn or a class. Each block / object is responsible for performing certain functions. In general, each block / object is divided into two main sectors, a program sector for storing code and a data sector for storing data. The code in the program sector of a block can be accessed and processed only for data belonging to that block. The data sector is also divided into two sectors, a first sector of "global" data including a plurality of global variables GV1-GVn and a second sector of, for example, "private" data such as records R1-Rn. Preferably, each record usually includes a plurality of record variables RV1 to RVn exemplified by the record Rx. Generally, each transaction is associated with one record of a block, and global data in the block can be shared by a plurality of transactions.

【００５３】通常はブロックへの信号エントリーで、ブロック内のデータ処理が開始する。
各プロセッサはイベントを受け取ると、それがイベント外部イベントか内部イベ
ントに関わらず、そのイベント信号で示されるブロックのコードを実行し、その
ブロック中のグローバル変数およびレコード変数の処理を行い、それによってソ
フトウェアタスクを実行する。図４において、ソフトウェアタスクの実行は各プ
ロセッサＰ１〜Ｐ４内の波線によって示される。Normally, data processing in a block starts with a signal entry to the block.
When each processor receives an event, whether it is an external event or an internal event, it executes the code of the block indicated by the event signal, processes the global variables and record variables in that block, and thereby executes the software. Perform tasks. In FIG. 4, the execution of the software task is indicated by broken lines in each of the processors P1 to P4.

【００５４】図４の例では、第１のプロセッサＰ１はソフトウェアブロックＢ８８のコード
を実行する。図には命令１２０〜１２３だけしか示されていないが、実際には多
くの命令が実行され、ブロック中の１つ以上の変数が各命令によって処理される
。例えば、命令１２０はレコードＲ１におけるレコード変数ＲＶ２８を処理し、
命令１２１はレコードＲ５におけるレコード変数ＲＶ５９を処理し、命令１２２
はグローバル変数ＧＶ４３を処理し、命令１２３はグローバル変数ＧＶ６７を処
理する。それに対応して、プロセッサＰ２はコードを実行してブロックＢ１の変
数を処理し、プロセッサＰ３はコードを実行してブロックＢ８の変数を処理し、
プロセッサＰ４はコードを実行してブロックＢ９９の変数を処理する。In the example of FIG. 4, the first processor P1 executes the code of the software block B88. Although only the instructions 120-123 are shown in the figure, many instructions are actually executed and one or more variables in the block are processed by each instruction. For example, the instruction 120 processes the record variable RV28 in the record R1,
The instruction 121 processes the record variable RV59 in the record R5.
Processes the global variable GV43, and the instruction 123 processes the global variable GV67. Correspondingly, processor P2 executes the code to process the variables in block B1, processor P3 executes the code to process the variables in block B8,
Processor P4 executes the code to process the variables in block B99.

【００５５】ブロック志向のソフトウェアの例として、Ｔｅｌｅｆｏｎａｋｔｉｅｂｏｌａ
ｇｅｔＬＭＥｒｉｃｓｓｏｎのＰＬＥＸ（ＰｒｏｇｒａｍｍｉｎｇＬａｎ
ｇｕａｇｅｆｏｒＥｘｃｈａｎｇｅｓ）ソフトウェアがあり、これはソフト
ウェア全体がブロック形式で構成される。Ｊａｖａアプリケーションは真のオブ
ジェクト指向設計の例である。As an example of block-oriented software, Telefonaktiebola
get LM Ericsson's PLEX (Programming Lan
There is a “gage for exchanges” software, in which the whole software is configured in a block format. Java applications are an example of a true object-oriented design.

【００５６】イベントレベルパイプライン処理前述のように、いくつかのシステムでは、内部イベント（例えば、ＣＰ−ｔｏ
−ＣＰバッファ信号）によって接続された「スライス」単位で外部イベントが処
理される。Event Level Pipelining As described above, in some systems, internal events (eg, CP-to
An external event is processed in units of “slices” connected by the (CP buffer signal).

【００５７】本発明の第２の特徴によると、同時処理は、複数の共有メモリプロセッサから
なる少なくとも１セットのプロセッサを、各外部イベントがパイプラインのそれ
ぞれ異なるプロセッサ段で実行されるイベントのチェーンとしてスライス単位で
処理されるマルチプロセッサパイプラインとして動作させることによって実行さ
れる。ある段から発生するすべての信号が発生順にしたがって次段に送られる限
り、発生順信号処理の順序づけ条件は保証されるだろう。この基準から逸脱した
場合でも、競走のない実行（ｒａｃｉｎｇ−ｆｒｅｅｅｘｅｃｕｔｉｏｎ）を
保証しなければならないだろう。与えられたスライスを実行した結果、２つ以上
の信号が発生した場合、それらを発生順に後続プロセッサ段に供給する必要があ
り、あるいは、これらの信号を２つ以上のプロセッサに分配するときは、競走に
伴って計算に支障がないように考慮しなければならない。According to a second feature of the present invention, the simultaneous processing comprises at least one set of processors comprising a plurality of shared memory processors as a chain of events where each external event is executed in a different processor stage of the pipeline. This is executed by operating as a multiprocessor pipeline that is processed in slice units. As long as all signals originating from one stage are sent to the next stage in the order in which they occur, the ordering conditions for chronological signal processing will be guaranteed. Even if deviating from this criterion, one would have to guarantee racing-free execution. If two or more signals occur as a result of executing a given slice, they need to be supplied to subsequent processor stages in the order in which they occur, or when distributing these signals to more than one processor, Care must be taken to ensure that calculations do not interfere with the race.

【００５８】ここで、本発明の第２の特徴によるマルチプロセッサパイプラインの一実施例
を図５Ａ、図５Ｂにしたがって説明する。Here, an embodiment of the multiprocessor pipeline according to the second aspect of the present invention will be described with reference to FIGS. 5A and 5B.

【００５９】図５Ａは発明の第２の特徴によるイベント駆動型処理システムの概要図である
。この処理システムは図２のものと類似している。しかし、マルチプロセッサパ
イプライン１１の一部であるプロセッサによって生成される内部イベントは、必
ずしも同じプロセッサにフィードバックされるわけではなく、プロセッサＰ１〜
Ｐ４から引かれた破線で示されるように、いずれかのプロセッサに供給されてイ
ベントキュー１６につながるバス上で終端することもある。FIG. 5A is a schematic diagram of an event-driven processing system according to the second aspect of the present invention. This processing system is similar to that of FIG. However, internal events generated by processors that are part of the multiprocessor pipeline 11 are not always fed back to the same processor,
As shown by the dashed line drawn from P4, the signal may be supplied to any of the processors and terminated on the bus connected to the event queue 16.

【００６０】オブジェクト指向のソフトウェアデザインでは、共有メモリのソフトウェアは
図４に関連して上述したようにブロックあるいはクラス状に構成され、対応する
プロセッサは外部イベントを受け取ると、ブロック／オブジェクトのコードを実
行し、別のブロック／オブジェクトに向けた内部イベント形式の結果を生成する
。実行のためこの内部イベントが現れると、指示されたブロック／オブジェクト
で実行され、他のブロック／オブジェクトに向けた別の内部イベントが生成され
る。通常、このチェーンは２〜３の内部イベント発生後に消滅する。例えば電気
通信に応用する場合、それぞれの外部イベントが生成する内部イベントは５〜１
０づつ程度であろう。In object-oriented software design, the shared memory software is organized into blocks or classes as described above with reference to FIG. 4, and the corresponding processor executes the block / object code upon receiving an external event. And generate an internal event type result for another block / object. When this internal event appears for execution, it is executed on the indicated block / object and another internal event directed to another block / object is generated. Normally, this chain disappears after a few internal events have occurred. For example, when applied to telecommunications, each external event generates 5 to 1 internal events.
It will be around 0.

【００６１】オブジェクト指向ソフトウェア設計のためのカスタム化マルチプロセッサパイ
プラインを実現することにより、ソフトウェアブロック／クラスのクラスタをプ
ロセッサに割り当てることができる。図２では、共有メモリ１２におけるブロッ
ク／クラスのクラスタＣＬ１〜ＣＬｎが破線ボックスで図式的に示される。図２
においてプロセッサＰ２とクラスタＣＬ１をつなぐ実線で示されるように、１つ
のクラスタＣＬ１はプロセッサＰ２に割り当てられ、また、プロセッサＰ４とク
ラスタＣＬ２をつなぐ破線で示されるように、もう一つのクラスタＣＬ２はプロ
セッサＰ４に割り当てられる。このように、共有メモリ１２内のブロック／クラ
スの各クラスタは、プロセッサＰ１〜Ｐ４の所定の１つに割り当てられ、割当ス
キームはディストリビュータ１４内のルックアップデーブル１７と共有メモリ１
２内のルックアップデーブル１８で実行される。各ルックアップデーブル１７、
１８は、例えばイベントＩＤに基づいて目標ブロックを各イベントにリンクさせ
、それぞれの目標ブロックをブロックの所定クラスタに関連づける。ディストリ
ビュータ１４はルックアップデーブル１７における情報にしたがって外部イベン
トを各プロセッサに分散する。共有メモリ１２のルックアップデーブル１８は、
プロセッサへの内部イベントの分散を可能にするために、すべてのプロセッサＰ
１〜Ｐ４で使用することができる。言い換えれば、プロセッサは、内部イベント
を生成したとき、ルックアップデーブル１８を参照して、ｉ）対応する目標ブロ
ックを、例えばイベントＩＤに基づいて確認し、ｉｉ）確認された目標ブロック
が属するクラスタを確認し、ｉｉｉ）確認されたクラスタが割り当てられるプロ
セッサを確認して、内部イベント信号を適切なイベントキューに送出する。ここ
で重要なことは、一般的には各ブロックがそれぞれ唯一のクラスタに属している
が、クラスタが重複する割当スキームでも、イベントＩＤの他に実行状態などの
情報を用いた若干複雑な方法によって実行可能である。By implementing a customized multiprocessor pipeline for object-oriented software design, clusters of software blocks / classes can be assigned to processors. In FIG. 2, clusters CL1 to CLn of blocks / classes in the shared memory 12 are schematically shown by dashed boxes. FIG.
, One cluster CL1 is assigned to the processor P2 as indicated by a solid line connecting the processor P2 and the cluster CL1, and another cluster CL2 is connected to the processor P4 as indicated by a broken line connecting the processor P4 and the cluster CL2. Assigned to. Thus, each cluster of blocks / classes in the shared memory 12 is assigned to a predetermined one of the processors P1 to P4, and the allocation scheme is the look-up table 17 in the distributor 14 and the shared memory 1
2 is performed in the look-up table 18. Each look-up table 17,
18 links target blocks to each event based on, for example, the event ID, and associates each target block with a predetermined cluster of blocks. The distributor 14 distributes the external event to each processor according to the information in the lookup table 17. The look-up table 18 of the shared memory 12 is
To allow the distribution of internal events to the processors, all processors P
1 to P4. In other words, when generating the internal event, the processor refers to the look-up table 18 to i) confirm the corresponding target block based on the event ID, for example, and ii) determine the cluster to which the confirmed target block belongs. Confirm, iii) confirm the processor to which the confirmed cluster is assigned and send an internal event signal to the appropriate event queue. What is important here is that each block generally belongs to only one cluster. However, even in an allocation scheme in which clusters overlap, a slightly complicated method using information such as an execution state in addition to an event ID is used. It is feasible.

【００６２】図５Ｂに示されるように、プロセッサにブロック／クラスのクラスタをマッピ
ングすると、自動的にパイプライン処理が実行される。すなわち、外部イベント
ＥＥは、プロセッサＰ１に割り当てられているブロックＡに導かれ、このブロッ
クで生成される内部イベントＩＥは、プロセッサＰ２に割り当てられているブロ
ックＢに導かれ、このブロックで生成される内部イベントＩＥは、プロセッサＰ
４に割り当てられているブロックＣに導かれ、このブロックで生成される内部イ
ベントＩＥは、プロセッサＰ１に割り当てられているブロックＤに導かれる。し
たがって、論理的には、多くのプロセッサ段を含むパイプラインを有することに
なる。ここで、ブロックＡ、Ｄは、プロセッサＰ１にマッピングされるクラスタ
の一部であると仮定し、ブロックＢはプロセッサＰ２にマッピングされるクラス
タの一部であり、ブロックＣはプロセッサＰ４にマッピングされるクラスタの一
部である。パイプラインの各段は１つのプロセッサで実行されるが、特定のプロ
セッサでパイプライン中の１つ以上の段を実行することが可能である。As shown in FIG. 5B, when a block / class cluster is mapped to a processor, pipeline processing is automatically executed. That is, the external event EE is guided to the block A assigned to the processor P1, and the internal event IE generated in this block is guided to the block B assigned to the processor P2, and generated in this block. Internal event IE is processor P
4, and the internal event IE generated in this block is guided to a block D assigned to the processor P1. Thus, logically, one would have a pipeline that includes many processor stages. Here, blocks A and D are assumed to be part of a cluster mapped to processor P1, block B is part of a cluster mapped to processor P2, and block C is mapped to processor P4. Part of a cluster. Although each stage of the pipeline is executed by one processor, it is possible for a particular processor to execute one or more stages in the pipeline.

【００６３】一つの変形として、共有メモリ１２の所定データ領域からの入力データを必要
とするイベントを同一の所定プロセッサセットにマッピングすることもできる。As a variation, events requiring input data from a predetermined data area of the shared memory 12 can be mapped to the same predetermined processor set.

【００６４】マルチプロセッサパイプラインにおける１つのプロセッサ段が第１のイベント
チェーンに属するイベントを実行して、その結果として発生した内部イベント信
号を次のプロセッサ段に送ると、通常は次のイベントチェーンからのイベントの
処理が自由に開始されるので、スループットが向上する。When one processor stage in a multiprocessor pipeline executes an event belonging to the first event chain and sends the resulting internal event signal to the next processor stage, the Since the processing of the event is started freely, the throughput is improved.

【００６５】利得を最大にするためには、すべてのプロセッサの負荷が等しくなるように、
プロセッサへのパイプライン段のマッピングをする必要がある。したがって、ブ
ロック／クラスのクラスタのパーティションは「等負荷」基準にしたがって仕切
られる。各クラスタに費やされる時間は、例えば単一プロセッサで走る同様のア
プリケーションから判断することが可能であり、また、ランタイム中に監視して
パーティションの再調整をすることもできる。１つの入力イベントに応答してブ
ロックから２つ以上の内部イベントが発生し、それぞれのイベントが別々のブロ
ックに送られる場合、後から発生した内部イベントが先に実行されることを防止
する、「等負荷」基準付きの「非競走（ｎｏｒａｃｉｎｇ）」基準が必要であ
る。To maximize the gain, the load on all processors should be equal,
It is necessary to map pipeline stages to processors. Therefore, the partitions of the block / class cluster are partitioned according to the "equal load" criteria. The time spent in each cluster can be determined, for example, from similar applications running on a single processor, and can be monitored during runtime to re-partition. When two or more internal events occur from a block in response to one input event, and each event is sent to a separate block, a later internal event is prevented from being executed first. A "no racing" criterion with an "equal load" criterion is required.

【００６６】もちろん、外部イベントをスライスに分割せずに、そのまま処理してもよいが
、分割することにより、構造化されたプログラム開発／メンテナンスが可能にな
り、更に、パイプライン処理も可能になる。Of course, the external event may be processed as it is without dividing it into slices. However, by dividing the external event, structured program development / maintenance becomes possible, and further, pipeline processing becomes possible. .

【００６７】また、１つの外部イベントを２〜３の大スライスまたは多数の小スライスに分
割してもでも、同じ処理を行うことができる。The same processing can be performed even if one external event is divided into a few large slices or a large number of small slices.

【００６８】以上のように、並行タスク実行中にプロセッサでグローバルデータを操作する
ときの整合性を確保するためには、ｉ）ロッキングと、ｉｉ）衝突検出およびロ
ールバックの２つの基本的な手順がある。As described above, in order to ensure consistency when global data is manipulated by the processor during execution of a parallel task, two basic procedures of i) locking and ii) collision detection and rollback There is.

【００６９】データ整合性を確保する手段としてのロッキングデータ整合性を確保する目的でロッキングを実行すると、一般にタスクを実行
する際に、各プロセッサは、タスクが使用するグローバルデータをタスクの実行
開始前にロックする。このようにして、グローバルデータをロックしたプロセッ
サだけが、そのデータにアクセスすることができる。Locking as Means of Ensuring Data Consistency When locking is executed for the purpose of ensuring data consistency, generally, when executing a task, each processor stores global data used by the task before starting execution of the task. To lock. In this way, only the processor that has locked the global data can access that data.

【００７０】データ領域を明確に画定して、ブロックの特定のデータセクターまたはブロッ
ク全体をロックすることができるので、ロッキングはオブジェクト指向デザイン
に非常に適している。通常、ブロック内のグローバルデータのどの部分が特定の
実行シーケンスまたはタスクによって変更されるかを知ることは不可能であり、
グローバルデータの一般的な特徴づけができないため、データ整合性を確保する
上で、グローバルデータセクター全体をロックするのが安全な方法である。理想
的な場合は、各ブロックのグローバルデータを保護するだけで十分であるが、多
くのアプリケーションでは、保護を必要とする「アクロスレコード」と呼ばれる
動作がある。例えば、空レコードを選択する動作では、実際に空レコードを見つ
けるまでに多くのレコードを探すことになろう。したがって、ブロック全体をロ
ックすると、すべてが保護される。また、バッファード信号の実行が、ループす
る（ＥＸＩＴまでに１ブロック当たり２回以上巡る）可能性があるいわゆるダイ
レクト／結合信号（あるブロックから別のブロックへダイレクトにジャンプする
）によって接続される複数ブロックにまたがるようなアプリケーションでは、ロ
ックされたブロックをタスク実行終了時まで解放できない。Locking is well suited for object-oriented designs because the data area can be clearly defined and specific data sectors of the block or the entire block can be locked. Usually, it is not possible to know which part of the global data in a block is changed by a particular execution sequence or task,
Locking the entire global data sector is a safe way to ensure data integrity, since general characterization of global data is not possible. In the ideal case, it is enough to protect the global data in each block, but in many applications there is an operation called "accross record" that requires protection. For example, the operation of selecting an empty record would involve searching for many records before actually finding an empty record. Therefore, locking the entire block protects everything. Also, the execution of the buffered signal is connected by a so-called direct / combined signal (jumping directly from one block to another block) that may cause a loop (twice or more per block before EXIT). In applications that straddle blocks, locked blocks cannot be released until the end of task execution.

【００７１】一般に、ＮＣＣを使用すると複数のプロセッサ間の「共有状態」が最小限にな
って、キャッシュヒット率が向上する。特に、例えば電気通信システムにおける
シグナリング装置や加入者端末などの機能的に異なるリージョナルプロセッサ／
ハードウェア装置を主要ノードにおけるそれぞれ異なるプロセッサにマッピング
すると、後方の実行段に処理が達するまで、異なるアクセスメカニズムは通常異
なるブロックで処理されるため、ロックされたブロック上でノーウエイトまたは
ほとんどノーウエイトで異なるアクセスメカニズムの同時処理が可能になる。In general, the use of the NCC minimizes the “shared state” between a plurality of processors and improves the cache hit rate. In particular, functionally different regional processors such as signaling devices and subscriber terminals in telecommunication systems, for example.
If the hardware device is mapped to different processors in the main node, the different access mechanisms are usually handled in different blocks until processing reaches a later stage of execution, so there is no or almost no weight on the locked block. Simultaneous processing of different access mechanisms becomes possible.

【００７２】図６は、データ整合性を保証するために行うブロック／オブジェクトのロッキ
ングを示す。３つの異なる外部イベントＥＥｘ、ＥＥｙ、ＥＥｚがブロックＢ１
、Ｂ２、Ｂ１にそれぞれ送られる場合を考察する。外部イベントＥＥｘはブロッ
クＢ１に入り、ブロックＢ１の対角線で示されるように、対応するプロセッサが
ブロックにおける実行開始前にブロックＢ１をロックする。次に、外部イベント
ＥＥｙはブロックＢ２に入り、対応するプロセッサがブロックＢ２をロックする
。図６の時間軸（ｔ）で示されるように、既にブロックＢ１に入って、そのブロ
ックをロックしている外部イベントＥＥｘに続いて、ブロックＢ１に向かう外部
イベントＥＥｚが到達する。したがって、外部イベントＥＥｚの処理は、ブロッ
クＢ１が解放されるまで待たなければならない。FIG. 6 shows the locking of blocks / objects performed to guarantee data consistency. Three different external events EEx, EEy, EEz are in block B1
, B2 and B1 respectively. The external event EEx enters block B1, and the corresponding processor locks block B1 before execution begins on the block, as indicated by the diagonal of block B1. Next, the external event EEy enters block B2, and the corresponding processor locks block B2. As shown by the time axis (t) in FIG. 6, the external event EEx that has already entered the block B1 and locks the block follows the external event EEx going to the block B1. Therefore, the processing of the external event EEz must wait until the block B1 is released.

【００７３】しかし、ロッキングによって、２つのプロセッサが、現タスクの実行で互いに
必要とする変数が解放されるのを無限に待ち続けるデッドロック状態が生じ得る
。したがって、デッドロックを避けるか、あるいはデッドロックを検出し、処理
の進行を保証するロールバックを行うのが望ましい。However, locking can cause a deadlock condition in which the two processors wait indefinitely for variables needed by the execution of the current task to be released from each other. Therefore, it is desirable to avoid a deadlock or detect a deadlock and perform a rollback to guarantee the progress of processing.

【００７４】実行中に必要に応じてブロックを確保またはロックする代わりに、ジョブの始
めにタスク（すなわちジョブ）全体の実行に必要な全ブロックを確保することに
よってデッドロックを回避することができる。あるジョブに必要な全ブロックを
常に予測することは不可能であるがコンパイラ分析を使用する非ランタイム入力
の場合、例えばジョブ中の処理時間の多くの部分を消費するブロックを少なくと
も確保することによってデッドロックを最小にするための情報が得られるかもし
れない。デッドロックを最小にする効率的な方法は、その処理で次に必要になる
ブロックであるか否かにかかわらず、使用頻度の高いブロックを実行開始前に確
保することである。最も安全な考え方は、ほとんど間違いなくジョブに必要なブ
ロック、特に、使用頻度の高いブロックを確保し、そして残りのブロックを必要
な時点で確保することである。Instead of securing or locking blocks as needed during execution, deadlock can be avoided by securing all blocks needed to execute the entire task (ie, job) at the beginning of the job. For non-runtime inputs where it is not always possible to predict all blocks needed for a job, but using compiler analysis, for example, by allocating at least blocks that consume a large part of the processing time in a job Information may be obtained to minimize locking. An efficient way to minimize deadlocks is to reserve frequently used blocks before starting execution, regardless of whether they are the next block needed in the process. The safest idea is almost certainly to reserve the blocks needed for the job, especially the ones that are used most often, and reserve the remaining blocks when needed.

【００７５】実行中に必要に応じてブロックを確保しようとすると、前述のようにデッドロ
ックになり易いので、デッドロックを検出して、分析する必要がある。デッドロ
ックは、できるだけ早く検出するのが望ましく、本発明によれば、ほとんど即座
にデッドロックを検出することができる。すべての「オーバーヘッド処理」は２
つのジョブ間で行われるので、デッドロックを起こしそうな後のジョブにリソー
スが取られると、デッドロックの検出が明らかになる。これは、他のプロセッサ
が対象のジョブに必要なリソースの１つを保持しているかどうかチェックし、そ
のプロセッサが対象のジョブのプロセッサに保持されたリソースを待っているか
どうかを、例えばブロックごとのフラグを使用して確認することによって達成さ
れる。If a block is to be reserved as needed during execution, a deadlock is likely to occur as described above. Therefore, it is necessary to detect and analyze the deadlock. It is desirable to detect deadlock as soon as possible, and according to the present invention, deadlock can be detected almost immediately. All "overhead processing" is 2
Since this is done between two jobs, deadlock detection becomes apparent when resources are taken up in a later job that is likely to cause a deadlock. This checks whether other processors have one of the resources needed for the job in question, and checks if that processor is waiting for resources held by the processor for the job in question, for example, on a block-by-block basis. Achieved by checking using flags.

【００７６】デッドロックを最小にすると、通常、ロールバックスキームと進行スキームに
影響を与える。デッドロックの頻度が下がるにしたがって、まれにしか起こらな
いロールバックの効率を気にする必要がなくなるので、ロールバックスキームは
単純になる。逆に、デッドロックの頻度が比較的高いと、効率的なロールバック
スキームが重要になる。Minimizing deadlock typically affects rollback and progress schemes. As the frequency of deadlocks decreases, the rollback scheme simplifies because it is not necessary to worry about the efficiency of the rare rollback. Conversely, when the frequency of deadlocks is relatively high, an efficient rollback scheme becomes important.

【００７７】ロールバックの基本原理は、保持されたリソースをすべて解放し、デッドロッ
クの原因にかかわるジョブの１つの開始点に戻り、その時点までの実行中に行わ
れたすべての変更を元に戻し、そして、効率を損なわずに処理の進行が保証され
るような方法、あるいはそのような遅延時間を経て、ロールバックされたジョブ
を再実行することである。この原理は一般に、ロールバックスキームがただちに
ジョブを再実行することによって同じジョブのロールバックが原因のデッドロッ
クの繰返しを許さないと同時に、また、ロールバックジョブの開始までの遅延時
間をあまり長くしないことを意味する。しかし、ジョブの実行時間が非常に短い
場合は、単にロールバックの対象としてデッドロックの原因となった「後の（ｌ
ａｔｅｒ）」ジョブを選択するのが適切であろう。The basic principle of rollback is to release all held resources, return to the starting point of one of the jobs involved in the cause of the deadlock, and base all changes made during execution up to that point. And then rerun the rolled back job in such a way that the progress of the process is guaranteed without loss of efficiency, or after such a delay. This principle generally does not allow repetition of deadlocks due to rollback of the same job by immediately re-executing the job, and also does not significantly increase the delay before the rollback job starts. Means that. However, when the execution time of the job is very short, it simply causes a deadlock as a target of rollback.
ater) job would be appropriate.

【００７８】データ整合性を確保する手段としての衝突検出データ整合性を確保する目的で衝突検出を実施すると、複数のプロセッサによ
ってソフトウェアタスクが並列に実行されてアクセス衝突が検出され、そして衝
突が検出された１つ以上の実行タスクをロールバックして再実行することができ
る。Collision Detection as Means of Ensuring Data Consistency When collision detection is performed for the purpose of ensuring data consistency, a plurality of processors execute software tasks in parallel to detect an access collision, and detect the collision. One or more of the performed tasks can be rolled back and re-executed.

【００７９】タスク実行中に各プロセッサが共有メモリにおける変数の使用をマークして、
変数アクセス衝突の検出を可能にすることが望ましい。ごく基本的なレベルでは
、マーカー法は共有メモリにおける個々の変数の使用をマークする過程を含む。
しかし、個々のデータの代わりに比較的広い領域をマークすることによって、や
や大まかな衝突チェックが実現できる。大まかな衝突チェックを実施する一つの
方法は、ページングを含む標準メモリ管理テクニックの利用である。もう一つは
、変数の組分けをマークする方法であって、個々のレコード変数をマークする代
わりに、レコード内のすべてのレコード変数を含む全レコードをマークする場合
に、特に効率的である。しかし、与えられたデータ領域をジョブが使用するとき
に、他のジョブが同じ領域を使用する確率が非常に低くなるような「データ領域
」を選択することが重要である。さもなければ、大まかなデータ領域マーキング
が、実際にはロールバックの頻度を増すことになるかもしれない。During task execution, each processor marks the use of a variable in shared memory,
It would be desirable to be able to detect variable access collisions. At a very basic level, the marker method involves marking the use of individual variables in shared memory.
However, by marking a relatively large area instead of individual data, a somewhat rougher collision check can be realized. One way to perform a rough collision check is to use standard memory management techniques, including paging. Another method is to mark the grouping of variables, which is particularly efficient when, instead of marking individual record variables, all records, including all record variables within a record, are marked. However, when a job uses a given data area, it is important to select a “data area” that has a very low probability that another job will use the same area. Otherwise, coarse data area markings may actually increase the frequency of rollbacks.

【００８０】図７は、オブジェクト指向ソフトウェア設計におけるアクセス衝突を検出する
際の変数マーキングの使用を示す。図４との関連で上述したように、共有メモリ
１２は複数ブロックＢ１〜Ｂｎで構成され、複数のプロセッサＰ１〜Ｐ３が共有
メモリ１２に接続される。図７では、ブロックＢ２およびブロックＢ４の２ブロ
ックについて詳細が示されている。この特定のマーカー法では、ブロック内の各
グローバル変数ＧＶ１〜ＧＶｎおよび各レコードＲ１〜Ｒｎは、図７に示される
マーカーフィールドに関連付けられる。FIG. 7 illustrates the use of variable marking in detecting access collisions in object-oriented software design. As described above with reference to FIG. 4, the shared memory 12 includes a plurality of blocks B1 to Bn, and a plurality of processors P1 to P3 are connected to the shared memory 12. FIG. 7 shows details of two blocks, block B2 and block B4. In this particular marker method, each global variable GV1 to GVn and each record R1 to Rn in the block are associated with the marker field shown in FIG.

【００８１】マーカーフィールドは共有メモリシステムに接続されたプロセッサごとに１ビ
ットを含むから、この場合、各マーカーフィールドは３ビットを含む。最初に全
ビットがリセットされ、各プロセッサは変数かレコードにアクセス（読み書き）
する前に自己のビットをセットし、次に、マーカーフィールド全体を読んで評価
する。マーカーフィールド内で他のビットがセットされていれば、衝突の可能性
が差し迫っているので、プロセッサロールは実行中のタスクをロールバックして
、対応する全マーカービットをリセットすることを含めて、現実行点までに施さ
れたすべての変更を元に戻す。一方、他に設定されたビットがなければ、プロセ
ッサはタスクの実行を続ける。各プロセッサは実行中にアクセスした各変数のア
ドレスを記録し、タスク実行終了時には、記録されたアドレスを使用してそれぞ
れの対応マーカーフィールド中の自己ビットをリセットする。Since the marker field contains one bit for each processor connected to the shared memory system, in this case each marker field contains three bits. First, all bits are reset, and each processor accesses a variable or record (read / write)
Before doing so, set its own bit, then read and evaluate the entire marker field. If any other bit is set in the marker field, a potential collision is imminent, and the processor role rolls back the running task, including resetting all corresponding marker bits, Undoes all changes made up to the current execution point. On the other hand, if no other bit is set, the processor continues to execute the task. Each processor records the address of each variable accessed during execution, and resets its own bit in each corresponding marker field using the recorded address at the end of task execution.

【００８２】衝突検出時にロールバックできるようにしておくためには、修正された全変数
および各ジョブ実行中の全アドレスのコピー（変更前の変数状態）を保存する必
要がある。それによって、ロールバック時に原状回復が可能になる。In order to be able to roll back when a collision is detected, it is necessary to save copies of all the modified variables and all addresses during execution of each job (variable state before change). Thus, the original state can be restored at the time of rollback.

【００８３】図７において、プロセッサＰ２はグローバル変数ＧＶ１にアクセスする必要が
あり、ＧＶ１に関連するマーカーフィールドの第２位置における自己のビットを
セットして、マーカーフィールド全体を読む。この場合、プロセッサＰ２がセッ
トしたビットと、プロセッサＰ１がセットしたビットがフィールド（１１０）に
含まれることから、変数アクセス衝突が差し迫っていることが検出される。プロ
セッサＰ２は実行中のタスクをロールバックする。そして、レコードＲ２にアク
セスする必要があれば、プロセッサＰ２は第２位置における自己ビットをセット
して、次に、マーカーフィールド全体を読む。Ｐ２がセットしたビットと、Ｐ３
がセットしたビットがフィールド（０１１）に含まれることから、レコードアク
セス衝突が検出され、プロセッサＰ２は実行中のタスクをロールバックする。レ
コードＲ１にアクセスする必要があると、プロセッサＰ３はまず、関連マーカー
フィールドの第３位置における自己ビットをセットし、そのフィールド全体を読
んで評価する。この場合、他にセットされたビットがないので、プロセッサＰ３
はレコードにアクセスして読み書きすることができる。例えば、ほとんど読み込
んだ変数に関する不要なロールバックを減らすために、それぞれのマーカーフィ
ールドには、各プロセッサあたり２ビットづつ、すなわち読出しと書き込みに１
ビットづつ含まれることが望ましい。In FIG. 7, processor P2 needs to access global variable GV1, sets its own bit in the second position of the marker field associated with GV1, and reads the entire marker field. In this case, since a bit set by the processor P2 and a bit set by the processor P1 are included in the field (110), it is detected that a variable access collision is imminent. The processor P2 rolls back the running task. Then, if it is necessary to access record R2, processor P2 sets its own bit in the second position and then reads the entire marker field. The bit set by P2 and P3
Is included in the field (011), a record access collision is detected, and the processor P2 rolls back the running task. When the record R1 needs to be accessed, the processor P3 first sets its own bit in the third position of the relevant marker field and reads and evaluates the entire field. In this case, since there are no other bits set, the processor P3
Can access and read and write records. For example, to reduce unnecessary rollbacks on mostly read variables, each marker field contains two bits per processor, ie, one for reads and one for writes.
It is desirable to be included bit by bit.

【００８４】もう一つの衝突検出アプローチはアドレス比較法と呼ばれ、タスク終了時に読
出しと書き込みアドレスが比較される。マーカー法との主な違いは、他のプロセ
ッサによるアクセスを通常はタスク実行中にチェックせず、タスク終了時にだけ
チェックすることである。アドレス比較法を実施する特定タイプのチェックユニ
ットに関する一例が国際特許出願ＷＯ８８／０２５１３で開示されている。Another collision detection approach is called an address comparison method, in which the read and write addresses are compared at the end of a task. The main difference from the marker method is that accesses by other processors are not normally checked during task execution, but only at the end of the task. One example of a particular type of check unit that implements the address comparison method is disclosed in International Patent Application WO 88/02513.

【００８５】既存アプリケーションソフトウェアの再利用通常、既存のシーケンシャリープログラムド（ｓｅｑｕｅｎｔｉａｌｌｙｐ
ｒｏｇｒａｍｍｅｄ）アプリケーションソフトウェアには、かなりな金額が投資
されており、階層化処理システムの最上位レベルにおける単一プロセッサノード
などの単一プロセッサシステム用として、何千行、何百万行ものソフトウェアコ
ードが既に存在している。アプリケーションソフトウェアが複数のプロセッサ上
で実行されるとき、再コンパイル等によって自動的にアプリケーションソフトウ
ェアを変換してデータ整合性を確保すれば、すべてのソフトウェアコードはマル
チプロセッサ環境に移行され、再利用されるので、時間と費用の節約になる。Reuse of Existing Application Software Usually, existing sequenced programs (sequentially
Significant amounts of money have been invested in application software, and thousands or millions of lines of software code are needed for a single processor system, such as a single processor node at the top level of a hierarchical processing system. Already exists. When the application software is executed on multiple processors, if the application software is automatically converted by recompilation and the like to ensure data consistency, all software code is transferred to a multiprocessor environment and reused. So you save time and money.

【００８６】図８Ａは層状の観点からみた従来技術による単一プロセッサシステムを示す。
下部層は、標準マイクロプロセッサ等のプロセッサＰ１である。次の層にはオペ
レーティングシステムが含まれ、続いて仮想計算機があり、この仮想計算機で最
上層のアプリケーションソフトウェアが翻訳される。FIG. 8A shows a prior art uniprocessor system from a layered perspective.
The lower layer is a processor P1 such as a standard microprocessor. The next layer contains the operating system, followed by the virtual machine, on which the top layer application software is translated.

【００８７】図８Ｂは層状の観点からみたマルチプロセッサシステムを示す。下部層は、即
納で入手可能な複数の共有メモリプロセッサとして実現されるマイクロプロセッ
サＰ１、Ｐ２である。次の層はオペレーティングシステムである。仮想計算機は
例えばＳＵＮのワークステーション上で走るＡＰＺエミュレータや、ＳＩＭＡＸ
等のコンパイル型式の高性能エミュレータなど、周知のＪａｖａ仮想計算機であ
って、マルチプロセッササポートおよびデータ整合性関連サポートに適するよう
に変更される。一般に、シーケンシャリープログラムド（ｓｅｑｕｅｎｔｉａｌ
ｌｙｐｒｏｇｒａｍｍｅｄ）アプリケーションソフトウェアは、オブジェクト
コードのポスト処理により、あるいは、コンパイルされるものであれば再コンパ
イルにより、また、翻訳されるものであればインタプリタを変更することにより
、データ整合性関連サポートに適するコードを追加するだけで変換される。FIG. 8B shows a multiprocessor system from a layered perspective. The lower layer is a microprocessor P1, P2 implemented as a plurality of shared memory processors available immediately. The next layer is the operating system. The virtual machine is, for example, an APZ emulator running on a SUN workstation or a SIMAX
A well-known Java virtual machine, such as a high-performance emulator of a compiled type, which is modified to be suitable for multiprocessor support and data integrity related support. Generally, a sequenced program (sequential
ly programmed application software is suitable for data integrity related support by post-processing of the object code, or by recompiling if compiled, and by changing the interpreter if translated. It is converted simply by adding code.

【００８８】変数マーキングに基づく衝突検出の場合、以下の方法で、単一プロセッサシス
テム用アプリケーションソフトウェアをマルチプロセッサ環境に移行することが
できる。適切なロールバックを可能にするため、変数への各書き込みアクセス前
に変数のアドレスおよび原状態を格納するためのコードがアプリケーションソフ
トウェアに挿入される。変数への各読み出しおよび書き込みアクセス前に、マー
カーフィールドのマーカービットをセットし、マーカーフィールドをチェックし
、変数のアドレスを格納するためのコードがソフトウェアに挿入される。続いて
、アプリケーションソフトウェアの再コンパイルまたは再翻訳、あるいはオブジ
ェクトコードのポスト処理が行われる。衝突検出関連サポートに適応するために
ハードウェア／オペレーティングシステム／仮想計算機が修正され、ロールバッ
クが実行され、マーカーフィールドがリセットされる。したがって、マーカーフ
ィールドをチェックするためのコードを実行するときに衝突が検出されると、通
常はハードウェア／オペレーティングシステム／仮想計算機に制御が移り、格納
されている修正変数のコピーを使用してロールバックが行われる。通常はジョブ
の終わりに、ハードウェア／オペレーティングシステム／仮想計算機は、ジョブ
でアクセスした変数の格納されたアドレスによって示される各マーカーフィール
ド中の関連ビットを引き継いで、リセットする。In the case of collision detection based on variable marking, application software for a single processor system can be migrated to a multiprocessor environment in the following manner. Before each write access to the variable, code to store the address and the original state of the variable is inserted into the application software to allow proper rollback. Before each read and write access to the variable, code is inserted into the software to set the marker bit in the marker field, check the marker field, and store the address of the variable. Subsequently, recompilation or retranslation of the application software or post processing of the object code is performed. The hardware / operating system / virtual machine is modified to accommodate collision detection related support, a rollback is performed, and the marker field is reset. Thus, if a collision is detected when executing code to check the marker field, control is typically passed to the hardware / operating system / virtual machine and rolled using the stored copy of the modified variable. Back is done. Usually at the end of the job, the hardware / operating system / virtual machine takes over and resets the relevant bit in each marker field indicated by the stored address of the variable accessed in the job.

【００８９】コードの静的分析を行うことにより、新しいコードの挿入を最小限にとどめる
ことができる。例えば、前述のように各読み書きの前に必ずコードを挿入するの
ではなく、最終目的が達せられる程度に回数を減らしてコードを挿入することが
できる。By performing a static analysis of the code, the insertion of new code can be minimized. For example, instead of inserting the code before each reading and writing as described above, the code can be inserted with a reduced number of times so that the final purpose is achieved.

【００９０】しかし、専用設計の特殊ハードウェアとして複数のプロセッサが使用される場
合は、アプリケーションソフトウェアがマルチプロセッサ環境に直接移行すると
理解するべきである。However, if multiple processors are used as specially designed special hardware, it should be understood that the application software transitions directly to a multiprocessor environment.

【００９１】図９は、本発明による１つ以上の処理システムを実施した通信システムの概略
図である。通信システム１００はＰＳＴＮ（ＰｕｂｌｉｃＳｗｉｔｃｈｅｄ
ＴｅｌｅｐｈｏｎｅＮｅｔｗｏｒｋ）、ＰＬＭＮ（ＰｕｂｌｉｃＬａｎｄ
ＭｏｂｉｌｅＮｅｔｗｏｒｋ）、ＩＳＤＮ（ＩｎｔｅｇｒａｔｅｄＳｅｒｖ
ｉｃｅｓＤｉｇｉｔａｌＮｅｔｗｏｒｋ）およびＡＴＭ（Ａｓｙｎｃｈｒｏ
ｎｏｕｓＴｒａｎｓｆｅｒＭｏｄｅ）ネットワークなど、様々なベアラサー
ビスネットワークをサポートすることができる。通信システム１００は基本的に
、トランクグループに通常分類される物理リンクによって相互接続される複数の
交換／ルーティングノード５０−１〜５０−６を含む。交換ノード５０−１〜５
０−４にはアクセスポイントがあって、そこに電話５１−１〜５１−４とコンピ
ュータ５２−１〜５２−４等のアクセス端末がローカル交換機（図示せず）を通
して接続される。交換ノード５０−５は移動体交換センター（ＭＳＣ）５３に接
続される。ＭＳＣ５３は２台の基地局コントローラ５４−１、５４−２（ＢＳＣ
）、およびホームロケーションレジスタ（ＨＬＲ）ノード５５に接続される。第
１のＢＳＣ５４−１は、１個以上の移動体ユニット５７−１、５７−２と通信す
る複数の基地局５６−１、５６−２に接続される。同様に、第２のＢＳＣ５４−
２は、１個以上の移動体ユニット５７−３と通信する複数の基地局５６−３、５
６−４に接続される。交換ノード５０−６はデータベースシステム（ＤＢＳ）を
備えたホストコンピュータ５８に接続される。システム１００に接続されたユー
ザ端末、例えばコンピュータ５２−１、５２−４は、ホストコンピュータ５８の
データベースシステムにデータベースサービスを要求することができる。サーバ
５９、特にＪａｖａサーバは交換／ルーティングノード５０−４に接続される。
また、ビジネスネットワーク（図示せず）などのプライベートネットワークも、
図１の通信システムに接続することが可能である。FIG. 9 is a schematic diagram of a communication system implementing one or more processing systems according to the present invention. The communication system 100 is a PSTN (Public Switched).
Telephone Network), PLMN (Public Land)
Mobile Network), ISDN (Integrated Service)
ices Digital Network) and ATM (Asyncro)
Various bearer service networks can be supported, such as a non-transfer mode network. Communication system 100 basically includes a plurality of switching / routing nodes 50-1 to 50-6 interconnected by physical links that are typically categorized into trunk groups. Exchange nodes 50-1 to 5
There is an access point at 0-4, where access terminals such as telephones 51-1 to 51-4 and computers 52-1 to 52-4 are connected through a local exchange (not shown). The switching node 50-5 is connected to a mobile switching center (MSC) 53. The MSC 53 has two base station controllers 54-1 and 54-2 (BSC
), And a home location register (HLR) node 55. The first BSC 54-1 is connected to a plurality of base stations 56-1 and 56-2 that communicate with one or more mobile units 57-1 and 57-2. Similarly, the second BSC 54-
2 includes a plurality of base stations 56-3, 5-5 communicating with one or more mobile units 57-3.
6-4. The switching node 50-6 is connected to a host computer 58 having a database system (DBS). User terminals connected to the system 100, for example, the computers 52-1 and 52-4, can request database services from the database system of the host computer 58. The server 59, especially a Java server, is connected to the switching / routing node 50-4.
Also, private networks such as business networks (not shown)
It is possible to connect to the communication system of FIG.

【００９２】通信システム１００は、ネットワークに接続されたユーザに対して様々なサー
ビスを提供する。それらサービスの例として、ＰＳＴＮ、ＰＬＭＮによる通常の
通話、メッセージサービス、ＬＡＮ接続、インテリジェントネットワーク（ＩＮ
）サービス、ＩＳＤＮサービス、ＣＴＩ（ＣｏｍｐｕｔｅｒＴｅｌｅｐｈｏｎ
ｙＩｎｔｅｇｒａｔｉｏｎ）サービス、テレビ会議システム、ファイル転送、
いわゆるインターネットへのアクセス、ポケットベル（登録商標）サービス、ビデオ・オン・デマンドなどがある。[0092] The communication system 100 provides various services to users connected to the network. Examples of these services include regular telephone calls using PSTN and PLMN, message services, LAN connections, intelligent networks (IN
) Service, ISDN service, CTI (Computer Telephon)
y Integration) service, video conferencing system, file transfer,
There are so-called Internet access, pager (registered trademark) service, and video on demand.

【００９３】本発明によると、システム１００における各交換ノード５０に、発明の第１ま
たは第２の特徴による（あるいは行列処理システム形式で２つの特徴を組み合わ
せた）処理システム１−１〜１−６を設け、その処理システムによってサービス
要求やノード間通信などのイベントを取り扱うことが望ましい。例えば、呼セッ
トアップでは、ジョブシーケンスを実行するために処理システムを必要とする。
このジョブシーケンスにより、プロセッサレベルでの呼セットアップサービスが
定義される。また、本発明による処理システムは、ＭＳＣ５３、ＢＳＣ５４−１
、５４−２、ＨＬＲノード５５、通信システム１００のホストコンピュータ５８
およびサーバ５９を、各１つ用いて構成することが望ましくい。According to the present invention, each switching node 50 in the system 100 is provided with a processing system 1-1 to 1-6 according to the first or second aspect of the invention (or a combination of the two aspects in the form of a matrix processing system). It is desirable that the processing system handles events such as service requests and inter-node communication. For example, call setup requires a processing system to execute a job sequence.
This job sequence defines a call setup service at the processor level. Further, the processing system according to the present invention includes the MSC 53 and the BSC 54-1.
, 54-2, HLR node 55, host computer 58 of communication system 100
It is preferable that each of the server 59 and the server 59 is configured.

【００９４】本発明は階層化処理システムの上位レベルプロセッサノードで使用することが
好ましいが、当業者には明らかな通り、イベントフローコンカレンシの確認が可
能であれば、あらゆるイベント駆動型処理に発明の上記特徴を適用することがで
きる。Although the present invention is preferably used in an upper-level processor node of a hierarchical processing system, as will be apparent to those skilled in the art, any event-driven concurrency can be used for any event-driven processing. The above features of the invention can be applied.

【００９５】イベントベースシステムの定義として、必ずしも限定しないが、電気通信、デ
ータ通信、トランザクション志向のシステムを包含する。The definition of an event-based system includes, but is not necessarily limited to, telecommunications, data communications, and transaction-oriented systems.

【００９６】共有メモリプロセッサの定義としては、市場で簡単に入手できる標準マイクロ
プロセッサに限定するものではなく、ＳＭＰや特化ハードウェアなど、すべての
処理ユニットからアクセス可能なアプリケーションソフトウェアおよびデータで
共通メモリに対して動作する各種タイプの処理ユニットを包含する。また、これ
は共有メモリがいくつかのメモリユニットにわたって分散されたシステム、ある
いは、異なるプロセッサに対する分散共有メモリの異なる部分へのアクセスタイ
ムが異なるかもしれない非対称アクセスのシステムも包含する。The definition of a shared memory processor is not limited to standard microprocessors that are readily available on the market, but is shared by application software and data accessible from all processing units, such as SMP and specialized hardware. It includes various types of processing units that operate on It also includes systems where the shared memory is distributed over several memory units, or systems with asymmetric access where the access time to different parts of the distributed shared memory for different processors may be different.

【００９７】上述の実施例は単なる例として挙げたものであって、それらは本発明を限定す
るものではない。上記以外にも、ここに開示され、請求される基本原理を保有す
る修正、変更、改良を発明の範囲と主旨にしたがって実施することができる。The embodiments described above are given by way of example only and they do not limit the invention. Other than the above, modifications, changes, and improvements that retain the basic principles disclosed and claimed herein may be made in accordance with the scope and spirit of the invention.

[Brief description of the drawings]

【図１】上位レベルのプロセッサノードを備えた本発明による階層化分散処理システム
の概要図。FIG. 1 is a schematic diagram of a hierarchical distributed processing system according to the present invention having a higher-level processor node.

【図２】本発明の第１の特徴による処理システムの概要図。FIG. 2 is a schematic diagram of a processing system according to a first feature of the present invention.

【図３】本発明の第１の特徴による処理システムの特定実施例。FIG. 3 shows a specific embodiment of a processing system according to the first aspect of the present invention.

【図４】共有メモリソフトウェアのオブジェクト指向設計による簡易化共有メモリマル
チプロセッサの概要図。FIG. 4 is a schematic diagram of a simplified shared memory multiprocessor based on object-oriented design of shared memory software.

【図５Ａ】本発明の第２の特徴による特に好ましい処理システムの概要図。FIG. 5A is a schematic diagram of a particularly preferred processing system according to the second aspect of the present invention.

【図５Ｂ】本発明の第２の特徴によるマルチプロセッサパイプライン。FIG. 5B is a multiprocessor pipeline according to a second aspect of the invention.

【図６】データの整合性を確保するためのブロック／オブジェクトロッキングの使用例
。FIG. 6 shows an example of using block / object locking to ensure data consistency.

【図７】アクセス衝突検出のための変数マーキングの使用例。FIG. 7 shows an example of using variable marking for detecting access collisions.

【図８Ａ】階層的に見た従来技術によるシングルプロセッサシステム例。FIG. 8A is an example of a single processor system according to the prior art viewed hierarchically.

【図８Ｂ】階層的に見たマルチプロセッサシステム例。FIG. 8B is an example of a multiprocessor system viewed hierarchically.

【図９】本発明による少なくとも１つの処理システムを実装した通信システムの概要図
。FIG. 9 is a schematic diagram of a communication system implementing at least one processing system according to the present invention.

───────────────────────────────────────────────────── フロントページの続き (81)指定国ＥＰ(ＡＴ，ＢＥ，ＣＨ，ＣＹ，ＤＥ，ＤＫ，ＥＳ，ＦＩ，ＦＲ，ＧＢ，ＧＲ，ＩＥ，ＩＴ，ＬＵ，ＭＣ，ＮＬ，ＰＴ，ＳＥ)，ＯＡ(ＢＦ，ＢＪ，ＣＦ，ＣＧ，ＣＩ，ＣＭ，ＧＡ，ＧＮ，ＧＷ，ＭＬ，ＭＲ，ＮＥ，ＳＮ，ＴＤ，ＴＧ)，ＡＰ(ＧＨ，ＧＭ，ＫＥ，ＬＳ，ＭＷ，ＳＤ，ＳＬ，ＳＺ，ＴＺ，ＵＧ，ＺＷ )，ＥＡ(ＡＭ，ＡＺ，ＢＹ，ＫＧ，ＫＺ，ＭＤ，ＲＵ，ＴＪ，ＴＭ)，ＡＥ，ＡＬ，ＡＭ，ＡＴ，ＡＵ，ＡＺ，ＢＡ，ＢＢ，ＢＧ，ＢＲ，ＢＹ，ＣＡ，ＣＨ，ＣＮ，ＣＲ，ＣＵ，ＣＺ，ＤＥ，ＤＫ，ＤＭ，ＥＥ，ＥＳ，ＦＩ，ＧＢ，ＧＤ，ＧＥ，ＧＨ，ＧＭ，ＨＲ，ＨＵ，ＩＤ，ＩＬ，ＩＮ，ＩＳ，ＪＰ，ＫＥ，ＫＧ，ＫＰ，ＫＲ，ＫＺ，ＬＣ，ＬＫ，ＬＲ，ＬＳ，ＬＴ，ＬＵ，ＬＶ，ＭＡ，ＭＤ，ＭＧ，ＭＫ，ＭＮ，ＭＷ，ＭＸ，ＮＯ，ＮＺ，ＰＬ，ＰＴ，ＲＯ，ＲＵ，ＳＤ，ＳＥ，ＳＧ，ＳＩ，ＳＫ，ＳＬ，ＴＪ，ＴＭ，ＴＲ，ＴＴ，ＴＺ，ＵＡ，ＵＧ，ＵＺ，ＶＮ，ＹＵ，ＺＡ，ＺＷ (72)発明者ヨンソン、ステン、エドヴァルドスウェーデン国ファルスタ、リイスビクスガタン３ (72)発明者ソホニ、ミリンドインド国モムバイ、ポワイ、インディアンインスチチュートオブテクノロジイ、ヒルサイド、シーエスアールイークォーターズシー−147 (72)発明者テイケカル、ニクヒルインド国バンガローレ、リッチモンドロード、センチュリーパーク、シー 321 Ｆターム(参考） 5B045 AA06 EE03 GG01 GG17 5B098 AA10 GA04 GD16 GD25 GD27──────────────────────────────────────────────────続き Continuation of front page (81) Designated country EP (AT, BE, CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, IT, LU, MC, NL, PT, SE ), OA (BF, BJ, CF, CG, CI, CM, GA, GN, GW, ML, MR, NE, SN, TD, TG), AP (GH, GM, KE, LS, MW, SD, SL, SZ, TZ, UG, ZW), EA (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), AE, AL, AM, AT, AU, AZ, BA, BB, BG, BR, BY, CA, CH, CN, CR, CU, CZ, DE, DK, DM, EE, ES, FI, GB, GD, GE, GH, GM, HR, HU, ID , IL, IN, IS, JP, KE, KG, KP, KR, KZ, LC, LK, LR, LS, LT, LU, LV, MA, MD, MG, MK, MN, MW, MX, NO, (72) Invention NZ, PL, PT, RO, RU, SD, SE, SG, SI, SK, SL, TJ, TM, TR, TT, TZ, UA, UG, UZ, VN, YU, ZA, ZW Jonsson, Sten, Edvard Farsta, Sweden, Lisvik Sgatan 3 (72) Inventor Sohoni, Mirind India Mombai, Powai, Indian Institute of Technology I, Hillside, CS R Quarters C-147 (72) Inventor Teikekar, Nikhill India Bangalore, Richmond Road, Century Park, Sea 321 F-term (reference) 5B045 AA06 EE03 GG01 GG17 5B09 8 AA10 GA04 GD16 GD25 GD27

Claims

[Claims]

1. An event-based hierarchical distributed processing system (1) having a plurality of processor nodes distributed over a plurality of levels of a system hierarchical structure, wherein at least one upper-level processor node of the hierarchical processing system (1) is provided. (10) a plurality of shared memory processors (11) and a plurality of non-exchanged (non-c) external event flows reaching the processor node.
(14) means for dividing the event categories into event categories, assigning each non-switched event category to a predetermined set of shared memory processors, and mapping external events to the processors for processing by the set of processors; Means (15) for ensuring data consistency when global data in the shared memory (12) is manipulated by the processor.

2. The hierarchical distributed processing system according to claim 1, wherein each processor set is configured in a single processor format.

3. At least one set of processors is configured in the form of a processor array that operates as a multiprocessor pipeline having a plurality of processor stages, and each event of a non-exchange category assigned to a processor set is different from each other in the pipeline. 2. The hierarchical distributed processing system according to claim 1, wherein an event chain executed in the processor stage is processed in slice units.

4. An event requiring input data from a predetermined data area of the shared memory (12) is mapped to exactly the same predetermined processor set by the mapping means (14, 18). Hierarchical distributed processing system.

5. The hierarchy according to claim 1, wherein the non-exchange category is a group of events, and the order of events needs to be preserved within the category, but no ordering is required for event processing of different categories. Distributed processing system.

6. The hierarchical distributed processing system according to claim 1, wherein the higher-level processor node further includes means for supplying an event generated in the processor set to the same processor set.

7. The hierarchical distributed processing system according to claim 1, wherein a non-exchange category is defined by an event from a predetermined source (S1 / S2).

8. The hierarchical distribution system according to claim 7, wherein the generation source (S1 / S2) is an input port, a lower-level processor node, or a hardware device connected to the hierarchical distributed processing system. Processing system.

9. A method for locking, in a shared memory, a global variable used for a software task executed in response to an event, and a means for releasing the locked global variable at the end of task execution. , Data matching means (
The hierarchical distributed processing system according to claim 1, wherein the hierarchical distributed processing system is included in (15).

10. The data matching means (15) further comprises means for releasing a locked global variable of one of the two mutually locking tasks and re-executing the task after an appropriate delay time. The hierarchical distributed processing system according to claim 9, wherein:

11. Software in the shared memory (12) includes a plurality of software blocks (B1 to Bn), each of the processors executes a software task including the software block in response to an event, and before execution of the task execution. Means for forming at least a part of the data matching means (15) for locking at least the global data of the software block is included in each processor, and only the processor locking the block accesses the global data in the block. 2. The hierarchical distributed processing system according to claim 1, wherein the distributed processing system is capable of performing the following.

12. The hierarchical distributed processing system according to claim 11, wherein the locking means locks the entire software block before the execution of the corresponding task starts, and releases the locked block at the end of the task execution.

13. The method according to claim 1, wherein the locking means secures blocks necessary for at least a software task consuming a considerable part of the processing time in the task before starting the task execution in order to minimize the deadlock condition. Item 11
A hierarchical distributed processing system as described in the above.

14. A means for detecting a deadlock condition and releasing a block locked by one of the waiting processors to ensure that the processor proceeds after a suitable delay to ensure processing progress. 12. The hierarchical distributed processing system according to claim 11, wherein the means for resuming the software task to be executed is included in a higher-level processor node.

15. A means for checking whether a variable required for a target software task is locked by another processor, and whether another processor is waiting for a variable locked by a processor involved in the target task. 15. The hierarchical distributed processing system according to claim 14, wherein the means for confirming whether or not the information is included is included in deadlock detecting means.

16. A means for individually processing events by a plurality of processors (11) to execute a plurality of corresponding software tasks in parallel, and detecting a collision between the parallel tasks; 2. The hierarchical distributed processing system according to claim 1, wherein the means for restoring and re-executing are included in the data matching means.

17. A method according to claim 1, wherein means for marking the use of the variable in the shared memory is included in each processor, and means for detecting a variable access collision based on the marking is included in the collision detection means. Item 18. The hierarchical distributed processing system according to Item 16.

18. The software in the shared memory (12) includes a plurality of software blocks (B1 to Bn), and each of the plurality of processors executes a software task including the software blocks in response to an event. 17. The processor of claim 16, wherein the processor includes means for marking the use of the variable in the block, and the collision detection means includes means for detecting a variable access collision based on the marking. A hierarchical distributed processing system as described in the above.

19. A parallel event queue (16), a queue for each processor set, and a mapping means (14) for mapping each external event to an event queue based on information included in each external event. 2. The hierarchical distributed processing system according to claim 1, wherein the hierarchically distributed processing system is included in a higher-level processor node.

20. An event-based hierarchical distributed processing system (1) having multiple processor nodes distributed over multiple levels of a system hierarchical structure, wherein at least one higher-level processor of the hierarchical processing system (1). A node (10) includes a plurality of shared memory processors (11) operating as a multiprocessor pipeline including a plurality of processor stages, wherein each event arriving at the multiprocessor pipeline is executed at a different processor stage of the pipeline. The upper level processor node (10) is provided with means (15) for ensuring data consistency when global data in the shared memory (12) is operated by the processor. Characterized by being included The hierarchical distributed processing system.

21. Software in the shared memory (12) includes a plurality of software blocks (B1 to Bn), each event is sent to one of the software blocks, and a cluster of blocks (CL) is processed by a processor using a load balancing method. A multiprocessor pipeline is realized by means (17, 18) for allocating to each of the above and means (17, 18) for mapping events to processors according to the allocation by the allocating means. Item 21. The hierarchical distributed processing system according to Item 20.

22. A means for locking a global variable used for a task executed by a processor in response to an event in a shared memory, and a means for releasing a locked global variable at the end of task execution. 21. The hierarchical distributed processing system according to claim 20, wherein is included in the data matching means (15).

23. The data matching means (15) further comprising means for releasing a global variable of one of the two mutually locked tasks and re-executing the task after an appropriate delay time. The hierarchical distributed processing system according to claim 22, characterized in that:

24. Software in the shared memory (12) includes a plurality of software blocks (B1 to Bn), each of the processors executes a software task including the software block in response to an event, and before execution of the task execution. Means for forming at least a part of the data matching means (15) for locking at least the global data of the software block is included in each processor, and only the processor locking the block accesses the global data in the block. 21. The hierarchical distributed processing system according to claim 20, wherein the distributed processing system can perform the processing.

25. The hierarchical distributed processing system according to claim 24, wherein the locking means locks the entire software block before the execution of the corresponding task starts, and releases the locked block at the end of the task execution.

26. The method according to claim 19, wherein the locking means secures a block necessary for at least a software task consuming a large part of the task processing time before starting the task execution in order to minimize the deadlock state. 25. The hierarchical distributed processing system according to item 24.

27. A means for detecting a deadlock condition and releasing a block locked by one of the waiting processors to ensure that the processor proceeds after a suitable delay to ensure processing progress. The hierarchical distributed processing system according to claim 24, wherein the means for re-executing the software task to be executed is included in a higher-level processor node.

28. Means for checking whether a variable required for a target software task is locked by another processor, and whether another processor is waiting for a variable locked by a processor related to the target task. 28. The hierarchical distributed processing system according to claim 27, wherein the means for confirming whether or not there is included is a deadlock detecting means.

29. A hierarchical distributed processing system in which a plurality of processors (11) individually process events in order to execute a plurality of corresponding software tasks in parallel, means for detecting a collision between parallel tasks. 21. The hierarchical distributed processing system according to claim 20, wherein the data matching means (15) includes means for reverting and re-executing the task in which the collision is detected.

30. A means for marking use of a variable in a shared memory is included in each processor, and means for detecting a variable access collision based on the marking is included in the collision detection means. Item 30. The hierarchical distributed processing system according to Item 29.

31. A processing method in an event-based hierarchical distributed processing system (1) having a plurality of processor nodes distributed over a plurality of levels of a system hierarchical structure, wherein at least one upper level of the hierarchical processing system (1) Providing a plurality of shared memory processors (11) in the processor nodes (10) of the plurality of non-exchange event categories (NCC) based on the event flow concurrency recognized by the system. ), Mapping each NCC to a processor such that each event NCC is assigned to a predetermined set of processors and processed by the set of processors, and a processor accessible to the given global data. Once To be limited to one, said processing method characterized by comprising the step of securing data integrity when operating processor global data in the shared memory (12).

32. The processing method according to claim 31, wherein the NCC is a group of events, and the order of events needs to be preserved within the category, but no ordering is required for event processing of different categories.

33. Operating at least one processor set as a multi-processor pipeline having a plurality of processor stages, wherein each event of a non-switched category assigned to the processor set is executed by a different processor stage in the pipeline. 32. The processing method according to claim 31, wherein processing is performed in slice units as an event chain.

34. The processing method according to claim 31, wherein the event generated by the processor set is supplied to the same processor.

35. In the step of ensuring data consistency, a global variable used for a software task executed in response to an event is locked in a shared memory, and the locked global variable is released at the end of the task execution. 32. The processing method according to claim 31, wherein:

36. The method of claim 35, further comprising: releasing a global variable of one of the two mutually locked tasks and re-executing the task after an appropriate delay. The processing method described.

37. A processing method in which software in the shared memory (12) includes a plurality of software blocks and a software task including the software blocks is executed by each of the processors in response to an event. The step of locking at least the global data of the software block before execution by one of the processors so that only that processor can access the global data in that block. The processing method according to claim 31.

38. The processing method according to claim 37, wherein the entire software block is locked before the execution of the corresponding task starts, and the locked block is released when the task execution ends.

39. The processing method according to claim 37, wherein all blocks necessary for the software task are secured before starting the task execution in order to avoid a so-called deadlock state.

40. To detect a deadlock condition and to ensure processing progress, release a block locked by one of the waiting processors and re-execute the software task executed by that processor after an appropriate delay time. The processing method according to claim 37, wherein the processing is performed.

41. A processing method in which a plurality of corresponding software tasks are executed in parallel by a processor in response to an event, wherein in the step of ensuring data consistency, an access collision is detected, and the task in which the collision is detected is detected. The processing method according to claim 31, wherein the processing is returned to the original state and executed again.

42. The processing method according to claim 41, wherein each processor marks the use of a variable in the shared memory, and detects a variable access collision based on the marking at the time of detecting the collision.

43. The processing method according to claim 31, further comprising the step of transferring application software for a single processor system to a plurality of shared memory processors and executing the application software.

44. A processing method in an event-based hierarchical distributed processing system (1) having a plurality of processor nodes distributed over a plurality of levels of a system hierarchical structure, wherein at least one upper level of the hierarchical processing system (1). Processor node (
10) providing a plurality of shared memory processors (11); dividing an external event flow to a processor node into a plurality of non-switched event categories (NCC) based on an event flow concurrency recognized by the system. Operating a plurality of shared memory processors (11) as a multiprocessor pipeline including a plurality of processor stages and executing at least one of the events arriving at the multiprocessor pipeline at different processor stages of the pipeline. Processing in a slice unit as an event chain to be performed, and data consistency when the processor operates global data in the shared memory (12) so that only one processor can access global data at a time. To secure It said processing method characterized by including a flop.

45. A processing method in which a plurality of software blocks (B1 to Bn) are included in software in a shared memory (12), and each event is sent to one of the software blocks. In the step of operating as a pipeline, a cluster of blocks (CL)
45. The processing method according to claim 44, wherein is assigned to each of the processors by a load balancing method, and events are mapped to the processors according to the assignment.

46. In the step of ensuring data consistency, a global variable used for a software task executed in response to an event is locked in a shared memory, and the locked global variable is released at the end of the task execution. The processing method according to claim 44, characterized in that:

47. The step of ensuring data consistency further comprising releasing a global variable of one of the two mutually locked tasks and re-executing the task after an appropriate delay. The processing method described.

48. A processing method, wherein software in a shared memory (12) includes a plurality of software blocks and a software task including the software blocks is executed by each of the processors in response to an event, comprising: The step of locking at least the global data of the software block before execution by one of the processors so that only that processor can access the global data in that block. The processing method according to claim 44.

49. The processing method according to claim 48, wherein the entire software block is locked before the execution of the corresponding task starts, and the locked block is released when the task execution ends.

50. The processing method according to claim 48, wherein all blocks necessary for the software task are secured before starting the task execution in order to avoid a so-called deadlock state.

51. To detect a deadlock condition and to ensure processing progress, release a block locked by one of the waiting processors and re-execute a software task executed by that processor after an appropriate delay time. 49. The processing method according to claim 48, wherein the method is executed.

52. A processing method for executing a plurality of corresponding software tasks by a processor in parallel in response to an event, wherein in the step of ensuring data consistency, an access collision between the parallel tasks is detected, and the collision is detected. The processing method according to claim 44, wherein the performed task is returned to the original state and re-executed.

53. The processing method according to claim 52, wherein each processor marks the use of a variable in the shared memory by each processor, and detects a variable access collision based on the marking when the collision is detected.

54. The processing method according to claim 44, further comprising the step of transferring application software for a single processor system to a plurality of shared memory processors and executing the application software.

55. A plurality of shared memory processors (11) for executing a plurality of jobs in parallel, and a coexistence category of an external independent event signal for a plurality of processors (11) for simultaneously executing corresponding jobs. A mapper (14) for mapping, a collision detector for detecting access collisions between parallel jobs when global data in the shared memory is manipulated by the processor, and a collision detector for ensuring data consistency. Means for restoring and re-executing a completed job.

56. The event driven processing system according to claim 55, further comprising means for supplying a job generated by the processor to the same processor.

57. A plurality of shared memory processors which operate as a multiprocessor pipeline including a plurality of processor stages and execute a plurality of jobs in parallel, wherein each external event reaching the multiprocessor pipeline is piped. A shared memory processor (11) for processing in a slice unit as an event chain executed in different processor stages of a line, and for detecting an access collision between parallel jobs when global data of the shared memory is operated by the processor. An event driven processing system comprising: a collision detector; and means for reverting and re-executing a job in which a collision is detected in order to ensure data consistency.

58. A shared memory (12) including software constituted by a plurality of software blocks, and a plurality of shared memory processors (11) for executing a plurality of jobs respectively associated with at least one of the software blocks in parallel. Means (17, 18) for assigning a cluster (CL) of software blocks to each processor; means (17, 18) for distributing jobs to execution processors based on the assignment by the assigning means; Means for ensuring data consistency when global data is manipulated by a multiprocessor.

59. A communication system (100) including an event-based hierarchical distributed processing system (1) having a plurality of processor nodes distributed over multiple levels of a system hierarchical structure, the communication system (100) comprising: At least one upper-level processor node (10) divides an external event flow arriving at the processor node into a plurality of non-switched event categories, and divides each non-switched event category into a predetermined set. Means (14) for allocating to the shared memory processor and mapping external events to the processor for processing by the set of processors; and data consistency when global data in the shared memory (12) is manipulated by the processor. Means for ensuring safety (15) The communication system, comprising:

60. A communication system (100) including an event-based hierarchical distributed processing system (1) having a plurality of processor nodes distributed over multiple levels of a system hierarchical structure, the communication system (100) comprising: At least one upper level processor node (10) includes a plurality of shared memory processors (12) operating as a multiprocessor pipeline including a plurality of processor stages, wherein each event arriving at the multiprocessor pipeline is Each of the event chains is processed in a slice unit as an event chain executed by a different processor stage. Means (15 The hierarchical distributed processing system characterized in that it contains.