JP2010539588A

JP2010539588A - Memory sharing and data distribution

Info

Publication number: JP2010539588A
Application number: JP2010524879A
Authority: JP
Inventors: ジミージェイ．タンブリン，; ローレンスダブリュー．グロッド，; ロバートエー．トッド，
Original assignee: メンター・グラフィクス・コーポレーション
Priority date: 2007-09-11
Filing date: 2008-09-11
Publication date: 2010-12-16
Also published as: WO2009035690A2; WO2009035690A3; EP2210398A2

Abstract

ソフトウェアツール計算プロセスのコピーは、マスタ計算デバイス上と、マスタ計算デバイスとは別個のリモート計算デバイス上でインスタンス化される。計算プロセスの主要インスタンス化は、マスタ計算デバイスのメモリにデータを保存し、そこからデータを読み出すことが可能なデータブロックリソースマネージャを採用する。計算プロセスの各リモートインストールは、リモートデータクライアントを採用する一方で、リモートデータサーバは、各リモート計算デバイス上においてインスタンス化される。計算プロセスのリモートインスタンス化が、アクセスデータを必要とする時にデータを要求するように、その関連付けられたリモートデータクライアントに命令する。データが別のリモート計算プロセスによって管理されている場合、リモートデータクライアントは、そのリモート計算プロセスと関連付けられたリモートデータサーバからデータを要求するであろう。A copy of the software tool computing process is instantiated on the master computing device and on a remote computing device that is separate from the master computing device. The primary instantiation of the computing process employs a data block resource manager that can store data in the memory of the master computing device and retrieve data therefrom. Each remote installation of the computing process employs a remote data client, while a remote data server is instantiated on each remote computing device. A remote instantiation of the computing process instructs its associated remote data client to request data when it needs access data. If the data is managed by another remote computing process, the remote data client will request data from a remote data server associated with that remote computing process.

Description

（関連出願）
本願は、発明者ＬａｕｒｅｎｃｅＧｒｏｄｄらによる米国仮特許出願第６０／９７１，２６４号（名称「ＭｅｍｏｒｙＳｈａｒｉｎｇＡｎｄＤａｔａＤｉｓｔｒｉｂｕｔｉｏｎ」、２００７年９月１１日出願）の米国特許法第１１９条第（ｅ）項の優先権の利益を主張し、この出願は、その全体が本明細書に参考として援用される。また、本願は、発明者ＬａｕｒｅｎｃｅＧｒｏｄｄらによる米国特許出願第１１／３９６，９２９号（名称「ＤｉｓｔｒｉｂｕｔｉｏｎＯｆＰａｒａｌｌｅｌＯｐｅｒａｔｉｏｎｓ」、２００６年４月１２日出願）に関連し、この出願も、その全体が本明細書に参考として援用される。 (Related application)
This application is US Provisional Patent Application No. 60 / 971,264 (named “Memory Sharing And Data Distribution”, filed on Sep. 11, 2007) by inventor Laurence Grodd et al. This application is hereby incorporated by reference in its entirety. This application is also related to US Patent Application No. 11 / 396,929 (named “Distribution Of Parallel Operations”, filed April 12, 2006) by inventor Laurence Grodd et al. Incorporated herein by reference.

（発明の分野）
本発明は、並列処理ネットワーク内における共有および分散型メモリの使用を対象とする。本発明の種々の実装は、並列処理ネットワーク内で動作する電子設計自動化ソフトウェアツールによる使用のための設計データを記憶する共有および分散型メモリの使用に対する特定の用途を有してもよい。 (Field of Invention)
The present invention is directed to the use of shared and distributed memory within a parallel processing network. Various implementations of the invention may have particular application to the use of shared and distributed memory for storing design data for use by electronic design automation software tools operating within a parallel processing network.

多くのソフトウェアアプリケーションは、単一のプロセッサコンピュータ上で効率的に実行可能である。しかしながら、いくつかのソフトウェアアプリケーションは、多くの演算を有するため、経済的な時間内に単一プロセッサコンピュータ上で連続的に実行することは不可能である。例えば、マイクロデバイス（例えば、集積回路）設計プロセスソフトウェアアプリケーションは、数十万または数億もの入力データ値に基づいて、十万以上の演算の実行を要求する場合がある。このような種類のソフトウェアアプリケーションをより迅速に実行するために、複数の処理スレッドを同時に使用すること可能な複数のプロセッサを採用するコンピュータが開発された。これらのコンピュータは、単一プロセッサコンピュータよりも迅速に複雑なソフトウェアアプリケーションを実行可能であるが、これらのマルチプロセッサコンピュータは、購入および維持するには非常に高価である。マルチプロセッサコンピュータの場合、プロセッサは、数多くの演算を同時に実行し、したがって、関連演算の並行実行を協調させるための特殊オペレーティングシステムを採用しなければならない。さらに、その複数のプロセッサは、メモリ等のリソースへのアクセスを同時に検索し得るため、マルチプロセッサコンピュータのバス構造および物理的レイアウトは、本質的に、単一プロセッサコンピュータより複雑である。 Many software applications can be efficiently executed on a single processor computer. However, some software applications have many operations and cannot be executed continuously on a single processor computer in an economical time. For example, a microdevice (eg, integrated circuit) design process software application may require execution of more than 100,000 operations based on hundreds of thousands or hundreds of millions of input data values. In order to execute these types of software applications more quickly, computers have been developed that employ multiple processors that can simultaneously use multiple processing threads. Although these computers can execute complex software applications more quickly than single processor computers, these multiprocessor computers are very expensive to purchase and maintain. In the case of a multiprocessor computer, the processor must execute a number of operations simultaneously, and thus employ a special operating system to coordinate the parallel execution of related operations. Furthermore, the bus structure and physical layout of a multiprocessor computer is inherently more complex than a single processor computer because the multiple processors can simultaneously retrieve access to resources such as memory.

大型マルチプロセッサコンピュータに伴う困難性および費用の観点から、リンクされた単一プロセッサコンピュータのネットワークが、単一マルチプロセッサコンピュータの使用に対する人気の高い代替となっている。パーソナルコンピュータ等の従来の単一プロセッサコンピュータのコストは、過去数年間で著しく低下した。さらに、複数の単一プロセッサコンピュータの演算をネットワークにリンクするための技術も、より高度かつ信頼性のあるものとなった。故に、数百万ドルのマルチプロセッサコンピュータは、現在、典型的には、比較的単純かつ低コスト単一プロセッサコンピュータのネットワークまたは「ファーム」に置換されつつある。 In view of the difficulties and costs associated with large multiprocessor computers, a network of linked single processor computers has become a popular alternative to the use of single multiprocessor computers. The cost of conventional single processor computers, such as personal computers, has decreased significantly over the past few years. In addition, techniques for linking the operations of multiple single processor computers to a network have become more sophisticated and reliable. Thus, multi-million dollar multiprocessor computers are now typically being replaced by relatively simple and low cost single processor computer networks or “farms”.

単一マルチプロセッサコンピュータから複数のネットワーク化された単一プロセッサコンピュータへの移行は、処理されるデータが並列処理を有する場合、特に有用である。このような種類のデータの場合、データの一部は、データの別の部分から独立する。すなわち、データの第１の部分の操作は、データの第２の部分の知識またはそこへのアクセスを要求しない。したがって、ある単一プロセッサコンピュータは、データの第１の部分上で演算を実行可能である一方、別の単一プロセッサコンピュータは、データの第２の部分上で同一演算を同時に実行可能である。複数のコンピュータを使用して、異なるデータ群上で同一演算を同時に、すなわち、「並列」で、実行することによって、大量のデータを迅速に処理可能である。複数の単一プロセッサコンピュータのこのような使用は、電子設計自動化（ＥＤＡ）ツールを使用する回路設計データを解析するために特に有益である。このような種類のデータの場合、マイクロ回路の第１の領域内の半導体ゲート等の設計の一部は、マイクロ回路の第２の領域内の配線ライン等の設計の別の部分から完全に独立してもよい。したがって、構造の最小幅確認を定義する演算等、いくつかの電子設計自動化演算が、ゲートのために、１つのコンピュータによって実行可能である一方、別のコンピュータが、配線ラインのために、同一演算を実行する。 The transition from a single multiprocessor computer to multiple networked single processor computers is particularly useful when the data being processed has parallel processing. For these types of data, one part of the data is independent of another part of the data. That is, manipulation of the first portion of data does not require knowledge of or access to the second portion of data. Thus, one single processor computer can perform operations on a first portion of data while another single processor computer can simultaneously perform the same operation on a second portion of data. By using multiple computers to perform the same operation on different data groups simultaneously, ie “in parallel”, large amounts of data can be processed quickly. Such use of multiple single processor computers is particularly beneficial for analyzing circuit design data using electronic design automation (EDA) tools. For this type of data, part of the design, such as the semiconductor gate, in the first region of the microcircuit is completely independent of another part of the design, such as the wiring line in the second region of the microcircuit. May be. Thus, some electronic design automation operations, such as operations that define minimum width confirmation of structures, can be performed by one computer for gates, while other computers perform the same operations for wiring lines. Execute.

並列処理は、電子設計自動化ソフトウェアツールを実行するために要求される時間を実質的に短縮するが、これらのツールのうちの多くは、依然として、大量のシステムメモリを要求する。例えば、電子設計自動化ソフトウェアツールは、各インスタンス化が、互いに並列して演算を実行可能なように、単一「マスタ」計算デバイス上で計算プロセスのいくつかのインスタンス化を開始する場合がある。代替として、または加えて、電子設計自動化ソフトウェアツールは、同様に、各インスタンス化が、互いに並列で演算を実行し得るように、複数の「リモート」計算デバイス上で計算プロセスのいくつかのコピーをインスタンス化する場合がある。従来の並列処理システムの場合、各インスタンス化は、その割り当てられた演算を行なうために要求されるデータのその独自のコピーを必要とするであろう。例えば、並列処理システムが、レイアウト設計データを処理するための電子設計自動化ソフトウェアツール計算プロセスを実装するために使用されている場合、集積回路設計の単一層のためのデータは、計算プロセスの複数のインスタンス化による使用のために、マスタ計算デバイスのメモリ内で数回複製される場合がある。さらになお、データは、同様に、それらのリモート計算デバイス上での計算プロセスのインスタンス化による使用のために、複数のリモート計算デバイス上で複製される場合がある。 Although parallel processing substantially reduces the time required to run electronic design automation software tools, many of these tools still require large amounts of system memory. For example, an electronic design automation software tool may initiate several instantiations of a computing process on a single “master” computing device so that each instantiation can perform operations in parallel with each other. Alternatively, or in addition, the electronic design automation software tool can similarly make several copies of the computing process on multiple “remote” computing devices so that each instantiation can perform operations in parallel with each other. May be instantiated. For conventional parallel processing systems, each instantiation will require its own copy of the data required to perform its assigned operation. For example, if a parallel processing system is used to implement an electronic design automation software tool calculation process for processing layout design data, data for a single layer of an integrated circuit design may be used in multiple processes of the calculation process. May be replicated several times in the memory of the master computing device for use by instantiation. Still further, data may be replicated on multiple remote computing devices for use by instantiation of computing processes on those remote computing devices as well.

有利には、本発明の種々の側面は、単一マスタ計算デバイス上で動作するソフトウェアツール計算プロセス（電子設計自動化ソフトウェアツールの計算プロセス等）の複数のインスタンス化を可能にし、そのマスタ計算デバイスのメモリ内に記憶されるデータを共有するための技術に関する。本発明の他の側面は、複数のリモート計算デバイス上で動作するソフトウェアツール計算プロセスの複数のインスタンス化を可能にし、それらのリモート計算デバイス間に分散されるデータを採用するステップを対象とする。 Advantageously, various aspects of the present invention allow multiple instantiations of a software tool calculation process (such as the calculation process of an electronic design automation software tool) running on a single master computing device, The present invention relates to a technique for sharing data stored in a memory. Another aspect of the invention is directed to allowing multiple instantiations of a software tool computing process running on multiple remote computing devices and employing data distributed among those remote computing devices.

本発明の種々の実施形態によると、電子設計自動化ソフトウェアツール計算プロセス等の複数のソフトウェアツール計算プロセスのコピーは、マスタ計算デバイス上でインスタンス化される。計算プロセスの各インスタンス化は、計算プロセスによって必要とされるデータを要求するように構成される、データブロックマネージャを含む、または別様に採用する。また、計算プロセスの主要インスタンス化は、マスタ計算デバイスのメモリにデータを保存し、そこから読み出すことが可能なデータブロックリソースマネージャを含む、または別様に採用する。本発明の種々の実施形態の場合、メモリは、マスタ計算デバイスの処理ユニットに隣接して動作する集積回路メモリデバイスによって提供される「高速」メモリ、磁気または光学ディスクによって提供される「低速」メモリ、またはそれらのいくつかの組み合わせであってもよい。計算プロセスのインスタンス化が、データへのアクセスを必要とする時、データブロックリソースマネージャからデータを要求するように、その関連付けられたデータブロックマネージャに命令する。それに応じて、データブロックリソースマネージャは、マスタ計算デバイスのメモリからデータを読み出し、計算プロセスによる使用のために、要求データブロックマネージャに提供する。同様に、計算プロセスが、データの保存を必要とする場合、データブロックリソースマネージャにデータを提供するように、その関連付けられたデータブロックマネージャに命令する。それに応じて、データブロックリソースマネージャは、マスタ計算デバイスのメモリ内にデータを保存する。 According to various embodiments of the invention, copies of a plurality of software tool calculation processes, such as an electronic design automation software tool calculation process, are instantiated on a master computing device. Each instantiation of the computing process includes or otherwise employs a data block manager configured to request data needed by the computing process. Also, the primary instantiation of the computing process includes or otherwise employs a data block resource manager that can store and read data in the memory of the master computing device. For various embodiments of the present invention, the memory is a “fast” memory provided by an integrated circuit memory device operating adjacent to the processing unit of the master computing device, a “slow” memory provided by a magnetic or optical disk. Or some combination thereof. When an instantiation of a computational process requires access to data, it instructs its associated data block manager to request data from the data block resource manager. In response, the data block resource manager reads data from the memory of the master computing device and provides it to the requesting data block manager for use by the computing process. Similarly, if the calculation process requires data storage, it instructs its associated data block manager to provide the data to the data block resource manager. In response, the data block resource manager stores data in the memory of the master computing device.

本発明のさらに他の実施例の場合、ソフトウェアツール計算プロセスの１つ以上のバージョンが、マスタ計算デバイスと、別個のリモート計算デバイス上でインスタンス化される。計算プロセスの各リモートインスタンス化は、リモートデータクライアントを含む、または別様に採用する。また、リモートデータサーバ（または、リモートデータブロックリソースマネージャ）は、各リモート計算デバイス上でインスタンス化される。データブロックリソースマネージャと同様に、リモートデータサーバは、そのリモート計算デバイスのメモリにデータを保存し、そこからデータを読み出すことが可能である。また、マスタ計算デバイスのためのメモリの場合のように、リモート計算デバイスのためのメモリは、リモート計算デバイスの処理ユニットに隣接して動作する集積回路メモリデバイスによって提供される「高速」メモリ、磁気または光学ディスクによって提供される「低速」メモリ、またはそれらのいくつかの組み合わせであってもよい。 In yet another embodiment of the invention, one or more versions of the software tool computing process are instantiated on the master computing device and on separate remote computing devices. Each remote instantiation of the computing process involves or otherwise employs a remote data client. A remote data server (or remote data block resource manager) is also instantiated on each remote computing device. Similar to the data block resource manager, the remote data server can store data in and retrieve data from the memory of the remote computing device. Also, as is the case for the memory for the master computing device, the memory for the remote computing device is a “fast” memory, magnetic, provided by an integrated circuit memory device that operates adjacent to the processing unit of the remote computing device. Or “slow” memory provided by an optical disc, or some combination thereof.

計算プロセスのリモートインスタンス化が、アクセスデータへのアクセスを必要とする時、データを要求するように、その関連付けられたリモートデータクライアントに命令する。データが別のリモート計算プロセスによって管理されている場合、リモートデータクライアントは、そのリモート計算プロセスと関連付けられたリモートデータサーバからデータを要求するであろう。代替として、データがマスタ計算デバイス上の計算プロセスによって管理されている場合、リモートデータクライアントは、そのマスタ計算プロセスと関連付けられたデータブロックマネージャからデータを要求するであろう。上述のように、データブロックマネージャは、順に、マスタ計算デバイス上の主要計算プロセスと関連付けられたデータブロックリソースマネージャを通して、データを取得してもよい。同様に、リモート計算プロセスが、データの保存を必要とする場合、対応するメモリ内への記憶のために、適切なリモートデータサーバまたはデータブロックリソースマネージャにデータを提供するように、その関連付けられたリモートデータクライアントに命令する。本発明の種々の実施例の場合、データブロックマネージャ、リモートデータクライアント、おおびリモートデータサーバは、伝送制御プロトコル／インターネットプロトコル（ＴＣＰ／ＩＰ）またはユーザデータグラムプロトコル等、好適なネットワーク通信プロトコルを使用する電子通信ネットワークによって、相互接続されてもよい。 When a remote instantiation of a computing process requires access to access data, it instructs its associated remote data client to request the data. If the data is managed by another remote computing process, the remote data client will request data from a remote data server associated with that remote computing process. Alternatively, if the data is managed by a computing process on the master computing device, the remote data client will request data from the data block manager associated with that master computing process. As described above, the data block manager may in turn obtain data through the data block resource manager associated with the main computing process on the master computing device. Similarly, if a remote computing process requires storage of data, its associated to provide data to the appropriate remote data server or data block resource manager for storage in the corresponding memory. Command the remote data client. For various embodiments of the present invention, the data block manager, remote data client, and remote data server use a suitable network communication protocol such as Transmission Control Protocol / Internet Protocol (TCP / IP) or User Datagram Protocol. May be interconnected by an electronic communication network.

本発明のこれらの、ならびに他の特徴および側面は、以下の詳細な説明を考慮することによって明らかとなろう。 These and other features and aspects of the present invention will become apparent upon consideration of the following detailed description.

図１は、本発明の種々の実施形態によって採用され得る、単一プロセッサコンピュータのネットワークとリンクされるマルチプロセッサコンピュータの概略図である。FIG. 1 is a schematic diagram of a multiprocessor computer linked to a network of single processor computers that may be employed by various embodiments of the present invention. 図２は、本発明の種々の実施形態によって採用され得る、コンピュータのためのプロセッサユニットの概略図である。FIG. 2 is a schematic diagram of a processor unit for a computer that may be employed by various embodiments of the invention. 図３は、データ集合を構成する各１６ｋＢのメモリブロックを識別するオフセットをマッピングするオフセットマップの概略図である。FIG. 3 is a schematic diagram of an offset map for mapping an offset for identifying each 16 kB memory block constituting the data set. 図４は、本発明の種々の実施例に従って、マスタ計算デバイスの種々のプロセス間でデータを共有可能な並列処理計算システムの実施例を示す概略図である。FIG. 4 is a schematic diagram illustrating an embodiment of a parallel processing computing system capable of sharing data among various processes of a master computing device in accordance with various embodiments of the present invention. 図５は、本発明の種々の実施形態によって行なわれ得る演算を表す表である。FIG. 5 is a table representing operations that may be performed by various embodiments of the present invention. 図６は、本発明の種々の実施例に従って、マスタ計算デバイスの異なるプロセス間でデータを共有可能な並列処理計算システムの実施例を示す概略図である。FIG. 6 is a schematic diagram illustrating an embodiment of a parallel processing computing system that can share data between different processes of a master computing device, in accordance with various embodiments of the present invention. 図７および８は、データのブロックを読み出し、送信するためのデータフォーマットを示す。7 and 8 show the data format for reading and transmitting a block of data. 図７および８は、データのブロックを読み出し、送信するためのデータフォーマットを示す。7 and 8 show the data format for reading and transmitting a block of data.

（導入）
本発明の種々の実施形態は、第１のフォーマットから第２のフォーマットへの変換のために、複数の計算スレッド間のデータのセグメントを分散するためのツールおよび方法に関する。故に、本発明のいくつかの実施形態の側面は、少なくとも１つのマルチプロセッサマスタコンピュータと、複数の単一プロセッサスレーブコンピュータとを含む、計算ネットワーク間のデータセグメントの分散に対する特定の用途を有する。これらの実装の理解をより容易にするために、マルチプロセッサマスタコンピュータを複数の単一プロセッサスレーブコンピュータにリンクさせるネットワークの実施例が論じられる。 (Introduction)
Various embodiments of the present invention relate to tools and methods for distributing segments of data among a plurality of computational threads for conversion from a first format to a second format. Thus, aspects of some embodiments of the present invention have particular application to the distribution of data segments between computing networks, including at least one multiprocessor master computer and multiple single processor slave computers. In order to make these implementations easier to understand, an example of a network that links a multiprocessor master computer to a plurality of single processor slave computers is discussed.

（例示的動作環境）
本発明の種々の実施例によるソフトウェアツール（電子設計自動化ソフトウェアツール等）の実装は、１つ以上のプログラム可能な計算デバイスによって実行される、コンピュータで実行可能なソフトウェア命令を使用して実装されてもよい。本発明のこれらの実施例は、ソフトウェア命令を使用して実装され得るため、本発明の種々の実施形態が採用され得る汎用のプログラム可能なコンピュータシステムの構成要素および演算を最初に説明する。より具体的には、ホストまたはマスタコンピュータと、１つ以上のリモートまたはスレーブコンピュータとを有する、コンピュータネットワークの構成要素および演算について、図１を参照して説明する。しかしながら、この動作環境は、単に、好適な動作環境の一実施例であって、本発明の使用または機能性の範囲に関して、任意の制限を示唆することを意図するものではない。 (Example operating environment)
Implementation of a software tool (such as an electronic design automation software tool) according to various embodiments of the invention is implemented using computer-executable software instructions executed by one or more programmable computing devices. Also good. Since these examples of the invention may be implemented using software instructions, the components and operations of a general purpose programmable computer system in which various embodiments of the invention may be employed are first described. More specifically, the components and operations of a computer network having a host or master computer and one or more remote or slave computers will be described with reference to FIG. However, this operating environment is merely one example of a suitable operating environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention.

図１では、マスタコンピュータ１０１は、複数の入力および出力デバイス１０３と、メモリ１０５とを含む、マルチプロセッサコンピュータである。入力および出力デバイス１０３は、ユーザから入力データを受信するか、またはユーザに出力データを提供するための任意のデバイスを含んでもよい。入力デバイスとして、例えば、ユーザからの入力を受信するためのキーボード、マイクロホン、スキャナ、またはポインティングデバイスが挙げられる。次いで、出力デバイスとして、ディスプレイモニタ、スピーカ、プリンタ、または触覚フィードバックデバイスが挙げられる。これらのデバイスおよびそれらの接続は、当技術分野において周知であるので、ここでは詳細に考察しない。 In FIG. 1, the master computer 101 is a multiprocessor computer that includes a plurality of input and output devices 103 and a memory 105. Input and output device 103 may include any device for receiving input data from a user or providing output data to a user. Examples of the input device include a keyboard, a microphone, a scanner, and a pointing device for receiving input from a user. The output device can then include a display monitor, speaker, printer, or haptic feedback device. These devices and their connections are well known in the art and will not be discussed in detail here.

同様に、メモリ１０５は、マスタコンピュータ１０１によってアクセス可能なコンピュータ可読媒体の任意の組み合わせを使用して実装されてもよい。コンピュータ可読媒体として、例えば、読み書きメモリ（ＲＡＭ）、読み出し専用メモリ（ＲＯＭ）、電子的消去およびプログラムが可能な読み出し専用メモリ（ＥＥＰＲＯＭ）、またはフラッシュメモリマイクロ回路デバイス等のマイクロ回路メモリデバイス、ＣＤ−ＲＯＭディスク、デジタルビデオディスク（ＤＶＤ）、または他の光学記憶デバイスが挙げられる。また、コンピュータ可読媒体として、磁気カセット、磁気テープ、磁気ディスク、または他の磁気記憶デバイス、穿孔媒体、ホログラフィック記憶デバイス、あるいは、所望の情報を記憶するために使用することができる任意の他の媒体が挙げられる。 Similarly, memory 105 may be implemented using any combination of computer readable media accessible by master computer 101. Computer readable media include, for example, read / write memory (RAM), read only memory (ROM), read only memory (EEPROM) capable of electronic erasure and programming, or microcircuit memory devices such as flash memory microcircuit devices, CD- A ROM disk, a digital video disk (DVD), or other optical storage device. Also, as a computer readable medium, a magnetic cassette, magnetic tape, magnetic disk, or other magnetic storage device, perforated medium, holographic storage device, or any other that can be used to store desired information Medium.

以下に詳述するように、マスタコンピュータ１０１は、本発明の種々の実施例に従って、１つ以上の演算を行なうためのソフトウェアアプリケーションを実行する。故に、メモリ１０５は、実行時に１つ以上の演算を行なうためのソフトウェアアプリケーションを実装する、ソフトウェア命令１０７Ａを記憶する。また、メモリ１０５は、ソフトウェアアプリケーションと併用されるデータ１０７Ｂを記憶する。例示される実施形態では、データ１０７Ｂは、ソフトウェアアプリケーションが、そのうちの少なくともいくつかが並列で行われ得る演算を行なうために使用する、プロセスデータを含有する。 As detailed below, the master computer 101 executes a software application for performing one or more operations in accordance with various embodiments of the invention. Thus, the memory 105 stores software instructions 107A that implement a software application for performing one or more operations during execution. The memory 105 also stores data 107B that is used in combination with the software application. In the illustrated embodiment, data 107B contains process data that a software application uses to perform operations, at least some of which can be performed in parallel.

また、マスタコンピュータ１０１は、複数のプロセッサユニット１０９と、インターフェースデバイス１１１とを含む。プロセッサユニット１０９は、ソフトウェア命令１０７Ａを実行するようにプログラム可能である任意の種類のプロセッサデバイスであってもよいが、従来は、マイクロプロセッサデバイスである。例えば、プロセッサユニット１０９のうちの１つ以上は、Ｉｎｔｅｌ（登録商標）Ｐｅｎｔｉｕｍ（登録商標）またはＸｅｏｎ^ＴＭマイクロプロセッサ、ＡｄｖａｎｃｅｄＭｉｃｒｏＤｅｖｉｃｅｓＡｔｈｌｏｎ^ＴＭマイクロプロセッサ、あるいはＭｏｔｏｒｏｌａ６８Ｋ／Ｃｏｌｄｆｉｒｅマイクロプロセッサ等の市販の汎用のプログラム可能なマイクロプロセッサであってもよい。代替として、または加えて、プロセッサユニット１０９のうちの１つ以上は、特定の種類の数学的演算を最適に行なうように設計されたマイクロプロセッサ等の特別に製造されたプロセッサであってもよい。インターフェースデバイス１１１、プロセッサユニット１０９、メモリ１０５、および入力／出力デバイス１０３は、バス１１３によってともに接続される。 The master computer 101 includes a plurality of processor units 109 and an interface device 111. The processor unit 109 may be any type of processor device that is programmable to execute the software instructions 107A, but is conventionally a microprocessor device. For example, one or more of the processor units 109 may be a commercially available general purpose program such as an Intel® Pentium® or Xeon ^™ microprocessor, an Advanced Micro Devices Athlon ^™ microprocessor, or a Motorola 68K / Coldfire microprocessor. It may be a possible microprocessor. Alternatively or additionally, one or more of the processor units 109 may be a specially manufactured processor such as a microprocessor designed to optimally perform a particular type of mathematical operation. The interface device 111, the processor unit 109, the memory 105, and the input / output device 103 are connected together by a bus 113.

本発明のいくつかの実装の場合、マスタ計算デバイス１０１は、２つ以上のプロセッサコアを有する１つ以上の処理ユニット１０９を採用してもよい。故に、図２は、本発明の種々の実施形態とともに採用され得る、マルチコアプロセッサユニット１０９の実施例を示す。本図から分かるように、プロセッサユニット１０９は、複数のプロセッサコア２０１を含む。各プロセッサコア２０１は、計算エンジン２０３と、メモリキャッシュ２０５とを含む。当業者には周知のように、計算エンジンは、ソフトウェア命令を取り出し、次いで、取り出された命令内に指定されるアクションを行なう等、種々の計算関数を行なうための論理デバイスを含む。これらのアクションとして、例えば、加算、減算、乗算、および数の比較、ＡＮＤ、ＯＲ、ＮＯＲ、およびＸＯＲ等の論理演算の実行、ならびにデータの読み出しが挙げられる。次いで、各計算エンジン２０３は、実行のためのデータおよび／または命令を迅速に記憶および読み出すように、その対応するメモリキャッシュ２０５を使用してもよい。 In some implementations of the invention, the master computing device 101 may employ one or more processing units 109 having two or more processor cores. Thus, FIG. 2 illustrates an example of a multi-core processor unit 109 that may be employed with various embodiments of the present invention. As can be seen from this figure, the processor unit 109 includes a plurality of processor cores 201. Each processor core 201 includes a calculation engine 203 and a memory cache 205. As is well known to those skilled in the art, the computation engine includes logic devices for performing various computational functions, such as retrieving software instructions and then performing actions specified in the retrieved instructions. These actions include, for example, addition, subtraction, multiplication, number comparison, execution of logical operations such as AND, OR, NOR, and XOR, and reading of data. Each computation engine 203 may then use its corresponding memory cache 205 to quickly store and retrieve data and / or instructions for execution.

各プロセッサコア２０１は、相互接続２０７に接続される。相互接続２０７の特定の構造は、プロセッサユニット２０１のアーキテクチャに応じて可変であってもよい。ＳｏｎｙＣｏｒｐｏｒａｔｉｏｎ、ＴｏｓｈｉｂａＣｏｒｐｏｒａｔｉｏｎ、およびＩＢＭＣｏｒｐｏｒａｔｉｏｎによって製造されたセルマイクロプロセッサ等のいくつかのプロセッサユニット２０１の場合、相互接続２０７が、相互接続バスとして実装されてもよい。しかしながら、ＡｄｖａｎｃｅｄＭｉｃｒｏＤｅｖｉｃｅｓ（Ｓｕｎｎｙｖａｌｅ，Ｃａｌｉｆｏｒｎｉａ）から市販のＯｐｔｅｒｏｎ^ＴＭおよびＡｔｈｌｏｎ^ＴＭアルコアプロセッサ等の他のプロセッサユニット２０１の場合、相互接続２０７は、システム要求インターフェースデバイスとして実装されてもよい。いずれの場合でも、プロセッサコア２０１は、相互接続２０７を通して、入力／出力インターフェース２０９と、メモリコントローラ２１１と通信する。入力／出力インターフェース２０９は、プロセッサユニット２０１とバス１１３との間の通信インターフェースを提供する。同様に、メモリコントローラ２１１は、プロセッサユニット２０１とシステムメモリ１０７との間の情報交換を制御する。本発明のいくつかの実装の場合、プロセッサユニット２０１は、プロセッサコア２０１によって共有される、アクセス可能な高レベルキャッシュメモリ等の付加的な構成要素を含んでもよい。 Each processor core 201 is connected to an interconnect 207. The particular structure of interconnect 207 may vary depending on the architecture of processor unit 201. For some processor units 201 such as cell microprocessors manufactured by Sony Corporation, Toshiba Corporation, and IBM Corporation, interconnect 207 may be implemented as an interconnect bus. However, in the case of other processor units 201 such as the Opteron ^™ and Athlon ^™ Alcore processors commercially available from Advanced Micro Devices (Sunnyvale, California), interconnect 207 may be implemented as a system request interface device. In any case, the processor core 201 communicates with the input / output interface 209 and the memory controller 211 through the interconnect 207. The input / output interface 209 provides a communication interface between the processor unit 201 and the bus 113. Similarly, the memory controller 211 controls information exchange between the processor unit 201 and the system memory 107. In some implementations of the invention, the processor unit 201 may include additional components such as accessible high level cache memory shared by the processor core 201.

図２は、本発明のいくつかの実施形態によって採用され得るプロセッサユニット２０１の一例示を示すが、本例示は、代表例に過ぎず、限定することを意図するものではないことを理解されたい。例えば、本発明のいくつかの実施形態は、１つ以上のセルプロセッサを有するマスタコンピュータ１０１を採用してもよい。セルプロセッサは、複数の入力／出力インターフェース２０９と、複数のメモリコントローラ２１１とを採用する。また、セルプロセッサは、異なる種類の９つの別個のプロセッサコア２０１を有する。より具体的には、６つ以上の相乗プロセッサ要素（ＳＰＥ）と、パワープロセッサ要素（ＰＰＥ）とを有する。各相乗プロセッサ要素は、１２８×１２８ビットのレジスタを有するベクトルタイプの計算エンジン２０３と、４つの単精度浮動小数点計算ユニットと、４つの整数計算ユニットと、命令およびデータの両方を記憶する２５６ＫＢのローカル記憶メモリとを有する。次いで、パワープロセッサ要素は、相乗プロセッサ要素によって行なわれるタスクを制御する。その構成のため、セルプロセッサは、多くの従来のプロセッサよりも実質的に高速で、高速フーリエ変換（ＦＦＴ）の計算等のいくつかの数学的演算を行なうことができる。 Although FIG. 2 illustrates one example of a processor unit 201 that may be employed by some embodiments of the present invention, it should be understood that this example is only representative and is not intended to be limiting. . For example, some embodiments of the present invention may employ a master computer 101 having one or more cell processors. The cell processor employs a plurality of input / output interfaces 209 and a plurality of memory controllers 211. The cell processor also has nine separate processor cores 201 of different types. More specifically, it has six or more synergistic processor elements (SPE) and power processor elements (PPE). Each synergistic processor element includes a vector-type computation engine 203 with 128 × 128-bit registers, four single-precision floating-point computation units, four integer computation units, and a 256 KB local store for both instructions and data. And a storage memory. The power processor element then controls the tasks performed by the synergistic processor element. Because of its configuration, the cell processor is substantially faster than many conventional processors and can perform some mathematical operations such as fast Fourier transform (FFT) calculations.

図１に戻ると、インターフェースデバイス１１１は、通信インターフェースを通して、マスタコンピュータ１０１に、スレーブコンピュータ１１５Ａ、１１５Ｂ、１１５Ｃ、・・・１１５ｘと通信させる。通信インターフェースは、例えば、従来の有線ネットワーク接続または光透過性の有線ネットワーク接続を含む、任意の好適な種類のインターフェースであってもよい。また、通信インターフェースは無線光学接続、無線周波数接続、赤外線接続、さらには音響接続等の無線接続であってもよい。インターフェースデバイス１１１は、マスタコンピュータ１０１およびスレーブコンピュータ１１５のそれぞれからのデータおよび制御信号を、伝送制御プロトコル（ＴＣＰ）、ユーザデータグラムプロトコル（ＵＤＰ）、およびインターネットプロトコル（ＩＰ）等の１つ以上の通信プロトコルに従って、ネットワークメッセージに変換する。これらの、および他の従来の通信プロトコルは、当技術分野において周知であるので、ここでは詳細に考察しない。 Referring back to FIG. 1, the interface device 111 causes the master computer 101 to communicate with the slave computers 115A, 115B, 115C,... 115x through the communication interface. The communication interface may be any suitable type of interface including, for example, a conventional wired network connection or a light transmissive wired network connection. The communication interface may be a wireless connection such as a wireless optical connection, a radio frequency connection, an infrared connection, or an acoustic connection. The interface device 111 transmits data and control signals from each of the master computer 101 and the slave computer 115 to one or more communications such as a transmission control protocol (TCP), a user datagram protocol (UDP), and an Internet protocol (IP). Convert to network message according to protocol. These and other conventional communication protocols are well known in the art and will not be discussed in detail here.

各スレーブコンピュータ１１５は、メモリ１１７と、プロセッサユニット１１９と、インターフェースデバイス１２１と、任意に、システムバス１２５によって、ともに接続される１つ以上の入力／出力デバイス１２３とを含んでもよい。マスタコンピュータ１０１の場合のように、スレーブコンピュータ１１５のための任意の入力／出力デバイス１２３は、キーボード、ポインティングデバイス、マイクロホン、ディスプレイモニタ、スピーカ、およびプリンタ等、任意の従来の入力または出力デバイスを含んでもよい。同様に、プロセッサユニット１１９は、任意の種類の従来の、または特別に製造されたプログラム可能なプロセッサデバイスであってもよい。例えば、プロセッサユニット１１９のうちの１つ以上は、Ｉｎｔｅｌ（登録商標）Ｐｅｎｔｉｕｍ（登録商標）またはＸｅｏｎ^ＴＭマイクロプロセッサ、ＡｄｖａｎｃｅｄＭｉｃｒｏＤｅｖｉｃｅｓＡｔｈｌｏｎ^ＴＭマイクロプロセッサ、あるいはＭｏｔｏｒｏｌａ６８Ｋ／Ｃｏｌｄｆｉｒｅ（登録商標）マイクロプロセッサ等の市販の汎用のプログラム可能なマイクロプロセッサであってもよい。代替として、プロセッサユニット１０９のうちの１つ以上は、特定の種類の数学的演算を最適に行なうように設計されたマイクロプロセッサ等の特別に製造されたプロセッサであってもよい。 Each slave computer 115 may include a memory 117, a processor unit 119, an interface device 121, and optionally one or more input / output devices 123 connected together by a system bus 125. As with master computer 101, optional input / output device 123 for slave computer 115 includes any conventional input or output device such as a keyboard, pointing device, microphone, display monitor, speakers, and printer. But you can. Similarly, the processor unit 119 may be any type of conventional or specially manufactured programmable processor device. For example, one or more of the processor units 119 may be an Intel® Pentium® or Xeon ^™ microprocessor, an Advanced Micro Devices Athlon ^™ microprocessor, or a Motorola 68K / Coldfire® microprocessor, etc. It may be a commercially available general-purpose programmable microprocessor. Alternatively, one or more of the processor units 109 may be specially manufactured processors such as microprocessors designed to optimally perform certain types of mathematical operations.

さらに、プロセッサユニット１１９のうちの１つ以上は、図２を参照して上述したように、２つ以上のコアを有してもよい。例えば、本発明のいくつかの実装の場合、プロセッサユニット１１９のうちの１つ以上は、セルプロセッサであってもよい。次いで、メモリ１１７は、上述したコンピュータ可読媒体の任意の組み合わせを使用して、実装されてもよい。インターフェースデバイス１１１と同様に、インターフェースデバイス１２１は、通信インターフェースを介して、スレーブコンピュータ１１５にマスタコンピュータ１０１と通信させる。 Further, one or more of the processor units 119 may have two or more cores as described above with reference to FIG. For example, in some implementations of the invention, one or more of the processor units 119 may be a cell processor. The memory 117 may then be implemented using any combination of the computer readable media described above. Similar to the interface device 111, the interface device 121 causes the slave computer 115 to communicate with the master computer 101 via a communication interface.

例示される実施例では、マスタコンピュータ１０１は、複数のプロセッサユニット１０９を有するマルチプロセッサユニットコンピュータである一方、各スレーブコンピュータ１１５は、単一プロセッサユニット１１９を有する。しかしながら、本発明の代替実装は、単一プロセッサユニット１０９を有するマスタコンピュータを採用してもよいことに留意されたい。さらに、スレーブコンピュータ１１５のうちの１つ以上は、意図される使用に応じて、複数のプロセッサユニット１１９を有してもよい。また、単一インターフェースデバイス１１１のみが、ホストコンピュータ１０１に対して示されているが、本発明の代替的な実施形態の場合、コンピュータ１０１は、複数の通信インターフェースを介して、リモートコンピュータ１１５と通信するために、２つ以上の異なるインターフェースデバイス１１１を使用してもよいことに留意されたい。 In the illustrated embodiment, the master computer 101 is a multiprocessor unit computer having a plurality of processor units 109, while each slave computer 115 has a single processor unit 119. However, it should be noted that alternative implementations of the present invention may employ a master computer having a single processor unit 109. Further, one or more of the slave computers 115 may have multiple processor units 119 depending on the intended use. Also, although only a single interface device 111 is shown for the host computer 101, in an alternative embodiment of the present invention, the computer 101 communicates with the remote computer 115 via multiple communication interfaces. Note that two or more different interface devices 111 may be used to do this.

本発明の種々の実施例の場合、マスタコンピュータ１０１は、１つ以上の外部データ記憶デバイスに接続されてもよい。これらの外部データ記憶デバイスは、マスタコンピュータ１０１によってアクセス可能なコンピュータ可読媒体の任意の組み合わせを使用して実装されてもよい。コンピュータ可読媒体として、例えば、読み書きメモリ（ＲＡＭ）、読み出し専用メモリ（ＲＯＭ）、電子的消去およびプログラムが可能な読み出し専用メモリ（ＥＥＰＲＯＭ）、またはフラッシュメモリマイクロ回路デバイス等のマイクロ回路メモリデバイス、ＣＤ−ＲＯＭディスク、デジタルビデオディスク（ＤＶＤ）、または他の光学記憶デバイスが挙げられる。また、コンピュータ可読媒体として、磁気カセット、磁気テープ、磁気ディスク、または他の磁気記憶デバイス、穿孔媒体、ホログラフィック記憶デバイス、あるいは所望の情報を記憶するために使用可能な他の任意の媒体が挙げられる。本発明のいくつかの実装によると、スレーブコンピュータ１１５のうちの１つ以上は、代替として、または加えて、１つ以上の外部データ記憶バイスに接続されてもよい。典型的には、これらの外部データ記憶デバイスは、同様にマスタコンピュータ１０１に接続されるデータ記憶デバイスを含むが、それらは、マスタコンピュータ１０１によってアクセス可能な任意のデータ記憶デバイスと異なってもよい。 In various embodiments of the present invention, the master computer 101 may be connected to one or more external data storage devices. These external data storage devices may be implemented using any combination of computer readable media accessible by the master computer 101. Computer readable media include, for example, read / write memory (RAM), read only memory (ROM), read only memory (EEPROM) capable of electronic erasure and programming, or microcircuit memory devices such as flash memory microcircuit devices, CD- A ROM disk, a digital video disk (DVD), or other optical storage device. Computer-readable media also include magnetic cassettes, magnetic tapes, magnetic disks, or other magnetic storage devices, perforated media, holographic storage devices, or any other media that can be used to store desired information. It is done. According to some implementations of the invention, one or more of slave computers 115 may alternatively or additionally be connected to one or more external data storage devices. Typically, these external data storage devices include data storage devices that are also connected to the master computer 101, but they may be different from any data storage devices accessible by the master computer 101.

（演算）
上述のように、本発明の種々の側面は、マルチプロセッサアーキテクチャを有する計算システムによる演算の実行をサポートするために実装されてもよい。故に、本発明の異なる実施形態は、種々の異なる種類のソフトウェアアプリケーションとともに採用可能である。しかしながら、本発明のいくつかの実施形態は、マイクロ回路等のマイクロデバイスを表現する設計データをシミュレート、検証、または修正するための演算を行なう電子設計自動化ソフトウェアツールとの併用において、特に有用である場合がある。マイクロ回路デバイスの設計および製造は、「設計フロー」プロセス中に多くのステップを有する。これらのステップは、マイクロ回路の種類、その複雑性、設計チーム、およびマイクロ回路製造業者または製造工場に大きく依存する。いくつかのステップは、全設計フローに共通である。最初に、設計仕様が、典型的には、ハードウェア設計言語（ＨＤＬ）で論理的にモデル化される。次いで、ソフトウェアおよびハードウェア「ツール」は、ソフトウェアシミュレータおよび／またはハードウェアエミュレータを実行することによって、設計の種々の段階において設計を検証し、エラーが補正される。 (Calculation)
As mentioned above, various aspects of the invention may be implemented to support execution of operations by a computing system having a multiprocessor architecture. Thus, different embodiments of the present invention can be employed with a variety of different types of software applications. However, some embodiments of the present invention are particularly useful in conjunction with electronic design automation software tools that perform operations to simulate, verify, or modify design data representing microdevices such as microcircuits. There may be. The design and manufacture of microcircuit devices has many steps during the “design flow” process. These steps are highly dependent on the type of microcircuit, its complexity, design team, and microcircuit manufacturer or factory. Some steps are common to the entire design flow. Initially, the design specification is typically logically modeled in a hardware design language (HDL). The software and hardware “tools” then validate the design and correct errors at various stages of the design by running a software simulator and / or hardware emulator.

論理設計が満足できるものであるとみなされた後、合成ソフトウェアによって物理的設計データに変換される。物理的設計データは、例えば、製造工場でのフォトリソグラフィプロセスにおいて、所望のマイクロ回路デバイスを製造するために使用されるマスク上へ書き込まれる、幾何学的パターンを表現してもよい。物理的設計情報は、デバイスの適切な演算のための設計仕様および論理設計を正確に具現化することが非常に重要である。さらに、物理的設計データは、製造工場で使用されるマスクを作製するために採用されるため、データは、製造工場の要件に適合しなければならない。各製造工場は、それらのプロセス、機器、および技術とのコンプライアンスのために、その独自の物理的設計パラメータを指定する。故に、設計フローは、設計ルールチェックプロセスを含んでもよい。本プロセスの際に、回路設計の物理的レイアウトが、設計ルールと比較される。製造工場によって指定されたルールに加えて、設計ルールチェックプロセスは、回路設計の物理的レイアウトを、試験チップ、業界における一般知識等から得られたルール等の他の設計ルールに対してチェックすることも可能である。 After the logical design is deemed satisfactory, it is converted into physical design data by synthesis software. The physical design data may represent a geometric pattern that is written onto a mask used to produce a desired microcircuit device, for example, in a photolithography process at a manufacturing plant. It is very important that the physical design information accurately embodies the design specifications and logical design for proper operation of the device. Furthermore, since the physical design data is employed to create a mask for use in a manufacturing plant, the data must conform to the manufacturing plant requirements. Each manufacturing plant specifies its own physical design parameters for compliance with their processes, equipment, and technology. Thus, the design flow may include a design rule check process. During this process, the physical layout of the circuit design is compared with the design rules. In addition to the rules specified by the manufacturing plant, the design rule checking process checks the physical layout of the circuit design against other design rules, such as rules derived from test chips, general industry knowledge, etc. Is also possible.

設計者が、検証ソフトウェアアプリケーションを使用して、回路設計の物理的レイアウトが設計ルールに準拠することを検証すると、設計者は、回路設計の物理的レイアウトを修正し、物理的レイアウトがフォトリソグラフィプロセスの際にもたらす画像の解像度を向上させてもよい。これらの解像度向上技術（ＲＥＴ）として、例えば、光近接効果補正（ＯＰＣ）を使用した、またはサブ解像度補助特徴（ＳＲＡＦ）を加えることによる、物理的レイアウトの修正が挙げられる。解像度向上技術を使用して回路設計の物理的レイアウトが修正されると、設計ルールチェックが、修正されたレイアウト上で行なわれてもよく、所望の程度の解像度が得られるまで、プロセスが繰り返される。そのようなシミューレションおよび検証ツールの実施例は、２００１年５月８日に発行のＭｃＳｈｅｒｒｙらの米国特許第６，２３０，２９９号、２００１年６月１９日に発行のＭｃＳｈｅｒｒｙらの米国特許第６，２４９，９０３号、２００２年１月１５日に発行のＥｉｓｅｎｈｏｆｅｒらの米国特許第６，３３９，８３６号、２００２年５月２８日に発行のＢｏｚｋｕｓらの米国特許第６，３９７，３７２号、２００２年７月２日に発行のＡｎｄｅｒｓｏｎらの米国特許第６，４１５，４２１号、および２００２年７月２３日に発行の、Ａｎｄｅｒｓｏｎらの米国特許第６，４２５，１１３号に記載されており、それぞれ参照することにより本明細書に組み込まれる。 When the designer uses a verification software application to verify that the physical layout of the circuit design complies with the design rules, the designer modifies the physical layout of the circuit design, and the physical layout becomes a photolithography process. You may improve the resolution of the image brought about in this case. These resolution enhancement techniques (RET) include physical layout modifications using, for example, optical proximity correction (OPC) or by adding sub-resolution assist features (SRAF). When the physical layout of the circuit design is modified using resolution enhancement techniques, a design rule check may be performed on the modified layout and the process is repeated until the desired degree of resolution is obtained. . Examples of such simulation and verification tools are disclosed in McSherry et al. US Pat. No. 6,230,299, issued May 8, 2001, and McSherry et al. US Pat. No. 6,249,903, U.S. Pat. No. 6,339,836 issued to Eisenhofer et al. Issued on Jan. 15, 2002, U.S. Pat. No. 6,397,372 issued to Bozkus et al. Issued May 28, 2002. US Pat. No. 6,415,421 issued to Anderson et al. Issued July 2, 2002, and US Pat. No. 6,425,113 issued to Anderson et al. Issued July 23, 2002. Each of which is incorporated herein by reference.

新しい集積回路の設計は、数百万ものトランジスタ、レジスタ、コンデンサ、または他の電気的構造体の論理回路、メモリ回路、プログラム可能なフィールドアレイ、および他の回路デバイス内への相互接続を含んでもよい。コンピュータが、より容易にこれらの大きなデータ構造を作製および解析できるように（およびヒトユーザが、これらのデータ構造をより良く理解できるように）、しばしば、典型的には、「セル」と称される小さいデータ構造に階層的に編成される。したがって、マイクロプロセッサまたはフラッシュメモリ設計の場合、単一ビットを記憶するためのメモリ回路を構成するトランジスタの全ては、単一「ビットメモリ」セルに分類されてもよい。したがって、各トランジスタを個々に列挙しなければならないというよりは、単一ビットメモリ回路を構成するトランジスタ群は、集合的に、単一ユニットと称されて、そのように操作することができる。同様に、より大きい１６ビットメモリレジスタ回路を記述する設計データを、単一のセルに分類することができる。次いで、この高レベル「レジスタセル」は、ビットメモリセルのそれぞれとデータをやりとりするための入力／出力回路等の他の種々の回路を記述する設計データとともに、１６ビットメモリセルを含む場合もある。次いで、同様に、１２８ｋＢのメモリアレイを記述する設計データは、レジスタセルのそれぞれとデータのやりとりを行う入力／出力回路等のその独自の種々の回路を記述する設計データとともに、わずか６４，０００のレジスタセルの組み合わせとして簡潔に記述することができる。 New integrated circuit designs may include interconnections of millions of transistors, resistors, capacitors, or other electrical structures into logic circuits, memory circuits, programmable field arrays, and other circuit devices. Good. To make it easier for computers to create and analyze these large data structures (and to allow human users to better understand these data structures), they are often referred to as “cells”. Hierarchically organized into smaller data structures. Thus, for a microprocessor or flash memory design, all of the transistors that make up the memory circuit for storing a single bit may be classified as a single “bit memory” cell. Thus, rather than having to enumerate each transistor individually, the group of transistors that make up a single bit memory circuit are collectively referred to as a single unit and can be manipulated as such. Similarly, design data describing a larger 16-bit memory register circuit can be classified into a single cell. This high-level “register cell” may then include 16-bit memory cells along with design data describing various other circuits such as input / output circuits for exchanging data with each of the bit memory cells. . Similarly, the design data describing the 128 kB memory array, together with design data describing its own various circuits, such as input / output circuits that exchange data with each of the register cells, is only 64,000. It can be simply described as a combination of register cells.

マイクロ回路設計データを階層的セルに分類することによって、大きいデータ構造を、より迅速かつ効率的に処理することができる。例えば、回路設計者は、典型的には、設計を解析して、設計に記述された各回路特徴が、その設計からマイクロ回路を製造する製造工場によって指定された設計ルールに準拠することを確実にする。上述の実施例の場合、１２８ｋＢのメモリアレイ全体の各特徴を解析しなければならない代わりに、設計ルールチェックプロセスは、単一ビットセルの特徴を解析することができる。次いで、チェックの結果は、単一ビットセルの全てに適用することができる。単一ビットセルの１つのインスタンスが、設計ルールに準拠することが確認されると、設計ルールチェックプロセスは、（それ自体が１つ以上の階層的セルを構成することができる）その付加的な種々の回路の特徴を単に解析することによって、レジスタセルの解析を完了することができる。次いで、このチェックの結果は、レジスタセルの全てに適用することができる。レジスタセルの１つのインスタンスが、設計ルールに準拠することが確認されると、設計ルールチェックソフトウェアアプリケーションは、メモリアレイにおける付加的な種々の回路の特徴を単に解析することによって、１２８ｋＢのメモリアレイ全体の解析を完了することができる。したがって、大きいデータ構造の解析は、比較的少数のデータ構造を構成するセルの解析に圧縮することができる。 By classifying microcircuit design data into hierarchical cells, large data structures can be processed more quickly and efficiently. For example, circuit designers typically analyze a design to ensure that each circuit feature described in the design conforms to design rules specified by the manufacturing plant that produces the microcircuit from that design. To. In the embodiment described above, instead of having to analyze each feature of the entire 128 kB memory array, the design rule check process can analyze the features of a single bit cell. The result of the check can then be applied to all of the single bit cells. Once it is determined that one instance of a single bit cell complies with the design rules, the design rule check process can add its various additional features (which can themselves constitute one or more hierarchical cells). The analysis of the register cell can be completed simply by analyzing the characteristics of the circuit. The result of this check can then be applied to all of the register cells. Once one instance of a register cell is confirmed to comply with the design rules, the design rule check software application can analyze the entire 128 kB memory array by simply analyzing the additional various circuit features in the memory array. Analysis can be completed. Thus, analysis of large data structures can be compressed into analysis of cells that make up a relatively small number of data structures.

典型的には、マイクロ回路の物理的設計データは、２つの異なる種類のデータ（「描画層」設計データおよび「導出層」設計データ）を含むであろう。描画層データは、マイクロ回路を形成する材料の層内に描画されるポリゴンを記述する。描画層データは、通常、金属層、拡散層、およびポリシリコン層内のポリゴンを含むであろう。次いで、導出層は、描画層データおよび他の導出層データの組み合わせから構成される特徴を含むであろう。例えば、上述のトランジスタゲートの場合、ゲートを記述する導出層設計データは、ポリシリコン材料層におけるポリゴンと拡散材料層におけるポリゴンとの交差点から導出されるであろう。 Typically, the physical design data for a microcircuit will include two different types of data: “drawing layer” design data and “derived layer” design data. The drawing layer data describes a polygon drawn in the layer of material forming the microcircuit. The drawing layer data will typically include polygons in the metal layer, diffusion layer, and polysilicon layer. The derived layer will then include features composed of a combination of drawing layer data and other derived layer data. For example, for the transistor gate described above, the derived layer design data describing the gate would be derived from the intersection of the polygon in the polysilicon material layer and the polygon in the diffusion material layer.

例えば、設計ルールチェックソフトウェアアプリケーションは、２つの種類の演算を行なうであろう（設計データ値が指定パラメータに準拠するかどうかを確認する「チェック」演算、および導出層データを生成する「導出」演算）。したがって、トランジスタゲート設計データは、以下の導出演算によって生成されてもよい。 For example, a design rule check software application will perform two types of operations (a “check” operation to check whether the design data values conform to specified parameters, and a “derivation” operation to generate derivation layer data. ). Therefore, the transistor gate design data may be generated by the following derivation operation.

ｇａｔｅ＝ｄｉｆｆＡＮＤｐｏｌｙ
この演算の結果は、拡散層ポリゴンとポリシリコン層ポリゴンとの全ての交差点を識別するであろう。同様に、拡散層をｎ型材料でドープすることによって形成されるｐ型トランジスタゲートは、以下の導出演算によって識別される。 gate = diff AND poly
The result of this operation will identify all intersections between the diffusion layer polygon and the polysilicon layer polygon. Similarly, a p-type transistor gate formed by doping a diffusion layer with an n-type material is identified by the following derivation operation.

ｐｇａｔｅ＝ｎｗｅｌｌＡＮＤｇａｔｅ
次いで、この演算の結果は、拡散層におけるポリゴンをｎ型材料でドープした全てのトランジスタゲート（すなわち、ポリシリコン層ポリゴンと拡散層ポリゴンとの交差点）を識別するであろう。 pgate = nwell AND gate
The result of this operation will then identify all transistor gates that have doped the polygon in the diffusion layer with n-type material (ie, the intersection of the polysilicon layer polygon and the diffusion layer polygon).

次いで、チェック演算は、データ設計値のパラメータまたはパラメータ範囲を定義するであろう。例えば、ユーザは、いかなる金属配線ラインも、別の配線ラインの１ミクロンの範囲内にないことを確実にすることを所望する場合がある。この種の解析は、以下のチェック演算によって行なわれてもよい。 The check operation will then define a parameter or parameter range for the data design value. For example, the user may wish to ensure that no metal wiring line is within 1 micron of another wiring line. This type of analysis may be performed by the following check operation.

ｅｘｔｅｒｎａｌｍｅｔａｌ＜１
この演算の結果は、金属層設計データにおける別のポリゴンに１ミクロンよりも近く接近している、金属層設計データにおける各ポリゴンを識別するであろう。 external metal <1
The result of this operation will identify each polygon in the metal layer design data that is closer than 1 micron to another polygon in the metal layer design data.

また、上述の演算は、描画層データを採用するが、チェック演算は、同様に、導出層データ上で行なわれてもよい。例えば、ユーザが、いかなるトランジスタゲートも別のゲートの１ミクロン以内に位置していないことを確認することを所望する場合、設計ルールチェックプロセスは、以下のチェック演算を含む場合がある。 Further, although the above-described calculation employs drawing layer data, the check calculation may be similarly performed on the derived layer data. For example, if the user wishes to verify that no transistor gate is located within 1 micron of another gate, the design rule check process may include the following check operations:

ｅｘｔｅｒｎａｌｇａｔｅ＜１
この演算の結果は、別のゲートから１ミクロン未満に配置されたゲートを表現する全てのゲート設計データを識別するであろう。しかしながら、描画層設計データからゲートを識別する導出演算が行なわれるまで、このチェック演算を行なうことができないことを理解されたい。 external gate <1
The result of this operation will identify all gate design data representing a gate located less than 1 micron from another gate. However, it should be understood that this check operation cannot be performed until a derivation operation for identifying the gate from the drawing layer design data is performed.

表１は、疑似コードにおける回路解析プロセスの実施例を記述する。このようなフローの実施例は、ＭｅｎｔｏｒＧｒａｐｈｉｃｓＣｏｒｐｏｒａｔｉｏｎ（Ｗｉｌｓｏｎｖｉｌｌｅ，Ｏｒｅｇｏｎ）から市販の電子設計ツールであるＣａｌｉｂｒｅファミリを使用して、実装される場合がある。特に、表１は、単位領域あたりの所望の量の構造密度を達成するために、設計内の層に追加される「ダミー」幾何学データ（典型的には、矩形）を挿入するためのアルゴリズムのフローをリストアップする。 Table 1 describes an example of a circuit analysis process in pseudocode. An example of such a flow may be implemented using the Calibre family, a commercially available electronic design tool from Mentor Graphics Corporation (Wilsonville, Oregon). In particular, Table 1 shows an algorithm for inserting “dummy” geometric data (typically a rectangle) that is added to a layer in the design to achieve the desired amount of structural density per unit area. List the flows.

したがって、このフローは、アルゴリズムによって採用される一連の入力を指定する。特に、電子設計自動化ツールの場合、入力データが読み出されるデータベース（すなわち、ｉｎ．ｇｄｓ）、ＧＤＳＩＩデータフォーマットで電子設計自動化ツールに読み込まれる特定のレイアウト設計（すなわち、「ＧＤＳＩＩ」と称される回路設計）、および設計内の主要セル（すなわち、「Ｔｏｐｃｅｌｌ」と称されるセル）を識別する。

This flow thus specifies a set of inputs that are employed by the algorithm. In particular, in the case of an electronic design automation tool, a database from which input data is read (ie, in.gds), a specific layout design that is read into the electronic design automation tool in the GDSII data format (ie, a circuit design called “GDSII”) ), And the primary cell in the design (ie, the cell referred to as “Topcell”).

（データ編成）
以下により詳細に論じられるように、本発明の種々の実装は、マスタ計算デバイス、１つ以上のリモート計算デバイス、または両方の上で実行する種々の計算プロセスにデータを転送する。故に、本発明の種々の実装の場合、ソフトウェアツールは、定義されたサイズのブロック内のデータを処理するように構成される。例えば、本発明のいくつかの実装の場合、ソフトウェアツールは、１６キロバイトのブロック内のデータを読み出し、転送、および保存するように構成されてもよい。より具体的には、以下により詳細に論じられるように、本発明の種々の実装は、データブロックリソースマネージャを採用し、マスタ計算デバイスのメモリにデータを保存し、そこからデータを読み出す。本発明の種々の実施形態の場合、メモリは、マスタ計算デバイスの処理ユニットに隣接して動作する集積回路メモリデバイスによって提供される「高速」メモリ、磁気または光学ディスクによって提供される「低速」メモリ、またはそれらのいくつかの組み合わせであってもよい。 (Data organization)
As discussed in more detail below, various implementations of the invention transfer data to various computing processes executing on a master computing device, one or more remote computing devices, or both. Thus, for various implementations of the present invention, the software tool is configured to process data in blocks of a defined size. For example, in some implementations of the invention, the software tool may be configured to read, transfer, and store data in a 16 kilobyte block. More specifically, as discussed in more detail below, various implementations of the invention employ a data block resource manager to store and retrieve data from the memory of the master computing device. For various embodiments of the present invention, the memory is a “fast” memory provided by an integrated circuit memory device operating adjacent to the processing unit of the master computing device, a “slow” memory provided by a magnetic or optical disk. Or some combination thereof.

データブロックリソースマネージャは、例えば、従来のハードドライブ記憶デバイス内で採用される磁気ディスク等のディスク上に単一ファイルを生成してもよい。このファイルは、１６ｋＢブロックに分割され、必要に応じて、データブロックリソースマネージャによって拡張可能である。以下により詳細に論じられるように、データブロックリソースマネージャは、ＤＢＬＯＣＫＩＤを使用して、メモリに関連するデータ集合にアクセスしてもよい（本明細書で使用されるように、用語「アクセス」は、メモリからデータを読み出す、またはメモリにデータを保存するための演算を包含するために使用される）。データ集合は、１つ以上の実際の１６ｋＢブロックのファイルを含んでもよい。この事例では、ＤＢＬＯＣＫＩＤは、図３に例示されるように、データ集合を構成する各１６ｋＢのメモリブロックを識別するオフセットをマッピングする、オフセットマップを参照する。当業者によって理解されるように、２つのＤＢＬＯＣＫＩＤが、同一ファイルオフセットを参照すべきではない。 The data block resource manager may generate a single file on a disk, such as a magnetic disk employed in a conventional hard drive storage device, for example. This file is divided into 16 kB blocks and can be expanded by the data block resource manager as needed. As discussed in more detail below, the data block resource manager may use DBLOCK ID to access a data set associated with memory (as used herein, the term “access” , Used to encompass operations for reading data from memory or storing data in memory). The data set may include one or more actual 16 kB block files. In this case, the DBLOCK ID refers to an offset map that maps offsets that identify each 16 kB memory block that makes up the data set, as illustrated in FIG. As will be appreciated by those skilled in the art, two DBLOCK IDs should not refer to the same file offset.

しかしながら、典型的な関連データ集合は、１６ｋＢのいくつかの整数量でなくてもよいことを理解されたい。例えば、本発明のいくつかの実装の場合、ソフトウェアツールは、ＭｅｎｔｏｒＧｒａｐｈｉｃｓＣｏｒｐｏｒａｔｉｏｎ（Ｗｉｌｓｏｎｖｉｌｌｅ，Ｏｒｅｇｏｎ）から市販のソフトウェア製品であるＣａｌｉｂｒｅ（登録商標）ファミリのソフトウェアツール等の電子設計自動化ソフトウェアツールであってもよい。このような種類のソフトウェアツールの場合、関連データ集合は、集積回路のレイアウト設計からの幾何学的要素のシーケンスを含んでもよい。幾何学的シーケンス等の関連データ集合のサイズが１６ｋＢを超える場合、全ての初期フル１６ｋＢブロックのデータは、磁気ディスク等の「低速」メモリ記憶デバイスに移行され、上述のように、共通のＤＢＬＯＣＫＩＤを使用して参照されてもよい。次いで、常に非空である「ｄｒａｂｂｌｅ」は、メモリ内の「高速」またはローカル残部に保存可能である。 However, it should be understood that a typical related data set may not be some integer quantity of 16 kB. For example, in some implementations of the present invention, the software tool is an electronic design automation software tool such as the Calibre® family of software tools that are commercially available software products from Mentor Graphics Corporation (Wilsonville, Oregon). Also good. For these types of software tools, the associated data set may include a sequence of geometric elements from the layout design of the integrated circuit. If the size of the associated data set, such as a geometric sequence, exceeds 16 kB, all initial full 16 kB blocks of data are migrated to a “slow” memory storage device such as a magnetic disk and, as described above, a common DBLOCK ID May be referred to using A “drabble” that is always non-empty can then be stored in a “fast” or local remainder in memory.

この配列を使用することによって、データブロックリソースマネージャがアクセスデータを検索する時、関連付けられたＤＢＬＯＣＫＩＤを使用して行なうことが可能である。例えば、データブロックリソースマネージャがデータ集合を読み出す必要がある場合、最初に、そのデータ集合と関連付けられたＤＢＬＯＣＫＩＤを識別する。関連ＤＢＬＯＣＫＩＤを識別後、データブロックリソースマネージャは、ＤＢＬＯＣＫＩＤと関連付けられたオフセットを使用して、１６ｋＢずつデータの読み出しを開始する。読み出されるデータブロックが全て読み出され、処理された後、データブロックマネージャは、「ｄｒａｂｂｌｅ」の残り（ローカルメモリ内に保存されてもよい）を読み出し、処理する。次いで、アクセスされる次のデータ集合のためにＤＢＬＯＣＫＩＤを識別してもよい。このように、本発明の種々の実装は、例えば、ハードディスク記憶デバイス上のメモリの１６Ｋｂブロックの一部を浪費する必要なく、データ集合を記憶するために使用されるローカルメモリの量を縮小することが可能である。 By using this array, it is possible for the data block resource manager to use the associated DBLOCK ID when retrieving access data. For example, when a data block resource manager needs to read a data set, it first identifies the DBLOCK ID associated with that data set. After identifying the associated DBLOCK ID, the data block resource manager starts reading data by 16 kB using the offset associated with the DBLOCK ID. After all of the data blocks to be read are read and processed, the data block manager reads and processes the remainder of “able” (which may be stored in local memory). The DBLOCK ID may then be identified for the next data set to be accessed. Thus, various implementations of the present invention reduce the amount of local memory used to store a data set without having to waste, for example, a portion of the 16 Kb block of memory on a hard disk storage device. Is possible.

（共有データ）
図４は、本発明の種々の実施例に従って、マスタ計算デバイスの種々のプロセス間でデータを共有可能な並列処理計算システムの実施例を示す。本図から分かるように、並列処理計算システム４０１は、Ｎ＋１個の数の計算プロセス４０３のインスタンス化（０、１…Ｎ）を含む。より具体的には、並列処理計算システム４０１は、主要計算プロセス４０３'およびＮ個の数の「擬似」計算プロセス４０３Ａ−４０３ｘを含む。本発明の例示される実装の場合、計算プロセス４０３は、集積回路設計情報の階層的データベース（ＨＤＢ）上で動作するように構成される電子設計自動化ツールである。また、本図で例示されるように、各計算プロセス４０３は、データブロックマネージャ４０５を含む（または別様に採用する）。各データブロックマネージャは、その関連付けられた計算プロセス４０３からの命令に応答して、データを要求または伝送するように構成される。 (Shared data)
FIG. 4 illustrates an embodiment of a parallel processing computing system that can share data among various processes of a master computing device in accordance with various embodiments of the present invention. As can be seen from this figure, the parallel processing computing system 401 includes instantiations (0, 1,... N) of N + 1 number computing processes 403. More specifically, parallel processing computing system 401 includes a main computing process 403 ′ and N number of “pseudo” computing processes 403A-403x. In the illustrated implementation of the present invention, the calculation process 403 is an electronic design automation tool configured to operate on a hierarchical database (HDB) of integrated circuit design information. Also, as illustrated in this figure, each calculation process 403 includes (or employs otherwise) a data block manager 405. Each data block manager is configured to request or transmit data in response to instructions from its associated computing process 403.

また、主要計算プロセス４０３’は、データブロックリソースマネージャ４０７を含む。データブロックリソースマネージャ４０７は、マスタ計算デバイスのメモリの構成要素にデータを保存し、そこからデータを読み出すことが可能である。例示される実施例では、メモリは、マスタ計算デバイスの処理ユニットに隣接して動作する集積回路メモリデバイスによって提供され得る「高速」またはローカルメモリ４０９と、磁気または光学ディスク記憶デバイスによって提供される「低速」メモリ４１１の両方を含む。しかしながら、本発明の種々の実装の場合、データブロックリソースマネージャ４０７は、任意の数の異なる記憶ソースからデータを読み出すか、またはそこにデータを保存することが可能であってもよいことを理解されたい。これらの実装の場合、ＤＢＬＯＣＫＩＤ内のブロックは、同一ソースから割り当てられ、ソースは、指定されたブロックの割り当て（例えば、例示される実施例の場合、１６Ｋｂ）をサポートすべきである。 The main calculation process 403 ′ also includes a data block resource manager 407. The data block resource manager 407 can store data in and read data from the memory components of the master computing device. In the illustrated embodiment, the memory is provided by a “fast” or local memory 409, which can be provided by an integrated circuit memory device operating adjacent to the processing unit of the master computing device, and a magnetic or optical disk storage device. Both of the “slow” memory 411 are included. However, it is understood that for various implementations of the present invention, the data block resource manager 407 may be able to read data from or store data in any number of different storage sources. I want. For these implementations, the blocks in the DBLOCK ID are allocated from the same source, and the source should support the specified block allocation (eg, 16 Kb in the illustrated embodiment).

データブロックリソースマネージャ４０７は、擬似計算プロセス４０３Ａ−４０３ｘの各インスタンス化のためのデータブロックリソースサーバ４１３を含む。したがって、データブロックリソースマネージャ４０７は、計算プロセス４０３Ａと関連付けられたデータブロックリソースサーバ４１３Ａ、計算プロセス４０３ｘと関連付けられたデータブロックリソースサーバ４１３ｘ等を含む。本発明の種々の実装の場合、各データブロックリソースサーバ４１３は、その対応する計算プロセス４０３専用である。また、本発明の例示される実装の場合、主要計算プロセス４０３’は、データブロックリソースマネージャ４０７と直接相互作用が可能であって、主要計算プロセス４０３’と関連付けられたデータブロックリソースサーバ４１３の必要性を排除する。しかしながら、本発明のさらに他の実装の場合、主要計算プロセス４０３’は、同様に、関連付けられたデータブロックリソースサーバ４１３’を通して、データブロックリソースマネージャ４０７と相互に作用してもよい。 The data block resource manager 407 includes a data block resource server 413 for each instantiation of the pseudo-calculation processes 403A-403x. Accordingly, the data block resource manager 407 includes a data block resource server 413A associated with the calculation process 403A, a data block resource server 413x associated with the calculation process 403x, and the like. In various implementations of the invention, each data block resource server 413 is dedicated to its corresponding calculation process 403. Also, for the illustrated implementation of the present invention, the main calculation process 403 ′ can interact directly with the data block resource manager 407 and needs the data block resource server 413 associated with the main calculation process 403 ′. Eliminate sex. However, for still other implementations of the invention, the main computation process 403 'may interact with the data block resource manager 407 through the associated data block resource server 413' as well.

擬似計算プロセス４０３のインスタンス化が、計算デバイスのメモリからデータを読み出すために必要である時、データブロックリソースマネージャ４０７からデータを要求するように、その関連付けられたデータブロックマネージャ４０５に命令する。それに応じて、データブロックマネージャ４０５は、その関連付けられたデータブロックリソースサーバ４１３から指定されたデータを要求する。次いで、データブロックリソースサーバ４１３は、データブロックリソースマネージャ４０７に、ローカルメモリ４０９、ディスクメモリ４１１、または必要に応じて両方から要求されたデータを取得させる。データブロックリソースマネージャ４０７が、要求されたデータを読み出した後、データブロックリソースサーバ４１３は、読み出されたデータをその関連付けられたデータブロックマネージャ４０５に提供し、擬似計算プロセス４０３によって採用され得る。逆プロセスを使用して、計算プロセス４０３によって生成されるデータを保存する。 When instantiation of the pseudo-calculation process 403 is necessary to read data from the memory of the computing device, it instructs its associated data block manager 405 to request data from the data block resource manager 407. In response, the data block manager 405 requests the specified data from its associated data block resource server 413. The data block resource server 413 then causes the data block resource manager 407 to obtain the requested data from the local memory 409, the disk memory 411, or both as required. After the data block resource manager 407 reads the requested data, the data block resource server 413 provides the read data to its associated data block manager 405 and can be employed by the pseudo-calculation process 403. An inverse process is used to save the data generated by the calculation process 403.

図５は、主要計算プロセス４０３'（表中、ＨＤＢ０と称される）と擬似計算プロセス４０３Ａ（表中、ＨＤＢ１と称される）との間の幾何学データのシーケンス（表中、Ｌと称される）を交換するために、本発明の種々の実施形態によって行なわれ得る演算を表す表を示す。本発明の種々の実装の場合、データブロックマネージャ４０５は、例えば、従来のプロセス間の通信（ＩＰＣ）技術を使用するＵＮＩＸ（登録商標）ドメインソケットを通して、その関連付けられたデータブロックリソースサーバ４１３と通信してもよい。当然ながら、適切な場合、使用される基礎オペレーティングシステム、ハードウェア構成等に対して、さらに他の通信技術を採用可能である。 FIG. 5 shows a sequence of geometric data (designated L in the table) between the main computation process 403 ′ (designated HDB0 in the table) and the pseudo-calculation process 403A (designated HDB1 in the table). FIG. 7 shows a table representing operations that can be performed by various embodiments of the present invention to replace In various implementations of the present invention, the data block manager 405 communicates with its associated data block resource server 413, eg, through a UNIX domain socket using conventional inter-process communication (IPC) technology. May be. Of course, other communication technologies can be employed for the underlying operating system, hardware configuration, etc. used, where appropriate.

（分散データ）
図６は、本発明の種々の実施例に従って、マスタ計算デバイスの異なるプロセス間でデータを共有可能な並列処理計算システムの実施例を示す。本図から分かるように、並列処理計算システム６０１は、マスタ計算デバイス６０３（図中、ラベル「マスタノード」によって識別される）と、複数のリモート計算デバイス６０５Ａ、６０５Ｂ…６０５ｙ（図中、ラベル「リモートノード」によってそれぞれ識別される）とを含む。マスタコンピュータ５０３は、上述のように、Ｎ＋１個の数の計算プロセス４０３のインスタンス化（０、１・・・Ｎ）を含み、これらの「マスタ」計算プロセス４０３はそれぞれ、データブロックマネージャ４０５を含み、主要計算プロセス４０３’は、データブロックリソースマネージャ４０７を含む。 (Distributed data)
FIG. 6 illustrates an embodiment of a parallel processing computing system that can share data between different processes of a master computing device in accordance with various embodiments of the present invention. As can be seen from this figure, the parallel processing computing system 601 includes a master computing device 603 (identified by the label “master node” in the figure) and a plurality of remote computing devices 605A, 605B... 605y (label “ Each of which is identified by a “remote node”. The master computer 503 includes an instantiation (0, 1,... N) of N + 1 numbers of calculation processes 403, each of which includes a data block manager 405, as described above. The main calculation process 403 ′ includes a data block resource manager 407.

リモート計算デバイスはそれぞれ、リモート計算プロセス６０７の１つ以上のインスタンス化（図中、「ＨＤＢ＿ＲＣＳ」として標識化される）を含む。各リモート計算プロセス６０７は、マスタ計算デバイス５０３上の対応する計算プロセス４０３のコピーであってもよく、またはマスタ計算デバイス５０３上の対応する計算プロセス４０３のサブプロセスであってもよい。各リモート計算プロセス６０７は、リモートデータクライアント（図示せず）を含む、または別様に採用する。また、各リモート計算デバイス６０５は、リモートデータサーバ６０９（リモートデータブロックリソースマネージャとも称される）を実装する。データブロックリソースマネージャ４０７と同様に、各リモートデータサーバ６０９は、そのリモート計算デバイス６０５のメモリにデータを保存し、そこからデータを読み出すことが可能である。再び、リモート計算デバイス６０５のためのメモリは、リモート計算デバイスの処理ユニットに隣接して動作する集積回路メモリデバイスによって提供される「高速」メモリ、磁気または光学ディスクによって提供される「低速」メモリ、またはそれらのいくつかの組み合わせであってもよい。リモートデータクライアント、リモートデータサーバ６０９、およびデータブロックマネージャ４０５はそれぞれ、ネットワーク内に相互接続される。 Each remote computing device includes one or more instantiations of remote computing process 607 (labeled as “HDB_RCS” in the figure). Each remote computing process 607 may be a copy of the corresponding computing process 403 on the master computing device 503 or may be a sub-process of the corresponding computing process 403 on the master computing device 503. Each remote computing process 607 includes or otherwise employs a remote data client (not shown). Each remote computing device 605 also implements a remote data server 609 (also referred to as a remote data block resource manager). Similar to the data block resource manager 407, each remote data server 609 can store data in the memory of its remote computing device 605 and read data therefrom. Again, the memory for the remote computing device 605 includes “fast” memory provided by an integrated circuit memory device operating adjacent to the processing unit of the remote computing device, “slow” memory provided by a magnetic or optical disk, Or some combination thereof. A remote data client, remote data server 609, and data block manager 405 are each interconnected within the network.

上述のように、各リモートデータクライアント（図示せず）は、リモートデータクライアント、リモートデータサーバ６０９、およびデータブロックマネージャ４０５によって形成されるネットワークを介する１６Ｋブロック内のデータ送受信（読み／書き）機能の提供に関与する。各リモート計算プロセス６０７は、リモートデータクライアントを有し、各リモートデータクライアントは、インスタンス化される時、データブロックリソースマネージャ４０７から一意の識別子を取得する。 As described above, each remote data client (not shown) has a function of data transmission / reception (read / write) within a 16K block via the network formed by the remote data client, the remote data server 609, and the data block manager 405. Involved in providing. Each remote computation process 607 has a remote data client, and each remote data client obtains a unique identifier from the data block resource manager 407 when instantiated.

次いで、各リモートデータサーバ６０９は、任意のリモートデータクライアントからの１６Ｋブロックの受信と、その関連付けられたリモート計算デバイス６０５のメモリ内へのデータ記憶とに関与する。また、各リモートデータサーバ６０９は、その関連付けられたリモート計算デバイス６０５のメモリからのデータ受信と、任意のリモートデータクライアントへのそのデータの送信とに関与する。リモートデータサーバ６０９がインスタンス化される時、データブロックリソースマネージャ４０７は、リモートデータサーバ６０９を追跡し、それと通信可能であるように通知されるであろう。本発明の種々の実装の場合、リモートデータサーバ６０９は、１ＧＢ等の固定量のメモリを管理してもよい。また、リモートデータサーバ６０９を制御するための制御情報は、伝送制御プロトコル（ＴＣＰ）を使用するインターフェース等の標準計算プロセスインターフェースを介して、リモートデータサーバ６０９へ送信され、そこから読み出すことが可能である。 Each remote data server 609 is then responsible for receiving 16K blocks from any remote data client and storing data in the memory of its associated remote computing device 605. Each remote data server 609 is also responsible for receiving data from the memory of its associated remote computing device 605 and transmitting the data to any remote data client. When the remote data server 609 is instantiated, the data block resource manager 407 will be notified to track the remote data server 609 and be able to communicate with it. For various implementations of the invention, the remote data server 609 may manage a fixed amount of memory, such as 1 GB. Further, control information for controlling the remote data server 609 is transmitted to the remote data server 609 via a standard calculation process interface such as an interface using a transmission control protocol (TCP), and can be read from there. is there.

計算プロセス６０７のリモートインスタンス化がデータへのアクセスを必要とする時、データを要求するように、その関連付けられたリモートデータクライアントに命令する。データが、別のリモート計算プロセス６０７によって管理されている場合、リモートデータクライアントは、そのリモート計算プロセス６０７と関連付けられたリモートデータサーバ６０９からデータを要求するであろう。代替として、データが、マスタ計算デバイス上の計算プロセスによって管理されている場合、リモートデータクライアントは、そのマスタ計算プロセスと関連付けられたデータブロックマネージャからデータを要求するであろう。上述のように、データブロックマネージャは、次に、マスタ計算デバイス上の主要計算プロセスと関連付けられたデータブロックリソースマネージャを通して、データを取得してもよい。同様に、リモート計算プロセスが、データの保存を必要とする場合、対応するメモリ内への記憶のために、適切なリモートデータサーバまたはデータブロックリソースマネージャにデータを提供するように、その関連付けられたリモートデータクライアントに命令する。 When the remote instantiation of computing process 607 requires access to data, it instructs its associated remote data client to request the data. If the data is managed by another remote computing process 607, the remote data client will request data from the remote data server 609 associated with that remote computing process 607. Alternatively, if the data is managed by a computing process on the master computing device, the remote data client will request data from the data block manager associated with that master computing process. As described above, the data block manager may then obtain data through the data block resource manager associated with the main computing process on the master computing device. Similarly, if a remote computing process requires storage of data, its associated to provide data to the appropriate remote data server or data block resource manager for storage in the corresponding memory. Command the remote data client.

本発明の種々の実施例の場合、データブロックマネージャ、リモートデータクライアント、およびリモートデータサーバは、伝送制御プロトコル／インターネットプロトコル（ＴＣＰ／ＩＰ）またはユーザデータグラムプロトコル等の好適なネットワーク通信プロトコルを使用して、電子通信ネットワークによって相互接続されてもよい。本発明のいくつかの実装の場合、データブロックマネージャ、リモートデータクライアント、およびリモートデータサーバは、ＵＤＰの信頼性のある形態を使用して、相互接続されてもよい。例えば、本発明のいくつかの実装は、ＵＤＰを使用して、それぞれデータのブロックを確実に読み出し、送信するために、図７および８に示されるデータフォーマットを使用してもよい。これらのブロックの要素は、以下に論じられる。 For the various embodiments of the present invention, the data block manager, remote data client, and remote data server use a suitable network communication protocol such as Transmission Control Protocol / Internet Protocol (TCP / IP) or User Datagram Protocol. And may be interconnected by an electronic communication network. For some implementations of the invention, the data block manager, remote data client, and remote data server may be interconnected using a reliable form of UDP. For example, some implementations of the invention may use the data format shown in FIGS. 7 and 8 to reliably read and transmit each block of data using UDP. The elements of these blocks are discussed below.

ａ．シーケンス番号：４バイト、ＲＤＣ維持ブロックシーケンス番号、毎成功トランザクション後に増加させられる。この番号は、ＲＤＣが指定のトランザクションが成功したことを確認することが可能なように、ＲＤＳによって返される。 a. Sequence number: 4 bytes, RDC maintenance block sequence number, incremented after every successful transaction. This number is returned by the RDS so that the RDC can confirm that the specified transaction was successful.

ｂ．トランザクションＩＤ：４バイト、ＲＤＣ／ＲＤＳ維持、ＲＤＳ指定トランザクションＩＤ、ＲＤＳが失効トランザクションを検出可能なように、毎成功トランザクション後に増加させられる。 b. Transaction ID: 4 bytes, RDC / RDS maintenance, RDS designated transaction ID, incremented after every successful transaction so that RDS can detect stale transactions.

ｃ．クライアントＩＤ：４バイト、ＲＤＣに対して一意である（リソースマネージャによって割り当てられる）。理由：ＲＤＣトランザクションＩＤ情報の参照として、ＲＤＳによって使用される。 c. Client ID: 4 bytes, unique to RDC (assigned by resource manager). Reason: Used by RDS as a reference for RDC transaction ID information.

ｄ．タイムスタンプ：８バイト。クライアントによって使用され、往復時間を推定し、再送ＲＴＴの曖昧性問題を回避する。 d. Time stamp: 8 bytes. Used by the client to estimate the round trip time and avoid retransmission RTT ambiguity issues.

ｅ．ブロック指数：ＲＤＣがＲＤＳから所望するブロックの指数である。 e. Block index: The index of the block that the RDC wants from the RDS.

ｆ．ＲＤＳＩＰアドレス：４バイト、ＲＤＣによって使用され、受信されたデータが正しいＲＤＳからのものであることを保証する。クライアントＵＤＰソケットは、どこからでも、そのＩＰアドレスおよびポートのために予定された任意のデータを受信する。 f. RDS IP address: 4 bytes, used by RDC to ensure that the received data is from the correct RDS. The client UDP socket receives any data scheduled for its IP address and port from anywhere.

ｇ．ＲＤＳポート：２バイト、上記と同一目的である。 g. RDS port: 2 bytes, same purpose as above.

ＲＤＣは、シーケンス番号、トランザクションＩＤ、およびリモートアドレスを使用して、トランザクションを認証するが、また、ＲＤＣは、ＲＤＳから応答を取得しない場合に、再送する準備をしなければならない。再送は、ＲＴＴ推定値に基づいてもよい（ＴＣＰと同様）。ＲＤＣは、最大回数まで再送を試み、その再送タイマの最大時間まで指数関数的バックオフを使用するであろう（ＴＣＰと同様）。 The RDC authenticates the transaction using the sequence number, transaction ID, and remote address, but the RDC must be prepared to retransmit if it does not get a response from the RDS. The retransmission may be based on the RTT estimate (similar to TCP). The RDC will attempt to retransmit up to a maximum number of times and will use an exponential backoff until the maximum time of its retransmission timer (similar to TCP).

（結論）
上述のように、本発明は、その好適でかつ例示的な実施形態の観点から説明された。添付の請求項の範囲および精神内における数多くの他の実施形態、修正例、および変形例が、本開示の検討から当業者には想起されるであろう。 (Conclusion)
As described above, the present invention has been described in terms of its preferred and exemplary embodiments. Numerous other embodiments, modifications, and variations within the scope and spirit of the appended claims will occur to those skilled in the art from consideration of the disclosure.

Claims

A method of distributing data within a parallel processing system,
In a data block resource manager, receiving a request for data from a first data block manager, the first data block manager executing on a first computing node in a parallel processing system Associated with the process,
Determining that the requested data block is stored in a memory location assigned to a second process executing on a second computing node in the parallel processing system;
Requesting the data from a second data block manager associated with the second process;
Receiving data from the second data block manager;
Providing the received data to the first data block manager.

The method of claim 1, wherein the first computing node and the second computing node are hosted on the same computing device.

The first computing node is hosted by a first computing device, and the second computing node is hosted by a second computing device that is separate from the first computing device. the method of.

A method of distributing data within a parallel processing system,
At a remote data server, receiving a request for data from a first remote data client, the first remote data client executing on a first computing node in a parallel processing system Associated with that,
Providing the received data to the first remote data client.

A parallel processing system using any combination of the aforementioned components.