JPS59501131A

JPS59501131A - Concurrent processing elements for using dependent free codes

Info

Publication number: JPS59501131A
Application number: JP50228083A
Authority: JP
Inventors: デサンテイス・アルフレツド・ジヨン; シビンガ−・ジヨゼフ・シ−グフリ−ド
Original assignee: バロ−ス・コ−ポレ−ション
Priority date: 1982-06-08
Filing date: 1983-06-08
Publication date: 1984-06-28
Also published as: JPS6313216B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるため要約のデータは記録されません。 (57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】本願と直接または間接に関連する米国特許出願は以下のとおりである。[Detailed description of the invention] The following U.S. patent applications are directly or indirectly related to this application:

１９８２年６月８日にΔ１ｆｒｅｄ　Ｊ、　Ｄｅ　５ａｎｔｉＳ等によって出願された「多重処理エレメントのための従属自由コードを発生づ−る機構」という名称の出願番号第３８６，３３９号。Filed on June 8, 1982 by Δ1fred J, De5antiS, etc. ``Mechanism for Generating Dependent Free Code for Multiprocessing Elements'' Title Application No. 386,339.

１９８２年６月８日にＡ　１ｆｒｅｄ　Ｊ　、　［）ｅ　５ａｎｔｉｓによって出願された「従属自由コードのためのデータアイテムを再ネーミングするシステムおよび方法」という名称の出願番号第３８６．４２０号。By A1fred J, [)e5antis on June 8, 1982 “System for Renaming Data Items for Dependent Free Codes” has been filed. Application No. 386.420 titled ``Systems and Methods''.

この発明は、従属自由コードを発生するための′機構に関し、より特定的には複数の同時処理エレメントを用いるためのそのような機構に関する。The present invention relates to a mechanism for generating dependent free code, and more specifically to a mechanism for generating dependent free code. The present invention relates to such a mechanism for using a number of simultaneous processing elements.

先行技術の説明今日でもなほとんどのコンピュータは、本質的に順次的である命令言語によって駆動されまたはそのような命令言語を実行するＶ　Ｏｎ　Ｎ　ｅｕｍａｎタイプのものである。さらに、そのような順次的な言語は、個々の命令の間に多くの従属されることはできない。たとえば以下のようなシーケンスを考察する。Description of prior art Even today, most computers are built using instruction languages that are sequential in nature. V On N euman type driven or executing such command language belongs to. Moreover, such sequential languages have many subordinations between individual instructions. You cannot belong to it. For example, consider the following sequence.

Ｃ：＝Ｆｎ　（Ａ、Ｂ＞ＤＨ＝Ｆｎ４−ｉ　（Ｃ，Ｅ）この２つの関数ＦｎおよびＦｎ　＋ｉは、関数Ｆｎの結果が次の関数Ｆｎ＋ｉに対する入力として用いられるので、論さらに、順次言語の欠点は、シーケンスまたはループがが存在するということであり、もしそれが改良されるならば、プロセッサの処理高は増大されるであろう。C:=Fn (A, B> DH=Fn4-i (C,E) These two functions Fn and Fn+i are Arguably, the disadvantage of sequential languages is that they are used as input for sequences or or a loop exists, and if it is improved, the pro Processor throughput will be increased.

処理システムの処理高を増大させる１つの方法は、複数のプロセッサを多重処理モードで用いることである。し力＼しながら、個々のプロセッサはなお命令を順次的に実行しなければならず、唯一の同時処理は、個々のプロセッサ力くプログラムの別々のセグメントを実行しているときかまたは全く別のプログラムを実行しているときにのみ存在する。One way to increase the processing power of a processing system is to use multiple processors for multiprocessing. It is used in mode. However, individual processors still execute instructions in order. The only concurrent processing is the When running separate segments of RAM or running completely different programs Exist only when you do.

そのような多重処理システムは、たとえば、Ｍ　ｏｔｔ等の米ｍｎＦＩ’ＪＩ　３　、３１９　、２２６　号オよびＡ　ｎｄｅｒｓｏｎ等の米国特許第３，４１９，８４９Ｈに開示されている。Such a multiprocessing system is described, for example, by Mott et al. No. 3, 319, 226 O and Anderson et al. U.S. Pat. No. 3,41 No. 9,849H.

処理高を増大させるさらに他の試みは、命令実行の種々の副関数がオーバラップするパイプライニングを採用することである。これらのステップを一連の命令とオーツベラツブさせることによって、命令実行は各クロック時間行なわれることができ、それによってプロセッサの処理高を増大させる。Yet another attempt to increase processing power is to overlap the various subfunctions of instruction execution. The first step is to adopt pipe lining. These steps are a series of instructions By auto-blobbing, instruction execution is performed each clock time. , thereby increasing the processing power of the processor.

処理高を増大させるためのこれらのすべての方法は、上述したような命令間の論理的従属性のゆえに、順次的な命令実行のために設計されている。論理的な従属性のため、種々の命令が互いに従属性なく実行されて一群のまたは多数の処理エレメントによる処理を容易に適合させる真の同時処理は達成されることができない。All of these methods for increasing throughput rely on inter-instruction arguments as described above. Because of their logical dependencies, they are designed for sequential instruction execution. logical subordination Due to the nature of the True simultaneous processing cannot be achieved, making processing easily compatible with elements. stomach.

応用言語は、各ステートメントが本質的に互いに無関係であり、したがってそのような応用ステートメントを縮小するように設計された処理エレメントの回路網によって同時に実現されることができるという点において、命令言語とは異なっている。そのような応用言語プロセッサの例は、Ｂｏｌｔｏｎ等の米国特許出願第２８１，０６４号および１−１ａ！７ｅｎｍａｉｅｒ等の米国特許出願第２８１．０６５号において与えられている。これらの両出願は、１９８１年７月７日に出願され、本願の譲受人に譲渡された。そのような応用言語は、それらが設計によってＶ　ｏｎ　Ｎ　ｅｕｍａｎ的意味における順次的でないという点において、命令言語とは異なっている。しかしながら、今日用いられるほとんどのプログラムライブラリは、命令言語で書かれてｄ″３つ、またそれらのライブラリを用いるべきデータ処理システムの更新またはさらに別の世代は、命令言語を実行するようにされなければならない。An applied language is one in which each statement is essentially unrelated to each other and therefore its A network of processing elements designed to reduce application statements such as It differs from a command language in that it can be simultaneously realized by ing. An example of such an applied language processor is the US patent application of Bolton et al. No. 281,064 and 1-1a! U.S. Patent Application No. 28 of 7enmaier et al. 1.065. Both applications were filed on July 7, 1981. and has been assigned to the assignee of the present application. Such application languages are In the sense that it is not sequential in the eumanic sense, It is different from a command language. However, most professionals used today The gram library is written in a command language and has three d'' libraries, and Updates or yet another generation of data processing systems to be used to execute instruction languages must be made to do so.

処理高が増大され得る１つの方法は、前の処理の結果に依存しないオブジェクトコードのセグメントを認識し、それらのセグメントを複数の処理エレメントによって同時に処理され得る非従属シーダンスまたは持ち行列に形成することである。このことはもちろん、オペランドがメモリに存在するときのそのもとの値を破壊することなく処理がオペランドに対して実行され得るといった方法のオペランドの取扱いを必要とする。異なった記号名が、この目的のために任意のデータアイテムを参照するのに指定され得る。One way in which processing height can be increased is for objects that do not depend on the results of previous processing. Recognize segments of code and route those segments through multiple processing elements. is to form a non-dependent sedance or matrix that can be processed simultaneously. . This of course destroys the operand's original value while it resides in memory. operands in such a way that operations can be performed on them without destroying them requires handling of the code. Different symbolic names can be used for this purpose with any data access. Can be specified to refer to an item.

コードまたは記号のそのような待ち行列の配列は、処理装置による同時処理をさらに適合させる。The arrangement of such a queue of codes or symbols requires simultaneous processing by the processing unit. Adapt it to the following.

この発明の目的は、従属自由命令コードを発生するための改良された機構を提供することである。It is an object of this invention to provide an improved mechanism for generating dependent free instruction codes. It is to be.

この発明の他の目的は、多重処理エレメントによる実行のため従属自由命令コードを提供することである。この発明のさらに他の目的は、従属自由命令コードを複数の処理エレメントに同時的態様で与えるための改良された機構を提供することである。Another object of the invention is to provide dependent free instruction code for execution by multiple processing elements. The goal is to provide a Still another object of the invention is to providing an improved mechanism for serving multiple processing elements in a simultaneous manner; That is.

この発明のさらに他の目的は、冗長なメモリ取出しがなく、かつそのコードが一連のそのようなコードの処理のために再処理されなくてもよい特性である命令コードを発生するための機構を提供することである。Still another object of the invention is to eliminate redundant memory fetches and to The instruction code is a property that does not need to be reprocessed for processing such code in the series. The objective is to provide a mechanism for generating the code.

発明の概要上述した目的を達成するために、この発明は、オブジェクトコードのストリングを受けそれらを高レベルのタスクに形成し、論理的に非従属であるそのようなタスクのシーケンスを決定し、それによってそれらが別々に実行されるようにする、データプロセッサのためのキャッシュ機構に向けられている。このキャッシュ機構は、種々のタスクによって必要とされるすべてのメモリアクセスを行ない、かつこれらのタスクを、種々のデータアイテムがストアされなかったローカルメモリに対する対応するポインタまたはリファレンスとともにストアする。このキャッシュ機構は、記号翻訳テーブルを利用し、そこではタスクは、ローカルメモリに対する種々のリファレンスまたはポインタを表わす記号とともに待ち行列の態様でストアされる。この方法において、種々のデータアイテムは別々のタスクに用いるための別々の記号または記号名を割当てられることができ、したがって種々のタスク間の依存性を限定しかつデータ変更を制御する。Summary of the invention To achieve the above-mentioned object, the present invention provides a string of object codes. and form them into higher-level tasks and define such tasks that are logically non-dependent. determine the sequence of screens so that they are executed separately , is directed to a cache mechanism for data processors. This cache The mechanism performs all memory accesses required by various tasks, and perform these tasks on local media where various data items are not stored. store along with a corresponding pointer or reference to the memory. This key The caching mechanism utilizes a symbol translation table, where tasks are of the queue, along with symbols representing various references or pointers to the queue. It is stored in the following manner. In this method, different data items are assigned separate tasks. can be assigned a separate symbol or symbol name for use with Limit dependencies between various tasks and control data changes.

この発明の特徴は、一群の処理エレメントに対するキャッシュ機構を与えることであり、そのキャッシュ機構は順次的なオブジェクトコードのストリングをタスクの持ち行列に形成し、各待ち行列は他のものとは論理的に非従属である。A feature of the invention is that it provides a caching mechanism for a group of processing elements. , whose caching mechanism tasks strings of sequential object code. queues, each queue being logically independent of the others.

Ｋ１」口１敗奏ＪＵ側この発明の上述の目的およびその他の目的、効果および特徴は、図面を参照して行なう以下の詳細な説明から一層明らかとなろう。K1” 1 loss JU side The above objects and other objects, effects and features of this invention will be understood with reference to the drawings. It will become clearer from the detailed description given below.

第１図は、この発明が設計されるためのオブジェクトコードのストリングおよびそのオブジェクトコードから形成される対応する論理的非従属待ち行列である。FIG. 1 shows the object code string and A corresponding logically independent queue formed from that object code.

第２図は、この発明によるシステムの概略ブロック図である。FIG. 2 is a schematic block diagram of a system according to the invention.

第３図は、この発明により形成される待ち行列のフＡ〜マットを示す。FIG. 3 shows the format of a queue created in accordance with the present invention.

第４図は、この発明に利用される記号翻訳デープルモジュールの概略ブロック図である。FIG. 4 is a schematic block diagram of the symbol translation double module used in this invention. It is.

第５図は、この発明に用いられる処理エレメントの概略ブロック図である。FIG. 5 is a schematic block diagram of processing elements used in the present invention.

第６図は、この発明を示覆タイミング図である。FIG. 6 is a timing diagram illustrating the present invention.

発明の概略説明上述の目的、効果および特徴を達成するために、この発明は３つの異なった見地、すなわち多重処理ニレメン１−にＪ−る改良されたコード処理、リファレンス処理および並列的実行を有する。コード処理において、この発明はまず連結によって命令ストリングを予備処理し、一連の連結された命令の間の関係を調べて、それらの命令を互いにつないで従属命令の待ち行列を形成する。連結された命令が互いにつながれるべきかどうかを決定するために用いられる機構は、続く連結された命令に対する出力を与える１つの連結された命令への従属である。一旦非従属性が位置決めされると、待ち行列が形成される。一旦待ち行列が形成されると、この発明による機構はその全待ち行列を１つのステップで処理することによって効果的である。連結された命令を通常的に再処理するため数サイクルを必要とするのが、今では１サイクルでなされ、また待ち行列は一連のシーケンスの実行に対し再発生される必要がない。General description of the invention In order to achieve the above-mentioned objects, effects and features, this invention has three different aspects. , i.e., improved code processing based on multiprocessing element 1, reference Has processing and parallel execution. In code processing, this invention first uses concatenation. preprocesses the instruction string, examines the relationships between a series of concatenated instructions, and The instructions are linked together to form a queue of dependent instructions. concatenated instructions The mechanism used to determine whether the following is a dependency on one concatenated instruction that provides output for the executed instruction. Once non- Once the dependencies are located, a queue is formed. Once a queue is formed , the mechanism according to the invention processes the entire queue in one step. That's effective. Requires several cycles to normally reprocess concatenated instructions is now done in one cycle, and the queue is used to execute a series of sequences. Does not need to be regenerated for the row.

さらに、コードの処理の間、前に参照されかつ処理エレメントに対しローカルであるオペランドリファレンスは認識され得る。このことは各リファレンスを受けかつそのアイテムがプロセッサのローカルメモリにあるがどうかをみるために翻訳テーブルをスキャ・ンすることによって達成される。もしリファレンスがプロセッサのローカルメモリに常駐しなければ、この発明はそのリファレンスに記号を割当て、任意の待ち行列に対応する個々の記号は１つの処理エレメントに対する後続の転送のためそこに付加される。Additionally, during the processing of the code, previously referenced and local to the processing element Certain operand references may be recognized. This applies to each reference and the item is in the processor's local memory. This is accomplished by scanning the translation table. If the reference is professional If the symbol does not reside in the processor's local memory, the invention , and each symbol corresponding to any queue is assigned to one processing element. appended there for subsequent transmission.

一旦対応する持ち行列が形成されると、それらは複数の処理エレメントによって同時に実行されることができる。Once the corresponding holding matrices are formed, they are processed by multiple processing elements. Can be executed simultaneously.

今日の処理システムの設計において、スタック配向プロランスタック、ま゛たば先入れ後出しスタックが与えられて、特定の高レベルプログラム言語によって用いられる再帰的手順およびネスティッド処理を適合させる。このようなスタック配向プロセッサが与えられると、親制御プログラムおよび処理システムの一部を形成する他のルーチンは、アルゴル６０のような本質的に再帰的である特定の高レベル言語で書かれることができる。この形式の特定のプロセッサモジュールは、Ｂ　ａｒｔｏｎ等の特許第３．４６１．４２３号、３．．５４６，６７７号、および３，５４８．３８４号に開示されている。In the design of today's processing systems, stack-oriented prolane stacks, Given a first-in-last-out stack, it can be used by certain high-level programming languages. Adapt recursive procedures and nested operations that can be used. Stack like this Given an oriented processor, it directs parts of the parent control program and processing system to Other routines that are recursive in nature, such as Algol 60, Can be written in level language. The specific processor module of this format is , B. arton et al. Patent No. 3.461.423, 3. ．． No. 546,677, and No. 3,548.384.

スタック機構の機能、先入れ後出し機構は、命令および関連のパラメータを、特定の高レベル言語のネストされた構造を反射する方法で操作することである。そのようなスタックは主メモリに概念的に常駐し、プロセッサのスタックｌ１ｅｓはスタック内のトップデータアイテムに対するリファレンスを含むようにされている。この方法において、データアイテムの多くの種々のスタックはメモリ内に常駐し、プロセッサはそれらをプロセッサ内に存在するスタックレジスタのトップに対するアドレスに従ってアクセスし、種々のスタックはそのレジスタの内容の変化によって別々のときアクレスされることができる。The stacking mechanism, a first-in, last-out mechanism, allows instructions and associated parameters to be is to manipulate the nested structures of a given high-level language in a reflective manner. So A stack such as is conceptually resident in main memory and is similar to the processor's stack l1es is made to contain a reference to the top data item in the stack. There is. In this way, many different stacks of data items are stored in memory. The processor uses them as the top of the stack registers that reside within the processor. The various stacks are accessed according to the address to the register, and the contents of the various stacks are can be addressed at different times by changes in .

もしプロセッサがそのようなスタック機構を与えられな番プれば、プロセッサは再帰タイプの言語を、その一般目的のレジスタをそれらがハードウェアスタック機構であるにもかかわらずアドレスすることによって実行する。If a processor is not provided with such a stacking mechanism, the processor A recursive type of language that uses general purpose registers to integrate them into the hardware stack. Although it is a mechanism, it is executed by addressing it.

この発明の好ましい実施例は高レベル再帰的言語で書かれたプログラムを実行するためのそのようなスタック配向プロセッサに向けられているが、この発明の内容は再帰的なものとは別の高レベル言語プログラムの形式を実行する設計された他の形式のプロセッサにも用いることができる。A preferred embodiment of the invention executes a program written in a high-level recursive language. Although this invention is directed to such a stack-oriented processor for is a recursive language designed to perform different forms of high-level programming. It can also be used with other types of processors.

一旦プログラムがこの高レベル言語で書かれると、それはプロセッサのコンパイラによってオブジェクトコードまたは機械言語コードのストリングにコンパイラされ、その形式は特定のプロセッサの設計に従って設計されならびに制御される。上述したように、今日設計されるほとんどのプロセッサはなおＶ　Ｏｎ　Ｎ　ｅｕｌｌｌａｎタイプのものであり、それは本来的に順次的でありかつ多くの論理的従属性を含む。Once a program is written in this high-level language, it is compiler into a string of object code or machine language code. and its format is designed and controlled according to the specific processor design. . As mentioned above, most processors designed today are still VOnN eullan type, which is sequential in nature and subject to much discussion. Including logical dependence.

この発明が「デコンパイル」された高レベル言語コードの形式で従属自由コードをいかに与えるかということを一般的に示すために、ここで第１図を参照する。This invention provides dependent free code in the form of "decompiled" high-level language code. Reference is now made to FIG. 1 to generally illustrate how to provide .

第１図の左欄は、Ｃ［１，Ｊ］　：＝Ａ［Ｉ、Ｊ］＋Ｂ［Ｉ、Ｊ］の計算のための機械言語コードのストリングを表わず。この、計算は多くのアドレスに対するものであるので、第１図の左端に示された機械言語コードのストリングはループの一連のシーケンスまたはシリーズにおいて実行される。The left column of Figure 1 is for calculating C[1,J]:=A[I,J]+B[I,J] does not represent a string of machine language code. This calculation is for many addresses. Since the string of machine language code shown on the left side of Figure 1 is performed in a sequence or series of

このコードのストリングは４つのコードのグループまたはサブセットに分割され、その各々のグループは第１図の中央部分のブロック図によって示されるように他のものと大部分論理的に非従属である。一般的にこの発明の機構は、次の処理が前の処理またはストアされた処理と非従属であるとき、論理的に非従属のストリングの端部を決定する。This string of codes is divided into four code groups or subsets. , each group is as shown by the block diagram in the central part of FIG. It is largely logically non-dependent from others. Generally, the mechanism of this invention performs the following processing. A logically independent store is Determine the end of the ring.

この発明において、機構は、第１図の右欄に示されるように、値呼出しまたはメモリ、取出しを実行しかつオペレータの持ち行列またはデータアイテム（またはデータアイテムに対するローカルアドレス）を形成する。これらのオペレータおよびそのデータアイテムは互いに連結され、以下に説明する方法で処理エレメントに転送され得る。このような連結された命令は、以後タスクとして参照される。In this invention, the mechanism is a value call or a message as shown in the right column of FIG. memory, performs a retrieval, and returns the operator's holding matrix or data item (or local address for the data item). These operators and its data items are concatenated together and processed by processing elements in the manner described below. may be forwarded to Such concatenated instructions are hereinafter referred to as tasks. .

第１図の例において、４つの別々の持ち行列は従属連結命令の論理的に非従属なグループであり、以下に説明するように別々の処理エレメントによって同時に実行され得る。In the example of Figure 1, the four separate holding matrices are logically independent of the dependent concatenation instructions. group and executed simultaneously by separate processing elements as described below. can be carried out.

第１図の左欄におけるコードのストリングはループのシーケンスにおいて実行されるべきであるので、第１図の右欄における新しく発生された待ち行列は再発生される必要はない。各一連のループにとって必要なことは、新たな値およびアレイアイテムがメモリから取出されるということである。また、新たなポインタ値は、ストアされる変数に割当てられなければならない。The string of code in the left column of Figure 1 is executed in a sequence of loops. The newly generated queue in the right column of Figure 1 should be regenerated. There is no need to be done. All that is required for each series of loops is to create new values and arrays. This means that the item is retrieved from memory. Also, the new pointer value must be assigned to the variable being stored.

発明の詳細な説明この発明ににるプロセッサシステムは第２図に示されており、キャッシュ機構１０はオペレータの個々の持ち行列および複数の小さい処理エレメントｌｉａ、ｂおよびＣに対するデータリファレンス、ならびに待ち行列処理エレメント１３ａを供給するための機構であり、それらの各々はそれ自身のローカルメモリ１２ａ、、ｂおよびＣならびにローカルメモリ１３ｂにそれぞれ与えられる。キャッシュ機構１０は、主メモリ〈図示せず）と直接に通信し、個々の処理エレメントはまた直接ストレージモジュール１４によって主メモリと通信する。Detailed description of the invention The processor system according to the present invention is shown in FIG. 0 is the operator's individual holding matrix and multiple small processing elements lia,b and a data reference to C and queue processing element 13a , each of which has its own local memory 12a. , , b and C and the local memory 13b, respectively. Cash The processing mechanism 10 communicates directly with main memory (not shown), and the individual processing elements It also communicates with main memory via a direct storage module 14.

機Ｍ４１０は４つのユニットから構成されており、その４つのユニツ１〜は待ち行列タスクモジコール１０ａ、命令リファレンスモジュール１０ｂ、記号ｖＡ訳モジコールＩＯＣ。Machine M410 consists of four units, and the four units 1~ Matrix task module 10a, instruction reference module 10b, symbol vA translation Mogicor IOC.

およびジョブ待ち行列１０ｄを含む。ここで、こりらの個々のユニットの機能を概略的に説明する。オブジェクトコードまたは機械言語コードの個々のストリングは待機タスクモジュール１０ａによってメモリから受取られ、待機タスクモジュール１０ａは、各命令を直列的に受けてそれらをタスクの持ち行列にアセンブルするバッファまたはキャッシュメモリであり、タスクの待ち行列の長さは一連の連結されたキャラクタの間の論理的依存性による。待機タスクモジュール１０ａは、命令のつながれたグループが以前の百１算の結果を必要としないとぎを決定するのに充分なデコード回路を含む。つながれたタスクのそのような持ち行列がアセンブルされてしまうと、そのオペランドリファレンスは命令リファレンスモジュール１０ｂに転送され、命令リファレンスモジュール１０ｂは個々の命令および割当記号によって要求される任意のメモリ取出しを実行する。and job queue 10d. Here we describe the functions of these individual units. Briefly explain. individual strings of object code or machine language code The log is received from memory by the waiting task module 10a, and the waiting task module The module 10a receives each instruction serially and assembles them into a matrix of tasks. A buffer or cache memory for processing tasks, and the length of a task's queue is a series of due to logical dependencies between concatenated characters. Standby task module 10 a determines the sequence in which the linked group of instructions does not require the result of the previous 101 calculation. Contains sufficient decoding circuitry to Such a matrix of connected tasks is assembled, its operand reference becomes an instruction reference. The instruction reference module 10b stores the individual instructions. and perform any memory fetches required by the allocation symbol.

待機タスクモジュール１０ａはまた、記号翻訳モジュール１０Ｃに待ち行列番号を割当てる。The waiting task module 10a also sends a queue number to the symbol translation module 10C. Assign.

命令リファレンスモジコール１０ｂは絶対メモリアドレスが論理的に保持されているがどうかを決定する連想メモリであり、もし保持されていなければ命令基準モジュール１０ｂはそのアドレスを主メモリに送り、そのアドレスをストアし、そこに記号を割当てることによってそのメモリアクセスを行なう。この連想メモリは次に、記号翻訳モジコール１０ｃに対応するタスクとともにその記号を転送する。記号翻訳モジュール１０ｃはその記号に対しポインタ（ローカルメモリアドレス）を割当て、そのポインタを主メモリに転送し、それによって°主メモリはローカルメモリ内にデータアイテムをストアすることができる。オブジェクトコードのストリングの最初の実行の間、一連の命令に対する待ち行列は記号翻訳モジュール１０ｃにおいて形成されている。これらの待ち行列が形成される一方、個々のタスクおよびポインタはジョブ待ち行列１０ｄに転送される。The instruction reference module 10b has an absolute memory address that is logically maintained. It is an associative memory that determines whether the data is stored or not, and if it is not stored, the instruction standard Module 10b sends the address to main memory, stores the address, Access the memory by assigning a symbol to it. This associative memo Li then transfers the symbol with the corresponding task to the symbol translation module 10c. do. The symbol translation module 10c stores a pointer (local memory) for the symbol. address) and transfers its pointer to main memory, thereby can store data items in local memory. object During the first execution of a string of code, the queue for a sequence of instructions is symbolically translated It is formed in module 10c. While these queues form , individual tasks and pointers are transferred to job queue 10d.

記号翻訳モジュール１０Ｃは、待機タスクモジュール１０ａによって参照され得る種々の持ち行列記憶位置を有するテーブルルツアップメモリである。これらの記憶位置は、処理エレメントのローカルメモリに保持されたつながれた命令およびアイテムの記号のリストを含む。各持ち行列が読出されるとき、持ち行列に対する記号は、以下に詳細に説明するように、記号によって参照されるアイテムの実際の記憶位置に対するポインタを含むルックアップテーブルに対する読出しアドレスとして用いられる。第１図のオブジェクトコードストリングの最初の処理の終わりに、ジョブ待ち行列１０ｅは個々の作り出された待ち行列を含み、その作り出された待ち行列はタスクおよびポインタによって同時実行のため各処理エレメントＩｌａ、１１ｂ、および１１ｃに直列的に送られ得る。一方、実行のため必要とされる個々のデータアイテムは主メモリから取出されてローカルメモリ１２ａ、１２ｂおよび１２ｃの適当な記憶位置でストアされており、その記憶位置はジョブ待ち行列１０ｄにおけるポインタによってアクセスされる。The symbol translation module 10C may be referenced by the waiting task module 10a. This is a table-top memory having various matrix storage locations. these A memory location is a chain of instructions held in the local memory of a processing element. and a list of symbols for the item. When each holding matrix is read, Symbols that refer to the item referenced by the symbol are A read access to a lookup table containing pointers to actual storage locations. Used as a dress. Initial processing of the object code string in Figure 1 At the end, the job queue 10e contains each created queue and its The created queue is assigned to each processing item for concurrent execution by tasks and pointers. elements Ila, 11b, and 11c. On the other hand, for execution The individual data items needed are retrieved from main memory and stored in local memory. 12a, 12b, and 12c at appropriate storage locations. The location is accessed by a pointer in job queue 10d.

オブジェクトコードの最初のループまたは実行の完了により、すべてのタスク処理が完了されてしまうまで記号翻訳モジュール１００からジョブ待ち行列１０ｄへ以前に作り出された持ち行列を供給することによって、一連のループが実行され得る。Completion of the first loop or execution of the object code completes all task processing. job queue 10d from symbol translation module 100 until processing is completed. A series of loops is executed by supplying the previously created holding matrix to It can be done.

第２図のジョブ持ち行列１０ｄに待ち行列が常駐するときのそのフォーマットは第３図に示されている。左から右へ読出される各フィールドは、乗算命令、加算命令、減輝命令、および１．ＪおよびＣフィールドに対するポインタが続くインデックス命令である。これらは第１図における第１の持ち行列（Ｑｏ）に対応し、そこでは８ビツトリテラルは各乗算および加算命令の一部となる。The format when a queue resides in the job holding queue 10d in Figure 2 is It is shown in FIG. Each field read from left to right has a multiplication instruction, an addition command, dimming command, and 1. An integer followed by pointers to the J and C fields. This is Dex's command. These correspond to the first holding matrix (Qo) in Figure 1. , where an 8-bit literal becomes part of each multiply and add instruction.

このようにして形成された待ち行列は、将来の実−行のため命令を保持するのみならず、スタック環境ならびにそのアドレスおよび実行されるべき次の峙ち行、列の記憶位置を識別する。ステップごとに１つの持ち行列を利用可能処理エレメントに与えること以外、コード処理のため他のいかなる処理ステップも必要とされない。The queue thus formed only holds instructions for future execution. the stack environment and its address and next line to be executed, Identifies the storage location of the column. Processing element with one matrix available for each step No other processing steps are required to process the code other than providing the Not possible.

第２図の記号翻訳モジュール１０Ｃは、第４図に詳細に示されている。第４図に示されるように、このモジュールはテーブルルックアップ機構であり、持ち行列記号テーブル１６の列はつながれたタスクに対する記憶位置ならびに第２図の命令リファレンスモジュール１０ｂによって割当てられる記号名を表わし、また対応する行は第２図の待機タスクモジュール１０ａによって割当てられる各待ち行列番号を表わす。上述したように、記号翻訳モジュールにおけるこのようにしで形成された待ち行列は、行なわれるべき計算の各一連のループに対し、第２図のジョブ待ち行列１０ｄに対する転送のため、ポインタテーブル１７におけるポインタをアクセス、する準備ができている。The symbol translation module 10C of FIG. 2 is shown in more detail in FIG. In Figure 4 As shown, this module is a table lookup mechanism and has a matrix The columns of the symbol table 16 indicate the memory locations for the connected tasks and the commands in Figure 2. represents the symbolic name assigned by the command reference module 10b and also represents the corresponding symbol name. The corresponding rows correspond to each waiting line assigned by the waiting task module 10a of FIG. Represents the column number. As mentioned above, this way in the symbol translation module The queue formed is for each series of loops of computations to be performed as shown in Figure 2. Pointer in pointer table 17 for transfer to job queue 10d the printer is ready for access.

第４図において、種々の記号は間接ローカルメモリリファレンスであり、したがってそこにストアされたアイテムは異なったポインタを与えられ得るということに注意されたい。このことは、２つの利点をもたらす。まず第１は、異なったポインタを再ネーミングまたは割当てしてそれを表わすことによって、任意のデータアイテムがローカルメモリ内の１つ以上の記憶位置にストアされるということである。第２の利点は、任意の変数が１つの記憶位置にストアされてそのポインタを変化することなくそこから出てい（ことができる一方、その変数に対して行なわれる処理の結果が同一の記号名であるが異なったポインタを有する別の記憶位置にストアされ得るということである。In Figure 4, the various symbols are indirect local memory references; That means items stored there can be given different pointers. Please be careful. This provides two advantages. Firstly, different points Any data can be created by renaming or assigning an interface to represent it. that the item is stored in one or more locations in local memory. It is. The second advantage is that any variable can be stored in one storage location and its pointer The data exits from there unchanged (while you can The result of the processing performed is another memory with the same symbolic name but a different pointer. This means that it can be stored in a location.

第２図の個々の処理ニレメン１〜は、第５図に示されている。要約すれば、それらは複数のマイクロプログラム化されたマイクロプロセッサから形成されており、マイクロプロセッサはインテル８０８６のような商業的に利用可能なものであり、またはそれらはＦ　ａｂｅｒ等の米国特許第３，９８３．５３９号に開示されたカストマイズされたマイクロプログラム化されたプロレッザであってもよい。個々のプロセッサは異なった関数を実行するようにされるので、それらはその各関数を実行Ｊ−るのに必要とされるのみの論理回路を含む特別目的のマイクロプロセッサであってもよい。The individual treated elms 1- of FIG. 2 are shown in FIG. In summary, that are formed from multiple microprogrammed microprocessors. , the microprocessor is a commercially available one such as the Intel 8086. or they are disclosed in Faber et al., U.S. Pat. No. 3,983.539. Can be customized micro-programmed proreza . Since individual processors are made to perform different functions, they A special-purpose microprocessor that contains only the logic circuitry needed to perform each function. It may be a processor.

各回路１８は、演算論理ユニット、シフトユニット、乗算ユニット、インデキシングユニット、ストリングプロセッサ、およびデコードユニットである。さらに、シーケンシングユニット１９は第２図のジョブ持ち行列１０ｄから命令を受けて、制御ストア２ｏにストアされたマイクロ命令をアクセスする。制御ストアからのマイクロ命令は命令バスＩＢを介して各ユニットに供給され、ユニットによって発生された任意の状態信号は状態バスＣＢを介して転送される。対応するローカルメモリからのデータは、ＡバスＡＢで受けられ、実行された結果はＢバスＢＢに供給される。Each circuit 18 includes an arithmetic logic unit, a shift unit, a multiplication unit, an index a string processing unit, a string processor, and a decoding unit. moreover , the sequencing unit 19 receives instructions from the job holding matrix 10d in FIG. to access the microinstructions stored in the control store 2o. Control store? These microinstructions are supplied to each unit via the instruction bus IB, and are processed by the unit. Any status signals generated by the state bus CB are transferred via the status bus CB. Corresponding lo Data from the local memory is received on the A bus AB, and the executed results are sent to the B bus. Supplied to BB.

第１図に戻って、第２図の待機タスクモジュール１０ａによって受けられているコードストリングにおける種々の命令およびそのモジュールによって形成されるより高レベルの命令またはタスクのより詳細な説明をここで行なう。Returning to FIG. 1, the waiting task module 10a of FIG. formed by various instructions and their modules in a code string A more detailed explanation of higher level instructions or tasks is now provided.

第１図の左欄に示されるように、コードストリングの最初取出し、８ビツト値、および乗算命令である。それらは次のタスク、すなわち第１図の右欄の第１のタスクによって示されるリテラル値をＩに乗箒するタスク、にっながれている。処理は、加算タスクおよび減算タスクに対し続けらツクのトップにおく命令であり、インデックス命令はメモリ内にあるディスクリブタにおけるポインタを挿入する結果となる。このようにして、第１の持ち行列。。が形成される。As shown in the left column of Figure 1, the first fetch of the code string, the 8-bit value, and a multiplication instruction. They are used for the next task, i.e. the first task in the right column of Figure 1. The task of multiplying I by the literal value indicated by place is the instruction placed at the top of the sequence for addition and subtraction tasks. , the index instruction inserts a pointer to a disc libter in memory. This results in In this way, the first holding matrix. . is formed.

Ｑ、の配列は、名前呼出し命令の後命令Ｎ　Ｘ　Ｌ　Ｖが実行されてインデックス処理およびデータの取出しを生じるということ以外、同様である。このようにして、第２の待ち行列Ｑ１が形成される。第３の待ち行列Ｑ２の配列において、加算命令が存在し、この加算命令によって、スタックのトップにおける値を破壊するメモリの破壊記憶（ＳＴＯＤ）の前にＡおよびＢに対しこのように計算された値が加算される。The array of Q is indexed after the instruction NXLV is executed after the name call instruction. are similar, except that they cause processing and data retrieval. in this way A second queue Q1 is thus formed. In the arrangement of the third queue Q2, There is an add instruction that destroys the value at the top of the stack. is calculated like this for A and B before the storage of memory destruction (STOD) The value added is added.

第１図の中央のブロック図かられかるように、Ｑ２の最後の２つのタスクまたはつながれた命令の実行は、その値がローカルメモリにス１〜アされる計算Ｑ。およびＱ、の結果を必要とする。その記憶位置およびそれらの各ローカルメモリは、リファレンスが実際にそこにストア゛されているかどうかを示すだめのインデックスフラグに与えられる。As can be seen from the block diagram in the center of Figure 1, the last two tasks of Q2 or Execution of the linked instructions is a calculation whose value is stored in local memory. oh We need the results of and Q. Its storage locations and their respective local memory are , a temporary index indicating whether the reference is actually stored there. given to the xx flag.

この方法において、処理エレメントが同時的な方法で処理されるとき、Ｑ２のルーチンは必要な値が計算されかつローカルメモリにストアされてしまう前に第２のまたは最終の加算タスクに達することが可能である。対応する処理エレメントはそれらの値がまだ利用可能でないということを検出し、その値が利用可能になるまでそれらの記憶位置をアクセスし続りる。In this method, when the processing elements are processed in a concurrent manner, the rules of Q2 The process executes a second process before the required values are calculated and stored in local memory. It is possible to reach the or final addition task. Corresponding processing element detects that those values are not yet available, and when the values become available. It continues accessing those storage locations until it is reached.

第４の持ち行列またはＱ３は、値Ｊを取出しそれに１を加算し、スタックのトップにおける値を残したままメモリ内の非破壊ストアを行なう前に、スタックのトップにそσアドレスを挿入する。最後の４つの命令は、メモリがら値Ｋを取出し、それを値Ｊ　（ＬＳＥＱ）と比較し、もし値Ｋが値Ｊよりも大きければ次の命令、すなわち偽への分校により、プログラムカウンタが再ロードされ、そのルーチンが繰返される。一方、コードストリングにおける最後の命令は、ルーチンの終了を生じる無条件分枝である。The fourth holding matrix, or Q3, takes the value J and adds 1 to it, and tops the stack. the stack before performing a non-destructive store in memory, leaving the values in the stack intact. Insert the σ address into the The last four instructions retrieve the value K from memory. , compare it with the value J (LSEQ), and if the value K is greater than the value J, then the next instruction command, i.e. branching to false, reloads the program counter and Ching is repeated. On the other hand, the last instruction in the code string is the routine's It is an unconditional branch that results in termination.

第６図は個々の待ち行列に対する待ち行列実行時間のタイミング図であり、特定のタスクに対する各クロック時間は２つの番号によって表わされている。第１の番号は実行されている特定のループまたはシーケンスを表わし、第２の番号は命令を実行している特定の処理エレメントを表わす。持ち行列の配列の結果となるコードストリングの最初の伝送ならびにタスクの実行はほぼ１７クロツク時間を必要とし、一方一連のループは実行のため５クロック時間のみを必要とする。というのは、タスクはＱＴＭおよびＩＲＭにおいて完全に再処理される必要がないので、個々の従属自由待ち行列が同時的に実行されるからである。Figure 6 is a timing diagram of queue execution times for individual queues; Each clock time for a task is represented by two numbers. first The number represents the particular loop or sequence being executed, the second number represents the command Represents a particular processing element that is executing an instruction. Resulting in an array of holding matrices The initial transmission of the code string and execution of the task takes approximately 17 clock hours. while a series of loops requires only 5 clock times to execute. and This is because tasks do not need to be completely reprocessed in QTM and IRM. This is because the individual dependent free queues are executed concurrently.

一般的に、待機タスクモジュールはタスク、これらのタスクの待機、待ら行列実行、タグ予測および分校訂正に対する命令の同時的なステップを実行する。命令基準モジコールは、再ネーミング、記号マネージメントおよび取替の機能を実行する。記号翻訳モジュールは、並列アクセス、ポインタ割当およびタスク割当を与える。小さな処理エレメントは、独特な処理エレメントが頻繁でないタスク実行およびストリングの関数部分に対し用いられている間、頻繁なタスク実行のため設けられる。第２図の直接リファレンスモジュール１５は、ノンスタックリファレンスの評価のため設置ノられる。In general, the waiting task module handles tasks, waiting for these tasks, and queue executions. Execute simultaneous steps of instructions for line, tag prediction and branch correction. order Reference module performs renaming, symbol management and replacement functions do. The symbol translation module supports parallel access, pointer allocation and task allocation. give. Small processing elements allow unique processing elements to perform infrequent tasks. While used for functional parts of rows and strings, it is useful for performing frequent tasks. It will be provided for. The direct reference module 15 of FIG. It is set up for evaluation of reference.

最後にコンパイルされたオブジェクトコードを受けてそのコードのシーケンスをより高レベルのタスクに形成し、かつそれがオブジェクトコードストリングの前の実行の結果を必要としないという意味において他の持ち行列と論理的に無関係であるようなタスクの待ち行列を形成する、データプロセッサのための機構が説明されてきた。この方法において、そのような待ち行列のシーケンスは、同時実行のため、非従属処理エレメントに供給され得る。lastly Takes the compiled object code and converts that code sequence to a higher level. form into a level task, and it executes the object code string before is logically unrelated to other holding matrices in the sense that it does not require the result of A mechanism for a data processor to form a queue of tasks such as It's here. In this way, the sequence of such queues is and may be supplied to non-dependent processing elements.

記号翻訳テーブルが設けられて、それによってデータアイテムが参照され、その記号はローカルメモリに対する任意のポインタを割当てられ、そのポインタは変化されて、それによってデータアイテムは１つ以上のメモリ記憶位置に常駐することができ、またデータアイテムはそのアイテムに対する処理の結果が他の記憶位置にストアされ得る一方でメモリ内に残ることができる。A symbol translation table is provided by which the data item is referenced and its A symbol can be assigned an arbitrary pointer to local memory, and that pointer cannot be changed. data items, such that the data items reside in one or more memory storage locations. A data item can also be stored in other stores as a result of operations on that item. It can remain in memory while being stored in a location.

この発明の〜実施例が説明されたが、請求の範囲に記載された発明の精神から逸脱することなく変更および修正が可能であるということは当業者にとって明らかであろう。Embodiments of this invention have been described, but depart from the spirit of the claimed invention. It will be obvious to those skilled in the art that changes and modifications can be made without deviating from the Will.

凡　ＡＴ −ｆで、２入力　９Ｉ一部首２−ご　−２−袋　−２旨ｅ２冒　２ニ恣Σくの王　Σくの２　Σくのｚくの　＜の１０　、　（％Ｊ□　−コー国際調量報告Ordinary AT -f, 2 Input 9I part Neck 2 - Go - 2 - Bag - 2 effect e2 blasphemy 2 Ni Σ King of Kuno 2 Σ Kuno z Kuno <10, (%J□-Co international metrology report

Claims

[Claims] 1. Data for executing instruction code including operator and memory address a data processing system that forms logically independent strings and then means for forming dependent operators into a logically independent matrix; A storage for storing the logically independent holding matrices.A storage for storing the logically independent holding matrices. and a plurality of processing means each receiving a plurality of processing means. 2. The forming means is 1. In response to the instruction code in a sequential manner, the operation Determines when a factor does not require the results of previous operations and thus exhibits logical non-dependence. , and forming said logically independent string and subsequently forming said logically dependent string. Claim 1, further comprising means for forming operators into logically independent queues. System described. 3. The data processing system includes a main memory and receives the memory address. and further comprising main memory addressing means for retrieving data from said main memory. The system according to item 2 of the scope. 4. Each of the processing means includes local storage and is connected to the main memory. is used to transfer the local memory address to said memory, thereby memory so that the retrieved data can be transferred to the local memory. Claim 3 further comprising local storage address means for system. 5. The receiving means receives pairs of operators forming a specific logically independent matrix. said local memory address to a corresponding string. The system according to item 4 of the scope of demand. 6. Receive and execute sequential code including operators and memory addresses A processing method in a data processing system for receiving said sequential code; Determine when an operator does not need the results of a previous operation that exhibits logical nondependence. , Logically independent strings and subsequent logically dependent operators into non-dependent queues and separate ones of said queues for simultaneous execution. , methods of transfer to different individual processing means. 7. The data processing system includes a main memory, receives a main memory address, and receives a main memory address. 7. The method of claim 6, further comprising retrieving data from the main memory. Law. 8. Each of said processing means includes local storage and has a local memory address. transfer the retrieved data to the main memory, thereby causing the main memory to transfer the retrieved data to the main memory. further comprising: being capable of being transferred to said mouth--calm memory; The method according to claim 7. 9. the local memory addresses to form a specific logically independent queue; Claim 8 further comprising: providing a corresponding string of operators. The method described in section. 10. The data processing system includes a job queue, and the data processing system The dependent holding matrix and the corresponding local memory address are processed by the processing means of said sides. further comprising forwarding to the job queue for subsequent distribution to the job queue; The method described in claim 9.