JP2010146102A

JP2010146102A - Arithmetic processor and storage area allocation method

Info

Publication number: JP2010146102A
Application number: JP2008319993A
Authority: JP
Inventors: Makoto Kosone; 真小曽根; Kazuhisa Iizuka; 和久飯塚
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 2008-12-16
Filing date: 2008-12-16
Publication date: 2010-07-01

Abstract

<P>PROBLEM TO BE SOLVED: To provide memory configuration technology contributing to reduction in circuit scale. <P>SOLUTION: The processor 10 includes a plurality of arithmetic units 12a-12d performing arithmetic processing. First selection circuits 110a-110c each configure an address space to each arithmetic unit 12a-12d by use of at least one RAM 100a-100f. Each of the RAMs 100a-100f is not connected to all the arithmetic units 12a-12d, but connected only to a part thereof. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、演算処理に対してメモリを構成する技術に関し、特に機能の変更が可能な演算処理装置においてメモリを構成する技術に関する。 The present invention relates to a technique for configuring a memory for arithmetic processing, and more particularly to a technique for configuring a memory in an arithmetic processing device capable of changing functions.

近年、ＡＬＵ(Arithmetic Logic Unit)と呼ばれる基本演算機能を1つ以上持つ演算ユニットを備えた処理装置（プロセッサ）の開発が進められている（たとえば特許文献１参照）。この演算ユニットは、複数のＡＬＵを含むＡＬＵ列と、ＡＬＵ列の間に設けられた接続部を備えて構成される。処理装置においては、ＡＬＵアレイに対してコマンドデータが設定されることにより、ＡＬＵ回路の演算機能と、前後段のＡＬＵ列を接続する接続部が制御され、全体として所期の演算処理が実現される。コマンドデータは、一般にＣ言語等の高級プログラム言語で記述されたソースプログラムからデータフローグラフ（ＤＦＧ：Data Flow Graph）を作成し、その情報をもとに作成される。 In recent years, development of processing devices (processors) including an arithmetic unit having one or more basic arithmetic functions called ALU (Arithmetic Logic Unit) has been promoted (see, for example, Patent Document 1). This arithmetic unit is configured to include an ALU column including a plurality of ALUs and a connection portion provided between the ALU columns. In the processing device, by setting command data to the ALU array, the arithmetic function of the ALU circuit and the connection part connecting the preceding and succeeding ALU columns are controlled, and the expected arithmetic processing is realized as a whole. The The command data is generally created based on the data flow graph (DFG: Data Flow Graph) created from a source program written in a high-level program language such as C language.

複数の演算ユニットをもつ処理装置では、実現したい演算処理によって各演算ユニットの必要とするメモリ量が異なるため、メモリ構成を可変にできることが望ましい。そのため、メモリ再構成回路を備えた半導体装置を提案するものがある（たとえば特許文献２参照）。
特開２００６−４０２５４号公報特開２００６−１８４５２号公報 In a processing apparatus having a plurality of arithmetic units, it is desirable that the memory configuration can be made variable because the amount of memory required for each arithmetic unit differs depending on the arithmetic processing desired to be realized. For this reason, some semiconductor devices including a memory reconfigurable circuit are proposed (see, for example, Patent Document 2).
JP 2006-40254 A Japanese Patent Laid-Open No. 2006-18452

しかしながら、特許文献２に開示された半導体装置では、すべての演算ユニットがすべてのＲＡＭを用いて、必要とする容量のメモリを構成する。そのため、メモリ再構成に必要なスイッチや配線が多くなり、半導体装置全体の回路規模が増大している。 However, in the semiconductor device disclosed in Patent Document 2, all the arithmetic units use all the RAMs to form a memory having a necessary capacity. For this reason, more switches and wirings are required for memory reconfiguration, and the circuit scale of the entire semiconductor device is increasing.

本発明はこうした状況に鑑みてなされたもので、その目的は、回路規模の縮小化に貢献するメモリ構成技術を提供することにある。 The present invention has been made in view of such circumstances, and an object thereof is to provide a memory configuration technique that contributes to a reduction in circuit scale.

上記課題を解決するために、本発明のある態様の演算処理装置は、機能の変更が可能な演算処理装置であって、演算処理を行なう複数の演算ユニットと、複数の記憶領域と、各演算ユニットに対して、少なくとも１つの記憶領域を用いてアドレス空間を構成する記憶領域構成回路とを備える。少なくとも１つの記憶領域は、複数の演算ユニットの一部にのみ接続されている。 In order to solve the above problems, an arithmetic processing device according to an aspect of the present invention is an arithmetic processing device capable of changing a function, and includes a plurality of arithmetic units that perform arithmetic processing, a plurality of storage areas, and each arithmetic operation. The unit includes a storage area configuration circuit that forms an address space using at least one storage area. At least one storage area is connected to only some of the plurality of arithmetic units.

本発明の別の態様もまた、演算処理装置である。この装置は、機能の変更が可能な演算処理装置であって、複数の独立した演算処理を行なう少なくとも１つの演算ユニットと、複数の記憶領域と、各演算処理に対して、少なくとも１つの記憶領域を用いてアドレス空間を構成する記憶領域構成回路とを備える。少なくとも１つの記憶領域は、複数の演算処理の一部にのみ利用されるように演算ユニットに接続されている。 Another embodiment of the present invention is also an arithmetic processing device. This apparatus is an arithmetic processing unit capable of changing functions, and includes at least one arithmetic unit that performs a plurality of independent arithmetic processes, a plurality of storage areas, and at least one storage area for each arithmetic process. And a storage area configuration circuit that configures an address space. At least one storage area is connected to the arithmetic unit so as to be used only for a part of the plurality of arithmetic processes.

本発明のさらに別の態様は、記憶領域割当方法である。この方法は、複数の演算処理を実行することができ、かつ演算処理に対する記憶領域を構成可能な記憶領域構成回路を備えた演算処理装置に対して、各演算処理に割り当てる記憶領域を定める記憶領域割当方法であって、各演算処理が保持しなければならないデータ量に応じて、各演算処理が使用可能な記憶領域を割り当てる。 Yet another aspect of the present invention is a storage area allocation method. In this method, a storage area for determining a storage area to be assigned to each arithmetic process is provided for an arithmetic processing apparatus having a storage area configuration circuit capable of executing a plurality of arithmetic processes and having a storage area for the arithmetic process. According to the allocation method, a storage area that can be used by each arithmetic processing is allocated according to the amount of data that each arithmetic processing must hold.

なお、以上の構成要素の任意の組み合わせ、本発明の表現を方法、装置、システム、コンピュータプログラムとして表現したものもまた、本発明の態様として有効である。 It should be noted that any combination of the above components and the expression of the present invention expressed as a method, apparatus, system, and computer program are also effective as an aspect of the present invention.

本発明によれば、効率的なメモリ構成技術を提供することができる。 According to the present invention, an efficient memory configuration technique can be provided.

図１は、実施の形態に係る処理装置１０の基本構成図である。処理装置１０は、集積回路装置２６、コンパイル部３０、データフローグラフ処理部３１、設定データ生成部３２および記憶部３４を備える。処理装置１０は、機能の変更が可能な演算処理装置として構成される。 FIG. 1 is a basic configuration diagram of a processing apparatus 10 according to an embodiment. The processing device 10 includes an integrated circuit device 26, a compilation unit 30, a data flow graph processing unit 31, a setting data generation unit 32, and a storage unit 34. The processing device 10 is configured as an arithmetic processing device capable of changing functions.

集積回路装置２６は１チップとして構成され、演算ユニット１２、設定部１４、制御部１８、内部状態保持回路２０、第１フィードバック経路２４、メインメモリ２７および第２フィードバック経路２９を備える。演算ユニット１２は複数の演算器で構成され、設定を変更することにより、機能の変更を可能とするリコンフィギュラブル回路である。演算ユニット１２は組合せ回路または順序回路等の論理回路として構成される。第１フィードバック経路２４および第２フィードバック経路２９は、フィードバックパスとして機能し、演算ユニット１２の出力を、演算ユニット１２の入力に接続する。 The integrated circuit device 26 is configured as one chip, and includes an arithmetic unit 12, a setting unit 14, a control unit 18, an internal state holding circuit 20, a first feedback path 24, a main memory 27, and a second feedback path 29. The arithmetic unit 12 is a reconfigurable circuit that includes a plurality of arithmetic units and that can change the function by changing the setting. The arithmetic unit 12 is configured as a logic circuit such as a combinational circuit or a sequential circuit. The first feedback path 24 and the second feedback path 29 function as a feedback path, and connect the output of the arithmetic unit 12 to the input of the arithmetic unit 12.

演算ユニット１２は、それぞれが複数の演算機能を選択的に実行可能な論理回路の多段配列と、前段の論理回路の出力と後段の論理回路の入力の接続関係を設定可能な接続部とを備える。構造的には、複数の論理回路列の間に、論理回路列間の接続用結線を設定する接続部が設けられる。演算ユニット１２は、複数段に配列された各論理回路の機能、および論理回路間の接続を任意に設定することで、機能の変更を可能とする。 The arithmetic unit 12 includes a multi-stage array of logic circuits each capable of selectively executing a plurality of arithmetic functions, and a connection unit capable of setting a connection relationship between the output of the preceding logic circuit and the input of the succeeding logic circuit. . Structurally, a connection part for setting connection lines between logic circuit strings is provided between the plurality of logic circuit strings. The arithmetic unit 12 can change the function by arbitrarily setting the function of each logic circuit arranged in a plurality of stages and the connection between the logic circuits.

演算ユニット１２は、パイプライン構成を有し、複数のスレッドを同時に実行することができる。スレッドは、演算ユニット１２に実行させる処理単位であり、各スレッドの処理は、それ自体で完結する。複数のスレッドは、互いに独立して実行され、スレッド同士の間でデータの受け渡しがあるものであってもよい。処理装置１０は、複数種類の回路のコンフィギュレーションを演算ユニット１２上で同時に実現できる。 The arithmetic unit 12 has a pipeline configuration and can execute a plurality of threads simultaneously. A thread is a processing unit to be executed by the arithmetic unit 12, and the processing of each thread is completed by itself. The plurality of threads may be executed independently of each other, and there may be data passing between the threads. The processing apparatus 10 can simultaneously implement a plurality of types of circuit configurations on the arithmetic unit 12.

設定部１４は、第１回路設定部１４ａ、第２回路設定部１４ｂ、第３回路設定部１４ｃ、および第４回路設定部１４ｄを有し、演算ユニット１２に所期の回路を構成するための設定データ４０を供給する。各回路設定部１４ａ〜１４ｄは、プログラムカウンタのカウント値に基づいて、保持するデータを出力するコマンドメモリとして構成されてもよい。この場合、制御部１８がシーケンサとして機能してプログラムカウンタの出力を制御し、設定データ４０がコマンドメモリから出力されるコマンドデータとなる。 The setting unit 14 includes a first circuit setting unit 14a, a second circuit setting unit 14b, a third circuit setting unit 14c, and a fourth circuit setting unit 14d, and is used to configure an expected circuit in the arithmetic unit 12. Setting data 40 is supplied. Each of the circuit setting units 14a to 14d may be configured as a command memory that outputs data to be held based on the count value of the program counter. In this case, the control unit 18 functions as a sequencer to control the output of the program counter, and the setting data 40 becomes command data output from the command memory.

ここでは演算ユニット１２が、４段のＡＬＵ列で構成されたＡＬＵアレイであることを前提とする。具体的に、第１回路設定部１４ａ、第２回路設定部１４ｂ、第３回路設定部１４ｃおよび第４回路設定部１４ｄは、それぞれ異なるスレッドを実行するための設定データ４０を、演算ユニット１２のパイプラインにおいて、各段における接続部およびＡＬＵ列で構成されるリコンフィギュラブルユニットに所定の順序で供給する。これにより、ＡＬＵアレイの各段には、それぞれ異なる種類の回路の一部が構成されることになり、マルチスレッド処理機能が実現される。 Here, it is assumed that the arithmetic unit 12 is an ALU array composed of four stages of ALU columns. Specifically, the first circuit setting unit 14a, the second circuit setting unit 14b, the third circuit setting unit 14c, and the fourth circuit setting unit 14d receive setting data 40 for executing different threads, respectively. In the pipeline, the data is supplied in a predetermined order to a reconfigurable unit composed of a connection portion and an ALU row in each stage. As a result, a part of a different type of circuit is configured in each stage of the ALU array, and a multithread processing function is realized.

内部状態保持回路２０は、例えばデータフリップフロップ（ＤＦＦ）などの順序回路として構成され、演算ユニット１２の出力を受け付ける。内部状態保持回路２０は第１フィードバック経路２４に接続されており、演算ユニット１２の出力を直接演算ユニット１２の入力にフィードバックさせる。なお第１フィードバック経路２４は、演算ユニット１２の最下段の出力のみを最上段に入力するように構成されてもよい。 The internal state holding circuit 20 is configured as a sequential circuit such as a data flip-flop (DFF), for example, and receives the output of the arithmetic unit 12. The internal state holding circuit 20 is connected to the first feedback path 24 and feeds back the output of the arithmetic unit 12 directly to the input of the arithmetic unit 12. Note that the first feedback path 24 may be configured to input only the lowermost output of the arithmetic unit 12 to the uppermost stage.

メインメモリ２７は、演算ユニット１２から出力される出力データを格納するためのＲＡＭにより構成される。メインメモリ２７は、演算ユニット１２の出力データを一時的に保持し、演算ユニット１２に戻す機能をもつ。メインメモリ２７は、制御部１８からのチップイネーブル（ＣＥ）信号およびアドレス信号に基づいて、データの書込／読出を行う。メインメモリ２７は第２フィードバック経路２９に接続されており、制御部１８からの読出指示に基づいて、所期のタイミングでデータを演算ユニット１２の入力にフィードバックさせる。なお、設定部１４がコマンドメモリとして構成されている場合、コマンドメモリから供給されるコマンドデータで、メインメモリ２７のデータの書込／読出を行ってもよい。 The main memory 27 is configured by a RAM for storing output data output from the arithmetic unit 12. The main memory 27 has a function of temporarily holding output data of the arithmetic unit 12 and returning it to the arithmetic unit 12. The main memory 27 writes / reads data based on a chip enable (CE) signal and an address signal from the control unit 18. The main memory 27 is connected to the second feedback path 29 and feeds back data to the input of the arithmetic unit 12 at a predetermined timing based on a read instruction from the control unit 18. When the setting unit 14 is configured as a command memory, the data in the main memory 27 may be written / read using command data supplied from the command memory.

処理装置１０においては、演算ユニット１２の出力を演算ユニット１２の入力にフィードバックする経路が、第１フィードバック経路２４および第２フィードバック経路２９の２系統存在する。第１フィードバック経路２４は、メインメモリ２７を介さないために、演算ユニット１２の出力データを高速にフィードバック処理することが可能である。一方、第２フィードバック経路２９は、制御部１８からの指示により所期のタイミングでデータ信号を演算ユニット１２に供給することができる。このように、第１フィードバック経路２４または第２フィードバック経路２９は、演算ユニット１２上に再構成する回路に応じて適宜使い分けられる。 In the processing apparatus 10, there are two paths for feeding back the output of the arithmetic unit 12 to the input of the arithmetic unit 12, a first feedback path 24 and a second feedback path 29. Since the first feedback path 24 does not go through the main memory 27, the output data of the arithmetic unit 12 can be fed back at high speed. On the other hand, the second feedback path 29 can supply a data signal to the arithmetic unit 12 at an expected timing according to an instruction from the control unit 18. As described above, the first feedback path 24 or the second feedback path 29 is appropriately used depending on the circuit to be reconfigured on the arithmetic unit 12.

演算ユニット１２は、機能の変更が可能な論理回路を有して構成される。複数の論理回路は、マトリックス状に配置された構造をとってもよい。各論理回路の機能と、論理回路間の接続関係は、設定部１４により供給される設定データ４０に基づいて設定される。また、論理回路内において、基本演算素子同士を接続する組合せ用結線も、設定データ４０に基づいて設定される。設定データ４０は、以下の手順で生成される。 The arithmetic unit 12 includes a logic circuit whose function can be changed. The plurality of logic circuits may have a structure arranged in a matrix. The function of each logic circuit and the connection relationship between the logic circuits are set based on setting data 40 supplied by the setting unit 14. In the logic circuit, the combination connection for connecting the basic arithmetic elements to each other is also set based on the setting data 40. The setting data 40 is generated by the following procedure.

集積回路装置２６により実現されるべきプログラム３６が、記憶部３４に保持されている。プログラム３６は、回路における処理の動作を記述した動作記述を示し、信号処理回路または信号処理アルゴリズムなどをＣ言語などの高級言語で記述したものである。コンパイル部３０は、記憶部３４に格納されたプログラム３６をコンパイルし、データフローグラフ（ＤＦＧ）３８に変換して記憶部３４に格納する。データフローグラフ３８は、回路における演算間の実行順序の依存関係を表現し、入力変数および定数の演算の流れをグラフ構造で示したものである。一般に、データフローグラフ３８は、上から下に向かって演算が進むように作成される。 A program 36 to be realized by the integrated circuit device 26 is held in the storage unit 34. The program 36 shows an operation description describing the operation of processing in the circuit, and describes a signal processing circuit or a signal processing algorithm in a high-level language such as C language. The compiling unit 30 compiles the program 36 stored in the storage unit 34, converts it into a data flow graph (DFG) 38, and stores it in the storage unit 34. The data flow graph 38 expresses the dependency of execution order between operations in a circuit, and shows the flow of operations of input variables and constants in a graph structure. In general, the data flow graph 38 is created so that the calculation proceeds from top to bottom.

データフローグラフ処理部３１は、コンパイル部３０により生成されたデータフローグラフ３８から、所定の規則を有するデータフローの一群を探索する。このデータフローの一群は、演算ノード間に所定の接続関係を有するノードの集合であって、予め記憶部３４において登録されている。データフローグラフ処理部３１は、所定のデータフローの一群を探索して、その一群を構成するノード数よりも少ない数のノードに置換する。 The data flow graph processing unit 31 searches the data flow graph 38 generated by the compiling unit 30 for a group of data flows having a predetermined rule. A group of data flows is a set of nodes having a predetermined connection relationship between operation nodes, and is registered in the storage unit 34 in advance. The data flow graph processing unit 31 searches for a group of predetermined data flows and replaces them with a number of nodes smaller than the number of nodes constituting the group.

なお、データフローの一群から置換されたノードは、演算ユニット１２の論理回路で処理可能なノードである必要がある。これに対応して、論理回路は、置換されたノードを処理するために、自身のもつ複数の基本演算素子を所定の順序で組み合わせるための組合せ用結線を有して構成される。これにより、論理回路は基本演算素子の数を増やすことなく、組合せ用結線をもつことで、複数の基本演算素子により実行される新たな演算機能をもつことができる。言い換えると、論理回路における複数の基本演算素子の可能な組合せに応じて、所定の規則を有するデータフローの一群を予め登録し、データフローグラフ処理部３１が、１つの論理回路において実行可能なデータフローの一群を探索して、１つのノードに置換することが可能となる。 The node replaced from the group of data flows needs to be a node that can be processed by the logic circuit of the arithmetic unit 12. Correspondingly, the logic circuit is configured to have a combination connection for combining a plurality of basic arithmetic elements of its own in a predetermined order in order to process the replaced node. As a result, the logic circuit can have a new arithmetic function executed by a plurality of basic arithmetic elements by having the combination connection without increasing the number of basic arithmetic elements. In other words, a group of data flows having a predetermined rule is registered in advance according to possible combinations of a plurality of basic arithmetic elements in the logic circuit, and the data flow graph processing unit 31 can execute data in one logic circuit. A group of flows can be searched and replaced with one node.

設定データ生成部３２は、データフローグラフ処理部３１により処理されたデータフローグラフ３８から設定データ４０を生成する。設定データ４０は、データフローグラフ３８を演算ユニット１２にマッピングするためのデータであり、演算ユニット１２における論理回路の機能や論理回路間の接続関係を定める。 The setting data generation unit 32 generates setting data 40 from the data flow graph 38 processed by the data flow graph processing unit 31. The setting data 40 is data for mapping the data flow graph 38 to the arithmetic unit 12 and defines the function of the logic circuit in the arithmetic unit 12 and the connection relationship between the logic circuits.

なお設定データ生成部３２は、コンパイル部３０により生成されたメモリ（ＲＡＭ）と演算ユニット１２またはスレッドなどの演算処理単位との関係をもとに、メモリと演算ユニット１２との接続関係、およびメモリアクセスに関する制御を定める設定データ４０も生成する。この設定データ４０は、メインメモリ２７の再構成に利用される。 Note that the setting data generating unit 32 is based on the relationship between the memory (RAM) generated by the compiling unit 30 and the arithmetic processing unit such as the arithmetic unit 12 or the thread, and the connection relationship between the memory and the arithmetic unit 12 and the memory. Setting data 40 that defines control related to access is also generated. This setting data 40 is used for reconfiguration of the main memory 27.

図２は、演算ユニット１２の構成を示す。演算ユニット１２は、複数の演算機能を選択的に実行可能な論理回路５０より構成される論理回路列を複数備える。具体的に、演算ユニット１２は、論理回路列の多段配列と、各段に設けられた接続部５２を備えて構成される。接続部５２は、前段の論理回路の出力と後段の論理回路の入力の任意の接続関係あるいは予め定められた接続関係の組合せの中から選択された接続関係を設定することができる。また接続部５２は、前段の論理回路の出力信号を保持することができる。演算ユニット１２では、論理回路の多段配列構造により、上段から下段に向かって演算が進められる。 FIG. 2 shows the configuration of the arithmetic unit 12. The arithmetic unit 12 is provided with a plurality of logic circuit arrays each including a logic circuit 50 that can selectively execute a plurality of arithmetic functions. Specifically, the arithmetic unit 12 includes a multi-stage arrangement of logic circuit arrays and a connection unit 52 provided at each stage. The connection unit 52 can set an arbitrary connection relationship between the output of the preceding logic circuit and the input of the subsequent logic circuit, or a connection relationship selected from a predetermined combination of connection relationships. The connection unit 52 can hold the output signal of the preceding logic circuit. In the arithmetic unit 12, the operation proceeds from the upper stage to the lower stage due to the multi-stage arrangement structure of the logic circuits.

演算ユニット１２は、論理回路５０としてＡＬＵ(Arithmetic Logic Unit)を有している。ＡＬＵは、複数種類の多ビット演算を選択的に実行可能な算術論理回路であって、論理和、論理積、ビットシフトなどの複数種類の多ビット演算を設定により選択的に実行できる。各ＡＬＵは、複数の演算機能を設定するためのセレクタを有して構成されている。図示の例では、ＡＬＵが、２つの入力端子と１つの出力端子を有して構成される。なおＡＬＵは、３つ以上の入力端子を有してもよく、また２つ以上の出力端子を有してもよい。 The arithmetic unit 12 has an ALU (Arithmetic Logic Unit) as the logic circuit 50. The ALU is an arithmetic logic circuit capable of selectively executing a plurality of types of multi-bit operations, and can selectively execute a plurality of types of multi-bit operations such as logical sum, logical product, and bit shift by setting. Each ALU has a selector for setting a plurality of arithmetic functions. In the illustrated example, the ALU is configured to have two input terminals and one output terminal. The ALU may have three or more input terminals, and may have two or more output terminals.

図示のように、演算ユニット１２は、縦方向にＸ個、横方向にＹ個のＡＬＵが配置されたＸ段Ｙ列のＡＬＵアレイとして構成される。図２は、４段のＡＬＵ列５３を有するＡＬＵアレイを示し、接続部５２ａ、ＡＬＵ列５３ａ、接続部５２ｂ、ＡＬＵ列５３ｂ、接続部５２ｃ、ＡＬＵ列５３ｃ、接続部５２ｄ、ＡＬＵ列５３ｄ、接続部５２ｅが、この順に接続されている。接続部５２と後段のＡＬＵ列５３はリコンフィギュラブルユニットを構成し、リコンフィギュラブルユニットにおいて接続部５２は、外部から入力される変数や定数を、後段のＡＬＵ列５３における所期のＡＬＵに供給する機能を有している。また接続部５２は、前段のＡＬＵの演算結果を外部に直接出力することもできる。なおＡＬＵアレイは、複数のＡＬＵの物理的配置がアレイ状である必要はなく、ＡＬＵ間の接続関係がアレイ状に構成されていればよい。 As shown in the figure, the arithmetic unit 12 is configured as an ALU array of X stages and Y columns in which X ALUs in the vertical direction and Y ALUs in the horizontal direction are arranged. FIG. 2 shows an ALU array having four stages of ALU columns 53. The connection unit 52a, the ALU column 53a, the connection unit 52b, the ALU column 53b, the connection unit 52c, the ALU column 53c, the connection unit 52d, the ALU column 53d, and the connection The parts 52e are connected in this order. The connection unit 52 and the subsequent ALU column 53 constitute a reconfigurable unit. In the reconfigurable unit, the connection unit 52 supplies variables and constants input from the outside to the intended ALU in the subsequent ALU column 53. It has a function to do. The connection unit 52 can also directly output the calculation result of the preceding ALU to the outside. In the ALU array, the physical arrangement of the plurality of ALUs does not need to be in an array, and the connection relationship between the ALUs may be configured in an array.

ＡＬＵ１１、ＡＬＵ１２、・・・、ＡＬＵ１Ｙから構成される第１段のＡＬＵ列５３ａには、接続部５２ａを介して変数や定数が入力され、設定された所定の演算がなされる。演算結果の出力は、接続部５２ｂに設定された接続にしたがって、ＡＬＵ２１、ＡＬＵ２２、・・・、ＡＬＵ２Ｙから構成される第２段のＡＬＵ列５３ｂに入力される。接続部５２ｂにおいては、ＡＬＵ列５３ａの出力とＡＬＵ列５３ｂの入力の間で任意の接続関係、あるいは予め定められた接続関係の組合せの中から選択された接続関係を実現できるように結線が構成されており、設定により所期の結線が有効となる。以下、最終段である接続部５２ｅまで同様の構成である。各接続部５２はＤＦＦ回路としての機能を有し、最終段の接続部５２ｅは、図１に示す内部状態保持回路２０として機能してもよい。 Variables and constants are input to the first-stage ALU column 53a composed of ALU11, ALU12,..., ALU1Y via the connection unit 52a, and a predetermined calculation is performed. The output of the calculation result is input to the second-stage ALU column 53b composed of the ALU 21, ALU 22,..., ALU 2Y according to the connection set in the connection unit 52b. In the connection unit 52b, the connection is configured so that an arbitrary connection relationship between the output of the ALU column 53a and the input of the ALU column 53b or a connection relationship selected from a predetermined combination of connection relationships can be realized. The desired connection is enabled by setting. Hereinafter, the configuration is the same up to the connection portion 52e which is the final stage. Each connection unit 52 may function as a DFF circuit, and the final-stage connection unit 52e may function as the internal state holding circuit 20 illustrated in FIG.

回路のコンフィギュレーションは１クロックで行われる。具体的に、設定部１４が１クロックごとに設定データを演算ユニット１２にマッピングする。なお図１では、設定部１４が４つの第１回路設定部１４ａ〜第４回路設定部１４ｄを有しているが、図２に示すように演算ユニット１２が、５つの接続部５２および４つのＡＬＵ列５３で構成される場合は、各接続部５２および各ＡＬＵ列５３に設定データ４０を供給するために、設定部１４は、９つの回路設定部を有して構成されてもよい。 Circuit configuration is performed in one clock. Specifically, the setting unit 14 maps the setting data to the arithmetic unit 12 every clock. In FIG. 1, the setting unit 14 includes four first circuit setting units 14 a to 14 d. However, as illustrated in FIG. 2, the arithmetic unit 12 includes five connection units 52 and four circuit setting units 14. When configured with the ALU column 53, the setting unit 14 may include nine circuit setting units in order to supply the setting data 40 to each connection unit 52 and each ALU column 53.

各ＡＬＵ列５３の出力は、後段の接続部５２に保持される。複数スレッドの実行中、接続部５２のＤＦＦ回路は、前段の論理回路から出力されるデータを保持し、次のクロックで、前段の論理回路が実行していたスレッドと同一のスレッドを実行する後段の論理回路に、保持したデータを供給する。このように、１つのスレッドの処理は、クロックごとに１つ後段のＡＬＵ列５３において実行されることになる。最終段で処理されると、また最上段のＡＬＵ列からクロックごとに１段ずつ下がっていく。これにより、マルチスレッド処理を実行でき、効率的な回路コンフィギュレーションを実現できる。 The output of each ALU column 53 is held in the subsequent connection unit 52. During the execution of a plurality of threads, the DFF circuit of the connection unit 52 holds the data output from the preceding logic circuit and executes the same thread as the thread executed by the preceding logic circuit at the next clock. The held data is supplied to the logic circuit. As described above, the processing of one thread is executed in the ALU row 53 at the next stage for each clock. When processing is performed in the final stage, it is lowered one stage at a time from the uppermost ALU column. Thereby, multi-thread processing can be executed, and an efficient circuit configuration can be realized.

図３は、マルチスレッド処理を説明するための説明図である。第Ｔクロックで、ＡＬＵ列５３ａがスレッド１を、ＡＬＵ列５３ｂがスレッド４を、ＡＬＵ列５３ｃがスレッド３を、ＡＬＵ列５３ｄがスレッド２をそれぞれ処理する。次の第（Ｔ＋１）クロックで、ＡＬＵ列５３ａがスレッド２を、ＡＬＵ列５３ｂがスレッド１を、ＡＬＵ列５３ｃがスレッド４を、ＡＬＵ列５３ｄがスレッド３をそれぞれ処理する。このように、各スレッドが、１クロックごとに後段のＡＬＵ列５３により処理されることで、マルチスレッド処理が実行される。 FIG. 3 is an explanatory diagram for explaining multithread processing. At the T-th clock, the ALU column 53a processes the thread 1, the ALU column 53b processes the thread 4, the ALU column 53c processes the thread 3, and the ALU column 53d processes the thread 2. At the next (T + 1) th clock, the ALU column 53a processes the thread 2, the ALU column 53b processes the thread 1, the ALU column 53c processes the thread 4, and the ALU column 53d processes the thread 3. In this way, each thread is processed by the ALU row 53 in the subsequent stage for each clock, whereby multithread processing is executed.

図１に戻って、回路の構成時、制御部１８は、設定データ４０を記憶部３４から選択して読み出し、設定部１４に供給する。設定部１４は、各設定データ４０を格納する。設定部１４がコマンドメモリとして構成されている場合、制御部１８は設定部１４に対してプログラムカウンタ値を与え、設定部１４は、そのカウンタ値に応じて格納した設定データを、コマンドデータとして演算ユニット１２に設定する。なお、設定部１４は、キャッシュメモリや他の種類のメモリを有して構成されてもよい。なお、本例においては、制御部１８が記憶部３４から設定データ４０を受けて、その設定データを設定部１４に供給する構成について説明するが、制御部１８を介さずに、予め設定部１４に設定データを格納しておいてもよい。 Returning to FIG. 1, when the circuit is configured, the control unit 18 selects and reads the setting data 40 from the storage unit 34 and supplies the setting data 40 to the setting unit 14. The setting unit 14 stores each setting data 40. When the setting unit 14 is configured as a command memory, the control unit 18 gives a program counter value to the setting unit 14, and the setting unit 14 calculates setting data stored according to the counter value as command data. Set to unit 12. The setting unit 14 may include a cache memory and other types of memory. In this example, a configuration in which the control unit 18 receives the setting data 40 from the storage unit 34 and supplies the setting data to the setting unit 14 will be described. However, the setting unit 14 is not provided via the control unit 18 in advance. The setting data may be stored in the.

設定部１４は、設定データ４０を演算ユニット１２に設定し、演算ユニット１２に回路を逐次再構成させる。これにより、演算ユニット１２は、所期の演算を実行できる。演算ユニット１２は、基本セルとして高性能の演算能力のあるＡＬＵを用いており、また演算ユニット１２および設定部１４を１チップ上に構成することから、コンフィグレーションを高速に１クロックで実現することができる。制御部１８はクロック機能を有し、クロック信号は、メインメモリ２７に供給される。また制御部１８は４進カウンタを含み、カウント信号を設定部１４に供給してもよい。 The setting unit 14 sets the setting data 40 in the arithmetic unit 12 and causes the arithmetic unit 12 to sequentially reconfigure the circuit. Thereby, the arithmetic unit 12 can perform an intended calculation. The arithmetic unit 12 uses an ALU having high-performance arithmetic capability as a basic cell, and the arithmetic unit 12 and the setting unit 14 are configured on one chip, so that the configuration can be realized at a high speed with one clock. Can do. The control unit 18 has a clock function, and the clock signal is supplied to the main memory 27. The control unit 18 may include a quaternary counter and supply a count signal to the setting unit 14.

図１は、１つの演算ユニット１２を備えた処理装置１０の基本構成を示すが、以下において、処理装置１０は、複数の演算ユニット１２を有し、各演算ユニット１２がメインメモリ２７を構成するために使用可能な記憶領域（ＲＡＭ）を制限することで、全体の回路規模を小型化する。具体的には演算ユニット１２とＲＡＭの接続をハードウェア上で制限することで、回路規模を削減しつつ、接続制限されたなかで、演算ユニット１２にＲＡＭを効率的に割り当てる。なお以下では、処理装置１０が、複数の演算ユニット１２のメインメモリ２７の構成機能を有する例について説明するが、メインメモリ２７の再構成機能は、演算ユニット単位のほか、スレッド単位で実現されてもよい。 FIG. 1 shows a basic configuration of a processing apparatus 10 including one arithmetic unit 12. In the following, the processing apparatus 10 has a plurality of arithmetic units 12, and each arithmetic unit 12 constitutes a main memory 27. Therefore, the overall circuit scale is reduced by limiting the usable storage area (RAM). Specifically, by restricting the connection between the arithmetic unit 12 and the RAM on hardware, the RAM is efficiently allocated to the arithmetic unit 12 while the connection is restricted while reducing the circuit scale. In the following, an example in which the processing device 10 has the configuration function of the main memory 27 of the plurality of arithmetic units 12 will be described. However, the reconfiguration function of the main memory 27 is realized in units of threads in addition to the units of arithmetic units. Also good.

図４は、本実施形態の処理装置１０の構成の一例を示す。図４に示す処理装置１０は複数の演算ユニット１２ａ、１２ｂ、１２ｃ、１２ｄを備え、各演算ユニット１２に対してメモリを再構成可能なメモリ構成機能をもつ。メモリ群として、共通メモリ回路６０、個別メモリ回路７０、個別メモリ回路８０が設けられ、共通メモリ回路６０、個別メモリ回路７０、８０のそれぞれは、第１選択回路１１０と２つのＲＡＭ１００とから構成される。なお各メモリ回路は、３つ以上のＲＡＭ１００を有してもよく、また１つのＲＡＭ１００のみを有してもよい。なお処理装置１０は、複数の集積回路装置２６（図１参照）を備え、複数の集積回路装置２６で使用できる再構成機能をもつメモリを有した構成をとってもよい。またこの場合、再構成機能をもつメモリを使用しない演算ユニット１２が存在してもよい。 FIG. 4 shows an example of the configuration of the processing apparatus 10 of the present embodiment. The processing apparatus 10 shown in FIG. 4 includes a plurality of arithmetic units 12a, 12b, 12c, and 12d, and has a memory configuration function capable of reconfiguring the memory for each arithmetic unit 12. As a memory group, a common memory circuit 60, an individual memory circuit 70, and an individual memory circuit 80 are provided. Each of the common memory circuit 60 and the individual memory circuits 70 and 80 includes a first selection circuit 110 and two RAMs 100. The Note that each memory circuit may include three or more RAMs 100, or may include only one RAM 100. The processing apparatus 10 may include a plurality of integrated circuit devices 26 (see FIG. 1) and a memory having a reconfiguration function that can be used by the plurality of integrated circuit devices 26. In this case, there may be an arithmetic unit 12 that does not use a memory having a reconfiguration function.

具体的に共通メモリ回路６０は、演算ユニット１２ａ、１２ｂ、１２ｄにより使用可能であり、第１選択回路１１０ａと、ＲＡＭ１００ｅ、１００ｆを有する。個別メモリ回路７０は、演算ユニット１２ａ、１２ｂのみにより使用可能であり、第１選択回路１１０ｂと、ＲＡＭ１００ａ、１００ｂを有する。個別メモリ回路８０は、演算ユニット１２ｃ、１２ｄのみにより使用可能であり、第１選択回路１１０ｃと、ＲＡＭ１００ｃ、１００ｄを有する。ここでは、ＲＡＭ１００ａの容量が２０４８バイトで、それ以外のＲＡＭ１００ｂ〜１００ｆの容量が１０２４バイトであるとし、各演算ユニット１２での演算結果や途中のデータを保持するメモリを、メモリ群のＲＡＭ１００を用いて構成する例を示す。 Specifically, the common memory circuit 60 can be used by the arithmetic units 12a, 12b, and 12d, and includes a first selection circuit 110a and RAMs 100e and 100f. The individual memory circuit 70 can be used only by the arithmetic units 12a and 12b, and includes a first selection circuit 110b and RAMs 100a and 100b. The individual memory circuit 80 can be used only by the arithmetic units 12c and 12d, and includes a first selection circuit 110c and RAMs 100c and 100d. Here, it is assumed that the capacity of the RAM 100a is 2048 bytes, and the capacity of the other RAMs 100b to 100f is 1024 bytes, and the memory that holds the calculation results and intermediate data in each calculation unit 12 is used as the RAM 100 of the memory group. An example of configuration is shown below.

演算ユニット１２に対して、メモリ群との接続を選択する第２選択回路１２０が設けられる。第２選択回路１２０ａは、演算ユニット１２ａに対して設けられ、共通メモリ回路６０または個別メモリ回路７０のいずれかと演算ユニット１２ａとを接続する。第２選択回路１２０ｂは、演算ユニット１２ｂに対して設けられ、共通メモリ回路６０または個別メモリ回路７０のいずれかと演算ユニット１２ｂとを接続する。第２選択回路１２０ｃは、演算ユニット１２ｃに対して設けられるが、演算ユニット１２ｃは共通メモリ回路６０を使用しないため、この例では第２選択回路１２０ｃは存在しなくてもよい。第２選択回路１２０ｄは、演算ユニット１２ｄに対して設けられ、共通メモリ回路６０または個別メモリ回路８０のいずれかと演算ユニット１２ｄとを接続する。 A second selection circuit 120 that selects connection with the memory group is provided for the arithmetic unit 12. The second selection circuit 120a is provided for the arithmetic unit 12a, and connects either the common memory circuit 60 or the individual memory circuit 70 to the arithmetic unit 12a. The second selection circuit 120b is provided for the arithmetic unit 12b, and connects either the common memory circuit 60 or the individual memory circuit 70 to the arithmetic unit 12b. Although the second selection circuit 120c is provided for the arithmetic unit 12c, since the arithmetic unit 12c does not use the common memory circuit 60, the second selection circuit 120c may not exist in this example. The second selection circuit 120d is provided for the arithmetic unit 12d, and connects either the common memory circuit 60 or the individual memory circuit 80 to the arithmetic unit 12d.

処理装置１０において、第１選択回路１１０および第２選択回路１２０は、演算ユニットに対して記憶領域を必要とする場合に少なくとも１つの記憶領域（ＲＡＭ１００）を用いてアドレス空間を構成する記憶領域構成回路を形成する。処理装置１０において、少なくとも１つのＲＡＭ１００は、複数の演算ユニットの一部にのみ接続されており、これにより回路規模を削減することができる。たとえばＲＡＭ１００ｃ、１００ｄは、演算ユニット１２ｃ、１２ｄにのみ接続されており、演算ユニット１２ａ、１２ｂには接続されていない。 In the processing device 10, the first selection circuit 110 and the second selection circuit 120 have a storage area configuration that forms an address space using at least one storage area (RAM 100) when a storage area is required for the arithmetic unit. Form a circuit. In the processing apparatus 10, at least one RAM 100 is connected to only a part of the plurality of arithmetic units, thereby reducing the circuit scale. For example, the RAMs 100c and 100d are connected only to the arithmetic units 12c and 12d, and are not connected to the arithmetic units 12a and 12b.

図５は、本実施形態の処理装置１０の構成の変形例を示す。図５に示す処理装置１０は、１つの演算ユニット１２が、複数のスレッドを処理する。図示の例では、演算ユニット１２ａが、スレッド１、２を実行し、演算ユニット１２ｂが、スレッド３、４を実行する。図５に示す処理装置１０は、各スレッドに対してメモリを再構成可能なメモリ構成機能をもつ。なお、図４および図５に示す処理装置１０において、メモリを構成するための第１選択回路１１０ａ〜１１０ｃ、第２選択回路１２０ａ〜１２０ｄの配置は同じであり、以下は、図４に示す処理装置１０に関連して、メモリの構成機能を説明する。 FIG. 5 shows a modification of the configuration of the processing apparatus 10 of the present embodiment. In the processing apparatus 10 illustrated in FIG. 5, one arithmetic unit 12 processes a plurality of threads. In the illustrated example, the arithmetic unit 12 a executes threads 1 and 2, and the arithmetic unit 12 b executes threads 3 and 4. The processing apparatus 10 shown in FIG. 5 has a memory configuration function that can reconfigure the memory for each thread. 4 and FIG. 5, the arrangement of the first selection circuits 110a to 110c and the second selection circuits 120a to 120d for constituting the memory is the same, and the following processing is shown in FIG. In relation to the device 10, the memory configuration functions will be described.

図６は、各演算ユニット１２に対してＲＡＭ１００を用いてメインメモリ２７を構成した例を示す。メインメモリ２７の構成手法については、図１６（ａ）〜図１６（ｃ）に関連して詳述する。この例では、演算ユニット１２ａが容量３０７２バイトのメモリを、演算ユニット１２ｂが容量１０２４バイトのメモリを、演算ユニット１２ｃが容量１０２４バイトのメモリを、演算ユニット１２ｄが容量２０４８バイトのメモリを必要としている。 FIG. 6 shows an example in which the main memory 27 is configured using the RAM 100 for each arithmetic unit 12. The configuration method of the main memory 27 will be described in detail with reference to FIGS. 16 (a) to 16 (c). In this example, the arithmetic unit 12a requires a memory with a capacity of 3072 bytes, the arithmetic unit 12b requires a memory with a capacity of 1024 bytes, the arithmetic unit 12c requires a memory with a capacity of 1024 bytes, and the arithmetic unit 12d requires a memory with a capacity of 2048 bytes. .

図６に示すメモリ構成例では、演算ユニット１２ａに対して、共通メモリ回路６０のＲＡＭ１００ｅ（容量１０２４バイト）と、個別メモリ回路７０のＲＡＭ１００ａ（容量２０４８バイト）とにより、容量３０７２バイトの１つのメインメモリ２７ａが構成されている。演算ユニット１２ｂに対しては、個別メモリ回路７０のＲＡＭ１００ｂにより、容量１０２４バイトのメインメモリ２７ｂが構成され、演算ユニット１２ｃに対しては、個別メモリ回路８０のＲＡＭ１００ｃにより、容量１０２４バイトのメインメモリ２７ｃが構成され、メインメモリ２７ｄに対しては、個別メモリ回路８０のＲＡＭ１００ｄと、共通メモリ回路６０のＲＡＭ１００ｆとにより、容量２０４８バイトの１つのメインメモリ２７ｄが構成されている。図中、ａｄｄｒは、メインメモリ２７におけるアドレスを示す。 In the memory configuration example shown in FIG. 6, for the arithmetic unit 12a, the RAM 100e (capacity: 1024 bytes) of the common memory circuit 60 and the RAM 100a (capacity: 2048 bytes) of the individual memory circuit 70, one main having a capacity of 3072 bytes. A memory 27a is configured. For the arithmetic unit 12b, the RAM 100b of the individual memory circuit 70 constitutes a main memory 27b having a capacity of 1024 bytes. For the arithmetic unit 12c, the RAM 100c of the individual memory circuit 80 provides a main memory 27c having a capacity of 1024 bytes. For the main memory 27d, the RAM 100d of the individual memory circuit 80 and the RAM 100f of the common memory circuit 60 constitute one main memory 27d having a capacity of 2048 bytes. In the figure, addr indicates an address in the main memory 27.

図７は、共通メモリ回路６０の構成を示す。共通メモリ回路６０は、第１選択回路１１０ａおよびＲＡＭ１００ｅ、１００ｆを有して構成されており、第１選択回路１１０ａは、メモリ制御回路１３０ｅ、１３０ｆと、読出制御回路１５０ａとを備える。メモリ制御回路１３０ｅ、１３０ｆの構成は同じであるため、メモリ制御回路１３０ｆの図示は省略する。また読出制御回路１５０ａは、読出データを、演算ユニット１２ａに設けられた第２選択回路１２０ａに供給するための回路であり、本実施形態では、共通メモリ回路６０が、演算ユニット１２ａ以外にも、演算ユニット１２ｂ、１２ｄに接続されているため、読出データを第２選択回路１２０ｂ、１２０ｄに供給するための読出制御回路（図示せず）がさらに設けられる。メモリ制御回路１３０はＲＡＭ１００ごとに設けられ、読出制御回路１５０は演算ユニット１２ごとに設けられる。なお、図７ではＲＡＭ１００を演算処理に割り当てる例として、ＲＡＭ１００を演算ユニット１２に対して割り当てる回路構成を示しているが、既述したようにＲＡＭ１００をスレッドに対して割り当てることも可能である。 FIG. 7 shows the configuration of the common memory circuit 60. The common memory circuit 60 includes a first selection circuit 110a and RAMs 100e and 100f. The first selection circuit 110a includes memory control circuits 130e and 130f and a read control circuit 150a. Since the memory control circuits 130e and 130f have the same configuration, the memory control circuit 130f is not shown. The read control circuit 150a is a circuit for supplying read data to the second selection circuit 120a provided in the arithmetic unit 12a. In the present embodiment, the common memory circuit 60 includes the arithmetic unit 12a in addition to the arithmetic unit 12a. Since it is connected to the arithmetic units 12b and 12d, a read control circuit (not shown) for supplying read data to the second selection circuits 120b and 120d is further provided. The memory control circuit 130 is provided for each RAM 100, and the read control circuit 150 is provided for each arithmetic unit 12. FIG. 7 shows a circuit configuration in which the RAM 100 is assigned to the arithmetic unit 12 as an example of assigning the RAM 100 to the arithmetic processing. However, as described above, the RAM 100 can also be assigned to the thread.

なお図７は共通メモリ回路６０の構成を示しているが、個別メモリ回路７０、８０も、共通メモリ回路６０と同様に構成することができる。 Although FIG. 7 shows the configuration of the common memory circuit 60, the individual memory circuits 70 and 80 can also be configured in the same manner as the common memory circuit 60.

メモリ制御回路１３０は、自身が制御するＲＡＭ１００へのアクセスであるかを判定し、それによってＲＡＭ１００を制御するための回路である。以下、ＲＡＭ１００ｅのメモリ制御回路１３０ｅを例に説明する。 The memory control circuit 130 is a circuit for determining whether the access is to the RAM 100 controlled by the memory control circuit 130 and controlling the RAM 100 accordingly. Hereinafter, the memory control circuit 130e of the RAM 100e will be described as an example.

ｓｅｔ_ａ、ｓｅｔ_ｂはメモリ構成情報である。具体的にｓｅｔ_ａはＲＡＭ１００ｅがどの演算ユニット１２のメモリを構成しているか示す情報であり、ｓｅｔ_ｂは構成されたメモリのアドレス空間のうちのどこに相当するか示す情報である。なおｓｅｔ_ａ、ｓｅｔ_ｂは、どの演算ユニット１２にも使用されないという情報を有してもよい。マルチプレクサ（ＭＵＸ）１３２、１３４、１３６は、接続する演算ユニット１２ａ、１２ｂ、１２ｄから供給される書込データ（ｗ_ｄａｔａ）、アドレス信号（ａｄｄｒ）、書込判定信号（ｗ_ｅｎ）のそれぞれから、ＲＡＭ１００ｅを使用可能な演算ユニット１２のうちのひとつを選択する回路であり、ｓｅｔ_ａによって制御される。例えば、図６のメモリ構成例では、ＲＡＭ１００ｅは演算ユニット１２ａのメインメモリ２７ａの一部として構成されているため、ｓｅｔ_ａは演算ユニット１２ａを示す信号となっており、ＭＵＸ１３２、１３４、１３６は、それぞれ演算ユニット１２ａに関するｗ_ｄａｔａ、ａｄｄｒ、ｗ_ｅｎを選択する。 set_a and set_b are memory configuration information. Specifically, set_a is information indicating which of the arithmetic units 12 the RAM 100e constitutes, and set_b is information indicating where in the address space of the configured memory it corresponds. Note that set_a and set_b may have information that they are not used by any arithmetic unit 12. Multiplexers (MUX) 132, 134, and 136 store the RAM 100 e from write data (w_data), address signal (addr), and write determination signal (w_en) supplied from the arithmetic units 12 a, 12 b, and 12 d to be connected. This circuit selects one of the usable arithmetic units 12 and is controlled by set_a. For example, in the memory configuration example of FIG. 6, since the RAM 100e is configured as a part of the main memory 27a of the arithmetic unit 12a, set_a is a signal indicating the arithmetic unit 12a, and the MUXs 132, 134, and 136 are respectively Select w_data, addr, and w_en for the arithmetic unit 12a.

アクセス判定回路１３８は、ａｄｄｒの特定ビット（以下、判定ビット信号と呼ぶ）を用いてａｄｄｒがｓｅｔ_ｂで示されたアドレス範囲に入っているかどうかを判定する。ここで、容量が１０２４バイトと２０４８バイトのような複数の種類のＲＡＭ１００が存在する場合、容量の小さいＲＡＭもしくは同じ容量のＲＡＭの個数が多い順に小さいアドレス値を割り当てるようにすると、ａｄｄｒの上位のビットを見ることによってアクセス判定回路１３８が効率的に判定することが可能である。例えば、図６に示すメモリ構成例では、演算ユニット１２ａはＲＡＭ１００ａとＲＡＭ１００ｅを使用するため、容量の小さいもしくは同じ容量のＲＡＭの個数が多いという条件に合致するＲＡＭ１００ｅをアドレス値０から１０２３に割り当て、ＲＡＭ１００ａをアドレス値１０２４から３０７１に割り当てる。そうすると、ａｄｄｒの下位から１１ビット目以上のビットを判定ビット信号とし、判定ビット信号が０であればＲＡＭ１００ｅへのアクセスであることが判定される。この場合、ｓｅｔ_ｂは０を示す信号となり、アクセス判定回路１３８は判定ビット信号が０かどうかを判定し、０ならアクティブ、０でないならネガティブとして結果を出力する。 The access determination circuit 138 determines whether addr is in the address range indicated by set_b using a specific bit of addr (hereinafter referred to as a determination bit signal). Here, when there are a plurality of types of RAMs 100 having capacities of 1024 bytes and 2048 bytes, if a smaller address value is assigned in the descending order of the number of RAMs having the smaller capacity or the same capacity, the higher order of the addr The access determination circuit 138 can efficiently determine the bit by looking at the bit. For example, in the memory configuration example shown in FIG. 6, since the arithmetic unit 12a uses the RAM 100a and the RAM 100e, the RAM 100e that meets the condition that the number of RAMs having a small capacity or the same capacity is large is assigned to address values 0 to 1023, The RAM 100a is assigned to address values 1024 to 3071. Then, the eleventh bit or more from the lower order of addr is used as a determination bit signal. If the determination bit signal is 0, it is determined that the access is to the RAM 100e. In this case, set_b is a signal indicating 0, and the access determination circuit 138 determines whether the determination bit signal is 0, and outputs the result as active if 0 and negative if not.

ＣＥ生成回路１４２は、アクセス判定回路１３８の出力と選択されたｗ_ｅｎからＲＡＭ１００ｅの制御信号を生成する。例えば、アクセス判定回路１３８の出力と選択されたｗ_ｅｎがともにアクティブならＲＡＭ１００ｅに書き込みを行なわせる信号を出力し、アクセス判定回路１３８の出力がアクティブであって、ｗ_ｅｎがアクティブでないならＲＡＭ１００ｅに読み出しを行なわせる信号を出力する。 The CE generation circuit 142 generates a control signal for the RAM 100e from the output of the access determination circuit 138 and the selected w_en. For example, if the output of the access determination circuit 138 and the selected w_en are both active, a signal for writing to the RAM 100e is output. If the output of the access determination circuit 138 is active and the w_en is not active, the RAM 100e is read. Output a signal.

読出制御回路１５０ａは、メモリ群のＲＡＭ１００ｅ、１００ｆから読み出されたデータ（ｒ_ｄａｔａ）のうち、演算ユニット１２ａに渡す読出データを選択する。読出制御回路１５０ａは、読出データ変換回路１５２、１５４と読出データ選択回路１５６から形成される。以下、ＲＡＭ１００ｅに対して設けられた読出データ変換回路１５２を例に説明する。読出データ変換回路１５２は、ｒ_ｄａｔａが演算ユニット１２ａのｒ_ｄａｔａであればそのまま出力し、そうでなければ０（全ビット０）に変換して出力する。ｒ_ｄａｔａが演算ユニット１２ａのｒ_ｄａｔａであるかどうかは、ｓｅｔ_ａと、メモリ制御回路１３０ｅのアクセス判定回路１３８の出力をＲＡＭの読み出しにかかる時間分、遅延回路１４０にて遅延したデータ（select）とを用いて判定する。すなわち、ｓｅｔ_ａが演算ユニット１２ａを示す信号であり、かつアクセス判定回路１３８の出力であるselectがアクティブであれば演算ユニット１２ａのｒ_ｄａｔａであることが判定される。読出データ選択回路１５６は、各ＲＡＭ１００ｅ、１００ｆの読出データ変換回路１５２、１５４からの出力のうち、演算ユニット１２ａのｒ_ｄａｔａを選択する。この選択は、読出データ変換回路１５２、１５４の出力の論理和（ＯＲ）をとることで実現される。したがって、読出データ選択回路１５６からは、メモリ群のＲＡＭ１００ｅまたはＲＡＭ１００ｆから読み出されたデータが演算ユニット１２ａのｒ_ｄａｔａの場合はそのデータ値が出力され、演算ユニット１２ａのｒ_ｄａｔａでない場合は０が出力される。 The read control circuit 150a selects read data to be passed to the arithmetic unit 12a from the data (r_data) read from the RAMs 100e and 100f of the memory group. Read control circuit 150 a is formed of read data conversion circuits 152, 154 and read data selection circuit 156. Hereinafter, the read data conversion circuit 152 provided for the RAM 100e will be described as an example. The read data conversion circuit 152 outputs r_data if it is r_data of the arithmetic unit 12a, otherwise converts it to 0 (all bits 0) and outputs it. Whether r_data is r_data of the arithmetic unit 12a is determined by using set_a and data (select) obtained by delaying the output of the access determination circuit 138 of the memory control circuit 130e by the delay circuit 140 by the time required for RAM reading. Judgment. That is, if set_a is a signal indicating the arithmetic unit 12a, and the select that is the output of the access determination circuit 138 is active, it is determined that r_data of the arithmetic unit 12a. The read data selection circuit 156 selects r_data of the arithmetic unit 12a among the outputs from the read data conversion circuits 152 and 154 of the RAMs 100e and 100f. This selection is realized by taking the logical sum (OR) of the outputs of the read data conversion circuits 152 and 154. Therefore, the read data selection circuit 156 outputs the data value when the data read from the RAM 100e or RAM 100f of the memory group is r_data of the arithmetic unit 12a, and outputs 0 when it is not r_data of the arithmetic unit 12a. The

なお読出データ変換回路１５２、１５４が、それぞれのｒ_ｄａｔａが演算ユニット１２ａのｒ_ｄａｔａでない場合に全ビット０に変換して出力するとしたが、他の信号値たとえば全ビット１に変換して出力してもよい。このとき、読出データ選択回路１５６は、読出データ変換回路１５２、１５４の双方からの出力が全ビット１のときは０を出力し、そうでない場合は、読出データ変換回路１５２、１５４の出力の論理積（ＡＮＤ）をとって出力する。 Note that the read data conversion circuits 152 and 154 convert all the bits to 0 when each r_data is not r_data of the arithmetic unit 12a. However, the read data conversion circuits 152 and 154 may convert other signal values such as all bits to be output. Good. At this time, the read data selection circuit 156 outputs 0 when the outputs from both of the read data conversion circuits 152 and 154 are all 1s, otherwise, the logic of the output of the read data conversion circuits 152 and 154 Take product (AND) and output.

以上はＲＡＭ１００ｅのメモリ制御回路１３０ｅについての説明であるが、他のＲＡＭ１００のメモリ制御回路１３０も、同様の構成を有する。 The above is the description of the memory control circuit 130e of the RAM 100e, but the memory control circuits 130 of the other RAMs 100 have the same configuration.

図８は、第２選択回路１２０の回路構成を示す。第２選択回路１２０は、複数のメモリ群からのｒ_ｄａｔａのうち、ひとつを選択する。回路構成は、読出制御回路１５０ａの読出データ選択回路１５６と同様であり、各メモリ群からのデータの論理和（ＯＲ）をとる読出データ選択回路１５８を有して構成される。第２選択回路１２０ａは、読出データ選択回路１５８を有して、図６に示すメモリ構成例では、演算ユニット１２ａのメインメモリ２７ａを構成するＲＡＭ１００ａ、ＲＡＭ１００ｅの一方の読出データを演算ユニット１２ａに供給する。同様に第２選択回路１２０ｄは、演算ユニット１２ｄのメインメモリ２７ｄを構成するＲＡＭ１００ｄ、ＲＡＭ１００ｆの一方の読出データを演算ユニット１２ｄに供給する。第２選択回路１２０ｂは、ＲＡＭ１００ｂの読出データを演算ユニット１２ｂに供給し、第２選択回路１２０ｃは、ＲＡＭ１００ｃの読出データを演算ユニット１２ｃに供給する。 FIG. 8 shows a circuit configuration of the second selection circuit 120. The second selection circuit 120 selects one of r_data from a plurality of memory groups. The circuit configuration is the same as that of the read data selection circuit 156 of the read control circuit 150a, and includes a read data selection circuit 158 that takes a logical sum (OR) of data from each memory group. The second selection circuit 120a has a read data selection circuit 158. In the memory configuration example shown in FIG. 6, one read data of the RAM 100a and RAM 100e constituting the main memory 27a of the arithmetic unit 12a is supplied to the arithmetic unit 12a. To do. Similarly, the second selection circuit 120d supplies one read data of the RAM 100d and RAM 100f constituting the main memory 27d of the arithmetic unit 12d to the arithmetic unit 12d. The second selection circuit 120b supplies read data from the RAM 100b to the arithmetic unit 12b, and the second selection circuit 120c supplies read data from the RAM 100c to the arithmetic unit 12c.

以上のように、演算ユニット１２によって使用できるメモリ群を制限することにより、選択回路や接続線を削減することができ、配置配線の自由度が高まるとともに、回路規模が削減される。これにより、動作速度の向上し、処理時間が短くなる。 As described above, by limiting the memory group that can be used by the arithmetic unit 12, it is possible to reduce selection circuits and connection lines, thereby increasing the degree of freedom of placement and routing and reducing the circuit scale. This improves the operating speed and shortens the processing time.

さらに、読出制御回路１５０ａと第２選択回路１２０ａにおいて、それぞれ読出データ選択回路１５６と読出データ選択回路１５８とを設けることで、メモリ群と演算ユニット１２の間の配線数が少なくできる。また、読出データ変換回路１５２、１５４にて所望のデータでない場合は読出データを０値に変換することにより、論理和をとって所望のデータのみを出力できるようにしたことで、読出データ選択回路１５６での制御信号を不要にしている。これらにより、配置配線の自由度が高まり、さらには回路規模が削減され、動作速度が向上し、処理時間が短くなる。 Further, the read control circuit 150a and the second selection circuit 120a are provided with the read data selection circuit 156 and the read data selection circuit 158, respectively, so that the number of wirings between the memory group and the arithmetic unit 12 can be reduced. Further, when the read data conversion circuits 152 and 154 do not have the desired data, the read data is converted into a zero value so that only the desired data can be output by performing a logical sum. The control signal at 156 is unnecessary. As a result, the degree of freedom of placement and routing is increased, the circuit scale is reduced, the operation speed is improved, and the processing time is shortened.

共通メモリ回路６０のメモリ構成情報ｓｅｔ_ａ、ｓｅｔ_ｂは、設定部１４または制御部１８から供給されてもよく、リコンフィギュラブル回路である演算ユニット１２のコンフィギュレーション時に設定されてもよい。前者の場合は、プロセッサの動作時にダイナミックにメモリ構成を変更できるため、演算ユニット１２の処理バリエーションを増やすことができる。また、後者の場合は、メモリ構成情報をＲＡＭごとに独立して配置できるなど、配置配線の自由度が高まる。これにより回路規模が削減され、さらに動作速度の向上により、処理時間が短くなる。 The memory configuration information set_a and set_b of the common memory circuit 60 may be supplied from the setting unit 14 or the control unit 18 and may be set when the arithmetic unit 12 that is a reconfigurable circuit is configured. In the former case, since the memory configuration can be dynamically changed during the operation of the processor, the processing variations of the arithmetic unit 12 can be increased. In the latter case, the degree of freedom of placement and routing is increased, for example, the memory configuration information can be placed independently for each RAM. As a result, the circuit scale is reduced, and the processing time is shortened by further improving the operation speed.

また、ＲＡＭ１００によってアドレス値の割り当てを限定することによって、メモリ構成情報やメモリ群に供給するアドレスａｄｄｒのビット幅を削減することができ、回路規模が削減される。これにより、動作速度を向上でき、処理時間を短くできる。アドレス値の割当の限定手法としては具体的に、異なる記憶容量をもつＲＡＭ１００によってアドレス空間を構成する場合に、アドレス空間には、記憶容量の小さいＲＡＭ１００から順に小さいアドレス値を割り当てる。また、これとは別に、同じ記憶容量のＲＡＭ１００が存在する場合に、同じ記憶容量のＲＡＭ１００の個数が多いものから順に、小さいアドレス値を割り当ててもよい。また、複数の演算ユニット１２に使用可能なＲＡＭ１００に、小さいアドレス値を割り当ててもよい。 Further, by limiting the assignment of address values by the RAM 100, the bit width of the address addr supplied to the memory configuration information and the memory group can be reduced, and the circuit scale is reduced. Thereby, the operation speed can be improved and the processing time can be shortened. Specifically, as a method for limiting the assignment of address values, specifically, when an address space is constituted by the RAMs 100 having different storage capacities, smaller address values are assigned to the address spaces in order from the RAM 100 having a smaller storage capacity. Alternatively, if there are RAMs 100 having the same storage capacity, smaller address values may be assigned in order from the largest number of RAMs 100 having the same storage capacity. Also, a small address value may be assigned to the RAM 100 that can be used for the plurality of arithmetic units 12.

例えば、図４に示す処理装置１０の例では、共通メモリ回路６０のＲＡＭ１００ｅ、１００ｆの判定ビット信号を０もしくは１のみ、個別メモリ回路７０のＲＡＭ１００ａの判定ビット信号を０から５、ＲＡＭ１００ｂの判定ビット信号を０から３、個別メモリ回路８０のＲＡＭ１００ｃ、１００ｄの判定ビット信号を０から３と制限することができる。この場合、共通メモリ回路６０のＲＡＭ１００ｅ、１００ｆのｓｅｔ_ｂは1ビット、ＲＡＭ１００ｂ、１００ｃ、１００ｄのｓｅｔ_ｂは２ビットでよいことになる。なおアドレス値の割り当てを限定しなければ、いずれもｓｅｔ_ｂは３ビット必要となるため、アドレス値の割り当ての限定には、回路規模を削減できる利点がある。また、共通メモリ回路６０に送るアドレスａｄｄｒのビット幅は、制限がない場合１３ビット必要であるが、このように制限した場合、演算ユニット１２ａ、１２ｂの出力時に共通メモリ回路６０に送るアドレスａｄｄｒの１２ビット目を１２ビット目と１３ビット目の論理和値（ＯＲ値）に変換すれば、１２ビットでよいことになる。 For example, in the example of the processing apparatus 10 shown in FIG. 4, the determination bit signals of the RAMs 100e and 100f of the common memory circuit 60 are only 0 or 1, the determination bit signal of the RAM 100a of the individual memory circuit 70 is 0 to 5, and the determination bits of the RAM 100b The signal can be limited to 0 to 3, and the determination bit signals of the RAMs 100c and 100d of the individual memory circuit 80 can be limited to 0 to 3. In this case, the set_b of the RAMs 100e and 100f of the common memory circuit 60 may be 1 bit, and the set_b of the RAMs 100b, 100c, and 100d may be 2 bits. If the address value assignment is not limited, the set_b requires 3 bits in all cases. Therefore, the limitation of the address value assignment has an advantage that the circuit scale can be reduced. In addition, the bit width of the address addr to be sent to the common memory circuit 60 is 13 bits when there is no restriction. However, if this restriction is used, the address addr to be sent to the common memory circuit 60 at the time of output of the arithmetic units 12a and 12b. If the 12th bit is converted into the logical sum (OR value) of the 12th and 13th bits, 12 bits are sufficient.

図９は、本実施形態の処理装置１０の構成のさらなる変形例を示す。図９に示す処理装置１０は、１つの演算ユニット１２ａを備え、演算ユニット１２ａが、スレッド１、２、３、４を実行する。図９に示す処理装置１０は、各スレッドに対してメモリを再構成可能なメモリ構成機能をもつ。第２選択回路１２０ａ〜１２０ｄは、それぞれスレッド１〜４に対して、それぞれのスレッドとＲＡＭ１００との間のデータの送受を可能とするために設けられる。 FIG. 9 shows a further modification of the configuration of the processing apparatus 10 of the present embodiment. The processing apparatus 10 illustrated in FIG. 9 includes one arithmetic unit 12a, and the arithmetic unit 12a executes threads 1, 2, 3, and 4. The processing apparatus 10 illustrated in FIG. 9 has a memory configuration function that can reconfigure the memory for each thread. The second selection circuits 120 a to 120 d are provided for the threads 1 to 4, respectively, to enable data transmission / reception between the respective threads and the RAM 100.

このように演算ユニット１２が、複数の独立した処理（スレッド）を実行可能なリコンフィギュラブル回路で構成される場合、メモリ群の使用制限をスレッド単位に設定することも可能である。図４と比較すると、演算ユニット１２とスレッドとを単純に置き換えたものとみることができる。これは、図４では処理単位が演算ユニット１２であったのに対して、図９では処理単位がスレッドであることの違いでしかないためであり、したがって、本実施形態のメモリ構成技術は、処理単位ごとに実行可能であることが分かる。 As described above, when the arithmetic unit 12 is configured by a reconfigurable circuit capable of executing a plurality of independent processes (threads), it is possible to set the use restriction of the memory group for each thread. Compared with FIG. 4, it can be seen that the arithmetic unit 12 and the thread are simply replaced. This is because the processing unit is the arithmetic unit 12 in FIG. 4 but only the difference in FIG. 9 is that the processing unit is a thread. Therefore, the memory configuration technique of this embodiment is as follows. It can be seen that it can be executed for each processing unit.

なお、処理装置１０が複数の演算ユニット１２を備え、各演算ユニット１２ごとに個別メモリ回路が設けられ、またすべての演算ユニット１２が共通メモリ回路にアクセス可能な構成としてもよい。この構成によると、個別メモリ回路は演算ユニット専用のメモリとすることができるため、回路規模を削減できるとともに、共通メモリ回路はすべての演算ユニットで共用するために、メモリ構成の柔軟性を維持することも可能となる。 Note that the processing device 10 may include a plurality of arithmetic units 12, an individual memory circuit may be provided for each arithmetic unit 12, and all the arithmetic units 12 may access the common memory circuit. According to this configuration, since the individual memory circuit can be a memory dedicated to the arithmetic unit, the circuit scale can be reduced, and the common memory circuit is shared by all the arithmetic units, so that the flexibility of the memory configuration is maintained. It is also possible.

以上は、処理装置１０のハードウェア構成について説明した。以下では、マルチスレッド処理において、さらにメモリの有効利用を図るべく、メモリのアドレス空間を共用する技術について説明する。マルチスレッド処理は、複数の演算ユニット１２において実行されてよいが、以下では説明を簡単にするために、図９に示す処理装置１０の構成にしたがって、アドレス空間共用技術を説明する。 The hardware configuration of the processing device 10 has been described above. In the following, a technique for sharing a memory address space in order to further effectively use the memory in multithread processing will be described. The multi-thread processing may be executed in the plurality of arithmetic units 12, but for the sake of simplicity, the address space sharing technique will be described below according to the configuration of the processing device 10 shown in FIG.

図１０は、演算ユニット１２の構成の一例を示す。この演算ユニット１２は、複数のスレッドがメモリのアドレス空間を共用するアドレス空間共用回路を構成する。このアドレス空間共用回路では、各スレッド１〜４のそれぞれに対してメモリが割り当てられているなかで、スレッド３がスレッド１のメモリのアドレス空間を共用できる。アドレス空間共用回路は、出力選択回路１７０および入力選択回路１８０、１８２により実現される。 FIG. 10 shows an example of the configuration of the arithmetic unit 12. The arithmetic unit 12 constitutes an address space sharing circuit in which a plurality of threads share the memory address space. In this address space sharing circuit, the thread 3 can share the memory 1 address space while the memory is allocated to each of the threads 1 to 4. The address space sharing circuit is realized by the output selection circuit 170 and the input selection circuits 180 and 182.

ＭＵＸ１６０ａ、１６０ｂ、１６０ｃ、１６０ｄは、接続部５２の入力側に設けられ、接続部５２の下段のＡＬＵ列５３が処理するスレッドに割り当てられているＲＡＭ１００からの読出データを選択して出力する。具体的にＭＵＸ１６０ａは、各スレッドのＲＡＭ１００からの読出データのうち、１段目のＡＬＵ列５３ａが処理するスレッドのデータを選択して接続部５２ａに供給する。ＭＵＸ１６０ｅ、１６０ｆ、１６０ｇ、１６０ｈは、接続部５２の出力側に設けられ、ＲＡＭ１００にアクセス（書込または読出）するデータのうち、特定のスレッドを処理しているＡＬＵ列５３の下段の接続部５２からのデータを選択して出力する。具体的にＭＵＸ１６０ｅは、各ＡＬＵ列５３で処理されたスレッドのデータのうち、処理されたスレッド４のデータを選択して、スレッド４に割り当てられているＲＡＭ１００に出力する。これらのＭＵＸ１６０は、４進カウンタ（図示せず）から出力される制御信号により制御されてよい。 The MUXs 160 a, 160 b, 160 c, and 160 d are provided on the input side of the connection unit 52, and select and output read data from the RAM 100 assigned to threads that are processed by the lower ALU column 53 of the connection unit 52. Specifically, the MUX 160a selects the data of the thread processed by the first-stage ALU column 53a from the read data from the RAM 100 of each thread and supplies the selected data to the connection unit 52a. The MUXs 160e, 160f, 160g, and 160h are provided on the output side of the connection unit 52, and the lower connection unit 52 of the ALU column 53 that processes a specific thread among data that accesses (writes or reads) the RAM 100. Select and output data from. Specifically, the MUX 160 e selects the processed thread 4 data from the thread data processed in each ALU column 53 and outputs the selected data to the RAM 100 allocated to the thread 4. These MUXs 160 may be controlled by a control signal output from a quaternary counter (not shown).

図１１は、スレッド１とスレッド３とでアドレス空間を共用するときの出力選択回路１７０の動作を示す。複数スレッドでメモリのアドレス空間を共用しない場合は、ＭＵＸ１６０ｅ、１６０ｆ、１６０ｇ、１６０ｈの選択により、各段のＡＬＵ列５３から所期のメモリにアクセスできる。一方、図１１に示すようにスレッド１とスレッド３とでメモリアドレス空間を共用する場合は、１段目のＡＬＵ列５３ａ、２段目のＡＬＵ列５３ｂからのみメモリにアクセスでき、またメモリからの読出データの入力は、３段目のＡＬＵ列５３ｃ、４段目のＡＬＵ列５３ｄに制限される。ここで段を制限するとは、読出データの入力を、特定の段に限定することを意味している。 FIG. 11 shows the operation of the output selection circuit 170 when the thread 1 and the thread 3 share the address space. When the memory address space is not shared by a plurality of threads, the desired memory can be accessed from the ALU column 53 of each stage by selecting the MUXs 160e, 160f, 160g, and 160h. On the other hand, as shown in FIG. 11, when the memory address space is shared between the thread 1 and the thread 3, the memory can be accessed only from the first-stage ALU column 53a and the second-stage ALU column 53b. Input of read data is limited to the third-stage ALU column 53c and the fourth-stage ALU column 53d. Here, limiting the level means that the input of read data is limited to a specific level.

出力選択回路１７０は、スレッド３がスレッド１用のメモリを使用しない場合は、常にスレッド１のメモリアクセスデータを選択する。一方でスレッド３がスレッド１用のメモリを使用する場合は、出力選択回路１７０が、スレッド１が１段目のＡＬＵ列５３ａまたは２段目のＡＬＵ列５３ｂで処理されているときはスレッド１のメモリアクセスデータを、それ以外のとき、つまりスレッド３が１段目のＡＬＵ列５３ａまたは２段目のＡＬＵ列５３ｂで処理されているときはスレッド３のメモリアクセスデータを選択する。すなわち、共用時の出力選択回路１７０は、図１１における出力選択回路１７０への２つの入力のうち、実線で示されている入力を常に選択するように動作する。この例では、第Ｔクロックで、ＡＬＵ列５３ａで処理されたスレッド１の出力データ、第（Ｔ＋１）クロックで、ＡＬＵ列５３ｂで処理されたスレッド１の出力データ、第（Ｔ＋２）クロックで、ＡＬＵ列５３ａで処理されたスレッド３の出力データ、第（Ｔ＋３）クロックで、ＡＬＵ列５３ｂで処理されたスレッド３の出力データが選択される。 The output selection circuit 170 always selects the memory access data of the thread 1 when the thread 3 does not use the memory for the thread 1. On the other hand, when the thread 3 uses the memory for the thread 1, the output selection circuit 170 determines that the thread 1 is processed when the thread 1 is processed by the first-stage ALU column 53a or the second-stage ALU column 53b. At other times, that is, when the thread 3 is processed by the first-stage ALU column 53a or the second-stage ALU column 53b, the memory access data of the thread 3 is selected. In other words, the output selection circuit 170 at the time of sharing operates so as to always select the input indicated by the solid line among the two inputs to the output selection circuit 170 in FIG. In this example, the output data of the thread 1 processed in the ALU column 53a at the Tth clock, the output data of the thread 1 processed in the ALU column 53b at the (T + 1) clock, and the ALU at the (T + 2) clock. The output data of the thread 3 processed in the column 53a and the output data of the thread 3 processed in the ALU column 53b are selected at the (T + 3) th clock.

入力選択回路１８０、１８２は、ＭＵＸ１６０ｃ、１６０ｄに供給される４進カウンタからの制御信号を変換する回路である。メモリアドレス空間の共用が行われない場合、ＭＵＸ１６０ｃは、各スレッドのメモリからの読出データのうち、常にＡＬＵ列５３ｃが処理するスレッドのデータを選択する。これは、入力選択回路１８０がない場合と同じである。そしてメモリアドレス空間の共用を行なう場合は、３段目のＡＬＵ列５３ｃがスレッド３を処理するときはスレッド１用メモリからのデータを選択し、それ以外のときは、３段目のＡＬＵ列５３ｃが処理するスレッドのメモリのデータを選択する。 The input selection circuits 180 and 182 are circuits that convert a control signal from a quaternary counter supplied to the MUXs 160c and 160d. When the memory address space is not shared, the MUX 160c always selects the thread data to be processed by the ALU column 53c from the read data from the memory of each thread. This is the same as when the input selection circuit 180 is not provided. When the memory address space is shared, when the third-stage ALU column 53c processes the thread 3, the data from the thread 1 memory is selected. Otherwise, the third-stage ALU column 53c is selected. Selects thread memory data to process.

同様に、メモリアドレス空間の共用が行われない場合、ＭＵＸ１６０ｄは、各スレッドのメモリからの読出データのうち、常にＡＬＵ列５３ｄが処理するスレッドのデータを選択する。これは入力選択回路１８２がない場合と同じである。そしてメモリアドレス空間の共用を行う場合は、４段目のＡＬＵ列５３ｄがスレッド３を処理するときはスレッド１用メモリからのデータを選択し、それ以外のときは、４段目のＡＬＵ列５３ｄが処理するスレッドのメモリのデータを選択する。すなわち、スレッド１用メモリからのデータは、図１１に示すように、スレッド１またはスレッド３が動作している３段目のＡＬＵ列５３ｃまたは４段目のＡＬＵ列５３ｄに入力される。 Similarly, when the memory address space is not shared, the MUX 160d always selects the thread data to be processed by the ALU column 53d from the read data from the memory of each thread. This is the same as when the input selection circuit 182 is not provided. When the memory address space is shared, the data from the thread 1 memory is selected when the fourth-stage ALU column 53d processes the thread 3, and otherwise the fourth-stage ALU column 53d. Selects thread memory data to process. That is, the data from the thread 1 memory is input to the third-stage ALU column 53c or the fourth-stage ALU column 53d in which the thread 1 or thread 3 is operating as shown in FIG.

このように、メモリのアドレス空間を複数のスレッドで共用する場合、アクセス可能なリコンフィギュラブル回路の段を制限し、共用するアドレス空間に同時にアクセスしないようにアクセスタイミングを制御することにより、複数のスレッドが同時にメモリにアクセスしないようにしている。 As described above, when the memory address space is shared by a plurality of threads, a plurality of accessible reconfigurable circuit stages are limited, and the access timing is controlled so as not to access the shared address space at the same time. Threads are prevented from accessing memory at the same time.

図１２は、図９に示す処理装置１０において、スレッド１とスレッド３がメモリアドレス空間を共用したときのメモリ構成例を示す。スレッド３はスレッド１のメモリを使用することにより、スレッド３が、メモリアドレス空間の非共用時に使用できるメモリの容量よりも大きな容量を使用することができる。例えば、スレッド３が必要とするメモリの容量が２５００バイトであった場合、図９を参照すると、メモリ空間の非共用時には最大でもＲＡＭ１００ｃ、１００ｄの２０４８バイトしか利用できないため、スレッド３の処理を実行できるメモリを構成することができない。そのため、個別メモリ回路８０のＲＡＭ１００の数もしくは容量を増やす必要があるが、本実施形態のようにメモリアドレス空間をスレッド１と共用可能とすると、スレッド３が、２５００バイトのメモリを利用することが可能となる。 FIG. 12 shows a memory configuration example when the thread 1 and the thread 3 share the memory address space in the processing apparatus 10 shown in FIG. By using the memory of the thread 1, the thread 3 can use a capacity larger than the capacity of the memory that the thread 3 can use when the memory address space is not shared. For example, when the memory capacity required by the thread 3 is 2500 bytes, referring to FIG. 9, when the memory space is not shared, only 2048 bytes of the RAMs 100c and 100d can be used at the maximum. Can't configure memory that can. Therefore, it is necessary to increase the number or capacity of the RAM 100 of the individual memory circuit 80. However, if the memory address space can be shared with the thread 1 as in this embodiment, the thread 3 may use a 2500-byte memory. It becomes possible.

このとき、スレッド１の処理に必要なメモリの容量が５００バイトであった場合、メモリアドレス空間の非共用時に１０２４バイトの容量のＲＡＭをひとつ使うと、５２４バイトの容量が余ってしまうが、２５００バイトを必要とするスレッド３と３０７２バイトのメモリアドレス空間を共用することで、容量の余りは僅か７２バイトとなり、メモリを有効に使用することができる。このように、共用機能により、メモリの効率的な使用が可能となることで、ＲＡＭ１００の数やＲＡＭ１００の容量を少なく構成でき、回路規模を削減できる。さらに、スレッド３がスレッド１とメモリを共有することにより、スレッド４は個別メモリ回路８０のすべてのＲＡＭ１００ｃ、１００ｄを使用できることとなり、スレッド４で実行する処理のバリエーションを増やすことができる。 At this time, if the capacity of the memory required for the processing of the thread 1 is 500 bytes, if one RAM having a capacity of 1024 bytes is used when the memory address space is not shared, the capacity of 524 bytes remains, but 2500 By sharing the memory address space of 3072 bytes with the thread 3 that requires bytes, the remainder of the capacity is only 72 bytes, and the memory can be used effectively. As described above, since the shared function enables efficient use of the memory, the number of RAMs 100 and the capacity of the RAM 100 can be reduced, and the circuit scale can be reduced. Furthermore, since the thread 3 shares the memory with the thread 1, the thread 4 can use all the RAMs 100c and 100d of the individual memory circuit 80, and variations of processing executed by the thread 4 can be increased.

さらに、複数のスレッドでメモリのアドレス空間を共用することにより、メモリを介してデータの受け渡しが可能なため、処理の効率化ができ、処理時間を短くできる。ここでは、１つの演算ユニット１２ａのスレッド同士でのメモリアクセスをリコンフィギュラブル回路の段で制限することで、メモリのアドレス空間を共用する構成の例を示したが、異なる演算ユニット１２のスレッド同士でアドレス空間を共用できてもよく、演算ユニット１２同士でアドレス空間を共用できてもよい。また、ある演算ユニット１２のスレッドと、別の演算ユニット１２との間でアドレス空間を共用できてもよい。また、さらには、３つ以上の演算ユニット１２もしくはスレッドでアドレス空間を共用してもよい。 Further, by sharing the memory address space among a plurality of threads, data can be transferred via the memory, so that the processing efficiency can be improved and the processing time can be shortened. Here, an example of a configuration in which a memory address space is shared by restricting memory access between threads of one arithmetic unit 12a at the stage of the reconfigurable circuit is shown. The address space may be shared with each other, and the address space may be shared between the arithmetic units 12. Further, the address space may be shared between a thread of a certain arithmetic unit 12 and another arithmetic unit 12. Furthermore, the address space may be shared by three or more arithmetic units 12 or threads.

図１３は、２つの演算ユニット１２の構成例を示す。この例では、各演算ユニット１２ａ、１２ｂのスレッドに対して、ＲＡＭ１００が割り当てられている。演算ユニット１２ａ、１２ｂは、それぞれで実行するスレッドがメモリのアドレス空間を共用するアドレス空間共用回路を構成する。このアドレス空間共用回路では、演算ユニット１２ａで実行する各スレッド１〜４のそれぞれに対してメモリが割り当てられており、また演算ユニット１２ｂで実行する各スレッド５〜８のそれぞれに対してメモリが割り当てられているなかで、スレッド７がスレッド１とメモリのアドレス空間を共用し、スレッド７とスレッド１とは、それぞれ異なる演算ユニット１２ｂ、１２ａで処理されている。アドレス空間共用回路は、出力選択回路１７０ａおよび入力選択回路１８０ａ、１８２ａにより実現される。出力選択回路１７０ａ、入力選択回路１８０ａ、１８２ａの動作は、図１０に関連して説明したものと同じである。 FIG. 13 shows a configuration example of the two arithmetic units 12. In this example, the RAM 100 is allocated to the threads of the arithmetic units 12a and 12b. The arithmetic units 12a and 12b constitute an address space sharing circuit in which threads executed by the respective arithmetic units 12a and 12b share the address space of the memory. In this address space shared circuit, a memory is allocated to each of the threads 1 to 4 executed by the arithmetic unit 12a, and a memory is allocated to each of the threads 5 to 8 executed by the arithmetic unit 12b. Among them, the thread 7 shares the address space of the memory with the thread 1, and the thread 7 and the thread 1 are processed by different arithmetic units 12b and 12a, respectively. The address space sharing circuit is realized by the output selection circuit 170a and the input selection circuits 180a and 182a. The operations of the output selection circuit 170a and the input selection circuits 180a and 182a are the same as those described with reference to FIG.

図１４は、２つの演算ユニット１２の別の構成例を示す。ここでは、各演算ユニット１２ａ、１２ｂに対して、ＲＡＭ１００が割り当てられている。演算ユニット１２ａ、１２ｂは、メモリのアドレス空間を共用するアドレス空間共用回路を構成する。出力選択回路１７０ｂは、メモリのアドレス空間を共用しない場合は常に演算ユニット１２ａの出力を選択するように動作し、共用する場合は、４進カウンタ（図示せず）からの制御信号にしたがって演算ユニット１２ａまたは演算ユニット１２ｂの出力を選択する。たとえば出力選択回路１７０ｂは、４進カウンタの最初の２つのカウント値に対しては演算ユニット１２ａの出力を選択し、後の２つのカウント値に対しては演算ユニット１２ｂの出力を選択するように動作してよい。また、入力選択回路１８０ｂは、メモリのアドレス空間を共用しない場合は常に演算ユニット１２ｂ用メモリからのデータをＭＵＸ１６０から出力させ、共用する場合は、演算ユニット１２ａ用メモリまたは演算ユニット１２ｂ用メモリからのデータを選択する。この例では、共用を行なう場合、たとえば演算ユニット１２ａのメモリアクセスは１段目、２段目、入力は３段目、４段目に制限され、演算ユニット１２ｂのメモリアクセスは３段目、４段目、入力は１段目、２段目に制限される。 FIG. 14 shows another configuration example of the two arithmetic units 12. Here, the RAM 100 is allocated to each of the arithmetic units 12a and 12b. The arithmetic units 12a and 12b constitute an address space sharing circuit that shares the address space of the memory. The output selection circuit 170b operates so as to always select the output of the arithmetic unit 12a when the address space of the memory is not shared, and when it is shared, the output unit 170b operates according to a control signal from a quaternary counter (not shown). 12a or the output of the arithmetic unit 12b is selected. For example, the output selection circuit 170b selects the output of the arithmetic unit 12a for the first two count values of the quaternary counter, and selects the output of the arithmetic unit 12b for the latter two count values. May work. Further, the input selection circuit 180b always outputs the data from the arithmetic unit 12b memory from the MUX 160 when the address space of the memory is not shared, and when shared, from the memory for the arithmetic unit 12a or the arithmetic unit 12b Select data. In this example, when sharing is performed, for example, the memory access of the arithmetic unit 12a is restricted to the first stage and the second stage, the input is restricted to the third stage and the fourth stage, and the memory access of the arithmetic unit 12b is restricted to the third stage and the fourth stage. Stages and inputs are limited to the first and second stages.

なお、演算ユニット１２が多段構成をとらない場合に、例えば、２つの演算ユニット１２ａ、１２ｂでメモリのアドレス空間を共用するのであれば、交互にメモリにアクセスするようにアクセスタイミングを制御すればよい。以上のようにメモリのアドレス空間をスレッド間、演算ユニット間、またはスレッドと演算ユニット間で共用可能とすることで、回路規模の削減や処理時間の短縮を実現でき、処理装置１０における消費電力の低減化や、処理装置１０の処理パフォーマンスの向上を実現することが可能となる。 When the arithmetic unit 12 does not have a multi-stage configuration, for example, if the memory address space is shared by the two arithmetic units 12a and 12b, the access timing may be controlled so that the memory is accessed alternately. . As described above, the memory address space can be shared between threads, between arithmetic units, or between threads and arithmetic units, so that the circuit scale can be reduced and the processing time can be reduced. It is possible to realize reduction and improvement in processing performance of the processing apparatus 10.

以下において、メモリとの接続が制限された処理装置１０において、メモリを演算ユニット１２またはスレッドに対して、効率的に割り当てる手法について説明する。以下、メモリを割り当てる対象を「対象処理」とよぶ。ここで、処理装置１０において処理単位がスレッドである場合には、スレッドが対象処理であり、またマルチスレッド処理を行わない処理装置１０においては、演算ユニット１２が１つの処理を実行するため、演算ユニット１２が対象処理となる。 Hereinafter, a method for efficiently allocating memory to the arithmetic unit 12 or the thread in the processing device 10 in which connection with the memory is limited will be described. Hereinafter, the target to which the memory is allocated is referred to as “target processing”. Here, when the processing unit is a thread in the processing device 10, the thread is a target process, and in the processing device 10 that does not perform multi-thread processing, the arithmetic unit 12 executes one process. Unit 12 is the target process.

図１５は、メモリと対象処理との接続がハードウェア上で制限された処理装置１０の構成を示す。共通メモリ回路２１０は、ＲＡＭ２００ｅ、２００ｆ、２００ｇを有し、個別メモリ回路２２０は、ＲＡＭ２００ａ、２００ｂ、２００ｃを有し、個別メモリ回路２３０は、ＲＡＭ２００ｄを有する。対象処理２５０および対象処理２６０は、共通メモリ回路２１０および個別メモリ回路２２０を使用可能であり、対象処理２７０は、共通メモリ回路２１０および個別メモリ回路２３０を使用可能である。図１５中、各ＲＡＭ２００に対して付された数字は、メモリ容量（単位なし）を示す。たとえばＲＡＭ２００ａは、１２８のメモリ容量をもち、ＲＡＭ２００ｃは、３２のメモリ容量をもっている。また各対象処理に対して付された数字は、メモリに格納する必要があるデータ量を示す。たとえば対象処理２５０は、１９０のデータ量をメモリに格納する必要があり、また対象処理２６０は、６０のデータ量をメモリに格納する必要がある。以下、各対象処理２５０、２６０、２７０にメモリを割り当てる技術について説明する。 FIG. 15 shows a configuration of the processing apparatus 10 in which the connection between the memory and the target process is restricted on hardware. The common memory circuit 210 includes RAMs 200e, 200f, and 200g, the individual memory circuit 220 includes RAMs 200a, 200b, and 200c, and the individual memory circuit 230 includes a RAM 200d. The target process 250 and the target process 260 can use the common memory circuit 210 and the individual memory circuit 220, and the target process 270 can use the common memory circuit 210 and the individual memory circuit 230. In FIG. 15, the numbers given to each RAM 200 indicate the memory capacity (no unit). For example, the RAM 200a has a memory capacity of 128, and the RAM 200c has a memory capacity of 32. The number given to each target process indicates the amount of data that needs to be stored in the memory. For example, the target process 250 needs to store 190 data amounts in the memory, and the target process 260 needs to store 60 data amounts in the memory. Hereinafter, a technique for allocating memory to each of the target processes 250, 260, and 270 will be described.

図１６（ａ）、図１６（ｂ）、図１６（ｃ）は、メモリ割当処理のフローチャートを示す。以下において、メモリと対象処理とが図１５に示すように接続されている場合に、各対象処理に対してメモリを割り当てる例を説明する。このメモリ割当処理では、各対象処理で扱うデータ量を、順次ＲＡＭ２００に振り分けていく。このフローチャートにおいて、「未割当データ」は、ＲＡＭ２００に対して、まだ割り当てられていないデータを意味し、たとえば対象処理２５０に対してＲＡＭ２００ａが既に割り当てられていると、未割当データは、６２（＝１９０−１２８）として表現される。メモリ割当処理は、図１に示すコンパイル部３０により実行される。コンパイル部３０は、メモリ割当処理を行う前に、各対象処理が保持しなければならないデータ量を取得する。また、使用可能なメモリの容量も予め取得する。 FIGS. 16A, 16B, and 16C are flowcharts of the memory allocation process. In the following, an example in which memory is allocated to each target process when the memory and the target process are connected as shown in FIG. 15 will be described. In this memory allocation process, the amount of data handled in each target process is sequentially allocated to the RAM 200. In this flowchart, “unallocated data” means data not yet allocated to the RAM 200. For example, if the RAM 200a is already allocated to the target process 250, the unallocated data is 62 (= 190-128). The memory allocation process is executed by the compiling unit 30 shown in FIG. The compiling unit 30 acquires the amount of data that each target process must hold before performing the memory allocation process. Also, the available memory capacity is acquired in advance.

まず未割当データのある対象処理が存在するか判定される（Ｓ１０）。すべての対象処理に対してメモリが割り当てられていれば（Ｓ１０のＮ）、本フローは終了する。メモリ割当処理の開始時は、まだ対象処理に対してメモりが割り当てられていないため（Ｓ１０のＹ）、Ｓ１２に進む。 First, it is determined whether there is a target process with unallocated data (S10). If memory is allocated to all target processes (N in S10), this flow ends. At the start of the memory allocation process, since no memory has been allocated to the target process (Y in S10), the process proceeds to S12.

Ｓ１２では、未割当データのある対象処理のうち、使用可能なメモリの総容量が最も小さい対象処理を処理群ａとして設定する。使用可能なメモリの総容量とは、各対象処理がアクセス可能なメモリの総容量を意味する。図１５を参照すると、メモリのアクセス制限から、使用可能なメモリの総容量が最も小さい対象処理は対象処理２７０であるため、処理群ａに対象処理２７０が設定される。続いて、処理群ａに含まれる対象処理のうち、未割当データ量の最も大きい対象処理を、処理ｘとして設定する（Ｓ１４）。この場合、処理群ａに含まれる対象処理が、対象処理２７０のみであるため、対象処理２７０が処理ｘとして特定される。続いて、処理ｘが使用可能なすべてのＲＡＭを、候補メモリｂとして設定する（Ｓ１６）。図１５を参照して、対象処理２７０が使用可能なＲＡＭ２００ｄ、２００ｅ、２００ｆ、２００ｇが、候補メモリｂとして特定される。これらの候補メモリｂは、候補メモリ群を構成する。 In S12, among the target processes having unallocated data, the target process having the smallest total usable memory capacity is set as the process group a. The total memory capacity that can be used means the total memory capacity that can be accessed by each target process. Referring to FIG. 15, the target process 270 is set in the processing group a because the target process having the smallest total available memory capacity is the target process 270 due to memory access restrictions. Subsequently, among the target processes included in the processing group a, the target process having the largest unallocated data amount is set as the process x (S14). In this case, since the target process included in the processing group a is only the target process 270, the target process 270 is specified as the process x. Subsequently, all the RAMs that can be used by the process x are set as candidate memories b (S16). Referring to FIG. 15, RAMs 200d, 200e, 200f, and 200g that can be used by the target process 270 are identified as candidate memories b. These candidate memories b constitute a candidate memory group.

このように対象処理２７０に候補メモリｂが設定できた場合（Ｓ１８のＹ）、候補メモリｂを容量が大きい順にソートし、その順番を保持する（Ｓ２０）。なお、メモリ容量が同じ場合には、使用される可能性のある対象処理の数が少ない順にソートする。候補メモリｂであるＲＡＭ２００ｄ、２００ｅ、２００ｆ、２００ｇに関して言えば、すべて同じ３２の容量を有しているが、ＲＡＭ２００ｄは、対象処理２７０のみにより使用される可能性があるのに対し、共通メモリ回路２１０に含まれる２００ｅ、２００ｆ、２００ｇは、対象処理２５０、２６０、２７０の３つの対象処理により使用される可能性がある。したがって、これらは、ＲＡＭ２００ｄ、２００ｅ、２００ｆ、２００ｇの順にソートされ、その順番が保持される。Ｓ２０にしたがってＲＡＭ２００の割り当てを試行する順番を定めることで、メモリの割当可能性を高めることができるとともに、割り当て後のメモリの総容量を小さく抑えることも可能となる。 When the candidate memory b can be set in the target process 270 as described above (Y in S18), the candidate memories b are sorted in descending order of capacity, and the order is retained (S20). When the memory capacities are the same, sorting is performed in ascending order of the number of target processes that may be used. Regarding the RAMs 200d, 200e, 200f, and 200g, which are candidate memories b, they all have the same capacity of 32, but the RAM 200d may be used only by the target process 270, whereas the common memory circuit There is a possibility that 200e, 200f, and 200g included in 210 are used by three target processes 250, 260, and 270. Therefore, these are sorted in the order of the RAMs 200d, 200e, 200f, and 200g, and the order is maintained. By determining the order in which the RAM 200 is tried in accordance with S20, the possibility of memory allocation can be increased, and the total memory capacity after allocation can be kept small.

まず、保持したＲＡＭ２００の順序を示すｉを１にセットし（Ｓ２２）、候補メモリ群のｉ番目のＲＡＭ２００を候補メモリｃとする（Ｓ２４）。ここでは、ＲＡＭ２００ｄが候補メモリｃとして設定される。続いて、候補メモリｃの容量が処理ｘの未割当データ量以下であるか判定される（Ｓ２６）。ここでは候補メモリｃ（ＲＡＭ２００ｄ）の容量が３２、処理ｘ（対象処理２７０）の未割当データ量が３０であり、候補メモリｃの容量が処理ｘの未割当データ量より大きい（Ｓ２６のＮ）。 First, i indicating the order of the stored RAM 200 is set to 1 (S22), and the i-th RAM 200 in the candidate memory group is set as a candidate memory c (S24). Here, the RAM 200d is set as the candidate memory c. Subsequently, it is determined whether the capacity of the candidate memory c is equal to or less than the unallocated data amount of the process x (S26). Here, the capacity of the candidate memory c (RAM 200d) is 32, the unallocated data amount of the process x (target process 270) is 30, and the capacity of the candidate memory c is larger than the unallocated data amount of the process x (N in S26). .

続いて、処理群ａのうち、使用可能なメモリが処理ｘと同じ対象処理を処理群ｄ（処理ｘを含む）とし、処理群ｄの未割当データ量を、それよりも大きいメモリの最小容量の倍数で表した値の合計をｅとする（Ｓ２８）。ここでは、処理群ａが対象処理２７０のみであるため処理群ｄは対象処理２７０となり、その未割当データ量は３０である。このときメモリの最小容量は３２であり、３２の倍数で且つ３０よりも大きい値であって最小の値は３２である。したがって、ｅ＝３２と特定される。ここでは、最小容量のメモリをいくつ使用して未割当データ量が割り当て可能かを調べ、未割当データ量をメモリの最小容量の単位で表している。これは、Ｓ２８から以下のＳ３２ですべての未割当データが候補メモリｃより容量が小さいメモリを使って割り当てできるかを調べるためである。その際、例えば、未割当データ量が１０と２０の２つの処理がある場合、本来は容量３２のメモリをそれぞれの処理に割り当てることになるが、メモリの最小容量単位で表していないと、未割当データ量の合計が３０なので容量３２のメモリ１つで割り当てできると判断してしまう危険性があり、それを解決するためにメモリの最小容量単位で表している。次に、Ｓ３０では、候補メモリ群のうち、候補メモリｃより容量が小さいメモリの容量の合計をｆとする。ここでＲＡＭ２００ｄ〜２００ｇのうち、候補メモリｃ（ＲＡＭ２００ｄ）より容量が小さいメモリはないので、ｆ＝０となる。 Subsequently, in the processing group a, a target process whose usable memory is the same as that of the processing x is a processing group d (including the processing x), and the unallocated data amount of the processing group d is set to a minimum memory capacity larger than that. The sum of the values expressed in multiples of e is set as e (S28). Here, since the processing group a is only the target processing 270, the processing group d is the target processing 270, and the unallocated data amount is 30. At this time, the minimum capacity of the memory is 32, which is a multiple of 32 and larger than 30, and the minimum value is 32. Therefore, e = 32 is specified. Here, it is examined how many memories with the minimum capacity can be used to allocate the unallocated data amount, and the unallocated data amount is expressed in units of the minimum capacity of the memory. This is for checking whether all unallocated data can be allocated using a memory having a capacity smaller than that of the candidate memory c from S28 to S32 below. At that time, for example, when there are two processes with unallocated data amounts of 10 and 20, a memory with a capacity of 32 is originally allocated to each process. Since the total amount of allocated data is 30, there is a risk that it can be determined that one memory with a capacity of 32 can be allocated. Next, in S30, the sum of the capacities of memories having a capacity smaller than that of the candidate memory c in the candidate memory group is defined as f. Here, since there is no memory having a smaller capacity than the candidate memory c (RAM 200d) among the RAMs 200d to 200g, f = 0.

ここでｅとｆとを比較し（Ｓ３２）、この場合はｅ（＝３２）＞ｆ（＝０）（Ｓ３２のＹ）であるため、候補メモリｃを処理ｘの未割当データに割り当てる（Ｓ３４）。これによりＲＡＭ２００ｄが対象処理２７０に割り当てられる。このとき処理ｘ（対象処理２７０）のデータはすべて割り当てられたことになり、したがって処理ｘに未割当データはなくなるため（Ｓ３６のＮ）、Ｓ１０に戻る。 Here, e and f are compared (S32), and in this case, e (= 32)> f (= 0) (Y in S32), so that the candidate memory c is assigned to the unallocated data of the process x (S34). ). As a result, the RAM 200d is allocated to the target process 270. At this time, all the data of the process x (target process 270) is allocated, and therefore there is no unallocated data in the process x (N in S36), and the process returns to S10.

このとき未割当データのある対象処理として対象処理２５０、２６０が存在し（Ｓ１０のＹ）、Ｓ１２では、使用可能なメモリの総容量が最も小さい対象処理が処理群ａとして選ばれる。ここでは、対象処理２５０、２６０の使用可能なメモリの総容量が同じであるため、双方の対象処理２５０、２６０が処理群ａとして設定される。続いてＳ１４で、未割当データ量の大きい対象処理２５０が処理ｘとして設定され、Ｓ１６で、対象処理２５０が使用可能なメモリであるＲＡＭ２００ａ、２００ｂ、２００ｃ、２００ｅ、２００ｆ、２００ｇが、候補メモリｂとして設定される。これらの候補メモリｂは、候補メモリ群を構成する。 At this time, there are target processes 250 and 260 as target processes having unallocated data (Y in S10). In S12, the target process having the smallest total available memory capacity is selected as the process group a. Here, since the total available memory capacity of the target processes 250 and 260 is the same, both the target processes 250 and 260 are set as the processing group a. Subsequently, in S14, the target process 250 having a large amount of unallocated data is set as the process x, and in S16, the RAMs 200a, 200b, 200c, 200e, 200f, and 200g, which are memories usable by the target process 250, are stored in the candidate memory b. Set as These candidate memories b constitute a candidate memory group.

候補メモリｂが存在しているため（Ｓ１８のＹ）、Ｓ２０で、候補メモリｂが容量の大きい順にソートされ、その順番が保持される。なお、メモリ容量が同じ場合には、使用される可能性のある対象処理の数が少ない順にソートする。候補メモリｂであるＲＡＭ２００ａ、２００ｂ、２００ｃ、２００ｅ、２００ｆ、２００ｇに関して言えば、候補メモリｂは、ＲＡＭ２００ａ、２００ｂ、２００ｃ、２００ｅ、２００ｆ、２００ｇの順にソートされ、その順番が保持される。 Since the candidate memory b exists (Y in S18), the candidate memory b is sorted in descending order of capacity in S20, and the order is maintained. When the memory capacities are the same, sorting is performed in ascending order of the number of target processes that may be used. Regarding the RAMs 200a, 200b, 200c, 200e, 200f, and 200g, which are candidate memories b, the candidate memories b are sorted in the order of the RAMs 200a, 200b, 200c, 200e, 200f, and 200g, and the order is maintained.

保持したＲＡＭ２００の順序を示すｉを１にセットし（Ｓ２２）、候補メモリ群のｉ番目のＲＡＭ２００を候補メモリｃとする（Ｓ２４）。ここでは、ＲＡＭ２００ａが候補メモリｃとして設定される。続いて、候補メモリｃの容量が処理ｘの未割当データ量以下であるか判定される（Ｓ２６）。ここでは候補メモリｃ（ＲＡＭ２００ａ）の容量が１２８、処理ｘ（対象処理２５０）の未割当データ量が１９０であり、候補メモリｃの容量が処理ｘの未割当データ量より小さい（Ｓ２６のＹ）。したがって候補メモリｃを処理ｘの未割当データに割り当て（Ｓ３４）、これによりＲＡＭ２００ａが対象処理２５０に割り当てられる。このとき処理ｘ（対象処理２５０）に未割当データが存在するため（Ｓ３６のＹ）、Ｓ１２に戻る。このとき対象処理２５０の未割当データ量は６２（＝１９２−１２８）となる。 I indicating the order of the stored RAMs 200 is set to 1 (S22), and the i-th RAM 200 in the candidate memory group is set as a candidate memory c (S24). Here, the RAM 200a is set as the candidate memory c. Subsequently, it is determined whether the capacity of the candidate memory c is equal to or less than the unallocated data amount of the process x (S26). Here, the capacity of the candidate memory c (RAM 200a) is 128, the unallocated data amount of the process x (target process 250) is 190, and the capacity of the candidate memory c is smaller than the unallocated data amount of the process x (Y in S26). . Therefore, the candidate memory c is allocated to the unallocated data of the process x (S34), whereby the RAM 200a is allocated to the target process 250. At this time, since there is unallocated data in the process x (target process 250) (Y in S36), the process returns to S12. At this time, the unallocated data amount of the target process 250 is 62 (= 192-128).

続いて、Ｓ１２では、処理群ａに対象処理２５０（未割当データ量６２）、対象処理２６０（未割当データ量６０）が選ばれる。Ｓ１４では、処理群ａのうち、未割当データ量の大きい対象処理２５０が処理ｘに設定される。Ｓ１６で、処理ｘが使用可能なＲＡＭ２００ｂ、２００ｃ、２００ｅ、２００ｆ、２００ｇが候補メモリｂに設定される。候補メモリが存在するため（Ｓ１８のＹ）、Ｓ２０では、候補メモリｂがソートされて、ＲＡＭ２００ｂ、２００ｃ、２００ｅ、２００ｆ、２００ｇの順番が保持される。そして、ｉ＝１（Ｓ２２）として、Ｓ２４では、まずＲＡＭ２００ｂが候補メモリｃに設定される。 Subsequently, in S12, the target process 250 (unallocated data amount 62) and the target process 260 (unallocated data amount 60) are selected as the processing group a. In S14, the target process 250 having a large unallocated data amount in the process group a is set as the process x. In S16, the RAMs 200b, 200c, 200e, 200f, and 200g that can use the process x are set as the candidate memory b. Since there is a candidate memory (Y in S18), in S20, the candidate memory b is sorted and the order of the RAMs 200b, 200c, 200e, 200f, and 200g is maintained. Then, i = 1 (S22), and in S24, the RAM 200b is first set as the candidate memory c.

続いて、候補メモリｃの容量が処理ｘの未割当データ量以下であるか判定され（Ｓ２６）、ここでは候補メモリｃ（ＲＡＭ２００ｂ）の容量が１２８、処理ｘ（対象処理２５０）の未割当データ量が６２である（Ｓ２６のＮ）。Ｓ２８において、処理群ａのうち、使用可能なメモリが処理ｘと同じ対象処理を処理群ｄ（処理ｘを含む）とし、処理群ｄの未割当データ量を、それよりも大きいメモリの最小容量の倍数で表した値の合計をｅとする（Ｓ２８）。ここでは、処理群ａが対象処理２５０、２６０であり、対象処理２５０、２６０は使用可能なメモリが同じであるため、処理群ｄは対象処理２５０、２６０となる。このとき対象処理２５０、２６０の未割当データ量は、それぞれ６２、６０である。このときメモリの最小容量は３２であり、３２の倍数で且つ６２、６０よりも大きい値であって最小の値はそれぞれ６４、６４である。したがって、ｅ＝１２８（＝６４＋６４）と特定される。 Subsequently, it is determined whether the capacity of the candidate memory c is equal to or less than the unallocated data amount of the process x (S26). Here, the capacity of the candidate memory c (RAM 200b) is 128, and the unallocated data of the process x (target process 250). The amount is 62 (N in S26). In S28, among the processing groups a, the target processing whose usable memory is the same as the processing x is the processing group d (including the processing x), and the unallocated data amount of the processing group d is set to the minimum memory capacity larger than that. The sum of the values expressed in multiples of e is set as e (S28). Here, since the process group a is the target processes 250 and 260 and the target processes 250 and 260 have the same usable memory, the process group d is the target processes 250 and 260. At this time, the unallocated data amounts of the target processes 250 and 260 are 62 and 60, respectively. At this time, the minimum capacity of the memory is 32, which is a multiple of 32 and larger than 62 and 60, and the minimum values are 64 and 64, respectively. Therefore, e = 128 (= 64 + 64) is specified.

次に、Ｓ３０では、候補メモリ群のうち、候補メモリｃ（ＲＡＭ２００ｂ）より容量が小さいメモリの容量の合計をｆとする。ここでＲＡＭ２００ｃ、２００ｅ、２００ｆ、２００ｇが、候補メモリｃ（ＲＡＭ２００ｂ）より容量が小さいため、ｆ＝１２８（＝３２＋３２＋３２＋３２となる。このときｅ＝ｆ（Ｓ３２のＮ）となり、ｉを１インクリメントして（Ｓ３８）、候補メモリ群のｉ番目のメモリと（ｉ−１）番目のメモリの容量が同じか判定される（Ｓ４０）。このとき２番目のメモリ（ＲＡＭ２００ｃ）と１番目のメモリ（ＲＡＭ２００ｂ）とは容量が異なるため（Ｓ４０のＮ）、Ｓ２４に戻る。Ｓ２４では、２番目のメモリであるＲＡＭ２００ｃが、候補メモリｃとして設定される。 Next, in S30, let f be the total memory capacity of the candidate memory group that is smaller than the candidate memory c (RAM 200b). Here, since the capacity of the RAMs 200c, 200e, 200f, and 200g is smaller than that of the candidate memory c (RAM 200b), f = 128 (= 32 + 32 + 32 + 32. At this time, e = f (N in S32), and i is incremented by 1. (S38), it is determined whether the capacities of the i-th memory and the (i-1) -th memory in the candidate memory group are the same (S40) At this time, the second memory (RAM 200c) and the first memory (RAM 200b) Since the capacity is different (N in S40), the process returns to S24, where the RAM 200c, which is the second memory, is set as the candidate memory c.

Ｓ２６では、候補メモリｃの容量（３２）が、処理ｘの未割当データ量（６２）以下であるため（Ｓ２６のＹ）、候補メモリｃを処理ｘの未割当データに割り当て（Ｓ３４）、これによりＲＡＭ２００ｃが対象処理２５０に割り当てられる。このとき処理ｘ（対象処理２５０）に未割当データが存在するため（Ｓ３６のＹ）、Ｓ１２に戻る。このとき対象処理２５０の未割当データ量は３０（＝６２−３２）となる。 In S26, since the capacity (32) of the candidate memory c is equal to or less than the unallocated data amount (62) of the process x (Y in S26), the candidate memory c is allocated to the unallocated data of the process x (S34). Thus, the RAM 200c is allocated to the target process 250. At this time, since there is unallocated data in the process x (target process 250) (Y in S36), the process returns to S12. At this time, the unallocated data amount of the target process 250 is 30 (= 62−32).

続いて、Ｓ１２では、処理群ａに対象処理２５０（未割当データ量３０）、対象処理２６０（未割当データ量６０）が選ばれる。Ｓ１４では、処理群ａのうち、未割当データ量の大きい対象処理２６０が処理ｘに設定される。Ｓ１６で、処理ｘが使用可能なＲＡＭ２００ｂ、２００ｅ、２００ｆ、２００ｇが候補メモリｂに設定される。候補メモリが存在するため（Ｓ１８のＹ）、Ｓ２０では、候補メモリｂがソートされて、ＲＡＭ２００ｂ、２００ｅ、２００ｆ、２００ｇの順番が保持される。そして、ｉ＝１（Ｓ２２）として、Ｓ２４では、まずＲＡＭ２００ｂが候補メモリｃに設定される。 Subsequently, in S12, the target process 250 (unallocated data amount 30) and the target process 260 (unallocated data amount 60) are selected as the processing group a. In S14, the target process 260 having a large unallocated data amount is set as the process x in the process group a. In S16, the RAMs 200b, 200e, 200f, and 200g that can use the process x are set as the candidate memory b. Since there is a candidate memory (Y in S18), in S20, the candidate memory b is sorted and the order of the RAMs 200b, 200e, 200f, and 200g is maintained. Then, i = 1 (S22), and in S24, the RAM 200b is first set as the candidate memory c.

続いて、候補メモリｃの容量が処理ｘの未割当データ量以下であるか判定され（Ｓ２６）、ここでは候補メモリｃ（ＲＡＭ２００ｂ）の容量が１２８、処理ｘ（対象処理２６０）の未割当データ量が６０である（Ｓ２６のＮ）。Ｓ２８において、処理群ａのうち、使用可能なメモリが処理ｘと同じ対象処理を処理群ｄ（処理ｘを含む）とし、処理群ｄの未割当データ量を、それよりも大きいメモリの最小容量の倍数で表した値の合計をｅとする（Ｓ２８）。ここでは、処理群ａが対象処理２５０、２６０であり、対象処理２５０、２６０は使用可能なメモリが同じであるため、処理群ｄは対象処理２５０、２６０となる。このとき対象処理２５０、２６０の未割当データ量は、それぞれ３０、６０である。このときメモリの最小容量は３２であり、３２の倍数で且つ３０、６０よりも大きい値であって最小の値はそれぞれ３２、６４である。したがって、ｅ＝９６（＝３２＋６４）と特定される。 Subsequently, it is determined whether the capacity of the candidate memory c is equal to or less than the unallocated data amount of the process x (S26). Here, the capacity of the candidate memory c (RAM 200b) is 128, and the unallocated data of the process x (target process 260). The amount is 60 (N in S26). In S28, among the processing groups a, the target processing whose usable memory is the same as the processing x is the processing group d (including the processing x), and the unallocated data amount of the processing group d is set to the minimum memory capacity larger than that. The sum of the values expressed in multiples of e is set as e (S28). Here, since the process group a is the target processes 250 and 260 and the target processes 250 and 260 have the same usable memory, the process group d is the target processes 250 and 260. At this time, the unallocated data amounts of the target processes 250 and 260 are 30 and 60, respectively. At this time, the minimum capacity of the memory is 32, which is a multiple of 32 and larger than 30, 60, and the minimum values are 32, 64, respectively. Therefore, e = 96 (= 32 + 64) is specified.

次に、Ｓ３０では、候補メモリ群のうち、候補メモリｃ（ＲＡＭ２００ｂ）より容量が小さいメモリの容量の合計をｆとする。ここでＲＡＭ２００ｅ、２００ｆ、２００ｇが、候補メモリｃ（ＲＡＭ２００ｂ）より容量が小さいため、ｆ＝９６（＝３２＋３２＋３２となる。このときｅ＝ｆ（Ｓ３２のＮ）となり、ｉを１インクリメントして（Ｓ３８）、候補メモリ群のｉ番目のメモリと（ｉ−１）番目のメモリの容量が同じか判定される（Ｓ４０）。このとき２番目のメモリ（ＲＡＭ２００ｅ）と１番目のメモリ（ＲＡＭ２００ｂ）とは容量が異なるため（Ｓ４０のＮ）、Ｓ２４に戻る。Ｓ２４では、２番目のメモリであるＲＡＭ２００ｅが、候補メモリｃとして設定される。 Next, in S30, let f be the total memory capacity of the candidate memory group that is smaller than the candidate memory c (RAM 200b). Here, since the capacity of the RAMs 200e, 200f, and 200g is smaller than that of the candidate memory c (RAM 200b), f = 96 (= 32 + 32 + 32. At this time, e = f (N of S32) and i is incremented by 1 (S38). It is determined whether the i-th memory and the (i-1) -th memory in the candidate memory group have the same capacity (S40), where the second memory (RAM 200e) and the first memory (RAM 200b) are Since the capacities are different (N in S40), the process returns to S24, where the RAM 200e as the second memory is set as the candidate memory c.

Ｓ２６では、候補メモリｃの容量（３２）が、処理ｘの未割当データ量（６０）以下であるため（Ｓ２６のＹ）、候補メモリｃを処理ｘの未割当データに割り当て（Ｓ３４）、これによりＲＡＭ２００ｅが対象処理２６０に割り当てられる。このとき処理ｘ（対象処理２６０）に未割当データが存在するため（Ｓ３６のＹ）、Ｓ１２に戻る。このとき対象処理２６０の未割当データ量は２８（＝６０−３２）となる。 In S26, since the capacity (32) of the candidate memory c is equal to or less than the unallocated data amount (60) of the process x (Y in S26), the candidate memory c is allocated to the unallocated data of the process x (S34). Thus, the RAM 200e is allocated to the target process 260. At this time, since unallocated data exists in the process x (target process 260) (Y in S36), the process returns to S12. At this time, the unallocated data amount of the target process 260 is 28 (= 60−32).

続いて、Ｓ１２では、処理群ａに対象処理２５０（未割当データ量３０）、対象処理２６０（未割当データ量２８）が選ばれる。Ｓ１４では、処理群ａのうち、未割当データ量の大きい対象処理２５０が処理ｘに設定される。Ｓ１６で、処理ｘが使用可能なＲＡＭ２００ｂ、２００ｆ、２００ｇが候補メモリｂに設定される。候補メモリが存在するため（Ｓ１８のＹ）、Ｓ２０では、候補メモリｂがソートされて、ＲＡＭ２００ｂ、２００ｆ、２００ｇの順番が保持される。そして、ｉ＝１（Ｓ２２）として、Ｓ２４では、まずＲＡＭ２００ｂが候補メモリｃに設定される。 Subsequently, in S12, the target process 250 (unallocated data amount 30) and the target process 260 (unallocated data amount 28) are selected as the processing group a. In S14, the target process 250 having a large unallocated data amount in the process group a is set as the process x. In S16, the RAMs 200b, 200f, and 200g that can use the process x are set as the candidate memory b. Since there is a candidate memory (Y in S18), in S20, the candidate memory b is sorted and the order of the RAMs 200b, 200f, and 200g is maintained. Then, i = 1 (S22), and in S24, the RAM 200b is first set as the candidate memory c.

続いて、候補メモリｃの容量が処理ｘの未割当データ量以下であるか判定され（Ｓ２６）、ここでは候補メモリｃ（ＲＡＭ２００ｂ）の容量が１２８、処理ｘ（対象処理２５０）の未割当データ量が３０である（Ｓ２６のＮ）。Ｓ２８において、処理群ａのうち、使用可能なメモリが処理ｘと同じ対象処理を処理群ｄ（処理ｘを含む）とし、処理群ｄの未割当データ量を、それよりも大きいメモリの最小容量の倍数で表した値の合計をｅとする（Ｓ２８）。ここでは、処理群ａが対象処理２５０、２６０であり、対象処理２５０、２６０は使用可能なメモリが同じであるため、処理群ｄは対象処理２５０、２６０となる。このとき対象処理２５０、２６０の未割当データ量は、それぞれ３０、２８である。このときメモリの最小容量は３２であり、３２の倍数で且つ３０、２８よりも大きい値であって最小の値はそれぞれ３２、３２である。したがって、ｅ＝６４（＝３２＋３２）と特定される。 Subsequently, it is determined whether the capacity of the candidate memory c is equal to or less than the unallocated data amount of the process x (S26). Here, the capacity of the candidate memory c (RAM 200b) is 128, and the unallocated data of the process x (target process 250). The amount is 30 (N in S26). In S28, among the processing groups a, the target processing whose usable memory is the same as the processing x is the processing group d (including the processing x), and the unallocated data amount of the processing group d is set to the minimum memory capacity larger than that. The sum of the values expressed in multiples of e is set as e (S28). Here, since the process group a is the target processes 250 and 260 and the target processes 250 and 260 have the same usable memory, the process group d is the target processes 250 and 260. At this time, the unallocated data amounts of the target processes 250 and 260 are 30 and 28, respectively. At this time, the minimum memory capacity is 32, which is a multiple of 32 and larger than 30, 28, and the minimum values are 32, 32, respectively. Therefore, e = 64 (= 32 + 32) is specified.

次に、Ｓ３０では、候補メモリ群のうち、候補メモリｃ（ＲＡＭ２００ｂ）より容量が小さいメモリの容量の合計をｆとする。ここでＲＡＭ２００ｆ、２００ｇが、候補メモリｃ（ＲＡＭ２００ｂ）より容量が小さいため、ｆ＝６４（＝３２＋３２）となる。このときｅ＝ｆ（Ｓ３２のＮ）となり、ｉを１インクリメントして（Ｓ３８）、候補メモリ群のｉ番目のメモリと（ｉ−１）番目のメモリの容量が同じか判定される（Ｓ４０）。このとき２番目のメモリ（ＲＡＭ２００ｆ）と１番目のメモリ（ＲＡＭ２００ｂ）とは容量が異なるため（Ｓ４０のＮ）、Ｓ２４に戻る。Ｓ２４では、２番目のメモリであるＲＡＭ２００ｆが、候補メモリｃとして設定される。 Next, in S30, let f be the total memory capacity of the candidate memory group that is smaller than the candidate memory c (RAM 200b). Here, since the RAMs 200f and 200g have a smaller capacity than the candidate memory c (RAM 200b), f = 64 (= 32 + 32). At this time, e = f (N in S32), i is incremented by 1 (S38), and it is determined whether the capacities of the i-th memory and the (i-1) -th memory in the candidate memory group are the same (S40). . At this time, since the second memory (RAM 200f) and the first memory (RAM 200b) have different capacities (N in S40), the process returns to S24. In S24, the second memory RAM 200f is set as the candidate memory c.

Ｓ２６では、候補メモリｃの容量（３２）が、処理ｘの未割当データ量（３０）以上であるため（Ｓ２６のＹ）、候補メモリｃを処理ｘの未割当データに割り当て（Ｓ３４）、これによりＲＡＭ２００ｆが対象処理２５０に割り当てられる。このとき処理ｘ（対象処理２６０）には未割当データが存在しなくなり（Ｓ３６のＮ）、Ｓ１０に戻る。 In S26, since the capacity (32) of the candidate memory c is equal to or larger than the unallocated data amount (30) of the process x (Y in S26), the candidate memory c is allocated to the unallocated data of the process x (S34). Thus, the RAM 200f is allocated to the target process 250. At this time, there is no unallocated data in the process x (target process 260) (N in S36), and the process returns to S10.

このとき未割当データのある対象処理２６０が存在し（Ｓ１０のＹ）、Ｓ１２では、対象処理２６０が処理群ａとして選ばれる。続いてＳ１４で、対象処理２６０が処理ｘとして設定され、Ｓ１６で、対象処理２６０が使用可能なメモリであるＲＡＭ２００ｂ、２００ｇが、候補メモリｂとして設定される。 At this time, there is a target process 260 with unallocated data (Y in S10), and in S12, the target process 260 is selected as the process group a. Subsequently, in S14, the target process 260 is set as the process x, and in S16, the RAMs 200b and 200g, which are memories that can be used by the target process 260, are set as the candidate memory b.

候補メモリｂが存在しているため（Ｓ１８のＹ）、Ｓ２０で、候補メモリｂが容量の大きい順にソートされ、その順番が保持される。このとき、候補メモリｂは、ＲＡＭ２００ｂ、２００ｇの順にソートされ、その順番が保持される。 Since the candidate memory b exists (Y in S18), the candidate memory b is sorted in descending order of capacity in S20, and the order is maintained. At this time, the candidate memory b is sorted in the order of the RAMs 200b and 200g, and the order is retained.

保持したＲＡＭ２００の順序を示すｉを１にセットし（Ｓ２２）、候補メモリ群のｉ番目のＲＡＭ２００を候補メモリｃとする（Ｓ２４）。ここでは、ＲＡＭ２００ｂが候補メモリｃとして設定される。続いて、候補メモリｃの容量が処理ｘの未割当データ量以下であるか判定される（Ｓ２６）。ここでは候補メモリｃ（ＲＡＭ２００ｂ）の容量が１２８、処理ｘ（対象処理２６０）の未割当データ量が２８であり、候補メモリｃの容量が処理ｘの未割当データ量より大きい（Ｓ２６のＮ）。 I indicating the order of the stored RAMs 200 is set to 1 (S22), and the i-th RAM 200 in the candidate memory group is set as a candidate memory c (S24). Here, the RAM 200b is set as the candidate memory c. Subsequently, it is determined whether the capacity of the candidate memory c is equal to or less than the unallocated data amount of the process x (S26). Here, the capacity of the candidate memory c (RAM 200b) is 128, the unallocated data amount of the process x (target process 260) is 28, and the capacity of the candidate memory c is larger than the unallocated data amount of the process x (N in S26). .

続いて、処理群ａのうち、使用可能なメモリが処理ｘと同じ対象処理を処理群ｄ（処理ｘを含む）とし、処理群ｄの未割当データ量を、それよりも大きいメモリの最小容量の倍数で表した値の合計をｅとする（Ｓ２８）。ここでは、処理群ａが対象処理２６０のみであるため処理群ｄは対象処理２６０となり、その未割当データ量は２８である。このときメモリの最小容量は３２であり、３２の倍数で且つ２８よりも大きい値であって最小の値は３２である。したがって、ｅ＝３２と特定される。次に、Ｓ３０では、候補メモリ群のうち、候補メモリｃより容量が小さいメモリの容量の合計をｆとする。ここでＲＡＭ２００ｂ、２００ｇのうち、候補メモリｃ（ＲＡＭ２００ｂ）より容量が小さいメモリはＲＡＭ２００ｇであり、ｆ＝３２となる。 Subsequently, in the processing group a, a target process whose usable memory is the same as that of the processing x is a processing group d (including the processing x), and the unallocated data amount of the processing group d is set to a minimum memory capacity larger than that. The sum of the values expressed in multiples of e is set as e (S28). Here, since the process group a is only the target process 260, the process group d is the target process 260, and the amount of unallocated data is 28. At this time, the minimum capacity of the memory is 32, which is a multiple of 32 and greater than 28, and the minimum value is 32. Therefore, e = 32 is specified. Next, in S30, the sum of the capacities of memories having a capacity smaller than that of the candidate memory c in the candidate memory group is defined as f. Here, of the RAMs 200b and 200g, the memory having a smaller capacity than the candidate memory c (RAM 200b) is the RAM 200g, and f = 32.

このときｅ＝ｆ（Ｓ３２のＮ）となり、ｉを１インクリメントして（Ｓ３８）、候補メモリ群のｉ番目のメモリと（ｉ−１）番目のメモリの容量が同じか判定される（Ｓ４０）。このとき２番目のメモリ（ＲＡＭ２００ｇ）と１番目のメモリ（ＲＡＭ２００ｂ）とは容量が異なるため（Ｓ４０のＮ）、Ｓ２４に戻る。Ｓ２４では、２番目のメモリであるＲＡＭ２００ｇが、候補メモリｃとして設定される。 At this time, e = f (N in S32), i is incremented by 1 (S38), and it is determined whether the capacities of the i-th memory and the (i-1) -th memory in the candidate memory group are the same (S40). . At this time, since the second memory (RAM 200g) and the first memory (RAM 200b) have different capacities (N in S40), the process returns to S24. In S24, the second memory RAM 200g is set as the candidate memory c.

このとき候補メモリｃ（ＲＡＭ２００ｇ）の容量が３２、処理ｘ（対象処理２６０）の未割当データ量が２８であり、候補メモリｃの容量が処理ｘの未割当データ量より大きい（Ｓ２６のＮ）。 At this time, the capacity of the candidate memory c (RAM 200g) is 32, the unallocated data amount of the process x (target process 260) is 28, and the capacity of the candidate memory c is larger than the unallocated data amount of the process x (N in S26). .

続いて、処理群ａのうち、使用可能なメモリが処理ｘと同じ対象処理を処理群ｄ（処理ｘを含む）とし、処理群ｄの未割当データ量を、それよりも大きいメモリの最小容量の倍数で表した値の合計をｅとする（Ｓ２８）。ここでは、処理群ａが対象処理２６０のみであるため処理群ｄは対象処理２６０となり、その未割当データ量は２８である。このときメモリの最小容量は３２であり、３２の倍数で且つ２８よりも大きい値であって最小の値は３２である。したがって、ｅ＝３２と特定される。次に、Ｓ３０では、候補メモリ群のうち、候補メモリｃより容量が小さいメモリの容量の合計をｆとする。ここでＲＡＭ２００ｂ、２００ｇのうち、候補メモリｃ（ＲＡＭ２００ｇ）より容量が小さいメモリは存在しないため、ｆ＝０となる。 Subsequently, in the processing group a, a target process whose usable memory is the same as that of the processing x is a processing group d (including the processing x), and the unallocated data amount of the processing group d is set to a minimum memory capacity larger than that. The sum of the values expressed in multiples of e is set as e (S28). Here, since the process group a is only the target process 260, the process group d is the target process 260, and the amount of unallocated data is 28. At this time, the minimum capacity of the memory is 32, which is a multiple of 32 and greater than 28, and the minimum value is 32. Therefore, e = 32 is specified. Next, in S30, the sum of the capacities of memories having a capacity smaller than that of the candidate memory c in the candidate memory group is defined as f. Here, since there is no memory having a smaller capacity than the candidate memory c (RAM 200g) among the RAMs 200b and 200g, f = 0.

ここでｅとｆとを比較し（Ｓ３２）、この場合はｅ（＝３２）＞ｆ（＝０）（Ｓ３２のＹ）であるため、候補メモリｃを処理ｘの未割当データに割り当てる（Ｓ３４）。これによりＲＡＭ２００ｇが対象処理２６０に割り当てられる。このとき処理ｘ（対象処理２６０）のデータはすべて割り当てられたことになり、したがって処理ｘに未割当データはなくなるため（Ｓ３６のＮ）、Ｓ１０に戻る。Ｓ１０では、すべての対象処理についてメモリの割り当てが完了したため（Ｓ１０のＮ）、本フローが終了する。 Here, e and f are compared (S32), and in this case, e (= 32)> f (= 0) (Y in S32), so that the candidate memory c is assigned to the unallocated data of the process x (S34). ). As a result, the RAM 200g is allocated to the target process 260. At this time, all the data of the process x (target process 260) has been allocated, and therefore there is no unallocated data in the process x (N in S36), and the process returns to S10. In S10, since the memory allocation has been completed for all target processes (N in S10), this flow ends.

なお、Ｓ１８において候補メモリｂが存在しない場合（Ｓ１８のＮ）、処理ｘが使用可能なメモリで、かつすでに割り当てられているメモリのうち割当先の処理が別のメモリに割当変更可能なメモリｘがあるか判定される（Ｓ４２）。そのようなメモリｘが存在しない場合（Ｓ４２のＮ）、割当ができず、本フローは終了する。一方で、そのようなメモリｘが存在する場合（Ｓ４２のＹ）、メモリｘの割当先の処理において、メモリｘの割当を別のメモリに変更し（Ｓ４４）、メモリｘを処理ｘの未割当データに割り当て（Ｓ４６）、Ｓ３６に移行する。このように、候補メモリｂが存在しない場合には、すでに割り当てられているメモリを変更することで、再度、メモリの割り当て処理を実行する。 If the candidate memory b does not exist in S18 (N in S18), the memory x can be used for the process x, and the memory x that can be reassigned to another memory among the already allocated memories. It is determined whether there is (S42). If such a memory x does not exist (N in S42), allocation cannot be performed, and this flow ends. On the other hand, when such a memory x exists (Y of S42), in the process of the allocation destination of the memory x, the allocation of the memory x is changed to another memory (S44), and the memory x is not allocated to the process x. Data is assigned (S46), and the process proceeds to S36. As described above, when the candidate memory b does not exist, the memory allocation process is executed again by changing the already allocated memory.

図１７は、対象処理に割り当てたＲＡＭを示す。
以上により、コンパイル部３０は、各対象処理へのメモリ割り当てを自動で行なうことができ、ユーザの開発効率を向上させることができる。また、この割当処理を実行することで、使用するメモリの総容量を極力小さくできる。メモリは一般的に容量が大きくなるほど消費電力が大きくなるため、使用するメモリの総容量を小さくすることで、メモリの消費電力を低減することができる。例えば、図１５において、本手法を用いず、対象処理２５０から順に、使用可能なＲＡＭ２００を、符号として付しているアルファベット順に割り当てていくと、対象処理２５０にはＲＡＭ２００ａ、２００ｂ、対象処理２６０にはＲＡＭ２００ｃ、２００ｅ、対象処理２７０にはＲＡＭ２００ｄ、２００ｆが割り当てられる。したがって、この場合の割り当てられたＲＡＭ２００の総容量は３８４となり、図１７における割り当てられたＲＡＭ２００の総容量（３２０）よりも大きくなる。 FIG. 17 shows the RAM allocated to the target process.
As described above, the compiling unit 30 can automatically allocate memory to each target process, and can improve the development efficiency of the user. Also, by executing this allocation process, the total capacity of the memory to be used can be minimized. Since the memory generally increases in power consumption as its capacity increases, the power consumption of the memory can be reduced by reducing the total capacity of the memory used. For example, in FIG. 15, when the usable RAM 200 is assigned in the alphabetical order given as a reference in order from the target process 250 without using this method, the target process 250 is assigned to the RAMs 200 a and 200 b and the target process 260. Are allocated to the RAMs 200c and 200e, and the target processing 270 is the RAMs 200d and 200f. Accordingly, the total capacity of the allocated RAM 200 in this case is 384, which is larger than the total capacity (320) of the allocated RAM 200 in FIG.

今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 The embodiment disclosed this time should be considered as illustrative in all points and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

実施の形態に係る処理装置の基本構成図である。1 is a basic configuration diagram of a processing apparatus according to an embodiment. 演算ユニットの構成を示す図である。It is a figure which shows the structure of an arithmetic unit. マルチスレッド処理を説明するための説明図である。It is explanatory drawing for demonstrating multithread processing. 本実施形態の処理装置の構成の一例を示す図である。It is a figure which shows an example of a structure of the processing apparatus of this embodiment. 本実施形態の処理装置の構成の変形例を示す図である。It is a figure which shows the modification of the structure of the processing apparatus of this embodiment. 各演算ユニットに対してＲＡＭを用いてメインメモリを構成した例を示す図である。It is a figure which shows the example which comprised the main memory using RAM with respect to each arithmetic unit. 共通メモリ回路の構成を示す図である。It is a figure which shows the structure of a common memory circuit. 第２選択回路の回路構成を示す図である。It is a figure which shows the circuit structure of a 2nd selection circuit. 本実施形態の処理装置の構成のさらなる変形例を示す図である。It is a figure which shows the further modification of the structure of the processing apparatus of this embodiment. 演算ユニットの構成の一例を示す図である。It is a figure which shows an example of a structure of an arithmetic unit. スレッド１とスレッド３とでアドレス空間を共用するときの出力選択回路の動作を示す図である。FIG. 10 is a diagram illustrating an operation of the output selection circuit when the thread 1 and the thread 3 share an address space. 図９に示す処理装置において、スレッド１とスレッド３がメモリアドレス空間を共用したときのメモリ構成例を示す図である。FIG. 10 is a diagram illustrating a memory configuration example when a thread 1 and a thread 3 share a memory address space in the processing device illustrated in FIG. 9. ２つの演算ユニットの構成例を示す図である。It is a figure which shows the structural example of two arithmetic units. ２つの演算ユニットの別の構成例を示す図である。It is a figure which shows another structural example of two arithmetic units. メモリと対象処理との接続がハードウェア上で制限された処理装置の構成を示す図である。It is a figure which shows the structure of the processing apparatus with which the connection of a memory and an object process was restrict | limited on hardware. メモリ割当処理のフローチャートを示す図である。It is a figure which shows the flowchart of a memory allocation process. メモリ割当処理のフローチャートを示す図である。It is a figure which shows the flowchart of a memory allocation process. メモリ割当処理のフローチャートを示す図である。It is a figure which shows the flowchart of a memory allocation process. 対象処理に割り当てたＲＡＭを示す図である。It is a figure which shows RAM allocated to the object process.

Explanation of symbols

１０・・・処理装置、１２・・・演算ユニット、１４・・・設定部、１８・・・制御部、２０・・・内部状態保持回路、２４・・・第１フィードバック経路、２６・・・集積回路装置、２７・・・メインメモリ、２９・・・第２フィードバック経路、３０・・・コンパイル部、３１・・・データフローグラフ処理部、３２・・・設定データ生成部、３４・・・記憶部、３６・・・プログラム、３８・・・データフローグラフ、４０・・・設定データ、５０・・・論理回路、６０・・・共通メモリ回路、７０，８０・・・個別メモリ回路、１００・・・ＲＡＭ、１１０・・・第１選択回路、１２０・・・第２選択回路、１３０・・・メモリ制御回路、１３８・・・アクセス判定回路、１４０・・・遅延回路、１４２・・・ＣＥ生成回路、１５０・・・読出制御回路、１５２，１５４・・・読出データ変換回路、１５６，１５８・・・読出データ選択回路、１７０・・・出力選択回路、１８０，１８２・・・入力選択回路、２００・・・ＲＡＭ、２１０・・・共通メモリ回路、２２０，２３０・・・個別メモリ回路。 DESCRIPTION OF SYMBOLS 10 ... Processing apparatus, 12 ... Arithmetic unit, 14 ... Setting part, 18 ... Control part, 20 ... Internal state holding circuit, 24 ... First feedback path, 26 ... Integrated circuit device, 27 ... main memory, 29 ... second feedback path, 30 ... compiling unit, 31 ... data flow graph processing unit, 32 ... setting data generating unit, 34 ... Storage unit 36 ... program 38 ... data flow graph 40 ... setting data 50 ... logic circuit 60 ... common memory circuit 70,80 ... individual memory circuit 100 ... RAM, 110 ... first selection circuit, 120 ... second selection circuit, 130 ... memory control circuit, 138 ... access determination circuit, 140 ... delay circuit, 142 ... CE generation circuit, 150 ..Read control circuit, 152,154 ... Read data conversion circuit, 156,158 ... Read data selection circuit, 170 ... Output selection circuit, 180,182 ... Input selection circuit, 200 ... RAM, 210 ... common memory circuit, 220, 230 ... individual memory circuit.

Claims

An arithmetic processing device capable of changing functions,
A plurality of arithmetic units for performing arithmetic processing;
Multiple storage areas;
A storage area configuration circuit that forms an address space using at least one storage area for each arithmetic unit, and
At least one storage area is connected to only a part of a plurality of arithmetic units.

An arithmetic processing device capable of changing functions,
At least one arithmetic unit for performing a plurality of independent arithmetic processes;
Multiple storage areas;
A storage area configuration circuit that forms an address space using at least one storage area for each arithmetic processing, and
At least one storage area is connected to an arithmetic unit so as to be used only for a part of a plurality of arithmetic processes.

The storage area configuration circuit includes:
Based on some bits of the address signal supplied,
A control circuit for selecting a storage area to which data is written or read out of at least one storage area constituting the address space;
A read data selection circuit for selecting one of data read from at least one storage area,
The arithmetic processing apparatus according to claim 1, wherein the control circuit is provided for each storage area.

4. The address space is configured in such a manner that, in the case where the address space is constituted by storage areas having different storage capacities, the address space is assigned a smaller address value in order from the storage area having the smaller storage capacity. Arithmetic processing device.

The address space is configured by a plurality of storage areas, and when there are storage areas having the same storage capacity, a smaller address value is assigned in order from the largest number of storage areas having the same storage capacity. The arithmetic processing device according to any one of 1 to 3.

6. The arithmetic processing apparatus according to claim 1, wherein a single address space can be shared by a plurality of arithmetic processes.

7. The access timing is controlled so that the plurality of arithmetic processes do not access the shared address space at the same time when an address space constituted by the same storage area is shared by the plurality of arithmetic processes. The arithmetic processing unit described.

The arithmetic unit has a multistage arrangement structure of logic circuits,
When the first arithmetic processing enables the use of the address space configured for the second arithmetic processing, the first arithmetic processing and the second arithmetic processing do not access the shared address space at the same time. The arithmetic processing apparatus according to claim 7, wherein output stages to the address space or input stages are limited.

9. The storage area according to claim 1, wherein a storage area where all the arithmetic processes can be used and a storage area where only a part of the plurality of arithmetic processes can be used exist. The arithmetic processing unit described in 1.

A storage area allocating method for determining a storage area to be assigned to each arithmetic process for an arithmetic processing apparatus having a storage area configuration circuit capable of executing a plurality of arithmetic processes and configuring a storage area for the arithmetic process. And
Obtaining the amount of data that each arithmetic process must hold;
Obtaining the amount of storage space available;
Allocating a storage area that can be used by each arithmetic processing according to the amount of data that each arithmetic processing must hold;
A storage area allocating method comprising:

A step of selecting a calculation process in which the total capacity of the usable storage area is the smallest among the calculation processes for which storage area allocation has not been completed;
The storage area allocation method according to claim 10, wherein the storage area is allocated from the selected arithmetic processing.

The arithmetic processing device according to any one of claims 1 to 9, wherein the storage area configuration circuit includes a storage area allocated to each arithmetic processing by the storage area allocation method according to any one of claims 10 to 11. An arithmetic processing apparatus comprising an address space using a relationship.