JP6897213B2

JP6897213B2 - Code generator, code generator and code generator

Info

Publication number: JP6897213B2
Application number: JP2017058550A
Authority: JP
Inventors: 敏也平田
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2017-03-24
Filing date: 2017-03-24
Publication date: 2021-06-30
Anticipated expiration: 2037-03-24
Also published as: JP2018163381A

Description

本開示は、プロセッサにおいて実行可能なコードを生成する技術に関する。 The present disclosure relates to techniques for generating code that can be executed on a processor.

近年の情報処理装置（コンピュータ等）においては、ＳＩＭＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎＭｕｌｔｉｐｌｅＤａｔａ）命令を実行可能なプロセッサが用いられることがある。ＳＩＭＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎＭｕｌｔｉｐｌｅＤａｔａ）命令は、複数のデータに対して、同じ演算処理を並列に実行可能な命令（並列実行命令）である。ＳＩＭＤ命令の使用は、少ない数の命令を用いて複数のデータに関する演算を実行することを可能とする。 In recent information processing devices (computers and the like), a processor capable of executing SIMD (Single Instruction Multiple Data) instructions may be used. The SIMD (Single Instruction Multiple Data) instruction is an instruction (parallel execution instruction) capable of executing the same arithmetic processing in parallel for a plurality of data. The use of SIMD instructions makes it possible to perform operations on multiple data with a small number of instructions.

ＳＩＭＤ演算命令を用いる技術として、以下の技術が知られている。 The following techniques are known as techniques that use SIMD operation instructions.

特許文献１には、高級言語を用いて記述されたソースコードを中間表現に変換する過程において、異なるデータに対して同じ演算を適用している箇所をみつけ、ＳＩＭＤ命令に置き換える技術が記載されている。 Patent Document 1 describes a technique of finding a place where the same operation is applied to different data in the process of converting a source code written using a high-level language into an intermediate representation, and replacing it with a SIMD instruction. There is.

特許文献２には、ソースコードの依存関係を示す演算木を生成し、各演算木に含まれる演算命令を並べた演算列から共通するサブ演算列を抽出し、サブ演算列に含まれる命令を組合せてＳＩＭＤ命令を生成する技術が記載されている。 In Patent Document 2, a calculation tree showing the dependency of the source code is generated, a common sub-operation string is extracted from the operation sequence in which the operation instructions included in each operation tree are arranged, and the instructions included in the sub-operation string are used. Techniques for generating SIMD instructions in combination are described.

特許文献３には、ソースコードをコンパイルする際に命令間の依存関係を示すツリーを生成し、そのツリーの一部を、複合演算命令に置き換える技術が記載されている。 Patent Document 3 describes a technique of generating a tree showing the dependency between instructions when compiling a source code and replacing a part of the tree with a compound operation instruction.

特開２００３−２０２９９１号公報Japanese Unexamined Patent Publication No. 2003-20291 特開２０１３−２０６２８９号公報Japanese Unexamined Patent Publication No. 2013-206289 特開２０１５−１４３９３９号公報Japanese Unexamined Patent Publication No. 2015-143939

あるプログラムにおいて、同じ種類の複数の演算命令に依存関係（例えば、ある演算命令の結果を、他の演算命令が参照する関係、等）がなく、並列に実行可能な場合を想定する。この場合、これらの命令を組合せて、ＳＩＭＤ命令を生成することができる。例えば、上記特許文献１は、複数のデータに対して同じ演算命令が実行される場合、当該命令をＳＩＭＤ命令に置き換える。 In a certain program, it is assumed that a plurality of arithmetic instructions of the same type have no dependency (for example, a relation in which the result of one arithmetic instruction is referred to by another arithmetic instruction) and can be executed in parallel. In this case, these instructions can be combined to generate a SIMD instruction. For example, in Patent Document 1, when the same operation instruction is executed for a plurality of data, the instruction is replaced with a SIMD instruction.

一方、例えば、あるプログラムにおいて、同じ種類の演算命令の間に依存関係がある場合や、異なる演算命令（例えば、加減算と乗算）が連続して実行されるような場合、これらの演算命令は、必ずしも単純にＳＩＭＤ命令に置き換え可能とは限らない。 On the other hand, for example, in a program, when there is a dependency between the same type of arithmetic instructions, or when different arithmetic instructions (for example, addition / subtraction and multiplication) are executed consecutively, these arithmetic instructions are used. It is not always possible to simply replace it with a SIMD instruction.

又、上記特許文献２、特許文献３に記載された方法は、プログラムの中に含まれる、ある特定の命令の組合せ（乗算命令とその乗算結果を参照する加算命令との組合せ）を積和命令（ＦＭＡ（ＦｕｓｅｄＭｕｌｔｉｐｌｙａｎｄａｄｄ）命令）に置き換え可能な場合に限り、係るＦＭＡ命令をＳＩＭＤ命令として並列に実行可能である。即ち、これらの技術を採用した場合、ＳＩＭＤ命令に変換可能な命令が限定される。以上より、プログラムにおいて異なる命令を含む部分については、例えばＳＩＭＤ命令を用いた実行効率の向上が困難な場合がある、という問題があった。 Further, in the method described in Patent Documents 2 and 3, a product-sum instruction is a combination of a specific instruction (a combination of a multiplication instruction and an addition instruction that refers to the multiplication result) included in the program. The FMA instruction can be executed in parallel as a SIMD instruction only when it can be replaced with (FMA (Fused Programy and added) instruction). That is, when these techniques are adopted, the instructions that can be converted into SIMD instructions are limited. From the above, there is a problem that it may be difficult to improve the execution efficiency by using, for example, the SIMD instruction for the part including different instructions in the program.

本開示に係る技術は、上記のような状況を鑑みて着想されたものである。本開示に係る技術は、異なる複数の種類の命令を含むプログラムの実行効率を向上する技術を提供することを、主たる目的の１つとする。 The technology according to the present disclosure was conceived in view of the above circumstances. One of the main purposes of the technique according to the present disclosure is to provide a technique for improving the execution efficiency of a program including a plurality of different types of instructions.

本開示に係る技術の１つの態様であるコード生成装置は、コンピュータ・プログラムである解析対象コードに含まれる１以上の演算命令のうち、命令の種類が異なる上記演算命令の間の依存関係に基づいて、複数のデータに対して、異なる種類の演算を１つの命令として実行可能な融合演算命令を並列に実行可能な並列実行命令に変換可能か否かを判定するコード解析部と、上記コード解析部により、上記並列実行命令に変換可能であると判定された上記演算命令を上記融合演算命令に変換してから、その上記融合演算命令のオペランドが、上記並列実行命令として実行可能な形式に配置されたデータを生成することで、上記融合演算命令を並列に実行する上記並列実行命令を生成する命令生成部と、を備える。 The code generator, which is one aspect of the technique according to the present disclosure, is based on the dependency between the operation instructions of different types among one or more operation instructions included in the analysis target code which is a computer program. A code analysis unit that determines whether or not a fusion operation instruction that can execute different types of operations as one instruction can be converted into a parallel execution instruction that can be executed in parallel for a plurality of data, and the above code analysis. After converting the operation instruction determined to be convertible to the parallel execution instruction by the unit into the fusion operation instruction, the operand of the fusion operation instruction is arranged in a format that can be executed as the parallel execution instruction. It includes an instruction generation unit that generates the parallel execution instruction that executes the fusion operation instruction in parallel by generating the generated data.

本開示に係る技術の１つの態様であるコード生成方法は、複数の演算命令により構成されるコンピュータ・プログラムである解析対象コードに含まれる１以上の演算命令を、上記演算命令の種類と、上記演算命令の間の依存関係とに基づいて、複数のデータに対して、異なる種類の演算を１つの命令として実行可能な融合演算命令を並列に実行可能な並列実行命令に変換可能か否かを判定し、
上記並列実行命令に変換可能であると判定された上記演算命令を上記融合演算命令に変換してから、その上記融合演算命令のオペランドが、上記並列実行命令として実行可能な形式に配置されたデータを生成することで、上記融合演算命令を並列に実行する上記並列実行命令を生成することを含む。 In the code generation method, which is one aspect of the technique according to the present disclosure, one or more arithmetic instructions included in the analysis target code, which is a computer program composed of a plurality of arithmetic instructions, are divided into the above operation instruction types and the above Whether or not a fusion operation instruction that can execute different types of operations as one instruction can be converted into a parallel execution instruction that can be executed in parallel for a plurality of data based on the dependency between the operation instructions. Judge,
Data in which the operand of the fusion operation instruction is arranged in a format that can be executed as the parallel execution instruction after the operation instruction determined to be convertible to the parallel execution instruction is converted into the fusion operation instruction. Is included to generate the parallel execution instruction that executes the fusion operation instruction in parallel.

また、上記目的は、上記構成を有するコード生成装置、及び、対応するコード生成方法をコンピュータによって実現するコンピュータ・プログラム、及び、そのコンピュータ・プログラムが格納されているコンピュータ読み取り可能な記憶媒体によっても達成される。 The above object is also achieved by a code generator having the above configuration, a computer program that realizes the corresponding code generation method by a computer, and a computer-readable storage medium in which the computer program is stored. Will be done.

本開示に係る技術によれば、異なる複数の種類の命令を含むプログラムの実行効率を向上することが可能である。 According to the technique according to the present disclosure, it is possible to improve the execution efficiency of a program including a plurality of different types of instructions.

図１は、ＳＩＭＤ命令による処理の一例を示す説明図である。FIG. 1 is an explanatory diagram showing an example of processing by the SIMD instruction. 図２は、複数の命令を１つのＳＩＭＤ命令に変換する１つの具体例を示す説明図である。FIG. 2 is an explanatory diagram showing one specific example of converting a plurality of instructions into one SIMD instruction. 図３は、異なる複数の命令の具体例を示す説明図である。FIG. 3 is an explanatory diagram showing specific examples of a plurality of different instructions. 図４は、本開示の第１の実施形態に係るコード生成装置の機能的な構成を例示するブロック図である。FIG. 4 is a block diagram illustrating a functional configuration of the code generation device according to the first embodiment of the present disclosure. 図５Ａは、コード生成装置の動作の具体例を示すフローチャート（１／２）である。FIG. 5A is a flowchart (1/2) showing a specific example of the operation of the code generation device. 図５Ｂは、コード生成装置の動作の具体例を示すフローチャート（２／２）である。FIG. 5B is a flowchart (2/2) showing a specific example of the operation of the code generation device. 図６は、単純な算術演算を、ＦＭＡ演算に変換する方法を例示する説明図である。FIG. 6 is an explanatory diagram illustrating a method of converting a simple arithmetic operation into an FMA operation. 図７は、解析対象コードに含まれる単純な算術演算の命令から、複数のＦＭＡ演算を並列に実行するＳＩＭＤ命令を生成する過程の概要を示す説明図である。FIG. 7 is an explanatory diagram showing an outline of a process of generating a SIMD instruction for executing a plurality of FMA operations in parallel from a simple arithmetic operation instruction included in the analysis target code. 図８は、解析対象コードの具体例を示す説明図である。FIG. 8 is an explanatory diagram showing a specific example of the analysis target code. 図９は、図８に例示する解析対象コードの具体例に含まれる命令の依存関係を例示する説明図である。FIG. 9 is an explanatory diagram illustrating the dependency relationship of the instructions included in the specific example of the analysis target code illustrated in FIG. 図１０は、図８に例示する解析対象コードに含まれる命令から変換されたＳＩＭＤ命令を例示する説明図である。FIG. 10 is an explanatory diagram illustrating a SIMD instruction converted from an instruction included in the analysis target code illustrated in FIG. 図１１は、２つの方法を用いて、ある算術演算から変換されたＦＭＡ演算の具体例を示す説明図である。FIG. 11 is an explanatory diagram showing a specific example of an FMA operation converted from a certain arithmetic operation using two methods. 本開示の各実施形態において、解析対象コードから生成されたオブジェクトコードを実行可能な情報処理装置の構成を例示するブロック図である。It is a block diagram which illustrates the structure of the information processing apparatus which can execute the object code generated from the analysis target code in each embodiment of this disclosure. 本開示の第２の実施形態におけるコード生成装置の機能的な構成を例示するブロック図である。It is a block diagram which illustrates the functional structure of the code generation apparatus in the 2nd Embodiment of this disclosure. 図１４は、各実施形態を実現可能なハードウェアの構成を例示する説明図である。FIG. 14 is an explanatory diagram illustrating a hardware configuration that can realize each embodiment.

まず、本開示に関する技術的な検討事項等について、より詳細に説明する。 First, the technical matters to be examined regarding the present disclosure will be described in more detail.

一般的に、ＳＩＭＤ命令は、図１に例示するように、複数のデータに対して、一つの命令で同じ演算処理を並列に（同時に）実行可能である。図１に示す具体例の場合、１つのＳＩＭＤ命令により、４つのデータに関する加算処理が実行される。 In general, the SIMD instruction can execute the same arithmetic processing in parallel (simultaneously) with one instruction for a plurality of data as illustrated in FIG. In the case of the specific example shown in FIG. 1, one SIMD instruction executes addition processing for four data.

このように、ＳＩＭＤ命令を用いることにより、複数のデータに対する演算処理を、少ない数の命令を用いて実行できる。これより、コードサイズが低減され、プログラムの実行時間が改善される。 In this way, by using the SIMD instruction, it is possible to execute arithmetic processing on a plurality of data using a small number of instructions. This reduces the code size and improves the execution time of the program.

例えば、図２に例示するように、あるコードが２つの加算命令を含む場合、ある種のコンパイラは、これらの命令を、２つの加算を並行して（同時に）実行するＳＩＭＤ命令に変換することが可能である。一方、図３に例示するように、異なる命令（図3の場合は、加算命令と乗算命令）の組については、典型的には、これらの命令をＳＩＭＤ命令に直接的には変換できない場合が多い。この場合、各々の命令が逐次的に実行される。即ち、この場合、ＳＩＭＤ命令を用いてプログラムの実行効率を改善することが困難である。 For example, as illustrated in FIG. 2, if a code contains two add instructions, some compilers may convert these instructions into SIMD instructions that execute the two additions in parallel (simultaneously). Is possible. On the other hand, as illustrated in FIG. 3, for a set of different instructions (in the case of FIG. 3, an addition instruction and a multiplication instruction), it may not be possible to directly convert these instructions into SIMD instructions. There are many. In this case, each instruction is executed sequentially. That is, in this case, it is difficult to improve the execution efficiency of the program by using the SIMD instruction.

本開示に係る技術は、上記のような事情から着想を得ており、異なる種類の命令をＳＩＭＤ命令として実行可能なコード（例えば、オブジェクトコード、実行コード等のプログラム）を生成することで、コードのサイズを低減する。また、これによりプログラムの実行時間が改善される。 The technology according to the present disclosure is inspired by the above circumstances, and is coded by generating code (for example, a program such as object code or execution code) that can execute different types of instructions as SIMD instructions. Reduce the size of. This also improves the execution time of the program.

以下の各実施形態を用いて、以下、各実施形態を用いて、本開示に係る技術を実現可能な、コード生成装置、コード生成方法等について詳細に説明する。 Using each of the following embodiments, a code generation device, a code generation method, and the like that can realize the technique according to the present disclosure will be described in detail below using each embodiment.

以下の各実施形態においては、ＳＩＭＤ命令を実行可能な演算器、より具体的には、ＳＩＭＤ型の融合演算（例えば、ＦＭＡ演算等）を実行可能な演算器（例えば、ＳＩＭＤ型ＦＭＡ演算器）を有するプロセッサ向けのコード（オブジェクトコード、実行コード等）を生成する技術について説明する。 In each of the following embodiments, an arithmetic unit capable of executing SIMD instructions, more specifically, an arithmetic unit capable of executing SIMD-type fusion operations (for example, FMA operations) (for example, SIMD-type FMA arithmetic units). The technique for generating the code (object code, execution code, etc.) for the processor having the above will be described.

融合演算は、例えば、ＦＭＡ演算のように、複数種類の演算を１つの演算として実行するような演算である。ＳＩＭＤ型の融合演算を実行可能な演算器は、融合演算を表す命令（融合演算命令）を、１つの命令として実行可能である。以下においては、説明の便宜上、融合演算としてＦＭＡ演算を用いる構成を例示する。なお、本開示に係る技術は、これには限定されず、ＦＭＡ演算以外の融合演算が用いられてもよい。以下、ＳＩＭＤ型ＦＭＡ演算器を用いる命令を、ＳＩＭＤ−ＦＭＡ命令と記載することがある。 The fusion operation is an operation that executes a plurality of types of operations as one operation, such as an FMA operation. An arithmetic unit capable of executing a SIMD type fusion operation can execute an instruction representing a fusion operation (fusion operation instruction) as one instruction. In the following, for convenience of explanation, a configuration in which the FMA operation is used as the fusion operation will be illustrated. The technique according to the present disclosure is not limited to this, and a fusion operation other than the FMA operation may be used. Hereinafter, an instruction using a SIMD type FMA arithmetic unit may be described as a SIMD-FMA instruction.

ＦＭＡ演算器は乗算器と加算器とを含むよう構成され、積和演算を１命令で実行可能な演算器である。積和演算は、例えば、（（Ａ＊Ｂ）＋Ｃ）という形式の演算（乗算の結果を加算する演算）である。ＳＩＭＤ型ＦＭＡ演算器は、複数（例えば２つ）の組のＦＭＡ演算を、１命令で処理可能な演算器である。 The FMA arithmetic unit is configured to include a multiplier and an adder, and is an arithmetic unit capable of executing a product-sum operation with one instruction. The product-sum operation is, for example, an operation of the form ((A * B) + C) (an operation of adding the results of multiplication). The SIMD type FMA arithmetic unit is an arithmetic unit capable of processing a plurality of (for example, two) sets of FMA operations with one instruction.

本開示に係る技術は、例えば、解析対象のコード（例えば、ソースコード等）に含まれる演算命令の種類と依存関係とに応じて、それらの演算命令をＳＩＭＤ−ＦＭＡ命令に変換可能か否かを判定することを可能とする。そして、その判定結果に基づいて、それらの演算命令を、ＳＩＭＤ−ＦＭＡ命令に変換することを可能とする。また、この際、本開示に係る技術は、ＳＩＭＤ−ＦＭＡ命令により処理されるデータ（オペランド）を、ＳＩＭＤ−ＦＭＡ命令により参照されるレジスタに配置可能な形式にまとめる（パックする）ことが可能である。 Whether or not the technique according to the present disclosure can convert those arithmetic instructions into SIMD-FMA instructions according to, for example, the types and dependencies of the arithmetic instructions included in the code to be analyzed (for example, source code, etc.). Can be determined. Then, based on the determination result, it is possible to convert those arithmetic instructions into SIMD-FMA instructions. Further, at this time, the technique according to the present disclosure can collect (pack) the data (operand) processed by the SIMD-FMA instruction in a format that can be arranged in the register referenced by the SIMD-FMA instruction. is there.

以下、本開示に係る技術を実現可能な実施形態について説明する。以下の各実施形態に記載されている装置等の構成は例示であり、本開示に係る技術の範囲はそれらには限定されない。以下の各実施形態における装置を構成する構成要素の区分け（例えば、機能的な単位による分割）は、本開示に係る技術を実現可能な一例である。本開示に係る技術の実現に際しては、以下の例示に限定されず、様々な構成が想定される。即ち、以下の各実施形態に例示する各構成要素は、更に分割されてもよい。また、以下の各実施形態における１以上の構成要素が、統合されてもよい。 Hereinafter, embodiments in which the technology according to the present disclosure can be realized will be described. The configurations of the devices and the like described in each of the following embodiments are examples, and the scope of the technology according to the present disclosure is not limited thereto. The division of the components constituting the apparatus in each of the following embodiments (for example, division by functional units) is an example in which the technique according to the present disclosure can be realized. In realizing the technology according to the present disclosure, various configurations are assumed without being limited to the following examples. That is, each component illustrated in each of the following embodiments may be further divided. In addition, one or more components in each of the following embodiments may be integrated.

本開示に係る技術は、単体の装置（物理的及び仮想的な装置）を用いて実現されてもよく、複数の離間した装置（物理的及び仮想的な装置）を用いて実現されてもよい。本開示に係る技術が複数の装置により実現される場合、各装置は有線、無線、又はそれらを適切に組合せた通信ネットワークにより通信可能に接続されてもよい。係る通信ネットワークは、物理的な通信ネットワークであってもよく、仮想的な通信ネットワークであってもよい。以下において説明する各実施形態を実現可能なハードウェア構成については、後述する。 The technique according to the present disclosure may be realized by using a single device (physical and virtual device), or may be realized by using a plurality of separated devices (physical and virtual device). .. When the technology according to the present disclosure is realized by a plurality of devices, each device may be communicably connected by a communication network that is wired, wireless, or a combination thereof. The communication network may be a physical communication network or a virtual communication network. The hardware configuration that can realize each of the embodiments described below will be described later.

＜第１の実施形態＞
以下、本開示に係る技術の第１の実施形態について説明する。 <First Embodiment>
Hereinafter, the first embodiment of the technique according to the present disclosure will be described.

［構成］
図４は、本実施形態におけるコード生成装置１００の機能的な構成を例示するブロック図である。図４に例示するように、コード生成装置１００は、コンパイラ１０１を備える。コード生成装置１００は、更に、ファイル管理部１０４を備えてもよい。コード生成装置１００を構成するこれらの構成要素の間は、適切な方法（例えば、プロセス間通信、共有メモリ、各種ＡＰＩ（ＡｐｐｌｉｃａｔｉｏｎＰｒｏｇｒａｍｍｉｎｇＩｎｔｅｒｆａｃｅ）等）を用いて、相互に通信可能に接続されていてもよい。以下、各構成要素について説明する。 [Constitution]
FIG. 4 is a block diagram illustrating a functional configuration of the code generation device 100 according to the present embodiment. As illustrated in FIG. 4, the code generation device 100 includes a compiler 101. The code generation device 100 may further include a file management unit 104. These components constituting the code generation device 100 are communicably connected to each other by using appropriate methods (for example, interprocess communication, shared memory, various APIs (Application Programming Interfaces), etc.). May be good. Hereinafter, each component will be described.

コンパイラ１０１は、ソースコードを入力として受け付け、字句解析処理、構文解析処理、意味解析処理、オブジェクトのコード生成処理、等を実行する。これにより、コンパイラ１０１は、ソースコードをオブジェクトコードに変換（コンパイル）する。この際、コンパイラ１０１は、例えば、オブジェクトコードのサイズの低減、オブジェクトコードから最終的に生成される実行コードの実行速度の改善、などを目的とした最適化処理を実行することができる。以下、係る最適化処理を実行可能な構成要素について説明する。 The compiler 101 receives the source code as input and executes lexical analysis processing, syntax analysis processing, semantic analysis processing, object code generation processing, and the like. As a result, the compiler 101 converts (compiles) the source code into the object code. At this time, the compiler 101 can execute the optimization process for the purpose of reducing the size of the object code, improving the execution speed of the execution code finally generated from the object code, and the like. Hereinafter, the components capable of executing the optimization process will be described.

コンパイラ１０１は、コード解析部１０２と、命令生成部１０３と、を有する。 The compiler 101 includes a code analysis unit 102 and an instruction generation unit 103.

以下、コード解析部１０２がソースコード又は中間コード（ソースコードに対して、構文解析処理、字句解析処理、意味解析処理等を行うことで生成されたコード）解析する構成について説明する。なお、以下、ソースコード及び中間コードをまとめて解析対象コードと記載することがある。 Hereinafter, a configuration in which the code analysis unit 102 analyzes the source code or the intermediate code (code generated by performing syntactic analysis processing, lexical analysis processing, semantic analysis processing, etc. on the source code) will be described. In the following, the source code and the intermediate code may be collectively referred to as the analysis target code.

コード解析部１０２は、解析対象コードを解析し、その解析対象コードに含まれる演算命令が、ＳＩＭＤ命令に変換可能か否かを判定する。コード解析部１０２は、例えば、解析対象コードに含まれるループ処理（繰り返し処理）内部の演算が、ＳＩＭＤ命令に変換可能であるか否かを判定することができる。また、コード解析部１０２は、例えば、解析対象コードにおいて逐次実行される演算が、ＳＩＭＤ命令に変換可能であるか否かを判定することができる。 The code analysis unit 102 analyzes the analysis target code and determines whether or not the arithmetic instruction included in the analysis target code can be converted into a SIMD instruction. The code analysis unit 102 can determine, for example, whether or not the operation inside the loop processing (repetition processing) included in the analysis target code can be converted into a SIMD instruction. Further, the code analysis unit 102 can determine, for example, whether or not the operations sequentially executed in the analysis target code can be converted into SIMD instructions.

コード解析部１０２は、解析対象コードに含まれる演算（例えば、加減算、乗算、除算、ＦＭＡ演算等）を探索する、探索部１０２ａを含む。探索部１０２ａは、例えば、解析対象コードに含まれるループ処理内部の演算を探索することができる。探索部１０２ａは、探索した結果を、依存関係解析部１０２ｂ（後述）に提供してもよい。 The code analysis unit 102 includes a search unit 102a that searches for operations (for example, addition / subtraction, multiplication, division, FMA operation, etc.) included in the analysis target code. The search unit 102a can search, for example, the operation inside the loop processing included in the analysis target code. The search unit 102a may provide the search result to the dependency analysis unit 102b (described later).

コード解析部１０２は、また、探索部１０２ａが探索した演算の間の依存関係を解析し、それらの演算をＳＩＭＤ命令に変換可能か否かを判定する、依存関係解析部１０２ｂを含む。依存関係解析部１０２ｂは、例えば、探索部１０２ａが探索したループ処理内部における演算の依存関係を解析し、それらの演算がＳＩＭＤ命令に変換可能か判定することができる。 The code analysis unit 102 also includes a dependency analysis unit 102b that analyzes the dependencies between the operations searched by the search unit 102a and determines whether or not those operations can be converted into SIMD instructions. The dependency analysis unit 102b can analyze, for example, the dependency relationships of the operations inside the loop processing searched by the search unit 102a, and determine whether those operations can be converted into SIMD instructions.

命令生成部１０３は、コード解析部１０２における解析対象コードの解析結果に応じて、解析対象コードに含まれる演算を、ＳＩＭＤ命令に変換する。 The instruction generation unit 103 converts the operation included in the analysis target code into a SIMD instruction according to the analysis result of the analysis target code in the code analysis unit 102.

命令生成部１０３は、演算命令（例えば、加算命令、減算命令、乗算命令等）をＦＭＡ演算の形式に変換する演算変換部１０３ａを含む。命令生成部１０３は、また、ＳＩＭＤ命令生成部１０３ｂを含む。ＳＩＭＤ命令生成部１０３ｂは、複数のＦＭＡ演算間の依存関係に基づいて、組合せ可能な２つのＦＭＡ演算を選択し、ＳＩＭＤ命令のオペランドを表すデータを成形（パック）し、ＳＩＭＤ−ＦＭＡ命令を生成する処理を実行する。 The instruction generation unit 103 includes an operation conversion unit 103a that converts an operation instruction (for example, an addition instruction, a subtraction instruction, a multiplication instruction, etc.) into an FMA operation format. The instruction generation unit 103 also includes a SIMD instruction generation unit 103b. The SIMD instruction generation unit 103b selects two FMA operations that can be combined based on the dependency between a plurality of FMA operations, forms (packs) data representing the operands of the SIMD instruction, and generates a SIMD-FMA instruction. Execute the process to be performed.

コード解析部１０２及び命令生成部１０３は、コンパイラ１０１において最適化処理の一部を実行する構成要素として実装されてもよい。なお、コンパイラ１０１は、これ以外に、字句解析、構文解析、意味解析、オブジェクト生成等の典型的な処理を実行するよう構成されてよい。これらの典型的な処理は、例えば、周知技術を用いて実装されてもよい。 The code analysis unit 102 and the instruction generation unit 103 may be implemented as components that execute a part of the optimization process in the compiler 101. In addition to this, the compiler 101 may be configured to execute typical processes such as lexical analysis, syntactic analysis, semantic analysis, and object generation. These typical processes may be implemented, for example, using well-known techniques.

ファイル管理部１０４は、ソースコード、オブジェクトコード等を記憶及び管理するよう構成される。本実施形態においては、ファイル管理部１０４は、ソースコード、オブジェクトコード等をファイルの形式で管理してもよい。この場合、ファイル管理部１０４は、ファイルシステムを用いて実現されてもよい。なお、本実施形態はこれには限定されず、ファイル管理部１０４は、例えば、データベース等、ファイルシステム以外の技術を用いて実現されてもよい。 The file management unit 104 is configured to store and manage source code, object code, and the like. In the present embodiment, the file management unit 104 may manage the source code, the object code, and the like in the file format. In this case, the file management unit 104 may be realized by using a file system. The present embodiment is not limited to this, and the file management unit 104 may be realized by using a technique other than the file system, such as a database.

コード生成装置１００は、例えば、コード生成装置１００自身により実行されるオブジェクトコードを生成してもよい。また、コード生成装置１００は、例えば、図１２に例示するような他の情報処理装置１２００において実行可能なオブジェクトコードを生成してもよい。情報処理装置１２００は、例えば、プロセッサ１２０１、メモリ１２０２、ストレージ１２０３等から構成されるコンピュータであってよい。 The code generation device 100 may generate object code executed by the code generation device 100 itself, for example. Further, the code generation device 100 may generate an object code that can be executed by another information processing device 1200 as illustrated in FIG. 12, for example. The information processing device 1200 may be, for example, a computer composed of a processor 1201, a memory 1202, a storage 1203, and the like.

プロセッサ１２０１は、少なくとも、ＳＩＭＤ型ＦＭＡ演算器と、ＳＩＭＤ演算に用いられるオペランドを記憶可能なレジスタ（ＳＩＭＤ−ＦＭＡレジスタと記載する場合がある）とを含むＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｓｉｎｇＵｎｉｔ）やＭＰＵ（ＭｉｃｒｏＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）であってよい。プロセッサ１２０１は、上記以外に、今日における典型的な計算処理を実行可能な構成を備えてよい。メモリ１２０２は、プロセッサ１２０１から参照可能な記憶装置であり、プロセッサ１２０１により実行されるオブジェクトコードを記憶することができる。ストレージ１２０３は、例えば、不揮発性の記憶装置（ハードディスクドライブや、半導体フラッシュメモリ等）により構成され、オブジェクトコードを記憶することが可能である。図１２に例示する情報処理装置１２００の構成は、一つの具体例であり、本実施形態はこれには限定されない。 The processor 1201 is a CPU (Central Processing Unit) or MPU (Micro Processing Unit) including at least a SIMD type FMA arithmetic unit and a register (sometimes referred to as a SIMD-FMA register) capable of storing operands used in SIMD arithmetic. Unit) may be used. In addition to the above, the processor 1201 may have a configuration capable of executing typical calculation processing of today. The memory 1202 is a storage device that can be referred to by the processor 1201 and can store the object code executed by the processor 1201. The storage 1203 is composed of, for example, a non-volatile storage device (hard disk drive, semiconductor flash memory, etc.) and can store an object code. The configuration of the information processing apparatus 1200 illustrated in FIG. 12 is a specific example, and the present embodiment is not limited thereto.

［動作］
以下、上記のように構成されたコード生成装置１００の動作について説明する。 [motion]
Hereinafter, the operation of the code generation device 100 configured as described above will be described.

図５Ａは、コード解析部１０２における処理の一例を示すフローチャートである。コード解析部１０２は、解析対象コードを解析し、探索部１０２ａの機能を用いて、解析対象コードに含まれる演算の種類を特定する（ステップＳ５０１）。コード解析部１０２は、例えば、解析対象コードにおけるループ処理を解析し、探索部１０２ａの機能を用いて、ループ処理に含まれる演算の種類を特定してもよい。 FIG. 5A is a flowchart showing an example of processing in the code analysis unit 102. The code analysis unit 102 analyzes the analysis target code and uses the function of the search unit 102a to specify the type of operation included in the analysis target code (step S501). For example, the code analysis unit 102 may analyze the loop processing in the code to be analyzed and use the function of the search unit 102a to specify the type of operation included in the loop processing.

探索部１０２ａは、解析対象コードに含まれる演算のうち、加減算と乗算とを選択し、ＳＩＭＤ命令に変換可能な演算の組合せのパターンを探索する（ステップＳ５０２）。この際、探索部１０２ａは、例えば、オペランドの型及び精度が同じ加減算命令と、乗算命令との組合せを探索してもよい。 The search unit 102a selects addition / subtraction and multiplication from the operations included in the analysis target code, and searches for a pattern of a combination of operations that can be converted into a SIMD instruction (step S502). At this time, the search unit 102a may search for a combination of the addition / subtraction instruction and the multiplication instruction having the same operand type and accuracy, for example.

依存関係解析部１０２ｂは、探索部１０２ａにより探索された演算の組合せについて、依存関係があるか否かを解析する。この解析の結果に応じて、依存関係解析部１０２ｂは、これらの命令をＳＩＭＤ命令に変換可能であるか否かを判定する（ステップＳ５０３）。依存関係解析部１０２ｂは、例えば、複数の演算が同一のループ処理に含まれ、かつ、それぞれの演算の間に依存関係がない場合、それらの演算の組合せについて、依存関係がないと判定してもよい。 The dependency analysis unit 102b analyzes whether or not there is a dependency relationship with respect to the combination of operations searched by the search unit 102a. Depending on the result of this analysis, the dependency analysis unit 102b determines whether or not these instructions can be converted into SIMD instructions (step S503). For example, when a plurality of operations are included in the same loop processing and there is no dependency between the operations, the dependency analysis unit 102b determines that there is no dependency for the combination of those operations. May be good.

ステップＳ５０３における判定の結果、依存関係があると判定された場合（ステップＳ５０４においてＹＥＳ）、コード解析部１０２は、それらの命令について、ＳＩＭＤ命令に変換しないと決定する。依存関係がある命令をＳＩＭＤ命令に変換した場合、計算結果の値が変わってしまう可能性があるからである。コード解析部１０２は、それらの命令を、ＳＩＭＤ命令に変換される命令の候補から除外してよい（ステップＳ５０５）。 If it is determined that there is a dependency as a result of the determination in step S503 (YES in step S504), the code analysis unit 102 determines that those instructions are not converted into SIMD instructions. This is because when an instruction having a dependency is converted into a SIMD instruction, the value of the calculation result may change. The code analysis unit 102 may exclude those instructions from the candidates for instructions to be converted into SIMD instructions (step S505).

上記解析の結果、依存関係がないと判断された場合（ステップＳ５０４においてＮＯ）、コード解析部１０２は、それらの命令をＳＩＭＤ命令に変換可能であると判定する。
この場合、コード解析部１０２は、例えば、上記処理の結果を命令生成部１０３に提供してもよい。 If it is determined as a result of the above analysis that there is no dependency (NO in step S504), the code analysis unit 102 determines that these instructions can be converted into SIMD instructions.
In this case, the code analysis unit 102 may provide the result of the above processing to the instruction generation unit 103, for example.

命令生成部１０３は、依存関係がない演算の組合せを、ＳＩＭＤ命令（より具体的には、ＳＩＭＤ−ＦＭＡ命令）に変換する（ステップＳ５０６）。以下、ステップＳ５０６における処理について、図５Ｂに例示するフローチャートを参照して説明する。図５Ｂは、命令生成部１０３における処理の一例を例示するフローチャートである。 The instruction generation unit 103 converts a combination of operations having no dependency into a SIMD instruction (more specifically, a SIMD-FMA instruction) (step S506). Hereinafter, the process in step S506 will be described with reference to the flowchart illustrated in FIG. 5B. FIG. 5B is a flowchart illustrating an example of processing in the instruction generation unit 103.

命令生成部１０３は、演算変換部１０３ａの機能を用いて、加減算と乗算とを、ＦＭＡ演算の形式へ変換する（ステップＳ５０８）。 The instruction generation unit 103 converts addition / subtraction and multiplication into an FMA operation format by using the function of the operation conversion unit 103a (step S508).

以下、解析対象コードに含まれる非ＦＭＡ演算（例えば、加算、減算、乗算、除算等の単順な算術演算）をＦＭＡ演算に変換する方法について、図６に例示する説明図を参照して説明する。命令生成部１０３は、解析対象コードに含まれる加算について、一方のオペランドにダミーの乗算（結果に影響しない見せかけの乗算）として値”１”をかける（乗算する）。これにより、命令生成部１０３は、当該加算をＦＭＡ演算に変換する（図６の６０１）。以下、加算をＦＭＡ演算に変換する処理を、第１の演算変換と記載することがある。加算をＦＭＡ演算に変換する際に用いられるダミーのオペランド（本具体例においては、”１”）を、第１のダミーオペランドと記載することがある。 Hereinafter, a method of converting a non-FMA operation (for example, a simple arithmetic operation such as addition, subtraction, multiplication, division, etc.) included in the analysis target code into an FMA operation will be described with reference to an explanatory diagram illustrated in FIG. To do. The instruction generation unit 103 multiplies (multiplies) one operand by the value "1" as a dummy multiplication (a fake multiplication that does not affect the result) for the addition included in the analysis target code. As a result, the instruction generation unit 103 converts the addition into an FMA operation (601 in FIG. 6). Hereinafter, the process of converting the addition into the FMA operation may be described as the first operation conversion. The dummy operand (“1” in this specific example) used when converting the addition to the FMA operation may be described as the first dummy operand.

命令生成部１０３は、解析対象コードに含まれる減算について、一方のオペランドにダミーの乗算として値”１”をかけ（乗算し）、更に、減数のオペランドの符号を反転することで、当該減算をＦＭＡ演算に変換する（図６の６０２）。以下、減算をＦＭＡ演算に変換する処理を、第２の演算変換と記載することがある。減算をＦＭＡ演算に変換する際に用いられるダミーのオペランド（本具体例においては、”１”）を、第２のダミーオペランドと記載することがある。 The instruction generation unit 103 multiplies (multiplies) one operand by the value "1" as a dummy multiplication, and inverts the sign of the decimal operand to perform the subtraction. Convert to FMA operation (602 in FIG. 6). Hereinafter, the process of converting the subtraction into the FMA operation may be described as the second operation conversion. The dummy operand (“1” in this specific example) used when converting the subtraction into the FMA operation may be described as the second dummy operand.

命令生成部１０３は、解析対象コードに含まれる乗算について、当該乗算の結果に値”０”を足すダミーの加算（結果に影響しない見せかけの加算）を加えることで、当該乗算をＦＭＡ演算に変換する（図６の６０３）。以下、乗算をＦＭＡ演算に変換する処理を、第３の演算変換と記載する場合がある。以下、乗算をＦＭＡ演算に変換する際に用いられるダミーのオペランド（本具体例においては、”０”）を、第３のダミーオペランドと記載することがある。 The instruction generation unit 103 converts the multiplication into an FMA operation by adding a dummy addition (a fake addition that does not affect the result) that adds the value "0" to the result of the multiplication for the multiplication included in the analysis target code. (603 in FIG. 6). Hereinafter, the process of converting multiplication into FMA operation may be described as a third operation conversion. Hereinafter, a dummy operand (“0” in this specific example) used when converting multiplication into an FMA operation may be described as a third dummy operand.

上記のような方法によって、命令生成部１０３は、演算の結果を変えることなく、第１、第２、第３の演算変換により、加算、減算、及び乗算をＦＭＡ演算に変換することができる。このような変換により、例えば、コード解析部１０２によってＳＩＭＤ命令化可能であると判断された加算（又は減算）と乗算と組合せは、２つのＦＭＡ演算の組（１つは加算（又は減算）から変換されたＦＭＡ演算、もう１つは乗算から変換されたＦＭＡ演算）、に変換される。 By the method as described above, the instruction generation unit 103 can convert addition, subtraction, and multiplication into an FMA operation by the first, second, and third operation conversions without changing the result of the operation. By such conversion, for example, the addition (or subtraction), multiplication, and combination determined by the code analysis unit 102 to be SIMD-instructed can be obtained from a set of two FMA operations (one is addition (or subtraction)). The converted FMA operation, the other is the converted FMA operation from multiplication).

ＳＩＭＤ命令生成部１０３ｂは、上記２つのＦＭＡ演算の組を同時に（並行して）実行可能なＳＩＭＤ命令（ＳＩＭＤ−ＦＭＡ命令）を生成する（ステップＳ５０８）。この際、ＳＩＭＤ命令生成部１０３ｂは、それぞれのＦＭＡ演算のオペランドが、ＳＩＭＤレジスタの上位ビットと下位ビットとにパック（配置）されるように、ＳＩＭＤ−ＦＭＡ命令のオペランドを表すデータを生成する。これにより、ＳＩＭＤ命令生成部１０３ｂは、ＳＩＭＤ−ＦＭＡ命令を生成する。 The SIMD instruction generation unit 103b generates a SIMD instruction (SIMD-FMA instruction) capable of executing the above two sets of FMA operations at the same time (in parallel) (step S508). At this time, the SIMD instruction generation unit 103b generates data representing the operands of the SIMD-FMA instruction so that the operands of each FMA operation are packed (arranged) in the high-order bits and the low-order bits of the SIMD register. As a result, the SIMD instruction generation unit 103b generates the SIMD-FMA instruction.

命令生成部１０３の処理について、図７に例示する具体例を用いて説明する。図７に例示する具体例においては、解析対象コード（ソースコード）には、異なる種類の演算（加算及び乗算）が含まれていることから、これらの命令を単純にＳＩＭＤ命令に置き換えることは困難である。 The processing of the instruction generation unit 103 will be described with reference to a specific example illustrated in FIG. In the specific example illustrated in FIG. 7, since the analysis target code (source code) includes different types of operations (addition and multiplication), it is difficult to simply replace these instructions with SIMD instructions. Is.

命令生成部１０３（具体的には、演算変換部１０３ａ）は、ソースコードに含まれる加算及び乗算を、ＦＭＡ演算に変換する。命令生成部１０３（具体的には、ＳＩＭＤ命令生成部１０３ｂ）は、ＦＭＡ演算を、ＳＩＭＤ命令（ＳＩＭＤ−ＦＭＡ命令）に変換する。この際、ＳＩＭＤ命令生成部１０３ｂは、ＦＭＡ演算の各オペランドが、ＳＩＭＤレジスタの上位ビットと、下位ビットとにパックされるように、ＳＩＭＤ−ＦＭＡ命令のオペランドを表すデータを生成する。これにより、係るソースコードから変換されたオブジェクトコードが実行される際、あたかも、ＳＩＭＤレジスタの上位ビットにおいては”Ａ１＋Ｂ１”なる加算処理が実行され、下位ビットにおいては、”Ａ２＊Ｂ２”なる乗算処理が実行されるかのように、演算処理を実行することが可能である。 The instruction generation unit 103 (specifically, the operation conversion unit 103a) converts the addition and multiplication included in the source code into an FMA operation. The instruction generation unit 103 (specifically, the SIMD instruction generation unit 103b) converts the FMA operation into a SIMD instruction (SIMD-FMA instruction). At this time, the SIMD instruction generation unit 103b generates data representing the operands of the SIMD-FMA instruction so that each operand of the FMA operation is packed into the high-order bit and the low-order bit of the SIMD register. As a result, when the object code converted from the source code is executed, the addition process of "A1 + B1" is executed in the upper bits of the SIMD register, and the multiplication process of "A2 * B2" is executed in the lower bits. It is possible to perform arithmetic processing as if.

図７は、２つのＦＭＡ命令をＳＩＭＤ命令として実行する具体例を例示するが、本実施形態は、これには限定されない。コード生成装置１００は、生成したオブジェクトコードが実行さる環境（例えば、プロセッサ１２０１の構成）に応じて、３つ以上のＦＭＡ命令をＳＩＭＤ命令として実行する、ＳＩＭＤ−ＦＭＡ命令を生成してもよい。 FIG. 7 illustrates a specific example of executing two FMA instructions as SIMD instructions, but the present embodiment is not limited thereto. The code generator 100 may generate a SIMD-FMA instruction that executes three or more FMA instructions as SIMD instructions according to the environment in which the generated object code is executed (for example, the configuration of the processor 1201).

以下、コード生成装置１００が、ｎ個（ｎは２以上の整数）のＦＭＡ命令を並列に実行可能なＳＩＭＤ命令（ＳＩＭＤ−ＦＭＡ命令）を生成する場合について概要を説明する。 Hereinafter, a case where the code generator 100 generates SIMD instructions (SIMD-FMA instructions) capable of executing n FMA instructions (n is an integer of 2 or more) in parallel will be described.

この場合、命令生成部１０３は、オブジェクトコードが実行される環境（例えばプロセッサ１２０１）におけるＳＩＭＤ−ＦＭＡレジスタをｎ等分したそれぞれの領域に、ｎ個のＦＭＡ命令のオペランドが配置されるように、ＳＩＭＤ−ＦＭＡ命令を生成してもよい。例えば、ＳＩＭＤ−ＦＭＡレジスタが６４ｂｉｔ幅であり、ＦＭＡ命令のオペランドが単精度浮動小数電データ（３２ｂｉｔ）である場合、ＳＩＭＤ−ＦＭＡ命令として、２つのＦＭＡ命令が並列に実行されてよい。この場合、命令生成部は、ＳＩＭＤ−ＦＭＡレジスタを２つの領域（上位３２ｂｉｔ、下位３２ｂｉｔ）に分割し、それぞれの領域に、２つのＦＭＡ演算のオペランドを配置する。例えば、ＳＩＭＤ−ＦＭＡレジスタが６４ｂｉｔ幅であり、ＦＭＡ命令のオペランドが半精度浮動小数電データ（１６ｂｉｔ）である場合、ＳＩＭＤ−ＦＭＡ命令として、４つのＦＭＡ命令が並列に実行されてもよい。この場合、命令生成部は、ＳＩＭＤ−ＦＭＡレジスタを４つの領域（それぞれ１６ｂｉｔ）に分割し、それぞれの領域に、４つのＦＭＡ演算のオペランドを配置する。 In this case, the instruction generation unit 103 arranges n operands of the FMA instruction in each area of the SIMD-FMA register divided into n equal parts in the environment in which the object code is executed (for example, processor 1201). SIMD-FMA instructions may be generated. For example, when the SIMD-FMA register is 64 bits wide and the operand of the FMA instruction is single-precision floating small number electric data (32 bits), two FMA instructions may be executed in parallel as the SIMD-FMA instruction. In this case, the instruction generation unit divides the SIMD-FMA register into two areas (upper 32 bits and lower 32 bits), and arranges two operands of the FMA operation in each area. For example, when the SIMD-FMA register is 64 bits wide and the operand of the FMA instruction is semi-precision floating small number electric data (16 bits), four FMA instructions may be executed in parallel as SIMD-FMA instructions. In this case, the instruction generator divides the SIMD-FMA register into four areas (16 bits each), and arranges four FMA operation operands in each area.

コード生成装置１００を用いた場合に得られる効果の一例を、図８乃至図１０に示す具体例を用いて説明する。なお、図８乃至図１０に示す具体例は、一例にすぎず、本実施形態はこれには限定されない。 An example of the effect obtained when the code generation device 100 is used will be described with reference to the specific examples shown in FIGS. 8 to 10. The specific examples shown in FIGS. 8 to 10 are merely examples, and the present embodiment is not limited thereto.

図８は、解析対象コード（ソースコード）に含まれるループ中に現れる命令列の一例を示す説明図である。図８に例示する演算処理のオペランドは、全て単精度浮動小数点データ（３２ｂｉｔ）を表すことを想定する。また、実行環境（例えば、図１２に例示する情報処理装置）のプロセッサは、ＳＩＭＤ型ＦＭＡ演算器を有することを想定する。また、係るプロセッサは、６４ｂｉｔ幅のＳＩＭＤ命令により、単精度浮動小数を同時に２個計算可能であり、１クロックあたり最大で単精度浮動小数点演算を４個実行できることを想定する。 FIG. 8 is an explanatory diagram showing an example of an instruction sequence appearing in a loop included in the analysis target code (source code). It is assumed that all the operands of the arithmetic processing illustrated in FIG. 8 represent single-precision floating-point data (32 bits). Further, it is assumed that the processor of the execution environment (for example, the information processing device illustrated in FIG. 12) has a SIMD type FMA arithmetic unit. Further, it is assumed that the processor can calculate two single-precision floating-point numbers at the same time by a 64-bit wide SIMD instruction, and can execute up to four single-precision floating-point operations per clock.

図８に示す具体例の場合、解析対象コードに（８０１）〜（８０８）の命令が含まれる。命令（８０１）〜（８０４）においては加算が行われ、命令（８０５）〜（８０８）においては乗算が行われる。図９に例示するように、命令（８０２）は、命令（８０１）の演算結果（”Ａ１”）を参照することから、命令（８０２）と、命令（８０１）との間には依存関係がある。命令（８０３）、（８０４）についても同様に、前の命令の結果を参照することから、加算命令（８０１）〜（８０４）の間には依存関係がある。 In the case of the specific example shown in FIG. 8, the analysis target code includes the instructions (801) to (808). Addition is performed in the instructions (801) to (804), and multiplication is performed in the instructions (805) to (808). As illustrated in FIG. 9, since the instruction (802) refers to the operation result (“A1”) of the instruction (801), there is a dependency relationship between the instruction (802) and the instruction (801). is there. Similarly, with respect to the instructions (803) and (804), since the result of the previous instruction is referred to, there is a dependency relationship between the addition instructions (801) to (804).

また、命令（８０６）は、命令（８０５）の演算結果（”Ｘ１”）を参照することから命令（８０５）と、命令（８０６）との間には依存関係がある。命令（８０７）、命令（８０８）についても同様に、前の命令の結果を参照することから、乗算命令（８０５）〜（８０８）の間にも依存関係がある。即ち、同じ種類の演算命令である命令（８０１）〜（８０４）の加算同士、及び、命令（８０５）〜（８０８）の乗算同士の間に依存関係がある。 Further, since the instruction (806) refers to the operation result (“X1”) of the instruction (805), there is a dependency relationship between the instruction (805) and the instruction (806). Similarly, for the instruction (807) and the instruction (808), since the result of the previous instruction is referred to, there is a dependency relationship between the multiplication instructions (805) to (808). That is, there is a dependency between the additions of the instructions (801) to (804), which are the same type of arithmetic instructions, and the multiplications of the instructions (805) to (808).

一方、命令（８０５）〜（８０８）の乗算命令は、命令（８０１）〜（８０４）の加算命令の結果を参照しない。即ち、命令（８０１）〜（８０４）の加算命令と、命令（８０５）〜（８０８）の乗算命令との間には依存関係がない。 On the other hand, the multiplication instructions of the instructions (805) to (808) do not refer to the result of the addition instructions of the instructions (801) to (804). That is, there is no dependency between the addition instructions of the instructions (801) to (804) and the multiplication instructions of the instructions (805) to (808).

典型的には、同じ種類の演算については、演算の間に依存関係がなければ、ＳＩＭＤ命令に変換することができる。一方、演算の種類が異なる命令（例えば、加算命令と乗算目例）については、単純にＳＩＭＤ命令に置き換えることは困難である。 Typically, the same type of operation can be converted to a SIMD instruction if there are no dependencies between the operations. On the other hand, it is difficult to simply replace an instruction with a different operation type (for example, an addition instruction and a multiplication item example) with a SIMD instruction.

図８に例示するような命令列の場合、同じ種類の演算命令の間には依存関係があることから、単純にはＳＩＭＤ命令に変換することができない。この場合、８つ命令を逐次的に実行することとなり、十分な実行性能が得られない。また、この場合、プロセッサは、１クロックで１個の演算しか実行できないことから、最大４個同時実行可能なピーク性能に対して、十分な実行効率が得られない。 In the case of the instruction sequence as illustrated in FIG. 8, since there is a dependency between the same type of operation instructions, it cannot be simply converted into a SIMD instruction. In this case, eight instructions are executed sequentially, and sufficient execution performance cannot be obtained. Further, in this case, since the processor can execute only one operation in one clock, sufficient execution efficiency cannot be obtained for the peak performance in which a maximum of four operations can be executed simultaneously.

これに対して、上記説明した本実施形態におけるコード生成装置１００は、図１０に例示するように、命令（８０１）〜（８０８）を、それぞれＦＭＡ演算（ｆ１）〜（ｆ８）に変換することができる。コード生成装置１００は、依存関係のない（ｆ１）と（ｆ５）、（ｆ２）と（ｆ６）、（ｆ３）と（ｆ７）、（ｆ４）と（ｆ８）を、同時に（並行して）実行可能な命令の組であると判定する。コード生成装置１００は、これらの命令をＳＩＭＤ−ＦＭＡ命令に変換し、命令（ｓ１）〜（ｓ４）を生成することができる。 On the other hand, the code generation device 100 in the present embodiment described above converts the instructions (801) to (808) into FMA operations (f1) to (f8), respectively, as illustrated in FIG. Can be done. The code generation device 100 executes (f1) and (f5), (f2) and (f6), (f3) and (f7), (f4) and (f8) having no dependency at the same time (in parallel). Judge that it is a set of possible instructions. The code generation device 100 can convert these instructions into SIMD-FMA instructions and generate instructions (s1) to (s4).

コード生成装置１００は、元の加算命令のオペランドがＳＩＭＤレジスタの上位ビット（上位３２ｂｉｔ）にパックされ、元の乗算命令のオペランドがＳＩＭＤレジスタの下位ビット（下位３２ｂｉｔ）にパックされるよう、ＳＩＭＤ−ＦＭＡ命令を生成する。これにより、命令（ｓ１）〜（ｓ４）においては、あたかもＳＩＭＤレジスタの上位ビットにおいては、命令（ｆ１）〜（ｆ４）に相当する加算命令が実行され、下位ビットにおいては、命令（ｆ５）〜（ｆ８）に相当する乗算命令が実行されるかのように、演算処理が実行される。 The code generator 100 packs the operand of the original addition instruction into the upper bit (upper 32 bits) of the SIMD register and the operand of the original multiplication instruction into the lower bit (lower 32 bits) of the SIMD register. Generate an FMA instruction. As a result, in the instructions (s1) to (s4), the addition instructions corresponding to the instructions (f1) to (f4) are executed in the upper bits of the SIMD register, and the instructions (f5) to (f5) to the lower bits are executed. The arithmetic processing is executed as if the multiplication instruction corresponding to (f8) is executed.

図８に示す具体例においては、コード生成装置１００は、ＳＩＭＤ命令に変換できない場合と比較して、命令数を１／２（８命令から４命令へ）に削減することが可能である。また、実行性能は、２倍になる。 In the specific example shown in FIG. 8, the code generation device 100 can reduce the number of instructions to 1/2 (from 8 instructions to 4 instructions) as compared with the case where the SIMD instruction cannot be converted. In addition, the execution performance is doubled.

コード生成装置１００は、ＳＩＭＤ命令として２つＦＭＡ演算命令を同時発行するコード（オブジェクトコード）を生成することができる。これより、コード生成装置１００により生成されたコードを実行するプロセッサは、１クロックで実質４個の演算（乗算と加減算）を実行することが可能である。これにより、ＳＩＭＤ命令に変換できない場合と比較して、ピーク性能に対する実行効率が４倍になる。 The code generation device 100 can generate a code (object code) that simultaneously issues two FMA operation instructions as SIMD instructions. From this, the processor that executes the code generated by the code generation device 100 can execute substantially four operations (multiplication and addition / subtraction) in one clock. As a result, the execution efficiency with respect to the peak performance is quadrupled as compared with the case where the SIMD instruction cannot be converted.

上記のように構成されたコード生成装置１００を用いた場合に得られる効果の他の一例を、図１１に示す具体例を用いて説明する。なお、図１１に示す具体例は、一例にすぎず、本実施形態は、これには限定されない。 Another example of the effect obtained when the code generation device 100 configured as described above is used will be described with reference to the specific example shown in FIG. The specific example shown in FIG. 11 is only an example, and the present embodiment is not limited thereto.

図１１に例示する解析対象コード（図１１の（ａ）部分）においては、演算命令（１１０１）における乗算の結果（”Ａ０”）が、後続する他の命令（１１０２）〜（１１０３）に参照される。命令（１１０１）は、加算であり、命令（１１０２）〜（１１０３）は、加算及び乗算であることから、これらの命令を単純にＦＭＡ演算の命令に変換することは困難である。 In the analysis target code (part (a) of FIG. 11) illustrated in FIG. 11, the result of multiplication (“A0”) in the operation instruction (1101) is referred to the other subsequent instructions (1102) to (1103). Will be done. Since the instruction (1101) is addition and the instructions (1102) to (1103) are addition and multiplication, it is difficult to simply convert these instructions into instructions for FMA operation.

上記説明した本実施形態とは異なる方法として、例えば、命令（１１０１）と、他の命令（１１０２−１１０４）とを、単純に組合せることで、これらをＦＭＡ演算に変換することが考えられる。そのような方法は、例えば、上記説明した第１、第２、第３のダミー演算を用いず、ＦＭＡ演算に直接的に変換可能な命令の組合せを探索すること、探索された命令をＦＭＡ演算に変換すること、及び、変換したＦＭＡ演算を実行するＳＩＭＤ−ＦＭＡ命令を生成すること、を含んでもよい。そのような方法を用いた場合、命令（１１０１）〜（１１０４）は、例えば、図１１の（ｂ）部分に示す命令（１１０５）〜（１１０８）に変換される。命令（１１０６）、（１１０８）はＦＭＡ命令であるが、これらの命令において、演算”Ｂ０＊Ｃ０”が複数回実行される。即ち、この演算を実行するための処理（例えば、オペランドの読み込み等）に要するコスト（レイテンシ等）が、複数回発生する可能性がある。また、このような方法を用いた場合、ＦＭＡ演算に変換せずに”Ａ０”を先に計算した方が、乗算命令の数が少ない。 As a method different from the present embodiment described above, for example, it is conceivable to convert the instruction (1101) and another instruction (1102-1104) into an FMA operation by simply combining them. Such a method is, for example, searching for a combination of instructions that can be directly converted into an FMA operation without using the first, second, and third dummy operations described above, and performing an FMA operation on the searched instructions. It may include converting to and generating a SIMD-FMA instruction that executes the converted FMA operation. When such a method is used, the instructions (1101) to (1104) are converted into, for example, the instructions (1105) to (1108) shown in the portion (b) of FIG. Instructions (1106) and (1108) are FMA instructions, and in these instructions, the operation "B0 * C0" is executed a plurality of times. That is, the cost (latency, etc.) required for the process for executing this operation (for example, reading the operand) may occur a plurality of times. Further, when such a method is used, the number of multiplication instructions is smaller when "A0" is calculated first without converting to FMA calculation.

これに対して、本実施形態におけるコード生成装置１００は、ＦＭＡ演算に直接的に変換可能な複数の演算を組合せてＦＭＡ演算を生成するのではなく、依存関係のない命令を、それぞれＦＭＡ命令に変換する。図１１に示す具体例の場合、コード生成装置１００は、図１１の命令（１１０１）〜（１１０４）から、命令（１１０９）〜（１１１２）を生成することができる。この場合、命令（１１０９）において、先に”Ａ０”が計算されることから、演算”Ｂ０＊Ｃ０”を実行するコストは、１度のみ生じる。 On the other hand, the code generation device 100 in the present embodiment does not generate an FMA operation by combining a plurality of operations that can be directly converted into an FMA operation, but converts an instruction having no dependency into an FMA instruction. Convert. In the case of the specific example shown in FIG. 11, the code generator 100 can generate the instructions (1109) to (1112) from the instructions (1101) to (1104) shown in FIG. In this case, since "A0" is calculated first in the instruction (1109), the cost of executing the operation "B0 * C0" is incurred only once.

また、命令（１１１０）〜（１１１２）に含まれる第１、第２のダミーオペランドは、いずれも即値である。よって、ダミーオペランドに関する演算の実行に要するコストは、比較的低いと考えられる。更に、コード生成装置１００は、命令（１１１０）〜（１１１２）のうち、２つの命令をＳＩＭＤ−ＦＭＡ命令に変換することが可能である。これにより、コード生成装置１００は、命令の数を削減できる。 Further, the first and second dummy operands included in the instructions (1110) to (1112) are all immediate values. Therefore, the cost required to execute the operation related to the dummy operand is considered to be relatively low. Further, the code generation device 100 can convert two of the instructions (1110) to (1112) into SIMD-FMA instructions. As a result, the code generation device 100 can reduce the number of instructions.

以上説明したように、本実施形態におけるコード生成装置１００は、解析対象コードのコードサイズを低減することが可能である。また、コード生成装置１００は、解析対象コードの実行速度を改善することが可能である。その理由は、以下の通りである。 As described above, the code generation device 100 in the present embodiment can reduce the code size of the code to be analyzed. Further, the code generation device 100 can improve the execution speed of the code to be analyzed. The reason is as follows.

コード生成装置１００は、解析対象コードに含まれる、異なる種類の演算を表す命令列を、複数のオペランドに関する演算を並行して（同時に）実行可能な命令（ＳＩＭＤ命令）に変換することができる。より具体的には、コード生成装置１００は、解析対象コードに含まれる演算の依存関係を判定することで、それらの演算を、ＳＩＭＤ命令に変換して実行可能か否か判定可能である。コード生成装置１００は、解析対象コードに含まれる演算（例えば、加減算及び乗算）を、ダミーのオペランドを用いて、ＦＭＡ演算に変換可能である。コード生成装置１００は、依存関係のないＦＭＡ演算を、ＳＩＭＤ命令（ＳＩＭＤ−ＦＭＡ命令）に変換可能である。このように、コード生成装置１００は、解析対象コードの一部をＳＩＭＤ−ＦＭＡ命令に変換することで、ＳＩＭＤ命令に変換できない場合に比べて、コードサイズを低減することが可能である。更に、ＳＩＭＤ−ＦＭＡ命令を用いることにより、コード（具体的には、解析対象コードから生成されたオブジェクトコード）の実行速度が改善される。 The code generation device 100 can convert instruction sequences representing different types of operations included in the code to be analyzed into instructions (SIMD instructions) that can execute operations related to a plurality of operands in parallel (simultaneously). More specifically, the code generation device 100 can determine whether or not the operations can be executed by converting them into SIMD instructions by determining the dependency of the operations included in the analysis target code. The code generation device 100 can convert the operations (for example, addition / subtraction and multiplication) included in the analysis target code into FMA operations by using dummy operands. The code generation device 100 can convert an FMA operation having no dependency into a SIMD instruction (SIMD-FMA instruction). As described above, the code generation device 100 can reduce the code size by converting a part of the analysis target code into the SIMD-FMA instruction as compared with the case where it cannot be converted into the SIMD instruction. Further, by using the SIMD-FMA instruction, the execution speed of the code (specifically, the object code generated from the analysis target code) is improved.

例えば、科学技術計算の分野では、配列に格納した大量のデータをループによって処理し、ループの繰返しごとに依存関係がある漸化式を計算するようなプログラムが実行されることがある。例えば、図８に示す、Ａ（ｎ）＝Ａ（ｎ−１）＋Ｂ（ｎ）、Ｘ（ｎ）＝Ｘ（ｎ−１）＋Ｙ（ｎ）（Ａ、Ｂ、Ｘ、Ｙは配列、ｎは配列のインデックス）は、そのような漸化式の一例であるとも考えられる。この場合、プログラムの実行により、多くの演算（例えば、加算、乗算等）命令が発行される。しかしながら、例えば、漸化式においては、同種の演算（例えば、加算同士、乗算同士）の間には依存関係があることが多い。よって、これらの命令をそのままＳＩＭＤ命令に置き換える困難である。このように、ループ内に存在する同種の演算命令を、ＳＩＭＤ命令に単純に置き換えられない場合であっても、本実施形態におけるコード生成装置１００は、異なる種別の命令を組合せてＳＩＭＤ命令（ＳＩＭＤ−ＦＭＡ命令）を生成可能である。これにより、コード生成装置１００は、例えば、漸化式を含むようなプログラムであっても、コードサイズを削減することが可能である。また、コード生成装置１００は、そのようなプログラムの実行速度を改善することが可能である。 For example, in the field of scientific computing, a program may be executed in which a large amount of data stored in an array is processed by a loop and a recurrence formula having a dependency relationship is calculated for each iteration of the loop. For example, as shown in FIG. 8, A (n) = A (n-1) + B (n), X (n) = X (n-1) + Y (n) (A, B, X, Y are sequences, n Is an array index) is also considered to be an example of such a recurrence formula. In this case, the execution of the program issues many arithmetic (for example, addition, multiplication, etc.) instructions. However, for example, in a recurrence formula, there are often dependencies between operations of the same type (for example, additions and multiplications). Therefore, it is difficult to replace these instructions with SIMD instructions as they are. As described above, even if the same type of operation instruction existing in the loop cannot be simply replaced with the SIMD instruction, the code generation device 100 in the present embodiment combines different types of instructions to form a SIMD instruction (SIMD instruction). -FMA instruction) can be generated. Thereby, the code generation device 100 can reduce the code size even in a program including, for example, a recurrence formula. In addition, the code generation device 100 can improve the execution speed of such a program.

例えば、組込みシステムのようにメモリのサイズが限られるシステムにおいては、コードサイズを小さくすることが求められる。このような分野においても、本実施形態において説明した方式の有用性が高いと考えられる。 For example, in a system such as an embedded system in which the memory size is limited, it is required to reduce the code size. It is considered that the method described in the present embodiment is highly useful also in such a field.

［変形例］
以下、本実施形態の変形例について説明する。本変形例におけるコード生成装置の機能的な構成は、上記第１の実施形態と同様としてよい。 [Modification example]
Hereinafter, a modified example of the present embodiment will be described. The functional configuration of the code generation device in this modification may be the same as that of the first embodiment.

上記第１の実施形態におけるコード生成装置１００は、加減算と乗算との組合せを、それらの間の依存関係に応じて、ＳＩＭＤ命令（具体的にはＳＩＭＤ−ＦＭＡ命令）に変換することができる。 The code generation device 100 in the first embodiment can convert a combination of addition / subtraction and multiplication into a SIMD instruction (specifically, a SIMD-FMA instruction) according to the dependency between them.

本変形例において、コード生成装置１００は、上記に加え、加減算と除算との組合せを、ＳＩＭＤ−ＦＭＡ命令に変換可能に構成される。浮動小数点数の演算においては、除数の逆数を求めることで、除算を乗算に変換可能である。このため、本変形例において、コード生成装置１００は、例えば、除算を乗算に変換してから、にＦＭＡ命令に変換することで、加減算と除算との組合せから、ＳＩＭＤ−ＦＭＡ命令を生成することが可能である。 In this modification, in addition to the above, the code generator 100 is configured so that the combination of addition / subtraction and division can be converted into a SIMD-FMA instruction. In floating-point arithmetic, division can be converted to multiplication by finding the reciprocal of the divisor. Therefore, in this modification, the code generator 100 generates a SIMD-FMA instruction from a combination of addition / subtraction and division by, for example, converting division into multiplication and then converting it into an FMA instruction. Is possible.

＜第２の実施形態＞
以下、本開示に係る技術の基本的な実施形態である、第２の実施形態について説明する。 <Second embodiment>
Hereinafter, a second embodiment, which is a basic embodiment of the technique according to the present disclosure, will be described.

図１３は、本実施形態におけるコード生成装置１３００の機能的な構成を例示するブロック図である。図１３に例示するように、コード生成装置１３００は、コード解析部１３０１（コード解析手段）と、命令生成部１３０２（命令生成手段）と、を含む。 FIG. 13 is a block diagram illustrating a functional configuration of the code generation device 1300 according to the present embodiment. As illustrated in FIG. 13, the code generation device 1300 includes a code analysis unit 1301 (code analysis means) and an instruction generation unit 1302 (instruction generation means).

コード解析部１３０１及び命令生成部１３０２は、第１の実施形態と同様、ソースコードからオブジェクトコードを生成するコンパイラの一部として実現されてもよい。 The code analysis unit 1301 and the instruction generation unit 1302 may be realized as a part of a compiler that generates object code from source code, as in the first embodiment.

コード生成装置１３００は、例えば、第１の実施形態におけるコード生成装置１００と同様、コード生成装置１３００自身により実行されるオブジェクトコードを生成してもよい。また、コード生成装置１３００は、例えば、図１２に例示するような他の情報処理装置１２００において実行可能なオブジェクトコードを生成してもよい。 The code generation device 1300 may generate object code executed by the code generation device 1300 itself, as in the code generation device 100 in the first embodiment, for example. Further, the code generation device 1300 may generate an object code that can be executed by another information processing device 1200 as illustrated in FIG. 12, for example.

以下、コード生成装置１３００を構成する各構成要素について説明する。 Hereinafter, each component constituting the code generation device 1300 will be described.

コード解析部１３０１は、複数の演算命令により構成されるコンピュータ・プログラムである解析対象コードに含まれる１以上の演算命令を、その演算命令の種類と、演算命令の間の依存関係とに基づいて、融合演算命令を並列に実行可能な並列実行命令に変換可能が否かを判定する。コード解析部１３０１は、例えば、命令の種類が異なる演算命令の間の依存関係を確認し、それらの命令を並列実行命令に変換可能が否かを判定してもよい。 The code analysis unit 1301 executes one or more arithmetic instructions included in the analysis target code, which is a computer program composed of a plurality of arithmetic instructions, based on the type of the arithmetic instructions and the dependency between the arithmetic instructions. , Determines whether the fusion operation instruction can be converted into a parallel execution instruction that can be executed in parallel. The code analysis unit 1301 may, for example, confirm the dependency between operation instructions of different types of instructions and determine whether or not those instructions can be converted into parallel execution instructions.

融合演算命令は、複数のデータに対して、異なる種類の演算を１つの命令として実行可能な命令である。融合演算命令は、例えば、ＦＭＡ演算を実行する命令であってもよい。 A fusion operation instruction is an instruction that can execute different types of operations as one instruction for a plurality of data. The fusion operation instruction may be, for example, an instruction that executes an FMA operation.

並列実行命令は、１つの命令で複数のデータに関する演算を並列に実行可能な命令である。並列実行命令は、例えば、ＳＩＭＤ命令であってもよい。また、融合演算命令を並列に実行可能な並列実行命令は、ＳＩＭＤ−ＦＭＡ命令であってもよい。 A parallel execution instruction is an instruction that can execute operations related to a plurality of data in parallel with one instruction. The parallel execution instruction may be, for example, a SIMD instruction. Further, the parallel execution instruction capable of executing the fusion operation instruction in parallel may be a SIMD-FMA instruction.

命令生成部１３０２は、コード解析部１３０１により、並列実行命令に変換可能であると判定された演算命令を、融合演算命令に変換する。命令生成部１３０２は、変換した融合演算命令のオペランドが、並列実行命令として実行可能な形式に配置されたデータを生成することで、融合演算命令を並列に実行する並列実行命令を生成する。 The instruction generation unit 1302 converts an operation instruction determined by the code analysis unit 1301 to be a parallel execution instruction into a fusion operation instruction. The instruction generation unit 1302 generates a parallel execution instruction that executes the fusion operation instruction in parallel by generating data in which the operands of the converted fusion operation instruction are arranged in a format that can be executed as a parallel execution instruction.

一例として、符号演算命令がＦＭＡ命令であり、並列実行命令がＳＩＭＤ命令（より具体的にはＳＩＭＤ−ＦＭＡ命令）であることを想定する。この場合、コード解析部１３０１は、例えば、解析対象コードに含まれる、異なる種類の演算（例えば、加算と乗算、減算と乗算等）命令について、それらの間の依存関係を確認し、依存関係がない演算命令の組合せを選択することができる。命令生成部１３０２は、例えば、コード解析部１３０１により選択された、依存関係がない異なる種類の演算をＦＭＡ演算命令に変換し、そのＦＭＡ演算命令を組合せてＳＩＭＤ−ＦＭＡ命令を生成してもよい。この場合、命令生成部１３０２は、例えば、ＳＩＭＤ−ＦＭＡ命令の実行に用いられるレジスタを分割した領域に、並列に実行されるＦＭＡ命令のオペランドが割り当てられるように、ＳＩＭＤ−ＦＭＡ命令のオペランドを表すデータを生成してもよい。 As an example, it is assumed that the code operation instruction is an FMA instruction and the parallel execution instruction is a SIMD instruction (more specifically, a SIMD-FMA instruction). In this case, the code analysis unit 1301 confirms the dependency between different types of operation (for example, addition and multiplication, subtraction and multiplication, etc.) instructions included in the code to be analyzed, and the dependency relationship is determined. You can select no combination of arithmetic instructions. The instruction generation unit 1302 may, for example, convert different types of operations having no dependency selected by the code analysis unit 1301 into FMA operation instructions, and combine the FMA operation instructions to generate a SIMD-FMA instruction. .. In this case, the instruction generation unit 1302 represents, for example, the operand of the SIMD-FMA instruction so that the operand of the FMA instruction executed in parallel is assigned to the area where the register used for executing the SIMD-FMA instruction is divided. Data may be generated.

本実施形態において、コード解析部１３０１は、第１の実施形態におけるコード解析部１０２と同様に構成されてもよく、命令生成部１３０２は、第１の実施形態における命令生成部１０３と同様に構成されてもよい。 In the present embodiment, the code analysis unit 1301 may be configured in the same manner as the code analysis unit 102 in the first embodiment, and the instruction generation unit 1302 may be configured in the same manner as the instruction generation unit 103 in the first embodiment. May be done.

上記のように構成されたコード生成装置１３００によれば、異なる命令を含むプログラムの実行効率を向上することが可能である。その理由は、コード解析部１３０１が、異なる種類の演算命令間の依存関係を判定し、命令生成部が、依存関係がない演算命令を、融合演算命令を並列に実行する並列実行命令に変換するからである。コード生成装置１３００は、例えば、解析対象コードに含まれる異なる種類の演算である加算と乗算、減算と乗算とを、ＳＩＭＤ−ＦＭＡ命令に変換したオブジェクトコードを生成することが可能である。これにより、コード生成装置１３００は、オブジェクトコードのサイズを低減し、実行速度を向上することができる。 According to the code generation device 1300 configured as described above, it is possible to improve the execution efficiency of a program including different instructions. The reason is that the code analysis unit 1301 determines the dependency between different types of operation instructions, and the instruction generation unit converts the operation instruction having no dependency into a parallel execution instruction that executes the fusion operation instruction in parallel. Because. The code generation device 1300 can generate an object code obtained by converting, for example, addition and multiplication, subtraction and multiplication, which are different types of operations included in the code to be analyzed, into a SIMD-FMA instruction. As a result, the code generation device 1300 can reduce the size of the object code and improve the execution speed.

＜ハードウェア及びソフトウェア・プログラム（コンピュータ・プログラム）の構成＞
以下、上記各実施形態を実現可能なハードウェア構成について説明する。 <Structure of hardware and software programs (computer programs)>
Hereinafter, a hardware configuration capable of realizing each of the above embodiments will be described.

以下の説明においては、上記各実施形態におけるコード生成装置（１００、１３００）をまとめて、単に「コード生成装置」と記載する。またコード生成装置の各構成要素を、単に「コード生成装置の構成要素」と記載する。 In the following description, the code generators (100, 1300) in each of the above embodiments are collectively referred to as "code generator". Further, each component of the code generation device is simply described as "component of the code generation device".

上記各実施形態において説明したコード生成装置は、１つ又は複数の専用のハードウェア装置により構成されてもよい。その場合、上記各図に示した各構成要素は、一部又は全部を統合したハードウェア（処理ロジックを実装した回路構成（ｃｉｒｃｕｉｔｒｙ））として実現されてもよい。 The code generation device described in each of the above embodiments may be composed of one or a plurality of dedicated hardware devices. In that case, each component shown in each of the above figures may be realized as hardware in which a part or all of them are integrated (a circuit configuration (circuit configuration (circuit configuration) in which processing logic is implemented)).

例えば、コード生成装置をハードウェアにより実現する場合、コード生成装置の構成要素は、それぞれの機能を提供可能な回路構成を実装するＳｏＣ（ＳｙｓｔｅｍｏｎａＣｈｉｐ）等により実現されてもよい。この場合、例えば、コード生成装置の構成要素が保持するデータは、ＳｏＣに統合されたＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）領域、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）領域、フラッシュメモリ領域等に記憶されてもよい。なお、上記回路構成は１つの具体的態様であり、実装においては、様々なバリエーションが想定される。 For example, when the code generation device is realized by hardware, the components of the code generation device may be realized by SoC (System on a Chip) or the like that implements a circuit configuration capable of providing each function. In this case, for example, the data held by the components of the code generator may be stored in a RAM (Random Access Memory) area, a ROM (Read Only Memory) area, a flash memory area, or the like integrated in the SoC. The circuit configuration is one specific embodiment, and various variations are expected in mounting.

コード生成装置の各構成要素を接続する通信回線としては、周知の通信バスあるいは通信ネットワーク採用してもよい。また、各構成要素を接続する通信回線は、それぞれの構成要素間をピアツーピアで接続してもよい。コード生成装置を複数のハードウェア装置により構成する場合、それぞれのハードウェア装置の間は、適切な通信方法（有線、無線、またはそれらの組合せ）により通信可能に接続されていてもよい。 A well-known communication bus or communication network may be adopted as the communication line for connecting each component of the code generation device. Further, the communication line connecting each component may be connected peer-to-peer between the respective components. When the code generation device is composed of a plurality of hardware devices, the respective hardware devices may be communicably connected by an appropriate communication method (wired, wireless, or a combination thereof).

上述したコード生成装置は、図１４に例示するような汎用のハードウェア装置１４００と、係るハードウェア装置１４００によって実行される各種ソフトウェア・プログラム（コンピュータ・プログラム、例えば、コンパイラ）とによって構成されてもよい。この場合、コード生成装置は、複数のハードウェア装置１４００及びソフトウェア・プログラムにより構成されてもよい。 The code generation device described above may be composed of a general-purpose hardware device 1400 as illustrated in FIG. 14 and various software programs (computer programs, for example, a compiler) executed by the hardware device 1400. Good. In this case, the code generation device may be composed of a plurality of hardware devices 1400 and software programs.

図１４におけるプロセッサ１４０１は、汎用のＣＰＵ（中央処理装置：ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）やマイクロプロセッサ等の演算処理装置である。プロセッサ１４０１は、例えば、後述する不揮発性記憶装置１４０３に記憶された各種ソフトウェア・プログラムをメモリ１４０２に読み出し、係るソフトウェア・プログラムに従って処理を実行してもよい。なお、プロセッサ１４０１は、図１２に例示するプロセッサ１２０１と同様、ＳＩＭＤ−ＦＭＡ命令を実行可能な演算器を備えてもよい。 The processor 1401 in FIG. 14 is an arithmetic processing device such as a general-purpose CPU (Central Processing Unit) or a microprocessor. The processor 1401 may read, for example, various software programs stored in the non-volatile storage device 1403 described later into the memory 1402 and execute processing according to the software programs. Note that the processor 1401 may include an arithmetic unit capable of executing SIMD-FMA instructions, similar to the processor 1201 illustrated in FIG.

この場合、上記各実施形態におけるコード生成装置の構成要素は、例えば、プロセッサ１４０１により実行されるソフトウェア・プログラムとして実現可能である。そのようなソフトウェア・プログラムの実装においては、様々なバリエーションが想定される。 In this case, the components of the code generator in each of the above embodiments can be realized as, for example, a software program executed by the processor 1401. Various variations are expected in the implementation of such software programs.

メモリ１４０２は、プロセッサ１４０１から参照可能な、ＲＡＭ等のメモリ装置であり、ソフトウェア・プログラムや各種データ等を記憶する。なお、メモリ１４０２は、揮発性のメモリ装置であってもよい。 The memory 1402 is a memory device such as a RAM that can be referred to by the processor 1401 and stores software programs, various data, and the like. The memory 1402 may be a volatile memory device.

不揮発性記憶装置１４０３は、例えば磁気ディスクドライブや、フラッシュメモリによる半導体記憶装置のような、不揮発性の記憶装置である。不揮発性記憶装置１４０３は、各種ソフトウェア・プログラムやデータ等を記憶可能である。 The non-volatile storage device 1403 is a non-volatile storage device such as a magnetic disk drive or a semiconductor storage device using a flash memory. The non-volatile storage device 1403 can store various software programs, data, and the like.

例えば、上記各実施形態における、ファイル管理部（１０４）は、不揮発性記憶装置１４０３にデータを保持してもよい。 For example, the file management unit (104) in each of the above embodiments may hold data in the non-volatile storage device 1403.

ドライブ装置１４０４は、例えば、後述する記録媒体１４０５に対するデータの読み込みや書き込みを処理する装置である。 The drive device 1404 is, for example, a device that processes data reading and writing to the recording medium 1405 described later.

記録媒体１４０５は、例えば、光ディスク、光磁気ディスク、半導体フラッシュメモリ等、データを記録可能な記録媒体である。 The recording medium 1405 is a recording medium capable of recording data, such as an optical disk, a magneto-optical disk, and a semiconductor flash memory.

上述した各実施形態を例に説明したコード生成装置は、例えば、図１４に例示するハードウェア装置１４００に対して、上記各実施形態において説明した機能を実現可能なソフトウェア・プログラム（例えば、コンパイラ）を供給することにより、実現されてもよい。より具体的には、例えば、係る装置に対して供給したソフトウェア・プログラムを、プロセッサ１４０１が実行することによって、本開示に係る技術が実現されてもよい。この場合、係るハードウェア装置１４００において稼働しているオペレーティングシステムや、データベース管理ソフト、ネットワークソフト等のミドルウェアなどが各処理の一部を実行してもよい。 The code generation device described by taking each of the above-described embodiments as an example is, for example, a software program (for example, a compiler) capable of realizing the functions described in each of the above-described embodiments with respect to the hardware device 1400 illustrated in FIG. May be realized by supplying. More specifically, for example, the technology according to the present disclosure may be realized by the processor 1401 executing the software program supplied to the device. In this case, the operating system running on the hardware device 1400, middleware such as database management software and network software may execute a part of each process.

上述した各実施形態において、上記各図（例えば、図４及び図１３）に示した各部は、上述したハードウェアにより実行されるソフトウェア・プログラムの機能（処理）単位である、ソフトウェアモジュールとして実現されてもよい。ただし、これらの図面に示した各ソフトウェアモジュールの区分けは一例であり、実装に際しては、他の様々な構成が想定され得る。 In each of the above-described embodiments, each part shown in each of the above-mentioned figures (for example, FIGS. 4 and 13) is realized as a software module which is a function (processing) unit of a software program executed by the above-mentioned hardware. You may. However, the division of each software module shown in these drawings is an example, and various other configurations can be assumed at the time of implementation.

例えば、上記各部をソフトウェアモジュールとして実現する場合、これらのソフトウェアモジュールは、不揮発性記憶装置１４０３に記憶されてもよい。そして、プロセッサ１４０１が、それぞれの処理を実行する際に、これらのソフトウェアモジュールをメモリ１４０２に読み出してもよい。 For example, when each of the above parts is realized as a software module, these software modules may be stored in the non-volatile storage device 1403. Then, when the processor 1401 executes each process, these software modules may be read into the memory 1402.

また、これらのソフトウェアモジュールは、共有メモリやプロセス間通信等により、相互に各種データを伝達できるように構成されてもよい。このような構成により、これらのソフトウェアモジュールを、相互に通信可能に接続可能である。 Further, these software modules may be configured so that various data can be transmitted to each other by shared memory, interprocess communication, or the like. With such a configuration, these software modules can be connected to each other so as to be able to communicate with each other.

更に、上記各ソフトウェア・プログラムは、記録媒体１４０５に記録されてもよい。この場合、上記各ソフトウェア・プログラムは、上記通信装置等の出荷段階、又は運用段階等において、適宜ドライブ装置１４０４を通じて不揮発性記憶装置１４０３に格納されるよう構成されてもよい。 Further, each of the above software programs may be recorded on the recording medium 1405. In this case, each of the software programs may be configured to be appropriately stored in the non-volatile storage device 1403 through the drive device 1404 at the shipping stage, the operation stage, or the like of the communication device or the like.

なお、上記の場合において、上記コード生成装置への各種ソフトウェア・プログラムの供給方法として、出荷前の製造段階、又は出荷後のメンテナンス段階等において、適当な治具（ツール）を利用して当該装置内にインストールする方法を採用してもよい。また、各種ソフトウェア・プログラムの供給方法として、インターネット等の通信回線を介して外部からダウンロードする方法等のように、現在では一般的な手順を採用してもよい。 In the above case, as a method of supplying various software programs to the code generator, the device is used by using an appropriate jig (tool) at the manufacturing stage before shipment, the maintenance stage after shipment, or the like. You may adopt the method of installing in. Further, as a method of supplying various software programs, a general procedure may be adopted at present, such as a method of downloading from the outside via a communication line such as the Internet.

そして、このような場合において、本開示に係る技術は、係るソフトウェア・プログラムを構成するコード、又は係るコードが記録されたところの、コンピュータ読み取り可能な記録媒体によって構成されると捉えることができる。この場合、係る記録媒体は、ハードウェア装置１４００と独立した媒体に限らず、ＬＡＮやインターネットなどにより伝送されたソフトウェア・プログラムをダウンロードして記憶又は一時記憶した記憶媒体を含む。 In such a case, the technology according to the present disclosure can be regarded as being composed of a code constituting the software program or a computer-readable recording medium on which the code is recorded. In this case, the recording medium is not limited to a medium independent of the hardware device 1400, but includes a storage medium in which a software program transmitted via a LAN, the Internet, or the like is downloaded and stored or temporarily stored.

また、上述したコード生成装置、又は、当該コード生成装置の構成要素は、図１４に例示するハードウェア装置１４００を仮想化した仮想化環境と、当該仮想化環境において実行される各種ソフトウェア・プログラム（コンピュータ・プログラム）とによって構成されてもよい。この場合、図１４に例示するハードウェア装置１４００の構成要素は、当該仮想化環境における仮想デバイスとして提供される。なお、この場合も、図１４に例示するハードウェア装置１４００を物理的な装置として構成した場合と同様の構成にて、本開示に係る技術を実現可能である。 Further, the above-mentioned code generation device or a component of the code generation device includes a virtual environment in which the hardware device 1400 illustrated in FIG. 14 is virtualized, and various software programs executed in the virtual environment ( It may be configured by a computer program). In this case, the components of the hardware device 1400 illustrated in FIG. 14 are provided as virtual devices in the virtualized environment. In this case as well, the technique according to the present disclosure can be realized with the same configuration as when the hardware device 1400 illustrated in FIG. 14 is configured as a physical device.

以上、本開示に係る技術を、上述した模範的な実施形態に適用した例として説明した。しかしながら、本開示の技術的範囲は、上述した各実施形態及び変形例に記載した範囲には限定されない。当業者には、係る実施形態に対して多様な変更又は改良を加えることが可能であることは明らかである。そのような場合、係る変更又は改良を加えた新たな実施形態も、本開示の技術的範囲に含まれ得る。更に、上述した各実施形態及び変形例、並びに、係る変更又は改良を加えた新たな実施形態を組合せた実施形態も、本開示の技術的範囲に含まれ得る。そしてこのことは、特許請求の範囲に記載した事項から明らかである。 The techniques according to the present disclosure have been described above as examples of application to the above-mentioned exemplary embodiments. However, the technical scope of the present disclosure is not limited to the scope described in each of the above-described embodiments and modifications. It will be apparent to those skilled in the art that various changes or improvements can be made to such embodiments. In such cases, new embodiments with such modifications or improvements may also be included in the technical scope of the present disclosure. Further, the technical scope of the present disclosure may also include embodiments in which the above-described embodiments and modifications, as well as new embodiments with such modifications or improvements, are combined. And this is clear from the matters stated in the claims.

１００コード生成装置
１０１コンパイラ
１０２コード解析部
１０３命令生成部
１０４ファイル管理部
１２００情報処理装置
１２０１プロセッサ
１２０２メモリ
１２０３ストレージ
１３００コード生成装置
１３０１コード解析部
１３０２命令生成部
１４０１プロセッサ
１４０２メモリ
１４０３不揮発性記憶装置
１４０４ドライブ装置
１４０５記録媒体 100 Code generator 101 Compiler 102 Code analysis unit 103 Instruction generation unit 104 File management unit 1200 Information processing device 1201 Processor 1202 Memory 1203 Storage 1300 Code generator 1301 Code analysis unit 1302 Instruction generation unit 1401 Processor 1402 Memory 1403 Non-volatile storage device 1404 Drive device 1405 Recording medium

Claims

A code that selects a combination of operation instructions that have no dependency by analyzing the dependency between the operation instructions of different types among one or more operation instructions included in the analysis target code that is a computer program. Analytical means and
For each combination of the dependency is no operation instruction, convert the fusion operation instruction consisting operation instruction combination of addition and subtraction and multiplication, to produce a parallel execution instructions that perform pre Symbol fusion calculation instructions in parallel With instruction generation means ,
The instruction generation means is composed of an arithmetic instruction of a combination of addition / subtraction and division having no dependency, and an arithmetic instruction of a combination of addition / subtraction and multiplication by converting the division into multiplication by obtaining the reciprocal of the divisor. A code generator that converts to the fusion operation instruction.

The code analysis means determines whether or not there is a dependency between the addition instruction and the multiplication instruction, which are the operation instructions having different operation types, among the operation instructions included in the analysis target code, and the subtraction instruction. Determine if there is a dependency between and the multiplication instruction,
The code generation device according to claim 1, wherein the instruction generation means converts the operation instructions of different types, which are determined by the code analysis means to have no dependency, into the fusion operation instructions.

The instruction generation means converts all of the operation instructions of different types determined to have no dependency by the code analysis means into the fusion operation instruction, and combines the converted fusion operation instructions. The code generator according to claim 2, wherein the parallel execution instruction for executing the fusion operation instruction in parallel is generated.

The fusion operation instruction is an FMA (Fused Multiply and added) operation instruction that executes a product-sum operation with one instruction.
The parallel execution instruction is a SIMD (Single Instruction Multiply Data) -FMA instruction capable of executing an FMA operation instruction in parallel for a plurality of data.
The instruction generation means converts the operation instruction determined to have no dependency into an FMA operation instruction according to the result of the determination in the code analysis means, and combines the converted FMA operation instructions. The code generator according to claim 2 or 3, which generates the SIMD-FMA instruction.

The instruction generation means
When the arithmetic instruction is an addition instruction, the addition instruction is FMA by adding an instruction to multiply one operand of the addition instruction by a first dummy operand having a value that does not affect the result of addition as an immediate value. Convert to arithmetic instructions
When the operation instruction is a subtraction instruction, an instruction to multiply one operand of the subtraction instruction by a second dummy operand having a value having a value that does not affect the result of the subtraction as an immediate value is added, and the other operand of the subtraction instruction is added. By inverting the sign of, the subtraction instruction is converted into an FMA operation instruction.
When the operation instruction is a multiplication instruction, the subtraction instruction is converted into an FMA operation instruction by adding an instruction for adding a third dummy operand having a value that does not affect the result of multiplication as an immediate value.
The code generator according to claim 4.

When the SIMD-FMA instruction can execute two FMA operation instructions in parallel,
The instruction generation means
Of the areas in which the register for storing the operands related to the SIMD-FMA instruction is divided into two, one of the areas contains the operands related to the addition instruction or the subtraction instruction among the operation instructions before being converted into the FMA operation instruction. Place and
The code according to claim 5, which generates data representing an operand of the SIMD-FMA instruction so as to arrange an operand related to a multiplication instruction among the operation instructions before being converted into an FMA operation instruction in the other area. Generator.

By analyzing the dependency between the operation instructions having different types of instructions from one or more operation instructions included in the analysis target code which is a computer program, a combination of operation instructions having no dependency is selected .
Wherein for each combination of dependency is not operation instruction, convert the fusion operation instruction consisting operation instruction combination of addition and subtraction and multiplication, to produce a parallel execution instructions that perform pre Symbol fusion calculation instructions in parallel ,
When the parallel execution instruction is generated, the reciprocal of the divisor is obtained to convert the division to multiplication, so that the operation instruction of the combination of addition / subtraction and division having no dependency can be obtained from the combination of addition / subtraction and multiplication. A code generation method for converting into the fusion operation instruction consisting of an operation instruction.

It is determined whether or not there is a dependency between the operation instructions of different types among the one or more operation instructions, and the operation instruction of a different type determined to have no dependency is used as the fusion operation instruction. The code generation method according to claim 7, further comprising converting.

On the computer
Analysis that selects a combination of arithmetic instructions that have no dependency by analyzing the dependency between the arithmetic instructions of different types among one or more arithmetic instructions included in the code to be analyzed, which is a computer program. Processing and
For each combination of the dependency is no operation instruction, convert the fusion operation instruction consisting operation instruction combination of addition and subtraction and multiplication, to produce a parallel execution instructions that perform pre Symbol fusion calculation instructions in parallel Execute the instruction generation process ,
In the instruction generation process, the reciprocal of the divisor is obtained, and the division is converted into multiplication, so that the operation instruction of the combination of addition / subtraction and division having no dependency is composed of the operation instruction of the combination of addition / subtraction and multiplication. A code generation program that converts to the fusion operation instruction.

The analysis process includes a process of determining whether or not there is a dependency between the operation instructions of different types among the one or more operation instructions.
The code generation program according to claim 9, wherein the instruction generation process includes a process of converting the operation instruction of a different type determined to have no dependency into the fusion operation instruction.