WO2021166840A1 - Recording medium, compiling device, processing system, and compiling method - Google Patents

Recording medium, compiling device, processing system, and compiling method Download PDF

Info

Publication number
WO2021166840A1
WO2021166840A1 PCT/JP2021/005479 JP2021005479W WO2021166840A1 WO 2021166840 A1 WO2021166840 A1 WO 2021166840A1 JP 2021005479 W JP2021005479 W JP 2021005479W WO 2021166840 A1 WO2021166840 A1 WO 2021166840A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
vector
address
mask
scalar
Prior art date
Application number
PCT/JP2021/005479
Other languages
French (fr)
Japanese (ja)
Inventor
敏也 平田
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Priority to JP2022501869A priority Critical patent/JP7513080B2/en
Publication of WO2021166840A1 publication Critical patent/WO2021166840A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode

Definitions

  • the present invention relates to a recording medium, a compiling device, a processing system, and a compiling method.
  • Patent Document 1 discloses, as a related technique, a technique relating to an apparatus and a method for managing address collisions when performing a vector operation.
  • Patent Document 2 discloses a technique for handling a list vector on a vector register as a related technique.
  • the summation operation including the following indirect address reference is called a list summation operation.
  • L (I) the same value may be duplicated in L (I).
  • X (L (I)) the vector operation cannot be performed, and the sequential processing is performed using the scalar instruction.
  • FIG. 15 shows an example of an instruction sequence generated for the list summation operation when this method is used. In the instruction sequence example shown in FIG.
  • the compiler according to the LISTVEC instruction line method includes a vector scatter instruction (hereinafter referred to as a VSC instruction) and a vector gather instruction. (Hereinafter referred to as VGT instruction) is generated.
  • VSC instructions and VGT instructions have a high execution cost, and when the LISTVEC instruction line method is used, these instructions are executed every time the loop is repeated, so that the execution time of the list summation operation becomes long. There is a problem that it will end up. Therefore, there is a demand for a technique capable of shortening the execution time of the list summation operation even when the elements are duplicated in the list summation operation.
  • Each aspect of the present invention aims to provide a recording medium, a compiling device, a processing system, and a compiling method capable of solving the above problems.
  • the recording medium calculates the address of the array referred to by the indirect address on the computer of the compiling device, and detects the duplication of the calculated address. Generates an instruction to create a vector mask, perform operations between vectors based on the bits of the vector mask, and recalculate with a scalar based on the calculation results between the vectors. Record the compiler.
  • the compiling device comprises an address calculation means, a duplicate detection-mask creation means, a vector addition means, and a result invalid term detection means to perform an operation.
  • An instruction to generate an instruction to be executed the address calculation means generating an instruction to execute calculating the address of an array to be referred to by an indirect address, and the duplicate detection-mask creation means to calculate the calculated address.
  • the vector addition means generates an instruction to execute an operation between vectors based on the bits of the vector mask, and generates an instruction to execute the operation of detecting the duplication of the vectors and creating the vector mask.
  • the result invalid term detecting means generates an instruction to execute recalculation with a scalar based on the calculation result of the vectors.
  • the processing system includes the above compilation device and an arithmetic unit that performs arithmetic operations according to the instructions generated by the compilation device.
  • the compilation method calculates the address of the array referenced by the indirect address, detects the duplication of the calculated address, and creates a vector mask. This includes performing operations between vectors based on the bits of the vector mask, and recalculating with a scalar based on the calculation results between the vectors.
  • the execution time of the list summation operation can be shortened even when the elements are duplicated in the list summation operation.
  • the compiler 1 is a compiler having a vector instruction generation function for high-speed processing of a sum operation including an indirect address reference in the compilation device 10. Specifically, the compiler 1 newly adds a VFMD instruction that detects elements having the same value in a single vector register and creates a vector mask, and executes the VFMD instruction with indirect access. It is a compiler that generates a high-speed instruction sequence with reduced costly instructions. The hardware to be processed according to the instruction sequence generated by the compiler 1 loads the data pointed to by the address in the memory stored in each element of the vector register specified by the list vector.
  • the instruction set includes a vector gather instruction to be loaded into and a vector scatter instruction to store the data on the vector register at the store destination address on the memory stored in each element of the vector register specified by the list vector.
  • 20 is a computing device 20 having a vector processor.
  • the processing system 100 includes a compilation device 10 and an arithmetic unit 20.
  • the compiler 1 generates an object code (instruction) from the source program in the compile device 10. As shown in FIG. 2, the compiler 1 functions as a code analysis means 11 and an instruction generation means 12 in the compilation device 10.
  • the code analysis means 11 is a means for analyzing a program and determining whether or not to perform vectorization of the list summation operation.
  • the code analysis means 11 includes an instruction line analysis means 111 and a list summation operation syntax analysis means 112.
  • the instruction line analysis means 111 analyzes whether or not an instruction line that allows vectorization of the list summation operation is specified.
  • the list summation parsing means 112 analyzes that the instruction line parsing means 111 specifies an instruction line that allows vectorization of the list summation operation, the list summation operation to which the instruction line is specified is vectorized. Analyze whether it is a possible format.
  • the instruction generation means 12 is a means for generating a vectorization code when the code analysis means 11 determines that the list summation operation is vectorized based on the analysis result.
  • the instruction generation means 12 includes a vector instruction generation means 121 and a scalar recalculation instruction generation means 122.
  • the vector instruction generation means 121 performs vector addition of all elements, and generates an instruction for detecting an address duplication at the time of the addition.
  • the scalar recalculation instruction generation means 122 generates an instruction (hereinafter, referred to as “scalar recalculation instruction”) for recalculating an element whose result is invalid due to duplication of addresses.
  • the code analysis means 11 and the scalar recalculation instruction generation means 122 are means that the compiler related to the LISTVEC instruction line method also has, and for example, "Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. , “Compilers: Principles, Technologies, and Tools (2nd Edition)", (USA), Pearson Education, Inc, 2007, pp.1-581. ..
  • the vector instruction generation means 121 includes an address calculation means 1211, a duplicate detection-mask creation means 1212, a vector addition means 1213, and a result invalid term detection means 1214.
  • the address calculation means 1211 calculates the address of the array referred to by the indirect address.
  • Duplicate detection-The mask creating means 1212 detects the duplicate of the address calculated by the address calculating means 1211 and creates a vector mask.
  • the vector addition means 1213 calculates the addition of vectors based on the bits of the vector mask.
  • the result invalid term detecting means 1214 confirms the necessity of generating the scalar recalculation instruction and branches the process.
  • the result invalid term detecting means 1214 determines that it is necessary to generate a scalar recalculation instruction by detecting that the addition result between the vectors is invalid before performing the addition operation, the scalar recalculation is performed. Let the vector addition means 1213 execute the generation of the instruction.
  • the address calculation means 1211 generates the instructions of line numbers 6 and 7 shown in FIG. 4 and calculates the address of each element of X (L (I)).
  • Duplicate detection-The mask creating means 1212 generates a newly added VFMD instruction at line number 8 shown in FIG. 4, and detects duplicate addresses of each element of X (L (I)) stored in the vector register. , Generate a vector mask.
  • the vector addition means 1213 generates an instruction of line number 12 from line number 9 shown in FIG.
  • the result invalid term detection means 1214 generates the instructions of line numbers 13 and 14, and counts the number of elements in which the bit of the vector mask created by the duplicate detection-mask creation means 1212 is 1 according to the PCVM instruction. If the number is not 0, that is, if there is even one duplicate address, the process is branched to generate a scalar recalculation instruction.
  • VSEQ vector sequential number instruction
  • VSC line number 10
  • VGT line number 11
  • VCMPS vector compare instruction
  • row number 1 row number 1
  • VSC line number 11
  • VCMPS vector compare instruction
  • a VSFA instruction Vector Shift Left and Add
  • L (I) element of X
  • the address is calculated and stored in the vector register.
  • 4-byte data is the target of compilation, and the instruction "vsfa% v59,% v60, 2,% s59" is generated. Will be done.
  • each element of L (I) is stored in the vector register% v60 at the previous row number 7.
  • the address of X stored in the scalar register% s59 is added to the value obtained by multiplying the value by 4 bytes of the data size to calculate the address of X (L (I)), which is stored in the vector register% v59.
  • the vector register% v59 storing the addresses of each element of X (L (I)) is set.
  • an instruction to detect whether there are elements having the same value that is, whether there is a duplicate in the address of X (L (I)), and to create a vector mask in which the bit of the index number of the duplicate element is 1.
  • VFMD Vector Form Mask Duplicate
  • FIG. 5 the VFMD instruction uses the vector register VR0 that stores the addresses of each element of X (L (I)) calculated by the VSFA instruction as the source, and the created vector mask as the vector mask register VM0. Store in.
  • the compiler 1 has a function of generating the newly added vector mask creation instruction VFMD, and for example, the VSC generated when detecting the duplication of addresses by using the LISTVEC instruction line method.
  • VFMD vector mask creation instruction
  • VSC generated when detecting the duplication of addresses by using the LISTVEC instruction line method.
  • the array X, the array Y, and the array L that is the index of the array X are recorded in the memory of the arithmetic unit 20.
  • the number of elements N of these arrays X, Y, and L is 5, and it is assumed that the numerical values shown in the portion (a) of FIG. 7 are stored in the memory of the arithmetic unit 20 as the elements of each array in the initial state. do.
  • the arithmetic unit 20 reads each element of the array L into the vector register VR0 as shown in the part (b) of FIG.
  • the arithmetic unit 20 calculates the address of each element of the array X (L (I)) (step S1) and stores it in the vector register VR1.
  • the arithmetic unit 20 detects the duplication of the values of each element of the vector register VR1 by the newly added VFMD instruction (step S2), and the vector mask register. A vector mask is generated in VM0 (step S3).
  • the arithmetic unit 20 compares the first element adr (X (1)) of the vector register VR1 with the second and subsequent elements of the vector register VR1 in order.
  • the third element of the vector register VR1 is adr (X (1)), which overlaps with the first element adr (X (1)).
  • the first element of the vector register VR1 is determined to have duplication, and the bit of the first element of the vector mask register VM0 is set to 1.
  • the arithmetic unit 20 similarly determines the duplication of the second and subsequent elements of the vector register VR1, sets the bit of the vector mask register VM0 to 1 if there is duplication, and sets the bit of the vector mask register VM0 if there is no duplication. Set to 0. Bits of the vector mask register VM0 corresponding to the last appearing element (in the case of the example shown in the part (d) of FIG. 8), which is the fourth element (X (1))) among the elements having the overlap of the vector register VR1. Is 0.
  • the arithmetic unit 20 reads the value of the array X (L (I)) into the vector register VR2, and the value of the array Y (I) is the vector register VR3. Is read into. Then, as shown in the portion (f) of FIG. 9, the arithmetic unit 20 executes vector addition for the nth element of the vector register VR2 and the nth element of the vector register VR3 (step S4), and the calculation result. Is stored in the vector register VR4. In this case, n is an integer from 1 to 5.
  • the arithmetic unit 20 writes the arithmetic result to the memory indicated by the array X (L (I)) as shown in the portion (g) of FIG.
  • the arithmetic unit 20 first writes the first stored value 3 of the vector register VR1 to the element X (1) having the overlapping address, and then stores the third value 3.
  • the value 7 is overwritten, the fourth stored value 9 is overwritten, and the last written value 9 is reflected in the memory.
  • the arithmetic unit 20 can determine that the arithmetic result is an invalid result (inappropriate result) due to the duplication of addresses (step S5). As shown in the part (h) of FIG. 10, the arithmetic unit 20 counts the number of elements in which the bit of the vector mask register VM0 is 1, and if the counted number is not 0, the process recalculates with a scalar. Branch control.
  • the arithmetic unit 20 determines that the arithmetic result is an invalid result (inappropriate result) (YES in step S5), the arithmetic unit 20 causes the vector mask as shown in the part (i) of FIG. The number of the element in which the bit of the register VM0 is 1 is recalculated by the scalar (step S6).
  • the arithmetic unit 20 calculates X (L (1)) + Y (1) for the first element.
  • the arithmetic unit 20 writes 10 to the element X (L (1)), that is, the element X (1).
  • FIG. 11 shows the final result. Therefore, in FIG. 11, the value stored in X (1) is 15 instead of 10.
  • the process ends.
  • the compiler 1 has been described above.
  • VSEQ vector sequential number instruction
  • VSC line number 10
  • VGT vector sequential number instruction
  • VCMPS vector compile command
  • VFMK vector form mask command
  • the vector instruction generation means 121 performs vector addition of all elements, and generates an instruction for detecting an address duplication at the time of the addition.
  • the vector instruction generation means 121 includes an address calculation means 1211, a duplicate detection-mask creation means 1212, a vector addition means 1213, and a result invalid term detection means 1214.
  • the address calculation means 1211 calculates the address of the array referred to by the indirect address.
  • Duplicate detection-The mask creating means 1212 detects the duplicate of the address calculated by the address calculating means 1211 and creates a vector mask.
  • the vector addition means 1213 calculates the addition of vectors based on the bits of the vector mask.
  • the result invalid term detecting means 1214 confirms the necessity of generating the scalar recalculation instruction and branches the process.
  • the result invalid term detecting means 1214 causes the vector adding means 1213 to generate the scalar recalculation instruction when it is determined that the scalar recalculation instruction needs to be generated based on the addition result between the vectors.
  • the VSC instruction and the VGT instruction which have a high execution cost accompanied by the memory access, are eliminated, and the VFMD instruction having only the register access can be processed.
  • the compiler 1 can shorten the execution time of the list summation operation in the compile device 10 even when the elements are duplicated in the list summation operation.
  • the op part is an addition instruction
  • the op part may be other than an addition instruction as long as it is an instruction capable of vector operation such as a subtraction instruction.
  • the VSC instruction and the VGT instruction which are instructions with high execution cost accompanied by memory access, can be used for vector operations other than addition. It disappears and the processing can be speeded up.
  • the minimum configuration compiling device 10 includes address calculation means 1211, duplication detection-mask creation means 1212, vector addition means 1213, and result invalid term detection means 1214.
  • the address calculation means 1211 generates an instruction to execute the calculation of the address of the array to which the indirect address is referenced.
  • Duplicate detection-The mask creating means 1212 generates an instruction to detect the duplicate of the address calculated by the address calculating means 1211 and execute the creation of the vector mask.
  • the vector addition means 1213 generates an instruction to execute an operation between the vectors based on the bits of the vector mask.
  • the result invalid term detecting means 1214 generates an instruction for causing the vector adding means 1213 to recalculate with a scalar based on the calculation result of the vectors.
  • the address calculation means 1211 generates an instruction for executing the calculation of the address of the array referred to by the indirect address (step S11).
  • Duplicate detection-The mask creating means 1212 generates an instruction to detect the duplicate of the address calculated by the address calculating means 1211 and execute the creation of the vector mask (step S12).
  • the vector addition means 1213 generates an instruction to execute an operation between the vectors based on the bits of the vector mask (step S13).
  • the result invalid term detecting means 1214 generates an instruction to cause the vector adding means 1213 to recalculate with a scalar based on the calculation result of the vectors (step S14).
  • the compilation device 10 having the minimum configuration according to the embodiment of the present invention has been described above. With this compile device 10, the execution time of the list summation operation can be shortened even when the elements are duplicated in the list summation operation.
  • the newly added VFMD instruction has been described as detecting duplication between elements using a vector register storing the calculated X (L (I)) address.
  • the address duplication of the list summation operation may be detected by detecting the duplication between the elements of the vector register loaded with the value of the array L (I).
  • the value of the array L (I) is stored in the vector register% v60 in line number 6, and here, “vfmd% vm15,% v60” and%.
  • the compilation device 10 and the arithmetic unit 20 have been described as separate devices. However, in another embodiment of the present invention, the compiling device 10 and the arithmetic unit 20 are housed in one device, and the one device performs both the processing performed by the compiling device 10 and the processing performed by the arithmetic unit 20. It may be a thing.
  • the order of the processing may be changed within the range in which the appropriate processing is performed.
  • FIG. 14 is a schematic block diagram showing the configuration of a computer according to at least one embodiment.
  • the computer 5 includes a CPU 6 (including a vector processor), a main memory 7, a storage 8, and an interface 9.
  • the above-mentioned compilation device 10, arithmetic unit 20, and other control devices is mounted on the computer 5.
  • each processing unit described above is stored in the storage 8 in the form of a program.
  • the CPU 6 reads the program from the storage 8, expands it into the main memory 7, and executes the above processing according to the program. Further, the CPU 6 secures a storage area corresponding to each of the above-mentioned storage units in the main memory 7 according to the program.
  • the storage 8 examples include HDD (Hard Disk Drive), SSD (Solid State Drive), magnetic disk, optical magnetic disk, CD-ROM (Compact Disk Read Only Memory), DVD-ROM (Digital Versatile Disk) , Semiconductor memory and the like.
  • the storage 8 may be internal media directly connected to the bus of computer 5, or external media connected to computer 5 via an interface 9 or a communication line. When this program is distributed to the computer 5 via a communication line, the distributed computer 5 may expand the program in the main memory 7 and execute the above processing.
  • the storage 8 is a non-temporary tangible storage medium.
  • the above program may realize a part of the above-mentioned functions.
  • the program may be a file that can realize the above-mentioned functions in combination with a program already recorded in the computer device, that is, a so-called difference file (difference program).
  • Each aspect of the present invention may be applied to a recording medium, a compiling device, a processing system, and a compiling method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Devices For Executing Special Programs (AREA)
  • Complex Calculations (AREA)

Abstract

This recording medium records a compiler that is executed to cause a computer of a compiling device to generate instructions to execute: calculation of the address of an array to be referred to by indirect addressing; generation of a vector mask by detecting duplication of the calculated address; calculation between vectors on the basis of the bits in the vector mask; and recalculation using a scalar on the basis of the calculation results between the vectors.

Description

記録媒体、コンパイル装置、処理システム及びコンパイル方法Recording medium, compilation device, processing system and compilation method
 本発明は、記録媒体、コンパイル装置、処理システム及びコンパイル方法に関する。 The present invention relates to a recording medium, a compiling device, a processing system, and a compiling method.
 大規模数値解析やシミュレーションを行う分野(例えば、AIやビッグデータを扱う分野)では、ベクトル命令を有する計算機を利用して情報処理が行われることがある。
 特許文献1には、関連する技術として、ベクトル演算を実行する際にアドレス衝突を管理するための装置及び方法に関する技術が開示されている。
 特許文献2には、関連する技術として、リストベクトルをベクトルレジスタ上で扱う技術が開示されている。
In the field of large-scale numerical analysis and simulation (for example, the field of handling AI and big data), information processing may be performed using a computer having vector instructions.
Patent Document 1 discloses, as a related technique, a technique relating to an apparatus and a method for managing address collisions when performing a vector operation.
Patent Document 2 discloses a technique for handling a list vector on a vector register as a related technique.
特表2019-517060号公報Special Table 2019-517060 特開平4-127367号公報Japanese Unexamined Patent Publication No. 4-127367
 ところで、次のような間接アドレス参照を含む総和演算のことをリスト総和演算と呼ぶ。
 DO I = 1, N
   X(L(I)) = X(L(I)) + Y(I)
 ENDDO
 このリスト総和演算において、L(I)中に同一の値が重複する場合がある。X(L(I))の定義と参照には依存関係がある。そのため、L(I)中に同一の値が重複する場合、ベクトル演算を行うことができず、スカラ命令を用いて逐次処理を行うことになる。
By the way, the summation operation including the following indirect address reference is called a list summation operation.
DO I = 1, N
X (L (I)) = X (L (I)) + Y (I)
ENDDO
In this list summation operation, the same value may be duplicated in L (I). There is a dependency between the definition and reference of X (L (I)). Therefore, when the same value is duplicated in L (I), the vector operation cannot be performed, and the sequential processing is performed using the scalar instruction.
 例えば、非特許文献である“Sugiyama, T., N. Terada, T. Murata, Y. Omura, H. Usui, and H.Matsumoto, Vectorized Particle Simulation Using“LISTVEC” Compile-directive on SX Super-computer, IPSJ journal, 45, SIG 6 (ACS 6), p. 171(2004)”などによるLISTVEC指示行法は、初めに重複を考慮せずにベクトル演算を行い、その後、重複を検出する命令を生成し、重複している要素についてスカラで再計算するという手法である。この手法を用いた場合、重複の数が少なければ少ないほどベクトル化の効果が高まり処理を高速化することができる。図15に、この手法を用いた場合に、上記リスト総和演算に対して生成される命令列の例を示す。図15に示す命令列の例では、加算対象の配列Xの要素数を最大ベクトル長256に区切ってループで処理を行っている。しかしながら、図15の行番号10、11に示すように、重複を検出するための処理において、LISTVEC指示行法に係るコンパイラは、ベクトルスキャッター命令(以下、VSC命令と記載)とベクトルギャザー命令(以下、VGT命令と記載)を生成する。一般的に、VSC命令、VGT命令は実行コストが高く、LISTVEC指示行法を用いた場合、これらの命令がループの繰り返しごとに実行されることとなるため、リスト総和演算の実行時間が長くなってしまうという課題がある。
 そのため、リスト総和演算において要素が重複する場合であってもリスト総和演算の実行時間を短くすることのできる技術が求められている。
For example, the non-patent documents "Sugiyama, T., N. Terada, T. Murata, Y. Omura, H. Usui, and H. Matsumoto, Vectorized Computer Simulation Directing" LISTVEC The LISTVEC directive method such as IPSJ journal, 45, SIG 6 (ACS 6), p. 171 (2004) "first performs a vector operation without considering duplication, and then generates an instruction to detect duplication. , It is a method of recalculating the overlapping elements with a scalar. When this method is used, the smaller the number of duplicates, the more effective the vectorization and the faster the processing. FIG. 15 shows an example of an instruction sequence generated for the list summation operation when this method is used. In the instruction sequence example shown in FIG. 15, the number of elements of the array X to be added is divided into a maximum vector length of 256, and processing is performed in a loop. However, as shown in line numbers 10 and 11 of FIG. 15, in the process for detecting duplication, the compiler according to the LISTVEC instruction line method includes a vector scatter instruction (hereinafter referred to as a VSC instruction) and a vector gather instruction. (Hereinafter referred to as VGT instruction) is generated. In general, VSC instructions and VGT instructions have a high execution cost, and when the LISTVEC instruction line method is used, these instructions are executed every time the loop is repeated, so that the execution time of the list summation operation becomes long. There is a problem that it will end up.
Therefore, there is a demand for a technique capable of shortening the execution time of the list summation operation even when the elements are duplicated in the list summation operation.
 本発明の各態様は、上記の課題を解決することのできる記録媒体、コンパイル装置、処理システム及びコンパイル方法を提供することを目的としている。 Each aspect of the present invention aims to provide a recording medium, a compiling device, a processing system, and a compiling method capable of solving the above problems.
 上記目的を達成するために、本発明の一態様によれば、記録媒体は、コンパイル装置のコンピュータに、間接アドレス参照される配列のアドレスを算出することと、算出した前記アドレスの重複を検出しベクトルマスクを作成することと、前記ベクトルマスクのビットに基づいて、ベクトルどうしの演算を行うことと、前記ベクトルどうしの演算結果に基づいてスカラで計算し直すことと、を実行させる命令を生成するコンパイラを記録する。 In order to achieve the above object, according to one aspect of the present invention, the recording medium calculates the address of the array referred to by the indirect address on the computer of the compiling device, and detects the duplication of the calculated address. Generates an instruction to create a vector mask, perform operations between vectors based on the bits of the vector mask, and recalculate with a scalar based on the calculation results between the vectors. Record the compiler.
 上記目的を達成するために、本発明の別の態様によれば、コンパイル装置は、アドレス計算手段、重複検出-マスク作成手段、ベクトル加算手段、及び、結果不正項検出手段を備え、演算を実行させる命令を生成するコンパイル装置であって、前記アドレス計算手段は、間接アドレス参照される配列のアドレスを算出することを実行させる命令を生成し、前記重複検出-マスク作成手段は、算出した前記アドレスの重複を検出しベクトルマスクを作成することを実行させる命令を生成し、前記ベクトル加算手段は、前記ベクトルマスクのビットに基づいて、ベクトルどうしの演算を行うことを実行させる命令を生成し、前記結果不正項検出手段は、前記ベクトルどうしの演算結果に基づいてスカラで計算し直すことを実行させる命令を生成する。 In order to achieve the above object, according to another aspect of the present invention, the compiling device comprises an address calculation means, a duplicate detection-mask creation means, a vector addition means, and a result invalid term detection means to perform an operation. An instruction to generate an instruction to be executed, the address calculation means generating an instruction to execute calculating the address of an array to be referred to by an indirect address, and the duplicate detection-mask creation means to calculate the calculated address. The vector addition means generates an instruction to execute an operation between vectors based on the bits of the vector mask, and generates an instruction to execute the operation of detecting the duplication of the vectors and creating the vector mask. The result invalid term detecting means generates an instruction to execute recalculation with a scalar based on the calculation result of the vectors.
 上記目的を達成するために、本発明の別の態様によれば、処理システムは、上記のコンパイル装置と、前記コンパイル装置によって生成された命令に従って演算する演算装置と、を備える。 In order to achieve the above object, according to another aspect of the present invention, the processing system includes the above compilation device and an arithmetic unit that performs arithmetic operations according to the instructions generated by the compilation device.
 上記目的を達成するために、本発明の別の態様によれば、コンパイル方法は、間接アドレス参照される配列のアドレスを算出することと、算出した前記アドレスの重複を検出しベクトルマスクを作成することと、前記ベクトルマスクのビットに基づいて、ベクトルどうしの演算を行うことと、前記ベクトルどうしの演算結果に基づいてスカラで計算し直すことと、を含む。 In order to achieve the above object, according to another aspect of the present invention, the compilation method calculates the address of the array referenced by the indirect address, detects the duplication of the calculated address, and creates a vector mask. This includes performing operations between vectors based on the bits of the vector mask, and recalculating with a scalar based on the calculation results between the vectors.
 本発明の各態様によれば、リスト総和演算において要素が重複する場合であってもリスト総和演算の実行時間を短くすることができる。 According to each aspect of the present invention, the execution time of the list summation operation can be shortened even when the elements are duplicated in the list summation operation.
本発明の一実施形態による処理システムの構成の一例を示す図である。It is a figure which shows an example of the structure of the processing system by one Embodiment of this invention. 本発明の一実施形態によるコンパイル装置においてコンパイラによって実現される機能の一例を示す図である。It is a figure which shows an example of the function realized by the compiler in the compilation apparatus by one Embodiment of this invention. 本発明の一実施形態によるベクトル命令生成手段の一例を示す図である。It is a figure which shows an example of the vector instruction generation means by one Embodiment of this invention. 本発明の一実施形態によるコンパイラによって生成されるリスト総和演算の命令列の一例を示す図である。It is a figure which shows an example of the instruction sequence of the list summation operation generated by the compiler by one Embodiment of this invention. 本発明の一実施形態によるニーモニックの一例を示す図である。It is a figure which shows an example of the mnemonic by one Embodiment of this invention. 本発明の一実施形態による演算装置の処理フローの一例を示す図である。It is a figure which shows an example of the processing flow of the arithmetic unit by one Embodiment of this invention. 本発明の一実施形態による演算装置による処理を説明するための第1の図である。It is the first figure for demonstrating the processing by the arithmetic unit by one Embodiment of this invention. 本発明の一実施形態による演算装置による処理を説明するための第2の図である。It is a 2nd figure for demonstrating the processing by the arithmetic unit by one Embodiment of this invention. 本発明の一実施形態による演算装置による処理を説明するための第3の図である。It is a 3rd figure for demonstrating the processing by the arithmetic unit by one Embodiment of this invention. 本発明の一実施形態による演算装置による処理を説明するための第4の図である。It is a 4th figure for demonstrating the processing by the arithmetic unit by one Embodiment of this invention. 本発明の一実施形態による演算装置による処理を説明するための第5の図である。It is a 5th figure for demonstrating the processing by the arithmetic unit by one Embodiment of this invention. 本発明の実施形態による最小構成のコンパイル装置を示す図である。It is a figure which shows the compilation apparatus of the minimum structure by embodiment of this invention. 本発明の実施形態による最小構成のコンパイル装置の処理フローの一例を示す図である。It is a figure which shows an example of the processing flow of the compilation apparatus of the minimum configuration by embodiment of this invention. 少なくとも1つの実施形態に係るコンピュータの構成を示す概略ブロック図である。It is a schematic block diagram which shows the structure of the computer which concerns on at least one Embodiment. 本発明に関連するコンパイラによって生成されるリスト総和演算の命令列の一例を示す図である。It is a figure which shows an example of the instruction sequence of the list summation operation generated by the compiler which concerns on this invention.
 以下、図面を参照しながら実施形態について詳しく説明する。
<実施形態>
 本発明の一実施形態によるコンパイラ1は、コンパイル装置10において、間接アドレス参照を含む総和演算を高速処理するベクトル命令生成機能を有するコンパイラである。具体的には、コンパイラ1は、単一のベクトルレジスタ内で同一の値をもつ要素を検出してベクトルマスクを作成するVFMD命令を新規に追加し、このVFMD命令を用いて間接アクセスをともなう実行コストの大きい命令を削減した高速な命令列を生成するコンパイラである。
 なお、コンパイラ1によって生成された命令列に従って処理を実行する対象となるハードウェアは、リストベクトルで指定されたベクトルレジスタの各要素に格納されたメモリ上のアドレスが指し示すデータをロード先のベクトルレジスタにロードするベクトルギャザー命令、および、リストベクトルで指定されたベクトルレジスタの各要素に格納されたメモリ上のストア先のアドレスにベクトルレジスタ上のデータをストアするベクトルスキャッター命令を命令セットに備えたベクトルプロセッサを有する演算装置20である。
 本発明の一実施形態による処理システム100は、図1に示すように、コンパイル装置10、演算装置20を備える。
Hereinafter, embodiments will be described in detail with reference to the drawings.
<Embodiment>
The compiler 1 according to the embodiment of the present invention is a compiler having a vector instruction generation function for high-speed processing of a sum operation including an indirect address reference in the compilation device 10. Specifically, the compiler 1 newly adds a VFMD instruction that detects elements having the same value in a single vector register and creates a vector mask, and executes the VFMD instruction with indirect access. It is a compiler that generates a high-speed instruction sequence with reduced costly instructions.
The hardware to be processed according to the instruction sequence generated by the compiler 1 loads the data pointed to by the address in the memory stored in each element of the vector register specified by the list vector. The instruction set includes a vector gather instruction to be loaded into and a vector scatter instruction to store the data on the vector register at the store destination address on the memory stored in each element of the vector register specified by the list vector. 20 is a computing device 20 having a vector processor.
As shown in FIG. 1, the processing system 100 according to the embodiment of the present invention includes a compilation device 10 and an arithmetic unit 20.
 コンパイラ1は、コンパイル装置10において、ソースプログラムからオブジェクトコード(命令)を生成する。コンパイラ1は、図2に示すように、コンパイル装置10において、コード解析手段11、命令生成手段12として機能する。 The compiler 1 generates an object code (instruction) from the source program in the compile device 10. As shown in FIG. 2, the compiler 1 functions as a code analysis means 11 and an instruction generation means 12 in the compilation device 10.
 コード解析手段11は、プログラムを解析しリスト総和演算のベクトル化を行うか否かを判定する手段である。コード解析手段11は、指示行解析手段111、リスト総和演算構文解析手段112を含む。
 指示行解析手段111は、リスト総和演算のベクトル化を許可する指示行が指定されているか否かを解析する。
 リスト総和演算構文解析手段112は、指示行解析手段111がリスト総和演算のベクトル化を許可する指示行が指定されていると解析した場合、その指示行が指定されているリスト総和演算がベクトル化可能な形式であるかを解析する。
The code analysis means 11 is a means for analyzing a program and determining whether or not to perform vectorization of the list summation operation. The code analysis means 11 includes an instruction line analysis means 111 and a list summation operation syntax analysis means 112.
The instruction line analysis means 111 analyzes whether or not an instruction line that allows vectorization of the list summation operation is specified.
When the list summation parsing means 112 analyzes that the instruction line parsing means 111 specifies an instruction line that allows vectorization of the list summation operation, the list summation operation to which the instruction line is specified is vectorized. Analyze whether it is a possible format.
 命令生成手段12は、コード解析手段11が解析結果によりリスト総和演算のベクトル化を行うと判定した場合、ベクトル化コードを生成する手段である。命令生成手段12は、ベクトル命令生成手段121、スカラ再計算命令生成手段122を含む。
 ベクトル命令生成手段121は、全要素のベクトル加算を行い、その加算の際にアドレスの重複を検出する命令を生成する。
 スカラ再計算命令生成手段122は、アドレスの重複により結果が不正となっている要素についてスカラで計算し直す命令(以下、「スカラ再計算命令」と記載)を生成する。
The instruction generation means 12 is a means for generating a vectorization code when the code analysis means 11 determines that the list summation operation is vectorized based on the analysis result. The instruction generation means 12 includes a vector instruction generation means 121 and a scalar recalculation instruction generation means 122.
The vector instruction generation means 121 performs vector addition of all elements, and generates an instruction for detecting an address duplication at the time of the addition.
The scalar recalculation instruction generation means 122 generates an instruction (hereinafter, referred to as “scalar recalculation instruction”) for recalculating an element whose result is invalid due to duplication of addresses.
 なお、コード解析手段11及びスカラ再計算命令生成手段122は、LISTVEC指示行法に係るコンパイラも有する手段であり、例えば、「Alfred V. Aho, Monica S. Lam, Ravi Sethi,and Jeffrey D. Ullman,“Compilers: Principles, Techniques, and Tools(2nd Edition)”,(米国),Pearson Education,Inc,2007,pp.1-581」に記載されている技術と同様に実現するものであってもよい。 The code analysis means 11 and the scalar recalculation instruction generation means 122 are means that the compiler related to the LISTVEC instruction line method also has, and for example, "Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. , "Compilers: Principles, Technologies, and Tools (2nd Edition)", (USA), Pearson Education, Inc, 2007, pp.1-581. ..
 ベクトル命令生成手段121は、図3に示すように、アドレス計算手段1211、重複検出-マスク作成手段1212、ベクトル加算手段1213、結果不正項検出手段1214を含む。
 アドレス計算手段1211は、間接アドレス参照される配列のアドレスを算出する。
 重複検出-マスク作成手段1212は、アドレス計算手段1211が算出したアドレスの重複を検出しベクトルマスクを作成する。
 ベクトル加算手段1213は、ベクトルマスクのビットに基づいて、ベクトルどうしの加算を演算する。
 結果不正項検出手段1214は、スカラ再計算命令を生成する必要性を確認して処理の分岐を行う。つまり、結果不正項検出手段1214は、ベクトルどうしの加算結果が不正となることを加算演算を行う前に検出することによってスカラ再計算命令の生成が必要であると判定した場合に、スカラ再計算命令の生成をベクトル加算手段1213に実行させる。
As shown in FIG. 3, the vector instruction generation means 121 includes an address calculation means 1211, a duplicate detection-mask creation means 1212, a vector addition means 1213, and a result invalid term detection means 1214.
The address calculation means 1211 calculates the address of the array referred to by the indirect address.
Duplicate detection-The mask creating means 1212 detects the duplicate of the address calculated by the address calculating means 1211 and creates a vector mask.
The vector addition means 1213 calculates the addition of vectors based on the bits of the vector mask.
The result invalid term detecting means 1214 confirms the necessity of generating the scalar recalculation instruction and branches the process. That is, when the result invalid term detecting means 1214 determines that it is necessary to generate a scalar recalculation instruction by detecting that the addition result between the vectors is invalid before performing the addition operation, the scalar recalculation is performed. Let the vector addition means 1213 execute the generation of the instruction.
 次に、図4に示す本発明の一実施形態によるコンパイラ1によって生成されるリスト総和演算の命令列を例に、ベクトル命令生成手段121が生成する命令について説明する。
 アドレス計算手段1211は、図4に示す行番号6、7の命令を生成し、X(L(I))の各要素のアドレスを算出する。重複検出-マスク作成手段1212は、図4に示す行番号8において新規に追加したVFMD命令を生成し、ベクトルレジスタに格納されたX(L(I))の各要素のアドレスの重複を検出し、ベクトルマスクを生成する。ベクトル加算手段1213は、図4に示す行番号9から行番号12の命令を生成し、X(L(I))+Y(I)のベクトル加算を行い、その演算結果をX(L(I))が示すメモリに書き込む。結果不正項検出手段1214は、行番号13、14の命令を生成し、PCVM命令に応じて重複検出-マスク作成手段1212が作成したベクトルマスクのビットが1となっている要素の数を数え、数が0でない、つまりアドレスの重複が1つでもあればスカラ再計算命令を生成する処理へと分岐させる。
Next, the instructions generated by the vector instruction generation means 121 will be described by taking as an example the instruction sequence of the list summation operation generated by the compiler 1 according to the embodiment of the present invention shown in FIG.
The address calculation means 1211 generates the instructions of line numbers 6 and 7 shown in FIG. 4 and calculates the address of each element of X (L (I)). Duplicate detection-The mask creating means 1212 generates a newly added VFMD instruction at line number 8 shown in FIG. 4, and detects duplicate addresses of each element of X (L (I)) stored in the vector register. , Generate a vector mask. The vector addition means 1213 generates an instruction of line number 12 from line number 9 shown in FIG. 4, performs vector addition of X (L (I)) + Y (I), and outputs the calculation result to X (L (I)). ) Indicates the memory. The result invalid term detection means 1214 generates the instructions of line numbers 13 and 14, and counts the number of elements in which the bit of the vector mask created by the duplicate detection-mask creation means 1212 is 1 according to the PCVM instruction. If the number is not 0, that is, if there is even one duplicate address, the process is branched to generate a scalar recalculation instruction.
 なお、図4に示す本発明の一実施形態によるコンパイラ1によって生成されるリスト総和演算の命令列と、図15に示すLISTVEC指示行法を用いて生成される命令列とを比較すると、図15に示す命令列において、アドレス重複検出のために生成していたVSEQ(ベクトルシーケンシャルナンバー命令)(行番号1)、VSC(行番号10)、VGT(行番号11)、VCMPS(ベクトルコンペア命令)(行番号15)、および、VFMK(ベクトルフォームマスク命令)(行番号16)の各命令が、図4に示す命令列では生成されず、それらの命令の代わりに新規にVFMD命令(行番号8)が生成される。 Comparing the instruction sequence of the list summation operation generated by the compiler 1 according to the embodiment of the present invention shown in FIG. 4 with the instruction sequence generated by using the LISTVEC instruction line method shown in FIG. 15, FIG. In the instruction sequence shown in (1), VSEQ (vector sequential number instruction) (line number 1), VSC (line number 10), VGT (line number 11), VCMPS (vector compare instruction) (row number 1), VSC (line number 11), VCMPS (vector compare instruction) generated for address duplication detection ( Line numbers 15) and VFMK (vector form mask command) (line number 16) are not generated in the command sequence shown in FIG. 4, and instead of those commands, a new VFMD command (line number 8) Is generated.
 図15に示すLISTVEC指示行法を用いて生成される命令列の例の場合、行番号8において、VSFA命令(Vector Shift Left and Add)を生成し、X(L(I))の各要素のアドレスを算出してベクトルレジスタに格納している。また、図15に示すLISTVEC指示行法を用いて生成される命令列の例の場合、4バイトデータをコンパイルの対象としており、“vsfa %v59,%v60,2,%s59”という命令が生成される。そして、図15に示すLISTVEC指示行法を用いて生成される命令列の例の場合、1つ前の行番号7でベクトルレジスタ%v60にL(I)の各要素が格納されており、これにデータサイズの4バイトを乗算した値に、スカラレジスタ%s59に格納されたXのアドレスを加算してX(L(I))のアドレスを算出し、ベクトルレジスタ%v59に格納している。
 一方、図4に示す本発明の一実施形態によるコンパイラ1によって生成されるリスト総和演算の命令列の例の場合、X(L(I))の各要素のアドレスを格納したベクトルレジスタ%v59に対して、同一の値をもつ要素がないか、つまりX(L(I))のアドレスに重複がないかを検出し、重複する要素のインデックス番号のビットを1とするベクトルマスクを作成する命令が新規に追加される。この新規命令の名前をVFMD(Vector Form Mask Duplicate)とし、ニーモニックの例を図5に示す。図5に示すように、VFMD命令は、上述したVSFA命令で算出されたX(L(I))の各要素のアドレスを格納したベクトルレジスタVR0をソースとし、作成したベクトルマスクをベクトルマスクレジスタVM0に格納する。
 本発明の一実施形態によるコンパイラ1は、この新規に追加したベクトルマスク作成命令VFMDを生成する機能を備え、例えば、LISTVEC指示行法を用いてアドレスの重複を検出する際に生成していたVSC命令とVGT命令を含む命令列に代わって、新規にVFMD命令を生成することにより、リスト総和演算における実行コストの高い命令を削減し、処理の高速化を実現する。
In the case of the example of the instruction sequence generated by using the LISTVEC instruction line method shown in FIG. 15, a VSFA instruction (Vector Shift Left and Add) is generated at line number 8 and each element of X (L (I)) is generated. The address is calculated and stored in the vector register. Further, in the case of the example of the instruction sequence generated by using the LISTVEC instruction line method shown in FIG. 15, 4-byte data is the target of compilation, and the instruction "vsfa% v59,% v60, 2,% s59" is generated. Will be done. Then, in the case of the example of the instruction sequence generated by using the LISTVEC instruction row method shown in FIG. 15, each element of L (I) is stored in the vector register% v60 at the previous row number 7. The address of X stored in the scalar register% s59 is added to the value obtained by multiplying the value by 4 bytes of the data size to calculate the address of X (L (I)), which is stored in the vector register% v59.
On the other hand, in the case of the instruction sequence of the list summation operation generated by the compiler 1 according to the embodiment of the present invention shown in FIG. 4, the vector register% v59 storing the addresses of each element of X (L (I)) is set. On the other hand, an instruction to detect whether there are elements having the same value, that is, whether there is a duplicate in the address of X (L (I)), and to create a vector mask in which the bit of the index number of the duplicate element is 1. Is newly added. The name of this new instruction is VFMD (Vector Form Mask Duplicate), and an example of the mnemonic is shown in FIG. As shown in FIG. 5, the VFMD instruction uses the vector register VR0 that stores the addresses of each element of X (L (I)) calculated by the VSFA instruction as the source, and the created vector mask as the vector mask register VM0. Store in.
The compiler 1 according to the embodiment of the present invention has a function of generating the newly added vector mask creation instruction VFMD, and for example, the VSC generated when detecting the duplication of addresses by using the LISTVEC instruction line method. By generating a new VFMD instruction instead of the instruction sequence including the instruction and the VGT instruction, the instruction having a high execution cost in the list summation operation is reduced, and the processing speed is realized.
 次に、コンパイラ1によって生成されたオブジェクトコード(命令)を実行する演算装置20の処理について説明する。
 ここでは、図6に示す演算装置20のリスト総和演算の処理について図7~図11を用いて説明する。
 なお、図7~図11に示す具体的な数値を用いたリスト総和演算の処理は、一例であり、本発明の一実施形態による演算装置20のリスト総和演算の処理を限定するものではない。
Next, the processing of the arithmetic unit 20 that executes the object code (instruction) generated by the compiler 1 will be described.
Here, the process of the list summation calculation of the arithmetic unit 20 shown in FIG. 6 will be described with reference to FIGS. 7 to 11.
The process of the list summation calculation using the specific numerical values shown in FIGS. 7 to 11 is an example, and does not limit the process of the list summation calculation of the arithmetic unit 20 according to the embodiment of the present invention.
 配列Xと、配列Yと、配列Xのインデックスとなる配列Lとが演算装置20のメモリに記録されている。これらの配列X、Y、Lの要素数Nは5であり、初期状態における各配列の要素として、図7の(a)の部分に示す数値が演算装置20のメモリに格納されているものとする。 The array X, the array Y, and the array L that is the index of the array X are recorded in the memory of the arithmetic unit 20. The number of elements N of these arrays X, Y, and L is 5, and it is assumed that the numerical values shown in the portion (a) of FIG. 7 are stored in the memory of the arithmetic unit 20 as the elements of each array in the initial state. do.
 まず、演算装置20によって、図7の(b)の部分に示すように、配列Lの各要素がベクトルレジスタVR0に読み込まれる。 First, the arithmetic unit 20 reads each element of the array L into the vector register VR0 as shown in the part (b) of FIG.
 次に、演算装置20によって、図8の(c)の部分に示すように、配列X(L(I))の各要素のアドレスが計算され(ステップS1)、ベクトルレジスタVR1に格納される。 Next, as shown in the part (c) of FIG. 8, the arithmetic unit 20 calculates the address of each element of the array X (L (I)) (step S1) and stores it in the vector register VR1.
 次に、演算装置20によって、図8の(d)の部分に示すように、新規に追加されたVFMD命令によってベクトルレジスタVR1の各要素の値の重複が検出され(ステップS2)、ベクトルマスクレジスタVM0にベクトルマスクが生成される(ステップS3)。演算装置20は、ベクトルレジスタVR1の1番目の要素addr(X(1))について、ベクトルレジスタVR1の2番目以降の要素と順に比較する。図8の(d)の部分に示す例では、ベクトルレジスタVR1の3番目の要素がaddr(X(1))であり、1番目の要素addr(X(1))と重複する。そのため、ベクトルレジスタVR1の1番目の要素は、重複ありと判定され、ベクトルマスクレジスタVM0の1番目の要素のビットを1に設定する。演算装置20は、ベクトルレジスタVR1の2番目以降の要素についても同様に重複を判定し、重複があればベクトルマスクレジスタVM0のビットを1に設定し、重複がなければベクトルマスクレジスタVM0のビットを0に設定する。ベクトルレジスタVR1の重複がある要素のうち、最後に現れる要素(図8の(d)の部分に示す例の場合、4番目のaddr(X(1)))に対応するベクトルマスクレジスタVM0のビットは0となる。これは、ベクトル演算した場合、最後の要素の演算結果がそれよりも前の演算結果を上書きしてメモリ上に反映され、スカラで計算し直す必要がないことを示すものである。これは、後述するように、演算装置20がスカラで計算し直す場合の式であるX(1)=X(1)+Y(1)+Y(3)+Y(4)の右辺うち、X(1)+Y(4)=9がすでに計算されて結果がベクトルレジスタVR1に格納されており、この最後の要素に関してはスカラで計算し直す必要がないためであり、VM0のビットは0にしてX(1)+Y(4)のスカラ命令は生成しないようにしてスカラで計算し直す回数を減らす工夫によるものである。 Next, as shown in the part (d) of FIG. 8, the arithmetic unit 20 detects the duplication of the values of each element of the vector register VR1 by the newly added VFMD instruction (step S2), and the vector mask register. A vector mask is generated in VM0 (step S3). The arithmetic unit 20 compares the first element adr (X (1)) of the vector register VR1 with the second and subsequent elements of the vector register VR1 in order. In the example shown in the part (d) of FIG. 8, the third element of the vector register VR1 is adr (X (1)), which overlaps with the first element adr (X (1)). Therefore, the first element of the vector register VR1 is determined to have duplication, and the bit of the first element of the vector mask register VM0 is set to 1. The arithmetic unit 20 similarly determines the duplication of the second and subsequent elements of the vector register VR1, sets the bit of the vector mask register VM0 to 1 if there is duplication, and sets the bit of the vector mask register VM0 if there is no duplication. Set to 0. Bits of the vector mask register VM0 corresponding to the last appearing element (in the case of the example shown in the part (d) of FIG. 8), which is the fourth element (X (1))) among the elements having the overlap of the vector register VR1. Is 0. This indicates that when a vector operation is performed, the operation result of the last element overwrites the operation result before that and is reflected in the memory, and there is no need to recalculate with a scalar. As will be described later, this is an equation when the arithmetic unit 20 recalculates with a scalar. Of the right side of X (1) = X (1) + Y (1) + Y (3) + Y (4), X (1) ) + Y (4) = 9 has already been calculated and the result is stored in the vector register VR1, and it is not necessary to recalculate this last element with a scalar. Therefore, the bit of VM0 is set to 0 and X ( 1) The scalar command of + Y (4) is not generated and the number of times of recalculation with the scalar is reduced.
 次に、演算装置20によって、図9の(e)の部分に示すように、配列X(L(I))の値がベクトルレジスタVR2に読み込まれ、配列Y(I)の値がベクトルレジスタVR3に読み込まれる。そして、演算装置20によって、図9の(f)の部分に示すように、ベクトルレジスタVR2のn番目の要素とベクトルレジスタVR3のn番目の要素についてベクトル加算が実行され(ステップS4)、演算結果がベクトルレジスタVR4に格納される。なお、この場合のnは、1から5の整数である。 Next, as shown in the portion (e) of FIG. 9, the arithmetic unit 20 reads the value of the array X (L (I)) into the vector register VR2, and the value of the array Y (I) is the vector register VR3. Is read into. Then, as shown in the portion (f) of FIG. 9, the arithmetic unit 20 executes vector addition for the nth element of the vector register VR2 and the nth element of the vector register VR3 (step S4), and the calculation result. Is stored in the vector register VR4. In this case, n is an integer from 1 to 5.
 次に、演算装置20によって、図10の(g)の部分に示すように、演算結果が配列X(L(I))が示すメモリに書き込まれる。このとき、アドレスが重複している要素X(1)には、演算装置20によって、最初にベクトルレジスタVR1の1番目に格納されている値3が書き込まれ、次に3番目に格納されている値7が上書きされ、最後に4番目に格納されている値9が上書きされて、最後に書き込まれた値9がメモリ上に反映される。
 なお、各要素をX(L(I))=X(L(I))+Y(I)を用いてスカラで計算した場合、要素配列X(1)=X(1)+Y(1)+Y(3)+Y(4)となり、要素X(1)の値としては2+1+5+7=15が正しい。演算装置20は、アドレスの重複によって、演算結果が不正な結果(不適切な結果)であると判定することができる(ステップS5)。
 演算装置20によって、図10の(h)の部分に示すように、ベクトルマスクレジスタVM0のビットが1の要素の数をカウントし、カウントした数が0でなければスカラで計算し直す処理へと制御を分岐させる。
Next, the arithmetic unit 20 writes the arithmetic result to the memory indicated by the array X (L (I)) as shown in the portion (g) of FIG. At this time, the arithmetic unit 20 first writes the first stored value 3 of the vector register VR1 to the element X (1) having the overlapping address, and then stores the third value 3. The value 7 is overwritten, the fourth stored value 9 is overwritten, and the last written value 9 is reflected in the memory.
When each element is calculated by scalar using X (L (I)) = X (L (I)) + Y (I), the element array X (1) = X (1) + Y (1) + Y ( 3) + Y (4), and 2 + 1 + 5 + 7 = 15 is correct as the value of the element X (1). The arithmetic unit 20 can determine that the arithmetic result is an invalid result (inappropriate result) due to the duplication of addresses (step S5).
As shown in the part (h) of FIG. 10, the arithmetic unit 20 counts the number of elements in which the bit of the vector mask register VM0 is 1, and if the counted number is not 0, the process recalculates with a scalar. Branch control.
 演算装置20が、演算結果が不正な結果(不適切な結果)であると判定した場合(ステップS5においてYES)、演算装置20によって、図11の(i)の部分に示すように、ベクトルマスクレジスタVM0のビットが1となっている要素の番号について、スカラで計算し直される(ステップS6)。 When the arithmetic unit 20 determines that the arithmetic result is an invalid result (inappropriate result) (YES in step S5), the arithmetic unit 20 causes the vector mask as shown in the part (i) of FIG. The number of the element in which the bit of the register VM0 is 1 is recalculated by the scalar (step S6).
 まず、演算装置20によって、1番目の要素について、X(L(1))+Y(1)が計算される。このとき、要素X(L(1))にはベクトル計算の結果X(1)+Y(4)=9が格納されているため、X(L(1))+Y(1)の結果は9+1=10となり、演算装置20によって、要素X(L(1))、つまり要素X(1)には10が書き込まれる。
 なお、図11は、最終結果を示している。そのため、図11において、X(1)に格納される値は10ではなく15となっている。
First, the arithmetic unit 20 calculates X (L (1)) + Y (1) for the first element. At this time, since the vector calculation result X (1) + Y (4) = 9 is stored in the element X (L (1)), the result of X (L (1)) + Y (1) is 9 + 1 = It becomes 10, and the arithmetic unit 20 writes 10 to the element X (L (1)), that is, the element X (1).
Note that FIG. 11 shows the final result. Therefore, in FIG. 11, the value stored in X (1) is 15 instead of 10.
 次に、演算装置20によって、3番目の要素について、X(L(3))=X(L(3))+Y(3)が計算される。要素X(L(3))=X(1)には先ほどのスカラで計算し直して得られた値10が格納されているため、X(L(3))+Y(3)の結果は10+5=15となり、演算装置20によって、要素X(L(3))、つまり要素X(1)には15が書き込まれる。VM0でビットが1となっている要素についてこのスカラでの計算のし直しを繰返し、結果として不正項が補正される。図4に示す例では3番目の要素でスカラでの計算のし直しは終了し、図11の(j)の部分に示すように最終的な演算結果が格納される。アドレスの重複があった配列Xの1番目の要素の値は15となり、正しい結果が得られている。 Next, the arithmetic unit 20 calculates X (L (3)) = X (L (3)) + Y (3) for the third element. Since the value 10 obtained by recalculating with the above scalar is stored in the element X (L (3)) = X (1), the result of X (L (3)) + Y (3) is 10 + 5. = 15, and the arithmetic unit 20 writes 15 to the element X (L (3)), that is, the element X (1). The calculation with this scalar is repeated for the element whose bit is 1 in VM0, and as a result, the invalid term is corrected. In the example shown in FIG. 4, the recalculation in the scalar is completed at the third element, and the final calculation result is stored as shown in the part (j) of FIG. The value of the first element of the array X with duplicate addresses is 15, and the correct result is obtained.
 また、演算装置20が、演算結果が不正な結果(不適切な結果)でないと判定した場合、処理を終了する。 If the arithmetic unit 20 determines that the arithmetic result is not an invalid result (inappropriate result), the process ends.
 以上、本発明の一実施形態によるコンパイラ1について説明した。
 本発明に関連する例えばLISTVEC指示行法を用いるコンパイラでは、図15に示すように、アドレス重複検出のために生成していたVSEQ(ベクトルシーケンシャルナンバー命令)(行番号1)、VSC(行番号10)、VGT(行番号11)、VCMPS(ベクトルコンペア命令)(行番号15)、および、VFMK(ベクトルフォームマスク命令)(行番号16)の各命令列を生成するのに対して、本発明の一実施形態によるコンパイラ1では、新規にVFMD命令が追加されるのみであり(図4の行番号8)、コンパイル装置10において、ベクトル命令生成手段121として機能する。ベクトル命令生成手段121は、全要素のベクトル加算を行い、その加算の際にアドレスの重複を検出する命令を生成する。ベクトル命令生成手段121は、アドレス計算手段1211、重複検出-マスク作成手段1212、ベクトル加算手段1213、結果不正項検出手段1214を含む。アドレス計算手段1211は、間接アドレス参照される配列のアドレスを算出する。重複検出-マスク作成手段1212は、アドレス計算手段1211が算出したアドレスの重複を検出しベクトルマスクを作成する。ベクトル加算手段1213は、ベクトルマスクのビットに基づいて、ベクトルどうしの加算を演算する。結果不正項検出手段1214は、スカラ再計算命令の生成の必要性を確認して処理の分岐を行う。つまり、結果不正項検出手段1214は、ベクトルどうしの加算結果に基づいてスカラ再計算命令の生成が必要であると判定した場合に、スカラ再計算命令の生成をベクトル加算手段1213に実行させる。
 このように、新規にVFMD命令が追加されることにより、VSC命令とVGT命令というメモリアクセスを伴う実行コストが高い命令がなくなり、レジスタアクセスのみのVFMD命令だけで処理できるようになる。その結果、コンパイラ1は、コンパイル装置10において、リスト総和演算において要素が重複する場合であってもリスト総和演算の実行時間を短くすることができる。
The compiler 1 according to the embodiment of the present invention has been described above.
In a compiler using, for example, the LISTVEC instruction line method related to the present invention, as shown in FIG. 15, VSEQ (vector sequential number instruction) (line number 1) and VSC (line number 10) generated for address duplication detection are generated. ), VGT (line number 11), VCMPS (vector compile command) (line number 15), and VFMK (vector form mask command) (line number 16). In the compiler 1 according to one embodiment, only a new VFMD instruction is added (line number 8 in FIG. 4), and the compiler 10 functions as a vector instruction generation means 121 in the compile device 10. The vector instruction generation means 121 performs vector addition of all elements, and generates an instruction for detecting an address duplication at the time of the addition. The vector instruction generation means 121 includes an address calculation means 1211, a duplicate detection-mask creation means 1212, a vector addition means 1213, and a result invalid term detection means 1214. The address calculation means 1211 calculates the address of the array referred to by the indirect address. Duplicate detection-The mask creating means 1212 detects the duplicate of the address calculated by the address calculating means 1211 and creates a vector mask. The vector addition means 1213 calculates the addition of vectors based on the bits of the vector mask. The result invalid term detecting means 1214 confirms the necessity of generating the scalar recalculation instruction and branches the process. That is, the result invalid term detecting means 1214 causes the vector adding means 1213 to generate the scalar recalculation instruction when it is determined that the scalar recalculation instruction needs to be generated based on the addition result between the vectors.
As described above, by newly adding the VFMD instruction, the VSC instruction and the VGT instruction, which have a high execution cost accompanied by the memory access, are eliminated, and the VFMD instruction having only the register access can be processed. As a result, the compiler 1 can shorten the execution time of the list summation operation in the compile device 10 even when the elements are duplicated in the list summation operation.
 なお、本発明の一実施形態では、リスト総和演算について説明したが、本発明の別の実施形態では、リスト総和演算ではなく、以下の形式の演算に適用するものであってもよい。
 X(L(I)) = X(L(I)) op expr
  op: ベクトル演算可能な命令
  expr: Xの参照を含まない式
In one embodiment of the present invention, the list summation calculation has been described, but in another embodiment of the present invention, it may be applied to the following type of calculation instead of the list summation calculation.
X (L (I)) = X (L (I)) op expr
op: Vector-operable instruction expr: Expression that does not include a reference to X
 本発明の一実施形態ではopの部分が加算命令の場合を例に説明したが、opの部分は、例えば減算命令など、ベクトル演算可能な命令であれば加算命令以外であってもよい。
 本発明の一実施形態における加算を減算などのベクトル演算に置き換えて同様のコンパイル及び演算を考えることで、加算以外のベクトル演算についてもVSC命令とVGT命令というメモリアクセスを伴う実行コストが高い命令がなくなり、処理を高速化することができる。
In one embodiment of the present invention, the case where the op part is an addition instruction has been described as an example, but the op part may be other than an addition instruction as long as it is an instruction capable of vector operation such as a subtraction instruction.
By substituting the addition in one embodiment of the present invention with a vector operation such as subtraction and considering the same compilation and operation, the VSC instruction and the VGT instruction, which are instructions with high execution cost accompanied by memory access, can be used for vector operations other than addition. It disappears and the processing can be speeded up.
 本発明の実施形態による最小構成のコンパイル装置10について説明する。
 本発明の実施形態による最小構成のコンパイル装置10は、図12に示すように、アドレス計算手段1211、重複検出-マスク作成手段1212、ベクトル加算手段1213、結果不正項検出手段1214を備える。
 アドレス計算手段1211は、間接アドレス参照される配列のアドレスを算出することを実行させる命令を生成する。
 重複検出-マスク作成手段1212は、アドレス計算手段1211が算出した前記アドレスの重複を検出しベクトルマスクを作成することを実行させる命令を生成する。
 ベクトル加算手段1213は、前記ベクトルマスクのビットに基づいて、ベクトルどうしの演算を行うことを実行させる命令を生成する。
 結果不正項検出手段1214は、前記ベクトルどうしの演算結果に基づいてベクトル加算手段1213にスカラで計算し直すことを実行させる命令を生成する。
The minimum configuration compilation device 10 according to the embodiment of the present invention will be described.
As shown in FIG. 12, the minimum configuration compiling device 10 according to the embodiment of the present invention includes address calculation means 1211, duplication detection-mask creation means 1212, vector addition means 1213, and result invalid term detection means 1214.
The address calculation means 1211 generates an instruction to execute the calculation of the address of the array to which the indirect address is referenced.
Duplicate detection-The mask creating means 1212 generates an instruction to detect the duplicate of the address calculated by the address calculating means 1211 and execute the creation of the vector mask.
The vector addition means 1213 generates an instruction to execute an operation between the vectors based on the bits of the vector mask.
The result invalid term detecting means 1214 generates an instruction for causing the vector adding means 1213 to recalculate with a scalar based on the calculation result of the vectors.
 次に、本発明の実施形態による最小構成のコンパイル装置10による処理について説明する。
 ここでは、図13に示す処理フローについて説明する。
 アドレス計算手段1211は、間接アドレス参照される配列のアドレスを算出することを実行させる命令を生成する(ステップS11)。
 重複検出-マスク作成手段1212は、アドレス計算手段1211が算出した前記アドレスの重複を検出しベクトルマスクを作成することを実行させる命令を生成する(ステップS12)。
 ベクトル加算手段1213は、前記ベクトルマスクのビットに基づいて、ベクトルどうしの演算を行うことを実行させる命令を生成する(ステップS13)。
 結果不正項検出手段1214は、前記ベクトルどうしの演算結果に基づいてベクトル加算手段1213にスカラで計算し直すことを実行させる命令を生成する(ステップS14)。
Next, processing by the compilation device 10 having the minimum configuration according to the embodiment of the present invention will be described.
Here, the processing flow shown in FIG. 13 will be described.
The address calculation means 1211 generates an instruction for executing the calculation of the address of the array referred to by the indirect address (step S11).
Duplicate detection-The mask creating means 1212 generates an instruction to detect the duplicate of the address calculated by the address calculating means 1211 and execute the creation of the vector mask (step S12).
The vector addition means 1213 generates an instruction to execute an operation between the vectors based on the bits of the vector mask (step S13).
The result invalid term detecting means 1214 generates an instruction to cause the vector adding means 1213 to recalculate with a scalar based on the calculation result of the vectors (step S14).
 以上、本発明の実施形態による最小構成のコンパイル装置10について説明した。
 このコンパイル装置10により、リスト総和演算において要素が重複する場合であってもリスト総和演算の実行時間を短くすることができる。
The compilation device 10 having the minimum configuration according to the embodiment of the present invention has been described above.
With this compile device 10, the execution time of the list summation operation can be shortened even when the elements are duplicated in the list summation operation.
 なお、本発明の一実施形態では、新規追加したVFMD命令について、算出したX(L(I))のアドレスを格納したベクトルレジスタを用いて要素間の重複を検出するものとして説明した。しかしながら、本発明の別の実施形態では、配列L(I)の値をロードしたベクトルレジスタの要素間の重複を検出することで、リスト総和演算のアドレス重複を検出するものであってもよい。このことを図4に示した命令列を例に説明すると、行番号6で配列L(I)の値をベクトルレジスタ%v60に格納しており、ここで“vfmd %vm15,%v60”と%v60をソースとしてVFMD命令を生成することで、本発明の一実施形態におけるベクトルマスクと同じベクトルマスクを作成することができる。 In one embodiment of the present invention, the newly added VFMD instruction has been described as detecting duplication between elements using a vector register storing the calculated X (L (I)) address. However, in another embodiment of the present invention, the address duplication of the list summation operation may be detected by detecting the duplication between the elements of the vector register loaded with the value of the array L (I). Explaining this by taking the instruction sequence shown in FIG. 4 as an example, the value of the array L (I) is stored in the vector register% v60 in line number 6, and here, “vfmd% vm15,% v60” and%. By generating the VFMD instruction using v60 as a source, the same vector mask as the vector mask in one embodiment of the present invention can be created.
 なお、本発明の一実施形態では、コンパイル装置10と演算装置20は、別の装置として説明した。しかしながら、本発明の別の実施形態では、コンパイル装置10と演算装置20とは1つの装置に収められ、その1つの装置が、コンパイル装置10が行う処理と演算装置20が行う処理の両方を行うものであってもよい。 In one embodiment of the present invention, the compilation device 10 and the arithmetic unit 20 have been described as separate devices. However, in another embodiment of the present invention, the compiling device 10 and the arithmetic unit 20 are housed in one device, and the one device performs both the processing performed by the compiling device 10 and the processing performed by the arithmetic unit 20. It may be a thing.
 なお、本発明の実施形態における処理は、適切な処理が行われる範囲において、処理の順番が入れ替わってもよい。 In the processing according to the embodiment of the present invention, the order of the processing may be changed within the range in which the appropriate processing is performed.
 本発明の実施形態について説明したが、上述のコンパイル装置10、演算装置20、その他の制御装置は内部に、コンピュータ装置を有していてもよい。そして、上述した処理の過程は、プログラムの形式でコンピュータ読み取り可能な記録媒体に記憶されており、このプログラムをコンピュータが読み出して実行することによって、上記処理が行われる。コンピュータの具体例を以下に示す。
 図14は、少なくとも1つの実施形態に係るコンピュータの構成を示す概略ブロック図である。
 コンピュータ5は、図14に示すように、CPU6(ベクトルプロセッサを含む)、メインメモリ7、ストレージ8、インターフェース9を備える。
 例えば、上述のコンパイル装置10、演算装置20、その他の制御装置のそれぞれは、コンピュータ5に実装される。そして、上述した各処理部の動作は、プログラムの形式でストレージ8に記憶されている。CPU6は、プログラムをストレージ8から読み出してメインメモリ7に展開し、当該プログラムに従って上記処理を実行する。また、CPU6は、プログラムに従って、上述した各記憶部に対応する記憶領域をメインメモリ7に確保する。
Although the embodiment of the present invention has been described, the above-mentioned compilation device 10, arithmetic unit 20, and other control devices may have a computer device inside. The process of the above-mentioned processing is stored in a computer-readable recording medium in the form of a program, and the above-mentioned processing is performed by the computer reading and executing this program. A specific example of a computer is shown below.
FIG. 14 is a schematic block diagram showing the configuration of a computer according to at least one embodiment.
As shown in FIG. 14, the computer 5 includes a CPU 6 (including a vector processor), a main memory 7, a storage 8, and an interface 9.
For example, each of the above-mentioned compilation device 10, arithmetic unit 20, and other control devices is mounted on the computer 5. The operation of each processing unit described above is stored in the storage 8 in the form of a program. The CPU 6 reads the program from the storage 8, expands it into the main memory 7, and executes the above processing according to the program. Further, the CPU 6 secures a storage area corresponding to each of the above-mentioned storage units in the main memory 7 according to the program.
 ストレージ8の例としては、HDD(Hard Disk Drive)、SSD(Solid State Drive)、磁気ディスク、光磁気ディスク、CD-ROM(Compact Disc Read Only Memory)、DVD-ROM(Digital Versatile Disc Read Only Memory)、半導体メモリ等が挙げられる。ストレージ8は、コンピュータ5のバスに直接接続された内部メディアであってもよいし、インターフェース9または通信回線を介してコンピュータ5に接続される外部メディアであってもよい。また、このプログラムが通信回線によってコンピュータ5に配信される場合、配信を受けたコンピュータ5が当該プログラムをメインメモリ7に展開し、上記処理を実行してもよい。少なくとも1つの実施形態において、ストレージ8は、一時的でない有形の記憶媒体である。 Examples of the storage 8 include HDD (Hard Disk Drive), SSD (Solid State Drive), magnetic disk, optical magnetic disk, CD-ROM (Compact Disk Read Only Memory), DVD-ROM (Digital Versatile Disk) , Semiconductor memory and the like. The storage 8 may be internal media directly connected to the bus of computer 5, or external media connected to computer 5 via an interface 9 or a communication line. When this program is distributed to the computer 5 via a communication line, the distributed computer 5 may expand the program in the main memory 7 and execute the above processing. In at least one embodiment, the storage 8 is a non-temporary tangible storage medium.
 また、上記プログラムは、前述した機能の一部を実現してもよい。さらに、上記プログラムは、前述した機能をコンピュータ装置にすでに記録されているプログラムとの組み合わせで実現できるファイル、いわゆる差分ファイル(差分プログラム)であってもよい。 Further, the above program may realize a part of the above-mentioned functions. Further, the program may be a file that can realize the above-mentioned functions in combination with a program already recorded in the computer device, that is, a so-called difference file (difference program).
 本発明のいくつかの実施形態を説明したが、これらの実施形態は、例であり、発明の範囲を限定しない。これらの実施形態は、発明の要旨を逸脱しない範囲で、種々の追加、省略、置き換え、変更を行ってよい。 Although some embodiments of the present invention have been described, these embodiments are examples and do not limit the scope of the invention. Various additions, omissions, replacements, and changes may be made to these embodiments without departing from the gist of the invention.
 この出願は、2020年2月17日に出願された日本国特願2020-024338を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority based on Japanese Patent Application No. 2020-024338 filed on February 17, 2020, and incorporates all of its disclosures here.
 本発明の各態様は、記録媒体、コンパイル装置、処理システム及びコンパイル方法に適用してもよい。 Each aspect of the present invention may be applied to a recording medium, a compiling device, a processing system, and a compiling method.
1・・・コンパイラ
5・・・コンピュータ
6・・・CPU
7・・・メインメモリ
8・・・ストレージ
9・・・インターフェース
10・・・コンパイル装置
11・・・コード解析手段
12・・・命令生成手段
20・・・演算装置
100・・・処理システム
111・・・指示行解析手段
112・・・リスト総和演算構文解析手段
121・・・ベクトル命令生成手段
122・・・スカラ再計算命令生成手段
1211・・・アドレス計算手段
1212・・・重複検出-マスク作成手段
1213・・・ベクトル加算手段
1214・・・結果不正項検出手段
1 ... Compiler 5 ... Computer 6 ... CPU
7 ... Main memory 8 ... Storage 9 ... Interface 10 ... Compiling device 11 ... Code analysis means 12 ... Instruction generation means 20 ... Arithmetic device 100 ... Processing system 111 ...・ ・ Instruction line analysis means 112 ・ ・ ・ List summation operation Syntax analysis means 121 ・ ・ ・ Vector instruction generation means 122 ・ ・ ・ Scalar recalculation instruction generation means 1211 ・ ・ ・ Address calculation means 1212 ・ ・ ・ Duplicate detection-Mask creation Means 1213 ... Vector addition means 1214 ... Result invalid term detection means

Claims (7)

  1.  コンパイル装置のコンピュータに、
     間接アドレス参照される配列のアドレスを算出することと、
     算出した前記アドレスの重複を検出しベクトルマスクを作成することと、
     前記ベクトルマスクのビットに基づいて、ベクトルどうしの演算を行うことと、
     前記ベクトルどうしの演算結果に基づいてスカラで計算し直すことと、
     を実行させる命令を生成するコンパイラを記録する記録媒体。
    On the computer of the compiler
    Indirect address calculating the address of the referenced array and
    Detecting the calculated duplication of the addresses and creating a vector mask,
    Performing operations between vectors based on the bits of the vector mask,
    Recalculating with a scalar based on the calculation result of the vectors,
    A recording medium that records a compiler that produces instructions to execute.
  2.  前記演算は、加算または減算である、
     請求項1に記載の記録媒体。
    The operation is addition or subtraction,
    The recording medium according to claim 1.
  3.  前記命令は、前記アドレスの重複を検出し前記ベクトルマスクを作成することを実行させるVFMD命令を含み、
     前記VFMD命令は、前記コンピュータにおけるCPUからメモリへのアクセスを、前記CPUから前記コンピュータにおけるレジスタへのアクセスに変更させる、
     請求項1または請求項2に記載の記録媒体。
    The instruction includes a VFMD instruction that detects duplicate addresses and executes the creation of the vector mask.
    The VFMD instruction changes the access from the CPU to the memory in the computer to the access from the CPU to the register in the computer.
    The recording medium according to claim 1 or 2.
  4.  前記VFMD命令は、前記レジスタにおいてアドレスの重複がある場合に、前記ベクトルどうしの演算をスカラで計算し直すスカラ再計算命令を生成させる、
     請求項3に記載の記録媒体。
    The VFMD instruction generates a scalar recalculation instruction that recalculates the operations between the vectors with a scalar when there is an address duplication in the register.
    The recording medium according to claim 3.
  5.  アドレス計算手段、重複検出-マスク作成手段、ベクトル加算手段、及び、結果不正項検出手段を備え、演算を実行させる命令を生成するコンパイル装置であって、
     前記アドレス計算手段は、間接アドレス参照される配列のアドレスを算出することを実行させる命令を生成し、
     前記重複検出-マスク作成手段は、算出した前記アドレスの重複を検出しベクトルマスクを作成することを実行させる命令を生成し、
     前記ベクトル加算手段は、前記ベクトルマスクのビットに基づいて、ベクトルどうしの演算を行うことを実行させる命令を生成し、
     前記結果不正項検出手段は、前記ベクトルどうしの演算結果に基づいてスカラで計算し直すことを実行させる命令を生成する、
     コンパイル装置。
    A compiling device that includes an address calculation means, a duplicate detection-mask creation means, a vector addition means, and a result invalid term detection means, and generates an instruction to execute an operation.
    The address calculation means generates an instruction to execute the calculation of the address of the array to be referred to by the indirect address.
    The duplication detection-mask creation means generates an instruction to detect the calculated duplication of the address and execute the creation of the vector mask.
    The vector addition means generates an instruction to execute an operation between vectors based on the bits of the vector mask.
    The result invalid term detecting means generates an instruction to execute recalculation with a scalar based on the calculation result of the vectors.
    Compiler.
  6.  請求項5に記載のコンパイル装置と、
     前記コンパイル装置によって生成された命令に従って演算する演算装置と、
     を備える処理システム。
    The compiling device according to claim 5 and
    An arithmetic unit that performs operations according to the instructions generated by the compilation unit, and
    Processing system equipped with.
  7.  間接アドレス参照される配列のアドレスを算出することと、
     算出した前記アドレスの重複を検出しベクトルマスクを作成することと、
     前記ベクトルマスクのビットに基づいて、ベクトルどうしの演算を行うことと、
     前記ベクトルどうしの演算結果に基づいてスカラで計算し直すことと、
     を含むコンパイル方法。
    Indirect address calculating the address of the referenced array and
    Detecting the calculated duplication of the addresses and creating a vector mask,
    Performing operations between vectors based on the bits of the vector mask,
    Recalculating with a scalar based on the calculation result of the vectors,
    How to compile including.
PCT/JP2021/005479 2020-02-17 2021-02-15 Recording medium, compiling device, processing system, and compiling method WO2021166840A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2022501869A JP7513080B2 (en) 2020-02-17 2021-02-15 Processing system and processing method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020024338 2020-02-17
JP2020-024338 2020-02-17

Publications (1)

Publication Number Publication Date
WO2021166840A1 true WO2021166840A1 (en) 2021-08-26

Family

ID=77391169

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/005479 WO2021166840A1 (en) 2020-02-17 2021-02-15 Recording medium, compiling device, processing system, and compiling method

Country Status (2)

Country Link
JP (1) JP7513080B2 (en)
WO (1) WO2021166840A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04127366A (en) * 1990-09-19 1992-04-28 Koufu Nippon Denki Kk List vector processor
JPH0554059A (en) * 1991-08-29 1993-03-05 Nec Corp Vector processor
JPH11242598A (en) * 1998-02-24 1999-09-07 Fujitsu Ltd Compiling method and device, object program executing method and device and program storage medium
JP2003150577A (en) * 2001-11-08 2003-05-23 Japan Atom Energy Res Inst High speed processing method of addition including indirect address reference on vector computer, program and vector computer using this program
US20070283127A1 (en) * 2003-08-18 2007-12-06 Cray Inc. Method and apparatus for indirectly addressed vector load-add-store across multi-processors
US20160092285A1 (en) * 2014-09-25 2016-03-31 Intel Corporation Method and Apparatus for Approximating Detection of Overlaps Between Memory Ranges

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04127366A (en) * 1990-09-19 1992-04-28 Koufu Nippon Denki Kk List vector processor
JPH0554059A (en) * 1991-08-29 1993-03-05 Nec Corp Vector processor
JPH11242598A (en) * 1998-02-24 1999-09-07 Fujitsu Ltd Compiling method and device, object program executing method and device and program storage medium
JP2003150577A (en) * 2001-11-08 2003-05-23 Japan Atom Energy Res Inst High speed processing method of addition including indirect address reference on vector computer, program and vector computer using this program
US20070283127A1 (en) * 2003-08-18 2007-12-06 Cray Inc. Method and apparatus for indirectly addressed vector load-add-store across multi-processors
US20160092285A1 (en) * 2014-09-25 2016-03-31 Intel Corporation Method and Apparatus for Approximating Detection of Overlaps Between Memory Ranges

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TOORU SUGIYAMA, NAOKI TERADA, TAKESHI MURATA, YOSHIHARU OMURA, HIDEYUKI USUI, HIROSHI MATSUMOTO: "Vectorized Particle Simulation Using "LISTVEC" Compile-directive on SX Super-Computer", JOHO SHORI GAKKAI RONBUNSHI - TRANSACTIONS OF INFORMATION PROCESSING SOCIETY OF JAPAN., vol. 45, no. SIG06 (ACS6), 15 May 2004 (2004-05-15), JP, pages 171 - 175, XP009530679, ISSN: 0387-5806 *

Also Published As

Publication number Publication date
JP7513080B2 (en) 2024-07-09
JPWO2021166840A1 (en) 2021-08-26

Similar Documents

Publication Publication Date Title
US6113650A (en) Compiler for optimization in generating instruction sequence and compiling method
US20180234355A1 (en) Data-plane stateful processing units in packet processing pipelines
WO2010013370A1 (en) Program conversion device and program conversion method
JP5118745B2 (en) Vectorization of memory access instructions
US20110302394A1 (en) System and method for processing regular expressions using simd and parallel streams
JP6666554B2 (en) Information processing apparatus, conversion program, and conversion method
US11556317B2 (en) Instruction translation support method and information processing apparatus
US8266416B2 (en) Dynamic reconfiguration supporting method, dynamic reconfiguration supporting apparatus, and dynamic reconfiguration system
JP2013206291A (en) Program, code generation method and information processing apparatus
WO2021166840A1 (en) Recording medium, compiling device, processing system, and compiling method
JP6945768B2 (en) Detection device, detection method, and detection program
US11635947B2 (en) Instruction translation support method and information processing apparatus
JP3683281B2 (en) High speed memory access processing apparatus and method
JP3763516B2 (en) CONVERSION PROGRAM, COMPILER, COMPUTER DEVICE, AND PROGRAM CONVERSION METHOD
JP2018124877A (en) Code generating device, code generating method, and code generating program
JPH10320212A (en) Cache optimizing method
JP6583033B2 (en) Driver generation program, apparatus, and method
JP2020201530A (en) Compilation device, compilation method, and program
JP2017151903A (en) Compilation device, compilation method and compilation program
WO2022130883A1 (en) Compiling device, compiling method, and compiling program recording medium
JPH10275088A (en) Link optimizing method
JP6254352B2 (en) Apparatus and method for processing invalid operation of loop prologue or epilogue
JP7115563B2 (en) Compilation device, compilation method, and control program
US11327758B2 (en) Non-transitory computer-readable recording medium, assembly instruction conversion method and information processing apparatus
JP3705367B2 (en) Instruction processing method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21756941

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022501869

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21756941

Country of ref document: EP

Kind code of ref document: A1