WO2021166840A1

WO2021166840A1 - Recording medium, compiling device, processing system, and compiling method

Info

Publication number: WO2021166840A1
Application number: PCT/JP2021/005479
Authority: WO
Inventors: 敏也平田
Original assignee: 日本電気株式会社
Priority date: 2020-02-17
Filing date: 2021-02-15
Publication date: 2021-08-26
Also published as: JP7513080B2; JPWO2021166840A1

Abstract

This recording medium records a compiler that is executed to cause a computer of a compiling device to generate instructions to execute: calculation of the address of an array to be referred to by indirect addressing; generation of a vector mask by detecting duplication of the calculated address; calculation between vectors on the basis of the bits in the vector mask; and recalculation using a scalar on the basis of the calculation results between the vectors.

Description

Recording medium, compilation device, processing system and compilation method

The present invention relates to a recording medium, a compiling device, a processing system, and a compiling method.

In the field of large-scale numerical analysis and simulation (for example, the field of handling AI and big data), information processing may be performed using a computer having vector instructions.
Patent Document 1 discloses, as a related technique, a technique relating to an apparatus and a method for managing address collisions when performing a vector operation.
Patent Document 2 discloses a technique for handling a list vector on a vector register as a related technique.

Special Table 2019-517060 Japanese Unexamined Patent Publication No. 4-127367

By the way, the summation operation including the following indirect address reference is called a list summation operation.
DO I = 1, N
X (L (I)) = X (L (I)) + Y (I)
ENDDO
In this list summation operation, the same value may be duplicated in L (I). There is a dependency between the definition and reference of X (L (I)). Therefore, when the same value is duplicated in L (I), the vector operation cannot be performed, and the sequential processing is performed using the scalar instruction.

For example, the non-patent documents "Sugiyama, T., N. Terada, T. Murata, Y. Omura, H. Usui, and H. Matsumoto, Vectorized Computer Simulation Directing" LISTVEC The LISTVEC directive method such as IPSJ journal, 45, SIG 6 (ACS 6), p. 171 (2004) "first performs a vector operation without considering duplication, and then generates an instruction to detect duplication. , It is a method of recalculating the overlapping elements with a scalar. When this method is used, the smaller the number of duplicates, the more effective the vectorization and the faster the processing. FIG. 15 shows an example of an instruction sequence generated for the list summation operation when this method is used. In the instruction sequence example shown in FIG. 15, the number of elements of the array X to be added is divided into a maximum vector length of 256, and processing is performed in a loop. However, as shown in

line numbers

10 and 11 of FIG. 15, in the process for detecting duplication, the compiler according to the LISTVEC instruction line method includes a vector scatter instruction (hereinafter referred to as a VSC instruction) and a vector gather instruction. (Hereinafter referred to as VGT instruction) is generated. In general, VSC instructions and VGT instructions have a high execution cost, and when the LISTVEC instruction line method is used, these instructions are executed every time the loop is repeated, so that the execution time of the list summation operation becomes long. There is a problem that it will end up.
Therefore, there is a demand for a technique capable of shortening the execution time of the list summation operation even when the elements are duplicated in the list summation operation.

Each aspect of the present invention aims to provide a recording medium, a compiling device, a processing system, and a compiling method capable of solving the above problems.

In order to achieve the above object, according to one aspect of the present invention, the recording medium calculates the address of the array referred to by the indirect address on the computer of the compiling device, and detects the duplication of the calculated address. Generates an instruction to create a vector mask, perform operations between vectors based on the bits of the vector mask, and recalculate with a scalar based on the calculation results between the vectors. Record the compiler.

In order to achieve the above object, according to another aspect of the present invention, the compiling device comprises an address calculation means, a duplicate detection-mask creation means, a vector addition means, and a result invalid term detection means to perform an operation. An instruction to generate an instruction to be executed, the address calculation means generating an instruction to execute calculating the address of an array to be referred to by an indirect address, and the duplicate detection-mask creation means to calculate the calculated address. The vector addition means generates an instruction to execute an operation between vectors based on the bits of the vector mask, and generates an instruction to execute the operation of detecting the duplication of the vectors and creating the vector mask. The result invalid term detecting means generates an instruction to execute recalculation with a scalar based on the calculation result of the vectors.

In order to achieve the above object, according to another aspect of the present invention, the processing system includes the above compilation device and an arithmetic unit that performs arithmetic operations according to the instructions generated by the compilation device.

In order to achieve the above object, according to another aspect of the present invention, the compilation method calculates the address of the array referenced by the indirect address, detects the duplication of the calculated address, and creates a vector mask. This includes performing operations between vectors based on the bits of the vector mask, and recalculating with a scalar based on the calculation results between the vectors.

According to each aspect of the present invention, the execution time of the list summation operation can be shortened even when the elements are duplicated in the list summation operation.

It is a figure which shows an example of the structure of the processing system by one Embodiment of this invention. It is a figure which shows an example of the function realized by the compiler in the compilation apparatus by one Embodiment of this invention. It is a figure which shows an example of the vector instruction generation means by one Embodiment of this invention. It is a figure which shows an example of the instruction sequence of the list summation operation generated by the compiler by one Embodiment of this invention. It is a figure which shows an example of the mnemonic by one Embodiment of this invention. It is a figure which shows an example of the processing flow of the arithmetic unit by one Embodiment of this invention. It is the first figure for demonstrating the processing by the arithmetic unit by one Embodiment of this invention. It is a 2nd figure for demonstrating the processing by the arithmetic unit by one Embodiment of this invention. It is a 3rd figure for demonstrating the processing by the arithmetic unit by one Embodiment of this invention. It is a 4th figure for demonstrating the processing by the arithmetic unit by one Embodiment of this invention. It is a 5th figure for demonstrating the processing by the arithmetic unit by one Embodiment of this invention. It is a figure which shows the compilation apparatus of the minimum structure by embodiment of this invention. It is a figure which shows an example of the processing flow of the compilation apparatus of the minimum configuration by embodiment of this invention. It is a schematic block diagram which shows the structure of the computer which concerns on at least one Embodiment. It is a figure which shows an example of the instruction sequence of the list summation operation generated by the compiler which concerns on this invention.

Hereinafter, embodiments will be described in detail with reference to the drawings.
<Embodiment>
The compiler 1 according to the embodiment of the present invention is a compiler having a vector instruction generation function for high-speed processing of a sum operation including an indirect address reference in the compilation device 10. Specifically, the compiler 1 newly adds a VFMD instruction that detects elements having the same value in a single vector register and creates a vector mask, and executes the VFMD instruction with indirect access. It is a compiler that generates a high-speed instruction sequence with reduced costly instructions.
The hardware to be processed according to the instruction sequence generated by the compiler 1 loads the data pointed to by the address in the memory stored in each element of the vector register specified by the list vector. The instruction set includes a vector gather instruction to be loaded into and a vector scatter instruction to store the data on the vector register at the store destination address on the memory stored in each element of the vector register specified by the list vector. 20 is a computing device 20 having a vector processor.
As shown in FIG. 1, the processing system 100 according to the embodiment of the present invention includes a compilation device 10 and an arithmetic unit 20.

The compiler 1 generates an object code (instruction) from the source program in the compile device 10. As shown in FIG. 2, the compiler 1 functions as a code analysis means 11 and an instruction generation means 12 in the compilation device 10.

The code analysis means 11 is a means for analyzing a program and determining whether or not to perform vectorization of the list summation operation. The code analysis means 11 includes an instruction line analysis means 111 and a list summation operation syntax analysis means 112.
The instruction line analysis means 111 analyzes whether or not an instruction line that allows vectorization of the list summation operation is specified.
When the list summation parsing means 112 analyzes that the instruction line parsing means 111 specifies an instruction line that allows vectorization of the list summation operation, the list summation operation to which the instruction line is specified is vectorized. Analyze whether it is a possible format.

The instruction generation means 12 is a means for generating a vectorization code when the code analysis means 11 determines that the list summation operation is vectorized based on the analysis result. The instruction generation means 12 includes a vector instruction generation means 121 and a scalar recalculation instruction generation means 122.
The vector instruction generation means 121 performs vector addition of all elements, and generates an instruction for detecting an address duplication at the time of the addition.
The scalar recalculation instruction generation means 122 generates an instruction (hereinafter, referred to as “scalar recalculation instruction”) for recalculating an element whose result is invalid due to duplication of addresses.

The code analysis means 11 and the scalar recalculation instruction generation means 122 are means that the compiler related to the LISTVEC instruction line method also has, and for example, "Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. , "Compilers: Principles, Technologies, and Tools (2nd Edition)", (USA), Pearson Education, Inc, 2007, pp.1-581. ..

As shown in FIG. 3, the vector instruction generation means 121 includes an address calculation means 1211, a duplicate detection-mask creation means 1212, a vector addition means 1213, and a result invalid term detection means 1214.
The address calculation means 1211 calculates the address of the array referred to by the indirect address.
Duplicate detection-The mask creating means 1212 detects the duplicate of the address calculated by the address calculating means 1211 and creates a vector mask.
The vector addition means 1213 calculates the addition of vectors based on the bits of the vector mask.
The result invalid term detecting means 1214 confirms the necessity of generating the scalar recalculation instruction and branches the process. That is, when the result invalid term detecting means 1214 determines that it is necessary to generate a scalar recalculation instruction by detecting that the addition result between the vectors is invalid before performing the addition operation, the scalar recalculation is performed. Let the vector addition means 1213 execute the generation of the instruction.

Next, the instructions generated by the vector instruction generation means 121 will be described by taking as an example the instruction sequence of the list summation operation generated by the compiler 1 according to the embodiment of the present invention shown in FIG.
The address calculation means 1211 generates the instructions of

line numbers

6 and 7 shown in FIG. 4 and calculates the address of each element of X (L (I)). Duplicate detection-The mask creating means 1212 generates a newly added VFMD instruction at line number 8 shown in FIG. 4, and detects duplicate addresses of each element of X (L (I)) stored in the vector register. , Generate a vector mask. The vector addition means 1213 generates an instruction of line number 12 from line number 9 shown in FIG. 4, performs vector addition of X (L (I)) + Y (I), and outputs the calculation result to X (L (I)). ) Indicates the memory. The result invalid term detection means 1214 generates the instructions of

line numbers

13 and 14, and counts the number of elements in which the bit of the vector mask created by the duplicate detection-mask creation means 1212 is 1 according to the PCVM instruction. If the number is not 0, that is, if there is even one duplicate address, the process is branched to generate a scalar recalculation instruction.

Comparing the instruction sequence of the list summation operation generated by the compiler 1 according to the embodiment of the present invention shown in FIG. 4 with the instruction sequence generated by using the LISTVEC instruction line method shown in FIG. 15, FIG. In the instruction sequence shown in (1), VSEQ (vector sequential number instruction) (line number 1), VSC (line number 10), VGT (line number 11), VCMPS (vector compare instruction) (row number 1), VSC (line number 11), VCMPS (vector compare instruction) generated for address duplication detection ( Line numbers 15) and VFMK (vector form mask command) (line number 16) are not generated in the command sequence shown in FIG. 4, and instead of those commands, a new VFMD command (line number 8) Is generated.

In the case of the example of the instruction sequence generated by using the LISTVEC instruction line method shown in FIG. 15, a VSFA instruction (Vector Shift Left and Add) is generated at line number 8 and each element of X (L (I)) is generated. The address is calculated and stored in the vector register. Further, in the case of the example of the instruction sequence generated by using the LISTVEC instruction line method shown in FIG. 15, 4-byte data is the target of compilation, and the instruction "vsfa% v59,% v60, 2,% s59" is generated. Will be done. Then, in the case of the example of the instruction sequence generated by using the LISTVEC instruction row method shown in FIG. 15, each element of L (I) is stored in the vector register% v60 at the previous row number 7. The address of X stored in the scalar register% s59 is added to the value obtained by multiplying the value by 4 bytes of the data size to calculate the address of X (L (I)), which is stored in the vector register% v59.
On the other hand, in the case of the instruction sequence of the list summation operation generated by the compiler 1 according to the embodiment of the present invention shown in FIG. 4, the vector register% v59 storing the addresses of each element of X (L (I)) is set. On the other hand, an instruction to detect whether there are elements having the same value, that is, whether there is a duplicate in the address of X (L (I)), and to create a vector mask in which the bit of the index number of the duplicate element is 1. Is newly added. The name of this new instruction is VFMD (Vector Form Mask Duplicate), and an example of the mnemonic is shown in FIG. As shown in FIG. 5, the VFMD instruction uses the vector register VR0 that stores the addresses of each element of X (L (I)) calculated by the VSFA instruction as the source, and the created vector mask as the vector mask register VM0. Store in.
The compiler 1 according to the embodiment of the present invention has a function of generating the newly added vector mask creation instruction VFMD, and for example, the VSC generated when detecting the duplication of addresses by using the LISTVEC instruction line method. By generating a new VFMD instruction instead of the instruction sequence including the instruction and the VGT instruction, the instruction having a high execution cost in the list summation operation is reduced, and the processing speed is realized.

Next, the processing of the arithmetic unit 20 that executes the object code (instruction) generated by the compiler 1 will be described.
Here, the process of the list summation calculation of the arithmetic unit 20 shown in FIG. 6 will be described with reference to FIGS. 7 to 11.
The process of the list summation calculation using the specific numerical values shown in FIGS. 7 to 11 is an example, and does not limit the process of the list summation calculation of the arithmetic unit 20 according to the embodiment of the present invention.

The array X, the array Y, and the array L that is the index of the array X are recorded in the memory of the arithmetic unit 20. The number of elements N of these arrays X, Y, and L is 5, and it is assumed that the numerical values shown in the portion (a) of FIG. 7 are stored in the memory of the arithmetic unit 20 as the elements of each array in the initial state. do.

First, the arithmetic unit 20 reads each element of the array L into the vector register VR0 as shown in the part (b) of FIG.

Next, as shown in the part (c) of FIG. 8, the arithmetic unit 20 calculates the address of each element of the array X (L (I)) (step S1) and stores it in the vector register VR1.

Next, as shown in the part (d) of FIG. 8, the arithmetic unit 20 detects the duplication of the values of each element of the vector register VR1 by the newly added VFMD instruction (step S2), and the vector mask register. A vector mask is generated in VM0 (step S3). The arithmetic unit 20 compares the first element adr (X (1)) of the vector register VR1 with the second and subsequent elements of the vector register VR1 in order. In the example shown in the part (d) of FIG. 8, the third element of the vector register VR1 is adr (X (1)), which overlaps with the first element adr (X (1)). Therefore, the first element of the vector register VR1 is determined to have duplication, and the bit of the first element of the vector mask register VM0 is set to 1. The arithmetic unit 20 similarly determines the duplication of the second and subsequent elements of the vector register VR1, sets the bit of the vector mask register VM0 to 1 if there is duplication, and sets the bit of the vector mask register VM0 if there is no duplication. Set to 0. Bits of the vector mask register VM0 corresponding to the last appearing element (in the case of the example shown in the part (d) of FIG. 8), which is the fourth element (X (1))) among the elements having the overlap of the vector register VR1. Is 0. This indicates that when a vector operation is performed, the operation result of the last element overwrites the operation result before that and is reflected in the memory, and there is no need to recalculate with a scalar. As will be described later, this is an equation when the arithmetic unit 20 recalculates with a scalar. Of the right side of X (1) = X (1) + Y (1) + Y (3) + Y (4), X (1) ) + Y (4) = 9 has already been calculated and the result is stored in the vector register VR1, and it is not necessary to recalculate this last element with a scalar. Therefore, the bit of VM0 is set to 0 and X ( 1) The scalar command of + Y (4) is not generated and the number of times of recalculation with the scalar is reduced.

Next, as shown in the portion (e) of FIG. 9, the arithmetic unit 20 reads the value of the array X (L (I)) into the vector register VR2, and the value of the array Y (I) is the vector register VR3. Is read into. Then, as shown in the portion (f) of FIG. 9, the arithmetic unit 20 executes vector addition for the nth element of the vector register VR2 and the nth element of the vector register VR3 (step S4), and the calculation result. Is stored in the vector register VR4. In this case, n is an integer from 1 to 5.

Next, the arithmetic unit 20 writes the arithmetic result to the memory indicated by the array X (L (I)) as shown in the portion (g) of FIG. At this time, the arithmetic unit 20 first writes the first stored value 3 of the vector register VR1 to the element X (1) having the overlapping address, and then stores the third value 3. The value 7 is overwritten, the fourth stored value 9 is overwritten, and the last written value 9 is reflected in the memory.
When each element is calculated by scalar using X (L (I)) = X (L (I)) + Y (I), the element array X (1) = X (1) + Y (1) + Y ( 3) + Y (4), and 2 + 1 + 5 + 7 = 15 is correct as the value of the element X (1). The arithmetic unit 20 can determine that the arithmetic result is an invalid result (inappropriate result) due to the duplication of addresses (step S5).
As shown in the part (h) of FIG. 10, the arithmetic unit 20 counts the number of elements in which the bit of the vector mask register VM0 is 1, and if the counted number is not 0, the process recalculates with a scalar. Branch control.

When the arithmetic unit 20 determines that the arithmetic result is an invalid result (inappropriate result) (YES in step S5), the arithmetic unit 20 causes the vector mask as shown in the part (i) of FIG. The number of the element in which the bit of the register VM0 is 1 is recalculated by the scalar (step S6).

First, the arithmetic unit 20 calculates X (L (1)) + Y (1) for the first element. At this time, since the vector calculation result X (1) + Y (4) = 9 is stored in the element X (L (1)), the result of X (L (1)) + Y (1) is 9 + 1 = It becomes 10, and the arithmetic unit 20 writes 10 to the element X (L (1)), that is, the element X (1).
Note that FIG. 11 shows the final result. Therefore, in FIG. 11, the value stored in X (1) is 15 instead of 10.

Next, the arithmetic unit 20 calculates X (L (3)) = X (L (3)) + Y (3) for the third element. Since the value 10 obtained by recalculating with the above scalar is stored in the element X (L (3)) = X (1), the result of X (L (3)) + Y (3) is 10 + 5. = 15, and the arithmetic unit 20 writes 15 to the element X (L (3)), that is, the element X (1). The calculation with this scalar is repeated for the element whose bit is 1 in VM0, and as a result, the invalid term is corrected. In the example shown in FIG. 4, the recalculation in the scalar is completed at the third element, and the final calculation result is stored as shown in the part (j) of FIG. The value of the first element of the array X with duplicate addresses is 15, and the correct result is obtained.

If the arithmetic unit 20 determines that the arithmetic result is not an invalid result (inappropriate result), the process ends.

The compiler 1 according to the embodiment of the present invention has been described above.
In a compiler using, for example, the LISTVEC instruction line method related to the present invention, as shown in FIG. 15, VSEQ (vector sequential number instruction) (line number 1) and VSC (line number 10) generated for address duplication detection are generated. ), VGT (line number 11), VCMPS (vector compile command) (line number 15), and VFMK (vector form mask command) (line number 16). In the compiler 1 according to one embodiment, only a new VFMD instruction is added (line number 8 in FIG. 4), and the compiler 10 functions as a vector instruction generation means 121 in the compile device 10. The vector instruction generation means 121 performs vector addition of all elements, and generates an instruction for detecting an address duplication at the time of the addition. The vector instruction generation means 121 includes an address calculation means 1211, a duplicate detection-mask creation means 1212, a vector addition means 1213, and a result invalid term detection means 1214. The address calculation means 1211 calculates the address of the array referred to by the indirect address. Duplicate detection-The mask creating means 1212 detects the duplicate of the address calculated by the address calculating means 1211 and creates a vector mask. The vector addition means 1213 calculates the addition of vectors based on the bits of the vector mask. The result invalid term detecting means 1214 confirms the necessity of generating the scalar recalculation instruction and branches the process. That is, the result invalid term detecting means 1214 causes the vector adding means 1213 to generate the scalar recalculation instruction when it is determined that the scalar recalculation instruction needs to be generated based on the addition result between the vectors.
As described above, by newly adding the VFMD instruction, the VSC instruction and the VGT instruction, which have a high execution cost accompanied by the memory access, are eliminated, and the VFMD instruction having only the register access can be processed. As a result, the compiler 1 can shorten the execution time of the list summation operation in the compile device 10 even when the elements are duplicated in the list summation operation.

In one embodiment of the present invention, the list summation calculation has been described, but in another embodiment of the present invention, it may be applied to the following type of calculation instead of the list summation calculation.
X (L (I)) = X (L (I)) op expr
op: Vector-operable instruction expr: Expression that does not include a reference to X

In one embodiment of the present invention, the case where the op part is an addition instruction has been described as an example, but the op part may be other than an addition instruction as long as it is an instruction capable of vector operation such as a subtraction instruction.
By substituting the addition in one embodiment of the present invention with a vector operation such as subtraction and considering the same compilation and operation, the VSC instruction and the VGT instruction, which are instructions with high execution cost accompanied by memory access, can be used for vector operations other than addition. It disappears and the processing can be speeded up.

The minimum configuration compilation device 10 according to the embodiment of the present invention will be described.
As shown in FIG. 12, the minimum configuration compiling device 10 according to the embodiment of the present invention includes address calculation means 1211, duplication detection-mask creation means 1212, vector addition means 1213, and result invalid term detection means 1214.
The address calculation means 1211 generates an instruction to execute the calculation of the address of the array to which the indirect address is referenced.
Duplicate detection-The mask creating means 1212 generates an instruction to detect the duplicate of the address calculated by the address calculating means 1211 and execute the creation of the vector mask.
The vector addition means 1213 generates an instruction to execute an operation between the vectors based on the bits of the vector mask.
The result invalid term detecting means 1214 generates an instruction for causing the vector adding means 1213 to recalculate with a scalar based on the calculation result of the vectors.

Next, processing by the compilation device 10 having the minimum configuration according to the embodiment of the present invention will be described.
Here, the processing flow shown in FIG. 13 will be described.
The address calculation means 1211 generates an instruction for executing the calculation of the address of the array referred to by the indirect address (step S11).
Duplicate detection-The mask creating means 1212 generates an instruction to detect the duplicate of the address calculated by the address calculating means 1211 and execute the creation of the vector mask (step S12).
The vector addition means 1213 generates an instruction to execute an operation between the vectors based on the bits of the vector mask (step S13).
The result invalid term detecting means 1214 generates an instruction to cause the vector adding means 1213 to recalculate with a scalar based on the calculation result of the vectors (step S14).

The compilation device 10 having the minimum configuration according to the embodiment of the present invention has been described above.
With this compile device 10, the execution time of the list summation operation can be shortened even when the elements are duplicated in the list summation operation.

In one embodiment of the present invention, the newly added VFMD instruction has been described as detecting duplication between elements using a vector register storing the calculated X (L (I)) address. However, in another embodiment of the present invention, the address duplication of the list summation operation may be detected by detecting the duplication between the elements of the vector register loaded with the value of the array L (I). Explaining this by taking the instruction sequence shown in FIG. 4 as an example, the value of the array L (I) is stored in the vector register% v60 in line number 6, and here, “vfmd% vm15,% v60” and%. By generating the VFMD instruction using v60 as a source, the same vector mask as the vector mask in one embodiment of the present invention can be created.

In one embodiment of the present invention, the compilation device 10 and the arithmetic unit 20 have been described as separate devices. However, in another embodiment of the present invention, the compiling device 10 and the arithmetic unit 20 are housed in one device, and the one device performs both the processing performed by the compiling device 10 and the processing performed by the arithmetic unit 20. It may be a thing.

In the processing according to the embodiment of the present invention, the order of the processing may be changed within the range in which the appropriate processing is performed.

Although the embodiment of the present invention has been described, the above-mentioned compilation device 10, arithmetic unit 20, and other control devices may have a computer device inside. The process of the above-mentioned processing is stored in a computer-readable recording medium in the form of a program, and the above-mentioned processing is performed by the computer reading and executing this program. A specific example of a computer is shown below.
FIG. 14 is a schematic block diagram showing the configuration of a computer according to at least one embodiment.
As shown in FIG. 14, the computer 5 includes a CPU 6 (including a vector processor), a main memory 7, a storage 8, and an interface 9.
For example, each of the above-mentioned compilation device 10, arithmetic unit 20, and other control devices is mounted on the computer 5. The operation of each processing unit described above is stored in the storage 8 in the form of a program. The CPU 6 reads the program from the storage 8, expands it into the main memory 7, and executes the above processing according to the program. Further, the CPU 6 secures a storage area corresponding to each of the above-mentioned storage units in the main memory 7 according to the program.

Examples of the storage 8 include HDD (Hard Disk Drive), SSD (Solid State Drive), magnetic disk, optical magnetic disk, CD-ROM (Compact Disk Read Only Memory), DVD-ROM (Digital Versatile Disk) , Semiconductor memory and the like. The storage 8 may be internal media directly connected to the bus of computer 5, or external media connected to computer 5 via an interface 9 or a communication line. When this program is distributed to the computer 5 via a communication line, the distributed computer 5 may expand the program in the main memory 7 and execute the above processing. In at least one embodiment, the storage 8 is a non-temporary tangible storage medium.

Further, the above program may realize a part of the above-mentioned functions. Further, the program may be a file that can realize the above-mentioned functions in combination with a program already recorded in the computer device, that is, a so-called difference file (difference program).

Although some embodiments of the present invention have been described, these embodiments are examples and do not limit the scope of the invention. Various additions, omissions, replacements, and changes may be made to these embodiments without departing from the gist of the invention.

This application claims priority based on Japanese Patent Application No. 2020-024338 filed on February 17, 2020, and incorporates all of its disclosures here.

Each aspect of the present invention may be applied to a recording medium, a compiling device, a processing system, and a compiling method.

1 ... Compiler 5 ... Computer 6 ... CPU
7 ... Main memory 8 ... Storage 9 ... Interface 10 ... Compiling device 11 ... Code analysis means 12 ... Instruction generation means 20 ... Arithmetic device 100 ... Processing system 111 ...・・ Instruction line analysis means 112 ・・・ List summation operation Syntax analysis means 121 ・・・ Vector instruction generation means 122 ・・・ Scalar recalculation instruction generation means 1211 ・・・ Address calculation means 1212 ・・・ Duplicate detection-Mask creation Means 1213 ... Vector addition means 1214 ... Result invalid term detection means

Claims

On the computer of the compiler
Indirect address calculating the address of the referenced array and
Detecting the calculated duplication of the addresses and creating a vector mask,
Performing operations between vectors based on the bits of the vector mask,
Recalculating with a scalar based on the calculation result of the vectors,
A recording medium that records a compiler that produces instructions to execute.
The operation is addition or subtraction,
The recording medium according to claim 1.
The instruction includes a VFMD instruction that detects duplicate addresses and executes the creation of the vector mask.
The VFMD instruction changes the access from the CPU to the memory in the computer to the access from the CPU to the register in the computer.
The recording medium according to claim 1 or 2.
The VFMD instruction generates a scalar recalculation instruction that recalculates the operations between the vectors with a scalar when there is an address duplication in the register.
The recording medium according to claim 3.
A compiling device that includes an address calculation means, a duplicate detection-mask creation means, a vector addition means, and a result invalid term detection means, and generates an instruction to execute an operation.
The address calculation means generates an instruction to execute the calculation of the address of the array to be referred to by the indirect address.
The duplication detection-mask creation means generates an instruction to detect the calculated duplication of the address and execute the creation of the vector mask.
The vector addition means generates an instruction to execute an operation between vectors based on the bits of the vector mask.
The result invalid term detecting means generates an instruction to execute recalculation with a scalar based on the calculation result of the vectors.
Compiler.
The compiling device according to claim 5 and
An arithmetic unit that performs operations according to the instructions generated by the compilation unit, and
Processing system equipped with.
Indirect address calculating the address of the referenced array and
Detecting the calculated duplication of the addresses and creating a vector mask,
Performing operations between vectors based on the bits of the vector mask,
Recalculating with a scalar based on the calculation result of the vectors,
How to compile including.