CN115328493A

CN115328493A - Compiling method, electronic device, and medium for vector instructions with masks

Info

Publication number: CN115328493A
Application number: CN202211064698.4A
Authority: CN
Inventors: 廖兴龙
Original assignee: ARM Technology China Co Ltd
Current assignee: ARM Technology China Co Ltd
Priority date: 2022-09-01
Filing date: 2022-09-01
Publication date: 2022-11-11

Abstract

The present application relates to the field of computer technologies, and in particular, to a compiling method, an electronic device, and a medium for a vector instruction with a mask. The method is applied to the electronic equipment, and a compiler is installed on the electronic equipment; and the method comprises: a compiler acquires a program to be compiled; the method comprises the steps that a compiler determines that a program to be compiled comprises a first program code, and the code form of the first program code accords with the form of a preset conditional operator expression; the compiler generates a first vector instruction according to a first mask of the first program code and the vector operation expression; and the compiler determines that the operation mode of the first program code is a zero setting mode, and generates a second vector instruction. According to the compiling method for the vector instruction with the mask, a user does not need to remember and call a built-in interface related to an operation mode during programming, generality of a program code and usability of programming are improved, and programming efficiency is improved.

Description

Compiling method, electronic device, and medium for vector instructions with masks

Technical Field

The invention relates to the technical field of computers, in particular to a compiling method, electronic equipment and medium for vector instructions with masks.

Background

The process of data processing in an electronic device is shown in fig. 1: first, a user inputs a program code for implementing data processing into the electronic device 10, and the compiler 100 compiles the received program code to obtain an executable instruction or an executable file and stores the executable instruction or the executable file in the memory 200. The operating system of the electronic device 10 distributes the compiled executable instructions or executable files that can be executed by the processor 300 to the processor 300 for data processing.

Currently, when the compiler 100 compiles the received program code including the vector instructions with masks, the compilation of the vector instructions with masks needs to be implemented through a series of built-in compiler (build-ins) interfaces. For example, for an add instruction of vector op1 and vector op2 with masks, the corresponding program code may for example: op1= add _ m (op 1, op2, mask), where mask is a mask and add _ m is a function, and is used to call a build-ins interface corresponding to a merge (merge) mode, where when a compiler compiles the code, at a compilation back end, a number of a compiler built-in function (intrinsic) corresponding to the add _ m function in the software architecture is determined, and a corresponding assembly instruction or binary instruction is generated according to the intrinsic number, thereby completing compilation of the program code.

Because the software architectures of compilers of different companies are different, and the functions corresponding to intrinsic numbers in different software architectures are different, a user needs to remember add _ m functions corresponding to various build-ins interfaces in different software architectures during programming, so that the call of the corresponding build-ins interfaces under the software architecture can be realized. The program codes input by the user have no universality in different software architectures, so that the programming efficiency of the user is low. And in order to support the vector operation with the mask, a compiler needs to provide specific build-ins interfaces, including a merge mode build-ins interface and a zero mode build-ins interface, and a software system developer needs to take a lot of time to maintain the specific build-ins interfaces.

Disclosure of Invention

In order to solve the problems that the codes are not universal and have low programming efficiency, embodiments of the present application provide a compiling method for vector instructions with masks, an electronic device, and a readable storage medium.

In a first aspect, an embodiment of the present application provides a compiling method for a vector instruction with a mask, which is applied to an electronic device, where the electronic device is installed with a compiler;

the method comprises the following steps:

the compiler acquires a program to be compiled;

the compiler determines that the program to be compiled comprises a first program code, and the code form of the first program code conforms to the form of a preset conditional operator expression, wherein the first program code comprises a first mask of a vector instruction with a mask, a vector operation expression and an operation mode, and the operation mode is a merge mode or a zero setting mode;

the compiler generates a first vector instruction according to the first mask of the first program code and the vector operation expression, wherein the first vector instruction is used for representing the vector operation corresponding to the first program code;

and if the compiler determines that the operation mode of the first program code is a zero setting mode, generating a second vector instruction, wherein the second vector instruction is used for representing that the operation mode corresponding to the first program code is the zero setting mode.

According to the compiling method for the vector instruction with the mask, when a user inputs a program code into a compiler, the user only needs to input the first program code according to the form of the conditional operation expression, a built-in interface related to an operation mode does not need to be memorized and called during programming, the user can program the vector instruction like common programming, the universality of the program code and the usability of the programming are improved, and the programming efficiency is improved. And the compiler does not need to provide a built-in interface matched with the vector instruction, the number of interfaces of the compiler is reduced, and the maintenance cost of the built-in interface is reduced.

In addition, since the embodiment of the present application does not need a built-in interface for performing the vector operation, the variable corresponding to the operation result of the vector operation may not be limited to be one of the input variables participating in the vector operation.

In a possible implementation manner of the first aspect, the preset conditional operator expression includes a first expression, a second expression and a third expression, and the second expression is located between the first expression and the third expression;

the first expression corresponds to the first mask in the first program code, the second expression corresponds to the vector operation expression in the first program code, and the third expression corresponds to an operation mode in the first program code.

the first expression corresponds to the first mask in the first program code, the second expression corresponds to the operation mode in the first program code, and the third expression corresponds to the vector operation expression in the first program code.

In a possible implementation manner of the first aspect, the method further includes:

the compiler performs bitwise negation on the first mask to obtain a second mask;

the compiler generating a second program code comprising the second mask, the vector operational expression, and the operational mode;

the compiler generates a third vector instruction according to the second mask of the second program code and the vector operation expression.

if the first mask does not match the Boolean vector type, determining that the comparison result of the mask and the zero vector is a third mask corresponding to the first expression;

the method further comprises the following steps:

the compiler generates a fourth vector instruction according to the third mask of the first program code and the vector operation expression.

In a possible implementation manner of the first aspect, if the compiler determines that the operation mode of the first program code is a zero setting mode, generating a second vector instruction includes:

the compiler determining the operational mode of the first program code to be a zeroing mode;

the compiler generates a zero operation instruction.

the compiler determines that an execution order of the first vector instructions precedes an execution order of the second vector instructions.

the compiler determines the operation mode of the first program code to be a merge mode;

and the compiler determines that the return variable of the first program code is different from the variable corresponding to the operation mode, and the compiler generates a copy operation instruction.

the compiler determines that an execution order of the first vector instructions is subsequent to an execution order of the copy operation instructions.

the compiler determines that the program to be compiled includes a third program code, and the third program code is a code of a pre-copy instruction or a code of a post-copy instruction.

if the third program code is the code of the pre-copy instruction and the operation mode is a zero setting mode, generating the second vector instruction;

and if the third program code is the code of the copy-after instruction and the operation mode is a merging mode, generating the second vector instruction.

the compiler identifies the second vector instruction with an instruction identification;

the compiler generates a first vector instruction packet including the first vector instruction and the second vector instruction according to the instruction identification.

In a second aspect, an embodiment of the present application provides a compiling apparatus for a vector instruction with a mask, which is applied to an electronic device, where a compiler is installed on the electronic device; the device includes:

the program acquisition module is used for acquiring a program to be compiled by the compiler;

a first instruction generation module, configured to generate, by the compiler, a first vector instruction according to the first mask of the first program code and the vector operation expression, where the first vector instruction is used to characterize a vector operation corresponding to the first program code;

and a second instruction generating module, configured to, if the compiler determines that the operation mode of the first program code is a zero setting mode, generate a second vector instruction, where the second vector instruction is used to characterize that the operation mode corresponding to the first program code is the zero setting mode.

In a third aspect, an embodiment of the present application provides an electronic device, one or more processors; one or more memories; the one or more memories store one or more programs that, when executed by the one or more processors, cause the electronic device to perform the above-described method for compiling masked vector instructions.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having instructions stored thereon, which when executed on a computer cause the computer to perform the above-mentioned compiling method for vector instructions with masks.

In a fifth aspect, embodiments of the present application provide a computer program product comprising instructions that, when executed, cause a computer to perform the above compiling method for masked vector instructions.

Drawings

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;

FIG. 2 is a diagram illustrating an operation of a calculation of a vector instruction with a mask according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram illustrating a compiling flow of the compiler 100 according to an embodiment of the present application;

fig. 4 is a flowchart illustrating a compiling method for a vector instruction with a mask according to an embodiment of the present application;

FIG. 5 is a flowchart illustrating another compiling method for vector instructions with masks according to an embodiment of the present disclosure;

fig. 6 is a schematic block diagram illustrating a system structure of an electronic device according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an SOC according to an embodiment of the present application;

fig. 8 is a schematic program module diagram of a compiling apparatus for vector instructions with masks according to an embodiment of the present application.

Detailed Description

The illustrative embodiments of the present application include, but are not limited to, a data processing method, an electronic device, and a readable storage medium. Embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

In the following description, numerous technical details are set forth in order to provide a better understanding of the present invention. However, it will be understood by those skilled in the art that the claimed embodiments of the present invention may be practiced without these specific details and with various changes and modifications based on the following embodiments.

In order to better understand the scheme of the embodiments of the present application, the following first introduces the related terms and concepts that may be involved in the embodiments of the present application.

Compiling: compilation is the process of converting a user program (usually written in a high-level source language, or called source file) into an Intermediate Representation (IR), and performing multiple conversions on the IR to finally generate a corresponding target language code (usually assembly code or binary executable code for a target architecture).

A compiler: is software that translates computer code written in one programming language (the source language) into another language (the target language). It is generally divided into three parts: front end, middle end, back end.

Vector register: registers for performing vector calculations and/or storing vector data.

A mask register: for storing mask data, and a multi-bit mask of the mask data corresponds to data elements of the vector register. The value of each mask bit in the mask data is used to control whether the data element corresponding to the vector register participates in a vector operation.

Merge (merge) mode: and the corresponding position of the vector register which does not participate in the vector operation retains the original value operation mode of the data of the position after the vector operation is completed.

Zero (zero) mode: and the corresponding position of the vector register which does not participate in the vector operation is set to be an operation mode of zero after the vector operation is completed.

Built-in function (intrinsic): a non-standard compiler function provided by the compiler that may be predefined or customized.

Instruction descent (Lowering): the IR instructions are dispatched to the actually processed function. For example, in an underlying Virtual Machine (LLVM) architecture, instruction dropping converts an IR into a SelectionDAG, which can be used to describe the program flow of input program code, i.e., before instruction dropping, the input program code is represented by the IR, and after instruction dropping, the program code is described by the SelectionDAG.

The processing of data by electronic device 10 is further described below in conjunction with FIG. 1.

As shown in fig. 1, the electronic device 10 may include a compiler 100, a memory 200, and a processor 300. The compiler 100 is a hardware structure of the electronic device 10, and the compiler 100 is configured to compile a received program. The memory 200 and the processor 300 are hardware structures of the electronic device 10, the memory 200 is used for storing programs, executable instructions obtained by compiling the programs, and the like, and the processor 300 is used for performing data processing, such as vector operation and the like.

It is understood that the processor 300 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc.

It is understood that the electronic device 10 may include, but is not limited to: laptop computers, desktop computers, tablet computers, cell phones, servers, wearable devices, head-mounted displays, mobile email devices, portable gaming devices, reader devices, televisions, and the like.

Specifically, when the program is input into the electronic device 10, the compiler 100 compiles the received program to obtain executable instructions or executable files, and stores the executable instructions or executable files in the memory 200. The system of the electronic device 10 may allocate the compiled executable instructions or executable files that the processor 300 can execute to the processor 300 for data processing.

It is to be appreciated that processor 300 may include an execution unit 301. Execution unit 301 is used to execute instructions, such as compiled masked vector instructions, and may include circuitry to execute the instructions.

It is to be appreciated that processor 300 may also include a register set 302. The register set 302 may include one or more sets of vector registers 311 and one or more sets of mask registers 312. Where each vector register 311 may be 512 bits, 256 bits, or 128 bits wide, or different vector widths may be used. Each mask register 312 may contain a plurality of mask bits, each mask bit corresponding to one data element in the vector register 311. For example, a 64-bit mask register may be used to mask 64 8-bit data elements of a 512-bit register. For vector registers having different widths (e.g., 256-bits or 128-bits) and data elements of different lengths (e.g., 16-bits, 32-bits, or 64-bits), different numbers of mask bits may be used in conjunction with vector operations.

Processor 300 executes masked vector instructions, which may specifically be: execution unit 301 may read a vector instruction from memory 200 and obtain a mask and vector operands corresponding to the vector instruction. Execution unit 301 may assign the read mask to mask register 312 and the vector operands to vector register 311. When the execution unit 301 executes a vector instruction, the corresponding data elements in the vector register 311 may be controlled to participate in vector operations in the vector instruction by a mask in the mask register 312.

The operation of the processor 300 to perform the computation of masked vector instructions is illustrated below with reference to the figures. For ease of illustration, each vector register 311 in fig. 2 is shown as having eight data elements. In other embodiments, the vector register 311 may have a different number of data elements therein. The vector register 311 may be 128 bits, 256 bits, or 512 bits wide, or may use different widths. Since there are eight data elements in each vector register, only eight mask bits are used in conjunction with each vector register. Further, the vector operation is described as an example of the addition operation.

Fig. 2 is a schematic diagram illustrating an operation process of computing a vector instruction with a mask according to an embodiment of the present application.

As shown in fig. 2, the vector register 311 may include a vector register t1 and a vector registert2, wherein the vector register t1 is used for storing the vector data A and the addition result vector data C of the vector data A and the vector data B, and the vector register t2 is used for storing the vector data B, A _j 、B _j The subscript j of (a) is used to indicate the number of data elements in the vector data. The mask register K is used for storing a mask, and a value of 1 in the mask bit indicates that the data element corresponding to the mask bit participates in the addition operation, and a value of 0 in the mask bit indicates that the data element corresponding to the mask bit does not participate in the addition operation. Further, after execution of a + B, the data of each data bit in vector register t1, vector register t2, and mask register K is as shown in fig. 2.

Wherein, the mask value corresponding to mask bit 6, mask bit 2 and mask bit 1 of mask register K is 0, which indicates that data element a in vector register t1 is ₆ Data element A ₂ And data element A ₁ The vector register t1 does not participate in the addition operation, and the value of the corresponding data bit of the vector register t1 can be determined after the addition operation is executed according to the corresponding operation mode, namely the merging mode or the zero setting mode.

When the operation mode is the merge mode, the data element C in the vector register t1 after the addition operation is performed ₆ Data element C ₂ And data element C ₁ The original value is kept unchanged.

When the operation mode is the zero setting mode, the data element C in the vector register t1 after the addition operation is executed ₆ Data element C ₂ And data element C ₁ Is zero.

To facilitate understanding of the method in the embodiment of the present application, the compiling process of the compiler 100 is described below with reference to fig. 3.

Fig. 3 is a schematic diagram illustrating a compiling flow of the compiler 100 according to an embodiment of the present application.

As shown in fig. 3, the compilation process of compiler 100 includes three phases:

(1) Front end

The front end may include lexical analysis 321, syntactic analysis 322, and semantic analysis 323.

Among them, the lexical analysis 321 may include: the method comprises the steps of scanning characters of a source program line by line through a lexical analyzer, identifying each word in the source program, determining the type of each word, converting the identified words into a uniform built-in representation, namely a lexical unit (token) form, and obtaining the category code and the attribute value of each word. For example, the type code of the keyword word in the source program code is one word and one code, and the attribute value is the keyword. For example, the type code of a constant word in the source code is type one, and the attribute value is a constant.

Among other things, parsing 322 may include: various types of phrases are identified from the token sequence output by the lexical analyzer through a parser (parser), and a parse tree (parse tree) is constructed. The root node of the parse tree may be "sentence", the child nodes may be various phrases and words constituting the "sentence", and the leaf nodes correspond to grammatical types of the words in the "sentence", such as "preposition", "article", "noun", and the like.

Among them, the semantic analysis 323 may include: and performing type checking on the syntax analysis tree, such as semantic related operations of type binding, type derivation, variable resolution and the like.

(2) Middle terminal

The middle end may include intermediate representation 324, code optimization 325, and the like.

The intermediate representation 324 may include a compiler that converts the processing result of the front end into an internal representation representing the semantic and syntactic structure of the source program, and each stage of the compiler performs analysis or optimization transformation on the intermediate representation. The intermediate representation may be an abstract syntax tree, an inverse polish sign or a 3-address code, etc. In some embodiments, the conditional operation expression, when expressed in-between, may be expressed by a (vselect) instruction.

Code optimization 325 may include, among other things: optimization improves the intermediate representation results to generate better object code, "better" may be targeted to be faster, shorter, or less energy intensive.

(3) Back end

The back end may include instruction selection 326, register allocation 327, instruction reordering 328, and generation of a final file of executable instructions.

Among other things, instruction selection 326 may include: the instructions that match the mid-end output are selected among a plurality of instruction sets executable by processor 300 according to the result or level of optimization of the code.

Register allocation 327 may include, among other things: variables, instructions, etc. assigned to the virtual registers during the intermediate representation are assigned to the physical registers. Where virtual registers are registers in an abstract sense, i.e., there is no corresponding hardware structure, and physical registers are registers that actually exist on processor 1300. It is to be appreciated that the vector register 311 and the mask register 312 on the register bank 302 of the processor 300 in fig. 1 are physical registers.

Among other things, instruction reordering 328 may include: and analyzing a plurality of instructions selected in the instruction selection, setting the instructions without data dependency relationship as parallel instructions according to the data dependency relationship among the instructions, and setting the instruction execution sequence of the instructions with the data dependency relationship according to the dependency relationship.

As described above, in the current scheme, for a vector instruction with a mask, a user is generally required to be familiar with and call a build-ins interface corresponding to a vector operation, and due to different built-in interfaces of different software architectures, the user needs to remember various built-in interfaces of different software architectures, so as to implement programming of the vector operation, which is tedious in user programming and low in code universality. And a software architecture needs a specific build-ins interface to realize corresponding vector operation, and maintenance of the build-ins interface needs to consume a large amount of time and cost.

In order to solve the above problem, an embodiment of the present application provides a compiling method for a vector instruction with a mask. Specifically, in the embodiment of the present application, when the compiler 100 compiles an input program, for a vector instruction code portion in the program code, if it is determined that the vector instruction code portion is a code that satisfies a conditional operation expression form, the compiler 100 may generate a first vector instruction that characterizes a vector operation with a mask according to the mask in the vector instruction code portion and the vector operation, and may generate a second vector instruction that characterizes an operation mode when the operation mode in the conditional operation expression is a zero setting mode. Wherein the conditional operation expression may include a mask, a vector operation and a corresponding operation mode, for example, the form of the conditional operation expression is "mask? (op 1+ op 2): a ", where mask denotes a mask, op1+ op2 denotes a vector operation, and a denotes an operation mode.

According to the compiling method for the vector instruction with the mask, when a user inputs a program code into the compiler 100, only the code part of the vector instruction with the mask in the program code needs to be input according to the conditional operation expression, a built-in interface related to an operation mode does not need to be memorized and called during programming, the user can program the vector instruction like common programming, the universality of the program code and the usability of programming are improved, and the programming efficiency is improved. And the compiler does not need to provide a built-in interface matched with the vector instruction, the number of interfaces of the compiler is reduced, and the maintenance cost of the built-in interface is reduced.

It can be understood that the mask indicates whether the corresponding position of the operand (i.e. the vector data in the foregoing) in the vector operation participates in the vector operation, each mask bit may correspond to one data element of the operand, that is, one mask bit may correspond to 1-bit data in the operand, at this time, the data element is 1-bit data, and may also correspond to 8-bit data, at this time, the data element is 8-bit data, and this correspondence relationship is related to the data type of the operand. In some embodiments, the mask may be a binary sequence, where a 0 may indicate that the data element at the corresponding position in the operand does not participate in the vector operation, and a 1 indicates that the data element at the corresponding position in the operand does participate in the vector operation; alternatively, 0 indicates that data at a corresponding position in the operand participates in the vector operation, and 1 indicates that data elements at a corresponding position in the operand do not participate in the vector operation, which is not limited in the present application.

It will be appreciated that the vector operations may be logical operations performed on input operands, such as addition operations, multiplication operations, shift operations, multiply-accumulate operations, and the like. The vector operation may include a single operation type, or may include a mixture of multiple operation types, which is not limited in this application.

In some embodiments, the vector operation may be an operation supported in a library file linked by a compiler, or may be an operation supported by a non-standard compiler, that is, a built-in function, which is not limited in this application.

It is to be understood that the conditional operation expression can be expressed as "expression 1? Expression 2: expression 3 ″ may be in the form of expression 1, expression 2, and expression 3, where expression 1 may represent a mask, expression 2 may represent a vector operation, and expression 3 may represent an operation mode, and in other embodiments, expression 2 may represent an operation mode, expression 3 may represent a vector operation, and expression 1 may represent a mask, which is not limited in this application.

It will be appreciated that the first vector instruction is an instruction generated from a mask and vector operation, for example for program code "op3= mask? (op 1+ op 2): 0', the first vector instruction of the assembly language generated after the compilation by the compiler may be "add.w t3, t1, t2, p0", where t3 corresponds to the vector register where the operand op3 is located, t1 corresponds to the vector register where the operand op2 is located, and p0 corresponds to the mask register carried by the mask.

It will be appreciated that the operation modes, namely the zero-setting mode and the merge mode in the foregoing, are used to indicate the values of the data bits that are not involved in the vector operation. Furthermore, for the operation mode being the zero setting mode, the second vector instruction may be a zero-assigning operation instruction, configured to assign a value of a data bit not involved in the operation in the vector data to zero, as a value of a corresponding data bit in the vector operation result.

For example, for program code "op3= mask? (op 1+ op 2): 0", the generated second vector instruction may be a zero-assignment operation instruction, and the corresponding assembly language code may be, for example," movz. Post t3", configured to assign 0 to a data bit in the vector register t3 where the operand op3 is located, which corresponds to a data element of the result operand op3 that does not participate in the vector operation.

In some embodiments, for the merge mode, the operation mode may or may not generate the copy operation instruction, and the operation mode is determined according to the return value of the conditional operation expression and the parameter of the operation mode.

For example, for program code "op3= mask? (op 1+ op 2): op4", may generate a copy operation instruction, and the corresponding assembly language code may be, for example," mov. Pre t3, t4", configured to copy values of data bits of the vector register t4 in which the operand op4 is located, which correspond to data elements of the operand op1 that do not participate in the vector operation, to corresponding data bits of the vector register t3 in which the operand op3 is located.

It is understood that the above examples of the program code and the assembly language code are only one example of the embodiments of the present application, and those skilled in the art may make adaptive changes according to different high-level program languages and assembly languages, and the present application is not limited thereto.

The compiler generates a first vector instruction for characterizing vector operation with mask according to mask and vector operation in the conditional operator, and when an operation mode in the conditional operation expression is a zeroing mode, may generate a second vector instruction for characterizing the operation mode, which may specifically include: at the middle compiling end, namely the middle representation 324 stage in fig. 3, the compiler may generate a vselect instruction of the middle representation of the conditional operation expression according to the conditional operation expression, and during the descending process of the instruction at the back compiling end, namely the instruction selection 326 stage in fig. 3, the compiler may perform specific processing on the vselect instruction, namely, generate a first vector instruction according to the vector operation and the mask in the conditional operation expression, and generate a second vector instruction when the operation mode in the conditional operation expression is the zero setting mode. Specifically, for the operation mode being a zeroing mode, the first vector instruction precedes the second vector instruction. Furthermore, when the processor 300 executes the executable program compiled by the compiler 100, and the operation mode is the zeroing mode, the processor 300 may execute the first vector instruction first and then execute the second vector instruction.

A compiling method for a vector instruction with mask in the embodiment of the present application is described in detail below with reference to fig. 4.

It can be understood that, because the compiling of the conditional operation expression in the embodiment of the present application needs to perform the determination of the true and false values, that is, the compiler 100 needs to support (pool) vectors, in order to better implement the method in the embodiment of the present application, for a programming language that does not support pool vectors, it needs to be extended to implement the support for the pool vectors. For example, the compiler of the OpenCL language supports integer (int) vectors and does not support pool vectors, and therefore, the compiler of the OpenCL language needs to be expanded to support the pool vectors by the compiler of the OpenCL language.

Fig. 4 is a flowchart illustrating a compiling method for a vector instruction with a mask according to an embodiment of the present application.

As shown in fig. 4, a compiling method for a vector instruction with a mask in an embodiment of the present application includes:

401: the compiler 100 acquires program code input by a user.

It is to be understood that the program code obtained by the compiler may be a high-level programming language, such as c language, c + + language, openCL language, and the like, which is not limited in this application.

402: compiler 100 determines code in program code that includes masked vector instructions characterized as conditional operation expressions.

It is to be understood that the form of the conditional operation expression can be expressed as "expression 1? Expression 2: expression 3". The expression 1 is a pool type data, and has two possible values, 0 (false) and 1 (true), which can be used to determine whether the condition is true. If the condition is satisfied, the value of the expression 1 is 1, and if the condition is not satisfied, the value of the expression 1 is 0. The conditional operator "expression 1? Expression 2: expression 3 "can be understood as that, when the value of expression 1 is 1, the return value of the conditional operation expression is the value of expression 2; when the value of expression 1 is 0, the return value of the conditional operation expression is the value of expression 3.

It will be appreciated that the code characterizing a masked vector instruction in the form of a conditional operation expression may be a "mask variable? Vector operation expression: operation mode variables ". The code of the vector instruction with the mask can be understood as that when the value of the mask bit of the mask variable is 1, in the corresponding vector operation expression, the data element corresponding to the mask bit in the operand participating in the vector operation participates in the vector operation corresponding to the vector operation expression; when the value of the mask bit of the mask variable is 0, in the corresponding vector operation expression, the data element corresponding to the mask bit in the operand participating in the vector operation does not participate in the vector operation corresponding to the vector operation expression. Also, it may be determined, according to the operation mode variable, that when the value of a mask bit of the mask variable is 0, a value corresponding to the mask bit in the return value of the conditional operation expression. Specifically, when the operation mode variable represents that the operation mode is the merge mode, when the value of the mask bit of the mask variable is 0, the value corresponding to the mask bit in the return value of the conditional operation expression keeps the value of the data element corresponding to the operand in the original vector operation expression unchanged; and when the operation mode variable represents that the operation mode is a zero setting mode, when the value of the mask bit of the mask variable is 0, the value corresponding to the mask bit in the return value of the conditional operation expression is set to be zero.

It is to be understood that the data type of the mask variable is a boolean vector type, the data type of the operand in the vector operation expression is a vector type, and the data type of the operation mode variable is a vector type.

It is understood that the vector operation corresponding to the vector operation expression may be a logical operation performed on an input operand, such as an addition operation, a multiplication operation, a shift operation, a multiply-accumulate operation, an accumulate operation, and the like. The vector operation may include a single operation type, or may include a mixture of multiple operation types, which is not limited in this application.

It will be appreciated that data corresponding to variables of the vector operation in the vector operation expression is stored in the vector register 311 and data corresponding to the mask variables is stored in the mask register 312.

In some embodiments, the code characterizing a masked vector instruction in the form of a conditional operation expression may be a "mask variable? Operation mode variables: vector operation expression ". Compiler 100 may first perform bitwise negation on the vector data of the mask variable corresponding to expression 1 and exchange the contents of

expressions

2 and 3 before performing

steps

403 and 404. For example, for program code "op3= mask0: (op 1+ op 2) ", if the compiler 100 determines that the expression 2 corresponds to the operation pattern vector 0, the compiler 100 generates a bitwise negation instruction about" mask "before executing the

steps

403 and 404 to generate the first vector instruction and the second vector instruction, and inverts 0 and 1 in the mask vector. For example, the mask of mask register K in fig. 2 is bitwise inverted by "10111001" to obtain mask "01000110". Moreover, the compiler 100 exchanges the 0 vector corresponding to the expression 2 with the addition operation expression corresponding to the expression 3, that is, modifies the conditional operation expression to obtain a conditional operator expression "op3= mask? (op 1+ op 2): 0", at this point, mask is the mask vector after bitwise negation.

In some embodiments, if the compiler 100 determines that the function corresponding to the vector operation expression in the conditional operation expression is an intrinsic function of the compiler, the compiler 100 may determine a corresponding relationship between the intrinsic function and an intrinsic number in the software architecture, and determine a build-ins interface and a generated instruction corresponding to the intrinsic function. The compiler may then perform

steps

404 and 404 as described below.

In some embodiments, after the compiler 100 executes step 402, it may determine expression 1, i.e. a mask variable, in the conditional operation expression, and determine whether the mask variable is a cool vector. If the vector is a pool vector, continuing to execute step 403 and step 404; otherwise, the mask variable needs to be compared with the 0 vector, and the return value of the comparison result is the bol vector corresponding to the mask variable. For example, if a mask variable in a conditional operation expression in the input program code is an integer vector, and the compiler 100 determines that the mask variable is a non-bool vector, the compiler 100 may generate a comparison instruction for comparing the integer vector with a 0 vector, and use a return value of the comparison instruction as expression 1 in the conditional operation expression.

In some embodiments, compiler 100 may intermediate representation and optimization of the program code via a compile front end and a compile middle end before performing step 403. At the compile-time end, compiler 100 may generate a vselect instruction of the intermediate representation of the conditional operation expression from the conditional operation expression, for example, at the intermediate representation 324 stage in fig. 3.

It will be appreciated that for conditional operator expressions, compiler 100 can intermediate them, resulting in a vselect instruction.

403: the compiler 100 generates a first vector instruction according to a mask variable and a vector operation expression in a code of the vector instruction with the mask.

It will be appreciated that the first vector instruction is an instruction generated from a mask and vector operation. The first vector instruction may include a vector register corresponding to a vector operation return value, a vector operation instruction, a vector register corresponding to a vector operation, and a mask register corresponding to a mask.

In some embodiments, the generating a first vector instruction according to a mask variable and a vector operation expression in the code of the vector instruction with a mask in step 403 specifically includes: the compiler may process the vselect instruction generated according to the conditional operation expression in the intermediate representation stage at the compile back end, that is, during the instruction descending process of the compile back end, that is, at the instruction selection 326 stage in fig. 3. That is, the compiler 100 may generate the first vector instruction including the vector register corresponding to the vector operation return value, the vector operation instruction, the vector register corresponding to the vector operation, and the mask register corresponding to the mask according to the vector operation and the mask in the conditional operation expression.

For example, for program code "op3= mask? (op 1+ op 2): 0', the first vector instruction in the assembly language generated after the compilation by the compiler may be "add.w t3, t1, t2, p0", where t3 corresponds to the vector register in which the operand op3 is located, t1 corresponds to the vector register in which the operand op2 is located, p0 corresponds to the mask register carried by the mask, and add.w denotes performing an addition operation on vector data with a length of 32 bits.

404: compiler 100 determines that the operation mode in the code of the masked vector instruction is a zeroing mode and generates a second vector instruction.

For example, for program code "op3= mask? (op 1+ op 2): op4", may generate a copy operation instruction, and the corresponding assembly language code may be, for example," mov. Pre t3, t4", configured to copy values of data bits of the vector register t4 in which the operand op4 is located, which correspond to data elements of the operand op1 that do not participate in the vector operation, to corresponding data bits of the vector register t3 in which the operand op3 is located. For example, for program code "op3= mask? (op 1+ op 2): op3", no copy operation instruction may be generated.

In some embodiments, step 404 specifically includes: during the descending of instructions in the compiling back end, namely the instruction selection 326 stage in fig. 3, the compiler may perform specific processing on the vselect instruction, and generate the second vector instruction when the operation mode in the conditional operation expression is determined to be the zeroing mode.

In some embodiments, the compiler 100 may order the execution order of the first vector instruction and the second vector instruction in the instruction reordering stage. Specifically, for the operation mode being the zeroing mode, the first vector instruction precedes the second vector instruction. Furthermore, when the processor 300 executes the executable program compiled by the compiler 100, and the operation mode is the zeroing mode, the processor 300 may execute the first vector instruction first and then execute the second vector instruction.

In some embodiments, for copy operation instructions generated for merge mode based on the operation mode, the first vector instruction follows the copy operation instruction. Furthermore, when the processor 300 executes the executable program compiled by the compiler 100, the processor 300 may execute the second vector instruction first and then execute the first vector instruction.

In some embodiments, the compiler 100 may pack the first vector instruction and the second vector instruction. For example, for program code "op3= mask? (op 1+ op 2): 0 ″, the following first vector instruction and second vector instruction may be generated:

{add.w t3，t1，t2，p0；

movz.post t3；}

the meaning of each parameter is described above and will not be described herein.

As another example, for program code "op3= mask? (op 1+ op 2): op3", the following first vector instruction may be generated, without generating a second vector instruction:

{add.w t3，t1，t2，p0；}

the meaning of each parameter is described in the foregoing, and is not described herein again.

As another example, for program code "op3= mask? (op 1+ op 2): op4", may generate a first vector instruction and a second vector instruction as follows:

{mov.pre t3，t4；

add.w t3，t1，t2，p0；}

According to the compiling method for the vector instruction with the mask, when a user inputs a program code into the compiler 100, the user only needs to input the code part of the vector instruction with the mask in the program code according to the conditional operation expression, and does not need to remember and call a built-in interface related to an operation mode during programming. And the compiler does not need to provide a built-in interface matched with the vector instruction, the number of interfaces of the compiler is reduced, and the maintenance cost of the built-in interface is reduced.

Another compiling method for vector instructions with masks provided by the embodiment of the present application is performed below with reference to fig. 5. Introduction is carried out. It is understood that, in the present embodiment, a Very Long Instruction Word (VLIM) architecture corresponding to the compiler 100 includes a pre-copy (pre-move) Instruction and a post-copy (post-move) Instruction, and the compiler 100 supports a pool vector type.

Fig. 5 is a flowchart illustrating another compiling method for vector instructions with masks according to an embodiment of the present disclosure.

As shown in fig. 5, a compiling method for a vector instruction with a mask in an embodiment of the present application includes:

501: the compiler 100 acquires program code input by a user. Step 501 is the same as step 401, and is not described herein again.

502: compiler 100 determines code that includes masked vector instructions characterized as conditional operation expressions in program code and target copy code.

It is to be understood that the code of the vector instruction with mask, which is characterized in the form of the conditional operation expression, is similar to that in step 402, except that the operation mode variable is fixed and may be a 0 vector or a variable corresponding to the return value of the conditional operation expression in this embodiment. That is, in the present embodiment, the compiler 100 has a default operation mode, which is a default zeroing mode or a default merging mode, and the user does not need to change the setting of the expression.

It will be appreciated that the target copy code may be the code of the pre-copy instruction, or may be the code of the post-copy instruction. The user may modify the default operation mode of compiler 100 via the object instruction code. Specifically, the following step 504 will be described, and will not be described herein.

503: the compiler 100 generates a first vector instruction according to a mask variable and a vector operation expression in a code of the vector instruction with the mask. Step 503 is the same as step 403, and is not described herein again.

504: the compiler 100 determines a target operation mode of the masked vector instruction according to an operation mode variable in a code of the target copy code and the masked vector instruction.

It will be appreciated that the second vector instruction may be a pre-copy instruction or a post-copy instruction. The pre-copy instruction is used for keeping the default operation mode unchanged, and the post-copy instruction is used for changing the default operation mode.

For example, when the operation mode variable in the conditional operation expression represents that the default operation mode of the compiler 100 is the merge mode, and the target copy code in the program code is the code of the pre-copy instruction, the target operation mode corresponding to the generated pre-copy instruction is the merge mode; when the default operation mode of the conditional operation expression representing the compiler 100 is the merge mode and the target copy code in the program code is the code of the post-copy instruction, the target operation mode corresponding to the generated post-copy instruction is the zero-setting mode.

505: when the compiler 100 determines that the target operation mode is the zero setting mode, it generates a second vector instruction and marks the second vector instruction by an instruction identifier. The second vector instruction generated in step 505 and step 404 is the same as the generation method, and is not described herein again.

It can be understood that marking the second vector instruction by the instruction identifier specifically may include: the compiler 100 marks the second vector instruction in the instruction description.

In some embodiments, for the merge mode, the operation mode may or may not generate the copy operation instruction, and the operation mode is determined according to the return value variable of the conditional operation expression and the variable of the operation mode.

For example, for program code "op3= mask? (op 1+ op 2): op4", a copy operation instruction may be generated, and the corresponding assembly language code may be, for example," mov. Pre t3, t4", for copying values of data bits in the vector register t4 where operand op4 is located, which correspond to data elements of operand op1 that are not involved in the vector operation, to corresponding data bits in the vector register t3 where operand op3 is located. For example, for program code "op3= mask? (op 1+ op 2): op3", no copy operation instruction may be generated.

506: the compiler 100 generates a vector instruction packet including the first vector instruction and the second vector instruction according to the instruction identification of the second vector instruction.

It is to be understood that step 506 is for the compiler 100 to pack the first vector instruction and the second vector instruction according to the instruction identification, for example, the first vector instruction and the second vector instruction are included in an instruction packet through middle brackets. The program code entered in step 501 may correspond to a plurality of instruction packets. At least two instructions may be included in each instruction packet.

Fig. 6 is a schematic block diagram of a system structure of an electronic device applied to a method in an embodiment of the present application.

The electronic device 600 may include one or more processors 601 coupled to system control logic 603. For at least one embodiment, system control logic 603 communicates with processor 601 via a multi-drop bus, such as a front-side bus (FSB), a point-to-point interface, such as a quick channel interconnect (QPI), or similar connection. The processor 601 executes instructions that control data processing operations of a general type. In some embodiments, system control logic 603 includes, but is not limited to, graphics memory system control logic (GMCH) (not shown) and an input/output hub (IOH) (which may be on separate chips) (not shown), where the GMCH includes memory and graphics controllers and is coupled with the IOH.

The electronic device 600 may also include a coprocessor 602 and memory 604 coupled to the system control logic 603. Alternatively, one or both of the memory and GMCH may be integrated within the processor (as described herein), with the memory 604 and coprocessor 602 coupled directly to the processor 601 and system control logic 603, with the system control logic 603 and IOH in a single chip. The memory 604 may be, for example, dynamic Random Access Memory (DRAM), phase Change Memory (PCM), or a combination of the two. In one embodiment, the coprocessor 602 is such as, for example, a high-throughput MIC processor, a network or communication processor, a graphics processor, a GPGPU, an embedded processor, or the like. Specifically, the compiling method for the vector instructions with masks in the embodiment of the present application is applied to the coprocessor 602 or the processor 601 in fig. 6, and it is understood that the coprocessor 602 or the processor 601 is the processor 300 in fig. 1.

In one embodiment, the electronic device 600 may further include a Network Interface (NIC) 606. The network interface 606 may include a transceiver to provide a radio interface for the electronic device 600 to communicate with any other suitable device (e.g., front end module, antenna, etc.). In various embodiments, the network interface 606 may be integrated with other components of the electronic device 600. The network interface 606 may implement the functions of the communication unit in the above-described embodiments.

The electronic device 600 may further include an input/output (I/O) device 605. I/O605 may include: a user interface designed to enable a user to interact with the electronic device 600; the design of the peripheral component interface enables peripheral components to also interact with the electronic device 600; and/or sensors are designed to determine environmental conditions and/or location information associated with electronic device 600.

It is noted that fig. 6 is merely exemplary. That is, although fig. 6 shows that the electronic device 600 includes a plurality of devices, such as a processor 601, a system control logic 603, and a memory 604, in a practical application, a system using the methods of the present application may include only a part of the devices of the electronic device 600, and for example, may include only the processor 601 and the NIC 606. The nature of the alternative device in fig. 6 is shown in dashed lines.

Fig. 7 is a schematic structural diagram of an SOC according to an embodiment of the present application. In fig. 7, similar components have the same reference numerals. In addition, the dashed box is an optional feature for more advanced SOCs. In fig. 7, the SOC 700 includes: an interconnect unit 705 coupled to the processor 701; a system agent unit 707; a bus controller unit 708; an integrated memory control unit 704; a set or one or more coprocessors 702 which may include integrated graphics logic, an image processor, an audio processor, and a video processor; a Static Random Access Memory (SRAM) unit 703; a Direct Memory Access (DMA) unit 706. In one embodiment, coprocessor 701 comprises a special-purpose processor, such as, for example, a network or communication processor, a GPGPU, a high-throughput MIC processor, or an embedded processor, among others. Where processor 701 and coprocessor 702 may correspond to processor 300 in fig. 1.

Fig. 8 is a schematic diagram of program modules of a compiling apparatus 800 for a vector instruction with mask according to an embodiment of the present disclosure, which may be used to execute a compiling method for a vector instruction with mask according to an embodiment of the present disclosure, for example, execute the foregoing steps 401 to 404, and the embodiment of the present disclosure provides a compiling apparatus for a vector instruction with mask, which is applied to an electronic device, where a compiler is installed on the electronic device; the device includes:

a program obtaining module 801, configured to obtain, by a compiler, a program to be compiled;

a code analysis module 802, configured to determine that a program to be compiled includes a first program code, and a code form of the first program code conforms to a form of a preset conditional operator expression, where the first program code includes a first mask of a vector instruction with a mask, a vector operation expression, and an operation mode, and the operation mode is a merge mode or a zero setting mode;

a first instruction generating module 803, configured to generate, by the compiler, a first vector instruction according to the first mask of the first program code and the vector operation expression, where the first vector instruction is used to represent a vector operation corresponding to the first program code;

the second instruction generating module 804 is configured to, if the compiler determines that the operation mode of the first program code is the zeroing mode, generate a second vector instruction, where the second vector instruction is used to characterize that the operation mode of the first program code is the zeroing mode.

Embodiments of the mechanisms disclosed herein may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the application may be implemented as computer programs or program code executing on programmable systems comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.

Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this application, a processing system includes any system having a processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor.

The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. Including but not limited to OpenCL, C language, C + +, java, etc. For languages such as C + +, java, etc., since they convert the storage, those skilled in the art may make the conversion based on the specific high-level language, which may be different from the application of the data processing method in the embodiment of the present application, without departing from the scope of the embodiment of the present application.

In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed via a network or via other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy diskettes, optical disks, read-only memories (CD-ROMs), magneto-optical disks, read-only memories (ROMs), random Access Memories (RAMs), erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, flash memory, or a tangible machine-readable memory for transmitting information (e.g., carrier waves, infrared digital signals, etc.) using the internet in an electrical, optical, acoustical or other form of propagated signal. Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).

In the drawings, some features of the structures or methods may be shown in a particular arrangement and/or order. However, it is to be understood that such specific arrangement and/or ordering may not be required. Rather, in some embodiments, the features may be arranged in a manner and/or order different from that shown in the illustrative figures. In addition, the inclusion of a structural or methodical feature in a particular figure is not meant to imply that such feature is required in all embodiments, and in some embodiments, may not be included or may be combined with other features.

It should be noted that, in the embodiments of the apparatuses in the present application, each unit/module is a logical unit/module, and physically, one logical unit/module may be one physical unit/module, or may be a part of one physical unit/module, and may also be implemented by a combination of multiple physical units/modules, where the physical implementation manner of the logical unit/module itself is not the most important, and the combination of the functions implemented by the logical unit/module is the key to solve the technical problem provided by the present application. Furthermore, in order to highlight the innovative part of the present application, the above-mentioned device embodiments of the present application do not introduce units/modules which are not so closely related to solve the technical problems presented in the present application, which does not indicate that no other units/modules exist in the above-mentioned device embodiments.

It is noted that, in the examples and description of the present patent, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element.

While the present application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present application.

Claims

1. A compiling method for vector instructions with masks is applied to electronic equipment, and is characterized in that a compiler is installed on the electronic equipment;

the method comprises the following steps:

the compiler acquires a program to be compiled;

2. The compiling method for a vector instruction with mask according to claim 1, wherein the preset conditional operator expression comprises a first expression, a second expression and a third expression, the second expression being located between the first expression and the third expression;

3. The compiling method for a vector instruction with mask according to claim 1, wherein the preset conditional operator expressions comprise a first expression, a second expression and a third expression, the second expression being located between the first expression and the third expression;

4. The method of compiling for a masked vector instruction according to claim 3, further comprising:

and the compiler generates a third vector instruction according to the second mask of the second program code and the vector operation expression.

5. The compilation method for masked vector instructions according to claim 2, further comprising:

the method further comprises the following steps:

6. The compiling method for a masked vector instruction according to claim 1, wherein the compiler, if it determines that the operation mode of the first program code is a zeroing mode, generating a second vector instruction, comprising:

the compiler generates a zero operation instruction.

7. The method of compiling for a masked vector instruction according to claim 6, further comprising:

8. The method of compiling for a masked vector instruction according to claim 1, further comprising:

and the compiler determines that the return variable of the first program code is different from the variable corresponding to the operation mode, and generates a copy operation instruction.

9. The method of compiling for a masked vector instruction according to claim 8, further comprising:

10. The method of compiling for a masked vector instruction according to claim 1, further comprising:

11. The compiling method for a masked vector instruction according to claim 10, wherein the compiler determining that the operation mode of the first program code is a zeroing mode, generating a second vector instruction comprising:

12. The compilation method for masked vector instructions according to claim 11, further comprising:

13. An electronic device, comprising:

a memory for storing instructions for execution by one or more processors of the electronic device, an

A processor, being one of processors of an electronic device, for controlling execution of the compiling method for a masked vector instruction according to any one of claims 1 to 12.

14. A computer-readable storage medium having stored thereon instructions that, when executed on a computer, cause the computer to perform the method of compiling instructions for masked vector instructions of any of claims 1 to 12.

15. A computer program product, characterized in that the computer program product comprises instructions which, when executed, cause a computer to perform the compiling method for a masked vector instruction of any one of claims 1 to 12.