WO2022206969A1 - 向量指令的识别方法、装置、电子设备及计算机可读存储介质 - Google Patents

向量指令的识别方法、装置、电子设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2022206969A1
WO2022206969A1 PCT/CN2022/084890 CN2022084890W WO2022206969A1 WO 2022206969 A1 WO2022206969 A1 WO 2022206969A1 CN 2022084890 W CN2022084890 W CN 2022084890W WO 2022206969 A1 WO2022206969 A1 WO 2022206969A1
Authority
WO
WIPO (PCT)
Prior art keywords
loop iteration
parameter
dimension
instruction
vector instruction
Prior art date
Application number
PCT/CN2022/084890
Other languages
English (en)
French (fr)
Inventor
何忠政
刘飞
詹荣开
Original Assignee
北京希姆计算科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京希姆计算科技有限公司 filed Critical 北京希姆计算科技有限公司
Publication of WO2022206969A1 publication Critical patent/WO2022206969A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/3001Arithmetic instructions

Definitions

  • the present disclosure relates to the field of program compilation and processors, and in particular, to a vector instruction identification method, apparatus, electronic device, and computer-readable storage medium.
  • Dedicated or vector processors such as GPU (Graphics Processing Unit) and NPU (neural-network processing units) have become one of the main ways for modern computers to improve performance.
  • scalar software code needs to be converted to vectorized code or instructions.
  • the compiler automatically vectorizes the scalar program and automatically generates vector instructions.
  • GCC GPU Compiler Collection
  • LLVM Low Level Virtual Machine
  • the automatic vectorization of general-purpose compilers can only vectorize simple vector instructions and simple scenarios such as addition and subtraction.
  • the second is to use an explicit programming model, mainly through the compiler's intrinsics function or inline assembly (inline assembly). This method has high requirements on the quality of programmers, requires programmers to be familiar with vectorized programming interfaces, and has poor code portability.
  • IR in scalar form is converted into vector instructions for execution on a dedicated or vector processor.
  • IR in scalar form is converted into vector instructions, the vector instructions that can be converted and have the highest performance must be selected from vector instructions of different forms.
  • An essential step to achieve this conversion is the identification of vector instructions, that is, to identify vector instructions that can be successfully converted from IR in scalar form.
  • Scheme 1 According to the constraints of each dimension of the input parameters and output parameters of a specific vector instruction, logically judge the loop variables to be vectorized. If the dimension of the instruction parameter is met, the mapping succeeds, otherwise the mapping fails.
  • the disadvantage of scheme 1 is that it is realized through logical judgment, and the process is complicated. It is necessary to write logical judgment branch statements for analysis according to the constraints of each dimension of the input parameters and output parameters of each vector instruction, and the solution is not very versatile.
  • Option 2 Establish an IR description in the form of Tree-Pattern according to the IR in the form of a scalar, and confirm the IR in the form of a scalar that can be vectorized through similarity matching of the Tree-Pattern.
  • the current solution only supports vector instructions: vector add, vector subtract, etc.
  • the vector addition instruction does not support different forms of vector addition instruction distinction.
  • an embodiment of the present disclosure provides a method for identifying a vector instruction, including:
  • the scalar instruction is mapped to a vector instruction according to a vector instruction dimension mapping rule and an instruction configuration table; wherein the vector instruction dimension mapping rule includes a rule for mapping a scalar instruction to a vector instruction; the instruction configuration table includes a vector instruction set;
  • the vector instruction dimension validity rule includes a rule for verifying the validity of the instruction mapping result
  • vector instruction dimension mapping rule includes:
  • the mapping rule is that the row dimension of the parameter does not map the loop iteration variable and the column dimension of the parameter does not map the loop iteration variable;
  • the mapping rule is that the row dimension of the parameter does not map the loop iteration variable and the column dimension of the parameter maps the loop iteration variable with a step size of 1;
  • the mapping rule is that the row dimension of the parameter maps the loop iteration variable whose step size is not 1 and the column dimension of the parameter does not map the loop iteration variable;
  • the mapping rule is that the mapping rule is the row dimension of the parameter to map the loop iteration variable whose step size is not 1 and the column dimension of the parameter to be mapped Loop iteration variable with step size 1.
  • vector instruction dimension validity rule includes:
  • the parameter When the data type of the parameter is 1*n type data, the parameter has no constraint on mapping loop iteration variables in the row dimension, and the parameter has mapping loop iteration variables in the column dimension;
  • the parameter When the data type of the parameter is m*1 type data, the parameter has a mapping loop iteration variable in the row dimension, and the parameter has no constraint on the mapping loop iteration variable in the column dimension;
  • the parameter When the data type of the parameter is m*n type data, the parameter has no constraint on the mapping loop iteration variable in the row dimension, and the parameter has the mapping loop iteration variable in the column dimension;
  • the parameter has a mapping loop iteration variable in the row dimension or the parameter has a mapping loop iteration variable in the column dimension.
  • instruction configuration table includes:
  • the data type of the parameter in the vector instruction is constrained by the step size of the data type; wherein,
  • the step size constraint is that the step size in the row dimension is 0 and the step size in the column dimension is 0;
  • the step size constraint is that the step size in the row dimension is 0 and the step size in the column dimension is 1;
  • the step size constraint is that the step size in the row dimension is 1 and the step size in the column dimension is 0;
  • the step size constraint is that the step size in the row dimension is max and the step size in the column dimension is 1; wherein, the max represents the maximum value that the register can represent.
  • mapping of the scalar instruction to a vector instruction according to the vector instruction dimension mapping rule and the instruction configuration table includes:
  • the loop iteration variable is mapped to the parameter of the vector instruction according to the data type of the parameter of the vector instruction.
  • mapping of the loop iteration variable to the parameter of the vector instruction according to the data type of the parameter of the vector instruction includes:
  • the selected loop iteration variable is respectively mapped to the row dimension and the column dimension of the parameter according to the vector instruction dimension mapping rule to obtain a loop iteration variable set of the parameter of the vector instruction.
  • mapping of the loop iteration variable to the parameter of the vector instruction according to the dimension mapping rule of the vector instruction includes:
  • the loop iteration variable used by the selected output parameter is respectively mapped to the row dimension and the column dimension of the output parameter according to the vector instruction dimension mapping rule of the output parameter to obtain the loop iteration variable set of the output parameter of the vector instruction;
  • the loop iteration variable used by the selected input parameter is respectively mapped to the row dimension and the column dimension of the input parameter according to the vector instruction dimension mapping rule of the input parameter to obtain the loop iteration variable set of the input parameter of the vector instruction.
  • mapping the loop iteration variable to the parameter of the vector instruction according to the data type of the parameter of the vector instruction including:
  • the loop iteration variable used by the selected input parameter is respectively mapped to the row dimension and the column dimension of the input parameter according to the vector instruction dimension mapping rule of the input parameter to obtain the loop iteration variable set of the input parameter of the vector instruction;
  • the loop iteration variable used by the selected output parameter is respectively mapped to the row dimension and the column dimension of the output parameter according to the vector instruction dimension mapping rule of the output parameter to obtain the loop iteration variable set of the output parameter of the vector instruction.
  • judging the validity of the vector instruction according to the dimension validity rule of the vector instruction includes:
  • the validity of the vector instruction is determined according to the dimension and step size of the loop iteration variable in the loop iteration variable set of the parameters; or,
  • the validity of the vector instruction is determined according to the dimension and step size of the loop iteration variable in the loop iteration variable set of the output parameters and the dimension and step size of the loop iteration variable in the loop iteration variable set of the input parameters.
  • an embodiment of the present disclosure provides an apparatus for identifying a vector instruction, including:
  • the instruction acquisition module is used to acquire scalar instructions
  • mapping module configured to map the scalar instruction to a vector instruction according to a vector instruction dimension mapping rule and an instruction configuration table; wherein the vector instruction dimension mapping rule includes a rule for mapping a scalar instruction to a vector instruction; the instruction configuration table Including the vector instruction set;
  • An identification and verification module for judging the validity of the vector instruction according to the vector instruction dimension validity rule; wherein, the vector instruction dimension validity rule includes a rule for verifying the validity of the instruction mapping result; if the vector instruction is valid, Then the vector instruction is recognized successfully.
  • embodiments of the present disclosure provide an electronic device, including: a memory for storing computer-readable instructions; and one or more processors for executing the computer-readable instructions, so that the processors run When implementing the method in any one of the foregoing first aspects.
  • embodiments of the present disclosure provide a computer-readable storage medium, where the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions are used to cause a computer to execute the method in any one of the foregoing first aspects. .
  • embodiments of the present disclosure provide a computer program product, including computer instructions, when the computer instructions are executed by a computing device, the computing device can perform any one of the methods in the foregoing first aspect.
  • Embodiments of the present disclosure disclose a method, apparatus, electronic device, and computer-readable storage medium for identifying a vector instruction.
  • the method for identifying the vector instruction includes: obtaining a scalar instruction; mapping the scalar instruction to a vector instruction according to a vector instruction dimension mapping rule and an instruction configuration table; wherein the vector instruction dimension mapping rule includes mapping the scalar instruction to a vector instruction.
  • the instruction configuration table includes a vector instruction set; the validity of the vector instruction is judged according to the vector instruction dimension validity rule; wherein, the vector instruction dimension validity rule includes a rule for verifying the validity of the instruction mapping result; If the vector instruction is valid, the vector instruction identification is successful.
  • the above method identifies vector instructions through the vector instruction dimension mapping rule, the instruction configuration table and the vector instruction dimension validity rule, and solves the technology in the prior art that the conversion of scalar instructions into vector instructions is complex, has low versatility and cannot distinguish vector instructions. question.
  • Figure 1 is an example schematic diagram of code in scalar form
  • 2a-2d are schematic diagrams of data types of parameters in a vector instruction provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic flowchart of a method for identifying a vector instruction provided by an embodiment of the present disclosure
  • 4a-4r are schematic diagrams of the types of input parameters and output parameters in various vector instructions
  • FIG. 5 is a further schematic flowchart of a method for identifying a vector instruction provided by an embodiment of the present disclosure
  • 6a is a schematic diagram of an example of a scalar code block used in an embodiment of the present disclosure
  • FIG. 6b is an exemplary schematic diagram of an instruction configuration table used in an embodiment of the present disclosure.
  • FIG. 7 is a further schematic flowchart of a method for identifying a vector instruction provided by an embodiment of the present disclosure
  • FIG. 8 is an exemplary schematic diagram of a scalar code block used in another embodiment of the present disclosure.
  • FIG. 9 is a further schematic flowchart of a method for identifying a vector instruction provided by an embodiment of the present disclosure.
  • FIG. 10a is an exemplary schematic diagram of a scalar code block used in another embodiment of the present disclosure.
  • FIG. 10b is an exemplary schematic diagram of an instruction configuration table used in another embodiment of the present disclosure.
  • the term “including” and variations thereof are open-ended inclusions, ie, "including but not limited to”.
  • the term “based on” is “based at least in part on.”
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
  • FIG. 1 is an example of code in the present disclosure that needs to be converted into scalar form of vector instructions.
  • the for loop statement block in scalar form needs to be converted into a corresponding vector instruction, and the loop iteration variables i and j in the for loop need to be mapped to the input parameters A and B in the vector instruction,
  • the dimension corresponding to the output parameter C is used to vectorize the loop iteration variable.
  • the parameter type shown in Figure 2a is a scalar/vector type, which can be understood as the shape and size of [1,1]([row,col] ) matrix;
  • the parameter type shown in Figure 2b is a vector type, which can be understood as a matrix of shape and size [1,n]([row,col]);
  • the parameter type shown in Figure 2c is a vector type, which can be It is understood as a matrix of shape size [m,1]([row,col]);
  • the parameter type shown in Figure 2d is a vector type, which can be understood as a shape size of [m,n]([row,col])
  • the matrix of ; where row represents the row dimension, col represents the column dimension, m>1, n>1.
  • FIG. 3 is a schematic flowchart of a method for identifying a vector instruction according to an embodiment of the present disclosure.
  • the identification method of the vector instruction is used to identify the vector instruction that the scalar instruction can be mapped into; the identification method of the vector instruction can be performed by a vector instruction identification device, and the vector instruction identification device can be implemented as software or hardware, Or implemented as a combination of software and hardware, the device for identifying vector instructions can be integrated into a device in a system for identifying vector instructions, such as a terminal device for identifying vector instructions.
  • the method includes the following steps:
  • Step S301 obtaining a scalar instruction
  • the scalar instruction includes a statement block in scalar form, and the scalar instruction includes loop iteration variables, such as the for loop statement block shown in FIG. 1 , in which loop iteration variables i and j are used.
  • the method for identifying the vector instruction further includes:
  • Step S302 mapping the scalar instruction to a vector instruction according to the vector instruction dimension mapping rule and the instruction configuration table; wherein, the vector instruction dimension mapping rule includes a rule for mapping a scalar instruction to a vector instruction; the instruction The configuration table includes a set of vector instructions.
  • the dimension mapping rule of the vector instruction includes: the data type of the parameter in the vector instruction; and the rule for mapping the loop iteration variable in the scalar instruction to the parameter in the vector instruction.
  • the vector instruction dimension mapping rule includes the data type of the parameter in the vector instruction, and corresponding to the data types of the different parameters, the scalar loop variable is mapped to the row dimension and/or column in the parameter of the vector instruction dimension rules.
  • the vector instruction dimension mapping rule includes:
  • the mapping rule is that the row dimension of the parameter does not map the loop iteration variable and the column dimension of the parameter does not map the loop iteration variable;
  • the mapping rule is that the row dimension of the parameter does not map the loop iteration variable and the column dimension of the parameter maps the loop iteration variable with a step size of 1;
  • the mapping rule is that the row dimension of the parameter maps the loop iteration variable whose step size is not 1 and the column dimension of the parameter does not map the loop iteration variable;
  • the mapping rule is that the mapping rule is the row dimension of the parameter to map the loop iteration variable whose step size is not 1 and the column dimension of the parameter to be mapped Loop iteration variable with step size 1.
  • the row dimension of the parameter does not map the scalar loop iteration variable, and the column dimension of the parameter is mapped with a step size of A scalar loop iteration variable of 1.
  • i is in the inner loop, and each iteration needs to increase by 1, so the step size of i is 1;
  • j is in the outer loop, i is iterated 30 times, and the value of j is increased by 1 times, so the step size of j is 30.
  • the row dimension of the parameter is mapped to the loop iteration variable whose step size is not 1, and the column dimension of the parameter is not mapped The loop iteration variable.
  • the row dimension of the parameter is mapped to a loop iteration variable whose step size is not 1, and the column dimension of the parameter is mapped to the step A loop iteration variable of length 1.
  • the instruction configuration table includes: all vector instructions to be identified and data types of parameters in each vector instruction.
  • the data type of the parameter in the vector instruction is constrained by the step size of the data type;
  • the step size constraint is that the step size in the row dimension is 0 and the step size in the column dimension is 0;
  • step size constraint is that the step size on the row dimension is 0 and the step size on the column dimension is 1;
  • the step size constraint is that the step size in the row dimension is 1 and the step size in the column dimension is 0;
  • the step size constraint is that the step size in the row dimension is max and the step size in the column dimension is 1; wherein, the max represents the maximum value that the register can represent.
  • the data type of the parameter in the vector instruction may be represented by the step size of the row dimension and the column dimension.
  • the data type shown in Figure 2a the step size in the row dimension is 0, and the step size in the column dimension is 0, it can be expressed as: RowTable: ⁇ 0 ⁇ , ColTable: ⁇ 0 ⁇ ; as shown in Figure 2b
  • the data type shown, the step size in the row dimension is 0, and the step size in the column dimension is 1, it can be expressed as: RowTable: ⁇ 0 ⁇ , ColTable: ⁇ 1 ⁇ ;
  • the data type shown in Figure 2c the step size in the row dimension is 1, and the step size in the column dimension is 0, which can be expressed as: RowTable: ⁇ 1 ⁇ , ColTable: ⁇ 0 ⁇ ;
  • the data type shown in Figure 2d in the row dimension
  • the step size on is max, and the step size on the column dimension is 1, which can be expressed as: RowTable: ⁇ max ⁇ , ColTable: ⁇ 1 ⁇ .
  • Figures 4a-4r show the types of input parameters and output parameters in various vector instructions.
  • the configuration in the instruction configuration table is:
  • the type of the output parameter dst is [RowTable[0],ColTable[0]], which is [max,1].
  • the type of the input parameter src is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of the output parameter dst is [RowTable[0],ColTable[0]], which is [max,1].
  • the type of the input parameter is [RowTable[1],ColTable[1]], which is [1,0].
  • the type of the output parameter dst is [RowTable[0],ColTable[0]], which is [max,1].
  • the type of the input parameter is [RowTable[1],ColTable[1]], which is [0,1].
  • the configuration in the command configuration table is:
  • the type of the output parameter dst is [RowTable[0],ColTable[0]], which is [max,1].
  • the type of the input parameter is [RowTable[1],ColTable[1]], which is [1,0].
  • the configuration in the instruction configuration table is:
  • the type of the output parameter dst is [RowTable[0], ColTable[0]], which is [1,0].
  • the type of the input parameter is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of the output parameter dst is [RowTable[0], ColTable[0], that is, [0,1].
  • the type of the input parameter is [RowTable[1],ColTable[1]], which is [max,1].
  • the configuration in the instruction configuration table is:
  • the type of the output parameter dst is [RowTable[0], ColTable[0]], which is [0,0].
  • the type of the input parameter is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of the output parameter dst is [RowTable[0],ColTable[0]], which is [max,1].
  • the type of the input parameter src1 is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of the input parameter src2 is [RowTable[2],ColTable[2]], which is [max,1].
  • the type of the output parameter dst is [RowTable[0],ColTable[0]], which is [max,1].
  • the type of the input parameter src1 is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of the input parameter src2 is [RowTable[2],ColTable[2]], which is [1,0].
  • the type of the output parameter dst is [RowTable[0],ColTable[0]], which is [max,1].
  • the type of the input parameter src1 is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of the input parameter src2 is [RowTable[2],ColTable[2]], which is [0,1].
  • the type of the output parameter dst is [RowTable[0],ColTable[0]], which is [max,1].
  • the type of the input parameter src1 is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of the input parameter src2 is [RowTable[2],ColTable[2]], which is [0,0].
  • the configuration in the instruction configuration table is:
  • the type of the output parameter dst is [RowTable[0], ColTable[0]], which is [0,0].
  • the type of the input parameter src1 is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of the input parameter src2 is [RowTable[2],ColTable[2]], which is [max,1].
  • the type of the output parameter dst is [RowTable[0], ColTable[0]], which is [0,1].
  • the type of the input parameter src1 is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of the input parameter src2 is [RowTable[2],ColTable[2]], which is [max,1].
  • the configuration in the instruction configuration table is:
  • the type of the output parameter dst is [RowTable[0], ColTable[0]], which is [1,0].
  • the type of the input parameter src1 is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of the input parameter src2 is [RowTable[2],ColTable[2]], which is [max,1].
  • the type of the output parameter dst is [RowTable[0], ColTable[0]], which is [0,1].
  • the type of the input parameter src1 is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of the input parameter src2 is [RowTable[2],ColTable[2]], which is [1,0].
  • the configuration in the instruction configuration table is:
  • the type of the output parameter dst is [RowTable[0], ColTable[0]], which is [1,0].
  • the type of the input parameter src1 is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of the input parameter src2 is [RowTable[2],ColTable[2]], which is [0,1].
  • the type of the output parameter dst is [RowTable[0], ColTable[0]], which is [0,1].
  • the type of the input parameter src1 is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of the input parameter src2 is [RowTable[2],ColTable[2]], which is [0,0].
  • the configuration in the instruction configuration table is:
  • the type of the output parameter dst is [RowTable[0], ColTable[0]], which is [1,0].
  • the type of the input parameter src1 is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of the input parameter src2 is [RowTable[2],ColTable[2]], which is [0,0].
  • the reduce operation instruction refers to an operation instruction in which the output parameter is reduced in the row dimension and/or the column dimension relative to the input parameter.
  • step S302 includes:
  • the loop iteration variable is mapped to the parameter of the vector instruction according to the data type of the parameter of the vector instruction.
  • the loop iteration variable in the scalar instruction is mapped to the parameter of the vector instruction according to the dimension mapping rule of the vector instruction, so
  • the parameters of the vector instruction can be scalar parameters or vector parameters.
  • mapping the loop iteration variable to the parameter of the vector instruction according to the data type of the parameter of the vector instruction includes:
  • Step S501 generating a loop iteration variable set according to the loop iteration variable in the scalar instruction and the step size of the loop iteration variable;
  • Step S502 obtaining the vector instruction dimension mapping rule corresponding to the data type according to the data type of the parameter of the vector instruction;
  • Step S503 selecting the loop iteration variable used by the parameter of the vector instruction from the loop iteration variable set;
  • Step S504 Map the selected loop iteration variable to the row dimension and column dimension of the parameter according to the vector instruction dimension mapping rule to obtain a loop iteration variable set of the parameter of the vector instruction.
  • the type of the vector instruction is not distinguished whether it is a non-reduce instruction or a reduce.
  • step S501 a set of loop iteration variables is generated according to the obtained loop iteration variables.
  • FIG. 6a An example of a scalar code block is shown in Figure 6a.
  • the instruction veadd is included, and the loop iteration variables of the veadd include i and j, where the step size of j is 1, and the step size of i is 10; then the loop iteration variable set generated according to the obtained loop iteration variables is axes: ⁇ (i,10),(j,1) ⁇ .
  • step S502-step S504 for each vector instruction in the instruction configuration table, obtain the vector instruction dimension mapping rule corresponding to the parameter type according to the type of its parameter, and select the vector instruction from the loop iteration variable set The loop iteration variable used by the parameter. Then, the selected loop iteration variable is mapped into a loop iteration variable set of parameters of the vector instruction according to the vector instruction dimension mapping rule, wherein the loop iteration variable set of the parameters of the vector instruction includes the vector instruction parameter in the row.
  • the set of loop iteration variables in the dimension and the set of loop iteration variables in the column dimension of the vector instruction parameter for each vector instruction in the instruction configuration table, obtain the vector instruction dimension mapping rule corresponding to the parameter type according to the type of its parameter, and select the vector instruction from the loop iteration variable set The loop iteration variable used by the parameter. Then, the selected loop iteration variable is mapped into a loop iteration variable set of parameters of the vector instruction according to the vector instruction dimension mapping rule, wherein the loop iter
  • the instruction configuration table shown in Figure 6b includes vector instructions veadd_mm, veadd_mv_dimh, veadd_mv_dimw, and veadd_mf.
  • veadd_mm includes 2 inputs and 1 and output, and the parameter types in its instruction configuration table are:
  • the type of the output parameter C is [RowTable[0],ColTable[0]], which is [max,1].
  • the type of input parameter A is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of input parameter B is [RowTable[2],ColTable[2]], which is [max,1].
  • mapping rules for C, A, and B are:
  • the row dimension maps the loop iteration variables with a step size other than 1, and the column dimension maps the loop iteration variables with a step size of 1.
  • the loop iteration variables used by C are i and j.
  • the stride of the loop iteration variable i is 10, which is not 1, so it can be mapped to the row of C.
  • the stride of the loop iteration variable j is 1, so it can be mapped to the col of C.
  • the dst_row_axises set of C is: ⁇ (i,10) ⁇
  • the dst_col_axises set is: ⁇ (j,1) ⁇ .
  • the loop iteration variables used by A are i and j.
  • the stride of the loop iteration variable i is 10, which can be mapped to the row of A.
  • the stride of the loop iteration variable j is 1, which can be mapped to the col of A.
  • the set of src_row_axises of A is: ⁇ (i,10) ⁇
  • the set of src_col_axises is: ⁇ (j,1) ⁇ .
  • the loop iteration variable used by B is j.
  • the stride of the loop iteration variable j is 1, which can be mapped to the col of B, but cannot be mapped to the row of B, because the row of B can only map the loop iteration variable whose stride is not 1.
  • the set of src_row_axises of B is: ⁇
  • the set of src_col_axises is: ⁇ (j,1) ⁇ .
  • veadd_mv_dimh includes 2 inputs and 1 and output, and the parameter types in its instruction configuration table are:
  • the type of the output parameter C is [RowTable[0],ColTable[0]], which is [max,1].
  • the type of input parameter A is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of input parameter B is [RowTable[2],ColTable[2]], which is [0,1].
  • mapping rules of C and A are:
  • the row dimension maps the loop iteration variables with a step size other than 1, and the column dimension maps the loop iteration variables with a step size of 1.
  • mapping rule for B is:
  • the row dimension does not map the loop iteration variable, and the column dimension maps the loop iteration variable with stride 1.
  • the loop iteration variables used by C are i and j.
  • the stride of the loop iteration variable i is 10, which can be mapped to the row of C.
  • the stride of the loop iteration variable j is 1, which can be mapped to the col of C.
  • the dst_row_axises set of C is: ⁇ (i,10) ⁇
  • the dst_col_axises set is: ⁇ (j,1) ⁇ .
  • the loop iteration variables used by A are i and j.
  • the stride of the loop iteration variable i is 10, which can be mapped to the row of A.
  • the stride of the loop iteration variable j is 1, which can be mapped to the col of A.
  • the set of src_row_axises of A is: ⁇ (i,10) ⁇
  • the set of src_col_axises is: ⁇ (j,1) ⁇ .
  • the loop iteration variable used by B is j.
  • the stride of the loop iteration variable j is 1, which can be mapped to the col of B; and the row direction of B does not map the loop iteration variable.
  • the set of src_row_axises of B is: ⁇
  • the set of src_col_axises is: ⁇ (j,1) ⁇ .
  • veadd_mv_dimw includes 2 inputs and 1 and output, and the parameter types in its instruction configuration table are:
  • the type of the output parameter C is [RowTable[0],ColTable[0]], which is [max,1].
  • the type of input parameter A is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of input parameter B is [RowTable[2],ColTable[2]], which is [1,0].
  • mapping rules of C and A are:
  • the row dimension maps the loop iteration variables with a step size other than 1, and the column dimension maps the loop iteration variables with a step size of 1.
  • mapping rule for B is:
  • the row dimension maps loop iteration variables with a stride other than 1, and the column dimension does not map loop iteration variables.
  • the loop iteration variables used by C are i and j.
  • the stride of the loop iteration variable i is 10, which can be mapped to the row of C.
  • the stride of the loop iteration variable j is 1, which can be mapped to the col of C.
  • the dst_row_axises set of C is: ⁇ (i,10) ⁇
  • the dst_col_axises set is: ⁇ (j,1) ⁇ .
  • the loop iteration variables used by A are i and j.
  • the stride of the loop iteration variable i is 10, which can be mapped to the row of A.
  • the stride of the loop iteration variable j is 1, which can be mapped to the col of A.
  • the set of src_row_axises of A is: ⁇ (i,10) ⁇
  • the set of src_col_axises is: ⁇ (j,1) ⁇ .
  • the loop iteration variable of input B is mapped, and the loop iteration variable used by B is j.
  • the stride of the loop iteration variable j is 1, which cannot be mapped in the row direction of B; and according to the mapping rule, the col dimension of B does not map the loop iteration variable. Therefore, the set of src_row_axises of B is: ⁇ , and the set of src_col_axises is: ⁇ .
  • veadd_mv_mf includes 2 inputs and 1 and output, and the parameter types in its instruction configuration table are:
  • the type of the output parameter C is [RowTable[0],ColTable[0]], which is [max,1].
  • the type of input parameter A is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of input parameter B is [RowTable[2],ColTable[2]], which is [0,0].
  • mapping rules of C and A are:
  • the row dimension maps the loop iteration variables with a step size other than 1, and the column dimension maps the loop iteration variables with a step size of 1.
  • mapping rule for B is:
  • Row dimensions do not map loop iteration variables, and column dimensions do not map loop iteration variables.
  • the loop iteration variables used by C are i and j.
  • the stride of the loop iteration variable i is 10, which can be mapped to the row of C.
  • the stride of the loop iteration variable j is 1, which can be mapped to the col of C.
  • the dst_row_axises set of C is: ⁇ (i,10) ⁇
  • the dst_col_axises set is: ⁇ (j,1) ⁇ .
  • the loop iteration variables used by A are i and j.
  • the stride of the loop iteration variable i is 10, which can be mapped to the row of A.
  • the stride of the loop iteration variable j is 1, which can be mapped to the col of A.
  • the set of src_row_axises of A is: ⁇ (i,10) ⁇
  • the set of src_col_axises is: ⁇ (j,1) ⁇ .
  • the loop iteration variable used by B is j.
  • the stride of the loop iteration variable j is 1, which cannot be mapped in the row direction of B; and the col dimension of B does not map the loop iteration variable.
  • the set of src_row_axises of B is: ⁇ , and the set of src_col_axises is: ⁇ .
  • the scalar instruction is mapped to the vector instruction. Since it cannot be determined which vector instructions in the vector instruction configuration table can execute the operations in the scalar instructions before the mapping is completed, in step S302, the scalar instructions need to be mapped to each of the vector instructions in the configuration table vector instructions. As in the above example, the scalar instruction veadd is mapped into vector instructions veadd_mm, veadd_mv_dimh, veadd_mv_dimw, and veadd_mf, respectively.
  • non-reduce instructions and reduce instructions may also be distinguished, and two different types of instructions use different mapping processes.
  • the step S302 includes:
  • Step S701 generating a loop iteration variable set according to the loop iteration variable in the scalar instruction and the step size of the loop iteration variable;
  • Step S702 obtaining the vector instruction dimension mapping rule of the output parameter corresponding to the data type according to the data type of the output parameter of the vector instruction;
  • Step S703 selecting the loop iteration variable used by the output parameter from the loop iteration variable set;
  • Step S704 the loop iteration variable used by the selected output parameter is respectively mapped to the row dimension and the column dimension of the output parameter according to the vector instruction dimension mapping rule of the output parameter to obtain the loop iteration of the output parameter of the vector instruction.
  • Step S705 obtaining the vector instruction dimension mapping rule of the input parameter corresponding to the data type according to the data type of the input parameter of the vector instruction;
  • Step S706 selecting the loop iteration variable used by the input parameter from the loop iteration variable set of the output parameter;
  • Step S707 Map the loop iteration variable used by the selected input parameter to the row dimension and column dimension of the input parameter according to the vector instruction dimension mapping rule of the input parameter to obtain the loop iteration of the input parameter of the vector instruction.
  • steps S701-S707 firstly map the loop iteration variable on the output parameter, then use the obtained output parameter loop iteration variable set as the input parameter to select the loop iteration variable set, and then map the loop iteration variable on the input parameter.
  • FIG. 8 An example of another scalar code block is shown in Figure 8.
  • the instruction veadd is included, and the loop iteration variable of the veadd includes i, and the step size of i is 1; the loop iteration variable set generated according to the loop iteration variable obtained is axes: ⁇ (i,1) ⁇ .
  • the instruction configuration table shown in FIG. 6b is also used, which includes vector instructions veadd_mm, veadd_mv_dimh, veadd_mv_dimw, and veadd_mf.
  • veadd_mm includes 2 inputs and 1 and output, and the parameter types in its instruction configuration table are:
  • the type of the output parameter C is [RowTable[0],ColTable[0]], which is [max,1].
  • the type of input parameter A is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of input parameter B is [RowTable[2],ColTable[2]], which is [max,1].
  • mapping rules for C, A, and B are:
  • the row dimension maps the loop iteration variables with a step size other than 1, and the column dimension maps the loop iteration variables with a step size of 1.
  • the loop iteration variable used by C is i.
  • the stride of the loop iteration variable i is 1, which can be mapped to the col of C.
  • the set of dst_row_axises of C is: ⁇ , and the set of dst_col_axises is: ⁇ (i,1) ⁇ .
  • the loop iteration variable used by A is i.
  • the stride of the loop iteration variable i is 1, which can be mapped to the col of A.
  • the set of src_row_axises of A is: ⁇
  • the set of src_col_axises is: ⁇ (i,1) ⁇ .
  • the loop iteration variable used by B is i.
  • the stride of the loop iteration variable i is 1, which can be mapped to the col of B.
  • the set of src_row_axises of B is: ⁇
  • the set of src_col_axises is: ⁇ (i,1) ⁇ .
  • veadd_mv_dimh includes 2 inputs and 1 and output, and the parameter types in its instruction configuration table are:
  • the type of the output parameter C is [RowTable[0],ColTable[0]], which is [max,1].
  • the type of input parameter A is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of input parameter B is [RowTable[2],ColTable[2]], which is [0,1].
  • mapping rules of C and A are:
  • the row dimension maps the loop iteration variables with a step size other than 1, and the column dimension maps the loop iteration variables with a step size of 1.
  • mapping rule for B is:
  • the row dimension does not map the loop iteration variable, and the column dimension maps the loop iteration variable with stride 1.
  • the loop iteration variable used by C is i.
  • the stride of the loop iteration variable i is 1, which can be mapped to the col of C.
  • the set of dst_row_axises of C is: ⁇ , and the set of dst_col_axises is: ⁇ (i,1) ⁇ .
  • the loop iteration variable used by A is i.
  • the stride of the loop iteration variable i is 1, which can be mapped to the col of A.
  • the set of src_row_axises of A is: ⁇
  • the set of src_col_axises is: ⁇ (i,1) ⁇ .
  • the loop iteration variable used by B is i.
  • the stride of the loop iteration variable i is 1, which can be mapped to the col of B.
  • the set of src_row_axises of B is: ⁇
  • the set of src_col_axises is: ⁇ (i,1) ⁇ .
  • veadd_mv_dimw includes 2 inputs and 1 and output, and the parameter types in its instruction configuration table are:
  • the type of the output parameter C is [RowTable[0],ColTable[0]], which is [max,1].
  • the type of input parameter A is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of input parameter B is [RowTable[2],ColTable[2]], which is [1,0].
  • mapping rules of C and A are:
  • the row dimension maps the loop iteration variables with a step size other than 1, and the column dimension maps the loop iteration variables with a step size of 1.
  • mapping rule for B is:
  • the row dimension maps loop iteration variables with a stride other than 1, and the column dimension does not map loop iteration variables.
  • the loop iteration variable used by C is i.
  • the stride of the loop iteration variable i is 1, which can be mapped to the col of C.
  • the set of dst_row_axises of C is: ⁇ , and the set of dst_col_axises is: ⁇ (i,1) ⁇ .
  • the loop iteration variable used by A is i.
  • the stride of the loop iteration variable i is 1, which can be mapped to the col of A.
  • the set of src_row_axises of A is: ⁇
  • the set of src_col_axises is: ⁇ (i,1) ⁇ .
  • the loop iteration variable used by B is i.
  • the stride of the loop iteration variable i is 1, which cannot be mapped in the row and col directions of B.
  • the set of src_row_axises of B is: ⁇
  • the set of src_col_axises is: ⁇ .
  • veadd_mv_mf includes 2 inputs and 1 and output, and the parameter types in its instruction configuration table are:
  • the type of the output parameter C is [RowTable[0],ColTable[0]], which is [max,1].
  • the type of input parameter A is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of input parameter B is [RowTable[2],ColTable[2]], which is [0,0].
  • mapping rules of C and A are:
  • the row dimension maps the loop iteration variables with a step size other than 1, and the column dimension maps the loop iteration variables with a step size of 1.
  • mapping rule for B is:
  • Row dimensions do not map loop iteration variables, and column dimensions do not map loop iteration variables.
  • the loop iteration variable used by C is i.
  • the stride of the loop iteration variable i is 1, which can be mapped to the col of C.
  • the set of dst_row_axises of C is: ⁇ , and the set of dst_col_axises is: ⁇ (i,1) ⁇ .
  • the loop iteration variable used by A is i.
  • the stride of the loop iteration variable i is 1, which can be mapped to the col of A.
  • the set of src_row_axises of A is: ⁇
  • the set of src_col_axises is: ⁇ (i,1) ⁇ .
  • the loop iteration variable used by B is j.
  • the stride of the loop iteration variable i is 1, which cannot be mapped in the row and col directions of B.
  • the set of src_row_axises of B is: ⁇ , and the set of src_col_axises is: ⁇ .
  • the step S302 includes:
  • Step S901 generating a loop iteration variable set according to the loop iteration variable in the scalar instruction and the step size of the loop iteration variable;
  • Step S902 obtaining the vector instruction dimension mapping rule of the input parameter corresponding to the data type according to the data type of the input parameter of the vector instruction;
  • Step S903 selecting the loop iteration variable of the input parameter from the loop iteration variable set;
  • Step S904 respectively map the loop iteration variable of the selected input parameter to the row dimension and column dimension of the input parameter according to the vector instruction dimension mapping rule of the input parameter to obtain the loop iteration variable set of the input parameter of the vector instruction ;
  • Step S905 combining the loop iteration variable sets of all input parameters to obtain the loop iteration variable joint set of input parameters;
  • Step S906 obtaining the vector instruction dimension mapping rule of the output parameter corresponding to the data type according to the data type of the output parameter of the vector instruction;
  • Step S907 selecting the loop iteration variable of the output parameter from the joint set of loop iteration variables of the input parameter;
  • Step S908 respectively map the loop iteration variable of the selected output parameter to the row dimension and column dimension of the output parameter according to the vector instruction dimension mapping rule of the output parameter to obtain the loop iteration variable set of the output parameter of the vector instruction .
  • steps S901-S908 firstly map the loop iteration variables on the input parameters, then use the union of all the obtained output parameter loop iteration variable sets as the output parameters to select the set of loop iteration variables, and then map the selected loop iteration variables on the output parameters. the loop iteration variable.
  • FIG. 10a Another example of a scalar code block is shown in Figure 10a.
  • the instruction veemacc is included, and the loop iteration variables of the veemacc include i and k, where the step size of k is 1, and the step size of i is 10; then the loop iteration variable set generated according to the obtained loop iteration variables is axes: ⁇ (i,10),(k,1) ⁇ .
  • an instruction configuration table as shown in FIG. 10b which includes vector instructions veemacc_mm, veemacc_mm_dimh, veemacc_mm_dimw, veemacc_mv_dimh, veemacc_mv_dimw, veemacc_mf_dimh, veemacc_mf_dimw.
  • veemacc_mm includes 2 inputs and 1 and output, and the parameter types in its instruction configuration table are:
  • the type of the output parameter C is [RowTable[0], ColTable[0]], which is [0,0].
  • the type of input parameter A is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of input parameter B is [RowTable[2],ColTable[2]], which is [max,1].
  • mapping rules of A and B are:
  • the row dimension maps the loop iteration variables with a step size other than 1, and the column dimension maps the loop iteration variables with a step size of 1.
  • mapping rule of C can be obtained as:
  • Row dimensions do not map loop iteration variables, and column dimensions do not map loop iteration variables.
  • the stride of the loop iteration variable i is 10, which can be mapped to the row of A; the stride of the loop iteration variable k is 1, which can be mapped to the col of A.
  • the set of src_row_axises of A is: ⁇ (i,10) ⁇ , and the set of src_col_axises is: ⁇ (k,1) ⁇ .
  • the stride of the loop iteration variable i is 10, which can be mapped to the row of B; the stride of the loop iteration variable k is 1, which can be mapped to the col of B.
  • the set of src_row_axises of B is: ⁇ (i,10) ⁇ , and the set of src_col_axises is: ⁇ (k,1) ⁇ .
  • union_row_axises is: ⁇ (i,10) ⁇
  • union_col_axises is: ⁇ (k,1) ⁇ .
  • the loop iteration variable used by C is k.
  • the stride of the loop iteration variable k is 1, which cannot be mapped to the row and col of C.
  • the set of dst_row_axises of C is: ⁇ , and the set of dst_col_axises is: ⁇ .
  • veemacc_mm_dimh includes 2 inputs and 1 and output, and the parameter types in its instruction configuration table are:
  • the type of the output parameter C is [RowTable[0], ColTable[0]], which is [0,1].
  • the type of input parameter A is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of input parameter B is [RowTable[2],ColTable[2]], which is [max,1].
  • mapping rules of A and B are:
  • the row dimension maps the loop iteration variables with a step size other than 1, and the column dimension maps the loop iteration variables with a step size of 1.
  • mapping rule of C can be obtained as:
  • the row dimension does not map the loop iteration variable, and the column dimension maps the loop iteration variable with stride 1.
  • the stride of the loop iteration variable i is 10, which can be mapped to the row of A; the stride of the loop iteration variable k is 1, which can be mapped to the col of A.
  • the set of src_row_axises of A is: ⁇ (i,10) ⁇ , and the set of src_col_axises is: ⁇ (k,1) ⁇ .
  • the stride of the loop iteration variable i is 10, which can be mapped to the row of B; the stride of the loop iteration variable k is 1, which can be mapped to the col of B.
  • the set of src_row_axises of B is: ⁇ (i,10) ⁇ , and the set of src_col_axises is: ⁇ (k,1) ⁇ .
  • union_row_axises is: ⁇ (i,10) ⁇
  • union_col_axises is: ⁇ (k,1) ⁇ .
  • the loop iteration variable used by C is k.
  • the stride of the loop iteration variable k is 1, which can be mapped to the col of C.
  • the set of dst_row_axises of C is: ⁇ , and the set of dst_col_axises is: ⁇ (k,1) ⁇ .
  • veemacc_mm_dimw includes 2 inputs and 1 and output, and the parameter types in its instruction configuration table are:
  • the type of the output parameter C is [RowTable[0], ColTable[0]], which is [1,0].
  • the type of input parameter A is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of input parameter B is [RowTable[2],ColTable[2]], which is [max,1].
  • mapping rules of A and B are:
  • the row dimension maps the loop iteration variables with a step size other than 1, and the column dimension maps the loop iteration variables with a step size of 1.
  • mapping rule of C can be obtained as:
  • the row dimension maps loop iteration variables with a stride other than 1, and the column dimension does not map loop iteration variables.
  • the stride of the loop iteration variable i is 10, which can be mapped to the row of A; the stride of the loop iteration variable k is 1, which can be mapped to the col of A.
  • the set of src_row_axises of A is: ⁇ (i,10) ⁇ , and the set of src_col_axises is: ⁇ (k,1) ⁇ .
  • the stride of the loop iteration variable i is 10, which can be mapped to the row of B; the stride of the loop iteration variable k is 1, which can be mapped to the col of B.
  • the set of src_row_axises of B is: ⁇ (i,10) ⁇ , and the set of src_col_axises is: ⁇ (k,1) ⁇ .
  • union_row_axises is: ⁇ (i,10) ⁇
  • union_col_axises is: ⁇ (k,1) ⁇ .
  • the loop iteration variable used by C is k.
  • the stride of the loop iteration variable k is 1, which cannot be mapped to the row and col of C.
  • the set of dst_row_axises of C is: ⁇ , and the set of dst_col_axises is: ⁇ .
  • veemacc_mv_dimh includes 2 inputs and 1 and output, and the parameter types in its instruction configuration table are:
  • the type of the output parameter C is [RowTable[0], ColTable[0]], which is [0,1].
  • the type of input parameter A is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of input parameter B is [RowTable[2],ColTable[2]], which is [1,0].
  • mapping rule of A can be obtained as:
  • the row dimension maps the loop iteration variables with a step size other than 1, and the column dimension maps the loop iteration variables with a step size of 1.
  • mapping rule of B can be obtained as:
  • the row dimension maps loop iteration variables with a stride other than 1, and the column dimension does not map loop iteration variables.
  • mapping rule of C can be obtained as:
  • the row dimension does not map the loop iteration variable, and the column dimension maps the loop iteration variable with stride 1.
  • the stride of the loop iteration variable i is 10, which can be mapped to the row of A; the stride of the loop iteration variable k is 1, which can be mapped to the col of A.
  • the set of src_row_axises of A is: ⁇ (i,10) ⁇ , and the set of src_col_axises is: ⁇ (k,1) ⁇ .
  • the stride of the loop iteration variable i is 10, which can be mapped to the row of B.
  • the set of src_row_axises of B is: ⁇ (i,10) ⁇ , and the set of src_col_axises is: ⁇ .
  • union_row_axises is: ⁇ (i,10) ⁇
  • union_col_axises is: ⁇ (k,1) ⁇ .
  • the loop iteration variable used by C is k.
  • the stride of the loop iteration variable k is 1, which can be mapped to the col of C.
  • the set of dst_row_axises of C is: ⁇ , and the set of dst_col_axises is: ⁇ (k,1) ⁇ .
  • veemacc_mv_dimw includes 2 inputs and 1 and an output, and the parameter types in its instruction configuration table are:
  • the type of the output parameter C is [RowTable[0], ColTable[0]], which is [1,0].
  • the type of input parameter A is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of input parameter B is [RowTable[2],ColTable[2]], which is [0,1].
  • mapping rule of A can be obtained as:
  • the row dimension maps the loop iteration variables with a step size other than 1, and the column dimension maps the loop iteration variables with a step size of 1.
  • mapping rule of B can be obtained as:
  • the row dimension does not map the loop iteration variable, and the column dimension maps the loop iteration variable with stride 1.
  • mapping rule of C can be obtained as:
  • the row dimension maps loop iteration variables with a stride other than 1, and the column dimension does not map loop iteration variables.
  • the stride of the loop iteration variable i is 10, which can be mapped to the row of A; the stride of the loop iteration variable k is 1, which can be mapped to the col of A.
  • the set of src_row_axises of A is: ⁇ (i,10) ⁇ , and the set of src_col_axises is: ⁇ (k,1) ⁇ .
  • the stride of the loop iteration variable k is 1, which can be mapped to the col of B.
  • the set of src_row_axises of B is: ⁇
  • the set of src_col_axises is: ⁇ (k,1) ⁇ .
  • union_row_axises is: ⁇ (i,10) ⁇
  • union_col_axises is: ⁇ (k,1) ⁇ .
  • the loop iteration variable used by C is k.
  • the stride of the loop iteration variable k is 1, which cannot be mapped to the row and col of C.
  • the set of dst_row_axises of C is: ⁇ , and the set of dst_col_axises is: ⁇ .
  • veemacc_mf_dimh includes 2 inputs and 1 and output, and the parameter types in its instruction configuration table are:
  • the type of the output parameter C is [RowTable[0], ColTable[0]], which is [0,1].
  • the type of input parameter A is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of input parameter B is [RowTable[2],ColTable[2]], which is [0,0].
  • mapping rule of A can be obtained as:
  • the row dimension maps the loop iteration variables with a step size other than 1, and the column dimension maps the loop iteration variables with a step size of 1.
  • mapping rule of B can be obtained as:
  • Row dimensions do not map loop iteration variables, and column dimensions do not map loop iteration variables.
  • mapping rule of C can be obtained as:
  • the row dimension does not map the loop iteration variable, and the column dimension maps the loop iteration variable with stride 1.
  • the stride of the loop iteration variable i is 10, which can be mapped to the row of A; the stride of the loop iteration variable k is 1, which can be mapped to the col of A.
  • the set of src_row_axises of A is: ⁇ (i,10) ⁇ , and the set of src_col_axises is: ⁇ (k,1) ⁇ .
  • the set of src_row_axises of B is: ⁇
  • the set of src_col_axises is: ⁇ .
  • union_row_axises is: ⁇ (i,10) ⁇
  • union_col_axises is: ⁇ (k,1) ⁇ .
  • the loop iteration variable used by C is k.
  • the stride of the loop iteration variable k is 1, which can be mapped to the col of C.
  • the set of dst_row_axises of C is: ⁇ , and the set of dst_col_axises is: ⁇ (k,1) ⁇ .
  • veemacc_mf_dimw includes 2 inputs and 1 and output, and the parameter types in its instruction configuration table are:
  • the type of the output parameter C is [RowTable[0], ColTable[0]], which is [1,0].
  • the type of input parameter A is [RowTable[1],ColTable[1]], which is [max,1].
  • the type of input parameter B is [RowTable[2],ColTable[2]], which is [0,0].
  • mapping rule of A can be obtained as:
  • the row dimension maps the loop iteration variables with a step size other than 1, and the column dimension maps the loop iteration variables with a step size of 1.
  • mapping rule of B can be obtained as:
  • Row dimensions do not map loop iteration variables, and column dimensions do not map loop iteration variables.
  • mapping rule of C can be obtained as:
  • the row dimension maps loop iteration variables with a stride other than 1, and the column dimension does not map loop iteration variables.
  • the stride of the loop iteration variable i is 10, which can be mapped to the row of A; the stride of the loop iteration variable k is 1, which can be mapped to the col of A.
  • the set of src_row_axises of A is: ⁇ (i,10) ⁇ , and the set of src_col_axises is: ⁇ (k,1) ⁇ .
  • the stride of the loop iteration variable k is 1, which cannot be mapped to the row and col of B.
  • the set of src_row_axises of B is: ⁇
  • the set of src_col_axises is: ⁇ .
  • union_row_axises is: ⁇ (i,10) ⁇
  • union_col_axises is: ⁇ (k,1) ⁇ .
  • the loop iteration variable used by C is k.
  • the stride of the loop iteration variable k is 1, which cannot be mapped to the row and col of C.
  • the set of dst_row_axises of C is: ⁇ , and the set of dst_col_axises is: ⁇ .
  • mapping scalar instructions to vector instructions is obtained.
  • the method for identifying the vector instruction further includes:
  • Step S303 judge the validity of the vector instruction according to the vector instruction dimension validity rule; wherein, the vector instruction dimension validity rule includes a rule for verifying the validity of the instruction mapping result; if the vector instruction is valid, then the The vector instruction was recognized successfully.
  • the vector instruction dimension validity rule includes a rule for verifying the validity of the instruction mapping result; if the vector instruction is valid, then the The vector instruction was recognized successfully.
  • the vector instruction dimension validity rule is used to further determine which vector instructions are valid instructions.
  • step S303 includes:
  • the validity of the vector instruction is determined according to the dimension and step size of the loop iteration variable in the loop iteration variable set of the parameters; or,
  • the validity of the vector instruction is determined according to the dimension and step size of the loop iteration variable in the loop iteration variable set of the output parameters and the dimension and step size of the loop iteration variable in the loop iteration variable set of the input parameters.
  • the dimension validity rules of vector instructions are different.
  • mapping loop iteration variables of row dimension and column dimension have no constraints
  • the row dimension mapping loop iteration variable has no constraints, and the column dimension must have the mapping loop iteration variable;
  • the row dimension must have a mapping loop iteration variable, and the column dimension mapping loop iteration variable has no constraints;
  • the row dimension mapping loop iteration variable has no constraints, and the column dimension must have the mapping loop iteration variable.
  • the validity judgment rules for the row dimension/column dimension in the input parameters also include:
  • the loop iteration variable used in the input parameter to be processed by reduce must be successfully mapped to either the row dimension or the column dimension.
  • the dimension of the loop iteration variable that is processed by reduce in the input parameter is mapped. In the output parameter, this dimension cannot map the loop iteration variable.
  • veadd_mv_dimw is an invalid vector instruction
  • the reason is because B is m*1 type data
  • its validity rule is: the row dimension must be There is a mapping loop iteration variable, and the column dimension mapping loop iteration variable has no constraints, but the row dimension in the set of B has no mapping loop iteration variable, so veadd_mv_dimw is invalid.
  • other vector instructions conform to the validity rule, so they are all valid.
  • veadd_mv_dimw is an invalid vector instruction
  • B is m*1 type data
  • its validity rule is: the row dimension must have a mapping cycle Iteration variable, column dimension mapping loop iteration variable has no constraints.
  • the set of src_row_axises of B is: ⁇
  • the set of src_col_axises is: ⁇ , that is, the row dimension of B does not map the loop iteration variable, so it does not meet the validity rules
  • veadd_mv_dimw is invalid.
  • veemacc_mm_dimw is an invalid instruction, because C is m*1 type data, its validity rule is: the row dimension must have mapping loop iteration Variable, column dimension mapping loop iteration variable has no constraints.
  • the set of dst_row_axises of C is: ⁇ , and the set of dst_col_axises is: ⁇ . Due to the row dimension of C, there is no mapping loop iteration variable, which does not meet the validity rules, so veemacc_mm_dimw is invalid.
  • veemacc_mv_dimw is an invalid instruction, because C is m*1 type data, and its validity rule is: the row dimension must have a mapping loop iteration variable, and the column dimension mapping loop iteration variable has no constraints.
  • the set of dst_row_axises of C is: ⁇
  • the set of dst_col_axises is: ⁇ . Due to the row dimension of C, there is no mapping loop iteration variable, which does not meet the validity rules, so veemacc_mm_dimw is invalid.
  • veemacc_mf_dimh is an invalid instruction, because B uses the loop iteration variable i of reduce, and its validity rule is: the loop iteration variable processed by reduce used in the input parameter must be successfully mapped to the row dimension or the column dimension.
  • the set of src_row_axises of B is: ⁇
  • the set of src_col_axises is: ⁇
  • the row and col of B have no mapping axis, which does not meet the validity rules, so veemacc_mf_dimh is invalid.
  • veemacc_mf_dimw is an invalid instruction, because C is m*1 type data, and its validity rule is: the row dimension must have a mapping loop iteration variable, and the column dimension mapping loop iteration variable has no constraints.
  • the set of dst_row_axises of C is: ⁇
  • the set of dst_col_axises is: ⁇
  • the row dimension of C does not map the loop iteration variable
  • B uses the loop iteration variable i of reduce
  • its validity rule is: use in the input parameter
  • the loop iteration variable processed by reduce must be successfully mapped to either the row dimension or the column dimension.
  • the set of src_row_axises of B is: ⁇
  • the set of src_col_axises is: ⁇
  • the row and col of B have no mapping axis, which does not meet the validity rules, so veemacc_mf_dimw is an invalid instruction.
  • veadd_mm, veadd_mv_dimh, and veadd_mf are vector instructions that are successfully identified, that is, the scalar instruction veadd can be converted into any one of veadd_mm, veadd_mv_dimh, and veadd_mf;
  • veemacc_mm, veemacc_mm_dimh, and veemacc_mv_dimh are vector instructions that are successfully identified, that is, scalar
  • the instruction veemacc can be converted to any of veemacc_mm, veemacc_mm_dimh, veemacc_mv_dimh.
  • scalar instructions can be converted into corresponding vector instructions, thereby improving the efficiency of data processing by the hardware chip.
  • the vector instructions supported by the above identification method can be expanded.
  • An embodiment of the present disclosure discloses a method for identifying a vector instruction.
  • the method for identifying a vector instruction includes: acquiring a scalar instruction; mapping the scalar instruction to a vector instruction according to a vector instruction dimension mapping rule and an instruction configuration table; wherein the The vector instruction dimension mapping rule includes a rule for mapping a scalar instruction to a vector instruction; the instruction configuration table includes a vector instruction set; the validity of the vector instruction is judged according to the vector instruction dimension validity rule; wherein, the vector instruction dimension is valid
  • the validity rules include rules for verifying the validity of the instruction mapping result; if the vector instruction is valid, the vector instruction is recognized successfully.
  • the above method identifies vector instructions through the vector instruction dimension mapping rule, the instruction configuration table and the vector instruction dimension validity rule, and solves the technology in the prior art that the conversion of scalar instructions into vector instructions is complex, has low versatility and cannot distinguish vector instructions. question.
  • An embodiment of the present disclosure provides an apparatus for identifying vector instructions, including: an instruction acquisition module for acquiring a scalar instruction; a mapping module for mapping the scalar instruction into a vector instruction according to a vector instruction dimension mapping rule and an instruction configuration table ; wherein, the vector instruction dimension mapping rule includes a rule for mapping a scalar instruction to a vector instruction; the instruction configuration table includes a vector instruction set; an identification verification module is used to judge the vector instruction according to the vector instruction dimension validity rule.
  • Validity wherein, the vector instruction dimension validity rule includes a rule for verifying the validity of the instruction mapping result; if the vector instruction is valid, the vector instruction is successfully identified.
  • vector instruction dimension mapping rule includes:
  • the mapping rule is that the row dimension of the parameter does not map the loop iteration variable and the column dimension of the parameter does not map the loop iteration variable;
  • the mapping rule is that the row dimension of the parameter does not map the loop iteration variable and the column dimension of the parameter maps the loop iteration variable with a step size of 1;
  • the mapping rule is that the row dimension of the parameter maps the loop iteration variable whose step size is not 1 and the column dimension of the parameter does not map the loop iteration variable;
  • the mapping rule is that the mapping rule is the row dimension of the parameter to map the loop iteration variable whose step size is not 1 and the column dimension of the parameter to be mapped Loop iteration variable with step size 1.
  • vector instruction dimension validity rule includes:
  • the parameter When the data type of the parameter is 1*n type data, the parameter has no constraint on mapping loop iteration variables in the row dimension, and the parameter has mapping loop iteration variables in the column dimension;
  • the parameter When the data type of the parameter is m*1 type data, the parameter has a mapping loop iteration variable in the row dimension, and the parameter has no constraint on the mapping loop iteration variable in the column dimension;
  • the parameter When the data type of the parameter is m*n type data, the parameter has no constraint on the mapping loop iteration variable in the row dimension, and the parameter has the mapping loop iteration variable in the column dimension;
  • the parameter has a mapping loop iteration variable in the row dimension or the parameter has a mapping loop iteration variable in the column dimension.
  • instruction configuration table includes:
  • the data type of the parameter in the vector instruction is constrained by the step size of the data type; wherein,
  • the step size constraint is that the step size in the row dimension is 0 and the step size in the column dimension is 0;
  • the step size constraint is that the step size in the row dimension is 0 and the step size in the column dimension is 1;
  • the step size constraint is that the step size in the row dimension is 1 and the step size in the column dimension is 0;
  • the step size constraint is that the step size in the row dimension is max and the step size in the column dimension is 1; wherein, the max represents the maximum value that the register can represent.
  • mapping module is also used for:
  • the loop iteration variable is mapped to the parameter of the vector instruction according to the data type of the parameter of the vector instruction.
  • mapping module is also used for:
  • the selected loop iteration variable is respectively mapped to the row dimension and the column dimension of the parameter according to the vector instruction dimension mapping rule to obtain a loop iteration variable set of the parameter of the vector instruction.
  • mapping module is also used to:
  • the loop iteration variable used by the selected output parameter is respectively mapped to the row dimension and the column dimension of the output parameter according to the vector instruction dimension mapping rule of the output parameter to obtain the loop iteration variable set of the output parameter of the vector instruction;
  • the loop iteration variable used by the selected input parameter is respectively mapped to the row dimension and the column dimension of the input parameter according to the vector instruction dimension mapping rule of the input parameter to obtain the loop iteration variable set of the input parameter of the vector instruction.
  • mapping module is also used to:
  • the loop iteration variable used by the selected input parameter is respectively mapped to the row dimension and the column dimension of the input parameter according to the vector instruction dimension mapping rule of the input parameter to obtain the loop iteration variable set of the input parameter of the vector instruction;
  • the loop iteration variable used by the selected output parameter is respectively mapped to the row dimension and the column dimension of the output parameter according to the vector instruction dimension mapping rule of the output parameter to obtain the loop iteration variable set of the output parameter of the vector instruction.
  • identification verification module is also used for:
  • the validity of the vector instruction is determined according to the dimension and step size of the loop iteration variable in the loop iteration variable set of the parameters; or,
  • the validity of the vector instruction is determined according to the dimension and step size of the loop iteration variable in the loop iteration variable set of the output parameters and the dimension and step size of the loop iteration variable in the loop iteration variable set of the input parameters.
  • Embodiments of the present disclosure also provide an electronic device, including: a memory for storing computer-readable instructions; and one or more processors for executing the computer-readable instructions, so that the processor implements execution when running any of the methods described in the examples.
  • Embodiments of the present disclosure also provide a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions are used to cause a computer to execute any one of the methods in the foregoing embodiments.
  • Embodiments of the present disclosure also provide a computer program product, which includes computer instructions, and when the computer instructions are executed by a computing device, the computing device can perform any one of the methods in the foregoing embodiments.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments of the present disclosure may be implemented in a software manner, and may also be implemented in a hardware manner. Among them, the name of the unit does not constitute a limitation of the unit itself under certain circumstances.
  • exemplary types of hardware logic components include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs Systems on Chips
  • CPLDs Complex Programmable Logical Devices
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

本公开涉及一种向量指令的识别方法、装置、电子设备和计算机可读存储介质。其中该方法包括:获取标量指令;根据向量指令维度映射规则以及指令配置表将所述标量指令映射为向量指令;其中,所述向量指令维度映射规则包括将标量指令映射为向量指令的规则;所述指令配置表包括向量指令集合;根据向量指令维度有效性规则判断所述向量指令的有效性;其中,所述向量指令维度有效性规则包括验证指令映射结果的有效性的规则;如果所述向量指令有效,则所述向量指令识别成功。上述方法通过向量指令维度映射规则、指令配置表以及向量指令维度有效性规则识别向量指令,解决了将标量指令转换为向量指令时复杂、通用性不高且无法区分向量指令的技术问题。

Description

向量指令的识别方法、装置、电子设备及计算机可读存储介质
本申请要求了2021年4月1日提交的、申请号为202110357565.5、发明名称为“向量指令的识别方法、装置、电子设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及程序编译及处理器领域,尤其涉及一种向量指令的识别方法、装置、电子设备及计算机可读存储介质。
背景技术
专用或者向量处理器如GPU(Graphics Processing Unit)、NPU(neural-network processing units)已成为现代计算机提升性能的主要方式之一。提供专用或者向量处理器以后,标量软件代码需要转换为向量化代码或者指令。一般有如下两种方式实现转换:一是编译器对标量程序进行自动向量化,自动生成向量指令。如GCC(GNU Compiler Collection)或者LLVM(Low Level Virtual Machine)都会对一些简单的循环进行自动向量化。通用编译器的自动向量化只能向量化加、减等简单的向量指令和简单的场景。二是使用显式的编程模型,主要通过编译器的intrinsics函数或者inline assembly(内联汇编)实现。这种方式对编程人员的素质要求较高,需要编程人员熟知向量化编程接口,且代码移植性差。
在深度学习领域中,专用或者向量处理器会针对向量指令的输入参数和输出参数类型提供不同形式的向量指令。现今利用向量指令来加速代码效率的实现中,编译器工具链会先在自定义标量形式的IR(Intermediate representation,中间表示)上进行优化。优化完成后,将标量形式的IR转换为向量指令,在专用或者向量处理器上执行。标量形式的IR在转换为向量指令时,必须从不同形式的向量指令中选取可以转化的且性能最高的向量指令。实现这种转换的必不可少的步骤就是向量指令的识别,即识别标量形式的IR能够成功转换的向量指令。
现有实现自定义标量形式的IR转换为向量指令的方案:
方案一:根据特定向量指令的输入参数和输出参数的每个维度的约束条件,对要向量化的循环变量进行逻辑判断。如果满足指令参数的维度的要求,那么映射成功,反之则映射失败。方案一的缺点在于:通过逻辑判断实现,流程复杂。需要根据每种向量指令的输入参数和输出参数的每个维度的约束条件写逻辑判断分支语句进行分析,方案通用性不高。
方案二:根据标量形式的IR建立Tree-Pattern形式的IR描述,通过Tree-Pattern的相似性匹配确认能够向量化的标量形式的IR。目前方案仅仅支持向量指令:向量加、向量减等。而在向量加指令的输入参数和输出参数为不同类型时,向量加指令不支持不同形式的向量加指令区分。
发明内容
提供该发明内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该发明内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。
为了解决现有技术中的上述技术问题,本公开实施例提出如下技术方案:
第一方面,本公开实施例提供一种向量指令的识别方法,包括:
获取标量指令;
根据向量指令维度映射规则以及指令配置表将所述标量指令映射为向量指令;其中,所述向量指令维度映射规则包括将标量指令映射为向量指令的规则;所述指令配置表包括向量指令集合;
根据向量指令维度有效性规则判断所述向量指令的有效性;其中,所述向量指令维度有效性规则包括验证指令映射结果的有效性的规则;
如果所述向量指令有效,则所述向量指令识别成功。
进一步的,所述向量指令维度映射规则,包括:
向量指令中的参数的数据类型;
标量指令中的循环迭代变量映射为所述向量指令中的参数的规则。
进一步的,当所述参数的数据类型为1*1型数据,所述映射规则为所述参数的行维度不映射所述循环迭代变量并且所述参数的列维度不映射循环迭代变量;
当所述参数的数据类型为1*n型数据,所述映射规则为所述参数的行维度不映射所述循环迭代变量并且所述参数的列维度映射步长为1的循环迭代变量;
当所述参数的数据类型为m*1型数据,所述映射规则为所述参数的行维度映射步长不为1的所述循环迭代变量并且所述参数的列维度不映射所述循环迭代变量;
当所述参数的数据类型为m*n型数据,所述映射规则为所述映射规则为所述参数的行维度映射步长不为1的所述循环迭代变量并且所述参数的列维度映射步长为1的循环迭代变量。
进一步的,所述向量指令维度有效性规则,包括:
当所述参数的数据类型为1*1型数据,所述参数在行维度和列维度上映射循环迭代变量没有约束;
当所述参数的数据类型为1*n型数据,所述参数在行维度上映射循环迭代变量没有约束,所述参数在列维度上有映射循环迭代变量;
当所述参数的数据类型为m*1型数据,所述参数在行维度上有映射循环迭代变量,所述参数在列维度上映射循环迭代变量没有约束;
当所述参数的数据类型为m*n型数据,所述参数在行维度上映射循环迭代变量没有约束,所述参数在列维度上有映射循环迭代变量;
当所述向量指令为消除循环迭代变量的指令,所述参数在行维度上有映射循环迭代变量或者所述参数在列维度上有映射循环迭代变量。
进一步的,所述指令配置表,包括:
所有待识别的向量指令以及每个所述向量指令中的参数的数据类型。
进一步的,所述向量指令中的参数的数据类型由所述数据类型的步长约束;其中,
当所述数据类型为1*1型数据,其步长约束为行维度上步长为0且列维度上步长为0;
当所述数据类型为1*n型数据,其步长约束为行维度上步长为0且列维度上步长为1;
当所述数据类型为m*1型数据,其步长约束为行维度上步长为1且列维度上步长为0;
当所述数据类型为m*n型数据,其步长约束为行维度上步长为max且列维度上步长为1;其中,所述max表示寄存器所能表示的最大值。
进一步的,所述根据所述向量指令维度映射规则以及所述指令配置表将所述标量指令映射为向量指令,包括:
获取所述指令配置表中的向量指令;
确定所述向量指令的参数的数据类型;
获取所述标量指令中的循环迭代变量;
根据所述向量指令的参数的数据类型将所述循环迭代变量映射为所述向量指令的参数。
进一步的,所述根据所述向量指令的参数的数据类型将所述循环迭代变量映射为所述向量指令的参数,包括:
根据所述标量指令中的循环迭代变量以及所述循环迭代变量的步长生成循环迭代变量集合;
根据所述向量指令的参数的数据类型获取所述数据类型所对应的向量指令维度映射规则;
从所述循环迭代变量集合中选取所述向量指令的参数所使用的循环迭代变量;
将选取的所述循环迭代变量根据所述向量指令维度映射规则分别映射到所述参数的行维度和列维度得到所述向量指令的参数的循环迭代变量集合。
进一步的,当所述向量指令为非消除循环迭代变量的指令,所述根据所述向量指令维度映射规则将所述循环迭代变量映射为所述向量指令的参数,包括:
根据所述标量指令中的循环迭代变量以及所述循环迭代变量的步长生成循环迭代变量集合;
根据所述向量指令的输出参数的数据类型获取所述数据类型所对应的输出参数的向量指令维度映射规则;
从所述循环迭代变量集合中选取所述输出参数所使用的循环迭代变量;
将所选取的输出参数所使用的循环迭代变量根据所述输出参数的向量指令维度映射规则分别映射到所述输出参数的行维度和列维度得到所述向量指令的输出参数的循环迭代变量集合;
根据所述向量指令的输入参数的数据类型获取所述数据类型所对应的输入参数的向量指令维度映射规则;
从所述输出参数的循环迭代变量集合中选取所述输入参数所使用的循环迭代变量;
将所选取的输入参数所使用的循环迭代变量根据所述输入参数的向量指令维度映射规则分别映射到所述输入参数的行维度和列维度得到所述向量指令的输入参数的循环迭代变量集合。
进一步的,当所述向量指令为消除循环迭代变量的指令,所述根据所述向量指令的参数的数据类型将所述循环迭代变量映射为所述向量指令的参数,包括:
根据所述标量指令中的循环迭代变量以及所述循环迭代变量的步长生成循环迭代变量集合;
根据所述向量指令的输入参数的数据类型获取所述数据类型所对应的输入参数的向量指令维度映射规则;
从所述循环迭代变量集合中选取所述输入参数所使用的循环迭代变量;
将所选取的输入参数所使用的循环迭代变量根据所述输入参数的向量指令维度映射规则分别映射到所述输入参数的行维度和列维度得到所述向量指令的输入参数的循环迭代变量集合;
将所有输入参数的循环迭代变量集合合并得到输入参数的循环迭代变量联合集合;
根据所述向量指令的输出参数的数据类型获取所述数据类型所对应的输出参数的向量指令维度映射规则;
从所述输入参数的循环迭代变量联合集合中选取所述输出参数所使用的循环迭 代变量;
将所选取的输出参数所使用的循环迭代变量根据所述输出参数的向量指令维度映射规则分别映射到所述输出参数的行维度和列维度得到所述向量指令的输出参数的循环迭代变量集合。
进一步的,所述根据所述向量指令维度有效性规则判断所述向量指令的有效性,包括:
根据所述参数的循环迭代变量集合中的循环迭代变量的维度以及步长确定所述向量指令的有效性;或者,
根据所述输出参数的循环迭代变量集合中的循环迭代变量的维度以及步长和所述输入参数的循环迭代变量集合中的循环迭代变量的维度以及步长确定所述向量指令的有效性。
第二方面,本公开实施例提供一种向量指令的识别装置,包括:
指令获取模块,用于获取标量指令;
映射模块,用于根据向量指令维度映射规则以及指令配置表将所述标量指令映射为向量指令;其中,所述向量指令维度映射规则包括将标量指令映射为向量指令的规则;所述指令配置表包括向量指令集合;
识别验证模块,用于根据向量指令维度有效性规则判断所述向量指令的有效性;其中,所述向量指令维度有效性规则包括验证指令映射结果的有效性的规则;如果所述向量指令有效,则所述向量指令识别成功。
第三方面,本公开实施例提供一种电子设备,包括:存储器,用于存储计算机可读指令;以及一个或多个处理器,用于运行所述计算机可读指令,使得所述处理器运行时实现前述第一方面中的任一所述的方法。
第四方面,本公开实施例提供一种计算机可读存储介质,该非暂态计算机可读存储介质存储计算机指令,该计算机指令用于使计算机执行前述第一方面中的任一所述的方法。
第五方面,本公开实施例提供一种计算机程序产品,包括计算机指令,当所述计算机指令被计算设备执行时,所述计算设备可以执行前述第一方面中的任一所述的方法。
本公开实施例公开了一种向量指令的识别方法、装置、电子设备和计算机可读存储介质。其中该向量指令的识别方法包括:获取标量指令;根据向量指令维度映射规则以及指令配置表将所述标量指令映射为向量指令;其中,所述向量指令维度映射规则包括将标量指令映射为向量指令的规则;所述指令配置表包括向量指令集合;根据向量指令维度有效性规则判断所述向量指令的有效性;其中,所述向量指令维度有效性规则包括验证指令映射结果的有效性的规则;如果所述向量指令有效,则所述向量指令识别成功。上述方法通过向量指令维度映射规则、指令配置表以及向量指令维度有效性规则识别向量指令,解决了现有技术中将标量指令转换为向量指令时复杂、通用性不高且无法区分向量指令的技术问题。
上述说明仅是本公开技术方案的概述,为了能更清楚了解本公开的技术手段,而可依照说明书的内容予以实施,并且为让本公开的上述和其他目的、特征和优点能够更明显易懂,以下特举较佳实施例,并配合附图,详细说明如下。
附图说明
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。
图1为标量形式的代码的示例示意图;
图2a-2d为本公开实施例提供的向量指令中的参数的数据类型示意图;
图3为本公开实施例提供的向量指令的识别方法的流程示意图;
图4a-图4r为各种向量指令中的输入参数和输出参数的类型示意图;
图5为本公开实施例提供的向量指令的识别方法的进一步流程示意图;
图6a为本公开实施例中所使用的标量代码块的示例示意图;
图6b为本公开实施例中所使用的指令配置表的示例示意图;
图7为本公开实施例提供的向量指令的识别方法的进一步流程示意图;
图8为本公开另一实施例中所使用的标量代码块的示例示意图;
图9为本公开实施例提供的向量指令的识别方法的进一步流程示意图;
图10a为本公开另一实施例中所使用的标量代码块的示例示意图;
图10b为本公开另一实施例中所使用的指令配置表的示例示意图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
图1为本公开中需要转换成向量指令的标量形式的代码的示例。如图1中所示,需要将所示的标量形式的for循环语句块转换为对应的向量指令,需要将for循环中的循环迭代变量i和j映射到向量指令中的输入参数A和B、输出参数C所对应的维度上,以将所述循环迭代变量向量化。
图2a-2d为本公开实施例提供的向量指令中的参数的数据类型示意图。向量指令支持的输入参数和输出参数的数据类型有如下四种:如图2a中所示的参数类型为标量/向量的类型,可以理解为形状大小为[1,1]([row,col])的矩阵;如图2b所示的参数类型为向量类型,可以理解为形状大小为[1,n]([row,col])的矩阵;如图2c所示的参数类型为向量类型,可以理解为形状大小为[m,1]([row,col])的矩阵;如图2d所示的参数类型为向量类型,可以理解为形状大小为[m,n]([row,col])的矩阵;其中row表示行维度,col表示列维度,m>1,n>1。
图3为本公开实施例提供的向量指令的识别方法的流程示意图。
所述向量指令的识别方法,用于识别标量指令能够映射成的向量指令;该向量指令的识别方法可以由一向量指令的识别装置来执行,该向量指令的识别装置可以实现为软件或者硬件,或者实现为软件和硬件的组合,该向量指令的识别装置可以集成设置在向量指令的识别系统中的某设备中,比如向量指令的识别终端设备中。
如图3所示,该方法包括如下步骤:
步骤S301,获取标量指令;
其中,所述标量指令包括标量形式的语句块,所述标量指令中包括循环迭代变量,如图1所示的for循环语句块,其中使用了循环迭代变量i和j。
返回图3,所述向量指令的识别方法,还包括:
步骤S302,根据所述向量指令维度映射规则以及所述指令配置表将所述标量指令映射为向量指令;其中,所述向量指令维度映射规则包括将标量指令映射为向量指令的规则;所述指令配置表包括向量指令集合。
其中,所述向量指令维度映射规则,包括:向量指令中的参数的数据类型;标量指令中的循环迭代变量映射为所述向量指令中的参数的规则。
具体的,所述向量指令维度映射规则包括向量指令中的参数的数据类型,以及对应不同所述参数的所述数据类型,将标量循环变量映射到向量指令的参数中的行维度和/或列维度的规则。
可选的,所述向量指令维度映射规则,包括:
当所述参数的数据类型为1*1型数据,所述映射规则为所述参数的行维度不映射所述循环迭代变量并且所述参数的列维度不映射循环迭代变量;
当所述参数的数据类型为1*n型数据,所述映射规则为所述参数的行维度不映射所述循环迭代变量并且所述参数的列维度映射步长为1的循环迭代变量;
当所述参数的数据类型为m*1型数据,所述映射规则为所述参数的行维度映射步长不为1的所述循环迭代变量并且所述参数的列维度不映射所述循环迭代变量;
当所述参数的数据类型为m*n型数据,所述映射规则为所述映射规则为所述参数的行维度映射步长不为1的所述循环迭代变量并且所述参数的列维度映射步长为1的循环迭代变量。
即,当所述向量指令的参数的类型为图2a所示的[1,1]类型的数据时,所述参数的行维度和列维度均不映射所述标量的循环迭代变量。
当所述向量指令的参数的类型为图2b所示的[1,n]类型的数据时,所述参数的行维度不映射所述标量循环迭代变量,所述参数的列维度映射步长为1的标量循环迭代变量。如图1所示的标量指令中,i处于内层循环,其每次迭代都需要增加1,因此i的步长为1;j处于外层循环,i迭代30次,j的值才增加1次,因此j的步长为30。
当所述向量指令的参数的类型为图2c所示的[m,1]类型的数据时,所述参数的行维度映射步长不为1的循环迭代变量,所述参数的列维度不映射循环迭代变量。
当所述向量指令的参数的类型为图2d所示的[m,n]类型的数据时,所述参数的行维度映射步长不为1的循环迭代变量,所述参数的列维度映射步长为1的循环迭代变量。
所述指令配置表中,包括:所有待识别的向量指令以及每个向量指令中的参数的数据类型。
其中,所述向量指令中的参数的数据类型由所述数据类型的步长约束;其中,
当所述数据类型为1*1型数据,其步长约束为行维度上步长为0且列维度上步长为0;
当所述数据类型为1*n型数据,其步长约束为行维度上步长为0且列维度上步长 为1;
当所述数据类型为m*1型数据,其步长约束为行维度上步长为1且列维度上步长为0;
当所述数据类型为m*n型数据,其步长约束为行维度上步长为max且列维度上步长为1;其中,所述max表示寄存器所能表示的最大值。
其中,所述向量指令中的参数的数据类型可以通过上述行维度和列维度的步长表示。如图2a所示的数据类型,在行维度上的步长为0,在列维度上的步长为0,则可以表示为:RowTable:{0},ColTable:{0};如图2b所示的数据类型,在行维度上的步长为0,在列维度上的步长为1,则可以表示为:RowTable:{0},ColTable:{1};如图2c所示的数据类型,在行维度上的步长为1,在列维度上的步长为0,则可以表示为:RowTable:{1},ColTable:{0};如图2d所示的数据类型,在行维度上的步长为max,在列维度上的步长为1,则可以表示为:RowTable:{max},ColTable:{1}。
如图4a-图4r为各种向量指令中的输入参数和输出参数的类型情况。
图4a所示的指令,其指令配置表中的配置为:
RowTable:{max,max}    //dst,src
ColTable:{1,1}        //dst,src
输出参数dst的类型为[RowTable[0],ColTable[0]],即为[max,1]。输入参数src的类型为[RowTable[1],ColTable[1]],即为[max,1]。
图4b所示的指令,其指令配置表中的配置为:
RowTable:{max,1}      //dst,src
ColTable:{1,0}        //dst,src
输出参数dst的类型为[RowTable[0],ColTable[0]],即为[max,1]。输入参数的类型为[RowTable[1],ColTable[1]],即为[1,0]。
图4c所示的指令,其指令配置表中的配置为:
RowTable:{max,0}      //dst,src
ColTable:{1,1}        //dst,src
输出参数dst的类型为[RowTable[0],ColTable[0]],即为[max,1]。输入参数的类型为[RowTable[1],ColTable[1]],即为[0,1]。
图4d所示的指令输入,指令配置表中的配置为:
RowTable:{max,0}      //dst,src
ColTable:{1,0}        //dst,src
输出参数dst的类型为[RowTable[0],ColTable[0]],即为[max,1]。输入参数的类型为[RowTable[1],ColTable[1]],即为[1,0]。
图4e所示的reduce操作指令,指令配置表中的配置为:
RowTable:{1,max}    //dst,src
ColTable:{0,1}      //dst,src
输出参数dst的类型为[RowTable[0],ColTable[0]],即为[1,0]。输入参数的类型为[RowTable[1],ColTable[1]],即为[max,1]。
图4f所示的reduce操作指令,指令配置表中的配置为:
RowTable:{0,max}    //dst,src
ColTable:{1,1}      //dst,src
输出参数dst的类型为[RowTable[0],ColTable[0],即为[0,1]。输入参数的类型为[RowTable[1],ColTable[1]],即为[max,1]。
图4g所示的reduce操作指令,指令配置表中的配置为:
RowTable:{0,max}    //dst,src
ColTable:{0,1}      //dst,src
输出参数dst的类型为[RowTable[0],ColTable[0]],即为[0,0]。输入参数的类型为[RowTable[1],ColTable[1]],即为[max,1]。
图4h所示的指令输入,指令配置表中的配置为:
RowTable:{max,max,max}    //dst,src1,src2
ColTable:{1,1,1}           //dst,src1,src2
输出参数dst的类型为[RowTable[0],ColTable[0]],即为[max,1]。输入参数src1的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数src2的类型为[RowTable[2],ColTable[2]],即为[max,1]。
图4i所示的指令输入,指令配置表中的配置为:
RowTable:{max,max,1}      //dst,src1,src2
ColTable:{1,1,0}           //dst,src1,src2
输出参数dst的类型为[RowTable[0],ColTable[0]],即为[max,1]。输入参数src1的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数src2的类型为[RowTable[2],ColTable[2]],即为[1,0]。
图4j所示的指令输入,指令配置表中的配置为:
RowTable:{max,max,0}     //dst,src1,src2
ColTable:{1,1,1}          //dst,src1,src2
输出参数dst的类型为[RowTable[0],ColTable[0]],即为[max,1]。输入参数src1的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数src2的类型为[RowTable[2],ColTable[2]],即为[0,1]。
图4k所示的指令输入,指令配置表中的配置为:
RowTable:{max,max,0}     //dst,src1,src2
ColTable:{1,1,0}          //dst,src1,src2
输出参数dst的类型为[RowTable[0],ColTable[0]],即为[max,1]。输入参数src1的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数src2的类型为[RowTable[2],ColTable[2]],即为[0,0]。
图4l所示的reduce操作指令,指令配置表中的配置为:
RowTable:{0,max,max}      //dst,src1,src2
ColTable:{0,1,1}           //dst,src1,src2
输出参数dst的类型为[RowTable[0],ColTable[0]],即为[0,0]。输入参数src1的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数src2的类型为[RowTable[2],ColTable[2]],即为[max,1]。
图4m所示的reduce操作指令,指令配置表中的配置为:
RowTable:{0,max,max}      //dst,src1,src2
ColTable:{1,1,1}           //dst,src1,src2
输出参数dst的类型为[RowTable[0],ColTable[0]],即为[0,1]。输入参数src1的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数src2的类型为[RowTable[2],ColTable[2]],即为[max,1]。
图4n所示的reduce操作指令,指令配置表中的配置为:
RowTable:{1,max,max}     //dst,src1,src2
ColTable:{0,1,1}            //dst,src1,src2
输出参数dst的类型为[RowTable[0],ColTable[0]],即为[1,0]。输入参数src1的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数src2的类型为[RowTable[2],ColTable[2]],即为[max,1]。
图4o所示的reduce操作指令,指令配置表中的配置为:
RowTable:{0,max,1}       //dst,src1,src2
ColTable:{1,1,0}           //dst,src1,src2
输出参数dst的类型为[RowTable[0],ColTable[0]],即为[0,1]。输入参数src1的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数src2的类型为[RowTable[2],ColTable[2]],即为[1,0]。
图4p所示的reduce操作指令,指令配置表中的配置为:
RowTable:{1,max,0}        //dst,src1,src2
ColTable:{0,1,1}          //dst,src1,src2
输出参数dst的类型为[RowTable[0],ColTable[0]],即为[1,0]。输入参数src1的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数src2的类型为[RowTable[2],ColTable[2]],即为[0,1]。
图4q所示的reduce操作指令输入,指令配置表中的配置为:
RowTable:{0,max,0}       //dst,src1,src2
ColTable:{1,1,0}         //dst,src1,src2
输出参数dst的类型为[RowTable[0],ColTable[0]],即为[0,1]。输入参数src1的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数src2的类型为[RowTable[2],ColTable[2]],即为[0,0]。
图4r所示的reduce操作指令,指令配置表中的配置为:
RowTable:{1,max,0}       //dst,src1,src2
ColTable:{0,1,0}         //dst,src1,src2
输出参数dst的类型为[RowTable[0],ColTable[0]],即为[1,0]。输入参数src1的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数src2的类型为[RowTable[2],ColTable[2]],即为[0,0]。
其中,所述reduce操作指令是指输出参数在行维度和/或列维度相对于输入参数有缩减的操作指令。
可选的,所述步骤S302包括:
获取所述指令配置表中的向量指令;
确定所述向量指令的参数的数据类型;
获取所述标量指令中的循环迭代变量;
根据所述向量指令的参数的数据类型将所述循环迭代变量映射为所述向量指令的参数。
在上述步骤中,在得到指令配置表中的向量指令以及向量指令的参数的数据类型之后,将标量指令中的循环迭代变量根据所述向量指令维度映射规则映射成所述向量指令的参数,所述向量指令的参数可以是标量参数或者向量参数。
可选的,所述根据所述向量指令的参数的数据类型将所述循环迭代变量映射为所述向量指令的参数,包括:
步骤S501,根据所述标量指令中的循环迭代变量以及所述循环迭代变量的步长生成循环迭代变量集合;
步骤S502,根据所述向量指令的参数的数据类型获取所述数据类型所对应的向量指令维度映射规则;
步骤S503,从所述循环迭代变量集合中选取所述向量指令的参数所使用的循环迭代变量;
步骤S504,将所选取的循环迭代变量根据所述向量指令维度映射规则分别映射到所述参数的行维度和列维度得到所述向量指令的参数的循环迭代变量集合。
上述可选实施例中,不区分向量指令的类型是非reduce指令还是reduce。
其中,步骤S501中根据获取到的循环迭代变量生成循环迭代变量集合。
如图6a所示为标量代码块的示例。其中包括指令veadd,所述veadd的循环迭代变量包括i和j,其中j的步长为1,i的步长为10;则根据获取到的循环迭代变量生成的循环迭代变量集合为axises:{(i,10),(j,1)}。
在步骤S502-步骤S504中,对指令配置表中的每个向量指令,根据其参数的类型获取所述参数类型所对应的向量指令维度映射规则以及从所述循环迭代变量集合选取所述向量指令的参数所使用的循环迭代变量。之后将所选取的循环迭代变量根据所述向量指令维度映射规则映射成所述向量指令的参数的循环迭代变量集合,其中所述向量指令的参数的循环迭代变量集合包括所述向量指令参数在行维度上的循环迭代变量集合以及所述向量指令参数在列维度上的循环迭代变量集合。
如图6b所示的指令配置表,其中包括向量指令veadd_mm,veadd_mv_dimh,veadd_mv_dimw,veadd_mf。
对于向量指令veadd_mm:
veadd_mm包括2个输入和1和输出,其指令配置表中的参数类型为:
RowTable:{max,max,max}       //C,A,B
ColTable:{1,1,1}             //C,A,B
输出参数C的类型为[RowTable[0],ColTable[0]],即为[max,1]。输入参数A的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数B的类型为[RowTable[2],ColTable[2]],即为[max,1]。
由此可以得到C、A、B的映射规则均为:
行维度映射步长不为1的循环迭代变量,列维度映射步长为1的循环迭代变量。
根据集合axises,映射输出C的循环迭代变量。其中,C所使用的循环迭代变量为i和j。循环迭代变量i的stride为10,其不为1,因此,可以映射在C的row。循环迭代变量j的stride为1,因此可以映射在C的col。C的dst_row_axises集合为:{(i,10)},dst_col_axises集合为:{(j,1)}。
根据集合axises,映射输入A的循环迭代变量。其中,A所使用的循环迭代变量为i和j。循环迭代变量i的stride为10,可以映射在A的row。循环迭代变量j的stride为1,可以映射在A的col。A的src_row_axises集合为:{(i,10)},src_col_axises集合为:{(j,1)}。
根据集合axises,映射输入B的循环迭代变量。其中,B所使用的循环迭代变量为j。循环迭代变量j的stride为1,可以映射在B的col,但是不能映射在B的row,因为B的row只能映射stride不为1的循环迭代变量。B的src_row_axises集合为:{},src_col_axises集合为:{(j,1)}。
类似的,对于向量指令veadd_mv_dimh:
veadd_mv_dimh包括2个输入和1和输出,其指令配置表中的参数类型为:
RowTable:{max,max,0}       //C,A,B
ColTable:{1,1,1}           //C,A,B
输出参数C的类型为[RowTable[0],ColTable[0]],即为[max,1]。输入参数A的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数B的类型为[RowTable[2],ColTable[2]],即为[0,1]。
由此可以得到C、A的映射规则均为:
行维度映射步长不为1的循环迭代变量,列维度映射步长为1的循环迭代变量。
B的映射规则为:
行维度不映射循环迭代变量,列维度映射步长为1的循环迭代变量。
根据集合axises,映射输出C的循环迭代变量。其中,C所使用的循环迭代变量为i和j。循环迭代变量i的stride为10,可以映射在C的row。循环迭代变量j的stride为1,可以映射在C的col。C的dst_row_axises集合为:{(i,10)},dst_col_axises集合为:{(j,1)}。
根据集合axises,映射输入A的循环迭代变量。其中,A所使用的循环迭代变量为i和j。循环迭代变量i的stride为10,可以映射在A的row。循环迭代变量j的stride为1,可以映射在A的col。A的src_row_axises集合为:{(i,10)},src_col_axises集合为:{(j,1)}。
根据集合axises,映射输入B的循环迭代变量。其中,B所使用的循环迭代变量为j。根据上述映射规则,循环迭代变量j的stride为1,可以映射在B的col;而B的row方向不映射循环迭代变量。B的src_row_axises集合为:{},src_col_axises集合为:{(j,1)}。
对于向量指令veadd_mv_dimw:
veadd_mv_dimw包括2个输入和1和输出,其指令配置表中的参数类型为:
RowTable:{max,max,1}       //C,A,B
ColTable:{1,1,0}           //C,A,B
输出参数C的类型为[RowTable[0],ColTable[0]],即为[max,1]。输入参数A的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数B的类型为[RowTable[2],ColTable[2]],即为[1,0]。
由此可以得到C、A的映射规则均为:
行维度映射步长不为1的循环迭代变量,列维度映射步长为1的循环迭代变量。
B的映射规则为:
行维度映射步长不为1的循环迭代变量,列维度不映射循环迭代变量。
根据集合axises,映射输出C的循环迭代变量。其中,C所使用的循环迭代变量为i和j。循环迭代变量i的stride为10,可以映射在C的row。循环迭代变量j的stride为1,可以映射在C的col。C的dst_row_axises集合为:{(i,10)},dst_col_axises集合为:{(j,1)}。
根据集合axises,映射输入A的循环迭代变量。其中,A所使用的循环迭代变量为i和j。循环迭代变量i的stride为10,可以映射在A的row。循环迭代变量j的stride为1,可以映射在A的col。A的src_row_axises集合为:{(i,10)},src_col_axises集合为:{(j,1)}。
根据集合axises,映射输入B的循环迭代变量,B所使用的循环迭代变量为j。循环迭代变量j的stride为1,不能映射在B的row方向;而根据映射规则,B的col维度不映射循环迭代变量。因此,B的src_row_axises集合为:{},src_col_axises集合为:{}。
对于向量指令veadd_mv_mf:
veadd_mv_mf包括2个输入和1和输出,其指令配置表中的参数类型为:
RowTable:{max,max,0}       //C,A,B
ColTable:{1,1,0}           //C,A,B
输出参数C的类型为[RowTable[0],ColTable[0]],即为[max,1]。输入参数A的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数B的类型为[RowTable[2],ColTable[2]],即为[0,0]。
由此可以得到C、A的映射规则均为:
行维度映射步长不为1的循环迭代变量,列维度映射步长为1的循环迭代变量。
B的映射规则为:
行维度不映射循环迭代变量,列维度不映射循环迭代变量。
根据集合axises,映射输出C的循环迭代变量。其中,C所使用的循环迭代变量为i和j。循环迭代变量i的stride为10,可以映射在C的row。循环迭代变量j的stride为1,可以映射在C的col。C的dst_row_axises集合为:{(i,10)},dst_col_axises集合为:{(j,1)}。
根据集合axises,映射输入A的循环迭代变量。其中,A所使用的循环迭代变量为i和j。循环迭代变量i的stride为10,可以映射在A的row。循环迭代变量j的stride为1,可以映射在A的col。A的src_row_axises集合为:{(i,10)},src_col_axises集合为:{(j,1)}。
根据集合axises,映射输入B的循环迭代变量。其中,B所使用的循环迭代变量为j。循环迭代变量j的stride为1,不能映射在B的row方向;而B的col维度不映射循环迭代变量。B的src_row_axises集合为:{},src_col_axises集合为:{}。
经过上述步骤S501-步骤S504,将标量指令映射为向量指令。由于在映射完成之前不能确定向量指令配置表中的哪些向量指令可以执行所述标量指令中的操作,因此在步骤S302中,需要将所述标量指令映射成所述向量指令配置表中的每个向量指令。如上述示例中,将标量指令veadd分别映射成向量指令veadd_mm、veadd_mv_dimh、veadd_mv_dimw和veadd_mf。
可选的,所述步骤S302在具体的实现过程中,还可区分非reduce指令和reduce指令,两种不同类型的指令使用不同的映射过程。
可选的,对于非reduce指令,所述步骤S302包括:
步骤S701,根据所述标量指令中的循环迭代变量以及所述循环迭代变量的步长生成循环迭代变量集合;
步骤S702,根据所述向量指令的输出参数的数据类型获取所述数据类型所对应的输出参数的向量指令维度映射规则;
步骤S703,从所述循环迭代变量集合中选取所述输出参数所使用的循环迭代变量;
步骤S704,将所选取的输出参数所使用的循环迭代变量根据所述输出参数的向量指令维度映射规则分别映射到所述输出参数的行维度和列维度得到所述向量指令的输出参数的循环迭代变量集合;
步骤S705,根据所述向量指令的输入参数的数据类型获取所述数据类型所对应的输入参数的向量指令维度映射规则;
步骤S706,从所述输出参数的循环迭代变量集合中选取所述输入参数所使用的循环迭代变量;
步骤S707,将所选取的输入参数所使用的循环迭代变量根据所述输入参数的向量指令维度映射规则分别映射到所述输入参数的行维度和列维度得到所述向量指令的输入参数的循环迭代变量集合。
上述步骤S701-步骤S707中,先在输出参数上映射循环迭代变量,之后将得到的输出参数循环迭代变量集合作为输入参数选择循环迭代变量的集合,然后在输入参数上映射循环迭代变量。
如图8所示为另一个标量代码块的示例。其中包括指令veadd,所述veadd的循 环迭代变量包括i,其中i的步长为1;则根据获取到的循环迭代变量生成的循环迭代变量集合为axises:{(i,1)}。
在该实施例中,还使用图6b所示的指令配置表,其中包括向量指令veadd_mm,veadd_mv_dimh,veadd_mv_dimw,veadd_mf。
对于向量指令veadd_mm:
veadd_mm包括2个输入和1和输出,其指令配置表中的参数类型为:
RowTable:{max,max,max}       //C,A,B
ColTable:{1,1,1}             //C,A,B
输出参数C的类型为[RowTable[0],ColTable[0]],即为[max,1]。输入参数A的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数B的类型为[RowTable[2],ColTable[2]],即为[max,1]。
由此可以得到C、A、B的映射规则均为:
行维度映射步长不为1的循环迭代变量,列维度映射步长为1的循环迭代变量。
根据集合axises,映射输出C的循环迭代变量。其中,C所使用的循环迭代变量为i。循环迭代变量i的stride为1,可以映射在C的col。C的dst_row_axises集合为:{},dst_col_axises集合为:{(i,1)}。
根据集合dst_row_axises和dst_col_axises,映射输入A的循环迭代变量。其中,A所使用的循环迭代变量为i。循环迭代变量i的stride为1,可以映射在A的col。A的src_row_axises集合为:{},src_col_axises集合为:{(i,1)}。
根据集合dst_row_axises和dst_col_axises,映射输入B的循环迭代变量。其中,B所使用的循环迭代变量为i。循环迭代变量i的stride为1,可以映射在B的col。B的src_row_axises集合为:{},src_col_axises集合为:{(i,1)}。
以此类推,对于向量指令veadd_mv_dimh:
veadd_mv_dimh包括2个输入和1和输出,其指令配置表中的参数类型为:
RowTable:{max,max,0}       //C,A,B
ColTable:{1,1,1}           //C,A,B
输出参数C的类型为[RowTable[0],ColTable[0]],即为[max,1]。输入参数A的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数B的类型为[RowTable[2],ColTable[2]],即为[0,1]。
由此可以得到C、A的映射规则均为:
行维度映射步长不为1的循环迭代变量,列维度映射步长为1的循环迭代变量。
B的映射规则为:
行维度不映射循环迭代变量,列维度映射步长为1的循环迭代变量。
根据集合axises,映射输出C的循环迭代变量。其中,C所使用的循环迭代变量为i。循环迭代变量i的stride为1,可以映射在C的col。C的dst_row_axises集合为:{},dst_col_axises集合为:{(i,1)}。
根据集合dst_row_axises和dst_col_axises,映射输入A的循环迭代变量。其中,A所使用的循环迭代变量为i。循环迭代变量i的stride为1,可以映射在A的col。A的src_row_axises集合为:{},src_col_axises集合为:{(i,1)}。
根据集合dst_row_axises和dst_col_axises,映射输入B的循环迭代变量。其中,B所使用的循环迭代变量为i。循环迭代变量i的stride为1,可以映射在B的col。B的src_row_axises集合为:{},src_col_axises集合为:{(i,1)}。
对于向量指令veadd_mv_dimw:
veadd_mv_dimw包括2个输入和1和输出,其指令配置表中的参数类型为:
RowTable:{max,max,1}       //C,A,B
ColTable:{1,1,0}           //C,A,B
输出参数C的类型为[RowTable[0],ColTable[0]],即为[max,1]。输入参数A的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数B的类型为[RowTable[2],ColTable[2]],即为[1,0]。
由此可以得到C、A的映射规则均为:
行维度映射步长不为1的循环迭代变量,列维度映射步长为1的循环迭代变量。
B的映射规则为:
行维度映射步长不为1的循环迭代变量,列维度不映射循环迭代变量。
根据集合axises,映射输出C的循环迭代变量。其中,C所使用的循环迭代变量为i。循环迭代变量i的stride为1,可以映射在C的col。C的dst_row_axises集合为:{},dst_col_axises集合为:{(i,1)}。
根据集合dst_row_axises和dst_col_axises,映射输入A的循环迭代变量。其中,A所使用的循环迭代变量为i。循环迭代变量i的stride为1,可以映射在A的col。A的src_row_axises集合为:{},src_col_axises集合为:{(i,1)}。
根据集合dst_row_axises和dst_col_axises,映射输入B的循环迭代变量。其中,B所使用的循环迭代变量为i。循环迭代变量i的stride为1,不能映射在B的row和col方向。B的src_row_axises集合为:{},src_col_axises集合为:{}。
对于向量指令veadd_mv_mf:
veadd_mv_mf包括2个输入和1和输出,其指令配置表中的参数类型为:
RowTable:{max,max,0}       //C,A,B
ColTable:{1,1,0}           //C,A,B
输出参数C的类型为[RowTable[0],ColTable[0]],即为[max,1]。输入参数A的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数B的类型为[RowTable[2],ColTable[2]],即为[0,0]。
由此可以得到C、A的映射规则均为:
行维度映射步长不为1的循环迭代变量,列维度映射步长为1的循环迭代变量。
B的映射规则为:
行维度不映射循环迭代变量,列维度不映射循环迭代变量。
根据集合axises,映射输出C的循环迭代变量。其中,C所使用的循环迭代变量为i。循环迭代变量i的stride为1,可以映射在C的col。C的dst_row_axises集合为:{},dst_col_axises集合为:{(i,1)}。
根据集合dst_row_axises和dst_col_axises,映射输入A的循环迭代变量。其中,A所使用的循环迭代变量为i。循环迭代变量i的stride为1,可以映射在A的col。A的src_row_axises集合为:{},src_col_axises集合为:{(i,1)}。
根据集合axises,映射输入B的循环迭代变量。其中,B所使用的循环迭代变量为j。循环迭代变量i的stride为1,不能映射在B的row和col方向。B的src_row_axises集合为:{},src_col_axises集合为:{}。
可选的,对于reduce指令,所述步骤S302包括:
步骤S901,根据所述标量指令中的循环迭代变量以及所述循环迭代变量的步长生成循环迭代变量集合;
步骤S902,根据所述向量指令的输入参数的数据类型获取所述数据类型所对应的输入参数的向量指令维度映射规则;
步骤S903,从所述循环迭代变量集合中选取所述输入参数的循环迭代变量;
步骤S904,将所选取的输入参数的循环迭代变量根据所述输入参数的向量指令维度映射规则分别映射到所述输入参数的行维度和列维度得到所述向量指令的输入参数的循环迭代变量集合;
步骤S905,将所有输入参数的循环迭代变量集合合并得到输入参数的循环迭代变量联合集合;
步骤S906,根据所述向量指令的输出参数的数据类型获取所述数据类型所对应的输出参数的向量指令维度映射规则;
步骤S907,从所述输入参数的循环迭代变量联合集合中选取所述输出参数的循环迭代变量;
步骤S908,将所选取的输出参数的循环迭代变量根据所述输出参数的向量指令维度映射规则分别映射到所述输出参数的行维度和列维度得到所述向量指令的输出参数的循环迭代变量集合。
上述步骤S901-步骤S908中,先在输入参数上映射循环迭代变量,之后将得到的所有输出参数循环迭代变量集合的并集作为输出参数选择循环迭代变量的集合,然后在输出参数上映射所选择的循环迭代变量。
如图10a所示为另一个标量代码块的示例。其中包括指令veemacc,所述veemacc的循环迭代变量包括i和k,其中k的步长为1,i的步长为10;则根据获取到的循环迭代变量生成的循环迭代变量集合为axises:{(i,10),(k,1)}。
在该实施例中,使用如图10b所示的指令配置表,其中包括向量指令veemacc_mm,veemacc_mm_dimh,veemacc_mm_dimw,veemacc_mv_dimh,veemacc_mv_dimw,veemacc_mf_dimh,veemacc_mf_dimw。
对于向量指令veemacc_mm:
veemacc_mm包括2个输入和1和输出,其指令配置表中的参数类型为:
RowTable:{0,max,max}       //C,A,B
ColTable:{0,1,1}           //C,A,B
输出参数C的类型为[RowTable[0],ColTable[0]],即为[0,0]。输入参数A的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数B的类型为[RowTable[2],ColTable[2]],即为[max,1]。
由此可以得到A、B的映射规则均为:
行维度映射步长不为1的循环迭代变量,列维度映射步长为1的循环迭代变量。
由此可以得到C的映射规则为:
行维度不映射循环迭代变量,列维度不映射循环迭代变量。
根据集合axises,映射输入A的循环迭代变量。循环迭代变量i的stride为10,可以映射在A的row;循环迭代变量k的stride为1,可以映射在A的col。A的src_row_axises集合为:{(i,10)},src_col_axises集合为:{(k,1)}。
根据集合axises,映射输入B的循环迭代变量。循环迭代变量i的stride为10,可以映射在B的row;循环迭代变量k的stride为1,可以映射在B的col。B的src_row_axises集合为:{(i,10)},src_col_axises集合为:{(k,1)}。
获得所有输入参数的row方向和col方向循环迭代变量集合的并集,相同的循环迭代变量的stride取最大值。由此可以得到,union_row_axises为:{(i,10)},union_col_axises为:{(k,1)}。
根据集合union_row_axises和union_col_axises,映射输出C的循环迭代变量。其中,C所使用的循环迭代变量为k。循环迭代变量k的stride为1,不可以映射在C的row和col。C的dst_row_axises集合为:{},dst_col_axises集合为:{}。
对于向量指令veemacc_mm_dimh:
veemacc_mm_dimh包括2个输入和1和输出,其指令配置表中的参数类型为:
RowTable:{0,max,max}       //C,A,B
ColTable:{1,1,1}           //C,A,B
输出参数C的类型为[RowTable[0],ColTable[0]],即为[0,1]。输入参数A的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数B的类型为[RowTable[2],ColTable[2]],即为[max,1]。
由此可以得到A、B的映射规则均为:
行维度映射步长不为1的循环迭代变量,列维度映射步长为1的循环迭代变量。
由此可以得到C的映射规则为:
行维度不映射循环迭代变量,列维度映射步长为1的循环迭代变量。
根据集合axises,映射输入A的循环迭代变量。循环迭代变量i的stride为10,可以映射在A的row;循环迭代变量k的stride为1,可以映射在A的col。A的src_row_axises集合为:{(i,10)},src_col_axises集合为:{(k,1)}。
根据集合axises,映射输入B的循环迭代变量。循环迭代变量i的stride为10,可以映射在B的row;循环迭代变量k的stride为1,可以映射在B的col。B的src_row_axises集合为:{(i,10)},src_col_axises集合为:{(k,1)}。
获得所有输入参数的row方向和col方向循环迭代变量集合的并集,相同的循环迭代变量的stride取最大值。由此可以得到,union_row_axises为:{(i,10)},union_col_axises为:{(k,1)}。
根据集合union_row_axises和union_col_axises,映射输出C的循环迭代变量。其中,C所使用的循环迭代变量为k。循环迭代变量k的stride为1,可以映射在C的col。C的dst_row_axises集合为:{},dst_col_axises集合为:{(k,1)}。
对于向量指令veemacc_mm_dimw:
veemacc_mm_dimw包括2个输入和1和输出,其指令配置表中的参数类型为:
RowTable:{1,max,max}       //C,A,B
ColTable:{0,1,1}          //C,A,B
输出参数C的类型为[RowTable[0],ColTable[0]],即为[1,0]。输入参数A的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数B的类型为[RowTable[2],ColTable[2]],即为[max,1]。
由此可以得到A、B的映射规则均为:
行维度映射步长不为1的循环迭代变量,列维度映射步长为1的循环迭代变量。
由此可以得到C的映射规则为:
行维度映射步长不为1的循环迭代变量,列维度不映射循环迭代变量。
根据集合axises,映射输入A的循环迭代变量。循环迭代变量i的stride为10,可以映射在A的row;循环迭代变量k的stride为1,可以映射在A的col。A的src_row_axises集合为:{(i,10)},src_col_axises集合为:{(k,1)}。
根据集合axises,映射输入B的循环迭代变量。循环迭代变量i的stride为10,可以映射在B的row;循环迭代变量k的stride为1,可以映射在B的col。B的src_row_axises集合为:{(i,10)},src_col_axises集合为:{(k,1)}。
获得所有输入参数的row方向和col方向循环迭代变量集合的并集,相同的循环迭代变量的stride取最大值。由此可以得到,union_row_axises为:{(i,10)},union_col_axises为:{(k,1)}。
根据集合union_row_axises和union_col_axises,映射输出C的循环迭代变量。其中,C所使用的循环迭代变量为k。循环迭代变量k的stride为1,不可以映射在C的row和col。C的dst_row_axises集合为:{},dst_col_axises集合为:{}。
对于向量指令veemacc_mv_dimh:
veemacc_mv_dimh包括2个输入和1和输出,其指令配置表中的参数类型为:
RowTable:{0,max,1}       //C,A,B
ColTable:{1,1,0}         //C,A,B
输出参数C的类型为[RowTable[0],ColTable[0]],即为[0,1]。输入参数A的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数B的类型为[RowTable[2],ColTable[2]],即为[1,0]。
由此可以得到A的映射规则为:
行维度映射步长不为1的循环迭代变量,列维度映射步长为1的循环迭代变量。
由此可以得到B的映射规则为:
行维度映射步长不为1的循环迭代变量,列维度不映射循环迭代变量。
由此可以得到C的映射规则为:
行维度不映射循环迭代变量,列维度映射步长为1的循环迭代变量。
根据集合axises,映射输入A的循环迭代变量。循环迭代变量i的stride为10,可以映射在A的row;循环迭代变量k的stride为1,可以映射在A的col。A的src_row_axises集合为:{(i,10)},src_col_axises集合为:{(k,1)}。
根据集合axises,映射输入B的循环迭代变量。循环迭代变量i的stride为10,可以映射在B的row。B的src_row_axises集合为:{(i,10)},src_col_axises集合为:{}。
获得所有输入参数的row方向和col方向循环迭代变量集合的并集,相同的循环迭代变量的stride取最大值。由此可以得到,union_row_axises为:{(i,10)},union_col_axises为:{(k,1)}。
根据集合union_row_axises和union_col_axises,映射输出C的循环迭代变量。其中,C所使用的循环迭代变量为k。循环迭代变量k的stride为1,可以映射在C的col。C的dst_row_axises集合为:{},dst_col_axises集合为:{(k,1)}。
对于向量指令veemacc_mv_dimw:
veemacc_mv_dimw包括2个输入和1和输出,其指令配置表中的参数类型为:
RowTable:{1,max,0}       //C,A,B
ColTable:{0,1,1}         //C,A,B
输出参数C的类型为[RowTable[0],ColTable[0]],即为[1,0]。输入参数A的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数B的类型为[RowTable[2],ColTable[2]],即为[0,1]。
由此可以得到A的映射规则为:
行维度映射步长不为1的循环迭代变量,列维度映射步长为1的循环迭代变量。
由此可以得到B的映射规则为:
行维度不映射循环迭代变量,列维度映射步长为1的循环迭代变量。
由此可以得到C的映射规则为:
行维度映射步长不为1的循环迭代变量,列维度不映射循环迭代变量。
根据集合axises,映射输入A的循环迭代变量。循环迭代变量i的stride为10,可以映射在A的row;循环迭代变量k的stride为1,可以映射在A的col。A的src_row_axises集合为:{(i,10)},src_col_axises集合为:{(k,1)}。
根据集合axises,映射输入B的循环迭代变量。循环迭代变量k的stride为1,可以映射在B的col。B的src_row_axises集合为:{},src_col_axises集合为:{(k,1)}。
获得所有输入参数的row方向和col方向循环迭代变量集合的并集,相同的循环迭代变量的stride取最大值。由此可以得到,union_row_axises为:{(i,10)},union_col_axises为:{(k,1)}。
根据集合union_row_axises和union_col_axises,映射输出C的循环迭代变量。其中,C所使用的循环迭代变量为k。循环迭代变量k的stride为1,不可以映射在C的row和col。C的dst_row_axises集合为:{},dst_col_axises集合为:{}。
对于向量指令veemacc_mf_dimh:
veemacc_mf_dimh包括2个输入和1和输出,其指令配置表中的参数类型为:
RowTable:{0,max,0}       //C,A,B
ColTable:{1,1,0}         //C,A,B
输出参数C的类型为[RowTable[0],ColTable[0]],即为[0,1]。输入参数A的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数B的类型为[RowTable[2],ColTable[2]],即为[0,0]。
由此可以得到A的映射规则为:
行维度映射步长不为1的循环迭代变量,列维度映射步长为1的循环迭代变量。
由此可以得到B的映射规则为:
行维度不映射循环迭代变量,列维度不映射循环迭代变量。
由此可以得到C的映射规则为:
行维度不映射循环迭代变量,列维度映射步长为1的循环迭代变量。
根据集合axises,映射输入A的循环迭代变量。循环迭代变量i的stride为10,可以映射在A的row;循环迭代变量k的stride为1,可以映射在A的col。A的src_row_axises集合为:{(i,10)},src_col_axises集合为:{(k,1)}。
根据集合axises,映射输入B的循环迭代变量。循环迭代变量不可以映射在B的row和row。B的src_row_axises集合为:{},src_col_axises集合为:{}。
获得所有输入参数的row方向和col方向循环迭代变量集合的并集,相同的循环迭代变量的stride取最大值。由此可以得到,union_row_axises为:{(i,10)},union_col_axises为:{(k,1)}。
根据集合union_row_axises和union_col_axises,映射输出C的循环迭代变量。其中,C所使用的循环迭代变量为k。循环迭代变量k的stride为1,可以映射在C的col。C的dst_row_axises集合为:{},dst_col_axises集合为:{(k,1)}。
对于向量指令veemacc_mf_dimw:
veemacc_mf_dimw包括2个输入和1和输出,其指令配置表中的参数类型为:
RowTable:{1,max,0}       //C,A,B
ColTable:{0,1,0}         //C,A,B
输出参数C的类型为[RowTable[0],ColTable[0]],即为[1,0]。输入参数A的类型为[RowTable[1],ColTable[1]],即为[max,1]。输入参数B的类型为 [RowTable[2],ColTable[2]],即为[0,0]。
由此可以得到A的映射规则为:
行维度映射步长不为1的循环迭代变量,列维度映射步长为1的循环迭代变量。
由此可以得到B的映射规则为:
行维度不映射循环迭代变量,列维度不映射循环迭代变量。
由此可以得到C的映射规则为:
行维度映射步长不为1的循环迭代变量,列维度不映射循环迭代变量。
根据集合axises,映射输入A的循环迭代变量。循环迭代变量i的stride为10,可以映射在A的row;循环迭代变量k的stride为1,可以映射在A的col。A的src_row_axises集合为:{(i,10)},src_col_axises集合为:{(k,1)}。
根据集合axises,映射输入B的循环迭代变量。循环迭代变量k的stride为1,不可以映射在B的row和col。B的src_row_axises集合为:{},src_col_axises集合为:{}。
获得所有输入参数的row方向和col方向循环迭代变量集合的并集,相同的循环迭代变量的stride取最大值。由此可以得到,union_row_axises为:{(i,10)},union_col_axises为:{(k,1)}。
根据集合union_row_axises和union_col_axises,映射输出C的循环迭代变量。其中,C所使用的循环迭代变量为k。循环迭代变量k的stride为1,不可以映射在C的row和col。C的dst_row_axises集合为:{},dst_col_axises集合为:{}。
通过上述针对reduce指令和非reduce指令的不同映射方式,得到将标量指令映射成向量指令的结果。
返回图3,所述向量指令的识别方法,还包括:
步骤S303,根据向量指令维度有效性规则判断所述向量指令的有效性;其中,所述向量指令维度有效性规则包括验证指令映射结果的有效性的规则;如果所述向量指令有效,则所述向量指令识别成功。
在将标量指令映射成指令配置表中的向量指令之后,无法确定映射后的向量指令是否有效,因此在该步骤中,通过向量指令维度有效性规则进一步判断哪些向量指令是有效的指令。
可选的,所述步骤S303包括:
根据所述参数的循环迭代变量集合中的循环迭代变量的维度以及步长确定所述向量指令的有效性;或者,
根据所述输出参数的循环迭代变量集合中的循环迭代变量的维度以及步长和所述输入参数的循环迭代变量集合中的循环迭代变量的维度以及步长确定所述向量指令的有效性。
可选的,对于不同类型的数据,其向量指令维度有效性规则不同。
当所述数据类型为1*1型数据,行维度和列维度的映射循环迭代变量没有约束;
当所述数据类型为1*n型数据,行维度映射循环迭代变量没有约束,列维度必须有映射循环迭代变量;
当所述数据类型为m*1型数据,行维度必须有映射循环迭代变量,列维度映射循环迭代变量没有约束;
当所述数据类型为m*n型数据,行维度映射循环迭代变量没有约束,列维度必须有映射循环迭代变量。
此外,对于输入参数中行维度/列维度有效性判断规则还包括:
所有输入参数行维度/列维度映射循环迭代变量都不为空时,行维度/列维度映 射循环迭代变量的交集不为空。
对于reduce类指令还需要满足的有效性规则:
输入参数中使用的被reduce处理的循环迭代变量,必须成功映射到行维度或者列维度。
输入参数中被reduce处理的循环迭代变量映射的维度,在输出参数中该维度不能映射循环迭代变量。
由此,通过上述规则可以判断:对于步骤S501-步骤S504的示例中的向量指令中,veadd_mv_dimw是无效向量指令,其原因是因为B是m*1型数据,其有效性规则为:行维度必须有映射循环迭代变量,列维度映射循环迭代变量没有约束,但是B的集合中行维度没有映射循环迭代变量,因此veadd_mv_dimw是无效的。而采用有效性规则判断后,其他向量指令均符合有效性规则,因此均有效。
类似的,通过上述规则可以判断:对于步骤S701-步骤S707的示例中的向量指令中,veadd_mv_dimw是无效向量指令,因为B是m*1型数据,其有效性规则为:行维度必须有映射循环迭代变量,列维度映射循环迭代变量没有约束。而B的src_row_axises集合为:{},src_col_axises集合为:{},即B的行维度没有映射循环迭代变量,因此不符合有效性规则,veadd_mv_dimw是无效的。而采用有效性规则判断后,其他向量指令均符合有效性规则,因此均有效。
由此,通过上述规则可以判断:对于步骤S901-步骤S908的示例中的向量指令中,veemacc_mm_dimw是无效指令,因为C是m*1型数据,其有效性规则为:行维度必须有映射循环迭代变量,列维度映射循环迭代变量没有约束。而C的dst_row_axises集合为:{},dst_col_axises集合为:{},由于C的行维度,没有映射循环迭代变量,不符合有效性规则,因此veemacc_mm_dimw是无效的。
veemacc_mv_dimw是无效指令,因为C是m*1型数据,其有效性规则为:行维度必须有映射循环迭代变量,列维度映射循环迭代变量没有约束。而C的dst_row_axises集合为:{},dst_col_axises集合为:{},由于C的行维度,没有映射循环迭代变量,不符合有效性规则,因此veemacc_mm_dimw是无效的。
veemacc_mf_dimh是无效指令,因为B使用了reduce的循环迭代变量i,其有效性规则为:输入参数中使用的被reduce处理的循环迭代变量,必须成功映射到行维度或者列维度。但是B的src_row_axises集合为:{},src_col_axises集合为:{},B的row和col都没有映射轴,不符合有效性规则,因此veemacc_mf_dimh是无效的。
veemacc_mf_dimw是无效指令,因为C是m*1型数据,其有效性规则为:行维度必须有映射循环迭代变量,列维度映射循环迭代变量没有约束。而C的dst_row_axises集合为:{},dst_col_axises集合为:{},C的行维度没有映射循环迭代变量;并且,因为B使用了reduce的循环迭代变量i,其有效性规则为:输入参数中使用的被reduce处理的循环迭代变量,必须成功映射到行维度或者列维度。但是B的src_row_axises集合为:{},src_col_axises集合为:{},B的row和col都没有映射轴,不符合有效性规则,因此veemacc_mf_dimw是无效指令。
其他向量指令均符合有效性规则。
最后,保留有效的向量指令作为识别成功的向量指令。如上述实施例中,veadd_mm,veadd_mv_dimh,veadd_mf为识别成功的向量指令,即标量的指令veadd可以转换成veadd_mm,veadd_mv_dimh,veadd_mf中的任一个;veemacc_mm,veemacc_mm_dimh,veemacc_mv_dimh为识别成功的向量指令,即标量的指令veemacc可以转换成veemacc_mm,veemacc_mm_dimh,veemacc_mv_dimh中的任一个。
通过上述方案,可以将标量指令转换为对应的向量指令,提高硬件芯片处理数据的效率。另外,通过指令配置文件,能够扩展上述识别方法所支持的向量指令。
本公开实施例公开了一种向量指令的识别方法,该向量指令的识别方法包括:获取标量指令;根据向量指令维度映射规则以及指令配置表将所述标量指令映射为向量指令;其中,所述向量指令维度映射规则包括将标量指令映射为向量指令的规则;所述指令配置表包括向量指令集合;根据向量指令维度有效性规则判断所述向量指令的有效性;其中,所述向量指令维度有效性规则包括验证指令映射结果的有效性的规则;如果所述向量指令有效,则所述向量指令识别成功。上述方法通过向量指令维度映射规则、指令配置表以及向量指令维度有效性规则识别向量指令,解决了现有技术中将标量指令转换为向量指令时复杂、通用性不高且无法区分向量指令的技术问题。
本公开实施例该提供一种向量指令的识别装置,包括:指令获取模块,用于获取标量指令;映射模块,用于根据向量指令维度映射规则以及指令配置表将所述标量指令映射为向量指令;其中,所述向量指令维度映射规则包括将标量指令映射为向量指令的规则;所述指令配置表包括向量指令集合;识别验证模块,用于根据向量指令维度有效性规则判断所述向量指令的有效性;其中,所述向量指令维度有效性规则包括验证指令映射结果的有效性的规则;如果所述向量指令有效,则所述向量指令识别成功。
进一步的,所述向量指令维度映射规则,包括:
向量指令中的参数的数据类型;
标量指令中的循环迭代变量映射为所述向量指令中的参数的规则。
进一步的,当所述参数的数据类型为1*1型数据,所述映射规则为所述参数的行维度不映射所述循环迭代变量并且所述参数的列维度不映射循环迭代变量;
当所述参数的数据类型为1*n型数据,所述映射规则为所述参数的行维度不映射所述循环迭代变量并且所述参数的列维度映射步长为1的循环迭代变量;
当所述参数的数据类型为m*1型数据,所述映射规则为所述参数的行维度映射步长不为1的所述循环迭代变量并且所述参数的列维度不映射所述循环迭代变量;
当所述参数的数据类型为m*n型数据,所述映射规则为所述映射规则为所述参数的行维度映射步长不为1的所述循环迭代变量并且所述参数的列维度映射步长为1的循环迭代变量。
进一步的,所述向量指令维度有效性规则,包括:
当所述参数的数据类型为1*1型数据,所述参数在行维度和列维度上映射循环迭代变量没有约束;
当所述参数的数据类型为1*n型数据,所述参数在行维度上映射循环迭代变量没有约束,所述参数在列维度上有映射循环迭代变量;
当所述参数的数据类型为m*1型数据,所述参数在行维度上有映射循环迭代变量,所述参数在列维度上映射循环迭代变量没有约束;
当所述参数的数据类型为m*n型数据,所述参数在行维度上映射循环迭代变量没有约束,所述参数在列维度上有映射循环迭代变量;
当所述向量指令为消除循环迭代变量的指令,所述参数在行维度上有映射循环迭代变量或者所述参数在列维度上有映射循环迭代变量。
进一步的,所述指令配置表,包括:
所有待识别的向量指令以及每个所述向量指令中的参数的数据类型。
进一步的,所述向量指令中的参数的数据类型由所述数据类型的步长约束;其中,
当所述数据类型为1*1型数据,其步长约束为行维度上步长为0且列维度上步长为0;
当所述数据类型为1*n型数据,其步长约束为行维度上步长为0且列维度上步长为1;
当所述数据类型为m*1型数据,其步长约束为行维度上步长为1且列维度上步长为0;
当所述数据类型为m*n型数据,其步长约束为行维度上步长为max且列维度上步长为1;其中,所述max表示寄存器所能表示的最大值。
进一步的,所述映射模块,还用于:
获取所述指令配置表中的向量指令;
确定所述向量指令的参数的数据类型;
获取所述标量指令中的循环迭代变量;
根据所述向量指令的参数的数据类型将所述循环迭代变量映射为所述向量指令的参数。
进一步的,所述映射模块,还用于:
根据所述标量指令中的循环迭代变量以及所述循环迭代变量的步长生成循环迭代变量集合;
根据所述向量指令的参数的数据类型获取所述数据类型所对应的向量指令维度映射规则;
从所述循环迭代变量集合中选取所述向量指令的参数所使用的循环迭代变量;
将选取的所述循环迭代变量根据所述向量指令维度映射规则分别映射到所述参数的行维度和列维度得到所述向量指令的参数的循环迭代变量集合。
进一步的,当所述向量指令为非消除循环迭代变量的指令,所述映射模块,还用于:
根据所述标量指令中的循环迭代变量以及所述循环迭代变量的步长生成循环迭代变量集合;
根据所述向量指令的输出参数的数据类型获取所述数据类型所对应的输出参数的向量指令维度映射规则;
从所述循环迭代变量集合中选取所述输出参数所使用的循环迭代变量;
将所选取的输出参数所使用的循环迭代变量根据所述输出参数的向量指令维度映射规则分别映射到所述输出参数的行维度和列维度得到所述向量指令的输出参数的循环迭代变量集合;
根据所述向量指令的输入参数的数据类型获取所述数据类型所对应的输入参数的向量指令维度映射规则;
从所述输出参数的循环迭代变量集合中选取所述输入参数所使用的循环迭代变量;
将所选取的输入参数所使用的循环迭代变量根据所述输入参数的向量指令维度映射规则分别映射到所述输入参数的行维度和列维度得到所述向量指令的输入参数的循环迭代变量集合。
进一步的,当所述向量指令为消除循环迭代变量的指令,所述映射模块,还用于:
根据所述标量指令中的循环迭代变量以及所述循环迭代变量的步长生成循环迭代变量集合;
根据所述向量指令的输入参数的数据类型获取所述数据类型所对应的输入参数的向量指令维度映射规则;
从所述循环迭代变量集合中选取所述输入参数所使用的循环迭代变量;
将所选取的输入参数所使用的循环迭代变量根据所述输入参数的向量指令维度映射规则分别映射到所述输入参数的行维度和列维度得到所述向量指令的输入参数的循环迭代变量集合;
将所有输入参数的循环迭代变量集合合并得到输入参数的循环迭代变量联合集 合;
根据所述向量指令的输出参数的数据类型获取所述数据类型所对应的输出参数的向量指令维度映射规则;
从所述输入参数的循环迭代变量联合集合中选取所述输出参数所使用的循环迭代变量;
将所选取的输出参数所使用的循环迭代变量根据所述输出参数的向量指令维度映射规则分别映射到所述输出参数的行维度和列维度得到所述向量指令的输出参数的循环迭代变量集合。
进一步的,所述识别验证模块,还用于:
根据所述参数的循环迭代变量集合中的循环迭代变量的维度以及步长确定所述向量指令的有效性;或者,
根据所述输出参数的循环迭代变量集合中的循环迭代变量的维度以及步长和所述输入参数的循环迭代变量集合中的循环迭代变量的维度以及步长确定所述向量指令的有效性。
本公开实施例还提供一种电子设备,包括:存储器,用于存储计算机可读指令;以及一个或多个处理器,用于运行所述计算机可读指令,使得所述处理器运行时实现实施例中的任一所述方法。
本公开实施例还提供一种非暂态计算机可读存储介质,该非暂态计算机可读存储介质存储计算机指令,该计算机指令用于使计算机执行前述实施例中的任一所述方法。
本公开实施例还提供一种计算机程序产品,其中,包括计算机指令,当所述计算机指令被计算设备执行时,所述计算设备可以执行前述实施例中的任一所述方法。
本公开附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。

Claims (15)

  1. 一种向量指令的识别方法,其特征在于,包括:
    获取标量指令;
    根据向量指令维度映射规则以及指令配置表将所述标量指令映射为向量指令;其中,所述向量指令维度映射规则包括将标量指令映射为向量指令的规则;所述指令配置表包括向量指令集合;
    根据向量指令维度有效性规则判断所述向量指令的有效性;其中,所述向量指令维度有效性规则包括验证指令映射结果的有效性的规则;
    如果所述向量指令有效,则所述向量指令识别成功。
  2. 如权利要求1所述的向量指令的识别方法,其特征在于,所述向量指令维度映射规则,包括:
    向量指令中的参数的数据类型;
    标量指令中的循环迭代变量映射为所述向量指令中的参数的规则。
  3. 如权利要求2所述的向量指令的识别方法,其特征在于:
    当所述参数的数据类型为1*1型数据,所述映射规则为所述参数的行维度不映射所述循环迭代变量并且所述参数的列维度不映射循环迭代变量;
    当所述参数的数据类型为1*n型数据,所述映射规则为所述参数的行维度不映射所述循环迭代变量并且所述参数的列维度映射步长为1的循环迭代变量;
    当所述参数的数据类型为m*1型数据,所述映射规则为所述参数的行维度映射步长不为1的所述循环迭代变量并且所述参数的列维度不映射所述循环迭代变量;
    当所述参数的数据类型为m*n型数据,所述映射规则为所述映射规则为所述参数的行维度映射步长不为1的所述循环迭代变量并且所述参数的列维度映射步长为1的循环迭代变量。
  4. 如权利要求2或3中所述的向量指令的识别方法,其特征在于,所述向量指令维度有效性规则,包括:
    当所述参数的数据类型为1*1型数据,所述参数在行维度和列维度上映射循环迭代变量没有约束;
    当所述参数的数据类型为1*n型数据,所述参数在行维度上映射循环迭代变量没有约束,所述参数在列维度上有映射循环迭代变量;
    当所述参数的数据类型为m*1型数据,所述参数在行维度上有映射循环迭代变量,所述参数在列维度上映射循环迭代变量没有约束;
    当所述参数的数据类型为m*n型数据,所述参数在行维度上映射循环迭代变量没有约束,所述参数在列维度上有映射循环迭代变量;
    当所述向量指令为消除循环迭代变量的指令,所述参数在行维度上有映射循环迭代变量或者所述参数在列维度上有映射循环迭代变量。
  5. 如权利要求1-4中任一项所述的向量指令的识别方法,其特征在于,所述指令配置表,包括:
    所有待识别的向量指令以及每个所述向量指令中的参数的数据类型。
  6. 如权利要求5所述的向量指令的识别方法,其特征在于:
    所述向量指令中的参数的数据类型由所述数据类型的步长约束;其中,
    当所述数据类型为1*1型数据,其步长约束为行维度上步长为0且列维度上步长为0;
    当所述数据类型为1*n型数据,其步长约束为行维度上步长为0且列维度上步长为1;
    当所述数据类型为m*1型数据,其步长约束为行维度上步长为1且列维度上步长为0;
    当所述数据类型为m*n型数据,其步长约束为行维度上步长为max且列维度上步长为1;其中,所述max表示寄存器所能表示的最大值。
  7. 如权利要求2-6中任一项所述的向量指令的识别方法,其特征在于,所述根据所述向量指令维度映射规则以及所述指令配置表将所述标量指令映射为向量指令,包括:
    获取所述指令配置表中的向量指令;
    确定所述向量指令的参数的数据类型;
    获取所述标量指令中的循环迭代变量;
    根据所述向量指令的参数的数据类型将所述循环迭代变量映射为所述向量指令的参数。
  8. 如权利要求7所述的向量指令的识别方法,其特征在于,所述根据所述向量指令的参数的数据类型将所述循环迭代变量映射为所述向量指令的参数,包括:
    根据所述标量指令中的循环迭代变量以及所述循环迭代变量的步长生成循环迭代变量集合;
    根据所述向量指令的参数的数据类型获取所述数据类型所对应的向量指令维度映射规则;
    从所述循环迭代变量集合中选取所述向量指令的参数所使用的循环迭代变量;
    将选取的所述循环迭代变量根据所述向量指令维度映射规则分别映射到所述参数的行维度和列维度得到所述向量指令的参数的循环迭代变量集合。
  9. 如权利要求7所述的向量指令的识别方法,其特征在于,当所述向量指令为消除循环迭代变量的指令,所述根据所述向量指令的参数的数据类型将所述循环迭代变量映射为所述向量指令的参数,包括:
    根据所述标量指令中的循环迭代变量以及所述循环迭代变量的步长生成循环迭代变量集合;
    根据所述向量指令的输入参数的数据类型获取所述数据类型所对应的输入参数的向量指令维度映射规则;
    从所述循环迭代变量集合中选取所述输入参数所使用的循环迭代变量;
    将所选取的输入参数所使用的循环迭代变量根据所述输入参数的向量指令维度映射规则分别映射到所述输入参数的行维度和列维度得到所述向量指令的输入参数的循环迭代变量集合;
    将所有输入参数的循环迭代变量集合合并得到输入参数的循环迭代变量联合集合;
    根据所述向量指令的输出参数的数据类型获取所述数据类型所对应的输出参数的向量指令维度映射规则;
    从所述输入参数的循环迭代变量联合集合中选取所述输出参数所使用的循环迭代变量;
    将所选取的输出参数所使用的循环迭代变量根据所述输出参数的向量指令维度映射规则分别映射到所述输出参数的行维度和列维度得到所述向量指令的输出参数的循环迭代变量集合。
  10. 如权利要求8或9所述的向量指令的识别方法,其特征在于,所述根据所述向量指令维度有效性规则判断所述向量指令的有效性,包括:
    根据所述参数的循环迭代变量集合中的循环迭代变量的维度以及步长确定所述向量指令的有效性;或者,
    根据所述输出参数的循环迭代变量集合中的循环迭代变量的维度以及步长和所述输入参数的循环迭代变量集合中的循环迭代变量的维度以及步长确定所述向量指令的有效性。
  11. 如权利要求7所述的向量指令的识别方法,其特征在于,向量指令为非消除循环迭代变量的指令,所述根据所述向量指令维度映射规则将所述循环迭代变量映射为所述向量指令的参数,包括:
    根据所述标量指令中的循环迭代变量以及所述循环迭代变量的步长生成循环迭代变量集合;
    根据所述向量指令的输出参数的数据类型获取所述数据类型所对应的输出参数的向量指令维度映射规则;
    从所述循环迭代变量集合中选取所述输出参数所使用的循环迭代变量;
    将所选取的输出参数所使用的循环迭代变量根据所述输出参数的向量指令维度映射规则分别映射到所述输出参数的行维度和列维度得到所述向量指令的输出参数的循环迭代变量集合;
    根据所述向量指令的输入参数的数据类型获取所述数据类型所对应的输入参数的向量指令维度映射规则;
    从所述输出参数的循环迭代变量集合中选取所述输入参数所使用的循环迭代变量;
    将所选取的输入参数所使用的循环迭代变量根据所述输入参数的向量指令维度映射规则分别映射到所述输入参数的行维度和列维度得到所述向量指令的输入参数的循环迭代变量集合。
  12. 一种向量指令的识别装置,包括:
    指令获取模块,用于获取标量指令;
    映射模块,用于根据向量指令维度映射规则以及指令配置表将所述标量指令映射为向量指令;其中,所述向量指令维度映射规则包括将标量指令映射为向量指令的规则;所述指令配置表包括向量指令集合;
    识别验证模块,用于根据向量指令维度有效性规则判断所述向量指令的有效性;其中,所述向量指令维度有效性规则包括验证指令映射结果的有效性的规则;如果所述向量指令有效,则所述向量指令识别成功。
  13. 一种电子设备,包括:存储器,用于存储计算机可读指令;以及一个或多个处理器,用于运行所述计算机可读指令,使得所述处理器运行时实现权利要求1-11中任一项所述的方法。
  14. 一种计算机可读存储介质,该非暂态计算机可读存储介质存储计算机指令,该计算机指令用于使计算机执行权利要求1-11中任一项所述的方法。
  15. 一种计算机程序产品,包括计算机指令,当所述计算机指令被计算设备执行时,所述计算设备可以执行权利要求1-11中任一项所述的方法。
PCT/CN2022/084890 2021-04-01 2022-04-01 向量指令的识别方法、装置、电子设备及计算机可读存储介质 WO2022206969A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110357565.5A CN115185524A (zh) 2021-04-01 2021-04-01 向量指令的识别方法、装置、电子设备及计算机可读存储介质
CN202110357565.5 2021-04-01

Publications (1)

Publication Number Publication Date
WO2022206969A1 true WO2022206969A1 (zh) 2022-10-06

Family

ID=83458053

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/084890 WO2022206969A1 (zh) 2021-04-01 2022-04-01 向量指令的识别方法、装置、电子设备及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN115185524A (zh)
WO (1) WO2022206969A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116893850B (zh) * 2023-07-10 2024-05-24 北京辉羲智能科技有限公司 一种硬件循环指令转换方法及编译器

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4791559A (en) * 1985-12-16 1988-12-13 Sperry Corporation High-speed instruction control for vector processors with remapping
US20130067196A1 (en) * 2011-09-13 2013-03-14 Qualcomm Incorporated Vectorization of machine level scalar instructions in a computer program during execution of the computer program
CN106575219A (zh) * 2014-09-26 2017-04-19 英特尔公司 针对用于处理运算的向量格式的指令和逻辑技术领域
WO2018192492A1 (zh) * 2017-04-20 2018-10-25 上海寒武纪信息科技有限公司 一种运算装置及相关产品

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4791559A (en) * 1985-12-16 1988-12-13 Sperry Corporation High-speed instruction control for vector processors with remapping
US20130067196A1 (en) * 2011-09-13 2013-03-14 Qualcomm Incorporated Vectorization of machine level scalar instructions in a computer program during execution of the computer program
CN106575219A (zh) * 2014-09-26 2017-04-19 英特尔公司 针对用于处理运算的向量格式的指令和逻辑技术领域
WO2018192492A1 (zh) * 2017-04-20 2018-10-25 上海寒武纪信息科技有限公司 一种运算装置及相关产品

Also Published As

Publication number Publication date
CN115185524A (zh) 2022-10-14

Similar Documents

Publication Publication Date Title
TWI486810B (zh) 在狀態機晶格中之計數器操作
KR101873619B1 (ko) 상태 기계 격자에서의 불리언 로직
TWI517036B (zh) 程式化的平行機與電腦實施方法、電腦可讀媒體、非暫態電腦可讀媒體、用於編譯程式之電腦及系統
CN105138335B (zh) 一种基于控制流图的函数调用路径提取方法及装置
CN105224452B (zh) 一种针对科学计算程序静态分析性能的预测代价优化方法
US7917899B2 (en) Program development apparatus, method for developing a program, and a computer program product for executing an application for a program development apparatus
US20130291113A1 (en) Process flow optimized directed graph traversal
CN107844415B (zh) 一种基于插值的模型检测路径缩减方法、计算机
US11068247B2 (en) Vectorizing conditional min-max sequence reduction loops
US11651014B2 (en) Source code retrieval
WO2022206969A1 (zh) 向量指令的识别方法、装置、电子设备及计算机可读存储介质
CN113504900A (zh) 一种编程语言转换方法和装置
US9182960B2 (en) Loop distribution detection program and loop distribution detection method
US8990741B2 (en) Circuit design support device, circuit design support method and program
US20190057125A1 (en) System and method for managing log data
CN114117896A (zh) 面向超长simd管线的二值规约优化实现方法及系统
CN105893660A (zh) 一种面向符号bdd操作的cpu设计方法及计算系统
JP6572668B2 (ja) 検査方法
Miyajima et al. A domain specific language and toolchain for OpenCV Runtime Binary Acceleration using GPU
CN111444112B (zh) 一种基于共享不可行路径池的代码并行验证方法和装置
CN102314266B (zh) 物体侦测方法及物体侦测系统
Mubarak-Ali et al. Enhancing Generic Pipeline Model for Code Clone Detection using Divide and Conquer Approach.
JPH0379741B2 (zh)
JP5169322B2 (ja) 変数最適化装置、変数最適化プログラム、コンパイラ、変数最適化方法、及びコンパイル方法
CN116431154A (zh) 一种基于符号执行的路径语言转换方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22779149

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 130224)